The traditional deep Boltzmann machine training algorithm requires a greedy layerwise pretraining phase. Existing techniques for avoiding greedy pretraining do not perform as well for classification as the layerwise method. I show that 2nd order methods applied to a deterministic training criterion can obtain better classification performance than the existing joint training methods.