Anatomically, the brain is deep. To understand the ramifications of depth on learning in the brain requires a clear theory of deep learning. I develop the theory of gradient descent learning in deep linear neural networks, which gives exact quantitative answers to fundamental questions such as how learning speed scales with depth, how unsupervised pretraining speeds learning, and how internal representations change across a deep network. Several key hallmarks of deep learning are consistent with behavioral and neural observations. The theory can be further specialized for specific experimental paradigms. Taking visual perceptual learning as an example, I show that a deep learning theory accounts for neural tuning changes across the cortical hierarchy; and predicts behavioral performance transfer to untrained tasks as a function of task precision, restricted position training, and learning time. Together, these findings suggest that depth may be a key factor constraining learning dynamics in the brain. A better scientific understanding should eventually contribute to engineering advances, and I discuss one example from this work: a class of scaled, orthogonal initializations which permit rapid training of very deep nonlinear networks.