What does it mean to perceive something? What does it mean when you understand? A long-standing hypothesis about these phenomena stems from the brain’s ability to perform “analysis-by-synthesis”. Rather than just computing a complex function from inputs to outputs, the analysis-by-synthesis approach suggests that the brain incorporates a generative model of the world. To perceive and understand is then the ability to find a configuration of this generative model that explains the sensory inputs. The idea of analysis-by-synthesis has been around for decades, but development has been held back due to the difficulty of solving the inverse problem. Recently, we have developed a framework where generative models can be expressed through multiplication of high-dimensional vector representations, and thus the process of inverting the generative model amounts to factorization of the vector representation. I will cover our recent Nature MI papers, which explain how we formulate generative models of simple visual scenes and use the resonator network to solve the inverse factorization problem. The resonator network is a new type of recurrent neural network architecture that uses the principle of search in superposition (and built-in inductive biases) to solve factorization problems. In the context of simple scenes, the resonator network will “attend” to one of the objects and decompose it into properties like shape, color and location (the generative factors). Finally, I will discuss some of our new efforts to extend these algorithms for visually-guided spatial navigation and mapping (for modeling hippocampus), for generalized translation-invariant object detection and tracking, and for 3D visual perception.