A central goal of vision science is to understand the principles underlying the perception and cortical encoding of the complex visual environment of our everyday experience. In the visual cortex, foundational work with artificial stimuli, and more recent work combining natural images and deep convolutional neural networks, have revealed much about the tuning of cortical neurons to specific image features. However, a major limitation of this existing work is its focus on single-neuron response strength to isolated images. First, during natural vision, the inputs to cortical neurons are not isolated but rather embedded in a spatial and temporal context. Second, the full structure of population activity—including the substantial trial-to-trial variability that is shared among neurons—determines encoded information and, ultimately, perception.
In the first part of this talk, I will argue for a normative approach to study encoding of natural images in primary visual cortex (V1), which combines a detailed understanding of the sensory inputs with a theory of how those inputs should be represented. Specifically, we hypothesize that V1 response structure serves to approximate a probabilistic representation optimized to the statistics of natural visual inputs, and that contextual modulation is an integral aspect of achieving this goal. I will present a concrete computational framework that instantiates this hypothesis, and data recorded using multielectrode arrays in macaque V1 to test its predictions. In the second part, I will discuss how we leveraged this framework to develop deep probabilistic algorithms for natural image segmentation, and test them with novel experimental measurements of human perceptual segmentation maps.