Visual Cortical Processing: Image to Object Representation
Rudiger von der Heydt
Image understanding is often conceived as a hierarchical process with many levels where complexity and invariance of object selectivity gradually increase with level in the hierarchy. In contrast, neurophysiological studies have shown that figure-ground organization and border ownership coding, which imply understanding of the object structure of an image, occur at levels as low as V1 and V2 of the visual cortex. This cannot be the result of back-projections from object recognition centers in the inferotemporal cortex, because border-ownership signals appear well before shape selective responses emerge in inferotemporal cortex. Ultra-fast border-ownership signals have been found not only for simple figure displays, but also for complex natural scenes. This talk will review the hypothesis and neurohysiological evidence that the brain uses dedicated grouping mechanisms early on to link elementary features to larger entities we might call “proto-objects.” This process is pre-attentive and does not rely on object recognition. The proposed mechanism consists of grouping cells that sum distributed feature signals with fixed templates and, by feedback, enhance the same feature signals. With this circuit, the system can enhance many feature signals by top-down activating a single grouping cell. The shapes and sizes of the grouping templates and the rise and persistence of grouping cell activity give rise to the Gestalt laws of object perception. The proto-object structures serve to individuate objects and provide permanence; they enable the system to track moving objects and cope with the displacements caused by eye movements, to select one object out of many and to scrutinize the selected object.
Sparse Deep Predictive Coding: a model of visual perception
Building models to efficiently represent images is a central problem in the machine learning community. The brain and especially the visual cortex, has long find economical and robust solutions to solve such a problem. At the local scale, Sparse Coding is one of the most successful framework to model neural computation in the visual cortex. It directly derives from the efficient coding hypothesis, and could be thought as a competitive mechanism that describes visual stimulus using a limited number of neurons. At the structural scale Predictive Coding theory has been proposed to model the interconnection between cortical layers using feedforward and feedback connections.
The presentation introduces a model combining Sparse Coding and Predictive Coding in a hierarchical and convolutional architecture. Our model, called the Sparse Deep Predictive Coding (SDPC) was trained on several challenging databases including faces and natural images. The SDPC allows us to analyze the impact of recurrent processing at both neural organization level and perceptual level. At neural organization level, the feedback signal of the model accounted for a reorganization of the V1 association fields that promotes contour integration. At the higher level of perception, the SDPC exhibited significant denoising ability, highly correlated with the strength of the feedback from V2 to V1. The SDPC demonstrates that neuro-inspiration might be the right path to design more powerful and more robust computer vision algorithms.
Probabilistic computation in natural vision
A central goal of vision science is to understand the principles underlying the perception and cortical encoding of the complex visual environment of our everyday experience. In the visual cortex, foundational work with artificial stimuli, and more recent work combining natural images and deep convolutional neural networks, have revealed much about the tuning of cortical neurons to specific image features. However, a major limitation of this existing work is its focus on single-neuron response strength to isolated images. First, during natural vision, the inputs to cortical neurons are not isolated but rather embedded in a spatial and temporal context. Second, the full structure of population activity—including the substantial trial-to-trial variability that is shared among neurons—determines encoded information and, ultimately, perception.
In the first part of this talk, I will argue for a normative approach to study encoding of natural images in primary visual cortex (V1), which combines a detailed understanding of the sensory inputs with a theory of how those inputs should be represented. Specifically, we hypothesize that V1 response structure serves to approximate a probabilistic representation optimized to the statistics of natural visual inputs, and that contextual modulation is an integral aspect of achieving this goal. I will present a concrete computational framework that instantiates this hypothesis, and data recorded using multielectrode arrays in macaque V1 to test its predictions. In the second part, I will discuss how we leveraged this framework to develop deep probabilistic algorithms for natural image segmentation, and test them with novel experimental measurements of human perceptual segmentation maps.