Each waking moment, our brain is bombarded by sensory information, estimated to be in the range of hundreds of megabits/sec.  Somehow, we make sense of this data stream by extracting the forms of spatiotemporal structure embedded in it, and from this we build meaningful representations of objects, sounds, surface textures and so forth in the environment.  The overarching goal of research in my lab is to understand how this process occurs in the thalamocortical system.

The driving hypothesis behind our work is that the cortex essentially contains a probabilistic model of the environment, and that sensory information is interpreted and represented in terms of this model. Thus, much of our work focuses on building probabilistic models of natural images, and constructing neural circuits capable of representing images in terms of these models (see review article).  We also utilize psychophysical experiments, and neurophysiological experiments through collaborative efforts, in order to test the predictions of these models.

Sparse coding of spatiotemporal structure

basis function movie In previous work we have shown (together with David Field) that when one seeks a sparse, linear decomposition of natural images, the basis functions that emerge are spatially localized, oriented, and bandpass (selective to structure at different spatial scales).  We have extended this model to the time domain by describing images in terms of a set of time-varying coefficient signals that are convolved with a set of spatiotemporal basis functions. When the basis functions are optimized to produce sparse coefficient signals, they resemble the same spatial properties except that they now translate as a function of time, similar to the space-time inseparable receptive fields of cortical simple-cells.  This suggests that the spatiotemporal receptive field properties of V1 neurons could be optimized to represent time-varying images in terms of sparse, spike-like events.  This may facilitate the formation of associations and detection of coincidences at later stages of processing.  We are currently building a causal spike-coding model capable of representing time-varying images as a spike train in real-time.  The behavior of this model can then be directly tied to the actual dynamics of V1 neurons in response to natural images (see below).  We are also applying this model to video streaming, with the hope of achieving lower bandwidth and higher perceptual quality than current methods.  For further information see this chapter on the subject (or for a shorter version this ICIP paper).

Adapting wavelet pyramid architectures to natural images.

Wavelets are a popular technique for image coding. But most wavelet bases are designed by hand according to certain mathematical desiderata, which are only indirectly related to the types of structures occuring in natural scenes.  Phil Sallee and I developed methods for adapting the basis functions of a wavelet pyramid directly to natural images.  We have shown that the learned basis can achieve a higher degree of sparsity compared to standard wavelet bases.  Look at our paper or try out our code in Matlab.  Phil's thesis shows how this method may be applied to the problems of denoising and super-resolution.  (Phil is currently Sr. Consultant at Booz Allen Hamilton.)

Responses of V1 neurons to natural images

While much is known about how visual neurons respond to simple stimuli such as bars and gratings, very little is known about how neurons actually encode the structures in natural scenes during free-viewing.  Together with Dr. Charles Gray at Montana State University, Bozeman, we are analyzing the responses of V1 neurons recorded while monkeys freely view natural scenes (see our SFN poster).  In addition to simply characterizing how neurons actually respond, we will also be testing the predictions of the sparse coding model above.  

Shape representation in human visual cortex - fMRI

Scott Murray and I investigated how different areas in human visual cortex participate during shape perception.  Scott showed that as higher areas such as LOC become active in response to a coherent object, activity in lower areas such as V1 is reduced, even though the same elementary features are present in the image.  The reduction of activity in V1 is consistent with predictive coding models which postulate that cortico-cortical feedback pathways are carrying the predictions of higher-level areas to lower-level areas, and that to the extent the higher area can "explain away'' activity in the lower area it is reduced.  Read the paper.  (Scott is currently Asst. Professor in the Dept. of Psychology at University of Washington.)

Time course of object recognition and visual completion - EEG

Characterizing the timecourse of object recognition is important for providing constraints on compuational models of the visual system, especially those that require the use of feedback loops to resolve ambiguity at lower levels of representation.   Many studies seem to suggest that object recognition occurs extremely quickly.  For example, one can see components in the EEG as early as 150 ms after image onset corresponding to the presence of a particular object, which many have interpreted as evidence against heavy use of feedback.  My student Jeff Johnson investigated the timecourse of object recognition using a combination of EEG and reaction-time measures.  Jeff showed that the early components in the EEG waveform that were previously attributed to object recognition may be due to featural differences in the images rather than reflecting object categorization per se (see our JOV paper).  Jeff also examined the timecourse of visual completion under conditions of occlusion (see paper).  

This research is currently supported by grants from NSF, NGA, and CIFAR .