Research
Each waking moment, our brain is bombarded by sensory
information, estimated to be in the range of hundreds of
megabits/sec. Somehow, we make sense of this data stream by
extracting the forms of spatiotemporal structure embedded in it, and
from this we build meaningful representations of objects, sounds,
surface textures and so forth in the environment. The
overarching goal of research in my lab is to understand how this
process occurs in the thalamocortical system.
The driving hypothesis behind our work is that the cortex
essentially contains a probabilistic model of the environment, and
that sensory information is interpreted and represented in terms of
this model. Thus, much of our work focuses on building probabilistic models of natural images, and
constructing neural circuits capable of representing images in terms
of these models (see review
article). We also utilize psychophysical experiments, and
neurophysiological experiments through collaborative efforts, in order
to test the predictions of these models.
Sparse coding of spatiotemporal
structure
In previous work we have shown (together with David Field) that when
one seeks a sparse, linear decomposition of natural images, the basis
functions that emerge are spatially localized, oriented, and bandpass
(selective to structure at different spatial scales). We have
extended this model to the time domain by describing images in terms
of a set of time-varying coefficient signals that are convolved with
a set of spatiotemporal basis functions. When the basis functions
are optimized to produce sparse coefficient signals, they resemble
the same spatial properties except that they now translate as a function of time, similar to the
space-time inseparable receptive fields of cortical
simple-cells. This suggests that the spatiotemporal
receptive field properties of V1 neurons could be optimized to
represent time-varying images in terms of sparse, spike-like
events. This may facilitate the formation of associations and
detection of coincidences at later stages of processing. We
are currently building a causal spike-coding model capable of
representing time-varying images as a spike train in real-time.
The behavior of this model can then be directly tied to the actual
dynamics of V1 neurons in response to natural images (see
below). We are also applying this model to video
streaming, with the hope of achieving lower bandwidth and
higher perceptual quality than current methods. For further information see this chapter on the subject (or for
a shorter version this ICIP
paper).
Adapting wavelet pyramid architectures
to natural images.
Wavelets are a popular technique for image coding. But most wavelet
bases are designed by hand according to certain mathematical desiderata,
which are only indirectly related to the types of structures occuring in
natural scenes. Phil Sallee
and I developed methods for adapting the basis functions of a wavelet pyramid
directly to natural images. We have shown that the learned basis can
achieve a higher degree of sparsity compared to standard wavelet bases.
Look at our paper or try out
our code in Matlab. Phil's thesis shows
how this method may be applied to the problems of denoising and super-resolution.
(Phil is currently Sr. Consultant at Booz Allen Hamilton.)
Responses of V1 neurons to natural
images
While much is known about how visual neurons respond to simple stimuli
such as bars and gratings, very little is known about how neurons actually
encode the structures in natural scenes during free-viewing. Together
with Dr.
Charles Gray at Montana State University, Bozeman, we are analyzing
the responses of V1 neurons recorded while monkeys freely view natural scenes
(see our SFN poster). In addition
to simply characterizing how neurons actually respond, we will also be testing
the predictions of the sparse coding model above.
Shape representation in human
visual cortex - fMRI
Scott Murray and I
investigated how different areas in human visual cortex participate during
shape perception. Scott showed that as higher areas such as LOC become
active in response to a coherent object, activity in lower areas such as
V1 is reduced, even though the same elementary features are present in the
image. The reduction of activity in V1 is consistent with predictive
coding models which postulate that cortico-cortical feedback pathways are
carrying the predictions of higher-level areas to lower-level areas, and
that to the extent the higher area can "explain away'' activity in the lower
area it is reduced. Read the paper. (Scott is currently Asst. Professor in the
Dept. of Psychology at University of Washington.)
Time course of object recognition
and visual completion - EEG
Characterizing the timecourse of object recognition is important for
providing constraints on compuational models of the visual system, especially
those that require the use of feedback loops to resolve ambiguity at lower
levels of representation. Many studies seem to suggest that object
recognition occurs extremely quickly. For example, one can see components
in the EEG as early as 150 ms after image onset corresponding to the presence
of a particular object, which many have interpreted as evidence against heavy
use of feedback. My student Jeff Johnson investigated the timecourse
of object recognition using a combination of EEG and reaction-time measures.
Jeff showed that the early components in the EEG waveform that were previously
attributed to object recognition may be due to featural differences in the
images rather than reflecting object categorization per se (see our JOV paper). Jeff also
examined the timecourse of visual completion under conditions of occlusion
(see paper).
This research is currently supported by grants from NSF, NGA,
and CIFAR .