Neural representations for predictive processing of dynamic visual signals

Pierre-Etienne Helley Fiquet

New York University
Wednesday, May 31, 2023 at 12:00pm
Evans Hall Room 560 and via Zoom

All organisms make temporal predictions, and their evolutionary fitness level generally scales with the accuracy of these predictions. In the context of visual perception, observer motion and continuous deformations of objects and textures structure the dynamics of visual signals, which allows for partial prediction of future inputs from past ones. Here, we propose a self-supervised representation-learning framework that reveals and exploits the regularities of natural videos to compute accurate predictions. The architecture is motivated by the Fourier shift theorem and its group-theoretic generalization, and is optimized for next-frame prediction. Through controlled experiments, we demonstrate that this approach can discover the representation of simple transformation groups acting in data. When trained on natural video datasets, our framework achieves better prediction performance than traditional motion compensation and conventional deep networks, while maintaining interpretability and speed. Furthermore, we implement this framework using normalized simple and direction-selective complex cell-like units, which are the elements commonly used to describe the computations of primate V1 neurons. These results highlight the potential of a principled video processing framework in elucidating how the visual system transforms sensory inputs into representations suitable for temporal prediction.