Image understanding is often conceived as a hierarchical process with many levels where complexity and invariance of object selectivity gradually increase with level in the hierarchy. In contrast, neurophysiological studies have shown that figure-ground organization and border ownership coding, which imply understanding of the object structure of an image, occur at levels as low as V1 and V2 of the visual cortex. This cannot be the result of back-projections from object recognition centers in the inferotemporal cortex, because border-ownership signals appear well before shape selective responses emerge in inferotemporal cortex. Ultra-fast border-ownership signals have been found not only for simple figure displays, but also for complex natural scenes. This talk will review the hypothesis and neurohysiological evidence that the brain uses dedicated grouping mechanisms early on to link elementary features to larger entities we might call “proto-objects.” This process is pre-attentive and does not rely on object recognition. The proposed mechanism consists of grouping cells that sum distributed feature signals with fixed templates and, by feedback, enhance the same feature signals. With this circuit, the system can enhance many feature signals by top-down activating a single grouping cell. The shapes and sizes of the grouping templates and the rise and persistence of grouping cell activity give rise to the Gestalt laws of object perception. The proto-object structures serve to individuate objects and provide permanence; they enable the system to track moving objects and cope with the displacements caused by eye movements, to select one object out of many and to scrutinize the selected object.