The human visual system consists of many areas, each of which represents different features or aspects of the visual world. Primary visual cortex represents simple image features, such as luminance edges and color contrast at particular image locations. Higher-order visual areas seem to represent the semantic categories present in a visual scene. However, the intermediate visual representations that bridge the conceptual gap between edges and semantic categories are comparatively poorly understood. In this talk, I will present a novel approach to modeling intermediate visual representations of objects and scenes. We record BOLD fMRI responses as human subjects watch realistic movies generated using 3D animation software. We then use the “ground truth” of the virtual world in the animation software (rather than the rendered stimulus pixels) to define the locations of boundary contours and the depth and orientation of 3D surfaces in the stimuli. We fit the boundary and surface parameters to the recorded BOLD data for each voxel, and use the resulting models to predict responses in a withheld portion of the data set. The distribution of prediction accuracy for the two models reveals separate representations of boundary contours (in lateral occipital cortex, V3A, and the Occipital Face Area) and background 3D surfaces (in the Parahippocampal Place Area, Occipital Place Area, and Retrosplenial Cortex).