One fundamental class of dimensionality reduction algorithms are nonlinear embedding (or manifold learning) algorithms. Their goal is to find meaningful low-dimensional coordinates for a collection of N objects, such as images, for which pairwise similarities or distances are known. Their roots are in multidimensional scaling (MDS) algorithms developed in the psychology literature before the 1950s, and have seen widespread interest and application in machine learning in the last 20 years.
Embedding algorithms define an objective function of the low-dimensional coordinates of the N data objects, possibly subject to some constraints. When this objective is quadratic, it defines a spectral problem whose global solution is given by the extremal eigenvectors of a certain matrix of NxN. This includes algorithms such as classical MDS, Isomap, LLE and Laplacian eigenmaps. When it is nonlinear, numerical optimisation is necessary. This includes algorithms such as stochastic neighbour embedding, t-SNE and the elastic embedding.
I will give a brief introduction to spectral and nonlinear embedding algorithms and then discuss optimisation approaches for both of them, with particular attention to large-scale problems. For spectral methods, these include landmark-based methods such as the Nyström formula, column sampling, locally linear landmarks (LLL) and variational Nyström, which are based on solving a reduced eigenproblem on a small set of landmark objects and then extrapolating the solution to all N data objects. For nonlinear methods, I will describe the spectral direction method, which uses a sparse positive definite Hessian approximation to “bend” the true gradient with the curvature of the spectral part of the objective. I will also describe the use of fast multipole methods, which allow us to scale the optimisation to millions of objects, by approximating the gradient (which is quadratic on N if computed exactly) in O(N).
Finally, I will discuss the out-of-sample mapping problem for embeddings, that is, how to define a mapping that we can use to find the low-dimensional projection of points not in the training set. I will describe two approaches: a nonparametric one, which makes no model assumptions, and illustrate it with the Laplacian Eigenmaps Latent Variable Model; and a parametric one, which uses a parametric mapping such as a neural net. For the latter, I will show how to construct an optimisation algorithm that learns the out-of-sample mapping from the training set by using auxiliary coordinates. The algorithm is universal in that it can be easily adapted for any combination of embedding objective function and parametric mapping.
Miguel Á. Carreira-Perpiñán is a professor in Electrical Engineering and Computer Science at the University of California, Merced. He received the degree of “licenciado en informática” (MSc in computer science) from the Technical University of Madrid in 1995 and a PhD in computer science from the University of Sheffield in 2001. Prior to joining UC Merced, he did postdoctoral work at Georgetown University (in computational neuroscience) and the University of Toronto (in machine learning), and was an assistant professor at the Oregon Graduate Institute (Oregon Health & Science University). He is the recipient of an NSF CAREER award, a Google Faculty Research Award and best (student) paper awards at AISTATS and Interspeech. He is an associate editor for the IEEE Transactions on Pattern Analysis and Machine Intelligence and has been an area chair for several machine learning, computer vision and data mining conferences. His research interests lie in machine learning, in particular unsupervised learning problems such as dimensionality reduction, clustering and denoising, with an emphasis on optimisation aspects, and with applications to speech processing (e.g. articulatory inversion and model adaptation), computer vision, sensor networks, information retrieval and other areas.