(Joint work with Gil Kur and Boaz Nadler)
The notion of projection pursuit according to Huber (1985) appeared first in work of J.B. Kruskal (1969,1972) and was implemented and popularized by Friedman and Tukey (1974). Key papers are Huber (1985) and Diaconis and Freedman (1984). The notion, crudely, according to Huber, is that a p dimensional sample can be studied in various ways , relatable to almost all of multivariate analysis, through one or higher dimensional projections of the sample into low dimensional spaces. The basic idea is to hunt for non Gaussian projections.
Why?
1. Diaconis and Freedman showed (in much greater generality) that if coordinates are iid and L_2, as p and n ->oo the marginal distributions for ALMOST all projections are asymptotically Gaussian and if p is fixed or p/n->0 ALL empirical distributions of projections have limiting Gaussian distributions . Thus non Gaussian empirical distributions suggest interesting non linear phenomena.
2. The ICA model makes sense iff all but one sources is non Gaussian. Hence, the FASTA algorithm has as a basic ingredient finding the projection whose empirical distribution has maximal kurtosis, a standard measure of non-Gaussianity. In the current era it is common to have p>>n or at least p/n->c>0. What then?
We show that, if p/n->oo, given any distribution F, a projection can be found whose empirical df is arbitrarily close to F. If p/n->c>0 some non-Gaussian F are attainable, but others are not. In these regimes it seems plausible that if signal to noise ratio is low, projection pursuit can be a dubious guide to structure and FASTA and other such algorithms can be unstable.