Projection pursuit in high-dimensions

Peter Bickel

Dept. of Statistics, UC Berkeley
Wednesday, October 17, 2018 at 12:00pm
560 Evans

(Joint work with Gil Kur and Boaz Nadler)

The notion of projection pursuit according to Huber (1985) appeared first in work of J.B. Kruskal (1969,1972) and was implemented and popularized by Friedman and Tukey (1974).  Key papers are Huber (1985) and Diaconis and Freedman (1984).  The notion, crudely,  according to Huber, is that a p dimensional sample  can be studied in  various ways , relatable  to almost all of multivariate analysis, through one or higher dimensional projections of the sample  into low dimensional spaces.  The basic idea is to hunt for non Gaussian projections.

Why?

1. Diaconis and Freedman showed (in much greater generality) that  if coordinates  are iid  and L_2, as p and n ->oo  the marginal distributions for ALMOST all projections are asymptotically Gaussian and if p is fixed or p/n->0 ALL empirical distributions of projections  have limiting Gaussian distributions .  Thus non Gaussian empirical distributions suggest interesting non linear phenomena.

2.  The ICA model makes sense iff all but one sources is non Gaussian.   Hence, the FASTA algorithm has as  a  basic ingredient  finding the projection whose empirical distribution has maximal kurtosis, a standard measure of non-Gaussianity.  In the current era it is common to have p>>n or at least p/n->c>0.  What then?

We show that, if p/n->oo, given any distribution F, a projection can be found whose empirical df  is arbitrarily close to F. If p/n->c>0 some non-Gaussian F are attainable, but others are not. In these regimes it seems plausible that  if signal to noise ratio is low, projection pursuit can be a dubious guide to structure and  FASTA and other such algorithms can be unstable.