Discovering Relationships and their Structures Across Disparate Data Modalities

Joshua Vogelstein

Johns Hopkins University
Wednesday, August 16, 2017 at 12:00pm
560 Evans Hall

Determining whether certain properties are related to other properties is fundamental to scientific discovery. As data collection rates accelerate, it is becoming increasingly difficult yet ever more important to determine whether one property of data (e.g., cloud density) is related to another (e.g., grass wetness). Only if two properties are related are further investigations into the geometry of the relationship warranted. While existing approaches can test whether two properties are related, they may require unfeasibly large sample sizes in real data scenarios, and do not provide insight into the geometry underlying the structure of the relationship. We juxtapose hypothesis testing, manifold learning, and harmonic analysis to obtain Multiscale Generalized Correlation (MGC). Our key insight is that one can adaptively restrict the analysis to the “jointly local” observations – that is, one can estimate the scale with the most informative neighbors for determining the existence and geometry of a relationship. We prove that to achieve a given true positive rate, MGC typically requires far fewer samples than existing methods for all investigated dependence structures and dimensionalities, while maintaining computational efficiency. Moreover, MGC uniquely provides a simple and elegant characterization of the potentially complex latent geometry underlying the relationship. We used MGC to detect the presence and reveal the geometry of the relationships between mental and brain properties, to perform a proteomics screening, and to develop an imaging biomarker for disease, while avoiding the false positive inflation problems that have plagued conventional parametric approaches. Our open source implementation of MGC is easy to use, parameter-free, and applicable to previously vexing statistical questions that are ubiquitous in science, government, finance, and other disciplines.