Design of a Semantic Type System to Facilitate Data Sharing and Analysis Tool Reuse

Graham Cummins

Washington State University
Wednesday, October 19, 2011 at 12:00pm
Evans 560

Data sharing between labs, and indeed between disciplines, reduces duplication of effort, facilitates new discoveries, and leads to the development of more flexible, reliable, and reproducible analysis techniques. Initially, a data sharing solution is required to support entry, storage and transfer of data. In order to be useful, however, it must also assist potential collaborators to locate appropriate data sets, apply their analysis tools, and interpret the results meaningfully. This requires that the stored data sets incorporate information describing both their structure and their meaning, preferably in a form that is useful both to humans and machines. Currently, most approaches to this problem focus on tagging data sets with additional meta-data. These tags are human readable labels that describe the meaning, and typically also the origin and intended use, of the data. Meta-data tags, however, are of limited use to machines. They can be searched over, but usually do not specify enough information to allow the application of analysis tools. I present an alternative approach to data markup, which falls between the domains of meta-data tagging and computational data type (in the sense used in the design of computer languages). I call this approach a semantic type system. I present a design for such a system, which is based heavily on the pattern-matching behavior of functional computer languages, and an initial implementation of this design. I will show examples of how this system provides an application interface for analysis tools, how it can be used by humans to understand data, and also how it can be used to make data sets easier to search. Finally, I propose a possible mechanism for converting recorded data into appropriately semantically typed structures.