While “Big Data” has been commonplace in many industries, biomedical researchers have traditionally worked with data that could be easily managed in spreadsheets and even in paper lab notebooks. The switch to electronic health records and the increasing use of genomic technologies are rendering these tools inadequate. Many business intelligence software packages exist for exploring and analyzing complex data, however these tools fit poorly in cross-institutional academic settings where the cost and burden of end-user training makes their deployment unsustainable. Furthermore, unlike transactional data such as website analytics, financial, or business operations, many aspects of biomedical data are not readily summed or averaged, further limiting the utility of existing tools.
This talk will discuss the capabilities, architecture, and motivations for Harvest. We will address some of our core strategies for mitigating data model and query complexity as well as maintaining the user’s experience when working with large data sets. The first topic regarding data complexity discusses exposing a perceivably flat data access layer to users which makes it simple to quickly find which data they are interested in without concerns about the structure of the data model. The second point will focus on the importance of highly descriptive metadata for context and discoverability of data. Researchers in particular typically use their own vocabulary for describing their data. The third point will focus on abstraction approaches for making data robust and presentable without sacrificing the ability to query and sort discrete values.
Harvest also addresses data scale by gradually exposing data and statistics depending on the current context. This topic discusses presenting aggregate statistics and distribution charts for informing users of what data is available prior to building queries and viewing data.
In addition to internal projects, Harvest powers several multi-center projects developed at CHOP including the [NIDCD](http://www.nidcd.nih.gov/)-funded [AudGenDB](http://audgendb.chop.edu) and the [NHLBI](http://www.nhlbi.nih.gov/)’s [Pediatric Cardiac Genomics Consortium](http://www.benchtobassinet.net/AboutPCGC.asp) Data Hub.
Byron Ruth is a Senior Analyst/Programmer in the Center for Biomedical Informatics at The Children’s Hospital of Philadelphia. Byron’s skills in advanced web programming environments and content code abstraction have enabled him to lead a variety of projects at CHOP, including the development of a highly integrated audiology research database, an electronic health record-mediated clinical decision support engine for the care of premature infants, and a data management system that helps to discover relationships between genetic markers of congenital heart defects and clinical outcomes.
Michael is a Lead Application Scientist in The Children’s Hospital of Philadelphia’s Center for Biomedical Informatics. His primary role is to lead, support, and advise projects with a need for integrated clinical, genomic, and imaging data to enable translational research.
Michael has over 10 years of experience building and managing complex biomedical data repositories. Prior to his work at CHOP, Michael spent 8 years designing and building several genomic data integration projects for one of the world’s largest pharmaceutical companies.
Michael has had a dual interest in biology and computer science since first discovering the field of Bioinformatics as an undergrad studying Biochemistry and Molecular Biology at The Pennsylvania State University. He also holds master’s degree in Biotechnology/Bioinformatics from The University of Pennsylvania.
For information on exhibition and sponsorship opportunities at the conference, contact Sharon Cordesse at (707) 827-7065 or firstname.lastname@example.org.
View a complete list of OSCON contacts