Presented By O'Reilly and Cloudera
Make Data Work
December 1–3, 2015 • Singapore

Petascale genomics

Uri Laserson (Cloudera)
4:00pm–4:40pm Thursday, 12/03/2015
Data Science and Advanced Analytics
Location: 321-322 Level: Intermediate
Average rating: ***..
(3.33, 6 ratings)

Prerequisite Knowledge

General familiarity with the Hadoop ecosystem. Interest in or familiarity with genome research.

Description

The advent of next-generation DNA sequencing technologies is poised to revolutionize the way life sciences research is practiced. These new technologies are scaling significantly faster than Moore’s law, and promise to catapult life sciences research and the biotech industry into the realm of big data. However, bioinformatics and data management in the life sciences has been slow to adopt the latest big data technologies pioneered by the internet industry (e.g., Google and Facebook), in part because these tools are only beginning to become necessary today.

In this talk, we will review several ways in which distributed computing tools (e.g., the Hadoop ecosystem) can be used to significantly advance the state of the art in life sciences research, including:

  • Scaling genome-wide association studies to find connections between your genes and your traits
  • Large-scale data integration of the large number of public databases
  • Assembling genome sequences from short snippets for use in cancer genomics

We will also cover the new ADAM project for rebooting genomics ETL on top of Spark, and the Eggo project for providing Parquet-formatted public data sets. The tools covered will include Hadoop, Spark, Impala, and Cloudera Director, among others.

Photo of Uri Laserson

Uri Laserson

Cloudera

Uri Laserson is a data scientist at Cloudera. Previously, he obtained his PhD from MIT where he developed applications of high-throughput DNA sequencing to immunology. During that time, he co-founded Good Start Genetics, a next-generation diagnostics company focused on genetic carrier screening. In 2012, he was selected to Forbes’s list of 30 under 30.