Get the free Ebook:
Private and Open Data in Asia: A Regional Guide.
The advent of next-generation DNA sequencing technologies is poised to revolutionize the way life sciences research is practiced. These new technologies are scaling significantly faster than Moore’s law, and promise to catapult life sciences research and the biotech industry into the realm of big data. However, bioinformatics and data management in the life sciences has been slow to adopt the latest big data technologies pioneered by the internet industry (e.g., Google and Facebook), in part because these tools are only beginning to become necessary today.
In this talk, we will review several ways in which distributed computing tools (e.g., the Hadoop ecosystem) can be used to significantly advance the state of the art in life sciences research, including:
We will also cover the new ADAM project for rebooting genomics ETL on top of Spark, and the Eggo project for providing Parquet-formatted public data sets. The tools covered will include Hadoop, Spark, Impala, and Cloudera Director, among others.
Uri Laserson is a data scientist at Cloudera. Previously, he obtained his PhD from MIT where he developed applications of high-throughput DNA sequencing to immunology. During that time, he co-founded Good Start Genetics, a next-generation diagnostics company focused on genetic carrier screening. In 2012, he was selected to Forbes’s list of 30 under 30.
©2015, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.