Presented By O'Reilly and Cloudera
Make Data Work
September 26–27, 2016: Training
September 27–29, 2016: Tutorials & Conference
New York, NY

Using parallel graph-processing libraries for cancer genomics

Crystal Valentine (MapR Technologies)
4:35pm–5:15pm Wednesday, 09/28/2016
Data science & advanced analytics
Location: 3D 10 Level: Beginner
Average rating: ****.
(4.50, 4 ratings)

What you'll learn

  • Explore a big data graph-processing application from computational biology
  • Learn how to implement graph algorithms using a big data graph processing framework on Hadoop
  • Description

    Several big data graph processing frameworks that have been built to run on large graph datasets have been proposed and are in use at large corporations for applications ranging from social network analysis to machine learning to the PageRank algorithm. However, these libraries can also be put to work to study the nature of cancer.

    Cancer is a complex disease characterized by defective signaling pathways in the cell. New techniques in cancer genomics research discover the pathways implicated in cancer development by studying large protein-protein interaction networks representing how proteins interact with each other within the cell. Using somatic mutation data from a set of cancer patients and probabilistic graph algorithms, researchers can perform de novo identification of mutated pathways or subnetworks, which can then be used to develop therapies.

    Crystal Valentine explains how the large graph-processing frameworks that run on Hadoop can be used to detect significantly mutated protein signaling pathways in cancer genomes using techniques similar to those used in social network analysis algorithms and describes an algorithm for de novo discovery of highly-mutated signaling pathways in cancer patients using a big data graph-processing library.

    Photo of Crystal Valentine

    Crystal Valentine

    MapR Technologies

    Crystal Valentine is the vice president of technology strategy at MapR Technologies. She has nearly two decades’ experience in big data research and practice. Previously, Crystal was a consultant at Ab Initio, where she worked with Fortune 500 companies to design and implement high-throughput, mission-critical applications and with equity investors as a technical expert on competing technologies and market trends. She was also a tenure-track professor in the Department of Computer Science at Amherst College. She is the author of several academic publications in the areas of algorithms, high-performance computing, and computational biology and holds a patent for extreme virtual memory. Crystal was a Fulbright Scholar in Italy and holds a PhD in computer science from Brown University as well as a bachelor’s degree from Amherst College.

    Comments on this page are now closed.


    09/30/2016 5:53am EDT

    Dr Valentine did a great job of explaining how data science can be successfully applied to genomic research. More sessions like this would be a great service to our community of data scientists.