Several big data graph processing frameworks that have been built to run on large graph datasets have been proposed and are in use at large corporations for applications ranging from social network analysis to machine learning to the PageRank algorithm. However, these libraries can also be put to work to study the nature of cancer.
Cancer is a complex disease characterized by defective signaling pathways in the cell. New techniques in cancer genomics research discover the pathways implicated in cancer development by studying large protein-protein interaction networks representing how proteins interact with each other within the cell. Using somatic mutation data from a set of cancer patients and probabilistic graph algorithms, researchers can perform de novo identification of mutated pathways or subnetworks, which can then be used to develop therapies.
Crystal Valentine explains how the large graph-processing frameworks that run on Hadoop can be used to detect significantly mutated protein signaling pathways in cancer genomes using techniques similar to those used in social network analysis algorithms and describes an algorithm for de novo discovery of highly-mutated signaling pathways in cancer patients using a big data graph-processing library.
Crystal Valentine is the vice president of technology strategy at MapR Technologies. She has nearly two decades’ experience in big data research and practice. Previously, Crystal was a consultant at Ab Initio, where she worked with Fortune 500 companies to design and implement high-throughput, mission-critical applications and with equity investors as a technical expert on competing technologies and market trends. She was also a tenure-track professor in the Department of Computer Science at Amherst College. She is the author of several academic publications in the areas of algorithms, high-performance computing, and computational biology and holds a patent for extreme virtual memory. Crystal was a Fulbright Scholar in Italy and holds a PhD in computer science from Brown University as well as a bachelor’s degree from Amherst College.
Comments on this page are now closed.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.