Biology is 3 billion base pairs of mostly unknown code. Each DNA sequence is just a string of the characters A, C, T, or G, and we don’t know what it says—yet it controls so much of our lives, including when and how we get disease.
Decoding the genome using deep learning fundamentally differs from most tasks, as we do not know the full structure of the data and therefore cannot design architectures to suit it. Laura Deming and Sasha Targ describe novel machine-learning search algorithms that help to find architectures best suited to decoding genomics and outline the open questions in biology that deep learning has extraordinary power to address today. Laura and Sasha discuss the progress they’ve made on addressing those problems and offer a vision for the future of deep learning in decoding information contained within DNA to improve understanding of genomics in ways that impact quality of life.
Genomics is an excellent domain in which to apply deep learning because while we have intuition that local patterns and long-range sequential dependencies affect genetic function, much structure remains to be discovered. Laura and Sasha demonstrate how to predict which sequences a biological entity, in this case a class of proteins called transcription factors, can bind to (a standard benchmark task in biology) and how to train a deep network to take a thousands-of-base-pairs sequence and compress it into an internal feature representation that can be used to extract useful information about the outcome we care about (in this case, cell-type specific gene expression—a newly developed technique that captures the structure of regulatory sequences in promoter sequences genome-wide). Laura and Sasha share takeaway results about which types of architectures perform the best at these genomics tasks as well as novel, unpublished biological findings in which they demonstrate that their architectures can both recover known elements that are important in disease and identify promising new candidates for experimental follow up.
Currently a Forbes 30 under 30 star and partner at the Longevity Fund, Laura Deming has wanted to cure aging since the age of eight. After years working on nematode longevity at the UCSF graduate school, Laura matriculated at MIT at 14 to work on artificial organogenesis and bone aging. She is now based in San Francisco, working to find and fund therapies to extend the human health span. She has also recently become a board observer at Navitor Pharmaceuticals.
Sasha Targ is an MD-PhD student at the University of California, San Francisco interested in applying computational approaches to solve problems in genomics and medicine. Sasha studied biology and physics at MIT and graduated Phi Beta Kappa in three years in order to pursue research full time. She previously conducted six years of basic immunology research into mechanisms of antibody development that could be used to create better vaccines and on methods that efficiently characterize patients with autoimmunity, resulting in Science and Nature Biotechnology coauthorships.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org