Community Clouds for Cancer Genomics: Lessons Learned from Bionimbus

Robert Grossman (University of Chicago)
Precision Medicine Salon G
Bionimbus is an open source petabyte scale community cloud based upon OpenStack for managing, analyzing and sharing large genomics datasets that is operated by the not-for-profit Open Cloud Consortium.

It contains a variety of public datasets, including ENCODE and the 1000 Genomes dataset.

We recently expanded Bionimbus so that researchers can analyze data from controlled datasets, such as The Cancer Genome Atlas (TCGA) in a secure and compliant fashion. TCGA contains data from over 6,000 cancer patients, spanning 20 different types of cancer. Tissues samples from both cancerous and normal tissue are collected and sequenced.

Until now, researchers who had the required authorizations from NIH to analyze TCGA data had to first set up a secure, compliant computing environment capable of managing and analyzing terabytes of data (which can take months), download the data (which can take weeks), and install the appropriate analysis pipelines. This was a challenge for most research groups.

Authorized researchers can now simply log in to Bionimbus, select the data they would like to analyze, launch one or more virtual machines, and the data and frequently used pipelines that they need are immediately available.

We begin the talk with a short demonstration of using Bionimbus and then discuss:

- the role of private, community and public clouds in bioinformatics

- the Bionimbus architecture

- the Bionimbus security and compliance framework

- how Bionimbus interoperates with Amazon Web Services

- how to interoperate your own resources with Bionimbus

- adding data to Bionimbus

- how Bionimbus allocates computing resources to the community

- how to get involved with Bionimbus

Robert Grossman

University of Chicago

Robert Grossman is a faculty member and the Chief Research Informatics Officer in the Biological Sciences Division of the University of Chicago. He is a Senior Fellow in the Institute for Genomics and Systems Biology (IGSB) and the Computation Institute (CI). He is also the Founder and a Partner of Open Data Group, which specializes in building predictive models over big data. His areas of research include: big data, predictive analytics, bioinformatics, data intensive computing and analytic infrastructure. He has led the development of open source software tools for analyzing big data (Augustus), cloud computing (Sector), and high performance networking (UDT). In 1996 he founded Magnify, Inc., which provides data mining solutions to the insurance industry. Grossman was Magnify’s Chairman until it was sold to ChoicePoint in 2005. He is also the Chair of the Open Cloud
Consortium, which is a not-for-profit that supports the cloud community by operating cloud infrastructure, such as the Open Science Data Cloud. He blogs about big data, data science, and data engineering at RobertGrossman.com.

