Brought to you by NumFOCUS Foundation and O’Reilly Media Inc.
The official Jupyter Conference
August 22-23, 2017: Training
August 23-25, 2017: Tutorials & Conference
New York, NY

The KBase Narrative Interface: a platform for reproducible biological analysis

The DOE Systems Biology Knowledgebase (KBase, http://kbase.us) is an open source / open access software and data platform designed to make it easier for biological scientists to create, execute, collaborate on, and share sophisticated, reproducible analyses of their data in the context of public data and privately shared data. KBase provides tens of thousands of publicly accessible genomes and related biological reference datasets, as well as a growing collection of bioinformatics tools. Users access KBase via the Narrative Interface, a modified Jupyter Notebook serving as the front end to a scalable object store, an execution engine, and a distributed compute cluster, as well as a library of computational tools packaged and distributed as Docker images. KBase users can use the tools and datasets to create, run and share their analysis workflows. These workflows are saved as “Narratives” that include the input data along with analysis steps, results, visualizations, and commentary. Narratives can be shared with other users, who can use KBase to easily re-run analyses and reproduce results.

The Narrative Interface supports both point-and-click and scripting access for creating workflows, enabling programmers and non-programmers to easily collaborate within the same platform and share their datasets and results. Point-and-click GUI tools make it easy for users to browse, select, configure and run analysis functionality in the form of “apps”. Users can also add their own code cells to incorporate custom analysis steps that are not available as KBase apps. Under the hood, the Narrative Interface generates and runs code that executes on KBase servers, and tracks the app progress. To support reproducibility, apps are versioned Docker images that manage their code and dependencies, enabling users to repeat the exact same computation using the version of the application installed at the time that the workflow was last run.

KBase was designed to be an extensible community resource. The KBase development kit (SDK) uses Docker to enable third-party developers to wrap new or existing open source tools and publish them to KBase by registering a GitHub repository against the KBase application catalog. The catalog is then capable of building a copy of the application and exposing it from within the KBase Narrative. This provides a model for computational platforms to build open collaborative communities using common standards.

The KBase platform is completely open source, with all core infrastructure and service code available on GitHub (https://github.com/kbase).