Brought to you by NumFOCUS Foundation and O’Reilly Media Inc.
The official Jupyter Conference
August 22-23, 2017: Training
August 23-25, 2017: Tutorials & Conference
New York, NY

Integration of the materials data contribution framework MPContribs with Jupyter(Hub)

Moderated by: Patrick Huck & Shreyas Cholia

The Materials Project (MP, 1), started in 2011, provides open web-based access to computed information on known and predicted materials as well as powerful analysis tools to inspire and design novel materials. It has become a worldwide resource for the materials sciences community, with over 30,000 users who rely on the portal as a trusted source to accelerate their research. As a result, they wish to help with MP’s efforts by contributing back, but also ask for support in sharing their experimental and computational datasets alongside MP’s curated results. This provides the opportunity for researchers in both domains to validate calculations or measurements almost instantaneously and use the disseminated data for integrated materials studies.

We recently released our general contribution framework, MPContribs 2 3 4 5 6, as a sustainable solution for well-curated data management, organization and dissemination in the context of MP. The framework serves the purpose of collectively maintaining contributions to local and MP community databases as annotations to existing MP materials. It subsequently disseminates them through a generic interactive gateway powered by Jupyter notebooks or through custom project web apps enabled by the webtzite app kit 7.

As will be shown in a live demo of MPContribs during the presentation 8, Jupyter notebooks, tools and services are used in all aspects of MPContribs. To name a few:

i) Data ingestion relies on the automated, iterative, in-memory building of notebooks using the nbformat and nbconvert APIs. For the purpose of data validation before database storage, each contribution and its according MPFile components of hierarchical, structural, tabular, and graphical data are converted into notebooks with native representations of the components. A new aspect in this context is the speed-up that could be achieved by parallelizing the notebook building.

ii) The integration of all MPContribs features/services with a custom JupyterHub instance facilitates its deployment significantly. Interested contributors can get started immediately by logging into MP’s JupyterHub with their GitHub credentials without the need of lengthy and difficult local installations and database setups. Upon the first-time login, the JupyterHub instance automatically takes care of registering user-specific routes to the MPContribs services and uses supervisord to manage the necessary infrastructure, e.g. MongoDB databases.

iii) MPContribs is installed in the user’s Docker container in development mode 9. Using the integrated tree and terminal features of JupyterHub, this enables users to immediately and collaboratively develop custom landing pages to their contributed datasets through direct code editing. Supervisord ensures the automated restart and reloading of libraries affected by the code changes.

iv) Since all users on MP’s JupyterHub instance make use of MPContribs services, they can collaboratively work on data preparation for MP by sharing databases. The MPContribs Data Ingester app is designed to allow pulling and (possibly) pushing to another user’s identical database via simple drop-down menu or Rester configuration.

As a real-world example, we will demonstrate the above integration of MPContribs and Jupyter(Hub) with a real-world example based on the contribution and dissemination of theoretical diffusion data from our collaborators at University of Wisconsin.

1 Materials Project, https://materialsproject.org
2 “A Community Contribution Framework for Sharing Materials Data with Materials Project”, Proc. IEEE eScience Conf. (2015) 535-541, arXiv:1510.05024, DOI:10.1109/eScience.2015.75
3 “User Applications Driven by the Community Contribution Framework MPContribs in the Materials Project”, Concurrency Computat.: Pract. Exper. (2015), arXiv:1510.05727, DOI:10.1002/cpe.3698
4 “Effective and interactive dissemination of diffusion data using MPContribs, plus a demo of UW/SI2 and MPContribs”, Proceedings of the 11th Gateway Computing Environments Conference, http://sciencegateways.org/wp-content/uploads/2016/09/Patrick-Huck-2016-11-02_Gateways2016-1.pdf
5 “Materials Project as Analysis and Validation Hub for Experimental and Computational Materials Data”, 2016 Materials Research Society Fall Meeting & Exhibit
6 MPContribs Code Repository, https://github.com/materialsproject/MPContribs
7 webtzite, https://github.com/materialsproject/webtzite
8 MPContribs Demo Video, https://youtu.be/McNif_-0Q7M
9 Dockerfile for singleuser spawner of MP JupyterHub deployment, https://github.com/materialsproject/mp-jupyter-docker/blob/11c274be3f7550d9f45770cca2bb571cefd636cf/Dockerfile