Brought to you by NumFOCUS Foundation and O’Reilly Media
The official Jupyter Conference
Aug 21-22, 2018: Training
Aug 22-24, 2018: Tutorials & Conference
New York, NY

Reproducible science with the Renku platform

Sandra Savchenko-de Jong (Swiss Data Science Center)
11:05am–11:45am Friday, August 24, 2018
Reproducible research and open science
Location: Nassau Level: Beginner
Average rating: *****
(5.00, 2 ratings)

Who is this presentation for?

  • Data scientists

Prerequisite knowledge

  • A working knowledge of the Jupyter Notebook

What you'll learn

  • Explore Renku, a highly scalable and secure open software platform designed to make (data) science reproducible, foster collaboration between scientists, and share resources in a federated environment

Description

Sandra Savchenko-de Jong offers an overview of Renku, a highly scalable and secure open software platform developed by the Swiss Data Science Centre (a collaboration between ETH Zurich and EFPL) that is designed to make (data) science reproducible, foster collaboration between scientists, and share resources in a federated environment. The name was borrowed from the renku, a traditional form of Japanese collaborative poetry. Like its namesake, the platform encourages interdisciplinary cooperation (or competition) between scientists.

Renku shows up as a shell around users’ Jupyter notebooks. Under the hood, the platform is governed by a loosely coupled federated model that allows organizations to share compute and storage resources while keeping complete control over said resources. Renku is developed in alignment with the FAIR principles—to make data findable, accessible, interoperable, and reusable.

Reusability is enabled by Renku’s knowledge graph. All actions performed on the data and code, whether code execution and access to the storage to read or write new results, are authorized and registered automatically by the Renku middleware into the knowledge graph. The knowledge graph is immutable and contains information about the version of data, code (or notebooks), and the relationships between the two, such as which execution of a notebook generated a version of a dataset and what dataset was used in input. The resulting knowledge graph can be used for governance, intellectual properties attribution, auditing, and data science on data science. The latter would enable new type of services, such as improved search algorithms for data science research and recommender systems to suggest algorithms or datasets to data scientists based on their research activities.

Photo of Sandra Savchenko-de Jong

Sandra Savchenko-de Jong

Swiss Data Science Center

Sandra Savchenko-de Jong is a Lausanne-based software engineer and data scientist at the Swiss Data Science Center, where she works on the development of the Renku platform. Previously, she was a software engineer at a large bank in the Netherlands. An astrophysicist by education, Sandra studied at the Rijksuniversiteit Groningen in the Netherlands and holds a PhD from the Observatoire de Paris in France.