Brought to you by NumFOCUS Foundation and O’Reilly Media Inc.
The official Jupyter Conference
August 22-23, 2017: Training
August 23-25, 2017: Tutorials & Conference
New York, NY
All
Usage and application
Maarten Breddels (Kapteyn Astronomical Institute, University of Groningen)
I will present vaex and ipyvolume. Vaex enables calculating statistics for a billion samples per second on a regular N-dimensional grid. Ipyvolume enabled volume and glyph rendering in the Notebook. Together they allowing interactively visualization and exploration of large, high dimensional datasets in the Notebooks .
Extensions and customization
Daina Bouquin (Harvard-Smithsonian Center for Astrophysics), John DeBlase (Freelance Development)
Network analytics using tools like NetworkX and Jupyter often leave programmers with difficult to examine hairballs rather than useful visualizations. Meanwhile, more flexible tools like SigmaJS have high learning curves for people new to Javascript. This session will show how a simple, flexible architecture can help people make beautiful javascript networks without ditching the Jupyter notebook.
JupyterHub deployments
Scott Sanderson (Quantopian)
This talk describes the architecture of the Quantopian Research Platform, a Jupyter Notebook deployment serving a community of over 100,000 users. We show how, using standard extension mechanisms, we provide features such as: - Robust storage and retrieval of hundreds of gigabytes of notebooks. - Integrating the notebook into an existing web application. - Sharing Notebooks between users.
Extensions and customization
Ali Marami (R-Brain Inc)
JupyterLab provides a robust foundation for building flexible computational environments. As one of the contributors to this project, we have leveraged the JupyterLab extension architecture to build a powerful IDE. R-Brain IDE is one of the few tools in the market which supports R and Python in data science evenly with important features such as IntelliSense, debugging, environment and data view.
Reproducible research and open science
Bernie Randles (UCLA), Catherine Zucker (Harvard University)
Recently, researchers are citing the Jupyter notebook as a way to share the processes involved in the act of scientific inquiry. Traditionally, researchers have cited code and data related to the publication. The Jupyter notebook is a ‘recipe’ to explain the methods used, provide context for data, computations, and results and if shared in a public repository, furthers open science practices.
Reproducible research and open science
Mark Hahnel (FigShare), Marius Tulbure (figshare)
Reports of a lack of reproducibility have led funders and others to require open data and code as as the outputs of research they fund. In this talk, we will describe the opportunities for Jupyter notebooks to be the final output of academic research. We will discuss how Jupyter could help disrupt the inefficiencies in cost and scale of open access academic publishing.
Usage and application
Kazunori Sato (Google)
Google Cloud Datalab is a Jupyter environment from Google that integrates BigQuery, TensorFlow and other Google Cloud services seamlessly. With the massively parallel query engine, you can easily run SQL query from Jupyter to access terabytes of data in seconds, and train your deep model with TensorFlow with tens of GPUs in the cloud, with all the usual tools available on Jupyter.
Usage and application
yoshi NOBU Masatani (National Institute of Informatics)
Jupyter is useful for DevOps as well. It enables collaboration between expert and novice to accumulate infrastructure knowledge, and also between tech and non-tech users. Automation via notebooks enhances traceability and reproducibility. We elaborate knowledge, workflow, and customer support as Literate Computing practices. We show how combine Jupyter with Ansible for reproducible infrastructure.
Reproducible research and open science
Paco Nathan (O'Reilly Media)
Lessons learned about using notebooks in media. Our project explores "computable content", combining Jupyter notebooks, video timelines, Docker containers, and HTML/JS for "last mile" presentation. What system architectures are needed at scale? How to coach authors to be effective with the medium? Can live coding augment formative assessment? What are typical barriers encountered in practice?
Usage and application
Andreas Mueller (Columbia University)
Tutorial Please note: to attend, your registration must include Tutorials on Wednesday.
In this tutorial we will use Jupyter notebooks together with the data analysis packages pandas, seaborn and scikit-learn to explore a variety of real-world datasets. We will walk through initial assessment of data, dealing with different data types, visualization and preprocessing, and finally build predictive models for tasks including health care and housing.
Usage and application
Natalino Busa (Teradata)
Jupyter notebooks are transforming the way we look at computing, coding and science. But is this the only "data scientist experience" that this technology can provide? Actually, you can use Jupyter to create interactive web applications for data exploration and analysis. In the background, these apps are still powered by well understood and documented Jupyter notebooks.
Usage and application
Gunjan Baid (UC Berkeley), Vinitra Swamy (UC Berkeley)
Engaging critically with data is now a required skill for students in all areas, but many traditional data science programs aren’t easily accessible to those without prior computing experience. Our data science program has 1200 students across 50 majors (ranging from history & literature to cognitive science), and we explain how we designed our pedagogy to make data science accessible to everyone.
Development and community
David Taieb (IBM)
Whether you are an experienced data scientist or just a beginner needing to do some data science in a Jupyter Notebook, this session is for you. You will learn how PixieDust, which is a new open source library that has already been downloaded thousands of times, speeds data exploration with interactive auto visualizations that make creating charts easy and fun
Kernels
Tim Gasper (Bitfusion), Pierce Spitler (Bitfusion)
Jupyter is great for deep learning development and training. Combined with GPUs, it makes for fast dev and fast execution, but doesn’t make it easy to switch from a CPU execution context to GPUs and back. We’ll look at best practices on doing deep learning with Jupyter, and then show how to work with CPUs and GPUs more easily by using Elastic GPUs and quick-switching between custom kernels.
Reproducible research and open science
Matt Burton (University of Pittsburgh)
While Jupyter Notebooks are a boon for computational science, they are also a powerful tool in the digital humanities. This talk introduces the digital humanities community, discusses a novel use of Jupyter Notebooks to analyze computational research, and reflects upon Jupyter’s relationship to scholarly publishing and the production of knowledge.
JupyterHub deployments
Yuvi Panda (Wikimedia Foundation)
Open data by itself is not enough - people of all backgrounds should be able to easily use it however they want. We talk about how providing free, open & public computational infrastructure with easy access to our open data has helped a lot more people from diverse backgrounds make use of our data, & why other organizations providing open data should do similar things!
Reproducible research and open science
Lindsey Heagy (University of British Columbia), Rowan Cockett (3point Science)
In the deployment of a short-course on geophysics, we have been developing strategies for developing an “educational stack.” Web-based textbooks and interactive simulations built in Jupyter notebooks provide an entry-point for course participants to reproduce content they are shown and to dive into the code used to build them. We will share the tools we are using and discuss some of our learnings.
Usage and application
James Bednar (Continuum Analytics), Philipp Rudiger (Continuum Analytics)
Tutorial Please note: to attend, your registration must include Tutorials on Wednesday.
It can be difficult to assemble the right set of packages from the Python scientific software ecosystem to solve complex problems. This presentation will show step by step how to make and deploy a concise, fast, and fully reproducible recipe for interactive visualization of millions or billions of datapoints using very few lines of Python in a Jupyter notebook.
JupyterHub deployments
Min Ragan-Kelley (Simula Research Laboratory), Carol Willing (Cal Poly San Luis Obispo), Yuvi Panda (Wikimedia Foundation), Ryan Lovett (Department of Statistics, UC Berkeley)
Tutorial Please note: to attend, your registration must include Tutorials on Wednesday.
JupyterHub, a multi-user server for Jupyter notebooks, enables you to offer a notebook server to everyone in a group. When teaching a course, you can use JupyterHub to give each student access to the same resources and notebooks. There’s no need for the students to install software on their laptops. This tutorial will get you started deploying and customizing JupyterHub for your needs.
Development and community
Leah Silen (NumFOCUS)
What does the discovery of the Higgs Boson, the landing of the Phalae robot, the analysis of political engagement, and the freedom of human trafficking victims have in common? NumFOCUS projects were there. We invite you to come and learn how we together can empower scientists and save humanity.
Usage and application
Karlijn Willems (DataCamp)
Drawing inspiration from narrative theory and design thinking, among others, we will walk through examples that illustrate how to effectively use Jupyter notebooks in the data journalism workflow.
Extensions and customization
Matt Greenwood (Two Sigma Investments)
This talk will introduce BeakerX, a set of Jupyter notebook extensions that enable polyglot data science, time-series plotting and processing, research publication, and integration with Apache Spark. We’ll review the Jupyter extension architecture and how BeakerX plugs into it, cover the current set of BeakerX capabilities, and discuss the pivot from Beaker, a standalone notebook, to BeakerX.
Reproducible research and open science
Thorin Tabor (University of California, San Diego)
GenePattern Notebook allows Jupyter to communicate with the open source GenePattern environment for integrative genomics analysis. It wraps hundreds of software tools for analyzing “omics” data types, as well as general machine learning methods. It makes these available in Jupyter through a user-friendly interface that is accessible to both programming and nonprogramming researchers.
Extensions and customization
Chris Kotfila (Kitware Inc)
GeoNotebook is an extension to the Jupyter Notebook that provides interactive visualization and analysis of geo-spatial data. Unlike other geo-spatial extensions to the Notebook, GeoNotebook includes a fully integrated tile server providing easy visualization of vector and raster data formats.
Usage and application
Christopher Wilcox (Microsoft)
Have you thought about what it takes to host 500+ Jupyter users concurrently? What about managing 15,000+ users and their content? Learn how Azure Notebooks does this daily and about the challenges faced in designing and building a scalable Jupyter service.
Reproducible research and open science
Zach Sailer (University of Oregon)
Scientific research thrives on collaborations between computational and experimental groups who work together to solve problems using their separate expertise. This session highlights how tools like the Notebook, JupyterHub, and ipywidgets can be used to make these collaborations smoother and more effective.
JupyterHub deployments
Shreyas Cholia (Lawrence Berkley National Laboratory), Rollin Thomas (Lawrence Berkeley National Laboratory), Shane Canon (Lawrence Berkeley National Laboratory)
Extracting scientific insights from data increasingly demands a richer, more interactive experience than traditional high-performance computing systems have traditionally provided. We present our efforts to leverage Jupyterhub to enable notebook services for data-intensive supercomputing on the Cray XC40 Cori system at the National Energy Research Scientific Computing Center (NERSC).
Core architecture
Safia Abdalla (nteract)
Tutorial Please note: to attend, your registration must include Tutorials on Wednesday.
Have you wondered what it takes to go from a Jupyter user to a Jupyter pro? Wonder no more! In this talk, we'll cover the core concepts of the Jupyter ecosystem like the extensions ecosystem, the kernel ecosystem, and the front-end architecture. Attendees will leave with an understanding of the possibilities of the Jupyter ecosystem and practical skills on customizing the Notebook experience.
Usage and application
Paco Nathan (O'Reilly Media)
How do people manage AI systems by interacting with them? Semi-supervised learning is hard. With machine learning pipelines running at scale, there's still a large need to keep humans in the loop. This project uses Jupyter in two ways: (1) people tune ML pipelines by reviewing analytics and adjusting parameters managed within notebooks; (2) the pipelines update those notebooks in lieu of logs.
Usage and application
Srinivas Sunkara (Bloomberg LP), Cheryl Quah (Bloomberg LP)
Strong partnerships between the open-source community and industry have been driving many recent developments in Jupyter. Learn more about the results of the community's collaboration with financial service providers such as Bloomberg, including JupyterLab, bqplot and enhancements to ipywidgets that greatly enrich Jupyter as an environment for data science and quantitative financial research.
Usage and application
Aaron Kramer (DataScience Inc.)
Tutorial Please note: to attend, your registration must include Tutorials on Wednesday.
Modern natural language processing workflows often require interoperability between multiple tools. This lecture is an introduction to interactive nlp with SpaCy within the Jupyter notebook. We'll cover core nlp concepts, core workflows in SpaCy, and work through examples of interacting with other tools like Tensorflow, networkx, LIME, and others as part of interactive nlp projects.
Usage and application
R.Stuart Geiger (UC-Berkeley Institute for Data Science), Brittany Fiore-Gartland (eScience Institute, Department of Human Centered Design and Engineering, University of Washington), Charlotte Cabasse-Mazel (UC-Berkeley Institute for Data Science)
Jupyter Notebooks are not only transforming how people communicate knowledge, but also supporting new social and collaborative practices. In this talk, we present ethnographic findings about various rituals performed with Jupyter notebooks. The concept of rituals is useful for thinking about how the core technology of notebooks is extended through other tools, platforms, and practices.
Usage and application
Andrew Therriault (City of Boston)
Jupyter notebooks are a great tool for exploratory analysis and early development, but what do you do when it's time to move to production? A few years ago, the obvious answer was to export to a pure Python script, but now that's not the only option. We'll look at real world cases and explore alternatives for integrating Jupyter into production workflows.
Did you not have a chance to submit a talk to JupyterCon? Would you like to present work in progress for targeted feedback, that may not quite be ready for a full-length presentation? Do you have work targeted to a narrow audience but that could benefit from an in-person discussion? The JupyterCon poster session may be just the ticket for you to share your work with the rest of the community.
Jupyter subprojects
Sylvain Corlay (QuantStack)
Tutorial Please note: to attend, your registration must include Tutorials on Wednesday.
Jupyter widgets enables building user interfaces with graphical controls such as sliders and textboxes inside a Jupyter notebook, documentation, and web pages. Jupyter widgets also provide a framework for building custom controls. We will show how to use Jupyter widgets effectively for interactive computing, explore the ecosystem of custom controls, and demonstrate how to build your own control.
Core architecture
Min Ragan-Kelley (Simula Research Laboratory), Carol Willing (Cal Poly San Luis Obispo)
JupyterHub is a multi-user server for Jupyter notebooks. JupyterHub developers will discuss exciting recent additions and future plans for the project, including sharing notebooks with students and collaborators.
Core architecture
Steven Silvester (Continuum Analytics), Jason Grout (Bloomberg)
Tutorial Please note: to attend, your registration must include Tutorials on Wednesday.
JupyterLab Tutorial - A walkthrough of JupyterLab as a user and as an extension author. A tour of the capabilities of JupyterLab, and a demonstration of creating a simple extension to the environment.
Keynotes
Brett Cannon (Microsoft / Python Software Foundation)
Details to come.
Keynotes
Lorena Barba (George Washington University)
Details to come.
Keynotes
Nadia Eghbal (GitHub)
Details to come.
Keynotes
Wes McKinney (Two Sigma Investments)
Details to come.
Development and community
Kari Jordan (Data Carpentry)
Diversity can be achieved through sharing information among members of a community. As Jupyter prides itself on being a community of “dynamic developers”, “cutting edge scientists”, and “everyday users”, is our platform being shared with diverse populations? Explore how training has the potential to improve diversity and drive usage of Jupyter notebooks in broader communities.
Reproducible research and open science
Megan Risdal (Kaggle), Wendy Chih-wen Kan (Kaggle)
Kaggle Kernels, an in-browser code execution environment which includes a version of Jupyter Notebooks, has allowed Kaggle, home of the world’s largest data science community, to flourish in new ways. From a diverse repository of user-created notebooks paired with competitions and public datasets, we share how Kernels has impacted machine learning trends, collaborative data science, and learning.
Usage and application
Christine Doig (Continuum Analytics), Fabio Pliger (Continuum Analytics)
This talk will introduce how we built a commercial product on top Jupyter to help Excel users access the capabilities of the rich data science Python ecosystem. We'll present examples and use cases in a variety of industries, the collaborative workflow between analysts and data scientists that the application has enabled, and how we leveraged the Jupyter architecture to build the product.
Robert Schroll (The Data Incubator)
2-Day Training Please note: to attend, your registration must include Training courses.
This training will introduce TensorFlow's capabilities through its Python interface with a series of Jupyter notebooks. It will move from building machine learning algorithms piece by piece to using the higher-level abstractions provided by TensorFlow. Students will use this knowledge to build and visualize machine-learning models on real-world data
JupyterHub deployments
Ryan Lovett (Department of Statistics, UC Berkeley), Yuvi Panda (Wikimedia Foundation)
For our data science education program, we use Jupyter notebooks on a JupyterHub so students can learn data science without being distracted by details like installing and debugging Python packages. This talk will explain the DevOps principles we use to keep our hub (1000+ users, largest reported educational hub) stable and performant, and have all the features our instructors and students want.
Documentation
Carol Willing (Cal Poly San Luis Obispo)
Music, as a universal language, engages and delights. By combining music with Jupyter notebooks, you can explore and teach the basics of interactive computing and data science. We'll use music21, a tool for computer-aided musicology, and magenta, a Tensorflow project for making music using machine learning, to create collaborative narratives and publishing materials for teaching and learning.
Development and community
Kyle Kelley (Netflix)
Netflix Data Scientists and Engineers. What do they know? Do they know things? Let's find out!
Usage and application
Patty Ryan (Microsoft), Lee Stott (Microsoft), Michael Lanzetta (Microsoft)
We describe, with video and demonstrations, four inspirational industry applications of Jupyter notebooks. These industry examples represent innovative applications of machine learning in manufacturing, retail, services and education. We also present and share four reference industry Jupyter notebooks, along with demo data sets, for practical application to class industry value areas.
Reproducible research and open science
Jupyter Notebooks are a popular option for sharing data science workflows. We sought to explore best practices in this regard and chose to analyze Jupyter Notebooks referenced in PubMed Central in terms of their reproducibility and other aspects of usability (e.g. documentation, ease of reuse). The project started at a hackathon earlier this month, is still ongoing and documented on GitHub.
Christian Moscardi (The Data Incubator)
2-Day Training Please note: to attend, your registration must include Training courses.
We cover developing a machine learning pipeline, from prototyping to production, in the Jupyter platform. We look at data cleaning, feature engineering, model building/evaluation, and deployment. We dive into applications from real-world datasets. We highlight Jupyter magics, settings, and libraries to enable visualizations. We demonstrate Jupyter best practices in an industry-focused setting.
Usage and application
Laurent Gautier (Technical University of Denmark)
Tutorial Please note: to attend, your registration must include Tutorials on Wednesday.
Python is popular for data analysis, but restricting oneself to only use it, would be missing a wealth of libraries or capabilities available in R, or SQL. This tutorial will demonstrate that a polyglot approach can be pragmatic, reasonable, and good looking thanks to R visualizations.
Jupyter subprojects
Ian Rose (UC Berkeley)
I demonstrate recent work on allowing for realtime collaboration in Jupyter notebooks, including installation, usage, and design decisions.
Kernels
This talk aims at giving an opiniated answer to the question: why hasn't an official Scala kernel for Jupyter emerged yet? Part of the answer lies in the fact that there are no Scala shell as user-friendly as IPython. But a strong contender is emerging! It still has to overcome a few challenges, not the least of them being supporting big data frameworks like Spark, Scio, Scalding, etc.
Jupyter subprojects
Christian Moscardi (The Data Incubator)
This talk will focus on the practical solutions we have developed in our use of Jupyter notebooks for education. We will discuss some of the open-source Jupyter extensions we have written to improve the learning experience, as well as tools to clean notebooks before they are committed to version control.
Keynotes
Andrew Odewahn (O'Reilly Media), Fernando Perez (University of California at Berkeley)
Program Chairs, Andrew Odewahn and Fernando Perez open the first day of keynotes.
Usage and application
Marc Colangelo (Zymergen), Justin Nand (Zymergen), Danielle Chou (Zymergen)
Zymergen is a technology company, approaching biology with an engineering and data-driven mindset. Our platform integrates robotics, software, and biology to deliver predictability and reliability during strain design and development. This session will highlight how Jupyter notebooks play an integral role in providing a shared Python environment between our software engineers and scientists.
Extensions and customization
Andreas Mueller (Columbia University)
One of the strength of Jupyter Notebooks is combining narrative, code and graphics. This is the ideal combination for teaching anything programming related - which is why I chose notebooks as the tool for writing "Introduction to Machine Learning with Python". However, going from notebook to book was not easy, and this talk will describe challenges and tricks for converting notebooks for print.
Kernels
Sylvain Corlay (QuantStack), Johan Mabille (QuantStack)
xeus is a library meant to facilitate the implementation of kernels for Jupyter. It takes the burden of implementing the Jupyter Kernel protocol so that kernel authors can focus on implementing the language specific part of the kernel, and support features such as auto-complete or interactive widgets more easily. We showcase a new C++ kernel based on the cling interpreter built with Xeus.