Brought to you by NumFOCUS Foundation and O’Reilly Media
The official Jupyter Conference
Aug 21-22, 2018: Training
Aug 22-24, 2018: Tutorials & Conference
New York, NY

Presentations

Kevin McCormick (Amazon Web Services), Vladimir Zhukov (Amazon Web Services)
Kevin McCormick explains the story of two approaches which were used internally at AWS to accelerate new ML algorithm development, and easily package Jupyter notebooks for scheduled execution, by creating custom Jupyter kernels that automatically create Docker containers, and dispatch them to either a distributed training service or job execution environment.
Moderated by: Victor Dibia (PhD)
Rapidly creating effective visualizations using expressive grammars is challenging for users who have limited time and limited skills in statistics and data visualization. In this paper we introduce Data2Vis, a neural translation model that automatically learns data visualizations strategies and generates visualizations based on data.
Zachary Glassman (The Data Incubator)
Zachary Glassman leads a hands-on dive into building intelligent business applications using machine learning, walking you through all the steps of developing a machine learning pipeline. You'll explore data cleaning, feature engineering, model building and evaluation, and deployment and extend these models into two applications using real-world datasets.
Zachary Glassman (The Data Incubator)
Zachary Glassman leads a hands-on dive into building intelligent business applications using machine learning, walking you through all the steps of developing a machine learning pipeline. You'll explore data cleaning, feature engineering, model building and evaluation, and deployment and extend these models into two applications using real-world datasets.
Adam Thornton (LSST)
LSST is an ambitious project to map the sky in the fastest, widest, and deepest survey ever made. The project's database disrupts traditional astronomical workflows, and its science platform requires a paradigm shift in how astronomy is done. Adam Thornton discusses the challenges of providing production services on a notebook-based architecture and the compelling advantages of JupyterLab.
Moderated by: Rose Chang
As a team of interns at Project Jupyter, we created the JupyterLab cell tags extension, which enables users to easily add, view, and manipulate descriptive tags for notebook cells. The extension includes the functionality to select all cells with a given tag. Cell tagging is designed to streamline user workflow and ease organization of notebooks.
Moderated by: Alena Mueller
We are a team of development and design interns at Project Jupyter who are undergraduates at California Polytechnic State University, San Luis Obispo. Our poster will describe the design and development of a new JupyterLab extension for users to view and change keyboard shortcuts with the press of a button. This enables users of all experience levels to increase productivity with custom shortcuts!
Moderated by: Richa Gadgil
We are an intern team at Project Jupyter and have created a status bar extension for Jupyterlab to revolutionize how information is displayed. The purpose of the status bar is to make relevant information constantly available to users to streamline workflow. Our status bar extension can house default widgets created by us as well as customizable widgets that can be created by other developers.
Moderated by: Jeffrey Treviño, nCoda
Part of the nCoda code-literate music composition and analysis environment project, abjadcompile, a package extension for the Atom text editor, enables composers to preview algorithmically generated music notations -- generated programmatically in Python using the Abjad API for Formalized Score Control -- by injecting kernel middleware into Hydrogen’s communication with a running Jupyter kernel.
Bruno Gonçalves (JPMorgan Chase & Co.)
Bruno Gonçalves offers an overview of the fundamental concepts and ideas behind human visual perception and explains how it informs scientific data visualization. To illustrate these concepts, Bruno shares practical examples using matplotlib and seaborn.
Matt Brems (General Assembly)
Missing data plagues nearly every data science problem. Often, people just drop or ignore missing data. However, this usually ends up with bad results. Matt Brems explains how bad dropping or ignoring missing data can be and teaches you how to handle missing data the right way by leveraging Jupyter notebooks to properly reweight or impute your data.
Will M Farr (Stony Brook University)
Will Farr shares examples of Jupyter use within the LIGO and Virgo Scientific Collaborations and offers lessons about the (many) advantages and (few) disadvantages of Jupyter for large, global scientific collaborations. Along the way, Will speculates on Jupyter's future role in gravitational wave astronomy.
Jane Herriman (Julia Computing)
Jane Herriman uses Jupyter notebooks to show you why Julia is special, demonstrate how easy it is to learn, and get you writing your first Julia programs.
Moderated by: Hsinyi Tsang
The National Cancer Institute Cloud Resources, including Broad Institute’s FireCloud and the Seven Bridges Cancer Genomics Cloud (CGC), seamlessly integrate interactive, exploratory analysis using Jupyter notebooks.
Alaa Moussawi offers an overview of anomaly detection algorithms that use data from phasor measurement unit sensors in the power grid. These algorithms are designed from first principles. They classify anomalies using fundamental classification algorithms such as decision trees and neural networks. Feature selection is used to identify the optimal set of parameters for the learning algorithms.
Moderated by: Anna Chang
Recent technological developments have allowed the modern-day neuroscientist to easily and quickly collect large amounts of data from the brain. Here, I discuss how neuroscientists are using Jupyter tools and Python packages to analyze and visualize neural data. These methods can extend to other systems/network scientists or artificial intelligence engineers.
Ariadne is a static analysis tool that provides support to developers by tracking tensors through TensorFlow code written in Python. The tool is based on IBM-started WALA, an open source analysis framework. By leveraging ongoing work to integrate the Monaco editor into Jupyter, this tool adds static analysis support to Jupyter Labs.
Wind down after a full day of sessions with delicious snacks and drinks as you network with attendees, speakers, and sponsors. The attendee reception is sponsored by Two Sigma.
Michelle Ufford (Netflix)
Netflix is reimagining what a Jupyter notebook is, who works with it, and what you can do with it. Michelle Ufford shares how Netflix leverages notebooks today and describes a brief vision for the future.
Tim Head (Wild Tree Tech)
The Binder project drastically lowers the bar to sharing and reusing software. Users wanting to try out someone else’s work need only click a single link to do so. Tim Head offers an overview of the Binder project and explores the concepts and ideas behind it. Tim then showcases examples from the community to show off the power of Binder.
Explore efforts to bring full ipywidget support to the plotly.py data visualization library. This work brings many exciting new features to Jupyter Notebook users working with plotly.py, including Python callbacks, offline image export, binary array serialization, and integration with the broader ipywidget ecosystem.
Moderated by: Kevin Bates
Data science and analytics departments are now common place for enterprises determined to maximize their operations. While Jupyter Notebooks have significantly decreased the cost of admission into this space, enterprises are finding that data science at scale is difficult within the current framework. Jupyter Enterprise Gateway is designed to address these scalability issues for the enterprise.
Paco Nathan (derwen.ai)
The Business Summit concludes with "unconference"-style breakout sessions that allow enterprise stakeholders to give input to Project Jupyter directly.
David Schaaf (Capital One), Julia Lane (Center for Urban Science and Progress and Wagner School, NYU), Dan Romuald Mbanga (Amazon Web Services), Dave Stuart (Department of Defense ), Michael Li (The Data Incubator), Pramit Choudhary (Oracle(Datascience.com))
Join in for the Business Summit's roundtable discussion with participation from IBM, Capital One, the DoD, AWS, Oracle, and others. Speakers will discuss important issues in our current environment—everything from compliance and GDPR to ML models.
Moderated by: Byron Chu
The ability to process information in an analytical way is in high demand as students enter the workforce. Because of this, teachers are now feeling the pressure to include more coding and data analytics into their curricula. Imagine being able to easily use Jupyter — for data analytics, visualizations, math and writing — in your K-12 class. This is a real opportunity that we call Callysto.
Ian Allison (Pacific Institute for the Mathematical Sciences), James Colliander (Pacific Institute for the Mathematical Sciences)
Over the past 18 months, Ian Allison and James Colliander have deployed Jupyter to more than 8,000 users at universities across Canada. Ian and James offer an overview of the Syzygy platform and explain how they plan to scale and deliver the service nationally and how they intend to make Jupyter integral to the working experience of students, researchers, and faculty members.
Moderated by: Peter Rose
Jupyter Notebooks have the potential to make research more reproducible. However, in practice, many notebooks fall short of this promise. Here we identify challenges and propose guidelines to organize, document, and deploy notebooks to increase reproducibility and reusability. These guidelines also apply to instructional materials.
Dave Stuart (Department of Defense )
Dave Stuart explains how Jupyter was used inside the US Department of Defense and the greater intelligence community to empower thousands of "citizen data scientists" to build and share analytics in order to meet the community’s dynamic challenges.
Closing remarks
Closing remarks
Damián Avila (Anaconda, Inc.)
RISE has evolved into the main slideshow machinery for live presentations within the Jupyter notebook. Damián Avila explains how to install and use RISE. You'll also discover how to customize it and see some of its new capabilities. Damián concludes by discussing the migration from RISE into a new JupyterLab-RISE extension providing RISE-based capabilities in the new JupyterLab interface.
Michelle Gill, Ph.D. (BenevolentAI)
Michelle Gill explains how data science methodologies and tools can be used to link information from different scientific fields and accelerate discovery in a variety of areas, including the biological sciences.
Laura Noren (Obsidian Security)
Laura Noren offers an overview of a research project on the various infrastructure models supporting data science in research settings in terms of funding, educational uses, and research utilization. Laura then shares some of the findings, comparing the national federation model currently established in Canada to the more grassroots efforts in many US universities.
There are many great tutorials for training your deep learning models using TensorFlow, Keras, Spark or one of the many other frameworks. But training is only a small part in the overall deep learning pipeline. This talk gives an overview into building a complete deep learning pipeline starting with exploratory analysis, over training, model storage, model serving, and monitoring.
Tracy Teal (The Carpentries)
We are generating vast amounts of data, but it's not the data itself that is valuable—it's the information and knowledge that can come from this data. Tracy Teal explains how to bring people to data and empower them to address their questions, reach their potential, and solve issues that are important in science, scholarship, and society.
Carol Willing (Cal Poly San Luis Obispo), Min Ragan-Kelley (Simula Research Laboratory), Erik Sundell (IT-Gymnasiet Uppsala)
Carol Willing, Min Ragan-Kelley, and Erik Sundell demonstrate how to provide easy access to Jupyter notebooks and JupyterLab without requiring users to install anything on their computers. You'll learn how to configure and deploy a cloud-based JupyterHub using Kubernetes and how to customize and extend it for your needs.
Kerim Kalafala and Nicholai L'Esperance share their experiences using Jupyter notebooks as a critical aid in designing the next generation of IBM Power and Z processors, focusing on analytics on graphs consisting of hundreds of millions of nodes. Along the way, Kerim and Nicholai explain how they leverage Jupyter notebooks as part of their overall design system.
Scott Sanderson (Quantopian)
Scott Sanderson explores how interactivity can and should influence the design of software libraries, details how the needs of interactive users differ from the needs of application developers, and shares techniques for improving the usability of libraries in interactive environments without sacrificing robustness in noninteractive environments.
Available building energy data analysis software doesn't meet the needs of building scientists and energy service professionals. Join in to explore a Python-based API and data visualization toolkit that can be used within a Jupyter notebook to create a powerful and flexible analysis tool and prototype code that can be plugged in to more robust applications.
Cristian Capdevila (Prognos)
Cristian Capdevila explains how Prognos is predicting disease by applying a combination of modern machine learning techniques and clinical expertise to the world’s largest clinical lab database and how the company is leveraging Amazon SageMaker to accelerate model development, training, and deployment.
Moderated by: Grant Nestor
A proof-of-concept that brings the power of interactive computing to the common doc (e.g. Google Docs) to cater to and empower a less technical but much larger category of users.
Moderated by: Naty Clementi
Our poster showcases the concepts and design principles behind new learning modules for teaching undergraduate engineering students to use computing to learn. The modules are Jupyter-first—i.e., written as a set of Jupyter notebooks and shared on GitHub—and refashioned as open online courses using the Open edX platform. We aim to gel a community of educators sharing teaching modules using Jupyter.
Brian Granger (Cal Poly San Luis Obispo)
Over the past two years, we have seen a dramatic shift in Jupyter’s deployment, from ad hoc usage by individuals to production enterprise application at scale. Brian Granger explains how this has expanded the Jupyter community and revealed new use cases with new challenges and opportunities.
Join in to discover lessons learned utilizing JupyterHub and Jupyter notebooks to facilitate workshops for participants and demonstrators at the ESIP 2018 Summer Meeting in Tuscon, Arizona.
Today’s Balkanized “data cathedrals” force us to extract, transform, and load data for before use, leaving us without a way to use data we don’t control. Join in to learn why this approach should be replaced by the "data bazaar," allowing us to freely compose and build upon each other’s data much the way we do with software today—using Jupyter as a key tool.
Kevin Zielnicki (Stitch Fix)
Even with good intentions, analysis notebooks can quickly accumulate a mess of false starts and out-of-order statements. Best practices encourage cleaning up a notebook to ensure reproducibility, but many analyses will never reach this cleaned-up state. Kevin Zielnicki offers an overview of Nodebook, a Jupyter plugin that encourages reproducibility by preventing inconsistency.
Wenming Ye (Amazon Web Services), Miro Enev (NVIDIA)
Wenming Ye and Miro Enev offer an overview of deep learning along with hands-on Jupyter labs, demos, and instruction. You'll learn how DL is applied in modern business practice and how to leverage building blocks from the Amazon ML family of AI services.
Wenming Ye (Amazon Web Services), Miro Enev (NVIDIA)
Machine learning and IoT projects are increasingly common at enterprises and startups alike and have been the key innovation engine for Amazon businesses such as Go, Alexa, and Robotics. Wenming Ye and Miro Enev lead a hands-on deep dive into the AWS machine learning platform, using Project Jupyter-based Amazon SageMaker to build, train, and deploy ML/DL models to the cloud and AWS DeepLens.
Lorena Barba (George Washington University), Robert Talbert (Grand Valley State University)
In flipped learning, students encounter new material before class meetings, which helps them learn how to learn and frees up class time to focus on creative applications of the basic material. Lorena Barba and Robert Talbert discuss the use of Jupyter notebooks as a “tangible interface” for new material in a flipped course and share case studies from their own courses.
Paco Nathan (derwen.ai), Fernando Perez (UC Berkeley and Lawrence Berkeley National Laboratory), Brian Granger (Cal Poly San Luis Obispo)
JupyterCon cochairs Paco Nathan, Fernando Pérez, and Brian Granger open the second day of keynotes.
Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics.
As an institute you will run workshops, lecture courses and research projects and in each case you want to deploy a differently configured JupyterHub. Maintaining the support infrastructure and configuration for a hub is work and requires a lot of technical expertise, multiplying this by N JupyterHubs increases the maintenance burden even more.
Thorin Tabor (University of California, San Diego)
Making Jupyter accessible to all members of a research organization, regardless of their programming ability, empowers it to best utilize the latest analysis methods while avoiding bottlenecks. Thorin Tabor offers an overview of the GenePattern Notebook, which offers a wide suite of enhancements to the Jupyter environment to help bridge the gap between programmers and nonprogrammers.
Joshua Patterson (NVIDIA), Keith Kraus (NVIDIA), Leo Meyerovich (Graphistry)
Joshua Patterson, Leo Meyerovich, and Keith Kraus demonstrate how to use PyGDF and other GoAi technologies to easily analyze and interactively visualize large datasets from standard Jupyter notebooks.
Sylvain Corlay (QuantStack), Johan Mabille (QuantStack), Wolf Vollprecht (QuantStack), Martin Renou
Sylvain Corlay, Johan Mabille, Wolf Vollprecht, and Martin Renou share the latest features of the C++ Jupyter kernel, including live help, auto-completion, rich MIME type rendering, and interactive widgets. Join in to explore one of the most feature-full implementations of the Jupyter kernel protocol that also brings Jupyter closer to the metal.
Tyler Erickson (Google)
Massive collections of data on the Earth's changing environment, collected by satellite sensors and generated by Earth system models, are being exposed via web APIs by multiple providers. Tyler Erickson highlights the use of JupyterLab and Jupyter widgets in analyzing complex high-dimensional datasets, providing insights into how our Earth is changing and what the future might look like.
ED MA (Synchrony Financial)
In the corporate tax world, Microsoft Excel—the king of spreadsheets—is the default tool for tracking information and managing tasks, but tax professionals are often annoyed by slowly updating or broken linked or referenced cells within or between spreadsheets. Jinli Ma explains how the Jupyter Notebook does a better job than Microsoft Excel with the original issued discount calculation process.
Kyle Kelley (Netflix)
Kyle Kelley walks you through creating a new web application from the ground up, teaching you how to build on top of Jupyter's protocols in the process. Along the way, you'll learn about Jupyter's REST and streaming APIs, message spec, and the notebook format.
Yuvi Panda (Data Science Education Program (UC Berkeley))
Running infrastructure is challenging for an open source community. Yuvi Panda shares lessons drawn from the small community that operates MyBinder.org, covering the social and technical processes for keeping MyBinder.org reliable in the most open, transparent, and inclusive way possible, using pretty graphs about the state of MyBinder.org that anyone can see in real time.
Rachael Tatman (Kaggle)
Rachael Tatman offers practical introduction to incorporating Jupyter notebooks into the classroom using active learning techniques.
Joel Grus (Allen Institute for Artificial Intelligence)
I have been using and teaching Python for many years. I wrote a best-selling book about learning data science. And here's my confession: I don't like notebooks. (There are dozens of us!) I'll explain why I find notebooks difficult, show how they frustrate my preferred pedagogy, demonstrate how I prefer to work, and discuss what Jupyter could do to win me over.
Moderated by: Melissa Ferrari
Jupyter notebooks, particularly widgets, are a hidden gem for academic researchers; they provide a framework to succinctly (and easily) explore data and present cohesive data narratives. I will demonstrate how I prototype pipelines for morphing raw data into interpretable plots. Specifically, I will illustrate the use of widgets for parameter estimation in image processing and model fitting.
The Minnesota Supercomputing Institute has implemented JupyterHub and the Jupyter Notebook server as a general-purpose point of entry to interactive high-performance computing services. This mode of operation runs counter to traditional job-oriented HPC operations but offers significant advantages for ease of use, data exploration, prototyping, and workflow development.
Get to know your fellow attendees over dinner. We've made reservations for you at some of the most sought-after restaurants in town. This is a great chance to make new connections and sample some of the great cuisine New York City has to offer.
Rob Newton (Trinity School)
In an effort to broaden graduates' mathematical toolkit and address gender equity in STEM education, Rob Newton has led the implementation of Python projects across his school's entire ninth-grade math courses. Now every student in the ninth grade completes three python projects that introduce programming and integrate them with the ideas developed in class.
Douglas Blank (Bryn Mawr College)
For the last four years, Douglas Blank has used nothing but Jupyter in the classroom—from a first-year writing course to a course on assembly language, from biology to computer science, from lectures to homework. Join in to learn how Douglas has leveraged Jupyter and discover the successes and failures he experienced along the way. Nicole Petrozzo then offers a student's perspective.
Lorena Barba (George Washington University), Robert Talbert (Grand Valley State University)
The Jupyter in education track concludes with breakout sessions that allow presenters and attendees alike to work together on specific topics, potentially leading to new projects and collaborations.
Luciano Resende (IBM Watson)
IBM has leveraged the Jupyter stack in many of its products to offer industry-leading and business-critical services to its clients. Luciano Resende explores some of the open source initiatives that IBM is leading in the Jupyter ecosystem to address enterprise requirements in the community.
Gerald Rousselle (Teradata)
Gerald Rouselle reviews some of the trends in modern data and analytics ecosystems for large enterprises and shares some of the key challenges and opportunities for Jupyter adoption. He also details some recent examples and experiments in incorporating Jupyter in commercial products and platforms.
Explore IBM's Data Science Experience (DSX) and see how it leverages Jupyter to enable data scientists and AI professionals to create notebooks accessing cloud data services and Watson AI services to collaboratively analyze data and gain insights. Join in to see a demo of example Python notebooks created in Jupyter in DSX that apply AI to data and visualize the results.
David Schaaf (Capital One)
David Schaaf explains how data science and data engineering can work together in cross-functional teams—with Jupyter notebooks at the center of collaboration and the analytic workflow—to more effectively and more quickly deliver results to decision makers.
The Jupyter Poster Session is an opportunity for you to discuss your work with other attendees and presenters. Posters will be presented Wednesday evening in a friendly, networking setting so you can mingle with the presenters and discuss their work one on one.
Paco Nathan (derwen.ai)
Jupyter is built on a set of extensible, reusable building blocks, expressed through various open protocols, APIs, and standards. For many use cases, these are combined to provide extensible software architecture for interactive computing with data. Paco Nathan shares a few somewhat unexpected things that emerged in 2018.
Help shape the future of Jupyter's user experience. We’ll be testing new UI ideas for JupyterLab, listening to your needs, and involving you in idea generation.
Help shape the future of Jupyter's user experience. We’ll be testing new UI ideas for JupyterLab, listening to your needs, and involving you in idea generation.
Maarten Breddels (Maarten Breddels), Sylvain Corlay (QuantStack)
Project Jupyter aims to provide a consistent set of tools for data science workflows, from the exploratory phase of the analysis to the sharing of the results. Maarten Breddels and Sylvain Corlay offer an overview of Jupyter's interactive widgets framework, which enables rich user interaction, including 2D and 3D interactive plotting, geographic data visualization, and much more.
Afshin Darian (Two Sigma | Project Jupyter), M Pacer (Netflix), Min Ragan-Kelley (Simula Research Laboratory), Matthias Bussonnier (UC Berkeley BIDS)
Jupyter's straightforward, out-of-the-box experience has been important for its success in widespread adoption. But good defaults only go so far. Join Afshin Darian, M Pacer, Min Ragan-Kelley, and Matthias Bussonnier to go beyond the defaults and make Jupyter your own.
Julia Lane (Center for Urban Science and Progress and Wagner School, NYU)
Government agencies have found it difficult to serve taxpayers because of the technical, bureaucratic, and ethical issues associated with access and use of sensitive data. Julia Lane explains how the Coleridge Initiative has partnered with Jupyter to design ways that can address the core problems such organizations face.
Join the leaders of and contributors to the Jupyter community for a hands-on "open studio" on Saturday, August 25, at the Hilton Midtown, sponsored by Bloomberg.
Join the leaders of and contributors to the Jupyter community for a hands-on "open studio" on Saturday, August 25, at the Hilton Midtown, sponsored by Bloomberg.
Kick off a great week at JupyterCon by meeting some of your fellow attendees at a casual happy hour.
Mariah Rogers (UC Berkeley Division of Data Sciences), Ronald Walker (UC Berkeley Division of Data Sciences), Julian Kudszus (UC Berkeley Division of Data Sciences)
The Data Science Modules program at UC Berkeley creates short explorations into data science using notebooks to allow students to work hands-on with a dataset relevant to their course. Mariah Rogers, Ronald Walker, and Julian Kudszus explain the logistics behind such a program and the indispensable features of JupyterHub that enable such a unique learning experience.
Ian Rose (UC Berkeley), Chris Colbert (Project Jupyter)
Ian Rose and Chris Colbert walk you through the JupyterLab interface and codebase and explain how it fits within the overall roadmap of Project Jupyter.
Lindsay Richman (McKinsey & Co.)
JupyterLab and Plotly both provide a rich set of tools for working with data. When combined, they create a powerful computational environment that enables users to produce versatile, robust visualizations in a fast-paced setting. Lindsay Richman demonstrates how to use JupyterLab, Plotly, and Plotly's Python-based Dash framework to create dynamic charts and interactive reports.
Chris Colbert (Project Jupyter), Ian Rose (UC Berkeley), Saul Shanabrook (Quansight)
Chris Colbert, Ian Rose, and Saul Shanabrook walk you through using, extending, and developing custom components for JupyterLab using PhosphorJS, React, JavaScript, TypeScript, and CSS. You'll learn how to make full use of the power features of JupyterLab, customize it to your needs, and develop custom extensions, making complete use of JupyterLab's current capabilities.
Jason Grout (Bloomberg), Matthias Bussonnier (UC Berkeley BIDS)
JupyterLab—Jupyter's new frontend—goes beyond the classic Jupyter Notebook, providing a flexible and extensible web application with a set of reusable components. Jason Grout and Matthias Bussonnier walk you through using JupyterLab, explain how to transition from the classic Jupyter Notebook frontend to JupyterLab, and demonstrate JupyterLab's new powerful features.
A case study of a Jupyter/JupyterHub deployment in an enterprise environment. This talk will brief the audience on what we had to do, the reasons why we love it, and how to adopt it in their companies.
Keynote - To Be Announced
Keynote - To Be Announced
Keynotes - To Be Announced
Keynote - To Be Announced
Keynotes - To Be Announced Soon
Keynote - To Be Announced
Dan Romuald Mbanga (Amazon Web Services)
Keynote by Dan Romuald Mbanga
Carol Willing (Cal Poly San Luis Obispo), Jessica Forde (Jupyter), Erik Sundell (IT-Gymnasiet Uppsala)
Students learn by doing. Carol Willing, Jessica Forde, and Erik Sundell demonstrate the value of interactive content, using Jupyter notebooks, widgets, and visualization libraries, share notable examples of projects within the Jupyter community, and outline ways educators can help students develop data science literacy and use computational skills to build upon their interests.
Christopher Cho (Google)
Christopher Cho demonstrates how Kubernetes can be easily leveraged to build a complete deep learning pipeline, including data ingestion and aggregation, preprocessing, ML training, and serving with the mighty Kubernetes APIs.
M Pacer (Netflix)
Jupyter displays a rich array of media types out of the box. M Pacer explains how to use these capabilities to their full potential, covering how to add rich displays to existing and new Python classes and how to customize the way notebooks are converted to other formats. These skills will enable anyone to make beautiful objects with Jupyter.
Sam Lau (UC Berkeley), Caleb Siu (UC Berkeley)
The nbinteract package converts Jupyter notebooks with widgets into interactive, standalone HTML pages. Its built-in support for function-driven plotting makes authoring interactive pages simpler by allowing users to focus on data, not callbacks. Sam Lau and Caleb Siu offer an overview of nbinteract and walk you through the steps to publish an interactive web page from a Jupyter notebook.
Noemi Derzsy (AT&T Labs)
Networks, also known as graphs, are one of the most crucial data structures in our increasingly intertwined world. Social friendship networks, the web, financial systems, and infrastructure are all network structures. Noemi Derzsy explains how to generate, manipulate, analyze, and visualize graph structures that will help you gain insight about relationships between elements in your data.
Michelle Ufford (Netflix)
Netflix relies on notebooks to inform decisions and fuel experiments across the company. Now Netflix wants to go even further to deliver a compelling notebook experience for end-to-end workflows. Michelle Ufford shares some of the big bets Netflix is making on notebook infrastructure, covering data use at Netflix, architecture, kernels, UIs, and open source projects, such as nteract.
Matt Greenwood (Two Sigma Investments)
Matt Greenwood explains why Two Sigma, a company in a space notorious for protecting IP, thinks it's important to contribute to the open source community. Matt covers the evolution of Two Sigma's thinking and policies over the past five years and makes a case for why other companies should make a commitment to the open source ecosystem.
Learn how to make pandas faster by changing a single line of your code. Pandas on Ray gives users a seamless way to transition into multiprocess computing and parallel execution of their data science pipelines.
Ryan Abernathey (Columbia University), Yuvi Panda (Data Science Education Program (UC Berkeley))
Climate science is being flooded with petabytes of data, overwhelming traditional modes of data analysis. The Pangeo project is building a platform to take big data climate science into the cloud using SciPy and large-scale interactive computing tools. Join Ryan Abernathey and Yuvi Panda to find out what the Pangeo team is building and why and learn how to use it.
Romit Mehta (PayPal), Praveen Kanamarlapudi (PayPal)
Hundreds of PayPal's data scientists, analysts, and developers use Jupyter to access data spread across filesystem, relational, document, and key-value stores, enabling complex analytics and an easy way to build, train, and deploy machine learning models. Romit Mehta and Praveen Kanamarlapudi explain how PayPal built its Jupyter infrastructure and powerful extensions.
Moderated by: Sonyah Seiden
This exploratory project delves into different ways to incorporate variance into the modeling process using historical data on global renewable energy trends. Using Pandas, NumPy, Scikitlearn, and pymc3, the methodology incorporates K-Means clustering, autoregressive, ensemble, and Bayesian autoregressive modeling to understand how and to what degree each approach impacts the results.
April Clyburne-Sherin (Code Ocean)
April Clyburne-Sherin walks you through preparing Jupyter notebooks for computationally reproducible publication. You'll learn best practices for publishing notebooks and get hands-on experience preparing your own research for reuse, creating documentation, and submitting your notebook to share.
Moderated by: Trevor Lyon
QuantEcon Notes is an open source Jupyter notebook sharing site. Users can submit their own notebooks or discover other notebooks.
George Williams (GSI Technology), Harini Kannan (Capsule8), Alex Comerford (Capsule8)
The key to successful threat detection in cybersecurity is fast response. George Williams, Harini Kannan, and Alex Comerford offer an overview of specialized extensions they have built for data scientists working in cybersecurity that can be used and deployed via JupyterHub.
William Stein (SageMath, Inc. | University of Washington)
William Stein explains how CoCalc relates to Project Jupyter and shares how he implemented real-time collaborative editing of Jupyter notebooks in CoCalc.
Jackson Brown (Allen Institute for Cell Science), Aneesh Karve (Quilt)
Reproducible data is essential for notebooks that work across time, across contributors, and across machines. Jackson Brown and Aneesh Karve demonstrate how to use an open source data registry to create reproducible data dependencies for Jupyter and share a case study in open science over terabyte-size image datasets.
Elizabeth Wickes (School of Information Sciences, University of Illinois at Urbana-Champaign)
As practitioners of open science begin to migrate their educational material into pubic repositories, many of their common practices and platforms can be used to streamline the instruction material development process. Elizabeth Wickes explains how open science practices can be used in an educational context and why they are best facilitated by tools like the Jupyter Notebook.
Chris Harris (Kitware)
In silico prediction of chemical properties has seen vast improvements in both veracity and volume of data but is currently hamstrung by a lack of transparent, reproducible workflows coupled with environments for visualization and analysis. Chris Harris offers an overview of a platform that uses Jupyter notebooks to enable an end-to-end workflow from simulation setup to visualizing the results.
Rachael Tatman (Kaggle)
Rachael Tatman shows you how to take an existing research project and make it fully reproducible using Kaggle Kernels. You'll learn best practices for and get hands-on experience with each of the three components necessary for completely reproducible research.
Sandra Savchenko-de Jong (Swiss Data Science Center)
Sandra Savchenko-de Jong offers an overview of Renku, a highly scalable and secure open software platform designed to make (data) science reproducible, foster collaboration between scientists, and share resources in a federated environment.
Jupyter Notebooks enable simple data analytics for data scientists, BeakerX is a collection of kernels and extensions to the Jupyter interactive computing environment. But who is responsible for bringing up those Notebooks? This talk discusses how to leverage Mesosphere DC/OS as a self-service platform for JupyterLab notebooks.
Ian Foster (Argonne National Laboratory | University of Chicago)
The Globus service simplifies the utilization of large and distributed data on the Jupyter platform. Ian Foster explains how to use Globus and Jupyter to seamlessly access notebooks using existing institutional credentials, connect notebooks with data residing on disparate storage systems, and make data securely available to business partners and research collaborators.
Luciano Resende (IBM Watson)
Luciano Resende outlines a pattern for building deep learning models using the Jupyter Notebook's interactive development in commodity hardware and leveraging platforms and services such as Fabric for Deep Learning (FfDL) for cost-effective full dataset training of deep learning models.
Matthew Seal (Netflix)
Using an nteract project, papermill, Matthew Seal walks you through how Netflix uses notebooks to track user jobs and make a simple interface for work submission. You’ll get an inside peek at how Netflix is tackling the scheduling problem for a range of users who want easily managed workflows.
Fernando Perez (UC Berkeley and Lawrence Berkeley National Laboratory)
In 2018, UC Berkeley launched a new major in data science, anchored by two core courses that are the fastest-growing in the history of the university. Fernando Pérez discusses the program and explains how the core courses, which now reach roughly 40% of the campus population, are extending data science into specific domains that cover virtually all disciplinary areas of the campus.
Vijay Reddy (Google Cloud)
Vijay Reddy walks you through the process of building machine learning models with TensorFlow. You'll learn about data exploration, feature engineering, model creation, training, evaluation, deployment, and more.
Bo Peng (The University of Texas, MD Anderson Cancer Center)
Bo Peng offers an overview of Script of Scripts (SoS), a Python 3-based workflow engine with a Jupyter frontend that allows the use of multiple kernels in one notebook. This unique combination enables users to analyze data using multiple scripting languages in one notebook and, if needed, convert scripts to workflows in situ to analyze large amounts of data on remote systems.
Ready, set, network! Meet fellow attendees who are looking to connect at JupyterCon. We'll gather before Thursday keynotes for an informal speed networking event. Be sure to bring your business cards—and remember to have fun.
Ready, set, network! Meet fellow attendees who are looking to connect at JupyterCon. We'll gather before Friday keynotes for an informal speed networking event. Be sure to bring your business cards—and remember to have fun.
Versioning is easy when you only need a local versioning system (v1, v2, v3, etc.). It gets hard when versioning info needs to concisely say if upgrades are safe or risky and roughly what will change. Explore StabVS, a stabilizing versioning system developed for EvoSysBio research, which could help Jupyter open science users increase the long-term stability of their code.
David Koop (University of Massachusetts Dartmouth)
Dataflow notebooks build on the Jupyter Notebook environment by adding constructs to make dependencies between cells explicit and clear. David Koop offers an overview of the Dataflow kernel, shows how it can be used to robustly link cells as a notebook is developed, and demonstrates how that notebook can be reused and extended without impacting its reproducibility.
Carol Willing (Cal Poly San Luis Obispo)
New challenges are emerging for Jupyter, open information, and investing in the future. You, the innovators of this growing knowledge commons, will determine how we meet these challenges and sustain the ecosystem. Carol Willing shows how you can start.
Diogo Castro (CERN)
SWAN, CERN’s service for web-based analysis, leverages the power of Jupyter to provide the high energy physics community access to state-of-the-art infrastructure and services through a web service. Diogo Castro offers an overview of SWAN and explains how researchers and students are using it in their work.
Stephanie Stattel (Bloomberg LP), Paul Ivanov (Bloomberg LP)
Stephanie Stattel and Paul Ivanov walk you through a series of extensions that demonstrate the power and flexibility of JupyterLab’s architecture, from targeted functionality modifications to more extreme atmospheric changes that require extensive decoupling and flexibility within JupyterLab.
Moderated by: Tony Fast
This work highlights the diverse content shared during the first few months of our growing Atlanta Jupyter User Group community.
Min Ragan-Kelley (Simula Research Laboratory), Carol Willing (Cal Poly San Luis Obispo), Yuvi Panda (Data Science Education Program (UC Berkeley))
JupyterHub is a multiuser server for Jupyter notebooks, focused on supporting deployments in research and education. Min Ragan-Kelley, Carol Willing, and Yuvi Panda discuss recent additions and future plans for the project.
Moderated by: Rose Chang
As a team of UI/UX Design Interns at Project Jupyter, we designed a commenting and annotation system for JupyterLab. This system, when implemented, will enable users to collaborate and discuss text files, notebooks, data sets, and other JupyterLab supported documents!
John Miller (Honeywell UOP)
John Miller offers an overview of the Emacs IPython Notebook (EIN), a full-featured client for the Jupyter Notebook in Emacs, and shares a brief history of its development.
Ryan Abernathey (Columbia University)
Drawing on his experience with the Pangeo project, Ryan Abernathey makes the case for the large-scale migration of scientific data and research to the cloud. The cloud offers a way to make the largest datasets instantly accessible to the most sophisticated computational techniques. A global scientific data commons could usher in a golden age of data-driven discovery.
Carol Willing (Cal Poly San Luis Obispo), Natalia Clementi (The George Washington University), James Colliander (Pacific Institute for the Mathematical Sciences), Allen Downey (Olin College of Engineering), Jason Moore (UC Davis), Danny Caballero (Michigan State University)
Join this panel of seasoned educators and the cochairs of the education track at JupyterCon to look to the future of Jupyter in teaching and learning.
Viral Shah (Julia Computing), Jane Herriman (Julia Computing), Stefan Karpinski (Julia Computing, Inc.)
Julia and Jupyter share a common evolution path: Julia is the language for modern technical computing, while Jupyter is the development and presentation environment of choice for modern technical computing. Viral Shah and Jane Herriman discuss Julia's journey and the impact of Jupyter on Julia's growth.
Catherine Ordun (Booz Allen Hamilton)
Many US government agencies are just getting started with machine learning. As a result, data scientists need to de-"black box" models as much as possible. One simple way to do this is to transparently show how the model is coded and its results at each step. Notebooks do just this. Catherine Ordun walks you through a notebook built for RNNs and explains how government agencies can use notebooks.
Tony Fast (Ronin), Nick Bollweg (Georgia Tech Research Institute)
Notebook authors often consider only the interactive experience of creating computable documents. However, the dynamic state of a notebook is a minor period in its lifecycle; the majority is spent as a file at rest. Tony Fast and Nick Bollweg explore conventions that create notebooks with value long past their inception as documents, software packages, test suites, and interactive applications.
Mark Hansen (Columbia Journalism School | The Brown Institute for Media Innovation)
Beyond Twitter, Facebook, and similar networks, without question, data, code, and algorithms are forming systems of power in our society. Mark Hansen explains why it is crucial that journalists—explainers of last resort—be able to interrogate these systems, holding power to account.
Paco Nathan (derwen.ai), Fernando Perez (UC Berkeley and Lawrence Berkeley National Laboratory), Brian Granger (Cal Poly San Luis Obispo)
JupyterCon cochairs Paco Nathan, Fernando Pérez, and Brian Granger open the first day of keynotes.
Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics.
Moderated by: Mark Coleman
Dotscience: Zero-effort version control that couples models, data and parameters. Includes a dashboard that plots each model’s performance as a function of its parameters and hyperparameters, to simplify model optimisation, and a data provenance graph per model, generated automatically as the model reads and writes data.
David Schaaf (Capital One), Shivraj Ramanan (Capital One)
In Capital One's recent exploration of "notebook" offerings, JupyterHub emerged as a top contender that could serve as a potential platform for analytics even in highly regulated industries like financial services. David Schaaf and Shivraj Ramanan discuss Capital One's journey and explain how Jupyter has become a part of the company's ever-growing analytics toolkit.
Sean Gorman (DigitalGlobe)
Satellite imagery can be a critical resource during disasters and humanitarian crises. While the community has improved data sharing, we still struggle to create reusable data science to solve problems on the ground. Sean Gorman offers an overview of GBDX Notebooks, a step toward creating an open data science community built around Jupyter to stream imagery and share analysis at scale.
Seth Lawler (Dewberry)
Creating flood maps for coastal and riverine communities requires geospatial processing, statistical analysis, finite element modeling, and a team of specialists working together. Seth Lawler explains how using the feature-rich JupyterLab to develop tools, share code with team members, and document workflows used in the creation of flood maps improves productivity and reproducibility.
Randy Zwitch (MapD)
MapD Core is an open source analytical SQL engine that has been designed from the ground up to harness the parallelism inherent in GPUs. This enables queries on billions of rows of data in milliseconds. Randy Zwitch offers an overview of the MapD kernel extension for the Jupyter Notebook and explains how to use it in a typical machine learning workflow.
Nicolas Fernandez (Icahn School of Medicine at Mount Sinai)
Nicolas Fernandez offers an overview of Clustergrammer-Widget, an interactive heatmap Jupyter widget that enables users to easily explore high-dimensional data within a Jupyter notebook and share their interactive visualizations using nbviewer.
Chakri Cherukuri (Bloomberg LP)
Chakri Cherukuri offers an overview of the interactive widget ecosystem available in the Jupyter notebook and illustrates how Jupyter widgets can be used to build rich visualizations of machine learning models. Along the way, Chakri walks you through algorithms like regression, clustering, and optimization and shares a wizard for building and training deep learning models with diagnostic plots.
Holden Karau (Google), Matt Hunt (Bloomberg)
Many of us believe that gender diversity in open source projects is important. (If you don’t, this isn’t going to convince you.) But what things are correlated with improved gender diversity, and what can we learn from similar historic industries? Holden Karau and Matt Hunt explore the diversity of different projects, examine historic EEOC complaints, and detail parallels and historic solutions.
Julia Meinwald (Two Sigma Investments)
Julia Meinwald outlines a few effective ways Two Sigma has identified to support the unseen labor maintaining a healthy open source ecosystem and details how the company’s thinking on this topic has evolved.