Brought to you by NumFOCUS Foundation and O’Reilly Media Inc.
The official Jupyter Conference
August 22-23, 2017: Training
August 23-25, 2017: Tutorials & Conference
New York, NY

Presentations

Maarten Breddels (Kapteyn Astronomical Institute, University of Groningen)
Maarten Breddels offers an overview of vaex, a Python library that enables calculating statistics for a billion samples per second on a regular n-dimensional grid, and ipyvolume, a library that enables volume and glyph rendering in Jupyter notebooks. Together, these libraries allow the interactive visualization and exploration of large, high-dimensional datasets in the Jupyter Notebook.
Diogo Munaro Vieira (Globo.com), Felipe Ferreira (Globo.com)
JupyterHub is an important tool for research and data-driven decisions at Globo.com. Diogo Munaro Vieira and Felipe Ferreira explain how data scientists at Globo.com—the largest media group in Latin America and second largest television group in the world—use Jupyter notebooks for data analysis and machine learning, making decisions that impact 50 million users per month.
Moderated by: Roy Hyunjin Han
Jupyter Notebook is already great, but did you know that you can use it to prototype computational web applications? In this whirlwind tour, we will introduce you to several favorite open source plugins that we have been using for the past few years (many of which we have developed) that let us rapidly deploy tools for processing tables, images, spatial data, satellite images, sounds and video.
Moderated by: Ashwin Trikuta Srinath, Linh Ngo, & Jeff Denton
This talk will be about how to build a JupyterHub setup with a rich set of features for interactive HPC, and solutions to practical problems encountered in integrating JupyterHub with other components of HPC systems. We will present several examples of how researchers at our institute are using JupyterHub, and demonstrate the different parts of our setup that enable their applications.
Moderated by: Feyzi Bagirov & Tatiana Yarmola
Poor data quality frequently invalidates data analysis, especially when performed in Excel, the most commonplace business intelligence tool, on data that underwent transformations, imputations, and manual manipulations. In this talk we will use Pandas to walk through an example of Excel data analysis and illustrate several common pitfalls that make this analysis invalid.
Come enjoy delicious snacks and beverages with fellow JupyterCon attendees, speakers, and sponsors.
Andreas will be at the O'Reilly booth signing copies of his book, Introduction to Machine Learning with Python.
Daina Bouquin (Harvard-Smithsonian Center for Astrophysics), John DeBlase (CUNY Building Performance Lab)
Performing network analytics with NetworkX and Jupyter often results in difficult-to-examine hairballs rather than useful visualizations. Meanwhile, more flexible tools like SigmaJS have high learning curves for people new to JavaScript. Daina Bouquin and John DeBlase share a simple, flexible architecture that can help create beautiful JavaScript networks without ditching the Jupyter Notebook.
Scott Sanderson (Quantopian)
Scott Sanderson describes the architecture of the Quantopian Research Platform, a Jupyter Notebook deployment serving a community of over 100,000 users, explaining how, using standard extension mechanisms, it provides robust storage and retrieval of hundreds of gigabytes of notebooks, integrates notebooks into an existing web application, and enables sharing notebooks between users.
Ali Marami (R-Brain Inc)
JupyterLab provides a robust foundation for building flexible computational environments. Ali Marami explains how R-Brain leveraged the JupyterLab extension architecture to build a powerful IDE for data scientists, one of the few tools in the market that evenly supports R and Python in data science and includes features such as IntelliSense, debugging, and environment and data view.
Moderated by: Luciano Resende & Jakob Odersky
Data Scientists are becoming a necessity of every company in the data-centric world of today, and with them comes the requirement to make available a flexible and interactive analytics platform. This session will describe our experience and best practices putting together an Analytical platform based on Jupyter Notebooks, Apache Toree and Apache Spark.
Romain Menegaux (Bloomberg LP), Chakri Cherukuri (Bloomberg LP)
Romain Menegaux and Chakri Cherukuri demonstrate how to develop advanced applications and dashboards using open source projects, illustrated with examples in machine learning, finance, and neuroscience.
Bernie Randles (UCLA), Hope Chen (Harvard University)
Although researchers have traditionally cited code and data related to their publications, they are increasingly using the Jupyter Notebook to share the processes involved in the act of scientific inquiry. Bernie Randles and Hope Chen explore various aspects of citing Jupyter notebooks in publications, discussing benefits, pitfalls, and best practices for creating the "paper of the future."
Program chairs Fernando Pérez and Andrew Odewahn close the first day of keynotes.
Program chairs Fernando Pérez and Andrew Odewahn close the second day of keynotes.
Mark Hahnel (figshare), Marius Tulbure (figshare)
Reports of a lack of reproducibility have led funders and others to require open data and code as the outputs of research they fund. Mark Hahnel and Marius Tulbure discuss the opportunities for Jupyter notebooks to be the final output of academic research, arguing that Jupyter could help disrupt the inefficiencies in cost and scale of open access academic publishing.
Kazunori Sato (Google)
Kazunori Sato explains how you can use Google Cloud Datalab—a Jupyter environment from Google that integrates BigQuery, TensorFlow, and other Google Cloud services seamlessly—to easily run SQL queries from Jupyter to access terabytes of data in seconds and train a deep learning model with TensorFlow with tens of GPUs in the cloud, with all the usual tools available on Jupyter.
yoshi NOBU Masatani (National Institute of Informatics)
Jupyter is useful for DevOps. It enables collaboration between experts and novices to accumulate infrastructure knowledge, while automation via notebooks enhances traceability and reproducibility. Yoshi Nobu Masatani shows how to combine Jupyter with Ansible for reproducible infrastructure and explores knowledge, workflow, and customer support as literate computing practices.
Moderated by: Paco Nathan
Paco Nathan shares lessons learned about using notebooks in media and explores computable content that combines Jupyter notebooks, video timelines, Docker containers, and HTML/JS for "last mile" presentation, covering system architectures, how to coach authors to be effective with the medium, whether live coding can augment formative assessment, and the typical barriers encountered in practice.
Moderated by: Faras Sadek and Demba Ba
At Harvard, we deployed JupyterHub on Amazon AWS for two classes in School of Engineering. The Signal Processing class used Docker-based JupyterHub, where each user provisioned with a docker container notebook. For the Decision Theory class, we redesigned JupyterHub using a dedicated EC2 instance per user’s notebook, providing better scalability, reliability and cost efficiency.
Moderated by: Diogo Munaro Vieira & Felipe Ferreira
At Globo.com all of our datascientists are using Jupyter Notebooks for analysis. Its analysis require some security because they are working on our shared data science platform. We will show how JupyterHub was adjusted for authentication with company's OAuth2 solution and user's action track system based on Jupyter notebook hooks.
Andreas Mueller (Columbia University)
Andreas Müller walks you through a variety of real-world datasets using Jupyter notebooks together with the data analysis packages pandas, seaborn, and scikit-learn. You'll perform an initial assessment of data, deal with different data types, visualization, and preprocessing, and build predictive models for tasks such as health care and housing.
Laurent Gautier (Verily)
Python is popular for data analysis, but restricting yourself to Python means missing a wealth of libraries or capabilities available in R or SQL. Laurent Gautier walks you through a pragmatic, reasonable, and good-looking polyglot approach, all thanks to R visualizations.
Jupyter notebooks are transforming the way we look at computing, coding, and science. But is this the only "data scientist experience" that this technology can provide? Natalino Busa explains how you can create interactive web applications for data exploration and analysis that in the background are still powered by the well-understood and well-documented Jupyter Notebook.
Moderated by: Elijah Philpotts
3Blades has developed an innovative artificial intelligence agent to enhance productivity for data scientists when using Jupyter Notebooks for Exploratory Data Analysis (EDA).
Gunjan Baid (UC Berkeley), Vinitra Swamy (UC Berkeley)
Engaging critically with data is now a required skill for students in all areas, but many traditional data science programs aren’t easily accessible to those without prior computing experience. Gunjan Baid and Vinitra Swamy explore UC Berkeley's Data Science program—2,000 students across 50 majors—explaining how its pedagogy was designed to make data science accessible to everyone.
Christine Doig (Anaconda )
Christine Doig offers an overview of the Anaconda Project, an open source library created by Continuum Analytics that delivers lightweight, efficient encapsulation and portability of data science projects. A JupyterLab extension enables data scientists to install the necessary dependencies, download datasets, and set environment variables and deployment commands from a graphical interface.
David Taieb (IBM), Prithwish Chakraborty (IBM Watson Health), Faisal Farooq (IBM Watson Health)
David Taieb, Prithwish Chakraborty, and Faisal Farooq offer an overview of PixieDust, a new open source library that speeds data exploration with interactive autovisualizations that make creating charts easy and fun.
William Merchan (DataScience.com)
Ian Swanson explores the key components of a data science platform and explains how they are enabling organizations to realize the potential of their data science teams.
Wes McKinney (Two Sigma Investments)
Wes McKinney makes the case for a shared infrastructure for data science, discusses the open source community's efforts on Apache Arrow, and offers a vision for seamless computation and data sharing across languages.
Tim Gasper (Bitfusion), Subbu Rama (Bitfusion)
Combined with GPUs, Jupyter makes for fast development and fast execution, but it is not always easy to switch from a CPU execution context to GPUs and back. Tim Gasper and Subbu Rama share best practices for doing deep learning with Jupyter and explain how to work with CPUs and GPUs more easily by using Elastic GPUs and quick-switching between custom kernels.
Matt Burton (University of Pittsburgh)
While Jupyter notebooks are a boon for computational science, they are also a powerful tool in the digital humanities. Matt Burton offers an overview of the digital humanities community, discusses defactoring—a novel use of Jupyter notebooks to analyze computational research—and reflects upon Jupyter’s relationship to scholarly publishing and the production of knowledge.
Yuvi Panda (Data Science Education Program (UC Berkeley))
Open data by itself is not enough. You need open computational infrastructures as well. Yuvi Panda offers an overview of a volunteer-led open knowledge movement that makes all of its data available openly and explores the free, open, and public computational infrastructure recently set up for people to play with and build things on its data (using a JupyterHub deployment).
Lindsey Heagy (University of British Columbia), Rowan Cockett (3point Science)
Web-based textbooks and interactive simulations built in Jupyter notebooks provide an entry point for course participants to reproduce content they are shown and dive into the code used to build them. Lindsey Heagy and Rowan Cockett share strategies and tools for developing an educational stack that emerged from the deployment of a course on geophysics and some lessons learned along the way.
James Bednar (Anaconda, Inc.), Philipp Rudiger (Anaconda Inc.)
It can be difficult to assemble the right set of packages from the Python scientific software ecosystem to solve complex problems. James Bednar and Philipp Rudiger walk you step by step through making and deploying a concise, fast, and fully reproducible recipe for interactive visualization of millions or billions of data points using very few lines of Python in a Jupyter notebook.
Min Ragan-Kelley (Simula Research Laboratory), Carol Willing (Cal Poly San Luis Obispo), Yuvi Panda (Data Science Education Program (UC Berkeley)), Ryan Lovett (Department of Statistics, UC Berkeley)
JupyterHub, a multiuser server for Jupyter notebooks, enables you to offer a notebook server to everyone in a group—which is particularly useful when teaching a course, as students no longer need to install software on their laptops. Min Ragan-Kelley, Carol Willing, Yuvi Panda, and Ryan Lovett get you started deploying and customizing JupyterHub for your needs.
Lorena Barba (George Washington University)
Lorena Barba explores how to build the ability to support reproducible research into the design of tools like Jupyter and explains how better insights on designing for reproducibility might help extend this design to our research workflows, with the machine as our active collaborator.
Moderated by: Joy Chakraborty
How to run Kerberize secured multi-user Jupyter notebook (JupyterHub) in a integrated with Spark/Yarn cluster and how to use docker to setup such complex integrated platform quickly with less difficulties.
Moderated by: Dave Goodsmith, Meredith Lee, Rene Baston, and Edgar Fuller
A demonstration station will feature donated cloud computing resources from DataScience.com, Amazon Web Services, GoogleCloud, Satori, and other partners in live executable Jupyter-based notebooks.
Leah Silen (NumFOCUS), Andy Terrel (NumFOCUS)
What do the discovery of the Higgs boson, the landing of the Philae robot, the analysis of political engagement, and the freedom of human trafficking victims have in common? NumFOCUS projects were there. Join Leah Silen and Andy Terrel to learn how we can empower scientists and save humanity.
Moderated by: Steven Anton
Sometimes data scientists need to work directly with highly sensitive data, such as personally identifiable information or health records. Jupyter notebooks provide a great platform for exploration, but don't meet strict security standards. We will walk through a solution that our data science team uses to harden security by seamlessly encrypting notebooks at rest.
Karlijn Willems (DataCamp)
Drawing inspiration from narrative theory and design thinking, Karlijn Willems walks you through effectively using Jupyter notebooks to guide the data journalism workflow and tackle some of the challenges that data can pose to data journalism.
Moderated by: Andrey Petrin
Big Data analytics is already outdated at Yandex. We need insights and action items from our logs and databases. In this new environment speed of prototyping comes to the first place. I'm going to give an overview how we use Python and Jupyter to create prototypes that amaze and inspire real product creation.
Moderated by: en zyme & Zelda Kohn
Real estate transactions are geographically sparse and rare, often with both listing and selling agents. Many factors determine price; most models rely on physical parameters. Via Jupyter/Python geographic and data tools, we'll discover "farms", and pricing characteristics. Farms (found via clustering) can affect either listing or sales price, both of which are negotiated.
Meet the Experts are your chance to meet face-to-face with JupyterCon presenters in a small-group setting. Drop in to discuss their sessions, ask questions, or make suggestions.
Fernando Perez (UC Berkeley and Lawrence Berkeley National Laboratory), Andrew Odewahn (O'Reilly Media)
Program chairs Fernando Pérez and Andrew Odewahn open the second day of keynotes.
Matt Greenwood (Two Sigma Investments)
Matt Greenwood introduces BeakerX, a set of Jupyter Notebook extensions that enable polyglot data science, time series plotting and processing, research publication, and integration with Apache Spark. Matt reviews the Jupyter extension architecture and how BeakerX plugs into it, covers the current set of BeakerX capabilities, and discusses the pivot from Beaker, a standalone notebook, to BeakerX.
Peter Wang (Anaconda)
Peter Wang explores open source commercial companies, offering a firsthand account of the unique challenges of building a company that is fundamentally centered around sustainable open source innovation and sharing guidelines for how to carry volunteer-based open source values forward, intentionally and thoughtfully, in a data-centric world.
Thorin Tabor (University of California, San Diego)
Thorin Tabor offers an overview of the GenePattern Notebook, which allows Jupyter to communicate with the open source GenePattern environment for integrative genomics analysis. It wraps hundreds of software tools for analyzing omics data types, as well as general machine learning methods, and makes them available through a user-friendly interface.
Chris Kotfila (Kitware)
Chris Kotfila offers an overview of the GeoNotebook extension to the Jupyter Notebook, which provides interactive visualization and analysis of geospatial data. Unlike other geospatial extensions to the Jupyter Notebook, GeoNotebook includes a fully integrated tile server providing easy visualization of vector and raster data formats.
Christopher Wilcox (Microsoft)
Have you thought about what it takes to host 500+ Jupyter users concurrently? What about managing 17,000+ users and their content? Christopher Wilcox explains how Azure Notebooks does this daily and discusses the challenges faced in designing and building a scalable Jupyter service.
Moderated by: Douglas Liming
Ready to take a deeper look at how the Jupyter platform is having a widespread impact on analytics? Learn how a large health organization was able to fit SAS their open ecosystem, and thanks to the Jupyter platform, you no longer have to choose between analytics languages like Python, R, or SAS, and how a single, unified open analytics platform supported by Jupyter empowers you to have it all.
Moderated by: Chris Rawles
The availability of data combined with new analytical tools have fundamentally transformed the sports industry, and in this talk I show how to use Jupyter Notebook with powerful analytical tools such as Apache Spark and visualization tools like Matplotlib and Seaborn to assist data science.
Zach Sailer (University of Oregon)
Scientific research thrives on collaborations between computational and experimental groups, who work together to solve problems using their separate expertise. Zach Sailer highlights how tools like the Jupyter Notebook, JupyterHub, and ipywidgets can be used to make these collaborations smoother and more effective.
Shreyas Cholia (Lawrence Berkeley National Laboratory), Rollin Thomas (Lawrence Berkeley National Laboratory), Shane Canon (Lawrence Berkeley National Laboratory)
Shreyas Cholia, Rollin Thomas, and Shane Canon share their experience leveraging JupyterHub to enable notebook services for data-intensive supercomputing on the Cray XC40 Cori system at the National Energy Research Scientific Computing Center (NERSC).
Rachel Thomas (fast.ai)
Although some claim you must start with advanced math to use deep learning, the best way for any coder to get started is with code. Rachel Thomas explains how fast.ai's Practical Deep Learning for Coders course uses Jupyter notebooks to provide an environment that encourages students to learn deep learning through experimentation.
Safia Abdalla (nteract)
Have you wondered what it takes to go from a Jupyter user to a Jupyter pro? Wonder no more. Safia Abdalla explores the core concepts of the Jupyter ecosystem, including the extensions ecosystem, the kernel ecosystem, and the frontend architecture, leaving you with an understanding of the possibilities of the Jupyter ecosystem and practical skills on customizing the Jupyter Notebook experience.
Paco Nathan (O'Reilly Media)
Paco Nathan reviews use cases where Jupyter provides a frontend to AI as the means for keeping humans in the loop. This process enhances the feedback loop between people and machines, and the end result is that a smaller group of people can handle a wider range of responsibilities for building and maintaining a complex system of automation.
Srinivas Sunkara (Bloomberg LP), Cheryl Quah (Bloomberg LP)
Strong partnerships between the open source community and industry have driven many recent developments in Jupyter. Srinivas Sunkara and Cheryl Quah discuss the results of some of these collaborations, including JupyterLab, bqplot, and enhancements to ipywidgets that greatly enrich Jupyter as an environment for data science and quantitative financial research.
Moderated by: Patrick Huck & Shreyas Cholia
The open Materials Project (MP, https://materialsproject.org) that supports the design of novel materials, now allows users to contribute and share new theoretical and experimental materials data via the MPContribs tool. MPContribs uses Jupyter and JupyterHub at every layer and is an important step in MP’s effort to deliver a next-generation collaborative platform for Materials (Data) Science.
Aaron Kramer (DataScience.com)
Modern natural language processing (NLP) workflows often require interoperability between multiple tools. Aaron Kramer offers an introduction to interactive NLP with SpaCy within the Jupyter Notebook, covering core NLP concepts, core workflows in SpaCy, and examples of interacting with other tools like TensorFlow, NetworkX, LIME, and others as part of interactive NLP projects.
Moderated by: Harold Mitchell
Today's healthcare and research professionals have so much precious historical data in need of a predictive outcome. Wouldn't it be nice to carry around a web-based notebook that had built‐in algorithms to perform predictions? Even more, the built‐in algorithms would be built by and maintained by you.
Peter Wang (Anaconda)
In recent years, open source has emerged as a valuable player in the enterprise, and companies like Jupyter and Anaconda are leading the way. Peter Wang discusses the coevolution of these two major players in the new open data science ecosystem and shares next steps to a sustainable future.
R.Stuart Geiger (UC Berkeley Institute for Data Science), Charlotte Cabasse-Mazel (UC Berkeley Institute for Data Science)
The concept of the ritual is useful for thinking about how the core technology of Jupyter notebooks is extended through other tools, platforms, and practices. R. Stuart Geiger, Brittany Fiore-Gartland, and Charlotte Cabasse-Mazel share ethnographic findings about various rituals performed with Jupyter notebooks.
Kyle Kelley (Netflix)
So, Netflix's data scientists and engineers. . .do they know things? Join Kyle Kelley to find out. Kyle explores how Netflix uses Jupyter and explains how you can learn from Netflix's experience to enable analysts at your organization.
Andrew Odewahn (O'Reilly Media)
For almost five years, O’Reilly Media has centered its publishing processes around tools like Jupyter, Git, GitHub, Docker, and a host of open source packages. Andrew Odewahn explores how O'Reilly is using the Jupyter architecture to create the next generation of technical content and offers a preview of what's in store for the future.
Kyle Kelley (Netflix), Brian Granger (Cal Poly San Luis Obispo)
Kyle Kelley and Brian Granger offer a broad look at Jupyter frontends, describing their common aspects and explaining how their differences help Jupyter reach a broader set of users. They also share ongoing challenges in building these frontends (real-time collaboration, security, rich output, different Markdown formats, etc.) as well as their ongoing work to address these questions.
This session will be given by a member of the core Jupyter team. More details to come.
Andrew Therriault (City of Boston)
Jupyter notebooks are a great tool for exploratory analysis and early development, but what do you do when it's time to move to production? A few years ago, the obvious answer was to export to a pure Python script, but now there are other options. Andrew Therriault dives into real-world cases to explore alternatives for integrating Jupyter into production workflows.
Skipper Seabold (Civis Analytics), Lori Eich (Civis Analytics)
It’s not enough just to give data scientists access to Jupyter notebooks in the cloud. Skipper Seabold and Lori Eich argue that to build truly data-driven organizations, everyone from data scientists and managers to business stakeholders needs to work in concert to bring data science out of the wilderness and into the core of decision-making processes.
Moderated by: Jacob Frias Koehler
Here, we present an undergraduate mathematics curriculum that leverages the Jupyter notebook and Jupyterhub to deliver material content and serve as the computational platform for students. These materials are motivated by introductory classes typically labeled Quantitative Reasoning, PreCalculus, and Calculus I.
Moderated by: Laxmikanth Malladi
Spinning up Jupyter on AWS is easy with many references for deploying on EC2 and EMR. This session intends to provide additional configurations and patterns for Enterprises to govern, track and audit usage on AWS.
The JupyterCon Poster Session is an opportunity for you to discuss your work with other attendees and presenters. Posters will be presented Wednesday evening in a friendly, networking setting so you can mingle with the presenters and discuss their work one on one.
Sylvain Corlay (QuantStack), Jason Grout (Bloomberg)
Jupyter widgets allow you to build user interfaces with graphical controls inside a Jupyter notebook and provide a framework for building custom controls. Sylvain Corlay and Jason Grout demonstrate how to use Jupyter widgets effectively for interactive computing, explore the ecosystem of custom controls, and walk you through building your own control.
Matthias Bussonnier (UC Berkeley BIDS), Paul Ivanov (Bloomberg LP)
Matthias Bussonnier and Paul Ivanov walk you through the current Jupyter architecture and protocol and explain how kernels work (decoupled from but in communication with the environment for input and output, such as a notebook document). Matthias and Paul also offer an overview of a number of kernels developed by the community and show you how you can get started writing a new kernel.
Get to know your fellow attendees over dinner. We've made reservations for you at some great restaurants in town, for a chance to make new connections and sample some of the cuisine New York City has to offer.
Join the leaders and contributors from the Jupyter community in the free JupyterCon code sprint. At the sprint, you can work side-by-side with leaders and contributors in the Jupyter ecosystem to implement that feature you've always wanted, fix bugs, write documentation, test software, or dive deep into the internals of a project.
Moderated by: Jeffrey Denton
It is a match made in the cloud. By marrying JupyterHub and CloudyCluster, users gain access to scalable Jupyter without the headache and overhead of operations. Learn how CloudyCluster can scale JupyterHub to support thousands of users and thousands of computers, all from your smartphone, tablet, or desktop device.
Min Ragan-Kelley (Simula Research Laboratory), Carol Willing (Cal Poly San Luis Obispo)
JupyterHub is a multiuser server for Jupyter notebooks. Min Ragan-Kelley and Carol Willing discuss exciting recent additions and future plans for the project, including the ability to share notebooks with students and collaborators.
This session will be given by a member of the core Jupyter team. More details to come.
Steven Silvester (Anaconda Powered by Continuum Analytics), Jason Grout (Bloomberg)
Steven Silvester and Jason Grout lead a walkthrough of JupyterLab as a user and as an extension author, explore its capabilities, and offer a demonstration of how to create a simple extension to the environment.
Brian Granger (Cal Poly San Luis Obispo), Chris Colbert (Project Jupyter), Ian Rose (UC Berkeley)
Brian Granger, Chris Colbert, and Ian Rose offer an overview of JupyterLab, which enables users to work with the core building blocks of the classic Jupyter Notebook in a more flexible and integrated manner.
Moderated by: David Visontai
The advent of many interdisciplinary research areas and the cooperation of different scientific fields demand computational systems that allow for efficient collaboration. Kooplex, our highly integrated system incorporating the advantages of Jupyter notebooks, public dashboards, version control and data sharing serves as a basis for different projects in fields ranging from Medicine to Physics.
Demba Ba (Harvard University)
Demba Ba discusses two new signal processing/statistical modeling courses he designed and implemented at Harvard, exploring his perspective as an educator and that of the students as well as the steps that led him to adopt the current cloudJHub architecture. Along the way, Demba outlines the potential of architectures such as cloudJHub to help to democratize data science education.
Kari Jordan (Data Carpentry)
Diversity can be achieved through sharing information among members of a community. Jupyter prides itself on being a community of dynamic developers, cutting-edge scientists, and everyday users, but is our platform being shared with diverse populations? Kari Jordan explains how training has the potential to improve diversity and drive usage of Jupyter notebooks in broader communities.
Megan Risdal (Kaggle), Wendy Chih-wen Kan (Kaggle)
Kaggle Kernels, an in-browser code execution environment that includes a version of Jupyter Notebooks, has allowed Kaggle to flourish in new ways. Drawing on a diverse repository of user-created notebooks paired with competitions and public datasets, Megan Risdal and Wendy Chih-wen Kan explain how Kernels has impacted machine learning trends, collaborative data science, and learning.
Christine Doig (Anaconda ), Fabio Pliger (Anaconda)
Christine Doig and Fabio Pliger explain how they built a commercial product on top Jupyter to help Excel users access the capabilities of the rich data science Python ecosystem and share examples and use cases from a variety of industries that illustrate the collaborative workflow between analysts and data scientists that the application has enabled.
Industry Table discussions are a great way to informally network with people in similar industries or interested in the same topics. Industry Table discussions will happen during lunch on Thursday, August 24, and Friday, August 25.
Industry Table discussions are a great way to informally network with people in similar industries or interested in the same topics. Industry Table discussions will happen during lunch on Thursday, August 24, and Friday, August 25.
Robert Schroll (The Data Incubator)
Robert Schroll introduces TensorFlow's capabilities through its Python interface with a series of Jupyter notebooks, moving from building machine learning algorithms piece by piece to using the higher-level abstractions provided by TensorFlow. You'll then use this knowledge to build and visualize machine learning models on real-world data.
Jeremy Freeman (Chan Zuckerberg Initiative)
Modern biology is evolving quickly, but if we want to make our science more robust, more scalable, and more reproducible, the major bottleneck is computation. Jeremy Freeman offers an overview of a growing ecosystem of solutions to this challenge—many of which involve Jupyter—in the context of exciting scientific projects past, present, and future.
Ryan Lovett (Department of Statistics, UC Berkeley), Yuvi Panda (Data Science Education Program (UC Berkeley))
The UC Berkeley Data Science Education program uses Jupyter notebooks on a JupyterHub. Ryan Lovett and Yuvi Panda outline the DevOps principles that keep the largest reported educational hub (with 1,000+ users) stable and performant while enabling all the features instructors and students require.
Raj Singh (IBM Cloud Data Services)
Raj Singh offers an overview of PixieDust, a Jupyter Notebook extension that provides an easy way to make interactive maps from DataFrames for visual exploratory data analysis. Raj explains how he built mapping into PixieDust, putting data from Apache Spark-based analytics on maps using Mapbox GL.
Andreas Mueller (Columbia University)
Do you have questions on general machine learning or maybe something a little more specific, like Python tools for machine learning, accessible machine learning and data science, automatic machine learning or scikit-learn? Andreas is a great resource; stop by for a chat.
Gunjan Baid (UC Berkeley)
Chat with Gunjan about the use of Jupyter notebooks in education and how to use these tools more effectively in classrooms.
Kyle Kelley (Netflix)
Kyle is happy to talk with people about how Netflix’s data platform uses and deploys the backing infrastructure for Jupyter, what it’s like to build frontends for Jupyter, and where to move Jupyter forward to meet current and future needs.
Paco Nathan (O'Reilly Media)
Paco will be available to discuss using Jupyter notebooks in media for publishing computable content and coaching authors to be more effective with Jupyter notebooks and machine learning pipelines managed using Jupyter notebooks for active learning and human-in-the-loop design patterns.
yoshi NOBU Masatani (National Institute of Informatics)
Interested in literate computing for reproducibility and nblineage? Or understanding the notebook lifecycle and the consequences of computational narratives? Grab this opportunity to meet Nobu.
Pramit Choudhary (Datascience.com)
Pramit Choudhary offers an overview of Datascience.com's model interpretation library Skater, explains how to use it to evaluate models using the Jupyter environment, and shares how it could help analysts, data scientists, and statisticians better understand their model behavior—without compromising on the choice of algorithm.
Carol Willing (Cal Poly San Luis Obispo)
Music engages and delights. Carol Willing explains how to explore and teach the basics of interactive computing and data science by combining music with Jupyter notebooks, using music21, a tool for computer-aided musicology, and Magenta, a TensorFlow project for making music with machine learning, to create collaborative narratives and publishing materials for teaching and learning.
Patty Ryan (Microsoft), Lee Stott (Microsoft), Michael Lanzetta (Microsoft)
Patty Ryan, Lee Stott, and Michael Lanzetta explore four industry examples of Jupyter notebooks that illustrate innovative applications of machine learning in manufacturing, retail, services, and education and share four reference industry Jupyter notebooks (available in both Python and R)—along with demo datasets—for practical application to your specific industry value areas.
Hilary Parker (Stitch Fix)
Traditionally, statistical training has focused on statistical methods and tests, without addressing the process of developing a technical artifact, such as a report. Hilary Parker argues that it's critical to teach students how to go about developing an analysis so they avoid common pitfalls and explains why we must adopt a blameless postmortem culture to address these pitfalls as they occur.
Daniel Mietchen (University of Virginia)
Jupyter notebooks are a popular option for sharing data science workflows. Daniel Mietchen shares best practices for reproducibility and other aspects of usability (documentation, ease of reuse, etc.) gleaned from analyzing Jupyter notebooks referenced in PubMed Central, an ongoing project that started at a hackathon earlier this year and is being documented on GitHub.
Christian Moscardi (The Data Incubator)
Christian Moscardi walks you through developing a machine learning pipeline, from prototyping to production, with the Jupyter platform, exploring data cleaning, feature engineering, model building and evaluation, and deployment in an industry-focused setting. Along the way, you'll learn Jupyter best practices and the Jupyter settings and libraries that enable great visualizations.
Moderated by: Bill Walrond
In this presentation, Kevin Rasmussen, Solution Architect, Caserta Concepts, discusses why notebooks aren’t just for data scientists anymore. Drawing information from a current project with one of the most respected newspapers in the country, he will go into detail about how to put data engineering into production with notebooks.
Moderated by: Jonathan Whitmore
Project Jupyter contains tools that are perfect for many data science tasks, including rapid iteration for data munging, visualizing, and creating a beautiful presentation of results. The same tools that give power to individual data scientists can prove challenging to integrate in a team setting. This talk will emphasize overall best practices for data science team productivity.
Fernando Perez (UC Berkeley and Lawrence Berkeley National Laboratory)
Fernando Pérez opens JupyterCon with an overview of Project Jupyter, describing how it fits into a vision of collaborative, community-based open development of tools applicable to research, education, and industry.
Moderated by: David P. Sanders (Department of Physics, Faculty of Sciences, National University of Mexico)
An overview of using Julia with the Jupyter notebook, showing how the flexibility of the language is reflected in the notebook environment.
Moderated by: Trevor Lyon, Matt McKay, and Spencer Lyon
Introduction to the QuantEcon Open Notebook Archive, a community driven home for sharing and discovering Jupyter notebooks.
Mac Rogers (Domino Data Lab)
Mac Rogers shares best practices for creating Jupyter dashboards and some lesser-known tricks for making Jupyter dashboards interactive and attractive.
Alexandre Archambault explores why an official Scala kernel for Jupyter has yet to emerge. Part of the answer lies in the fact that there is no user-friendly, easy-to-use Scala shell in the console (i.e., no IPython for Scala). But there's a new contender, Ammonite—although it still has to overcome a few challenges, not least being supporting by big data frameworks like Spark, Scio, and Scalding.
Moderated by: Matt Henderson and Shreyas Cholia
Scientists increasingly rely on large-scale computation and data analysis, with applications ranging from designing better batteries to understanding our universe. In this talk we’ll describe how scientists could greatly benefit from a platform using the core Jupyter architecture of notebooks and kernels with large-scale HPC and data analysis systems to enable interactive supercomputing.
Moderated by: Majid Khorrami & Laura Kahn
What if decision makers could use data science techniques to predict how much economic aid they would receive each year? Our proposal will show how we did just that and used data for social good.
Gather before keynotes on Thursday and Friday morning for a speed networking event. Enjoy casual conversation while meeting new attendees.
Gather before keynotes on Thursday and Friday morning for a speed networking event. Enjoy casual conversation while meeting new attendees.
Moderated by: Marius van Niekerk
Spylon kernel is a pure python jupyter metakernel. This allows python and scala users to have an easy kernel to use with Apache Spark.
Christian Moscardi (The Data Incubator)
Christian Moscardi shares the practical solutions developed at the Data Incubator for using Jupyter notebooks for education. Christian explores some of the open source Jupyter extensions he has written to improve the learning experience as well as tools to clean notebooks before they are committed to version control.
Moderated by: Joshua Cook
This teaching session will take participants through using Docker's suite of tools, the numpy/scipy ecosystem, and the Jupyter project as a feature-rich programming interface, to build powerful systems for performing rich analysis and transformation on data sets of any size.
Brett Cannon (Microsoft | Python Software Foundation)
Brett Cannon explains why, in order for open source projects to function long-term, a symbiotic relationship between user and project maintainer needs to exist. When users receive a useful piece of software and project maintainers receive useful help in maintaining the project, everyone is happy.
M Pacer (Project Jupyter | Berkeley Institute for Data Science), Jess Hamrick (UC Berkeley), Damián Avila (Anaconda Powered by Continuum Analytics)
M Pacer, Jess Hamrick, and Damián Avila explain how the structured nature of the notebook document format, combined with native tools for manipulation and creation, allows the notebook to be used across a wide range of domains and applications.
The DOE Systems Biology Knowledgebase (KBase) is an open source project that enables biological scientists to create, execute, collaborate on and share reproducible analysis workflows. KBase's Narrative Interface, built on the Jupyter Notebook, is the front end to a scalable object store, an execution engine, a distributed compute cluster, and a library of analysis tools packaged as Docker images.
William Merchan (DataScience.com)
William Merchan outlines the fundamental trends driving the adoption of Jupyter and shares lessons learned deploying Jupyter in large organizations. Join in to learn best practices in developing a high-performing data science team and moving data science to the core and discover where data science platforms fit in.
Author Book Signings will be held in the O’Reilly booth on Thursday. This is a great opportunity for you to meet O’Reilly authors and get a free copy of one of their books. Complimentary copies will be provided to the first 25 attendees. Limit one free book per attendee.
Andrew Odewahn (O'Reilly Media), Fernando Perez (UC Berkeley and Lawrence Berkeley National Laboratory)
Program chairs Andrew Odewahn and Fernando Pérez open the first day of keynotes.
Danielle Chou (Zymergen)
Zymergen approaches biology with an engineering and data-driven mindset. Its platform integrates robotics, software, and biology to deliver predictability and reliability during strain design and development. Danielle Chou explains the integral role Jupyter notebooks play in providing a shared Python environment between Zymergen's software engineers and scientists.
Moderated by: Timothy Dobbins
SQLCell is a magic function that executes raw, parallel, parameterized SQL queries with the ability to accept python variables as parameters, switch between engines with a button click, run outside of a transaction block, produce an intuitive query plan graph with D3.js to highlight slow points in query; all while concurrently running Python code. And much more.
Moderated by: Jason Kuruzovich
FreeCodeCamp.com is a online learning platform for coding that has figured out how to use distributed content creation to power a learning community. This talk will discuss FreeCodeCamp and detail my current efforts to start a similar model for analytics with the AnalyticsDojo.com:, including content, technical, and community related opportunities and challenges.
Nadia Eghbal (GitHub)
We know money has an important role to play in open source, but where does it help and where does it fall short? Nadia Eghbal explores how money can support open source development without changing its incentives—especially when grants are involved.
Andreas Mueller (Columbia University)
The Jupyter Notebook can combine narrative, code, and graphics—the ideal combination for teaching anything programming related. That's why Andreas Müller chose to write his book, Introduction to Machine Learning with Python, in a Jupyter notebook. However, going from notebook to book was not easy. Andreas shares challenges and tricks for converting notebooks for print.
Researchers, data scientists, and professionals spend their days doing cutting-edge work. But when it comes time to writing, and disseminating their work, they’re often still using models and tools that haven’t changed much in decades, if not centuries.
Sylvain Corlay (QuantStack), Johan Mabille (QuantStack)
Xeus takes on the burden of implementing the Jupyter kernel protocol so that kernel authors can focus on more easily implementing the language-specific part of the kernel and support features, such as autocomplete or interactive widgets. Sylvain Corlay and Johan Mabille showcase a new C++ kernel based on the Cling interpreter built with xeus.