Brought to you by NumFOCUS Foundation and O’Reilly Media
The official Jupyter Conference
Aug 21-22, 2018: Training
Aug 22-24, 2018: Tutorials & Conference
New York, NY

Speakers

Hear from innovative practitioners, talented managers, and senior developers who are doing amazing things in the Jupyter ecosystem. More speakers will be announced; please check back for updates.

Filter

Search Speakers

Ryan Abernathey is an assistant professor of Earth and environmental science at Columbia University and Lamont Doherty Earth Observatory. Ryan is a physical oceanographer who studies the large-scale ocean circulation and its relationship with Earth’s climate. High-resolution numerical modeling and satellite remote sensing are key tools in this research, which has led to an interest in high-performance computing and big data. Previously, he held a postdoc at Scripps Institution of Oceanography. In 2016, Ryan was awarded an Alfred P. Sloan Research Fellowship in ocean sciences and an NSF CAREER award for a project entitled “Evolution of Mesoscale Turbulence in a Changing Climate” and received a NASA New Investigator Award in 2013. He is an active participant in and advocate for open source software, open data, and reproducible science. He holds a PhD from MIT and a BA from Middlebury College.

Presentations

Keynote by Ryan Abernathey Keynote

Keynote by Ryan Abernathey

Pangeo: Big data climate science in the cloud Session

Climate science is being flooded with petabytes of data, overwhelming traditional modes of data analysis. The Pangeo project is building a platform to take big data climate science into the cloud using SciPy and large-scale interactive computing tools. Join Ryan Abernathey and Yuvi Panda to find out what the Pangeo team is building and why and learn how to use it.

I am an IT manager for the Pacific Institute for the Mathematical Sciences, I’m also a long time user of ipython and jupyter with a background in computational physics.

I helped create and deploy a system of JupyterHubs under the name syzygy.ca allowing more than 8000 staff, students and faculty to include jupyter in their work. I am also involved in a program to leverage Jupyter in K12 classrooms via the Canadian Government’s CanCode initiative.

Presentations

Canadians Land on Jupyter Session

Over the past 18 months, we have deployed Jupyter to more than 8000 users at Universities across Canada. In this talk, we'll discuss how we did it, how we plan to scale and deliver the service nationally, how people are using the platform, and how we intend to make Jupyter integral to the working experience of students, researchers, and faculty members.

Damián Avila is a Software Developer, Data Scientist and Quantitative Analyst from Córdoba, Argentina.
His main focus of interests is Data Science, Finance, Data Visualization and the Jupyter/IPython ecosystem.
He has made meaningful contributions to several Open Source projects (core developer of popular projects, such as Jupyter/IPython, Nikola, and Bokeh) and also he started his own projects being RISE (a “live” slideshow for the Jupyter notebook) the most popular one.
He has presented talks, tutorials, and posters in several national and international conferences.
Currently, he’s working and leading projects as a Software Developer at Anaconda, Inc.

Presentations

Current RISE candies and its evolution into the future. Session

RISE has evolved into the main slideshow machinery for live presentations within the Jupyter notebook. In this talk, we'll explain how to install/use RISE and how to customize it. Additionally, we will show some new capabilities. Finally, we'll show the beginning of the migration from RISE into a new jupyterlab-rise extension providing RISE-based capabilities in the new Jupyter Lab interface.

Lorena A. Barba is associate professor of mechanical and aerospace engineering at the George Washington University in Washington, DC. In addition to her research in computational science and engineering, she is interested in education technology, social learning and massively open online courses as well as innovations in STEM education, including flipped classrooms and other forms of blended learning. Lorena is a recipient of the 2016 Leamer-Rosenthal Award for Open Social Sciences and was awarded an honorable mention at the 2017 Open Education Awards for Excellence of the Open Education Consortium.

Presentations

Flipped learning with Jupyter: Experiences, best practices, and supporting research Session

In flipped learning, students encounter new material before class meetings, which helps them learn how to learn and frees up class time to focus on creative applications of the basic material. Lorena Barba and Robert Talbert discuss the use of Jupyter notebooks as a “tangible interface” for new material in a flipped course and share case studies from their own courses.

Honey Berk is the Managing Director of the CUNY Building Performance Lab. Honey manages the organization’s core contract with the NYC Department of Administrative Services, and oversees a number of applied research programs, primarily focused around energy data analysis and building automation systems. Further, she directs the Energy Data Lab, a program that trains student interns from the CUNY campuses in energy data analysis and M&V protocols. Honey has an M.S. in Data Analytics from the CUNY School of Professional Studies and a B.A. in Psychology from New York University.

Presentations

Developing an Inverse Energy Data Analysis Toolkit with Jupyter Notebook Poster

Available building energy data analysis software does not meet the needs of building scientists and energy service professionals. This session will showcase development of a Python-based API and data visualization toolkit that can be used within a Jupyter Notebook to create a powerful and flexible analysis tool, and also to prototype code which can be plugged in to more robust applications.

Doug is an associate professor of computer science at Bryn Mawr College, an all-women’s college outside of Philadelphia, PA. He has been using Python in education for 20 years, and Jupyter since its creation. He has developed many languages and tools for Jupyter specifically for pedagogy. His research area is in combining artificial neural networks and robotics in order to give robots self-motivation.

Presentations

Jupyter Graduates! Session

For the last four years, I have used nothing but Jupyter in the classroom. From a firstyear writing course to a course on assembly language; from Biology to Computer Science; from lectures to homework---everything has been in Jupyter. In this talk, I explore the ways I have leveraged Jupyter, and detail the successes and failures experienced along the way.

Nick Bollweg is a core member of the Jupyter Project and contributor to conda-forge and other Python and JavaScript open source projects. Over his career, he has done work in the enterprise open source, medical, corporate, and applied research sectors, including oncology biostatistics curation, document management fleet optimization, complex system collaboration, and decision-making tools and enterprise data science platforms. Nick holds a BA in computer science and German from the University of Minnesota and a PM in applied systems engineering from the Georgia Institute of Technology.

Presentations

The reincarnation of a notebook Session

Notebook authors often consider only the interactive experience of creating computable documents. However, the dynamic state of a notebook is a minor period in its lifecycle; the majority is spent as a file at rest. Tony Fast and Nick Bollweg explore conventions that create notebooks with value long past their inception as documents, software packages, test suites, and interactive applications.

Maarten Breddels is a astronomer, freelance developer, consultant, and data scientist working working mostly with Python, C++, and JavaScript in the Jupyter ecosystem. His expertise ranges from fast numerical computation and API design to 3D visualization. He holds a bachelor’s degree in ICT and both a master’s degree and PhD in astronomy.

Presentations

Jupyter widgets Session

Project Jupyter aims to provide a consistent set of tools for data science workflows, from the exploratory phase of the analysis to the sharing of the results. Maarten Breddels and Sylvain Corlay offer an overview of Jupyter's interactive widgets framework, which enables rich user interaction, including 2D and 3D interactive plotting, geographic data visualization, and much more.

Matt currently leads instruction for General Assembly’s Data Science Immersive in Washington, DC, where he helps bridge the gap between theoretical statistics and real-world insights. Matt is passionate about making data science more accessible and putting the revolutionary power of machine learning into the hands of as many people as possible. A recovering politico, Matt was a data scientist for a political consulting firm through the 2016 election. He holds a master’s degree in statistics from the Ohio State University. When he isn’t teaching, he’s thinking about how to be a better teacher, falling asleep to Netflix, or cuddling with his pug.

Presentations

Advanced data science, part 2: Five ways to handle missing data in Jupyter notebooks Tutorial

Missing data plagues nearly every data science problem. Often, people just drop or ignore missing data. However, this usually ends up with bad results. Matt Brems explains how bad dropping or ignoring missing data can be and teaches you how to handle missing data the right way by leveraging Jupyter notebooks to properly reweight or impute your data.

Matthias Bussonnier is postdoc at UC Berkeley BIDS and a core developer of the Jupyter and IPython project, where he is working in close collaboration with Google to bring real-time collaboration to the Jupyter environment.

Presentations

Jupyter's configuration system Session

Jupyter's straightforward, out-of-the-box experience has been important for its success in widespread adoption. But good defaults only go so far. Join Afshin Darian, M Pacer, Min Ragan-Kelley, and Matthias Bussonnier to go beyond the defaults and make Jupyter your own.

JupyterLab tutorial Tutorial

JupyterLab—Jupyter's new frontend—goes beyond the classic Jupyter Notebook, providing a flexible and extensible web application with a set of reusable components. Jason Grout and Matthias Bussonnier walk you through using JupyterLab, explain how to transition from the classic Jupyter Notebook frontend to JupyterLab, and demonstrate the new powerful features of JupyterLab.

Diogo Castro is a member of the Software Development for Experiments group at CERN, where he works in the SWAN team as a full-stack developer.

Presentations

SWAN: CERN's Jupyter-based Interactive Data Analysis Service Session

SWAN, CERN’s Service for Web-based ANalysis, is leveraging the power of Jupyter to provide the High Energy Physics community with access to state-of-the-art infrastructure and services through a web service. This presentation details how this was possible and how is being used by researchers and students.

Chakri Cherukuri is a senior researcher in the Quantitative Financial Research Group at Bloomberg LP. His research interests include quantitative portfolio management, algorithmic trading strategies, and applied machine learning. Chakri has extensive experience in numerical computing and software development. Previously, he built analytical tools for the trading desks at Goldman Sachs and Lehman Brothers. He holds an undergraduate degree in engineering from the Indian Institute of Technology, Madras, an MS in computer science from Arizona State University, and an MS in computational finance from Carnegie Mellon University.

Presentations

Visualizing machine learning models in the Jupyter Notebook (sponsored by Bloomberg LP) Session

Chakri Cherukuri offers an overview of the interactive widget ecosystem available in the Jupyter notebook and illustrates how Jupyter widgets can be used to build rich visualizations of machine learning models. Along the way, Chakri walks you through algorithms like regression, clustering, and optimization and shares a wizard for building and training deep learning models with diagnostic plots.

Christopher Cho is a product manager and cloud program manager at Google, where he helps customers solve machine learning and infrastructure problems, and is one of the product managers in Kubeflow team. Previously, Chris was research program manager at DeepMind, working on cutting-edge ML research. His background is in enterprise business consulting. Chris is currently working toward his MSCS at Georgia Tech. He holds a BS MechE from the University of Illinois Urbana-Champaign.

Presentations

Machine learning at scale with Kubernetes 1-Day Training

Christopher Cho demonstrates how Kubernetes can be easily leveraged to build a complete deep learning pipeline, including data ingestion and aggregation, preprocessing, ML training, and serving with the mighty Kubernetes APIs.

Pramit Choudhary is a lead data scientist at DataScience.com, where he focuses on optimizing and applying classical machine learning and Bayesian design strategy to solve real-world problems. Currently, he is leading initiatives on figuring out better ways to explain a model’s learned decision policies to reduce the chaos in building effective models and close the gap between a prototype and operationalized model.

Presentations

Business Summit roundtable: The current environment—Compliance, ethics, ML model interpretation, GDPR, and more Session

Join in for the Business Summit's roundtable discussion with participation from IBM, Capital One, the DoD, Amazon AWS, Oracle, and others. Speakers will discuss important issues in our current environment—everything from compliance and GDPR to ML models.

Human in the loop: Understanding model interpretation with Jupyter and Skater Tutorial

Just predicting the target labels for a data science use case is not enough. It's important to understand the why, what, and how of a given model’s behavior. Pramit Choudhary explores algorithms (post hoc and rule extraction) to faithfully interpret ML models globally and locally with Jupyter's interactiveness and Skater, an open source library to demystify the inner workings of ML models.

April Clyburne-Sherin is an outreach scientist at Code Ocean, where she trains scientists in computational reproducibility best practices. An epidemiologist, methodologist, and expert in open science tools, methods, training, and community stewardship, since 2014, April has focused on training scientists in open and reproducible research methods at the Center for Open Science, Sense About Science, and SPARC. She is coauthor of FOSTER’s Open Science Training Handbook; cofounder of OOO Canada, a network to promote leadership in open access, open education, and open data; and producer of The Method, an open source podcast. She holds an MS in population medicine (epidemiology).

Presentations

Preparing your Jupyter notebook for computationally reproducible publication: A hands-on BYONotebook tutorial for researchers Tutorial

April Clyburne-Sherin walks you through preparing Jupyter notebooks for computationally reproducible publication. You'll learn best practices for publishing notebooks and get hands-on experience preparing your own research for reuse, creating documentation, and submitting your notebook to share.

Chris Colbert is a software architect for Project Jupyter.

Presentations

JupyterLab Session

Ian Rose and Chris Colbert walk you through the JupyterLab interface and codebase and explain how it fits within the overall roadmap of Project Jupyter.

JupyterLab training 1-Day Training

Chris Colbert, Ian Rose, and Saul Shanabrook walk you through using, extending, and developing custom components for JupyterLab using PhosphorJS, React, JavaScript, TypeScript, and CSS. You'll learn how to make full use of the power features of JupyterLab, customize it to your needs, and develop custom extensions, making complete use of JupyterLab's current capabilities.

James Colliander is Professor of Mathematics at UBC and serves as Director of the Pacific Institute for the Mathematical Sciences. He is also the Founder/CEO of Crowdmark, an education technology company based in Toronto. Colliander’s research intertwines partial differential equations, harmonic analysis, and dynamical systems to address problems arising from mathematical physics and other sources. He received his PhD in 1997 from the University of Illinois. After an NSF Postdoc at the University of California Berkeley, Colliander joined the University of Toronto and became Professor in 2007. He moved to UBC in 2015. Colliander was Professeur Invité at the Université de Paris-Nord, Université de Paris-Sud, and at the Institut Henri Poincaré. He has been a member of the Institute for Advanced Study. Colliander received a Sloan Fellowship, the McLean Award, and is an award winning teacher.

Presentations

Canadians Land on Jupyter Session

Over the past 18 months, we have deployed Jupyter to more than 8000 users at Universities across Canada. In this talk, we'll discuss how we did it, how we plan to scale and deliver the service nationally, how people are using the platform, and how we intend to make Jupyter integral to the working experience of students, researchers, and faculty members.

Alex Comerford is a data scientist at cybersecurity company Capsule8, where he focuses on developing interactive and informative data visualizations to identify security issues in large-scale cloud environments. His interests include data science, data visualization, statistics, and machine learning.

Presentations

Rapid data science deployment for cybersecurity with JupyterHub Session

The key to successful threat detection in cybersecurity is fast response. George Williams, Harini Kannan, and Alex Comerford offer an overview of specialized extensions they have built for data scientists working in cybersecurity that can be used and deployed via JupyterHub.

Sylvain Corlay is a quant researcher specializing in stochastic analysis and optimal control and the founder of QuantStack. Previously, Sylvain was a quant researcher at Bloomberg LP and an adjunct faculty member at Columbia University and NYU. As an open source developer, Sylvain mostly contributes to Project Jupyter in the area of interactive widgets and lower-level components such as traitlets. He is also a member of the steering committee of the project. Sylvain is also a contributor to a number of other open source projects for scientific computing and data visualization, such as bqplot, pythreejs, and ipyleaflet, and coauthored the xtensor C++ tensor algebra library. He holds a PhD in applied mathematics from University Paris VI.

Presentations

Going Native: C++ as a First-Class Citizen of the Jupyter Ecosystem Session

In this talk, we present the latest features of the C++ Jupyter kernel including - live help, auto-completion, - rich mime type rendering, - interactive widgets, making it one of the most featureful implementations of the Jupyter kernel protocol, and bringing Jupyter closer to the metal.

Jupyter widgets Session

Project Jupyter aims to provide a consistent set of tools for data science workflows, from the exploratory phase of the analysis to the sharing of the results. Maarten Breddels and Sylvain Corlay offer an overview of Jupyter's interactive widgets framework, which enables rich user interaction, including 2D and 3D interactive plotting, geographic data visualization, and much more.

Afshin Darian is a Jupyter core developer at Two Sigma and a coauthor of JupyterLab. He has been active in the open source community for several years and has worked at several open source enterprises, including Anaconda, Alfresco Software, and OpenGamma. Darian holds degrees in philosophy and medieval history.

Presentations

Jupyter's configuration system Session

Jupyter's straightforward, out-of-the-box experience has been important for its success in widespread adoption. But good defaults only go so far. Join Afshin Darian, M Pacer, Min Ragan-Kelley, and Matthias Bussonnier to go beyond the defaults and make Jupyter your own.

John DeBlase is lead developer for the CUNY Building Performance Lab, where he helps develop Python-based statistical modeling applications for city-wide energy management research. A developer, data scientist, and musician from Queens, NY, John’s personal research revolves around the development musical intelligence systems using natural language processing techniques with a focus on real-time human-computer interaction. John is interested in developing applications for data scientists that emphasize interactive data visualization, leveraging the best tools currently available in both Python and Node.js.

Presentations

Developing an Inverse Energy Data Analysis Toolkit with Jupyter Notebook Poster

Available building energy data analysis software does not meet the needs of building scientists and energy service professionals. This session will showcase development of a Python-based API and data visualization toolkit that can be used within a Jupyter Notebook to create a powerful and flexible analysis tool, and also to prototype code which can be plugged in to more robust applications.

Miro Enev is a senior solutions architect at NVIDIA, where he helps train and guide pilot deep learning projects at Amazon. Miro’s interests include advancing data science and machine intelligence while respecting human values in future technology ecosystems.

Presentations

Explore AWS Machine Learning Platform using Amazon SageMaker (Day 2) Training Day 2

Machine Learning and IoT projects are now common for enterprises and startups alike. These advanced technologies have been the key innovation engine for businesses such as Amazon Go, Alexa, and Robotics. In this hands-on workshop, we will explore the AWS Machine Learning Platform using project Jupyter-based Amazon SageMaker to build, train, and deploy ML/DL models to Cloud, and AWS DeepLens.

Explore the AWS machine learning platform using Amazon SageMaker 2-Day Training

Machine learning and IoT projects are increasingly common at enterprises and startups alike and have been the key innovation engine for Amazon businesses such as Go, Alexa, and Robotics. Wenming Ye and Miro Enev lead a hands-on deep dive into the AWS machine learning platform, using Project Jupyter-based Amazon SageMaker to build, train, and deploy ML/DL models to the cloud and AWS DeepLens.

Tyler A. Erickson is a senior developer advocate at Google, where he fosters collaborations with researchers from academia, NGOs, and governmental organizations seeking to capitalize on Earth Engine’s capabilities for geospatial analyses that involve immense satellite and model-based datasets. Tyler leads the development of Earth Engine’s core efforts in water and climate, guides the evolution of Earth Engine to support these scientific domains, and leads support efforts for the Earth Engine Python API. A snow hydrologist by training, he holds degrees in civil and environmental engineering and geography from Colorado State University, CalTech, Stanford, and the University of Colorado at Boulder. Tyler is a longtime Python programmer, with contributions to the Open Source Geospatial (OSGeo) Foundation and the Free and Open Source Software for Geospatial (FOSS4G) conferences.

Presentations

How JupyterLab and widgets enable interactive analysis of the Earth's past, present, and future Session

Massive collections of data on the Earth's changing environment, collected by satellite sensors and generated by Earth system models, are being exposed via web APIs by multiple providers. Tyler Erickson highlights the use of JupyterLab and Jupyter widgets in analyzing complex high-dimensional datasets, providing insights into how our Earth is changing and what the future might look like.

Will Farr is an associate professor in the Department of Physics and Astronomy at Stony Brook University and the Gravitational Wave Astronomy Group leader at the Flatiron Institute’s Center for Computational Astronomy. A theoretical astrophysicist with interests in astrostatistics, the gravitational dynamics of exoplanets and dense stellar systems, gravitational waves, compact object evolution, computational astrophysics, and general relativity, Will is also an enthusiastic programming language polyglot and has contributed software to many astronomical projects. You can find him as farr on GitHub.

Presentations

All the cool kids are doing it; maybe we should too? Jupyter, gravitational waves, and the LIGO and Virgo Scientific Collaborations Keynote

Will Farr shares examples of Jupyter use within the LIGO and Virgo Scientific Collaborations and offers lessons about the (many) advantages and (few) disadvantages of Jupyter for large, global scientific collaborations. Along the way, Will speculates on Jupyter's future role in gravitational wave astronomy.

Tony Fast is a modern scientist with over a decade of experience analyzing unstructured data for cross-functional teams in research, business, and security. Tony currently explores the intersection of applied engineering and computer science, trying to understand how open access will transform basic science for the next-generation workforce. He is actively building diverse communities around open source scientific software technologies in metro Atlanta; he currently organizes the Atlanta Jupyter user group and is a data lead at Code for Atlanta. He was also a cofounder of PyData Atlanta. Tony holds a PhD in materials science and engineering from Drexel University and a BS in ceramic engineering from Rutgers University.

Presentations

The reincarnation of a notebook Session

Notebook authors often consider only the interactive experience of creating computable documents. However, the dynamic state of a notebook is a minor period in its lifecycle; the majority is spent as a file at rest. Tony Fast and Nick Bollweg explore conventions that create notebooks with value long past their inception as documents, software packages, test suites, and interactive applications.

Nicolas Fernandez is a Computational Scientist at the
Human Immune Monitoring Center at the Icahn School of Medicine at Mount Sinai. Nicolas is a computational biologist with interests in analysis and visualization of high-throughput biological data as a means to understanding biological regulatory networks.

Presentations

Visualizing High-Dimensional Biological Data with Clustergrammer-Widget in Jupyter Notebooks Session

Exploring high-dimensional requires the development of sophisticated interactive visualizations to enable users to easily discover complex patterns within their data. We developed Clustergrammer-widget, an interactive heatmap Jupyter widget, that enables users to easily explore high-dimensional data within a Jupyter notebook and share their interactive visualizations using NBviewer.

Jessica Forde is a technical writer for Project Jupyter. Her previous open source projects include datamicroscopes, a Bayesian nonparametrics library in Python, and density, a tool for Columbia University study spaces based on wireless device data.

Presentations

Learn by doing: Using data-driven stories and visualizations in the (high school and college) classroom Session

Students learn by doing. Carol Willing, Jessica Forde, and Erik Sundell demonstrate the value of interactive content, using Jupyter notebooks, widgets, and visualization libraries, share notable examples of projects within the Jupyter community, and outline ways educators can help students develop data science literacy and use computational skills to build upon their interests.

Ian Foster is a senior scientist, distinguished fellow, and director of the Data Science and Learning Division at Argonne National Laboratory as well as the Arthur Holly Compton Distinguished Service Professor of Computer Science at the University of Chicago and a fellow of the Institute for Molecular Engineering. A computer scientist whose work at the intersection of computing and the sciences has produced both practical technologies that have seen wide adoption and concepts and methods that have proven influential in research and education, Ian is also chief troublemaker at Globus. His research interests span a range of topics in parallel, distributed, and data-intensive computing. A unifying theme is a desire to use the power of rapid communication to accelerate discovery, whether by linking people with remote computers and data, accelerating complex computational processes, or enabling distributed virtual teams. Ian pursues use-inspired basic research, meaning that he employs challenging practical problems to motivate and focus work on hard problems in computer science. Over the years, these practical problems have come from such fields as environmental science, economics, high-energy physics, biomedicine, and engineering. He often builds sophisticated artifacts (i.e., software and distributed systems) in order to apply, evaluate, and disseminate new concepts and methods. Ian’s work frequently involves large teams of disciplinary scholars, computer scientists, and software engineers. Ian has received multiple awards for his work, including the IEEE TCSC Award for Excellence in Scalable Computing (2014), the Inaugural ACM HPDC Lifetime Achievement Award (2012), and the IEEE Tsutomu Kanai Award (2011).

Presentations

Scaling collaborative data science with Globus and Jupyter Session

The Globus service simplifies the utilization of large and distributed data on the Jupyter platform. Ian Foster explains how to use Globus and Jupyter to seamlessly access notebooks using existing institutional credentials, connect notebooks with data residing on disparate storage systems, and make data securely available to business partners and research collaborators.

Michelle Gill is a senior deep learning consultant within NVIDIA’s Professional Services group, where she assists clients across all sectors in utilizing deep learning for strategic advantage. Previously, Michelle was a senior data scientist at Metis, where she taught quarterly bootcamps and conducted corporate training focused on data science, machine learning, and related technologies; a scientist at the National Cancer Institute, where she developed parallelized software utilizing machine learning and compressed sensing algorithms; and a postdoctoral research fellow at Columbia University Medical School, where she utilized nuclear magnetic resonance (NMR) spectroscopy to study the biological activity of cancer-associated enzymes. She holds a PhD in molecular biophysics and biochemistry from Yale University.

Presentations

Data science as a catalyst for scientific discovery Keynote

Michelle Gill explains how data science methodologies and tools can be used to link information from different scientific fields and accelerate discovery in a variety of areas, including the biological sciences.

Zachary Glassman is a data scientist in residence at the Data Incubator. Zachary has a passion for building data tools and teaching others to use Python. He studied physics and mathematics as an undergraduate at Pomona College and holds a master’s degree in atomic physics from the University of Maryland.

Presentations

Hands-on data science with Python 2-Day Training

Zachary Glassman leads a hands-on dive into building intelligent business applications using machine learning, walking you through all the steps of developing a machine learning pipeline. You'll explore data cleaning, feature engineering, model building and evaluation, and deployment and extend these models into two applications from real-world datasets.

Hands-On Data Science with Python (Day 2) Training Day 2

This course offers a foundation in building intelligent business applications using machine learning. We will walk through all the steps of developing a machine learning pipeline. We’ll look at data cleaning, feature engineering, model building/evaluation, and deployment. Students will extend these models into two applications from real-world datasets.

Bruno Gonçalves is a Moore-Sloan fellow at NYU’s Center for Data Science. With a background in physics and computer science, Bruno has spent his career exploring the use of datasets from sources as diverse as Apache web logs, Wikipedia edits, Twitter posts, epidemiological reports, and census data to analyze and model human behavior and mobility. More recently, he has been focusing on the application of machine learning and neural network techniques to analyze large geolocated datasets.

Presentations

Advanced Data Science, Part 1: Data Visualization in Jupyter using matplotlib and seaborn Tutorial

The fundamental concepts and ideas behind human visual perception and how it informs scientific data visualization are introduced in an intuitive and grounded manner. These concepts are illustrated through practical examples using matplotlib and seaborn, following a tutorial on these two libraries. Finally, the main ideas will be summarized in the form of rules of thumb for ease of reference.

Sean Gorman is the head of technical product management at DigitalGlobe. Previously, Sean was a cofounder of Timbr.io, a platform for enabling algorithmic orchestrations with sensor and social data (acquired by DigitalGlobe), and the founder and CEO of GeoIQ, a collaborative data and analytics company serving commercial and government customers (acquired by ESRI). Sean also worked at ESRI integrating social data with ESRI’s mapping technologies and was a research professor at George Mason University, where he focused on the intersection of complexity science, statistical mechanics, and spatial analysis. Sean holds a PhD from George Mason University, where he was the Provost’s High Potential Research Candidate, a Fisher Prize winner, and an INFORMS Dissertation Prize recipient.

Presentations

Using Jupyter to create a community for satellite imagery analysis and sharing Session

Satellite imagery can be a critical resource during disasters and humanitarian crises. While the community has improved data sharing, we still struggle to create reusable data science to solve on the ground problems. Sean Gorman offers an overview of GBDX Notebooks, a step toward creating an open data science community built around Jupyter to stream imagery and share analysis at scale.

Brian Granger is an associate professor of physics and data science at Cal Poly State University in San Luis Obispo. Brian is a leader of the IPython project, cofounder of Project Jupyter, and an active contributor to a number of other open source projects focused on data science in Python. Recently, he cocreated the Altair package for statistical visualization in Python. He is a advisory board member of NumFOCUS and a faculty fellow of the Cal Poly Center for Innovation and Entrepreneurship.

Presentations

Enterprise usage of Jupyter: The business case and best practices for leveraging open source Session

Over the past two years, we have seen a dramatic shift in Jupyter’s deployment, from ad hoc usage by individuals to production enterprise application at scale. Brian Granger explains how this has expanded the Jupyter community and revealed new use cases with new challenges and opportunities.

Friday opening remarks Keynote

JupyterCon cochairs Paco Nathan, Fernando Pérez, and Brian Granger open the second day of keynotes.

Thursday opening remarks Keynote

JupyterCon cochairs Paco Nathan, Fernando Pérez, and Brian Granger open the first day of keynotes.

Matt Greenwood is chief inspiration officer at Two Sigma, where he has led a number of company-wide efforts in engineering and modeling. Matt began his career at Bell Labs, working in the Operating Systems group under Dennis Ritchie, before moving to IBM Research, where he was responsible for a number of early efforts in tablet computing and distributed computing. Matt also also served as lead developer and manager for a number of systems on the network element at Entrisphere, which created a product providing access equipment for broadband service providers, and created the Customer Engineering department in preparation for initial customer trials. Matt holds a BA and an MA in math from Oxford University, a master’s degree in theoretical physics from the Weizmann Institute of Science in Israel, and a PhD in mathematics from Columbia University, where he taught for a number of years.

Presentations

Open Source Software and the Allocation of Capital Session

The presentation will explain why Two Sigma, a company in a space notorious for protecting IP, thinks it's important to contribute to the open source community. I'll talk about the evolution of our thinking and policies over the past five years, and make a case for why other companies should make a commitment to the open source ecosystem.

Jason Grout is a Jupyter developer at Bloomberg, working primarily on JupyterLab and the interactive widget system. Previously, Jason was an assistant professor of mathematics at Drake University in Des Moines, Iowa. Jason co-organizes the PyDataNYC Meetup. He has also been a major contributor to the open source Sage mathematical software system for many years. He holds a PhD in mathematics from Brigham Young University.

Presentations

JupyterLab tutorial Tutorial

JupyterLab—Jupyter's new frontend—goes beyond the classic Jupyter Notebook, providing a flexible and extensible web application with a set of reusable components. Jason Grout and Matthias Bussonnier walk you through using JupyterLab, explain how to transition from the classic Jupyter Notebook frontend to JupyterLab, and demonstrate the new powerful features of JupyterLab.

Joel Grus is a research engineer at the Allen Institute for Artificial Intelligence, the author of the beloved O’Reilly book “Data Science from Scratch”, and the author of the beloved blog post “Fizz Buzz in Tensorflow”. Previously he worked as a software engineer at Google and as a data scientist at a variety of startups. He lives in Seattle.

Presentations

I Don't Like Notebooks Session

I have been using and teaching Python for many years. I wrote a bestselling book about learning data science. And here's my confession: I don't like notebooks. [There are dozens of us!] In this talk I'll explain why I find notebooks difficult, show how they frustrate my preferred pedagogy, demonstrate how I prefer to work, and discuss what Jupyter could do to win me over.

.

Presentations

Keynote by Mark Hansen Keynote

Keynote with Mark Hansen

Chris Harris is a staff research and development engineer at Kitware. Chris has a wide range of research interests from high performance computing right through to client side visualization of scientific data sets. Prior to working at Kitware Chris worked at IBM on high performance messaging systems. He holds a masters degree in the Computing and Artificial Intelligence from Imperial College London.

Presentations

Reproducible quantum chemistry in Jupyter Session

In-silico prediction of chemical properties has seen vast improvements in both veracity and volume of data, but is currently hamstrung by a lack of transparent, reproducible workflows coupled with environments for visualization and analysis. We have developed a platform that uses Jupyter notebooks to enable end-to-end workflow from simulation setup, right through to visualizing the results.

I have contributed to the development of the Jupyter project and other PyData projects for several years. I am a known good actor in the python data ecosystem. I have extensive experience using and developing python and c++ for data science applications.

As a PhD student and post-doc I have given many talks at small and large international conferences to other physicists as well as undergraduate students. I co-organise the PyData meetup in Zurich and give talks at local meetups every few months as well as open-source conferences like PyCon and EuroSciPy. I am one of the maintainers of scikit-optimize a python library for blackbox optimisation and have contributed to scikit-learn. I run a free-lance consultancy specialised in building full stack data science solutions and teaching artificial intelligence skills. Customers include a large international organisation based in Geneva, Startups, NGOs, open-source projects, research groups. I am a mentor for Mozilla’s Open Leadership programme.

My homepage: http://www.wildtreetech.com

Presentations

Binder - lowering the bar to sharing interactive software Session

The Binder project drastically lowers the bar to sharing and re-using software. As a user wanting to try out someone else’s work requires only clicking a single link. This talk will introduce the audience to the concepts and ideas behind the Binder project. We will showcase examples from the community to Show off the power of Binder.

Jane Herriman is director of diversity and outreach at Julia Computing and a PhD student at Caltech. She is a Julia, dance, and strength training enthusiast who uses Jupyter notebooks to teach Julia.

Presentations

An introduction to Julia in Jupyter Tutorial

Jane Herriman uses Jupyter notebooks to show you why Julia is special, demonstrate how easy it is to learn Julia, and get you writing your first Julia programs.

The journey to Julia 1.0: The "Ju" in Jupyter Session

Julia and Jupyter share a common evolution path: Julia is the language for modern technical computing, while Jupyter is the development and presentation environment of choice for modern technical computing. Viral Shah and Jane Herriman discuss Julia's journey and the impact of Jupyter on Julia's growth.

Joel Horwitz is the vice president of strategic partnerships and offerings for the Digital Business Group at IBM, where he spearheads new partnerships and offerings for IBM Analytics, IBM Watson, IBM Cloud, IBM Hybrid Cloud, and emerging technology platforms IBM Blockchain, IBM Q, and many others. Joel is a passionate product, customer and marketing executive leading growth and transformation with developer ecosystems and partner offerings. His specialties include corporate development, product management, digital marketing, and data science. He holds an MS in nanotechnology from the University of Washington and an MBA from the University of Pittsburgh’s Katz School of Business.

Presentations

Business Summit roundtable: The current environment—Compliance, ethics, ML model interpretation, GDPR, and more Session

Join in for the Business Summit's roundtable discussion with participation from IBM, Capital One, the DoD, Amazon AWS, Oracle, and others. Speakers will discuss important issues in our current environment—everything from compliance and GDPR to ML models.

Matthew Hunt started playing with computers when he was 8, sold his first program at 13, and retains an unhealthy degree of curiosity. He lives in New York, where he can be found tinkering with 3D printers, dabbling in the future of flight, playing with VR headsets, and even doing work sometimes. He still believes that where you find people having the most fun, there will you find the future being created. Matthew runs the NYC Spark User’s group.

Presentations

What things are correlated with gender diversity: A dig through the ASF and Jupyter projects Session

Many of us believe that gender diversity in open source projects is important. (If you don’t, this isn’t going to convince you.) But what things are correlated with improved gender diversity, and what can we learn from similar historic industries? Holden Karau and Matt Hunt explore the diversity of different projects, examine historic EEOC complaints, and detail parallels and historic solutions.

Paul Ivanov is a senior software engineer at Bloomberg LP working on IPython- and Jupyter-related open source projects. Previously, Paul worked on backend and data engineering at Disqus; was a code monkey at the Brain Imaging Center at UC Berkeley, where he worked on IPython and taught at UC Berkeley’s Python bootcamps; worked in Bruno Olshausen’s lab at the Redwood Center for Theoretical Neuroscience; and was a PhD candidate in the Vision Science program at UC Berkeley. He holds a degree in computer science from UC Davis.

Presentations

Terraforming Jupyter: Changing JupyterLab to suit your needs Session

Stephanie Stattel and Paul Ivanov walk you through a series of extensions that demonstrate the power and flexibility of JupyterLab’s architecture, from targeted functionality modifications to more extreme atmospheric changes that require extensive decoupling and flexibility within JupyterLab.

Kerim Kalafala is a member of the IBM Academy of Technology, a Senior Technical Staff Member in the IBM Systems Group, and an IBM Master Inventor. His current role is lead architect of static timing and noise analysis software tools used to design and verify the world’s fastest microprocessors. Kerim has received multiple prestigious Research Division awards for publications in computer science and mathematics, an ACM/IEEE Technical Impact Award in Electronic Design Automation, as well as a best-paper award at the Design Automation Conference, and was recognized for co-authoring a top-10 most cited paper in the 50 year history of DAC. Kerim has also received both the IBM Corporate and Outstanding Technical Achievement Awards for contributions to the field of statistical timing analysis. He is an inventor on 49 issued patents worldwide and approximately a dozen more pending. Kerim is a member of the executive board for the Rhinebeck Science Foundation, and volunteers extensively in his local community. Before joining IBM, Kerim received his undergraduate and graduate degrees in Computer and Systems Engineering from Rensselaer Polytechnic Institute, where he graduated with Summa Cum Laude honors.

Presentations

Design and Analysis of the World’s Most Advanced Microprocessors Using Jupyter Notebooks Session

We will present our experiences using Jupyter notebooks, as a critical aid in the design the next generation of IBM Power and Z processors. Analytics on graphs consisting of hundreds of millions of nodes will be emphasized along with leveraging Jupyter notebooks as part of our overall design system.

Praveen Kanamarlapudi is a senior software engineer on the core data platform team at PayPal, where he builds scalable and distributed platforms, including a highly available Jupyter platform that is being used by hundreds of the company’s data scientists, analysts, and developers. He’s also a contributor to Livy and Sparkmagic.

Presentations

PayPal Notebooks: Data science and machine learning at scale, powered by Jupyter Session

Hundreds of PayPal's data scientists, analysts, and developers use Jupyter to access data spread across filesystem, relational, document, and key-value stores, enabling complex analytics and an easy way to build, train, and deploy machine learning models. Romit Mehta and Praveen Kanamarlapudi explain how PayPal built its Jupyter infrastructure and powerful extensions.

Harini Kannan is a data scientist at cybersecurity company Capsule8, where she applies her skills in statistics, visualization, and machine learning to a broad range of threat detection and computer security problems. She enjoys using Python, Jupyterlab, R, and TensorFlow in her daily work.

Presentations

Rapid data science deployment for cybersecurity with JupyterHub Session

The key to successful threat detection in cybersecurity is fast response. George Williams, Harini Kannan, and Alex Comerford offer an overview of specialized extensions they have built for data scientists working in cybersecurity that can be used and deployed via JupyterHub.

Holden Karau is a transgender Canadian open source developer advocate at Google focusing on Apache Spark, Beam, and related big data tools. Previously, she worked at IBM, Alpine, Databricks, Google (yes, this is her second time), Foursquare, and Amazon. Holden is the coauthor of Learning Spark, High Performance Spark, and another Spark book that’s a bit more out of date. She is a committer on the Apache Spark, SystemML, and Mahout projects. When not in San Francisco, Holden speaks internationally about different big data technologies (mostly Spark). She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal. Outside of work, she enjoys playing with fire, riding scooters, and dancing.

Presentations

What things are correlated with gender diversity: A dig through the ASF and Jupyter projects Session

Many of us believe that gender diversity in open source projects is important. (If you don’t, this isn’t going to convince you.) But what things are correlated with improved gender diversity, and what can we learn from similar historic industries? Holden Karau and Matt Hunt explore the diversity of different projects, examine historic EEOC complaints, and detail parallels and historic solutions.

Kyle Kelley is a senior software engineer at Netflix, a maintainer on nteract.io, and a core developer of the IPython/Jupyter project. He wants to help build great environments for collaborative analysis, development, and production workloads for everyone, from small teams to massive scale.

Presentations

How to build on top of Jupyter’s protocols Tutorial

Kyle Kelley walks you through creating a new web application from the ground up, teaching you how to build on top of Jupyter's protocols in the process. Along the way, you'll learn about Jupyter's REST and streaming APIs, message spec, and the notebook format.

David Koop is an assistant professor in the Computer and Information Science Department at UMass Dartmouth. His research interests include data visualization, computational provenance, and data science environments. He has served as a core developer for the VisTrails project and has collaborated with scientists in the fields of climate science, quantum physics, and invasive species modeling. David holds a PhD in computing from the University of Utah.

Presentations

Supporting reproducibility in Jupyter through Dataflow notebooks Session

Dataflow notebooks build on the Jupyter Notebook environment by adding constructs to make dependencies between cells explicit and clear. David Koop offers an overview of the Dataflow kernel, shows how it can be used to robustly link cells as a notebook is developed, and demonstrates how that notebook can be reused and extended without impacting its reproducibility.

Keith Kraus is a Washington, DC-based senior engineer on the AI infrastructure team at NVIDIA, where he builds GPU-accelerated solutions around data engineering, analytics, and visualization. Previously, Keith did extensive data engineering, systems engineering, and data visualization work in the cybersecurity domain, focused on building a GPU-accelerated big data solution for advanced threat detection and cyber threat-hunting capabilities. Keith holds a BEng in computer engineering and an MEng in networked information systems from Stevens Institute of Technology.

Presentations

GPU-accelerated data science with Jupyter notebooks Session

The GPU Open Analytics Initiative (GoAi) is a collection of open source libraries, frameworks, and APIs that make leveraging GPUs easy for data scientists. Joshua Patterson and Keith Kraus demonstrate how to build Jupyter notebooks with GPU-accelerated data processing and visualizations, rapidly accelerating data exploration all without writing any low-level code.

Julian Kudszus is a software engineer at Yelp. Previously, he was a curriculum developer and program coordinator for the Data Science Modules initiative at UC Berkeley, which brings data science lessons to thousands of the university’s students across a wide range of domains through the use of JupyterHub and Jupyter notebooks. He received the Outstanding Teaching and Leadership award his work at the Division of Data Sciences. Julian holds a bachelor’s degree in computer science from UC Berkeley.

Presentations

JupyterHub for domain-focused integrated learning modules Session

The Data Science Modules program at UC Berkeley creates short explorations into data science using notebooks to allow students to work hands-on with a dataset relevant to their course. Mariah Rogers, Ronald Walker, and Julian Kudszus explain the logistics behind such a program and the indispensable features of JupyterHub that enable such a unique learning experience.

Nicholai L’Esperance is a staff engineer in the IBM Systems group in Essex Junction, Vermont, where he works in the Product Engineering Diagnostics group. In this role, Nicholai develops new tools and methodologies to aid yield, reliability, and characterization missions for IBM’s Power and Z programs. Before joining IBM, Nicholai completed his BSEE and MSEE at the University of Vermont, graduating with Cum Laude honors. During his time at UVM, Nicholai’s studies focused on signal analysis, co-authoring several papers on ground-penetrating radar and device testing. Nicholai is continuing his studies, pursuing a graduate degree in Computer Science.

Presentations

Design and Analysis of the World’s Most Advanced Microprocessors Using Jupyter Notebooks Session

We will present our experiences using Jupyter notebooks, as a critical aid in the design the next generation of IBM Power and Z processors. Analytics on graphs consisting of hundreds of millions of nodes will be emphasized along with leveraging Jupyter notebooks as part of our overall design system.

Julia Lane is a professor at the NYU Wagner Graduate School of Public Service and the NYU Center for Urban Science and Progress as well as a NYU provostial fellow for innovation analytics. Previously, Julia was a senior managing economist and institute fellow at American Institutes for Research, where she cofounded the Institute for Research on Innovation and Science (IRIS) at the University of Michigan. Over her caree, Julia has held positions at the National Science Foundation, the Urban Institute, the World Bank, American University, and NORC at the University at Chicago.

Presentations

Jupyter, sensitive data, and public policy Session

Government agencies have found it difficult to serve taxpayers because of the technical, bureaucratic, and ethical issues associated with access and use of sensitive data. Julia Lane explains how the Coleridge Initiative has partnered with Jupyter to design ways that can address the core problems such organizations face.

I’m a MS student at UC Berkeley advised by Josh Hug. I am interested in improving data science education. Currently, I’m building tools to make it easy to create and publish interactive educational content online.

Presentations

nbinteract: Shareable, Interactive Webpages From Notebooks Session

The nbinteract package converts Jupyter notebooks with widgets into interactive, standalone HTML pages. nbinteract’s built-in support for function-driven plotting makes authoring interactive pages simpler by allowing users to focus on data, not callbacks. We will introduce nbinteract and walk through the steps to publish an interactive web page from a Jupyter notebook.

Mr. Lawler is an engineering consultant with expertise in coastal and riverine surface water modeling. Mr. Lawler is a subject matter expert in scientific programming with experience developing and scaling serial applications for parallel processing in High Performance and Cloud Computing environments. He has worked on broad ranging projects at the national, state, and local level including the development and quality control of tools in use by the US Army Corps of Engineers and the United States Geological Survey. Mr. Lawler is currently completing a PhD in Civil Engineering at George Mason University, where he is conducting research with the National Weather Service to enhance modeling and forecasting capabilities in areas influenced by coastal and fluvial flooding mechanisms.

Presentations

Using JupyterLab for flood map development: approaches for improving productivity and reproducibility Session

Creating flood maps for coastal & riverine communities requires geospatial processing, statistical analysis, finite element modeling, and a team of specialists working together. This talk will demo the process of how using the feature-rich JupyterLab to develop tools, share code with team members, and document workflows used in the creation of flood maps improves productivity and reproducibility.

Assistant Professor in the Laboratory of Genetics and the Wisconsin Institute for Discovery at UW-Madison. Architecting Evolvix, the first general-purpose programming language designed by biologists for biologists.

Presentations

StabVS: a Stabilizing Versioning System for reproducible open science Poster

Versioning is easy when we only need a local versioning system like v1, v2, v3. It gets hard when versioning info needs to concisely say if upgrades are safe or risky and roughly what will change. Versioning is hard to change later. The stabilizing versioning system we developed for our EvoSysBio research could help Jupyter open science users increase the long-term stability of their code.

Currently, Jinli Ma is the VP of Tax Data Analytics at a financial company. He holds a Master degree on Computer Information Technologies, a Master degree on Financial Engineering and a Bachelor degree on Applied Mathematics. His work experiences include data analytics, model development, model review, and model governance in financial industry.

Presentations

How Jupyter Notebook Makes Corporate Tax Process Easier and Better Session

In corporate tax world, the king of spreadsheets, Microsoft Excel is often the default tool for tracking information and managing tasks. However, tax professionals are often lost or annoyed by slowly updating or broken linked or referenced cells within or between spreadsheets. This session highlights how Jupyter Notebook can make a better job than Microsoft Excel in the OID calculation process.

Johan Mabille is a scientific software developer at QuantStack, where he specializes in high-performance computing in C++. Previously, Johan was a quant developer at HSBC. An open source developer, Johan is the coauthor of xtensor and xeus and the main author of xsimd. He holds a master’s degree in computer science from Centrale-Supelec.

Presentations

Going Native: C++ as a First-Class Citizen of the Jupyter Ecosystem Session

In this talk, we present the latest features of the C++ Jupyter kernel including - live help, auto-completion, - rich mime type rendering, - interactive widgets, making it one of the most featureful implementations of the Jupyter kernel protocol, and bringing Jupyter closer to the metal.

Dan Romuald Mbanga is a global lead business development manager at AWS, where he leads business and technical initiatives involving Amazon AI platforms such as Amazon SageMaker, designed to provide end-to-end machine learning environments for AWS’s customers. He helps AWS customers in all GEOs, as well as internal AWS stakeholders across data science, product development, marketing, sales, and technical support achieve success with AWS’s machine and deep learning technologies. Previously, Dan was a big data and DevOps engineering manager at AWS, where he built and led two teams of specialized engineers on the Hadoop ecosystem and in CI/CD technologies. Dan holds BS degrees in physics and computer science from the University of Buea. In his spare time, he enjoys traveling, hacking hardware electronics, and learning new languages.

Presentations

Business Summit roundtable: The current environment—Compliance, ethics, ML model interpretation, GDPR, and more Session

Join in for the Business Summit's roundtable discussion with participation from IBM, Capital One, the DoD, Amazon AWS, Oracle, and others. Speakers will discuss important issues in our current environment—everything from compliance and GDPR to ML models.

Keynote by Dan Romuald Mbanga Keynote

Keynote by Dan Romuald Mbanga

Jon Mease is a data scientist and software developer with the Air and Missile Defense Sector of the Johns Hopkins Applied Physics Laboratory. He has interests and experiences across a range of technical domains, algorithm families, programming languages, data visualization techniques, and data science technologies. Jon holds bachelor’s degrees in Physics and Mathematics from Millersville University and a master’s degree in Computer Science from Johns Hopkins University.

Presentations

Bringing ipywidget support to plotly.py Poster

We present our efforts to bring full ipywidget support to the plotly.py data visualization library. This work brings many exciting new features to Jupyter Notebook users working with plotly.py including Python callbacks, offline image export, binary array serialization, and integration with the broader ipywidget ecosystem.

Romit Mehta is a product manager at PayPal focusing on core big data and analytics platform products, which include a compute framework, a data platform and a notebooks platform. In this role, Romit is working to simplify application development on big data technologies like Spark and improve analysts’ and data scientists’ agility and ease their access to data spread across a multitude of data stores via friendly technologies like SQL and notebooks. In his 19-year career, Romit has built data and analytics solutions for a wide variety of companies across the networking, semiconductor, telecom, security, and fintech industries. Outside of data products, Romit spends his time with his wife Kosha and their two wonderful kids, Annika and Vedant.

Presentations

PayPal Notebooks: Data science and machine learning at scale, powered by Jupyter Session

Hundreds of PayPal's data scientists, analysts, and developers use Jupyter to access data spread across filesystem, relational, document, and key-value stores, enabling complex analytics and an easy way to build, train, and deploy machine learning models. Romit Mehta and Praveen Kanamarlapudi explain how PayPal built its Jupyter infrastructure and powerful extensions.

Julia Meinwald is open source coordinator at Two Sigma. Julia’s background is in music, but she’s been learning more about technology and scientific computing ever since she joined Two Sigma in 2010. She’s enjoyed every stop of her quest to learn more about open source software, from getting to know what makes the products developed at Two Sigma special to writing backing tracks for her musical Reb + VoDKa + Me on Sonic Pi.

Presentations

Business Summit roundtable: The current environment—Compliance, ethics, ML model interpretation, GDPR, and more Session

Join in for the Business Summit's roundtable discussion with participation from IBM, Capital One, the DoD, Amazon AWS, Oracle, and others. Speakers will discuss important issues in our current environment—everything from compliance and GDPR to ML models.

Keynote by Julia Meinwald Keynote

Keynote by Julia Meinwald

A Chemical Engineering / Computer Science double major who has spent the last 20 years in the refining and petrochemicals industry looking (mostly unsuccessfully) to find a harmonious union of the two disciplines.

My employer, Honeywell UOP, has a long and illustrious history as a technology licensor in the refining and petrochemical industry. UOP has pioneered many advancements in catalyst and process technology that have revolutionized the oil refining industry. I currently work as a Regional Service Manager in UOP’s Technology Services division, which itself has a long and important history in the UOP organization for bringing world-class technical service to UOP’s many customers.

Prior to my current role I worked in UOP’s Field Operating Services, whose members travel the world helping refiners commission and operate UOP technology. In my travels I’ve had the pleasure of working with talented and welcoming individuals in Egypt, Mexico, Chile, Venezuela, Korea, Japan, and Russia.

My history with Python goes to the beginning of my career in the late 90’s when I built a simple web server on the company intranet using an early incarnation of Zope. After a long hiatus while I traveled the world I came back to Python, discovering the wonders of IPython and Python’s amazing data science ecosystem.

At heart, though, I’m a Lisp-guy and my fascination with that language goes back to college and my favorite CS topic – Artificial Intelligence – in my junior and senior years. Up until then I was very much a vi user, but it is hard to learn Lisp without also learning that most daunting of text editors: Emacs. Interestingly, it really wasn’t until I was well into my career as a Chemical Engineer that I began to use Emacs in earnest, all because of the indispensable org-mode.

And finally, EIN! I found the emacs-ipython-notebook after (re)discovering IPython around version 0.11. Amazing things were happening in Python at the
time, and I was re-learning the joys of using something other than Excel for doing engineering work. I started as a user, but soon became something more as ein’s creator- Takafumi Arakaki- moved on to other things and big changes in ipython required significant updates to ein to maintain compatibility. Not able to live with the thought of a world without ein, I foolishly dived into the world of elisp and Jupyter development and have not looked back.

At the moment ein enjoys a modest user community – over 500 stars on github and over 40,000 downloads on MELPA. Their kind words and bug reports have helped keep ein relevant throughout the many changes in Jupyter’s architecture in the past few years.

Presentations

The Emacs Ipython Notebook Session

A full-featured client for the Jupyter Notebook in Emacs. The Emacs IPython Notebook, or EIN, is a full-feature client for the Jupyter Notebook that runs in the venerable Emacs":https://www.gnu.org/software/emacs/ text editor. This presentation is intended to provide a general introduction to the tool along with a brief history of its development.

Michael has served (under various job titles) as an expert in scientific computing at the Minnesota Supercomputing Institute since 2011, and in that time has become a leading evangelist for Python and Jupyter at the University of Minnesota.

Michael received his doctorate in Astrophysics in 2011 after a stint developing flight systems and data analysis software for the EBEX suborbital payload.

Presentations

Interactive Supercomputing for Academics Poster

The Minnesota Supercomputing Institute has implemented Jupyterhub and the Jupyter notebook server as a general-purpose point-of-entry to interactive high performance computing services. This mode of operation runs counter to traditional job-oriented HPC operations, but offers significant advantages for ease-of-use, data exploration, prototyping, and workflow development.

Genetics Major with various biological research interests; extensive contributions to the usability of StabVS, the Stabilizing Versioning System, and other aspects of Evolvix.

Presentations

StabVS: a Stabilizing Versioning System for reproducible open science Poster

Versioning is easy when we only need a local versioning system like v1, v2, v3. It gets hard when versioning info needs to concisely say if upgrades are safe or risky and roughly what will change. Versioning is hard to change later. The stabilizing versioning system we developed for our EvoSysBio research could help Jupyter open science users increase the long-term stability of their code.

Alaa Moussawi is currently pursuing a PhD in computational physics with a focus on network science from Rensselaer Polytechnic Institute. Previously, he was a high school math and astronomy teacher in the NYC public high school system. Alaa’s research interests lie in influence and controllability of networks, statistical physics, and novel neural network architectures. He has authored and coauthored papers in the field of network science, with a particular focus on cascading failures in power grids and global risk dynamics. He was awarded the Trip Advisor top prize at Hack RPI 2017, where he led a team of four computational physicists in creating an Amazon Alexa application that helps an uninformed vacation seeker plan a trip based on their personal preferences, including finding hotels and flights, by suggesting appropriate venues in the location. He holds a BSc in physics with a minor in secondary education from the City College of New York. In his free time, he studies economic models of wealth distribution and dabbles in algorithmic trading.

Presentations

Anomaly detection and classification with distribution grid sensor data Poster

Alaa Moussawi offers an overview of anomaly detection algorithms that use data from phasor measurement unit sensors in the power grid. These algorithms are designed from first principles. They classify anomalies using fundamental classification algorithms such as decision trees and neural networks. Feature selection is used to identify the optimal set of parameters for the learning algorithms.

Paco Nathan leads the Learning Group at O’Reilly Media. Known as a “player/coach” data scientist, Paco led innovative data teams building ML apps at scale for several years and more recently was an evangelist for Apache Spark, Apache Mesos, and Cascading. Paco has expertise in machine learning, distributed systems, functional programming, and cloud computing with 30+ years of tech industry experience, ranging from Bell Labs to early-stage startups. Paco is an advisor for Amplify Partners and was cited in 2015 as one of the top 30 people in big data and analytics by Innovation Enterprise. He is the author of Just Enough Math, Intro to Apache Spark, and Enterprise Data Workflows with Cascading.

Presentations

Friday opening remarks Keynote

JupyterCon cochairs Paco Nathan, Fernando Pérez, and Brian Granger open the second day of keynotes.

Keynote by Paco Nathan Keynote

Keynote by Paco Nathan

Thursday opening remarks Keynote

JupyterCon cochairs Paco Nathan, Fernando Pérez, and Brian Granger open the first day of keynotes.

I grew up in an Army family spending time in California, Texas, 2 different bases in Germany, and finished in Northern New York where I graduated high school. From there I was off to SUNY Potsdam to study mathematics. Upon graduating from Potsdam I knew I needed long sunny days and warm weather and so I went to graduate school at the University of Florida.

While at Florida I found what I was looking for, beautiful weather, beaches, and algebraic topology. I found manifolds to be an interesting blend of exotic spaces, yet well-behaved spaces and finished my PhD by analyzing a topological invariant that nobody can pronounce. The market for academics is pretty rough right now and so I was off to teach at an independent school.

I’ve been at Trinity five years now. Among my graduate school peers, I think I have the strongest students, the smallest classes, and the opportunity to do the most interesting work! Each year I teach a course on advanced topics that lie beyond a traditional high school curriculum. Recent courses have included algebraic number theory, combinatorics, linear algebra, group theory, and cryptography, each time with a significant coding component. Additionally I’m able to draw upon my background to extend the depth of our standard curriculum, improving everything from 9th grade math to BC calculus.

In my spare time I enjoy exploring NYC, particularly when an interesting restaurant is involved. I love fruity herbal tea, having spent some time as the adviser to the tea club at Trinity.

Presentations

Jupyter for every high schooler Session

In an effort to broaden our graduates' mathematical toolkit as well as address gender equity in STEM education I've led the implementation of python projects across our entire 9th grade math courses. Every student in the 9th grade completes 3 python projects that introduce programming and integrate it with the ideas developed in class.

I am an organizational sociologist at NYU investigating how organizations integrate (or fail to integrate) data-driven decision making insights and processes.

Presentations

Data Science in US & Canadian Higher Education Session

This talk will be based on research that of the various infrastructure models supporting data science in research settings in terms of funding, educational uses, and research utilization. Specifically, we explore the national federation model currently established in Canada, with the support of the Canadian federal government, in comparison to the more grassroots efforts in many US universities.

Brendan is a leader in the open source software development community and open data movement. He founded Qri (pronounced “query”) to help bring the benefits of open source software to public data. He helped to launch DataTogether.org, a network of communities, data scientists and developers dedicated to promoting a culture of data collection and sharing. He is also a member of EDGI, the Environmental Data and Governance Initiative, founded to support efforts to preserve at-risk government environmental data.

Presentations

Exit the Data Cathedral. Enter the Data Bazaar. Poster

Today’s balkanized “data cathedrals” force us to extract, transform and load data for before use, without a way to depend on data we don’t control. We must replace this “cathedral approach” with the goal of building a data bazaar, allowing us to freely compose and build upon each other’s data much the way we do with software today, using jupyter as a key tool for interacting with this data bazaar.

Catherine Ordun is a Senior Data Scientist at Booz Allen Hamilton, in Washington, D.C. She has a background in biology, public health, and business, and is a self-taught Python programmer. She has led data science work across the U.S. Government, including U.S. intelligence, public health, and DoD agencies. She is lucky to be on the Women in Data Science Committee at Booz Allen, is a two-time recipient of the Women of Color (WoC) award, has presented to the National Academy of Medicine, led her team to the Top 3 in a Health and Human Services Opioid Codeathon, and is currently a program reviewer for SciPy2018. She is passionate about machine learning, and has recently started participating in Kaggle challenges as well as has started an internal firm-wide machine intelligence Meetup.

Presentations

Jupyter Notebook as a Transparent Way to Document Machine Learning Model Development - Case Study for a U.S. Defense Agency Session

Many U.S. government agencies are just getting started in machine learning. As a result, data scientists need to de-"black box" models as much as possible. One simple way to do this is to transparently show how the model is coded and its results at each step. Notebooks do just this. We will walk through a notebook we built for RNNs and discuss how we think agencies can use Notebooks.

Carl Osipov is a program manager focused on helping Google’s customers and business partners get trained and certified to run machine learning and data analytics workloads on Google Cloud. Carl has more than 16 years of experience in the IT industry and has held leadership roles for programs and projects in the areas of big data, cloud computing, service-oriented architecture, machine learning, and computational natural language processing at some of the world’s leading technology companies across United States and Europe. Carl has written over 20 articles in professional, trade, and academic journals and holds six patents from the USPTO. He has received three corporate awards from IBM for his innovative work. You can find out more about Carl on his blog.

Presentations

Serverless Machine Learning with TensorFlow 1-Day Training

In this workshop, we walk through the process of building machine learning models with TensorFlow. We cover data exploration, feature engineering, model creation, training, evaluation and deployment.

M Pacer is a Jupyter core developer at the Berkeley Institute for Data Science (BIDS) focusing on the intersection between Jupyter and scientific publishing (with an eye toward constructing a total scientific record that is more amenable to machine learning techniques). M holds a PhD from UC Berkeley, where his research used machine learning and human experiments to study casual explanation and causal inference, and a BS from Yale University.

Presentations

Jupyter's configuration system Session

Jupyter's straightforward, out-of-the-box experience has been important for its success in widespread adoption. But good defaults only go so far. Join Afshin Darian, M Pacer, Min Ragan-Kelley, and Matthias Bussonnier to go beyond the defaults and make Jupyter your own.

Making beautiful objects with Jupyter Session

Jupyter displays a rich array of media types out-of-the-box. M Pacer explains how to use these capabilities to their full potential, covering how to add rich displays to existing and new Python classes and how to customize the way notebooks are converted to other formats. These skills will enable anyone to make beautiful objects with Jupyter.

Yuvi Panda is infrastructure lead for the Data Science Education Program at UC Berkeley, where he works on scaling JupyterHub for use by thousands of students. A programmer and DevOps engineer, he wants to make it easy for people who don’t traditionally consider themselves programmers to do things with code and builds tools (Quarry, PAWS, etc.) to sidestep the list of historical accidents that constitute the “command-line tax” that people have to pay before doing productive things with computing. He’s a core member of the JupyterHub team and works on mybinder.org as well. Yuvi is also a Wikimedian, since you can check out of Wikimedia, but you can never leave.

Presentations

How we run MyBinder.org: A case study in open infrastructure Session

Running infrastructure is challenging for an open source community. Yuvi Panda shares lessons drawn from the small community that operates MyBinder.org, covering the social and technical processes for keeping MyBinder.org reliable in the most open, transparent, and inclusive way possible, using pretty graphs about the state of MyBinder.org that anyone can see in real time.

Pangeo: Big data climate science in the cloud Session

Climate science is being flooded with petabytes of data, overwhelming traditional modes of data analysis. The Pangeo project is building a platform to take big data climate science into the cloud using SciPy and large-scale interactive computing tools. Join Ryan Abernathey and Yuvi Panda to find out what the Pangeo team is building and why and learn how to use it.

The current state of JupyterHub and what's in store for the future Session

JupyterHub is a multiuser server for Jupyter notebooks, focused on supporting deployments in research and education. Min Ragan-Kelley, Carol Willing, and Yuvi Panda discuss recent additions and future plans for the project.

Joshua Patterson is the director of applied solutions engineering at NVIDIA. Previously, Josh worked with leading experts across the public and private sectors and academia to build a next-generation cyberdefense platform. He was also a White House Presidential Innovation Fellow. His current passions are graph analytics, machine learning, and GPU data acceleration. Josh also loves storytelling with data and creating interactive data visualizations. He holds a BA in economics from the University of North Carolina at Chapel Hill and an MA in economics from the University of South Carolina’s Moore School of Business.

Presentations

GPU-accelerated data science with Jupyter notebooks Session

The GPU Open Analytics Initiative (GoAi) is a collection of open source libraries, frameworks, and APIs that make leveraging GPUs easy for data scientists. Joshua Patterson and Keith Kraus demonstrate how to build Jupyter notebooks with GPU-accelerated data processing and visualizations, rapidly accelerating data exploration all without writing any low-level code.

Fernando Pérez is a staff scientist at Lawrence Berkeley National Laboratory and a founding investigator of the Berkeley Institute for Data Science at UC Berkeley, created in 2013. His research focuses on creating tools for modern computational research and data science across domain disciplines, with an emphasis on high-level languages, interactive and literate computing, and reproducible research. He created IPython while a graduate student in 2001 and continues to lead its evolution into Project Jupyter, now as a collaborative effort with a talented team that does all the hard work. Fernando regularly lectures about scientific computing and data science and is a member of the Python Software Foundation, a founding member of NumFOCUS, and a National Academy of Science Kavli Frontiers of Science Fellow. He is also the recipient of the 2012 Award for the Advancement of Free Software from the Free Software Foundation. Fernando holds a PhD in particle physics from the University of Colorado at Boulder, which he followed with postdoctoral research in applied mathematics and developing numerical algorithms.

Presentations

Friday opening remarks Keynote

JupyterCon cochairs Paco Nathan, Fernando Pérez, and Brian Granger open the second day of keynotes.

Keynote by Fernando Perez Keynote

Keynote by Fernando Perez

Thursday opening remarks Keynote

JupyterCon cochairs Paco Nathan, Fernando Pérez, and Brian Granger open the first day of keynotes.

Devin Petersohn is a PhD student from the University of California – Berkeley. His research interests include large scale computing, data science, and genomics research.

Presentations

Pandas on Ray Session

Make Pandas faster by changing a single line of your code. Pandas on Ray gives users a seamless way to transition into multi-process computing and parallel execution of their data science pipelines.

Nicole Petrozzo is graduating from the Department of Computer Science at Bryn Mawr College, Spring 2018. She first used Jupyter in her firstyear seminar, and she last used Jupyter for her senior thesis exploring recommender systems using deep learning.

Presentations

Jupyter Graduates! Session

For the last four years, I have used nothing but Jupyter in the classroom. From a firstyear writing course to a course on assembly language; from Biology to Computer Science; from lectures to homework---everything has been in Jupyter. In this talk, I explore the ways I have leveraged Jupyter, and detail the successes and failures experienced along the way.

Min has been a core contributor to IPython and Jupyter for over ten years. He is a Postdoctoral Fellow at Simula Research Laboratory where his focus is on developing JupyterHub, Binder, and related technologies and supporting deployments of Jupyter in science and education around the world.

Presentations

Deploying a cloud-based JupyterHub for students and researchers Tutorial

This tutorial will let you provide a group of your colleagues or students with easy access to Jupyter notebooks and JupyterLab without asking them to install anything on their computers. You will configure and deploy a cloud-based JupyterHub using Kubernetes. You will learn how to customize and extend it for your needs.

Jupyter's configuration system Session

Jupyter's straightforward, out-of-the-box experience has been important for its success in widespread adoption. But good defaults only go so far. Join Afshin Darian, M Pacer, Min Ragan-Kelley, and Matthias Bussonnier to go beyond the defaults and make Jupyter your own.

The current state of JupyterHub and what's in store for the future Session

JupyterHub is a multiuser server for Jupyter notebooks, focused on supporting deployments in research and education. Min Ragan-Kelley, Carol Willing, and Yuvi Panda discuss recent additions and future plans for the project.

Shivraj Ramanan is director of product management at Capital One. Shivraj combines a strong background in business strategy with technical depth to drive successful outcomes for product teams. Previously, he worked in product strategy in a Fortune 500 company, where he analyzed emerging markets and investigated strategic investments, and in strategy consulting, where he advised on a wide variety of complex topics. Shivraj started his career as a software engineer developing enterprise backup software.

Presentations

Using Jupyter notebooks in highly regulated environments Session

In Capital One's recent exploration of "notebook" offerings, JupyterHub emerged as a top contender that could serve as a potential platform for analytics even in highly regulated industries like financial services. David Schaaf and Shivraj Ramanan discuss Capital One's journey and explain how Jupyter has become a part of the company's ever-growing analytics toolkit.

Presentations

Going Native: C++ as a First-Class Citizen of the Jupyter Ecosystem Session

In this talk, we present the latest features of the C++ Jupyter kernel including - live help, auto-completion, - rich mime type rendering, - interactive widgets, making it one of the most featureful implementations of the Jupyter kernel protocol, and bringing Jupyter closer to the metal.

Luciano Resende is a data science platform architect at IBM CODAIT (formerly the Spark Technology Center). A member of the ASF, Luciano has been contributing to open source at the ASF for over 10 years and is currently contributing to various big data-related Apache projects around the Apache Spark ecosystem as well as building a scalable, secure, and flexible enterprise data science platform within the Jupyter ecosystem.

Presentations

Scaling notebooks for deep learning workloads (sponsored by IBM) Session

Luciano Resende outlines a pattern for building deep learning models using the Jupyter Notebook's interactive development in commodity hardware and leveraging platforms and services such as Fabric for Deep Learning (FfDL) for cost-effective full dataset training of deep learning models.

Lindsay Richman is a Digital Operations Specialist at McKinsey & Company. She programs in Python and Javascript, primarily working in the areas of data visualization, front-end web development, and robotics. Lindsay uses machine learning and AI to help streamline operations, improve product quality, and drive informed decision making.

Presentations

JupyterLab and Plotly: A Data Vizualization Power Couple Session

JupyterLab and Plotly both provide a rich set of tools for working with data. When combined, they create a powerful computational environment that enables users to produce versatile, robust visualizations in a fast-paced setting. This session demonstrates how to use JupyterLab, Plotly, and Plotly's Python-based Dash framework to create dynamic charts and interactive reports.

Mariah Rogers is program coordinator for the Division of Data Sciences at UC Berkeley, where she led the effort to build up the Data Scholars program that provides specialized academic support for students from underrepresented and nontraditional backgrounds. Mariah has been working with faculty on campus to build up the academic advising program for the new data science major (announced late Spring 2018) and has also been comanaging the Data Science Modules program to facilitate the introduction of data science concepts in existing courses across the UC Berkeley campus. Mariah holds a degree in computer science from UC Berkeley.

Presentations

JupyterHub for domain-focused integrated learning modules Session

The Data Science Modules program at UC Berkeley creates short explorations into data science using notebooks to allow students to work hands-on with a dataset relevant to their course. Mariah Rogers, Ronald Walker, and Julian Kudszus explain the logistics behind such a program and the indispensable features of JupyterHub that enable such a unique learning experience.

Ian Rose is as postdoctoral fellow at the Berkeley Institute for Data Science, where he works on the Jupyter Project. He holds a PhD in geology from UC Berkeley, where his research focused on the physics of the deep Earth.

Presentations

JupyterLab Session

Ian Rose and Chris Colbert walk you through the JupyterLab interface and codebase and explain how it fits within the overall roadmap of Project Jupyter.

JupyterLab training 1-Day Training

Chris Colbert, Ian Rose, and Saul Shanabrook walk you through using, extending, and developing custom components for JupyterLab using PhosphorJS, React, JavaScript, TypeScript, and CSS. You'll learn how to make full use of the power features of JupyterLab, customize it to your needs, and develop custom extensions, making complete use of JupyterLab's current capabilities.

Gerald Rousselle is director of product management at Teradata.

Presentations

Jupyter in the modern enterprise data and analytics ecosystem: Trends, experiments, and opportunities Session

Gerald Rouselle reviews some of the trends in modern data and analytics ecosystems for large enterprises and shares some of the key challenges and opportunities for Jupyter adoption. He also details some recent examples and experiments in incorporating Jupyter in commercial products and platforms.

Scott Sanderson is a senior software engineer at Quantopian, where he is responsible for the design and implementation of Quantopian’s backtesting and research APIs. Within the Jupyter ecosystem, most of Scott’s work focuses on enhancing the extensibility of the Jupyter Notebook for use in large deployments.

Presentations

Designing for Interaction Session

This presentation explores how interactivity can and should influence the design of software libraries. We discuss ways that the needs of interactive users differ from the needs of application developers, and we describe techniques for improving the usability of libraries in interactive environments without sacrificing robustness in non-interactive environments.

An physicist by education, I studied Astrophysics at the Rijksuniversiteit Groningen (the Netherlands) and achieved my PhD from the Observatoire de Paris (France). After that I made the shift to software engineer and worked at a large bank in the Netherlands. While the work was enjoyable I was ready for a new challenge after 2 years, and joined the SDSC (Lausanne location) as a software engineer/data scientist to work on the development of the Renga platform.

Presentations

Reproducible science with the Renku platform Session

Renku is a highly-scalable and secure open software platform designed to make (data) science reproducible, to foster collaboration between scientists, and to share resources in a federated environment.

David Schaaf is a director of data engineering at Capital One, where he leads data product development within the Financial Services division. As part of his role, he guides agile teams to build data products for analyst and data communities with a primary focus on enabling self-service analytics, exploration, and insight discovery. David’s teams typically design data products using microservices, Angular, and Python and leverage core CI/CD practices for continuous delivery. David has more than 15 years of experience in software engineering and data analytics. He also has a wide breadth of knowledge across the financial services domain and in the retail industry. As a developer and analyst, David’s greatest interest is solving unique, complex problems and developing others as software and data engineers.

Presentations

Business Summit roundtable: The current environment—Compliance, ethics, ML model interpretation, GDPR, and more Session

Join in for the Business Summit's roundtable discussion with participation from IBM, Capital One, the DoD, Amazon AWS, Oracle, and others. Speakers will discuss important issues in our current environment—everything from compliance and GDPR to ML models.

Jupyter notebooks and the intersection of data science and data engineering Keynote

David Schaaf explains how data science and data engineering can work together in cross-functional teams—with Jupyter notebooks at the center of collaboration and the analytic workflow—to more effectively and more quickly deliver results to decision makers.

Using Jupyter notebooks in highly regulated environments Session

In Capital One's recent exploration of "notebook" offerings, JupyterHub emerged as a top contender that could serve as a potential platform for analytics even in highly regulated industries like financial services. David Schaaf and Shivraj Ramanan discuss Capital One's journey and explain how Jupyter has become a part of the company's ever-growing analytics toolkit.

Based in the Bay Area of California, Matthew attended Stanford University for undergraduate and graduate school. He stayed in the area focused on startups, spending a long stretch of time working at OpenGov. Now he’s working at Netflix and scaling data platform solutions.

Presentations

Scheduled Notebooks: A means for manageable and traceable code execution Session

Using an nteract project, papermill, we’ll walk through how we use notebooks to track user jobs and make a simple interface for work submission. You’ll get an inside peek at how Netflix is tackling the scheduling problem for a range of users who want easily managed workflows.

Viral B. Shah is a cofounder and CEO of Julia Computing and a cocreator of the Julia language. He spends all his time on working toward making Julia the default language for all forms of data science and numerical computing. Previously, he architected the payment platforms for the National ID (Aadhaar) project of the Government of India and authored Rebooting India, a book on his experiences implementing a complex technology project in governance. Viral holds a PhD in computational sciences from UC Santa Barbara, where his thesis was on interactive supercomputing. The technology developed in his thesis was licensed commercially by Microsoft.

Presentations

The journey to Julia 1.0: The "Ju" in Jupyter Session

Julia and Jupyter share a common evolution path: Julia is the language for modern technical computing, while Jupyter is the development and presentation environment of choice for modern technical computing. Viral Shah and Jane Herriman discuss Julia's journey and the impact of Jupyter on Julia's growth.

Saul Shanabrook is a software developer at Quansight.

Presentations

JupyterLab training 1-Day Training

Chris Colbert, Ian Rose, and Saul Shanabrook walk you through using, extending, and developing custom components for JupyterLab using PhosphorJS, React, JavaScript, TypeScript, and CSS. You'll learn how to make full use of the power features of JupyterLab, customize it to your needs, and develop custom extensions, making complete use of JupyterLab's current capabilities.

I’m Caleb, an student at UC Berkeley studying Computer Science and Economics. I’m interested in applying data science in the context of education and social good. I’m currently working on nbinteract, a project that allows users to easily create interactive visualizations with just a few lines of Python.

Presentations

nbinteract: Shareable, Interactive Webpages From Notebooks Session

The nbinteract package converts Jupyter notebooks with widgets into interactive, standalone HTML pages. nbinteract’s built-in support for function-driven plotting makes authoring interactive pages simpler by allowing users to focus on data, not callbacks. We will introduce nbinteract and walk through the steps to publish an interactive web page from a Jupyter notebook.

Stephanie Stattel is a senior software developer who has been with Bloomberg LP for over 5 years and is currently developing applications to improve financial professionals’ research and investment workflows. She is a San Francisco lead of the company’s global Bloomberg Women In Tech (BWIT) community.

Presentations

Terraforming Jupyter: Changing JupyterLab to suit your needs Session

Stephanie Stattel and Paul Ivanov walk you through a series of extensions that demonstrate the power and flexibility of JupyterLab’s architecture, from targeted functionality modifications to more extreme atmospheric changes that require extensive decoupling and flexibility within JupyterLab.

William Stein is the founder of the SageMath open source math software project, and also came up with the name Cython and launched that project. He is a Full Professor of Mathematics at University of Washington (currently on leave), and is the CEO of SageMath, Inc., whose main product is CoCalc. He has published 3 books and a few dozen papers in number theory.

Presentations

Realtime collaboration with Jupyter notebooks using CoCalc Session

I will explain how CoCalc relates to the Jupyter project, then describe how I implemented realtime collaborative editing of Jupyter notebooks in CoCalc.

Dave Stuart is a senior data scientist within the US Department of Defense and is the lead of the nbgallery project.

Presentations

Business Summit roundtable: The current environment—Compliance, ethics, ML model interpretation, GDPR, and more Session

Join in for the Business Summit's roundtable discussion with participation from IBM, Capital One, the DoD, Amazon AWS, Oracle, and others. Speakers will discuss important issues in our current environment—everything from compliance and GDPR to ML models.

Citizen data science: An enterprise use case from inside the US intelligence community Session

Dave Stuart explains how Jupyter was used inside the US Department of Defense and the greater intelligence community to empower thousands of "citizen data scientists" to build and share analytics in order to meet the community’s dynamic challenges.

Erik is a math and physics teacher in Uppsala, Sweden. While working towards a machine learning degree online, he realized the potential of Jupyter for educators and established a JupyterHub deployment using the Zero to JupyterHub on Kubernetes guide for his students, thereafter contributing to the open source project.

Presentations

Deploying a cloud-based JupyterHub for students and researchers Tutorial

This tutorial will let you provide a group of your colleagues or students with easy access to Jupyter notebooks and JupyterLab without asking them to install anything on their computers. You will configure and deploy a cloud-based JupyterHub using Kubernetes. You will learn how to customize and extend it for your needs.

Learn by doing: Using data-driven stories and visualizations in the (high school and college) classroom Session

Students learn by doing. Carol Willing, Jessica Forde, and Erik Sundell demonstrate the value of interactive content, using Jupyter notebooks, widgets, and visualization libraries, share notable examples of projects within the Jupyter community, and outline ways educators can help students develop data science literacy and use computational skills to build upon their interests.

Thorin Tabor is a software engineer at UCSD and a contributing scientist at the Broad Institute. He is the lead developer of the GenePattern Notebook and an open source developer on the integration of bioinformatic tools with Jupyter.

Presentations

GenePattern Notebook: Jupyter Beyond the Programmer Session

Making Jupyter accessible to all members of a research organization, regardless of their programming ability, empowers it to best utilize the latest analysis methods without the number of coders presenting a bottleneck. To bridge the gap between programmers and nonprogrammers, we have developed GenePattern Notebook, which offers a wide suite of enhancements to the Jupyter environment.

Robert Talbert is a professor of mathematics at Grand Valley State University. Robert is an early adopter, proponent, and thought leader on flipped learning in higher education, and his flipped learning implementations include 10 different university mathematics and computer science courses. He is the author of Flipped Learning: A Guide for Higher Education Faculty; he has also written articles, book chapters, and blog posts and given workshops and presentations on flipped learning to audiences in colleges across the US and abroad.

Presentations

Flipped learning with Jupyter: Experiences, best practices, and supporting research Session

In flipped learning, students encounter new material before class meetings, which helps them learn how to learn and frees up class time to focus on creative applications of the basic material. Lorena Barba and Robert Talbert discuss the use of Jupyter notebooks as a “tangible interface” for new material in a flipped course and share case studies from their own courses.

I’m a software engineer who writes Scala/Akka and Python, and has experiences of Ruby, Node.js and C/C++. As for human languages, I speak Japanese, business level of English, and conversation level of Chinese.

I developed anti-virus software, a web crawler, which analyzes suspicious web pages, in my first company, Trend Micro in Taiwan.
Then, I joined Kakaku.com and developed the most popular restaurant review website. I moved to NY to launch the US version of their website as subsidiary’s CTO. I developed all the frontend, backend and infrastructure of the website.
After I came back to Japan, I joined Kaizen Platform, which is a Web A/B Testing platform company. I designed and developed an Ad banner A/B Testing platform, a Web A/B Testing with DMP integration and a real-time conversion rate predictor with machine learning. I moved to its San Francisco office to support the US expansion.
Then, I moved back to Japan again and started to work in Preferred Networks. I’m developing a job scheduler and various useful tools with Mesos so our researchers can run distributed ML tasks in our cluster with 1024 GPUs and InfiniBand.

I love writing open source libraries on GitHub.
https://github.com/dtaniwaki

I’m interested in Computer Vision, Big Data and Machine Learning industries right now.

Presentations

GPU-enabled JupyterHub on Mesos Poster

At Preferred Networks, we operate a cluster with 1024 GPUs and instant and flexible access to the cluster is essential for our researchers. However, we realized it's hard to provide exclusive access to GPU cores and therefore introduced JupyterHub on Mesos. Mesos is responsible for resource isolation, and using Docker images with shared home directories provides a highly flexible environment.

Rachael Tatman is a data scientist at Kaggle. She has a PhD in linguistics from the University of Washington, with a focus in computational sociolinguistics. Her interests include data science education and fairness in machine learning.

Presentations

I do, We do, You Do: Supporting active learning with notebooks Tutorial

A practical introduction on incorporating notebooks into the classroom using active learning techniques.

Reproducible Research Best Practices (highlighting Kaggle Kernels) 1-Day Training

In this workshop, we’ll take an existing research project and make it fully reproducible using Kaggle Kernels. This workshop will include hands-on instruction and best practices for each of the three components necessary for completely reproducible research.

Tracy Teal is a cofounder of Data Carpentry and the executive director of The Carpentries. Previously, Tracy was an NSF postdoctoral researcher in biological informatics and an assistant professor in microbiology at Michigan State University. After seeing researchers’ need for effective data skills to effectively and reproducibly conduct research, she cofounded Data Carpentry to scale data training along with data production. Tracy is involved in the open source software and reproducible research communities, including as an editor at the Journal for Open Source Software and Journal for Open Source Education. She holds a PhD in computation and neural systems from California Institute of Technology.

Presentations

Democratizing data Keynote

We are generating vast amounts of data, but it's not the data itself that is valuable—it's the information and knowledge that can come from this data. Tracy Teal explains how to bring people to data and empower them to address their questions, reach their potential, and solve issues that are important in science, scholarship, and society.

Adam Thornton is a software developer in Data Management/Science Quality and Reliability Engineering on the Large Synoptic Survey Telescope. He is working on the JupyterLab-based interactive component of the LSST Science Platform. He has nearly 30 years of development, IT consulting, and system administration experience in a wide variety of settings from academic computing to Fortune 20 enterprises.

Presentations

"If The Data Will Not Come to the Astronomer...": JupyterLab and a sea change in astronomical analysis Session

LSST is an ambitious project to map the sky in the the fastest, widest and deepest survey ever made. This petabyte-scale, 7 trillion-row database disrupts traditional astronomical workflows. Our science platform requires a paradigm shift in how astronomy is done. Learn the challenges of providing production services on a notebook-based architecture and the compelling advantages of JupyterLab.

Wolf is a scientific software developer at QuantStack. Prior to joining QuantStack, Wolf completed a masters in Robotics at ETH Zurich and Stanford, focusing on Artificial Intelligence. He also wore a couple of hats: freelance web designer and – developer, building software for the BeachBot with Disney Research and making drones find their way at Rapyuta Robotics.
Besides work he’s a passionate cyclist and enjoys spending time outside the city.

Presentations

Going Native: C++ as a First-Class Citizen of the Jupyter Ecosystem Session

In this talk, we present the latest features of the C++ Jupyter kernel including - live help, auto-completion, - rich mime type rendering, - interactive widgets, making it one of the most featureful implementations of the Jupyter kernel protocol, and bringing Jupyter closer to the metal.

Ronald (Ronnie) Walker is a senior at UC Berkeley, where he is studying Economics. Ronnie has served as an undergraduate student instructor, connector course teaching assistant, and modules team lead within the university’s Data Science Education Program. As team lead, he worked with faculty in Linguistics, Information Science, Education, Cognitive Science, Legal Studies, Near Eastern Studies, and Economics to build short modules for their courses. Most recently, he has been busy helping departments integrate existing full courses with data science approaches.

Presentations

JupyterHub for domain-focused integrated learning modules Session

The Data Science Modules program at UC Berkeley creates short explorations into data science using notebooks to allow students to work hands-on with a dataset relevant to their course. Mariah Rogers, Ronald Walker, and Julian Kudszus explain the logistics behind such a program and the indispensable features of JupyterHub that enable such a unique learning experience.

Elizabeth Wickes is a Lecturer at the School of Information Sciences at the University of Illinois, where she teaches foundational programming and information technology courses. She was previously a Data Curation Specialist for the Research Data Service at the University Library of the University of Illinois, and the Curation Manager for Wolfram|Alpha. She currently co-organizes the Champaign-Urbana Python user group, has been a Carpentries instructor since 2015, trainer since 2017, and elected member of The Carpenties’ Executive Council for 2018.

Presentations

Reproducible education: what teaching can learn from open science practices Session

As practitioners of open science begin to migrate their educational material into pubic repositories, many of their common practices and platforms can be used to streamline the instruction material development process. This talk will compare how many open science practices can be used in an educational context, and are best facilitated by usage of tools like the Jupyter Notebook.

George Williams is the chief data scientist for Capsule8, a cybersecurity startup based in Brooklyn. George has worked at the intersection of research and industry for two decades. He has published papers at major mathematics and AI conferences and holds several patents in computer vision and security.

Presentations

Rapid data science deployment for cybersecurity with JupyterHub Session

The key to successful threat detection in cybersecurity is fast response. George Williams, Harini Kannan, and Alex Comerford offer an overview of specialized extensions they have built for data scientists working in cybersecurity that can be used and deployed via JupyterHub.

Carol is currently a Research Software Engineer at Cal Poly San Luis Obispo working full-time on [Project Jupyter](https://jupyter.org). She is also a Python Software Foundation Fellow and former Director; a Project Jupyter Steering Council member; a core developer for CPython, Jupyter, AnitaB.org’s open source projects, and PyLadies; a co-organizer of PyLadies San Diego and San Diego Python User Group; a Geek-In-Residence at FabLab San Diego; and an independent developer of open hardware and software.

Presentations

Deploying a cloud-based JupyterHub for students and researchers Tutorial

This tutorial will let you provide a group of your colleagues or students with easy access to Jupyter notebooks and JupyterLab without asking them to install anything on their computers. You will configure and deploy a cloud-based JupyterHub using Kubernetes. You will learn how to customize and extend it for your needs.

Learn by doing: Using data-driven stories and visualizations in the (high school and college) classroom Session

Students learn by doing. Carol Willing, Jessica Forde, and Erik Sundell demonstrate the value of interactive content, using Jupyter notebooks, widgets, and visualization libraries, share notable examples of projects within the Jupyter community, and outline ways educators can help students develop data science literacy and use computational skills to build upon their interests.

Sustaining wonder: Jupyter and the knowledge commons Keynote

New challenges are emerging for Jupyter, open information, and investing in the future. You, the innovators of this growing knowledge commons, will determine how we meet these challenges and sustain the ecosystem. Carol Willing shows how you can start.

The current state of JupyterHub and what's in store for the future Session

JupyterHub is a multiuser server for Jupyter notebooks, focused on supporting deployments in research and education. Min Ragan-Kelley, Carol Willing, and Yuvi Panda discuss recent additions and future plans for the project.

Wenming Ye is a senior solution architect at Amazon Web Services.

Presentations

Explore AWS Machine Learning Platform using Amazon SageMaker (Day 2) Training Day 2

Machine Learning and IoT projects are now common for enterprises and startups alike. These advanced technologies have been the key innovation engine for businesses such as Amazon Go, Alexa, and Robotics. In this hands-on workshop, we will explore the AWS Machine Learning Platform using project Jupyter-based Amazon SageMaker to build, train, and deploy ML/DL models to Cloud, and AWS DeepLens.

Explore the AWS machine learning platform using Amazon SageMaker 2-Day Training

Machine learning and IoT projects are increasingly common at enterprises and startups alike and have been the key innovation engine for Amazon businesses such as Go, Alexa, and Robotics. Wenming Ye and Miro Enev lead a hands-on deep dive into the AWS machine learning platform, using Project Jupyter-based Amazon SageMaker to build, train, and deploy ML/DL models to the cloud and AWS DeepLens.

Kevin Zielnicki is a Data Scientist on the Styling Algorithms team at Stitch Fix. Kevin holds a doctorate in physics in the field of quantum information processing, but he now enjoys working with data that can be observed without changing its value.

Presentations

Explorations in reproducible analysis with Nodebook Session

Even with good intentions, analysis notebooks can quickly accumulate a mess of false starts and out-of-order statements. Best practices encourage cleaning up a notebook to ensure reproducibility, but many analyses will never reach this cleaned-up state. As an alternative, this talk will describe Nodebook, a Jupyter plugin that encourages reproducibility by preventing inconsistency.

Randy Zwitch is a senior developer advocate at MapD.

Presentations

Using the MapD kernel for the Jupyter Notebook Session

MapD Core is an open source analytical SQL engine that has been designed from the ground up to harness the parallelism inherent in GPUs. This enables queries on billions of rows of data in milliseconds. Randy Zwitch offers an overview of the MapD kernel extension for the Jupyter Notebook and explains how to use it in a typical machine learning workflow.