Brought to you by NumFOCUS Foundation and O’Reilly Media
The official Jupyter Conference
Aug 21-22, 2018: Training
Aug 22-24, 2018: Tutorials & Conference
New York, NY

Speakers

Hear from innovative practitioners, talented managers, and senior developers who are doing amazing things in the Jupyter ecosystem. More speakers will be announced; please check back for updates.

Filter

Search Speakers

Ryan Abernathey is an assistant professor of Earth and environmental science at Columbia University and Lamont Doherty Earth Observatory. Ryan is a physical oceanographer who studies the large-scale ocean circulation and its relationship with Earth’s climate. High-resolution numerical modeling and satellite remote sensing are key tools in this research, which has led to an interest in high-performance computing and big data. Previously, he held a postdoc at Scripps Institution of Oceanography. In 2016, Ryan was awarded an Alfred P. Sloan Research Fellowship in ocean sciences and an NSF CAREER award for a project entitled “Evolution of Mesoscale Turbulence in a Changing Climate” and received a NASA New Investigator Award in 2013. He is an active participant in and advocate for open source software, open data, and reproducible science. He holds a PhD from MIT and a BA from Middlebury College.

Presentations

Pangeo: Big data climate science in the cloud Session

Climate science is being flooded with petabytes of data, overwhelming traditional modes of data analysis. The Pangeo project is building a platform to take big data climate science into the cloud using SciPy and large-scale interactive computing tools. Join Ryan Abernathey and Yuvi Panda to find out what the Pangeo team is building and why and learn how to use it.

The future of data-driven discovery in the cloud Keynote

Drawing on his experience with the Pangeo project, Ryan Abernathey makes the case for the large-scale migration of scientific data and research to the cloud. The cloud offers a way to make the largest datasets instantly accessible to the most sophisticated computational techniques. A global scientific data commons could usher in a golden age of data-driven discovery.

Ian Allison is an IT manager for the Pacific Institute for the Mathematical Sciences. A longtime user of IPython and Project Jupyter, Ian helped create and deploy a system of JupyterHubs under the name Syzygy, enabling more than 8,000 staff, students, and faculty members to include Jupyter in their work. Ian is also involved in a program to leverage Jupyter in K–12 classrooms via the Canadian government’s CanCode initiative. His background is in computational physics.

Presentations

Canadians land on Jupyter Session

Over the past 18 months, Ian Allison and James Colliander have deployed Jupyter to more than 8,000 users at universities across Canada. Ian and James offer an overview of the Syzygy platform and explain how they plan to scale and deliver the service nationally and how they intend to make Jupyter integral to the working experience of students, researchers, and faculty members.

Damián Avila is a software development team lead at Anaconda, Inc. A software developer, data scientist, and quantitative analyst from Córdoba, Argentina, his interests include data science, finance, data visualization, and the Jupyter/IPython ecosystem. Damián has made meaningful contributions to several open source projects and is a core developer for popular projects such as Jupyter/IPython, Nikola, and Bokeh. His personal project RISE, a “live” slideshow for the Jupyter Notebook, is quite popular. Damián is a frequent speaker at national and international conferences.

Presentations

Current RISE capabilities and its evolution into the future Session

RISE has evolved into the main slideshow machinery for live presentations within the Jupyter notebook. Damián Avila explains how to install and use RISE. You'll also discover how to customize it and see some of its new capabilities. Damián concludes by discussing the migration from RISE into a new JupyterLab-RISE extension providing RISE-based capabilities in the new JupyterLab interface.

Lorena A. Barba is associate professor of mechanical and aerospace engineering at the George Washington University in Washington, DC. In addition to her research in computational science and engineering, she is interested in education technology, social learning, and massively open online courses as well as innovations in STEM education, including flipped classrooms and other forms of blended learning. Lorena is a recipient of the 2016 Leamer-Rosenthal Award for Open Social Sciences and was recognized with an honorable mention at the Open Education Consortium’s 2017 Open Education Awards for Excellence.

Presentations

Flipped learning with Jupyter: Experiences, best practices, and supporting research Session

In flipped learning, students encounter new material before class meetings, which helps them learn how to learn and frees up class time to focus on creative applications of the basic material. Lorena Barba and Robert Talbert discuss the use of Jupyter notebooks as a “tangible interface” for new material in a flipped course and share case studies from their own courses.

Jupyter in education discussion group

The Jupyter in education track concludes with breakout sessions that allow presenters and attendees alike to work together on specific topics, potentially leading to new projects and collaborations.

Doug Blank is an associate professor of computer science at Bryn Mawr College, an all-women’s college outside of Philadelphia. He has been using Python in education for 20 years and Jupyter since its creation. He has developed many languages and tools for Jupyter specifically for pedagogy. His research focuses on combining artificial neural networks and robotics in order to give robots self-motivation.

Presentations

Jupyter graduates Session

For the last four years, Douglas Blank has used nothing but Jupyter in the classroom—from a first-year writing course to a course on assembly language, from biology to computer science, from lectures to homework. Join in to learn how Douglas has leveraged Jupyter and discover the successes and failures he experienced along the way. Nicole Petrozzo then offers a student's perspective.

Nick Bollweg is a core member of the Jupyter Project and contributor to conda-forge and other Python and JavaScript open source projects. Over his career, he has done work in the enterprise open source, medical, corporate, and applied research sectors, including oncology biostatistics curation, document management fleet optimization, complex system collaboration, and decision-making tools and enterprise data science platforms. Nick holds a BA in computer science and German from the University of Minnesota and a PM in applied systems engineering from the Georgia Institute of Technology.

Presentations

The reincarnation of a notebook Session

Notebook authors often consider only the interactive experience of creating computable documents. However, the dynamic state of a notebook is a minor period in its lifecycle; the majority is spent as a file at rest. Tony Fast and Nick Bollweg explore conventions that create notebooks with value long past their inception as documents, software packages, test suites, and interactive applications.

Maarten Breddels is a astronomer, freelance developer, consultant, and data scientist working working mostly with Python, C++, and JavaScript in the Jupyter ecosystem. His expertise ranges from fast numerical computation and API design to 3D visualization. He holds a bachelor’s degree in ICT and both a master’s degree and PhD in astronomy.

Presentations

Jupyter widgets Session

Project Jupyter aims to provide a consistent set of tools for data science workflows, from the exploratory phase of the analysis to the sharing of the results. Maarten Breddels and Sylvain Corlay offer an overview of Jupyter's interactive widgets framework, which enables rich user interaction, including 2D and 3D interactive plotting, geographic data visualization, and much more.

Matt currently leads instruction for General Assembly’s Data Science Immersive in Washington, DC, where he helps bridge the gap between theoretical statistics and real-world insights. Matt is passionate about making data science more accessible and putting the revolutionary power of machine learning into the hands of as many people as possible. A recovering politico, Matt was a data scientist for a political consulting firm through the 2016 election. He holds a master’s degree in statistics from the Ohio State University. When he isn’t teaching, he’s thinking about how to be a better teacher, falling asleep to Netflix, or cuddling with his pug.

Presentations

Advanced data science, part 2: Five ways to handle missing data in Jupyter notebooks Tutorial

Missing data plagues nearly every data science problem. Often, people just drop or ignore missing data. However, this usually ends up with bad results. Matt Brems explains how bad dropping or ignoring missing data can be and teaches you how to handle missing data the right way by leveraging Jupyter notebooks to properly reweight or impute your data.

Jackson Brown is a research engineer working on data release infrastructure for the modeling team at the Allen Institute for Cell Science. He is also the cofounder of the Council Data Project, an organization working to enable better public transparency and discourse. Previously, he was a designer for SageMathCloud (CoCalc), a collaborative computation service.

Presentations

Reproducible data dependencies for Jupyter: Distributing massive, versioned image datasets from the Allen Institute for Cell Science Session

Reproducible data is essential for notebooks that work across time, across contributors, and across machines. Jackson Brown and Aneesh Karve demonstrate how to use an open source data registry to create reproducible data dependencies for Jupyter and share a case study in open science over terabyte-size image datasets.

Matthias Bussonnier is postdoc at UC Berkeley BIDS and a core developer of the Jupyter and IPython project, where he is working in close collaboration with Google to bring real-time collaboration to the Jupyter environment.

Presentations

Jupyter's configuration system Session

Jupyter's straightforward, out-of-the-box experience has been important for its success in widespread adoption. But good defaults only go so far. Join Afshin Darian, M Pacer, Min Ragan-Kelley, and Matthias Bussonnier to go beyond the defaults and make Jupyter your own.

JupyterLab tutorial Tutorial

JupyterLab—Jupyter's new frontend—goes beyond the classic Jupyter Notebook, providing a flexible and extensible web application with a set of reusable components. Jason Grout and Matthias Bussonnier walk you through using JupyterLab, explain how to transition from the classic Jupyter Notebook frontend to JupyterLab, and demonstrate JupyterLab's new powerful features.

I’m a physics education researcher who studies how tools and science practices affect student learning in physics, and the conditions and environments that support or inhibit this learning. I conduct research from the high school to the upper-division and am particularly interested in how students learn physics through their use of tools such as mathematics and computing. My work employs cognitive and sociocultural theories of learning and aims to blend these perspectives to enhance physics instruction at all levels. My projects range from the fine-grained (e.g., how students understand particular elements of code) to the course-scale (e.g., how students learn to model systems in electromagnetism) to the very broad (e.g., how does computing affect learning across a degree program?). Presently, I co-direct the Physics Education Research Lab at MSU.

Presentations

The future of Jupyter in education Session

Join this panel of seasoned educators and the cochairs of the education track at JupyterCon to look to the future of Jupyter in teaching and learning.

Cristian Capdevila is principal data scientist at Prognos, where he and his fellow data scientists work alongside clinical experts to develop disease prediction products for customers in the life sciences and payer markets. Previously, Cristian was a data scientist in the ad tech space, working on customer similarity models.

Presentations

Disease prediction using the world's largest clinical lab dataset (sponsored by Amazon Web Services) Keynote

Cristian Capdevila explains how Prognos is predicting disease by applying a combination of modern machine learning techniques and clinical expertise to the world’s largest clinical lab database and how the company is leveraging Amazon SageMaker to accelerate model development, training, and deployment.

Diogo Castro is a full stack developer on the SWAN team within the Software Development for Experiments Group at CERN.

Presentations

SWAN: CERN's Jupyter-based interactive data analysis service Session

SWAN, CERN’s service for web-based analysis, leverages the power of Jupyter to provide the high energy physics community access to state-of-the-art infrastructure and services through a web service. Diogo Castro offers an overview of SWAN and explains how researchers and students are using it in their work.

Chakri Cherukuri is a senior researcher in the Quantitative Financial Research Group at Bloomberg LP. His research interests include quantitative portfolio management, algorithmic trading strategies, and applied machine learning. Chakri has extensive experience in numerical computing and software development. Previously, he built analytical tools for the trading desks at Goldman Sachs and Lehman Brothers. He holds an undergraduate degree in engineering from the Indian Institute of Technology, Madras, an MS in computer science from Arizona State University, and an MS in computational finance from Carnegie Mellon University.

Presentations

Visualizing machine learning models in the Jupyter Notebook (sponsored by Bloomberg LP) Session

Chakri Cherukuri offers an overview of the interactive widget ecosystem available in the Jupyter notebook and illustrates how Jupyter widgets can be used to build rich visualizations of machine learning models. Along the way, Chakri walks you through algorithms like regression, clustering, and optimization and shares a wizard for building and training deep learning models with diagnostic plots.

Christopher Cho is a product manager and cloud program manager at Google, where he helps customers solve machine learning and infrastructure problems, and is one of the product managers in Kubeflow team. Previously, Chris was research program manager at DeepMind, working on cutting-edge ML research. His background is in enterprise business consulting. Chris is currently working toward his MSCS at Georgia Tech. He holds a BS in mechanical engineering from the University of Illinois Urbana-Champaign.

Presentations

Machine learning at scale with Kubernetes 1-Day Training

Christopher Cho demonstrates how Kubernetes can be easily leveraged to build a complete deep learning pipeline, including data ingestion and aggregation, preprocessing, ML training, and serving with the mighty Kubernetes APIs.

Pramit Choudhary is a lead data scientist at DataScience.com, where he focuses on optimizing and applying classical machine learning and Bayesian design strategy to solve real-world problems. Currently, he is leading initiatives on figuring out better ways to explain a model’s learned decision policies to reduce the chaos in building effective models and close the gap between a prototype and operationalized model.

Presentations

Business Summit roundtable: The current environment—Compliance, ethics, ML model interpretation, GDPR, and more Session

Join in for the Business Summit's roundtable discussion with participation from IBM, Capital One, the DoD, AWS, Oracle, and others. Speakers will discuss important issues in our current environment—everything from compliance and GDPR to ML models.

Natalia Clementi is a PhD student at the George Washington University.

Presentations

The future of Jupyter in education Session

Join this panel of seasoned educators and the cochairs of the education track at JupyterCon to look to the future of Jupyter in teaching and learning.

April Clyburne-Sherin is an outreach scientist at Code Ocean, where she trains scientists in computational reproducibility best practices. An epidemiologist, methodologist, and expert in open science tools, methods, training, and community stewardship, since 2014, April has focused on training scientists in open and reproducible research methods at the Center for Open Science, Sense about Science, and SPARC. She is coauthor of FOSTER’s Open Science Training Handbook; cofounder of OOO Canada, a network to promote leadership in open access, open education, and open data; and producer of The Method, an open source podcast. She holds an MS in population medicine (epidemiology).

Presentations

Preparing your Jupyter notebook for computationally reproducible publication: A hands-on BYONotebook tutorial for researchers Tutorial

April Clyburne-Sherin walks you through preparing Jupyter notebooks for computationally reproducible publication. You'll learn best practices for publishing notebooks and get hands-on experience preparing your own research for reuse, creating documentation, and submitting your notebook to share.

Chris Colbert is a software architect for Project Jupyter.

Presentations

JupyterLab Session

Ian Rose and Chris Colbert walk you through the JupyterLab interface and codebase and explain how it fits within the overall roadmap of Project Jupyter.

JupyterLab training 1-Day Training

Chris Colbert, Ian Rose, and Saul Shanabrook walk you through using, extending, and developing custom components for JupyterLab using PhosphorJS, React, JavaScript, TypeScript, and CSS. You'll learn how to make full use of the power features of JupyterLab, customize it to your needs, and develop custom extensions, making complete use of JupyterLab's current capabilities.

James Colliander is a professor of mathematics at UBC, director of the Pacific Institute for the Mathematical Sciences, and the founder and CEO of Toronto-based education technology company Crowdmark. James’s research intertwines partial differential equations, harmonic analysis, and dynamical systems to address problems arising from mathematical physics and other sources. Previously, he was an NSF postdoc at the University of California Berkeley, a professor at the University of Toronto, and a professeur invité at the Université de Paris-Nord, Université de Paris-Sud, and at the Institut Henri Poincaré. He has been a member of the Institute for Advanced Study. James has been recognized with a Sloan fellowship and the McLean Award and as an award-winning teacher. He holds a PhD from the University of Illinois.

Presentations

Canadians land on Jupyter Session

Over the past 18 months, Ian Allison and James Colliander have deployed Jupyter to more than 8,000 users at universities across Canada. Ian and James offer an overview of the Syzygy platform and explain how they plan to scale and deliver the service nationally and how they intend to make Jupyter integral to the working experience of students, researchers, and faculty members.

The future of Jupyter in education Session

Join this panel of seasoned educators and the cochairs of the education track at JupyterCon to look to the future of Jupyter in teaching and learning.

Alex Comerford is a data scientist at cybersecurity company Capsule8, where he focuses on developing interactive and informative data visualizations to identify security issues in large-scale cloud environments. His interests include data science, data visualization, statistics, and machine learning.

Presentations

Rapid data science exploration for cybersecurity Session

The key to successful threat detection in cybersecurity is fast response. George Williams, Harini Kannan, and Alex Comerford offer an overview of specialized extensions they have built for data scientists working in cybersecurity that can be used and deployed via JupyterHub.

Sylvain Corlay is the founder of QuantStack and a quant researcher specializing in stochastic analysis and optimal control. Previously, Sylvain was a quant researcher at Bloomberg LP and an adjunct faculty member at Columbia University and NYU. As an open source developer, Sylvain mostly contributes to Project Jupyter in the area of interactive widgets and lower-level components such as traitlets. He is also a member of the steering committee of the project. Sylvain is also a contributor to a number of other open source projects for scientific computing and data visualization, such as bqplot, pythreejs, and ipyleaflet, and coauthored the xtensor C++ tensor algebra library. He holds a PhD in applied mathematics from University Paris VI.

Presentations

Going native: C++ as a first-class citizen of the Jupyter ecosystem Session

Sylvain Corlay, Johan Mabille, Wolf Vollprecht, and Martin Renou share the latest features of the C++ Jupyter kernel, including live help, auto-completion, rich MIME type rendering, and interactive widgets. Join in to explore one of the most feature-full implementations of the Jupyter kernel protocol that also brings Jupyter closer to the metal.

Jupyter widgets Session

Project Jupyter aims to provide a consistent set of tools for data science workflows, from the exploratory phase of the analysis to the sharing of the results. Maarten Breddels and Sylvain Corlay offer an overview of Jupyter's interactive widgets framework, which enables rich user interaction, including 2D and 3D interactive plotting, geographic data visualization, and much more.

Afshin Darian is a Jupyter core developer at Two Sigma and a coauthor of JupyterLab. He has been active in the open source community for several years and has worked at several open source enterprises, including Anaconda, Alfresco Software, and OpenGamma. Darian holds degrees in philosophy and medieval history.

Presentations

Jupyter's configuration system Session

Jupyter's straightforward, out-of-the-box experience has been important for its success in widespread adoption. But good defaults only go so far. Join Afshin Darian, M Pacer, Min Ragan-Kelley, and Matthias Bussonnier to go beyond the defaults and make Jupyter your own.

Noemi Derzsy is a senior inventive scientist within the Data Science and AI Research organization at AT&T Labs. Previously, Noemi was a data science fellow at Insight Data Science NYC and a postdoctoral research associate at Social Cognitive Networks Academic Research Center at Rensselaer Polytechnic Institute. She holds a PhD in physics with over a decade of research experience in network science and computer science. Her interests revolve around the study of complex systems and complex networks through real-world data.

Presentations

Network and graph analysis with Jupyter notebooks Tutorial

Networks, also known as graphs, are one of the most crucial data structures in our increasingly intertwined world. Social friendship networks, the web, financial systems, and infrastructure are all network structures. Noemi Derzsy explains how to generate, manipulate, analyze, and visualize graph structures that will help you gain insight about relationships between elements in your data.

Allen Downey is a professor at Olin College and the author of Think Python, Think Stats, Think Bayes, and more. He writes about statistics in his blog Probably Overthinking It.

Presentations

The future of Jupyter in education Session

Join this panel of seasoned educators and the cochairs of the education track at JupyterCon to look to the future of Jupyter in teaching and learning.

Miro Enev is a senior solutions architect at NVIDIA, where he helps train and guide pilot deep learning projects at Amazon. Miro’s interests include advancing data science and machine intelligence while respecting human values in future technology ecosystems.

Presentations

Explore the AWS machine learning platform using Amazon SageMaker 2-Day Training

Wenming Ye and Miro Enev offer an overview of deep learning along with hands-on Jupyter labs, demos, and instruction. You'll learn how DL is applied in modern business practice and how to leverage building blocks from the Amazon ML family of AI services.

Explore the AWS machine learning platform using Amazon SageMaker (Day 2) Training Day 2

Machine learning and IoT projects are increasingly common at enterprises and startups alike and have been the key innovation engine for Amazon businesses such as Go, Alexa, and Robotics. Wenming Ye and Miro Enev lead a hands-on deep dive into the AWS machine learning platform, using Project Jupyter-based Amazon SageMaker to build, train, and deploy ML/DL models to the cloud and AWS DeepLens.

Tyler A. Erickson is a senior developer advocate at Google, where he fosters collaborations with researchers from academia, NGOs, and governmental organizations seeking to capitalize on Earth Engine’s capabilities for geospatial analyses that involve immense satellite and model-based datasets. Tyler leads the development of Earth Engine’s core efforts in water and climate, guides the evolution of Earth Engine to support these scientific domains, and leads support efforts for the Earth Engine Python API. A snow hydrologist by training, he holds degrees in civil and environmental engineering and geography from Colorado State University, CalTech, Stanford, and the University of Colorado at Boulder. Tyler is a longtime Python programmer, with contributions to the Open Source Geospatial (OSGeo) Foundation and the Free and Open Source Software for Geospatial (FOSS4G) conferences.

Presentations

How JupyterLab and widgets enable interactive analysis of the Earth's past, present, and future Session

Massive collections of data on the Earth's changing environment, collected by satellite sensors and generated by Earth system models, are being exposed via web APIs by multiple providers. Tyler Erickson highlights the use of JupyterLab and Jupyter widgets in analyzing complex high-dimensional datasets, providing insights into how our Earth is changing and what the future might look like.

Tony Fast is a modern scientist with over a decade of experience analyzing unstructured data for cross-functional teams in research, business, and security. Tony currently explores the intersection of applied engineering and computer science, trying to understand how open access will transform basic science for the next-generation workforce. He is actively building diverse communities around open source scientific software technologies in metro Atlanta; he currently organizes the Atlanta Jupyter user group and is a data lead at Code for Atlanta. He was also a cofounder of PyData Atlanta. Tony holds a PhD in materials science and engineering from Drexel University and a BS in ceramic engineering from Rutgers University.

Presentations

The reincarnation of a notebook Session

Notebook authors often consider only the interactive experience of creating computable documents. However, the dynamic state of a notebook is a minor period in its lifecycle; the majority is spent as a file at rest. Tony Fast and Nick Bollweg explore conventions that create notebooks with value long past their inception as documents, software packages, test suites, and interactive applications.

Nicolas Fernandez is a computational scientist at the Human Immune Monitoring Center at the Icahn School of Medicine at Mount Sinai. Nicolas is a computational biologist with interests in analysis and visualization of high-throughput biological data as a means to understanding biological regulatory networks.

Presentations

Visualizing high-dimensional biological data with Clustergrammer-Widget in the Jupyter Notebook Session

Nicolas Fernandez offers an overview of Clustergrammer-Widget, an interactive heatmap Jupyter widget that enables users to easily explore high-dimensional data within a Jupyter notebook and share their interactive visualizations using nbviewer.

Jessica Forde is a technical writer for Project Jupyter. Her previous open source projects include datamicroscopes, a Bayesian nonparametrics library in Python, and density, a tool for Columbia University study spaces based on wireless device data.

Presentations

Learn by doing: Using data-driven stories and visualizations in the (high school and college) classroom Session

Students learn by doing. Carol Willing, Jessica Forde, and Erik Sundell demonstrate the value of interactive content, using Jupyter notebooks, widgets, and visualization libraries, share notable examples of projects within the Jupyter community, and outline ways educators can help students develop data science literacy and use computational skills to build upon their interests.

Ian Foster is a senior scientist, distinguished fellow, and director of the Data Science and Learning Division at Argonne National Laboratory as well as the Arthur Holly Compton Distinguished Service Professor of Computer Science at the University of Chicago and a fellow of the Institute for Molecular Engineering. A computer scientist whose work at the intersection of computing and the sciences has produced both practical technologies that have seen wide adoption and concepts and methods that have proven influential in research and education, Ian is also chief troublemaker at Globus. His research interests span a range of topics in parallel, distributed, and data-intensive computing. A unifying theme is a desire to use the power of rapid communication to accelerate discovery, whether by linking people with remote computers and data, accelerating complex computational processes, or enabling distributed virtual teams. Ian pursues use-inspired basic research, meaning that he employs challenging practical problems to motivate and focus work on hard problems in computer science. Over the years, these practical problems have come from such fields as environmental science, economics, high-energy physics, biomedicine, and engineering. He often builds sophisticated artifacts (i.e., software and distributed systems) in order to apply, evaluate, and disseminate new concepts and methods. Ian’s work frequently involves large teams of disciplinary scholars, computer scientists, and software engineers. Ian has received multiple awards for his work, including the IEEE TCSC Award for Excellence in Scalable Computing (2014), the Inaugural ACM HPDC Lifetime Achievement Award (2012), and the IEEE Tsutomu Kanai Award (2011).

Presentations

Scaling collaborative data science with Globus and Jupyter Session

The Globus service simplifies the utilization of large and distributed data on the Jupyter platform. Ian Foster explains how to use Globus and Jupyter to seamlessly access notebooks using existing institutional credentials, connect notebooks with data residing on disparate storage systems, and make data securely available to business partners and research collaborators.

Michelle Gill is a senior data scientist at BenevolentAI, where she uses artificial intelligence to facilitate pharmaceutical discovery. Previously, Michelle was a deep learning consultant within NVIDIA’s Professional Services Group and a scientist at the National Cancer Institute, where she developed parallelized software utilizing machine learning and compressed sensing algorithms. As a postdoctoral research fellow at Columbia University Medical School, she studied the biological activity of cancer-associated enzymes. She holds a PhD in molecular biophysics and biochemistry from Yale University.

Presentations

Data science as a catalyst for scientific discovery Keynote

Michelle Gill explains how data science methodologies and tools can be used to link information from different scientific fields and accelerate discovery in a variety of areas, including the biological sciences.

Zachary Glassman is a data scientist in residence at the Data Incubator. Zachary has a passion for building data tools and teaching others to use Python. He studied physics and mathematics as an undergraduate at Pomona College and holds a master’s degree in atomic physics from the University of Maryland.

Presentations

Hands-on data science with Python 2-Day Training

Zachary Glassman leads a hands-on dive into building intelligent business applications using machine learning, walking you through all the steps of developing a machine learning pipeline. You'll explore data cleaning, feature engineering, model building and evaluation, and deployment and extend these models into two applications using real-world datasets.

Hands-on data science with Python (Day 2) Training Day 2

Zachary Glassman leads a hands-on dive into building intelligent business applications using machine learning, walking you through all the steps of developing a machine learning pipeline. You'll explore data cleaning, feature engineering, model building and evaluation, and deployment and extend these models into two applications using real-world datasets.

Bruno Gonçalves is currently a Vice President in Data Science and Finance at JPMorgan Chase. Previously, we was a Data Science fellow at NYU’s Center for Data Science while on leave from a tenured faculty position at Aix-Marseille Université. Since completing his PhD in the Physics of Complex Systems in 2008 he has been pursuing the use of Data Science and Machine Learning to study Human Behavior. Using large datasets from Twitter, Wikipedia, web access logs, and Yahoo! Meme he studied how we can observe both large scale and individual human behavior in an obtrusive and widespread manner. The main applications have been to the study of Computational Linguistics, Information Diffusion, Behavioral Change and Epidemic Spreading.

Presentations

Advanced data science, part 1: Data visualization in Jupyter using Matplotlib and Seaborn Tutorial

Bruno Gonçalves offers an overview of the fundamental concepts and ideas behind human visual perception and explains how it informs scientific data visualization. To illustrate these concepts, Bruno shares practical examples using matplotlib and seaborn.

Sean Gorman is the head of technical product management at DigitalGlobe. Previously, Sean was a cofounder of Timbr.io, a platform for enabling algorithmic orchestrations with sensor and social data (acquired by DigitalGlobe), and the founder and CEO of GeoIQ, a collaborative data and analytics company serving commercial and government customers (acquired by Esri). Sean also worked at Esri integrating social data with Esri’s mapping technologies and was a research professor at George Mason University, where he focused on the intersection of complexity science, statistical mechanics, and spatial analysis. Sean holds a PhD from George Mason University, where he was the Provost’s High Potential Research Candidate, a Fisher Prize winner, and an INFORMS Dissertation Prize recipient.

Presentations

Using Jupyter to create a community for satellite imagery analysis and sharing Session

Satellite imagery can be a critical resource during disasters and humanitarian crises. While the community has improved data sharing, we still struggle to create reusable data science to solve problems on the ground. Sean Gorman offers an overview of GBDX Notebooks, a step toward creating an open data science community built around Jupyter to stream imagery and share analysis at scale.

Brian Granger is an associate professor of physics and data science at Cal Poly State University in San Luis Obispo. Brian is a leader of the IPython project, cofounder of Project Jupyter, and an active contributor to a number of other open source projects focused on data science in Python. Recently, he cocreated the Altair package for statistical visualization in Python. He is an advisory board member of NumFOCUS and a faculty fellow of the Cal Poly Center for Innovation and Entrepreneurship.

Presentations

Enterprise usage of Jupyter: The business case and best practices for leveraging open source Session

Over the past two years, we have seen a dramatic shift in Jupyter’s deployment, from ad hoc usage by individuals to production enterprise application at scale. Brian Granger explains how this has expanded the Jupyter community and revealed new use cases with new challenges and opportunities.

Friday opening remarks Keynote

JupyterCon cochairs Paco Nathan, Fernando Pérez, and Brian Granger open the second day of keynotes.

Thursday opening remarks Keynote

JupyterCon cochairs Paco Nathan, Fernando Pérez, and Brian Granger open the first day of keynotes.

Matt Greenwood is chief inspiration officer at Two Sigma, where he has led a number of company-wide efforts in engineering and modeling. Matt began his career at Bell Labs, working in the Operating Systems Group under Dennis Ritchie, before moving to IBM Research, where he was responsible for a number of early efforts in tablet computing and distributed computing. Matt also served as lead developer and manager for a number of systems on the network element at Entrisphere, which created a product providing access equipment for broadband service providers, and created the Customer Engineering Department in preparation for initial customer trials. Matt holds a BA and an MA in math from Oxford University, a master’s degree in theoretical physics from the Weizmann Institute of Science in Israel, and a PhD in mathematics from Columbia University, where he taught for a number of years.

Presentations

Open source software and the allocation of capital Session

Matt Greenwood explains why Two Sigma, a company in a space notorious for protecting IP, thinks it's important to contribute to the open source community. Matt covers the evolution of Two Sigma's thinking and policies over the past five years and makes a case for why other companies should make a commitment to the open source ecosystem.

Jason Grout is a Jupyter developer at Bloomberg, working primarily on JupyterLab and the interactive widget system. Previously, Jason was an assistant professor of mathematics at Drake University in Des Moines, Iowa. Jason co-organizes the PyDataNYC Meetup. He has also been a major contributor to the open source Sage mathematical software system for many years. He holds a PhD in mathematics from Brigham Young University.

Presentations

JupyterLab tutorial Tutorial

JupyterLab—Jupyter's new frontend—goes beyond the classic Jupyter Notebook, providing a flexible and extensible web application with a set of reusable components. Jason Grout and Matthias Bussonnier walk you through using JupyterLab, explain how to transition from the classic Jupyter Notebook frontend to JupyterLab, and demonstrate JupyterLab's new powerful features.

Joel Grus is a research engineer at the Allen Institute for Artificial Intelligence and the author of the beloved O’Reilly book Data Science from Scratch and the blog post “Fizz Buzz in TensorFlow.” Previously, he was a software engineer at Google and a data scientist at a variety of startups. He lives in Seattle.

Presentations

I don't like notebooks. Session

I have been using and teaching Python for many years. I wrote a best-selling book about learning data science. And here's my confession: I don't like notebooks. (There are dozens of us!) I'll explain why I find notebooks difficult, show how they frustrate my preferred pedagogy, demonstrate how I prefer to work, and discuss what Jupyter could do to win me over.

Mark Hansen is a professor of journalism and the director of the David and Helen Gurley Brown Institute for Media Innovation, a bicoastal collaboration between the Columbia Journalism School and the School of Engineering at Stanford University with a mission to explore the interplay between technology and story. Previously, Mark was a professor in the Department of Statistics at UCLA. In addition to his technical work, he also has an active art practice involving the presentation of data for the public. His work with the Office for Creative Research has been exhibited at the Museum of Modern Art in New York, the Whitney Museum, the Centro de Arte Reina Sofia, the London Science Museum, and the Cartier Foundation in Paris and in permanent displays in the lobbies of the New York Times building and the Public Theater in Manhattan. Mark holds a BS in applied math from the University of California, Davis, and a PhD and MA in statistics from the University of California, Berkeley.

Presentations

The reporter’s notebook Keynote

Beyond Twitter, Facebook, and similar networks, without question, data, code, and algorithms are forming systems of power in our society. Mark Hansen explains why it is crucial that journalists—explainers of last resort—be able to interrogate these systems, holding power to account.

Chris Harris is a staff research and development engineer at Kitware. Chris has a wide range of research interests, from high-performance computing to client-side visualization of scientific datasets. Previously, Chris worked on high-performance messaging systems at IBM. He holds a master’s degree in computing and artificial intelligence from Imperial College London.

Presentations

Reproducible quantum chemistry in Jupyter Session

In silico prediction of chemical properties has seen vast improvements in both veracity and volume of data but is currently hamstrung by a lack of transparent, reproducible workflows coupled with environments for visualization and analysis. Chris Harris offers an overview of a platform that uses Jupyter notebooks to enable an end-to-end workflow from simulation setup to visualizing the results.

Tim Head is CEO of Wild Tree Tech, a freelance consultancy specializing in building full stack data science solutions and teaching artificial intelligence skills. Customers include a large international organization based in Geneva, startups, NGOs, open source projects, and research groups. Tim is a mentor for Mozilla’s Open Leadership program. He is a known good actor in the Python data ecosystem and has contributed to the development of Project Jupyter and other PyData projects for several years. He has extensive experience using and developing Python and C++ for data science applications, is one of the maintainers of scikit-optimize, a Python library for blackbox optimization, and has contributed to scikit-learn. Tim has given many talks at small and large international conferences like PyCon and EuroSciPy and co-organizes the PyData meetup in Zurich. He holds a PhD in experimental particle physics from the University of Manchester.

Presentations

Binder: Lowering the bar to sharing interactive software Session

The Binder project drastically lowers the bar to sharing and reusing software. Users wanting to try out someone else’s work need only click a single link to do so. Tim Head offers an overview of the Binder project and explores the concepts and ideas behind it. Tim then showcases examples from the community to show off the power of Binder.

Jane Herriman is director of diversity and outreach at Julia Computing and a PhD student at Caltech. She is a Julia, dance, and strength training enthusiast who uses Jupyter notebooks to teach Julia.

Presentations

An introduction to Julia in Jupyter Tutorial

Jane Herriman uses Jupyter notebooks to show you why Julia is special, demonstrate how easy it is to learn, and get you writing your first Julia programs.

The journey to Julia 1.0: The "Ju" in Jupyter Session

Julia and Jupyter share a common evolution path: Julia is the language for modern technical computing, while Jupyter is the development and presentation environment of choice for modern technical computing. Viral Shah and Jane Herriman discuss Julia's journey and the impact of Jupyter on Julia's growth.

Matthew Hunt started playing with computers when he was 8, sold his first program at 13, and retains an unhealthy degree of curiosity. He lives in New York, where he can be found tinkering with 3D printers, dabbling in the future of flight, playing with VR headsets, and even doing work sometimes. He still believes that where you find people having the most fun, there will you find the future being created. Matthew runs the NYC Spark user group.

Presentations

What things are correlated with gender diversity: A dig through the ASF and Jupyter projects Session

Many of us believe that gender diversity in open source projects is important. (If you don’t, this isn’t going to convince you.) But what things are correlated with improved gender diversity, and what can we learn from similar historic industries? Holden Karau and Matt Hunt explore the diversity of different projects, examine historic EEOC complaints, and detail parallels and historic solutions.

Paul Ivanov is a member of the Jupyter Steering Council and a senior software engineer at Bloomberg LP working on IPython- and Jupyter-related open source projects. Previously, Paul worked on backend and data engineering at Disqus; was a code monkey at the Brain Imaging Center at UC Berkeley, where he worked on IPython and taught at UC Berkeley’s Python bootcamps; worked in Bruno Olshausen’s lab at the Redwood Center for Theoretical Neuroscience; and was a PhD candidate in the Vision Science program at UC Berkeley. He holds a degree in computer science from UC Davis.

Presentations

Terraforming Jupyter: Changing JupyterLab to suit your needs Session

Stephanie Stattel and Paul Ivanov walk you through a series of extensions that demonstrate the power and flexibility of JupyterLab’s architecture, from targeted functionality modifications to more extreme atmospheric changes that require extensive decoupling and flexibility within JupyterLab.

Kerim Kalafala is a member of the IBM Academy of Technology, a senior technical staff member in the IBM Systems Group, and an IBM Master Inventor. Currently, he is lead architect of static timing and noise analysis software tools used to design and verify the world’s fastest microprocessors. Kerim has received multiple prestigious Research Division awards for publications in computer science and mathematics, an ACM/IEEE Technical Impact Award in Electronic Design Automation, and a best-paper award at the Design Automation Conference and was recognized for coauthoring a top-10 most-cited paper in the 50-year history of DAC. Kerim has also received both the IBM Corporate and Outstanding Technical Achievement Awards for contributions to the field of statistical timing analysis. He is an inventor with 49 issued patents worldwide and approximately a dozen more pending. Kerim is a member of the executive board for the Rhinebeck Science Foundation and volunteers extensively in his local community. Kerim holds undergraduate and graduate degrees in computer and systems engineering from Rensselaer Polytechnic Institute, where he graduated with summa cum laude honors.

Presentations

Design and analysis of the world’s most advanced microprocessors using Jupyter notebooks Session

Kerim Kalafala and Nicholai L'Esperance share their experiences using Jupyter notebooks as a critical aid in designing the next generation of IBM Power and Z processors, focusing on analytics on graphs consisting of hundreds of millions of nodes. Along the way, Kerim and Nicholai explain how they leverage Jupyter notebooks as part of their overall design system.

Praveen Kanamarlapudi is a senior software engineer on the core data platform team at PayPal, where he builds scalable and distributed platforms, including a highly available Jupyter platform that is being used by hundreds of the company’s data scientists, analysts, and developers. He’s also a contributor to Livy and Sparkmagic.

Presentations

PayPal Notebooks: Data science and machine learning at scale, powered by Jupyter Session

Hundreds of PayPal's data scientists, analysts, and developers use Jupyter to access data spread across filesystem, relational, document, and key-value stores, enabling complex analytics and an easy way to build, train, and deploy machine learning models. Romit Mehta and Praveen Kanamarlapudi explain how PayPal built its Jupyter infrastructure and powerful extensions.

Harini Kannan is a data scientist at cybersecurity company Capsule8, where she applies her skills in statistics, visualization, and machine learning to a broad range of threat detection and computer security problems. She enjoys using Python, Jupyterlab, R, and TensorFlow in her daily work.

Presentations

Rapid data science exploration for cybersecurity Session

The key to successful threat detection in cybersecurity is fast response. George Williams, Harini Kannan, and Alex Comerford offer an overview of specialized extensions they have built for data scientists working in cybersecurity that can be used and deployed via JupyterHub.

Holden Karau is a transgender Canadian open source developer advocate at Google focusing on Apache Spark, Beam, and related big data tools. Previously, she worked at IBM, Alpine, Databricks, Google (yes, this is her second time), Foursquare, and Amazon. Holden is the coauthor of Learning Spark, High Performance Spark, and another Spark book that’s a bit more out of date. She is a committer on the Apache Spark, SystemML, and Mahout projects. When not in San Francisco, Holden speaks internationally about different big data technologies (mostly Spark). She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal. Outside of work, she enjoys playing with fire, riding scooters, and dancing.

Presentations

What things are correlated with gender diversity: A dig through the ASF and Jupyter projects Session

Many of us believe that gender diversity in open source projects is important. (If you don’t, this isn’t going to convince you.) But what things are correlated with improved gender diversity, and what can we learn from similar historic industries? Holden Karau and Matt Hunt explore the diversity of different projects, examine historic EEOC complaints, and detail parallels and historic solutions.

Stefan Karpinski is one of the co-creators and core developers of the Julia language. He is also a co-founder and the Chief Open Source Officer at Julia Computing, a company founded by Julia’s creators to provide professional training, products and consulting to support the use of Julia in large-scale, high-performance industrial applications. Stefan is an applied mathematician and data scientist by trade, having previously worked at Akamai, Citrix Online, and Etsy.

Presentations

The journey to Julia 1.0: The "Ju" in Jupyter Session

Julia and Jupyter share a common evolution path: Julia is the language for modern technical computing, while Jupyter is the development and presentation environment of choice for modern technical computing. Viral Shah and Jane Herriman discuss Julia's journey and the impact of Jupyter on Julia's growth.

Aneesh Karve is the CTO of Quilt Data, a Y Combinator company advancing an open source standard for versioned data. Previously, Aneesh was a product manager, lead designer, and software engineer at companies including Microsoft, NVIDIA, and Matterport and the general manager and founding member of AdJitsu, the first real-time 3D advertising platform for iOS (acquired by Amobee in 2012). He holds degrees in chemistry, mathematics, and computer science. Aneesh’s research background spans proteomics, machine learning, and algebraic number theory.

Presentations

Reproducible data dependencies for Jupyter: Distributing massive, versioned image datasets from the Allen Institute for Cell Science Session

Reproducible data is essential for notebooks that work across time, across contributors, and across machines. Jackson Brown and Aneesh Karve demonstrate how to use an open source data registry to create reproducible data dependencies for Jupyter and share a case study in open science over terabyte-size image datasets.

Kyle Kelley is a senior software engineer at Netflix, a maintainer on nteract.io, and a core developer of the IPython/Jupyter project. He wants to help build great environments for collaborative analysis, development, and production workloads for everyone, from small teams to massive scale.

Presentations

How to build on top of Jupyter’s protocols Tutorial

Kyle Kelley walks you through creating a new web application from the ground up, teaching you how to build on top of Jupyter's protocols in the process. Along the way, you'll learn about Jupyter's REST and streaming APIs, message spec, and the notebook format.

David Koop is an assistant professor in the Computer and Information Science Department at UMass Dartmouth. His research interests include data visualization, computational provenance, and data science environments. He has served as a core developer for the VisTrails project and has collaborated with scientists in the fields of climate science, quantum physics, and invasive species modeling. David holds a PhD in computing from the University of Utah.

Presentations

Supporting reproducibility in Jupyter through dataflow notebooks Session

Dataflow notebooks build on the Jupyter Notebook environment by adding constructs to make dependencies between cells explicit and clear. David Koop offers an overview of the Dataflow kernel, shows how it can be used to robustly link cells as a notebook is developed, and demonstrates how that notebook can be reused and extended without impacting its reproducibility.

Keith Kraus is a Washington, DC-based senior engineer on the AI infrastructure team at NVIDIA, where he builds GPU-accelerated solutions around data engineering, analytics, and visualization. Previously, Keith did extensive data engineering, systems engineering, and data visualization work in the cybersecurity domain, focused on building a GPU-accelerated big data solution for advanced threat detection and cyberthreat-hunting capabilities. Keith holds a BEng in computer engineering and an MEng in networked information systems from Stevens Institute of Technology.

Presentations

GoAi and PyGDF: GPU-accelerated data science with Jupyter notebooks Session

Joshua Patterson, Leo Meyerovich, and Keith Kraus demonstrate how to use PyGDF and other GoAi technologies to easily analyze and interactively visualize large datasets from standard Jupyter notebooks.

Julian Kudszus was a curriculum developer and program coordinator for the Data Science Modules initiative at UC Berkeley, which brings data science lessons to thousands of the university’s students across a wide range of domains through the use of JupyterHub and Jupyter notebooks. He received the Outstanding Teaching and Leadership award for his work at the Division of Data Sciences. Julian holds a bachelor’s degree in computer science from UC Berkeley.

Presentations

JupyterHub for domain-focused integrated learning modules Session

The Data Science Modules program at UC Berkeley creates short explorations into data science using notebooks to allow students to work hands-on with a dataset relevant to their course. Mariah Rogers, Ronald Walker, and Julian Kudszus explain the logistics behind such a program and the indispensable features of JupyterHub that enable such a unique learning experience.

Nicholai L’Esperance is a Essex Junction, Vermont-based staff engineer in the Product Engineering Diagnostics Group within the IBM Systems Group, where he develops new tools and methodologies to aid yield, reliability, and characterization missions for IBM’s Power and Z programs. Nicholai holds a BSEE and MSEE from the University of Vermont, where he graduated with cum laude honors. During his time at UVM, Nicholai focused on signal analysis, coauthoring several papers on ground-penetrating radar and device testing. Nicholai is continuing his studies, pursuing a graduate degree in computer science.

Presentations

Design and analysis of the world’s most advanced microprocessors using Jupyter notebooks Session

Kerim Kalafala and Nicholai L'Esperance share their experiences using Jupyter notebooks as a critical aid in designing the next generation of IBM Power and Z processors, focusing on analytics on graphs consisting of hundreds of millions of nodes. Along the way, Kerim and Nicholai explain how they leverage Jupyter notebooks as part of their overall design system.

Julia Lane is a professor at the NYU Wagner Graduate School of Public Service and the NYU Center for Urban Science and Progress as well as a NYU provostial fellow for innovation analytics. Previously, Julia was a senior managing economist and institute fellow at American Institutes for Research, where she cofounded the Institute for Research on Innovation and Science (IRIS) at the University of Michigan. Over her career, Julia has held positions at the National Science Foundation, the Urban Institute, the World Bank, American University, and NORC at the University at Chicago.

Presentations

Business Summit roundtable: The current environment—Compliance, ethics, ML model interpretation, GDPR, and more Session

Join in for the Business Summit's roundtable discussion with participation from IBM, Capital One, the DoD, AWS, Oracle, and others. Speakers will discuss important issues in our current environment—everything from compliance and GDPR to ML models.

Jupyter, sensitive data, and public policy Session

Government agencies have found it difficult to serve taxpayers because of the technical, bureaucratic, and ethical issues associated with access and use of sensitive data. Julia Lane explains how the Coleridge Initiative has partnered with Jupyter to design ways that can address the core problems such organizations face.

Sam Lau is a graduate student at UC Berkeley, where he is working on a master of science, advised by Josh Hug. Sam is interested in improving data science education. Currently, he’s building tools to make it easy to create and publish interactive educational content online.

Presentations

nbinteract: Shareable interactive web pages from notebooks Session

The nbinteract package converts Jupyter notebooks with widgets into interactive, standalone HTML pages. Its built-in support for function-driven plotting makes authoring interactive pages simpler by allowing users to focus on data, not callbacks. Sam Lau and Caleb Siu offer an overview of nbinteract and walk you through the steps to publish an interactive web page from a Jupyter notebook.

Seth Lawler is an engineering consultant with expertise in coastal and riverine surface water modeling. A subject-matter expert in scientific programming with experience developing and scaling serial applications for parallel processing in high-performance and cloud computing environments, Seth has worked on wide-ranging projects at the national, state, and local level, including the development and quality control of tools in use by the US Army Corps of Engineers and the United States Geological Survey. He is currently completing a PhD in civil engineering at George Mason University, where he is conducting research with the National Weather Service to enhance modeling and forecasting capabilities in areas influenced by coastal and fluvial flooding mechanisms.

Presentations

Using JupyterLab for flood map development: Approaches for improving productivity and reproducibility Session

Creating flood maps for coastal and riverine communities requires geospatial processing, statistical analysis, finite element modeling, and a team of specialists working together. Seth Lawler explains how using the feature-rich JupyterLab to develop tools, share code with team members, and document workflows used in the creation of flood maps improves productivity and reproducibility.

Tianhui Michael Li is the founder and CEO of the Data Incubator. Michael has worked as a data scientist lead at Foursquare, a quant at D.E. Shaw and JPMorgan, and a rocket scientist at NASA. At Foursquare, Michael discovered that his favorite part of the job was teaching and mentoring smart people about data science. He decided to build a startup that lets him focus on what he really loves. He did his PhD at Princeton as a Hertz fellow and read Part III Maths at Cambridge as a Marshall scholar.

Presentations

Business Summit roundtable: The current environment—Compliance, ethics, ML model interpretation, GDPR, and more Session

Join in for the Business Summit's roundtable discussion with participation from IBM, Capital One, the DoD, AWS, Oracle, and others. Speakers will discuss important issues in our current environment—everything from compliance and GDPR to ML models.

Will Farr is an associate professor in the Department of Physics and Astronomy at Stony Brook University and the Gravitational Wave Astronomy Group leader at the Flatiron Institute’s Center for Computational Astronomy. A theoretical astrophysicist with interests in astrostatistics, the gravitational dynamics of exoplanets and dense stellar systems, gravitational waves, compact object evolution, computational astrophysics, and general relativity, Will is also an enthusiastic programming language polyglot and has contributed software to many astronomical projects. You can find him as farr on GitHub.

Presentations

All the cool kids are doing it; maybe we should too? Jupyter, gravitational waves, and the LIGO and Virgo Scientific Collaborations Keynote

Will Farr shares examples of Jupyter use within the LIGO and Virgo Scientific Collaborations and offers lessons about the (many) advantages and (few) disadvantages of Jupyter for large, global scientific collaborations. Along the way, Will speculates on Jupyter's future role in gravitational wave astronomy.

ED Ma is the vice president of tax data analytics at Synchrony Financial. His work experience includes data analytics, model development, model review, and model governance in financial industry. He holds a master’s degree in computer information technologies, a master’s degree in financial engineering, and a bachelor’s degree in applied mathematics.

Presentations

How the Jupyter Notebook makes the corporate tax process easier and better Session

In the corporate tax world, Microsoft Excel—the king of spreadsheets—is the default tool for tracking information and managing tasks, but tax professionals are often annoyed by slowly updating or broken linked or referenced cells within or between spreadsheets. Jinli Ma explains how the Jupyter Notebook does a better job than Microsoft Excel with the original issued discount calculation process.

Johan Mabille is a scientific software developer at QuantStack, where he specializes in high-performance computing in C++. Previously, Johan was a quant developer at HSBC. An open source developer, Johan is the coauthor of xtensor and xeus and the main author of xsimd. He holds a master’s degree in computer science from Centrale-Supelec.

Presentations

Going native: C++ as a first-class citizen of the Jupyter ecosystem Session

Sylvain Corlay, Johan Mabille, Wolf Vollprecht, and Martin Renou share the latest features of the C++ Jupyter kernel, including live help, auto-completion, rich MIME type rendering, and interactive widgets. Join in to explore one of the most feature-full implementations of the Jupyter kernel protocol that also brings Jupyter closer to the metal.

Dan Romuald Mbanga is a global lead business development manager at AWS, where he leads business and technical initiatives involving Amazon AI platforms such as Amazon SageMaker, designed to provide end-to-end machine learning environments for AWS’s customers. He helps AWS customers in all GEOs, as well as internal AWS stakeholders across data science, product development, marketing, sales, and technical support achieve success with AWS’s machine and deep learning technologies. Previously, Dan was a big data and DevOps engineering manager at AWS, where he built and led two teams of specialized engineers on the Hadoop ecosystem and in CI/CD technologies. Dan holds BS degrees in physics and computer science from the University of Buea. In his spare time, he enjoys traveling, hacking hardware electronics, and learning new languages.

Presentations

Business Summit roundtable: The current environment—Compliance, ethics, ML model interpretation, GDPR, and more Session

Join in for the Business Summit's roundtable discussion with participation from IBM, Capital One, the DoD, AWS, Oracle, and others. Speakers will discuss important issues in our current environment—everything from compliance and GDPR to ML models.

Keynote by Dan Romuald Mbanga Keynote

Keynote by Dan Romuald Mbanga

Kevin McCormick is a senior software development engineer at Amazon Web Services, where he is the lead engineer on the Amazon SageMaker notebook platform, which provides an easy-to-use Jupyter notebook experience as a first-class AWS offering. Kevin has over 15 years of software development and IT experience, including building software for everything from websites, IDEs, and web browsers to productivity applications and reusable libraries. He has contributed to a number of open source projects, including more than a dozen improvements to the Chromium project. Although he lives in Seattle, he’s a New York/New Jersey native, so he knows what a good slice of pizza is supposed to taste like.

Presentations

Containerizing notebooks for serverless execution (sponsored by AWS) Session

Kevin McCormick explains the story of two approaches which were used internally at AWS to accelerate new ML algorithm development, and easily package Jupyter notebooks for scheduled execution, by creating custom Jupyter kernels that automatically create Docker containers, and dispatch them to either a distributed training service or job execution environment.

Romit Mehta is a product manager at PayPal focusing on core big data and analytics platform products, which include a compute framework, a data platform, and a notebooks platform. In this role, Romit is working to simplify application development on big data technologies like Spark and improve analysts’ and data scientists’ agility and ease their access to data spread across a multitude of data stores via friendly technologies like SQL and notebooks. In his 19-year career, Romit has built data and analytics solutions for a wide variety of companies across the networking, semiconductor, telecom, security, and fintech industries. Outside of data products, Romit spends his time with his wife Kosha and their two wonderful kids, Annika and Vedant.

Presentations

PayPal Notebooks: Data science and machine learning at scale, powered by Jupyter Session

Hundreds of PayPal's data scientists, analysts, and developers use Jupyter to access data spread across filesystem, relational, document, and key-value stores, enabling complex analytics and an easy way to build, train, and deploy machine learning models. Romit Mehta and Praveen Kanamarlapudi explain how PayPal built its Jupyter infrastructure and powerful extensions.

Julia Meinwald is the open source coordinator at Two Sigma. Her background is in music, but she’s been learning more about technology and scientific computing ever since she joined Two Sigma in 2010. She’s enjoyed every stop of her quest to learn more about open source software, from getting to know what makes the products developed at Two Sigma special to writing backing tracks for her musical, Reb + VoDKa + Me, on Sonic Pi. Julia is a frequent speaker at open source meetups at Two Sigma and at the 2017 PyData conference in NYC and is thrilled to be a part of this year’s JupyterCon.

Presentations

Why contribute to open source? Keynote

Julia Meinwald outlines a few effective ways Two Sigma has identified to support the unseen labor maintaining a healthy open source ecosystem and details how the company’s thinking on this topic has evolved.

Leo Meyerovich cofounded Graphistry, Inc. to help enterprise and federal teams easily scale visual investigations of their event and graph data. Graphistry’s original approach of connecting GPUs in browsers to GPUs in datacenters builds upon the founding team’s work at UC Berkeley on the first parallel web browser and the Superconductor language. Leo is most cited for his work in language-based security and policy verification. His earlier research received awards for the first reactive web language Flapjax, parallelizing the web browser, and the sociological foundations of programming languages.

Presentations

GoAi and PyGDF: GPU-accelerated data science with Jupyter notebooks Session

Joshua Patterson, Leo Meyerovich, and Keith Kraus demonstrate how to use PyGDF and other GoAi technologies to easily analyze and interactively visualize large datasets from standard Jupyter notebooks.

John Miller is a regional services manager in the Technology Services Division of Honeywell UOP. A chemical engineering and computer science double major, John has spent the last 20 years in the refining and petrochemicals industry trying (mostly unsuccessfully) to find a harmonious union of the two disciplines. Previously, he worked in UOP’s Field Operating Services, where he traveled the world helping refiners commission and operate UOP technology. John’s Python experience began in the late ‘90s, when he built a simple web server on the company intranet using an early incarnation of Zope. However, he’s a Lisp guy at heart. As a result, he was forced to learn that most daunting of text editors: Emacs. John found the Emacs IPython Notebook after (re)discovering IPython around version 0.11; when EIN’s creator, Takafumi Arakaki, moved on to other things and big changes in IPython required significant updates to EIN to maintain compatibility, he foolishly dived into the world of Emacs Lisp and Jupyter development and hasn’t looked back.

Presentations

The Emacs Ipython Notebook Session

John Miller offers an overview of the Emacs IPython Notebook (EIN), a full-featured client for the Jupyter Notebook in Emacs, and shares a brief history of its development.

Presentations

The future of Jupyter in education Session

Join this panel of seasoned educators and the cochairs of the education track at JupyterCon to look to the future of Jupyter in teaching and learning.

Paco Nathan is known as a “player/coach” with core expertise in data science, natural language processing, machine learning, and cloud computing. He has 35+ years of experience in the tech industry, at companies ranging from Bell Labs to early-stage startups. His recent roles include director of the Learning Group at O’Reilly Media and director of community evangelism at Databricks and Apache Spark. Paco is the cochair of JupyterCon and an advisor for Amplify Partners, Deep Learning Analytics, and Recognai. He was named one of the top 30 people in big data and analytics in 2015 by Innovation Enterprise.

Presentations

Business Summit discussion group Session

The Business Summit concludes with "unconference"-style breakout sessions that allow enterprise stakeholders to give input to Project Jupyter directly.

Friday opening remarks Keynote

JupyterCon cochairs Paco Nathan, Fernando Pérez, and Brian Granger open the second day of keynotes.

Jupyter trends in 2018 Keynote

Jupyter is built on a set of extensible, reusable building blocks, expressed through various open protocols, APIs, and standards. For many use cases, these are combined to provide extensible software architecture for interactive computing with data. Paco Nathan shares a few somewhat unexpected things that emerged in 2018.

Thursday opening remarks Keynote

JupyterCon cochairs Paco Nathan, Fernando Pérez, and Brian Granger open the first day of keynotes.

Rob Newton is a mathematics instructor at the Trinity School, where each year he teaches a course on advanced topics that lie beyond a traditional high school curriculum. Recent courses have included algebraic number theory, combinatorics, linear algebra, group theory, and cryptography, each with a significant coding component. Rob grew up in a military family, moving from California to Texas to Germany to New York during his childhood. He holds a bachelor’s degree in mathematics from SUNY Potsdam and a PhD from the University of Florida, where his research analyzed a topological invariant that nobody can pronounce (and where he took advantage of the beautiful weather and beaches). Rob is an avid homebrewer and loves fruity herbal tea—experience he used as the adviser of the tea club at Trinity.

Presentations

Jupyter for every high schooler Session

In an effort to broaden graduates' mathematical toolkit and address gender equity in STEM education, Rob Newton has led the implementation of Python projects across his school's entire ninth-grade math courses. Now every student in the ninth grade completes three python projects that introduce programming and integrate them with the ideas developed in class.

Laura Noren is an organizational sociologist at NYU investigating how organizations integrate (or fail to integrate) data-driven decision-making insights and processes.

Presentations

Data science in US and Canadian higher education Session

Laura Noren offers an overview of a research project on the various infrastructure models supporting data science in research settings in terms of funding, educational uses, and research utilization. Laura then shares some of the findings, comparing the national federation model currently established in Canada to the more grassroots efforts in many US universities.

Catherine Ordun is a Washington, DC-based senior data scientist at Booz Allen Hamilton. Catherine’s background is in biology, public health, and business. A self-taught Python programmer, she has led data science work across the US government, including intelligence and public health agencies and the DoD. She serves on the Women in Data Science Committee at Booz Allen, has presented to the National Academy of Medicine, and led her team to the top three in a Health and Human Services opioid codeathon. Catherine is a two-time recipient of the Women of Color (WoC) award and is currently a program reviewer for SciPy2018. She is passionate about machine learning, has recently started participating in Kaggle challenges, and has started an internal firm-wide machine intelligence meetup.

Presentations

The Jupyter Notebook as a transparent way to document machine learning model development: A case study from a US defense agency Session

Many US government agencies are just getting started with machine learning. As a result, data scientists need to de-"black box" models as much as possible. One simple way to do this is to transparently show how the model is coded and its results at each step. Notebooks do just this. Catherine Ordun walks you through a notebook built for RNNs and explains how government agencies can use notebooks.

M Pacer is a Jupyter core developer and a senior notebook engineer at Netflix. Previously, M was a postdoctoral researcher the Berkeley Institute for Data Science (BIDS), focusing on the intersection between Jupyter and scientific publishing. M holds a PhD from UC Berkeley, where their research used machine learning and human experiments to study casual explanation and causal inference, and a BS from Yale University, where their research focused on the role of causal language in evaluating scientific claims.

Presentations

Jupyter's configuration system Session

Jupyter's straightforward, out-of-the-box experience has been important for its success in widespread adoption. But good defaults only go so far. Join Afshin Darian, M Pacer, Min Ragan-Kelley, and Matthias Bussonnier to go beyond the defaults and make Jupyter your own.

Making beautiful objects with Jupyter Session

Jupyter displays a rich array of media types out of the box. M Pacer explains how to use these capabilities to their full potential, covering how to add rich displays to existing and new Python classes and how to customize the way notebooks are converted to other formats. These skills will enable anyone to make beautiful objects with Jupyter.

Yuvi Panda is infrastructure lead for the Data Science Education Program at UC Berkeley, where he works on scaling JupyterHub for use by thousands of students. A programmer and DevOps engineer, he wants to make it easy for people who don’t traditionally consider themselves programmers to do things with code and builds tools (Quarry, PAWS, etc.) to sidestep the list of historical accidents that constitute the “command-line tax” that people have to pay before doing productive things with computing. He’s a core member of the JupyterHub team and works on mybinder.org as well. Yuvi is also a Wikimedian, since you can check out of Wikimedia, but you can never leave.

Presentations

How we run MyBinder.org: A case study in open infrastructure Session

Running infrastructure is challenging for an open source community. Yuvi Panda shares lessons drawn from the small community that operates MyBinder.org, covering the social and technical processes for keeping MyBinder.org reliable in the most open, transparent, and inclusive way possible, using pretty graphs about the state of MyBinder.org that anyone can see in real time.

Pangeo: Big data climate science in the cloud Session

Climate science is being flooded with petabytes of data, overwhelming traditional modes of data analysis. The Pangeo project is building a platform to take big data climate science into the cloud using SciPy and large-scale interactive computing tools. Join Ryan Abernathey and Yuvi Panda to find out what the Pangeo team is building and why and learn how to use it.

The current state of JupyterHub and what's in store for the future Session

JupyterHub is a multiuser server for Jupyter notebooks, focused on supporting deployments in research and education. Min Ragan-Kelley, Carol Willing, and Yuvi Panda discuss recent additions and future plans for the project.

Joshua Patterson is the director of applied solutions engineering at NVIDIA. Previously, Josh worked with leading experts across the public and private sectors and academia to build a next-generation cyberdefense platform. He was also a White House Presidential Innovation Fellow. His current passions are graph analytics, machine learning, and GPU data acceleration. Josh also loves storytelling with data and creating interactive data visualizations. He holds a BA in economics from the University of North Carolina at Chapel Hill and an MA in economics from the University of South Carolina’s Moore School of Business.

Presentations

GoAi and PyGDF: GPU-accelerated data science with Jupyter notebooks Session

Joshua Patterson, Leo Meyerovich, and Keith Kraus demonstrate how to use PyGDF and other GoAi technologies to easily analyze and interactively visualize large datasets from standard Jupyter notebooks.

Bo Peng is an assistant professor in the Department of Bioinformatics and Computational Biology at the University of Texas’s MD Anderson Cancer Center. Drawing on his background in mathematics, bioinformatics, and computer science, Bo applies advanced computational techniques (parallel computation, large-scale simulations) to research topics in population genetics, genetic epidemiology, and bioinformatics. He is the author of leading population genetics simulator simuPOP as well as software tools for the integrated annotation, manipulation, and analysis of genetic variants from whole exome and whole genome sequencing studies (Variant Tools), with Script of Scripts being his most recent project.

Presentations

SoS: A polyglot notebook and workflow system for both interactive multilanguage data analysis and batch data processing Session

Bo Peng offers an overview of Script of Scripts (SoS), a Python 3-based workflow engine with a Jupyter frontend that allows the use of multiple kernels in one notebook. This unique combination enables users to analyze data using multiple scripting languages in one notebook and, if needed, convert scripts to workflows in situ to analyze large amounts of data on remote systems.

Fernando Pérez is a staff scientist at Lawrence Berkeley National Laboratory and a founding investigator of the Berkeley Institute for Data Science at UC Berkeley, created in 2013. His research focuses on creating tools for modern computational research and data science across domain disciplines, with an emphasis on high-level languages, interactive and literate computing, and reproducible research. He created IPython while a graduate student in 2001 and continues to lead its evolution into Project Jupyter, now as a collaborative effort with a talented team that does all the hard work. Fernando regularly lectures about scientific computing and data science and is a member of the Python Software Foundation, a founding member of NumFOCUS, and a National Academy of Science Kavli Frontiers of Science Fellow. He is also the recipient of the 2012 Award for the Advancement of Free Software from the Free Software Foundation. Fernando holds a PhD in particle physics from the University of Colorado at Boulder, which he followed with postdoctoral research in applied mathematics and developing numerical algorithms.

Presentations

Friday opening remarks Keynote

JupyterCon cochairs Paco Nathan, Fernando Pérez, and Brian Granger open the second day of keynotes.

Sea change: What happens when Jupyter becomes pervasive at a university? Keynote

In 2018, UC Berkeley launched a new major in data science, anchored by two core courses that are the fastest-growing in the history of the university. Fernando Pérez discusses the program and explains how the core courses, which now reach roughly 40% of the campus population, are extending data science into specific domains that cover virtually all disciplinary areas of the campus.

Thursday opening remarks Keynote

JupyterCon cochairs Paco Nathan, Fernando Pérez, and Brian Granger open the first day of keynotes.

Min Ragan-Kelley is a postdoctoral fellow at Simula Research Lab in Oslo, Norway, where he focuses on developing JupyterHub, Binder, and related technologies and supporting deployments of Jupyter in science and education around the world. Min has been contributing to IPython and Jupyter since 2006 (full-time since 2013).

Presentations

Deploying a cloud-based JupyterHub for students and researchers Tutorial

Carol Willing, Min Ragan-Kelley, and Erik Sundell demonstrate how to provide easy access to Jupyter notebooks and JupyterLab without requiring users to install anything on their computers. You'll learn how to configure and deploy a cloud-based JupyterHub using Kubernetes and how to customize and extend it for your needs.

Jupyter's configuration system Session

Jupyter's straightforward, out-of-the-box experience has been important for its success in widespread adoption. But good defaults only go so far. Join Afshin Darian, M Pacer, Min Ragan-Kelley, and Matthias Bussonnier to go beyond the defaults and make Jupyter your own.

The current state of JupyterHub and what's in store for the future Session

JupyterHub is a multiuser server for Jupyter notebooks, focused on supporting deployments in research and education. Min Ragan-Kelley, Carol Willing, and Yuvi Panda discuss recent additions and future plans for the project.

Shivraj Ramanan is director of product management at Capital One. Shivraj combines a strong background in business strategy with technical depth to drive successful outcomes for product teams. Previously, he worked in product strategy in a Fortune 500 company, where he analyzed emerging markets and investigated strategic investments, and in strategy consulting, where he advised on a wide variety of complex topics. Shivraj started his career as a software engineer developing enterprise backup software.

Presentations

Using Jupyter notebooks in highly regulated environments Session

In Capital One's recent exploration of "notebook" offerings, JupyterHub emerged as a top contender that could serve as a potential platform for analytics even in highly regulated industries like financial services. David Schaaf and Shivraj Ramanan discuss Capital One's journey and explain how Jupyter has become a part of the company's ever-growing analytics toolkit.

Vijay Reddy is a machine learning specialist on the Google Cloud customer engineering team, where his mission is to democratize machine learning and help companies realize the power of machine learning via Google Cloud. Previously, he worked at a startup that used machine learning to detect bank fraud. Vijay studied computer science at Carnegie Mellon.

Presentations

Serverless machine learning with TensorFlow 1-Day Training

Vijay Reddy walks you through the process of building machine learning models with TensorFlow. You'll learn about data exploration, feature engineering, model creation, training, evaluation, deployment, and more.

Presentations

Going native: C++ as a first-class citizen of the Jupyter ecosystem Session

Sylvain Corlay, Johan Mabille, Wolf Vollprecht, and Martin Renou share the latest features of the C++ Jupyter kernel, including live help, auto-completion, rich MIME type rendering, and interactive widgets. Join in to explore one of the most feature-full implementations of the Jupyter kernel protocol that also brings Jupyter closer to the metal.

Luciano Resende is a data science platform architect at IBM CODAIT (formerly the Spark Technology Center). A member of the ASF, Luciano has been contributing to open source at the ASF for over 10 years and is currently contributing to various big data-related Apache projects around the Apache Spark ecosystem as well as building a scalable, secure, and flexible enterprise data science platform within the Jupyter ecosystem.

Presentations

Jupyter in the enterprise Keynote

IBM has leveraged the Jupyter stack in many of its products to offer industry-leading and business-critical services to its clients. Luciano Resende explores some of the open source initiatives that IBM is leading in the Jupyter ecosystem to address enterprise requirements in the community.

Scaling notebooks for deep learning workloads (sponsored by IBM Watson) Session

Luciano Resende outlines a pattern for building deep learning models using the Jupyter Notebook's interactive development in commodity hardware and leveraging platforms and services such as Fabric for Deep Learning (FfDL) for cost-effective full dataset training of deep learning models.

Lindsay Richman is a digital operations specialist at McKinsey & Company, where she programs in Python and JavaScript, primarily working in the areas of data visualization, frontend web development, and robotics. Lindsay uses machine learning and AI to help streamline operations, improve product quality, and drive informed decision making.

Presentations

JupyterLab and Plotly: A data visualization power couple Session

JupyterLab and Plotly both provide a rich set of tools for working with data. When combined, they create a powerful computational environment that enables users to produce versatile, robust visualizations in a fast-paced setting. Lindsay Richman demonstrates how to use JupyterLab, Plotly, and Plotly's Python-based Dash framework to create dynamic charts and interactive reports.

Mariah Rogers is program coordinator for the Division of Data Sciences at UC Berkeley, where she led the effort to build up the Data Scholars program that provides specialized academic support for students from underrepresented and nontraditional backgrounds. Mariah has been working with faculty on campus to build up the academic advising program for the new data science major (announced late Spring 2018) and has also been comanaging the Data Science Modules program to facilitate the introduction of data science concepts in existing courses across the UC Berkeley campus. Mariah holds a degree in computer science from UC Berkeley.

Presentations

JupyterHub for domain-focused integrated learning modules Session

The Data Science Modules program at UC Berkeley creates short explorations into data science using notebooks to allow students to work hands-on with a dataset relevant to their course. Mariah Rogers, Ronald Walker, and Julian Kudszus explain the logistics behind such a program and the indispensable features of JupyterHub that enable such a unique learning experience.

Ian Rose is as postdoctoral fellow at the Berkeley Institute for Data Science, where he works on the Jupyter Project. He holds a PhD in geology from UC Berkeley, where his research focused on the physics of the deep Earth.

Presentations

JupyterLab Session

Ian Rose and Chris Colbert walk you through the JupyterLab interface and codebase and explain how it fits within the overall roadmap of Project Jupyter.

JupyterLab training 1-Day Training

Chris Colbert, Ian Rose, and Saul Shanabrook walk you through using, extending, and developing custom components for JupyterLab using PhosphorJS, React, JavaScript, TypeScript, and CSS. You'll learn how to make full use of the power features of JupyterLab, customize it to your needs, and develop custom extensions, making complete use of JupyterLab's current capabilities.

Gerald Rousselle is director of product management at Teradata.

Presentations

Jupyter in the modern enterprise data and analytics ecosystem: Trends, experiments, and opportunities Session

Gerald Rouselle reviews some of the trends in modern data and analytics ecosystems for large enterprises and shares some of the key challenges and opportunities for Jupyter adoption. He also details some recent examples and experiments in incorporating Jupyter in commercial products and platforms.

Scott Sanderson is a senior software engineer at Quantopian, where he is responsible for the design and implementation of Quantopian’s backtesting and research APIs. Within the Jupyter ecosystem, most of Scott’s work focuses on enhancing the extensibility of the Jupyter Notebook for use in large deployments.

Presentations

Designing for interaction Session

Scott Sanderson explores how interactivity can and should influence the design of software libraries, details how the needs of interactive users differ from the needs of application developers, and shares techniques for improving the usability of libraries in interactive environments without sacrificing robustness in noninteractive environments.

Sandra Savchenko-de Jong is a Lausanne-based software engineer and data scientist at the Swiss Data Science Center, where she works on the development of the Renku platform. Previously, she was a software engineer at a large bank in the Netherlands. An astrophysicist by education, Sandra studied at the Rijksuniversiteit Groningen in the Netherlands and holds a PhD from the Observatoire de Paris in France.

Presentations

Reproducible science with the Renku platform Session

Sandra Savchenko-de Jong offers an overview of Renku, a highly scalable and secure open software platform designed to make (data) science reproducible, foster collaboration between scientists, and share resources in a federated environment.

David Schaaf is a director of data engineering at Capital One, where he leads data product development within the Financial Services Division. As part of his role, he guides agile teams to build data products for analyst and data communities with a primary focus on enabling self-service analytics, exploration, and insight discovery. David’s teams typically design data products using microservices, Angular, and Python and leverage core CI/CD practices for continuous delivery. David has more than 15 years of experience in software engineering and data analytics. He also has a wide breadth of knowledge across the financial services domain and in the retail industry. As a developer and analyst, David’s greatest interest is solving unique, complex problems and developing others as software and data engineers.

Presentations

Business Summit roundtable: The current environment—Compliance, ethics, ML model interpretation, GDPR, and more Session

Join in for the Business Summit's roundtable discussion with participation from IBM, Capital One, the DoD, AWS, Oracle, and others. Speakers will discuss important issues in our current environment—everything from compliance and GDPR to ML models.

Jupyter notebooks and the intersection of data science and data engineering Keynote

David Schaaf explains how data science and data engineering can work together in cross-functional teams—with Jupyter notebooks at the center of collaboration and the analytic workflow—to more effectively and more quickly deliver results to decision makers.

Using Jupyter notebooks in highly regulated environments Session

In Capital One's recent exploration of "notebook" offerings, JupyterHub emerged as a top contender that could serve as a potential platform for analytics even in highly regulated industries like financial services. David Schaaf and Shivraj Ramanan discuss Capital One's journey and explain how Jupyter has become a part of the company's ever-growing analytics toolkit.

Matthew Seal is a senior software engineer at Netflix, where he works on scaling data platform solutions. Based in the Bay Area of California, Matthew attended Stanford University for undergraduate and graduate school. He stayed in the area, working at startups and spending a long stretch of time working at OpenGov.

Presentations

Scheduled notebooks: A means for manageable and traceable code execution Session

Using an nteract project, papermill, Matthew Seal walks you through how Netflix uses notebooks to track user jobs and make a simple interface for work submission. You’ll get an inside peek at how Netflix is tackling the scheduling problem for a range of users who want easily managed workflows.

Viral B. Shah is a cofounder and CEO of Julia Computing and a cocreator of the Julia language. He spends all his time on working toward making Julia the default language for all forms of data science and numerical computing. Previously, he architected the payment platforms for the National ID (Aadhaar) project of the Government of India and authored Rebooting India, a book on his experiences implementing a complex technology project in governance. Viral holds a PhD in computational sciences from UC Santa Barbara, where his thesis was on interactive supercomputing. The technology developed in his thesis was licensed commercially by Microsoft.

Presentations

The journey to Julia 1.0: The "Ju" in Jupyter Session

Julia and Jupyter share a common evolution path: Julia is the language for modern technical computing, while Jupyter is the development and presentation environment of choice for modern technical computing. Viral Shah and Jane Herriman discuss Julia's journey and the impact of Jupyter on Julia's growth.

Saul Shanabrook is a software developer at Quansight.

Presentations

JupyterLab training 1-Day Training

Chris Colbert, Ian Rose, and Saul Shanabrook walk you through using, extending, and developing custom components for JupyterLab using PhosphorJS, React, JavaScript, TypeScript, and CSS. You'll learn how to make full use of the power features of JupyterLab, customize it to your needs, and develop custom extensions, making complete use of JupyterLab's current capabilities.

Caleb Siu is studying computer science and economics at UC Berkeley. Caleb is interested in applying data science in the context of education and social good. He’s currently working on nbinteract, a project that allows users to easily create interactive visualizations with just a few lines of Python.

Presentations

nbinteract: Shareable interactive web pages from notebooks Session

The nbinteract package converts Jupyter notebooks with widgets into interactive, standalone HTML pages. Its built-in support for function-driven plotting makes authoring interactive pages simpler by allowing users to focus on data, not callbacks. Sam Lau and Caleb Siu offer an overview of nbinteract and walk you through the steps to publish an interactive web page from a Jupyter notebook.

Stephanie Stattel is a senior software developer at Bloomberg LP, where she is developing applications to improve financial professionals’ research and investment workflows. She is a San Francisco lead of the company’s global Bloomberg Women in Tech (BWIT) community.

Presentations

Terraforming Jupyter: Changing JupyterLab to suit your needs Session

Stephanie Stattel and Paul Ivanov walk you through a series of extensions that demonstrate the power and flexibility of JupyterLab’s architecture, from targeted functionality modifications to more extreme atmospheric changes that require extensive decoupling and flexibility within JupyterLab.

William Stein is a full professor of mathematics at the University of Washington (where he is currently on leave) and the CEO of SageMath, Inc., whose main product is CoCalc. William is the founder of the SageMath open source math software project. He also came up with the name Cython and launched that project. He has published three books and a few dozen papers on number theory.

Presentations

Real-time collaboration with Jupyter notebooks using CoCalc Session

William Stein explains how CoCalc relates to Project Jupyter and shares how he implemented real-time collaborative editing of Jupyter notebooks in CoCalc.

Dave Stuart is a senior data scientist within the US Department of Defense. Dave currently leads a large-scale effort to transform the workflows of thousands of enterprise business analysts through Jupyter and Python adoption, making tradecraft more efficient, sharable, and repeatable. Prior to this focus, Dave led multiple grass-roots technology adoption efforts, developing innovative training methods which tangibly increased the technical proficiency of a large, non-coding enterprise workforce.

Presentations

Business Summit roundtable: The current environment—Compliance, ethics, ML model interpretation, GDPR, and more Session

Join in for the Business Summit's roundtable discussion with participation from IBM, Capital One, the DoD, AWS, Oracle, and others. Speakers will discuss important issues in our current environment—everything from compliance and GDPR to ML models.

Citizen data science: An enterprise use case from inside the US intelligence community Session

Dave Stuart explains how Jupyter was used inside the US Department of Defense and the greater intelligence community to empower thousands of "citizen data scientists" to build and share analytics in order to meet the community’s dynamic challenges.

Erik Sundell is a math and physics teacher in Uppsala, Sweden. While working toward a machine learning degree online, he realized the potential of Jupyter for educators and established a JupyterHub deployment using the “Zero to JupyterHub on Kubernetes” guide for his students. Soon after, he began contributing to the open source project.

Presentations

Deploying a cloud-based JupyterHub for students and researchers Tutorial

Carol Willing, Min Ragan-Kelley, and Erik Sundell demonstrate how to provide easy access to Jupyter notebooks and JupyterLab without requiring users to install anything on their computers. You'll learn how to configure and deploy a cloud-based JupyterHub using Kubernetes and how to customize and extend it for your needs.

Learn by doing: Using data-driven stories and visualizations in the (high school and college) classroom Session

Students learn by doing. Carol Willing, Jessica Forde, and Erik Sundell demonstrate the value of interactive content, using Jupyter notebooks, widgets, and visualization libraries, share notable examples of projects within the Jupyter community, and outline ways educators can help students develop data science literacy and use computational skills to build upon their interests.

Thorin Tabor is a software engineer at UCSD and a contributing scientist at the Broad Institute. He is the lead developer of the GenePattern Notebook and an open source developer on the integration of bioinformatic tools with Jupyter.

Presentations

GenePattern Notebook: Jupyter beyond the programmer Session

Making Jupyter accessible to all members of a research organization, regardless of their programming ability, empowers it to best utilize the latest analysis methods while avoiding bottlenecks. Thorin Tabor offers an overview of the GenePattern Notebook, which offers a wide suite of enhancements to the Jupyter environment to help bridge the gap between programmers and nonprogrammers.

Robert Talbert is a professor of mathematics at Grand Valley State University. Robert is an early adopter, proponent, and thought leader on flipped learning in higher education, and his flipped learning implementations include 10 different university mathematics and computer science courses. He is the author of Flipped Learning: A Guide for Higher Education Faculty; he has also written articles, book chapters, and blog posts and given workshops and presentations on flipped learning to audiences in colleges across the US and abroad.

Presentations

Flipped learning with Jupyter: Experiences, best practices, and supporting research Session

In flipped learning, students encounter new material before class meetings, which helps them learn how to learn and frees up class time to focus on creative applications of the basic material. Lorena Barba and Robert Talbert discuss the use of Jupyter notebooks as a “tangible interface” for new material in a flipped course and share case studies from their own courses.

Robert Talbert is Professor and Assistant Chair in the Department of Mathematics at Grand Valley State University. He holds M.S. and Ph.D. degrees in Mathematics from Vanderbilt University. He is the author of the book Flipped Learning: A Guide for Higher Education Faculty (Stylus, 2017) and is a frequent workshop facilitator and keynote speaker on teaching and learning in the US and abroad. During the 2017-2018 academic year, Robert spent a sabbatical withi Steelcase Education, where he worked with the Workspace Futures group to conduct research on active learning and active learning spaces. He writes about math, technology, education, and academic productivity at his website, rtalbert.org.

Presentations

Jupyter in education discussion group

The Jupyter in education track concludes with breakout sessions that allow presenters and attendees alike to work together on specific topics, potentially leading to new projects and collaborations.

Rachael Tatman is a data scientist at Kaggle. She holds a PhD in linguistics from the University of Washington, with a focus in computational sociolinguistics. Her interests include data science education and fairness in machine learning.

Presentations

I Do, We Do, You Do: Supporting active learning with notebooks Tutorial

Rachael Tatman offers practical introduction to incorporating Jupyter notebooks into the classroom using active learning techniques.

Reproducible research best practices (highlighting Kaggle Kernels) 1-Day Training

Rachael Tatman shows you how to take an existing research project and make it fully reproducible using Kaggle Kernels. You'll learn best practices for and get hands-on experience with each of the three components necessary for completely reproducible research.

Tracy Teal is a cofounder of Data Carpentry and the executive director of The Carpentries. Previously, Tracy was an NSF postdoctoral researcher in biological informatics and an assistant professor in microbiology at Michigan State University. After seeing researchers’ need for effective data skills to effectively and reproducibly conduct research, she cofounded Data Carpentry to scale data training along with data production. Tracy is involved in the open source software and reproducible research communities, including as an editor at the Journal for Open Source Software and Journal for Open Source Education. She holds a PhD in computation and neural systems from California Institute of Technology.

Presentations

Democratizing data Keynote

We are generating vast amounts of data, but it's not the data itself that is valuable—it's the information and knowledge that can come from this data. Tracy Teal explains how to bring people to data and empower them to address their questions, reach their potential, and solve issues that are important in science, scholarship, and society.

Adam Thornton is a software developer for the Large Synoptic Survey Telescope, where he focuses on data management, science quality, and reliability engineering and is working on the JupyterLab-based interactive component of the LSST science platform. He has nearly 30 years of development, IT consulting, and system administration experience in a wide variety of settings, from academic computing to Fortune 20 enterprises.

Presentations

"If the data will not come to the astronomer. . .": JupyterLab and a sea change in astronomical analysis Session

LSST is an ambitious project to map the sky in the fastest, widest, and deepest survey ever made. The project's database disrupts traditional astronomical workflows, and its science platform requires a paradigm shift in how astronomy is done. Adam Thornton discusses the challenges of providing production services on a notebook-based architecture and the compelling advantages of JupyterLab.

Michelle Ufford leads the big data tools team at Netflix. She specializes in analytics infrastructure and has spent the last decade leading high-impact projects in web-scale environments. Her team is responsible for innovations to improve the usability of Netflix’s 100 PB data warehouse and industry-leading data platform. Previously, she led data engineering, data management, and platform architecture for GoDaddy, where she set a TPS record for SQL Server and helped pioneer Hadoop data warehousing techniques. Michelle is also a published author, patented developer, and award-winning open source contributor. You can find her on Twitter at @MichelleUfford.

Presentations

Beyond interactive: Scaling impact with notebooks at Netflix Keynote

Netflix is reimagining what a Jupyter notebook is, who works with it, and what you can do with it. Michelle Ufford shares how Netflix leverages notebooks today and describes a brief vision for the future.

Notebooks at Netflix: From analytics to engineering (sponsored by Netflix) Session

Netflix relies on notebooks to inform decisions and fuel experiments across the company. Now Netflix wants to go even further to deliver a compelling notebook experience for end-to-end workflows. Michelle Ufford shares some of the big bets Netflix is making on notebook infrastructure, covering data use at Netflix, architecture, kernels, UIs, and open source projects, such as nteract.

Wolf Vollprecht is a scientific software developer at QuantStack. Previously, Wolf was a freelance web designer and developer, building software for the BeachBot with Disney Research and making drones find their way at Rapyuta Robotics. He holds a master’s degree in robotics from ETH Zurich and Stanford, focusing on artificial intelligence. In his free time, he’s a passionate cyclist who enjoys spending time outside the city.

Presentations

Going native: C++ as a first-class citizen of the Jupyter ecosystem Session

Sylvain Corlay, Johan Mabille, Wolf Vollprecht, and Martin Renou share the latest features of the C++ Jupyter kernel, including live help, auto-completion, rich MIME type rendering, and interactive widgets. Join in to explore one of the most feature-full implementations of the Jupyter kernel protocol that also brings Jupyter closer to the metal.

Ronald (Ronnie) Walker is a senior at UC Berkeley, where he is studying economics. Ronnie has served as an undergraduate student instructor, connector course teaching assistant, and modules team lead within the university’s Data Science Education Program. As team lead, he worked with faculty in Linguistics, Information Science, Education, Cognitive Science, Legal Studies, Near Eastern Studies, and Economics to build short modules for their courses. Most recently, he has been busy helping departments integrate existing full courses with data science approaches.

Presentations

JupyterHub for domain-focused integrated learning modules Session

The Data Science Modules program at UC Berkeley creates short explorations into data science using notebooks to allow students to work hands-on with a dataset relevant to their course. Mariah Rogers, Ronald Walker, and Julian Kudszus explain the logistics behind such a program and the indispensable features of JupyterHub that enable such a unique learning experience.

Elizabeth Wickes is a lecturer at the School of Information Sciences at the University of Illinois, where she teaches foundational programming and information technology courses. Previously, Elizabeth was a data curation specialist for the Research Data Service at the University Library of the University of Illinois and the curation manager for Wolfram|Alpha. She currently co-organizes the Champaign-Urbana Python user group, has been a Carpentries instructor since 2015 and a trainer since 2017, and is an elected member of the Carpentries executive council for 2018.

Presentations

Reproducible education: What teaching can learn from open science practices Session

As practitioners of open science begin to migrate their educational material into pubic repositories, many of their common practices and platforms can be used to streamline the instruction material development process. Elizabeth Wickes explains how open science practices can be used in an educational context and why they are best facilitated by tools like the Jupyter Notebook.

George Williams is the Director of Data Science, an AI chip and embedded algorithms company. George has worked at the intersection of research and industry for two decades. He has published papers at major mathematics and AI conferences and holds several patents in computer vision and security.

Presentations

Rapid data science exploration for cybersecurity Session

The key to successful threat detection in cybersecurity is fast response. George Williams, Harini Kannan, and Alex Comerford offer an overview of specialized extensions they have built for data scientists working in cybersecurity that can be used and deployed via JupyterHub.

Carol Willing is a research software engineer at Cal Poly San Luis Obispo working full-time on Project Jupyter, a Python Software Foundation fellow and former director, a Jupyter Steering Council member, a geek in residence at FabLab San Diego, where she teaches wearable electronics and software development, and an independent developer of open hardware and software. She co-organizes PyLadies San Diego and San Diego Python, contributes to open source community projects, including OpenHatch, CPython, Jupyter, and AnitaB.org’s open source projects, and is an active member of the MIT Enterprise Forum in San Diego. She enjoys sharing her passion for electronics, software, problem solving, and the arts. Previously, Carol worked in software engineering management, product and project management, sales, and the nonprofit sector. She holds an MS in management with an emphasis on applied economics and high-tech marketing from MIT and a BSE in electrical engineering from Duke University.

Presentations

Deploying a cloud-based JupyterHub for students and researchers Tutorial

Carol Willing, Min Ragan-Kelley, and Erik Sundell demonstrate how to provide easy access to Jupyter notebooks and JupyterLab without requiring users to install anything on their computers. You'll learn how to configure and deploy a cloud-based JupyterHub using Kubernetes and how to customize and extend it for your needs.

Learn by doing: Using data-driven stories and visualizations in the (high school and college) classroom Session

Students learn by doing. Carol Willing, Jessica Forde, and Erik Sundell demonstrate the value of interactive content, using Jupyter notebooks, widgets, and visualization libraries, share notable examples of projects within the Jupyter community, and outline ways educators can help students develop data science literacy and use computational skills to build upon their interests.

Sustaining wonder: Jupyter and the knowledge commons Keynote

New challenges are emerging for Jupyter, open information, and investing in the future. You, the innovators of this growing knowledge commons, will determine how we meet these challenges and sustain the ecosystem. Carol Willing shows how you can start.

The current state of JupyterHub and what's in store for the future Session

JupyterHub is a multiuser server for Jupyter notebooks, focused on supporting deployments in research and education. Min Ragan-Kelley, Carol Willing, and Yuvi Panda discuss recent additions and future plans for the project.

The future of Jupyter in education Session

Join this panel of seasoned educators and the cochairs of the education track at JupyterCon to look to the future of Jupyter in teaching and learning.

Wenming Ye is a senior solution architect at Amazon Web Services.

Presentations

Explore the AWS machine learning platform using Amazon SageMaker 2-Day Training

Wenming Ye and Miro Enev offer an overview of deep learning along with hands-on Jupyter labs, demos, and instruction. You'll learn how DL is applied in modern business practice and how to leverage building blocks from the Amazon ML family of AI services.

Explore the AWS machine learning platform using Amazon SageMaker (Day 2) Training Day 2

Machine learning and IoT projects are increasingly common at enterprises and startups alike and have been the key innovation engine for Amazon businesses such as Go, Alexa, and Robotics. Wenming Ye and Miro Enev lead a hands-on deep dive into the AWS machine learning platform, using Project Jupyter-based Amazon SageMaker to build, train, and deploy ML/DL models to the cloud and AWS DeepLens.

Presentations

Containerizing notebooks for serverless execution (sponsored by AWS) Session

Kevin McCormick explains the story of two approaches which were used internally at AWS to accelerate new ML algorithm development, and easily package Jupyter notebooks for scheduled execution, by creating custom Jupyter kernels that automatically create Docker containers, and dispatch them to either a distributed training service or job execution environment.

Kevin Zielnicki is a data scientist on the styling algorithms team at Stitch Fix. Kevin holds a PhD in physics in the field of quantum information processing, but he now enjoys working with data that can be observed without changing its value.

Presentations

Explorations in reproducible analysis with Nodebook Session

Even with good intentions, analysis notebooks can quickly accumulate a mess of false starts and out-of-order statements. Best practices encourage cleaning up a notebook to ensure reproducibility, but many analyses will never reach this cleaned-up state. Kevin Zielnicki offers an overview of Nodebook, a Jupyter plugin that encourages reproducibility by preventing inconsistency.

Randy Zwitch is a Senior Developer Advocate at MapD, enabling customers and community users alike to utilize MapD to its fullest potential. With broad industry experience in Energy, Digital Analytics, Banking, Telecommunications and Media, Randy brings a wealth of knowledge across verticals as well as an in-depth knowledge of open-source tools for analytics.

Presentations

Using the MapD kernel for the Jupyter Notebook Session

MapD Core is an open source analytical SQL engine that has been designed from the ground up to harness the parallelism inherent in GPUs. This enables queries on billions of rows of data in milliseconds. Randy Zwitch offers an overview of the MapD kernel extension for the Jupyter Notebook and explains how to use it in a typical machine learning workflow.