Brought to you by NumFOCUS Foundation and O’Reilly Media
The official Jupyter Conference
Aug 21-22, 2018: Training
Aug 22-24, 2018: Tutorials & Conference
New York, NY

Speakers

Hear from innovative practitioners, talented managers, and senior developers who are doing amazing things in the Jupyter ecosystem. More speakers will be announced; please check back for updates.

Filter

Search Speakers

Ryan P. Abernathey is an Assistant Professor of Earth And Environmental Science at Columbia University and Lamont Doherty Earth Observatory. He received his Ph.D. from MIT in 2012 and a B.A. from Middlebury College. He joined Columbia in 2013 after a postdoc at Scripps Institution of Oceanography. Ryan is a physical oceanographer who studies the large-scale ocean circulation and its relationship with Earth’s climate. High-resolution numerical modeling and satellite remote sensing are key tools in this research, which has led to an interest in high performance computing and big data.

In Feb. 2016, Prof. Abernathey was awarded an Alfred P. Sloan Research Fellowship in Ocean Sciences and an NSF CAREER award for a project entitled “Evolution of Mesoscale Turbulence in a Changing Climate.” He received a NASA New Investigator Award in 2013. He is an active participant in and advocate for open source software, open data, and reproducible science.

Presentations

Pangeo: Big Data Climate Science in the Cloud 40-minute session

Climate science is being flooded with petabytes of data, overwhelming traditional modes of data analysis. The Pangeo project is building a platform to take Big Data Climate Science into the cloud using scientific python and large-scale interactive computing tools. Come find out what we are building, why we are building it, and how you can use it!

I am an IT manager for the Pacific Institute for the Mathematical Sciences, I’m also a long time user of ipython and jupyter with a background in computational physics.

I helped create and deploy a system of JupyterHubs under the name syzygy.ca allowing more than 8000 staff, students and faculty to include jupyter in their work. I am also involved in a program to leverage Jupyter in K12 classrooms via the Canadian Government’s CanCode initiative.

Presentations

Canadians Land on Jupyter 40-minute session

Over the past 18 months, we have deployed Jupyter to more than 8000 users at Universities across Canada. In this talk, we'll discuss how we did it, how we plan to scale and deliver the service nationally, how people are using the platform, and how we intend to make Jupyter integral to the working experience of students, researchers, and faculty members.

Damián Avila is a Software Developer, Data Scientist and Quantitative Analyst from Córdoba, Argentina.
His main focus of interests is Data Science, Finance, Data Visualization and the Jupyter/IPython ecosystem.
He has made meaningful contributions to several Open Source projects (core developer of popular projects, such as Jupyter/IPython, Nikola, and Bokeh) and also he started his own projects being RISE (a “live” slideshow for the Jupyter notebook) the most popular one.
He has presented talks, tutorials, and posters in several national and international conferences.
Currently, he’s working and leading projects as a Software Developer at Anaconda, Inc.

Presentations

Current RISE candies and its evolution into the future. 40-minute session

RISE has evolved into the main slideshow machinery for live presentations within the Jupyter notebook. In this talk, we'll explain how to install/use RISE and how to customize it. Additionally, we will show some new capabilities. Finally, we'll show the beginning of the migration from RISE into a new jupyterlab-rise extension providing RISE-based capabilities in the new Jupyter Lab interface.

Doug is an associate professor of computer science at Bryn Mawr College, an all-women’s college outside of Philadelphia, PA. He has been using Python in education for 20 years, and Jupyter since its creation. He has developed many languages and tools for Jupyter specifically for pedagogy. His research area is in combining artificial neural networks and robotics in order to give robots self-motivation.

Presentations

Jupyter Graduates! 40-minute session

For the last four years, I have used nothing but Jupyter in the classroom. From a firstyear writing course to a course on assembly language; from Biology to Computer Science; from lectures to homework---everything has been in Jupyter. In this talk, I explore the ways I have leveraged Jupyter, and detail the successes and failures experienced along the way.

Matt currently leads instruction for GA’s Data Science Immersive in Washington, D.C. and most enjoys bridging the gap between theoretical statistics and real-world insights. Matt is a recovering politico, having worked as a data scientist for a political consulting firm through the 2016 election. Prior to his work in politics, he earned his Master’s degree in statistics from The Ohio State University. Matt is passionate about making data science more accessible and putting the revolutionary power of machine learning into the hands of as many people as possible. When he isn’t teaching, he’s thinking about how to be a better teacher, falling asleep to Netflix, and/or cuddling with his pug.

Presentations

Advanced Data Science, Part 2: I have missing data.... how do I handle it (the right way)?" Handle missing data in 5 ways in Jupyter notebooks. Tutorial

Missing data plagues nearly every data science problem. Oftentimes, people just drop or ignore missing data. However, this usually ends up with bad results. We'll show how bad dropping or ignoring missing data can be, then we'll learn how to fix this - the right way! Leverage Jupyter notebooks to properly reweight or impute your data.

Diogo Castro is a member of the Software Development for Experiments group at CERN, where he works in the SWAN team as a full-stack developer.

Presentations

SWAN: CERN's Jupyter-based Interactive Data Analysis Service 40-minute session

SWAN, CERN’s Service for Web-based ANalysis, is leveraging the power of Jupyter to provide the High Energy Physics community with access to state-of-the-art infrastructure and services through a web service. This presentation details how this was possible and how is being used by researchers and students.

Christopher Cho is a Product manager/Cloud Program manager at Google helping customers solve machine learning and infrastructure problems. Chris was research program manager at DeepMind helping cutting edge ML research and decided to pivot in order to help customers solve real business problems. Chris comes from enterprise business consulting background. Chris is one of the product managers in Kubeflow team. Chris is currently obtaining MSCS from Georgia Tech and BS MechE from University of Illinois Urbana-Champaign.

Presentations

Machine Learning at Scale with Kubernetes 1-Day Training

This talk will explore how Kubernetes can be easily leveraged to build a complete Deep Learning pipelines starting all the way from data ingestion/aggregation, pre-processing, ML training, and serving with the mighty Kubernetes APIs.

Pramit Choudhary is a lead data scientist at DataScience.com, where he focuses on optimizing and applying classical machine learning and Bayesian design strategy to solve real-world problems. Currently, he is leading initiatives on figuring out better ways to explain a model’s learned decision policies to reduce the chaos in building effective models and close the gap between a prototype and operationalized model.

Presentations

Human in the Loop: Understanding model interpretation with Jupyter and Skater Tutorial

Just predicting the target labels for a datascience use-case is not enough. It is important to understand the “why”, “what” & “how” about the model’s behavior. In the tutorial, we will explore algorithms(posthoc and rule extraction) to faithfully interpret ML models globally and locally with jupyter's interactiveness and “Skater”, an opensource library to demystify inner working of ML models

April Clyburne-Sherin is an epidemiologist, methodologist, and expert in open science tools, methods, training, and community stewardship. She holds an MS in Population Medicine (Epidemiology). She is co-founder of OOO Canada, a network to promote leadership in open access, open education, and open data, and producer of The Method, an open source podcast. Since 2014, she has focussed on training scientists in open and reproducible research methods (Center for Open Science, Sense About Science, SPARC)and is co-author of FOSTER’s Open Science Training Handbook. In her current role of Outreach Scientist, she trains scientists in computational reproducibility best practices using Code Ocean.

Presentations

Preparing your Jupyter notebook for computationally reproducible publication: A hands-on, BYONotebook tutorial for researchers Tutorial

This is a practical tutorial to prepare Jupyter notebooks for computationally reproducible publication. We start with introductory information about computational reproducibility but the bulk of the tutorial is guided work. Best practices for publishing notebooks are covered, with participants preparing their research for reuse, creating documentation, and submitting their notebook to share.

James Colliander is Professor of Mathematics at UBC and serves as Director of the Pacific Institute for the Mathematical Sciences. He is also the Founder/CEO of Crowdmark, an education technology company based in Toronto. Colliander’s research intertwines partial differential equations, harmonic analysis, and dynamical systems to address problems arising from mathematical physics and other sources. He received his PhD in 1997 from the University of Illinois. After an NSF Postdoc at the University of California Berkeley, Colliander joined the University of Toronto and became Professor in 2007. He moved to UBC in 2015. Colliander was Professeur Invité at the Université de Paris-Nord, Université de Paris-Sud, and at the Institut Henri Poincaré. He has been a member of the Institute for Advanced Study. Colliander received a Sloan Fellowship, the McLean Award, and is an award winning teacher.

Presentations

Canadians Land on Jupyter 40-minute session

Over the past 18 months, we have deployed Jupyter to more than 8000 users at Universities across Canada. In this talk, we'll discuss how we did it, how we plan to scale and deliver the service nationally, how people are using the platform, and how we intend to make Jupyter integral to the working experience of students, researchers, and faculty members.

Sylvain Corlay is a quant researcher specializing in stochastic analysis and optimal control and the founder of QuantStack. Previously, Sylvain was a quant researcher at Bloomberg LP and an adjunct faculty member at Columbia University and NYU. As an open source developer, Sylvain mostly contributes to Project Jupyter in the area of interactive widgets and lower-level components such as traitlets. He is also a member of the steering committee of the project. Sylvain is also a contributor to a number of other open source projects for scientific computing and data visualization, such as bqplot, pythreejs, and ipyleaflet, and coauthored the xtensor C++ tensor algebra library. He holds a PhD in applied mathematics from University Paris VI.

Presentations

Going Native: C++ as a First-Class Citizen of the Jupyter Ecosystem 40-minute session

In this talk, we present the latest features of the C++ Jupyter kernel including - live help, auto-completion, - rich mime type rendering, - interactive widgets, making it one of the most featureful implementations of the Jupyter kernel protocol, and bringing Jupyter closer to the metal.

Miro Enev’s interests are in advancing data science and machine intelligence while respecting human values in future technology ecosystems. He is currently a Solutions Architect at NVIDIA. He helps train and guide pilot deep learning projects at Amazon.

Presentations

Explore AWS Machine Learning Platform using Amazon SageMaker 2-Day Training

Machine Learning and IoT projects are now common for enterprises and startups alike. These advanced technologies have been the key innovation engine for businesses such as Amazon Go, Alexa, and Robotics. In this hands-on workshop, we will explore the AWS Machine Learning Platform using project Jupyter-based Amazon SageMaker to build, train, and deploy ML/DL models to Cloud, and AWS DeepLens.

Explore AWS Machine Learning Platform using Amazon SageMaker (Day 2) Training Day 2

Machine Learning and IoT projects are now common for enterprises and startups alike. These advanced technologies have been the key innovation engine for businesses such as Amazon Go, Alexa, and Robotics. In this hands-on workshop, we will explore the AWS Machine Learning Platform using project Jupyter-based Amazon SageMaker to build, train, and deploy ML/DL models to Cloud, and AWS DeepLens.

Dr. Tyler A. Erickson is a Senior Developer Advocate at Google. In this role, he fosters collaborations with researchers from academia, NGO’s, and governmental organizations seeking to capitalize on Earth Engine’s capabilities for geospatial analyses that involve immense satellite and model-based datasets. Dr. Erickson leads the development of Earth Engine’s core efforts in water and climate, guides the evolution of Earth Engine to support these scientific domains, and leads support efforts for the Earth Engine Python API. A snow hydrologist by training, he has degrees civil and environmental engineering and geography degrees from Colorado State University, CalTech, Stanford, and the University of Colorado at Boulder. Tyler is a longtime Python programmer, with contributions to Open Source Geospatial (OSGeo) Foundation and the Free and Open Source Software for Geospatial (FOSS4G) conferences.

Presentations

How JupyterLab and Widgets Enable Interactive Analysis of the Earth's Past, Present, and Future 40-minute session

Massive collections of data on the Earth's changing environment, collected by satellite sensors and generated by Earth system models, are being exposed via web APIs by multiple providers. This presentation will highlight the use of JupyterLab and Jupyter Widgets in analyzing complex high-dimensional datasets, providing insights into how our Earth is changing and what the future might look like.

Nicolas Fernandez is a Computational Scientist at the
Human Immune Monitoring Center at the Icahn School of Medicine at Mount Sinai. Nicolas is a computational biologist with interests in analysis and visualization of high-throughput biological data as a means to understanding biological regulatory networks.

Presentations

Visualizing High-Dimensional Biological Data with Clustergrammer-Widget in Jupyter Notebooks 40-minute session

Exploring high-dimensional requires the development of sophisticated interactive visualizations to enable users to easily discover complex patterns within their data. We developed Clustergrammer-widget, an interactive heatmap Jupyter widget, that enables users to easily explore high-dimensional data within a Jupyter notebook and share their interactive visualizations using NBviewer.

Jessica Forde is a technical writer for Project Jupyter. Her previous open source projects include datamicroscopes, a Bayesian nonparametrics library in Python, and density, a tool for Columbia University study spaces based on wireless device data.

Presentations

Learn by Doing: Using Data-driven Stories and Visualization in the Classroom (High School and College) 40-minute session

Students can learn by doing. In this talk, we will show how interactive content, using Jupyter Notebooks, Widgets, and visualization libraries put the student in charge. We will share notable examples of projects within the Jupyter community and offer ways in which educators can help students to develop data science literacy and use computational skills to build upon their interests.

Zachary Glassman is a data scientist in residence at the Data Incubator. Zachary has a passion for building data tools and teaching others to use Python. He studied physics and mathematics as an undergraduate at Pomona College and holds a master’s degree in atomic physics from the University of Maryland.

Presentations

Hands-On Data Science with Python 2-Day Training

This course offers a foundation in building intelligent business applications using machine learning. We will walk through all the steps of developing a machine learning pipeline. We’ll look at data cleaning, feature engineering, model building/evaluation, and deployment. Students will extend these models into two applications from real-world datasets.

Hands-On Data Science with Python (Day 2) Training Day 2

This course offers a foundation in building intelligent business applications using machine learning. We will walk through all the steps of developing a machine learning pipeline. We’ll look at data cleaning, feature engineering, model building/evaluation, and deployment. Students will extend these models into two applications from real-world datasets.

Bruno Gonçalves is a Moore-Sloan fellow at NYU’s Center for Data Science. With a background in physics and computer science, Bruno has spent his career exploring the use of datasets from sources as diverse as Apache web logs, Wikipedia edits, Twitter posts, epidemiological reports, and census data to analyze and model human behavior and mobility. More recently, he has been focusing on the application of machine learning and neural network techniques to analyze large geolocated datasets.

Presentations

Advanced Data Science, Part 1: Data Visualization in Jupyter using matplotlib and seaborn Tutorial

The fundamental concepts and ideas behind human visual perception and how it informs scientific data visualization are introduced in an intuitive and grounded manner. These concepts are illustrated through practical examples using matplotlib and seaborn, following a tutorial on these two libraries. Finally, the main ideas will be summarized in the form of rules of thumb for ease of reference.

Sean is the head of technical product management at DigitalGlobe. Previously Sean was a co-founder of Timbr.io a platform for enabling algorithmic orchestrations with sensor and social data, which was acquired by DigitalGlobe. Before starting Timbr.io he was the founder and CEO of GeoIQ – a collaborative data and analytics company serving commercial and government customers. GeoIQ was subsequently acquired by ESRI where Sean worked integrating social data with ESRI’s mapping technologies. Sean has also previously worked in academia serving as a research professor at George Mason University. His academic research was focused on the intersection of complexity science, statistical mechanics and spatial analysis. Sean received his PhD from George Mason University as the Provost’s High Potential Research Candidate, Fisher Prize winner and an INFORMS Dissertation Prize recipient.

Presentations

Using Jupyter to Create a Community for Satellite Imagery Analysis and Sharing 40-minute session

Satellite imagery can be a critical resource during disasters and humanitarian crises. While the community has improved data sharing we still struggle to create reusable data science to solve on the ground problems. GBDX Notebooks is a step towards creating an open data science community built around Jupyter to stream imagery and share analysis at scale.

Loïc Gouarin is a Research Engineer at CNRS. He develops scientific computing software for different fields such as Lattice-Boltzmann methods, and Stokes solvers for fluid particles interaction. Loic is an expert in optimization code and HPC. He is also the director of the “Groupe Calcul” of the CNRS whose role is to animate the scientific and high-performance computing community in France. He contributes to several open source projects: xeus-cling, xplot, xthreejs, …

Presentations

Going Native: C++ as a First-Class Citizen of the Jupyter Ecosystem 40-minute session

In this talk, we present the latest features of the C++ Jupyter kernel including - live help, auto-completion, - rich mime type rendering, - interactive widgets, making it one of the most featureful implementations of the Jupyter kernel protocol, and bringing Jupyter closer to the metal.

Brian Granger is an associate professor of physics and data science at Cal Poly State University in San Luis Obispo. Brian is a leader of the IPython project, cofounder of Project Jupyter, and an active contributor to a number of other open source projects focused on data science in Python. Recently, he cocreated the Altair package for statistical visualization in Python. He is a advisory board member of NumFOCUS and a faculty fellow of the Cal Poly Center for Innovation and Entrepreneurship.

Presentations

Friday Opening Remarks Keynote

Friday Opening Remarks

JupyterLab 1-Day Training

Brian Granger offers an in-depth view of JupyterLab, which enables users to work with the core building blocks of the classic Jupyter Notebook in a more flexible and integrated manner.

JupyterLab Tutorial

Brian Granger offers an in-depth view of JupyterLab, which enables users to work with the core building blocks of the classic Jupyter Notebook in a more flexible and integrated manner.

Thursday Opening Remarks Keynote

Thursday Opening Remarks

Joel Grus is a research engineer at the Allen Institute for Artificial Intelligence, the author of the beloved O’Reilly book “Data Science from Scratch”, and the author of the beloved blog post “Fizz Buzz in Tensorflow”. Previously he worked as a software engineer at Google and as a data scientist at a variety of startups. He lives in Seattle.

Presentations

I Don't Like Notebooks 40-minute session

I have been using and teaching Python for many years. I wrote a bestselling book about learning data science. And here's my confession: I don't like notebooks. [There are dozens of us!] In this talk I'll explain why I find notebooks difficult, show how they frustrate my preferred pedagogy, demonstrate how I prefer to work, and discuss what Jupyter could do to win me over.

Chris Harris is a staff research and development engineer at Kitware. Chris has a wide range of research interests from high performance computing right through to client side visualization of scientific data sets. Prior to working at Kitware Chris worked at IBM on high performance messaging systems. He holds a masters degree in the Computing and Artificial Intelligence from Imperial College London.

Presentations

Reproducible quantum chemistry in Jupyter 40-minute session

In-silico prediction of chemical properties has seen vast improvements in both veracity and volume of data, but is currently hamstrung by a lack of transparent, reproducible workflows coupled with environments for visualization and analysis. We have developed a platform that uses Jupyter notebooks to enable end-to-end workflow from simulation setup, right through to visualizing the results.

I have contributed to the development of the Jupyter project and other PyData projects for several years. I am a known good actor in the python data ecosystem. I have extensive experience using and developing python and c++ for data science applications.

As a PhD student and post-doc I have given many talks at small and large international conferences to other physicists as well as undergraduate students. I co-organise the PyData meetup in Zurich and give talks at local meetups every few months as well as open-source conferences like PyCon and EuroSciPy. I am one of the maintainers of scikit-optimize a python library for blackbox optimisation and have contributed to scikit-learn. I run a free-lance consultancy specialised in building full stack data science solutions and teaching artificial intelligence skills. Customers include a large international organisation based in Geneva, Startups, NGOs, open-source projects, research groups. I am a mentor for Mozilla’s Open Leadership programme.

My homepage: http://www.wildtreetech.com

Presentations

Binder - lowering the bar to sharing interactive software 40-minute session

The Binder project drastically lowers the bar to sharing and re-using software. As a user wanting to try out someone else’s work requires only clicking a single link. This talk will introduce the audience to the concepts and ideas behind the Binder project. We will showcase examples from the community to Show off the power of Binder.

Jane Herriman is Director of Diversity and Outreach at Julia Computing and a PhD student at Caltech. She is a Julia, dance, and strength training enthusiast who uses Jupyter notebooks to teach Julia.

Presentations

An introduction to Julia in Jupyter Tutorial

This introductory workshop assumes no prior exposure to Julia. It should be accessible (and hopefully useful!) to scientists, engineers, and anyone else with technical computing needs. We will use Jupyter notebooks to show you why Julia is special, demonstrate how easy it is to learn Julia, and get you writing your first Julia programs.

The journey to Julia 1.0 - the "Ju" in Jupyter 40-minute session

Julia and Jupyter share a common evolution path. Julia is the language for modern technical computing, while Jupyter is the development and presentation environment for modern technical computing. This talk will explore the journey of Julia and the impact of Jupyter on Julia's growth.

Matthew Hunt started playing with computers when he was 8, sold his first program at 13, and retains an unhealthy degree of curiosity. He lives in New York, where he can be found tinkering with 3d printers, dabbling in the future of flight, playing with VR headsets, and even doing work sometimes. He still believes that where you find people having the most fun, there will you find the future being created. He runs the NYC Spark User’s group.

Presentations

What things are correlated with gender diversity: A dig through the ASF, Jupyter projects 40-minute session

Many of us believe that gender diversity in open source projects is important (and if you don’t this isn’t going to convince you), but what things are correlated with improved gender diversity and what can we learn from similar historic industries? We will explore the diversity of different projects possible factors. We’ll examine historic EEOC complaints & parallels + historic solutions.

Paul Ivanov is a senior software engineer at Bloomberg LP working on IPython- and Jupyter-related open source projects. Previously, Paul worked on backend and data engineering at Disqus; was a code monkey at the Brain Imaging Center at UC Berkeley, where he worked on IPython and taught at UC Berkeley’s Python bootcamps; worked in Bruno Olshausen’s lab at the Redwood Center for Theoretical Neuroscience; and was a PhD candidate in the Vision Science program at UC Berkeley. He holds a degree in computer science from UC Davis.

Presentations

Terraforming Jupyter: Changing JupyterLab to Suit Your Needs 40-minute session

We will walk through a series of extensions that demonstrate the power and flexibility of JupyterLab’s architecture. From targeted functionality modifications to more extreme atmospheric changes that require extensive decoupling and flexibility within JupyterLab, we explore the complexity and stability of extensions and how they can combine to form custom, opinionated JupyterLab environments.

Kerim Kalafala is a member of the IBM Academy of Technology, a Senior Technical Staff Member in the IBM Systems Group, and an IBM Master Inventor. His current role is lead architect of static timing and noise analysis software tools used to design and verify the world’s fastest microprocessors. Kerim has received multiple prestigious Research Division awards for publications in computer science and mathematics, an ACM/IEEE Technical Impact Award in Electronic Design Automation, as well as a best-paper award at the Design Automation Conference, and was recognized for co-authoring a top-10 most cited paper in the 50 year history of DAC. Kerim has also received both the IBM Corporate and Outstanding Technical Achievement Awards for contributions to the field of statistical timing analysis. He is an inventor on 49 issued patents worldwide and approximately a dozen more pending. Kerim is a member of the executive board for the Rhinebeck Science Foundation, and volunteers extensively in his local community. Before joining IBM, Kerim received his undergraduate and graduate degrees in Computer and Systems Engineering from Rensselaer Polytechnic Institute, where he graduated with Summa Cum Laude honors.

Presentations

Design and Analysis of the World’s Most Advanced Microprocessors Using Jupyter Notebooks 40-minute session

We will present our experiences using Jupyter notebooks, as a critical aid in the design the next generation of IBM Power and Z processors. Analytics on graphs consisting of hundreds of millions of nodes will be emphasized along with leveraging Jupyter notebooks as part of our overall design system.

Praveen Kanamarlapudi is a senior software engineer at PayPal and a contributor to Livy and Sparkmagic. He has been working on building scalable and distributed platforms at PayPal. As part of the Core Data Platform team, Praveen has enabled a highly available Jupyter platform which is being used by hundreds of data scientists, analysts and developers at PayPal.

Presentations

PayPal Notebooks: Data science and machine learning at scale, powered by Jupyter 40-minute session

Hundreds of data scientists, analysts and developers at PayPal use Jupyter to access data spread across filesystem, relational, document and key-value stores. Jupyter enables complex analytics and an easy way to build, train and deploy machine learning models at PayPal. Learn more about how we built the Jupyter infrastructure and powerful extensions at PayPal.

Holden Karau is a transgender Canadian open source developer advocate at Google focusing on Apache Spark, Beam, and related big data tools. Previously, she worked at IBM, Alpine, Databricks, Google (yes, this is her second time), Foursquare, and Amazon. Holden is the coauthor of Learning Spark, High Performance Spark, and another Spark book that’s a bit more out of date. She is a committer on the Apache Spark, SystemML, and Mahout projects. When not in San Francisco, Holden speaks internationally about different big data technologies (mostly Spark). She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal. Outside of work she enjoys playing with fire, riding scooters, and dancing.

Presentations

What things are correlated with gender diversity: A dig through the ASF, Jupyter projects 40-minute session

Many of us believe that gender diversity in open source projects is important (and if you don’t this isn’t going to convince you), but what things are correlated with improved gender diversity and what can we learn from similar historic industries? We will explore the diversity of different projects possible factors. We’ll examine historic EEOC complaints & parallels + historic solutions.

David Koop is an Assistant Professor in the Computer and Information Science Department at UMass Dartmouth. He received his Ph.D.in Computing from the University of Utah in 2012. His research interests include data visualization, computational provenance, and data science environments. He has served as a core developer for the VisTrails project and has collaborated with scientists in the fields of climate science, quantum physics, and invasive species modeling.

Presentations

Supporting Reproducibility in Jupyter through Dataflow Notebooks 40-minute session

Dataflow notebooks build on the Jupyter Notebook environment by adding constructs to make dependencies between cells explicit and clear. In this environment, unique, persistent cell identifiers make references between cells more robust. In addition, support for recursive dataflow execution of cells allows users to better organize, reuse, and reproduce their work.

Nicholai L’Esperance is a staff engineer in the IBM Systems group in Essex Junction, Vermont, where he works in the Product Engineering Diagnostics group. In this role, Nicholai develops new tools and methodologies to aid yield, reliability, and characterization missions for IBM’s Power and Z programs. Before joining IBM, Nicholai completed his BSEE and MSEE at the University of Vermont, graduating with Cum Laude honors. During his time at UVM, Nicholai’s studies focused on signal analysis, co-authoring several papers on ground-penetrating radar and device testing. Nicholai is continuing his studies, pursuing a graduate degree in Computer Science.

Presentations

Design and Analysis of the World’s Most Advanced Microprocessors Using Jupyter Notebooks 40-minute session

We will present our experiences using Jupyter notebooks, as a critical aid in the design the next generation of IBM Power and Z processors. Analytics on graphs consisting of hundreds of millions of nodes will be emphasized along with leveraging Jupyter notebooks as part of our overall design system.

Julia Lane is a Professor at the NYU Wagner Graduate School of Public Service, at the NYU Center for Urban Science and Progress, and a NYU Provostial Fellow for Innovation Analytics.

Previous to this, Julia was a Senior Managing Economist and Institute Fellow at American Institutes for Research. In this role Julia co-founded the Institute for Research on Innovation and Science (IRIS) at the University of Michigan. Julia has held positions at the National Science Foundation, The Urban Institute, The World Bank, American University and NORC at the University at Chicago.

Presentations

Jupyter, Sensitive Data and Public Policy 40-minute session

As Lew Platt, CEO of Hewlett Packard, once said, 'If only HP knew what HP knows, we would be three times more productive” Yet government agencies have found it difficult to serve taxpayers because of the technical, bureaucratic and ethical issues associated with access and use of sensitive data.

I’m a MS student at UC Berkeley advised by Josh Hug. I am interested in improving data science education. Currently, I’m building tools to make it easy to create and publish interactive educational content online.

Presentations

nbinteract: Shareable, Interactive Webpages From Notebooks 40-minute session

The nbinteract package converts Jupyter notebooks with widgets into interactive, standalone HTML pages. nbinteract’s built-in support for function-driven plotting makes authoring interactive pages simpler by allowing users to focus on data, not callbacks. We will introduce nbinteract and walk through the steps to publish an interactive web page from a Jupyter notebook.

Mr. Lawler is an engineering consultant with expertise in coastal and riverine surface water modeling. Mr. Lawler is a subject matter expert in scientific programming with experience developing and scaling serial applications for parallel processing in High Performance and Cloud Computing environments. He has worked on broad ranging projects at the national, state, and local level including the development and quality control of tools in use by the US Army Corps of Engineers and the United States Geological Survey. Mr. Lawler is currently completing a PhD in Civil Engineering at George Mason University, where he is conducting research with the National Weather Service to enhance modeling and forecasting capabilities in areas influenced by coastal and fluvial flooding mechanisms.

Presentations

Using JupyterLab for flood map development: approaches for improving productivity and reproducibility 40-minute session

Creating flood maps for coastal & riverine communities requires geospatial processing, statistical analysis, finite element modeling, and a team of specialists working together. This talk will demo the process of how using the feature-rich JupyterLab to develop tools, share code with team members, and document workflows used in the creation of flood maps improves productivity and reproducibility.

Johan Mabille is a scientific software developer at QuantStack, where he specializes in high-performance computing in C++. Previously, Johan was a quant developer at HSBC. An open source developer, Johan is the coauthor of xtensor and xeus and the main author of xsimd. He holds a master’s degree in computer science from Centrale-Supelec.

Presentations

Going Native: C++ as a First-Class Citizen of the Jupyter Ecosystem 40-minute session

In this talk, we present the latest features of the C++ Jupyter kernel including - live help, auto-completion, - rich mime type rendering, - interactive widgets, making it one of the most featureful implementations of the Jupyter kernel protocol, and bringing Jupyter closer to the metal.

Romit has over 19 years of experience with data, building data and analytics solutions for a wide variety of companies across networking, semi-conductors, telecom, security and fintech industries. At PayPal he is leading the product management of core big data and analytics platform products which include a compute framework, a data platform and a notebooks platform.

As part of this role, Romit is working to simplify data analysis and exploration, machine learning and data application development on big data technologies and improve analyst and data scientist agility and ease of accessing data spread across a multitude of data stores via friendly technologies like SQL and notebooks.

Outside of data products, Romit’s time is spent with his wife Kosha and two wonderful kids, Annika and Vedant.

Presentations

PayPal Notebooks: Data science and machine learning at scale, powered by Jupyter 40-minute session

Hundreds of data scientists, analysts and developers at PayPal use Jupyter to access data spread across filesystem, relational, document and key-value stores. Jupyter enables complex analytics and an easy way to build, train and deploy machine learning models at PayPal. Learn more about how we built the Jupyter infrastructure and powerful extensions at PayPal.

Julia’s background is in music, but she’s been learning more about technology and scientific computing ever since she joined Two Sigma in the spring of 2010. Her current role in the organization is that of open source coordinator. She’s enjoyed every stop of her quest to learn more about open source software, from getting to know what makes the products developed at Two Sigma special to writing backing tracks for her musical Reb + VoDKa + Me on Sonic Pi. www.juliameinwald.com.

Presentations

Open Source Software and the Allocation of Capital 40-minute session

The presentation will explain why Two Sigma, a company in a space notorious for protecting IP, thinks it's important to contribute to the open source community. I'll talk about the evolution of our thinking and policies over the past five years, and make a case for why other companies should make a commitment to the open source ecosystem.

A Chemical Engineering / Computer Science double major who has spent the last 20 years in the refining and petrochemicals industry looking (mostly unsuccessfully) to find a harmonious union of the two disciplines.

My employer, Honeywell UOP, has a long and illustrious history as a technology licensor in the refining and petrochemical industry. UOP has pioneered many advancements in catalyst and process technology that have revolutionized the oil refining industry. I currently work as a Regional Service Manager in UOP’s Technology Services division, which itself has a long and important history in the UOP organization for bringing world-class technical service to UOP’s many customers.

Prior to my current role I worked in UOP’s Field Operating Services, whose members travel the world helping refiners commission and operate UOP technology. In my travels I’ve had the pleasure of working with talented and welcoming individuals in Egypt, Mexico, Chile, Venezuela, Korea, Japan, and Russia.

My history with Python goes to the beginning of my career in the late 90’s when I built a simple web server on the company intranet using an early incarnation of Zope. After a long hiatus while I traveled the world I came back to Python, discovering the wonders of IPython and Python’s amazing data science ecosystem.

At heart, though, I’m a Lisp-guy and my fascination with that language goes back to college and my favorite CS topic – Artificial Intelligence – in my junior and senior years. Up until then I was very much a vi user, but it is hard to learn Lisp without also learning that most daunting of text editors: Emacs. Interestingly, it really wasn’t until I was well into my career as a Chemical Engineer that I began to use Emacs in earnest, all because of the indispensable org-mode.

And finally, EIN! I found the emacs-ipython-notebook after (re)discovering IPython around version 0.11. Amazing things were happening in Python at the
time, and I was re-learning the joys of using something other than Excel for doing engineering work. I started as a user, but soon became something more as ein’s creator- Takafumi Arakaki- moved on to other things and big changes in ipython required significant updates to ein to maintain compatibility. Not able to live with the thought of a world without ein, I foolishly dived into the world of elisp and Jupyter development and have not looked back.

At the moment ein enjoys a modest user community – over 500 stars on github and over 40,000 downloads on MELPA. Their kind words and bug reports have helped keep ein relevant throughout the many changes in Jupyter’s architecture in the past few years.

Presentations

The Emacs Ipython Notebook 40-minute session

A full-featured client for the Jupyter Notebook in Emacs. The Emacs IPython Notebook, or EIN, is a full-feature client for the Jupyter Notebook that runs in the venerable Emacs":https://www.gnu.org/software/emacs/ text editor. This presentation is intended to provide a general introduction to the tool along with a brief history of its development.

Paco Nathan leads the Learning Group at O’Reilly Media. Known as a “player/coach” data scientist, Paco led innovative data teams building ML apps at scale for several years and more recently was an evangelist for Apache Spark, Apache Mesos, and Cascading. Paco has expertise in machine learning, distributed systems, functional programming, and cloud computing with 30+ years of tech industry experience, ranging from Bell Labs to early-stage startups. Paco is an advisor for Amplify Partners and was cited in 2015 as one of the top 30 people in big data and analytics by Innovation Enterprise. He is the author of Just Enough Math, Intro to Apache Spark, and Enterprise Data Workflows with Cascading.

Presentations

Friday Opening Remarks Keynote

Friday Opening Remarks

Thursday Opening Remarks Keynote

Thursday Opening Remarks

I grew up in an Army family spending time in California, Texas, 2 different bases in Germany, and finished in Northern New York where I graduated high school. From there I was off to SUNY Potsdam to study mathematics. Upon graduating from Potsdam I knew I needed long sunny days and warm weather and so I went to graduate school at the University of Florida.

While at Florida I found what I was looking for, beautiful weather, beaches, and algebraic topology. I found manifolds to be an interesting blend of exotic spaces, yet well-behaved spaces and finished my PhD by analyzing a topological invariant that nobody can pronounce. The market for academics is pretty rough right now and so I was off to teach at an independent school.

I’ve been at Trinity five years now. Among my graduate school peers, I think I have the strongest students, the smallest classes, and the opportunity to do the most interesting work! Each year I teach a course on advanced topics that lie beyond a traditional high school curriculum. Recent courses have included algebraic number theory, combinatorics, linear algebra, group theory, and cryptography, each time with a significant coding component. Additionally I’m able to draw upon my background to extend the depth of our standard curriculum, improving everything from 9th grade math to BC calculus.

In my spare time I enjoy exploring NYC, particularly when an interesting restaurant is involved. I love fruity herbal tea, having spent some time as the adviser to the tea club at Trinity.

Presentations

Jupyter for every high schooler 40-minute session

In an effort to broaden our graduates' mathematical toolkit as well as address gender equity in STEM education I've led the implementation of python projects across our entire 9th grade math courses. Every student in the 9th grade completes 3 python projects that introduce programming and integrate it with the ideas developed in class.

I am an organizational sociologist at NYU investigating how organizations integrate (or fail to integrate) data-driven decision making insights and processes.

Presentations

Data Science in US & Canadian Higher Education 40-minute session

This talk will be based on research that of the various infrastructure models supporting data science in research settings in terms of funding, educational uses, and research utilization. Specifically, we explore the national federation model currently established in Canada, with the support of the Canadian federal government, in comparison to the more grassroots efforts in many US universities.

Catherine Ordun is a Senior Data Scientist at Booz Allen Hamilton, in Washington, D.C. She has a background in biology, public health, and business, and is a self-taught Python programmer. She has led data science work across the U.S. Government, including U.S. intelligence, public health, and DoD agencies. She is lucky to be on the Women in Data Science Committee at Booz Allen, is a two-time recipient of the Women of Color (WoC) award, has presented to the National Academy of Medicine, led her team to the Top 3 in a Health and Human Services Opioid Codeathon, and is currently a program reviewer for SciPy2018. She is passionate about machine learning, and has recently started participating in Kaggle challenges as well as has started an internal firm-wide machine intelligence Meetup.

Presentations

Jupyter Notebook as a Transparent Way to Document Machine Learning Model Development - Case Study for a U.S. Defense Agency 40-minute session

Many U.S. government agencies are just getting started in machine learning. As a result, data scientists need to de-"black box" models as much as possible. One simple way to do this is to transparently show how the model is coded and its results at each step. Notebooks do just this. We will walk through a notebook we built for RNNs and discuss how we think agencies can use Notebooks.

Carl is a program manager focused on helping Google’s customers and business partners get trained and certified to run machine learning and data analytics workloads on Google Cloud. With over 16 years of experience in the IT industry, Carl worked with the world’s leading technology companies across United States and Europe, including in leadership roles on programs and projects in the areas of big data, cloud computing, service-oriented architecture, machine learning, and computational natural language processing. Carl is an author of over 20 articles in professional, trade, and academic journals, an inventor with 6 patents at USPTO, and holds 3 corporate awards from IBM for his innovative work. You can find out more about Carl on his blog http://www.cloudswithcarl.com

Presentations

Serverless Machine Learning with TensorFlow 1-Day Training

In this workshop, we walk through the process of building machine learning models with TensorFlow. We cover data exploration, feature engineering, model creation, training, evaluation and deployment.

M Pacer is a Jupyter core developer at the Berkeley Institute for Data Science (BIDS) focusing on the intersection between Jupyter and scientific publishing (with an eye toward constructing a total scientific record that is more amenable to machine learning techniques). M holds a PhD from UC Berkeley, where his research used machine learning and human experiments to study casual explanation and causal inference, and a BS from Yale University.

Presentations

Making beautiful objects with Jupyter 40-minute session

Jupyter displays a rich array of media types out-of-the-box. In this talk, we will describe how to use these capabilities to their full potential. We will show how to add rich displays to existing and new Python classes. We will also show you how to customise the way notebooks are converted to other formats. These skills will enable anyone to make beautiful objects with Jupyter.

Yuvi Panda is infrastructure lead for the Data Science Education Program at UC Berkeley, where he works on scaling JupyterHub for use by thousands of students. A programmer and DevOps engineer, he wants to make it easy for people who don’t traditionally consider themselves programmers to do things with code and builds tools (Quarry, PAWS, etc.) to sidestep the list of historical accidents that constitute the “command-line tax” that people have to pay before doing productive things with computing. He’s a core member of the JupyterHub team and works on mybinder.org as well. Yuvi is also a Wikimedian, since you can check out of Wikimedia, but you can never leave.

Presentations

How we run MyBinder.org: A case study in open infrastructure 40-minute session

Running infrastructure is challenging for an open source community. A small community of individuals with varying skills operates MyBinder.org. In this talk, we'll talk about our social & technical processes for keeping mybinder.org reliable in the most open, transparent & inclusive way possible. We'll also share pretty graphs about the state of mybinder.org that anyone can see real-time!

Pangeo: Big Data Climate Science in the Cloud 40-minute session

Climate science is being flooded with petabytes of data, overwhelming traditional modes of data analysis. The Pangeo project is building a platform to take Big Data Climate Science into the cloud using scientific python and large-scale interactive computing tools. Come find out what we are building, why we are building it, and how you can use it!

Fernando Pérez is a staff scientist at Lawrence Berkeley National Laboratory and a founding investigator of the Berkeley Institute for Data Science at UC Berkeley, created in 2013. His research focuses on creating tools for modern computational research and data science across domain disciplines, with an emphasis on high-level languages, interactive and literate computing, and reproducible research. He created IPython while a graduate student in 2001 and continues to lead its evolution into Project Jupyter, now as a collaborative effort with a talented team that does all the hard work. Fernando regularly lectures about scientific computing and data science and is a member of the Python Software Foundation, a founding member of NumFOCUS, and a National Academy of Science Kavli Frontiers of Science Fellow. He is also the recipient of the 2012 Award for the Advancement of Free Software from the Free Software Foundation. Fernando holds a PhD in particle physics from the University of Colorado at Boulder, which he followed with postdoctoral research in applied mathematics and developing numerical algorithms.

Presentations

Friday Opening Remarks Keynote

Friday Opening Remarks

Thursday Opening Remarks Keynote

Thursday Opening Remarks

Nicole Petrozzo is graduating from the Department of Computer Science at Bryn Mawr College, Spring 2018. She first used Jupyter in her firstyear seminar, and she last used Jupyter for her senior thesis exploring recommender systems using deep learning.

Presentations

Jupyter Graduates! 40-minute session

For the last four years, I have used nothing but Jupyter in the classroom. From a firstyear writing course to a course on assembly language; from Biology to Computer Science; from lectures to homework---everything has been in Jupyter. In this talk, I explore the ways I have leveraged Jupyter, and detail the successes and failures experienced along the way.

Min has been a core contributor to IPython and Jupyter for over ten years. He is a Postdoctoral Fellow at Simula Research Laboratory where his focus is on developing JupyterHub, Binder, and related technologies and supporting deployments of Jupyter in science and education around the world.

Presentations

Deploying a cloud-based JupyterHub for students and researchers Tutorial

This tutorial will let you provide a group of your colleagues or students with easy access to Jupyter notebooks and JupyterLab without asking them to install anything on their computers. You will configure and deploy a cloud-based JupyterHub using Kubernetes. You will learn how to customize and extend it for your needs.

Shivraj combines a strong background in business strategy with technical depth to drive successful outcomes for product teams. Before joining Capital One, Shivraj worked in product strategy in a Fortune 500 company, where he analyzed emerging markets and investigated strategic investments, and in strategy consulting where he advised on a wide variety of complex topics. Shivraj started his career as a software engineer developing enterprise backup software.

Presentations

Using Jupyter Notebooks in Highly Regulated Environments 40-minute session

Capital One recently explored different "notebook", looking for new ways to support our information based strategy. As an outcome, JupyterHub emerged as one of many options that can serve as a potential platform for analytics even in highly regulated industries like Financial Services. Come learn more about our journey and how Jupyter has become a part of an ever growing analytics toolkit!

I am currently a Digital Operations Specialist at McKinsey & Company. I program in Python and Javascript, which I primarily use data visualization, front-end web development, and robotics.

Presentations

JupyterLab and Plotly: A Data Vizualization Power Couple 40-minute session

JupyterLab and Plotly both provide a rich set of tools for working with data. When combined, they create a powerful computational environment that enables users to produce versatile, robust visualizations in a fast-paced setting. This session demonstrates how McKinsey uses JupyterLab and Plotly to create dynamic charts and web apps, including those that stream IoT data, in Python, Julia, and R.

Mariah graduated from UC Berkeley in 2017 with a degree in Computer Science and began working for the Division of Data Sciences shortly after graduation. She led the effort to build up a program called Data Scholars that provides specialized academic support for students from underrepresented and non-traditional backgrounds. Since September 2017, Mariah has been working with faculty on campus to build up the academic advising program for the new Data Science major (announced late Spring 2018). She has also been co-managing the data science modules program to facilitate the introduction of data science concepts in existing courses across the UC Berkeley campus.

Presentations

JupyterHub for domain-focused integrated learning modules 40-minute session

The modules program at UC Berkeley creates short explorations into data science using notebooks to allow students to work hands-on with a dataset relevant to their course. We’ve served over 1500 students in over 25 different courses primarily in the social sciences, arts, and humanities by plugging in for 1-3 class periods, an impossibility without a JupyterHub eliminating installation time.

Presentations

Jupyter in the modern enterprise data and analytics ecosystem: Trends, experiments and opportunities 40-minute session

Gerald Rouselle reviews some of the trends in modern data and analytics ecosystems for large enterprises and shares some of the key challenges and opportunities for Jupyter adoption. He also shares some recent examples and experiments in incorporating Jupyter in commercial products and platforms.

Scott Sanderson is a senior software engineer at Quantopian, where he is responsible for the design and implementation of Quantopian’s backtesting and research APIs. Within the Jupyter ecosystem, most of Scott’s work focuses on enhancing the extensibility of the Jupyter Notebook for use in large deployments.

Presentations

Designing for Interaction 40-minute session

This presentation explores how interactivity can and should influence the design of software libraries. We discuss ways that the needs of interactive users differ from the needs of application developers, and we describe techniques for improving the usability of libraries in interactive environments without sacrificing robustness in non-interactive environments.

An physicist by education, I studied Astrophysics at the Rijksuniversiteit Groningen (the Netherlands) and achieved my PhD from the Observatoire de Paris (France). After that I made the shift to software engineer and worked at a large bank in the Netherlands. While the work was enjoyable I was ready for a new challenge after 2 years, and joined the SDSC (Lausanne location) as a software engineer/data scientist to work on the development of the Renga platform.

Presentations

Reproducible science with the Renga platform 40-minute session

Renga is a highly-scalable and secure open software platform designed to make (data) science reproducible, to foster collaboration between scientists, and to share resources in a federated environment.

David has more than 15 years of experience in software engineering and data analytics and is currently a Director of Data Engineering at a Fortune 500 company, leading data product development within the Financial Services division. As part of his role, he guides agile teams to build data products for analyst and data communities with a primary focus of enabling self-service analytics, exploration, and insight discovery. David’s teams typically design data products using microservices, AngularJS, & Python and leverage core CICD practices for continuous delivery. He also has a wide breadth of knowledge across Financial Services domains and in the retail industry. As a developer and analyst, David’s greatest interest is solving unique, complex problems and developing others as software and data engineers.

Presentations

Using Jupyter Notebooks in Highly Regulated Environments 40-minute session

Capital One recently explored different "notebook", looking for new ways to support our information based strategy. As an outcome, JupyterHub emerged as one of many options that can serve as a potential platform for analytics even in highly regulated industries like Financial Services. Come learn more about our journey and how Jupyter has become a part of an ever growing analytics toolkit!

Based in the Bay Area of California, Matthew attended Stanford University for undergraduate and graduate school. He stayed in the area focused on startups, spending a long stretch of time working at OpenGov. Now he’s working at Netflix and scaling data platform solutions.

Presentations

Scheduled Notebooks: A means for manageable and traceable code execution 40-minute session

Using an nteract project, papermill, we’ll walk through how we use notebooks to track user jobs and make a simple interface for work submission. You’ll get an inside peek at how Netflix is tackling the scheduling problem for a range of users who want easily managed workflows.

Dr. Viral B. Shah is a Co-founder and CEO of Julia Computing and a co-creator of the Julia language. Julia has been downloaded over 2M times today. He spends all his time on working towards making Julia the default language for all forms of data science and numerical computing.

Viral has a Ph.D. in computational sciences from UC Santa Barbara, where his thesis was on interactive supercomputing. The technology developed in his thesis was licensed commercially by Microsoft.

He also architected the payment platforms for the National ID (Aadhaar) project of the Government of India, and authored a book his experiences implementing a complex technology project in governance – Rebooting India.

Presentations

The journey to Julia 1.0 - the "Ju" in Jupyter 40-minute session

Julia and Jupyter share a common evolution path. Julia is the language for modern technical computing, while Jupyter is the development and presentation environment for modern technical computing. This talk will explore the journey of Julia and the impact of Jupyter on Julia's growth.

Veda Shankar is a Developer Advocate at MapD. He comes to MapD with experience as a Director of Emerging Technologies at Quanta Cloud Technology, where he held hands-on expertise in Red Hat products (OpenStack, Ceph, Gluster, and OpenShift. He has also been at Red Hat as a Principal Technical Marketing Manager and Storage Solutions Architect. Veda has an MS in Computer Science at New Jersey Institute of Technology.

Presentations

Using MapD Kernel for Jupyter Notebook by Veda Shankar 40-minute session

The Jupyter MapD kernel allows seamless integration of the GPU-based MapD Core SQL engine into a machine learning pipeline. (I am submitting this abstract for a speaker)

I’m Caleb, an student at UC Berkeley studying Computer Science and Economics. I’m interested in applying data science in the context of education and social good. I’m currently working on nbinteract, a project that allows users to easily create interactive visualizations with just a few lines of Python.

Presentations

nbinteract: Shareable, Interactive Webpages From Notebooks 40-minute session

The nbinteract package converts Jupyter notebooks with widgets into interactive, standalone HTML pages. nbinteract’s built-in support for function-driven plotting makes authoring interactive pages simpler by allowing users to focus on data, not callbacks. We will introduce nbinteract and walk through the steps to publish an interactive web page from a Jupyter notebook.

Stephanie Stattel is a senior software developer who has been with Bloomberg LP for over 5 years and is currently developing applications to improve financial professionals’ research and investment workflows. She is a San Francisco lead of the company’s global Bloomberg Women In Tech (BWIT) community.

Presentations

Terraforming Jupyter: Changing JupyterLab to Suit Your Needs 40-minute session

We will walk through a series of extensions that demonstrate the power and flexibility of JupyterLab’s architecture. From targeted functionality modifications to more extreme atmospheric changes that require extensive decoupling and flexibility within JupyterLab, we explore the complexity and stability of extensions and how they can combine to form custom, opinionated JupyterLab environments.

Dave is a senior data scientist within the U.S. Department of Defense and leads the nbgallery project.

Presentations

Citizen Data Science: An Enterprise Use Case from Inside the U.S. Intelligence Community 40-minute session

How Jupyter was used inside the U.S. Department of Defense (DOD) and the Intelligence Community (IC) to empower thousands of “Citizen Data Scientists” to build and share analytics in order to meet the community’s dynamic challenges. These Citizen Data Scientists have the aptitude, curiosity, and creativity to put their tradecraft into code but historically lacked the technical training to do so.

Erik is a math and physics teacher in Uppsala, Sweden. While working towards a machine learning degree online, he realized the potential of Jupyter for educators and established a JupyterHub deployment using the Zero to JupyterHub on Kubernetes guide for his students, thereafter contributing to the open source project.

Presentations

Deploying a cloud-based JupyterHub for students and researchers Tutorial

This tutorial will let you provide a group of your colleagues or students with easy access to Jupyter notebooks and JupyterLab without asking them to install anything on their computers. You will configure and deploy a cloud-based JupyterHub using Kubernetes. You will learn how to customize and extend it for your needs.

Learn by Doing: Using Data-driven Stories and Visualization in the Classroom (High School and College) 40-minute session

Students can learn by doing. In this talk, we will show how interactive content, using Jupyter Notebooks, Widgets, and visualization libraries put the student in charge. We will share notable examples of projects within the Jupyter community and offer ways in which educators can help students to develop data science literacy and use computational skills to build upon their interests.

Rachael Tatman is a data scientist at Kaggle. She has a PhD in linguistics from the University of Washington, with a focus in computational sociolinguistics. Her interests include data science education and fairness in machine learning.

Presentations

I do, We do, You Do: Supporting active learning with notebooks Tutorial

A practical introduction on incorporating notebooks into the classroom using active learning techniques.

Reproducible Research Best Practices (highlighting Kaggle Kernels) 1-Day Training

In this workshop, we’ll take an existing research project and make it fully reproducible using Kaggle Kernels. This workshop will include hands-on instruction and best practices for each of the three components necessary for completely reproducible research.

Adam Thornton is a software developer in Data Management/Science Quality and Reliability Engineering on the Large Synoptic Survey Telescope. He is working on the JupyterLab-based interactive component of the LSST Science Platform. He has nearly 30 years of development, IT consulting, and system administration experience in a wide variety of settings from academic computing to Fortune 20 enterprises.

Presentations

"If The Data Will Not Come to the Astronomer...": JupyterLab and a sea change in astronomical analysis 40-minute session

LSST is an ambitious project to map the sky in the the fastest, widest and deepest survey ever made. This petabyte-scale, 7 trillion-row database disrupts traditional astronomical workflows. Our science platform requires a paradigm shift in how astronomy is done. Learn the challenges of providing production services on a notebook-based architecture and the compelling advantages of JupyterLab.

Wolf is a scientific software developer at QuantStack. Prior to joining QuantStack, Wolf completed a masters in Robotics at ETH Zurich and Stanford, focusing on Artificial Intelligence. He also wore a couple of hats: freelance web designer and – developer, building software for the BeachBot with Disney Research and making drones find their way at Rapyuta Robotics.
Besides work he’s a passionate cyclist and enjoys spending time outside the city.

Presentations

Going Native: C++ as a First-Class Citizen of the Jupyter Ecosystem 40-minute session

In this talk, we present the latest features of the C++ Jupyter kernel including - live help, auto-completion, - rich mime type rendering, - interactive widgets, making it one of the most featureful implementations of the Jupyter kernel protocol, and bringing Jupyter closer to the metal.

Ronald (Ronnie) Walker is a senior at UC Berkeley studying Economics. He’s been involved with the Data Science Education Program at UC Berkeley since 2016, assuming various roles including undergraduate student instructor, connector course teaching assistant, and modules team lead. As a team lead for the modules program, Ronnie has worked with faculty in Linguistics, Information Science, Education, Cognitive Science, Legal Studies, Near Eastern Studies, and Economics to build short modules for their courses. Most recently, he has been busy helping departments integrate existing full courses with data science approaches.

Presentations

JupyterHub for domain-focused integrated learning modules 40-minute session

The modules program at UC Berkeley creates short explorations into data science using notebooks to allow students to work hands-on with a dataset relevant to their course. We’ve served over 1500 students in over 25 different courses primarily in the social sciences, arts, and humanities by plugging in for 1-3 class periods, an impossibility without a JupyterHub eliminating installation time.

Elizabeth Wickes is a Lecturer at the School of Information Sciences at the University of Illinois, where she teaches foundational programming and information technology courses. She was previously a Data Curation Specialist for the Research Data Service at the University Library of the University of Illinois, and the Curation Manager for Wolfram|Alpha. She currently co-organizes the Champaign-Urbana Python user group, has been a Carpentries instructor since 2015, trainer since 2017, and elected member of The Carpenties’ Executive Council for 2018.

Presentations

Reproducible education: what teaching can learn from open science practices 40-minute session

As practitioners of open science begin to migrate their educational material into pubic repositories, many of their common practices and platforms can be used to streamline the instruction material development process. This talk will compare how many open science practices can be used in an educational context, and are best facilitated by usage of tools like the Jupyter Notebook.

George Williams has worked at the intersection of research and industry for two decades. He has published paper in major mathematics and AI conferences, and several patents in computer vision and security. Currently, he is the chief data scientist for a cybersecurity startup based in Brooklyn.

Presentations

Rapid Data Science Deployment For Cybersecurity With JupyterHub 40-minute session

The key to successful threat detection in cybersecurity is fast response. Many actors are involved including operations specialists, cybersecurity experts, developers, and more recently data scientists. We have built specialized extensions for data scientists working in cybersecurity, that can be used and deployed via JupyterHub.

Carol is currently a Research Software Engineer at Cal Poly San Luis Obispo working full-time on [Project Jupyter](https://jupyter.org). She is also a Python Software Foundation Fellow and former Director; a Project Jupyter Steering Council member; a core developer for CPython, Jupyter, AnitaB.org’s open source projects, and PyLadies; a co-organizer of PyLadies San Diego and San Diego Python User Group; a Geek-In-Residence at FabLab San Diego; and an independent developer of open hardware and software.

Presentations

Deploying a cloud-based JupyterHub for students and researchers Tutorial

This tutorial will let you provide a group of your colleagues or students with easy access to Jupyter notebooks and JupyterLab without asking them to install anything on their computers. You will configure and deploy a cloud-based JupyterHub using Kubernetes. You will learn how to customize and extend it for your needs.

Learn by Doing: Using Data-driven Stories and Visualization in the Classroom (High School and College) 40-minute session

Students can learn by doing. In this talk, we will show how interactive content, using Jupyter Notebooks, Widgets, and visualization libraries put the student in charge. We will share notable examples of projects within the Jupyter community and offer ways in which educators can help students to develop data science literacy and use computational skills to build upon their interests.

Wenming Ye is currently a Senior Solution Architect for Amazon Web Services.

Presentations

Explore AWS Machine Learning Platform using Amazon SageMaker 2-Day Training

Machine Learning and IoT projects are now common for enterprises and startups alike. These advanced technologies have been the key innovation engine for businesses such as Amazon Go, Alexa, and Robotics. In this hands-on workshop, we will explore the AWS Machine Learning Platform using project Jupyter-based Amazon SageMaker to build, train, and deploy ML/DL models to Cloud, and AWS DeepLens.

Explore AWS Machine Learning Platform using Amazon SageMaker (Day 2) Training Day 2

Machine Learning and IoT projects are now common for enterprises and startups alike. These advanced technologies have been the key innovation engine for businesses such as Amazon Go, Alexa, and Robotics. In this hands-on workshop, we will explore the AWS Machine Learning Platform using project Jupyter-based Amazon SageMaker to build, train, and deploy ML/DL models to Cloud, and AWS DeepLens.

Kevin Zielnicki is a Data Scientist on the Styling Algorithms team at Stitch Fix. Kevin holds a doctorate in physics in the field of quantum information processing, but he now enjoys working with data that can be observed without changing its value.

Presentations

Explorations in reproducible analysis with Nodebook 40-minute session

Even with good intentions, analysis notebooks can quickly accumulate a mess of false starts and out-of-order statements. Best practices encourage cleaning up a notebook to ensure reproducibility, but many analyses will never reach this cleaned-up state. As an alternative, this talk will describe Nodebook, a Jupyter plugin that encourages reproducibility by preventing inconsistency.