Brought to you by NumFOCUS Foundation and O’Reilly Media Inc.
The official Jupyter Conference
August 22-23, 2017: Training
August 23-25, 2017: Tutorials & Conference
New York, NY

Speakers

New speakers are added regularly. Please check back to see the latest updates to the agenda.

Filter

Search Speakers

Safia Abdalla is one of the maintainers of nteract, a desktop-based interactive computing experience. A data scientist and software engineer with an interest in open source software and data science for social good, Safia is the organizer of PyData Chicago. In her free time, she enjoys running, working out, and drinking tea.

Presentations

How to cross the asteroid belt Tutorial

Have you wondered what it takes to go from a Jupyter user to a Jupyter pro? Wonder no more. Safia Abdalla explores the core concepts of the Jupyter ecosystem, including the extensions ecosystem, the kernel ecosystem, and the frontend architecture, leaving you with an understanding of the possibilities of the Jupyter ecosystem and practical skills on customizing the Jupyter Notebook experience.

Alexandre Archambault is a software and data engineer at Teads.tv. Alexandre is also a contributor to and author of noted Scala projects, inclusing coursier and shapeless.

Presentations

Scala: Why hasn't an official Scala kernel for Jupyter emerged yet? Session

Alexandre Archambault explores why an official Scala kernel for Jupyter has yet to emerge. Part of the answer lies in the fact that there is no user-friendly, easy-to-use Scala shell in the console (i.e., no IPython for Scala). But there's a new contender, Ammonite—although it still has to overcome a few challenges, not least being supporting by big data frameworks like Spark, Scio, and Scalding.

Damián Avila is a senior software developer at Anaconda Powered by Continuum Analytics. A software developer, data scientist, quantitative analyst, and developer focusing on data science, finance, data visualization, and the Jupyter ecosystem, Damián has made meaningful contribution to several open source projects and has been a core developer to Jupyter, Nikola, and Bokeh. He is the creator of RISE, a “live” slideshow machinery for the Jupyter Notebook. Previously, he was a biochemist-immunologist. Damián has presented talks, tutorials, and posters at a number of national and international conferences and led tutorials on the scientific Python ecosystem. He’s a member of the Jupyter Steering Council, Python Argentina, Scientific Python Argentina, and the Quantitative Finance Club.

Presentations

The Jupyter Notebook as document: From structure to application Session

M Pacer, Jess Hamrick, and Damián Avila explain how the structured nature of the notebook document format, combined with native tools for manipulation and creation, allows the notebook to be used across a wide range of domains and applications.

Demba Ba is an assistant professor of electrical engineering and bioengineering at Harvard University, where he directs the CRISP group. He and his group develop mathematical and computational tools to elucidate the role of dynamic networks of neurons in phenomena such as anesthesia, sleep, the learning of fear, and aging and to enable more efficient signal representations that exploit the structure present in natural media such as audio, images, and video. Demba is passionate about teaching, and eagerly incorporates Jupyter notebooks and the Python ecosystem in his courses because of the unique opportunity they provide for interactive, web-based teaching of content that has not traditionally leveraged scientific computing resources. Attempting to bridge the gap that has existed between theory and pen-and-paper courses and application- and coding-focused classes, he spearheaded the development and deployment of the JupyterHub Notebook on Amazon AWS cloud for two classes in Harvard’s School of Engineering and Applied Sciences. In 2016, Demba received a research fellowship in neuroscience from the Alfred P. Sloan Foundation. He holds a BSc in electrical engineering from the University of Maryland, College Park, and both an MSci and a PhD in electrical engineering and computer science with a minor in mathematics from the Massachusetts Institute of Technology.

Presentations

Labz 'N Da Wild 2.0: Teaching signal and data processing at scale using Jupyter notebooks in the cloud Keynote

Demba Ba discusses two new signal processing/statistical modeling courses he designed and implemented at Harvard, exploring his perspective as an educator and that of the students as well as the steps that led him to adopt the current cloudJHub architecture. Along the way, Demba outlines the potential of architectures such as cloudJHub to help to democratize data science education.

Gunjan Baid is a student at University of California, Berkeley. She completed her bachelor’s degree in computer science and biochemistry and is now pursuing a master’s degree in computer science with a research focus on computational biology. Gunjan is associated with the undergraduate Data Science education program, where as a student instructor, she worked with Jupyter notebooks in the classroom and now provides technical support for the program’s JupyterHub infrastructure.

Presentations

Data science at UC Berkeley: 2,000 undergraduates, 50 majors, no command line Session

Engaging critically with data is now a required skill for students in all areas, but many traditional data science programs aren’t easily accessible to those without prior computing experience. Gunjan Baid and Vinitra Swamy explore UC Berkeley's Data Science program—2,000 students across 50 majors—explaining how its pedagogy was designed to make data science accessible to everyone.

Meet the Expert with Gunjan Baid (UC Berkeley) Meet the Experts

Chat with Gunjan about the use of Jupyter notebooks in education and how to use these tools more effectively in classrooms.

Lorena A. Barba is associate professor of mechanical and aerospace engineering at the George Washington University in Washington, DC. Her research includes computational fluid dynamics, high-performance computing, computational biophysics, and animal flight, and she is well known for her courses and open educational resources using Jupyter notebooks. An international leader in computational science and engineering, Lorena is also a long-standing advocate of open source software for science and education. She was a recipient of the 2016 Leamer-Rosenthal Award for Open Social Sciences, and in 2017, she was nominated and received an honorable mention in the Open Education Awards for Excellence of the Open Education Consortium. She received the NSF Faculty Early CAREER award in 2012, was named a 2012 CUDA fellow by NVIDIA, and was awarded a grant by the UK Engineering and Physical Sciences Research Council (EPSRC) First Grant program in 2007. Lorena holds a PhD in aeronautics from the California Institute of Technology.

Presentations

Design for reproducibility Keynote

Lorena Barba explores how to build the ability to support reproducible research into the design of tools like Jupyter and explains how better insights on designing for reproducibility might help extend this design to our research workflows, with the machine as our active collaborator.

James Bednar is a solutions architect at Anaconda Powered by Continuum Analytics and an honorary fellow in the School of Informatics at the University of Edinburgh, Scotland. Previously, Jim was a lecturer and researcher in computational neuroscience at the University of Edinburgh, Scotland, and a software and hardware engineer at National Instruments. He manages the open source Python projects datashader, HoloViews, GeoViews, ImaGen, param, and paramnb. He has published more than 50 papers and books about the visual system, data visualization, and software development. Jim holds a PhD in computer science from the University of Texas as well as degrees in electrical engineering and philosophy.

Presentations

Deploying interactive Jupyter dashboards for visualizing hundreds of millions of datapoints, in 30 lines of Python Tutorial

It can be difficult to assemble the right set of packages from the Python scientific software ecosystem to solve complex problems. James Bednar and Philipp Rudiger walk you step by step through making and deploying a concise, fast, and fully reproducible recipe for interactive visualization of millions or billions of data points using very few lines of Python in a Jupyter notebook.

Daina Bouquin is the head librarian of the Harvard-Smithsonian Center for Astrophysics in Cambridge, MA. Her work aims to lower social and technical barriers that impact the astronomy community’s ability to create and share new knowledge. Her research interests focus primarily on how libraries can support open science, research software preservation, emerging computational methods, and the history of science. Daina is currently working toward an MS in data analytics at CUNY’s School of Professional Studies.

Presentations

Beautiful networks and network analytics made simpler with Jupyter Session

Performing network analytics with NetworkX and Jupyter often results in difficult-to-examine hairballs rather than useful visualizations. Meanwhile, more flexible tools like SigmaJS have high learning curves for people new to JavaScript. Daina Bouquin and John DeBlase share a simple, flexible architecture that can help create beautiful JavaScript networks without ditching the Jupyter Notebook.

Maarten Breddels is a postdoctoral researcher at the Kapteyn Astronomical Institute at the University of Groningen (RUG), Netherlands, where he works for the Gaia mission, combining astronomy and IT to enable visualization and exploration of the large dataset this satellite will yield. Maarten has experience with low-level languages, such as Assembly and C, and higher-level languages, including C++, Java, and Python. He holds a bachelor’s degree in information technology and a bachelor’s degree, master’s degree, and PhD in astronomy, where his research focused on the field of galactic dynamics.

Presentations

A billion stars in the Jupyter Notebook Session

Maarten Breddels offers an overview of vaex, a Python library that enables calculating statistics for a billion samples per second on a regular n-dimensional grid, and ipyvolume, a library that enables volume and glyph rendering in Jupyter notebooks. Together, these libraries allow the interactive visualization and exploration of large, high-dimensional datasets in the Jupyter Notebook.

Matt Burton is a visiting assistant professor at the School of Computing and Information at the University of Pittsburgh. His research interests include infrastructure studies, data science, and scholarly communication. Matt holds a PhD in information from the University of Michigan. His dissertation, Blogs as Infrastructure for Scholarly Communication, explored digital humanities blogging and the sociotechnical dynamics of web-centric publishing.

Presentations

Defactoring pace of change: Reviewing computational research in the digital humanities Session

While Jupyter notebooks are a boon for computational science, they are also a powerful tool in the digital humanities. Matt Burton offers an overview of the digital humanities community, discusses defactoring—a novel use of Jupyter notebooks to analyze computational research—and reflects upon Jupyter’s relationship to scholarly publishing and the production of knowledge.

Natalino Busa is the head of data science at Teradata, where he leads the definition, design, and implementation of big, fast data solutions for data-driven applications, such as predictive analytics, personalized marketing, and security event monitoring. Previously, Natalino served as enterprise data architect at ING and as senior researcher at Philips Research Laboratories on the topics of system-on-a-chip architectures, distributed computing, and parallelizing compilers. Natalino is an all-around technology manager, product developer, and innovator with a 15+ year track record in research, development, and management of distributed architectures and scalable services and applications.

Presentations

Data science apps: Beyond notebooks Session

Jupyter notebooks are transforming the way we look at computing, coding, and science. But is this the only "data scientist experience" that this technology can provide? Natalino Busa explains how you can create interactive web applications for data exploration and analysis that in the background are still powered by the well-understood and well-documented Jupyter Notebook.

Matthias Bussonnier is postdoc at UC Berkeley BIDS and a core developer of the Jupyter and IPython project, where he is working in close collaboration with Google to bring real-time collaboration to the Jupyter environment.

Presentations

Jupyter: Kernels, protocols, and the IPython reference implementation Session

Matthias Bussonnier and Paul Ivanov walk you through the current Jupyter architecture and protocol and explain how kernels work (decoupled from but in communication with the environment for input and output, such as a notebook document). Matthias and Paul also offer an overview of a number of kernels developed by the community and show you how you can get started writing a new kernel.

Charlotte Cabasse-Mazel is an ethnographer at the Berkeley Institute for Data Science at UC Berkeley. She is interested in the ways in which practices and methodologies of data science transform production of knowledge and interdisciplinary collaboration, as well as scientific personae and trajectories within the academic institution. Charlotte holds a PhD in geography and science and technologies studies from the University of Paris-Est, where she studied at the Laboratoire Techniques, Territoires et Sociétés (LATTS), at Ecole Nationale des Ponts et Chaussées.

Presentations

Jupyter and the changing rituals around computation Session

The concept of the ritual is useful for thinking about how the core technology of Jupyter notebooks is extended through other tools, platforms, and practices. R. Stuart Geiger, Brittany Fiore-Gartland, and Charlotte Cabasse-Mazel share ethnographic findings about various rituals performed with Jupyter notebooks.

Brett Cannon is a Python core developer working on Python on the Azure Data Science Tools team at Microsoft.

Presentations

The give and take of open source Keynote

Brett Cannon explains why, in order for open source projects to function long-term, a symbiotic relationship between user and project maintainer needs to exist. When users receive a useful piece of software and project maintainers receive useful help in maintaining the project, everyone is happy.

Shane Canon is a project engineer in the Data and Analytics Services group at NERSC in the Lawrence Berkeley National Laboratory, where he focuses on enabling data-intensive applications on HPC platforms and engaging with bioinformatics applications. Shane has held a number of positions at NERSC, including leading the Technology Integration group, where he focused on the Magellan Project and other areas of strategic focus, leading the Data Systems group, and serving as a system administrator for the PDSF cluster, where he gained experience in cluster administration, batch systems, parallel filesystems, and the Linux kernel. He was also a group leader at Oak Ridge National Laboratory, where he architected the 10 petabyte Spider filesystem. Shane is involved in a number of projects outside of NERSC, including serving as the production lead on the KBase project, which is developing a platform to enable predictive biology. Shane holds a PhD in physics from Duke University and a BS in physics from Auburn University.

Presentations

How JupyterHub tamed big science: Experiences deploying Jupyter at a supercomputing center Session

Shreyas Cholia, Rollin Thomas, and Shane Canon share their experience leveraging JupyterHub to enable notebook services for data-intensive supercomputing on the Cray XC40 Cori system at the National Energy Research Scientific Computing Center (NERSC).

Prithwish Chakraborty is a data scientist on the IBM Watson for Real World Evidence team at IBM Watson Health. His work focuses on applications of data science towards patient health characterization and risk modeling. Broadly, his research interests are temporal data mining, machine learning, and image recognition. His work has been published in key data science venues, including KDD, SDM, and AAAI, and he presented a tutorial on public health forecasting in AAAI 2016 and gave an invited talk at BCDE 2014. Prithwish holds a patent with HP labs on forecasting solar photovoltaic output. He holds a PhD in computer science from Virginia Tech, where his research, under the guidance of Naren Ramakrishnan, focused on the applications of data science to public health forecasting.

Presentations

Data science made easy in Jupyter notebooks using PixieDust and InsightFactory Session

David Taieb, Prithwish Chakraborty, and Faisal Farooq offer an overview of PixieDust, a new open source library that speeds data exploration with interactive autovisualizations that make creating charts easy and fun.

Born and raised in Taiwan, Hope Chen is a PhD candidate in astronomy and astrophysics at Harvard University. Since 2011, Hope has been a member of the Star Formation research group led by Alyssa A. Goodman of Harvard and has embarked on several research projects aimed at decoding the mysteries of structural formation in nearby cradles of stars, such as the Gould Belt molecular clouds. Hope is interested in making astronomy seamless and accessible and has been an ardent user of the Jupyter Notebook. Hope holds a degree from National Tsing Hua University with the highest distinction, the Dr. Mei Yi-Chih Prize.

Presentations

Citing the Jupyter Notebook in the scientific publication process Session

Although researchers have traditionally cited code and data related to their publications, they are increasingly using the Jupyter Notebook to share the processes involved in the act of scientific inquiry. Bernie Randles and Catherine Zucker explore various aspects of citing Jupyter notebooks in publications, discussing benefits, pitfalls, and best practices for creating the "paper of the future."

Chakri Cherukuri is a researcher in the Quantitative Financial Research group at Bloomberg LP. His research interests include quantitative portfolio management, algorithmic trading strategies, and applied machine learning. He has extensive experience in scientific computing and software development. Previously, he built analytical tools for the trading desks at Goldman Sachs and Lehman Brothers. Chakri is one of the main contributors to bqplot, a Jupyter Notebook–based interactive plotting library. He holds an undergraduate degree in mechanical engineering from the Indian Institute of Technology, Chennai, and an MS in computational finance from Carnegie Mellon University.

Presentations

Building interactive applications and dashboards in the Jupyter Notebook (sponsored by Bloomberg) Session

Romain Menegaux and Chakri Cherukuri demonstrate how to develop advanced applications and dashboards using open source projects, illustrated with examples in machine learning, finance, and neuroscience.

Shreyas Cholia leads the Usable Software Systems group at Lawrence Berkeley National Laboratory (LBNL), which focuses on making scientific computing more transparent and usable. He is particularly interested in how web APIs and tools can facilitate this. Shreyas also leads the science gateway, web, and grid efforts at the National Energy Research Scientific Computing Center (NERSC) at LBNL. His current work includes a project that enables Jupyter to interact with supercomputing resources, and NEWT, a REST API for high-performance computing. He holds a degree from Rice University, where he studied computer science and cognitive sciences.

Presentations

How JupyterHub tamed big science: Experiences deploying Jupyter at a supercomputing center Session

Shreyas Cholia, Rollin Thomas, and Shane Canon share their experience leveraging JupyterHub to enable notebook services for data-intensive supercomputing on the Cray XC40 Cori system at the National Energy Research Scientific Computing Center (NERSC).

Danielle Chou is a solutions engineer at Zymergen, where she works on custom software tools for scientists. Previously, she worked on failure detection software for an ingestible sensor company and studied bioengineering at UC Berkeley and UCSF.

Presentations

Using Jupyter at the intersection of robots and industrial biology Session

Zymergen approaches biology with an engineering and data-driven mindset. Its platform integrates robotics, software, and biology to deliver predictability and reliability during strain design and development. Marc Colangelo, Justin Nand, and Danielle Chou explain the integral role Jupyter notebooks play in providing a shared Python environment between Zymergen's software engineers and scientists.

Pramit Choudhary is a lead data scientist at Data Science.com, where he focuses on optimizing and applying classical machine learning and Bayesian design strategy to solve real-world problems. Currently, he is leading initiatives on figuring out better ways to explain a model’s learned decision policies to reduce the chaos in building effective models and close the gap between a prototype and operationalized model.

Presentations

Model interpretation guidelines for the enterprise: Using Jupyter’s interactiveness to build better predictive models (sponsored by DataScience.com) Session

Pramit Choudhary offers an overview of Datascience.com's model interpretation library Skater, explains how to use it to evaluate models using the Jupyter environment, and shares how it could help analysts, data scientists, and statisticians better understand their model behavior—without compromising on the choice of algorithm.

Rowan Cockett is the founder and CTO of 3point Science (acquired by Aranz Geo in 2016), a company building web-based visualization software for the geoscience industry, including Steno3D. Rowan is also a graduate student at the University of British Columbia, where he is researching a numerical framework aimed at increasing quantitative communication in the geosciences developed through his studies on numerical geophysics, subsurface flow, and structural geology. Rowan is interested in the intersection of education, industry, and academia and seeing what happens when you make powerful scientific modeling, visualization, and communication tools accessible through the web. Much of his research is accessible through an open source software initiative for geophysical simulations and parameter estimation (SimPEG) and an open website for geoscience modeling (Visible Geology).

Presentations

Deploying a reproducible course Session

Web-based textbooks and interactive simulations built in Jupyter notebooks provide an entry point for course participants to reproduce content they are shown and dive into the code used to build them. Lindsey Heagy and Rowan Cockett share strategies and tools for developing an educational stack that emerged from the deployment of a course on geophysics and some lessons learned along the way.

Marc Colangelo is a solutions engineer at Zymergen. Previously, Marc worked in various research areas including immunology, dynamic proteomics systems, and healthcare data modeling. Marc holds a bachelor of health sciences from McMaster University and a PhD from McMaster’s Medical Sciences program, with a focus on infection and immunity stream, in the Department of Pathology and Molecular Medicine.

Presentations

Using Jupyter at the intersection of robots and industrial biology Session

Zymergen approaches biology with an engineering and data-driven mindset. Its platform integrates robotics, software, and biology to deliver predictability and reliability during strain design and development. Marc Colangelo, Justin Nand, and Danielle Chou explain the integral role Jupyter notebooks play in providing a shared Python environment between Zymergen's software engineers and scientists.

Chris Colbert is a software architect for Project Jupyter.

Presentations

JupyterLab: The next-generation Jupyter frontend Session

Brian Granger, Chris Colbert, and Ian Rose offer an overview of JupyterLab, which enables users to work with the core building blocks of the classic Jupyter Notebook in a more flexible and integrated manner.

Sylvain Corlay is a quant researcher specializing in stochastic analysis and optimal control and the founder of QuantStack. Previously, Sylvain was a quant researcher at Bloomberg LP and an adjunct faculty member at Columbia University and NYU. As an open source developer, Sylvain mostly contributes to Project Jupyter in the area of interactive widgets and lower-level components such as traitlets. He is also a member of the steering committee of the project. Sylvain is also a contributor to a number of other open source projects for scientific computing and data visualization, such as bqplot, pythreejs, and ipyleaflet, and coauthored the xtensor C++ tensor algebra library. He holds a PhD in applied mathematics from University Paris VI.

Presentations

Jupyter widgets: Interactive controls for Jupyter Tutorial

Jupyter widgets allow you to build user interfaces with graphical controls inside a Jupyter notebook and provide a framework for building custom controls. Sylvain Corlay and Jason Grout demonstrate how to use Jupyter widgets effectively for interactive computing, explore the ecosystem of custom controls, and walk you through building your own control.

Xeus: A framework for writing native Jupyter kernels Session

Xeus takes on the burden of implementing the Jupyter kernel protocol so that kernel authors can focus on more easily implementing the language-specific part of the kernel and support features, such as autocomplete or interactive widgets. Sylvain Corlay and Johan Mabille showcase a new C++ kernel based on the Cling interpreter built with xeus.

John DeBlase is lead developer for the CUNY Building Performance Lab, where he helps develop Python-based statistical modeling applications for city-wide energy management research. A developer, data scientist, and musician from Queens, NY, John’s personal research revolves around the development musical intelligence systems using natural language processing techniques with a focus on real-time human-computer interaction. John is interested in developing applications for data scientists that emphasize interactive data visualization, leveraging the best tools currently available in both Python and Node.js.

Presentations

Beautiful networks and network analytics made simpler with Jupyter Session

Performing network analytics with NetworkX and Jupyter often results in difficult-to-examine hairballs rather than useful visualizations. Meanwhile, more flexible tools like SigmaJS have high learning curves for people new to JavaScript. Daina Bouquin and John DeBlase share a simple, flexible architecture that can help create beautiful JavaScript networks without ditching the Jupyter Notebook.

Christine Doig is a senior product manager and data scientist at Anaconda Powered by Continuum Analytics. Christine has 8+ years of experience in analytics, operations research, and machine learning in a variety of industries, including energy, manufacturing, and banking. An open source advocate, she has spoken at PyData, EuroPython, SciPy, PyCon, OSCON, and many other open source conferences. Christine holds an MS in industrial engineering from the Polytechnic University of Catalonia in Barcelona.

Presentations

Data science encapsulation and deployment with Anaconda Project and JupyterLab (sponsored by Anaconda Powered by Continuum Analytics) Session

Christine Doig offers an overview of the Anaconda Project, an open source library created by Continuum Analytics that delivers lightweight, efficient encapsulation and portability of data science projects. A JupyterLab extension enables data scientists to install the necessary dependencies, download datasets, and set environment variables and deployment commands from a graphical interface.

Leveraging Jupyter to build an Excel-Python bridge Session

Christine Doig and Fabio Pliger explain how they built a commercial product on top Jupyter to help Excel users access the capabilities of the rich data science Python ecosystem and share examples and use cases from a variety of industries that illustrate the collaborative workflow between analysts and data scientists that the application has enabled.

Nadia Eghbal works on community programs at GitHub, where she is building sustainability initiatives. Nadia explores how we can better support open source infrastructure, highlighting current gaps in funding and knowledge. She recently published Roads and Bridges: The Unseen Labor Behind Our Digital Infrastructure with support from the Ford Foundation. Nadia is based in San Francisco.

Presentations

Where money meets open source Keynote

We know money has an important role to play in open source, but where does it help and where does it fall short? Nadia Eghbal explores how money can support open source development without changing its incentives—especially when grants are involved.

Lori Eich is a senior product manager at Civis Analytics, where she works with engineers to build the Civis Data Science Platform. Lori’s background is in software engineering, derivatives trading, environmental consulting, and competitive fringe sports. She holds an SM and SB in Earth, atmospheric, and planetary sciences from the Massachusetts Institute of Technology.

Presentations

Jupyter notebooks and the road to enabling data-driven teams Session

It’s not enough just to give data scientists access to Jupyter notebooks in the cloud. Skipper Seabold and Lori Eich argue that to build truly data-driven organizations, everyone from data scientists and managers to business stakeholders needs to work in concert to bring data science out of the wilderness and into the core of decision-making processes.

Faisal Farooq is the principal scientist in the Watson Health group of IBM Watson, where he works on next-generation healthcare software to improve patient care. Faisal is an expert in applying machine learning in the healthcare domain. Previously, he was a senior key expert (distinguished scientist) at Siemens Healthcare, where he successfully delivered the most widely adopted data science product in US healthcare. Faisal has published a number of papers in multiple journals and at conferences in the areas of machine learning, handwriting, biometrics, and text analysis. He holds a PhD in computer science and engineering from the University at Buffalo, where he worked as a graduate research assistant in Center of Excellence for Document Analysis and Recognition (CEDAR) and the Center for Unified Biometrics and Sensors (CUBS). He also completed multiple research internships at the IBM T.J. Watson Research Center.

Presentations

Data science made easy in Jupyter notebooks using PixieDust and InsightFactory Session

David Taieb, Prithwish Chakraborty, and Faisal Farooq offer an overview of PixieDust, a new open source library that speeds data exploration with interactive autovisualizations that make creating charts easy and fun.

Felipe Ferreira is a big data engineer at Globo.com, where he focuses on big data platform analytics using Hadoop and associated ecosystem tools. Felipe is an analytical, performance-focused engineer with over 12 years of experience in enterprise systems development and architectural design using JEE technology combined with the ability to drive user-centric solutions, define strategy, and lead data management.

Presentations

Accelerating data-driven culture at the largest media group in Latin America with Jupyter Session

JupyterHub is an important tool for research and data-driven decisions at Globo.com. Diogo Munaro Vieira and Felipe Ferreira explain how data scientists at Globo.com—the largest media group in Latin America and second largest television group in the world—use Jupyter notebooks for data analysis and machine learning, making decisions that impact 50 million users per month.

Brittany Fiore-Gartland is the director of data science ethnography at the eScience Institute and a research scientist in the Department of Human Centered Design and Engineering at the University of Washington, where she leads a research group that studies the sociocultural implications of data-intensive science and how data-intensive technologies are reshaping how people work and organize. Her research focuses on cross-sector and interdisciplinary data science collaborations, emerging pedagogical models for data science, and bringing a human-centered, sociotechnical, and ethical perspective to data science practice. Brittany coleads UW Data Science Studies, an interdisciplinary group of researchers studying the sociotechnical and ethical dimensions of the emerging practice of data science that is part of a collaborative and multisited working group supported through the Moore-Sloan Data Science Environments and in partnership with researchers at the Berkeley Institute for Data Science and the Center for Data Science in New York University. Whenever possible, Brittany’s work follows a model of action research, meaning her research practice aims to inform and affect positive change within the communities she studies. Often this takes the form of articulating the challenges and opportunities for communication and collaboration during times of technological change. She works with communities to bridge communication gaps and develop value-informed, reflexive, and adaptive organizational practices.

Presentations

Jupyter and the changing rituals around computation Session

The concept of the ritual is useful for thinking about how the core technology of Jupyter notebooks is extended through other tools, platforms, and practices. R. Stuart Geiger, Brittany Fiore-Gartland, and Charlotte Cabasse-Mazel share ethnographic findings about various rituals performed with Jupyter notebooks.

Jeremy Freeman is manager of computational biology at the Chan Zuckerberg Initiative, where he is helping develop efforts to support and accelerate basic research with tools for analysis, visualization, and collaborative sharing of data and knowledge. Previously, he ran a neuroscience research lab for several years. A scientist at the intersection of biology and technology, Jeremy wants to understand how biological systems work and use that understanding to benefit both human health and the design of intelligent systems. He is passionate about open source and open science and bringing scientists and engineers together across a range of fields.

Presentations

Making science happen faster Keynote

Modern biology is evolving quickly, but if we want to make our science more robust, more scalable, and more reproducible, the major bottleneck is computation. Jeremy Freeman offers an overview of a growing ecosystem of solutions to this challenge—many of which involve Jupyter—in the context of exciting scientific projects past, present, and future.

Tim Gasper is director of product and marketing at Bitfusion, a deep learning automation software company enabling easier, faster development of AI applications, and cofounder of Ponos, an IoT-enabled hydroponics farming technology company. Tim has over eight years of big data, IoT, and enterprise content product management and product marketing experience. He is a writer and speaker on entrepreneurship, the Lean Startup methodology, and big data analytics. Previously, Tim was global portfolio manager for CSC Big Data and Analytics, where he was responsible for the overall strategy, roadmap, partnerships, and technology mix for the big data and analytics product portfolio; VP of product at Infochimps (acquired by CSC), where he led product development for its market-leading open data marketplace and big data platform as a service; and cofounder of Keepstream, a social media analytics and curation company.

Presentations

Deep learning and Elastic GPUs using Jupyter Session

Combined with GPUs, Jupyter makes for fast development and fast execution, but it is not always easy to switch from a CPU execution context to GPUs and back. Tim Gasper and Subbu Rama share best practices for doing deep learning with Jupyter and explain how to work with CPUs and GPUs more easily by using Elastic GPUs and quick-switching between custom kernels.

Laurent Gautier is a scientific research lead at Verily Life Sciences (fka Google Life Sciences). Laurent’s work focuses on data science, visualization, machine learning, data mining, and prototyping software to understand molecular, cellular, and clinical data. He is the author of popular open source tools in bioinformatics and statistical programming for applications in healthcare, life sciences, and beyond and has contributed to or led a number of open source projects, including Bioconductor, affy, and rpy2.

Presentations

Data analysis in Jupyter notebooks with SQL, Python, and R Tutorial

Python is popular for data analysis, but restricting yourself to Python means missing a wealth of libraries or capabilities available in R or SQL. Laurent Gautier walks you through a pragmatic, reasonable, and good-looking polyglot approach, all thanks to R visualizations.

R. Stuart Geiger is an ethnographer and postdoctoral scholar at the Berkeley Institute for Data Science at UC Berkeley, where he studies the infrastructures and institutions that support the production of knowledge. He uses ethnographic, historical, qualitative, and quantitative methods in his research, which is grounded in the fields of computer-supported cooperative work, science and technology studies, and communication and new media studies. He holds a PhD from the UC Berkeley School of Information, where his research focused on the governance and operation of Wikipedia and scientific research networks. He has also studied newcomer socialization, moderation and quality control, specialization and professionalization, cooperation and conflict, the roles of support staff and technicians, and diversity and inclusion.

Presentations

Jupyter and the changing rituals around computation Session

The concept of the ritual is useful for thinking about how the core technology of Jupyter notebooks is extended through other tools, platforms, and practices. R. Stuart Geiger, Brittany Fiore-Gartland, and Charlotte Cabasse-Mazel share ethnographic findings about various rituals performed with Jupyter notebooks.

Brian Granger is an associate professor of physics and data science at Cal Poly State University in San Luis Obispo. Brian is a leader of the IPython project, cofounder of Project Jupyter, and an active contributor to a number of other open source projects focused on data science in Python. Recently, he cocreated the Altair package for statistical visualization in Python. He is a advisory board member of NumFOCUS and a faculty fellow of the Cal Poly Center for Innovation and Entrepreneurship.

Presentations

Jupyter frontends: From the classic Jupyter Notebook to JupyterLab, nteract, and beyond Session

Kyle Kelley and Brian Granger offer a broad look at Jupyter frontends, describing their common aspects and explaining how their differences help Jupyter reach a broader set of users. They also share ongoing challenges in building these frontends (real-time collaboration, security, rich output, different Markdown formats, etc.) as well as their ongoing work to address these questions.

JupyterLab: The next-generation Jupyter frontend Session

Brian Granger, Chris Colbert, and Ian Rose offer an overview of JupyterLab, which enables users to work with the core building blocks of the classic Jupyter Notebook in a more flexible and integrated manner.

Matt Greenwood is chief inspiration officer at Two Sigma, where he has led a number of company-wide efforts in engineering and modeling. Matt began his career at Bell Labs, working in the Operating Systems group under Dennis Ritchie, before moving to IBM Research, where he was responsible for a number of early efforts in tablet computing and distributed computing. Matt also also served as lead developer and manager for a number of systems on the network element at Entrisphere, which created a product providing access equipment for broadband service providers, and created the Customer Engineering department in preparation for initial customer trials. Matt holds a BA and an MA in math from Oxford University, a master’s degree in theoretical physics from the Weizmann Institute of Science in Israel, and a PhD in mathematics from Columbia University, where he taught for a number of years.

Presentations

From Beaker to BeakerX Session

Matt Greenwood introduces BeakerX, a set of Jupyter Notebook extensions that enable polyglot data science, time series plotting and processing, research publication, and integration with Apache Spark. Matt reviews the Jupyter extension architecture and how BeakerX plugs into it, covers the current set of BeakerX capabilities, and discusses the pivot from Beaker, a standalone notebook, to BeakerX.

Jason Grout is a Jupyter developer at Bloomberg, working primarily on JupyterLab and the interactive widget system. Previously, Jason was an assistant professor of mathematics at Drake University in Des Moines, Iowa. Jason co-organizes the PyDataNYC Meetup. He has also been a major contributor to the open source Sage mathematical software system for many years. He holds a PhD in mathematics from Brigham Young University.

Presentations

Jupyter widgets: Interactive controls for Jupyter Tutorial

Jupyter widgets allow you to build user interfaces with graphical controls inside a Jupyter notebook and provide a framework for building custom controls. Sylvain Corlay and Jason Grout demonstrate how to use Jupyter widgets effectively for interactive computing, explore the ecosystem of custom controls, and walk you through building your own control.

JupyterLab tutorial Tutorial

Steven Silvester and Jason Grout lead a walkthrough of JupyterLab as a user and as an extension author, explore its capabilities, and offer a demonstration of how to create a simple extension to the environment.

Mark Hahnel is the founder of figshare, an open data tool that allows researchers to publish all of their data in a citable, searchable, and sharable manner. Mark is passionate about open science and the potential it has to revolutionize the research community. He’s fresh out of academia, having just completed his PhD in stem cell biology at Imperial College London. Mark also studied genetics in Newcastle and Leeds.

Presentations

Closing the gap between Jupyter and academic publishing Session

Reports of a lack of reproducibility have led funders and others to require open data and code as the outputs of research they fund. Mark Hahnel and Marius Tulbure discuss the opportunities for Jupyter notebooks to be the final output of academic research, arguing that Jupyter could help disrupt the inefficiencies in cost and scale of open access academic publishing.

Jess Hamrick is a PhD candidate at UC Berkeley. Her research studies how people use imagination to solve problems and reason about the world and how to apply those ideas to machine learning and artificial intelligence. Jess is a member of the Jupyter Steering Council and is the lead maintainer of nbgrader, an open source tool for creating and grading assignments in the Jupyter Notebook. She holds a BS and an MEng in computer science from MIT.

Presentations

The Jupyter Notebook as document: From structure to application Session

M Pacer, Jess Hamrick, and Damián Avila explain how the structured nature of the notebook document format, combined with native tools for manipulation and creation, allows the notebook to be used across a wide range of domains and applications.

Lindsey Heagy is a PhD candidate at the University of British Columbia studying numerical geophysics. Her work focuses on using electromagnetic geophysics for monitoring subsurface injections, including carbon capture and storage and hydraulic fracturing. She a project lead on GeoSci.xyz, an effort to build collaborative, interactive, web-based textbooks in the geosciences, and a core contributor to SimPEG, an open source framework for geophysical simulation and inversions.

Presentations

Deploying a reproducible course Session

Web-based textbooks and interactive simulations built in Jupyter notebooks provide an entry point for course participants to reproduce content they are shown and dive into the code used to build them. Lindsey Heagy and Rowan Cockett share strategies and tools for developing an educational stack that emerged from the deployment of a course on geophysics and some lessons learned along the way.

Paul Ivanov is a senior software engineer at Bloomberg LP working on IPython- and Jupyter-related open source projects. Previously, Paul worked on backend and data engineering at Disqus; was a code monkey at the Brain Imaging Center at UC Berkeley, where he worked on IPython and taught at UC Berkeley’s Python bootcamps; worked in Bruno Olshausen’s lab at the Redwood Center for Theoretical Neuroscience; and was a PhD candidate in the Vision Science program at UC Berkeley. He holds a degree in computer science from UC Davis.

Presentations

Jupyter: Kernels, protocols, and the IPython reference implementation Session

Matthias Bussonnier and Paul Ivanov walk you through the current Jupyter architecture and protocol and explain how kernels work (decoupled from but in communication with the environment for input and output, such as a notebook document). Matthias and Paul also offer an overview of a number of kernels developed by the community and show you how you can get started writing a new kernel.

Kari Jordan is the deputy director of assessment for Data Carpentry and an advocate for improving diversity in data science. Previously, Kari was a postdoctoral fellow at Embry-Riddle Aeronautical University, where her research focus was evidenced-based instructional practices among STEM faculty. Kari served on the board of directors for the National Society of Black Engineers (NSBE) for three years. A product of the Detroit Public School system, Kari holds a BS and an MS in mechanical engineering from Michigan Technological University and a PhD in engineering education from the Ohio State University. During her education, she interned with Marathon Petroleum Company, SC Johnson, Ford Motor Company, and Educational Testing Services. As a graduate student, she received fellowships from the National Society of Black Engineers (NSBE), King-Chavez-Parks Initiative, and the National GEM Consortium.

Presentations

Learning to code isn’t enough: Training as a pathway to improve diversity Session

Diversity can be achieved through sharing information among members of a community. Jupyter prides itself on being a community of dynamic developers, cutting-edge scientists, and everyday users, but is our platform being shared with diverse populations? Kari Jordan explains how training has the potential to improve diversity and drive usage of Jupyter notebooks in broader communities.

Wendy Kan is a data scientist at Kaggle, the largest global data science community, where she works with companies and organizations to transform their data into machine learning competitions. Previously, Wendy was a software engineer and researcher. She holds BS and MS degrees in electrical engineering and a PhD in biomedical engineering.

Presentations

Lessons learned from tens of thousands of Kaggle notebooks Session

Kaggle Kernels, an in-browser code execution environment that includes a version of Jupyter Notebooks, has allowed Kaggle to flourish in new ways. Drawing on a diverse repository of user-created notebooks paired with competitions and public datasets, Megan Risdal and Wendy Chih-wen Kan explain how Kernels has impacted machine learning trends, collaborative data science, and learning.

Kyle Kelley is a senior software engineer at Netflix, a maintainer on nteract.io, and a core developer of the IPython/Jupyter project. He wants to help build great environments for collaborative analysis, development, and production workloads for everyone, from small teams to massive scale.

Presentations

Jupyter at Netflix Session

So, Netflix's data scientists and engineers. . .do they know things? Join Kyle Kelley to find out. Kyle explores how Netflix uses Jupyter and explains how you can learn from Netflix's experience to enable analysts at your organization.

Jupyter frontends: From the classic Jupyter Notebook to JupyterLab, nteract, and beyond Session

Kyle Kelley and Brian Granger offer a broad look at Jupyter frontends, describing their common aspects and explaining how their differences help Jupyter reach a broader set of users. They also share ongoing challenges in building these frontends (real-time collaboration, security, rich output, different Markdown formats, etc.) as well as their ongoing work to address these questions.

Meet the Expert with Kyle Kelley (Netflix) Meet the Experts

Kyle is happy to talk with people about how Netflix’s data platform uses and deploys the backing infrastructure for Jupyter, what it’s like to build frontends for Jupyter, and where to move Jupyter forward to meet current and future needs.

Chris Kotfila is an R&D engineer at Kitware. Chris’s research interests are in natural language processing, machine learning, knowledge organization and geographic information science. He holds dual degrees in computer science and philosophy from Rensselaer Polytechnic Institute and a master’s degree in library science, where he focused on issues of open access, scholarly communication and reproducible research. During his time at RPI, he worked regularly as a research programmer in the area of computational cognitive engineering. Chris also served overseas with the US Peace Corps. He is an avid open source enthusiast and a hopeless Emacs user.

Presentations

GeoNotebook: An extension to the Jupyter Notebook for exploratory geospatial analysis Session

Chris Kotfila offers an overview of the GeoNotebook extension to the Jupyter Notebook, which provides interactive visualization and analysis of geospatial data. Unlike other geospatial extensions to the Jupyter Notebook, GeoNotebook includes a fully integrated tile server providing easy visualization of vector and raster data formats.

Aaron Kramer is a data scientist and engineer at DataScience.com, where he builds powerful language and engagement models using natural language processing, deep learning, Bayesian inference, and machine learning.

Presentations

Interactive natural language processing with SpaCy and Jupyter Tutorial

Modern natural language processing (NLP) workflows often require interoperability between multiple tools. Aaron Kramer offers an introduction to interactive NLP with SpaCy within the Jupyter Notebook, covering core NLP concepts, core workflows in SpaCy, and examples of interacting with other tools like TensorFlow, NetworkX, LIME, and others as part of interactive NLP projects.

Michael Lanzetta is a principal software development engineer on the Partner Catalyst team at Microsoft, where his current work ranges from implementing binary protocols in JavaScript to training domain-specific image classification convolutional neural networks. He works with everyone from small startups to large enterprise customers—anyone doing innovative work that is stretching the Microsoft stack (and in particular Azure) beyond its current limits. Michael is currently the head of the Machine Learning technical working group in DX, helping upskill Microsoft’s field in ML and deep learning and leading efforts to bring Microsoft’s suite of ML technologies to the aid of its partners, both large and small. Previously, Michael worked on Microsoft Live Search and Windows Mobile Services, Bing Travel, MSN and MSN Mobile, and FUSE Labs.

Presentations

Notebook narratives from industry: Inspirational real-world examples and reusable industry notebooks Session

Patty Ryan, Lee Stott, and Michael Lanzetta explore four industry examples of Jupyter notebooks that illustrate innovative applications of machine learning in manufacturing, retail, services, and education and share four reference industry Jupyter notebooks (available in both Python and R)—along with demo datasets—for practical application to your specific industry value areas.

Ryan Lovett manages research and instructional computing for the Department of Statistics at UC Berkeley and is a member of the Data Science Education Program’s infrastructure team. He is most often a sysadmin, though he also enjoys programming and consulting with faculty and students.

Presentations

Deploying JupyterHub for students and researchers Tutorial

JupyterHub, a multiuser server for Jupyter notebooks, enables you to offer a notebook server to everyone in a group—which is particularly useful when teaching a course, as students no longer need to install software on their laptops. Min Ragan-Kelley, Carol Willing, Yuvi Panda, and Ryan Lovett get you started deploying and customizing JupyterHub for your needs.

Managing a 1,000+ student JupyterHub without losing your sanity Session

The UC Berkeley Data Science Education program uses Jupyter notebooks on a JupyterHub. Ryan Lovett and Yuvi Panda outline the DevOps principles that keep the largest reported educational hub (with 1,000+ users) stable and performant while enabling all the features instructors and students require.

Johan Mabille is a scientific software developer at QuantStack, where he specializes in high-performance computing in C++. Previously, Johan was a quant developer at HSBC. An open source developer, Johan is the coauthor of xtensor and xeus and the main author of xsimd. He holds a master’s degree in computer science from Centrale-Supelec.

Presentations

Xeus: A framework for writing native Jupyter kernels Session

Xeus takes on the burden of implementing the Jupyter kernel protocol so that kernel authors can focus on more easily implementing the language-specific part of the kernel and support features, such as autocomplete or interactive widgets. Sylvain Corlay and Johan Mabille showcase a new C++ kernel based on the Cling interpreter built with xeus.

Ali Marami has PhD in Finance from University of Neuchâtel in Switzerland and BS in Electrical engineering. He has extensive experience in financial and quantitative modeling and model risk management in several US banks. He is the Chief Data Scientist and of the founders of R-Brain which is a platform for developing, sharing and promoting models and applications in Data Science.

Presentations

Building a powerful data science IDE for R, Python, and SQL using JupyterLab Session

JupyterLab provides a robust foundation for building flexible computational environments. Ali Marami explains how R-Brain leveraged the JupyterLab extension architecture to build a powerful IDE for data scientists, one of the few tools in the market that evenly supports R and Python in data science and includes features such as IntelliSense, debugging, and environment and data view.

Yoshi Nobu Masatani is a project researcher at the National Institute of Informatics, an interuniversity research institute for information and systems, where he is responsible for the design and operation of the academic cloud within NII. He has a broad range of experience with OSS-based enterprise infrastructure deployments and operations with mission-critical high-availability systems and big data clusters. Previously, Nobu was a senior specialist and manager of OSS professional services within NTT Data Corp.

Presentations

Collaboration and automated operation as literate computing for reproducible infrastructure Session

Jupyter is useful for DevOps. It enables collaboration between experts and novices to accumulate infrastructure knowledge, while automation via notebooks enhances traceability and reproducibility. Yoshi Nobu Masatani shows how to combine Jupyter with Ansible for reproducible infrastructure and explores knowledge, workflow, and customer support as literate computing practices.

Meet the Expert with Yoshi Nobu Masatani (National Institute of Informatics) Meet the Experts

Interested in literate computing for reproducibility and nblineage? Or understanding the notebook lifecycle and the consequences of computational narratives? Grab this opportunity to meet Nobu.

Wes McKinney is a software architect at Two Sigma Investments. He is the creator of Python’s pandas library and a PMC member for Apache Arrow and Apache Parquet. He wrote the book Python for Data Analysis. Previously, Wes worked for Cloudera and was the founder and CEO of DataPad.

Presentations

Data science without borders Keynote

Wes McKinney makes the case for a shared infrastructure for data science, discusses the open source community's efforts on Apache Arrow, and offers a vision for seamless computation and data sharing across languages.

Romain Menegaux is a researcher on the Quantitative Financial Research team at Bloomberg LP, where he develops derivatives pricing models and applies machine learning to a variety of financial problems. Romain is one of the main developers of bqplot and an occasional contributor to ipywidgets.

Presentations

Building interactive applications and dashboards in the Jupyter Notebook (sponsored by Bloomberg) Session

Romain Menegaux and Chakri Cherukuri demonstrate how to develop advanced applications and dashboards using open source projects, illustrated with examples in machine learning, finance, and neuroscience.

William Merchan is chief strategy officer at DataScience.com, where he leads business and corporate development, partner initiatives, and strategy. Previously, he served as senior vice president of strategic alliances and general manager of dynamic pricing at MarketShare, where he oversaw global business development and partner relationships and successfully led the company to a $450 million acquisition by Neustar.

Presentations

Three movements driving enterprise adoption of Jupyter (sponsored by DataScience.com) Keynote

William Merchan outlines the fundamental trends driving the adoption of Jupyter and shares lessons learned deploying Jupyter in large organizations. Join in to learn best practices in developing a high-performing data science team and moving data science to the core and discover where data science platforms fit in.

Daniel Mietchen is a biophysicist interested in integrating research workflows with the World Wide Web, particularly through open licensing, open standards, public version histories, and forkability. With research activities spanning from the subcellular to the organismic level, from fossils to developing embryos and from insect larvae to elephants, he has experienced multiple shades of the research cycle and a variety of approaches to collaboration and sharing in research contexts. He has also been contributing to Wikipedia and its sister projects for more than a decade and is actively engaged in increasing the interactions between the Wikimedia and research communities.

Presentations

Postpublication peer review of Jupyter notebooks referenced in articles on PubMed Central Session

Jupyter notebooks are a popular option for sharing data science workflows. Daniel Mietchen shares best practices for reproducibility and other aspects of usability (documentation, ease of reuse, etc.) gleaned from analyzing Jupyter notebooks referenced in PubMed Central, an ongoing project that started at a hackathon earlier this year and is being documented on GitHub.

Christian Moscardi is director of technology for the Data Incubator. Previously, Christian developed a CMS for food blogs, worked for Google, and researched and taught at Columbia. He organizes with BetaNYC, New York’s civic tech organization, and contributes to various civic data projects. His extracurricular activities include cooking, playing the piano, and exploring New York.

Presentations

Practical machine learning with the Jupyter Notebook 2-Day Training

Christian Moscardi walks you through developing a machine learning pipeline, from prototyping to production, with the Jupyter platform, exploring data cleaning, feature engineering, model building and evaluation, and deployment in an industry-focused setting. Along the way, you'll learn Jupyter best practices and the Jupyter settings and libraries that enable great visualizations.

Teaching from Jupyter notebooks Session

Christian Moscardi shares the practical solutions developed at the Data Incubator for using Jupyter notebooks for education. Christian explores some of the open source Jupyter extensions he has written to improve the learning experience as well as tools to clean notebooks before they are committed to version control.

Andreas Müller is a lecturer at the Data Science Institute at Columbia University and author of Introduction to Machine Learning with Python (O’Reilly), which describes a practical approach to machine learning with Python and scikit-learn. His mission is to create open tools to lower the barrier of entry for machine learning applications, promote reproducible science, and democratize the access to high-quality machine learning algorithms. Andreas is one of the core developers of the scikit-learn machine learning library and has been comaintaining it for several years. He is also a Software Carpentry instructor. Previously, he worked at the NYU Center for Data Science on open source and open science and as a machine learning scientist at Amazon.

Presentations

Data analysis and machine learning in Jupyter Tutorial

Andreas Müller walks you through a variety of real-world datasets using Jupyter notebooks together with the data analysis packages pandas, seaborn, and scikit-learn. You'll perform an initial assessment of data, deal with different data types, visualization, and preprocessing, and build predictive models for tasks such as health care and housing.

Meet the Expert with Andreas Müller (Columbia University) Meet the Experts

Do you have questions on general machine learning or maybe something a little more specific, like Python tools for machine learning, accessible machine learning and data science, automatic machine learning or scikit-learn? Andreas is a great resource; stop by for a chat.

Writing (and publishing) a book written in Jupyter notebooks Session

The Jupyter Notebook can combine narrative, code, and graphics—the ideal combination for teaching anything programming related. That's why Andreas Müller chose to write his book, Introduction to Machine Learning with Python, in a Jupyter notebook. However, going from notebook to book was not easy. Andreas shares challenges and tricks for converting notebooks for print.

Diogo Munaro Vieira is a big data engineer at Globo.com. He is experienced in web development, P2P networking, collaborative systems, recommendation systems, open source software, and business intelligence. Diogo holds a bachelor’s degree in biological science and bioinformatics and a master’s degree in artificial intelligence from Universidade Federal do Rio de Janeiro.

Presentations

Accelerating data-driven culture at the largest media group in Latin America with Jupyter Session

JupyterHub is an important tool for research and data-driven decisions at Globo.com. Diogo Munaro Vieira and Felipe Ferreira explain how data scientists at Globo.com—the largest media group in Latin America and second largest television group in the world—use Jupyter notebooks for data analysis and machine learning, making decisions that impact 50 million users per month.

Justin Nand is a solutions engineer at Zymergen. Justin has a background in bioengineering and has worked as software engineer developing tools for research and computational biology.

Presentations

Using Jupyter at the intersection of robots and industrial biology Session

Zymergen approaches biology with an engineering and data-driven mindset. Its platform integrates robotics, software, and biology to deliver predictability and reliability during strain design and development. Marc Colangelo, Justin Nand, and Danielle Chou explain the integral role Jupyter notebooks play in providing a shared Python environment between Zymergen's software engineers and scientists.

Paco Nathan leads the Learning group at O’Reilly Media. Known as a “player/coach” data scientist, Paco led innovative data teams building ML apps at scale for several years and more recently was evangelist for Apache Spark, Apache Mesos, and Cascading. Paco has expertise in machine learning, distributed systems, functional programming, and cloud computing with 30+ years of tech-industry experience, ranging from Bell Labs to early-stage startups. Paco is an advisor for Amplify Partners and was cited in 2015 as one of the top 30 people in big data and analytics by Innovation Enterprise. He is the author of Just Enough Math, Intro to Apache Spark, and Enterprise Data Workflows with Cascading.

Presentations

Humans in the loop: Jupyter notebooks as a frontend for AI pipelines at scale Session

Paco Nathan reviews use cases where Jupyter provides a frontend to AI as the means for keeping humans in the loop. This process enhances the feedback loop between people and machines, and the end result is that a smaller group of people can handle a wider range of responsibilities for building and maintaining a complex system of automation.

Meet the Expert with Paco Nathan (O'Reilly Media) Meet the Experts

Paco will be available to discuss using Jupyter notebooks in media for publishing computable content and coaching authors to be more effective with Jupyter notebooks and machine learning pipelines managed using Jupyter notebooks for active learning and human-in-the-loop design patterns.

Andrew Odewahn is the CTO of O’Reilly Media, where he helps define and create the new products, services, and business models that will help O’Reilly continue to make the transition to an increasingly digital future. The author of two books on database development, he has experience as a software developer and consultant in a number of industries, including manufacturing, pharmaceuticals, and publishing. Andrew holds an MBA from New York University and a degree in computer science from the University of Alabama. He’s also thru-hiked the Appalachian Trail from Georgia to Maine.

Presentations

Friday opening welcome Keynote

Program chairs Fernando Pérez and Andrew Odewahn open the second day of keynotes.

Jupyter at O'Reilly Keynote

For almost five years, O’Reilly Media has centered its publishing processes around tools like Jupyter, Git, GitHub, Docker, and a host of open source packages. Andrew Odewahn explores how O'Reilly is using the Jupyter architecture to create the next generation of technical content and offers a preview of what's in store for the future.

Thursday opening welcome Keynote

Program chairs Andrew Odewahn and Fernando Pérez open the first day of keynotes.

M Pacer is a Jupyter core developer at the Berkeley Institute for Data Science (BIDS) focusing on the intersection between Jupyter and scientific publishing (with an eye toward constructing a total scientific record that is more amenable to machine learning techniques). M holds a PhD from UC Berkeley, where his research used machine learning and human experiments to study casual explanation and causal inference, and a BS from Yale University.

Presentations

The Jupyter Notebook as document: From structure to application Session

M Pacer, Jess Hamrick, and Damián Avila explain how the structured nature of the notebook document format, combined with native tools for manipulation and creation, allows the notebook to be used across a wide range of domains and applications.

Yuvi Panda is infrastructure lead for the Data Science Education Program at UC Berkeley, where he works on scaling JupyterHub for use by thousands of students. A programmer and DevOps engineer, he wants to make it easy for people who don’t traditionally consider themselves programmers to do things with code and builds tools (Quarry, PAWS, etc.) to sidestep the list of historical accidents that constitute the “command-line tax” that people have to pay before doing productive things with computing. He’s a core member of the JupyterHub team and works on mybinder.org as well. Yuvi is also a Wikimedian, since you can check out of Wikimedia, but you can never leave.

Presentations

Democratizing access to open data by providing open computational infrastructure Session

Open data by itself is not enough. You need open computational infrastructures as well. Yuvi Panda offers an overview of a volunteer-led open knowledge movement that makes all of its data available openly and explores the free, open, and public computational infrastructure recently set up for people to play with and build things on its data (using a JupyterHub deployment).

Deploying JupyterHub for students and researchers Tutorial

JupyterHub, a multiuser server for Jupyter notebooks, enables you to offer a notebook server to everyone in a group—which is particularly useful when teaching a course, as students no longer need to install software on their laptops. Min Ragan-Kelley, Carol Willing, Yuvi Panda, and Ryan Lovett get you started deploying and customizing JupyterHub for your needs.

Managing a 1,000+ student JupyterHub without losing your sanity Session

The UC Berkeley Data Science Education program uses Jupyter notebooks on a JupyterHub. Ryan Lovett and Yuvi Panda outline the DevOps principles that keep the largest reported educational hub (with 1,000+ users) stable and performant while enabling all the features instructors and students require.

Hilary Parker is a data scientist at Stitch Fix and cofounder of the Not So Standard Deviations podcast. Hilary focuses on R, experimentation, and rigorous analysis development methods such as reproducibility. Previously, she was a senior data analyst at Etsy. Hilary holds a PhD in biostatistics from the Johns Hopkins Bloomberg School of Public Health. Hilary can be found on Twitter at @hspter.

Presentations

Opinionated analysis development Session

Traditionally, statistical training has focused on statistical methods and tests, without addressing the process of developing a technical artifact, such as a report. Hilary Parker argues that it's critical to teach students how to go about developing an analysis so they avoid common pitfalls and explains why we must adopt a blameless postmortem culture to address these pitfalls as they occur.

Fernando Pérez is a staff scientist at Lawrence Berkeley National Laboratory and a founding investigator of the Berkeley Institute for Data Science at UC Berkeley, created in 2013. His research focuses on creating tools for modern computational research and data science across domain disciplines, with an emphasis on high-level languages, interactive and literate computing, and reproducible research. He created IPython while a graduate student in 2001 and continues to lead its evolution into Project Jupyter, now as a collaborative effort with a talented team that does all the hard work. Fernando regularly lectures about scientific computing and data science and is a member of the Python Software Foundation, a founding member of NumFOCUS, and a National Academy of Science Kavli Frontiers of Science Fellow. He is also the recipient of the 2012 Award for the Advancement of Free Software from the Free Software Foundation. Fernando holds a PhD in particle physics from the University of Colorado at Boulder, which he followed with postdoctoral research in applied mathematics and developing numerical algorithms.

Presentations

Friday opening welcome Keynote

Program chairs Fernando Pérez and Andrew Odewahn open the second day of keynotes.

Project Jupyter: From interactive Python to open science Keynote

Fernando Pérez opens JupyterCon with an overview of Project Jupyter, describing how it fits into a vision of collaborative, community-based open development of tools applicable to research, education, and industry.

Thursday opening welcome Keynote

Program chairs Andrew Odewahn and Fernando Pérez open the first day of keynotes.

Fabio Pliger is the technical lead for Anaconda Fusion and a Bokeh core developer at Anaconda Powered by Continuum Analytics, where he also worked on the XDATA DARPA and on customer projects. Fabio has 14+ years of experience in Python applied to both highly regulated enterprise and open source. He has been an open source and Python advocate for many years and has spoken at many tech conferences around the world. He is a former chairman of the EuroPython Society, cochair of the EuroPython Conference and PyCon Italy, and cofounder of the Python Italia Association. Fabio holds an bachelor’s degree in computer science from the University of Verona, Italy.

Presentations

Leveraging Jupyter to build an Excel-Python bridge Session

Christine Doig and Fabio Pliger explain how they built a commercial product on top Jupyter to help Excel users access the capabilities of the rich data science Python ecosystem and share examples and use cases from a variety of industries that illustrate the collaborative workflow between analysts and data scientists that the application has enabled.

Cheryl Quah is a senior software engineer at Bloomberg LP, where she develops applications to improve financial professionals’ research and investment workflows.

Presentations

Industry and open source: Working together to drive advancements in Jupyter for quants and data scientists Session

Strong partnerships between the open source community and industry have driven many recent developments in Jupyter. Srinivas Sunkara and Cheryl Quah discuss the results of some of these collaborations, including JupyterLab, bqplot, and enhancements to ipywidgets that greatly enrich Jupyter as an environment for data science and quantitative financial research.

Min Ragan-Kelley is a postdoctoral fellow at Simula Research Lab in Oslo, Norway. Min has been contributing to IPython and Jupyter since 2006 (full-time since 2013). His areas of focus include the underlying infrastructure of Jupyter and deployment tools and services, such as JupyterHub and nbviewer.

Presentations

Deploying JupyterHub for students and researchers Tutorial

JupyterHub, a multiuser server for Jupyter notebooks, enables you to offer a notebook server to everyone in a group—which is particularly useful when teaching a course, as students no longer need to install software on their laptops. Min Ragan-Kelley, Carol Willing, Yuvi Panda, and Ryan Lovett get you started deploying and customizing JupyterHub for your needs.

JupyterHub: A roadmap of recent developments and future directions Session

JupyterHub is a multiuser server for Jupyter notebooks. Min Ragan-Kelley and Carol Willing discuss exciting recent additions and future plans for the project, including the ability to share notebooks with students and collaborators.

Subbu Rama is cofounder and CEO at Bitfusion, a company providing tools to make deep learning and AI application development and infrastructure management faster and easier. Previously, Subbu held various roles at Intel, leading engineering efforts spanning design, automation, validation, and postsilicon. He worked on Intel’s first integrated-graphics CPU, Intel’s first low-power CPU, Atom, and its SOC, high-performance microservers (Intel’s first initiative on microservers using low-power mobile phone processors), and Xeon servers. Subbu also led Dell Innovation Labs, driving innovation and skunk works, later overseeing numerous new strategic technology initiatives at the intersection of software and the cloud. There he built an engineering team from the ground up and launched Dell’s first cloud infrastructure marketplace.

Presentations

Deep learning and Elastic GPUs using Jupyter Session

Combined with GPUs, Jupyter makes for fast development and fast execution, but it is not always easy to switch from a CPU execution context to GPUs and back. Tim Gasper and Subbu Rama share best practices for doing deep learning with Jupyter and explain how to work with CPUs and GPUs more easily by using Elastic GPUs and quick-switching between custom kernels.

Bernie Randles is a graduate student in the Information Studies program at UCLA. Her work is centered around knowledge creation in astronomy, specifically examining astronomers’ data and software pipeline practices. She also researches the use of open source software in scientific research organizations, primarily in data-rich and computationally intensive fields. Previously, Bernie worked in IT (wearing many hats, some red) at several colleges and universities. She holds degrees in math, computer science, and fine arts.

Presentations

Citing the Jupyter Notebook in the scientific publication process Session

Although researchers have traditionally cited code and data related to their publications, they are increasingly using the Jupyter Notebook to share the processes involved in the act of scientific inquiry. Bernie Randles and Catherine Zucker explore various aspects of citing Jupyter notebooks in publications, discussing benefits, pitfalls, and best practices for creating the "paper of the future."

Megan Risdal is a marketing manager at Kaggle. She holds master’s degrees in linguistics from the University of California, Los Angeles, and North Carolina State University. Her curiosities lie at the intersection of data, science, language, and learning.

Presentations

Lessons learned from tens of thousands of Kaggle notebooks Session

Kaggle Kernels, an in-browser code execution environment that includes a version of Jupyter Notebooks, has allowed Kaggle to flourish in new ways. Drawing on a diverse repository of user-created notebooks paired with competitions and public datasets, Megan Risdal and Wendy Chih-wen Kan explain how Kernels has impacted machine learning trends, collaborative data science, and learning.

Mac Rogers is a research engineer at Domino Data Lab, where he helps teams at enterprise companies, nonprofits, and universities accelerate research and integrate predictive models into their business using Domino’s data science platform. Previously, Mac was an investment engineer focusing on equity research systemization at systematic investment management firm Bridgewater Associates.

Presentations

Reproducible dashboards and other great things to do with Jupyter (sponsored by Domino Data Lab) Session

Mac Rogers shares best practices for creating Jupyter dashboards and some lesser-known tricks for making Jupyter dashboards interactive and attractive.

Ian Rose is as postdoctoral fellow at the Berkeley Institute for Data Science, where he works on the Jupyter Project. He holds a PhD in geology from UC Berkeley, where his research focused on the physics of the deep Earth.

Presentations

JupyterLab: The next-generation Jupyter frontend Session

Brian Granger, Chris Colbert, and Ian Rose offer an overview of JupyterLab, which enables users to work with the core building blocks of the classic Jupyter Notebook in a more flexible and integrated manner.

Philipp Rudiger is a software developer at Anaconda Powered by Continuum Analytics, where he develops open source and client-specific software solutions for data management, visualization, and analysis. Philipp holds a PhD in computational modeling of the visual system.

Presentations

Deploying interactive Jupyter dashboards for visualizing hundreds of millions of datapoints, in 30 lines of Python Tutorial

It can be difficult to assemble the right set of packages from the Python scientific software ecosystem to solve complex problems. James Bednar and Philipp Rudiger walk you step by step through making and deploying a concise, fast, and fully reproducible recipe for interactive visualization of millions or billions of data points using very few lines of Python in a Jupyter notebook.

Patty Ryan leads prototyping engagements with partners, both large and small, on the Technology Evangelism and Development team at Microsoft. She specializes in designing and operationalizing predictive models that inform strategies, focus customer outreach, and increase engagement. Previously, Patty led telemetry, analytics, UX, and support in Dynamics, Azure Identity, and O365, driving innovation in customer-facing self-service and distributed analytics.

Presentations

Notebook narratives from industry: Inspirational real-world examples and reusable industry notebooks Session

Patty Ryan, Lee Stott, and Michael Lanzetta explore four industry examples of Jupyter notebooks that illustrate innovative applications of machine learning in manufacturing, retail, services, and education and share four reference industry Jupyter notebooks (available in both Python and R)—along with demo datasets—for practical application to your specific industry value areas.

Zach Sailer is graduate student at the Harms Lab at the University of Oregon, where he studies the mechanisms that shape protein evolution from a biophysical perspective. Previously, he was a core developer for the IPython/Jupyter team at Cal Poly San Luis Obispo. Zach has created and contributed to various scientific open source projects and is also a strong advocate for open science, working hard to promote and practice open science in all aspects of his research.

Presentations

How Jupyter makes experimental and computational collaborations easy Session

Scientific research thrives on collaborations between computational and experimental groups, who work together to solve problems using their separate expertise. Zach Sailer highlights how tools like the Jupyter Notebook, JupyterHub, and ipywidgets can be used to make these collaborations smoother and more effective.

Scott Sanderson is a senior software engineer at Quantopian, where he is responsible for the design and implementation of Quantopian’s backtesting and research APIs. Within the Jupyter ecosystem, most of Scott’s work focuses on enhancing the extensibility of the Jupyter Notebook for use in large deployments.

Presentations

Building a notebook platform for 100,000 users Session

Scott Sanderson describes the architecture of the Quantopian Research Platform, a Jupyter Notebook deployment serving a community of over 100,000 users, explaining how, using standard extension mechanisms, it provides robust storage and retrieval of hundreds of gigabytes of notebooks, integrates notebooks into an existing web application, and enables sharing notebooks between users.

Kaz Sato is a staff developer advocate on Google’s Cloud Platform team, where he focuses on machine learning and data analytics products, such as TensorFlow, Cloud ML, and BigQuery. Kaz has also led and supported developer communities for Google Cloud for over eight years. He has been an invited speaker at events including Google Cloud Next ’17 SF, Google I/O 2016 and 2017, the 2017 Strata Data Conference in London, the 2016 Strata + Hadoop World in San Jose and NYC, the 2016 Hadoop Summit, and ODSC East 2016 and 2017. Kaz is also interested in hardware and the IoT and has been hosting FPGA meetups since 2013.

Presentations

Cloud Datalab: Jupyter with the power of BigQuery and TensorFlow Session

Kazunori Sato explains how you can use Google Cloud Datalab—a Jupyter environment from Google that integrates BigQuery, TensorFlow, and other Google Cloud services seamlessly—to easily run SQL queries from Jupyter to access terabytes of data in seconds and train a deep learning model with TensorFlow with tens of GPUs in the cloud, with all the usual tools available on Jupyter.

Robert Schroll is a data scientist in residence at the Data Incubator. Previously, he held postdocs in Amherst, Massachusetts, and Santiago, Chile, where he realized that his favorite parts of his job were teaching and analyzing data. He made the switch to data science and has been at the Data Incubator since. Robert holds a PhD in physics from the University of Chicago.

Presentations

Machine learning with TensorFlow and Jupyter 2-Day Training

Robert Schroll introduces TensorFlow's capabilities through its Python interface with a series of Jupyter notebooks, moving from building machine learning algorithms piece by piece to using the higher-level abstractions provided by TensorFlow. You'll then use this knowledge to build and visualize machine learning models on real-world data.

Skipper Seabold is the director of data science at data science technology and advisory firm Civis Analytics, where he drives the direction of the Civis Data Science Platform and pushes the capabilities of solutions that Civis can provide to its clients. Skipper is an economist by training and has a decade of experience working in the Python data open source community. He started and led the statsmodels Python project, was formerly on the core pandas team, and has contributed to many projects in Python data stack.

Presentations

Jupyter notebooks and the road to enabling data-driven teams Session

It’s not enough just to give data scientists access to Jupyter notebooks in the cloud. Skipper Seabold and Lori Eich argue that to build truly data-driven organizations, everyone from data scientists and managers to business stakeholders needs to work in concert to bring data science out of the wilderness and into the core of decision-making processes.

Leah Silen has been the executive director of NumFocus from its beginning and worked with the founding board members to write the application for NumFocus’s nonprofit status. Previously, Leah was a public relations and program director in the nonprofit sector, where she focused on community relations and fundraising. Leah has volunteered and sat on several boards of nonprofit organizations.

Presentations

Empower scientists; save humanity: NumFOCUS—Five years in, five hundred thousand to go Session

What do the discovery of the Higgs boson, the landing of the Philae robot, the analysis of political engagement, and the freedom of human trafficking victims have in common? NumFOCUS projects were there. Join Leah Silen and Andy Terrel to learn how we can empower scientists and save humanity.

Steven Silvester is a software engineer at Anaconda Powered by Continuum Analytics, where he works on Project Jupyter and JupyterLab, a next-generation user interface for the Jupyter Notebook. He has also written kernels for Octave, MATLAB, and Scilab. Previously, Steven served 10 years in the US Air Force.

Presentations

JupyterLab tutorial Tutorial

Steven Silvester and Jason Grout lead a walkthrough of JupyterLab as a user and as an extension author, explore its capabilities, and offer a demonstration of how to create a simple extension to the environment.

Raj Singh is a developer advocate for IBM Cloud Data Services. Raj pioneered web mapping as a service in the late 1990s with his startup Syncline. Previously, Raj worked on geospatial data interoperability challenges for the Open Geospatial Consortium, an international standards body. He’s a frequent speaker on interoperability and geolocation services and contributed to Mapping Hacks. He holds a PhD from MIT, where his research explored the potential of web services to power urban information systems.

Presentations

Mapping data in Jupyter notebooks with PixieDust (sponsored by IBM) Session

Raj Singh offers an overview of PixieDust, a Jupyter Notebook extension that provides an easy way to make interactive maps from DataFrames for visual exploratory data analysis. Raj explains how he built mapping into PixieDust, putting data from Apache Spark-based analytics on maps using Mapbox GL.

Lee Stott is CTO of academic engagements at Microsoft, where he engages academic institutions across the UK in the ongoing development of the Microsoft platform. Lee has held a number of roles at Microsoft, including academic and technical evangelist. Previously, Lee was the head of information systems at the University of Manchester, where he led service and delivery teams across both academic and commercial markets. Lee holds a PGCE in higher education management from the University of Southampton and an MSc in information technology from the University of Liverpool.

Presentations

Notebook narratives from industry: Inspirational real-world examples and reusable industry notebooks Session

Patty Ryan, Lee Stott, and Michael Lanzetta explore four industry examples of Jupyter notebooks that illustrate innovative applications of machine learning in manufacturing, retail, services, and education and share four reference industry Jupyter notebooks (available in both Python and R)—along with demo datasets—for practical application to your specific industry value areas.

Srinivas Sunkara is a quant on the Quantitative Financial Research team at Bloomberg LP, where he works on developing financial models that apply machine learning techniques to various problems in finance. Srinivas is one of the main developers of bqplot, a Jupyter Notebook–based interactive plotting library, and contributes to other open source projects, including ipywidgets and traitlets.

Presentations

Industry and open source: Working together to drive advancements in Jupyter for quants and data scientists Session

Strong partnerships between the open source community and industry have driven many recent developments in Jupyter. Srinivas Sunkara and Cheryl Quah discuss the results of some of these collaborations, including JupyterLab, bqplot, and enhancements to ipywidgets that greatly enrich Jupyter as an environment for data science and quantitative financial research.

Vinitra Swamy graduated two years early with a bachelor’s degree in computer science from the University of California, Berkeley, and is now working toward a master’s degree in computer science. Her research interests include data science, cloud computing environments, and natural language processing. Vinitra is head student instructor for Berkeley’s new Foundations of Data Science course, helping shape curriculum and educating thousands of students from diverse backgrounds. Her efforts in data science education were recently recognized with a Berkeley EECS award of excellence in teaching and leadership. Vinitra also leads a Jupyter development student research team within the Data Science Education program and assists with the technical deployment and use of JupyterHub infrastructure campus-wide.

Presentations

Data science at UC Berkeley: 2,000 undergraduates, 50 majors, no command line Session

Engaging critically with data is now a required skill for students in all areas, but many traditional data science programs aren’t easily accessible to those without prior computing experience. Gunjan Baid and Vinitra Swamy explore UC Berkeley's Data Science program—2,000 students across 50 majors—explaining how its pedagogy was designed to make data science accessible to everyone.

Ian Swanson is the CEO of DataScience.com. An expert in big data and analytics, an accomplished entrepreneur, and a successful executive for such Fortune 500 companies as American Express and Sprint, Ian is at home in both startups and enterprise-level organizations. Previously, he founded Sometrics (acquired by American Express in 2011), which launched the industry’s first global virtual currency platform. That platform—for which he earned a patent—managed more than 3.3 trillion units of virtual currency and served an online audience of 250 million in more than 180 countries. Prior to Sometrics, Ian worked for the secure chat and messaging startup Userplane (acquired by AOL). A sought-after speaker on data science, the internet of things, big data, and performance-based analytics, he advises a number companies on their product and marketing strategies and serves as a mentor to the Los Angeles startup incubators Amplify and Launchpad LA. Ian won the 2013 American Express Chairman’s Award and was twice recognized as one of Direct Marketing News’s 30 under 30. He attended the University of California, Santa Barbara.

Presentations

Data science platforms: Your key to actionable analytics (sponsored by DataScience.com) Session

Ian Swanson explores the key components of a data science platform and explains how they are enabling organizations to realize the potential of their data science teams.

Thorin Tabor is a software engineer at UCSD and a contributing scientist at the Broad Institute. Thorin is the lead developer of the GenePattern Notebook and an open source developer on the integration of bioinformatic tools with Jupyter.

Presentations

GenePattern Notebook: Jupyter for integrative genomics Session

Thorin Tabor offers an overview of the GenePattern Notebook, which allows Jupyter to communicate with the open source GenePattern environment for integrative genomics analysis. It wraps hundreds of software tools for analyzing omics data types, as well as general machine learning methods, and makes them available through a user-friendly interface.

David Taieb is the STSM for the Cloud Data Services Developer Advocacy team at IBM, where he leads a team of avid technologists with the mission of educating developers on the art of possible with cloud technologies. He’s passionate about building open source tools, such as the PixieDust Python library for the Jupyter Notebook and Apache Spark, that help improve developer’s productivity and overall experience. Previously, David was the lead architect for the Watson Core UI and Tooling team based in Littleton, Massachusetts, where he led the design and development of a Unified Tooling Platform to support all the Watson Tools, including accuracy analysis, test experiments, corpus ingestion, and training data generation. Before that, he was the lead architect for the Domino Server OSGi team responsible for integrating the eXpeditor J2EE Web Container in Domino and building first-class APIs for the developer community. David started with IBM in 1996, working on various globalization technologies and products including Domino Global Workbench and a multilingual content management system for the Websphere Application Server. David enjoys sharing his experience by speaking at conferences and meeting as many people as possible. You’ll find him at various events like the Strata Data Conference, Velocity, and IBM Interconnect.

Presentations

Data science made easy in Jupyter notebooks using PixieDust and InsightFactory Session

David Taieb, Prithwish Chakraborty, and Faisal Farooq offer an overview of PixieDust, a new open source library that speeds data exploration with interactive autovisualizations that make creating charts easy and fun.

Andy Terrel is president of NumFOCUS. He is also the chief data scientist of REX Real Estate, where he brings his experience building smart, scalable data systems to the real estate industry. A data architect, computational scientist, and technical leader, Andy is a passionate advocate for open source scientific codes and has been involved in the wider scientific Python community since 2006, contributing to numerous projects in the scientific stack.

Presentations

Empower scientists; save humanity: NumFOCUS—Five years in, five hundred thousand to go Session

What do the discovery of the Higgs boson, the landing of the Philae robot, the analysis of political engagement, and the freedom of human trafficking victims have in common? NumFOCUS projects were there. Join Leah Silen and Andy Terrel to learn how we can empower scientists and save humanity.

Andrew Therriault is the chief data officer for the City of Boston, where he leads Boston’s Analytics team, a nationally recognized leader in using data science to improve city operations and make progress in critical areas such as public safety, education, transportation, and health. Previously, Andrew was director of data science for the Democratic National Committee and served as editor of Data and Democracy: How Political Data Science Is Shaping the 2016 Elections from O’Reilly. He holds a PhD in political science from NYU and completed a postdoctoral research fellowship at Vanderbilt.

Presentations

Jupyter notebooks and production data science workflows Session

Jupyter notebooks are a great tool for exploratory analysis and early development, but what do you do when it's time to move to production? A few years ago, the obvious answer was to export to a pure Python script, but now there are other options. Andrew Therriault dives into real-world cases to explore alternatives for integrating Jupyter into production workflows.

Rachel Thomas is the cofounder of fast.ai and a researcher in residence at USF Data Institute, where she teaches numerical linear algebra. Rachel helped create the free Practical Deep Learning for Coders MOOC, which 50,000 students have started. Previously, she worked as a quant in energy trading, a data scientist and engineer at Uber, and a senior instructor at Hackbright. Rachel is a popular writer on data science and diversity in tech. Her writing has made the front page of Hacker News and Medium, has been included in newsletters by O’Reilly, Fortune, crunchbase, and Mattermark, and has been translated into Spanish, Portuguese, and Chinese. Rachel holds a PhD in mathematics from Duke.

Presentations

How the Jupyter Notebook helped fast.ai teach deep learning to 50,000 students Keynote

Although some claim you must start with advanced math to use deep learning, the best way for any coder to get started is with code. Rachel Thomas explains how fast.ai's Practical Deep Learning for Coders course uses Jupyter notebooks to provide an environment that encourages students to learn deep learning through experimentation.

Rollin Thomas is a big data architect in the Data and Analytics Services group at Lawrence Berkeley National Laboratory. Previously, he was a staff scientist in the Computational Research division. Rollin has worked on numerical simulations of supernova atmospheres, observation and analysis of supernova spectroscopy data, and data management for supernova cosmology experiments. He has served as a member of the Nearby Supernova Factory, is a builder on the Dark Energy Survey, and is a full member of the Large Synoptic Survey Telescope Dark Energy Science Collaboration. Rollin holds a BS in physics from Purdue University and a PhD in astrophysics from the University of Oklahoma.

Presentations

How JupyterHub tamed big science: Experiences deploying Jupyter at a supercomputing center Session

Shreyas Cholia, Rollin Thomas, and Shane Canon share their experience leveraging JupyterHub to enable notebook services for data-intensive supercomputing on the Cray XC40 Cori system at the National Energy Research Scientific Computing Center (NERSC).

Marius Tulbure is a developer and JavaScript enthusiast at figshare, always looking to evolve and improve his code and skills. If asked, he’ll list his hobbies as “everything,” but for the sake of brevity, they include binge-watching TV series and movies, playing his electric guitar, and trying to solve all sorts of hacking puzzles.

Presentations

Closing the gap between Jupyter and academic publishing Session

Reports of a lack of reproducibility have led funders and others to require open data and code as the outputs of research they fund. Mark Hahnel and Marius Tulbure discuss the opportunities for Jupyter notebooks to be the final output of academic research, arguing that Jupyter could help disrupt the inefficiencies in cost and scale of open access academic publishing.

Peter Wang is the cofounder and CTO of Continuum Analytics, where he leads the product engineering team for the Anaconda platform and open source projects including Bokeh and Blaze. Peter has been developing commercial scientific computing and visualization software for over 15 years and has software design and development experience across a broad variety of areas, including 3D graphics, geophysics, financial risk modeling, large data simulation and visualization, and medical imaging. As a creator of the PyData conference, he also devotes time and energy to growing the Python data community by advocating, teaching, and speaking about Python at conferences worldwide. Peter holds a BA in physics from Cornell University.

Presentations

Fueling open innovation in a data-centric world (sponsored by Anaconda Powered by Continuum Analytics) Session

Peter Wang explores open source commercial companies, offering a firsthand account of the unique challenges of building a company that is fundamentally centered around sustainable open source innovation and sharing guidelines for how to carry volunteer-based open source values forward, intentionally and thoughtfully, in a data-centric world.

Jupyter and Anaconda: Shaking up the enterprise (sponsored by Anaconda Powered by Continuum Analytics) Keynote

In recent years, open source has emerged as a valuable player in the enterprise, and companies like Jupyter and Anaconda are leading the way. Peter Wang discuss the coevolution of these two major players in the new open data science ecosystem and shares next steps to a sustainable future.

Christopher Wilcox is a software engineer at Microsoft, where he works on a range of products including Azure Notebooks, Python Tools for Visual Studio, and the Azure SDK for Python. Chris has more than five years’ experience building developer tooling and, more recently, scalable web services. In his spare time, he races motorcycles, hikes, and explores the Seattle brewing scene.

Presentations

Hosting Jupyter at scale Session

Have you thought about what it takes to host 500+ Jupyter users concurrently? What about managing 17,000+ users and their content? Christopher Wilcox explains how Azure Notebooks does this daily and discusses the challenges faced in designing and building a scalable Jupyter service.

Karlijn Willems is a journalist at DataCamp, where she focuses on data science and data science education. Previously, she worked as a junior big data developer with Hadoop, Spark, and Scala. Karlijn holds a degree in literature and linguistics (English and Spanish) and information management from KU Leuven.

Presentations

Enhancing data journalism with Jupyter Session

Drawing inspiration from narrative theory and design thinking, Karlijn Willems walks you through effectively using Jupyter notebooks to guide the data journalism workflow and tackle some of the challenges that data can pose to data journalism.

Carol Willing is a director of the Python Software Foundation, a Jupyter Steering Council member, and a geek in residence at FabLab San Diego, where she teaches wearable electronics and software development. She co-organizes PyLadies San Diego and San Diego Python, contributes to open source community projects, including OpenHatch, and is an active member of the MIT Enterprise Forum in San Diego. She enjoys sharing her passion for electronics, software, problem solving and the arts. Previously, Carol worked in software engineering management, product and project management, sales, and nonprofit organizations. She holds an MS in management with an emphasis on applied economics and high tech marketing from MIT and a BSE in electrical engineering from Duke University.

Presentations

Deploying JupyterHub for students and researchers Tutorial

JupyterHub, a multiuser server for Jupyter notebooks, enables you to offer a notebook server to everyone in a group—which is particularly useful when teaching a course, as students no longer need to install software on their laptops. Min Ragan-Kelley, Carol Willing, Yuvi Panda, and Ryan Lovett get you started deploying and customizing JupyterHub for your needs.

JupyterHub: A roadmap of recent developments and future directions Session

JupyterHub is a multiuser server for Jupyter notebooks. Min Ragan-Kelley and Carol Willing discuss exciting recent additions and future plans for the project, including the ability to share notebooks with students and collaborators.

Music and Jupyter: A combo for creating collaborative narratives for teaching Session

Music engages and delights. Carol Willing explains how to explore and teach the basics of interactive computing and data science by combining music with Jupyter notebooks, using music21, a tool for computer-aided musicology, and Magenta, a TensorFlow project for making music with machine learning, to create collaborative narratives and publishing materials for teaching and learning.