Brought to you by NumFOCUS Foundation and O’Reilly Media Inc.
The official Jupyter Conference
August 22-23, 2017: Training
August 23-25, 2017: Tutorials & Conference
New York, NY

Speakers

New speakers are added regularly. Please check back to see the latest updates to the agenda.

Filter

Search Speakers

Safia Abdalla is one of the maintainers of nteract, a desktop-based interactive computing experience. A data scientist and software engineer with an interest in open source software and data science for social good, Safia is the organizer of PyData Chicago. In her free time, she enjoys running, working out, and drinking tea.

Presentations

How to cross the asteroid belt Tutorial

Have you wondered what it takes to go from a Jupyter user to a Jupyter pro? Wonder no more. Safia Abdalla explores the core concepts of the Jupyter ecosystem, including the extensions ecosystem, the kernel ecosystem, and the frontend architecture, leaving you with an understanding of the possibilities of the Jupyter ecosystem and practical skills on customizing the Jupyter Notebook experience.

Alexandre Archambault is a software and data engineer at Teads.tv. Alexandre is also a contributor to and author of noted Scala projects, inclusing coursier and shapeless.

Presentations

Scala: Why hasn't an official Scala kernel for Jupyter emerged yet? Session

Alexandre Archambault explores why an official Scala kernel for Jupyter has yet to emerge. Part of the answer lies in the fact that there is no user-friendly, easy-to-use Scala shell in the console (i.e., no IPython for Scala). But there's a new contender, Ammonite—although it still has to overcome a few challenges, not least being supporting by big data frameworks like Spark, Scio, and Scalding.

Demba Ba is an Assistant Professor of Electrical Engineering and Bioengineering at Harvard University where he directs the CRISP group. He and his group develop mathematical and computational tools to elucidate the role of dynamic networks of neurons in phenomena such as anesthesia, sleep, the learning of fear and aging, and to enable more efficient signal representations that exploit the structure present in natural media such as audio, images, and video. In 2016, he received a Research Fellowship in Neuroscience from the Alfred P. Sloan Foundation. Prof. Ba received the B.Sc. degree in Electrical Engineering from the University of Maryland, College Park, in 2004 and the M.Sci. and Ph.D. degrees in Electrical Engineering and Computer Science with a minor in Mathematics from the Massachusetts Institute of Technology, Cambridge, MA in 2006 and 2011, respectively.

Prof. Ba is passionate about teaching, and eagerly incorporates Jupyter notebooks and the Python ecosystem in his teaching because of the unique opportunity they provide for interactive, web-based, teaching of content that has not traditionally leveraged scientific computing resources. In the School of Engineering and Applied Sciences, he spearheaded the development and deployment of the JupyterHub Notebook on Amazon AWS cloud for two classes ES155, Biological Signal Processing, and ES201, Decision Theory. His motivation was to bridge the traditional gap that has existed between theory/pen-and-paper and application/coding focused classes. ES155 and ES201 bridge this gap by bringing together under one umbrella data-friendly devices (a research-grade wearable), sophisticated signal processing/theory, and a Python-based scientific computing platform in the cloud powered by Jupyter notebooks.

How can we enable under-served communities to develop their own tech services and solutions based on data? Prof. Ba believes that teaching data science in these communities, beginning at the high-school level, is one solution. The Jupyter Notebook is a cost-effective/affordable way to teach data science, which will help democratize access to data science and data-related tools.

Presentations

Keynote by Demba Ba Keynote

Details to come.

Gunjan Baid is a student at University of California, Berkeley. She completed her bachelor’s degree in computer science and biochemistry and is now pursuing a master’s in computer science with a research focus on computational biology. Gunjan is associated with the undergraduate Data Science education program, where as a student instructor, she worked with Jupyter notebooks in the classroom and now provides technical support for the program’s JupyterHub infrastructure.

Presentations

Data science at UC Berkeley: 2,000 undergraduates, 50 majors, no command line Session

Engaging critically with data is now a required skill for students in all areas, but many traditional data science programs aren’t easily accessible to those without prior computing experience. Gunjan Baid and Vinitra Swamy explore UC Berkeley's Data Science program—1,200 students across 50 majors—explaining how its pedagogy was designed to make data science accessible to everyone.

Lorena A. Barba is associate professor of mechanical and aerospace engineering at the George Washington University in Washington, DC. Her research includes computational fluid dynamics, high-performance computing, computational biophysics, and animal flight, and she is well known for her courses and open educational resources using Jupyter notebooks. An international leader in computational science and engineering, Lorena is also a long-standing advocate of open source software for science and education. She was a recipient of the 2016 Leamer-Rosenthal Award for Open Social Sciences, and in 2017, she was nominated and received an honorable mention in the Open Education Awards for Excellence of the Open Education Consortium. She received the NSF Faculty Early CAREER award in 2012, was named a 2012 CUDA fellow by NVIDIA, and was awarded a grant by the UK Engineering and Physical Sciences Research Council (EPSRC) First Grant program in 2007. Lorena holds a PhD in aeronautics from the California Institute of Technology.

Presentations

Keynote by Lorena Barba Keynote

Details to come.

James Bednar is a solutions architect at Continuum Analytics and an honorary fellow in the School of Informatics at the University of Edinburgh, Scotland. Previously, Jim was a lecturer and researcher in computational neuroscience at the University of Edinburgh, Scotland, and a software and hardware engineer at National Instruments. He manages the open source Python projects datashader, HoloViews, GeoViews, ImaGen, param, and paramnb. He has published more than 50 papers and books about the visual system, data visualization, and software development. Jim holds a PhD in computer science from the University of Texas as well as degrees in electrical engineering and philosophy.

Presentations

Deploying interactive Jupyter dashboards for visualizing hundreds of millions of datapoints, in 30 lines of Python Tutorial

It can be difficult to assemble the right set of packages from the Python scientific software ecosystem to solve complex problems. James Bednar and Philipp Rudiger walk you step by step through making and deploying a concise, fast, and fully reproducible recipe for interactive visualization of millions or billions of data points using very few lines of Python in a Jupyter notebook.

Daina Bouquin is the head librarian of the Harvard-Smithsonian Center for Astrophysics in Cambridge, MA. Her work aims to lower social and technical barriers that impact the astronomy community’s ability to create and share new knowledge. Her research interests focus primarily on how libraries can support open science, research software preservation, emerging computational methods, and the history of science. Daina is currently working toward an MS in data analytics at CUNY’s School of Professional Studies.

Presentations

Beautiful networks and network analytics made simpler with Jupyter Session

Performing network analytics with NetworkX and Jupyter often results in difficult-to-examine hairballs rather than useful visualizations. Meanwhile, more flexible tools like SigmaJS have high learning curves for people new to JavaScript. Daina Bouquin and John DeBlase share a simple, flexible architecture that can help create beautiful JavaScript networks without ditching the Jupyter Notebook.

Maarten Breddels is a postdoctoral researcher at the Kapteyn Astronomical Institute at the University of Groningen (RUG), Netherlands, where he works for the Gaia mission, combining astronomy and IT, to enable visualization and exploration of the large dataset this satellite will yield. Maarten has experience with low-level languages, such as Assembly and C, and higher-level languages, including C++, Java, and Python. He holds a bachelor’s degree in information technology and a bachelor’s degree, master’s degree, and PhD in astronomy, where his research focused on the field of galactic dynamics.

Presentations

A billion stars in the Jupyter Notebook Session

Maarten Breddels offers an overview of vaex, a Python library that enables calculating statistics for a billion samples per second on a regular n-dimensional grid, and ipyvolume, a library that enables volume and glyph rendering in Jupyter notebooks. Together, these libraries allow the interactive visualization and exploration of large, high-dimensional datasets in the Jupyter Notebook.

Matt Burton is a visiting assistant professor at the School of Computing and Information at the University of Pittsburgh. His research interests include infrastructure studies, data science, and scholarly communication. Matt holds a PhD in information from the University of Michigan. His dissertation, Blogs as Infrastructure for Scholarly Communication, explored digital humanities blogging and the sociotechnical dynamics of web-centric publishing.

Presentations

Defactoring pace of change: Reviewing computational research in the digital humanities Session

While Jupyter Notebooks are a boon for computational science, they are also a powerful tool in the digital humanities. Matt Burton offers an overview of the digital humanities community, discusses defactoring—a novel use of Jupyter Notebooks to analyze computational research, and reflects upon Jupyter’s relationship to scholarly publishing and the production of knowledge.

Natalino Busa is the head of data science at Teradata, where he leads the definition, design, and implementation of big, fast data solutions for data-driven applications, such as predictive analytics, personalized marketing, and security event monitoring. Previously, Natalino served as enterprise data architect at ING and as senior researcher at Philips Research Laboratories on the topics of system-on-a-chip architectures, distributed computing, and parallelizing compilers. Natalino is an all-around technology manager, product developer, and innovator with a 15+ year track record in research, development, and management of distributed architectures and scalable services and applications.

Presentations

Data science apps: Beyond notebooks Session

Jupyter notebooks are transforming the way we look at computing, coding, and science. But is this the only "data scientist experience" that this technology can provide? Natalino Busa explains how you can create interactive web applications for data exploration and analysis that in the background are still powered by the well-understood and well-documented Jupyter Notebook.

Charlotte Cabasse-Mazel is an ethnographer at the Berkeley Institute for Data Science at UC Berkeley. She is interested in the ways in which practices and methodologies of data science transform production of knowledge and interdisciplinary collaboration, as well as scientific personae and trajectories within the academic institution. Charlotte holds a PhD in geography and science and technologies studies from the University of Paris-Est, where she studied at the Laboratoire Techniques, Territoires et Sociétés (LATTS), at Ecole Nationale des Ponts et Chaussées.

Presentations

Jupyter and the changing rituals around computation Session

The concept of rituals is useful for thinking about how the core technology of Jupyter notebooks is extended through other tools, platforms, and practices. R. Stuart Geiger, Brittany Fiore-Gartland, and Charlotte Cabasse-Mazel share ethnographic findings about various rituals performed with Jupyter notebooks.

Brett Cannon is a Python core developer working on Python on the Azure Data Science Tools team at Microsoft.

Presentations

Keynote by Brett Cannon Keynote

Details to come.

Shane Canon is a project engineer in the Data and Analytics Services group at NERSC in the Lawrence Berkeley National Laboratory, where he focuses on enabling data-intensive applications on HPC platforms and engaging with bioinformatics applications. Shane has held a number of positions at NERSC, including leading the Technology Integration Group, where he focused on the Magellan Project and other areas of strategic focus, leading the Data Systems group, and serving as a system administrator for the PDSF cluster, where he gained experience in cluster administration, batch systems, parallel file systems, and the Linux kernel. He was also a group leader at Oak Ridge National Laboratory, where he architected the 10 petabyte Spider filesystem. Shane is involved in a number of projects outside of NERSC, including the production lead on the KBase project, which is developing a platform to enable predictive biology. Shane holds a PhD in physics from Duke University and a BS in physics from Auburn University.

Presentations

How JupyterHub tamed big science: Experiences deploying Jupyter at a supercomputing center Session

Shreyas Cholia, Rollin Thomas, and Shane Canon share their experience leveraging Jupyterhub to enable notebook services for data-intensive supercomputing on the Cray XC40 Cori system at the National Energy Research Scientific Computing Center (NERSC).

Prithwish Chakraborty is a data scientist on the IBM Watson for Real World Evidence team at IBM Watson Health. His work focuses on applications of data science towards patient health characterization and risk modeling. Broadly, his research interests are temporal data mining, machine learning, and image recognition. His work has been published in key data science venues, including KDD, SDM, and AAAI, and he presented a tutorial on public health forecasting in AAAI 2016 and gave an invited talk at BCDE 2014. Prithwish holds a patent with HP labs on forecasting solar photovoltaic output. He holds a PhD in computer science from Virginia Tech, where his research, under the guidance of Naren Ramakrishnan, focused on the applications of data science to public health forecasting.

Presentations

Data science made easy in Jupyter notebooks using PixieDust and InsightFactory Session

David Taieb, Prithwish Chakraborty, and Faisal Farooq offer an overview of PixieDust, a new open source library that speeds data exploration with interactive autovisualizations that make creating charts easy and fun.

Shreyas Cholia leads the Usable Software Systems group at Lawrence Berkeley National Laboratory (LBNL), which focuses on making scientific computing more transparent and usable. He is particularly interested in how web APIs and tools can facilitate this. Shreyas also leads the science gateway, web, and grid efforts at the National Energy Research Scientific Computing Center (NERSC) at LBNL. His current work includes a project that enables Jupyter to interact with supercomputing resources, and NEWT, a REST API for high-performance computing. He holds a degree from Rice University, where he studied computer science and cognitive sciences.

Presentations

How JupyterHub tamed big science: Experiences deploying Jupyter at a supercomputing center Session

Shreyas Cholia, Rollin Thomas, and Shane Canon share their experience leveraging Jupyterhub to enable notebook services for data-intensive supercomputing on the Cray XC40 Cori system at the National Energy Research Scientific Computing Center (NERSC).

Danielle Chou is a solutions engineer at Zymergen, where she works on custom software tools for scientists. Previously, she worked on failure detection software for an ingestible sensor company and studied bioengineering at UC Berkeley and UCSF.

Presentations

Using Jupyter at the intersection of robots and industrial biology Session

Zymergen approaches biology with an engineering and data-driven mindset. Its platform integrates robotics, software, and biology to deliver predictability and reliability during strain design and development. Marc Colangelo, Justin Nand, and Danielle Chou explain the integral role Jupyter notebooks play in providing a shared Python environment between Zymergen's software engineers and scientists.

Rowan Cockett is the founder and CTO of 3point Science (acquired by Aranz Geo in 2016), a company building web-based visualization software for the geoscience industry, including Steno3D. Rowan is also a graduate student at the University of British Columbia, where he is researching a numerical framework aimed at increasing quantitative communication in the geosciences developed through his studies on numerical geophysics, subsurface flow, and structural geology. Rowan is interested in the intersection of education, industry, and academia and seeing what happens when you make powerful scientific modeling, visualization and communication tools accessible through the web. Much of his research is accessible through an open source software initiative for geophysical simulations and parameter estimation (SimPEG) and an open website for geoscience modeling (Visible Geology).

Presentations

Deploying a reproducible course Session

Web-based textbooks and interactive simulations built in Jupyter notebooks provide an entry point for course participants to reproduce content they are shown and dive into the code used to build them. Lindsey Heagy and Rowan Cockett share strategies and tools for developing an educational stack that emerged from the deployment of a course on geophysics and some lessons learned along the way.

Marc Colangelo is a solutions engineer at Zymergen. Previously, Marc worked in various research areas including immunology, dynamic proteomics systems, and healthcare data modeling. Marc holds a bachelor of health sciences from McMaster University and a PhD from McMaster’s Medical Sciences program, with a focus on infection and immunity stream, in the Department of Pathology and Molecular Medicine.

Presentations

Using Jupyter at the intersection of robots and industrial biology Session

Zymergen approaches biology with an engineering and data-driven mindset. Its platform integrates robotics, software, and biology to deliver predictability and reliability during strain design and development. Marc Colangelo, Justin Nand, and Danielle Chou explain the integral role Jupyter notebooks play in providing a shared Python environment between Zymergen's software engineers and scientists.

Sylvain Corlay is a quant researcher specializing in stochastic analysis and optimal control and the founder of QuantStack. Previously, Sylvain was a quant researcher at Bloomberg LP and an adjunct faculty member at Columbia University and NYU. As an open source developer, Sylvain mostly contributes to Project Jupyter in the area of interactive widgets and lower-level components such as traitlets. He is also a member of the steering committee of the project. Sylvain is also a contributor to a number of other open source projects for scientific computing and data visualization, such as bqplot, pythreejs, and ipyleaflet, and coauthored the xtensor C++ tensor algebra library. He holds a PhD in applied mathematics from University Paris VI.

Presentations

Jupyter widgets: Interactive controls for Jupyter Tutorial

With Jupyter widgets, you can build user interfaces with graphical controls inside a Jupyter notebook, documentation, and web pages. Jupyter widgets also provide a framework for building custom controls. Sylvain Corlay demonstrates how to use Jupyter widgets effectively for interactive computing, explores the ecosystem of custom controls, and walks you through building your own control.

Xeus: A framework for writing native Jupyter kernels Session

Xeus takes on the burden of implementing the Jupyter kernel protocol so that kernel authors can focus on more easily implementing the language-specific part of the kernel and support features, such as autocomplete or interactive widgets. Sylvain Corlay and Johan Mabille showcase a new C++ kernel based on the Cling interpreter built with xeus.

John DeBlase is a freelance developer, data scientist, and musician from Queens, NY. He is currently working toward a master’s degree in data analytics at CUNY’s School of Professional Studies. His current research revolves around the development musical intelligence systems using natural language processing techniques with a focus on real-time human-computer interaction. John is also interested in developing applications for data scientists that emphasize interactive data visualization, leveraging the best tools currently available in both Python and Node.js.

Presentations

Beautiful networks and network analytics made simpler with Jupyter Session

Performing network analytics with NetworkX and Jupyter often results in difficult-to-examine hairballs rather than useful visualizations. Meanwhile, more flexible tools like SigmaJS have high learning curves for people new to JavaScript. Daina Bouquin and John DeBlase share a simple, flexible architecture that can help create beautiful JavaScript networks without ditching the Jupyter Notebook.

Christine Doig is a product manager and senior data scientist at Continuum Analytics. Christine has 8+ years of experience in analytics, operations research, and machine learning in a variety of industries, including energy, manufacturing, and banking. An open source advocate, she has spoken at PyData, EuroPython, SciPy, PyCon, OSCON, and many other open source conferences. Christine holds an MS in industrial engineering from the Polytechnic University of Catalonia in Barcelona.

Presentations

Leveraging Jupyter to build an Excel-Python bridge Session

Christine Doig and Fabio Pliger explain how they built a commercial product on top Jupyter to help Excel users access the capabilities of the rich data science Python ecosystem and share examples and use cases from a variety of industries that illustrate the collaborative workflow between analysts and data scientists that the application has enabled.

Nadia Eghbal works on community programs at GitHub, where she is building sustainability initiatives. Nadia explores how we can better support open source infrastructure, highlighting current gaps in funding and knowledge. She recently published Roads and Bridges: The Unseen Labor Behind Our Digital Infrastructure with support from the Ford Foundation. Nadia is based in San Francisco.

Presentations

Keynote by Nadia Eghbal Keynote

Details to come.

Lori is a Senior Product Manager at Civis with a background in software engineering, derivatives trading, environmental consulting, and competitive fringe sports. Lori holds an S.M. and S.B. in Earth, Atmospheric, and Planetary Sciences from the Massachusetts Institute of Technology.

Presentations

Moving Jupyter into the cloud: challenges and lessons learned Session

The product and engineering teams of Civis Analytics integrated Jupyter notebooks into our cloud-based platform, providing the ability to run multiple notebooks concurrently and share them. We'll present what we learned about notebook users and their user stories, and the various technical challenges we encountered. You'll hear from both engineering and product as we co-present our approaches.

Faisal Farooq is the principal scientist in the Watson Health group of IBM Watson, where he works on next-generation healthcare software to improve patient care. Faisal is an expert in applying machine learning in the healthcare domain. Previously, he was a senior key expert (distinguished scientist) at Siemens Healthcare, where he successfully delivered the most widely adopted data science product in US healthcare. Faisal has published a number of papers in multiple journals and at conferences in the areas of machine learning, handwriting, biometrics, and text analysis. He holds a PhD in computer science and engineering from the University at Buffalo, where he worked as a graduate research assistant in Center of Excellence for Document Analysis and Recognition (CEDAR) and the Center for Unified Biometrics and Sensors (CUBS). He also completed multiple research internships at the IBM T.J. Watson Research Center.

Presentations

Data science made easy in Jupyter notebooks using PixieDust and InsightFactory Session

David Taieb, Prithwish Chakraborty, and Faisal Farooq offer an overview of PixieDust, a new open source library that speeds data exploration with interactive autovisualizations that make creating charts easy and fun.

Analytical, performance focused engineer with over 12 years experience in enterprise systems development and architectural design using JEE technology. Specialized in Big Data platform analytics using Hadoop and associated ecosystem tools. Exceptional technology skills combined with ability to drive user-centric solutions, define strategy and lead data management.

Presentations

Accelerating data driven culture at the largest media group of Latin America with Jupyter Session

JupyterHub is an important tool for research and data driven decisions at Globo.com. Here, we show how all of Globo.com data scientists can use Jupyter Notebooks for data analysis and machine learning with no installation and configuration taking decisions that impact 50 millions of users per month.

Brittany Fiore-Gartland is the director of data science ethnography at the eScience Institute and a research scientist in the Department of Human Centered Design and Engineering at the University of Washington, where she leads a research group that studies the sociocultural implications of data-intensive science and how data-intensive technologies are reshaping how people work and organize. Her research focuses on cross-sector and interdisciplinary data science collaborations, emerging pedagogical models for data science, and bringing a human-centered, sociotechnical, and ethical perspective to data science practice. Brittany co-leads UW Data Science Studies, an interdisciplinary group of researchers studying the sociotechnical and ethical dimensions of the emerging practice of data science that is part of a collaborative and multisited working group supported through the Moore-Sloan Data Science Environments and in partnership with researchers at the Berkeley Institute for Data Science and the Center for Data Science in New York University. Whenever possible, Brittany’s work follows a model of action research, meaning her research practice aims to inform and affect positive change within the communities she studies. Often this takes the form of articulating the challenges and opportunities for communication and collaboration during times of technological change. She works with communities to bridge communication gaps and develop value-informed, reflexive, and adaptive organizational practices.

Presentations

Jupyter and the changing rituals around computation Session

The concept of rituals is useful for thinking about how the core technology of Jupyter notebooks is extended through other tools, platforms, and practices. R. Stuart Geiger, Brittany Fiore-Gartland, and Charlotte Cabasse-Mazel share ethnographic findings about various rituals performed with Jupyter notebooks.

Jeremy Freeman is a scientist at the intersection of biology and technology. He wants to understand how biological systems work, and use that understanding to benefit both human health and the design of intelligent systems. After running a neuroscience research lab for several years Jeremy recently joined the Chan Zuckerberg Initiative, where he is helping develop their efforts to support and accelerate basic research with tools for analysis, visualization, and collaborative sharing of data and knowledge. He is passionate about open source and open science and bringing scientists and engineers together across a range of fields.

Presentations

Keynote by Jeremy Freeman Keynote

Details to come.

Tim Gasper is director of product and marketing at Bitfusion, a deep learning automation software company enabling easier, faster development of AI applications, and cofounder of Ponos, an IoT-enabled hydroponics farming technology company. Tim has over eight years of big data, IoT, and enterprise content product management and product marketing experience. He is a writer and speaker on entrepreneurship, the Lean Startup methodology, and big data analytics. Previously, Tim was global portfolio manager for CSC Big Data and Analytics, where he was responsible for the overall strategy, roadmap, partnerships, and technology mix for the big data and analytics product portfolio; VP of product at Infochimps (acquired by CSC), where he led product development for its market-leading open data marketplace and big data platform as a service; and cofounder of Keepstream, a social media analytics and curation company.

Presentations

Deep learning and Elastic GPUs using Jupyter Session

Combined with GPUs, Jupyter makes for fast development and fast execution, but it is not always easy to switch from a CPU execution context to GPUs and back. Tim Gasper and Pierce Spitler share best practices on doing deep learning with Jupyter and explain how to work with CPUs and GPUs more easily by using Elastic GPUs and quick-switching between custom kernels.

Laurent Gautier is a scientific research lead at Verily Life Sciences (fka Google Life Sciences). Laurent’s work focuses on data science, visualization, machine learning, data mining, and prototyping software to understand molecular, cellular, and clinical data. He is the author of popular open source tools in bioinformatics and statistical programming for applications in healthcare, life sciences, and beyond and has contributed to or led a number of open source projects, including Bioconductor, affy, and rpy2.

Presentations

Data analysis in Jupyter notebooks with SQL, Python, and R Tutorial

Python is popular for data analysis, but restricting yourself to Python means missing a wealth of libraries or capabilities available in R or SQL. Laurent Gautier walks you through a pragmatic, reasonable, and good-looking polyglot approach, all thanks to R visualizations.

R. Stuart Geiger is an ethnographer and postdoctoral scholar at the Berkeley Institute for Data Science at UC Berkeley, where he studies the infrastructures and institutions that support the production of knowledge. He uses ethnographic, historical, qualitative, and quantitative methods in his research, which is grounded in the fields of computer-supported cooperative work, science and technology studies, and communication and new media studies. He holds a PhD from the UC Berkeley School of Information, where his research focused on the governance and operation of Wikipedia and scientific research networks. He has also studied newcomer socialization, moderation and quality control, specialization and professionalization, cooperation and conflict, the roles of support staff and technicians, and diversity and inclusion.

Presentations

Jupyter and the changing rituals around computation Session

The concept of rituals is useful for thinking about how the core technology of Jupyter notebooks is extended through other tools, platforms, and practices. R. Stuart Geiger, Brittany Fiore-Gartland, and Charlotte Cabasse-Mazel share ethnographic findings about various rituals performed with Jupyter notebooks.

Matt Greenwood is chief inspiration officer at Two Sigma, where he has led a number of company-wide efforts in engineering and modeling. Matt began his career at Bell Labs, working in the Operating Systems group under Dennis Ritchie, before moving to IBM Research, where he was responsible for a number of early efforts in tablet computing and distributed computing. Matt also also served as lead developer and manager for a number of systems on the network element at Entrisphere, which created a product providing access equipment for broadband service providers, and created the Customer Engineering department in preparation for initial customer trials. Matt holds a BA and an MA in math from Oxford University, a master’s degree in theoretical physics from the Weizmann Institute of Science in Israel, and a PhD in mathematics from Columbia University, where he taught for a number of years.

Presentations

From Beaker to BeakerX Session

Matt Greenwood introduces BeakerX, a set of Jupyter Notebook extensions that enable polyglot data science, time series plotting and processing, research publication, and integration with Apache Spark. Matt reviews the Jupyter extension architecture and how BeakerX plugs into it, covers the current set of BeakerX capabilities, and discusses the pivot from Beaker, a standalone notebook, to BeakerX.

Jason Grout is a Jupyter developer at Bloomberg, working primarily on JupyterLab and the interactive widget system. He has also been a major contributor to the open source Sage mathematical software system for many years. Jason also co-organizes the PyDataNYC Meetup. Previously, Jason was an assistant professor of mathematics at Drake University in Des Moines, Iowa. He holds a PhD in mathematics from Brigham Young University.

Presentations

JupyterLab tutorial Tutorial

Steven Silvester and Jason Grout lead a walkthrough of JupyterLab as a user and as an extension author, explore the capabilities of JupyterLab, and a offer a demonstration of how to create a simple extension to the environment.

Mark Hahnel is the founder of Figshare, an open data tool that allows researchers to publish all of their data in a citable, searchable, and sharable manner. Mark is passionate about open science and the potential it has to revolutionize the research community. He’s fresh out of of academia, having just completed his PhD in stem cell biology at Imperial College London. Mark also studied genetics in Newcastle and Leeds.

Presentations

Closing the gap between Jupyter and academic publishing Session

Reports of a lack of reproducibility have led funders and others to require open data and code as the outputs of research they fund. Mark Hahnel and Marius Tulbure discuss the opportunities for Jupyter notebooks to be the final output of academic research, arguing that Jupyter could help disrupt the inefficiencies in cost and scale of open access academic publishing.

Lindsey Heagy is a PhD candidate at the University of British Columbia studying numerical geophysics. Her work focuses on using electromagnetic geophysics for monitoring subsurface injections including carbon capture and storage and hydraulic fracturing. She a project lead on GeoSci.xyz, an effort to build collaborative, interactive, web-based textbooks in the geosciences, and a core contributor to SimPEG, an open source framework for geophysical simulation and inversions.

Presentations

Deploying a reproducible course Session

Web-based textbooks and interactive simulations built in Jupyter notebooks provide an entry point for course participants to reproduce content they are shown and dive into the code used to build them. Lindsey Heagy and Rowan Cockett share strategies and tools for developing an educational stack that emerged from the deployment of a course on geophysics and some lessons learned along the way.

Kari Jordan is the deputy director of assessment for Data Carpentry and an advocate for improving diversity in data science. Previously, Kari was a postdoctoral fellow at Embry-Riddle Aeronautical University, where her research focus was evidenced-based instructional practices among STEM faculty. Kari served on the board of directors for the National Society of Black Engineers (NSBE) for three years. A product of the Detroit Public School system, Kari holds a BS and an MS in mechanical engineering from Michigan Technological University and a PhD in engineering education from the Ohio State University. During her education, she interned with Marathon Petroleum Company, SC Johnson, Ford Motor Company, and Educational Testing Services. As a graduate student, she received fellowships from the National Society of Black Engineers (NSBE), King-Chavez-Parks Initiative, and the National GEM Consortium.

Presentations

Learning to code isn’t enough: Training as a pathway to improve diversity Session

Diversity can be achieved through sharing information among members of a community. Jupyter prides itself on being a community of dynamic developers, cutting-edge scientists, and everyday users, but is our platform being shared with diverse populations? Kari Jordan explains how training has the potential to improve diversity and drive usage of Jupyter notebooks in broader communities.

Wendy Kan is a data scientist at Kaggle, the largest global data science community, where she works with companies and organizations to transform their data into machine learning competitions. Previously, Wendy was a software engineer and researcher. She holds BS and MS degrees in electrical engineering and a PhD in biomedical engineering.

Presentations

Lessons learned from tens of thousands of Kaggle notebooks Session

Kaggle Kernels, an in-browser code execution environment that includes a version of Jupyter Notebooks, has allowed Kaggle to flourish in new ways. Drawing on a diverse repository of user-created notebooks paired with competitions and public datasets, Megan Risdal and Wendy Chih-wen Kan explain how Kernels has impacted machine learning trends, collaborative data science, and learning.

Kyle Kelley is a senior software engineer at Netflix, a maintainer on nteract.io, and a core developer of the IPython/Jupyter project. He wants to help build great environments for collaborative analysis, development, and production workloads for everyone, from small teams to massive scale.

Presentations

Jupyter at Netflix Session

So, Netflix's data scientists and engineers. . .do they know things? Join Kyle Kelley to find out. Kyle explores how Netflix uses Jupyter and explains how you can learn from Netflix's experience to enable analysts at your organization.

Saranga Komanduri is a Tech Lead at Civis, applying his expertise in both data science and software engineering to solve hard problems at scale. Saranga has a PhD from the School of Computer Science at Carnegie Mellon University, where he spent six years studying linguistic password models, security warnings, and privacy. Prior to joining Civis, Saranga interned at Google and Microsoft Research.

Presentations

Moving Jupyter into the cloud: challenges and lessons learned Session

The product and engineering teams of Civis Analytics integrated Jupyter notebooks into our cloud-based platform, providing the ability to run multiple notebooks concurrently and share them. We'll present what we learned about notebook users and their user stories, and the various technical challenges we encountered. You'll hear from both engineering and product as we co-present our approaches.

Chris Kotfila is an R&D engineer at Kitware. Chris’s research interests are in natural language processing, machine learning, knowledge organization and geographic information science. He holds dual degrees in computer science and philosophy from Rensselaer Polytechnic Institute and a masters degree in library science, where he focused on issues of open access, scholarly communication and reproducible research. During his time at RPI, he worked regularly as a research programmer in the area of computational cognitive engineering. Chris also served overseas with the US Peace Corps. He is an avid open source enthusiast and a hopeless Emacs user.

Presentations

GeoNotebook: An extension to the Jupyter Notebook for exploratory geospatial analysis Session

Chris Kotfila offers an overview of the GeoNotebook extension to the Jupyter Notebook, which provides interactive visualization and analysis of geospatial data. Unlike other geospatial extensions to the Jupyter Notebook, GeoNotebook includes a fully integrated tile server providing easy visualization of vector and raster data formats.

Aaron Kramer is a data scientist and engineer at DataScience.com, where he builds powerful language and engagement models using natural language processing, deep learning, Bayesian inference, and machine learning.

Presentations

Interactive natural language processing with SpaCy and Jupyter Tutorial

Modern natural language processing (NLP) workflows often require interoperability between multiple tools. Aaron Kramer offers an introduction to interactive NLP with SpaCy within the Jupyter Notebook, covering core NLP concepts, core workflows in SpaCy, and examples of interacting with other tools like TensorFlow, NetworkX, LIME, and others as part of interactive NLP projects.

Michael Lanzetta is a principal software development engineer on the Partner Catalyst team at Microsoft, where his current work ranges from implementing binary protocols in JavaScript to training domain-specific image classification convolutional neural networks. He works with everyone from small startups to large enterprise customers—anyone doing innovative work that is stretching the Microsoft stack (and in particular Azure) beyond its current limits. Michael is currently the head of the Machine Learning Technical Working Group in DX, helping upskill Microsoft’s field in ML and deep learning, and leading efforts to bring Microsoft’s suite of ML technologies to the aid of its partners, both large and small. Previously, Michael worked on Microsoft Live Search and Windows Mobile Services, Bing Travel, MSN and MSN Mobile, and FUSE Labs.

Presentations

Notebook narratives from industry: Inspirational real-world examples and reusable industry notebooks Session

Patty Ryan, Lee Stott, and Michael Lanzetta explore four industry examples of Jupyter notebooks that illustrate innovative applications of machine learning in manufacturing, retail, services, and education and share four reference industry Jupyter notebooks (available in both Python and R)—along with demo datasets—for practical application to your specific industry value areas.

Ryan Lovett manages research and instructional computing for the Department of Statistics at UC Berkeley. He is most often a sysadmin though enjoys programming and consulting with faculty and students.

Presentations

Deploying JupyterHub for students and researchers Tutorial

JupyterHub, a multiuser server for Jupyter notebooks, enables you to offer a notebook server to everyone in a group—which is particularly useful when teaching a course, as students no longer need to install software on their laptops. Min Ragan-Kelley, Carol Willing, Yuvi Panda, and Ryan Lovett get you started deploying and customizing JupyterHub for your needs.

Managing a 1,000+ student JupyterHub without losing your sanity Session

The UC Berkeley Data Science Education program uses Jupyter notebooks on a JupyterHub. Ryan Lovett and Yuvi Panda outline the DevOps principles that keep the largest reported educational hub (with 1,000+ users) stable and performant while enabling all the features instructors and students require.

Johan Mabille is a scientific software developer at QuantStack, where he specializes in high-performance computing in C++. Previously, Johan was a quant developer at HSBC. An open source developer, Johan is the coauthor of xtensor and xeus and the main author of xsimd. He holds a master’s degree in computer science from Centrale-Supelec.

Presentations

Xeus: A framework for writing native Jupyter kernels Session

Xeus takes on the burden of implementing the Jupyter kernel protocol so that kernel authors can focus on more easily implementing the language-specific part of the kernel and support features, such as autocomplete or interactive widgets. Sylvain Corlay and Johan Mabille showcase a new C++ kernel based on the Cling interpreter built with xeus.

Ali Marami is the chief data scientist and one of the founders of R-Brain, a platform for developing, sharing, and promoting models and applications in data science. He has extensive experience in financial and quantitative modeling and model risk management at several US banks. Ali holds a PhD in finance from University of Neuchâtel in Switzerland and a BS in electrical engineering.

Presentations

Building a powerful data science IDE for R, Python, and SQL using JupyterLab Session

JupyterLab provides a robust foundation for building flexible computational environments. Ali Marami explains how R-Brain leveraged the JupyterLab extension architecture to build a powerful IDE for data scientists, one of the few tools in the market that evenly supports R and Python in data science and includes features such as IntelliSense, debugging, and environment and data view.

Yoshi Nobu Masatani is a project researcher at the National Institute of Informatics, an interuniversity research institute for information and systems, where he is responsible for the design and operation of the academic cloud within NII. He has broad range of experience on OSS-based enterprise infrastructure deployments and operations with mission-critical high-availability systems and big data clusters. Previously, Nobu was a senior specialist and manager of OSS professional services within NTT Data Corp.

Presentations

Collaboration and automated operation as literate computing for reproducible infrastructure Session

Jupyter is useful for DevOps. It enables collaboration between experts and novices to accumulate infrastructure knowledge, while automation via notebooks enhances traceability and reproducibility. Yoshi Nobu Masatani shows how to combine Jupyter with Ansible for reproducible infrastructure and explores knowledge, workflow, and customer support as literate computing practices.

Wes McKinney is a software architect at Two Sigma Investments. He is the creator of Python’s pandas library and a PMC member for Apache Arrow and Apache Parquet. He wrote the book Python for Data Analysis. Previously, Wes worked for Cloudera and was the founder and CEO of DataPad.

Presentations

Keynote by Wes McKinney Keynote

Details to come.

Daniel Mietchen is a biophysicist interested in integrating research workflows with the World Wide Web, particularly through open licensing, open standards, public version histories, and forkability. With research activities spanning from the subcellular to the organismic level, from fossils to developing embryos, and from insect larvae to elephants, he has experienced multiple shades of the research cycle and a variety of approaches to collaboration and sharing in research contexts. He has also been contributing to Wikipedia and its sister projects for more than a decade and is actively engaged in increasing the interactions between the Wikimedia and research communities.

Presentations

Postpublication peer review of Jupyter notebooks referenced in articles on PubMed Central Session

Jupyter notebooks are a popular option for sharing data science workflows. Daniel Mietchen shares best practices for reproducibility and other aspects of usability (documentation, ease of reuse, etc.) gleaned from analyzing Jupyter notebooks referenced in PubMed Central, a project that started at a hackathon earlier this year and is still ongoing and is being documented on GitHub.

Christian Moscardi is director of technology for the Data Incubator. Previously, Christian developed a CMS for food blogs, worked for Google, and researched and taught at Columbia. He organizes with BetaNYC, New York’s civic tech organization, and contributes to various civic data projects. His extracurricular activities include cooking, playing the piano, and exploring New York.

Presentations

Practical machine learning with the Jupyter Notebook 2-Day Training

Christian Moscardi walks you through developing a machine learning pipeline, from prototyping to production, with the Jupyter platform, exploring data cleaning, feature engineering, model building and evaluation, and deployment in an industry-focused setting. Along the way, you'll learn Jupyter best practices and the Jupyter settings and libraries that enable great visualizations.

Teaching from Jupyter notebooks Session

Christian Moscardi shares the practical solutions developed at the Data Incubator for using Jupyter notebooks for education. Christian explores some of the open source Jupyter extensions he has written to improve the learning experience as well as tools to clean notebooks before they are committed to version control.

Andreas Müller is a lecturer at the Data Science Institute at Columbia University and author of Introduction to Machine Learning with Python (O’Reilly), which describes a practical approach to machine learning with Python and scikit-learn. His mission is to create open tools to lower the barrier of entry for machine learning applications, promote reproducible science, and democratize the access to high-quality machine learning algorithms. Andreas is one of the core developers of the scikit-learn machine learning library and has been comaintaining it for several years. He is also a Software Carpentry instructor. Previously, he worked at the NYU Center for Data Science on open source and open science and as a machine learning scientist at Amazon.

Presentations

Data analysis and machine learning in Jupyter Tutorial

Andreas Müller walks you through a variety of real-world datasets using Jupyter notebooks together with the data analysis packages pandas, seaborn, and scikit-learn. You'll perform an initial assessment of data, deal with different data types, visualization, and preprocessing, and build predictive models for tasks such as health care and housing.

Writing (and publishing) a book written in Jupyter notebooks Session

The Jupyter Notebook can combine narrative, code, and graphics—the ideal combination for teaching anything programming related. That's why Andreas Müller chose to write his book, Introduction to Machine Learning with Python, in a Jupyter notebook. However, going from notebook to book was not easy. Andreas shares challenges and tricks for converting notebooks for print.

Bachelor’s in Biological Science – Biophysics (Bioinformatic) from UFRJ (2011) and master’s at Artificial Intelligence from PPGIUFRJ (2014). Has experience in Computer Science, acting on WEB Development, P2P Network, Collaborative Systems, Recommendation Systems, Open Source, BI (Business Inteligence) and loves Big Data.

Presentations

Accelerating data driven culture at the largest media group of Latin America with Jupyter Session

JupyterHub is an important tool for research and data driven decisions at Globo.com. Here, we show how all of Globo.com data scientists can use Jupyter Notebooks for data analysis and machine learning with no installation and configuration taking decisions that impact 50 millions of users per month.

Justin Nand is a solutions engineer at Zymergen. Justin has a background in bioengineering and has worked as software engineer developing tools for research and computational biology.

Presentations

Using Jupyter at the intersection of robots and industrial biology Session

Zymergen approaches biology with an engineering and data-driven mindset. Its platform integrates robotics, software, and biology to deliver predictability and reliability during strain design and development. Marc Colangelo, Justin Nand, and Danielle Chou explain the integral role Jupyter notebooks play in providing a shared Python environment between Zymergen's software engineers and scientists.

Paco Nathan leads the Learning group at O’Reilly Media. Known as a “player/coach” data scientist, Paco led innovative data teams building ML apps at scale for several years and more recently was evangelist for Apache Spark, Apache Mesos, and Cascading. Paco has expertise in machine learning, distributed systems, functional programming, and cloud computing with 30+ years of tech-industry experience, ranging from Bell Labs to early-stage startups. Paco is an advisor for Amplify Partners and was cited in 2015 as one of the top 30 people in big data and analytics by Innovation Enterprise. He is the author of Just Enough Math, Intro to Apache Spark, and Enterprise Data Workflows with Cascading.

Presentations

Humans in the loop: Jupyter notebooks as a frontend for AI pipelines at scale Session

Paco Nathan reviews use cases where Jupyter provides a frontend to AI as the means for keeping humans in the loop. This process enhances the feedback loop between people and machines, and the end result is that a smaller group of people can handle a wider range of responsibilities for building and maintaining a complex system of automation.

Andrew Odewahn is the CTO of O’Reilly Media, where he helps define and create the new products, services, and business models that will help O’Reilly continue to make the transition to an increasingly digital future. The author of two books on database development, he has experience as a software developer and consultant in a number of industries, including manufacturing, pharmaceuticals, and publishing. Andrew holds an MBA from New York University and a degree in computer science from the University of Alabama. He’s also thru-hiked the Appalachian Trail from Georgia to Maine.

Presentations

Friday Opening Welcome Keynote

Program Chairs, Fernando Perez and Andrew Odewahn open the second day of keynotes.

Thursday opening welcome Keynote

Program chairs Andrew Odewahn and Fernando Perez open the first day of keynotes.

Yuvi Panda is a programmer and DevOps engineer at Wikimedia, where he works on making it easy for people who don’t traditionally consider themselves “programmers” to do things with code. He builds tools (Quarry, PAWS, etc.) to sidestep the list of historical accidents that constitute the “command-line tax” that people have to pay before doing productive things with computing. He also contributes to Project Jupyter, primarily around making it easier to deploy.

Presentations

Democratizing access to open data by providing open computational infrastructure Session

Open data by itself is not enough. Yuvi Panda explains how providing free, open, and public computational infrastructure with easy access to open data has helped people of all backgrounds should be able to easily use data however they want and why other organizations providing open data should do the same.

Deploying JupyterHub for students and researchers Tutorial

JupyterHub, a multiuser server for Jupyter notebooks, enables you to offer a notebook server to everyone in a group—which is particularly useful when teaching a course, as students no longer need to install software on their laptops. Min Ragan-Kelley, Carol Willing, Yuvi Panda, and Ryan Lovett get you started deploying and customizing JupyterHub for your needs.

Managing a 1,000+ student JupyterHub without losing your sanity Session

The UC Berkeley Data Science Education program uses Jupyter notebooks on a JupyterHub. Ryan Lovett and Yuvi Panda outline the DevOps principles that keep the largest reported educational hub (with 1,000+ users) stable and performant while enabling all the features instructors and students require.

Hilary Parker is a data scientist at Stitch Fix and cofounder of the Not So Standard Deviations podcast. Hilary focuses on R, experimentation, and rigorous analysis development methods such as reproducibility. Previously, she was a senior data analyst at Etsy. Hilary holds a PhD in biostatistics from the Johns Hopkins Bloomberg School of Public Health. Hilary can be found on Twitter at @hspter.

Presentations

Opinionated analysis development Session

Traditionally, statistical training has focused on statistical methods and tests, without addressing the process of developing a technical artifact, such as a report. Hilary Parker argues that it's critical to teach students how to go about developing an analysis so they avoid common pitfalls and explains why we must adopt a blameless postmortem culture to address these pitfalls as they occur.

Fernando Pérez is a staff scientist at Lawrence Berkeley National Laboratory and a founding investigator of the Berkeley Institute for Data Science at UC Berkeley, where his research focuses on creating tools for modern computational research and data science across domain disciplines, with an emphasis on high-level languages, interactive and literate computing, and reproducible research. Fernando created IPython while a graduate student in 2001 and continues to lead its evolution into Project Jupyter, now as a collaborative effort with a talented team that does all the hard work. He regularly lectures about scientific computing and data science and is a member of the Python Software Foundation, a founding member of the NumFOCUS Foundation, and a National Academy of Science Kavli Frontiers of Science Fellow. He is the recipient of the 2012 Award for the Advancement of Free Software from the Free Software Foundation. Fernando holds a PhD in particle physics from the University of Colorado at Boulder, followed by postdoctoral research in applied mathematics, developing numerical algorithms.

Presentations

Friday Opening Welcome Keynote

Program Chairs, Fernando Perez and Andrew Odewahn open the second day of keynotes.

Keynote by Fernando Perez Keynote

Details to come.

Thursday opening welcome Keynote

Program chairs Andrew Odewahn and Fernando Perez open the first day of keynotes.

Fabio Pliger is the technical lead for Anaconda Fusion and a Bokeh core developer at Continuum Analytics, where he also worked on the XDATA DARPA and on customer projects. Fabio has 14+ years of experience in Python applied to both highly regulated enterprise and open source. He has been an open source and Python advocate for many years and has spoken at many tech conferences around the world. He is a former chairman of the EuroPython Society, cochair of the EuroPython Conference and PyCon Italy, and cofounder of the Python Italia Association. Fabio holds an bachelor’s degree in computer science from the University of Verona, Italy.

Presentations

Leveraging Jupyter to build an Excel-Python bridge Session

Christine Doig and Fabio Pliger explain how they built a commercial product on top Jupyter to help Excel users access the capabilities of the rich data science Python ecosystem and share examples and use cases from a variety of industries that illustrate the collaborative workflow between analysts and data scientists that the application has enabled.

Cheryl Quah is a senior software engineer at Bloomberg LP, where she develops applications to improve financial professionals’ research and investment workflows.

Presentations

Industry and open source: Working together to drive advancements in Jupyter for quants and data scientists Session

Strong partnerships between the open source community and industry have driven many recent developments in Jupyter. Srinivas Sunkara and Cheryl Quah discuss the results of some of these collaborations, including JupyterLab, bqplot, and enhancements to ipywidgets that greatly enrich Jupyter as an environment for data science and quantitative financial research.

Min Ragan-Kelley is a postdoctoral fellow at Simula Research Lab in Oslo, Norway. Min has been contributing to IPython and Jupyter since 2006 (full-time since 2013). His areas of focus include the underlying infrastructure of Jupyter and deployment tools and services, such as JupyterHub and nbviewer.

Presentations

Deploying JupyterHub for students and researchers Tutorial

JupyterHub, a multiuser server for Jupyter notebooks, enables you to offer a notebook server to everyone in a group—which is particularly useful when teaching a course, as students no longer need to install software on their laptops. Min Ragan-Kelley, Carol Willing, Yuvi Panda, and Ryan Lovett get you started deploying and customizing JupyterHub for your needs.

JupyterHub: A roadmap of recent developments and future directions Session

JupyterHub is a multiuser server for Jupyter notebooks. Min Ragan-Kelley and Carol Willing discuss exciting recent additions and future plans for the project, including sharing notebooks with students and collaborators.

Bernie Randles is a graduate student in the Information Studies program at UCLA. Her work is centered around knowledge creation in astronomy, specifically examining astronomers’ data and software pipeline practices. She also researches the use of open source software in scientific research organizations, primarily in data-rich and computationally intensive fields. Previously, Bernie worked in IT (wearing many hats, some red) at several colleges and universities. She holds degrees in math, computer science, and fine arts.

Presentations

Citing the Jupyter Notebook in the scientific publication process Session

Although researchers have traditionally cited code and data related to their publications, they are increasingly using the Jupyter Notebook to share the processes involved in the act of scientific inquiry. Bernie Randles and Catherine Zucker explore various aspects of citing Jupyter notebooks in publications, discussing benefits, pitfalls, and best practices for creating the "paper of the future."

Megan Risdal is a marketing manager at Kaggle. She holds master’s degrees in linguistics from the University of California, Los Angeles, and North Carolina State University. Her curiosities lie at the intersection of data, science, language, and learning.

Presentations

Lessons learned from tens of thousands of Kaggle notebooks Session

Kaggle Kernels, an in-browser code execution environment that includes a version of Jupyter Notebooks, has allowed Kaggle to flourish in new ways. Drawing on a diverse repository of user-created notebooks paired with competitions and public datasets, Megan Risdal and Wendy Chih-wen Kan explain how Kernels has impacted machine learning trends, collaborative data science, and learning.

Ian Rose is as postdoctoral fellow at the Berkeley Institute for Data Science, where he works on the Jupyter Project. He holds a PhD in geology from UC Berkeley, where his research focused on the physics of the deep Earth.

Presentations

Real-time collaboration in Jupyter notebooks Session

Ian Rose shares recent work on allowing for real-time collaboration in Jupyter notebooks, including installation, usage, and design decisions.

Philipp Rudiger is a software developer at Continuum Analytics, where he develops open source and client-specific software solutions for data management, visualization, and analysis. Philipp holds a PhD in computational modeling of the visual system.

Presentations

Deploying interactive Jupyter dashboards for visualizing hundreds of millions of datapoints, in 30 lines of Python Tutorial

It can be difficult to assemble the right set of packages from the Python scientific software ecosystem to solve complex problems. James Bednar and Philipp Rudiger walk you step by step through making and deploying a concise, fast, and fully reproducible recipe for interactive visualization of millions or billions of data points using very few lines of Python in a Jupyter notebook.

Patty Ryan leads prototyping engagements with partners, both large and small, on the Technology Evangelism and Development team at Microsoft. She specializes in designing and operationalizing predictive models that inform strategies, focus customer outreach, and increase engagement. Previously, Patty led telemetry, analytics, UX, and support in Dynamics, Azure Identity, and O365, driving innovation in customer-facing self-service and distributed analytics.

Presentations

Notebook narratives from industry: Inspirational real-world examples and reusable industry notebooks Session

Patty Ryan, Lee Stott, and Michael Lanzetta explore four industry examples of Jupyter notebooks that illustrate innovative applications of machine learning in manufacturing, retail, services, and education and share four reference industry Jupyter notebooks (available in both Python and R)—along with demo datasets—for practical application to your specific industry value areas.

Zach Sailer is graduate student at the Harms Lab at the University of Oregon, where he studies the mechanisms that shape protein evolution from a biophysical perspective. Previously, he was a core developer for the IPython/Jupyter team at Cal Poly San Luis Obispo. Zach has created and contributed to various scientific open source projects and is also a strong advocate for open science, working hard to promote and practice open science in all aspects of his research.

Presentations

How Jupyter makes experimental and computational collaborations easy Session

Scientific research thrives on collaborations between computational and experimental groups who work together to solve problems using their separate expertise. Zach Sailer highlights how tools like the Notebook, JupyterHub, and ipywidgets can be used to make these collaborations smoother and more effective.

Scott Sanderson is a senior software engineer at Quantopian, where he is responsible for the design and implementation of Quantopian’s backtesting and research APIs. Within the Jupyter ecosystem, most of Scott’s work focuses on enhancing the extensibility of the Jupyter Notebook for use in large deployments.

Presentations

Building a notebook platform for 100,000 users Session

Scott Sanderson describes the architecture of the Quantopian Research Platform, a Jupyter Notebook deployment serving a community of over 100,000 users, explaining how, using standard extension mechanisms, it provides robust storage and retrieval of hundreds of gigabytes of notebooks, integrates notebooks into an existing web application, and enables sharing notebooks between users.

Kaz Sato is a staff developer advocate on Google’s Cloud Platform team, where he focuses on machine learning and data analytics products, such as TensorFlow, Cloud ML, and BigQuery. Kaz has also led and supported developer communities for Google Cloud for over eight years. He has been an invited speaker at events including Google Cloud Next ’17 SF, Google I/O 2016 and 2017, the 2017 Strata Data Conference in London, the 2016 Strata + Hadoop World in San Jose and NYC, the 2016 Hadoop Summit, and ODSC East 2016 and 2017. Kaz is also interested in hardware and the IoT and has been hosting FPGA meetups since 2013.

Presentations

Cloud Datalab: Jupyter with the power of BigQuery and TensorFlow Session

Kazunori Sato explains how you can use Google Cloud Datalab—a Jupyter environment from Google that integrates BigQuery, TensorFlow, and other Google Cloud services seamlessly—to easily run SQL queries from Jupyter to access terabytes of data in seconds and train a deep learning model with TensorFlow with tens of GPUs in the cloud, with all the usual tools available on Jupyter.

Robert Schroll is a data scientist in residence at the Data Incubator. Previously, he held postdocs in Amherst, Massachusetts, and Santiago, Chile, where he realized that his favorite parts of his job were teaching and analyzing data. He made the switch to data science and has been at the Data Incubator since. Robert holds a PhD in physics from the University of Chicago.

Presentations

Machine learning with TensorFlow and Jupyter 2-Day Training

Robert Schroll introduces TensorFlow's capabilities through its Python interface with a series of Jupyter notebooks, moving from building machine learning algorithms piece by piece to using the higher-level abstractions provided by TensorFlow. You'll then use this knowledge to build and visualize machine learning models on real-world data.

Leah Silen has been the executive director of NumFocus from its beginning and worked with the founding board members to write the application for NumFocus’s nonprofit status. Previously, Leah was a public relations and program director in the nonprofit sector, where she focused on community relations and fundraising. Leah has volunteered and sat on several boards of nonprofit organizations.

Presentations

Empower scientists; save humanity: NumFOCUS—Five years in, five hundred thousand to go Session

What do the discovery of the Higgs boson, the landing of the Philae robot, the analysis of political engagement, and the freedom of human trafficking victims have in common? NumFOCUS projects were there. Join Leah Silen to learn together how we can empower scientists and save humanity.

Steven Silvester is a software engineer at Continuum Analytics, where he works on Project Jupyter and JupyterLab, a next-generation user interface for the Jupyter Notebook. He has also written kernels for Octave, MATLAB, and Scilab. Previously, Steven served 10 years in the US Air Force.

Presentations

JupyterLab tutorial Tutorial

Steven Silvester and Jason Grout lead a walkthrough of JupyterLab as a user and as an extension author, explore the capabilities of JupyterLab, and a offer a demonstration of how to create a simple extension to the environment.

Pierce Spitler leads product data science at Bitfusion, the world’s first end-to-end deep learning and AI development and infrastructure management platform. Previously, he served as the director of data science and insights for eyeQ, creators of next-generation personalized retail displays that leverage deep learning for facial recognition. He has several years experience interpreting sensor data, working with massive datasets, and performing deep learning on image and video data. Pierce is the co-organizer of the Austin Deep Learning meetup and writes and speaks on deep learning and applied data science.

Presentations

Deep learning and Elastic GPUs using Jupyter Session

Combined with GPUs, Jupyter makes for fast development and fast execution, but it is not always easy to switch from a CPU execution context to GPUs and back. Tim Gasper and Pierce Spitler share best practices on doing deep learning with Jupyter and explain how to work with CPUs and GPUs more easily by using Elastic GPUs and quick-switching between custom kernels.

Lee Stott is CTO of academic engagements at Microsoft, where he engages academic institutions across the UK in the ongoing development of the Microsoft platform. Lee has held a number of roles at Microsoft, including academic and technical evangelist. Previously, Lee was the head of information systems at the University of Manchester, where he led service and delivery teams across both academic and commercial markets. Lee holds a PGCE in higher education management from the University of Southampton and an MSc in information technology from the University of Liverpool.

Presentations

Notebook narratives from industry: Inspirational real-world examples and reusable industry notebooks Session

Patty Ryan, Lee Stott, and Michael Lanzetta explore four industry examples of Jupyter notebooks that illustrate innovative applications of machine learning in manufacturing, retail, services, and education and share four reference industry Jupyter notebooks (available in both Python and R)—along with demo datasets—for practical application to your specific industry value areas.

Srinivas Sunkara is a quant on the Quantitative Financial Research team at Bloomberg LP, where he works on developing financial models that apply machine learning techniques to various problems in finance. Srinivas is one of the main developers of bqplot, a Jupyter notebook–based interactive plotting library, and contributes to other open source projects, including ipywidgets and traitlets.

Presentations

Industry and open source: Working together to drive advancements in Jupyter for quants and data scientists Session

Strong partnerships between the open source community and industry have driven many recent developments in Jupyter. Srinivas Sunkara and Cheryl Quah discuss the results of some of these collaborations, including JupyterLab, bqplot, and enhancements to ipywidgets that greatly enrich Jupyter as an environment for data science and quantitative financial research.

Vinitra Swamy completed her Bachelor’s degree in computer science at the University of California, Berkeley in 2 years and is now pursuing a Master’s in computer science. Her research interests include data science, cloud computing environments, and natural language processing. As Head Student Instructor for Berkeley’s new Fundamentals of Data Science course, Vinitra has the chance to educate thousands of students from diverse backgrounds. Vinitra leads a Jupyter Development research team of students at the Berkeley Institute for Data Science and assists with the technical deployment and use of JupyterHub Infrastructure across campus.

Presentations

Data science at UC Berkeley: 2,000 undergraduates, 50 majors, no command line Session

Engaging critically with data is now a required skill for students in all areas, but many traditional data science programs aren’t easily accessible to those without prior computing experience. Gunjan Baid and Vinitra Swamy explore UC Berkeley's Data Science program—1,200 students across 50 majors—explaining how its pedagogy was designed to make data science accessible to everyone.

Thorin Tabor is a software engineer at UCSD and a contributing scientist at the Broad Institute. Thorin is the lead developer of the GenePattern Notebook and an open source developer on the integration of bioinformatic tools with Jupyter.

Presentations

GenePattern Notebook: Jupyter for integrative genomics Session

Thorin Tabor offers an overview of the GenePattern Notebook, which allows Jupyter to communicate with the open source GenePattern environment for integrative genomics analysis. It wraps hundreds of software tools for analyzing omics data types, as well as general machine learning methods, and makes them available through a user-friendly interface.

David Taieb is the STSM for the Cloud Data Services Developer Advocacy team at IBM, where he leads a team of avid technologists with the mission of educating developers on the art of possible with cloud technologies. He’s passionate about building open source tools, such as the PixieDust Python library for the Jupyter Notebook and Apache Spark, that help improve developer’s productivity and overall experience. Previously, David was the lead architect for the Watson Core UI and Tooling team based in Littleton, Massachusetts, where he led the design and development of a Unified Tooling Platform to support all the Watson Tools, including accuracy analysis, test experiments, corpus ingestion, and training data generation. Before that, he was the lead architect for the Domino Server OSGi team responsible for integrating the eXpeditor J2EE Web Container in Domino and building first-class APIs for the developer community. David started with IBM in 1996, working on various globalization technologies and products including Domino Global Workbench and a multilingual content management system for the Websphere Application Server. David enjoys sharing his experience by speaking at conferences and meeting as many people as possible. You’ll find him at various events like the Strata Data Conference, Velocity, and IBM Interconnect.

Presentations

Data science made easy in Jupyter notebooks using PixieDust and InsightFactory Session

David Taieb, Prithwish Chakraborty, and Faisal Farooq offer an overview of PixieDust, a new open source library that speeds data exploration with interactive autovisualizations that make creating charts easy and fun.

Data architect, computational scientist, and technical leader. Andy is the Chief Data Scientist of REX Real Estate, where he is bringing his experience building smart scalable data systems to the real estate industry. You will also find him leading the board of the NumFOCUS foundation. As a passionate advocate for open source scientific codes Andy has been involved in the wider scientific Python community since 2006, contributing to numerous projects in the scientific stack.

Presentations

Empower scientists; save humanity: NumFOCUS—Five years in, five hundred thousand to go Session

What do the discovery of the Higgs boson, the landing of the Philae robot, the analysis of political engagement, and the freedom of human trafficking victims have in common? NumFOCUS projects were there. Join Leah Silen to learn together how we can empower scientists and save humanity.

Andrew Therriault is the chief data officer for the City of Boston, where he leads Boston’s Analytics team, a nationally recognized leader in using data science to improve city operations and make progress in critical areas such as public safety, education, transportation, and health. Previously, Andrew was director of data science for the Democratic National Committee and served editor of Data and Democracy: How Political Data Science Is Shaping the 2016 Elections from O’Reilly. He holds a PhD in political science from NYU and completed a postdoctoral research fellowship at Vanderbilt.

Presentations

Jupyter notebooks and production data science workflows Session

Jupyter notebooks are a great tool for exploratory analysis and early development, but what do you do when it's time to move to production? A few years ago, the obvious answer was to export to a pure Python script, but now there are other options. Andrew Therriault dives into real-world cases to explore alternatives for integrating Jupyter into production workflows.

Rachel Thomas is the cofounder of fast.ai and a researcher-in-residence at USF Data Institute, where she teaches numerical linear algebra. Rachel helped create the free Practical Deep Learning for Coders MOOC, which 50,000 students have started. Previously, she worked as a quant in energy trading, a data scientist and engineer at Uber, and a senior instructor at Hackbright. Rachel is a popular writer on data science and diversity in tech. Her writing has made the front page of Hacker News and Medium, has been included in newsletters by O’Reilly, Fortune, crunchbase, and Mattermark, and has been translated into Spanish, Portuguese, and Chinese. Rachel holds a PhD in mathematics from Duke.

Presentations

How Jupyter Notebook Helped Us Teach Deep Learning to 50,000 Students Keynote

A class of machine learning algorithms called deep learning is achieving state-of-the-art results across many fields. Although some people claim you must start with advanced math to use deep learning, we found that the best way for any coder to get started is with code. We used Jupyter notebooks to provide an environment that encourages students to learn deep learning through experimentation.

Rollin Thomas is a big data architect in the Data and Analytics Services group at Lawrence Berkeley National Laboratory. Previously, he was a staff scientist in the Computational Research division. Rollin has worked on numerical simulations of supernova atmospheres, observation and analysis of supernova spectroscopy data, and data management for supernova cosmology experiments. He has served as a member of the Nearby Supernova Factory, is a builder on the Dark Energy Survey, and is a full member of the Large Synoptic Survey Telescope Dark Energy Science Collaboration. Rollin holds a BS in physics from Purdue University and a PhD in astrophysics from the University of Oklahoma.

Presentations

How JupyterHub tamed big science: Experiences deploying Jupyter at a supercomputing center Session

Shreyas Cholia, Rollin Thomas, and Shane Canon share their experience leveraging Jupyterhub to enable notebook services for data-intensive supercomputing on the Cray XC40 Cori system at the National Energy Research Scientific Computing Center (NERSC).

Marius Tulbure is a developer and JavaScript enthusiast at Figshare , always looking to evolve and improve his code and skills. If asked, he’ll list his hobbies as “everything” but for the sake of brevity, they include binge watching TV series and movies, playing his electric guitar, and trying to solve all sorts of hacking puzzles.

Presentations

Closing the gap between Jupyter and academic publishing Session

Reports of a lack of reproducibility have led funders and others to require open data and code as the outputs of research they fund. Mark Hahnel and Marius Tulbure discuss the opportunities for Jupyter notebooks to be the final output of academic research, arguing that Jupyter could help disrupt the inefficiencies in cost and scale of open access academic publishing.

Peter Wang is the cofounder and CTO of Continuum Analytics, where he leads the product engineering team for the Anaconda platform and open source projects including Bokeh and Blaze. Peter has been developing commercial scientific computing and visualization software for over 15 years and has software design and development experience across a broad variety of areas, including 3D graphics, geophysics, financial risk modeling, large data simulation and visualization, and medical imaging. As a creator of the PyData conference, he also devotes time and energy to growing the Python data community by advocating, teaching, and speaking about Python at conferences worldwide. Peter has a BA in physics from Cornell University.

Presentations

Fueling open innovation in a data-centric world Session

Peter Wang explores open source commercial companies, offering a firsthand account of the unique challenges of building a company that is fundamentally centered around sustainable open source innovation and sharing guidelines for how to carry volunteer-based open source values forward, intentionally and thoughtfully, in a data-centric world.

Jupyter & Anaconda: Shaking Up the Enterprise (sponsored by Continuum) Keynote

Open source has emerged as a valuable player in the enterprise in recent years. Companies like Jupyter and Anaconda are leading the way. Hear CTO and co-founder of Continuum Analytics Peter Wang discuss the co-evolution of these two major players in the new Open Data Science ecosystem and next steps to a sustainable future.

Christopher Wilcox is a software engineer at Microsoft, where he works on a range of products including Azure Notebooks, Python Tools for Visual Studio, and the Azure SDK for Python. Chris has more than five years’ experience building developer tooling and, more recently, scalable web services. In his spare time, he races motorcycles, hikes, and explores the Seattle brewing scene.

Presentations

Hosting Jupyter at scale Session

Have you thought about what it takes to host 500+ Jupyter users concurrently? What about managing 17,000+ users and their content? Christopher Wilcox explains how Azure Notebooks does this daily and discusses the challenges faced in designing and building a scalable Jupyter service.

Karlijn Willems is a data science journalist at DataCamp, where she writes for the DataCamp community, focusing on data science and data science education. Previously, she worked as a junior big data developer with Hadoop, Spark, and Scala. Karlijn holds a degree in literature and linguistics (English and Spanish) and information management from KU Leuven.

Presentations

Enhancing data journalism with Jupyter Session

Drawing inspiration from narrative theory and design thinking and exploring real-world examples, Karlijn Willems walks you through effectively using Jupyter notebooks to guide the data journalism workflow and tackle some of the challenges that data can pose to data journalism.

Carol Willing is a director of the Python Software Foundation, a Jupyter Steering Council member, and a geek in residence at FabLab San Diego, where she teaches wearable electronics and software development. She co-organizes PyLadies San Diego and San Diego Python, contributes to open source community projects, including OpenHatch, and is an active member of the MIT Enterprise Forum in San Diego. She enjoys sharing her passion for electronics, software, problem solving and the arts. Previously, Carol worked in software engineering management, product and project management, sales, and nonprofit organizations. She holds an MS in management with an emphasis on applied economics and high tech marketing from MIT and a BSE in electrical engineering from Duke University.

Presentations

Deploying JupyterHub for students and researchers Tutorial

JupyterHub, a multiuser server for Jupyter notebooks, enables you to offer a notebook server to everyone in a group—which is particularly useful when teaching a course, as students no longer need to install software on their laptops. Min Ragan-Kelley, Carol Willing, Yuvi Panda, and Ryan Lovett get you started deploying and customizing JupyterHub for your needs.

JupyterHub: A roadmap of recent developments and future directions Session

JupyterHub is a multiuser server for Jupyter notebooks. Min Ragan-Kelley and Carol Willing discuss exciting recent additions and future plans for the project, including sharing notebooks with students and collaborators.

Music and Jupyter: A combo for creating collaborative narratives for teaching Session

Music engages and delights. Carol Willing explains how to explore and teach the basics of interactive computing and data science by combining music with Jupyter notebooks, using music21, a tool for computer-aided musicology, and Magenta, a TensorFlow project for making music with machine learning, to create collaborative narratives and publishing materials for teaching and learning.

Catherine Zucker is a graduate student at Harvard University, where she is pursuing a PhD in astronomy. An NSF graduate research fellow, Catherine works with Alyssa Goodman and Douglas Finkbeiner on the 3D distribution of our galaxy’s gas and dust, in pursuit of a better understanding of the spiral structure of the Milky Way. She is an avid user of Jupyter Notebooks in her research and is broadly interested in their potential to make astronomy more open source, seamless, and accessible. Born and raised in Virginia, Catherine holds a double major in astronomy-physics and history from the University of Virginia, where her theses covered the evolution of galaxies in dense galaxy groups and the rise of the modern astronomical research observatory in the United States.

Presentations

Citing the Jupyter Notebook in the scientific publication process Session

Although researchers have traditionally cited code and data related to their publications, they are increasingly using the Jupyter Notebook to share the processes involved in the act of scientific inquiry. Bernie Randles and Catherine Zucker explore various aspects of citing Jupyter notebooks in publications, discussing benefits, pitfalls, and best practices for creating the "paper of the future."