Brought to you by NumFOCUS Foundation and O’Reilly Media Inc.
The official Jupyter Conference
August 22-23, 2017: Training
August 23-25, 2017: Tutorials & Conference
New York, NY

Speakers

New speakers are added regularly. Please check back to see the latest updates to the agenda.

Filter

Search Speakers

Safia Abdalla is one of the maintainers of nteract, a desktop-based interactive computing experience. A data scientist and software engineer with an interest in open source software and data science for social good, Safia is the organizer of PyData Chicago. In her free time, she enjoys running, working out, and drinking tea.

Presentations

How To Cross the Asteroid Belt Tutorial

Have you wondered what it takes to go from a Jupyter user to a Jupyter pro? Wonder no more! In this talk, we'll cover the core concepts of the Jupyter ecosystem like the extensions ecosystem, the kernel ecosystem, and the front-end architecture. Attendees will leave with an understanding of the possibilities of the Jupyter ecosystem and practical skills on customizing the Notebook experience.

I’m a software and data engineer, contributor and author of noted Scala projects (coursier, shapeless, …).

Presentations

Scala: why hasn't an official Scala kernel for Jupyter emerged yet? Session

This talk aims at giving an opiniated answer to the question: why hasn't an official Scala kernel for Jupyter emerged yet? Part of the answer lies in the fact that there are no Scala shell as user-friendly as IPython. But a strong contender is emerging! It still has to overcome a few challenges, not the least of them being supporting big data frameworks like Spark, Scio, Scalding, etc.

Gunjan is a student at University of California, Berkeley. She completed her bachelor’s degree in computer science and biochemistry and is now pursuing a master’s in computer science with a research focus on computational biology. At UC Berkeley, Gunjan has been involved with the undergraduate Data Science education program. As a student instructor, she has worked with Jupyter notebooks in the classroom and now provides technical support for the JupyterHub infrastructure used at Berkeley. She looks forward to speaking about the use of notebooks in an educational setting.

Presentations

Data Science at UC Berkeley: 2000 undergraduates, 50 majors, no command line Session

Engaging critically with data is now a required skill for students in all areas, but many traditional data science programs aren’t easily accessible to those without prior computing experience. Our data science program has 1200 students across 50 majors (ranging from history & literature to cognitive science), and we explain how we designed our pedagogy to make data science accessible to everyone.

Lorena A. Barba is Associate Professor of Mechanical and Aerospace Engineering at the George Washington University in Washington, DC. She has a PhD in Aeronautics from the California Institute of Technology. Prof. Barba received the NSF Faculty Early CAREER award (2012), was named CUDA Fellow by NVIDIA Corp. (2012), is an awardee of the UK Engineering and Physical Sciences Research Council (EPSRC) First Grant program (2007), and is a leader in computational science and engineering internationally. Her research includes computational fluid dynamics, high-performance computing, computational biophysics and animal flight.

Prof. Barba is also a long-standing advocate of open-source software for science and education. Her courses and open educational resources using Jupyter notebooks are well known. She is a recipient of the 2016 Leamer-Rosenthal Award for Open Social Sciences, and in 2017, she was nominated and received an Honorable Mention in the Open Education Awards for Excellence of the Open Education Consortium.

Presentations

Keynote by Lorena Barba Keynote

Details to come.

Dr. Jim Bednar is a Senior Solutions Architect at Continuum Analytics and an Honorary Fellow in the School of Informatics at the University of Edinburgh, Scotland. Dr. Bednar holds a Ph.D. in Computer Science from the University of Texas, along with degrees in Electrical Engineering and Philosophy. He has published more than 50 papers and books about the visual system, data visualization, and software development. Dr. Bednar manages the open source Python projects Datashader, HoloViews, GeoViews, ImaGen, Param, and ParamNB. Before Continuum, Dr. Bednar was a lecturer and researcher in Computational Neuroscience at the University of Edinburgh, Scotland, as well as a software and hardware engineer at National Instruments.

Presentations

Deploying interactive Jupyter dashboards for visualizing hundreds of millions of datapoints, in 30 lines of Python Tutorial

It can be difficult to assemble the right set of packages from the Python scientific software ecosystem to solve complex problems. This presentation will show step by step how to make and deploy a concise, fast, and fully reproducible recipe for interactive visualization of millions or billions of datapoints using very few lines of Python in a Jupyter notebook.

Daina is the Head Librarian of the Harvard-Smithsonian Center for Astrophysics in Cambridge, MA. Her work aims to lower social and technical barriers that impact the astronomy community’s ability to create and share new knowledge. Her research interests primarily focus on how libraries can support open science, research software preservation, emerging computational methods, and the history of science. She is currently enrolled in CUNY’s School of Professional Studies where she is earning a MS in Data Analytics.

Presentations

Beautiful networks and network analytics made simpler with Jupyter Session

Network analytics using tools like NetworkX and Jupyter often leave programmers with difficult to examine hairballs rather than useful visualizations. Meanwhile, more flexible tools like SigmaJS have high learning curves for people new to Javascript. This session will show how a simple, flexible architecture can help people make beautiful javascript networks without ditching the Jupyter notebook.

Maarten Breddels is a Postdoctoral Researcher at the Kapteyn Astronomical Institute, University of Groningen (RUG), Netherlands. He earned a bachelor in Information Technology, and a bachelor, master and PhD in Astronomy. Maarten has experience in low level languages as assembly and c, to higher level languages from C++ to Java and Python. His PhD was on the field of galactic dynamics. He is now working for the Gaia mission, combing astronomy and IT, to enable visualization and exploration of the large dataset this satellite will yield.

Presentations

A billion stars in the Notebook Session

I will present vaex and ipyvolume. Vaex enables calculating statistics for a billion samples per second on a regular N-dimensional grid. Ipyvolume enabled volume and glyph rendering in the Notebook. Together they allowing interactively visualization and exploration of large, high dimensional datasets in the Notebooks .

Matt Burton earned his PhD in Information from the University of Michigan in 2015. His dissertation “Blogs as Infrastructure for Scholarly Communication” explored digital humanities blogging and the socio-technical dynamics of web-centric publishing. His research interests include infrastructure studies, data science, and scholarly communication. He is currently a Visiting Assistant Professor at the School of Computing and Information at the University of Pittsburgh.

Presentations

Defactoring Pace of Change: Reviewing computational research in the digital humanities Session

While Jupyter Notebooks are a boon for computational science, they are also a powerful tool in the digital humanities. This talk introduces the digital humanities community, discusses a novel use of Jupyter Notebooks to analyze computational research, and reflects upon Jupyter’s relationship to scholarly publishing and the production of knowledge.

Natalino Busa is currently head of data science at Teradata, where he leads the definition, design, and implementation of big, fast data solutions for data-driven applications, such as predictive analytics, personalized marketing, and security event monitoring. Previously, Natalino served as enterprise data architect at ING and as senior researcher at Philips Research Laboratories on the topics of system-on-a-chip architectures, distributed computing, and parallelizing compilers. Natalino is an all-around technology manager, product developer, and innovator with a 15+ year track record in research, development, and management of distributed architectures and scalable services and applications.

Presentations

Data Science Apps: Beyond Notebooks Session

Jupyter notebooks are transforming the way we look at computing, coding and science. But is this the only "data scientist experience" that this technology can provide? Actually, you can use Jupyter to create interactive web applications for data exploration and analysis. In the background, these apps are still powered by well understood and documented Jupyter notebooks.

Charlotte Cabasse-Mazel holds a PhD in Geography and Science and Technologies Studies from the University of Paris-Est, where she studied at the Laboratoire Techniques, Territoires et Sociétés (LATTS), at Ecole Nationale des Ponts et Chaussées. She is interested in the ways in which practices and methodologies of data science transform production of knowledge and interdisciplinary collaboration, as well as scientific personae and trajectories within the academic institution.

Presentations

Jupyter and the changing rituals around computation Session

Jupyter Notebooks are not only transforming how people communicate knowledge, but also supporting new social and collaborative practices. In this talk, we present ethnographic findings about various rituals performed with Jupyter notebooks. The concept of rituals is useful for thinking about how the core technology of notebooks is extended through other tools, platforms, and practices.

Brett is a Python core developer who works on Python at Microsoft on the Azure Data Science Tools team.

Presentations

Keynote by Brett Cannon Keynote

Details to come.

Shane Canon joined NERSC in 2000 to serve as a system administrator for the PDSF cluster. While working with PDSF he gained experience in cluster administration, batch systems, parallel file systems and the Linux kernel. In 2005, Shane left LBNL to take a position as Group Leader at Oak Ridge National Laboratory. One of the more significant accomplishments while at ORNL was architecting the 10 petabyte Spider File System. In 2008, Shane returned to NERSC to lead the Data Systems Group. In 2009, he transitioned to leading a newly created Technology Integration Group in order to focus on the Magellan Project and other areas of strategic focus. More recently Shane has focused on enabling data intensive applications on HPC platforms and engaging with bioinformatics applications. Shane joined the Data & Analytics Services group in 2016 to focus on these topics. Shane is involved in a number of projects outside of NERSC. He is the Production Lead on the KBase project which is developing a platform to enable predictive biology. Shane has a Ph.D in Physics from Duke University and B.S. in Physics from Auburn University.

Presentations

How Jupyterhub Tamed Big Science: Experiences Deploying Jupyter at a Supercomputing Center Session

Extracting scientific insights from data increasingly demands a richer, more interactive experience than traditional high-performance computing systems have traditionally provided. We present our efforts to leverage Jupyterhub to enable notebook services for data-intensive supercomputing on the Cray XC40 Cori system at the National Energy Research Scientific Computing Center (NERSC).

Shreyas Cholia leads the Usable Software Systems Group at Lawrence Berkeley National Laboratory (LBNL), focused on making scientific computing more transparent and usable. He is particularly interested in how web APIs and tools can facilitate this. Shreyas also leads the science gateway, web and grid efforts at the National Energy Research Scientific Computing Center (NERSC) at LBNL. His current work includes a project that enables Jupyter to interact with supercomputing resources, and NEWT – a REST API for high performance computing. He graduated from Rice University, where he studied Computer Science and Cognitive Sciences.

Presentations

How Jupyterhub Tamed Big Science: Experiences Deploying Jupyter at a Supercomputing Center Session

Extracting scientific insights from data increasingly demands a richer, more interactive experience than traditional high-performance computing systems have traditionally provided. We present our efforts to leverage Jupyterhub to enable notebook services for data-intensive supercomputing on the Cray XC40 Cori system at the National Energy Research Scientific Computing Center (NERSC).

Danielle is a solutions engineer at Zymergen, working on custom software tools for scientists. Prior to Zymergen, she worked on failure detection software for an ingestible sensor company and studied Bioengineering at UC Berkeley/UCSF.

Presentations

Using Jupyter at the intersection of robots and industrial biology Session

Zymergen is a technology company, approaching biology with an engineering and data-driven mindset. Our platform integrates robotics, software, and biology to deliver predictability and reliability during strain design and development. This session will highlight how Jupyter notebooks play an integral role in providing a shared Python environment between our software engineers and scientists.

I am interested in the intersection of education, industry, and academia, and seeing what happens when you make powerful scientific modelling, visualization and communication tools accessible through the web. To explore these ideas, I founded 3point Science where we build web-based visualization software for the geoscience industry (Steno3D). 3point Science was acquired by Aranz Geo in 2016, and I have remained on as the CTO.

I am also a graduate student at The University of British Columbia (Canada) where I am researching a numerical framework aimed at increasing quantitative communication in the geosciences. This framework has been developed through my studies on numerical geophysics, subsurface flow, and structural geology. Much of my research is accessible through an open-source software initiative for geophysical simulations and parameter estimation (SimPEG) and an open website for geoscience modelling (Visible Geology).

Presentations

Deploying a reproducible course Session

In the deployment of a short-course on geophysics, we have been developing strategies for developing an “educational stack.” Web-based textbooks and interactive simulations built in Jupyter notebooks provide an entry-point for course participants to reproduce content they are shown and to dive into the code used to build them. We will share the tools we are using and discuss some of our learnings.

Marc is received his Bachelor of Health Sciences degree from McMaster University in Canada in 2004. He subsequently received his Ph.D. in 2011 from the Medical Sciences Program (Infection and Immunity Stream) at McMaster in the Department of Pathology and Molecular Medicine. Before joining Zymergen, Marc worked in various research areas including immunology, dynamic proteomics systems, and health care data modeling.

Presentations

Using Jupyter at the intersection of robots and industrial biology Session

Zymergen is a technology company, approaching biology with an engineering and data-driven mindset. Our platform integrates robotics, software, and biology to deliver predictability and reliability during strain design and development. This session will highlight how Jupyter notebooks play an integral role in providing a shared Python environment between our software engineers and scientists.

Sylvain Corlay is a quant researcher specializing in stochastic analysis and optimal control. He holds a PhD in applied mathematics from University Paris VI.

As an open source developer, Sylvain mostly contributes to Project Jupyter in the area of interactive widgets and lower level components such as traitlets, he is also a member of the steering committee of the Project. Besides Jupyter, Sylvain is a contributor to a number of other open-source projects for scientific computing and data visualization, such as bqplot, pythreejs and ipyleaflet. Sylvain also coauthored the xtensor C++ tensor algebra library.

Sylvain founded QuantStack in September 2016. Prior to founding QuantStack, Sylvain was a quant researcher at Bloomberg LP and an adjunct faculty member at Columbia University and NYU.

Presentations

Jupyter Widgets: Interactive controls for Jupyter Tutorial

Jupyter widgets enables building user interfaces with graphical controls such as sliders and textboxes inside a Jupyter notebook, documentation, and web pages. Jupyter widgets also provide a framework for building custom controls. We will show how to use Jupyter widgets effectively for interactive computing, explore the ecosystem of custom controls, and demonstrate how to build your own control.

Xeus: a framework for writing native Jupyter kernels Session

xeus is a library meant to facilitate the implementation of kernels for Jupyter. It takes the burden of implementing the Jupyter Kernel protocol so that kernel authors can focus on implementing the language specific part of the kernel, and support features such as auto-complete or interactive widgets more easily. We showcase a new C++ kernel based on the cling interpreter built with Xeus.

John is a freelance developer, data scientist and musician from Queens, NY. He is currently enrolled in CUNY’s School of Professional Studies where he is finishing his masters degree in data analytics. His current research revolves around the development musical intelligence systems using natural language processing techniques with a focus on realtime human-computer interaction. John also has an interest in developing applications for data scientists that emphasize interactive data visualization, leveraging the best tools currently available in both Python and Node.js.

Presentations

Beautiful networks and network analytics made simpler with Jupyter Session

Network analytics using tools like NetworkX and Jupyter often leave programmers with difficult to examine hairballs rather than useful visualizations. Meanwhile, more flexible tools like SigmaJS have high learning curves for people new to Javascript. This session will show how a simple, flexible architecture can help people make beautiful javascript networks without ditching the Jupyter notebook.

Christine Doig (@ch_doig) is a Product Manager and Senior Data Scientist at Continuum Analytics. She has 8+ years of experience in analytics, operations research, and machine learning in a variety of industries, including energy, manufacturing, and banking. Christine holds a M.S. in Industrial Engineering from the Polytechnic University of Catalonia in Barcelona. She is an open source advocate and has spoken at PyData, EuroPython, SciPy, PyCon, OSCON, and many other open source conferences.

Presentations

Leveraging Jupyter to build an Excel-Python bridge Session

This talk will introduce how we built a commercial product on top Jupyter to help Excel users access the capabilities of the rich data science Python ecosystem. We'll present examples and use cases in a variety of industries, the collaborative workflow between analysts and data scientists that the application has enabled, and how we leveraged the Jupyter architecture to build the product.

Nadia explores how we can better support open source infrastructure, highlighting current gaps in funding and knowledge. She recently published “Roads and Bridges: The Unseen Labor Behind Our Digital Infrastructure” with support from the Ford Foundation. Nadia is currently building sustainability initiatives at GitHub. She is based in San Francisco.

Presentations

Keynote by Nadia Eghbal Keynote

Details to come.

Brittany Fiore-Gartland, Ph.D., is the Director of Data Science Ethnography at the eScience Institute and a Research Scientist in the Department of Human Centered Design and Engineering. She leads a research group that studies the sociocultural implications of data-intensive science and how data-intensive technologies are reshaping how people work and organize. Her research focuses on cross-sector and interdisciplinary data science collaborations; emerging pedagogical models for data science; and bringing a human-centered, sociotechnical, and ethical perspective to data science practice.

She co-leads UW Data Science Studies, an interdisciplinary group of researchers studying the sociotechnical and ethical dimensions of the emerging practice of data science. UW Data Science Studies is part of a collaborative and multi-sited working group supported through the Moore-Sloan Data Science Environments and in partnership with researchers at Berkeley Institute for Data Science and the Center for Data Science in New York University.

Whenever possible her work follows a model of action research. This means her research practice aims to inform and affect positive change within the communities she studies. Often this takes the form of articulating the challenges and opportunities for communication and collaboration during times of technological change. She works with communities to bridge communication gaps and develop value-informed, reflexive, and adaptive organizational practice.

Presentations

Jupyter and the changing rituals around computation Session

Jupyter Notebooks are not only transforming how people communicate knowledge, but also supporting new social and collaborative practices. In this talk, we present ethnographic findings about various rituals performed with Jupyter notebooks. The concept of rituals is useful for thinking about how the core technology of notebooks is extended through other tools, platforms, and practices.

Tim Gasper is Director of Product & Marketing at Bitfusion, a GPU virtualization company enabling easier, more scalable deep learning, as well as co-founder at Ponos, an IoT-enabled hydroponics farming technology company. Tim has over eight years of big data, IoT, and enterprise content product management and product marketing experience. He is also a writer and speaker on entrepreneurship, the Lean Startup methodology, and big data analytics. Previously, Tim was global portfolio manager for CSC Big Data and Analytics, where he was responsible for the overall strategy, roadmap, partnerships, and technology mix for the big data and analytics product portfolio; vice president of product at Infochimps (acquired by CSC), where he led product development for its market-leading open data marketplace and big data platform as a service; and cofounder of Keepstream, a social media analytics and curation company.

Presentations

Deep Learning and Elastic GPUs using Jupyter Session

Jupyter is great for deep learning development and training. Combined with GPUs, it makes for fast dev and fast execution, but doesn’t make it easy to switch from a CPU execution context to GPUs and back. We’ll look at best practices on doing deep learning with Jupyter, and then show how to work with CPUs and GPUs more easily by using Elastic GPUs and quick-switching between custom kernels.

Scientific research lead and author of popular open source tools in bioinformatics and statistical programming for applications in healthcare, life sciences, and beyond. Focus on data science, visualization, machine learning, data mining, and prototyping software to understand molecular, cellular, and clinical data. Currently working at Verily Life Sciences (fka Google Life Sciences).

Open Source projects for the analysis of data I have significantly contributed to
or led include:

  • Bioconductor
  • affy: Analysis of Affymetrix microarray data
  • rpy2, a Python-R bridge for data analysis and prototyping

Presentations

Pragmatic polyglot data analysis in Jupyter notebooks with SQL, Python and R Tutorial

Python is popular for data analysis, but restricting oneself to only use it, would be missing a wealth of libraries or capabilities available in R, or SQL. This tutorial will demonstrate that a polyglot approach can be pragmatic, reasonable, and good looking thanks to R visualizations.

R. Stuart Geiger is an ethnographer and post-doctoral scholar at the Berkeley Institute for Data Science at UC-Berkeley, where he studies the infrastructures and institutions that support the production of knowledge. His Ph.D research at the UC-Berkeley School of Information focused on the governance and operation of Wikipedia and scientific research networks. He has studied topics including newcomer socialization, moderation and quality control, specialization and professionalization, cooperation and conflict, the roles of support staff and technicians, and diversity and inclusion. He uses ethnographic, historical, qualitative, and quantitative methods in his research, which is grounded in the fields of Computer-Supported Cooperative Work, Science and Technology Studies, and communication and new media studies.

Presentations

Jupyter and the changing rituals around computation Session

Jupyter Notebooks are not only transforming how people communicate knowledge, but also supporting new social and collaborative practices. In this talk, we present ethnographic findings about various rituals performed with Jupyter notebooks. The concept of rituals is useful for thinking about how the core technology of notebooks is extended through other tools, platforms, and practices.

Matt Greenwood joined Two Sigma in November of 2003 and since then has led a number of company-wide efforts in both engineering and modeling. Matt began his career at Bell Labs, working in the Operating Systems group under Dennis Ritchie. Subsequently, he moved onto IBM Research, where he was responsible for a number of early efforts in tablet computing and distributed computing. In 2000, Matt joined colleagues from Bell Labs at Entrisphere, Inc. in California. Entrisphere created a product providing access equipment for broadband service providers. During his tenure there, he was lead developer and manager for a number of systems on the network element. He then created the Customer Engineering department in preparation for initial customer trials. Matt earned a B.A. and M.A. in Math from Oxford University, and a Master’s degree in Theoretical Physics from the Weizmann Institute of Science in Israel. He also holds a Ph.D. in Mathematics from Columbia University where he taught for a number of years.

Presentations

From Beaker to BeakerX Session

This talk will introduce BeakerX, a set of Jupyter notebook extensions that enable polyglot data science, time-series plotting and processing, research publication, and integration with Apache Spark. We’ll review the Jupyter extension architecture and how BeakerX plugs into it, cover the current set of BeakerX capabilities, and discuss the pivot from Beaker, a standalone notebook, to BeakerX.

Jason Grout is a Jupyter developer at Bloomberg, working primarily on JupyterLab and the interactive widget system. He has also been a major contributor to the open source Sage mathematical software system for many years. Jason also co-organizes the PyDataNYC Meetup. Previously, Jason was an assistant professor of mathematics at Drake University in Des Moines, Iowa. He earned a PhD in mathematics from Brigham Young University.

Presentations

JupyterLab Tutorial Tutorial

JupyterLab Tutorial - A walkthrough of JupyterLab as a user and as an extension author. A tour of the capabilities of JupyterLab, and a demonstration of creating a simple extension to the environment.

Mark is the founder and CEO of figshare, an open data tool that allows researchers to publish all of their data in a citable, searchable and sharable manner. He previously completed his PhD in stem cell biology at Imperial College London. He is passionate about open science and the potential it has to revolutionise the research community. For more information about figshare, visit https://figshare.com. You can follow him at @MarkHahnel

Presentations

Closing the gap between Jupyter and Academic Publishing Session

Reports of a lack of reproducibility have led funders and others to require open data and code as as the outputs of research they fund. In this talk, we will describe the opportunities for Jupyter notebooks to be the final output of academic research. We will discuss how Jupyter could help disrupt the inefficiencies in cost and scale of open access academic publishing.

Lindsey Heagy is a PhD candidate at the University of British Columbia studying numerical geophysics. Her work focusses on using electromagnetic geophysics for monitoring subsurface injections including carbon capture and storage and hydraulic fracturing. She a project-lead on GeoSci.xyz, an effort to build collaborative, interactive, web-based textbooks in the geosciences and core contributor to SimPEG , an open source framework for geophysical simulation and inversions.

Presentations

Deploying a reproducible course Session

In the deployment of a short-course on geophysics, we have been developing strategies for developing an “educational stack.” Web-based textbooks and interactive simulations built in Jupyter notebooks provide an entry-point for course participants to reproduce content they are shown and to dive into the code used to build them. We will share the tools we are using and discuss some of our learnings.

Dr. Jordan is a product of the Detroit Public School system. After graduating from Martin Luther King High School she attended Michigan Technological University majoring in Mechanical Engineering. She earned B.S. and M.S. degrees and interned with various companies including Marathon Petroleum Company, S.C. Johnson, Ford Motor Company, and Educational Testing Services. As a graduate student, she received fellowships from the National Society of Black Engineers (NSBE), King-Chavez-Parks Initiative, and the National GEM Consortium. Dr. Jordan served on the Board of Directors for the National Society of Black Engineers (NSBE) for 3 years. After completing a Master’s degree in Education and PhD in Engineering Education at The Ohio State University, Dr. Jordan completed a 2-year post doctoral fellowship at Embry-Riddle Aeronautical University. Her research focus was evidenced-based instructional practices among STEM faculty. She is currently the Deputy Director of Assessment for Data Carpentry and an advocate for improving diversity in data science.

Presentations

Learning to code isn’t enough. Training as a Pathway to Improve Diversity Session

Diversity can be achieved through sharing information among members of a community. As Jupyter prides itself on being a community of “dynamic developers”, “cutting edge scientists”, and “everyday users”, is our platform being shared with diverse populations? Explore how training has the potential to improve diversity and drive usage of Jupyter notebooks in broader communities.

Wendy Kan is a data scientist at Kaggle, the largest global data science community. Wendy works with companies and organizations to transform their data into machine learning competitions. She was a software engineer and researcher before joining Kaggle. She holds BS and MS degrees in Electrical Engineering and PhD in Biomedical Engineering.

Presentations

Lessons learned from tens of thousands of Kaggle notebooks Session

Kaggle Kernels, an in-browser code execution environment which includes a version of Jupyter Notebooks, has allowed Kaggle, home of the world’s largest data science community, to flourish in new ways. From a diverse repository of user-created notebooks paired with competitions and public datasets, we share how Kernels has impacted machine learning trends, collaborative data science, and learning.

Kyle Kelley is a senior software engineer at Netflix, a maintainer on nteract.io, and a core developer of the IPython/Jupyter project. He wants to help build great environments for collaborative analysis, development, and production workloads for everyone; from small teams to massive scale.

Presentations

Netflix at Jupyter Session

Netflix Data Scientists and Engineers. What do they know? Do they know things? Let's find out!

Chris Kotfila holds dual degrees in Computer Science and Philosophy from Rensselaer Polytechnic Institute. During his time at RPI he worked regularly as a research programmer in the area of computational cognitive engineering. Chris served overseas with the US Peace Corps and on returning to the states he earned a masters degree in Library Science where he focused on issues of open access, scholarly communication and reproducible research.

Chris’s research interests are in natural language processing, machine learning, knowledge organization and geographic information science. He is an avid open source enthusiast, and a hopeless Emacs user.

Presentations

GeoNotebook: an extension to the Jupyter Notebook for exploratory geospatial analysis Session

GeoNotebook is an extension to the Jupyter Notebook that provides interactive visualization and analysis of geo-spatial data. Unlike other geo-spatial extensions to the Notebook, GeoNotebook includes a fully integrated tile server providing easy visualization of vector and raster data formats.

Aaron Kramer is a data scientist and engineer. He currently is a data scientist at Datascience Inc, where he builds powerful language and engagement models using natural language processing, deep learning, bayesian inference, and machine learning.

Presentations

Interactive Natural Language Processing with SpaCy and Jupyter Tutorial

Modern natural language processing workflows often require interoperability between multiple tools. This lecture is an introduction to interactive nlp with SpaCy within the Jupyter notebook. We'll cover core nlp concepts, core workflows in SpaCy, and work through examples of interacting with other tools like Tensorflow, networkx, LIME, and others as part of interactive nlp projects.

Michael worked in Microsoft Live Search and Windows Mobile Services, Bing Travel (http://www.bing.com/travel), MSN and MSN Mobile, FUSE Labs (http://www.so.cl), and now in Developer Experience on the Partner Catalyst Team (http://dxdevblog.azurewebsites.net/developerblog/real-life-code/).

His current work ranges from implementing binary protocols in Javascript (https://github.com/noodlefrenzy/node-amqp10) to training domain-specific image classification convolutional neural networks (http://www.mikelanzetta.com/2015/09/image-stream-processing-to-blob-storage/). He works with everyone from small startups to large enterprise customers – anyone doing innovative work that is stretching the Microsoft stack (and in particular Azure) beyond its current limits.

Michael is currently the head of the Machine Learning Technical Working Group in DX, helping upskill Microsoft’s field in ML and Deep Learning, and leading efforts to bring Microsoft’s suite of ML technologies to the aid of our partners, large and small.

Presentations

Notebook Narratives from Industry – Inspirational Real-World Examples and Reusable Industry Notebooks Session

We describe, with video and demonstrations, four inspirational industry applications of Jupyter notebooks. These industry examples represent innovative applications of machine learning in manufacturing, retail, services and education. We also present and share four reference industry Jupyter notebooks, along with demo data sets, for practical application to class industry value areas.

Ryan Lovett manages research and instructional computing for the Department of Statistics at UC Berkeley. He is most often a sysadmin though enjoys programming and consulting with faculty and students.

Presentations

Deploying JupyterHub for students and researchers Tutorial

JupyterHub, a multi-user server for Jupyter notebooks, enables you to offer a notebook server to everyone in a group. When teaching a course, you can use JupyterHub to give each student access to the same resources and notebooks. There’s no need for the students to install software on their laptops. This tutorial will get you started deploying and customizing JupyterHub for your needs.

Managing a 1000+ student JupyterHub without losing your sanity Session

For our data science education program, we use Jupyter notebooks on a JupyterHub so students can learn data science without being distracted by details like installing and debugging Python packages. This talk will explain the DevOps principles we use to keep our hub (1000+ users, largest reported educational hub) stable and performant, and have all the features our instructors and students want.

Johan Mabille is a scientific software developer specializing in high-performance computing in C++. He holds a master’s degree in computer science from Centrale-Supelec. As an open source developer, Johan coauthored xtensor and xeus , and is the main author of xsimd. Prior to joining QuantStack, Johan was a quant developer at HSBC.

Presentations

Xeus: a framework for writing native Jupyter kernels Session

xeus is a library meant to facilitate the implementation of kernels for Jupyter. It takes the burden of implementing the Jupyter Kernel protocol so that kernel authors can focus on implementing the language specific part of the kernel, and support features such as auto-complete or interactive widgets more easily. We showcase a new C++ kernel based on the cling interpreter built with Xeus.

Ali Marami has PhD in Finance from University of Neuchâtel in Switzerland and BS in Electrical engineering. He has extensive experience in financial and quantitative modeling and model risk management in several US banks. He is the Chief Data Scientist and of the founders of R-Brain which is a platform for developing, sharing and promoting models and applications in Data Science.

Presentations

Building a Powerful Data Science IDE for R, Python and SQL using JupyterLab Session

JupyterLab provides a robust foundation for building flexible computational environments. As one of the contributors to this project, we have leveraged the JupyterLab extension architecture to build a powerful IDE. R-Brain IDE is one of the few tools in the market which supports R and Python in data science evenly with important features such as IntelliSense, debugging, environment and data view.

Nobu is a project researcher responsible for design and operation of academic cloud within NII: National Institute of Informatics, inter-university research institute corporation research organization of information and systems. Formerly he has been a senior specialist/manager of OSS professional services @ NTT Data Corp. and has broad range of experience on OSS based enterprise infrastructure deployment and operations from mission critical high availability systems and bigdata clusters.

Presentations

Collaboration and Automated Operation as Literate Computing for Reproducible Infrastructure Session

Jupyter is useful for DevOps as well. It enables collaboration between expert and novice to accumulate infrastructure knowledge, and also between tech and non-tech users. Automation via notebooks enhances traceability and reproducibility. We elaborate knowledge, workflow, and customer support as Literate Computing practices. We show how combine Jupyter with Ansible for reproducible infrastructure.

Wes McKinney is a software architect at Two Sigma Investments. He is the creator of Python’s pandas library, and he is a PMC member for Apache Arrow and Apache Parquet. He wrote the book Python for Data Analysis. Previously, Wes worked for Cloudera, and he was the founder and CEO of DataPad.

Presentations

Keynote by Wes McKinney Keynote

Details to come.

Daniel Mietchen is a biophysicist interested in integrating research workflows with the World Wide Web, particularly through open licensing, open standards, public version histories and forkability. With research activities spanning from the subcellular to the organismic level, from fossils to developing embryos, and from insect larvae to elephants, he experienced multiple shades of the research cycle and a variety of approaches to collaboration and sharing in research contexts. He has also been contributing to Wikipedia and its sister projects for more than a decade and is actively engaged in increasing the interactions between the Wikimedia and research communities.

More here

Presentations

Post-publication peer review of Jupyter Notebooks referenced in articles on PubMed Central Session

Jupyter Notebooks are a popular option for sharing data science workflows. We sought to explore best practices in this regard and chose to analyze Jupyter Notebooks referenced in PubMed Central in terms of their reproducibility and other aspects of usability (e.g. documentation, ease of reuse). The project started at a hackathon earlier this month, is still ongoing and documented on GitHub.

Christian Moscardi has lived in NYC for the past 6 years, having previously developed a CMS for food blogs, worked for Google, and researched and taught at Columbia. Extracurricular activities include cooking, piano, and exploring New York.

Presentations

Practical Machine Learning with Jupyter Notebooks 2-Day Training

We cover developing a machine learning pipeline, from prototyping to production, in the Jupyter platform. We look at data cleaning, feature engineering, model building/evaluation, and deployment. We dive into applications from real-world datasets. We highlight Jupyter magics, settings, and libraries to enable visualizations. We demonstrate Jupyter best practices in an industry-focused setting.

Teaching from Jupyter notebooks Session

This talk will focus on the practical solutions we have developed in our use of Jupyter notebooks for education. We will discuss some of the open-source Jupyter extensions we have written to improve the learning experience, as well as tools to clean notebooks before they are committed to version control.

Andreas Müller is a lecturer at the Data Science Institute at Columbia University and author of the O’Reilly book “Introduction to machine learning with Python”, describing a practical approach to machine learning with python and scikit-learn. He is one of the core developers of the scikit-learn machine learning library, and has been co-maintaining it for several years. He is also a Software Carpentry instructor. In the past, he worked at the NYU Center for Data Science on open source and open science, and as Machine Learning Scientist at Amazon.

His mission is to create open tools to lower the barrier of entry for machine learning applications, promote reproducible science and democratize the access to high-quality machine learning algorithms.

Presentations

Data Analysis and Machine Learning in Jupyter Tutorial

In this tutorial we will use Jupyter notebooks together with the data analysis packages pandas, seaborn and scikit-learn to explore a variety of real-world datasets. We will walk through initial assessment of data, dealing with different data types, visualization and preprocessing, and finally build predictive models for tasks including health care and housing.

Writing (and publishing) a book written in Jupyter Notebooks Session

One of the strength of Jupyter Notebooks is combining narrative, code and graphics. This is the ideal combination for teaching anything programming related - which is why I chose notebooks as the tool for writing "Introduction to Machine Learning with Python". However, going from notebook to book was not easy, and this talk will describe challenges and tricks for converting notebooks for print.

Justin has a background in bioengineering and has worked as software engineer developing tools for research and computational biology. Currently, Justin is a Solutions Engineer at Zymergen.

Presentations

Using Jupyter at the intersection of robots and industrial biology Session

Zymergen is a technology company, approaching biology with an engineering and data-driven mindset. Our platform integrates robotics, software, and biology to deliver predictability and reliability during strain design and development. This session will highlight how Jupyter notebooks play an integral role in providing a shared Python environment between our software engineers and scientists.

Paco Nathan leads the Learning Group at O’Reilly Media. Known as a “player/coach” data scientist, Paco led innovative data teams building ML apps at scale for several years and more recently was evangelist for Apache Spark, Apache Mesos, and Cascading. Paco has expertise in machine learning, distributed systems, functional programming, and cloud computing with 30+ years of tech-industry experience, ranging from Bell Labs to early-stage startups. Paco is an advisor for Amplify Partners and was cited in 2015 as one of the Top 30 People in Big Data and Analytics by Innovation Enterprise. He is the author of Just Enough Math, Intro to Apache Spark, and Enterprise Data Workflows with Cascading.

Presentations

Computable Content: lessons learned Session

Lessons learned about using notebooks in media. Our project explores "computable content", combining Jupyter notebooks, video timelines, Docker containers, and HTML/JS for "last mile" presentation. What system architectures are needed at scale? How to coach authors to be effective with the medium? Can live coding augment formative assessment? What are typical barriers encountered in practice?

Humans in a loop: Jupyter notebooks as a front-end for AI pipelines at scale Session

How do people manage AI systems by interacting with them? Semi-supervised learning is hard. With machine learning pipelines running at scale, there's still a large need to keep humans in the loop. This project uses Jupyter in two ways: (1) people tune ML pipelines by reviewing analytics and adjusting parameters managed within notebooks; (2) the pipelines update those notebooks in lieu of logs.

Andrew Odewahn is the CTO of O’Reilly Media, where he helps define and create the new products, services, and business models that will help O’Reilly continue to make the transition to an increasingly digital future. The author of two books on database development, he has experience as a software developer and consultant in a number of industries, including manufacturing, pharmaceuticals, and publishing. Andrew has an MBA from New York University and a degree in computer science from the University of Alabama. He’s also thru-hiked the Appalachian Trail from Georgia to Maine.

Presentations

Thursday Opening Welcome Keynote

Program Chairs, Andrew Odewahn and Fernando Perez open the first day of keynotes.

Yuvi is a programmer/devops person at Wikimedia & UC Berkeley, where he works on making it easy for people who don’t traditionally consider themselves “programmers” to do things with code. He builds tools (e.g., Quarry, PAWS, etc.) to sidestep the list of historical accidents that constitute the “command line tax” that people have to pay before doing productive things with computing. He also contributes to the Jupyter project, primarily around making it easier to deploy.

Presentations

Democratizing access to Open Data by providing Open Computational Infrastructure Session

Open data by itself is not enough - people of all backgrounds should be able to easily use it however they want. We talk about how providing free, open & public computational infrastructure with easy access to our open data has helped a lot more people from diverse backgrounds make use of our data, & why other organizations providing open data should do similar things!

Deploying JupyterHub for students and researchers Tutorial

JupyterHub, a multi-user server for Jupyter notebooks, enables you to offer a notebook server to everyone in a group. When teaching a course, you can use JupyterHub to give each student access to the same resources and notebooks. There’s no need for the students to install software on their laptops. This tutorial will get you started deploying and customizing JupyterHub for your needs.

Managing a 1000+ student JupyterHub without losing your sanity Session

For our data science education program, we use Jupyter notebooks on a JupyterHub so students can learn data science without being distracted by details like installing and debugging Python packages. This talk will explain the DevOps principles we use to keep our hub (1000+ users, largest reported educational hub) stable and performant, and have all the features our instructors and students want.

Fernando Pérez is a staff scientist at Lawrence Berkeley National Laboratory and a founding investigator of the Berkeley Institute for Data Science at UC Berkeley, created in 2013. He received a PhD in particle physics from the University of Colorado at Boulder, followed by postdoctoral research in applied mathematics, developing numerical algorithms. Today, his research focuses on creating tools for modern computational research and data science across domain disciplines, with an emphasis on high-level languages, interactive and literate computing, and reproducible research. He created IPython while a graduate student in 2001 and continues to lead its evolution into Project Jupyter, now as a collaborative effort with a talented team that does all the hard work. He regularly lectures about scientific computing and data science, and is a member of the Python Software Foundation, a founding member of the NumFOCUS Foundation, and a National Academy of Science Kavli Frontiers of Science Fellow. He is the recipient of the 2012 Award for the Advancement of Free Software from the Free Software Foundation.

Presentations

Thursday Opening Welcome Keynote

Program Chairs, Andrew Odewahn and Fernando Perez open the first day of keynotes.

Fabio Pliger is the Technical Lead for Anaconda Fusion and a Bokeh core developer at Continuum Analytics, where he also worked on the XDATA DARPA and on customer projects. He has 14+ years of experience in Python applied to both highly regulated Enterprise and Open Source. Fabio holds an bachelor degree in Computer Science from the University of Verona, Italy. He has been an Open Source and Python advocate for many years, have spoken in many tech conferences around the world and is former Chairman of the EuroPython Society for 4 years, Co-Chair of the EuroPython Conference and Pycon Italy and Co-Founder of the Python Italia Association in 2007.

Presentations

Leveraging Jupyter to build an Excel-Python bridge Session

This talk will introduce how we built a commercial product on top Jupyter to help Excel users access the capabilities of the rich data science Python ecosystem. We'll present examples and use cases in a variety of industries, the collaborative workflow between analysts and data scientists that the application has enabled, and how we leveraged the Jupyter architecture to build the product.

Cheryl Quah is a Senior Software Engineer at Bloomberg LP. She works on developing applications for financial professionals to improve their research and investment workflows.

Presentations

Industry and Open-source: Working together to drive advancements in Jupyter for quants and data scientists Session

Strong partnerships between the open-source community and industry have been driving many recent developments in Jupyter. Learn more about the results of the community's collaboration with financial service providers such as Bloomberg, including JupyterLab, bqplot and enhancements to ipywidgets that greatly enrich Jupyter as an environment for data science and quantitative financial research.

Min has been contributing to IPython and Jupyter since 2006, and full-time since 2013. His areas of focus these days include the underlying infrastructure of Jupyter and deployment tools and services, such as JupyterHub, nbviewer, etc. He is now a postdoctoral fellow at Simula Research Lab in Oslo, Norway.

Presentations

Deploying JupyterHub for students and researchers Tutorial

JupyterHub, a multi-user server for Jupyter notebooks, enables you to offer a notebook server to everyone in a group. When teaching a course, you can use JupyterHub to give each student access to the same resources and notebooks. There’s no need for the students to install software on their laptops. This tutorial will get you started deploying and customizing JupyterHub for your needs.

JupyterHub: a roadmap of its recent developments and future direction Session

JupyterHub is a multi-user server for Jupyter notebooks. JupyterHub developers will discuss exciting recent additions and future plans for the project, including sharing notebooks with students and collaborators.

Bernie is a graduate student in the Information Studies program at UCLA. Her work is centered around knowledge creation in astronomy, specifically examining astronomers’ data and software pipeline practices. She also researches the use of open source software in scientific research organizations, primarily in data rich and computationally intensive fields. Prior to her current program, Bernie worked in IT (wearing many hats, some red) at several colleges and universities. She holds degrees in math, computer science, and fine arts.

Presentations

Citing the Jupyter Notebook in the Scientific Publication Process Session

Recently, researchers are citing the Jupyter notebook as a way to share the processes involved in the act of scientific inquiry. Traditionally, researchers have cited code and data related to the publication. The Jupyter notebook is a ‘recipe’ to explain the methods used, provide context for data, computations, and results and if shared in a public repository, furthers open science practices.

Megan Risdal is on the marketing team at Kaggle, the world’s largest data science community, where she focuses on driving growth of the open data publishing platform. She has master’s degrees in linguistics from the University of California, Los Angeles and North Carolina State University.

Presentations

Lessons learned from tens of thousands of Kaggle notebooks Session

Kaggle Kernels, an in-browser code execution environment which includes a version of Jupyter Notebooks, has allowed Kaggle, home of the world’s largest data science community, to flourish in new ways. From a diverse repository of user-created notebooks paired with competitions and public datasets, we share how Kernels has impacted machine learning trends, collaborative data science, and learning.

Ian Rose is as postdoctoral fellow at the Berkeley Institute for Data Science, working on the Jupyter Project there. He received his PhD in Geology from UC Berkeley, researching the physics of the deep Earth.

Presentations

Realtime collaboration in Jupyter notebooks Session

I demonstrate recent work on allowing for realtime collaboration in Jupyter notebooks, including installation, usage, and design decisions.

Philipp has a PhD in computational modeling of the visual system and works on developing open source and client specific software solutions for data management, visualization, and analysis at Continuum Analytics.

Presentations

Deploying interactive Jupyter dashboards for visualizing hundreds of millions of datapoints, in 30 lines of Python Tutorial

It can be difficult to assemble the right set of packages from the Python scientific software ecosystem to solve complex problems. This presentation will show step by step how to make and deploy a concise, fast, and fully reproducible recipe for interactive visualization of millions or billions of datapoints using very few lines of Python in a Jupyter notebook.

Patty Ryan leads prototyping engagements with partners, large and small, in the Technology Evangelism and Development team at Microsoft. She specializes in designing and operationalizing predictive models that inform strategies, focus customer outreach and increase engagement. Patty has led telemetry, analytics, UX and support in Dynamics, Azure Identity, and O365, driving innovation in customer-facing self-service and distributed analytics.

Presentations

Notebook Narratives from Industry – Inspirational Real-World Examples and Reusable Industry Notebooks Session

We describe, with video and demonstrations, four inspirational industry applications of Jupyter notebooks. These industry examples represent innovative applications of machine learning in manufacturing, retail, services and education. We also present and share four reference industry Jupyter notebooks, along with demo data sets, for practical application to class industry value areas.

Zach Sailer is a computational biophysics graduate student in the Harms Lab at the University of Oregon. Currently, he is studying protein evolution; specifically what are the underlying physical properties of proteins that shape their evolution. He graduated from Cal Poly San Luis Obispo with a Physics degree in 2013. Between undergraduate and graduate school, he worked as part of the core development team for IPython/Jupyter.

Presentations

How Jupyter makes experimental and computational collaborations easy Session

Scientific research thrives on collaborations between computational and experimental groups who work together to solve problems using their separate expertise. This session highlights how tools like the Notebook, JupyterHub, and ipywidgets can be used to make these collaborations smoother and more effective.

Scott Sanderson is a Senior Software Engineer at Quantopian, where he is responsible for design and implementation of Quantopian’s backtesting and research APIs.

Within the Jupyter ecosystem, most of Scott’s work focuses on enhancing the extensibility of the notebook for use in large deployments.

Presentations

Building a Notebook Platform for 100,000 Users Session

This talk describes the architecture of the Quantopian Research Platform, a Jupyter Notebook deployment serving a community of over 100,000 users. We show how, using standard extension mechanisms, we provide features such as: - Robust storage and retrieval of hundreds of gigabytes of notebooks. - Integrating the notebook into an existing web application. - Sharing Notebooks between users.

Kaz Sato is Staff Developer Advocate at Cloud Platform team, Google Inc. He leads the developer advocacy team for Machine Learning and Data Analytics products, such as TensorFlow, Cloud ML, and BigQuery. Speaking at major events including Google I/O 2016, Hadoop Summit 2016, Strata+Hadoop World 2016 San Jose and NYC, ODSC East/West 2016, Google Next 2015 NYC and Tel Aviv. Kaz also has been leading and supporting developer communities for Google Cloud for over 7 years. He is also interested in hardwares and IoT, and has been hosting FPGA meetups since 2013.

Presentations

Cloud Datalab: Jupyter with the power of BigQuery and TensorFlow Session

Google Cloud Datalab is a Jupyter environment from Google that integrates BigQuery, TensorFlow and other Google Cloud services seamlessly. With the massively parallel query engine, you can easily run SQL query from Jupyter to access terabytes of data in seconds, and train your deep model with TensorFlow with tens of GPUs in the cloud, with all the usual tools available on Jupyter.

Robert Schroll is a data scientist in residence at the Data Incubator. Previously, he held postdocs in Amherst, Massachusetts, and Santiago, Chile, where he realized that his favorite parts of his job were teaching and analyzing data. He made the switch to data science and has been at the Data Incubator since. Robert holds a PhD in physics from the University of Chicago.

Presentations

Machine Learning with TensorFlow and Jupyter 2-Day Training

This training will introduce TensorFlow's capabilities through its Python interface with a series of Jupyter notebooks. It will move from building machine learning algorithms piece by piece to using the higher-level abstractions provided by TensorFlow. Students will use this knowledge to build and visualize machine-learning models on real-world data

Leah Silen has been the NumFocus Executive Director from its beginning, working with the founding board members to write the application for NumFocus’s nonprofit status. Before joining NumFocus, she worked in the nonprofit sector as a Public Relations and Program Director with a focus on community relations and fundraising. Leah has also volunteered and sat on several boards of nonprofit organizations.

Presentations

Empower Scientists, Save Humanity: NumFOCUS Five years in, five hundred thousand to go Session

What does the discovery of the Higgs Boson, the landing of the Phalae robot, the analysis of political engagement, and the freedom of human trafficking victims have in common? NumFOCUS projects were there. We invite you to come and learn how we together can empower scientists and save humanity.

I served 10 years in the US Air Force prior to joining Continuum Analytics in 2015. I started working on project Jupyter in 2013, and have since written kernels for Octave, Matlab, and Scilab. I am currently working on JupyterLab, the next generation user interface for the Jupyter Notebook.

Presentations

JupyterLab Tutorial Tutorial

JupyterLab Tutorial - A walkthrough of JupyterLab as a user and as an extension author. A tour of the capabilities of JupyterLab, and a demonstration of creating a simple extension to the environment.

Pierce leads product data science at Bitfusion, the world’s first end-to-end deep learning and AI development and infrastructure management platform. Previously, he served as the Director of Data Science and Insights for eyeQ, next-generation personalized retail displays that leverage deep learning for facial recognition. He has several years experience interpreting sensor data, working with massive data sets, and performing deep learning on image and video data. Pierce is the co-organizer of the Austin Deep Learning Meetup and writes and speaks on deep learning and applied data science.

Presentations

Deep Learning and Elastic GPUs using Jupyter Session

Jupyter is great for deep learning development and training. Combined with GPUs, it makes for fast dev and fast execution, but doesn’t make it easy to switch from a CPU execution context to GPUs and back. We’ll look at best practices on doing deep learning with Jupyter, and then show how to work with CPUs and GPUs more easily by using Elastic GPUs and quick-switching between custom kernels.

Lee has worked for Microsoft in a number of roles over the last five years, acting as both Academic and Technical Evangelist before moving on to become CTO of Academic Engagements in 2016, where he now works to engage with academic institutions across the UK in the ongoing development of the Microsoft platform. Prior to this, Lee worked for the University of Manchester as the Head of Information Systems, giving him extensive experience of leading service and delivery teams across both academic and commercial markets.

Lee holds a PGCE in Higher Education Management from the University of Southampton and an MSc in Information Technology from the University of Liverpool.

Presentations

Notebook Narratives from Industry – Inspirational Real-World Examples and Reusable Industry Notebooks Session

We describe, with video and demonstrations, four inspirational industry applications of Jupyter notebooks. These industry examples represent innovative applications of machine learning in manufacturing, retail, services and education. We also present and share four reference industry Jupyter notebooks, along with demo data sets, for practical application to class industry value areas.

Srinivas Sunkara is a Quant in the Quantitative Financial Research team at Bloomberg LP. He works on developing financial models focusing on applying machine learning techniques to variaous problems in Finance. He is one of the main developers of bqplot, a Jupyter notebook based interactive plotting library and contribues to other open source projects like ipywidets and traitlets.

Presentations

Industry and Open-source: Working together to drive advancements in Jupyter for quants and data scientists Session

Strong partnerships between the open-source community and industry have been driving many recent developments in Jupyter. Learn more about the results of the community's collaboration with financial service providers such as Bloomberg, including JupyterLab, bqplot and enhancements to ipywidgets that greatly enrich Jupyter as an environment for data science and quantitative financial research.

Research lead of the DSEP Jupyter Development Team at the Berkeley Institute for Data Science, head student instructor of UC Berkeley’s Foundations of Data Science course, studying Computer Science.

Presentations

Data Science at UC Berkeley: 2000 undergraduates, 50 majors, no command line Session

Engaging critically with data is now a required skill for students in all areas, but many traditional data science programs aren’t easily accessible to those without prior computing experience. Our data science program has 1200 students across 50 majors (ranging from history & literature to cognitive science), and we explain how we designed our pedagogy to make data science accessible to everyone.

Software engineer at UCSD and contributing scientist at the Broad Institute. Lead developer of GenePattern Notebook and open source developer on the Jupyter integration of bioinformatic tools.

Presentations

GenePattern Notebook: Jupyter for Integrative Genomics Session

GenePattern Notebook allows Jupyter to communicate with the open source GenePattern environment for integrative genomics analysis. It wraps hundreds of software tools for analyzing “omics” data types, as well as general machine learning methods. It makes these available in Jupyter through a user-friendly interface that is accessible to both programming and nonprogramming researchers.

David Taieb is the STSM for the Watson Data Platform Developer Advocacy team at IBM, leading a team of avid technologists with the mission of educating developers on the art of possible with cloud technologies. He’s passionate about building Open Source tools like the PixieDust Python Library for Jupyter Notebooks and Apache Spark, that help improve developer productivity and overall experience. David enjoys sharing his experience by speaking at conferences and meeting as many people as possible.

Presentations

Data Science made easy with Jupyter Notebooks and PixieDust Session

Whether you are an experienced data scientist or just a beginner needing to do some data science in a Jupyter Notebook, this session is for you. You will learn how PixieDust, which is a new open source library that has already been downloaded thousands of times, speeds data exploration with interactive auto visualizations that make creating charts easy and fun

Andrew Therriault joined the City of Boston as its first Chief Data Officer in 2016, after serving as Director of Data Science for the Democratic National Committee. He received his PhD in political science from NYU in 2011 and completed a postdoctoral research fellowship at Vanderbilt, and more recently served as editor of “Data and Democracy: How Political Data Science is Shaping the 2016 Elections” (O’Reilly Media). Therriault leads Boston’s Analytics Team, a group that is a nationally-recognized leader in using data science to improve city operations and make progress in critical areas such as public safety, education, transportation, and health.

Presentations

Jupyter Notebooks and Production Data Science Workflows Session

Jupyter notebooks are a great tool for exploratory analysis and early development, but what do you do when it's time to move to production? A few years ago, the obvious answer was to export to a pure Python script, but now that's not the only option. We'll look at real world cases and explore alternatives for integrating Jupyter into production workflows.

Rollin Thomas is a Big Data Architect in the Data and Analytics Services group. Prior to joining NERSC in 2015, he was a Staff Scientist in the Computational Research Division. He has worked on numerical simulations of supernova atmospheres, observation and analysis of supernova spectroscopy data, and data management for supernova cosmology experiments. Rollin has served as a member of the Nearby Supernova Factory, is a builder on the Dark Energy Survey, and is a full member of the Large Synoptic Survey Telescope Dark Energy Science Collaboration. He holds a B.S. in physics from Purdue University and a Ph.D. in astrophysics from the University of Oklahoma.

Presentations

How Jupyterhub Tamed Big Science: Experiences Deploying Jupyter at a Supercomputing Center Session

Extracting scientific insights from data increasingly demands a richer, more interactive experience than traditional high-performance computing systems have traditionally provided. We present our efforts to leverage Jupyterhub to enable notebook services for data-intensive supercomputing on the Cray XC40 Cori system at the National Energy Research Scientific Computing Center (NERSC).

Marius is a ( ΞΣ ) JavaScript enthusiast at figshare, always looking to evolve and improve his code and skills. If asked, he’ll list his hobbies as “everything”, but for the sake of brevity these include binge watching TV series and movies, playing his electric guitar and trying to solve all sorts of hacking puzzles. You can’t follow him anywhere, he will follow you.

Presentations

Closing the gap between Jupyter and Academic Publishing Session

Reports of a lack of reproducibility have led funders and others to require open data and code as as the outputs of research they fund. In this talk, we will describe the opportunities for Jupyter notebooks to be the final output of academic research. We will discuss how Jupyter could help disrupt the inefficiencies in cost and scale of open access academic publishing.

Chris is a software engineer at Microsoft. He works on a range of products including Azure Notebooks, Python Tools for Visual Studio, and the Azure SDK for Python. With 5+ years of experience building developer tooling and more recently, scalable web services, he hopes to share some of his experiences with you.
In Chris’ spare time he races motorcycles, hikes, and explores the Seattle brewing scene.

Presentations

Hosting Jupyter at scale Session

Have you thought about what it takes to host 500+ Jupyter users concurrently? What about managing 15,000+ users and their content? Learn how Azure Notebooks does this daily and about the challenges faced in designing and building a scalable Jupyter service.

Karlijn Willems holds a degree in Literature and Linguistics (English and Spanish) and Information Management from KU Leuven. Before joining DataCamp as a data science journalist, she worked as a junior big data developer with Hadoop, Spark and Scala. Now, she writes for the DataCamp community, focusing on data science and data science education.

Presentations

Enhancing Data Journalism with Jupyter Session

Drawing inspiration from narrative theory and design thinking, among others, we will walk through examples that illustrate how to effectively use Jupyter notebooks in the data journalism workflow.

Carol Willing is a Director of the Python Software Foundation, a Jupyter Steering Council member, and a Geek in Residence at “FabLab San Diego” where she teaches wearable electronics and software development

She co-organizes PyLadies San Diego and San Diego Python, contributes to open source community projects, like OpenHatch,and is an active member of the MIT Enterprise Forum in San Diego. She enjoys sharing her passion for electronics, software, problem solving and the arts.

Previously, she worked in software engineering management, product and project management, sales, and non-profit organization. She attended MIT and received an MS in Management with an emphasis on applied economics and high tech marketing. She also received a BSE in Electrical Engineering from Duke University.

Presentations

Deploying JupyterHub for students and researchers Tutorial

JupyterHub, a multi-user server for Jupyter notebooks, enables you to offer a notebook server to everyone in a group. When teaching a course, you can use JupyterHub to give each student access to the same resources and notebooks. There’s no need for the students to install software on their laptops. This tutorial will get you started deploying and customizing JupyterHub for your needs.

JupyterHub: a roadmap of its recent developments and future direction Session

JupyterHub is a multi-user server for Jupyter notebooks. JupyterHub developers will discuss exciting recent additions and future plans for the project, including sharing notebooks with students and collaborators.

Music and Jupyter: a combo for creating collaborative narratives for teaching Session

Music, as a universal language, engages and delights. By combining music with Jupyter notebooks, you can explore and teach the basics of interactive computing and data science. We'll use music21, a tool for computer-aided musicology, and magenta, a Tensorflow project for making music using machine learning, to create collaborative narratives and publishing materials for teaching and learning.

Born and raised in Virginia, Catherine graduated with a double major in astronomy-physics and history from the University of Virginia in 2015. While there, she completed theses on the evolution of galaxies in dense galaxy groups and the rise of the modern astronomical research observatory in the United States. Upon graduation, she moved north to Cambridge, MA to pursue a PhD in astronomy at Harvard University. Now a second-year PhD student and an NSF Graduate Research Fellow, she works with Professors Alyssa Goodman and Douglas Finkbeiner on the 3D distribution of our Galaxy’s gas and dust, in pursuit of a better understanding of the spiral structure of the Milky Way. She is an avid user of Jupyter Notebooks in her research and is broadly interested in their potential to make astronomy more open-source, seamless, and accessible.

Presentations

Citing the Jupyter Notebook in the Scientific Publication Process Session

Recently, researchers are citing the Jupyter notebook as a way to share the processes involved in the act of scientific inquiry. Traditionally, researchers have cited code and data related to the publication. The Jupyter notebook is a ‘recipe’ to explain the methods used, provide context for data, computations, and results and if shared in a public repository, furthers open science practices.