Brought to you by NumFOCUS Foundation and O’Reilly Media Inc.

The official Jupyter Conference

August 22-23, 2017: Training

August 23-25, 2017: Tutorials & Conference

New York, NY

Please log in

Sponsored by:

Jupyter Poster Session

5:00pm–7:00pm Wednesday, August 23, 2017

Location: Trianon Ballrom

Posters will be presented Wednesday evening in a friendly setting where attendees can mingle. This session is an opportunity for you to discuss your Jupyter work one-on-one with other attendees and presenters.

Advances in Computational Application Prototyping through Jupyter Notebook Extensions

Moderated by: Roy Hyunjin Han

Jupyter Notebook is already great, but did you know that you can use it to prototype computational web applications? In this whirlwind tour, we will introduce you to several favorite open source plugins that we have been using for the past few years (many of which we have developed) that let us rapidly deploy tools for processing tables, images, spatial data, satellite images, sounds and video.

All Kernels Go: Building a Rich Environment for Interactive HPC with JupyterHub

Moderated by: Ashwin Trikuta Srinath, Linh Ngo, & Jeff Denton

This talk will be about how to build a JupyterHub setup with a rich set of features for interactive HPC, and solutions to practical problems encountered in integrating JupyterHub with other components of HPC systems. We will present several examples of how researchers at our institute are using JupyterHub, and demonstrate the different parts of our setup that enable their applications.

Analyst's Nightmare or Laundering Massive Spreadsheets

Moderated by: Feyzi Bagirov & Tatiana Yarmola

Poor data quality frequently invalidates data analysis, especially when performed in Excel, the most commonplace business intelligence tool, on data that underwent transformations, imputations, and manual manipulations. In this talk we will use Pandas to walk through an example of Excel data analysis and illustrate several common pitfalls that make this analysis invalid.

Building Analytics Platform with Apache Toree and Apache Spark

Moderated by: Luciano Resende & Jakob Odersky

Data Scientists are becoming a necessity of every company in the data-centric world of today, and with them comes the requirement to make available a flexible and interactive analytics platform. This session will describe our experience and best practices putting together an Analytical platform based on Jupyter Notebooks, Apache Toree and Apache Spark.

Computable content: Lessons learned

Moderated by: Paco Nathan

Paco Nathan shares lessons learned about using notebooks in media and explores computable content that combines Jupyter notebooks, video timelines, Docker containers, and HTML/JS for "last mile" presentation, covering system architectures, how to coach authors to be effective with the medium, whether live coding can augment formative assessment, and the typical barriers encountered in practice.

Creating a scalable, cost-effective JupyterHub implementation on Amazon AWS

Moderated by: Faras Sadek and Demba Ba

At Harvard, we deployed JupyterHub on Amazon AWS for two classes in School of Engineering. The Signal Processing class used Docker-based JupyterHub, where each user provisioned with a docker container notebook. For the Decision Theory class, we redesigned JupyterHub using a dedicated EC2 instance per user’s notebook, providing better scalability, reliability and cost efficiency.

Customizing JupyterHub authentication with OAuth2, Templates and security Track

Moderated by: Diogo Munaro Vieira & Felipe Ferreira

At Globo.com all of our datascientists are using Jupyter Notebooks for analysis. Its analysis require some security because they are working on our shared data science platform. We will show how JupyterHub was adjusted for authentication with company's OAuth2 solution and user's action track system based on Jupyter notebook hooks.

Data science artificial intelligence agent for Jupyter Notebooks

Moderated by: Elijah Philpotts

3Blades has developed an innovative artificial intelligence agent to enhance productivity for data scientists when using Jupyter Notebooks for Exploratory Data Analysis (EDA).

Dockerize and Kerberize JupyterHub Notebook for Spark, Yarn and HDFS

Moderated by: Joy Chakraborty

How to run Kerberize secured multi-user Jupyter notebook (JupyterHub) in a integrated with Spark/Yarn cluster and how to use docker to setup such complex integrated platform quickly with less difficulties.

Driving Jupyter: Hands-On Jupyter Demos and the National Transportation Data Challenge Presented by DataScience.com, the Regional Big Data Innovation Hubs, and partners

Moderated by: Dave Goodsmith, Meredith Lee, Rene Baston, and Edgar Fuller

A demonstration station will feature donated cloud computing resources from DataScience.com, Amazon Web Services, GoogleCloud, Satori, and other partners in live executable Jupyter-based notebooks.

Encrypting Notebooks for Data Science

Moderated by: Steven Anton

Sometimes data scientists need to work directly with highly sensitive data, such as personally identifiable information or health records. Jupyter notebooks provide a great platform for exploration, but don't meet strict security standards. We will walk through a solution that our data science team uses to harden security by seamlessly encrypting notebooks at rest.

Faster prototyping in Jupyter

Moderated by: Andrey Petrin

Big Data analytics is already outdated at Yandex. We need insights and action items from our logs and databases. In this new environment speed of prototyping comes to the first place. I'm going to give an overview how we use Python and Jupyter to create prototypes that amaze and inspire real product creation.

Find the Farm (Data Science Insights into Real Estate Pricing)

Moderated by: en zyme & Zelda Kohn

Real estate transactions are geographically sparse and rare, often with both listing and selling agents. Many factors determine price; most models rely on physical parameters. Via Jupyter/Python geographic and data tools, we'll discover "farms", and pricing characteristics. Farms (found via clustering) can affect either listing or sales price, both of which are negotiated.

How an Open Analytics Platform became a Lifesaver

Moderated by: Douglas Liming

Ready to take a deeper look at how the Jupyter platform is having a widespread impact on analytics? Learn how a large health organization was able to fit SAS their open ecosystem, and thanks to the Jupyter platform, you no longer have to choose between analytics languages like Python, R, or SAS, and how a single, unified open analytics platform supported by Jupyter empowers you to have it all.

How Data Science Assists Sports: Jupyter Notebook, Analytics, and the 3-point Revolution

Moderated by: Chris Rawles

The availability of data combined with new analytical tools have fundamentally transformed the sports industry, and in this talk I show how to use Jupyter Notebook with powerful analytical tools such as Apache Spark and visualization tools like Matplotlib and Seaborn to assist data science.

Integration of the materials data contribution framework MPContribs with Jupyter(Hub)

Moderated by: Patrick Huck & Shreyas Cholia

The open Materials Project (MP, https://materialsproject.org) that supports the design of novel materials, now allows users to contribute and share new theoretical and experimental materials data via the MPContribs tool. MPContribs uses Jupyter and JupyterHub at every layer and is an important step in MP’s effort to deliver a next-generation collaborative platform for Materials (Data) Science.

Intro to Machine Learning for the Healthcare Professional

Moderated by: Harold Mitchell

Today's healthcare and research professionals have so much precious historical data in need of a predictive outcome. Wouldn't it be nice to carry around a web-based notebook that had built‐in algorithms to perform predictions? Even more, the built‐in algorithms would be built by and maintained by you.

Jupyter Notebooks and Undergraduate Mathematics Curricula

Moderated by: Jacob Frias Koehler

Here, we present an undergraduate mathematics curriculum that leverages the Jupyter notebook and Jupyterhub to deliver material content and serve as the computational platform for students. These materials are motivated by introductory classes typically labeled Quantitative Reasoning, PreCalculus, and Calculus I.

Jupyter on AWS

Moderated by: Laxmikanth Malladi

Spinning up Jupyter on AWS is easy with many references for deploying on EC2 and EMR. This session intends to provide additional configurations and patterns for Enterprises to govern, track and audit usage on AWS.

JupyterHub Meets CloudyCluster

Moderated by: Jeffrey Denton

It is a match made in the cloud. By marrying JupyterHub and CloudyCluster, users gain access to scalable Jupyter without the headache and overhead of operations. Learn how CloudyCluster can scale JupyterHub to support thousands of users and thousands of computers, all from your smartphone, tablet, or desktop device.

Kooplex, building a collaborative platform for users from amateurs to experts

Moderated by: David Visontai

The advent of many interdisciplinary research areas and the cooperation of different scientific fields demand computational systems that allow for efficient collaboration. Kooplex, our highly integrated system incorporating the advantages of Jupyter notebooks, public dashboards, version control and data sharing serves as a basis for different projects in fields ranging from Medicine to Physics.

Productionalizing Notebooks for ETL

Moderated by: Bill Walrond

In this presentation, Kevin Rasmussen, Solution Architect, Caserta Concepts, discusses why notebooks aren’t just for data scientists anymore. Drawing information from a current project with one of the most respected newspapers in the country, he will go into detail about how to put data engineering into production with notebooks.

Project Jupyter for Productive Data Science Collaboration

Moderated by: Jonathan Whitmore

Project Jupyter contains tools that are perfect for many data science tasks, including rapid iteration for data munging, visualizing, and creating a beautiful presentation of results. The same tools that give power to individual data scientists can prove challenging to integrate in a team setting. This talk will emphasize overall best practices for data science team productivity.

Putting the "Ju" in Jupyter

Moderated by: David P. Sanders (Department of Physics, Faculty of Sciences, National University of Mexico)

An overview of using Julia with the Jupyter notebook, showing how the flexibility of the language is reflected in the notebook environment.

QuantEcon Open Notebook Archive

Moderated by: Trevor Lyon, Matt McKay, and Spencer Lyon

Introduction to the QuantEcon Open Notebook Archive, a community driven home for sharing and discovering Jupyter notebooks.

Science at the Speed of Thought: Enhancing Jupyter to Enable Interactive "Human-in-the-loop" Supercomputing

Moderated by: Matt Henderson and Shreyas Cholia

Scientists increasingly rely on large-scale computation and data analysis, with applications ranging from designing better batteries to understanding our universe. In this talk we’ll describe how scientists could greatly benefit from a platform using the core Jupyter architecture of notebooks and kernels with large-scale HPC and data analysis systems to enable interactive supercomputing.

Show Me the Money: Forecasting Economic Aid with Machine Learning

Moderated by: Majid Khorrami & Laura Kahn

What if decision makers could use data science techniques to predict how much economic aid they would receive each year? Our proposal will show how we did just that and used data for social good.

Spylon Kernel - A pure python scala kernel

Moderated by: Marius van Niekerk

Spylon kernel is a pure python jupyter metakernel. This allows python and scala users to have an easy kernel to use with Apache Spark.

The Containerized Jupyter Platform

Moderated by: Joshua Cook

This teaching session will take participants through using Docker's suite of tools, the numpy/scipy ecosystem, and the Jupyter project as a feature-rich programming interface, to build powerful systems for performing rich analysis and transformation on data sets of any size.

The KBase Narrative Interface: a platform for reproducible biological analysis

The DOE Systems Biology Knowledgebase (KBase) is an open source project that enables biological scientists to create, execute, collaborate on and share reproducible analysis workflows. KBase's Narrative Interface, built on the Jupyter Notebook, is the front end to a scalable object store, an execution engine, a distributed compute cluster, and a library of analysis tools packaged as Docker images.

Using Magic Functions to Build a Full, Multi-Threaded SQL Client

Moderated by: Timothy Dobbins

SQLCell is a magic function that executes raw, parallel, parameterized SQL queries with the ability to accept python variables as parameters, switch between engines with a button click, run outside of a transaction block, produce an intuitive query plan graph with D3.js to highlight slow points in query; all while concurrently running Python code. And much more.

We Need a Jupyter Based FreeCodeCamp for Analytics: Introducing the AnalyticsDojo

Moderated by: Jason Kuruzovich

FreeCodeCamp.com is a online learning platform for coding that has figured out how to use distributed content creation to power a learning community. This talk will discuss FreeCodeCamp and detail my current efforts to start a similar model for analytics with the AnalyticsDojo.com:, including content, technical, and community related opportunities and challenges.

Writing professional documents for the 21st century with Authorea

Researchers, data scientists, and professionals spend their days doing cutting-edge work. But when it comes time to writing, and disseminating their work, they’re often still using models and tools that haven’t changed much in decades, if not centuries.

Elite Sponsors

Strategic Sponsor

Bloomberg

Contributing Sponsor

Impact Sponsor

Domino Data Lab

Supporting Sponsors

Premier Exhibitors

Innovators

Community Partners

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email jupytersponsorships@oreilly.com

Partner Opportunities

For information on trade opportunities with JupyterCon, email partners@oreilly.com

Contact Us

View a complete list of JupyterCon contacts

©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com