Brought to you by NumFOCUS Foundation and O’Reilly Media Inc.
The official Jupyter Conference
August 22-23, 2017: Training
August 23-25, 2017: Tutorials & Conference
New York, NY

Data analysis in Jupyter notebooks with SQL, Python, and R

Laurent Gautier (Verily)
9:00am–12:30pm Wednesday, August 23, 2017
Usage and application
Location: Concourse F Level: Intermediate
Average rating: *****
(5.00, 1 rating)

Who is this presentation for?

  • Data science practitioners or anyone working on data (data journalists, epidemiologists, business analytics specialists, computational biologists, etc.) who wants to combine Python, R, and SQL (RDBM and Spark tables) to perform data analysis and make graphics

Prerequisite knowledge

  • A working knowledge of Python and the Jupyter Notebook

Materials or downloads needed in advance

  • A laptop (at least dual core with 8 GB of RAM) with Docker installed
  • The base container for the workshop should preferably be download ahead of time. With docker installed, the command line is

    docker pull rpy2/polyglot:latest

What you'll learn

  • Learn how combining Python with SQL and R can significantly expand your abilities to perform data analysis and why the Jupyter Notebook is a very good working environment for Python-centered polyglot data analysis

Description

Python is popular for data analysis, but restricting yourself to Python means missing a wealth of libraries or capabilities available in R or SQL. Laurent Gautier walks you through a pragmatic, reasonable, and good-looking polyglot approach, all thanks to R visualizations. Along the way, you’ll learn why the Jupyter Notebook is a very good working environment for Python-centered polyglot data analysis.

Photo of Laurent Gautier

Laurent Gautier

Verily

Laurent Gautier is a scientific research lead at Verily Life Sciences (fka Google Life Sciences). Laurent’s work focuses on data science, visualization, machine learning, data mining, and prototyping software to understand molecular, cellular, and clinical data. He is the author of popular open source tools in bioinformatics and statistical programming for applications in healthcare, life sciences, and beyond and has contributed to or led a number of open source projects, including Bioconductor, affy, and rpy2.

Comments on this page are now closed.

Comments

Picture of Laurent Gautier
Laurent Gautier | SCIENCE AND ENGINEERING
08/22/2017 3:28pm EDT

The error message is pointing out two possible causes for the error I agree with.

Picture of Feyzi Bagirov
Feyzi Bagirov | FACULTY OF ANALYTICS
08/21/2017 2:04pm EDT

I am getting an error when running docker command indicated above. The error is: "Post http:///var/run/docker.sock/v1.20/images/create?fromImage=rpy2%2Fpolyglot%3Alatest: dial unix /var/run/docker.sock: no such file or directory.

  • Are you trying to connect to a TLS-enabled daemon without TLS?
  • Is your docker daemon up and running?"
Picture of Laurent Gautier
Laurent Gautier | SCIENCE AND ENGINEERING
08/20/2017 3:14pm EDT

Thanks for the note Tyler. This is a good resource. Docker will help us get a turnkey environment that is expected to work on everyone everyone’s system (from laptop to VM on the cloud).

Tyler Pugliese | DATA SCIENTIST
08/20/2017 12:29pm EDT

Hi,

I’ve installed docker and found that the Docker docs on getting started are invaluable: https://docs.docker.com/get-started/

I recommend them to anyone (who like myself didn’t know what docker was before this).