Sep 23–26, 2019

Expand your data science and machine learning skills with Python, R, SQL, Spark, and TensorFlow

Ian Cook (Cloudera)
9:00am—5:00pm Monday, September 23—Tuesday, September 24
Location: 1A 18

Participants should plan to attend both days of training course. Note: to attend training courses, you must be registered for a Platinum or Training pass; does not include access to tutorials on Tuesday.

Advancing your career in data science requires learning new languages and frameworks—but learners face an overwhelming array of choices, each with different syntaxes, conventions, and terminology. Ian Cook simplifies the learning process by elucidating the abstractions common to these systems. Through hands-on exercises, you'll overcome obstacles to getting started using new tools.

What you'll learn, and how you can apply it

  • Understand the fundamental abstractions common to data science and machine learning systems
  • Learn how to implement equivalent workflows using Python, R, SQL, Spark, and TensorFlow and overcome the obstacles to getting started

Who is this presentation for?

  • You want to expand your data science and machine learning skills without getting overwhelmed.

Level

Beginner

Prerequisites:

  • A basic knowledge of Python or R

Hardware and/or installation requirements:

  • A laptop or a high-resolution tablet with keyboard with a recently updated web browser installed (No locally installed software or packages are required. This training uses Cloudera Data Science Workbench [CDSW].)

Python and R are the leading open source languages for data science and machine learning, but getting comfortable with both of these languages requires grappling with different syntaxes, conventions, and terminology. Pairs of ostensibly comparable packages from PyPI and CRAN often have fundamentally different interfaces, and APIs connecting Python and R to the same external systems are often incongruous. Furthermore, when data scientists attempt to scale workflows from smaller local datasets to larger distributed datasets, they must contend with additional frameworks and interfaces with idiosyncrasies beyond those in the core Python and R ecosystems. But these differences belie a set of fundamental abstractions common to these systems.

Ian Cook illuminates the underlying commonalities of these systems through intuitive explanations and straightforward demonstrations. You’ll learn how:

  • The two-dimensional data structures familiar to data scientists—including SQL tables, NumPy arrays, pandas DataFrames, R data frames, Spark DataFrames, and TensorFlow datasets—are all implementations of the same abstract concept, with only a few important differences.
  • Popular data manipulation interfaces—including SQL, pandas, dplyr, and the Spark DataFrame API—are all based on the same set of relational operations.
  • Machine learning workflows implemented using popular packages and frameworks—including scikit-learn, the caret package for R, Spark MLlib, and the TensorFlow Estimator API—all follow the same fundamental steps: input training data, define features and labels, train model, evaluate model, and make predictions.

By exploring and running Python and R code in Cloudera Data Science Workbench (CDSW), you’ll gain familiarity with these these two languages and their ecosystems of data science tools, plus SQL, Spark, and TensorFlow. By practicing on sets of equivalent data science and machine learning workflows implemented using these different languages and frameworks, you’ll overcome the obstacles to getting started using these tools.

About your instructor

Photo of Ian Cook

Ian Cook is a data scientist at Cloudera and the author of several R packages, including implyr. Previously, he was a data scientist at TIBCO and a statistical software developer at Advanced Micro Devices. Ian is cofounder of Research Triangle Analysts, the largest data science meetup group in the Raleigh, North Carolina, area, where he lives with his wife and two young children. He holds an MS in statistics from Lehigh University and a BS in applied mathematics from Stony Brook University.

Conference registration

Get the Platinum pass or the Training pass to add this course to your package. Best Price ends June 28

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Contact us

confreg@oreilly.com

For conference registration information and customer service

partners@oreilly.com

For more information on community discounts and trade opportunities with O’Reilly conferences

strataconf@oreilly.com

For information on exhibiting or sponsoring a conference

Contact list

View a complete list of Strata Data Conference contacts