Python and R are the leading open source languages for data science and machine learning, but getting comfortable with both of these languages requires grappling with different syntaxes, conventions, and terminology. Pairs of ostensibly comparable packages from PyPI and CRAN often have fundamentally different interfaces, and APIs connecting Python and R to the same external systems are often incongruous. Furthermore, when data scientists attempt to scale workflows from smaller local datasets to larger distributed datasets, they must contend with additional frameworks and interfaces with idiosyncrasies beyond those in the core Python and R ecosystems. But these differences belie a set of fundamental abstractions common to these systems.
Ian Cook illuminates the underlying commonalities of these systems through intuitive explanations and straightforward demonstrations. You’ll learn how:
By exploring and running Python and R code in Cloudera Data Science Workbench (CDSW), you’ll gain familiarity with these these two languages and their ecosystems of data science tools, plus SQL, Spark, and TensorFlow. By practicing on sets of equivalent data science and machine learning workflows implemented using these different languages and frameworks, you’ll overcome the obstacles to getting started using these tools.
Ian Cook is a data scientist at Cloudera and the author of several R packages, including implyr. Previously, he was a data scientist at TIBCO and a statistical software developer at AMD. Ian is a cofounder of Research Triangle Analysts, the largest data science meetup group in the Raleigh, North Carolina, area, where he lives with his wife and two young children. He holds an MS in statistics from Lehigh University and a BS in applied mathematics from Stony Brook University.
Get the Platinum pass or the Training pass to add this course to your package.
Comments on this page are now closed.
For exhibition and sponsorship opportunities, email strataconf@oreilly.com
For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com
View a complete list of Strata Data Conference contacts
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com
Comments
Great class! Highly recommend for anyone in or planning to get into Data Science. Good foundational class. Ian was very knowledgeable in the material and delivered the content in an easy-to-understand format.
Hi Yogesh, Yes, the first half of this training will be about data manipulation. (The second half will be about machine learning.) I’ll show examples of data manipulation tasks in Python using pandas, and I’ll show how to extend those to larger datasets using PySpark and some other tools. I’ll do the same for R (using dplyr, sparklyr, and other packages). If you’re primarily interested in Python, you can focus more on the Python examples. I’ll also show that once you know how to do these tasks in one language or framework (even SQL) it’s surprisingly easy to do it in others when you look past the syntax differences and see the underlying commonalities.
Hi, Does this course cover the data manipulation (data cleaning) using python?