Sparklyr, a free and open sourced package developed by RStudio in conjunction with IBM, Cloudera, and H2O, makes it easy and practical to analyze big data with R. The package provides an R interface to Spark’s distributed machine-learning algorithms and much more. With sparklyr, you can:
Edgar Ruiz walks you through these features and demonstrates how to use sparklyr to create R functions that access the full Spark API.
Edgar Ruiz is a solutions engineer at RStudio with a background in deploying enterprise reporting and business intelligence solutions. He is the author of multiple articles and blog posts sharing analytics insights and server infrastructure for data science. Recently, Edgar authored the “Data Science on Spark using sparklyr” cheat sheet.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.