Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

Schedule: R sessions

Add to your personal schedule
9:00am12:30pm Tuesday, March 14, 2017
Data science & advanced analytics
Location: LL21 C/D Level: Intermediate
Vanja Paunic (Microsoft), Robert Horton (Microsoft), Hang Zhang (Microsoft), Srini Kumar (LevaData, Inc.), Mengyue Zhao (Microsoft), John-Mark Agosta (Microsoft), Mario Inchiosa (Microsoft), Debraj GuhaThakurta (Microsoft)
Average rating: **...
(2.50, 4 ratings)
Join in to learn how to do scalable, end-to-end data science in R on single machines as well as on Spark clusters. You'll be assigned an individual Spark cluster with all contents preloaded and software installed and use it to gain experience building, operationalizing, and consuming machine-learning models using distributed functions in R. Read more.
Add to your personal schedule
9:00am12:30pm Tuesday, March 14, 2017
Stephen Elston (Quantia Analytics, LLC), Ryan Hafen (Hafen Consulting)
Average rating: ****.
(4.12, 8 ratings)
Divide and recombine techniques provide scalable methods for exploration and visualization of otherwise intractable datasets. Stephen Elston and Ryan Hafen lead a series of hands-on exercises to help you develop skills in exploration and visualization of large, complex datasets using R, Hadoop, and Spark. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, March 14, 2017
Data science & advanced analytics
Location: LL21 C/D Level: Intermediate
John Mount (Win-Vector LLC)
Average rating: ****.
(4.83, 6 ratings)
Sparklyr provides an R interface to Spark. With sparklyr, you can manipulate Spark datasets to bring them into R for analysis and visualization and use sparklyr to orchestrate distributed machine learning in Spark from R with the Spark MLlib and H2O SparkingWater libraries. John Mount demonstrates how to use sparklyr to analyze big data in Spark. Read more.
Add to your personal schedule
11:00am11:40am Wednesday, March 15, 2017
Visualization & user experience
Location: 212 A-B Level: Non-technical
Rumman Chowdhury (Accenture)
Average rating: ***..
(3.00, 2 ratings)
In collaboration with the Gray Area Foundation for the Arts and Metis Data Science, Rumman Chowdhury created an interactive data art installation with the purpose of educating San Franciscans about their own city. Rumman discusses the challenges of using historical, predigital-era data with D3 and R to craft a compelling and educational story residing at the intersection of art and technology. Read more.
Add to your personal schedule
2:40pm3:20pm Wednesday, March 15, 2017
Spark & beyond
Location: LL21 C/D Level: Beginner
Edgar Ruiz (RStudio)
Average rating: ****.
(4.80, 5 ratings)
Sparklyr makes it easy and practical to analyze big data with R—you can filter and aggregate Spark DataFrames to bring data into R for analysis and visualization and use R to orchestrate distributed machine learning in Spark using Spark ML and H2O SparkingWater. Edgar Ruiz walks you through these features and demonstrates how to use sparklyr to create R functions that access the full Spark API. Read more.