Presented By O'Reilly and Cloudera
December 5-6, 2016: Training
December 6–8, 2016: Tutorials & Conference
9:00am Data science at scale: Using Spark and Hadoop Maojin Jiang (江毛进) (Cloudera)
9:00am TBC
9:00am-5:00pm (8h)
Data science at scale: Using Spark and Hadoop
Maojin Jiang (江毛进) (Cloudera)
Maojin Jiang demonstrates how Spark and Hadoop enable data scientists to help companies reduce costs, increase profits, improve products, retain customers, and identify new opportunities. Through in-class simulations and exercises, Maojin walks you through applying data science methods to real-world challenges in different industries, offering preparation for data scientist roles in the field.
9:00am-5:00pm (8h)
To be confirmed
9:00am-5:00pm (8h)
Spark foundations: Prototyping Spark use cases on Wikipedia datasets
Andy Huang (Servian Australia)
The real power and value proposition of Apache Spark is in building a unified use case that combines ETL, batch analytics, real-time stream analysis, machine learning, graph processing, and visualizations. Andy Huang employs hands-on exercises using various Wikipedia datasets to illustrate the variety of ideal programming paradigms Spark makes possible.