Please note that it is a requirement that each participant set up a Databricks account for use during the tutorial. To ensure a swift and effective start to the tutorial, your account must be set up before the tutorial begins.
The class will consist of about 60% lecture and 40% hands-on labs + demos. Note that the hands-on labs in class will be taught in Scala. All students will have access to Databricks for one month after class to continue working on labs + assignments.
9:00 AM – 9:30 AM
Introduction to Wikipedia and Spark
9:30 AM – 10:30 AM
DataFrames and Spark SQL
Datasets used: Pageviews and Clickstream
10:30 AM – 11:00 AM
11:00 AM – 12:00 PM
Spark core architecture
12:00 PM – 1:00 PM
1:00 PM – 2:00 PM
Resilient distributed datasets
Dataset used: Pagecounts
2:00 PM – 2:30 PM
Datasets used: Clickstream
2:30 PM – 3:00 PM
Datasets: Live Edit Stream from multiple Languages
3:00 PM – 3:30 PM
3:30 PM – 3:45 PM
Guest talk: Choosing an optimal storage backend for your Spark use case
3:45 PM – 4:45 PM
Sameer Farooqui is a client services engineer at Databricks, where he works with customers on Apache Spark deployments. Sameer works with the Hadoop ecosystem, Cassandra, Couchbase, and general NoSQL domain. Prior to Databricks, he worked as a freelance big data consultant and trainer globally and taught big data courses. Before that, Sameer was a systems architect at Hortonworks, an emerging data platforms consultant at Accenture R&D, and an enterprise consultant for Symantec/Veritas (specializing in VCS, VVR, and SF-HA).
Comments on this page are now closed.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.