Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

Just enough Scala for Spark

Dean Wampler (Lightbend)
9:0012:30 Tuesday, 23 May 2017
Spark & beyond
Location: Capital Suite 4
Level: Intermediate
Average rating: ****.
(4.50, 2 ratings)

Who is this presentation for?

  • Data developers and data scientists interested in using Scala for Spark

Prerequisite knowledge

  • A basic familiarity with Spark and Java

Materials or downloads needed in advance

It's *VERY* important that you set up the tutorial material before the tutorial. Unfortunately, with just half a day and a large crowd, Dean will not be able to help anyone with problems during the session.

Please clone or download the following GitHub repo:

  • https://github.com/deanwampler/JustEnoughScalaForSpark
  • Then, follow the setup instructions in the README.md file (https://github.com/deanwampler/JustEnoughScalaForSpark/blob/master/README.md).
  • If you have problems, post an issue to the GitHub repo (https://github.com/deanwampler/JustEnoughScalaForSpark/issues) or ask for help on the project's Gitter channel (https://gitter.im/deanwampler/JustEnoughScalaForSpark).
  • What you'll learn

    • Discover why Scala is an ideal programming language for data engineers using Spark
    • Learn the core features of Scala necessary to write Spark code
    • Pick up tips and tricks for effective Scala

    Description

    Apache Spark is written in Scala. Although Spark provides a Java API, many data engineers are adopting Scala since it’s the “native” language for Sparkā€”and because Spark code written in Scala is much more concise than comparable Java code. Most data scientists, however, continue to use Python and R.

    If you want to learn Scala for Spark, this is the tutorial for you. Dean Wampler offers an overview of the core features of Scala you need to use Spark effectively, using hands-on exercises with the Spark APIs. You’ll learn the most important Scala syntax, idioms, and APIs for Spark development.

    Topics include:

    • Classes, methods, and functions
    • Immutable versus mutable values
    • Type inference
    • Pattern matching
    • Scala collections and the common operations on them (the basis of the RDD API)
    • Other Scala types like case classes, tuples, and options
    • Domain-specific languages in Scala
    • Effective use of the Spark shell (Scala interpreter)
    • Common mistakes (e.g., serialization errors) and how to avoid them
    Photo of Dean Wampler

    Dean Wampler

    Lightbend

    Dean Wampler is the vice president of fast data engineering at Lightbend, where he leads the creation of the Lightbend Fast Data Platform, a streaming data platform built on the Lightbend Reactive Platform, Kafka, Spark, Flink, and Mesosphere DC/OS. Dean is the author of Programming Scala and Functional Programming for Java Developers and the coauthor of Programming Hive, all from O’Reilly. He is a contributor to several open source projects. He’s also the co-organizer of several conferences around the world and several user groups in Chicago.

    Leave a Comment or Question

    Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

    Join the conversation here (requires login)