Presented By O'Reilly and Cloudera
Make Data Work
5–7 May, 2015 • London, UK

Spark Camp

Paco Nathan (, Alex Sicoe (Elsevier)
9:00–17:00 Tuesday, 5/05/2015
Tools & Technology
Location: King's Suite - Sandringham
Average rating: ****.
(4.36, 14 ratings)

Prerequisite Knowledge

Some experience is required in coding in Python, SQL, Java, or Scala, plus some familiarity with Big Data issues/concepts.

Materials or downloads needed in advance

What’s required for a laptop to use in the tutorial? * laptop with wifi and browser, and reasonably current hardware (+2GB RAM) * MacOSX, Windows, Linux — all work fine * make sure that you do not have corporate security controls that would prevent use of network All of the materials will be using cloud-based notebooks, and temporary free accounts for Databricks Cloud will be provided to all of the participants in the tutorial, to run Apache Spark within Amazon AWS.


Spark Camp, organized by the creators of the Apache Spark project at Databricks, will be a day-long, hands-on introduction to the Spark platform including Spark Core, the Spark Shell, Spark Streaming, Spark SQL, MLlib, and more. We will start with an overview of use cases and demonstrate writing simple Spark applications. We will cover each of the main components of the Spark stack via a series of technical talks targeted at developers who are new to Spark. Intermixed with the talks will be periods of hands-on lab work. Attendees will download and use Spark on their own laptops, and learn how to deploy Spark apps in distributed big data environments including common Hadoop distributions and Mesos.


Photo of Paco Nathan

Paco Nathan

O’Reilly author (Just Enough Math and Enterprise Data Workflows with Cascading) and a “player/coach” who’s led innovative data teams building large-scale apps. Director of community evangelism for Apache Spark with Databricks, advisor to Amplify Partners. Expert in machine learning, cluster computing, and enterprise use cases for big data.

Photo of Alex Sicoe

Alex Sicoe


Alex Sicoe is a software engineer at Big Data Partnership working with clients on projects involving scalable storage and compute systems like Apache Spark, Apache Cassandra, Apache Storm, and Apache Hadoop. He has extensive experience building data pipelines involving such systems as well as giving training courses on them. Alex is the first ever certified Databricks trainer. Previously he worked at CERN on building a large scale monitoring system for the ATLAS experiment on top of Apache Cassandra.

Comments on this page are now closed.


Picture of Paco Nathan
Paco Nathan
1/05/2015 17:42 BST

Hi Simon,

Yes, that’s correct. All the materials will be accessible through a browser, with no need for local admin privileges on your laptop.

See you there!

Simon Webb
1/05/2015 12:37 BST

Hi – I’ll be attending Spark Camp on Tuesday and have a question regarding equipment. Can I read into the statement “All of the materials will be using cloud-based notebooks” that I’ll be able to complete the tutorial without local administrator privileges on my laptop? Sorry if that seems a basic question, struggling with corporate IT!

Picture of Paco Nathan
Paco Nathan
9/04/2015 18:17 BST

Definitely not Java :)

In my experience, the choice between Scala or Python depends mostly on the nature of the organizations that you intend to be working in…

Scala has numerous advantages, and use of Spark does not require Scala in depth; however writing extensions to Spark likely will. It is commonly more about engineering for distributed systems infrastructure.

Python is rapidly becoming the preferred language for organizations involved in Data Science work. With Spark, Python is inherently slower than Scala, in terms of what groups of CPUs are going, but in practice faster than Scala in terms of what teams of people are doing to surface insights from data at scale.

Deepak Vadithala
9/04/2015 12:03 BST

Thank you Paco. That will be helpful. I’ll register today and look forward attending Spark Camp.

On separate note, if I need to embrace Spark as a beginner. What do I need to pick up – Scala, Python or Java? I mean – does either of them have big advantage? I wanted to pick up right language as I’m beginning anyway.

Picture of Paco Nathan
Paco Nathan
9/04/2015 3:22 BST

Hi Deepak,

Definitely. The code examples in Python or Scala can be mostly cut&paste if needed. Of course, for those with more coding experience feel free to explore in more detail. We also provide many examples in terms of SQL, which sounds like it would be more toward your background? Hope to see you at Spark Camp!


Deepak Vadithala
8/04/2015 18:13 BST

Hi – I’m very keen to attend Spark Camp but I don’t have Python, Java or Scala skills. I’m come from DB & C# background. Do you still think I can get something from this course? Thanks in advance.


Picture of Paco Nathan
Paco Nathan
31/03/2015 20:04 BST

Hi Elina,

Having some coding experience in Python or Scala will help, and SQL as well. We do not need any advanced features in either language, and frankly there will be many examples that allow for cut&paste. Use of cloud-based notebooks is particularly good for that. The point is more about how to conceptualize typical problems and use Spark to solve them.

We will probably have R support generally available in Spark and the notebooks by the time of Strata EU. I cannot promise it, but that’s going into Spark now and will be a game-changer.

Picture of Elina Jeskanen
Elina Jeskanen
31/03/2015 12:05 BST

What is are the pre requirements to be able to follow Spark Camp? Should one know python or is R sufficient? The purpose would be to learn how to use R on Spark for statistical modeling.