Presented By O'Reilly and Cloudera
Make Data Work
5–7 May, 2015 • London, UK

Hadoop & Beyond conference sessions

Tools beyond Hadoop—such as Cassandra, Storm, Accumulo, Kafka and Spark—and how they fit in the data science toolkit.

Wednesday, 06 May

Add to your personal schedule
10:55–11:35 Wednesday, 6/05/2015
Location: Buckingham Room - Palace Suite
Xuefu Zhang (Cloudera), Rui Li (Intel)
Average rating: ***..
(3.00, 5 ratings)
This presentation will talk about the motivation, design principles, architecture, challenges, and current status of the community project to make Spark a new back-end processing engine for Hive. Read more.
Add to your personal schedule
11:45–12:25 Wednesday, 6/05/2015
Location: Buckingham Room - Palace Suite
Patrick Wendell (Databricks)
Average rating: ****.
(4.73, 15 ratings)
Apache Spark is a popular engine for fast and efficient data processing. This talk will cover recent feature additions to Spark, such as the elastic scaling support, new algorithms in MLlib, and the Spark SQL datasources API. It will also outline the Spark roadmap for upcoming months. Since this talk is not until May, specific roadmap details will be determined close to the talk itself. Read more.
Add to your personal schedule
13:45–14:25 Wednesday, 6/05/2015
Location: Buckingham Room - Palace Suite
Martin Kleppmann (Independent)
Average rating: ****.
(4.71, 14 ratings)
Data is only useful if you can process it, analyse it, and create valuable products from it. If you have an idea for a new data-driven product, how long does it take you to get it into production? In this talk, we'll discuss Apache Kafka and Samza, open source tools created at LinkedIn with the goal of helping teams implement data products and ship them to production rapidly. Read more.
Add to your personal schedule
14:35–15:15 Wednesday, 6/05/2015
Location: Buckingham Room - Palace Suite
Jim Scott (MapR Technologies, Inc.)
Average rating: **...
(2.33, 6 ratings)
The Zeta Architecture is the enterprise architecture that describes how to move your business to the next generation. It combines a data center-wide resource manager, a rock-solid distributed file system, containerization, a big data processing platform, stream processing, a real-time data store, independent application architectures, and custom enterprise applications. Read more.
Add to your personal schedule
16:15–16:55 Wednesday, 6/05/2015
Location: Buckingham Room - Palace Suite
Cory O'Connor (Google), Emre Baran (Qubit)
Average rating: ****.
(4.00, 6 ratings)
This presentation will provide a brief technical overview of Google Bigtable and the global problems we solve internally at Google with this revolutionary architecture. We'll discuss some of our innovations since the original paper was released, what we’ve been working on with HBase, and include announcements on where we're headed next! Read more.
Add to your personal schedule
17:05–17:45 Wednesday, 6/05/2015
Location: Buckingham Room - Palace Suite
Jacques Nadeau (Dremio)
Average rating: ****.
(4.67, 9 ratings)
Technical overview of how Apache Drill enables high performance analysis of complex and dynamic data. Will discuss the role of self-describing data in a modern distributed data lake, the requirement for adaptive execution, and how to expose dynamic schema using SQL. Read more.

Thursday, 07 May

Add to your personal schedule
10:55–11:35 Thursday, 7/05/2015
Location: Buckingham Room - Palace Suite
Costin Leau (Elastic)
Average rating: ****.
(4.33, 3 ratings)
Search is more than typing words into a box. It's evolved into the backbone for today’s analytics demands​ and is an asset for businesses ​to ​ask the right questions ​in order to make sense of their data. Versatile, agile search and analytics can uncover the “uncommonly common” trends within, giving businesses real-time insights and setting them up to make the right data-driven decisions. Read more.
Add to your personal schedule
11:45–12:25 Thursday, 7/05/2015
Location: Buckingham Room - Palace Suite
Anirudh Koul (Microsoft), Shashank Singh (Microsoft)
Average rating: ****.
(4.78, 9 ratings)
We share lessons learned the hard way while building a real-time search, analytics, and trends pipeline over social media posts, using Elasticsearch, Azure, and Spark Streaming. Topics cover building an end-to-end pipeline including stream processing, applying natural language processing tools, scaling and performance tuning, search relevance, and applications like TV trends. Read more.
Add to your personal schedule
13:45–14:25 Thursday, 7/05/2015
Location: Buckingham Room - Palace Suite
Oscar Méndez (Stratio), David Morales (STRATIO)
Average rating: ***..
(3.09, 11 ratings)
Nowadays, all kinds of businesses need to deal with real-time information in order to successfully deliver their core services. SPARKTA was born to meet this demand. Thanks to this technology, real-time analysis is readily available for every use case with absolutely no coding. SPARKTA is easy to deploy, and also open source, fast, scalable, and fault-tolerant. Read more.
Add to your personal schedule
14:35–15:15 Thursday, 7/05/2015
Location: Buckingham Room - Palace Suite
Dean Wampler (Lightbend)
Average rating: ****.
(4.70, 10 ratings)
Spark is often seen as a replacement for MapReduce in Hadoop systems, but Spark clusters can also be deployed and managed by Mesos. This talk explains how to use Mesos for Spark applications. Using example applications, we'll examine the pros and cons of using Mesos vs. Hadoop YARN as a data platform and discuss practical issues when running Spark on Mesos. Read more.
Add to your personal schedule
16:15–16:55 Thursday, 7/05/2015
Location: Buckingham Room - Palace Suite
Tyler Akidau (Google)
Average rating: ****.
(4.14, 7 ratings)
Learn what it takes to ditch your Big Data batch pipelines and go all-streaming-all-the-time, without compromising latency, correctness, or the flexibility to deal with changes in upstream data. Read more.
Add to your personal schedule
17:05–17:45 Thursday, 7/05/2015
Location: Buckingham Room - Palace Suite
Stephan Ewen (data Artisans)
Average rating: ****.
(4.20, 5 ratings)
Apache Flink is a data analysis engine designed to match Hadoop in reliability and Spark in performance. Flink introduces novel features such as cost-based optimization for Java and Scala programs, native iterative processing, unification of streaming and batch processing, and efficient hybrid in-memory/on-disk processing. Flink has more than 70 contributors from industry and academia. Read more.