Presented By O'Reilly and Cloudera
Make Data Work
Feb 17–20, 2015 • San Jose, CA

Schedule: Hadoop & Beyond sessions

Tools beyond Hadoop—such as Cassandra, Storm, Accumulo, Kafka and Spark—and how they fit in the data science toolkit.

Wednesday, February 18

9:00am–5:00pm Wednesday, 02/18/2015
Location: LL21 E/F
Paco Nathan (derwen.ai), Holden Karau (Independent), Krishna Sankar (U.S.Bank), Reza Zadeh (Matroid | Stanford), Denny Guang-yeu Lee (Microsoft), Chris Fregly (PipelineAI)
Average rating: ***..
(3.71, 17 ratings)
A full-day, hands-on tutorial introducing Apache Spark and libraries for building workflows: Spark SQL, Spark Streaming, MLlib, GraphX, etc. Read more.
9:00am–12:30pm Wednesday, 02/18/2015
Location: 210 A/E
John Russell (Cloudera), Alan Choi (Cloudera)
Average rating: *....
(1.80, 5 ratings)
Impala is the massively parallel analytic database delivering interactive performance on Hadoop. In this half-day tutorial, we'll walk you through hands-on exercises, taking you from zero to up and running with Impala. Read more.
9:00am–12:30pm Wednesday, 02/18/2015
Location: 210 B/F
Patrick McFadin (Datastax)
Average rating: ***..
(3.62, 8 ratings)
Apache Cassandra has proven to be one of the best solutions for storing and retrieving time series data. Add in Apache Spark and Kafka, you have an amazing time series solution. We will talk data models, go through deployment and code to build a functional, real-time application. Languages used: Java, Scala Read more.

Thursday, February 19

10:40am–11:20am Thursday, 02/19/2015
Location: 230 C
Jay Kreps (Confluent)
Average rating: ****.
(4.85, 13 ratings)
What happens if you take everything that is happening in your company--every click, every impression, every database change, every application log--and make it all available as a real-time stream of well structured data? Companies such as LinkedIn have done this experiment and this talk will describe how this changes the way data is thought about and put to use in an organization. Read more.
11:30am–12:10pm Thursday, 02/19/2015
Location: 230 C
Jim Scott (NVIDIA)
Average rating: ****.
(4.50, 6 ratings)
Processing data from social media streams and sensors devices in real-time is becoming increasingly prevalent and there are plenty open source solutions to choose from. To help practitioners decide what to use when we compare three popular Apache projects allowing to do stream processing: Apache Storm, Apache Spark and Apache Samza. Read more.
1:30pm–2:10pm Thursday, 02/19/2015
Location: 210 C/G
Richard Williamson (Silicon Valley Data Science)
Average rating: ***..
(3.00, 3 ratings)
Getting the full value from data often requires the combination of stream processing on new events combined with large scale historical analysis. While both these activities are served by Spark’s execution framework, leveraging multiple persistence layers is key to efficiently and extensibly enabling these use cases. Read more.
1:30pm–2:10pm Thursday, 02/19/2015
Location: 230 C
Eric Schmidt (Google)
Average rating: ***..
(3.71, 7 ratings)
Map Reduce, Millwheel and other technologies changed the way data scientists approached data problems. New technologies like Spark and Cloud Dataflow deal with the complexity of stringing together map reduces and creating end-to-end programming logic from multiple steps by making Big Data into a concrete set of executable operations. Gain insights into programming options and what comes next. Read more.
2:20pm–3:00pm Thursday, 02/19/2015
Location: 230 C
Jacques Nadeau (Dremio)
Average rating: ****.
(4.75, 8 ratings)
I will talk about how Drill achieves high performance with flexibility and ease of use. Includes: First read planning and statistics. Flexible code generation depending on workload. Code optimization and planning techniques. Dynamic schema subsets. Advanced memory use and moving between Java and C. Making a static typing appear dynamic through any-time and multi-phase planning. Read more.
4:00pm–4:40pm Thursday, 02/19/2015
Location: 230 C
Average rating: ***..
(3.88, 8 ratings)
The explosion of internal data sources, data “lakes” (e.g., Hadoop), external public data sources, and feeds from the Internet of Things is creating a tsunami of diverse data sources for enterprises to leverage. Top-down data-integration and data-scientist tools won’t scale to meet integration demands. Learn how a scalable data curation platform can help enterprises with data integration at scale. Read more.
4:50pm–5:30pm Thursday, 02/19/2015
Location: 230 C
Randy Guck (Dell Software)
Average rating: *****
(5.00, 1 rating)
Not all big data problems require big cluster solutions. Doradus OLAP compresses data into compact shards, yielding fast analytical queries using little disk even for big data sets. Learn how Doradus leverages OLAP techniques, columnar storage, and Cassandra to yield sophisticated query features while using amazingly little disk space. Read more.

Friday, February 20

10:40am–11:20am Friday, 02/20/2015
Location: 210 A/E
Kurt Brown (Netflix)
Average rating: ****.
(4.83, 18 ratings)
The Netflix Data Platform is a constantly evolving, large scale infrastructure running in the (AWS) cloud. We are especially focused on performance and ease of use, with initiatives including Presto integration, Spark, and our Big Data Portal and API. This talk will dive into the various technologies we use, the motivations behind our approach, and the business benefits we get. Read more.
11:30am–12:10pm Friday, 02/20/2015
Location: 210 A/E
Costin Leau (Elastic)
Average rating: ***..
(3.00, 4 ratings)
Search is more than typing words into a box. It's evolved into the backbone for today’s analytics demands​,​​ and an asset for businesses ​to ​ask the right questions to make sense of their data. This session will highlight how a versatile, agile search and analytics platform can give shape to data, and uncover the "uncommonly common” trends within, to make the right data-driven decisions. Read more.
1:30pm–2:10pm Friday, 02/20/2015
Location: 210 A/E
Jairam Ranganathan (Cloudera)
With hundreds of developers from a variety of organizations participating, Hadoop moves quickly. This talk will survey the important changes admins and users should be aware of and their impacts to various use cases. Read more.
2:20pm–3:00pm Friday, 02/20/2015
Location: 210 A/E
Fangjin Yang (Imply), Vadim Ogievetsky (Imply)
Average rating: ****.
(4.00, 2 ratings)
The maturation of big data technologies has enabled numerous organizations to derive insights from vast quantities of data. The next set of challenges we face involve building applications that allow us to visualize, navigate, and interpret this data. Creating intuitive user interfaces is often a cumbersome process requiring complex data transformations, integrations, and queries. Read more.
2:20pm–3:00pm Friday, 02/20/2015
Location: 230 C
Ted Dunning (MapR)
Average rating: ***..
(3.50, 2 ratings)
YARN and MESOS are often positioned as competitors for managing datacenter resources, but in reality they work together to seamlessly share datacenter resources. Why force IT to choose between these two great technologies, when we can show you how they work in concert. Read more.