Schedule: Hadoop & Beyond sessions

Add to your personal schedule
Location: 212
Average rating: ***..
(3.67, 6 ratings)
SAMOA is an open-source platform for mining big data streams that runs on several distributed stream processing engines (such as S4 and Storm), and includes streaming algorithms for the most common machine learning tasks such as classification and clustering. More info at http://samoa-project.net Read more.
Add to your personal schedule
Location: 212
Claudiu Barbura (Ubix), David Talby (Atigeo)
Average rating: ***..
(3.31, 13 ratings)
Live demo of building an intelligent big data application from a web console. The tools and APIs behind are built on top of Spark, Shark, Tachyon, Mesos, Aurora, Cassandra, iPython and include: ELT pipeline (ingestion and transformation), data warehouse explorer, export to NoSql and generated APIs, predictive model building, training and publishing, dashboard UI, monitoring and instrumentation Read more.
Add to your personal schedule
Location: 212
Paco Nathan (O'Reilly Media)
Average rating: ****.
(4.00, 10 ratings)
Apache Spark: Streaming case studies based on interviews with the dev teams, compared and contrasted with alternative open source projects, plus an open source example that demonstrates integration of Spark Streaming, Spark SQL, and Tachyon within a single app. Read more.
Add to your personal schedule
Location: 212
Tim Berglund (DataStax)
Average rating: ****.
(4.25, 4 ratings)
An exploration of Apache Spark, an in-memory analytics framework that applies functional programming paradigms to provide ad-hoc analysis for distributed databases like Cassandra. Read more.
Add to your personal schedule
Location: 212
Costin Leau (Elastic)
Average rating: **...
(2.89, 9 ratings)
A practical exploration of anomaly detection (from credit card fraud to incorrectly tagged movies) through harnessing the power of the 'inverted index' - the foundation of information retrieval systems. Use Hadoop, Elasticsearch and Spark to gain insights into your big data and discover 'what stands out' at scale. Read more.
Add to your personal schedule
Location: 211
Paco Nathan (O'Reilly Media), Hossein Falaki (Databricks Inc.), Aaron Davidson (Databricks), Sameer Farooqui (Databricks), Alex Sicoe (Elsevier), Olivier Girardot (Lateral Thoughts)
Average rating: ***..
(3.00, 30 ratings)
Spark Camp: An Introduction to Apache Spark with Hands-on Tutorials. Read more.
Add to your personal schedule
Location: 120-121
Shay Banon (Elasticsearch)
Average rating: ****.
(4.67, 3 ratings)
Thanks to technologies like NoSQL and Hadoop, organizations can store massive amounts of data that’s important to their business. Now the challenge is how to extract actionable insights from it. This session will explore why search is the foundation to gain value from “big data” across your business - from marketing, to product, to backend infrastructure - highlighting a few real-world examples. Read more.
Add to your personal schedule
Location: 120-121
Kathleen Ting (Cloudera)
Average rating: ****.
(4.50, 4 ratings)
The next generation of MapReduce, YARN, has widely touted job throughput and Apache Hadoop cluster utilization benefits. Less known are the pitfalls littering the migration path to YARN. Learn from our extensive field experience to avoid those pitfalls and get your YARN cluster configured right the first time. Read more.
Add to your personal schedule
Location: 120-121
nick dimiduk (Hortonworks, Inc)
Average rating: ****.
(4.17, 6 ratings)
Your application is out-growing its database, you've started shopping NoSQL options. Maybe you've adopted Hadoop into your Data Warehouse. You've heard HBase might be an appropriate technology, but you need to know more. This talk is for you. To understand its use, first understand how it works. This talk explores the design of HBase and its critical paths to ground an understanding of its use. Read more.
Add to your personal schedule
Location: 120-121
Cindy Lamm (comSysto GmbH), Michael Hausenblas (Mesosphere)
Average rating: ***..
(3.89, 9 ratings)
We will describe our experiences in implementing a full-scale, data-driven application applied to a large anonymised dataset from the mobile operator Telefonica using Map-Reduce Our project was unusual in the breadth of techniques used and also in the diversity in our goals. We will provide our perspective based on our project and examine how upcoming technologies would have impacted our efforts Read more.
Add to your personal schedule
Location: 120-121
John Akred (Silicon Valley Data Science)
Average rating: ***..
(3.75, 4 ratings)
Creating a data architecture involves many moving parts. By examining the data value chain, from ingestion through to analytics, we will explain how the various parts of the Hadoop and big data ecosystem fit together to support batch, interactive and realtime analytical workloads. Read more.
Add to your personal schedule
Location: 120-121
Jim Scott (MapR Technologies, Inc.)
Average rating: ****.
(4.25, 8 ratings)
Apache Mesos, Apache Hadoop, Apache Spark + Custom Enterprise Applications: This stack combined is greater than the sum of each of the pieces of this stack. Couple all of that with custom enterprise applications, and the data center turns into a well-oiled machine. When combined, this software stack delivers unlimited flexibility for the entire data center. Read more.
Add to your personal schedule
Location: 127-128
Paco Nathan (O'Reilly Media), Aaron Davidson (Databricks), Sameer Farooqui (Databricks), Hossein Falaki (Databricks Inc.), Alex Sicoe (Elsevier), Olivier Girardot (Lateral Thoughts)
Average rating: ****.
(4.00, 2 ratings)
Join the Spark Team for an informal question and answer session. Read more.
Add to your personal schedule
Location: 122-123
Mark Grover (Cloudera), Gwen Shapira (Confluent), Ted Malaska (Blizzard), Jonathan Seidman (Cloudera)
Average rating: ***..
(3.77, 13 ratings)
Are you looking for a deeper understanding of how to integrate components in the Apache Hadoop ecosystem to implement data management and processing solutions? Then this tutorial is for you. We'll provide a clickstream analytics example illustrating how to architect solutions with Apache Hadoop along with providing best practices and recommendations for using Hadoop and related tools. Read more.