Presented By O'Reilly and Cloudera
Make Data Work
31 May–1 June 2016: Training
1 June–3 June 2016: Conference
London, UK

Architecture conference sessions

13:30–17:00 Wednesday, 1/06/2016
Patrick McFadin (DataStax)
We as an industry are collecting more data every year. IoT, web, and mobile applications send torrents of bits to our data centers that have to be processed and stored, even as users expect an always-on experience—leaving little room for error. Patrick McFadin explores how successful companies do this every day using the powerful Team Apache: Apache Kafka, Spark, and Cassandra.
16:35–17:15 Friday, 3/06/2016
Calum Murray (Intuit)
As Intuit evolved QuickBooks, Payroll, Payments, and other product offerings into a SaaS business and an open cloud platform, it quickly became apparent that business analytics could no longer be treated as an afterthought but had to be part of the platform architecture as a first-class concern. Calum Murray outlines key design considerations when architecting analytics into your SaaS platform.
13:30–17:00 Wednesday, 1/06/2016
John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Data Whisperers)
What are the essential components of a data platform? John Akred and Stephen O'Sullivan explain how the various parts of the Hadoop and big data ecosystems fit together in production to create a data platform supporting batch, interactive, and real-time analytical workloads.
16:35–17:15 Thursday, 2/06/2016
Jennifer Wu (Cloudera)
Jennifer Wu outlines concepts for successfully running Hadoop in the cloud, provides guidance on selecting cloud storage, covers real-world examples of Hadoop deployment patterns in public clouds, and demos Cloudera Director provisioning on AWS.
9:00–12:30 Wednesday, 1/06/2016
Scott Kurth (Silicon Valley Data Science), John Akred (Silicon Valley Data Science)
Big data and data science have great potential to accelerate business, but how do you reconcile the opportunity with the sea of possible technologies? Conventional data strategy offers little to guide us, focusing more on governance than on creating new value. Scott Kurth and John Akred explain how to create a modern data strategy that powers data-driven business.
17:25–18:05 Thursday, 2/06/2016
Chad Metcalf (Docker), Seshadri Mahalingam (Trifacta)
Developers of big data applications face a unique challenge testing their software against a diverse ecosystem of data platforms that can be complex and resource intensive to deploy. Chad Metcalf and Seshadri Mahalingam explain why Docker offers a simpler model for systems by encapsulating complex dependencies and making deployment onto servers dynamic and lightweight.
9:00–12:30 Wednesday, 1/06/2016
Jonathan Seidman (Cloudera), Mark Grover (Lyft), Gwen Shapira (Confluent), Ted Malaska (Capital One)
Jonathan Seidman, Mark Grover, Gwen Shapira, and Ted Malaska walk attendees through an end-to-end case study of building a fraud detection system, providing a concrete example of how to architect and implement real-time systems.
14:55–15:35 Friday, 3/06/2016
Nicholas Turner (Incited)
Nick Turner offers an insightful view on how technology is delivering self-service analytics through visualization and enabling business users to quickly explore their data at scale.
16:35–17:15 Friday, 3/06/2016
Ignacio Manuel Mulas Viela (Ericsson), Nicolas Seyvet (Ericsson AB)
ICT systems are growing in size and complexity. Monitoring and orchestration mechanisms need to evolve and provide richer capabilities to help handle them. Ignacio Manuel Mulas Viela and Nicolas Seyvet analyze a stream of telemetry/logs in real time by following the Kappa architecture paradigm, using machine-learning algorithms to spot unexpected behaviors from an in-production cloud system.
17:25–18:05 Thursday, 2/06/2016
Jim Scott (NVIDIA)
Application messaging isn’t new. Solutions like message queues have been around for a long time, but newer solutions like Kafka have emerged as high-performance, high-scalability alternatives that integrate well with Hadoop. Should distributed messaging systems like Kafka be considered replacements for legacy technologies? Jim Scott answers that question by delving into architectural trade-offs.
12:05–12:45 Friday, 3/06/2016
Hellmar Becker (Hortonworks), Frank Albers (ING)
How do you connect a Hadoop cluster to an enterprise directory with 100,000+ users and centralized role and access management? Hellmar Becker and Frank Albers present ING's approach to aligning Hadoop authentication and role management with ING’s policies and architecture, discuss challenges they met on the way, and outline the solutions they found.
14:55–15:35 Thursday, 2/06/2016
Carl Steinbach (LinkedIn)
Carl Steinbach offers an overview of Dali, LinkedIn's collection of libraries, services, and development tools that are united by the common goal of providing a dataset API for Hadoop.
12:05–12:45 Thursday, 2/06/2016
David Talby (Pacific AI), Claudiu Branzan (Accenture)
David Talby and Claudiu Branzan offer a live demo of an end-to-end system that makes nontrivial clinical inferences from free-text patient records. Infrastructure components include Kafka, Spark Streaming, Spark, Titan, and Elasticsearch; data science components include custom UIMA annotators, curated taxonomies, machine-learned dynamic ontologies, and real-time inferencing.
12:05–12:45 Thursday, 2/06/2016
Vida Ha (Databricks), Prakash Chockalingam (Databricks)
So you’ve successfully tackled big data. Now let Vida Ha and Prakash Chockalingam help you take it real time and conquer fast data. Vida and Prakash cover the most common uses cases for streaming, important streaming design patterns, and the best practices for implementing them to achieve maximum throughput and performance of your system using Spark Streaming.
12:05–12:45 Friday, 3/06/2016
Xavier Léauté (Confluent)
Xavier Léauté shares his experience and relates the challenges scaling Metamarkets's real-time processing to over 3 million events per second. Built entirely on open source, the stack performs streaming joins using Kafka and Samza and feeds into Druid, serving 1 million interactive queries per day.
11:15–11:55 Friday, 3/06/2016
Steven Noels (NGDATA)
Steven Noels explains how to prime the Hadoop ecosystem for real-time data analysis and actionability, examining ways to evolve from batch processing to real-time stream-based processing.