Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Schedule: Architecture sessions

Add to your personal schedule
9:00am - 5:00pm Monday, September 25 & Tuesday, September 26
Data engineering
Location: 1A 04/05
SOLD OUT
Bruce Martin (Cloudera)
Average rating: *....
(1.50, 2 ratings)
Bruce Martin leads you through designing and architecting solutions to a challenging business problem. You'll explore big data application architecture concepts in general and then apply them to the design of a challenging system. Read more.
Add to your personal schedule
9:00am - 5:00pm Monday, September 25 & Tuesday, September 26
SOLD OUT
Jesse Anderson (Big Data Institute)
To handle real-time big data, you need to solve two difficult problems: how do you ingest that much data and how will you process that much data? Jesse Anderson explores the latest real-time frameworks (both open source and managed cloud services), discusses the leading cloud providers, and explains how to choose the right one for your company. Read more.
Add to your personal schedule
9:00am12:30pm Tuesday, September 26, 2017
Data Engineering & Architecture, Spark & beyond
Location: 1E 12/13 Level: Intermediate
John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science)
Average rating: ***..
(3.27, 11 ratings)
What are the essential components of a data platform? John Akred and Stephen O'Sullivan explain how the various parts of the Hadoop, Spark, and big data ecosystems fit together in production to create a data platform supporting batch, interactive, and real-time analytical workloads. Read more.
Add to your personal schedule
9:00am12:30pm Tuesday, September 26, 2017
Big data and the Cloud, Data Engineering & Architecture
Location: 1E 10 Level: Intermediate
Jennifer Wu (Cloudera), Fahd Siddiqui (Cloudera), Paul George (Cloudera), Eugene Fratkin (Cloudera)
Average rating: *....
(1.50, 2 ratings)
Jennifer Wu, Paul George, Fahd Siddiqui, and Eugene Fratkin lead a deep dive into running data engineering workloads in a managed service capacity in the public cloud. Along the way, they share AWS infrastructure best practices and explain how data engineering workloads interoperate with data analytic workloads. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, September 26, 2017
Big data and the Cloud, Data Engineering & Architecture
Location: 1E 15/16 Level: Intermediate
Ryan Nienhuis (Amazon Web Services), Radhika Ravirala (Amazon Web Services (AWS)), Allan MacInnis (Amazon Web Services), Ben Snively (Amazon Web Services (AWS))
Average rating: ****.
(4.00, 2 ratings)
Want to learn how to use Amazon's big data web services to launch your first big data application on the cloud? Ryan Nienhuis, Radhika Ravirala, Allan MacInnis, and Ben Snively walk you through building a big data application using a combination of open source technologies and AWS managed services. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, September 26, 2017
Jonathan Seidman (Cloudera), Gwen Shapira (Confluent), Mark Grover (Lyft)
Average rating: ****.
(4.11, 9 ratings)
Using Customer 360 and the IoT as examples, Jonathan Seidman, Mark Grover, and Gwen Shapira explain how to architect a modern, real-time big data platform leveraging recent advancements in the open source software world, using components like Kafka, Impala, Kudu, Spark Streaming, and Spark SQL with Hadoop to enable new forms of data processing and analytics. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, September 26, 2017
Karthik Ramasamy (Streamlio), Sanjeev Kulkarni (Streamlio), Arun Kejariwal (MZ), Neng Lu (Twitter), Sijie Guo (Streamlio)
Average rating: ***..
(3.00, 3 ratings)
Karthik Ramasamy, Sanjeev Kulkarni, Avrilia Floratau, Ashvin Agrawal, Arun Kejariwal, and Sijie Guo walk you through state-of-the-art streaming systems, algorithms, and deployment architectures, covering the typical challenges in modern real-time big data platforms and offering insights on how to address them. Read more.
Add to your personal schedule
1:15pm1:55pm Wednesday, September 27, 2017
Michael Freedman (TimescaleDB | Princeton)
Average rating: ****.
(4.50, 4 ratings)
Michael Freedman offers an overview of TimescaleDB, a new scale-out database designed for time series workloads yet open-sourced and engineered up as a plugin to Postgres. Unlike most time series newcomers, TimescaleDB supports full SQL while achieving fast ingest and complex queries. Read more.
Add to your personal schedule
2:05pm2:45pm Wednesday, September 27, 2017
Big data and the Cloud, Data Engineering & Architecture
Location: 1A 15/16/17 Level: Intermediate
Henry Robinson (Cloudera), Greg Rahn (Cloudera)
Cloud environments will likely play a key role in your business’s future. Henry Robinson and Greg Rahn explore the workload considerations when evaluating the cloud for analytics and discuss common architectural patterns to optimize price and performance. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 27, 2017
Stephen Devine (Big Fish Games), Kalah Brown (Big Fish Games)
Companies are increasingly interested in processing and analyzing live-streaming data. The Hadoop ecosystem includes platforms and software library frameworks to support this work, but these components require correct architecture, performance tuning, and customization. Stephen Devine and Kalah Brown explain how they used Spark, Flume, and Kafka to build a live-streaming data pipeline. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 27, 2017
Data engineering, Data Engineering & Architecture
Location: 1A 15/16/17 Level: Intermediate
Paul Curtis (MapR Technologies)
Average rating: ****.
(4.67, 3 ratings)
A microservices architecture benefits from the agility of containers for convenient, predictable deployment of applications, while persistent, performant message streaming makes both work better. Paul Curtis explores these infrastructure components and discusses the design of highly scalable real-world systems that take advantage of this powerful triad. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 27, 2017
Data engineering, Data Engineering & Architecture
Location: 1A 23/24 Level: Advanced
Barbara Eckman (Comcast)
Average rating: ***..
(3.00, 2 ratings)
Barbara Eckman offers an overview of Comcast’s streaming data platform, comprised of a variety of ingest, transformation, and storage services, which uses Apache Avro schemas to support end-to-end data governance, Apache Atlas for data discovery and lineage, and custom asynchronous messaging libraries to notify Atlas of new data and schema entities and lineage links as they are created. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 27, 2017
Dave Shuman (Cloudera), James Kirkland (Red Hat)
Eclipse IoT is an ecosystem of organizations that are working together to establish an IoT architecture based on open source technologies and standards. Dave Shuman and James Kirkland showcase an end-to-end architecture for the IoT based on open source standards, highlighting Eclipse Kura, an open source stack for gateways and the edge, and Eclipse Kapua, an open source IoT cloud platform. Read more.
Add to your personal schedule
11:20am12:00pm Thursday, September 28, 2017
Gwen Shapira (Confluent)
Average rating: ****.
(4.50, 2 ratings)
Gwen Shapira explains how the three realities of modern programming—the explosion of data and data systems, building business processes as microservices instead of monolithic applications, and the rise of the public cloud—affect how developers and companies operate today and why companies across all industries are turning to streaming data and Apache Kafka for mission-critical applications. Read more.
Add to your personal schedule
11:20am12:00pm Thursday, September 28, 2017
Michael Crutcher (Cloudera), Ryan Lippert (Cloudera)
A long time ago in a data center far, far away, we deployed complex lambda architectures as the backbone of our IoT solutions. Though hard, they enabled collection of real-time sensor data and slightly delayed analytics. Michael Crutcher and Ryan Lippert explain why Apache Kudu, a relational storage layer for fast analytics on fast data, is the key to unlocking the value in IoT data. Read more.
Add to your personal schedule
1:15pm1:55pm Thursday, September 28, 2017
Data Engineering & Architecture, Real-time applications
Location: 1E 09 Level: Beginner
Matteo Merli (Streamlio), Sijie Guo (Streamlio)
Average rating: *****
(5.00, 2 ratings)
Modern enterprises produce data at increasingly high volume and velocity. To process data in real time, new types of storage systems have been designed, implemented, and deployed. Matteo Merli and Sijie Guo offer an overview of Apache DistributedLog and Pulsar, real-time storage systems built using Apache BookKeeper and used heavily in production. Read more.
Add to your personal schedule
1:15pm1:55pm Thursday, September 28, 2017
Data engineering, Strata Business Summit
Location: 1E 10/11 Level: Intermediate
Kurt Brown (Netflix)
Average rating: ****.
(4.40, 5 ratings)
Kurt Brown explains how to get the most out of your data infrastructure with 20 principles and practices used at Netflix. Kurt covers each in detail and explores how they relate to the technologies used at Netflix, including S3, Spark, Presto, Druid, R, Python, and Jupyter. Read more.
Add to your personal schedule
1:15pm1:55pm Thursday, September 28, 2017
Data engineering, Machine Learning & Data Science
Location: 1A 06/07 Level: Intermediate
Steven Totman (Cloudera), Faraz Rasheed (TD Bank)
Average rating: *****
(5.00, 2 ratings)
Steven Totman and Faraz Rasheed offer an overview of Griffin, a high-level, easy-to-use framework built on top of Spark, which encapsulates the complexities of common model development tasks within four phases: data understanding, feature extraction, model development, and serving modeling results. Read more.
Add to your personal schedule
2:55pm3:35pm Thursday, September 28, 2017
Data engineering, Data Engineering & Architecture
Location: 1A 23/24 Level: Intermediate
Felix GV (LinkedIn), Yan Yan (LinkedIn)
Average rating: **...
(2.00, 1 rating)
Companies with batch and stream processing pipelines need to serve the insights they glean back to their users, an often-overlooked problem that can be hard to achieve reliably and at scale. Felix GV and Yan Yan offer an overview of Venice, a new data store capable of ingesting data from Hadoop and Kafka, merging it together, replicating it globally, and serving it online at low latency. Read more.
Add to your personal schedule
2:55pm3:35pm Thursday, September 28, 2017
Big data and the Cloud, Data Engineering & Architecture
Location: 1A 15/16/17 Level: Intermediate
Jennifer Wu (Cloudera), Philip Langdale (Cloudera), Kostas Sakellis (Cloudera)
With its scalable data store, elastic compute, and pay-as-you-go cost model, cloud infrastructure is well-suited for large-scale data engineering workloads. Jennifer Wu, Philip Langdale, and Kostas Sakellis explore the latest cloud technologies, focusing on data engineering workloads, cost, security, and ease-of-use implications for data engineers. Read more.
Add to your personal schedule
4:35pm5:15pm Thursday, September 28, 2017
Data-driven business management, Strata Business Summit
Location: 1A 18 Level: Intermediate
Philip Russom (TDWI: The Data Warehousing Institute)
Philip Russom explains how a data lake can improve the role of Hadoop in data-driven business management. With the right end-user tools, a data lake can enable self-service data practices that wring business value from big data and modernize and extend programs for data warehousing, analytics, data integration, and other data-driven solutions. Read more.