San FranciscoLondonNew York

Presented By
O’Reilly + Cloudera

Make Data Work

29 April–2 May 2019
London, UK

Schedule: Data Integration and Data Pipelines sessions

9:00 - 17:00 Monday, 29 April & Tuesday, 30 April

Professional Kafka development

Data Engineering and Architecture
Location: London Suite 2

Jesse Anderson (Big Data Institute)

Average rating:

(5.00, 1 rating)

Jesse Anderson offers an in-depth look at Apache Kafka. You'll learn how Kafka works and how to create real-time systems with it as well as how to create consumers and publishers. Jesse then walks you through Kafka’s ecosystem, demonstrating how to use tools like Kafka Streams, Kafka Connect, and KSQL. Read more.

9:00 - 17:00 Monday, 29 April & Tuesday, 30 April

Building a serverless big data application on AWS

Data Engineering and Architecture
Location: London Suite 3

Jorge Lopez (Amazon Web Services), Nikki Rouda (Amazon Web Services), Damon Cortesi (Amazon Web Services), Sven Hansen (Amazon Web Services), Manos Samatas (Amazon Web Services), Alket Memushaj (Amazon Web Services)

Average rating:

(3.50, 2 ratings)

Serverless technologies let you build and scale applications and services rapidly without the need to provision or manage servers. Join in to learn how to incorporate serverless concepts into your big data architectures. You'll explore design patterns to ingest, store, and analyze your data as you build a big data application using AWS technologies such as S3, Athena, Kinesis, and more. Read more.

13:30–17:00 Tuesday, 30 April 2019

Architecture and algorithms for end-to-end streaming data processing

Data Engineering and Architecture, Streaming and IoT
Location: S11 A

Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio), Ivan Kelly (Streamlio)

Average rating:

(3.00, 10 ratings)

Many industry segments have been grappling with fast data (high-volume, high-velocity data). Arun Kejariwal and Karthik Ramasamy walk you through the state-of-the-art systems for each stage of an end-to-end data processing pipeline—messaging, compute, and storage—for real-time data and algorithms to extract insights (e.g., heavy hitters and quantiles) from data streams. Read more.

11:15–11:55 Wednesday, 1 May 2019

Stream, stream, stream: Different streaming methods with Spark and Kafka

Data Engineering and Architecture, Expo Hall
Location: Expo Hall 2 (Capital Hall N24)

Itai Yaffe (Nielsen)

Average rating:

(4.45, 11 ratings)

NMC (Nielsen Marketing Cloud) provides customers (both marketers and publishers) with real-time analytics tools to profile their target audiences. To achieve that, the company needs to ingest billions of events per day into its big data stores in a scalable, cost-efficient way. Itai Yaffe explains how NMC continuously transforms its data infrastructure to support these goals. Read more.

12:05–12:45 Wednesday, 1 May 2019

The changing face of ETL: Event-driven architectures for data engineers

Data Engineering and Architecture
Location: Capital Suite 8/9

Robin Moffatt (Confluent)

Average rating:

(4.21, 14 ratings)

Robin Moffatt discusses the concepts of events, their relevance to software and data engineers, and their ability to unify architectures in a powerful way. Join in to learn why analytics, data integration, and ETL fit naturally into a streaming world. Along the way, Robin will lead a hands-on demonstration of these concepts in practice and commentary on the design choices made. Read more.

14:55–15:35 Wednesday, 1 May 2019

The Lyft data platform: Now and in the future

Data Engineering and Architecture
Location: Capital Suite 8/9

Mark Grover (Lyft), Deepak Tiwari (Lyft)

Average rating:

(4.69, 13 ratings)

Lyft’s data platform is at the heart of the company's business. Decisions from pricing to ETA to business operations rely on Lyft’s data platform. Moreover, it powers the enormous scale and speed at which Lyft operates. Mark Grover and Deepak Tiwari walk you through the choices Lyft made in the development and sustenance of the data platform, along with what lies ahead in the future. Read more.

16:35–17:15 Wednesday, 1 May 2019

Scalability-aware autoscaling of a Spark application

Data Engineering and Architecture
Location: S11 A

Anirudha Beria (Qubole), Rohit Karlupia (Qubole)

Average rating:

(3.67, 3 ratings)

Autoscaling of resources aims to achieve low latency for a big data application while reducing resource costs at the same time. Scalability-aware autoscaling uses historical information to make better scaling decisions. Anirudha Beria and Rohit Karlupia explain how to measure the efficiency of autoscaling policies and discuss more efficient autoscaling policies, in terms of latency and costs. Read more.

17:25–18:05 Wednesday, 1 May 2019

Mastering streaming and pipelines: Designing and supporting the nervous system of your company

Data Engineering and Architecture, Expo Hall
Location: Expo Hall 2 (Capital Hall N24)

Ted Malaska (Capital One)

Average rating:

(4.12, 8 ratings)

The world of data is all about building the best path to support time and quality to value. 80% to 90% of the work is getting the data into the hands and tools that can create value. Ted Malaska takes you on a journey to investigate strategies and designs that can change the way your company looks and approaches data. Read more.

11:15–11:55 Thursday, 2 May 2019

Transforming a financial services data infrastructure for the modern era by building a PCI DSS-compliant data platform from the ground up on AWS

Data Engineering and Architecture
Location: Capital Suite 10/11

Eoin O'Flanagan (NewDay), Darragh McConville (Kainos)

Average rating:

(4.86, 7 ratings)

Eoin O'Flanagan and Darragh McConville explain how NewDay built a high-performance contemporary data processing platform from the ground up on AWS. Join in to explore the company's journey from a traditional legacy onsite data estate to an entirely cloud-based PCI DSS-compliant platform. Read more.

12:05–12:45 Thursday, 2 May 2019

Schema on read and the new logging way

Data Engineering and Architecture
Location: S11 A

David Josephsen (Sparkpost)

Average rating:

(3.50, 2 ratings)

David Josephsen tells the story of how Sparkpost's reliability engineering team abandoned ELK for a DIY schema-on-read logging infrastructure. Join in to learn the architectural details, trials, and tribulations from the company's Internal Event Hose data ingestion pipeline project, which uses Fluentd, Kinesis, Parquet, and AWS Athena to make logging sane. Read more.

14:05–14:45 Thursday, 2 May 2019

AI for good at scale in real time: Challenges in machine learning and deep learning

Data Science, Machine Learning & AI, Expo Hall
Location: Expo Hall (Capital Hall N24)

Alex Jaimes (Dataminr)

Average rating:

(3.00, 2 ratings)

When emergency events occur, social signals and sensor data are generated. Alex Jaimes explains how to apply machine learning and deep learning to process large amounts of heterogeneous data from various sources in real time, with a particular focus on how such information can be used for emergencies and in critical events for first responders and for other social good use cases. Read more.

14:55–15:35 Thursday, 2 May 2019

Learning how to perform ETL data migrations with open source tool Embulk

Data Engineering and Architecture
Location: Capital Suite 10/11

Jason Bell (Independent Speaker)

Average rating:

(5.00, 1 rating)

The Embulk data migration tool offers a convenient way to load data in to a variety of systems with basic configuration. Jason Bell offers an overview of the Embulk tool and outlines some common data migration scenarios that a data engineer could employ using the tool. Read more.

14:55–15:35 Thursday, 2 May 2019

Architecting a data platform to support analytic workflows for scientific data

Data Engineering and Architecture
Location: Capital Suite 8/9

Jane McConnell (Teradata), Sun Maria Lehmann (Equinor)

Average rating:

(3.67, 3 ratings)

In upstream oil and gas, a vast amount of the data requested for analytics projects is scientific data: physical measurements about the real world. Historically, this data has been managed library style, but a new system was needed to best provide this data. Sun Maria Lehmann and Jane McConnell share architectural best practices learned from their work with subsurface data. Read more.

16:35–17:15 Thursday, 2 May 2019

From legacy to cloud: An end-to-end data integration journey

Data Engineering and Architecture
Location: Capital Suite 8/9

Max Schultze (Zalando SE)

Average rating:

(4.83, 12 ratings)

Max Schultze details Zalondo's end-to-end data integration platform to serve analytical use cases and machine learning throughout the company, covering raw data collection, standardized data preparation (binary conversion, partitioning, etc.), user-driven analytics, and machine learning. Read more.

16:35–17:15 Thursday, 2 May 2019

Executive Briefing: What it takes to use machine learning in fast data pipelines

Executive Briefing and best practices, Strata Business Summit
Location: Capital Suite 13

Dean Wampler (Anyscale)

Average rating:

(5.00, 4 ratings)

Your team is building machine learning capabilities. Dean Wampler demonstrates how to integrate these capabilities in streaming data pipelines so you can leverage the results quickly and update them as needed and covers challenges such as how to build long-running services that are very reliable and scalable and how to combine a spectrum of very different tools, from data science to operations. Read more.

16:35–17:15 Thursday, 2 May 2019

Migrating Apache Oozie workflows to Apache Airflow

Data Engineering and Architecture
Location: S11 B

Feng Lu (Google Cloud), James Malone (Google), Apurva Desai (Google Cloud), Cameron Moberg (Truman State University | Google Cloud)

Average rating:

(4.00, 3 ratings)

Apache Oozie and Apache Airflow (incubating) are both widely used workflow orchestration systems, the former focusing on Apache Hadoop jobs. Feng Lu, James Malone, Apurva Desai, and Cameron Moberg explore an open source Oozie-to-Airflow migration tool developed at Google as a part of creating an effective cross-cloud and cross-system solution. Read more.

Presented by

Global Sponsors

Zettabyte Sponsor

Exabyte Sponsor

Impact Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2019, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com