Presented By
O’Reilly + Cloudera
Make Data Work
29 April–2 May 2019
London, UK

Schedule: Data Integration and Data Pipelines sessions

Add to your personal schedule
9:00 - 17:00 Monday, 29 April & Tuesday, 30 April
Data Engineering and Architecture
Location: Capital Suite 16
Jesse Anderson (Big Data Institute)
Takes a participant through an in-depth look at Apache Kafka. We show how Kafka works and how to create real-time systems with it. It shows how to create consumers and publishers in Kafka. The we look at Kafka’s ecosystem and how each one is used. We show how to use Kafka Streams, Kafka Connect, and KSQL. Read more.
Add to your personal schedule
9:00 - 17:00 Monday, 29 April & Tuesday, 30 April
Data Engineering and Architecture
Location: London Suite 3
Jorge Lopez (Amazon Web Services)
Serverless technologies let you build and scale applications and services rapidly without the need to provision or manage servers. In this workshop, we show you how to incorporate serverless concepts into your big data architectures, looking at design patterns to ingest, store, and analyze your data. You will build a big data application using AWS technologies such as S3, Athena, Kinesis, and more Read more.
Add to your personal schedule
13:3017:00 Tuesday, 30 April 2019
Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio)
Many industry segments have been grappling with fast data (high-volume, high-velocity data). In this tutorial we shall lead the audience through a journey of the landscape of state-of-the-art systems for each stage of an end-to-end data processing pipeline - messaging, compute and storage - for real-time data and algorithms to extract insights - e.g., heavy-hitters, quantiles - from data streams. Read more.
Add to your personal schedule
11:1511:55 Wednesday, 1 May 2019
Data Engineering and Architecture
Location: Capital Suite 7
Itai Yaffe (Nielsen)
At Nielsen Marketing Cloud, we provide our customers (marketers and publishers) real-time analytics tools to profile their target audiences. To achieve that, we need to ingest billions of events per day into our big data stores and we need to do it in a scalable yet cost-efficient manner. In this talk, we will discuss how we continuously transform our data infrastructure to support these goals. Read more.
Add to your personal schedule
12:0512:45 Wednesday, 1 May 2019
Data Engineering and Architecture
Location: Capital Suite 8/9
Robin Moffatt (Confluent)
This talk discusses the concepts of events, their relevance to software and data engineers and their ability to unify architectures in a powerful way. It describes why analytics, data integration and ETL fit naturally into a streaming world. There'll be a hands-on demonstration of these concepts in practice and commentary on the design choices made. Read more.
Add to your personal schedule
14:5515:35 Wednesday, 1 May 2019
Data Engineering and Architecture
Location: Capital Suite 8/9
Mark Grover (Lyft), Deepak Tiwari (Lyft)
Lyft’s data platform is at the heart of Lyft’s business. Decisions all the way from pricing, to ETA, to business operations rely on Lyft’s data platform. Moreover, it powers the enormous scale and speed at which Lyft operates. In this talk, Mark Grover walks through various choices Lyft has made in the development and sustenance of the data platform and why along with what lies ahead in future. Read more.
Add to your personal schedule
16:3517:15 Wednesday, 1 May 2019
Anirudha Beria (Qubole), Rohit Karlupia (Qubole)
Autoscaling of resources aims to achieve low latency for a big data application, while reducing resource costs at the same time. Scalability aware autoscaling aims to use historical information to make better scaling decisions. In this talk we will talk about (1) Measuring efficiency of autoscaling policies and (2) coming up with more efficient autoscaling policies, in terms of latency and costs. Read more.
Add to your personal schedule
17:2518:05 Wednesday, 1 May 2019
Data Engineering and Architecture
Location: Capital Suite 7
Ted Malaska (Capital One)
In the world of data it is all about building the best path to support time/quality to value. 80% to 90% of the work is getting the data into the hands and tools that can create value. This talk will take us on a journey of different patterns and solution that can work at the largest of companies. Read more.
Add to your personal schedule
11:1511:55 Thursday, 2 May 2019
Data Engineering and Architecture
Location: Capital Suite 10/11
Eoin O'Flanagan (Newday), Darragh McConville (Kainos)
In this session you will learn how we have built a high-performance contemporary data processing platform, from the ground up, on AWS. We will discuss our journey from legacy, onsite, traditional data estate to an entirely cloud-based, PCI DSS-compliant platform. Read more.
Add to your personal schedule
12:0512:45 Thursday, 2 May 2019
David Josephsen (Sparkpost)
This is the story of how Sparkpost Reliability Engineering abandoned ELK for a DIY Schema-On-Read logging infrastructure. We share architectural details and tribulations from our _Internal Event Hose_ data ingestion pipeline project, which uses Fluentd, Kinesis, Parquet and AWS Athena to make logging sane. Read more.
Add to your personal schedule
14:0514:45 Thursday, 2 May 2019
Data Engineering and Architecture
Location: Capital Suite 10/11
Ravi Suhag (Go Jek)
At GO-JEK, we build products that help millions of Indonesians commute, shop, eat and pay, daily. The Data team is responsible to create resilient and scalable data infrastructure across all of GO-JEK’s 18+ products. This involves building distributed big data infrastructure, real-time analytics and visualization pipelines for billions of data points per day. Read more.
Add to your personal schedule
14:0514:45 Thursday, 2 May 2019
Data Science, Machine Learning & AI
Location: Expo Hall (Capital Hall N24)
Alex Jaimes (Dataminr)
When emergency events occur, social signals and sensor data are generated. In this talk, I will describe how Machine Learning and Deep Learning are applied in processing large amounts of heterogeneous data from various sources in real time, with a particular focus on how such information can be used for emergencies and in critical events for first responders and for other social good use cases. Read more.
Add to your personal schedule
14:5515:35 Thursday, 2 May 2019
Data Engineering and Architecture
Location: Capital Suite 10/11
Jason Bell (DeskHoppa)
The Embulk data migration tool offers a convenient way to load data in to a variety of systems with basic configuration. This talk gives an overview of the Embulk tool and shows some common data migration scenarios that a data engineer could employ using the tool. Read more.
Add to your personal schedule
14:5515:35 Thursday, 2 May 2019
Data Engineering and Architecture
Location: Capital Suite 8/9
Jane McConnell (Teradata), Sun Maria Lehmann (Equinor)
In Upstream Oil and Gas, a vast amount of the data requested for analytics projects is “scientific data” - physical measurements about the real world. Historically this data has been managed “library-style” in files - but to provide this data to analytics projects, we need to do something different. Sun and Jane discuss architectural best practices learned from their work with subsurface data. Read more.
Add to your personal schedule
16:3517:15 Thursday, 2 May 2019
Data Engineering and Architecture
Location: Capital Suite 7
Max Schultze (Zalando SE)
Data Lake implementation at a large scale company, raw data collection, standardized data preparation (e.g. binary conversion, partitioning), user driven analytics and machine learning. Read more.
Add to your personal schedule
16:3517:15 Thursday, 2 May 2019
Dean Wampler (Lightbend)
Your team is building Machine Learning capabilities. I'll discuss how you can integrate these capabilities in streaming data pipelines so you can leverage the results quickly and update them as needed. There are big challenges. How do you build long-running services that are very reliable and scalable? How do you combine a spectrum of very different tools, from data science to operations? Read more.
Add to your personal schedule
16:3517:15 Thursday, 2 May 2019
Feng Lu (Google Cloud), James Malone (Google), Apurva Desai (Google Cloud), Cameron Moberg (Truman State University / Google Cloud)
Apache Oozie and Apache Airflow (incubating) are both widely used workflow orchestration systems where the former focuses on Apache Hadoop jobs. We see a need to build oozie to Airflow workflow mapping as a part of creating an effective cross-cloud/cross-system solution. This talk aims to introduce an open-source Oozie-to-Airflow migration tool developed at Google. Read more.