Presented By
O’Reilly + Cloudera
Make Data Work
29 April–2 May 2019
London, UK

Schedule: Data Platforms sessions

Add to your personal schedule
9:0012:30 Tuesday, 30 April 2019
Data Engineering and Architecture
Location: Capital Suite 9
Mark Madsen (Think Big Analytics), Todd Walter (Teradata)
Building a data lake involves more than installing Hadoop or putting data into AWS. The goal in most organizations is to build a multiuse data infrastructure that is not subject to past constraints. Mark Madsen and Todd Walter explore design assumptions and principles and walk you through a reference architecture to use as you work to unify your analytics infrastructure. Read more.
Add to your personal schedule
11:1511:55 Wednesday, 1 May 2019
Data Engineering and Architecture
Location: Capital Suite 8/9
Moty Fania (Intel)
In this session, Moty Fania will share his experience of implementing a Sales AI platform. It handles processing of millions of website pages and sifting thru millions of tweets per day. The platform is based on unique open source technologies and was designed for real-time, data extraction and actuation. This session highlights the key learnings with a thorough review of the architecture. Read more.
Add to your personal schedule
12:0512:45 Wednesday, 1 May 2019
Case studies, Strata Business Summit
Location: Capital Suite 12
Dirk Petzoldt (Zalando SE)
Case Study from Europe’s leading online fashion platform Zalando about its journey to a scalable, personalized Machine Learning based marketing platform. Read more.
Add to your personal schedule
14:0514:45 Wednesday, 1 May 2019
Data Engineering and Architecture
Location: Capital Suite 8/9
Jian Chang (Alibaba Group), Sanjian Chen (Alibaba Group)
We would like to share the architecture design and many detailed technology innovations of Alibaba TSDB, a state-of-the-art database for IoT data management, from years of development and continuous improvement. Read more.
Add to your personal schedule
14:0514:45 Wednesday, 1 May 2019
Ananth Durai (Slack Technologies Inc)
Logs are everywhere. Every organization collects tons of data every day. The logs are as good as the trust it earns to make business-critical decisions. Building trust and reliability of logs are critical to creating a data-driven organization. Ananth walkthrough his experience building reliable logging infrastructure at Slack and how it helped to build confidence on data. Read more.
Add to your personal schedule
14:5515:35 Wednesday, 1 May 2019
Data Engineering and Architecture
Location: Capital Suite 8/9
Mark Grover (Lyft), Deepak Tiwari (Lyft)
Lyft’s data platform is at the heart of Lyft’s business. Decisions all the way from pricing, to ETA, to business operations rely on Lyft’s data platform. Moreover, it powers the enormous scale and speed at which Lyft operates. In this talk, Mark Grover walks through various choices Lyft has made in the development and sustenance of the data platform and why along with what lies ahead in future. Read more.
Add to your personal schedule
14:5515:35 Wednesday, 1 May 2019
Law and Ethics, Strata Business Summit
Location: Capital Suite 10/11
Our experience with building the Business Intelligence platform has been nothing short of extraordinary. The proposal contains details about how Uber thought about building it's Business Intelligence platform. In this talk, I’ll narrate the journey of deciding on how we took a platform approach rather than adding features in a piecemeal fashion. Read more.
Add to your personal schedule
16:3517:15 Wednesday, 1 May 2019
Data Engineering and Architecture
Location: Capital Suite 8/9
Neelesh Salian (Stitch Fix)
Developing data infrastructure is not trivial and neither is changing it. It takes effort and discipline to make changes that can affect your team. In this talk, we shall learn what we, in Stitch Fix's Data Platform team, do to maintain and innovate our infrastructure for our Data Scientists. Read more.
Add to your personal schedule
17:2518:05 Wednesday, 1 May 2019
Felix Cheung (Uber)
Did you know that your Uber rides are powered by Apache Spark? Join Felix Cheung to learn how Uber is building its data platform with Apache Spark at enormous scale and discover the unique challenges the company faced and overcame. Read more.
Add to your personal schedule
17:2518:05 Wednesday, 1 May 2019
Data Engineering and Architecture
Location: Capital Suite 8/9
Hussein Mehanna (Google Cloud)
AI will change how we live in the next 30 years. However, AI is still limited to a small group of companies. Building AI systems is expensive and difficult. But in order to scale the impact of AI across the globe, we need to reduce the cost of building AI solutions? How can we do that? Can we learn from other industries? Yes, we can. The automobile industry went through a similar cycle. Read more.
Add to your personal schedule
17:2518:05 Wednesday, 1 May 2019
Mark Samson (Cloudera)
It is now possible to build a modern data platform capable of storing, processing and analysing a wide variety of data across multiple public and private Cloud platforms and on-premise data centres. This session will outline an information architecture for such a platform, informed by working with multiple large organisations who have built such platforms over the last 5 years. Read more.
Add to your personal schedule
11:1511:55 Thursday, 2 May 2019
Thomas Weise (Lyft)
Fast data and stream processing are essential for making Lyft rides a good experience for passengers and drivers. Our systems need to track and react to event streams in real-time, to update locations, compute routes and estimates, balance prices and more. The streaming platform at Lyft powers these use cases with development frameworks and deployment stack that are based on Apache Flink and Beam. Read more.
Add to your personal schedule
12:0512:45 Thursday, 2 May 2019
David Josephsen (Sparkpost)
This is the story of how Sparkpost Reliability Engineering abandoned ELK for a DIY Schema-On-Read logging infrastructure. We share architectural details and tribulations from our _Internal Event Hose_ data ingestion pipeline project, which uses Fluentd, Kinesis, Parquet and AWS Athena to make logging sane. Read more.
Add to your personal schedule
12:0512:45 Thursday, 2 May 2019
Pradeep Bhadani (Hotels.com), Elliot West (Hotels.com)
Expedia Group is a travel platform with an extensive portfolio including Expedia.com and Hotels.com. We like to give our data teams flexibility and autonomy to work with different technologies. However, this approach generates challenges that cannot be solved by existing tools. We'll explain how we built a unified virtual data lake on top of our many heterogeneous and distributed data platforms. Read more.
Add to your personal schedule
12:0512:45 Thursday, 2 May 2019
Data Engineering and Architecture
Location: Capital Suite 8/9
Václav Surovec (Deutsche Telekom IT), Gabor Kotalik (Deutsche Telekom AG)
The knowledge of location and travel patterns of customers is important for many companies. One of them is a German telco service operator T-Mobile Czech Republic. Commercial Roaming project using Cloudera Hadoop helped the company to better analyze the behavior of its customers from 10 countries, in a very secure way, to be able to provide better predictions and visualizations for the management. Read more.
Add to your personal schedule
14:0514:45 Thursday, 2 May 2019
Data Engineering and Architecture
Location: Capital Suite 8/9
Willem Pienaar (GO-JEK), Zhi Ling Chen (GO-JEK)
Features are key to driving impact with AI at all scales. By democratizing the creation, discovery, and access of features through a unified platform, organizations are able to dramatically accelerate innovation and time to market. Find out how GO-JEK, Indonesia's first billion-dollar startup, built a feature platform to unlock insights in AI, and the lessons they learned along the way. Read more.
Add to your personal schedule
14:0514:45 Thursday, 2 May 2019
Data Engineering and Architecture
Location: Capital Suite 10/11
Ravi Suhag (Go Jek)
At GO-JEK, we build products that help millions of Indonesians commute, shop, eat and pay, daily. The Data team is responsible to create resilient and scalable data infrastructure across all of GO-JEK’s 18+ products. This involves building distributed big data infrastructure, real-time analytics and visualization pipelines for billions of data points per day. Read more.
Add to your personal schedule
16:3517:15 Thursday, 2 May 2019
Nanda Vijaydev (BlueData), Thomas Phelan (BlueData)
Organizations need to keep ahead of their competition by using the latest AI/ML/DL technologies such as Spark, TensorFlow, and H2O. The challenge is in how to deploy these tools and keep them running in a consistent manner while maximizing the use of scarce hardware resources, such as GPUs. This session will discuss the effective deployment of such applications in a container environment. Read more.