Presented By O’Reilly and Cloudera
Make Data Work
21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Schedule: Data Platforms sessions

How do leading companies architect and develop data platforms? What features are important? In a series of sessions, companies will share their internal platforms for machine learning and AI. These are battle tested platforms used in production, some at extremely large-scale.

9:0012:30 Tuesday, 22 May 2018
Data engineering and architecture
Location: Capital Suite 14 Level: Intermediate
Mark Madsen (Teradata), Todd Walter (Archimedata)
Average rating: ****.
(4.29, 7 ratings)
Building a data lake involves more than installing Hadoop or putting data into AWS. The goal in most organizations is to build multiuse data infrastructure that is not subject to past constraints. Mark Madsen and Todd Walter explore design assumptions and principles and walk you through a reference architecture to use as you work to unify your analytics infrastructure. Read more.
13:3017:00 Tuesday, 22 May 2018
SOLD OUT
Data engineering and architecture
Location: Capital Suite 12 Level: Advanced
Ted Malaska (Capital One), Jonathan Seidman (Cloudera)
Average rating: ****.
(4.33, 3 ratings)
Using Customer 360 and the IoT as examples, Jonathan Seidman and Ted Malaska explain how to architect a modern, real-time big data platform leveraging recent advancements in the open source software world, using components like Kafka, Flink, Kudu, Spark Streaming, and Spark SQL and modern storage engines to enable new forms of data processing and analytics. Read more.
11:1511:55 Wednesday, 23 May 2018
Data engineering and architecture
Location: S11B Level: Intermediate
Jason Heo (Naver), Dooyong Kim (Navercorp)
Average rating: ***..
(3.00, 1 rating)
Naver.com is the largest search engine in Korea, with a 70% share of the Korean search market, and it handles billions of pages and events everyday. Jason Heo and Dooyong Kim offer an overview of Naver's web analytics system, built with Druid. Read more.
12:0512:45 Wednesday, 23 May 2018
Baolong Mao (JD.com), Yiran Wu (JD.com), Yupeng Fu (Alluxio)
Mao Baolong, Yiran Wu, and Yupeng Fu explain how JD.com uses Alluxio to provide support for ad hoc and real-time stream computing, using Alluxio-compatible HDFS URLs and Alluxio as a pluggable optimization component. To give just one example, one framework, JDPresto, has seen a 10x performance improvement on average. Read more.
14:0514:45 Wednesday, 23 May 2018
Data engineering and architecture
Location: S11B Level: Intermediate
Carsten Herbe (Audi Business Innovation GmbH), Matthias Graunitz (Audi AG)
Average rating: ****.
(4.33, 3 ratings)
Carsten Herbe and Matthias Graunitz detail Audi's journey from a Hadoop proof of concept to a multitenant enterprise platform, sharing lessons learned, the decisions Audi made, and how a number of use cases are implemented using the platform. Read more.
11:1511:55 Thursday, 24 May 2018
Neelesh Salian (Stitch Fix)
Average rating: *....
(1.00, 1 rating)
Neelesh Srinivas Salian offers an overview of the compute infrastructure used by the data science team at Stitch Fix, covering the architecture, tools within the larger ecosystem, and the challenges that the team overcame along the way. Read more.
11:1511:55 Thursday, 24 May 2018
Data engineering and architecture
Location: S11B Level: Intermediate
Irene Gonzálvez (Spotify)
Average rating: ***..
(3.88, 8 ratings)
Irene Gonzálvez shares Spotify's process for ensuring data quality, covering why and how the company became aware of its importance, the products it has developed, and future strategy. Read more.
11:1511:55 Thursday, 24 May 2018
Kinnary Jangla (Pinterest)
Average rating: ***..
(3.00, 5 ratings)
Having trouble coordinating development of your production ML system between a team of developers? Microservices drifting and causing problems debugging? Kinnary Jangla explains how Pinterest dockerized the services powering its home feed and how it impacted the engineering productivity of its ML teams while increasing uptime and ease of deployment. Read more.
12:0512:45 Thursday, 24 May 2018
Moty Fania (Intel)
Moty Fania explains how Intel implemented an AI inference platform to enable internal visual inspection use cases and shares lessons learned along the way. The platform is based on open source technologies and was designed for real-time streaming and online actuation. Read more.
14:0514:45 Thursday, 24 May 2018
Tony Xing (Microsoft), Bixiong Xu (Microsoft)
Average rating: **...
(2.00, 1 rating)
Tony Xing and Bixiong Xu offer an overview of Project Kensho, Microsoft's one-stop shop for business incident monitoring and automated insights. Tony and Bixiong cover the technology's evolution, the architecture, the algorithms, and the benefits and the trade-offs. Along the way, they share a case study on Bing ads key metrics monitoring and automated diagnostic insights. Read more.
14:5515:35 Thursday, 24 May 2018
Data-driven business management, Strata Business Summit
Location: Capital Suite 15/16 Level: Non-technical
Simon Chan (Salesforce)
Average rating: ****.
(4.00, 1 rating)
The promises of AI are great, but taking the steps to implement AI within an enterprise is challenging. The secret behind enterprise AI success often traces back to the underlying platform that accelerates AI development at scale. Based on years of experience helping executives establish AI product strategies, Simon Chan helps you discover the AI platform journey that is right for your business. Read more.
14:5515:35 Thursday, 24 May 2018
Data science and machine learning, Expo Hall
Location: Expo Hall Level: Beginner
Stamatis Stefanakos (D ONE AG)
Average rating: ****.
(4.33, 3 ratings)
Switzerland-based startup WinJi capitalizes on two current megatrends: big data and renewable energy. Stamatis Stefanakos offers an overview of WinJi's TruePower Asset Management Platform, covering the overall architecture and the motivation behind it, the physics behind the data, and the business case. Read more.
14:5515:35 Thursday, 24 May 2018
Alvin HEIB (Cloudera), guy le roux (Atos)
Alvin Heib and Guy Leroux offer an overview of ClickFox, a platform able to cope with high-performance analytical needs, from bits and bytes to solving a customer needs, covering the platform's virtualization, big data, and analytical layers. Read more.
16:3517:15 Thursday, 24 May 2018
Naghman Waheed (Bayer Crop Science), Brian Arnold (Bayer)
Average rating: ****.
(4.50, 2 ratings)
There are a number of tools that make it easy to implement a data lake. However, most lack the essential features that prevent your data lake from turning into a data swamp. Naghman Waheed and Brian Arnold offer an overview of Monsanto's Data Historian platform, which can ingest, store, and access datasets without compromising ease of use, governance, or security. Read more.