Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

Schedule: Data Platform sessions

9:00am5:00pm Tuesday, March 14, 2017
Location: LL20 B
Michael Abbott (Stanford University), Christopher Pouliot (Nio), Jennifer Anderson, Renee DiResta (New Knowledge), Coco Krumme (Haven | UC Berkeley), Ryan Baumann (Mapbox), JAVONA WHITE BEAR (IBM), Andre Luckow (BMW Group), Rajiv Paul (Yakit), Evangelos Simoudis (Synapse Partners), Roland Major (Transport for London), Rodrigo Fontecilla (Unisys), Lloyd Palum (Vnomics), Andreas Ribbrock (#zeroG, A Lufthansa Systems Company)
Data, Transportation, and Logistics Day offers a daylong deep-dive into how data science is changing transportation and logistics. We’ll investigate the latest advances in and applications of self-driving vehicles, automated drones, and embedded sensors and explore how new uses of data are challenging the industry to evolve infrastructure for the future. Read more.
9:00am5:00pm Tuesday, March 14, 2017
Location: LL20 A
Barbara Eckman (Comcast), Dirk Jungnickel (Emirates Integrated Telecommunications Company (du)), Kishore Papineni (Astellas Pharma), Paul Barth (Podium Data), Carlo Torniai (Pirelli Tyre), Bryan Harrison (American Express), Chris Murphy (Zurich Insurance Group), Martin Lidl (Deloitte), Maura Lynch (Pinterest), Nixon Patel (Kovid Group), Bas Geerdink (ING), Robin Li (Tapjoy), Yohan Chin (Tapjoy), Jim Harrold (NationBuilder), Lana Novikova (Heartbeat AI Technologies)
In a series of 12 half-hour talks aimed at a business audience, you’ll hear data-themed case studies from household brands and global companies, explaining the challenges they wanted to tackle, the approaches they took, and the benefits—and drawbacks—of their solutions. If you want practical insights about applied data, look no further. Read more.
11:00am11:40am Wednesday, March 15, 2017
Data engineering and architecture, Enterprise adoption
Location: 230 A Level: Beginner
Felix Gorodishter (GoDaddy)
Average rating: ****.
(4.25, 4 ratings)
GoDaddy ingests and analyzes 100,000 EPS of logs, metrics, and events each day. Felix Gorodishter shares GoDaddy's big data journey and explains how the company makes sense of 10+-TB-per-day growth for operational insights of its cloud leveraging Kafka, Hadoop, Spark, Pig, Hive, Cassandra, and Elasticsearch. Read more.
11:00am11:40am Wednesday, March 15, 2017
Peng Du (Uber Inc.), Randy Wei (Uber Inc.)
Average rating: ***..
(3.11, 9 ratings)
Peng Du and Randy Wei offer an overview of Uber’s data science workbench, which provides a central platform for data scientists to perform interactive data analysis through notebooks, share and collaborate on scripts, and publish results to dashboards and is seamlessly integrated with other Uber services, providing convenient features such as task scheduling, model publishing, and job monitoring. Read more.
11:50am12:30pm Wednesday, March 15, 2017
Spark & beyond
Location: 230 A Level: Intermediate
Jasjeet Thind (Zillow)
Average rating: ****.
(4.50, 2 ratings)
Zillow pioneered providing access to unprecedented information about the housing market. Long gone are the days when you needed an agent to get comparables and prior sale and listing data. And with more data, data science has enabled more use cases. Jasjeet Thind explains how Zillow uses Spark and machine learning to transform real estate. Read more.
11:50am12:30pm Wednesday, March 15, 2017
Data engineering and architecture
Location: LL20 A Level: Intermediate
Christopher Colburn (Netflix), Monal Daxini (Netflix)
Average rating: ****.
(4.00, 3 ratings)
In the past, typical real-time data processing was reserved for answering operational questions and very basic analytical questions, but with better processing frameworks and more-capable hardware, the streaming context can now enable personalization applications. Christopher Colburn and Monal Daxini explore the challenges faced when building a streaming application at scale at Netflix. Read more.
11:50am12:30pm Wednesday, March 15, 2017
Jure Leskovec (Pinterest)
Average rating: ****.
(4.82, 11 ratings)
Pinterest built a flexible, graph-based system for making recommendations to users in real time. The system uses random walks on a user-and-object graph in order to make personalized recommendations to 100+ million Pinterest users out of a catalog of over a billion items. Jure Leskovec explains how Pinterest built its modern recommendation engine and the lessons learned along the way. Read more.
1:50pm2:30pm Wednesday, March 15, 2017
Enterprise adoption
Location: 230 A Level: Intermediate
Eric Richardson (American Chemical Society)
Average rating: **...
(2.50, 2 ratings)
Eric Richardson explains how ACS used Hadoop, HBase, Spark, Kafka, and Solr to create a hybrid cloud enterprise data hub that scales without drama and drives adoption by ease of use, covering the architecture, technologies used, the challenges faced and defeated, and problems yet to solve. Read more.
1:50pm2:30pm Wednesday, March 15, 2017
Business case studies, Strata Business Summit
Location: 210 D/H Level: Intermediate
Chandan Joarder (Macys.com)
Average rating: ***..
(3.56, 9 ratings)
Chandan Joarder shares a guide to building real-time dashboards in-house using tools such as Kafka, web frameworks, and an in-memory database, utilizing JavaScript and Scala. Along the way, Chandan also discusses the architectural principles used in these dashboards to provide up-to-the-hour business performance metrics and alerts. Read more.
1:50pm2:30pm Wednesday, March 15, 2017
Data engineering and architecture
Location: LL20 A Level: Intermediate
Average rating: *****
(5.00, 2 ratings)
Data warehouses are critical in driving business decisions—with SQL dominantly used to build ETL pipelines. While the technology has shifted from using RDBMS-centric data warehouses to data pipelines based on Hadoop and MPP databases, engineering and quality processes have not kept pace. Avinash Padmanabhan highlights the changes that Intuit's team made to improve processes and data quality. Read more.
2:40pm3:20pm Wednesday, March 15, 2017
Big data and the Cloud, Enterprise adoption
Location: 230 A Level: Intermediate
Gwen Shapira (Confluent), Bob Lehmann (Bayer)
Average rating: ****.
(4.50, 2 ratings)
Gwen Shapira and Bob Lehmann share their experience and patterns building a cross-data-center streaming data platform for Monsanto. Learn how to facilitate your move to the cloud while "keeping the lights on" for legacy applications. In addition to integrating private and cloud data centers, you'll discover how to establish a solid foundation for a transition from batch to stream processing. Read more.
2:40pm3:20pm Wednesday, March 15, 2017
Platform Security and Cybersecurity
Location: LL21 B Level: Intermediate
Ajit Gaddam (VISA), Jiphun Satapathy (VISA)
Average rating: ***..
(3.83, 6 ratings)
Apache Kafka is used by over 35% of Fortune 500 companies to store and process some of their most sensitive datasets. Ajit Gaddam and Jiphun Satapathy provide a security reference architecture to secure your Kafka cluster while leveraging it to support your organization's cybersecurity requirements. Read more.
4:20pm5:00pm Wednesday, March 15, 2017
Business case studies, Strata Business Summit
Location: 210 D/H Level: Intermediate
Alan Chaney (Bitvore Corp)
Average rating: ***..
(3.50, 2 ratings)
Bitvore Corp’s Bitvore for Munis personalized news surveillance system is rapidly becoming a must-have for all major fixed-income securities analysts, investors, and brokers working in the three-trillion-dollar municipal bond market in the USA. Alan Chaney explains how Bitvore delivers the few important and relevant articles out of thousands each day, saving users many hours daily. Read more.
4:20pm5:00pm Wednesday, March 15, 2017
Spark & beyond
Location: LL21 C/D Level: Beginner
Average rating: ***..
(3.00, 3 ratings)
Spark powers various services in Bing, but the Bing team had to customize and extend Spark to cover its use cases and scale the implementation of Spark-based data pipelines to handle internet-scale data volume. Kaarthik Sivashanmugam explores these use cases, covering the architecture of Spark-based data platforms, challenges faced, and the customization done to Spark to address the challenges. Read more.
4:20pm5:00pm Wednesday, March 15, 2017
Data engineering and architecture
Location: LL20 C Level: Intermediate
Kevin Mao (Capital One)
Average rating: ****.
(4.67, 3 ratings)
Kevin Mao explores the value of and challenges associated with collecting raw security event data from disparate corners of enterprise infrastructure and transforming them into high-quality intelligence that can be used to forecast, detect, and mitigate cybersecurity threats. Read more.
11:00am11:40am Thursday, March 16, 2017
Data engineering and architecture, Real-time applications
Location: LL20 A Level: Intermediate
Tony Xing (Microsoft)
Average rating: ***..
(3.00, 2 ratings)
Tony Xing offers an overview of Microsoft's common anomaly detection platform, an API service built internally to provide product teams the flexibility to plug in any anomaly detection algorithms to fit their own signal types. Read more.
11:00am11:40am Thursday, March 16, 2017
Stream processing and analytics
Location: LL20 D Level: Intermediate
Bill Graham (Twitter), Avrilia Floratau (Microsoft), Ashvin Agrawal (Microsoft)
Twitter processes billions of events per day the instant the data is generated using Heron, an open source streaming engine tailored for large-scale environments. Bill Graham, Avrilia Floratau, and Ashvin Agrawal explore the techniques Heron uses to elastically scale resources in order to handle highly varying loads without sacrificing real-time performance or user experience. Read more.
11:00am11:40am Thursday, March 16, 2017
Business case studies, Strata Business Summit
Location: 210 D/H Level: Intermediate
Todd Mostak (MapD), Abdul Subhan (Verizon Wireless)
Average rating: ****.
(4.00, 2 ratings)
With more than 91M customers, Verizon produces oceans of data. The challenge this onslaught presents isn’t one of storage—it’s one of speed. The solution? Harnessing the power of GPUs to access insights in less than a millisecond. Todd Mostak and Abdul Subhan explain how Verizon solved its data challenge by implementing GPU-tuned analytics and visualization. Read more.
1:50pm2:30pm Thursday, March 16, 2017
Mike Koelemay (Sikorsky Aircraft, Lockheed Martin)
Average rating: *****
(5.00, 2 ratings)
Sikorsky collects data onboard thousands of helicopters deployed worldwide that is used for fleet management services, engineering analyses, and business intelligence. Mike Koelemay offers an overview of the data platform that Sikorsky has built to manage the ingestion, processing, and serving of this data so that it can be used to rapidly generate information to drive decision making. Read more.
2:40pm3:20pm Thursday, March 16, 2017
Gleicon Moraes (luc.id), Arthur Grava (Luizalabs)
Average rating: ****.
(4.00, 3 ratings)
Gleicon Moraes and Arthur Grava share war stories about developing and deploying a cloud-based large-scale recommender system for a top-three Brazilian ecommerce company. The system, which uses Cassandra and graph traversal, led to a more than 15% increase in sales. Read more.
4:20pm5:00pm Thursday, March 16, 2017
Business case studies, Strata Business Summit
Location: 210 D/H Level: Intermediate
Mahesh Goud T (Ticketmaster)
Average rating: **...
(2.00, 1 rating)
Mahesh Goud shares success stories using Ticketmaster's large-scale contextual bandit platform for SEM, which determines the optimal keyword bids under evolving keyword contexts to meet different business requirements, and explores Ticketmaster's streaming pipeline, consisting of Storm, Kafka, HBase, the ELK Stack, and Spring Boot. Read more.