San FranciscoLondon New York

Presented By
O’Reilly + Cloudera

Make Data Work

March 25-28, 2019
San Francisco, CA

Schedule: Streaming and IoT sessions

9:00am–12:30pm Tuesday, March 26, 2019

Introduction to Flink via Flink SQL

Location: 2004

Secondary topics: Streaming, realtime analytics, and IoT

Fabian Hueske (Ververica)

Average rating:

(5.00, 1 rating)

Fabian Hueske offers an overview of Apache Flink via the SQL interface, covering stream processing and Flink's various modes of use. Then you'll use Flink to run SQL queries on data streams and contrast this with the Flink DataStream API. Read more.

9:00am–12:30pm Tuesday, March 26, 2019

Hands-on machine learning with Kafka-based streaming pipelines

Location: 2007

Secondary topics: Data Integration and Data Pipelines, Data preparation, data governance, and data lineage, Model lifecycle management

Boris Lublinsky (Lightbend), Dean Wampler (Anyscale)

Average rating:

(3.85, 13 ratings)

Boris Lublinsky and Dean Wampler walk you through using ML in streaming data pipeline and doing periodic model retraining and low-latency scoring in live streams. You'll explore using Kafka as a data backplane, the pros and cons of microservices versus systems like Spark and Flink, tips for TensorFlow and SparkML, performance considerations, model metadata tracking, and other techniques. Read more.

5:10pm–5:50pm Wednesday, March 27, 2019

Critical turbine maintenance: Monitoring and diagnosing planes and power plants in real time

Location: 2006

Secondary topics: Streaming, realtime analytics, and IoT, Transportation and Logistics

June Andrews (GE), John Rutherford (GE)

Average rating:

(4.50, 2 ratings)

GE produces a third of the world's power and 60% of its airplane engines—a critical portion of the world's infrastructure that requires meticulous monitoring of the hundreds of sensors streaming data from each turbine. June Andrews and John Rutherford explain how GE's monitoring and diagnostics teams released the first real-time ML systems used to determine turbine health into production. Read more.

11:00am–11:40am Thursday, March 28, 2019

How Zhaopin.com built its enterprise event bus using Apache Pulsar

Location: 2006

Secondary topics: Data Platforms, Media, Marketing, Advertising, Streaming, realtime analytics, and IoT

Sijie Guo (StreamNative), Penghui Li (Zhaopin)

Average rating:

(4.00, 1 rating)

Using a messaging system to build an event bus is very common. However, certain use cases demand a messaging system with a certain set of features. Sijie Guo and Penghui Li discuss the event bus requirements for Zhaopin.com, one of China's biggest online recruitment services providers, and explain why the company chose Apache Pulsar. Read more.

11:50am–12:30pm Thursday, March 28, 2019

Flink SQL in action

Location: 2004

Secondary topics: Data Integration and Data Pipelines, Streaming, realtime analytics, and IoT

Fabian Hueske (Ververica)

Average rating:

(4.30, 10 ratings)

Processing streaming data with SQL is becoming increasingly popular. Fabian Hueske explains why SQL queries on streams should have the same semantics as SQL queries on static data. He then shares a selection of common use cases and demonstrates how easily they can be addressed with Flink SQL. Read more.

11:50am–12:30pm Thursday, March 28, 2019

Serverless for data and AI

Location: 2007

Secondary topics: AI and Data technologies in the cloud, Data Integration and Data Pipelines, Data Platforms

Avner Braverman (Binaris)

Average rating:

(4.00, 3 ratings)

What is serverless, and how can it be utilized for data analysis and AI? Avner Braverman outlines the benefits and limitations of serverless with respect to data transformation (ETL), AI inference and training, and real-time streaming. This is a technical talk, so expect demos and code. Read more.

4:40pm–5:20pm Thursday, March 28, 2019

Apache Druid autoscale-out/in for streaming data ingestion on Kubernetes

Location: 2006

Secondary topics: AI and Data technologies in the cloud

Jinchul Kim (SK Telecom)

Average rating:

(2.17, 6 ratings)

Druid supports autoscaling for data ingestion, but it's only available on AWS EC2. You can't rely on the feature on your private cloud. Jinchul Kim demonstrates autoscale-out/in on Kubernetes, details the benefit on this approach, and discusses the development of Druid Helm charts, rolling updates, and custom metric usage for horizontal autoscaling. Read more.

Presented by

Strategic Sponsors

Zettabyte Sponsor

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Supporting Sponsor

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com