Presented By O’Reilly and Cloudera

San Jose • London • New York

Make Data Work

March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

Schedule: Expo Hall sessions

11:00am–11:40am Wednesday, March 7, 2018

Being smarter than dinosaurs: How NASA uses deep learning for planetary defense

Data science and machine learning
Location: Expo Hall 1

Siddha Ganju (NVIDIA)

Siddha Ganju explains how the FDL lab at NASA uses artificial intelligence to improve and automate the identification of meteors above human-level performance using meteor shower images and recover known meteor shower streams and characterize previously unknown meteor showers using orbital data—research aimed at providing more warning time for long-period comet impacts. Read more.

11:50am–12:30pm Wednesday, March 7, 2018

Spark NLP in action: Improving patient flow forecasting at Kaiser Permanente

Data science and machine learning
Location: Expo Hall 1

David Talby (Pacific AI), Santosh Kulkarni (Kaiser Permanente)

Average rating:

(3.50, 2 ratings)

David Talby and Santosh Kulkarni explain how Kaiser Permanente uses the open source NLP library for Apache Spark to tackle one of the most common challenges with applying natural language process in practice: integrating domain-specific NLP as part of a scalable, performant, measurable, and reproducible machine learning pipeline. Read more.

1:50pm–2:30pm Wednesday, March 7, 2018

Lessons learned deploying machine learning and deep learning models in production at major tech companies

Data science and machine learning
Location: Expo Hall 1

Harish Doddi (Datatron), Jerry Xu (Datatron Technologies)

Average rating:

(4.00, 3 ratings)

Deploying machine learning models and deep learning models in production is hard. Harish Doddi and Jerry Xu outline the enterprise data science lifecycle, covering how production model deployment flow works, challenges, best practices, and lessons learned. Along the way, they explain why monitoring models in the production should be mandatory. Read more.

2:40pm–3:20pm Wednesday, March 7, 2018

Real-time deep link analytics: The next stage of graph analytics

Big data and data science in the cloud, Data engineering and architecture, Data-driven business management, Platform security and cybersecurity, Streaming systems and real-time applications
Location: Expo Hall 1

Yu Xu (TigerGraph)

Average rating:

(5.00, 2 ratings)

Graph databases are the fastest growing category in data management. However, most graph queries only traverse two hops in big graphs due to limitations in most graph databases. Real-world applications require deep link analytics that traverse far more than three hops. Yu Xu offers an overview of a fraud detection system that manages 100 billion graph elements to detect risk and fraudulent groups. Read more.

4:20pm–5:00pm Wednesday, March 7, 2018

Leveraging live data to realize the smart cities vision

Data engineering and architecture
Location: Expo Hall 1

Arun Kejariwal (Independent), Roman Smolgovsky (MZ)

One of the key application domains leveraging live data is smart cities, but success depends on the availability of generic platforms that support high throughput and ultralow latency. Arun Kejariwal and Francois Orsini offer an overview of Satori's live data platform and walk you through a country-scale case study of its implementation. Read more.

5:10pm–5:50pm Wednesday, March 7, 2018

Small pieces, loosely joined: A skater's code

Data science and machine learning
Location: Expo Hall 1

Rodney Mullen (Almost Skateboards)

Average rating:

(5.00, 2 ratings)

The essence of modern skating is learning tricks that couple with specific terrain. Activision’s video game franchise testifies to the nearly endless possibilities. Rodney Mullen offers a nuanced look at how skaters nudge the endpoints of disparate submovements to create new combinations that may shine a different light on ideas in machine learning—plus it’s a lot of fun. Read more.

11:00am–11:40am Thursday, March 8, 2018

Kafka streaming applications with Akka Streams and Kafka Streams

Data engineering and architecture, Streaming systems and real-time applications
Location: Expo Hall 1

Dean Wampler (Anyscale)

Average rating:

(5.00, 1 rating)

Dean Wampler compares and contrasts data processing with Akka Streams and Kafka Streams, microservice streaming applications based on Kafka. Dean discusses the strengths and weaknesses of each tool for particular design needs and contrasts them with Spark Streaming and Flink, so you'll know when to choose them instead. Read more.

1:50pm–2:30pm Thursday, March 8, 2018

The real-time journey from raw streaming data to AI-based analytics

Data engineering and architecture, Data science and machine learning, Streaming systems and real-time applications
Location: Expo Hall 1

Roy Ben Alta (Amazon Web Services), Ira Cohen (Anodot)

Average rating:

(5.00, 1 rating)

Many domains, such as mobile, web, the IoT, ecommerce, and more, have turned to analyzing streaming data. However, this presents challenges both in transforming the raw data to metrics and automatically analyzing the metrics in to produce the insights. Roy Ben-Alta and Ira Cohen share a solution implemented using Amazon Kinesis as the real-time pipeline feeding Anodot's anomaly detection solution. Read more.

2:40pm–3:20pm Thursday, March 8, 2018

Building ML and AI pipelines with Spark and TensorFlow

Big data and data science in the cloud, Data engineering and architecture, Data science and machine learning, Streaming systems and real-time applications
Location: Expo Hall 1

Chris Fregly (Amazon Web Services)

Average rating:

(5.00, 1 rating)

Chris Fregly demonstrates how to extend existing Spark-based data pipelines to include TensorFlow model training and deploying and offers an overview of TensorFlow’s TFRecord format, including libraries for converting to and from other popular file formats such as Parquet, CSV, JSON, and Avro stored in HDFS and S3. Read more.

Presented by

Elite Sponsors

Strategic Sponsors

Zettabyte Sponsor

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com