Presented By O’Reilly and Cloudera
Make Data Work
March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

Schedule: Expo Hall sessions

Add to your personal schedule
11:00am11:40am Wednesday, March 7, 2018
Data science and machine learning
Location: Expo Hall 1 Level: Intermediate
Siddha Ganju (Deep Vision)
Siddha Ganju explains how the FDL lab at NASA uses artificial intelligence to improve and automate the identification of meteors above human-level performance using meteor shower images and recover known meteor shower streams and characterize previously unknown meteor showers using orbital data—research aimed at providing more warning time for long-period comet impacts. Read more.
Add to your personal schedule
11:50am12:30pm Wednesday, March 7, 2018
Data science and machine learning
Location: Expo Hall 1 Level: Intermediate
David Talby (Pacific AI), Santosh Kulkarni (Kaiser Permanente)
Average rating: ***..
(3.50, 2 ratings)
David Talby and Santosh Kulkarni explain how Kaiser Permanente uses the open source NLP library for Apache Spark to tackle one of the most common challenges with applying natural language process in practice: integrating domain-specific NLP as part of a scalable, performant, measurable, and reproducible machine learning pipeline. Read more.
Add to your personal schedule
1:50pm2:30pm Wednesday, March 7, 2018
Data science and machine learning
Location: Expo Hall 1 Level: Non-technical
Harish Doddi (Datatron Technologies), Jerry Xu (Datatron Technologies)
Average rating: ****.
(4.00, 3 ratings)
Deploying machine learning models and deep learning models in production is hard. Harish Doddi and Jerry Xu outline the enterprise data science lifecycle, covering how production model deployment flow works, challenges, best practices, and lessons learned. Along the way, they explain why monitoring models in the production should be mandatory. Read more.
Add to your personal schedule
2:40pm3:20pm Wednesday, March 7, 2018
Yu Xu (TigerGraph)
Average rating: *****
(5.00, 2 ratings)
Graph databases are the fastest growing category in data management. However, most graph queries only traverse two hops in big graphs due to limitations in most graph databases. Real-world applications require deep link analytics that traverse far more than three hops. Yu Xu offers an overview of a fraud detection system that manages 100 billion graph elements to detect risk and fraudulent groups. Read more.
Add to your personal schedule
4:20pm5:00pm Wednesday, March 7, 2018
Data engineering and architecture
Location: Expo Hall 1
Arun Kejariwal (Independent), Roman Smolgovsky (MZ)
One of the key application domains leveraging live data is smart cities, but success depends on the availability of generic platforms that support high throughput and ultralow latency. Arun Kejariwal and Francois Orsini offer an overview of Satori's live data platform and walk you through a country-scale case study of its implementation. Read more.
Add to your personal schedule
5:10pm5:50pm Wednesday, March 7, 2018
Data science and machine learning
Location: Expo Hall 1
Rodney Mullen (Almost Skateboards)
Average rating: *****
(5.00, 2 ratings)
The essence of modern skating is learning tricks that couple with specific terrain. Activision’s video game franchise testifies to the nearly endless possibilities. Rodney Mullen offers a nuanced look at how skaters nudge the endpoints of disparate submovements to create new combinations that may shine a different light on ideas in machine learning—plus it’s a lot of fun. Read more.
Add to your personal schedule
11:00am11:40am Thursday, March 8, 2018
Dean Wampler (Lightbend)
Average rating: *****
(5.00, 1 rating)
Dean Wampler compares and contrasts data processing with Akka Streams and Kafka Streams, microservice streaming applications based on Kafka. Dean discusses the strengths and weaknesses of each tool for particular design needs and contrasts them with Spark Streaming and Flink, so you'll know when to choose them instead. Read more.
Add to your personal schedule
1:50pm2:30pm Thursday, March 8, 2018
Roy Ben-Alta (Amazon Web Services), Ira Cohen (Anodot)
Average rating: *****
(5.00, 1 rating)
Many domains, such as mobile, web, the IoT, ecommerce, and more, have turned to analyzing streaming data. However, this presents challenges both in transforming the raw data to metrics and automatically analyzing the metrics in to produce the insights. Roy Ben-Alta and Ira Cohen share a solution implemented using Amazon Kinesis as the real-time pipeline feeding Anodot's anomaly detection solution. Read more.
Add to your personal schedule
2:40pm3:20pm Thursday, March 8, 2018
Chris Fregly (PipelineAI)
Average rating: *****
(5.00, 1 rating)
Chris Fregly demonstrates how to extend existing Spark-based data pipelines to include TensorFlow model training and deploying and offers an overview of TensorFlow’s TFRecord format, including libraries for converting to and from other popular file formats such as Parquet, CSV, JSON, and Avro stored in HDFS and S3. Read more.