Presented By O’Reilly and Cloudera
Make Data Work
March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

Schedule: Graphs and Time-series sessions

These two fundamental data types were part of the rise of big data. Many common and important use cases lend themselves to graph analytics or time-series analysis. We want to showcase the latest generation of tools and methods for cleaning, preparing, storing, and analyzing graphs and time-series. Improvements in both software and hardware are leading to new solutions for analysts, data scientists, and engineers.

Add to your personal schedule
9:00am12:30pm Tuesday, March 6, 2018
Karthik Ramasamy (Streamlio), Sanjeev Kulkarni (Streamlio), Sijie Guo (Streamlio), Arun Kejariwal (MZ)
Across diverse segments in industry, there has been a shift in focus from big data to fast data. Karthik Ramasamy, Sanjeev Kulkarni, Arun Kejariwal, and Sijie Guo walk you through state-of-the-art streaming architectures, streaming frameworks, and streaming algorithms, covering the typical challenges in modern real-time big data platforms and offering insights on how to address them. Read more.
Add to your personal schedule
9:00am12:30pm Tuesday, March 6, 2018
Mo Patel (Independent), Neejole Patel (Virginia Tech)
Since its arrival in early 2017, PyTorch has won over many deep learning researchers and developers due to its dynamic computation framework. Mo Patel and Neejole Patel walk you through using PyTorch to build a content recommendation model. Read more.
Add to your personal schedule
9:00am5:00pm Tuesday, March 6, 2018
Location: LL20 B
David Boyle (MasterClass), Violeta Hennessey (Warner Bros.), April Chen (Civis Analytics), Sridhar Alla (Comcast), Noah Gift (UC Davis), Blake Irvine (Netflix), Kevin Lyons (Nielsen Marketing Cloud), Jennifer Webb (SuprFanz), Rizwan Patel (Caesars Entertainment), Anthony Accardo (Disney), Amanda Gerdes (Blizzard Entertainment)
Hear from innovators in ad tech, measurement, automation, and audience engagement about where the media industry is today—and where it's likely to go next. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, March 6, 2018
Data engineering and architecture
Location: 210 B/F Level: Intermediate
Ted Malaska (Blizzard Entertainment)
If you have data that has a time factor to it, then you need to think in terms of time series datasets. Ted Malaska explores time series in all of its forms, from tumbling windows to sessionization in batch or in streaming. You'll gain exposure to the tools and background you need to be successful in the world of time-oriented data. Read more.
Add to your personal schedule
11:00am11:40am Wednesday, March 7, 2018
Shivnath Babu (Duke University | Unravel Data Systems), Sumit Jindal (Unravel Data Systems)
Getting the best performance, predictability, and reliability for Kafka-based applications is a complex art. Shivnath Babu and Sumit Jindal explain how to simplify the process by leveraging recent advances in machine learning and AI and outline a methodology for applying statistical learning to the rich and diverse monitoring data that is available from Kafka. Read more.
Add to your personal schedule
11:50am12:30pm Wednesday, March 7, 2018
William Chambers (Databricks), Michael Armbrust (Databricks)
William Chambers and Michael Armbrust discuss the motivation and basics of Apache Spark's Structured Streaming processing engine and share lessons they've learned running hundreds of Structured Streaming workloads in the cloud. Read more.
Add to your personal schedule
1:50pm2:30pm Wednesday, March 7, 2018
Alexandra Gunderson (Arundo Analytics)
Heavy industries, such as oil and gas, have tremendous amounts of data from which predictive models could be built, but it takes weeks or even months to create a comprehensive dataset from all of the various data sources. Alexandra Gunderson details the methodology behind an industry-tested approach that incorporates machine learning to structure and link data from different sources. Read more.
Add to your personal schedule
1:50pm2:30pm Wednesday, March 7, 2018
Data science and machine learning
Location: LL20 C Level: Advanced
Andrew Ray (Sam’s Club Technology)
Andrew Ray offers a brief introduction to the distributed graph algorithm abstractions provided by Pregel, PowerGraph, and GraphX, drawing on real-world examples, and provides historical context for the evolution between these three abstractions. Read more.
Add to your personal schedule
1:50pm2:30pm Wednesday, March 7, 2018
Data science and machine learning
Location: LL21 B Level: Intermediate
Chanchal Chatterjee (Google Cloud Platform)
Chanchal Chatterjee reveals how Wells Fargo was able to productionize credit risk analytics by leveraging LSTM-TensorSpark. Through a unique algorithm and process for model interpretations, Wells Fargo is now achieving an unprecedented 90%+ accuracy rate with its credit risk analysis. Read more.
Add to your personal schedule
2:40pm3:20pm Wednesday, March 7, 2018
Yu Xu (TigerGraph)
Graph databases are the fastest growing category in data management. However, most graph queries only traverse two hops in big graphs due to limitations in most graph databases. Real-world applications require deep link analytics that traverse far more than three hops. Yu Xu offers an overview of a fraud detection system that manages 100 billion graph elements to detect risk and fraudulent groups. Read more.
Add to your personal schedule
2:40pm3:20pm Wednesday, March 7, 2018
Baron Schwartz (VividCortex)
Anomaly detection is white hot in the monitoring industry, but many don't really understand or care about it, while others repeat the same pattern many times. Why? And what can we do about it? Baron Schwartz explains how he arrived at a "post-anomaly detection" point of view. Read more.
Add to your personal schedule
4:20pm5:00pm Wednesday, March 7, 2018
Vlad A Ionescu (ShiftLeft), Fabian Yamaguchi (ShiftLeft)
Vlad Ionescu and Fabian Yamaguchi outline Code Property Graph (CPG), a unique approach that allows the functional elements of code to be represented in an interconnected graph of data and control flows, which enables semantic information about code to be stored scalably on distributed graph databases over the web while allowing them to be rapidly accessed. Read more.
Add to your personal schedule
4:20pm5:00pm Wednesday, March 7, 2018
Sijie Guo (Streamlio)
Apache BookKeeper, a scalable, fault-tolerant, and low-latency storage service optimized for real-time workloads, has been widely adopted by enterprises like Twitter, Yahoo, and Salesforce to store and serve mission-critical data. Sijie Guo explains how Apache BookKeeper satisfies the needs of stream storage. Read more.
Add to your personal schedule
4:20pm5:00pm Wednesday, March 7, 2018
Data science and machine learning
Location: LL20 A Level: Intermediate
Joseph Richards (GE Digital)
Deploying ML software applications for use cases in the industrial internet presents a unique set of challenges. Data-driven problems require approaches that are highly accurate, robust, fast, scalable, and fault tolerant. Joseph Richards shares GE's approach to building production-grade ML applications and explores work across GE in industries such as power, aviation, and oil and gas. Read more.
Add to your personal schedule
4:20pm5:00pm Wednesday, March 7, 2018
Data science and machine learning
Location: LL21 B Level: Intermediate
Andrea Pasqua (Uber), Anny Chen (Uber)
Time series forecasting and anomaly detection is of utmost importance at Uber. However, the scale of the problem, the need for speed, and the importance of accuracy make anomaly detection a challenging data science problem. Andrea Pasqua and Anny Chen explain how the use of recurrent neural networks is allowing Uber to meet this challenge. Read more.
Add to your personal schedule
5:10pm5:50pm Wednesday, March 7, 2018
Roger Barga (Amazon Web Services), Nina Mishra (Amazon Web Services), Sudipto Guha (Amazon Web Services), Ryan Nienhuis (Amazon Web Services)
Roger Barga, Nina Mishra, Sudipto Guha, and Ryan Nienhuis detail continuous machine learning algorithms that discover useful information in streaming data. They focus on explainable machine learning, including anomaly detection with attribution, the ability to reduce false positives through user feedback, and the detection of anomalies in directed graphs. Read more.
Add to your personal schedule
11:00am11:40am Thursday, March 8, 2018
Michael Schrenk (Self-Employed)
Big data becomes much more powerful when it has context. Fortunately, creative data scientists can create needed context though the use of metadata. Michael Schrenk explains how metadata is created and used to gain competitive advantages, predict troop strength, or even guess Social Security numbers. Read more.
Add to your personal schedule
11:00am11:40am Thursday, March 8, 2018
Ryan Boyd (Neo4j)
Ryan Boyd explains how he and his team reconstructed a subset of the Twitter network of Russian troll accounts and applied graph analytics to the data using the Neo4j graph database to uncover how these accounts were spreading fake news. Read more.
Add to your personal schedule
11:50am12:30pm Thursday, March 8, 2018
Ram Shankar Siva Kumar (Microsoft (Azure Security Data Science))
How should you best debug a security data science system: change the ML approach, redefine the security scenario, or start over from scratch? Ram Shankar answers this question by sharing the results of failed experiments and the lessons learned when building ML detections for cloud lateral movement, identifying anomalous executables, and automating incident response process. Read more.
Add to your personal schedule
11:50am12:30pm Thursday, March 8, 2018
Data engineering and architecture
Location: 230 C Level: Intermediate
Alexis Roos (Salesforce), Noah Burbank (Salesforce)
In the customer age, being able to extract relevant communications information in real time and cross-reference it with context is key. Alexis Roos and Noah Burbank explain how Salesforce uses data science and engineering to enable salespeople to monitor their emails in real time to surface insights and recommendations using a graph modeling contextual data. Read more.
Add to your personal schedule
1:50pm2:30pm Thursday, March 8, 2018
Data engineering and architecture
Location: 230 A Level: Intermediate
Michael Freedman (TimescaleDB | Princeton)
Michael Freedman offers an overview of TimescaleDB, a new scale-out database designed for time series workloads yet open-sourced and engineered up as a plugin to Postgres. Unlike most time series newcomers, TimescaleDB supports full SQL while achieving fast ingest and complex queries. Read more.
Add to your personal schedule
1:50pm2:30pm Thursday, March 8, 2018
Roy Ben-Alta (Amazon Web Services), Ira Cohen (Anodot)
Many domains, such as mobile, web, the IoT, ecommerce, and more, have turned to analyzing streaming data. However, this presents challenges both in transforming the raw data to metrics and automatically analyzing the metrics in to produce the insights. Roy Ben-Alta and Ira Cohen share a solution implemented using Amazon Kinesis as the real-time pipeline feeding Anodot's anomaly detection solution. Read more.
Add to your personal schedule
2:40pm3:20pm Thursday, March 8, 2018
Fabian Hueske (data Artisans), Flavio Junqueira (Dell EMC)
Flavio Junqueira and Fabian Hueske detail an open source streaming data stack consisting of Pravega (stream storage) and Apache Flink (computation on streams) that offers an unprecedented way of handling “everything as a stream” that includes unbounded streaming storage and unified batch and streaming abstraction and dynamically accommodates workload variations in a novel way. Read more.