Presented By O’Reilly and Cloudera
Make Data Work
March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

Schedule: Graphs and Time-series sessions

These two fundamental data types were part of the rise of big data. Many common and important use cases lend themselves to graph analytics or time-series analysis. We want to showcase the latest generation of tools and methods for cleaning, preparing, storing, and analyzing graphs and time-series. Improvements in both software and hardware are leading to new solutions for analysts, data scientists, and engineers.

9:00am12:30pm Tuesday, March 6, 2018
Karthik Ramasamy (Streamlio), Sanjeev Kulkarni (Streamlio), Sijie Guo (StreamNative), Arun Kejariwal (Independent)
Average rating: *****
(5.00, 2 ratings)
Across diverse segments in industry, there has been a shift in focus from big data to fast data. Karthik Ramasamy, Sanjeev Kulkarni, Arun Kejariwal, and Sijie Guo walk you through state-of-the-art streaming architectures, streaming frameworks, and streaming algorithms, covering the typical challenges in modern real-time big data platforms and offering insights on how to address them. Read more.
9:00am12:30pm Tuesday, March 6, 2018
Mo Patel (Independent), Neejole Patel (Virginia Tech)
Average rating: **...
(2.50, 4 ratings)
Since its arrival in early 2017, PyTorch has won over many deep learning researchers and developers due to its dynamic computation framework. Mo Patel and Neejole Patel walk you through using PyTorch to build a content recommendation model. Read more.
9:00am5:00pm Tuesday, March 6, 2018
Location: LL20 B
David Boyle (Audience Strategies), Violeta Hennessey (Warner Bros.), April Chen (Civis Analytics), Sridhar Alla (BlueWhale), Noah Gift (UC Davis), Blake Irvine (Netflix), Kevin Lyons (Nielsen Marketing Cloud), Jennifer Webb (SuprFanz), Rizwan Patel (Caesars Entertainment), Anthony Accardo (Disney), Amanda Gerdes (Blizzard Entertainment), Violeta Hennessey (Warner Bros.), Aneesh Karve (Quilt), David Boyle (Audience Strategies), Pete Skomoroch (Workday)
Hear from innovators in ad tech, measurement, automation, and audience engagement about where the media industry is today—and where it's likely to go next. Read more.
1:30pm5:00pm Tuesday, March 6, 2018
Ted Malaska (Capital One)
Average rating: **...
(2.80, 5 ratings)
If you have data that has a time factor to it, then you need to think in terms of time series datasets. Ted Malaska explores time series in all of its forms, from tumbling windows to sessionization in batch or in streaming. You'll gain exposure to the tools and background you need to be successful in the world of time-oriented data. Read more.
11:00am11:40am Wednesday, March 7, 2018
Shivnath Babu (Duke University | Unravel Data Systems), mdhruvg goel (Microsoft)
Average rating: ****.
(4.50, 2 ratings)
Getting the best performance, predictability, and reliability for Kafka-based applications is a complex art. Shivnath Babu and Dhruv Goel explain how to simplify the process by leveraging recent advances in machine learning and AI and outline a methodology for applying statistical learning to the rich and diverse monitoring data that is available from Kafka. Read more.
11:50am12:30pm Wednesday, March 7, 2018
Bill Chambers (Databricks), michael dddd (Databricks)
Average rating: ****.
(4.60, 5 ratings)
William Chambers and Michael Armbrust discuss the motivation and basics of Apache Spark's Structured Streaming processing engine and share lessons they've learned running hundreds of Structured Streaming workloads in the cloud. Read more.
1:50pm2:30pm Wednesday, March 7, 2018
Alexandra Gunderson (Arundo Analytics)
Average rating: *****
(5.00, 1 rating)
Heavy industries, such as oil and gas, have tremendous amounts of data from which predictive models could be built, but it takes weeks or even months to create a comprehensive dataset from all of the various data sources. Alexandra Gunderson details the methodology behind an industry-tested approach that incorporates machine learning to structure and link data from different sources. Read more.
1:50pm2:30pm Wednesday, March 7, 2018
Andrew Ray (Sam’s Club Technology)
Average rating: ***..
(3.00, 3 ratings)
Andrew Ray offers a brief introduction to the distributed graph algorithm abstractions provided by Pregel, PowerGraph, and GraphX, drawing on real-world examples, and provides historical context for the evolution between these three abstractions. Read more.
1:50pm2:30pm Wednesday, March 7, 2018
Kyle Grove (Teradata)
Average rating: *****
(5.00, 5 ratings)
Kyle Grove explains how Teradata and some of world’s largest financial institutions are innovating credit risk ranking with deep learning techniques and AnalyticOps. With the AnalyticOps framework, these organizations have built models with increased accuracy to drive more profitable lending decisions while being explainable to regulators. Read more.
2:40pm3:20pm Wednesday, March 7, 2018
Yu Xu (TigerGraph)
Average rating: *****
(5.00, 2 ratings)
Graph databases are the fastest growing category in data management. However, most graph queries only traverse two hops in big graphs due to limitations in most graph databases. Real-world applications require deep link analytics that traverse far more than three hops. Yu Xu offers an overview of a fraud detection system that manages 100 billion graph elements to detect risk and fraudulent groups. Read more.
2:40pm3:20pm Wednesday, March 7, 2018
Baron Schwartz (VividCortex)
Average rating: ****.
(4.80, 5 ratings)
Anomaly detection is white hot in the monitoring industry, but many don't really understand or care about it, while others repeat the same pattern many times. Why? And what can we do about it? Baron Schwartz explains how he arrived at a "post-anomaly detection" point of view. Read more.
4:20pm5:00pm Wednesday, March 7, 2018
Vlad A Ionescu (ShiftLeft), Fabian Yamaguchi (ShiftLeft)
Average rating: ****.
(4.00, 1 rating)
Vlad Ionescu and Fabian Yamaguchi outline Code Property Graph (CPG), a unique approach that allows the functional elements of code to be represented in an interconnected graph of data and control flows, which enables semantic information about code to be stored scalably on distributed graph databases over the web while allowing them to be rapidly accessed. Read more.
4:20pm5:00pm Wednesday, March 7, 2018
Sijie Guo (StreamNative)
Average rating: ***..
(3.67, 3 ratings)
Apache BookKeeper, a scalable, fault-tolerant, and low-latency storage service optimized for real-time workloads, has been widely adopted by enterprises like Twitter, Yahoo, and Salesforce to store and serve mission-critical data. Sijie Guo explains how Apache BookKeeper satisfies the needs of stream storage. Read more.
4:20pm5:00pm Wednesday, March 7, 2018
Joseph Richards (GE Digital)
Average rating: *****
(5.00, 1 rating)
Deploying ML software applications for use cases in the industrial internet presents a unique set of challenges. Data-driven problems require approaches that are highly accurate, robust, fast, scalable, and fault tolerant. Joseph Richards shares GE's approach to building production-grade ML applications and explores work across GE in industries such as power, aviation, and oil and gas. Read more.
4:20pm5:00pm Wednesday, March 7, 2018
Andrea Pasqua (Uber), Anny Chen (Uber)
Average rating: ****.
(4.60, 5 ratings)
Time series forecasting and anomaly detection is of utmost importance at Uber. However, the scale of the problem, the need for speed, and the importance of accuracy make anomaly detection a challenging data science problem. Andrea Pasqua and Anny Chen explain how the use of recurrent neural networks is allowing Uber to meet this challenge. Read more.
5:10pm5:50pm Wednesday, March 7, 2018
Roger Barga (Amazon Web Services), Nina Mishra (Amazon Web Services), Sudipto Guha (Amazon Web Services), Ryan Nienhuis (Amazon Web Services)
Average rating: *****
(5.00, 8 ratings)
Roger Barga, Nina Mishra, Sudipto Guha, and Ryan Nienhuis detail continuous machine learning algorithms that discover useful information in streaming data. They focus on explainable machine learning, including anomaly detection with attribution, the ability to reduce false positives through user feedback, and the detection of anomalies in directed graphs. Read more.
11:00am11:40am Thursday, March 8, 2018
Michael Schrenk (Self-Employed)
Average rating: ****.
(4.00, 5 ratings)
Big data becomes much more powerful when it has context. Fortunately, creative data scientists can create needed context though the use of metadata. Michael Schrenk explains how metadata is created and used to gain competitive advantages, predict troop strength, or even guess Social Security numbers. Read more.
11:00am11:40am Thursday, March 8, 2018
Ryan Boyd (Neo4j)
Average rating: *****
(5.00, 1 rating)
Ryan Boyd explains how he and his team reconstructed a subset of the Twitter network of Russian troll accounts and applied graph analytics to the data using the Neo4j graph database to uncover how these accounts were spreading fake news. Read more.
11:50am12:30pm Thursday, March 8, 2018
Ram Shankar Siva Kumar (Microsoft (Azure Security Data Science))
Average rating: ****.
(4.00, 1 rating)
How should you best debug a security data science system: change the ML approach, redefine the security scenario, or start over from scratch? Ram Shankar answers this question by sharing the results of failed experiments and the lessons learned when building ML detections for cloud lateral movement, identifying anomalous executables, and automating incident response process. Read more.
11:50am12:30pm Thursday, March 8, 2018
Alexis Roos (Salesforce), Noah Burbank (Salesforce)
Average rating: ***..
(3.00, 1 rating)
In the customer age, being able to extract relevant communications information in real time and cross-reference it with context is key. Alexis Roos and Noah Burbank explain how Salesforce uses data science and engineering to enable salespeople to monitor their emails in real time to surface insights and recommendations using a graph modeling contextual data. Read more.
1:50pm2:30pm Thursday, March 8, 2018
Michael Freedman (TimescaleDB)
Average rating: ****.
(4.50, 4 ratings)
Michael Freedman offers an overview of TimescaleDB, a new scale-out database designed for time series workloads yet open-sourced and engineered up as a plugin to Postgres. Unlike most time series newcomers, TimescaleDB supports full SQL while achieving fast ingest and complex queries. Read more.
1:50pm2:30pm Thursday, March 8, 2018
Roy Ben Alta (Amazon Web Services), Ira Cohen (Anodot)
Average rating: *****
(5.00, 1 rating)
Many domains, such as mobile, web, the IoT, ecommerce, and more, have turned to analyzing streaming data. However, this presents challenges both in transforming the raw data to metrics and automatically analyzing the metrics in to produce the insights. Roy Ben-Alta and Ira Cohen share a solution implemented using Amazon Kinesis as the real-time pipeline feeding Anodot's anomaly detection solution. Read more.
2:40pm3:20pm Thursday, March 8, 2018
Fabian Hueske (data Artisans), Flavio Junqueira (Dell EMC)
Average rating: ***..
(3.33, 3 ratings)
Flavio Junqueira and Fabian Hueske detail an open source streaming data stack consisting of Pravega (stream storage) and Apache Flink (computation on streams) that offers an unprecedented way of handling “everything as a stream” that includes unbounded streaming storage and unified batch and streaming abstraction and dynamically accommodates workload variations in a novel way. Read more.