San FranciscoLondonNew York

Presented By
O’Reilly + Cloudera

Make Data Work

29 April–2 May 2019
London, UK

Schedule: Temporal data and time-series sessions

13:30–17:00 Tuesday, 30 April 2019

Architecture and algorithms for end-to-end streaming data processing

Data Engineering and Architecture, Streaming and IoT
Location: S11 A

Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio), Ivan Kelly (Streamlio)

Average rating:

(3.00, 10 ratings)

Many industry segments have been grappling with fast data (high-volume, high-velocity data). Arun Kejariwal and Karthik Ramasamy walk you through the state-of-the-art systems for each stage of an end-to-end data processing pipeline—messaging, compute, and storage—for real-time data and algorithms to extract insights (e.g., heavy hitters and quantiles) from data streams. Read more.

13:30–17:00 Tuesday, 30 April 2019

Time series forecasting with Azure Machine Learning

Data Science, Machine Learning & AI
Location: Capital Suite 2/3

Francesca Lazzeri (Microsoft), Aashish Bhateja (Microsoft)

Average rating:

(4.25, 4 ratings)

Time series modeling and forecasting is fundamentally important to various practical domains; in the past few decades, machine learning model-based forecasting has become very popular in both private and public decision-making processes. Francesca Lazzeri walks you through using Azure Machine Learning to build and deploy your time series forecasting models. Read more.

11:15–11:55 Wednesday, 1 May 2019

Predicting real-time transaction fraud using supervised learning

Data Science, Machine Learning & AI
Location: Capital Suite 17

Sami Niemi (Barclays)

Average rating:

(4.62, 16 ratings)

Predicting transaction fraud of debit and credit card payments in real time is an important challenge, which state-of-art supervised machine learning models can help to solve. Sami Niemi offers an overview of the solutions Barclays has been developing and testing and details how well models perform in variety of situations like card present and card not present debit and credit card transactions. Read more.

12:05–12:45 Wednesday, 1 May 2019

Sequence-to-sequence modeling for time series

Data Science, Machine Learning & AI
Location: Capital Suite 17

Arun Kejariwal (Independent), Ira Cohen (Anodot)

Average rating:

(4.00, 5 ratings)

Sequence-to-sequence modeling (seq2seq) is now being used for applications based on time series data. Arun Kejariwal and Ira Cohen offer an overview seq2seq and explore its early use cases. They then walk you through leveraging seq2seq modeling for these use cases, particularly with regard to real-time anomaly detection and forecasting. Read more.

14:05–14:45 Wednesday, 1 May 2019

Building the data infrastructure for the internet of things at zettabyte scale

Data Engineering and Architecture
Location: Capital Suite 8/9

JIAN CHANG (Alibaba Group), Sanjian Chen (Alibaba Group)

Average rating:

(3.33, 3 ratings)

Jian Chang and Sanjian Chen share the architecture design and many detailed technology innovations of Alibaba TSDB, a state-of-the-art database for IoT data management, and discuss lessons learned from years of development and continuous improvement. Read more.

14:05–14:45 Wednesday, 1 May 2019

Using machine learning for stock picking

Data Science, Machine Learning & AI
Location: Capital Suite 15/16

Alun Biffin (Van Lanschot Kempen), David Dogon (Van Lanschot Kempen)

Average rating:

(4.45, 11 ratings)

Alun Biffin and David Dogon explain how machine learning revolutionized the stock-picking process for portfolio managers at Kempen Capital Management by filtering the vast small-cap investment universe down to a handful of optimal stocks. Read more.

16:35–17:15 Wednesday, 1 May 2019

LSTM-based time series anomaly detection using Analytics Zoo for Spark and BigDL

Data Science, Machine Learning & AI
Location: Capital Suite 17

Guoqiong Song (Intel)

Average rating:

(3.40, 5 ratings)

Collecting and processing massive time series data (e.g., logs, sensor readings, etc.) and detecting the anomalies in real time is critical for many emerging smart systems, such as industrial, manufacturing, AIOps, and the IoT. Guoqiong Song explains how to detect anomalies in time series data using Analytics Zoo and BigDL at scale on a standard Spark cluster. Read more.

16:35–17:15 Wednesday, 1 May 2019

A Magic 8 Ball for optimal cost and resource allocation for the big data stack

Data Science, Machine Learning & AI
Location: Capital Suite 15/16

Shivnath Babu (Unravel Data Systems | Duke University), Alkis Simitsis (Micro Focus)

Average rating:

(5.00, 1 rating)

Cost and resource provisioning are critical components of the big data stack. Shivnath Babu and Alkis Simitsis detail how to build a Magic 8 Ball for the big data stack—a decomposable time series model for optimal cost and resource allocation that offers enterprises a glimpse into their future needs and enables effective and cost-efficient project and operational planning. Read more.

14:05–14:45 Thursday, 2 May 2019

Reinforcement learning: A gentle introduction and an industrial application

Data Science, Machine Learning & AI
Location: Capital Suite 15/16

Christian Hidber (bSquare)

Average rating:

(4.86, 7 ratings)

Reinforcement learning (RL) learns complex processes autonomously like walking, beating the world champion in Go, or flying a helicopter. No big datasets with the “right” answers are needed: the algorithms learn by experimenting. Christian Hidber shows how and why RL works and demonstrates how to apply it to an industrial hydraulics application with 7,000 clients in 42 countries. Read more.

14:55–15:35 Thursday, 2 May 2019

Performant time series data management and analytics with PostgreSQL

Data Engineering and Architecture, Expo Hall
Location: Expo Hall 2 (Capital Hall N24)

Michael Freedman (TimescaleDB | Princeton University)

Average rating:

(4.75, 4 ratings)

Time series databases require ingesting high volumes of structured data, answering complex, performant queries for recent and historical time intervals, and performing specialized time-centric analysis and data management. Michael Freedman explains how to avoid these operational problems by reengineering Postgres to serve as a general data platform, including high-volume time series workloads. Read more.

14:55–15:35 Thursday, 2 May 2019

Early incident detection using fusion analytics of commuter-centric data sources

Data Science, Machine Learning & AI
Location: Capital Suite 15/16

Christopher Hooi (Land Transport Authority of Singapore)

Average rating:

(5.00, 3 ratings)

Christopher Hooi offers an overview of the Fusion Analytics for Public Transport Event Response (FASTER) system, a real-time advanced analytics solution for early warning of potential train incidents. FASTER uses engineering and commuter-centric IoT data sources to activate contingency plans at the earliest possible time and reduce impact to commuters. Read more.

Presented by

Global Sponsors

Zettabyte Sponsor

Exabyte Sponsor

Impact Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2019, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com