Schedule: Streaming and IoT sessions: Data science + business analytics training: Strata Data Conference

9:00am–12:30pm Tuesday, September 24, 2019

Location: 1E 10

Real-time SQL stream processing at scale with Apache Kafka and KSQL

Viktor Gamov (Confluent)

Building stream processing applications is certainly one of the hot topics in the IT community. But if you've ever thought you needed to be a programmer to do stream processing and build stream processing data pipelines, think again. Viktor Gamov explores KSQL, the stream processing query engine built on top of Apache Kafka. Read more.

9:00am–12:30pm Tuesday, September 24, 2019

Location: 1E 09

Serverless streaming architectures and algorithms for the enterprise

Data Engineering and Architecture, Streaming and IoT

Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio), Anurag Khandelwal (Yale University)

Arun Kejariwal, Karthik Ramasamy, and Anurag Khandelwal walk you through the landscape of streaming systems and examine the inception and growth of the serverless paradigm. You'll take a deep dive into Apache Pulsar, which provides native serverless support in the form of Pulsar functions and get a bird’s-eye view of the application domains where you can leverage Pulsar functions. Read more.

9:00am–12:30pm Tuesday, September 24, 2019

Location: 1E 11

Cloudera Edge Management in the IoT

Data Engineering and Architecture, Streaming and IoT

Purnima Reddy Kuchikulla (Cloudera), Timothy Spann (Cloudera), Abdelkrim Hadjidj (Cloudera), Andre Araujo (Cloudera), Hemanth Yamijala (Cloudera)

There are too many edge devices and agents, and you need to control and manage them. Purnima Reddy Kuchikulla, Timothy Spann, Abdelkrim Hadjidj, and Andre Araujo walk you through handling the difficulty in collecting real-time data and the trouble with updating a specific set of agents with edge applications. Get your hands dirty with CEM, which addresses these challenges with ease. Read more.

1:30pm–5:00pm Tuesday, September 24, 2019

Location: 1E 11

Sketching data and other magic tricks

Data Science, Machine Learning, & AI

Sophie Watson (Red Hat), William Benton (Red Hat)

Go hands-on with Sophie Watson and William Benton to examine data structures that let you answer interesting queries about massive datasets in fixed amounts of space and constant time. This seems like magic, but they'll explain the key trick that makes it possible and show you how to use these structures for real-world machine learning and data engineering applications. Read more.

1:30pm–5:00pm Tuesday, September 24, 2019

Location: 1E 14

Kafka and Streams Messaging Manager (SMM) crash course

Data Engineering and Architecture, Streaming and IoT

Purnima Reddy Kuchikulla (Cloudera), Dan Chaffelson (Cloudera), Attila Kanto (Cloudera), Tony Wu (Cloudera)

Kafka is omnipresent and the backbone of streaming analytics applications and data lakes. The challenge is understanding what's going on overall in the Kafka cluster, including performance, issues, and message flows. Purnima Reddy Kuchikulla and Dan Chaffelson walk you through a hands-on experience to visualize the entire Kafka environment end-to-end and simplify Kafka operations via SMM. Read more.

11:20am–12:00pm Wednesday, September 25, 2019

Location: 1A 15/16

Building a multitenant data processing and model inferencing platform with Kafka Streams

Data Engineering and Architecture

Navinder Pal Singh Brar (Walmart Labs)

Each week 275 million people shop at Walmart, generating interaction and transaction data. Navinder Pal Singh Brar explains how the customer backbone team enables extraction, transformation, and storage of customer data to be served to other teams. At 5 billion events per day, the Kafka Streams cluster processes events from various channels and maintains a uniform identity of a customer. Read more.

1:15pm–1:55pm Wednesday, September 25, 2019

Location: 1E 12/13

Turning petabytes of data from millions of vehicles into open data with Geotab

Case studies, Strata Business Summit

Felipe Hoffa (Google), Bob Bradley (Geotab)

Geotab is a world-leading asset-tracking company with millions of vehicles under service every day. Felipe Hoffa and Bob Bradley examine the challenges and solutions to create an ML- and geographic information system- (GI)S enabled petabyte-scale data warehouse leveraging Google Cloud. And they dive into the process to publish open, how you can access it, and how cities are using it. Read more.

2:55pm–3:35pm Wednesday, September 25, 2019

Location: 1E 12/13

Enabling 5G use cases through location intelligence

Case studies, Strata Business Summit

Tim McKenzie (Pitney Bowes)

Tim McKenzie examines why planning 5G network rollout and associated services requires a good understanding of location-based data. Accurate addressing and linking consumers to property or points of interest allows data enrichment with attributes, demographics and social data. Companies use location to organize and analyze network and customer data to understand where to target new services. Read more.

2:55pm–3:35pm Wednesday, September 25, 2019

Location: 1A 15/16

How Orange Financial combats financial fraud over 50M transactions a day using Apache Pulsar

Data Engineering and Architecture

Weisheng Xie (Orange Financial), Jia Zhai (StreamNative)

As a fintech company of China Telecom with half of a billion registered users and 41 million monthly active users, risk control decision deployment has been critical to its success. Weisheng Xie and Jia Zhai explore how the company leverages Apache Pulsar to boost the efficiency of its risk control decision development for combating financial frauds of over 50 million transactions a day. Read more.

4:35pm–5:15pm Wednesday, September 25, 2019

Location: 1A 15/16

Trill: The crown jewel of Microsoft’s streaming pipeline explained

Data Engineering and Architecture, Streaming and IoT

James Terwilliger (Microsoft Corporation), Badrish Chandramouli (Microsoft Research), Jonathan Goldstein (Microsoft Research)

Trill has been open-sourced, making the streaming engine behind services like the Bing Ads platform available for all to use and extend. James Terwilliger, Badrish Chandramouli, and Jonathan Goldstein dive into the history of and insights from streaming data at Microsoft. They demonstrate how its API can power complex application logic and the performance that gives the engine its name. Read more.

5:25pm–6:05pm Wednesday, September 25, 2019

Location: 1A 15/16

Fast data with the KISSS stack

Data Engineering and Architecture

Bas Geerdink (Aizonic)

Streaming analytics (or fast data processing) is the field of making predictions based on real-time data. Bas Geerdink presents a fast data architecture that covers many use cases that follow a "pipes and filters" pattern. This architecture can be used to create enterprise-grade solutions with a diversity of technology options. The stack is Kafka, Ignite, and Spark Structured Streaming (KISSS). Read more.

11:20am–12:00pm Thursday, September 26, 2019

Location: 1A 23/24

Performant time series data management and analytics with PostgreSQL

Data Engineering and Architecture

Michael Freedman (TimescaleDB | Princeton University)

Leveraging polyglot solutions for your time series data can lead to issues including engineering complexity, operational challenges, and even referential integrity concerns. Michael Freedman explains why, by re-engineering PostgreSQL to serve as a general data platform, your high-volume time series workloads will be better streamlined, resulting in more actionable data and greater ease of use. Read more.

11:20am–12:00pm Thursday, September 26, 2019

Location: 1A 21/22

Online machine learning in streaming applications

Data Engineering and Architecture, Streaming and IoT

Stavros Kontopoulos (Lightbend), Debasish Ghosh (Lightbend)

Stavros Kontopoulos and Debasish Ghosh explore online machine learning algorithm choices for streaming applications, especially those with resource-constrained use cases like IoT and personalization. They dive into Hoeffding Adaptive Trees, classic sketch data structures, and drift detection algorithms from implementation to production deployment, describing the pros and cons of each of them. Read more.

2:05pm–2:45pm Thursday, September 26, 2019

Location: 1A 06/07

Learning asset naming patterns to find risky unmanaged devices

Data Science, Machine Learning, & AI

Ryan Foltz (Exabeam)

Unmanaged and foreign devices in the corporate networks pose a security risk, and the first step toward reducing this risk is the ability to identify them. Ryan Foltz walks you through a comprehensive device management machine learning model based on deep learning that performs anomaly detection based on only device names to flag devices that do not follow naming structures. Read more.

2:05pm–2:45pm Thursday, September 26, 2019

Location: 1A 03

Stream processing beyond streaming data

Data Engineering and Architecture, Streaming and IoT

Stephan Ewen (Ververica)

Stephan Ewen details how stream processing is becoming a "grand unifying paradigm" for data processing and the newest developments in Apache Flink to support this trend: new cross-batch-streaming machine learning algorithms, state-of-the-art batch performance, and new building blocks for data-driven applications and application consistency. Read more.

2:05pm–2:45pm Thursday, September 26, 2019

Location: 3B - Expo Hall

Machine learning for streaming data: Practical insights

Data Science, Machine Learning, & AI, Expo Hall

Heitor Murilo Gomes (Télécom ParisTech), Albert Bifet (Télécom ParisTech)

Heitor Murilo Gomes and Albert Bifet introduce you to a machine learning pipeline for streaming data using the streamDM framework. You'll also learn how to use streamDM for supervised and unsupervised learning tasks, see examples of online preprocessing methods, and discover how to expand the framework by adding new learning algorithms or preprocessing methods. Read more.

2:05pm–2:45pm Thursday, September 26, 2019

Location: 1A 15/16

Posttransaction processing using Apache Pulsar at Narvar

Data Engineering and Architecture, Streaming and IoT

Davor Bonaci (Kaskada), Anand Madhavan (Narvar)

Narvar provides next-generation posttransaction experience for over 500 retailers. Karthik Ramasamy and Anand Madhavan take you on the journey of how Narvar moved away from using a slew of technologies for their platform and consolidated its use cases using Apache Pulsar. Read more.

3:45pm–4:25pm Thursday, September 26, 2019

Location: 1A 15/16

SK Telecom's 5G network monitoring and 3D visualization on streaming technologies

Data Engineering and Architecture, Streaming and IoT

Jonghyok Lee (SK Telecom), Chon Yong Lee (SK Telecom)

Jonghyok Lee Chon Yong Lee discuss T-CORE, SK Telecom’s monitoring and service analytics platform, which collects system and application data from several thousand servers and applications and provides a 3D visualization of the real-time status of the whole network. Join in to hear lessons learned during development. Read more.

4:35pm–5:15pm Thursday, September 26, 2019

Location: 1E 10/11

Executive Briefing: What it takes to use machine learning in fast data pipelines

Executive Briefing and best practices, Strata Business Summit

Dean Wampler (Anyscale)

Dean Wampler dives into how (and why) to integrate ML into production streaming data pipelines and to serve results quickly; how to bridge data science and production environments with different tools, techniques, and requirements; how to build reliable and scalable long-running services; and how to update ML models without downtime. Read more.