Schedule: Data Integration and Data Processing sessions: Data science + business analytics training: Strata Data Conference

9:00am - 5:00pm Monday, September 23 & Tuesday, September 24

Location: 1E 06

Jesse Anderson (Big Data Institute)

Jesse Anderson offers you an in-depth look at Apache Kafka. You'll learn how Kafka works and how to create real-time systems with it, as well as how to create consumers and publishers. You'll take a look Jesse then walks you through Kafka’s ecosystem, demonstrating how to use tools like Kafka Streams, Kafka Connect, and KSQL. Read more.

9:00am - 5:00pm Monday, September 23 & Tuesday, September 24

Location: 1A 17

SOLD OUT: Building a serverless big data application on AWS

Data Engineering and Architecture

Jorge Lopez (Amazon Web Services), Radhika Ravirala (Amazon Web Services), Nikki Rouda (Amazon Web Services), Jesse Gebhardt (Amazon Web Services), Rajeev Chakrabarti (Amazon Web Services)

Serverless technologies let you build and scale applications and services rapidly without the need to provision or manage servers. Join the AWS team to learn how to incorporate serverless concepts into your big data architectures. You'll explore design patterns to ingest, store, and analyze your data as you build a big data application using AWS technologies such as S3, Athena, Kinesis, and more. Read more.

9:00am–12:30pm Tuesday, September 24, 2019

Location: 1E 10

Real-time SQL stream processing at scale with Apache Kafka and KSQL

Data Engineering and Architecture

Viktor Gamov (Confluent)

Building stream processing applications is certainly one of the hot topics in the IT community. But if you've ever thought you needed to be a programmer to do stream processing and build stream processing data pipelines, think again. Viktor Gamov explores KSQL, the stream processing query engine built on top of Apache Kafka. Read more.

11:20am–12:00pm Wednesday, September 25, 2019

Location: 1A 15/16

Building a multitenant data processing and model inferencing platform with Kafka Streams

Data Engineering and Architecture

Navinder Pal Singh Brar (Walmart Labs)

Each week 275 million people shop at Walmart, generating interaction and transaction data. Navinder Pal Singh Brar explains how the customer backbone team enables extraction, transformation, and storage of customer data to be served to other teams. At 5 billion events per day, the Kafka Streams cluster processes events from various channels and maintains a uniform identity of a customer. Read more.

1:15pm–1:55pm Wednesday, September 25, 2019

Location: 1E 07/08

Your easy move to serverless computing and radically simplified data processing

Data Engineering and Architecture

Gil Vernik (IBM)

Most analytic flows can benefit from serverless, starting with simple cases to and moving to complex data preparations for AI frameworks like TensorFlow. To address the challenge of how to easily integrate serverless without major disruptions to your system, Gil Vernik explores the “push to the cloud” experience, which dramatically simplifies serverless for big data processing frameworks. Read more.

2:55pm–3:35pm Wednesday, September 25, 2019

Location: 1E 07/08

Time travel for data pipelines: Solving the mystery of what changed

Data Engineering and Architecture

Shradha Ambekar (Intuit), Sunil Goplani (Intuit), Sandeep Uttamchandani (Intuit)

A business insight shows a sudden spike. It can take hours, or days, to debug data pipelines to find the root cause. Shradha Ambekar, Sunil Goplani, and Sandeep Uttamchandani outline how Intuit built a self-service tool that automatically discovers data pipeline lineage and tracks every change, helping debug the issues in minutes—establishing trust in data while improving developer productivity. Read more.

4:35pm–5:15pm Wednesday, September 25, 2019

Location: 1A 06/07

Deep learning on mobile

Data Science, Machine Learning, & AI

Anirudh Koul (Microsoft), Meher Kasam (Square)

Over the last few years, convolutional neural networks (CNNs) have risen in popularity, especially in the area of computer vision. Anirudh Koul and Meher Kasam take you through how you can get deep neural nets to run efficiently on mobile devices. Read more.

4:35pm–5:15pm Wednesday, September 25, 2019

Location: 1A 15/16

Trill: The crown jewel of Microsoft’s streaming pipeline explained

Data Engineering and Architecture, Streaming and IoT

James Terwilliger (Microsoft Corporation), Badrish Chandramouli (Microsoft Research), Jonathan Goldstein (Microsoft Research)

Trill has been open-sourced, making the streaming engine behind services like the Bing Ads platform available for all to use and extend. James Terwilliger, Badrish Chandramouli, and Jonathan Goldstein dive into the history of and insights from streaming data at Microsoft. They demonstrate how its API can power complex application logic and the performance that gives the engine its name. Read more.

11:20am–12:00pm Thursday, September 26, 2019

Location: 1E 07/08

Using Spark for crunching astronomical data on the LSST scale

Data Engineering and Architecture

Petar Zecevic (SV Group)

The Large Scale Survey Telescope (LSST) is one of the most important future surveys. Its unique design allows it to cover large regions of the sky and obtain images of the faintest objects. After 10 years of operation, it will produce about 80 PB of data in images and catalog data. Petar Zecevic explains AXS, a system built for fast processing and cross-matching of survey catalog data. Read more.

2:05pm–2:45pm Thursday, September 26, 2019

Location: 1E 07/08

Fuzzy matching and deduplicating data: Techniques for advanced data prep

Data Engineering and Architecture

Nikki Rouda (Amazon Web Services), Janisha Anand (Amazon Web Services)

Nikki Rouda and Janisha Anand demonstrate how to deduplicate or link records in a dataset, even when the records don’t have a common unique identifier and no fields match exactly. You'll also learn how to link customer records across different databases, match external product lists against your own catalog, and solve tough challenges to prepare and cleanse data for analysis. Read more.

2:05pm–2:45pm Thursday, September 26, 2019

Location: 1A 23/24

Creating an extensible 100+ PB real-time big data platform by unifying storage and serving

Data Engineering and Architecture

Reza Shiftehfar (Uber)

Building a reliable big data platform is extremely challenging when it has to store and serve hundreds of petabytes of data in real time. Reza Shiftehfar reflects on the challenges faced and proposes architectural solutions to scale a big data platform to ingest, store, and serve 100+ PB of data with minute-level latency while efficiently utilizing the hardware and meeting security needs. Read more.

2:05pm–2:45pm Thursday, September 26, 2019

Location: 1A 15/16

Posttransaction processing using Apache Pulsar at Narvar

Data Engineering and Architecture, Streaming and IoT

Davor Bonaci (Kaskada), Anand Madhavan (Narvar)

Narvar provides next-generation posttransaction experience for over 500 retailers. Karthik Ramasamy and Anand Madhavan take you on the journey of how Narvar moved away from using a slew of technologies for their platform and consolidated its use cases using Apache Pulsar. Read more.

3:45pm–4:25pm Thursday, September 26, 2019

Location: 1A 15/16

SK Telecom's 5G network monitoring and 3D visualization on streaming technologies

Data Engineering and Architecture, Streaming and IoT

Jonghyok Lee (SK Telecom), Chon Yong Lee (SK Telecom)

Jonghyok Lee Chon Yong Lee discuss T-CORE, SK Telecom’s monitoring and service analytics platform, which collects system and application data from several thousand servers and applications and provides a 3D visualization of the real-time status of the whole network. Join in to hear lessons learned during development. Read more.