Sep 23–26, 2019

Schedule: Data Integration and Data Processing sessions

Add to your personal schedule
9:00am - 5:00pm Monday, September 23 & Tuesday, September 24
Location: 1E 06
Jesse Anderson (Big Data Institute)
Jesse Anderson offers an in-depth look at Apache Kafka. You'll learn how Kafka works and how to create real-time systems with it as well as how to create consumers and publishers. Jesse then walks you through Kafka’s ecosystem, demonstrating how to use tools like Kafka Streams, Kafka Connect, and KSQL. Read more.
Add to your personal schedule
9:00am - 5:00pm Monday, September 23 & Tuesday, September 24
Location: 1A 17
Jorge Lopez (Amazon Web Services)
Serverless technologies let you build and scale applications and services rapidly without the need to provision or manage servers. Join in to learn how to incorporate serverless concepts into your big data architectures. You'll explore design patterns to ingest, store, and analyze your data as you build a big data application using AWS technologies such as S3, Athena, Kinesis, and more. Read more.
Add to your personal schedule
9:00am12:30pm Tuesday, September 24, 2019
Location: 1E 10
Ricardo Ferreira (Confluent)
Building stream processing applications are certainly one of the hot topics among the IT community. Though a lot has been talked about this subject, one might say that building stream processing applications is the new sex during teenage. This tutorial aims to change this by introducing KSQL, the stream processing query engine built on top of Apache Kafka. Read more.
Add to your personal schedule
11:20am12:00pm Wednesday, September 25, 2019
Location: 1A 15/16
Navinder Pal Singh Brar (Walmart Labs)
Each week 275 million people shop at Walmart, generating multi-terabytes of interaction and transaction data. In Customer Backbone team, we enable extraction, transforming and storing of customer data to be served to teams such as Ads and Personalisation. At 5 Billion events/day our Kafka Streams cluster processes events from various channels and maintains a uniform identity of a customer. Read more.
Add to your personal schedule
1:15pm1:55pm Wednesday, September 25, 2019
Location: 1E 07/08
Gil Vernik (IBM)
Most analytic flows can benefit from the serverless, starting with simple cases to complex data preparations for AI frameworks, like TensorFlow. To address the challenge of how to easily integrate serverless, without major disruptions to your system, we present “push to the cloud” experience. This ability dramatically simplifies using serverless for different big data processing frameworks. Read more.
Add to your personal schedule
2:55pm3:35pm Wednesday, September 25, 2019
Location: 1E 07/08
Shradha Ambekar (Intuit), Sunil Goplani (Intuit), Sandeep Uttamchandani (Intuit)
Imagine a business insight showing a sudden spike.Debugging data pipelines is non-trivial and finding the root cause can take hours or even days! We’ll share how Intuit built a self-serve tool that automatically discovers data pipeline lineage and tracks every change that impacts pipeline.This helps debug pipeline issues in minutes–establishing trust in data while improving developer productivity. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 25, 2019
Location: 1A 06/07
Siddha Ganju (NVIDIA), Meher Kasam (Square)
Optimizing deep neural nets to run efficiently on mobile devices. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 25, 2019
Location: 1A 15/16
James Terwilliger (Microsoft Corporation), Badrish Chandramouli (Microsoft Research), Jonathan Goldstein (Microsoft Research)
Trill has been open-sourced, making the streaming engine behind services like the multi-billion-dollar Bing Ads platform available for all to use and extend. We give a brief history of streaming data at Microsoft and lessons learned. We then demonstrate how its API can power complex application logic, and the performance that gives the engine its name: a trillion events per day per node. Read more.
Add to your personal schedule
11:20am12:00pm Thursday, September 26, 2019
Location: 1E 07/08
Petar Zecevic (SV Group d.o.o.)
Large Scale Survey Telescope, or LSST, is one of the most important future surveys. Its unique design will allow it to cover large regions of the sky and obtain images of the faintest objects. In 10 years of its operation it will produce about 80 PB of data, both in images and catalog data. I will present AXS, a system we built for fast processing and cross-matching of survey catalog data. Read more.
Add to your personal schedule
2:05pm2:45pm Thursday, September 26, 2019
Location: 1E 07/08
Nikki Rouda (Amazon Web Services), Roy Hasson (Amazon Web Services)
Learn how to deduplicate or link records in a dataset, even when the records don’t have a common unique identifier and no fields match exactly. Link customer records across different databases (e.g. different name spelling or address.) Match external product lists against your own catalog, such as lists of hazardous goods. Solve tough challenges to prepare and cleanse data for analysis. Read more.
Add to your personal schedule
2:05pm2:45pm Thursday, September 26, 2019
Location: 1A 23/24
Reza Shiftehfar (Uber Technologies)
Building a reliable Big Data platform is extremely challenging when it has to store and serve 100s of PetaBytes of data in a real-time fashion . This talk reflects on the challenges faced and proposes architectural solutions to scale a Big Data Platform to ingest, store, and serve 100+ PB of data with minute level latency while efficiently utilizing the hardware and meeting the security needs. Read more.
Add to your personal schedule
2:05pm2:45pm Thursday, September 26, 2019
Location: 1A 15/16
Karthik Ramasamy (Streamlio), Anand Madhavan (Narvar)
Narvar provides next generation post transaction experience for over 500+ retailers. This talk explores the journey of how Narvar moving away from using a slew of technologies for their platform and consolidating their use cases using Apache Pulsar. Read more.
Add to your personal schedule
3:45pm4:25pm Thursday, September 26, 2019
Location: 1A 15/16
Jonghyok Lee (SK Telecom), Chon Yong Lee (SK Telecom)
Architecture and lessons learned from development of T-CORE, SK Telecom’s monitoring and service analytics platform, which collects system and application data from several thousand servers and applications and provides 3D visualized real-time status of the whole network and services for the operators and analytics platform for data scientists, engineers and developers. Read more.

    Contact us

    confreg@oreilly.com

    For conference registration information and customer service

    partners@oreilly.com

    For more information on community discounts and trade opportunities with O’Reilly conferences

    strataconf@oreilly.com

    For information on exhibiting or sponsoring a conference

    Contact list

    View a complete list of Strata Data Conference contacts