Sep 23–26, 2019

Schedule: Streaming and IoT sessions

Add to your personal schedule
9:00am12:30pm Tuesday, September 24, 2019
Location: 1E 10
Ricardo Ferreira (Confluent)
Building stream processing applications are certainly one of the hot topics among the IT community. Though a lot has been talked about this subject, one might say that building stream processing applications is the new sex during teenage. This tutorial aims to change this by introducing KSQL, the stream processing query engine built on top of Apache Kafka. Read more.
Add to your personal schedule
9:00am12:30pm Tuesday, September 24, 2019
Location: 1E 09
Arun Kejariwal (Facebook), Karthik Ramasamy (Streamlio), Anurag Khandelwal (RISELab, UC Berkeley)
In this tutorial, we shall walk the audience through the landscape of streaming systems and overview the inception and growth of the serverless paradigm. Next, we shall present a deep dive of Apache Pulsar which provides native serverless support in the form of Pulsar functions and paint a bird’s eye view of the application domains where Pulsar functions can be leveraged. Read more.
Add to your personal schedule
9:00am12:30pm Tuesday, September 24, 2019
Location: 1E 11
Purnima Reddy Kuchikulla (Cloudera), Timothy Spann (Cloudera), Abdelkrim Hadjidj (Cloudera)
Too many edge devices and agents. How does one control and manage them. How do we have handle the difficulty in collecting real-time data and most importantly, the trouble with updating specific set of agents with edge applications. Get your hands dirty with Cloudera Edge Management that addresses these challenges with ease. Read more.
Add to your personal schedule
9:00am5:00pm Tuesday, September 24, 2019
Location: 1A 08
Alistair Croll (Solve For Interesting), Jennifer Yang (Wells Fargo ECS), Nitzan Mekel-Bobrov (Capital One), Dan Barker (RSA Security), Rochelle March (Trucost), Catherine Gu (Stanford University), Moto Tohda (Tokyo Century (USA) Inc.), Mikheil Nadareishvili (TBC Bank), Jennifer Kloke (Ayasdi)
From analyzing risk and detecting fraud to predicting payments and improving customer experience, take a deep dive into the ways data technologies are transforming the financial industry. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, September 24, 2019
Location: 1E 08
Sophie Watson (Red Hat), William Benton (Red Hat)
In this hands-on workshop, we’ll introduce several data structures that let you answer interesting queries about massive data sets in fixed amounts of space and constant time. This seems like magic, but we'll explain the key trick that makes it possible and show you how to use these structures for real-world machine learning and data engineering applications. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, September 24, 2019
Location: 1E 14
Purnima Reddy Kuchikulla (Cloudera), Dan Chaffelson (Cloudera)
Kafka is omnipresent and is the backbone of not only streaming analytics applications but data lakes as well. The challenge is understanding what is going on overall in the Kafka cluster including performance, issues and message flows. This session gives a hands on experience to visualize their entire Kafka environment end-to-end and simplifies Kafka operations via SMM. Read more.
Add to your personal schedule
11:20am12:00pm Wednesday, September 25, 2019
Location: 1A 15/16
Navinder Pal Singh Brar (Walmart Labs)
Each week 275 million people shop at Walmart, generating multi-terabytes of interaction and transaction data. In Customer Backbone team, we enable extraction, transforming and storing of customer data to be served to teams such as Ads and Personalisation. At 5 Billion events/day our Kafka Streams cluster processes events from various channels and maintains a uniform identity of a customer. Read more.
Add to your personal schedule
1:15pm1:55pm Wednesday, September 25, 2019
Location: 1E 12/13
Felipe Hoffa (Google), Bob Bradley (Geotab)
Geotab is a world's leading asset tracking company, with millions of vehicles under service every day. In the first part of this talk we are going to review their challenges and solutions to create an ML and GIS enabled petabyte scale data warehouse leveraging Google Cloud. Then we are going to review their process to publish open, how to access it, and how cities are using it. Read more.
Add to your personal schedule
2:05pm2:45pm Wednesday, September 25, 2019
Location: 1A 15/16
Stephan Ewen (Ververica), Aljoscha Krettek (data Artisans)
The talk discusses how stream processing is becoming a "grand unifying paradigm" for data processing and the newest developments in Apache Flink to support this trend: New cross-batch-streaming Machine Learning algorithms, State-of-the-art batch performance, and new building blocks for data-driven applications and application consistency. Read more.
Add to your personal schedule
2:55pm3:35pm Wednesday, September 25, 2019
Location: 1E 12/13
Tim McKenzie (Pitney Bowes)
Planning 5G network rollout and associated services requires a good understanding of location based data. Accurate addressing and linking consumers to property parcels or points of interest allows data enrichment with property attributes, demographics and social data. Companies use location to organize and analyze network and customer data in order to understand where to target new services. Read more.
Add to your personal schedule
2:55pm3:35pm Wednesday, September 25, 2019
Location: 1A 15/16
Weisheng Xie (China Telecom BestPay Co., Ltd), Sijie Guo (ASF)
As a Fintech company of China Telecom with half billion registered users and 41 million monthly active users, risk control decision deployment has been critical to the success of the business. In this talk we share how we leverage Apache Pulsar to boost the efficiency of our risk control decision development for combating financial frauds over 50 million transactions a day. Read more.
Add to your personal schedule
4:35pm5:15pm Wednesday, September 25, 2019
Location: 1A 15/16
James Terwilliger (Microsoft Corporation), Badrish Chandramouli (Microsoft Research), Jonathan Goldstein (Microsoft Research)
Trill has been open-sourced, making the streaming engine behind services like the multi-billion-dollar Bing Ads platform available for all to use and extend. We give a brief history of streaming data at Microsoft and lessons learned. We then demonstrate how its API can power complex application logic, and the performance that gives the engine its name: a trillion events per day per node. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 25, 2019
Location: 1A 15/16
Bas Geerdink (ING)
Streaming Analytics (or Fast Data processing) is the field of making predictions on real-time data. In this talk, I'll present a fast data architecture that covers many use cases that follows a 'pipes and filters' pattern. This architecture can be used to create enterprise-grade solutions with a diversity of technology options. The stack is Kafka, Ignite, and Spark Structured Streaming (KISSS). Read more.
Add to your personal schedule
11:20am12:00pm Thursday, September 26, 2019
Location: 1A 15/16
Michael Freedman (TimescaleDB)
Leveraging polyglot solutions for your time-series data can lead to a variety of issues including engineering complexity, operational challenges, and even referential integrity concerns. By re-engineering Postgres to serve as a general data platform, your high-volume time-series workloads will be better streamlined, resulting in more actionable data and greater ease of use. Read more.
Add to your personal schedule
11:20am12:00pm Thursday, September 26, 2019
Location: 1A 21/22
Stavros Kontopoulos (Lightbend), Debasish Ghosh (Lightbend )
In this talk, we discuss online machine learning algorithm choices for streaming applications. We motive the discussion with resource constrained use cases like IoT and personalization. We cover Hoeffding Adaptive Trees, classic sketch data structures, and drift detection algorithms, all the way from implementation to production deployment, describing the pros and cons of using each of them. Read more.
Add to your personal schedule
2:05pm2:45pm Thursday, September 26, 2019
Location: 1A 06/07
Ryan Foltz (Exabeam)
Unmanaged & foreign devices in the corporate networks pose a security risk. The 1st step toward reducing risk from these devices is the ability to identify them. To have a comprehensive device management program, we proposed a machine learning model based on Deep Learning to perform anomaly detection based on only device names to flag devices that do not follow device naming structures. Read more.
Add to your personal schedule
2:05pm2:45pm Thursday, September 26, 2019
Location: 3B - Expo Hall
Heitor Murilo Gomes (Télécom ParisTech), Albert Bifet (Télécom ParisTech)
In this talk, we show how to develop a machine learning pipeline for streaming data using the StreamDM framework (https://github.com/huawei-noah/streamDM). We also introduce how to use StreamDM for supervised and unsupervised learning tasks, show examples of online preprocessing methods, and how to expand the framework adding new learning algorithms or preprocessing methods. Read more.
Add to your personal schedule
2:05pm2:45pm Thursday, September 26, 2019
Location: 1A 15/16
Karthik Ramasamy (Streamlio), Anand Madhavan (Narvar)
Narvar provides next generation post transaction experience for over 500+ retailers. This talk explores the journey of how Narvar moving away from using a slew of technologies for their platform and consolidating their use cases using Apache Pulsar. Read more.
Add to your personal schedule
3:45pm4:25pm Thursday, September 26, 2019
Location: 1A 15/16
Jonghyok Lee (SK Telecom), Chon Yong Lee (SK Telecom)
Architecture and lessons learned from development of T-CORE, SK Telecom’s monitoring and service analytics platform, which collects system and application data from several thousand servers and applications and provides 3D visualized real-time status of the whole network and services for the operators and analytics platform for data scientists, engineers and developers. Read more.
Add to your personal schedule
4:35pm5:15pm Thursday, September 26, 2019
Location: 1E 10/11
Dean Wampler (Lightbend)
Join me for a discussion of the following problems and their solutions: 1. How (and why) to integrate ML into production streaming data pipelines, to serve results quickly? 2. How to bridge data science and production environments, with different tools, techniques, and requirements? 3. How to build reliable and scalable, long-running services? 4. How to update ML models without downtime? Read more.

    Contact us

    confreg@oreilly.com

    For conference registration information and customer service

    partners@oreilly.com

    For more information on community discounts and trade opportunities with O’Reilly conferences

    strataconf@oreilly.com

    For information on exhibiting or sponsoring a conference

    Contact list

    View a complete list of Strata Data Conference contacts