Mar 15–18, 2020

Sunday, March 15, 2020

9:00am

Add to your personal schedule
9:00am–5:00pm Sunday, 03/15/2020
Training
Jesse Anderson (Big Data Institute)
Jesse Anderson leads a deep dive into Apache Kafka. You'll learn how Kafka works and how to create real-time systems with it. You'll also discover how to create consumers and publishers in Kafka and how to use Kafka Streams, Kafka Connect, and KSQL as you explore the Kafka ecosystem. Read more.
Add to your personal schedule
9:00am–5:00pm Sunday, 03/15/2020
Training
David Anderson (Ververica), Seth Wiesman (Ververica)
David Anderson and Seth Wiesman lead a hands-on introduction to Apache Flink for Java and Scala developers who want to learn to build streaming applications. You'll focus on the core concepts of distributed streaming data flows, event time, and key-partitioned state, while looking at runtime, ecosystem, and use cases with exercises to help you understand how the pieces fit together. Read more.
Add to your personal schedule
9:00am–5:00pm Sunday, 03/15/2020
Training
The instructors provide a nontechnical overview of AI and data science. Learn common techniques, how to apply them in your organization, and common pitfalls to avoid. You’ll pick up the language and develop a framework to be able to effectively engage with technical experts and use their input and analysis for your business’s strategic priorities and decision making. Read more.
Add to your personal schedule
9:00am–5:00pm Sunday, 03/15/2020
You'll walk through all the steps—from prototyping to production—of developing a machine learning pipeline. After looking at data cleaning, feature engineering, model building and evaluation, and deployment, you'll extend these models into two applications from real-world datasets. All your work will be done in Python. Read more.
Add to your personal schedule
9:00am–5:00pm Sunday, 03/15/2020
Training
The TensorFlow library provides for the use of computational graphs with automatic parallelization across resources. This architecture is ideal for implementing neural networks. You'll be introduced to TensorFlow's capabilities in Python, moving from building machine learning algorithms piece-by-piece to using the Keras API provided by TensorFlow with several hands-on applications. Read more.
Add to your personal schedule
9:00am–5:00pm Sunday, 03/15/2020
Training
Nikki Rouda (Amazon Web Services)
Serverless technologies let you build and scale applications and services rapidly without the need to provision or manage servers. Join Nikki Rouda to learn how to incorporate serverless concepts into your big data architectures. You'll explore design patterns to ingest, store, and analyze your data as you build a big data application using AWS technologies such as S3, Athena, Kinesis, and more. Read more.
Add to your personal schedule
9:00am–5:00pm Sunday, 03/15/2020
Training
Jorge Villamariona outlines how organizations using a single platform for processing all types of big data workloads are able to manage growth and complexity, react faster to customer needs, and improve collaboration—all at the same time. You'll leverage Apache Spark and Hive to build an end-to-end solution to address business challenges common in retail and ecommerce. Read more.
Add to your personal schedule
9:00am–5:00pm Sunday, 03/15/2020
Training
Bruno Goncalves (Data For Science)
Time series are everywhere around us. Understanding them requires taking into account the sequence of values seen in previous steps and even long-term temporal correlations. Bruno Goncalves explains a broad range of traditional machine learning (ML) and deep learning techniques to model and analyze time series datasets with an emphasis on practical applications. Read more.
Add to your personal schedule
9:00am–5:00pm Sunday, 03/15/2020
Training
The TensorFlow library provides computational graphs with automatic parallelization across resources—ideal architecture for implementing neural networks. You'll walk through TensorFlow's capabilities in Python, from building machine learning algorithms piece by piece to using the Keras API provided by TensorFlow, with several hands-on applications. Read more.
Add to your personal schedule
9:00am–5:00pm Sunday, 03/15/2020
Training
Bargava Subramanian (Binaize Labs), Amit Kapoor (narrativeVIZ)
Bargava Subramanian and Amit Kapoor provide you with a thorough introduction to the art and science of building recommendation systems and paradigms across domains. You'll get an end-to-end overview of deep learning-based recommendation and learning-to-rank systems to understand practical considerations and guidelines for building and deploying RecSys. Read more.
Add to your personal schedule
9:00am–5:00pm Sunday, 03/15/2020
Training
Rich Ott (The Pragmatic Institute)
PyTorch is a machine learning library for Python that allows you to build deep neural networks with great flexibility. Its easy-to-use API and seamless use of GPUs make it a sought-after tool for deep learning. Join in to get the knowledge you need to build deep learning models using real-world datasets and PyTorch with Rich Ott. Read more.
Add to your personal schedule
9:00am–5:00pm Sunday, 03/15/2020
Training
Delip Rao (AI Foundation)
Delip Rao explores natural language processing (NLP) using a set of machine learning techniques known as deep learning. He walks you through neural network architectures and NLP tasks and teaches you how to apply these architectures for those tasks. Read more.
Add to your personal schedule
9:00am–5:00pm Sunday, 03/15/2020
Training
Wenming Ye (Amazon Web Services)
Machine learning (ML) and deep learning (DL) projects are becoming increasingly common at enterprises and startups alike and have been a key innovation engine for Amazon businesses such as Go, Alexa, and Robotics. Wenming Ye demonstrates a practical next step in DL learning with instructions, demos, and hands-on labs. Read more.

10:30am

10:30am–11:00am Sunday, 03/15/2020
Morning break (30m)

12:30pm

12:30pm–1:30pm Sunday, 03/15/2020
Lunch (1h)

3:00pm

3:00pm–3:30pm Sunday, 03/15/2020
Afternoon break (30m)

7:00pm

Add to your personal schedule
7:00pm–9:00pm Sunday, 03/15/2020
Event
Get to know your fellow attendees over dinner. We've made reservations for you at some of the most sought-after restaurants in town. This is a great chance to make new connections and sample some of the great cuisine San Jose has to offer. Read more.

Monday, March 16, 2020

9:00am

Add to your personal schedule
9:00am–5:00pm Monday, 03/16/2020
From banking to biotech, retail to government, every business sector is changing in the face of abundant data. Get better at defining business problems and applying data solutions at Strata Data & AI. Read more.
9:00am–12:30pm Monday, 03/16/2020 TBC
Add to your personal schedule
9:00am–12:30pm Monday, 03/16/2020
Tutorial
Data Quality
Matt Harrison (MetaSnake)
You can use pandas to load data, inspect it, tweak it, visualize it, and do analysis with only a few lines of code. Matt Harrison leads a deep dive in plotting and Matplotlib integration, data quality, and issues such as missing data. Matt uses the split-apply-combine paradigm with groupBy and Pivot and explains stacking and unstacking data. Read more.
Add to your personal schedule
Add to your personal schedule
Add to your personal schedule
Add to your personal schedule
Add to your personal schedule
Add to your personal schedule
SOLD OUT
Add to your personal schedule
Add to your personal schedule
9:00am–5:00pm Monday, 03/16/2020
Dive into health, technology, and data in a day-long series of curated talks. Health Data Day at Strata Data & AI takes a closer look at how algorithms, sensors, and big data are changing healthcare forever. Read more.
Add to your personal schedule
9:00am–12:30pm Monday, 03/16/2020
Alice Zhao (Metis)
Data scientists are known to crunch numbers, but you may also run into text data. Alice Zhao teaches you to turn text data into a format that a machine can understand, identifies some of the most popular text analytics techniques, and showcases several natural language processing (NLP) libraries in Python including the natural language toolkit (NLTK), TextBlob, spaCy, and gensim. Read more.
Add to your personal schedule
9:00am–12:30pm Monday, 03/16/2020
Secondary topics:  Streaming and IoT
David Anderson (Ververica), Seth Wiesman (Ververica)
David Anderson and Seth Wiesman demonstrate how building and managing scalable, stateful, event-driven applications can be easier and more straightforward than you might expect. You'll go hands-on to implement a ride-sharing application together. Read more.
Add to your personal schedule
9:00am–12:30pm Monday, 03/16/2020
Sourav Dey (Manifold), Alex Ng (Manifold)
Today, ML engineers are working at the intersection of data science and software engineering—that is, MLOps. Sourav Dey and Alex Ng highlight the six steps of the Lean AI process and explain how it helps ML engineers work as an integrated part of development and production teams. You'll go hands-on using real-world data so you can get up and running seamlessly. Read more.
Add to your personal schedule
9:00am–12:30pm Monday, 03/16/2020
Danilo Sato (ThoughtWorks)
Danilo Sato lead you through applying continuous delivery (CD) to data science and machine learning (ML). Join in to learn how to make changes to your models while safely integrating and deploying them into production using testing and automation techniques to release reliably at any time and with a high frequency. Read more.
Add to your personal schedule
9:00am–12:30pm Monday, 03/16/2020
Mehrnoosh Sameki (MERS) (Microsoft), Sarah Bird (Microsoft)
Mehrnoosh Sameki and Sarah Bird examine six core principles of responsible AI: fairness, reliability and safety, privacy and security, inclusiveness, transparency, and accountability, focusing on transparency, fairness, and privacy. You'll discover best practices and state-of-the-art open source toolkits that empower researchers, data scientists, and stakeholders to build trustworthy AI systems. Read more.
Add to your personal schedule
Add to your personal schedule
Add to your personal schedule
Add to your personal schedule
Add to your personal schedule
Add to your personal schedule
Add to your personal schedule
9:00am–12:30pm Monday, 03/16/2020
Tutorial
Jike Chong (LinkedIn), Yue Cathy Chang (TutumGene)
More than 85% of data science projects fail. This high failure rate is a main reason why data science is still a "science." As data science practitioners, reducing this failure rate is a priority. Jike Chong and Yue Cathy Chang explain the three key steps of applying data science technology to business problems and three concerns for applying domain insights in AI and ML initiatives. Read more.
Add to your personal schedule
9:00am–12:30pm Monday, 03/16/2020
Tutorial
Paroma Varma (Snorkel)
Paroma Varma teaches you how to build and manage training datasets programmatically with Snorkel, an open source framework developed at the Stanford AI Lab, and demonstrates how this can lead to more efficiently building and managing machine learning (ML) models in a range of practical settings. Read more.
Add to your personal schedule
9:00am–12:30pm Monday, 03/16/2020
Tutorial
Robert Crowe (Google)
Putting together an ML production pipeline for training, deploying, and maintaining ML and deep learning applications is much more than just training a model. Robert Crowe outlines what's involved in creating a production ML pipeline and walks you through working code. Read more.
Add to your personal schedule
9:00am–12:30pm Monday, 03/16/2020
Tutorial
Catherine Nelson (Concur Labs, SAP Concur), Hannes Hapke (Wunderbar.ai)
Most deep learning models don’t get analyzed, validated, and deployed. Catherine Nelson and Hannes Hapke explain the necessary steps to release machine learning models for real-world applications. You'll view an example project using the TensorFlow ecosystem, focusing on how to analyze models and deploy them efficiently. Read more.
Add to your personal schedule
9:00am–12:30pm Monday, 03/16/2020
Tutorial
Fatma Tarlaci (Quansight)
Language is at the heart of everything we—humans—do. Natural language processing (NLP) is one of the most challenging tasks of artificial intelligence, mainly due to the difficulty of detecting nuances and common sense reasoning in natural language. Fatma Tarlaci invites you to learn more about NLP and get a complete hands-on implementation of an NLP deep learning model. Read more.

10:30am

10:30am–11:00am Monday, 03/16/2020
Morning break (30m)

12:30pm

12:30pm–1:30pm Monday, 03/16/2020
Lunch sponsored by Intel AI (1h)

1:30pm

Add to your personal schedule
1:30pm–5:00pm Monday, 03/16/2020
Robert Horton (Microsoft), Mario Inchiosa (Microsoft), John-Mark Agosta (Microsoft)
Robert Horton, Mario Inchiosa, and John-Mark Agosta offer an overview of the fundamental concepts of machine learning (ML) to business and healthcare decision makers and software product managers so you'll be able to make a more effective use of ML results and be better able to evaluate opportunities to apply ML in your industries. Read more.
Add to your personal schedule
1:30pm–5:00pm Monday, 03/16/2020
Robert Nishihara (University of California, Berkeley), Ion Stoica (University of California, Berkeley), Philipp Moritz (University of California, Berkeley)
There's no easy way to scale up Python applications to the cloud. Ray is an open source framework for parallel and distributed computing, making it easy to program and analyze data at any scale by providing general-purpose high-performance primitives. Robert Nishihara, Ion Stoica, and Philipp Moritz demonstrate how to use Ray to scale up Python applications, data processing, and machine learning. Read more.
Add to your personal schedule
1:30pm–5:00pm Monday, 03/16/2020
David Talby (Pacific AI), Alex Thomas (John Snow Labs), Claudiu Branzan (Accenture)
David Talby, Alex Thomas, and Claudiu Branzan detail the application of the latest advances in deep learning for common natural language processing (NLP) tasks such as named entity recognition, document classification, sentiment analysis, spell checking, and OCR. You'll learn to build complete text analysis pipelines using the highly performant, scalable, open source Spark NLP library in Python. Read more.
Add to your personal schedule
1:30pm–5:00pm Monday, 03/16/2020
Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio), Anurag Khandelwal (RISELab, UC Berkeley)
Arun Kejariwal, Karthik Ramasamy, and Anurag Khandelwal walk you through through the landscape of streaming systems for each stage of an end-to-end data processing pipeline—messaging, compute, and storage. You'll get an overview of the inception and growth of the serverless paradigm. They explore Apache Pulsar, which provides native serverless support in the form of Pulsar functions. Read more.
Add to your personal schedule
1:30pm–5:00pm Monday, 03/16/2020
Boris Lublinsky (Lightbend), Dean Wampler (Lightbend)
Machine learning (ML) models are data, which means they require the same data governance considerations as the rest of your data. Boris Lublinsky and Dean Wampler outline metadata management for model serving and explore what information about running systems you need and why it's important. You'll also learn how Apache Atlas can be used for storing and managing this information. Read more.
1:30pm–5:00pm Monday, 03/16/2020
TBC
Add to your personal schedule
1:30pm–5:00pm Monday, 03/16/2020
Patrick Hall (H2O.ai | George Washington University)
Even if you've followed current best practices for model training and assessment, machine learning models can be hacked, socially discriminatory, or just plain wrong. Patrick Hall breaks down model debugging strategies to test and fix security vulnerabilities, unwanted social biases, and latent inaccuracies in models. Read more.
Add to your personal schedule
1:30pm–5:00pm Monday, 03/16/2020
Tutorial
Ira Cohen (Anodot)
While the role of the manager doesn't require deep knowledge of ML algorithms, it does require understanding how ML-based products should be developed. Ira Cohen explores the cycle of developing ML-based capabilities (or entire products) and the role of the (product) manager in each step of the cycle. Read more.
Add to your personal schedule
1:30pm–5:00pm Monday, 03/16/2020
Tutorial
Mars Geldard (University of Tasmania), Paris Buttfield-Addison (Secret Lab), Tim Nugent (lonely.coffee)
Mars Geldard, Tim Nugent, and Paris Buttfield-Addison are here to prove Swift isn't just for app developers. Swift for TensorFlow provides the power of TensorFlow with all the advantages of Python (and complete access to Python libraries) and Swift—the safe, fast, incredibly capable open source programming language; Swift for TensorFlow is the perfect way to learn deep learning and Swift. Read more.
Add to your personal schedule
1:30pm–5:00pm Monday, 03/16/2020
Tutorial
Dennis Wei (IBM Research)
Dennis Wei teaches you to use and contribute to the new open source Python package AI Explainability 360 directly from its creators. Dennis translates new developments from research labs to data science practitioners in industry. You'll get a first look at the first comprehensive toolkit for explainable AI, including eight diverse and state-of-the-art methods from IBM Research. Read more.
Add to your personal schedule
1:30pm–5:00pm Monday, 03/16/2020
Tutorial
Vijay Srinivas Agneeswaran (Walmart Labs), Pramod Singh (Walmart Labs ), Akshay kulkarni (Publicis Sapient)
Vijay Srinivas Agneeswaran, Pramod Singh, and Akshay Kulkarni demonstrate the in-depth process of building a text summarization model with an attention network using TensorFlow (TF) 2.0. You'll gain the practical hands-on knowledge to build and deploy a scalable text summarization model on top of Kubeflow. Read more.
Add to your personal schedule
1:30pm–7:30pm Monday, 03/16/2020
Event
Join us during the O’Reilly Artificial Intelligence Conference to learn how to deploy enterprise AI solutions with Intel and its partner ecosystem. This event features offerings for a wide variety of industries and AI use cases. Read more.
Add to your personal schedule
1:30pm–5:00pm Monday, 03/16/2020
Tutorial
lukas biewald (Weights & Biases)
Join Lukas Biewald to build and deploy long short-term memories (LSTMs), grated recurrent units (GRUs), and other text classification techniques using Keras and scikit-learn. Read more.

3:00pm

3:00pm–3:30pm Monday, 03/16/2020
Afternoon break (30m)

5:00pm

Add to your personal schedule
5:00pm–7:00pm Monday, 03/16/2020
Event
Enjoy delicious snacks and beverages with fellow Strata & AI attendees, speakers, and sponsors at the Opening Reception, happening immediately after tutorials on Monday. Read more.

Tuesday, March 17, 2020

8:00am

Add to your personal schedule
8:00am–8:30am Tuesday, 03/17/2020
Event
Gather before keynotes on Wednesday morning to enjoy casual conversation while meeting fellow attendees. If one of your goals at Strata & AI is to meet new people, this session will jumpstart your networking opportunities. Read more.

8:45am

Add to your personal schedule
8:45am–10:30am Tuesday, 03/17/2020
Keynote
Rachel Roumeliotis (O'Reilly), Alistair Croll (Solve For Interesting)
Strata program chairs Rachel Roumeliotis and Alistair Croll welcome you to the first day of keynotes. Read more.

10:30am

10:30am–11:00am Tuesday, 03/17/2020
Morning break sponsored by Dataiku (30m)

11:00am

Add to your personal schedule
11:00am–11:40am Tuesday, 03/17/2020
Sandeep U (Intuit), Giriraj Bagdi (Intuit), Sunil Goplani (Intuit)
Data quality metrics focus on quantifying whether data is a mess. But you need to identify lead indicators before data becomes a mess. Sandeep U, Giriraj Bagadi, and Sunil Goplani explore developing lead indicators for data quality for Intuit's production data pipelines. You'll learn about the details of lead indicators, optimization tools, and lessons that moved the needle on data quality. Read more.
Add to your personal schedule
11:00am–11:40am Tuesday, 03/17/2020
Alasdair Allan (Babilim Light Industries)
Much of the data we collect is thrown away, but that's about to change; the power envelope needed to run machine learning models on embedded hardware has fallen dramatically, enabling you to put the smarts on the device rather than in the cloud. Alasdair Allan explains how the data you threw away can be processed in real time at the edge, and this has huge implications for how you deal with data. Read more.
Add to your personal schedule
11:00am–11:40am Tuesday, 03/17/2020
Dean Wampler (Lightbend), Boris Lublinsky (Lightbend)
Production deployment of machine learning (ML) models requires data governance, because models are data. Dean Wampler and Boris Lublinsky justify that claim and explore its implications and techniques for satisfying the requirements. Using motivating examples, you'll explore reproducibility, security, traceability, and auditing, plus some unique characteristics of models in production settings. Read more.
11:00am–11:40am Tuesday, 03/17/2020
TBC
Add to your personal schedule
11:00am–11:40am Tuesday, 03/17/2020
Rashmina Menon (GumGum), jatinder assi (GumGum)
GumGum receives 30 billion programmatic inventory impressions amounting to 25 TB of data per day. By generating near-real-time inventory forecast subject to campaign-specific targeting rules, it enables users to set up successful future campaigns. Rashmina Menon and Jatinder Assi highlight the architecture enabling forecasting in less than 30 seconds with Delta Lake and Databricks Delta caching. Read more.
Add to your personal schedule
11:00am–11:40am Tuesday, 03/17/2020
George Chkadua (TBC Bank), Levan Borchkhadze (TBC Bank)
TBC Bank is in transition from a product-centric to a client-centric company. Obvious applications of analytics are developing personalized next-best product recommendation for clients. George Chkadua and Levan Borchkhadze explain why it decided to implement ALS user-item matrix factorization method and demographic model. As as result, the pilot increased sales conversion rates by 70%. Read more.
11:00am–11:40am Tuesday, 03/17/2020 TBC
Add to your personal schedule
11:00am–11:40am Tuesday, 03/17/2020
Mudasir Ahmad (Cisco)
Artificial intelligence (AI) is a natural fit for supply chain operations, where decisions and actions need to be taken daily or even hourly, relating to delivery, manufacturing, quality, logistics, and planning. Mudasir Ahmad explains how AI can be implemented in a scalable and cost-effective way in your business' supply chain operations. You'll identify benefits and potential challenges. Read more.
Add to your personal schedule
11:00am–11:40am Tuesday, 03/17/2020
Trevor Grant (IBM), Holden Karau (Independent)
Trevor Grant and Holden Karau discuss getting and keeping your models in production with Kubeflow. Read more.
Add to your personal schedule
11:00am–11:40am Tuesday, 03/17/2020
Session
Tony Xing (Microsoft)
Anomaly detection may sound old-fashioned, yet it's super important in many industrial applications. Tony Xing outlines a novel anomaly detection algorithm based on spectral residual (SR) and convolutional neural networks (CNNs) and how this novel method was applied in the monitoring system supporting Microsoft AIOps and business incident prevention. Read more.
Add to your personal schedule
11:00am–11:40am Tuesday, 03/17/2020
Session
Jameson Toole (Fritz AI)
Getting machine learning (ML) models ready for use on device is a major challenge. Jameson Toole explains optimization, pruning, and compression techniques that keep app sizes small and inference speeds high. You'll learn to apply these techniques using mobile ML frameworks such as Core ML and TensorFlow Lite. Read more.
Add to your personal schedule
11:00am–11:40am Tuesday, 03/17/2020
Session
Divya Sivasankaran (integrate.ai)
In recent years, there's been a lot of attention on the need for ethical considerations in ML, as well as different ways to address bias in different stages of the ML pipeline. However, there hasn't been a lot of focus on how to bring fairness to ML products. Divya Sivasankaran explores the key challenges (and how to overcome them) in operationalizing fairness and bias in ML products. Read more.
Add to your personal schedule
11:00am–11:40am Tuesday, 03/17/2020
Session
Hannes Hapke (Wunderbar.ai), Catherine Nelson (Concur Labs, SAP Concur)
Measuring the machine learning model’s performance is key for every successful data science project. Therefore, model feedback loops are essential to capture feedback from users and to expand your model’s training dataset. This talk will introduce the concept of model feedback to you and guide you through a framework for increasing the ROI of your data science project. Read more.
Add to your personal schedule
11:00am–11:40am Tuesday, 03/17/2020
Navinder Pal Singh Brar (Walmart Labs)
One of the major use cases for stream processing is real-time fraud detection. Ecommerce has to deal with frauds on a wider scale as more and more companies are trying to provide customers with incentives such as free shipping by moving on to subscription-based models. Navinder Pal Singh Brar dives into the architecture, problems faced, and lessons from building such a pipeline. Read more.

11:50am

Add to your personal schedule
11:50am–12:30pm Tuesday, 03/17/2020
Abe Gong (Superconductive Health)
Data organizations everywhere struggle with pipeline debt: untested, unverified assumptions that corrupt data quality, drain productivity, and erode trust in data. Abe Gong shares best practices gathered from across the data community in the course of developing a leading open source library for fighting pipeline debt and ensuring data quality: Great Expectations. Read more.
Add to your personal schedule
11:50am–12:30pm Tuesday, 03/17/2020
Secondary topics:  Cloud Platforms and SaaS
Jimmy Bates (Pepperdata)
Jimmy Bates offers an impartial evaluation of Amazon Elastic MapReduce (EMR), Azure HDInsight, and Google Cloud DataProc, three leading cloud service providers, with respect to Hadoop and big data autoscaling capabilities and provides guidance to help you determine the flavor of autoscaling to best fit your business needs. Read more.
Add to your personal schedule
11:50am–12:30pm Tuesday, 03/17/2020
Secondary topics:  Data Quality
Shradha Ambekar (Intuit), Sunil Goplani (Intuit)
Debugging data pipelines is nontrivial and finding the root cause can take hours to days. Shradha Ambekar and Sunil Goplani outline how Intuit built a self-serve tool that automatically discovers data pipeline lineage and applies anomaly detection to detect and help debug issues in minutes–establishing trust in metrics and improving developer productivity by 10-100X. Read more.
11:50am–12:30pm Tuesday, 03/17/2020
TBC
Add to your personal schedule
11:50am–12:30pm Tuesday, 03/17/2020
Shankar Venkitachalam (Adobe), Megahanath Macha Yadagiri (Carnegie Mellon University), Deepak Pai (Adobe)
Identifying customer stages in a buying cycle enables you to perform personalized targeting depending on the stage. Shankar Venkitachalam, Megahanath Macha Yadagiri, and Deepak Pai identify ML techniques to analyze a customer's clickstream behavior to find the different stages of the buying cycle and quantify the critical click events that help transition a user from one stage to another. Read more.
Add to your personal schedule
11:50am–12:30pm Tuesday, 03/17/2020
Secondary topics:  Streaming and IoT
Mark Grover (Lyft), Dev Tagare (Lyft)
Mark Grover and Dev Tagare offer you a glimpse at the end-to-end data architecture Lyft uses to reduce data lag appearing in its analytical systems from 24+ hours to under 5 minutes. You'll learn the what and why of tech choices, monitoring, and best practices. They outline the use cases Lyft has enabled, especially in ML model performance and evaluation. Read more.
Add to your personal schedule
11:50am–12:30pm Tuesday, 03/17/2020
Secondary topics:  Technology Ethics
Guillaume Saint-Jacques (LinkedIn Corporation), Meg Garlinghouse (LinkedIn Corporation)
Most companies want to ensure their products and algorithms are fair. Guillaume Saint-Jacques and Meg Garlinghouse we share LinkedIn's A/B testing approach to fairness, describe new methods that detect whether an experiment introduces bias or inequality. You'll learn about a scalable implementation on Spark and examples of use cases and impact at LinkedIn. Read more.
Add to your personal schedule
11:50am–12:30pm Tuesday, 03/17/2020
Jike Chong (LinkedIn), Yue Cathy Chang (TutumGene)
More than 85% of data science projects fail. This high failure rate is a main reason why data science is still a science. Jike Chong and Yue "Cathy" Chang outline how you can reduce this failure rate and improve teams' confidence in executing successful data science projects by applying data science technology to business problems: scenario mapping, pattern discovery, and success evaluation. Read more.
Add to your personal schedule
11:50am–12:30pm Tuesday, 03/17/2020
Shubhankar Jain (SurveyMonkey), Aliaksandr Padvitselski (SurveyMonkey), Manohar Angani (SurveyMonkey)
Every organization leverages ML to increase value to customers and understand their business. You may have created models, but now you need to scale. Shubhankar Jain, Aliaksandr Padvitselski, and Manohar Angani use a case study to teach you how to pinpoint inefficiencies in your ML data flow, how SurveyMonkey tackled this, and how to make your data more usable to accelerate ML model development. Read more.
Add to your personal schedule
11:50am–12:30pm Tuesday, 03/17/2020
Session
Sundar Varadarajan (Wipro), Peyman Behbahani (Wipro Technologies)
• Time and motion study of manufacturing operations in a shop floor is traditionally carried out through manual observation which is time consuming and involves human errors and limitations. In this study a new approach of video analytics combined with time series analysis is introduced to automate the process of activity identification and timing measurements. Read more.
Add to your personal schedule
11:50am–12:30pm Tuesday, 03/17/2020
Session
Sukanya Mandal (Capgemini)
Heavy ML computation on resource-constrained IoT devices is a challenge. IoT demands near-zero latency, high bandwidth availability, continuous and seamless availability, and privacy. The right infrastructure derives the right ROI. This is where edge and cloud comes in. Sukanya Mandal explains how training ML models at the cloud and inferencing at the edge has made many IoT use cases plausible. Read more.
Add to your personal schedule
11:50am–12:30pm Tuesday, 03/17/2020
Session
Ilana Golbin (PwC), Anand Rao (PwC)
Join in for a practitioner’s overview of the risks of AI and depiction of responsible AI deployment within an organization. You'll discover how to ensure the safety, security, standardized testing, and governance of systems and how models can be fooled or subverted. Ilana Golbin and Anand Rao illustrate how organizations safeguard AI applications and vendor solutions to mitigate AI risks. Read more.
Add to your personal schedule
11:50am–12:30pm Tuesday, 03/17/2020
Session
Sumeet Vij (Booz Allen Hamilton)
Weak supervision allows the use of noisy sources to provide supervision signals for labeling large amounts of training data. Sumeet Vij showcases an approach combining a Snorkel weak supervision framework with denoising labeling functions, a generative model, and AI-powered search to train classifiers leveraging enterprise knowledge, without the need for tens of thousands of hand-labeled examples. Read more.
Add to your personal schedule
11:50am–12:30pm Tuesday, 03/17/2020
Secondary topics:  Streaming and IoT
Teresa Tung (Accenture), William Gatehouse (Accenture)
The digital twin presents a problem of data and models at scale—how to mobilize IT and OT data, AI, and engineering models that work across lines of business and even across partners. Teresa Tung and William Gatehouse share their experience of implementing digital twins use cases that combine IoT, AI models, engineering models, and domain context. Read more.

12:30pm

12:30pm–1:45pm Tuesday, 03/17/2020
Lunch (1h 15m)
Add to your personal schedule
12:30pm–1:45pm Tuesday, 03/17/2020
Event
If you’d like to make new professional connections and hear ideas for supporting diversity in the tech community, come to the diversity and inclusion networking lunch on Tuesday. Read more.
Add to your personal schedule
12:30pm–1:45pm Tuesday, 03/17/2020
Event
Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics. Read more.

1:45pm

Add to your personal schedule
1:45pm–2:25pm Tuesday, 03/17/2020
Eitan Anzenberg (Bill.com)
Although the field of optical character recognition (OCR) has been around for almost half a century, document parsing and field extraction from images remain an open research topic. Eitan Anzenberg digs into using an end-to-end deep learning and OCR architecture to predict regions of interest within documents and automatically extract their text. Read more.
1:45pm–2:25pm Tuesday, 03/17/2020
Secondary topics:  Cloud Platforms and SaaS
TBC
Add to your personal schedule
1:45pm–2:25pm Tuesday, 03/17/2020
Session
Governance
Secondary topics:  Security and Privacy
Haopei Wang (DataVisor)
Haopei Wang detail the design and implementation of a system that automatically extracts fraud-related features for digital identifiers commonly collected by online services. You'll learn the approach of addressing real-time feature computation and creating templates for feature generation. The system has been applied successfully to fraud detection as well as good user analysis. Read more.
Add to your personal schedule
1:45pm–2:25pm Tuesday, 03/17/2020
Suneeta Mall (Nearmap)
Using Kubernetes as the backbone of AI infrastructure, Nearmap built a fully automated deep learning inference pipeline that's highly resilient, scalable, and massively parallel. Using this system, Nearmap ran semantic segmentation over tens of quadrillions of pixels. Suneeta Mall demonstrates the solution demonstrating using Kubernetes in big data crunching and machine learning at scale. Read more.
Add to your personal schedule
1:45pm–2:25pm Tuesday, 03/17/2020
Secondary topics:  Culture and Organization
Nancy Rausch (SAS)
For data to be meaningful, it needs to be presented in a way people can relate to. Nancy Rausch explains how SAS combined AI and art to tell a compelling data story and how the company combined streaming data from local bee hives to forecast hive health. It visualized this data in a live-action art sculpture, which helped to bring the data to life in a fun and compelling way. Read more.
Add to your personal schedule
1:45pm–2:25pm Tuesday, 03/17/2020
Joseph Sirosh (Compass)
Compass is changing real estate by leveraging its industry-leading software to build search and analytical tools that help real estate professionals find, market, and sell homes. Joseph Sirosh details how Compass leverages AWS services, including Amazon Elasticsearch Service, to deliver a complete, scalable home-search solution. Read more.
Add to your personal schedule
1:45pm–2:25pm Tuesday, 03/17/2020
Secondary topics:  Streaming and IoT
Minal Mishra (Netflix)
Minal Mishra walks you through Netflix's video player release process, the challenges with deriving time series metrics from a firehose of events, and some of the oddities in running analysis on real-time metrics. Read more.
Add to your personal schedule
1:45pm–2:25pm Tuesday, 03/17/2020
Katie Malone (Civis Analytics), Michelangelo D'Agostino (ShopRunner)
Data science is relatively young, but the job of managing data scientists is younger still. Many people undertake this management position without the tools, mentorship, or role models they need to do it well. Katie Malone and Michelangelo D'Agostino review key themes from a recent Strata report that examines the steps necessary to build, manage, sustain, and retain a growing data science team. Read more.
Add to your personal schedule
1:45pm–2:25pm Tuesday, 03/17/2020
Kelley Rivoire (Stripe)
Tools for training and optimizing models have become more prevalent and easier to use; however, these are insufficient for deploying ML in critical production applications. Kelley Rivoire dissects how Stripe approached challenges in developing reliable, accurate, and performant ML applications that affect hundreds of thousands of businesses. Read more.
1:45pm–2:25pm Tuesday, 03/17/2020
TBC
1:45pm–2:25pm Tuesday, 03/17/2020
Session
TBC
Add to your personal schedule
1:45pm–2:25pm Tuesday, 03/17/2020
Session
Financial services companies use machine learning models to solve critical business use cases. Regulators demand model explainability. Chanchal Chatterjee shares how Google solved financial services business critical problems such as credit card fraud, anti-money laundering, lending risk, and insurance loss using complex machine learning models that can be explained to the regulators. Read more.
Add to your personal schedule
1:45pm–2:25pm Tuesday, 03/17/2020
Session
Carlos Pazos (SparkCognition), Keith Moore (SparkCognition)
AutoML brings acceleration and democratization of data science, but in the game of accuracy and flexibility, the uses of predefined blueprints to find adequate algorithms falls short. Carlos Pazos and Keith Moore shine a spotlight on a neuroevolutionary approach to AutoML to custom build novel, sophisticated neural networks that perfectly represent the relationships in your dataset. Read more.
1:45pm–2:25pm Tuesday, 03/17/2020
TBC

2:35pm

2:35pm–3:15pm Tuesday, 03/17/2020
TBC
Add to your personal schedule
2:35pm–3:15pm Tuesday, 03/17/2020
Michael Freedman (TimescaleDB | Princeton University)
Time series data tends to accumulate very quickly, across DevOps, IoT, industrial and energy, finance, and other domains. Time series data is everywhere, with monitoring and IoT applications generating tens of millions of metrics per second and petabytes of data. Michael Freedman shows you how to build a distributed time series database that offers the power of full SQL at scale. Read more.
Add to your personal schedule
2:35pm–3:15pm Tuesday, 03/17/2020
Secondary topics:  Security and Privacy
AMANDA CHESSELL (IBM), John Mertic (Linux Foundation)
Building on its success at establishing standards in the Apache Hadoop data platform, the ODPi (Linux Foundation) now turns its focus to the next big data challenge—enabling metadata management and governance at scale across the enterprise. Amanda Chessell and John Mertic discuss how the ODPi's guidance on governance (GoG) aims to create an open data governance ecosystem. Read more.
Add to your personal schedule
2:35pm–3:15pm Tuesday, 03/17/2020
Zak Hassan (Red Hat)
The number of logs increases constantly and no human can monitor them all. Zak Hassan employs NLP for text encoding and machine learning methods for automated anomaly detection in an effort to construct a tool that could help developers perform root cause analysis more quickly on failing applications. Also, he provides a means to give feedback to the ML algorithm to learn from false positives. Read more.
Add to your personal schedule
2:35pm–3:15pm Tuesday, 03/17/2020
Secondary topics:  Data Quality
David Kohn (TimescaleDB)
The sheer volume of time series data from servers, applications, or IoT devices introduces performance challenges, both to insert data at high rates and to process aggregates for subsequent understanding. David Kohn demonstrates how systems can properly continuously maintain up-to-date aggregates, even correctly handling late or out-of-order data, to simplify data analysis. Read more.
Add to your personal schedule
2:35pm–3:15pm Tuesday, 03/17/2020
Secondary topics:  Security and Privacy
Sathya Chandran (DataVisor)
Sathya Chandran explains key insights into current trends of account takeover fraud by analyzing 52 billion events generated by 1.1 billion users and developing a set of features called user mobility features to capture suspicious device and IP-switching patterns. You'll learn to incorporate mobility features into an anomaly detection solution to detect suspicious account activity in real time. Read more.
2:35pm–3:15pm Tuesday, 03/17/2020
TBC
Add to your personal schedule
2:35pm–3:15pm Tuesday, 03/17/2020
Session
Data Quality
Barr Moses (Monte Carlo Data)
Ever had your CEO or customer look at your report and say your the numbers look way off? Barr Moses defines data downtime—periods of time when your data is partial, erroneous, missing, or otherwise inaccurate. Data downtime is highly costly for organizations, yet is often addressed ad hoc. You'll discuss why data downtime matters to the data industry and how best-in-class teams address it. Read more.
Add to your personal schedule
2:35pm–3:15pm Tuesday, 03/17/2020
Secondary topics:  Cloud Platforms and SaaS
Rustem Feyzkhanov (Instrumental)
Machine and deep learning are becoming more and more essential for businesses for internal and external use; one of the main issues with deployment is finding the right way to train and operationalize the model. A serverless approach for deep learning provides cheap, simple, scalable, and reliable architecture for it. Rustem Feyzkhanov digs into how to do so within AWS infrastructure. Read more.
2:35pm–3:15pm Tuesday, 03/17/2020
TBC
Add to your personal schedule
2:35pm–3:15pm Tuesday, 03/17/2020
Session
Giacomo Bernardi (Extreme Networks)
Machines talk among themselves, but you may be understand their behavior by analyzing their language. Giacomo Bernardi outlines a lightweight approach for securing large internet of things (IoT) deployments by leveraging modern natural language processing (NLP) techniques. Rather than attempting cumbersome firewall rules, IoT deployments can be efficiently secured by online behavioral modeling. Read more.
2:35pm–3:15pm Tuesday, 03/17/2020
Session
TBC
Add to your personal schedule
2:35pm–3:15pm Tuesday, 03/17/2020
Session
Navdeep Gill (H2O.ai)
Like all good software, machine learning models should be debugged to discover and remediate errors. Navdeep Gill explores several standard techniques in the context of model debugging—disparate impact, residual, and sensitivity analysis—and introduces novel applications such as global and local explanation of model residuals. Read more.
2:35pm–3:15pm Tuesday, 03/17/2020
TBC

3:15pm

3:15pm–4:15pm Tuesday, 03/17/2020
Afternoon break (1h)

4:15pm

4:15pm–4:55pm Tuesday, 03/17/2020
TBC
Add to your personal schedule
4:15pm–4:55pm Tuesday, 03/17/2020
Secondary topics:  Data Management and Storage
Kamil Bajda-Pawlikowski explores Presto, an open source SQL engine, featuring low-latency queries, high concurrency, and the ability to query multiple data sources. With Kubernetes, you can easily deploy and manage Presto clusters across hybrid and multicloud environments with built-in high availability, autoscaling, and monitoring. Read more.
Add to your personal schedule
4:15pm–4:55pm Tuesday, 03/17/2020
Sihui Hu (Microsoft), Dom Divakaruni (Microsoft)
You'll discover effective ways to track the full lineage from data preparation to model training to inference. Sihui Hu and Dominic Divakaruni unpack how to retrieve data-to-data, data-to-model, and model-to-deployment lineages in one graph to achieve reproducible and reliable machine learning at scale. Read more.
Add to your personal schedule
4:15pm–4:55pm Tuesday, 03/17/2020
Ebrahim Safavi (Mist Systems), Jisheng Wang (Mist Systems)
Anomaly detection models are essential to run data-driven businesses intelligently. At Mist Systems, the need for accuracy and the scale of the data impose challenges to build and automate ML pipelines. Ebrahim Safavi and Jisheng Wang explain how recurrent neural networks and novel statistical models allow Mist Systems to build a cloud native solution and automate the anomaly detection workflow. Read more.
4:15pm–4:55pm Tuesday, 03/17/2020
TBC
Add to your personal schedule
4:15pm–4:55pm Tuesday, 03/17/2020
Uber spends hundreds of millions of dollars in marketing and constantly optimizes the allocation of these budgets. It deploys complex models, using Python and PyTorch, and borrowing from machine learning (ML) to speed up solvers to optimize marketing investment. Mario Vinasco explains the framework of the marketing spend problem and how it was implemented. Read more.
Add to your personal schedule
4:15pm–4:55pm Tuesday, 03/17/2020
Lior Gavish (Barracuda)
Lior Gavish breaks down a machine learning (ML)-based system that detects a highly evasive type of email-based fraud. The system combines innovative techniques for labeling and classifying highly unbalanced datasets with a distributed cloud application capable of processing high-volume communication in real time. Read more.
Add to your personal schedule
4:15pm–4:55pm Tuesday, 03/17/2020
Session
Compliance
Secondary topics:  Security and Privacy
Kathy Winger (Law Offices of Kathy Delaney Winger)
Kathy Winger breaks down what business owners and technology professionals need to know about potential risks in the cybersecurity arena. You'll learn the current legal and data security issues and practices along with what’s happening on the regulatory front. And she'll help you mitigate the risks you face. Read more.
Add to your personal schedule
4:15pm–4:55pm Tuesday, 03/17/2020
David Talby (Pacific AI)
The industry has about 40 years of experience forming best practices and tools for storing, versioning, collaborating, securing, testing, and building software source code—but only about 4 years doing so for AI models. David Talby catches you up on current best practices and freely available tools so that your team can go beyond experimentation to successfully deploy models. Read more.
Add to your personal schedule
4:15pm–4:55pm Tuesday, 03/17/2020
Session
Eitan Anzenberg (Bill.com)
Although the field of optical character recognition (OCR) has been around for half a century, document parsing and field extraction from images remains an open research topic. Eitan Anzenberg leads a deep dive into a learning architecture that leverages document understanding to extract fields of interest. Read more.
4:15pm–4:55pm Tuesday, 03/17/2020
TBC
Add to your personal schedule
4:15pm–4:55pm Tuesday, 03/17/2020
Session
Bahman Bahmani (Rakuten)
With California’s CCPA looming near, Europe’s GDPR still sending shockwaves, and public awareness of privacy breaches heightening, we are in the early days of a new era of personal data protection. We will explore the challenges and opportunities for AI in this new era, and provide actionable insights for the audience to navigate their paths to AI success in this brave new world of data privacy. Read more.
Add to your personal schedule
4:15pm–4:55pm Tuesday, 03/17/2020
Session
Ben Fowler (Southeast Toyota Finance)
Selecting the optimal set of features is a key step in the machine learning modeling process. Ben Fowler shares research that tested five approaches for feature selection. The approaches included current widely used methods, along with novel approaches for feature selection using open source libraries, building a classification model using the Lending Club dataset. Read more.
Add to your personal schedule
4:15pm–4:55pm Tuesday, 03/17/2020
Secondary topics:  Streaming and IoT
Dave Nielsen (Redis Labs)
Redis Streams enables you to collect data in time series format while matching the data processing rate of your continuous application. Apache Spark’s Structured Streaming API enables real-time decision making for your continuous data. Dave Nielsen demonstrates how to integrate open source Redis with Apache Spark’s Structured Streaming API using the Spark-Redis library. Read more.

5:05pm

5:05pm–5:45pm Tuesday, 03/17/2020
TBC
Add to your personal schedule
5:05pm–5:45pm Tuesday, 03/17/2020
Ben Galewsky (National Center for Supercomputing Applications), Gray Lindsey (Fermi National Accelerator Laboratory), Andrew Melo (Vanderbilt University)
Building a data engineering pipeline for serving segments of a 200 Pb dataset to particle physicists around the globe poses many challenges, some of which are unique to high energy physics and some apply to big science projects across disciplines. Ben Galewsky, Gray Lindsey, and Andrew Melo highlight how much of it can inform industry data science at scale. Read more.
Add to your personal schedule
5:05pm–5:45pm Tuesday, 03/17/2020
Lars George (Okera)
With various levels of security layers and different departments responsible data, there are a number of challenges with managing security and governance within AWS identity and access management (IAM). Lars George identifies the security layers, why there’s such a conundrum with IAM, if IAM actually slows down data projects, and the access control requirements needed in data lakes. Read more.
Add to your personal schedule
5:05pm–5:45pm Tuesday, 03/17/2020
Secondary topics:  Security and Privacy
Nicola Corradi (DataVisor)
Fraudulent attacks like fake reviews, application fraud, and promotion abuse create a common pattern shared within coordinated malicious accounts. Nicola Corradi explains novel deep learning models that learned to detect suspicious patterns, leading to the individuation of coordinated fraud attacks on social, dating, ecommerce, financial, and news aggregator services. Read more.
5:05pm–5:45pm Tuesday, 03/17/2020
TBC
Add to your personal schedule
5:05pm–5:45pm Tuesday, 03/17/2020
Harrison Wang (LiveRamp)
A migration to a new environment is never easy. You'll learn how LiveRamp tackled migrating its large-scale production workflows from their private data center to the cloud while maintaining high uptime. Harrison Wang examines the high-level steps and decisions involved, lessons learned, and what to realistically expect out of a migration. Read more.
Add to your personal schedule
5:05pm–5:45pm Tuesday, 03/17/2020
Karthik Ramasamy (Streamlio), Anand Madhavan (Narvar)
Narvar originally used a large collection of point technologies such as AWS Kinesis, Lambda, and Apache Kafka to satisfy its requirements for pub/sub messaging, message queuing, logging, and processing. Karthik Ramasamy and Anand Madhavan walk you through how Narvar moved away from using a slew of technologies and consolidating their use cases using Apache Pulsar. Read more.
Add to your personal schedule
5:05pm–5:45pm Tuesday, 03/17/2020
Sanjeev Mohan (Gartner)
The acceleration of the migration of workloads to the cloud isn't a binary journey. Some workloads will still be on-premises and some will be on multiple cloud providers. Sanjeev Mohan identifies key data and analytics considerations in modern data architectures, including strategies to handle data latency, gravity, ingress transformation, compliance, and governance needs and data orchestration. Read more.
5:05pm–5:45pm Tuesday, 03/17/2020
TBC
Add to your personal schedule
5:05pm–5:45pm Tuesday, 03/17/2020
Session
Josh Weisberg (Zillow Group)
Computer vision and deep learning enable new technologies to mimic how the human brain interprets images and create interactive shopping experiences. This progress has major implications for businesses providing customers with the information they need to make a purchase decision. Josh Weisberg offers an overview of implementing computer vision to create rich media experiences. Read more.
5:05pm–5:45pm Tuesday, 03/17/2020
TBC
5:05pm–5:45pm Tuesday, 03/17/2020
TBC
5:05pm–5:45pm Tuesday, 03/17/2020
Session
TBC
Add to your personal schedule
5:05pm–5:45pm Tuesday, 03/17/2020
Batch processing can benefit immensely from adopting some techniques from the streaming processing world. Balaji Varadarajan shares how Apache Hudi (incubating), an open source project created at Uber and currently incubating with the ASF, can bridge this gap and enable more productive, efficient batch data engineering. Read more.

5:45pm

Add to your personal schedule
5:45pm–7:15pm Tuesday, 03/17/2020
Event
Make your way from booth to booth while you check out all the exhibitors in the Expo Hall on Tuesday after sessions end. Read more.

Wednesday, March 18, 2020

8:00am

Add to your personal schedule
8:00am–8:30am Wednesday, 03/18/2020
Event
Gather before keynotes on Tuesday morning to enjoy casual conversation while meeting fellow attendees. If one of your goals at Strata & AI is to meet new people, this session will jumpstart your networking opportunities. Read more.

8:45am

Add to your personal schedule
8:45am–10:30am Wednesday, 03/18/2020
Keynote
Rachel Roumeliotis (O'Reilly), Alistair Croll (Solve For Interesting)
Strata program chairs Rachel Roumeliotis and Alistair Croll welcome you to the first day of keynotes. Read more.

10:30am

10:30am–11:00am Wednesday, 03/18/2020
Morning break (30m)

11:00am

11:00am–11:40am Wednesday, 03/18/2020
TBC
Add to your personal schedule
11:00am–11:40am Wednesday, 03/18/2020
Sophie Watson (Red Hat), William Benton (Red Hat)
Cloud native infrastructure like Kubernetes has obvious benefits for machine learning systems, allowing you to scale out experiments, train on specialized hardware, and conduct A/B tests. What isn’t obvious are the challenges that come up on day two. Sophie Watson and William Benton share their experience helping end-users navigate these challenges and make the most of new opportunities. Read more.
11:00am–11:40am Wednesday, 03/18/2020
TBC
Add to your personal schedule
11:00am–11:40am Wednesday, 03/18/2020
Jaya Susan Mathew (Microsoft)
With the need to cater to a global audience, there's a growing demand for applications to support speech identification, translation, and transliteration from one language to another. Jaya Susan Mathew explores this topic and how to quickly use some of the readily available APIs to identify, translate, or even transliterate speech or text within your application. Read more.
Add to your personal schedule
11:00am–11:40am Wednesday, 03/18/2020
Secondary topics:  Streaming and IoT
Paige Roberts (Vertica)
What works in production is the only technology criterion that matters. Companies with successful high-scale production IoT analytics programs like Philips, Anritsu, and OptimalPlus show remarkable similarities. IoT at production scale requires certain technology choices. Paige Roberts drills into the architectures of successful production implementations to identify what works and what doesn’t. Read more.
Add to your personal schedule
11:00am–11:40am Wednesday, 03/18/2020
Secondary topics:  Data Management and Storage
Maulik Soneji (Gojek), Dinesh Kumar (Gojek)
Maulik Soneji and Dinesh Kumar explore Gojek's event-processing library to consume events from Kafka and push it to BigQuery. All of its services are event sourced, and Gojek has a high load of 21K messages per second for few topics, and it has hundreds of topics. Read more.
Add to your personal schedule
11:00am–11:40am Wednesday, 03/18/2020
Talia Tron (Intuit ), Joy Rimchala (Intuit)
Explainable AI (XAI) has gained industry traction, given the importance of explaining ML-assisted decisions in human terms and detecting undesirable ML defects before systems are deployed. Talia Tron and Joy Rimchala delve into XAI techniques, advantages and drawbacks of black box versus glass box models, concept-based diagnostics, and real-world examples using design thinking principles. Read more.
Add to your personal schedule
11:00am–11:40am Wednesday, 03/18/2020
Mark Donsky (Okera)
Privacy regulation is increasing worldwide with Europe's GDPR, the California Consumer Privacy Act (CCPA), and the New York Privacy Act (NYPA). Penalties for noncompliance are stiff, but many companies still aren't prepared. Mark Donsky shares how to establish best practices for holistic privacy readiness as part of your data strategy. Read more.
Add to your personal schedule
11:00am–11:40am Wednesday, 03/18/2020
Moty Fania (Intel)
Moty Fania shares key insights from implementing and sustaining hundreds of ML models in production, including continuous delivery of ML models and systematic measures to minimize the cost and effort required to sustain them in production. You'll learn from examples from different business domains and deployment scenarios (on-premises, the cloud) covering the architecture and related AI platforms. Read more.
Add to your personal schedule
11:00am–11:40am Wednesday, 03/18/2020
Session
Imagine looking into a mirror, but not seeing your own face. Instead, you're looking in the eyes of Barack Obama or Angela Merkel. Your facial expressions are seamlessly transferred to the other person's face in real time. Martin Förtsch and Thomas Endres dig into a prototype from TNG that transfers faces from one person to another in real time based on deepfakes. Read more.
Add to your personal schedule
11:00am–11:40am Wednesday, 03/18/2020
Session
Jonathan Peck (Algorithmia)
ML has been advancing rapidly, but only a few contributors focus on the infrastructure and scaling challenges that come with it. Jonathan Peck explores why ML is a natural fit for serverless computing, a general architecture for scalable ML, and common issues when implementing on-demand scaling over GPU clusters, providing general solutions and a vision for the future of cloud-based ML. Read more.
Add to your personal schedule
11:00am–11:40am Wednesday, 03/18/2020
Session
Devices discover their way around the network and proxy the intent of the users behind them; leveraging this information for behavior analytics can raise privacy concerns. A selective use of embedding models on a crafted corpus from anonymized data can address these concerns. Ramsundar Janakiraman details a way to build representations with behavioral insights that also preserves user identity. Read more.
11:00am–11:40am Wednesday, 03/18/2020
TBC
Add to your personal schedule
11:00am–11:40am Wednesday, 03/18/2020
Kai Wähner (Confluent)
Apache Kafka became the de facto standard for microservice architectures, which also introduces new challenges. Kai Wähner explores the problems of distributed microservices communication and how both Kafka and a service mesh like Istio address them. You'll learn some approaches for combining both to build a reliable and scalable microservice architecture with decoupled and secure microservices. Read more.

11:50am

11:50am–12:30pm Wednesday, 03/18/2020
TBC
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/18/2020
Tomer Shiran (Dremio), Jacques Nadeau (Dremio)
Data lakes are hot again. With S3 as the data lake storage, the modern data lake architecture separates compute from storage. Companies can choose a variety of elastic, scalable, and cost-efficient technologies when designing a cloud data lake. Tomer Shiran and Jacques Nadeau share best practices for building a data lake on AWS, as well as various services and open source building blocks. Read more.
11:50am–12:30pm Wednesday, 03/18/2020
TBC
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/18/2020
Liqun Shao (Microsoft)
Liqun Sha leads you through material from a new GitHub repository to show how data scientists without NLP knowledge can quickly train, evaluate, and deploy state-of-the-art NLP models. She focuses on two use cases with distributed training on Azure Machine Learning with Horovod: GenSen for sentence similarity and BERT for question answering using Jupyter notebooks for Python. Read more.
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/18/2020
Claudiu Barbura (Blueprint)
Claudiu Barbura exposes a tech stack to consumers in BI tools and data science notebooks using live demos to explain the lessons learned using Spark (CPU), BlazingSQL and Rapids.ai (GPU), and Apache Arrow in its quest to exponentially increase the performance of its data virtualizer, which enables real-time access to data sources across different cloud providers and on-premises databases and APIs. Read more.
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/18/2020
Secondary topics:  Streaming and IoT
Jeff Chao (Netflix)
Netflix has experienced an unprecedented global increase in membership over the last several years. Production outages today have greater impact in less time than years before. Jeff Chao details the open-sourced Mantis, which allows Netflix to continue providing great experiences for its members, enabling it to get real-time, granular, cost-effective operational insights. Read more.
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/18/2020
Patryk Oleniuk (Virgin Hyperloop One), Sandhya Raghavan (Virgin Hyperloop One)
Patryk Oleniuk and Sandhya Raghava investigate how to use demand data to improve on the design of the fifth mode of transport—Hyperloop. They discuss the passenger demand prediction methods and the tech stack (Spark, koalas, Keras, MLflow) used to build a deep neural network (DNN)-based near-future demand prediction for simulation purposes. Read more.
11:50am–12:30pm Wednesday, 03/18/2020 TBC
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/18/2020
Alice Zheng (Amazon)
You'll learn four lessons in building and operating large-scale, production-grade machine learning systems at Amazon with Alice Zheng, useful for practitioners and would-be practitioners in the field. Read more.
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/18/2020
Session
We’ll anchor on building an image classifier trained on the Stanford Cars dataset to evaluate fine tuning and feature extraction and the impact of hyperparameter optimization to these techniques, then tune image transformation parameters to augment the model. Our goal is to answer: how can resource-constrained teams make trade-offs between efficiency and effectiveness using pre-trained models? Read more.
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/18/2020
Session
Nick Pinckernell (Comcast)
With model serving becoming easier thanks to tools like Kubeflow, the focus is shifting to feature engineering. Nick Pinckernell reviews five ways to get your raw data into engineered features (and eventually to your model) with open source tools, flexible components, and various architectures. Read more.
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/18/2020
Session
Krishna Gade (Fiddler Labs)
Krishna Gade outlines how "explainable AI" fills a critical gap in operationalizing AI and adopting an explainable approach into the end-to-end ML workflow from training to production. You'll discover the benefits of explainability such as the early identification of biased data and better confidence in model outputs. Read more.
11:50am–12:30pm Wednesday, 03/18/2020
TBC
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/18/2020
Jay Smith (Google), Remy Welch (Google Cloud)
Data is a valuable resource, but collecting and analyzing the data can be challenging. Further, the cost of resource allocation often prohibits the speed at which analysis can take place. Jay Smith and Remy Welch break down how serverless architecture can improve the portability and scalability of streaming event-driven Apache Spark jobs and perform ETL tasks using serverless frameworks. Read more.

12:30pm

12:30pm–2:15pm Wednesday, 03/18/2020
Lunch (1h 45m)
Add to your personal schedule
12:30pm–1:45pm Wednesday, 03/18/2020
Event
Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics. Read more.

1:45pm

Add to your personal schedule
1:45pm–2:25pm Wednesday, 03/18/2020
Session
Data Quality
Mehul Sheth (Druva)
Any software product needs to be tested against data. It's difficult to have a random but realistic dataset representing production data. Mehul Sheth highlights using production data to generate models. Production data is accessed without exposing it or violating any customer agreements on privacy. The models are then used to generate test data at scale in lower environments. Read more.
Add to your personal schedule
1:45pm–2:25pm Wednesday, 03/18/2020
Secondary topics:  Streaming and IoT
Denise Gosnell (DataStax)
Self-organizing networks rely on sensor communication and a centralized mechanism, like a cell tower, for transmitting the network's status. Denise Gosnell walks you through what happens if the tower goes down and how a graph data structure gets involved in the network's healing process. You'll see graphs in this dynamic network and how path information helps sensors come back online. Read more.
Add to your personal schedule
1:45pm–2:25pm Wednesday, 03/18/2020
Jin Hyuk Chang (Lyft), Tao Feng (Lyft)
Jin Hyuk Chang and Tao Feng offer a glimpse of Amundsen, an open source data discovery and metadata platform from Lyft. Since it was open-sourced, Amundsen has been used and extended by many different companies within the community. Read more.
1:45pm–2:25pm Wednesday, 03/18/2020
TBC
Add to your personal schedule
1:45pm–2:25pm Wednesday, 03/18/2020
Secondary topics:  Culture and Organization
Dave Stuart (Department of Defense )
Dave Stuart takes a look into how the US Intelligence Community (IC) uses Jupyter and Python to harness subject matter expertise of analysts in a DIY analytic movement. You'll cover the technical and cultural challenges the community encountered in its quest to find success at a large scale and address the strategies used to mitigate the challenges. Read more.
Add to your personal schedule
1:45pm–2:25pm Wednesday, 03/18/2020
Secondary topics:  Data Management and Storage
Qorry Asfar (Pusat Demokrasi dan Hak Asasi Manusia), Muhammad Asfar (University of Airlangga)
With the disclosure of the Cambridge Analytical scandal, political practitioners have started to adopt big data technology to give them better understanding and management of data. Qorry Asfar and Muhammad Asfar provide a big data case study to develop political strategy and examine how technological adoption will shape a better political landscape. Read more.
Add to your personal schedule
1:45pm–2:25pm Wednesday, 03/18/2020
Utkarsh B (Flipkart), Giridhar Yasa (Flipkart)
Utkarsh B and Giridhar Yasa lead a deep dive into architectural patterns and the solutions Flipkart developed to ensure business continuity to millions of online customers, and how it leveraged technology to avert or mitigate risks from catastrophic failures. Solving for business continuity requires investments application, data management, and infrastructure. Read more.
Add to your personal schedule
1:45pm–2:25pm Wednesday, 03/18/2020
Arvind Prabhakar (StreamSets)
DataOps is the best approach for enterprises to improve business and drives future revenue streams and competitive differentiation, which is why so many businesses are rethinking their data strategy. Arvind Prabhakar explains how DataOps solves all the problems that come along with managing data movement at scale. Read more.
Add to your personal schedule
1:45pm–2:25pm Wednesday, 03/18/2020
Ananth Kalyan Chakravarthy Gundabattula (Commonwealth Bank of Australia)
Feature engineering can make or break a machine learning model. The featuretools package and associated algorithm accelerate the way features are built. Ananth Kalyan Chakravarthy Gundabattula explains a Dask and Prefect-based framework that addresses challenges and opportunities using this approach in terms of lineage, risk, ethics and automated data pipelines for the enterprise. Read more.
Add to your personal schedule
1:45pm–2:25pm Wednesday, 03/18/2020
Session
Stephan Erberich (University of Southern California), Kalvin Ogbuefi (Children's Hospital Los Angeles), Long Ho (Children's Hospital Los Angeles)
Annotating radiological images by category at scale is a critical step for analytical ML. Supervised learning is challenging because image metadata doesn't reliably identify image content and manual labeling images for AI algorithms isn't feasible. Stephan Erberich, Kalvin Ogbuefi, and Long Ho share an approach for automated categorization of radiological images based on content category. Read more.
Add to your personal schedule
1:45pm–2:25pm Wednesday, 03/18/2020
Session
Fidan Boylu Uz (Microsoft), Mario Bourgoin (Microsoft), Gheorghe Iordanescu (Microsoft)
Hyperparameter optimization for machine leaning is complex that requires advanced optimization techniques and can be implemented as a generic framework decoupled from specific details of algorithms. Fidan Boylu Uz, Mario Bourgoin, and Gheorghe Iordanescu apply such a framework to tasks like object detection and text matching in a transparent, scalable, and easy-to-manage way in a cloud service. Read more.
Add to your personal schedule
1:45pm–2:25pm Wednesday, 03/18/2020
Session
Daniel Jeffries (Pachyderm)
With algorithms making more and more decisions in our lives, from who gets a job to who gets hired and fired, and even who goes to jail, it’s more critical than ever that we make AI auditable and explainable in the real world. Daniel Jeffries breaks down how you can make your AI and ML systems auditable and transparent right now with a few classic IT techniques your team already knows well. Read more.
Add to your personal schedule
1:45pm–2:25pm Wednesday, 03/18/2020
Session
Nicola Corradi (DataVisor)
Fraudulent attacks such as application fraud, fake reviews, and promotion abuse have to automate the generation of user content to scale; this creates latent patterns shared among the coordinated malicious accounts. Nicola Corradi digs into a deep learning model to detect such patterns for the identification of coordinated content abuse attacks on social, ecommerce, financial platforms, and more. Read more.
1:45pm–2:25pm Wednesday, 03/18/2020
TBC

2:35pm

Add to your personal schedule
2:35pm–3:15pm Wednesday, 03/18/2020
Benjamin Batorsky (MIT Sloan)
Identifying and labeling named entities such as companies or people in text is a key part of text processing pipelines. Benjamin Batorsky outlines how to train, test, and implement a named entity recognition (NER) model with spaCy. You'll get a sneak peak on how to use these techniques with large, non-English corpora. Read more.
Add to your personal schedule
2:35pm–3:15pm Wednesday, 03/18/2020
Wangda Tan (Cloudera), Arpit Agarwal (Cloudera)
2020 Hadoop is still evolving fast. You'll learn the current status of Apache Hadoop community and the exciting present and future of Hadoop 3.x. Wangda Tan and Arpit Agarwal cover new features like Hadoop on Cloud, GPU support, NameNode federation, Docker, 10X scheduling improvements, OZone, etc. And they offer you upgrade guidance from 2.x to 3.x. Read more.
Add to your personal schedule
2:35pm–3:15pm Wednesday, 03/18/2020
Secondary topics:  Security and Privacy
Nong Li (Okera)
The evolution of storing data in a warehouse to hybrid infrastructure of on-premises and cloud data lakes enabled agility and scale. Nong Li looks at the problems between data and metadata, the privacy and security risks associated with them, how to avoid the pitfalls of this challenges, and why companies need to get it right by enforcing security and privacy consistently across all applications. Read more.
2:35pm–3:15pm Wednesday, 03/18/2020
TBC
Add to your personal schedule
2:35pm–3:15pm Wednesday, 03/18/2020
Kshitij Wadhwa (Rockset), Dhruba Borthakur (Rockset)
Rockset is a serverless search and analytics engine that enables real-time search and analytics on raw data from Amazon DynamoDB—with full featured SQL. Kshitij Wadhwa and Dhruba Borthakur explore how Rockset takes an entirely new approach to loading, analyzing, and serving data so you can run powerful SQL analytics on data from DynamoDB without ETL. Read more.
Add to your personal schedule
2:35pm–3:15pm Wednesday, 03/18/2020
ravi krishnaswamy (Autodesk)
Today’s applications interact with data in a distributed and decentralized world. Using graphs at scale, you can infer communities and your interaction by tracking access to common data across users and applications. Ravi Krishnaswamy displays a real-world product example with millions of users that uses the combined powers of Spark and graph databases to gain insights into customer workflows. Read more.
Add to your personal schedule
2:35pm–3:15pm Wednesday, 03/18/2020
Micah Wylde (Lyft)
Lyft processes millions of events per second in real time to compute prices, balance marketplace dynamics, and detect fraud, among many other use cases. Micah Wylde showcases how Lyft uses Kubernetes along with Flink, Beam, and Kafka to enable service engineers and data scientists to easily build real-time data applications. Read more.
2:35pm–3:15pm Wednesday, 03/18/2020
TBC
Add to your personal schedule
2:35pm–3:15pm Wednesday, 03/18/2020
Jay Budzik (Zest AI)
More companies are adopting machine learning (ML) to run key business functions. The best performing models combine diverse model types into stacked ensembles, but explaining these hybrid models has been impossible—until now. Jay Budzik details a new technique, generalized integrated gradients (GIG), to explain complex ensembled ML models that are safe to use in high-stakes applications. Read more.
Add to your personal schedule
2:35pm–3:15pm Wednesday, 03/18/2020
Session
Digital brands focus heavily on personalizing consumers' experience at every single touchpoint. In order to engage with consumers in the most relevant ways, Lily AI helps brands dissect and understand how their consumers interact with their products, more specifically with the product features. Sowmiya Chocka Narayanan explores the lessons learned building AI-powered personalization for fashion. Read more.
2:35pm–3:15pm Wednesday, 03/18/2020
Session
TBC
Add to your personal schedule
2:35pm–3:15pm Wednesday, 03/18/2020
Session
Moin Nadeem (Intel)
The real world is highly biased, but we still train AI models on that data. This leads to models that are highly offensive and discriminatory. For instance, models have learned that male engineers are preferable, and therefore discriminate when used in hiring. Moin Nadeem explores how to assess the social biases that popular models exhibit and how to leverage this to create a more fair model. Read more.
Add to your personal schedule
2:35pm–3:15pm Wednesday, 03/18/2020
Session
Leslie De Jesus (Wovenware)
Considering the cost of customer acquisition and the importance of making decisions based on customer data, churn prediction is key for retaining customers and anticipating future trends. Leslie De Jesus describes the case study of how a healthcare insurance provider reduced customer churn and examines three key considerations when creating the DL model to be a tool for preemptive decision making. Read more.
2:35pm–3:15pm Wednesday, 03/18/2020
TBC

3:15pm

3:15pm–4:15pm Wednesday, 03/18/2020
Afternoon break (1h)

4:15pm

4:15pm–4:55pm Wednesday, 03/18/2020
TBC
Add to your personal schedule
4:15pm–4:55pm Wednesday, 03/18/2020
Zhe Zhang (LinkedIn), Huangming Xie (LinkedIn)
Compute efficiency optimization is of critical importance in the big data era, as data science and ML algorithms become increasingly complex and data size increases exponentially over time. Opportunities exist throughout the resource use funnel, which Zhe Zhang and Huangming Xie characterize using a CLUE framework. Read more.
Add to your personal schedule
4:15pm–4:55pm Wednesday, 03/18/2020
Session
Compliance
Secondary topics:  Security and Privacy
Lisa Joy Rosner (Otonomo)
As cars introduce more advanced features, the role of customer privacy and responsible data stewardship has become an important focus for auto manufacturers and drivers. Lisa Joy Rosner discusses the future of connected vehicles, data compliance measures, and the impact of related policies like GDPR and the California Consumer Privacy Act (CCPA). Read more.
Add to your personal schedule
4:15pm–4:55pm Wednesday, 03/18/2020
Nisha Muktewar (Cloudera Fast Forward Labs), Victor Dibia (Cloudera Fast Forward Labs)
In many business use cases, it's frequently desirable to automatically identify and respond to abnormal data. This process can be challenging, especially when working with high-dimensional, multivariate data. Nisha Muktewar and Victor Dibia explore deep learning approaches (sequence models, VAEs, GANs) for anomaly detection, performance benchmarks, and product possibilities. Read more.
Add to your personal schedule
4:15pm–4:55pm Wednesday, 03/18/2020
Chendi Xue (Intel), Jian Zhang (Intel)
Chendi Xue and Jian Zhang explore how Intel accelerated Spark SQL with AVX-supported vectorization technology. They outline the design and evaluation, including how to enable columnar process in Spark SQL, how to use Arrow as intermediate data, how to leverage AVX-enabled Gandiva for data processing, and performance analysis with system metrics and breakdown. Read more.
Add to your personal schedule
4:15pm–4:55pm Wednesday, 03/18/2020
Kelly Zhiling Wan (LinkedIn), Jason Wang (LinkedIn), Lili Zhou (LinkedIn)
Studies show that good customer services accelerates customers' cohesion toward a product, which increases product engagement and revenue spending. It's traditional to use customer surveys to measure how customers feel about services and products. Kelly Zhiling Wan, Chih Hui Wang, and Lili Zhou examine the innovative data product to measure customer happiness from LinkedIn. Read more.
Add to your personal schedule
4:15pm–4:55pm Wednesday, 03/18/2020
Penghui Li (Zhaopin), Jia Zhai (StreamNative)
Penghui Li and Jia Zhai walk you through building an event streaming platform based on Apache Pulsar and simplifying a stream processing pipeline by Pulsar Functions, Pulsar Schema, and Pulsar SQL. Read more.
Add to your personal schedule
4:15pm–4:55pm Wednesday, 03/18/2020
Session
Anand Rao (PwC), Joseph Voyles (PwC)
This session will provide enterprise data, data scientists and IT leaders with an introduction to the core differences between software and machine learning model life cycles. We will demonstrate how AI’s success will also limit scale, and will introduce leading practices for establishing AI Ops to overcome limitations by automating CI/CD, supporting continuous learning, and enabling model safety. Read more.
4:15pm–4:55pm Wednesday, 03/18/2020
TBC
Add to your personal schedule
4:15pm–4:55pm Wednesday, 03/18/2020
Session
AI techniques are finding applications in a wide range of applications. Crowd-counting deep learning models have been used to count people, animals, and microscopic cells. Srikanth Gopalakrishnan introduces novel crowd counting techniques and their applications, including a pharma case study to show how it was used for drug discovery to bring about 98% savings in drug characterization efforts. Read more.
Add to your personal schedule
4:15pm–4:55pm Wednesday, 03/18/2020
Session
Roshan Satish (DocuSign), Michael Chertushkin (John Snow Labs)
Roshan Satish and Michael Chertushkin lead you through a real-world case study about applying state-of-the-art deep learning techniques to a pipeline that combines computer vision (CV), optical character recognition (OCR), and natural language processing (NLP) at DocuSign. You'll discover how the project delivered on its extreme interpretability, scalability, and compliance requirements. Read more.
4:15pm–4:55pm Wednesday, 03/18/2020
TBC
Add to your personal schedule
4:15pm–4:55pm Wednesday, 03/18/2020
Session
Luyang Wang (Restaurant Brands International), Jiao(Jennie) Wang (Intel)
Lu Wang and Jennie Wang explain how to build a real-time menu recommendation system to leverage attention networks using Spark, Analytics Zoo, and MXNet in the cloud. You'll learn how to deploy the model and serve the real-time recommendation using both cloud and on-device infrastructure on Burger King’s production environment. Read more.
Add to your personal schedule
4:15pm–4:55pm Wednesday, 03/18/2020
Sijie Guo (StreamNative), Yong Zhang (StreamNative)
Sijie Guo and Yong Zhang lead a deep dive into the details of Pulsar transaction and how it can be used in Pulsar Functions and other processing engines to achieve transactional event streaming. Read more.

Contact us

confreg@oreilly.com

For conference registration information and customer service

partners@oreilly.com

For more information on community discounts and trade opportunities with O’Reilly conferences

Become a sponsor

For information on exhibiting or sponsoring a conference

pr@oreilly.com

For media/analyst press inquires