20–23 April 2020

Monday, 20 April 2020

10:00

Add to your personal schedule
10:00–17:30 Monday, 20/04/2020
Training
Secondary topics:  Training
Nikki Rouda (Amazon Web Services)
Nikki Rouda walks you through building a data lake on Amazon S3 using different ingestion mechanisms, performing incremental data processing on the data lake to support transactions on S3, and securing the data lake with fine-grained access control policies. Read more.
Add to your personal schedule
10:00–17:30 Monday, 20/04/2020
Training
Secondary topics:  Training
Thomas Nield (Nield Consulting Group)
There's been an explosion of tools for machine learning, but two have emerged as practical go-to solutions: scikit-learn and Apache Spark. Using Python, Thomas Nield leads a deep dive into examples in parallel (no pun intended) for both of these tools and demonstrates how to tackle machine learning at small, medium, and large scales. Read more.
Add to your personal schedule
10:00–17:30 Monday, 20/04/2020
Training
Secondary topics:  Training
Hugo Bowne-Anderson (DataCamp)
Hugo Bowne-Anderson walks you through the math and stats you need to know to do data science and interpret your results correctly— calculus, linear algebra, statistical intuition, and probabilistic thinking, among others. Along the way, you'll dive into hands-on examples from machine learning, online experiments and hypothesis testing, natural language processing, data ethics, and more. Read more.
Add to your personal schedule
10:00–17:30 Monday, 20/04/2020
Training
Secondary topics:  Training
Grishma Jena (IBM)
Data science is rapidly changing every industry. This has resulted in a shift away from traditional software development and toward data-driven decision making. Grishma Jena shows you how to use Python to extract, wrangle, explore, and understand data so you can leverage it in the real world. Read more.
Add to your personal schedule
10:00–17:30 Monday, 20/04/2020
Training
Secondary topics:  Training
Michael Cullan (Pragmatic Institute)
Michael Cullan introduces TensorFlow’s capabilities through its Python interface. Starting at the low level of the TensorFlow graph, you’ll build fundamental machine learning tools like linear regression and softmax classification before combining them into a basic neural network. Read more.

12:00

12:00–13:00 Monday, 20/04/2020
Break (1h)

15:00

15:00–15:30 Monday, 20/04/2020
Break (30m)

Tuesday, 21 April 2020

9:00

Add to your personal schedule
9:00–17:00 Tuesday, 21/04/2020
training
Secondary topics:  Training
Nathalie Rauschmayr (Amazon Web Services), Satadal Bhattacharjee (Amazon Web Services), Aparna Elangovan (Amazon Web Services)
Nathalie Rauschmayr, Satadal Bhattacharjee, and Aparna Elangovan take you through building, training, and deploying a deep learning model on Amazon SageMaker. You'll also learn how to use some of the latest SageMaker features such as SageMaker Debugger and SageMaker Model Monitor. Read more.
Add to your personal schedule
9:00–17:00 Tuesday, 21/04/2020
training
Secondary topics:  Training
Dean Wampler (Anyscale)
Surprisingly, there's no simple way to scale up Python applications from your laptop to the cloud. Ray is an open source framework for parallel and distributed computing that makes it easy to program and analyze data at any scale by providing general-purpose high-performance primitives. Dean Wampler teaches you how to use Ray to scale up Python applications, data processing, and machine learning. Read more.
Add to your personal schedule
9:00–17:00 Tuesday, 21/04/2020
training
Secondary topics:  Training
Alex Thomas (John Snow Labs), Maziyar Panahi (John Snow Labs)
Alex Thomas and Maziyar Panahi detail the application of the latest advances in deep learning for common natural language processing (NLP) tasks such as named entity recognition, document classification, sentiment analysis, spell checking, and OCR. You'll learn to build complete text analysis pipelines using the highly performant, scalable open source Spark NLP library in Python. Read more.
Add to your personal schedule
9:00–17:00 Tuesday, 21/04/2020
training
Secondary topics:  Training
Oliver Hughes (Pivotal), Alberto C. Ríos (Pivotal)
Today's data engineer needs a deep understanding of the key tools and concepts within the vast, rapidly evolving Kubernetes ecosystem. Join Oliver Hughes and Alberto C. Ríos to gain a thorough grounding on Kubernetes concepts, learn best practices, and get hands-on with some of the essential tooling. Read more.
Add to your personal schedule
9:00–17:00 Tuesday, 21/04/2020
training
Secondary topics:  Training
Pramod Singh (Walmart Labs ), Rajesh Shreedhar Bhat (Walmart Labs)
With the latest developments and improvements in the field of deep learning and artificial intelligence, many demanding natural language processing tasks have become easy to implement and execute. Pramod Singh and Rajesh Shreedhar Bhat demonstrate how to implement text summarization using attention networks. Read more.
Add to your personal schedule
Add to your personal schedule
9:00–17:00 Tuesday, 21/04/2020
training
Secondary topics:  Training
Janisha Anand (Amazon Web Services), Nikki Rouda (Amazon Web Services)
Janisha Anand and Nikki Rouda teach you how to build a serverless data lake on AWS. You'll ingest Instacart's public dataset to the data lake and draw valuable insights on consumer grocery shopping trends as you build data pipelines, leverage data lake storage infrastructure, configure security and governance policies, create a persistent catalog of data, perform ETL, and run an ad hoc analysis. Read more.
Add to your personal schedule
Add to your personal schedule
Add to your personal schedule
Add to your personal schedule
Add to your personal schedule
9:00–17:00 Tuesday, 21/04/2020
training
Matt Kirk (YourChiefScientist.com)
Join Matt Kirk to dig into the theory, practice, and implementation of reinforcement learning—a highly promising field of machine learning. Read more.
Add to your personal schedule
9:00–17:00 Tuesday, 21/04/2020
training
Secondary topics:  Training
Russell Jurney (Data Syndrome)
AI software is eating the world. Disruption has begun to affect every sector and industry. Your organization is either ahead of the curve or will fall rapidly behind as front runners pull away. Read more.

10:30

10:30–11:00 Tuesday, 21/04/2020
Break (30m)

12:30

12:30–13:30 Tuesday, 21/04/2020
Break (1h)

15:00

15:00–15:30 Tuesday, 21/04/2020
Break (30m)

17:00

17:00–18:00 Tuesday, 21/04/2020
TBC

Wednesday, 22 April 2020

8:15

Add to your personal schedule
8:15–8:45 Wednesday, 22/04/2020
Event
Ready, set, network! Meet fellow attendees who are looking to connect at the Strata Data & AI Conference. We'll gather before Wednesday and Thursday keynotes for an informal speed networking event. Be sure to bring your business cards—and remember to have fun. Read more.

9:00

Add to your personal schedule
9:00–10:45 Wednesday, 22/04/2020
Keynote
Rachel Roumeliotis (O'Reilly), Alistair Croll (Solve For Interesting)
Strata Data & AI Conference program chairs Rachel Roumeliotis and Alistair Croll welcome you to the first day of keynotes. Read more.

10:45

10:45–11:15 Wednesday, 22/04/2020
Morning break (30m)

11:15

Add to your personal schedule
11:15–12:45 Wednesday, 22/04/2020
Interactive session
ML in Production
Max Humber (General Assembly)
Max Humber helps you get your model in front of users as quickly as possible. You'll discover a step-by-step lean ML playbook showing you how to convert your idea into a fully deployed application. Read more.
11:15–12:45 Wednesday, 22/04/2020
TBC
Add to your personal schedule
11:15–11:55 Wednesday, 22/04/2020
Rupert Prescot (Elsevier), Jonathan Warner (Elsevier)
The ultimate purpose of data is to drive decisions, but things in the real world commonly aren’t as reliable or accurate as we'd like them to be. The main reason data gets dirty and often unreliable is simple: human intervention. Rupert Prescot and Jonathan Warner are here to help you maintain the reliability of data that's constantly exposed to and updated by your users. Read more.
Add to your personal schedule
11:15–11:55 Wednesday, 22/04/2020
Session
Case Studies
Flávio Santos (Spotify)
Data has been a first-class citizen at Spotify since the beginning. It is an important component of the ecosystem that allows data scientists and analysts to improve features and develop new products. Events collected from instrumented clients and backends go through a complex system before they are available for internal teams. This talk goes deep into how event delivery is built inside Spotify. Read more.
Add to your personal schedule
11:15–11:55 Wednesday, 22/04/2020
Anna Gressel (Debevoise & Plimpton LLP), Meeri Haataja (Saidot), Jim Pastore (Debevoise & Plimpton LLP)
The Canadian Government made waves when it passed a law requiring AI impact assessments for automated decision systems. Similar proposals are pending in the US and EU. Anna Gressel, Meeri Haataja, and Jim Pastore unpack what an AI impact assessment looks like in practice and how companies can get started from a technical and legal perspective, and they provide tips on assessing AI risk. Read more.
Add to your personal schedule
11:15–11:55 Wednesday, 22/04/2020
Session
Case Studies
Conor Sayles (Bank of Ireland)
Conor Sayles details how Bank of Ireland led a data value realization strategy, yielding a return of over €70M and incorporating infrastructure investment, agile management, and design thinking. An analytic system including Tableau, Teradata, SAS, and Cloudera provides a cornerstone for decision making across multiple functions. Underlying the success is a growing data community. Read more.
Add to your personal schedule
11:15–11:55 Wednesday, 22/04/2020
Session
Applied ML
Sami Niemi (Barclays)
Predicting transaction payment fraud in real-time is an important challenge, which state-of-art supervised machine learning models can help to solve. In last two years Barclays has developed and tested different models and implementation solutions. In this talk we learn how state-of-the-art machine learning models can be implemented, while meeting strict real-time latency requirements. Read more.
Add to your personal schedule
11:15–11:55 Wednesday, 22/04/2020
During the last year, BBC's Datalab team adopted Apache Airflow to improve its recommendation model lifecycle and data processing pipeline. Tatiana Al-Chueyr Martins shares insights and practical examples, achievements, and challenges. You'll leave empowered to decide when to use Airflow. Read more.
Add to your personal schedule
11:15–11:55 Wednesday, 22/04/2020
Luyang Wang (Restaurant Brands International (RBI)), Jiao(Jennie) Wang (Intel)
Lu Wang and Jennie Wang explain how to build a real-time menu recommendation system to leverage attention network using MXNet, Ray, Apache Spark, and Analytics Zoo in the cloud. You'll learn how to deploy the model and serve the real-time recommendation using both cloud and on-device infrastructure in Burger King’s production environment. Read more.
Add to your personal schedule
11:15–11:55 Wednesday, 22/04/2020
Ward Van Laer (IxorThink)
A machine learning solution is only as good as it's deemed by the end user. More often than not, we don't think through how results are communicated or measured. Join Ward Van Laer to understand why, if you want business-end end users to trust and correctly interpret AI models, you might need to make your models transparent and understandable. Read more.
Add to your personal schedule
11:15–11:55 Wednesday, 22/04/2020
Session
AI Engineering
Antje Barth (AWS)
Many machine learning systems focus primarily on training models but leave users with the task of deploying and retraining their models. Antje Barth discusses the importance of continuous machine learning for improving model performance and details practical approaches to building continuous model training pipelines using Kubeflow. Read more.
Add to your personal schedule
11:15–11:55 Wednesday, 22/04/2020
Session
AI at the Edge
The advance of the industrial internet of things (IIoT) promised much, particularly in the area of predictive maintenance. Tristan O'Gorman digs into whether or not those promises have been realized. You'll learn about the particular technical and strategic challenges that organizations seeking to adopt predictive maintenance have to overcome. Read more.

12:05

Add to your personal schedule
12:05–12:45 Wednesday, 22/04/2020
Andy Petrella (Kensu)
Recent papers from Google and the European Commission emphasized the need for solutions to monitor data quality and lineage. Andy Petrella highlights three advantages for monitoring in production: boosting efficiency of data processes, increasing confidence in models in real time, and ensuring accountability to fulfill policies. Read more.
Add to your personal schedule
12:05–12:45 Wednesday, 22/04/2020
Session
Case Studies
Enterprise IT has been delivering BI on Hadoop for a few years, but frustrated business analysts and data scientists want self-service data and ML in the cloud, so they can go much faster. Phillip Radley explores the challenges when enterprise IT teams have to quickly pivot from caring for an elephant on-premises to farming herds of clusters, pipelines, and models in clouds. Read more.
Add to your personal schedule
12:05–12:45 Wednesday, 22/04/2020
Michael Li (The Data Incubator)
Drawing on experiences gleaned from hundreds of clients, Michael Li provides successful case studies from companies in a variety of industries that have successfully incorporated data science into their products and services. He presents the Pragmatic Data Framework, which successful clients have embraced to jumpstart their data science efforts and prioritize high-impact data science projects. Read more.
Add to your personal schedule
12:05–12:45 Wednesday, 22/04/2020
Session
Case Studies
Kumar Sambhav (Barclays)
People analytics has become key to unlocking human resource insights to understand and measure policy effectiveness and implement improvements by embedding intelligent decision making. Kumar Sambhav draws on people analytics use cases from Barclays to discuss the pipeline it developed and the corresponding controls and governance model that was implemented. Read more.
Add to your personal schedule
12:05–12:45 Wednesday, 22/04/2020
Session
Applied ML
Eitan Anzenberg (Bill.com)
Although the field of optical character recognition (OCR) has been around for half a century, document parsing and field extraction from images remains an open research topic. We utilize an end-to-end deep learning architecture that leverages document understanding to extract fields of interest. Read more.
Add to your personal schedule
12:05–12:45 Wednesday, 22/04/2020
Siyao Meng (Cloudera), Wei-Chiu Chuang (Cloudera)
Distributed tracing is a well-known technique for identifying where failures occur and the reason behind poor performance, especially for complex systems like Hadoop, which involves many different components. Siyao Meng and Wei-Chiu Chuang demo the work on integrating OpenTracing in the Hadoop ecosystem and outline Cloudera's future integration plan. Read more.
Add to your personal schedule
12:05–12:45 Wednesday, 22/04/2020
Alon Nir (Deliveroo)
Alon Nir offers you a glimpse into what a powerful and impactful tool network analysis is. With plethora of real-world examples and friendly Python syntax, you'll be equipped—and hopefully inspired—to start your journey with this network analysis. Read more.
Add to your personal schedule
12:05–12:45 Wednesday, 22/04/2020
Dan Sullivan (New Relic)
ML models may perform as expected from a reliability and scalability perspective, but make poor decisions that cost sales and trust. In worst-case scenarios, decisions may violate policies and government regulations. Dan Sullivan showcases techniques for identifying bias, leveraging explainability methods to measure compliance, and incorporating these techniques into DevOps practices. Read more.
Add to your personal schedule
12:05–12:45 Wednesday, 22/04/2020
Session
AI Engineering
Oliver Gindele (Datatonic)
Productionizing machine learning (ML) pipelines can be a daunting and difficult task for data scientists. Oliver Gindele highlights some of the newest technologies that address that issue and explains how a global cosmetics brand used them to productionize a serverless ML pipeline in an exciting case study. Read more.
Add to your personal schedule
12:05–12:45 Wednesday, 22/04/2020
Session
AI at the Edge
Philip Kendall (Intercept IP)
Philip Kendall offers a look at the challenges involved in training and deploying a unique model to each of tens of thousands of Arduino-class IoT devices to minimize power use and maximize lifetime. The solution involves a high-level simulation of the system on the backend to perform the training and a custom virtual machine on the device to implement the learned model. Read more.

12:45

Add to your personal schedule
12:45–14:05 Wednesday, 22/04/2020
Event
Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics. Read more.

14:05

Add to your personal schedule
14:05–15:35 Wednesday, 22/04/2020
Interactive session
AI Engineering
Thomas Nield (Nield Consulting Group)
Linear regression, logistic regression, and Naïve Bayes are workhorse machine learning algorithms that achieve practical results with little overhead. As a matter of fact, building these algorithms from scratch (without libraries) is more accessible than you may think! Read more.
Add to your personal schedule
14:05–15:35 Wednesday, 22/04/2020
Interactive session
AI at the Edge
Axel Sirota (ASAPP)
Over this training, we will learn in a hands-on approach about Tensorflow Lite and how to leverage it to create a machine learning application that can run on your cell phone. Read more.
Add to your personal schedule
14:05–14:45 Wednesday, 22/04/2020
Abhishek Somani (Qubole), Shubham Tagra (Qubole), V Rajkumar (Qubole)
Abhishek Somani, Shubham Tagra, and V Rajkuma detail an open source framework for Apache Hive, Apache Spark, and Presto that provides cross-engine ACID transactions and enables performant and cost-effective updates and deletes on big data lakes on the cloud. Read more.
Add to your personal schedule
14:05–14:45 Wednesday, 22/04/2020
Session
Case Studies
David Benham (Chesapeake Energy)
Cloudera and Chesapeake Energy present a real-world use case for anomaly detection at scale to reduce time-to-action in response to pipeline blockage. You'll apply these to the use case, including the business context, the problem, the machine learning approach taken, the technical architecture employed, and the lessons learned. Read more.
Add to your personal schedule
14:05–14:45 Wednesday, 22/04/2020
Maurice Coyle (Trūata)
Is customer trust dead? Maurice Coyle unpacks this question and explores some of the myths around the use of personal data and consumer privacy. He debunks some of the most common data privacy myths and shares valuable insights into the effective use of data for insights-driven organizations. Read more.
Add to your personal schedule
14:05–14:45 Wednesday, 22/04/2020
Session
Case Studies
bhargavi reddy (Netflix)
Bhargavi Reddy outlines the driving forces for effective data lifecycle management (DLM) at Netflix and the current state of Netflix’s S3 data warehouse, offers an overview of the S3 access logs collection process using SQS and Apache Iceberg, and details how the S3 logs are used for improving the efficiency and security posture of Netflix's cloud infrastructure at scale in the DLM realm. Read more.
Add to your personal schedule
14:05–14:45 Wednesday, 22/04/2020
Session
Applied ML
The Gaussian assumption in the Black-Scholes formula for option pricing has proven it's limited. Today, GANs are the new gold standard for simulation. It's worked wonders in image generation, but it remains to be seen if it can be applied to option pricing. Alexandre Combessie tells you the story of how two data scientists deployed a GAN for option pricing in real time in 10 days. Read more.
Add to your personal schedule
14:05–14:45 Wednesday, 22/04/2020
Ted Dunning (MapR, now part of HPE)
Data pipelines are fast becoming a standard fixture in modern systems, but how to build and maintain them isn't nearly as widely known as, say, building a data warehouse. Ted Dunning demystifies the core building blocks of such pipelines and how to use tools such as TensorFlow (extended), scikit-learn, Apache Flink, and Apache Beam to build, maintain, and monitor them. Read more.
Add to your personal schedule
14:05–14:45 Wednesday, 22/04/2020
Session
Streaming
Shradha Ambekar (Intuit)
Data analysis at scale with fast query response is critical for businesses. Cassandra, a popular datastore used in streaming applications, with Spark integration allows running analytical workload but can be slow. Shradha Ambekar unpacks similar challenges faced at Intuit and the solutions her team implemented to improve performance by 100X. Read more.
Add to your personal schedule
14:05–14:45 Wednesday, 22/04/2020
Session
Sponsored
Maria Laura Scuri (FACE IT Ltd), Lucy Vasserman (Jigsaw, a unit of Google)
Gaming is one of the most popular mediums for real-time social interactions. While the channel encourages healthy engagement, it can also experience toxicity. Maria Laura Scuri and Lucy Vasserman will discuss how they used AI to fight toxicity and share best practices for implementing machine learning models to create a moderation system that can react to real-time situations. Read more.
Add to your personal schedule
14:05–14:45 Wednesday, 22/04/2020
Firms and government have become more aware of the risk of "black-box" algorithms that "work," but in an opaque way. Existing laws and regulations merely stipulate what ought to be the case and not to achieve it technically. Richard Sargeant is joined by leading figures from law, technology, and businesses to interrogate this subject. Read more.
Add to your personal schedule
14:05–14:45 Wednesday, 22/04/2020
Session
AI Engineering
Robert Drysdale (Accenture, The Dock)
You'll take a look into building, training, and deploying machine learning and deep learning models on the main cloud platforms (AWS, Azure, GCP) and agnostically with Robert Drysdal. Read more.
Add to your personal schedule
14:05–14:45 Wednesday, 22/04/2020
Magaly Alonzo (Elter)
Time series is a particular type of data for one purpose: time. Because of this single property, time series needs a very specific kind of neural network that necessitates memory. Magaly Alonzo offers an overview of what time series is and its properties. And you'll dive into recurrent neural nets, a particular architecture designed for this purpose. Read more.

14:55

Add to your personal schedule
14:55–15:35 Wednesday, 22/04/2020
Jeff Evans (StreamSets)
Spark is a powerful tool for data processing, but can it do slowly changing dimensions? The answer is yes, with some thoughtful use of its capabilities. And thanks to Spark’s built-in features, you aren’t limited to databases when it comes to handling deltas and persisting historical changes in records. Jeff Evans includes live demos so you can see these concepts in action. Read more.
Add to your personal schedule
14:55–15:35 Wednesday, 22/04/2020
Session
Case Studies
Gabor Kotalik (Deutsche Telekom), Vaclav Surovec (Deutsche Telekom)
Deutsche Telekom is fourth biggest telecommunication company in the world, and every day millions of its customers use their mobile services in roaming. Gabor Kotalik and Václav Surovec explain how the company designed and built its machine learning processes on top of the Cloudera Hadoop cluster to support its commercial roaming business. Read more.
Add to your personal schedule
14:55–15:35 Wednesday, 22/04/2020
Daniel Huss (gravityAI)
Many types of algorithms have become commoditized, yet companies continue to use tight resources to try to build these in-house all the time. Considering that according to Gartner, 87% of internal data science projects fail to make it into production, it's crazy to concentrate resources on anything but the most proprietary of projects. Daniel Huss is here to help you decide where to focus. Read more.
Add to your personal schedule
14:55–15:35 Wednesday, 22/04/2020
Session
Case Studies
Martin Goodson (Evolution AI)
Combining the exacting requirements of a leading data provider with a university’s expertise led to breakthrough technology that reads balance sheets more accurately than humans. But the journey wasn’t smooth. Martin Goodson shares the project’s structure, outcomes, and mistakes made along the way. Read more.
Add to your personal schedule
14:55–15:35 Wednesday, 22/04/2020
Session
Applied ML
Giacomo Bernardi (Extreme Networks)
Machines talk among them! What can we learn about their behaviour by analysing their "language"? In this talk we present a lightweight approach for securing large IoT deployments by leveraging modern Natural Language Processing techniques. Rather than attempting cumbersome firewall rules, we argue that IoT deployments can be efficiently secured by online behavioural modelling. Read more.
Add to your personal schedule
14:55–15:35 Wednesday, 22/04/2020
Alejandro Saucedo (The Institute for Ethical AI & Machine Learning)
Managing production machine learning systems at scale has uncovered new challenges that require fundamentally different approaches to traditional software engineering or data science. Alejandro Saucedo explores ML Ops, a concept that often encompasses the methodologies to continuously integrate, deploy and monitor machine learning in production at massive scale. Read more.
Add to your personal schedule
14:55–15:35 Wednesday, 22/04/2020
Session
Streaming
Scott Kidder (Mux)
Learn how the Mux Data service has leveraged Kafka and Go to build stateful stream-processing applications that operate on extremely high-volumes of video-view beacons to drive real-time monitoring dashboards and historical metrics representing a viewer’s quality-of-experience. We’ll also cover fault-tolerance, monitoring, and Kubernetes container deployments. Read more.
Add to your personal schedule
14:55–15:35 Wednesday, 22/04/2020
Morgan Gregory (Google)
The adoption of AI is accelerating. We're reaping many benefits from the advancement of AI, but we're also seeing hints of the unintended harm that occurs when responsibility isn’t front and center. Morgan Gregory explains why it’s critical to understand how and why this happens so we can build our future responsibly, with AI that's fair, safe, trustworthy, and green. Read more.
Add to your personal schedule
14:55–15:35 Wednesday, 22/04/2020
Session
AI Engineering
Adam Blum (Auger.AI)
First generation AutoML was targeted to business analysts and "citizen data scientists": upload data to the service, watch the leaderboard, pick a winning model. Second generation of AutoML is targeted to developers and covers the full AutoML lifecycle. Join Adam Blum to learn how tools transform applications by replacing logic with predictions. Read more.
Add to your personal schedule
14:55–15:35 Wednesday, 22/04/2020
Jonny Hancox (NVIDIA)
Federated learning (FL) is a relatively new technique pioneered to allow you to use much larger datasets to train machine learning models without needing to share sensitive data. Jonny Hancox describes why this technique is deal for the healthcare sector, in which patient data is highly sensitive, but there's a need to increase the amount of training data to get models to clinically viable levels. Read more.

15:35

15:35–16:35 Wednesday, 22/04/2020
Afternoon break (1h)

16:35

Add to your personal schedule
16:35–18:05 Wednesday, 22/04/2020
Interactive session
Data Wrangling and Integration
Sarah Guido (InVision)
Getting your data ready for modeling is the essential first step in the machine learning process. Sarah Guido outlines the basics of preparing and standardizing data for use in machine learning models. Read more.
Add to your personal schedule
16:35–18:05 Wednesday, 22/04/2020
Interactive session
Applied ML
Aileen Nielsen (Skillman Consulting)
This talk poses the question of whether deep learning will ever come to dominate time series forecasting as it has come to dominate approaches to language and imagery. We'll both ask the question and provide a partial answer. Read more.
16:35–17:15 Wednesday, 22/04/2020
TBC
Add to your personal schedule
16:35–17:15 Wednesday, 22/04/2020
Session
Case Studies
Ben Sykes (Netflix)
Ensuring a consistently great Netflix experience while pushing innovative technology updates is no easy feat. Ben Sykes takes a look at how Netflix turns log streams into real-time metrics to provide visibility into how devices are performing in the field. You'll discover some of the lessons Netflix learned while optimizing Druid to handle its load. Read more.
16:35–17:15 Wednesday, 22/04/2020
TBC
Add to your personal schedule
16:35–17:15 Wednesday, 22/04/2020
Session
Case Studies
Kelly Carmody (Dramatic Solutions), Yaakov Bressler (Dramatic Solutions)
Dynamic pricing implemented properly by Broadway, the West End, and smaller theaters shows the promise of increasing revenue while selling more tickets and lowering prices. Kelly Carmody and Yaakov Bressler dig into their work proving the statistics behind dynamic pricing using probability distributions and a variety of modeling techniques in Python. Read more.
Add to your personal schedule
16:35–17:15 Wednesday, 22/04/2020
Elias Nema (OLX)
OLX includes 20+ brands, more than 350M monthly active users, and millions of new items added to a platform daily. Of course recommender systems play a crucial part in its platform. Elias Nema highlights the data flows and core components used for building, serving and continuously iterating recommenders in such a dynamic marketplace. Read more.
Add to your personal schedule
16:35–17:15 Wednesday, 22/04/2020
Wojciech Biela (Starburst), Karol Sobczak (Starburst)
Wojciech Biela and Karol Sobcza explore Presto, an open source SQL engine, offering high concurrency, low-latency queries across multiple data sources within one query. With Kubernetes, you can easily deploy and manage Presto clusters across hybrid and multicloud environments with built-in high availability, autoscaling, and monitoring. Read more.
Add to your personal schedule
16:35–17:15 Wednesday, 22/04/2020
Ken Johnston (Microsoft), Ankit Srivastava (Microsoft)
Today, normal growth isn't enough—you need hockey-stick levels of growth. Sales and marketing orgs are looking to AI to "growth hack" their way to new markets and segments. Ken Johnston and Ankit Srivastava explain how to use mutual information at scale across massive data sources to help filter out noise and share critical insights with new cohort of users, businesses, and networks. Read more.
Add to your personal schedule
16:35–17:15 Wednesday, 22/04/2020
Hatem Hajri (Institut de recherche technologique SystemX)
Adversarial machine learning studies vulnerabilities of machine learning algorithms in adversarial settings and develops techniques to make learning more robust to adversarial examples. Hatem Hajr outlines adversarial machine learning and illustrates a new approach to address the problem of adversarial examples based on probabilistic techniques. Read more.
Add to your personal schedule
16:35–17:15 Wednesday, 22/04/2020
Session
AI at the Edge
Anthony Joseph (My House Geek)
IoT devices are increasing in power and capability, now allowing developers to use machine learning models on the device. Anthony Joseph analyzes a boxing training session with motion sensors onboard IoT devices using the TensorFlow framework and provides user feedback on technique and speed. Read more.
Add to your personal schedule
16:35–17:15 Wednesday, 22/04/2020
Julien Simon (AWS)
Julien Simon offers an overview of graph neural networks (GNNs), one of the most exciting developments in machine learning today. You'll discuss real-life use cases for which GNNs are a great fit and get started with GNNs using the Deep Graph Library, an open source library built on top of Apache MXNet and PyTorch. Read more.
Add to your personal schedule
16:35–17:15 Wednesday, 22/04/2020
Chris Santiago (Unravel)
Join Chris Santiago, Solution Engineering Manager to learn how to reduce the time troubleshooting and the costs involved in operating your data platform either on-prem or in hybrid or multi-cloud environments. During this session we will demonstrate how Unravel complements and extends your existing data platform. Read more.

17:25

17:25–18:05 Wednesday, 22/04/2020
TBC
Add to your personal schedule
17:25–18:05 Wednesday, 22/04/2020
Session
Case Studies
Gabriel Straub walks you through the BBC's experience with building a framework to build public service recommendations for the BBC, deploying in multiple clouds, following our machine learning principles, and reflecting editorial values to inform, educate, and entertain. Read more.
17:25–18:05 Wednesday, 22/04/2020
TBC
Add to your personal schedule
17:25–18:05 Wednesday, 22/04/2020
Session
Case Studies
Mike Lutz (Samtec)
Netflix proposed a novel best practice in using Jupyter notebooks as glue for working in the big data and AI-processing domain. You can follow a manufacturing company's adventure as it tries to implement Netflix's ideas on a dramatically smaller scale. Mike Lutz explains how Netflix's idea can be useful even for the small fry. Read more.
Add to your personal schedule
17:25–18:05 Wednesday, 22/04/2020
Session
Streaming
Sijie Guo (StreamNative)
Apache Pulsar as a cloud-native event streaming platform gains more and more adoptions in mission critical services due to its stronger consistency and durability guarantees. This presentation deep dives into the technical details driven the Pulsar adoption trend and showcases the real world example on using Apache Pulsar to process billions of transactions every day. Read more.
17:25–18:05 Wednesday, 22/04/2020
TBC
Add to your personal schedule
17:25–18:05 Wednesday, 22/04/2020
Marko Letic (Mozilla)
Did you know that the beginnings of data visualization are strongly tied to solving some of the biggest problems humanity has ever faced? Wouldn’t it be more interesting to say that you’re not a doctor, but you do save lives than to say you’re just a developer? If you want to know more, join me on this trip through time and beyond. Read more.
Add to your personal schedule
17:25–18:05 Wednesday, 22/04/2020
walid daboubi (Richemont)
Traditional cybersecurity processes are by definition reactive in that they're based on a set of rules. Walid Daboubi offers you a glimpse into how Richemont made its cybersecurity approach more proactive by applying machine learning on a set of concrete use cases. Read more.
Add to your personal schedule
17:25–18:05 Wednesday, 22/04/2020
Session
AI at the Edge
Alasdair Allan (Babilim Light Industries)
The future of machine learning is on the edge and on small, embedded devices. Over the last year, custom silicon, intended to speed up machine learning inferencing on the edge, has started to appear. No cloud needed. Alasdair Allan evaluates the new silicon, looking not just at inferencing speed but also at heating, cooling, and the overall power envelope needed to run it. Read more.
Add to your personal schedule
17:25–18:05 Wednesday, 22/04/2020
Meher Kasam (Square)
Meher Kasam, Anirudh Koul, and Siddha Ganju highlight the must-have checklist for everyday AI practitioners to speed up your deep learning training and inference with TensorFlow code examples. Read more.

18:05

Add to your personal schedule
18:05–19:05 Wednesday, 22/04/2020
Event
Make your way from booth to booth while you check out all the exhibitors in the Expo Hall on Wednesday after sessions end. Read more.

Thursday, 23 April 2020

8:15

Add to your personal schedule
8:15–8:45 Thursday, 23/04/2020
Event
Ready, set, network! Meet fellow attendees who are looking to connect at the Strata Data & AI Conference. We'll gather before Wednesday and Thursday keynotes for an informal speed networking event. Be sure to bring your business cards—and remember to have fun. Read more.

9:00

Add to your personal schedule
9:00–10:45 Thursday, 23/04/2020
Keynote
Rachel Roumeliotis (O'Reilly), Alistair Croll (Solve For Interesting)
Strata Data & AI Conference program chairs Rachel Roumeliotis and Alistair Croll welcome you to the second day of keynotes. Read more.

10:45

10:45–11:15 Thursday, 23/04/2020
Break (30m)

11:15

Add to your personal schedule
11:15–12:45 Thursday, 23/04/2020
Interactive session
Data Engineering
Jeff Carpenter (DataStax)
In this hands-on training, you’ll learn how to incorporate Apache Cassandra and Apache Kafka into your data pipelines, using the Kafka Connect framework and the DataStax Kafka source and sink Connectors. Read more.
11:15–12:45 Thursday, 23/04/2020
TBC
Add to your personal schedule
11:15–11:55 Thursday, 23/04/2020
Session
Governance
Anna Gressel (Debevoise & Plimpton LLP), Jim Pastore (Debevoise & Plimpton LLP), Florian Ostmann (The Alan Turing Institute)
Anna Gressel, Jim Pastore, and Florian Ostmann lead a crash course on the emerging ethical and regulatory issues surrounding fintech AI. You'll hear insights from statements by US and UK regulators in banking and financial services and examine their priorities in 2020. You'll get practical guidance on how you can mitigate ethical and legal risks and position your AI products for success. Read more.
Add to your personal schedule
11:15–11:55 Thursday, 23/04/2020
Session
Case Studies
Melissa Singh (TD Bank), Pirabu Pathmasenan (TD Bank)
Melissa Singh and Pirabu Pathmasenan walk you through TD Bank's data-driven transformation. You'll learn how it started, where it is today, and where it's going with big data and AI. You'll uncover shifts in the company's cultural paradigm, along with the technical tools and practices used to transition traditional analytics teams into the world of big data and AI. Read more.
Add to your personal schedule
11:15–11:55 Thursday, 23/04/2020
Simon Lidberg (Microsoft), Benjamin Wright-Jones (Microsoft)
DevOps, DevSecOps, AIOps, ML Ops, Data Ops, No Ops....Ditch your confusion and join Simon Lidberg and Benjamin Wright-Jones to understand what DevOps means for AI and your organization. Read more.
Add to your personal schedule
11:15–11:55 Thursday, 23/04/2020
Session
Case Studies
Andras Szabo (Pivigo), Adam Hill (HAL24K)
Wildfires are a major environmental and health risk, with a frequency that has increased dramatically in the past decade. Early detection is critical, however most often wildfires are only discovered by eye-witness accounts. In this talk we will tell about a data science partnership between HAL24K and Pivigo aimed at building an automated wildfire detection system using NOAA satellite data. Read more.
Add to your personal schedule
11:15–11:55 Thursday, 23/04/2020
Session
Applied ML
Brandy Freitas (Pitney Bowes)
Brandy Freitas demystifies the mathematical principles behind graph databases, offers a primer to graph native algorithms, and outlines the current use of graph technology in industry. Read more.
Add to your personal schedule
11:15–11:55 Thursday, 23/04/2020
Jacques Nadeau (Dremio)
Join in for a review of how to build a successful cloud data lake. Jacques Nadeau leads a deep dive into key topics such as landing, ETL, security cost and performance trade-offs, and access patterns, as well as technologies such as Apache Arrow, Iceberg, and Spark in the context of real-world customer deployments. Read more.
11:15–11:55 Thursday, 23/04/2020
TBC
Add to your personal schedule
11:15–11:55 Thursday, 23/04/2020
Session
AI Engineering
Thunder Shiviah (Databricks), Cyrielle Simeone (Databricks)
Thunder Shiviah and Cyrielle Simeone dive into MLflow, an open source platform from Databricks, to manage the complete ML lifecycle, including experiment tracking, model management, and deployment. With over 140 contributors and 800,000 monthly download on PyPi, MLflow has gained tremendous community adoption, demonstrating the need for an open source platform for the ML lifecycle. Read more.
Add to your personal schedule
11:15–11:55 Thursday, 23/04/2020
Around the world, IKEA has an ever-growing number of loyalty club (Family) members. An important part of IKEA’s ongoing digital transformation is to improve communication with these customers and to inspire them with offers that are most relevant for improving their everyday life. Kim Falk shares IKEA's work on personalizing promotional emails. Read more.
Add to your personal schedule
11:15–11:55 Thursday, 23/04/2020
Holden Karau (Independent), Trevor Grant (IBM)
We'll show you a way to get & keep your models in production with Kubeflow. Read more.
Add to your personal schedule
11:15–11:55 Thursday, 23/04/2020
Session
Streaming
Jason Bell (Independent Speaker)
Apache Pulsar gives you the same robust real-time messaging capabilities as Kafka. Jason Bell examines the challenges of migrating from an existing Kafka cluster to Apache Pulsar and what considerations you need to make with brokers, topics, retention, consumers, and producers. Read more.

12:05

Add to your personal schedule
12:05–12:45 Thursday, 23/04/2020
Session
Governance
Dean Wampler (Anyscale)
Production deployment of machine learning (ML) models requires data governance, because models are data. Dean Wampler justifies that claim and explores its implications and techniques for satisfying the requirements. Using motivating examples, you'll explore reproducibility, security, traceability, and auditing, plus some unique characteristics of models in production settings. Read more.
Add to your personal schedule
12:05–12:45 Thursday, 23/04/2020
Session
Case Studies
Criteo's infrastructure provides capacity and connectivity to host its platform and applications; the evolution of its infrastructure is driven by the ability to forecast traffic demand. Hamlet Jesse Medina Ruiz explains how Criteo uses Bayesian dynamic time series models to accurately forecast its traffic load and optimize hardware resources across data centers. Read more.
Add to your personal schedule
12:05–12:45 Thursday, 23/04/2020
Kevin Kim (Socar)
Socar has been seriously focused on data operations. Kevin Kim describes how Socar is redefining the car-sharing industry with data science with an experiment-based pricing strategy, machine learning–based demand prediction, optimized car management, accident risk profiling, and much more. Read more.
Add to your personal schedule
12:05–12:45 Thursday, 23/04/2020
Session
Case Studies
Rick Houlihan (Amazon Web Services)
When Amazon decided to migrate thousands of application services to NoSQL, many of those services required complex relational models that could not be reduced to simple key-value access patterns. The most commonly documented use cases for NoSQL are simplistic. this session shows how to model complex relational data efficiently in denormalized structures. Read more.
Add to your personal schedule
12:05–12:45 Thursday, 23/04/2020
Session
Applied ML
Jonathan Leslie (Pivigo)
MADE.com are a furniture and homewares retailer with a unique online-only business model. Given this format, it is crucial that customer service agents are able to respond to queries quickly and accurately. However, it can often be difficult to match the demand of incoming requests. We will tell about a project aimed developing a framework for automated responses to customer queries. Read more.
Add to your personal schedule
12:05–12:45 Thursday, 23/04/2020
Francesco Mucio (Francescomuc.io)
Sit down and play data engineering worst practices bingo. From cloud infrastructure to stream processing, from data lakes to analytics, you'll see what can go wrong and the reasons behind these decision. Francesco Mucio has been collecting stories for almost 20 years, and it's finally time to give back. If you recognize your organization in some of them, well, Francesco told you to sit down. Read more.
12:05–12:45 Thursday, 23/04/2020
TBC
Add to your personal schedule
12:05–12:45 Thursday, 23/04/2020
Session
NLP
Natural language processing (NLP) tasks using supervised ML perform poorly where conversational context is involved. Perumal Sudalai Kumaresa details how implementing deep reinforcement learning (DRL) in NLP is a better predictor in handling problems like Q&A, dialogue generation, and article summary by simulation of two agents taking turns that explore state-action space and learning a policy. Read more.
Add to your personal schedule
12:05–12:45 Thursday, 23/04/2020
Session
AI Engineering
MELANIE LAFFIN (Booz Allen Hamilton)
Traditional automation is typically limited to clear-cut business rules that can be easily programmed. Melanie Laffin expands what automation can do by adding eyes (computer vision), a brain (general AI models), and speech (natural language processing) to automations to enhance their ability. Read more.
Add to your personal schedule
12:05–12:45 Thursday, 23/04/2020
Miguel Martínez (NVIDIA)
GPU acceleration has been at the heart of scientific computing and artificial intelligence for many years now. Since the launch of RAPIDS last year, this vast computational resource has become available for data science workloads too. Miguel Martínez details the RAPIDS framework, a GPU-accelerated drop-in replacement for utilities such as pandas, scikit-learn, NetworkX, and XGBoost. Read more.
12:05–12:45 Thursday, 23/04/2020
TBC
Add to your personal schedule
12:05–12:45 Thursday, 23/04/2020
Session
Streaming
Itai Yaffe (Nielsen)
Nielsen Marketing Cloud leverages Apache Druid to provide its customers (marketers and publishers) real-time analytics tools for various use cases, including in-flight analytics, reporting, and building target audiences. Itai Yaffe digs into advanced Druid techniques, such as efficient ingestion of billions of events per day, query optimization, and data retention and deletion. Read more.

12:45

Add to your personal schedule
12:45–14:05 Thursday, 23/04/2020
Event
Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics. Read more.

14:05

Add to your personal schedule
14:05–15:35 Thursday, 23/04/2020
Interactive session
Joseph Nelson (Roboflow)
In this session, Joseph walks you through the end-to-end flow required to train a model for mobile deployment, including image collection, preprocessing and augmenting considerations, model training, and saving the TFLite model in an appropropriate format for deployment. For this session, participants should have awareness of machine learning, familiarity with Python, and their laptops. Read more.
14:05–15:35 Thursday, 23/04/2020
TBC
Add to your personal schedule
14:05–14:45 Thursday, 23/04/2020
Session
Governance
Sarah Gold (Projects by IF)
People care about how data about them is used. Building trust with consumers will require a change in how services treat data. Since 2016, IF has curated a data patterns catalogue which is used by product teams around the world. We’ll show how patterns help teams build digital services that give people agency over data, build trust and start addressing systemic balances of power. Read more.
14:05–14:45 Thursday, 23/04/2020
Session
Case Studies
TBC
Add to your personal schedule
14:05–14:45 Thursday, 23/04/2020
Asif Jan (Roche)
Advances in AI and ML are critical to advancing understanding diseases and bringing better and more efficacious treatments to patients, realizing the dream of personalized healthcare. Asif Jan shares insights from building data science teams in pharma and outlines a road map for success of AI and ML in the pharma industry. Read more.
Add to your personal schedule
14:05–14:45 Thursday, 23/04/2020
Session
Case Studies
Almost two years ago EnBW developed its core beliefs for the role of AI at EnBW and derived concrete actions that need to be taken to scale its AI activities. Rainer Hoffmann and Frank Säuberlich describe the actions and the challenges EnBW has faced on its journey so far and its approach to mastering these challenges. Read more.
Add to your personal schedule
14:05–14:45 Thursday, 23/04/2020
Session
Applied ML
Davin Kaing (IBM)
What is driving revenue? How can we improve our client experience? These are causal questions that many organizations face. Answering these questions using data can be challenging, especially since in most cases, only observational data are available. We will go through an overview of both traditional and modern causal inference techniques and address their limitations and applications. Read more.
Add to your personal schedule
14:05–14:45 Thursday, 23/04/2020
Session
Computer vision
Mary Wahl (Microsoft), Ye Xing (Microsoft)
With the increasing availability of massive high-resolution aerial imagery, the geospatial information system community and the computer vision (CV) community joined forces in the new field of "geo AI." Mary Wahl and Ye Xing introduce you to this new field with live demos and sample code for common AI applications to aerial imagery from both commercial and government use cases. Read more.
Add to your personal schedule
14:05–14:45 Thursday, 23/04/2020
Tal Doron (GigaSpaces)
More enterprises are using big data for better business decision making, but existing infrastructure lacks the performance and scale needed to support the growing requirements for real-time analysis and visualization of operational data. Tal Doron outlines how you can achieve BI visualization on fresh data for real-time dashboards and low-latency response time when generating reports. Read more.
Add to your personal schedule
14:05–14:45 Thursday, 23/04/2020
Session
NLP
Nipun Sadvilkar (Episource)
Episource is building a clinical natural language processing (NLP) engine to extract from medical charts to automate coding in claims submissions using a medical coder's expertise to review highlighted entities and autosuggested ICD10 codes. Nipun Sadvilkar details building a key component of Episource's clinical NLP engine—Clinical NER—from data annotation to models and techniques. Read more.
Add to your personal schedule
14:05–14:45 Thursday, 23/04/2020
Bargava Subramanian (Binaize), Amit Kapoor (narrativeVIZ)
Bargava Subramanian and Amit Kapoor use two real-world examples to show how you can quickly build visual data products using TensorFlow.js to address the challenges inherent in understanding the strengths, weaknesses, and biases of your models as well as involving business users to design and develop a more effective model. Read more.
Add to your personal schedule
14:05–14:45 Thursday, 23/04/2020
Robert Crowe (Google)
Production ML must address issues of modern software methodology as well as issues unique to ML. Different types of ML have different requirements, often driven by different data lifecycles and ground truth. And implementations often suffer from limitations in modularity, scalability, and extensibility. Robert Crowe examines production ML applications and reviews TensorFlow Extended (TFX). Read more.
14:05–14:45 Thursday, 23/04/2020
TBC

14:55

Add to your personal schedule
14:55–15:35 Thursday, 23/04/2020
Session
Governance
Robin Jose (Scorable)
A key challenge to AI adoption is the lack of transparency and the Blackbox models. This talk shows how a Berlin based startup democratized Credit Risk Assessment with Explainable AI. The blackbox nature of AI causes concerns on adoption, regulation and ethical use. We present a hope that explainable AI could not only solve this problem, but in doing so make the world a better place. Read more.
Add to your personal schedule
14:55–15:35 Thursday, 23/04/2020
Session
Case Studies
Lukumon Oyedele (University of the West of England)
The time spent by frontline construction workers can be reduced by 50% through a hands-free assembly support building information modeling (BIM) system. Lukumon Oyedele explains how to make it possible for onsite construction workers to seek support from BIM through verbal query and augmented display through conversational AI and augmented reality (AR). Read more.
Add to your personal schedule
14:55–15:35 Thursday, 23/04/2020
LOMIT Patel (IMVU)
The future of customer acquisition rests on the shoulders of leveraging intelligent machines, orchestrating complex campaigns across key marketing platforms—dynamically allocating budgets, pruning creatives, surfacing insights, and taking actions powered by AI. Lomit Patel shows you how to use AI and machine learning (ML) to provide an operational layer to deliver meaningful results. Read more.
Add to your personal schedule
14:55–15:35 Thursday, 23/04/2020
Session
Case Studies
Nutsa Abazadze (TBC Bank), Aleksandre Lomadze (TBC Bank)
We will tell you how our failed attempt to build an ML model brought us to discovering institutional problems and kicked off improvement of existing business processes so that we would collect quality data for future modeling; and how we still managed to increase deposit profitability by 20% in the process. Read more.
Add to your personal schedule
14:55–15:35 Thursday, 23/04/2020
Session
Applied ML
Fredrik Schlyter (Violet ventures)
Finvoice started out as a small project consisting of one machine learning engineer and 50 invoices; today it's used by companies that scan over 80 million invoices per year. Fredrik Schlyter describes how machine learning can capture payment information on invoices and how it expanded from a cloud-based API solution to doing the inference directly on customers' mobile phones. Read more.
Add to your personal schedule
14:55–15:35 Thursday, 23/04/2020
Session
Computer vision
Angus Taylor (Microsoft), Patrick Buehler (Microsoft)
Training and deployment of deep neural networks for computer vision (CV) in realistic business scenarios remains a challenge for both data scientists and engineers. Angus Taylor and Patrick Buehler dig into state-of-the-art in the CV domain and provide resources and code examples for various CV tasks by leveraging the Microsoft CV best-practices repository. Read more.
Add to your personal schedule
14:55–15:35 Thursday, 23/04/2020
Shradha Ambekar (Intuit), Sunil Goplani (Intuit)
Imagine a business metric showing a sudden spike. Debugging data pipelines is nontrivial and finding the root cause can take hours to days. Shradha Ambekar and Sunil Goplani outline how Intuit built a self-serve tool that automatically discovers data pipeline lineage and applies anomaly detection to detect and debug issues in minutes. Read more.
Add to your personal schedule
14:55–15:35 Thursday, 23/04/2020
Session
NLP
Barbara Fusinska (Google)
Natural language processing (NLP) offers techniques to gain insight from and generate text data. Barbara Fusinska introduces you to NLP concepts and deep learning architectures using document context. You'll see a series of demos with TensorFlow from classification task to text generation. Read more.
Add to your personal schedule
14:55–15:35 Thursday, 23/04/2020
marcel blattner (Tamedia)
We still lack a clear understanding of how deep learning neural networks learn. Theoretical physics can provide some tools to gain more insight about generalization and model robustness. Marcel Blattner offers an overview of ongoing research and the first promising and applicable results. Read more.
Add to your personal schedule
14:55–15:35 Thursday, 23/04/2020
Charu Jaiswal (integrate.ai)
You train ML models and deploy them into the wild. And then the performance of your models decreases over time as business operations and customer behaviors change. You may only notice months later, incurring costly results. Charu Jaiswal explains how to fight back against performance loss by monitoring, testing, and retraining ML models actively in production. Read more.
14:55–15:35 Thursday, 23/04/2020
TBC

15:35

15:35–16:35 Thursday, 23/04/2020
Break (1h)

16:35

16:35–17:15 Thursday, 23/04/2020
TBC
16:35–17:15 Thursday, 23/04/2020
TBC
Add to your personal schedule
16:35–17:15 Thursday, 23/04/2020
Session
Governance
Majken Sander (Majken Sander)
Join Majken Sander to learn about the importance of data literacy and ethics. Schools and society in general need to educating citizens to raise their digital awareness. Companies need to build their employee's data literacy competencies. And the company's digital economy strategy should include data ethics and maybe also chose to embrace it as a competitive edge gained via branding value. Read more.
Add to your personal schedule
16:35–17:15 Thursday, 23/04/2020
Session
Case Studies
Jennifer Yang (Wells Fargo ECS)
Traditional rule-based data quality management methodology is costly and poorly scalable. It requires subject matter experts within business, data and technology domains. The presentation will discuss a use case that demonstrates how the machine learning techniques can be used in the data quality management on the big data platform in the financial industry. Read more.
Add to your personal schedule
16:35–17:15 Thursday, 23/04/2020
Victor Gonzalez (ConCrédito)
Victor Gonzalez explores how the fintech ecosystem changes the rules of the financial services industry in Mexico. The ConCrédito digital transformation driven by data project is the basis for the growth and scope of business objectives. The business model needed to migrate from the traditional model to digital processes, allowing ConCrédito to be in the hands of its customers. Read more.
Add to your personal schedule
16:35–17:15 Thursday, 23/04/2020
Session
Case Studies
Kim Nilsson (Pivigo), Robert Grieg-Gran (Mindful Chef)
Mindful Chef is a health-focused company that delivers weekly recipe boxes. In order to create a more personalised experience for their customers, they teamed up with Pivigo to develop an innovative recommender system. In this talk we will tell about this project and the development of a novel approach to understanding user taste that had an unexpectedly large impact on recommendation accuracy. Read more.
Add to your personal schedule
16:35–17:15 Thursday, 23/04/2020
Session
Computer vision
Tuhin Sharma (Binaize), Pravin Jha (Ameren)
Offline signature verification is one of the most critical tasks in traditional banking and financial industries. The unique challenge is to detect subtle but crucial differences between genuine and forged signatures. This verification task is even more challenging in writer-independent scenarios. Tuhin Sharma and Pravin Jha detail few-shot image classification. Read more.
Add to your personal schedule
16:35–17:15 Thursday, 23/04/2020
Lambda architecture is a general-purpose architecture for data platforms. It's been known for a while but was always hard to implement. Viacheslav Inozemtsev explains how, with the release of Delta Lake tables after Spark Structured Streaming became mature, Lambda architecture can now be done much easier than ever before for analytical and machine learning use cases. Read more.
Add to your personal schedule
16:35–17:15 Thursday, 23/04/2020
Swasti Kakker (LinkedIn), Manu Ram Pandit (LinkedIn), Navneet Verma (Linkedin)
Come and learn the challenges we overcame to make Darwin (Data Analytics and Relevance Workbench at LinkedIn) a reality. Know about how data scientists, developers, and analysts at LinkedIn can share their notebooks with their peers, author work in multiple languages, have their custom execution environments, execute long-running jobs, and do much more on a single hosted notebooks platform. Read more.
Add to your personal schedule
16:35–17:15 Thursday, 23/04/2020
Session
NLP
Markus Ludwig (Scout24)
Markus Ludwig shares insights from training and deploying a Transformer model that translates natural language to structured search queries. You'll cover the entire journey from idea to product, from teaching the model new tricks to helping it forget bad habits, and iteratively refine the user experience. Read more.
Add to your personal schedule
16:35–17:15 Thursday, 23/04/2020
Debasish Ghosh (Lightbend), Stavros Kontopoulos (Lightbend)
Debasish Ghosh and Stavros Kontopoulos explore online machine learning algorithm choices for streaming applications, including resource-constrained use cases like IoT and personalization, complete with code samples. You'll learn about drift detection algorithms and Hoeffding Adaptive Trees, performance metrics for online models, and practical concerns with deployment in production. Read more.
Add to your personal schedule
16:35–17:15 Thursday, 23/04/2020
Session
Streaming
Naghman Waheed (Bayer Crop Science)
IT information systems are a key enabler for Bayer's business in a very competitive environment. As the complexity of its business grows, so does the need to provide data for real-time business analytics and BI. Naghman Waheed walks you through the unique architecture that streams data out of its SAP ERP using SAP SLT and Kafka, enabling business decisions based on real-time events. Read more.

Contact us

confreg@oreilly.com

For conference registration information and customer service

partners@oreilly.com

For more information on community discounts and trade opportunities with O’Reilly conferences

Become a sponsor

For information on exhibiting or sponsoring a conference

pr@oreilly.com

For media/analyst press inquires