Mar 15–18, 2020

Schedule: Data Science and Machine Learning sessions

Add to your personal schedule
9:00am12:30pm Monday, March 16, 2020
Location: LL21A
Alice Zhao (Metis)
Data scientists are known to crunch numbers, but you may run into text data. Alice Zhao walks you through the steps to turn text data into a format that a machine can understand, identifies some of the most popular text analytics techniques, and showcases several natural language processing (NLP) libraries in Python including the natural language toolkit (NLTK), TextBlob, spaCy, and gensim. Read more.
Add to your personal schedule
9:00am12:30pm Monday, March 16, 2020
Location: LL21 C
Sourav Dey (Manifold), Alex Ng (Manifold)
Today, ML engineers are working at the intersection of data science and software engineering—that is, MLOps. This tutorial walks through the six steps of the Lean AI process and explains how it helps ML engineers work as an an integrated part of development and production teams. We’ll also work through a hands-on example using real-world data, so you can get up and running seamlessly. Read more.
Add to your personal schedule
1:30pm5:00pm Monday, March 16, 2020
Location: LL20D
Robert Nishihara (University of California, Berkeley), Ion Stoica (University of California, Berkeley), Philipp Moritz (University of California, Berkeley)
There's no easy way to scale up Python applications to the cloud. Ray is an open source framework for parallel and distributed computing, making it easy to program and analyze data at any scale by providing general-purpose high-performance primitives. Robert Nishihara, Ion Stoica, and Philipp Moritz demonstrate how to use Ray to scale up Python applications, data processing, and machine learning. Read more.
Add to your personal schedule
1:30pm5:00pm Monday, March 16, 2020
Location: LL21A
David Talby (Pacific AI), Alex Thomas (John Snow Labs), Claudiu Branzan (Accenture)
David Talby, Alex Thomas, and Claudiu Branzan detail the application of the latest advances in deep learning for common natural language processing (NLP) tasks such as named entity recognition, document classification, sentiment analysis, spell checking, and OCR. You'll learn to build complete text analysis pipelines using the highly performant, scalable, open source Spark NLP library in Python. Read more.
Add to your personal schedule
11:00am11:40am Tuesday, March 17, 2020
Location: LL21B
TBC Bank is in transition from product centric to client centric company. Obvious applications of analytics is developing personalized next best product recommendation for clients. After considering various collaborative filtering approaches we decided to implement ALS user-item matrix factorization method and demographic model. As result our pilot increased sales conversion rates by 70%. Read more.
Add to your personal schedule
11:50am12:30pm Tuesday, March 17, 2020
Location: LL21A
Identifying customer stages in a buying cycle enables marketers to perform personalized targeting depending on the stage of a customer. In this talk, we explain ML techniques to analyze the online journey of a customer's clickstream behavior to find the different stages of the customer's buying cycle and quantify the critical click events that help transition a user from one stage to another. Read more.
Add to your personal schedule
11:50am12:30pm Tuesday, March 17, 2020
Location: LL21 C
Secondary topics:  Technology Ethics
Guillaume Saint-Jacques (LinkedIn Corporation), Meg Garlinghouse (LinkedIn Corporation)
Most companies want to ensure their products and algorithms are “fair.” In this talk, we share our A/B testing approach to fairness, describing new methods which detect whether an experiment is introducing bias or inequality. We share a scalable implementation on Spark and go through examples of use cases and impact at LinkedIn. Read more.
Add to your personal schedule
1:45pm2:25pm Tuesday, March 17, 2020
Location: Expo Hall
Eitan Anzenberg (Bill.com)
Although the field of optical character recognition (OCR) has been around for almost half a century, document parsing and field extraction from images remain an open research topic. We utilize an end-to-end deep learning and OCR architecture to predict regions of interest within documents and automatically extract their text. Read more.
Add to your personal schedule
1:45pm2:25pm Tuesday, March 17, 2020
Location: LL20C
Secondary topics:  Security and Privacy
Haopei Wang (DataVisor)
We describe the design and implementation of a system that automatically extracts “fraud” related features for digital identifiers commonly collected by online services. We detail our approach in addressing real-time feature computation and creating templates for feature generation. Our system has been applied successfully to fraud detection as well as good user analysis. Read more.
Add to your personal schedule
2:35pm3:15pm Tuesday, March 17, 2020
Location: LL20D
Zak Hassan (Red Hat Inc)
The number of logs are increasing constantly and no human will, or can, monitor them all. We employ NLP for text encoding and machine learning methods for automated anomaly detection, in an effort to construct a tool that could help developers perform root cause analysis more quickly on failing applications. Also, provide a means to give feedback to the ML Algorithm to learn from false positives. Read more.
Add to your personal schedule
2:35pm3:15pm Tuesday, March 17, 2020
Location: LL21B
Secondary topics:  Security and Privacy
Sathya Chandran (DataVisor)
In this talk we provide key insights into current trends ATO fraud by analyzing 52 billion events generated by 1.1 billion users. Based on the insights we develop a set of features called user mobility features to capture suspicious device and IP switching patterns. Finally, we incorporate mobility features into an anomaly detection solution to detect suspicious account activity in real-time. Read more.
Add to your personal schedule
4:15pm4:55pm Tuesday, March 17, 2020
Location: LL21B
Uber spends hundreds of millions of dollars in marketing and constantly optimizes the allocation of these budgets. It deploys complex models, using Python and PyTorch, and borrowing from machine learning (ML) to speed up solvers to optimize marketing investment. Mario Vinasco explains the framework of the marketing spend problem and how it was implemented. Read more.
Add to your personal schedule
4:15pm4:55pm Tuesday, March 17, 2020
Location: LL21 C
Lior Gavish (Barracuda)
Lior Gavish breaks down a machine learning (ML)-based system that detects a highly evasive type of email-based fraud. The system combines innovative techniques for labeling and classifying highly unbalanced datasets with a distributed cloud application capable of processing high-volume communication in real time. Read more.
Add to your personal schedule
11:00am11:40am Wednesday, March 18, 2020
Location: LL20D
Jaya Susan Mathew (Microsoft)
With the need to cater to a global audience, there's a growing demand for applications to support speech identification, translation, and transliteration from one language to another. Jaya Susan Mathew explores this topic and how to quickly use some of the readily available APIs to identify, translate, or even transliterate speech or text within your application. Read more.
Add to your personal schedule
11:00am11:40am Wednesday, March 18, 2020
Location: LL21 C
Talia Tron (Intuit ), Joy Rimchala (Intuit)
Explainable AI (XAI) has gained industry traction, given the importance of explaining ML-assisted decisions in human terms & detecting undesirable ML defects before systems are deployed. Intuit data scientists delve into XAI developments & techniques, advantages/drawbacks of black box vs. glass box models, concept-based diagnostics & real-world examples using Design Thinking principles. Read more.
Add to your personal schedule
11:50am12:30pm Wednesday, March 18, 2020
Location: LL21 C
Patryk Oleniuk (Virgin Hyperloop One), Sandhya Raghavan (Virgin Hyperloop One)
How to use demand data to improve on the design of the 5th mode of transport: Hyperloop? We’ll discuss the passenger demand prediction methods and our tech stack (Spark / koalas, Keras, MLflow) we used to build a DNN-based near-future demand prediction, for simulation purposes. Read more.
Add to your personal schedule
1:45pm2:25pm Wednesday, March 18, 2020
Location: LL20A
Secondary topics:  Streaming and IoT
Denise Gosnell (DataStax)
Self-organizing networks rely on sensor communication and a centralized mechanism, like a cell tower, for transmitting the network's status. So, what happens if the tower goes down? And, how does a graph data structure get involved in the network's healing process? In this session, we will show you how we see graphs in this dynamic network and how path information helps sensors come back online. Read more.
Add to your personal schedule
2:35pm3:15pm Wednesday, March 18, 2020
Location: Expo Hall
Benjamin Batorsky (MIT Sloan)
Identifying and labelling named entities such as companies or people in text is a key part of text processing pipelines. In this talk I will outline how to train, test and implement a Named Entity Recognition (NER) model with spaCy. I will focus on how our team is using these techniques with large, non-English corpora. Read more.
Add to your personal schedule
2:35pm3:15pm Wednesday, March 18, 2020
Location: LL21B
ravi krishnaswamy (Autodesk)
Today’s applications interact with data in a distributed and decentralized world. Using graphs at scale, you can infer communities and your interaction by tracking access to common data across users and applications. Ravi Krishnaswamy displays a real-world product example with millions of users that uses the combined powers of Spark and graph databases to gain insights into customer workflows. Read more.
Add to your personal schedule
2:35pm3:15pm Wednesday, March 18, 2020
Location: LL21 E/F
Jay Budzik (ZestFinance)
More companies are adopting ML to run key business functions. The best performing models combine diverse model types into stacked ensembles, but explaining these hybrid models accurately has been impossible—until now. Hear how ZestFinance developed a new technique, Generalized Integrated Gradients (GIG), to explain complex ensembled ML models that are safe to use in high stakes applications. Read more.
Add to your personal schedule
4:15pm4:55pm Wednesday, March 18, 2020
Location: LL20D
Nisha Muktewar (Cloudera Fast Forward Labs), Victor Dibia (Cloudera Fast Forward Labs)
In many business use cases, it is frequently desirable to automatically identify and respond to abnormal data.This process can be challenging, especially when working with high dimensional, multivariate data. This talk explores deep learning approaches (Sequence models, VAEs, GANs) for anomaly detection, performance benchmarks and product possibilities. Read more.

Contact us

confreg@oreilly.com

For conference registration information and customer service

partners@oreilly.com

For more information on community discounts and trade opportunities with O’Reilly conferences

Become a sponsor

For information on exhibiting or sponsoring a conference

pr@oreilly.com

For media/analyst press inquires