Mar 15–18, 2020

Schedule: Deep dive into specific tools, platforms, or frameworks sessions

Add to your personal schedule
9:00am12:30pm Monday, March 16, 2020
Location: LL21A
Alice Zhao (Metis)
Data scientists are known to crunch numbers, but you may run into text data. Alice Zhao walks you through the steps to turn text data into a format that a machine can understand, identifies some of the most popular text analytics techniques, and showcases several natural language processing (NLP) libraries in Python including the natural language toolkit (NLTK), TextBlob, spaCy, and gensim. Read more.
Add to your personal schedule
9:00am12:30pm Monday, March 16, 2020
Location: LL21 C
Sourav Dey (Manifold), Alex Ng (Manifold)
Today, ML engineers are working at the intersection of data science and software engineering—that is, MLOps. This tutorial walks through the six steps of the Lean AI process and explains how it helps ML engineers work as an an integrated part of development and production teams. We’ll also work through a hands-on example using real-world data, so you can get up and running seamlessly. Read more.
Add to your personal schedule
1:30pm5:00pm Monday, March 16, 2020
Location: LL20D
Robert Nishihara (University of California, Berkeley), Ion Stoica (University of California, Berkeley), Philipp Moritz (University of California, Berkeley)
There's no easy way to scale up Python applications to the cloud. Ray is an open source framework for parallel and distributed computing, making it easy to program and analyze data at any scale by providing general-purpose high-performance primitives. Robert Nishihara, Ion Stoica, and Philipp Moritz demonstrate how to use Ray to scale up Python applications, data processing, and machine learning. Read more.
Add to your personal schedule
1:30pm5:00pm Monday, March 16, 2020
Location: LL21A
David Talby (Pacific AI), Alex Thomas (John Snow Labs), Claudiu Branzan (Accenture)
David Talby, Alex Thomas, and Claudiu Branzan detail the application of the latest advances in deep learning for common natural language processing (NLP) tasks such as named entity recognition, document classification, sentiment analysis, spell checking, and OCR. You'll learn to build complete text analysis pipelines using the highly performant, scalable, open source Spark NLP library in Python. Read more.
Add to your personal schedule
11:50am12:30pm Tuesday, March 17, 2020
Location: Expo Hall
Abe Gong (Superconductive Health)
Data organizations everywhere struggle with pipeline debt: untested, unverified assumptions that corrupt data quality, drain productivity, and erode trust in data. This presentation shares best practices gathered from across the data community in the course of developing leading open source library for fighting pipeline debt and ensuring data quality: Great Expectations. Read more.
Add to your personal schedule
1:45pm2:25pm Tuesday, March 17, 2020
Location: LL21B
Joseph Sirosh (Microsoft)
Compass is changing Real Estate by leveraging its industry-leading software to build search and analytical tools that help Real Estate professionals find, market, and sell homes. In this session, Compass engineers discuss how they leverage AWS services, including Amazon Elasticsearch Service, to deliver a complete, scalable home-search solution. Read more.
Add to your personal schedule
2:35pm3:15pm Tuesday, March 17, 2020
Location: LL20A
Michael Freedman (TimescaleDB | Princeton University)
Time series data is everywhere, with monitoring and IoT applications generating 10s of millions of metrics per second and petabytes of data. In this talk, Michael shows how to build a distributed time-series database that offers the power of full SQL at scale. Read more.
Add to your personal schedule
4:15pm4:55pm Tuesday, March 17, 2020
Location: LL21B
Uber spends hundreds of millions of dollars in marketing and constantly optimizes the allocation of these budgets. It deploys complex models, using Python and PyTorch, and borrowing from machine learning (ML) to speed up solvers to optimize marketing investment. Mario Vinasco explains the framework of the marketing spend problem and how it was implemented. Read more.
Add to your personal schedule
5:05pm5:45pm Tuesday, March 17, 2020
Location: LL20C
Lars George (Okera)
With various levels of security layers & different departments responsible for various types of data, there are a number of challenges with actually managing security & governance within AWS IAM. This session will discuss the security layers, why there’s such a conundrum with IAM, if IAM is actually slowing down data projects, and the access control requirements that are needed in data lakes. Read more.
Add to your personal schedule
11:50am12:30pm Wednesday, March 18, 2020
Location: 230 A
Jay Smith (Google), Remy Welch (Google Cloud)
Data is a valuable resource but collecting and analyzing the data can be challenging. Further, the cost of resource allocation often prohibits the speed in which analysis can take place. Jay and Remy will show you how serverless architecture can improve the portability and scalability of streaming event-driven Apache Spark jobs and perform ETL tasks using serverless frameworks. Read more.
Add to your personal schedule
1:45pm2:25pm Wednesday, March 18, 2020
Location: LL21 D
Arvind Prabhakar (StreamSets)
DataOps is the best approach for enterprises to improve business and is currently driving future revenue streams and competitive differentiation, which is why so many businesses are rethinking their data strategy. DataOps solves all the problems that come along with managing data movement at scale. Read more.
Add to your personal schedule
2:35pm3:15pm Wednesday, March 18, 2020
Location: LL21 E/F
Jay Budzik (ZestFinance)
More companies are adopting ML to run key business functions. The best performing models combine diverse model types into stacked ensembles, but explaining these hybrid models accurately has been impossible—until now. Hear how ZestFinance developed a new technique, Generalized Integrated Gradients (GIG), to explain complex ensembled ML models that are safe to use in high stakes applications. Read more.
Add to your personal schedule
2:35pm3:15pm Wednesday, March 18, 2020
Location: LL20A
Wangda Tan (Cloudera), Arpit Agarwal (Cloudera)
2020 Hadoop is still evolving fast! In this talk, we’ll start with the current status of Apache Hadoop community, we'll then move on to the exciting present & future of Hadoop 3.x. We will cover new features like Hadoop on Cloud, GPU support, NameNode federation, Docker, 10X scheduling improvements, OZone, etc. Also we will talk about upgrade guidance from 2.x to 3.x. Read more.
Add to your personal schedule
4:15pm4:55pm Wednesday, March 18, 2020
Location: 230 A
Sijie Guo (StreamNative), Yong Zhang (StreamNative)
This presentation deep dives into the details of Pulsar transaction and how it can be used in Pulsar Functions and other processing engines to achieve transactional event streaming. Read more.
Add to your personal schedule
4:15pm4:55pm Wednesday, March 18, 2020
Location: LL20D
Nisha Muktewar (Cloudera Fast Forward Labs), Victor Dibia (Cloudera Fast Forward Labs)
In many business use cases, it is frequently desirable to automatically identify and respond to abnormal data.This process can be challenging, especially when working with high dimensional, multivariate data. This talk explores deep learning approaches (Sequence models, VAEs, GANs) for anomaly detection, performance benchmarks and product possibilities. Read more.

Contact us

confreg@oreilly.com

For conference registration information and customer service

partners@oreilly.com

For more information on community discounts and trade opportunities with O’Reilly conferences

Become a sponsor

For information on exhibiting or sponsoring a conference

pr@oreilly.com

For media/analyst press inquires