Data Science, Machine Learning, & AI: Data science + business analytics training: Strata Data Conference

Wednesday, September 25: Keynotes & Sessions (Platinum, Gold, Silver & Bronze passes)
8:45am \| Location: 3E Strata Data Conference Keynotes
10:50 Morning break

Thursday, September 26: Keynotes & Sessions (Platinum, Gold, Silver & Bronze passes)
8:45am \| Location: 3E Strata Data Conference Keynotes
10:50 Morning break

9:00am - 5:00pm Monday, September 23 & Tuesday, September 24

Location: 1A 03

Recommendation systems using deep learning

Secondary topics: Deep Learning, Media and Advertising, Retail and e-commerce

Bargava Subramanian (Binaize), Amit Kapoor (narrativeVIZ)

Recommendation systems play a significant role—for users, a new world of options; for companies, it drives engagement and satisfaction. Amit Kapoor and Bargava Subramanian walk you through the different paradigms of recommendation systems and introduce you to deep learning-based approaches. You'll gain the practical hands-on knowledge to build, select, deploy, and maintain a recommendation system. Read more.

9:00am - 5:00pm Monday, September 23 & Tuesday, September 24

Location: 1A 15/16

Hands-on data science with Python

Secondary topics: Deep dive into specific tools, platforms, or frameworks

Michael Cullan (Pragmatic Institute)

Michael Cullan walks you through developing a machine learning pipeline from prototyping to production. You'll learn about data cleaning, feature engineering, model building and evaluation, and deployment and then extend these models into two applications from real-world datasets. All work will be done in Python. Read more.

9:00am - 5:00pm Monday, September 23 & Tuesday, September 24

Location: 1A 18

Expand your data science and machine learning skills with Python, R, SQL, Spark, and TensorFlow

Ian Cook (Cloudera)

Advancing your career in data science requires learning new languages and frameworks—but you face an overwhelming array of choices, each with different syntaxes, conventions, and terminology. Ian Cook simplifies the learning process by outlining the abstractions common to these systems. You'll go hands-on exercises to overcome obstacles to getting started using new tools. Read more.

9:00am - 5:00pm Monday, September 23 & Tuesday, September 24

Location: 1E 07

Machine learning from scratch in TensorFlow

Secondary topics: Deep dive into specific tools, platforms, or frameworks, Deep Learning

Dylan Bargteil (The Data Incubator)

The TensorFlow library provides for the use of computational graphs with automatic parallelization across resources. This architecture is ideal for implementing neural networks. Dylan Bargteil explores TensorFlow's capabilities in Python, demonstrating how to build machine learning algorithms piece by piece and how to use TensorFlow's Keras API with several hands-on applications. Read more.

9:00am–12:30pm Tuesday, September 24, 2019

Location: 1A 12/14

Efficient ML engineering: Tools and best practices

Secondary topics: Culture and Organization, Model Development, Governance, Operations

Sourav Dey (Manifold), Jakov Kucan (Manifold)

Sourav Dey and Jakov Kucan walk you through the six steps of the Lean AI process and explain how it helps your ML engineers work as an an integrated part of your development and production teams. You'll get a hands-on example using real-world data, so you can get up and running with Docker and Orbyter and see firsthand how streamlined they can make your workflow. Read more.

9:00am–12:30pm Tuesday, September 24, 2019

Location: 1A 21

SOLD OUT: Managing the complete machine learning lifecycle with MLflow

Secondary topics: Model Development, Governance, Operations

Jules Damji (Databricks)

ML development brings many new complexities beyond the software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools, and parameters to get the best results, and they need to track this information. Jules Damji walks you through MLflow, an open source project that simplifies the entire ML lifecycle, to solve this problem. Read more.

9:00am–12:30pm Tuesday, September 24, 2019

Location: 1A 23/24

Introduction to natural language processing in Python

Secondary topics: Text and Language processing and analysis

Alice Zhao (Metis)

As a data scientist, we are known to crunch numbers, but you need to decide what to do when you run into text data. Alice Zhao walks you through the steps to turn text data into a format that a machine can understand, explores some of the most popular text analytics techniques, and showcases several natural language processing (NLP) libraries in Python, including NLTK, TextBlob, spaCy, and gensim. Read more.

9:00am–12:30pm Tuesday, September 24, 2019

Location: 1E 12/13

Deep learning from scratch

Secondary topics: Deep Learning

Bruno Goncalves (Data For Science)

You'll go hands-on to learn the theoretical foundations and principal ideas underlying deep learning and neural networks. Bruno Gonçalves provides the code structure of the implementations that closely resembles the way Keras is structured, so that by the end of the course, you'll be prepared to dive deeper into the deep learning applications of your choice. Read more.

1:30pm–5:00pm Tuesday, September 24, 2019

Location: 1A 12/14

Deep learning methods for natural language processing

Secondary topics: Deep Learning, Financial Services, Text and Language processing and analysis

Garrett Hoffman (StockTwits)

Garrett Hoffman walks you through deep learning methods for natural language processing and natural language understanding tasks, using a live example in Python and TensorFlow with StockTwits data. Methods include Word2Vec, recurrent neural networks (RNNs) and variants (long short-term memory [LSTM] and gated recurrent unit [GRU]), and convolutional neural networks. Read more.

1:30pm–5:00pm Tuesday, September 24, 2019

Location: 1A 21

Building a recommender system with Amazon ML services

Secondary topics: Cloud Platforms and SaaS, Deep dive into specific tools, platforms, or frameworks

Karthik Sonti (Amazon Web Services), Emily Webber (Amazon Web Services), Varun Rao Bhamidimarri (Amazon Web Services)

Karthik Sonti, Emily Webber, and Varun Rao Bhamidimarri introduce you to the Amazon SageMaker machine learning platform and provide a high-level discussion of recommender systems. You'll dig into different machine learning approaches for recommender systems, including common methods such as matrix factorization as well as newer embedding approaches. Read more.

1:30pm–5:00pm Tuesday, September 24, 2019

Location: 1A 23/24

Natural language understanding at scale with Spark NLP

Secondary topics: Deep dive into specific tools, platforms, or frameworks, Text and Language processing and analysis

David Talby (Pacific AI), Alex Thomas (John Snow Labs), Saif Addin Ellafi (John Snow Labs), Claudiu Branzan (Accenture)

David Talby, Alex Thomas, Saif Addin Ellafi, and Claudiu Branzan walk you through state-of-the-art natural language processing (NLP) using the highly performant, highly scalable open source Spark NLP library. You'll spend about half your time coding as you work through four sections, each with an end-to-end working codebase that you can change and improve. Read more.

1:30pm–5:00pm Tuesday, September 24, 2019

Location: 1E 11

Sketching data and other magic tricks

Secondary topics: Streaming and IoT, Temporal data and time-series analytics

Sophie Watson (Red Hat), William Benton (Red Hat)

Go hands-on with Sophie Watson and William Benton to examine data structures that let you answer interesting queries about massive datasets in fixed amounts of space and constant time. This seems like magic, but they'll explain the key trick that makes it possible and show you how to use these structures for real-world machine learning and data engineering applications. Read more.

9:05am–9:15am Wednesday, September 25, 2019

Location: 3E

Recent trends in data and machine learning technologies

Ben Lorica (O'Reilly)

Ben Lorica dives into emerging technologies for building data infrastructures and machine learning platforms. Read more.

11:20am–12:00pm Wednesday, September 25, 2019

Location: 3B - Expo Hall

Unified tooling for machine learning interpretability

Secondary topics: Ethics

Harsha Nori (Microsoft), Samuel Jenkins (Microsoft), Rich Caruana (Microsoft)

Understanding decisions made by machine learning systems is critical for sensitive uses, ensuring fairness, and debugging production models. Interpretability presents options for trying to understand model decisions. Harsha Nori, Sameul Jenkins, and Rich Caruana explore the tools Microsoft is releasing to help you train powerful, interpretable models and interpret existing black box systems. Read more.

11:20am–12:00pm Wednesday, September 25, 2019

Location: 1A 06/07

Lightning-fast time series modeling and prediction: (S)ARIMA on steroids

Secondary topics: Temporal data and time-series analytics

Meir TOLEDANO (Anodot)

ARIMA has been used for time series modeling for decades. In practice, most time series collected from human activities exhibit seasonal patterns, but the efficient estimation of seasonal ARIMA ((S)ARIMA) models was inefficient for decades. Meir Toledano explains how Anodot was able to apply the technique for forecasting and anomaly detection for millions of time series every day. Read more.

11:20am–12:00pm Wednesday, September 25, 2019

Location: 1A 08/10

We run, we improve, we scale: The XGBoost story at Uber

Secondary topics: Deep dive into specific tools, platforms, or frameworks, Transportation and Logistics

Nan Zhu (Uber), Felix Cheung (Uber)

XGBoost has been widely deployed in companies across the industry. Nan Zhu and Felix Cheung dive into the internals of distributed training in XGBoost and demonstrate how XGBoost resolves the business problem in Uber with a scale to thousands of workers and tens of TB of training data. Read more.

11:20am–12:00pm Wednesday, September 25, 2019

Location: 1A 12/14

Practical feature engineering

Ted Dunning (MapR, now part of HPE)

Feature engineering is generally the section that gets left out of machine learning books, but it's also the most critical part in practice. Ted Dunning explores techniques, a few well known, but some rarely spoken of outside the institutional knowledge of top teams, including how to handle categorical inputs, natural language, transactions, and more in the context of machine learning. Read more.

1:15pm–1:55pm Wednesday, September 25, 2019

Location: 3B - Expo Hall

Feature engineering with Spark NLP to accelerate clinical trial recruitment

Secondary topics: Health and Medicine, Text and Language processing and analysis

Saif Addin Ellafi (John Snow Labs), Scott Hoch (BlackBox Engineering)

Recruiting patients for clinical trials is a major challenge in drug development. Saif Addin Ellafi and Scott Hoch explain how Deep 6 uses Spark NLP to scale its training and inference pipelines to millions of patients while achieving state-of-the-art accuracy. They dive into the technical challenges, the architecture of the full solution, and the lessons the company learned. Read more.

1:15pm–1:55pm Wednesday, September 25, 2019

Location: 1A 06/07

Improving OCR quality of documents using generative adversarial networks

Secondary topics: Deep Learning, Financial Services, Health and Medicine

Nagendra Shishodia (EXL), Chaithanya Manda (EXL), Solmaz Torabi (EXL)

Every NLP-based document-processing solution depends on converting scanned documents and images to machine readable text using an OCR solution, limited by the quality of scanned images. Nagendra Shishodia, Chaithanya Manda, and Solmaz Torabi explore how GAN can bring significant efficiencies in any document-processing solution by enhancing resolution and denoising scanned images. Read more.

1:15pm–1:55pm Wednesday, September 25, 2019

Location: 1A 08/10

Machine learning and large-scale data analysis on a centralized platform

Secondary topics: Data, Analytics, and AI Architecture, Financial Services, Retail and e-commerce

James Tang (Walmart Labs), Yiyi Zeng (Walmart Labs), Linhong Kang (Walmart Labs)

James Tang, Yiyi Zeng, and Linhong Kang outline how Walmart provides a secure and seamless shopping experience through machine learning and large scale data analysis on centralized platform. Read more.

1:15pm–1:55pm Wednesday, September 25, 2019

Location: 1A 12/14

Learning with limited labeled data

Secondary topics: Deep Learning

Shioulin Sam (Cloudera Fast Forward Labs)

Supervised machine learning requires large labeled datasets—a prohibitive limitation in many real world applications. But this could be avoided if machines could earn with a few labeled examples. Shioulin Sam explores and demonstrates an algorithmic solution that relies on collaboration between human and machine to label smartly, and she outlines product possibilities. Read more.

2:05pm–2:45pm Wednesday, September 25, 2019

Location: 3B - Expo Hall

Mind the semantic gap: How "talking semantics" can help you perform better data science

Secondary topics: Text and Language processing and analysis

Panos Alexopoulos (Textkernel)

In an era where discussions among data scientists are monopolized by the latest trends in machine learning, the role of semantics in data science is often underplayed. Panos Alexopoulos presents real-world cases where making fine, seemingly pedantic, distinctions in the meaning of data science tasks and the related data has helped improve significantly the effectiveness and value. Read more.

2:05pm–2:45pm Wednesday, September 25, 2019

Location: 1A 06/07

Real-time anomaly detection on observability data using neural networks

Secondary topics: Deep Learning, Temporal data and time-series analytics, Transportation and Logistics

Keshav Peswani (Expedia Group), Ashish Aggarwal (Expedia Group)

Observability is the key in modern architecture to quickly detect and repair problems in microservices. Modern observability platforms have evolved beyond simple application logs and include distributed tracing systems like Zipkin and Haystack. Keshav Peswani and Ashish Aggarwal explore how combining them with real-time, intelligent alerting mechanisms helps in the automated detection of problems. Read more.

2:05pm–2:45pm Wednesday, September 25, 2019

Location: 1A 08/10

Data science versus engineering: Does it really have to be this way?

Secondary topics: Culture and Organization

Ann Spencer (Domino), Amy Heineike (Primer), Paco Nathan (derwen.ai), Chris Wiggins (NYT | Columbia)

If, as a data scientist, you've wondered why it takes so long to deploy your model into production or, as an engineer, thought data scientists have no idea what they want, you're not alone. Join a lively discussion with industry veterans Ann Spencer, Paco Nathan, Amy Heineike, and Chris Wiggins to find best practices or insights on increasing collaboration when developing and deploying models. Read more.

2:05pm–2:45pm Wednesday, September 25, 2019

Location: 1A 12/14

Fair, privacy-preserving, and secure ML

Secondary topics: Ethics, Privacy and Security, Retail and e-commerce

Mikio Braun (Zalando)

With ML becoming more mainstream, the side effects of machine learning and AI on our lives become more visible. You have to take extra measures to make machine learning models fair and unbiased. And awareness for preserving the privacy in ML models is rapidly growing. Mikio Braun explores techniques and concepts around fairness, privacy, and security when it comes to machine learning models. Read more.

2:55pm–3:35pm Wednesday, September 25, 2019

Location: 3B - Expo Hall

Toward more fine-grained sentiment and emotion analysis of text

Secondary topics: Text and Language processing and analysis

Gerard de Melo (Rutgers University)

Gerard de Melo takes a deep dive into the kinds of sentiment and emotion consumers associate with a text. With new data-driven approaches, organizations can better pay attention to what's being said about them in different markets. And you can consider fonts and palettes best suited to convey specific emotions, so organizations can make informed choices when presenting information to consumers. Read more.

2:55pm–3:35pm Wednesday, September 25, 2019

Location: 1A 06/07

Introducing a new anomaly detection algorithm (SR-CNN) inspired by computer vision

Secondary topics: Deep Learning, Temporal data and time-series analytics

Tony Xing (Microsoft), Congrui Huang (Microsoft), Qiyang Li (Microsoft), Wenyi Yang (Microsoft)

Anomaly detection may sound old fashioned, yet it's super important in many industry applications. Tony Xing, Congrui Huang, Qiyang Li, and Wenyi Yang detail a novel anomaly-detection algorithm based on spectral residual (SR) and convolutional neural network (CNN) and how this method was applied in the monitoring system supporting Microsoft AIOps and business incident prevention. Read more.

2:55pm–3:35pm Wednesday, September 25, 2019

Location: 1A 08/10

Building a machine learning framework to measure TV advertising attribution

Secondary topics: Media and Advertising, Retail and e-commerce

Fei Wang (CarGurus)

Fei Wang takes a deep dive into a case study for the CarGurus TV Attribution Model. You'll understand how you can leverage the creation of a causal inference model to calculate cost per acquisition (CPA) of TV spend and measure effectiveness when compared to CPA of digital performance marketing spend. Read more.

2:55pm–3:35pm Wednesday, September 25, 2019

Location: 1A 12/14

How machine learning meets optimization

Secondary topics: Financial Services

Jari Koister (FICO )

Machine learning and constraint-based optimization are both used to solve critical business problems. They come from distinct research communities and have traditionally been treated separately. But Jari Koister examines how they're similar, how they're different, and how they can be used to solve complex problems with amazing results. Read more.

4:35pm–5:15pm Wednesday, September 25, 2019

Location: 3B - Expo Hall

Search logs + machine learning = autotagged inventory

Secondary topics: Text and Language processing and analysis

John Berryman (Eventbrite)

Eventbrite is exploring a new machine learning approach that allows it to harvest data from customer search logs and automatically tag events based upon their content. John Berryman dives into the results and how they have allowed the company to provide users with a better inventory-browsing experience. Read more.

4:35pm–5:15pm Wednesday, September 25, 2019

Location: 1A 06/07

Deep learning on mobile

Secondary topics: Data Integration and Data Processing, Deep Learning, Financial Services

Anirudh Koul (Microsoft), Meher Kasam (Square)

Over the last few years, convolutional neural networks (CNNs) have risen in popularity, especially in the area of computer vision. Anirudh Koul and Meher Kasam take you through how you can get deep neural nets to run efficiently on mobile devices. Read more.

4:35pm–5:15pm Wednesday, September 25, 2019

Location: 1A 08/10

From whiteboard to production: A demand forecasting system for an online grocery shop

Secondary topics: Retail and e-commerce, Temporal data and time-series analytics

Robert Pesch (inovex), Robin Senge (inovex)

Data-driven software is revolutionizing the world and enable intelligent services we interact with daily. Robert Pesch and Robin Senge outline the development process, statistical modeling, data-driven decision making, and components needed for productionizing a fully automated and highly scalable demand forecasting system for an online grocery shop for a billion-dollar retail group in Europe. Read more.

4:35pm–5:15pm Wednesday, September 25, 2019

Location: 1A 12/14

Predicting Criteo’s internet traffic load using Bayesian structural time series models

Secondary topics: Media and Advertising, Temporal data and time-series analytics

Hamlet Jesse Medina Ruiz (Criteo)

Criteo’s infrastructure provides the capacity and connectivity to host Criteo’s platform and applications. The evolution of this infrastructure is driven by the ability to forecast Criteo’s traffic demand. Hamlet Jesse Medina Ruiz explains how Criteo uses Bayesian dynamic time series models to accurately forecast its traffic load and optimize hardware resources across data centers. Read more.