Data Science, Machine Learning & AI: Big data conference & machine learning training

Wednesday Mar 27: Keynotes & Sessions (Platinum, Gold, Silver & Bronze passes)
8:45am \| Location: Ballroom Strata Data Conference Keynotes
10:30am Morning break

Thursday Mar 28: Keynotes & Sessions (Platinum, Gold, Silver & Bronze passes)
8:45am \| Location: Ballroom Strata Data Conference Keynotes
10:30am Morning break

9:00am - 5:00pm Monday, March 25 & Tuesday, March 26

Machine learning from scratch in TensorFlow

Location: 2014

Secondary topics: Deep Learning

Robert Schroll (The Data Incubator)

Average rating:

(4.50, 2 ratings)

The TensorFlow library provides for the use of computational graphs, with automatic parallelization across resources. This architecture is ideal for implementing neural networks. Robert Schroll offers an overview of TensorFlow's capabilities in Python, demonstrating how to build machine learning algorithms piece by piece and how to use TensorFlow's Keras API with several hands-on applications. Read more.

9:00am - 5:00pm Monday, March 25 & Tuesday, March 26

Hands-on data science with Python

Location: 2016

Don Fox (The Data Incubator)

Average rating:

(4.75, 12 ratings)

Don Fox walks you through developing a machine learning pipeline, from prototyping to production. You'll learn about data cleaning, feature engineering, model building and evaluation, and deployment and then extend these models into two applications from real-world datasets. All work will be done in Python. Read more.

9:00am - 5:00pm Monday, March 25 & Tuesday, March 26

Expand your data science and machine learning skills with Python, R, SQL, Spark, and TensorFlow

Location: 2020

Secondary topics: Deep Learning

Ian Cook (Cloudera)

Average rating:

(4.00, 1 rating)

Advancing your career in data science requires learning new languages and frameworks—but learners face an overwhelming array of choices, each with different syntaxes, conventions, and terminology. Ian Cook simplifies the learning process by elucidating the abstractions common to these systems. Through hands-on exercises, you'll overcome obstacles to getting started using new tools. Read more.

9:00am - 5:00pm Monday, March 25 & Tuesday, March 26

Forecasting financial time series with deep learning on Azure

Location: 3018

Secondary topics: Deep Learning, Financial Services, Temporal data and time-series analytics

Francesca Lazzeri (Microsoft), Jen Ren (Microsoft)

Francesca Lazzeri and Jen Ren walk you through the core steps for using Azure Machine Learning services to train your machine learning models both locally and on remote compute resources. Read more.

9:00am–12:30pm Tuesday, March 26, 2019

Recurrent neural networks without a PhD

Location: 2002

Secondary topics: Deep Learning, Temporal data and time-series analytics

Martin Gorner (Google)

Average rating:

(4.50, 4 ratings)

Martin Gorner leads a hands-on introduction to recurrent neural networks and TensorFlow. Join in to discover what makes RNNs so powerful for time series analysis. Read more.

9:00am–12:30pm Tuesday, March 26, 2019

Natural language understanding at scale with Spark NLP

Location: 2009

Secondary topics: Deep Learning, Text and Language processing and analysis

David Talby (Pacific AI), Alex Thomas (John Snow Labs), Claudiu Branzan (Accenture)

Average rating:

(4.75, 8 ratings)

David Talby, Alex Thomas, and Claudiu Branzan lead a hands-on introduction to scalable NLP using the highly performant, highly scalable open source Spark NLP library. You’ll spend about half your time coding as you work through four sections, each with an end-to-end working codebase that you can change and improve. Read more.

9:00am–12:30pm Tuesday, March 26, 2019

AI privacy and ethical compliance toolkit

Location: 2001

Secondary topics: Ethics, Security and Privacy

Iman Saleh (Intel), Cory Ilo (Intel), Cindy Tseng (Intel)

Average rating:

(5.00, 3 ratings)

From healthcare to smart home to autonomous vehicles, new applications of autonomous systems are raising ethical concerns about a host of issues, including bias, transparency, and privacy. Iman Saleh, Cory Ilo, and Cindy Tseng demonstrate tools and capabilities that can help data scientists address these concerns and bridge the gap between ethicists, regulators, and machine learning practitioners. Read more.

1:30pm–5:00pm Tuesday, March 26, 2019

Practical techniques for interpretable machine learning

Location: 2001

Secondary topics: Ethics

Patrick Hall (bnh.ai | H2O.ai)

Average rating:

(4.00, 9 ratings)

If machine learning can lead to financial gains for your organization, why isn’t everyone doing it? One reason is training machine learning systems with transparent inner workings and auditable predictions is difficult. Patrick Hall details the good, bad, and downright ugly lessons learned from his years of experience implementing solutions for interpretable machine learning. Read more.

1:30pm–5:00pm Tuesday, March 26, 2019

The hitchhiker's guide to deep learning-based recommenders in production

Location: 2002

Secondary topics: Deep Learning, Media, Marketing, Advertising, Model lifecycle management

Abhishek Kumar (Publicis Sapient), Pramod Singh (Walmart Labs )

Average rating:

(4.17, 6 ratings)

Abhishek Kumar and Pramod Singh walk you through deep learning-based recommender and personalization systems they've built for clients. Join in to learn how to use TensorFlow Serving and MLflow for end-to-end productionalization, including model serving, Dockerization, reproducibility, and experimentation, and Kubernetes for deployment and orchestration of ML-based microarchitectures. Read more.

1:30pm–5:00pm Tuesday, March 26, 2019

Analytics Zoo: Distributed TensorFlow and Keras on Apache Spark

Location: 2009

Secondary topics: Deep Learning, Temporal data and time-series analytics

Jason Dai (Intel), Yuhao Yang (Intel), Jiao(Jennie) Wang (Intel), Guoqiong Song (Intel)

Average rating:

(3.00, 6 ratings)

Jason Dai, Yuhao Yang, Jennie Wang, and Guoqiong Song explain how to build and productionize deep learning applications for big data with Analytics Zoo—a unified analytics and AI platform that seamlessly unites Spark, TensorFlow, Keras, and BigDL programs into an integrated pipeline—using real-world use cases from JD.com, MLSListings, the World Bank, Baosight, and Midea/KUKA. Read more.

1:30pm–5:00pm Tuesday, March 26, 2019

Using the full spectrum of data science to drive business decisions

Location: 2011

Secondary topics: AI and machine learning in the enterprise

Chi-Yi Kuan (LinkedIn), Tiger Zhang (LinkedIn), Xiaojing Dong (LinkedIn), Burcu Baran (LinkedIn), Emily Huang (LinkedIn)

Average rating:

(4.43, 14 ratings)

Thanks to the rapid growth in data resources, business leaders now appreciate the importance (and the challenge) of mining information from data. Join in as a group of LinkedIn's data scientists share their experiences successfully leveraging emerging techniques to assist in intelligent decision making. Read more.

11:00am–11:40am Wednesday, March 27, 2019

Interpretable and resilient AI for financial services

Location: 2009

Secondary topics: Ethics, Financial Services

Jari Koister (FICO )

Average rating:

(4.33, 3 ratings)

Financial services are increasingly deploying AI services for a wide range of applications, such as identifying fraud and financial crimes. Such deployment requires models to be interpretable, explainable, and resilient to adversarial attacks—regulatory requirements prohibit black-box machine learning models. Jari Koister shares tools and infrastructure has developed to support these needs. Read more.

11:00am–11:40am Wednesday, March 27, 2019

Cloud native machine learning: Emerging trends and the road ahead

Location: 2011

Secondary topics: AI and Data technologies in the cloud

Tristan Zajonc (Cloudera), Tim Chen (Cloudera)

Average rating:

(4.40, 5 ratings)

Data platforms are being asked to support an ever increasing range of workloads and compute environments, including machine learning and elastic cloud platforms. Tristan Zajonc and Tim Chen discuss emerging capabilities, including running machine learning and Spark workloads on autoscaling container platforms, and share their vision for the road ahead for ML and AI in the cloud. Read more.

11:00am–11:40am Wednesday, March 27, 2019

Deep learning applications for non-engineers

Location: 2016

Secondary topics: AI and Data technologies in the cloud, Deep Learning, Open Data, Data Generation and Data Networks

Jeremy Howard ( fast.ai | USF | doc.ai and platform.ai)

Average rating:

(4.80, 5 ratings)

Jeremy Howard describes how to leverage the latest research from the deep learning and HCI communities to train neural networks from scratch—without code or preexisting labels. He then shares case studies in fashion, retail and ecommerce, travel, and agriculture where these approaches have been used. Read more.

11:00am–11:40am Wednesday, March 27, 2019

Machine learning on encrypted data: Challenges and opportunities

Location: Expo Hall

Secondary topics: AI and Data technologies in the cloud, Security and Privacy

Alon Kaufman (Duality), Vinod Vaikuntanathan (MIT and Duality Technologies)

Average rating:

(3.75, 4 ratings)

Alon Kaufman and Vinod Vaikuntanathan discuss the challenges and opportunities of machine learning on encrypted data and describe the state of the art in this space. Read more.

11:00am–11:40am Wednesday, March 27, 2019

Building high-performance text classifiers on a limited labeling budget

Location: 2010

Secondary topics: AI and Data technologies in the cloud, Text and Language processing and analysis

Robert Horton (Microsoft), Mario Inchiosa (Microsoft), Ali Zaidi (Microsoft)

Average rating:

(4.70, 10 ratings)

Robert Horton, Mario Inchiosa, and Ali Zaidi demonstrate how to use three cutting-edge machine learning techniques—transfer learning from pretrained language models, active learning to make more effective use of a limited labeling budget, and hyperparameter tuning to maximize model performance—to up your modeling game. Read more.

11:00am–11:40am Wednesday, March 27, 2019

The death of coding: How AI redefines our relationship with computers (sponsored by IBM)

Location: 2005

Sam Lightstone (IBM)

Average rating:

(4.50, 4 ratings)

Sam Lightstone discusses how AI is fundamentally changing computer science and the practice of coding. Join in to discover what machine learning means today and explore recent advances in hardware and software and breakthrough innovations. Read more.

11:50am–12:30pm Wednesday, March 27, 2019

Applied machine learning in finance

Location: 2009

Secondary topics: Financial Services, Text and Language processing and analysis

Chakri Cherukuri (Bloomberg LP)

Average rating:

(4.33, 3 ratings)

Quantitative finance is a rich field in finance where advanced mathematical and statistical techniques are employed by both sell-side and buy-side institutions. Chakri Cherukuri explains how machine learning and deep learning techniques are being used in quantitative finance and details how these models work under the hood. Read more.

11:50am–12:30pm Wednesday, March 27, 2019

Automated machine learning for Agile data science at scale

Location: 2011

Secondary topics: AI and Data technologies in the cloud, AI and machine learning in the enterprise, Automation in data science and big data

Sarah Aerni (Salesforce)

Average rating:

(4.25, 4 ratings)

How does Salesforce make data science an Agile partner to over 100,000 customers? Sarah Aerni shares the nuts and bolts of the platform and details the Agile process behind it. From open source autoML library TransmogrifAI and experimentation to deployment and monitoring, Sarah covers the tools that make it possible for data scientists to rapidly iterate and adopt a truly Agile methodology. Read more.

11:50am–12:30pm Wednesday, March 27, 2019

Artificial intelligence on human behavior: New insights into customer segmentation

Location: 2016

Secondary topics: AI and machine learning in the enterprise, Deep Learning, Media, Marketing, Advertising, Retail and e-commerce

Melinda Han Williams (Dstillery)

Average rating:

(4.86, 14 ratings)

Customer segmentation based on coarse survey data is a staple of traditional market research. Melinda Han Williams explains how Dstillery uses neural networks to model the digital pathways of 100M consumers and uses the resulting embedding space to cluster customer populations into fine-grained behavioral segments and inform smarter consumer insights—in the process, creating a map of the internet. Read more.

11:50am–12:30pm Wednesday, March 27, 2019

Applying deep learning at Google for recommendations

Location: Expo Hall

Secondary topics: AI and Data technologies in the cloud, Deep Learning, Media, Marketing, Advertising, Retail and e-commerce

Ron Bodkin (Google)

Average rating:

(4.33, 6 ratings)

Google uses deep learning extensively in new and existing products. Join Ron Bodkin to learn how Google has used deep learning for recommendations at YouTube, in the Play store, and for customers in Google Cloud. You'll explore the role of embeddings, recurrent networks, contextual variables, and wide and deep learning and discover how to do candidate generation and ranking with deep learning. Read more.

11:50am–12:30pm Wednesday, March 27, 2019

NLP from scratch: Solving the cold start problem for natural language processing

Location: 2010

Secondary topics: Text and Language processing and analysis

Michael Johnson (Lockheed Martin), Norris Heintzelman (Lockheed Martin)

Average rating:

(4.60, 15 ratings)

How do you train a machine learning model with no training data? Michael Johnson and Norris Heintzelman share their journey implementing multiple solutions to bootstrapping training data in the NLP domain, covering topics including weak supervision, building an active learning framework, and annotation adjudication for named-entity recognition. Read more.

2:40pm–3:20pm Wednesday, March 27, 2019

From an archived data field to GO-JEK’s world-class product feature for customer experience

Location: 2009

Secondary topics: Text and Language processing and analysis, Transportation and Logistics

Divya Choudhary (University of Southern California)

Average rating:

(4.50, 2 ratings)

Divya Choudhary explains how GO-JEK uses random chat messages and notes written in a local language sent by customers to their drivers while waiting for a ride to arrive to carve out unparalleled information about pickup points and their names (which sometimes even Google Maps has no idea of) and help create a world-class customer pickup experience feature. Read more.

2:40pm–3:20pm Wednesday, March 27, 2019

Online evaluation of machine learning models

Location: 2011

Secondary topics: Model lifecycle management

Ted Dunning (MapR, now part of HPE)

Average rating:

(4.70, 10 ratings)

Evaluating machine learning models is surprisingly hard, particularly because these systems interact in very subtle ways. Ted Dunning breaks the problem of evaluation apart into operational and function evaluation, demonstrating how to do each without unnecessary pain and suffering. Along the way, he shares exciting visualization techniques that will help make differences strikingly apparent. Read more.

2:40pm–3:20pm Wednesday, March 27, 2019

Dilated neural networks for time series forecasting

Location: 2016

Secondary topics: Deep Learning, Temporal data and time-series analytics

Chenhui Hu (Microsoft)

Average rating:

(4.67, 6 ratings)

Dilated neural networks are a class of recently developed neural networks that achieve promising results in time series forecasting. Chenhui Hu discusses representative network architectures of dilated neural networks and demonstrates their advantages in terms of training efficiency and forecast accuracy by applying them to solve sales forecasting and financial time series forecasting problems. Read more.

2:40pm–3:20pm Wednesday, March 27, 2019

Natural language understanding in task-oriented conversational AI

Location: Expo Hall

Secondary topics: Deep Learning, Text and Language processing and analysis

Sonal Gupta (Facebook)

Average rating:

(4.40, 5 ratings)

Sonal Gupta explores practical systems for building a conversational AI system for task-oriented queries and details a way to do more advanced compositional understanding, which can understand cross-domain queries, using hierarchical representations. Read more.

2:40pm–3:20pm Wednesday, March 27, 2019

The measure and mismeasure of fairness in machine learning

Location: 2010

Secondary topics: Ethics

Sharad Goel (Stanford University)

Average rating:

(4.00, 4 ratings)

The nascent field of fair machine learning aims to ensure that decisions guided by algorithms are equitable. Several formal definitions of fairness have gained prominence, but, as Sharad Goel argues, nearly all of them suffer from significant statistical limitations. Perversely, when used as a design constraint, they can even harm the very groups they were intended to protect. Read more.

4:20pm–5:00pm Wednesday, March 27, 2019

Spark NLP: How Roche automates knowledge extraction from pathology and radiology reports

Location: 2009

Secondary topics: Deep Learning, Health and Medicine, Text and Language processing and analysis

Yogesh Pandit (Roche), Saif Addin Ellafi (John Snow Labs), Vishakha Sharma (Roche Molecular Solutions)

Average rating:

(4.67, 3 ratings)

Yogesh Pandit, Saif Addin Ellafi, and Vishakha Sharma discuss how Roche applies Spark NLP for healthcare to extract clinical facts from pathology reports and radiology. They then detail the design of the deep learning pipelines used to simplify training, optimization, and inference of such domain-specific models at scale. Read more.

4:20pm–5:00pm Wednesday, March 27, 2019

Scaling model training: From flexible training APIs to resource management with Kubernetes

Location: 2011

Secondary topics: Automation in data science and big data, Financial Services, Model lifecycle management

Kelley Rivoire (Stripe)

Average rating:

(4.33, 3 ratings)

Production ML applications benefit from reproducible, automated retraining, and deployment of ever-more predictive models trained on ever-increasing amounts of data. Kelley Rivoire explains how Stripe built a flexible API for training machine learning models that's used to train thousands of models per week on Kubernetes, supporting automated deployment of new models with improved performance. Read more.

4:20pm–5:00pm Wednesday, March 27, 2019

User-based real-time product recommendations leveraging deep learning using Analytics Zoo on Apache Spark and BigDL

Location: 2016

Secondary topics: Deep Learning, Retail and e-commerce

Luyang Wang (Restaurant Brands International), Jing (Nicole) Kong (Office Depot), Guoqiong Song (Intel), Maneesha Bhalla (Office Depot)

Average rating:

(4.00, 2 ratings)

User-based real-time recommendation systems have become an important topic in ecommerce. Lu Wang, Nicole Kong, Guoqiong Song, and Maneesha Bhalla demonstrate how to build deep learning algorithms using Analytics Zoo with BigDL on Apache Spark and create an end-to-end system to serve real-time product recommendations. Read more.

4:20pm–5:00pm Wednesday, March 27, 2019

Toward deep and representation learning for talent search at LinkedIn

Location: Expo Hall

Secondary topics: Deep Learning, Graph technologies and analytics, Text and Language processing and analysis

Gungor Polatkan (LinkedIn)

Average rating:

(4.33, 3 ratings)

Talent search systems at LinkedIn strive to match the potential candidates to the hiring needs of a recruiter expressed in terms of a search query. Gungor Polatkan shares the results of the company's deployment of deep learning models on a real-world production system serving 500M+ users through LinkedIn Recruiter. Read more.

4:20pm–5:00pm Wednesday, March 27, 2019

Time series forecasting using statistical and machine learning models: When and how

Location: 2010

Secondary topics: Financial Services, Temporal data and time-series analytics

Ying Yau (Walmart Labs)

Average rating:

(3.29, 7 ratings)

Time series forecasting techniques are applied in a wide range of scientific disciplines, business scenarios, and policy settings. Jeffrey Yau discusses the applications of statistical time series models, such as ARIMA, VAR, and regime-switching models, and machine learning models, such as random forest and neural network-based models, to forecasting problems. Read more.

5:10pm–5:50pm Wednesday, March 27, 2019

The magic behind your Lyft ride prices: A case study on machine learning and streaming

Location: 2009

Secondary topics: Data Platforms, Streaming, realtime analytics, and IoT, Transportation and Logistics

Rakesh Kumar (Lyft), Thomas Weise (Lyft)

Average rating:

(4.00, 3 ratings)

Rakesh Kumar and Thomas Weise explore how Lyft dynamically prices its rides with a combination of various data sources, ML models, and streaming infrastructure for low latency, reliability, and scalability—allowing the pricing system to be more adaptable to real-world changes. Read more.

5:10pm–5:50pm Wednesday, March 27, 2019

Talking to the machines: Monitoring production machine learning systems

Location: 2011

Secondary topics: Automation in data science and big data, Model lifecycle management, Temporal data and time-series analytics

Ting-Fang Yen (DataVisor)

Average rating:

(4.00, 3 ratings)

Ting-Fang Yen details an approach for monitoring production machine learning systems that handle billions of requests daily by discovering detection anomalies, such as spurious false positives, as well as gradual concept drifts when the model no longer captures the target concept. Join in to explore new tools for detecting undesirable model behaviors early in large-scale online ML systems. Read more.

5:10pm–5:50pm Wednesday, March 27, 2019

Real-time analytics on deep learning: When TensorFlow met Presto at Uber

Location: 2016

Secondary topics: Data Platforms, Deep Learning, Streaming, realtime analytics, and IoT

Zhenxiao Luo (Twitter)

Average rating:

(4.00, 4 ratings)

From determining the most convenient rider pickup points to predicting the fastest routes, Uber uses data-driven analytics to create seamless trip experiences. Inside Uber, analysts are using deep learning and big data to train models, make predictions, and run analytics in real time. Zhenxiao Luo explains how Uber runs real-time analytics with deep learning. Read more.

5:10pm–5:50pm Wednesday, March 27, 2019

Point, click, predict

Location: Expo Hall

Secondary topics: AI and Data technologies in the cloud, AI and machine learning in the enterprise, Automation in data science and big data, Data Platforms, Model lifecycle management

Kevin Moore (Salesforce)

Average rating:

(4.50, 2 ratings)

Kevin Moore walks you through how TransmogrifAI—Salesforce's open source AutoML library built on Spark—automatically generates models that are automatically customized to a company's dataset and use case and provides insights into why the model is making the predictions it does. Read more.

5:10pm–5:50pm Wednesday, March 27, 2019

Federated learning

Location: 2010

Secondary topics: Security and Privacy

Mike Lee Williams (Cloudera Fast Forward Labs)

Average rating:

(4.00, 1 rating)

Imagine building a model whose training data is collected on edge devices such as cell phones or sensors. Each device collects data unlike any other, and the data cannot leave the device because of privacy concerns or unreliable network access. This challenging situation is known as federated learning. Mike Lee Williams discusses the algorithmic solutions and the product opportunities. Read more.

11:00am–11:40am Thursday, March 28, 2019

Creating a bionic newsroom

Location: 2009

Secondary topics: Media, Marketing, Advertising

Boris Yakubchik (Forbes), Salah Zalatimo (Forbes)

Average rating:

(4.50, 2 ratings)

Boris Yakubchik and Salah Zalatimo offer an overview of Bertie, Forbes's new publishing platform—an AI assistant that learns from writers and suggests improvements—and detail Bertie’s features, architecture, and ultimate goals, paying special attention to how the company implemented an ensemble of machine learning models that, together, make up the AI assistant's skill set and personality. Read more.

11:00am–11:40am Thursday, March 28, 2019

Applications of mixed effects random forests

Location: 2011

Sourav Dey (Manifold)

Average rating:

(4.75, 4 ratings)

Clustered data is all around us. The best way to attack it? Mixed effect models. Sourav Dey explains how the mixed effects random forests (MERF) model and Python package marries the world of classical mixed effect modeling with modern machine learning algorithms and shows how it can be extended to be used with other advanced modeling techniques like gradient boosting machines and deep learning. Read more.

11:00am–11:40am Thursday, March 28, 2019

Detecting coordinated fraud attacks using deep learning

Location: 2016

Secondary topics: Deep Learning, Security and Privacy

Fang Yu (DataVisor)

Average rating:

(3.75, 4 ratings)

Online fraud flourishes as online services become ubiquitous in our daily life. Fang Yu explains how DataVisor leverages cutting-edge deep learning technologies to address the challenges in large-scale fraud detection. Read more.

11:00am–11:40am Thursday, March 28, 2019

The future of machine learning is decentralized

Location: Expo Hall

Secondary topics: Security and Privacy, Storage

Alex Ingerman (Google)

Average rating:

(4.67, 12 ratings)

Federated learning is an approach for training ML models across a fleet of participating devices without collecting their data in a central location. Alex Ingerman offers an overview of federated learning, compares traditional and federated ML workflows, and explores the current and upcoming use cases for decentralized machine learning, with examples from Google's deployment of this technology. Read more.

11:00am–11:40am Thursday, March 28, 2019

Framework to quantitatively assess ML safety: Technical implementation and best practices

Location: 2010

Secondary topics: AI and machine learning in the enterprise, Health and Medicine, Security and Privacy

Ram Shankar Siva Kumar (Microsoft (Azure Security))

Average rating:

(4.33, 3 ratings)

How can we guarantee that the ML system we develop is adequately protected from adversarial manipulation? Ram Shankar Kumar shares a framework and corresponding best practices to quantitatively assess the safety of your ML systems. Read more.

11:50am–12:30pm Thursday, March 28, 2019

Deploying data science for national economic statistics

Location: 2009

Secondary topics: Temporal data and time-series analytics

Jeff Chen (US Bureau of Economic Analysis)

Average rating:

(4.50, 2 ratings)

Jeff Chen shares strategies for overcoming time series challenges at the intersection of macroeconomics and data science, drawing from machine learning research conducted at the Bureau of Economic Analysis aimed at improving its flagship product the gross domestic product. Read more.

11:50am–12:30pm Thursday, March 28, 2019

Infinite segmentation: Scalable mutual information ranking on real-world graphs

Location: 2011

Secondary topics: AI and Data technologies in the cloud, AI and machine learning in the enterprise, Media, Marketing, Advertising

Ken Johnston (Microsoft), Ankit Srivastava (Microsoft)

Average rating:

(4.50, 2 ratings)

Today, normal growth isn't enough—you need hockey-stick levels of growth. Sales and marketing orgs are looking to AI to "growth hack" their way to new markets and segments. Ken Johnston and Ankit Srivastava explain how to use mutual information at scale across massive data sources to help filter out noise and share critical insights with new cohort of users, businesses, and networks. Read more.

11:50am–12:30pm Thursday, March 28, 2019

Modern techniques for building robust deep networks

Location: 2016

Secondary topics: Deep Learning, Temporal data and time-series analytics

Sricharan Kumar (Intuit )

Average rating:

(4.29, 7 ratings)

Machine learning is delivering immense value across industries. However, in some instances, machine learning models can produce overconfident results—with the potential for catastrophic outcomes. Kumar Sricharan explains how to address this challenge through Bayesian machine learning and highlights real-world examples to illustrate its benefits. Read more.

11:50am–12:30pm Thursday, March 28, 2019

Decentralized governance of data

Location: Expo Hall

Secondary topics: Open Data, Data Generation and Data Networks, Security and Privacy

Roger Chen (Computable)

Average rating:

(2.00, 1 rating)

Data remains a linchpin of success for machine learning yet too often is a scarce resource. And even when data is available, trust issues arise about the quality and ethics of collection. Roger Chen explores new models for generating and governing training data for AI applications. Read more.

11:50am–12:30pm Thursday, March 28, 2019

Serverless for data and AI

Location: 2007

Secondary topics: AI and Data technologies in the cloud, Data Integration and Data Pipelines, Data Platforms

Avner Braverman (Binaris)

Average rating:

(4.00, 3 ratings)

What is serverless, and how can it be utilized for data analysis and AI? Avner Braverman outlines the benefits and limitations of serverless with respect to data transformation (ETL), AI inference and training, and real-time streaming. This is a technical talk, so expect demos and code. Read more.

11:50am–12:30pm Thursday, March 28, 2019

Masquerading malicious DNS traffic

Location: 2010

Secondary topics: Security and Privacy, Temporal data and time-series analytics

David Rodriguez (Cisco Systems)

Average rating:

(4.50, 2 ratings)

Malicious DNS traffic patterns are inconsistent and typically thwart anomaly detection. David Rodriguez explains how Cisco uses Apache Spark and Stripe’s Bayesian inference software, Rainier, to fit the underlying time series distribution for millions of domains and outlines techniques to identify artificial traffic volumes related to spam, malvertising, and botnets (masquerading traffic). Read more.

1:50pm–2:30pm Thursday, March 28, 2019

Use the Jupyter Notebook to integrate adversarial attacks into a model training pipeline to detect vulnerabilities

Location: 2009

Secondary topics: Security and Privacy

Animesh Singh (IBM), Tommy Li (IBM)

Average rating:

(4.50, 2 ratings)

Animesh Singh and Tommy Li explain how to implement state-of-the-art methods for attacking and defending classifiers using the open source Adversarial Robustness Toolbox. The library provides AI developers with interfaces that support the composition of comprehensive defense systems using individual methods as building blocks. Read more.

1:50pm–2:30pm Thursday, March 28, 2019

How to determine the optimal anomaly detection method for your application

Location: 2011

Secondary topics: Temporal data and time-series analytics

Jonathan Merriman (Verint Intelligent Self Service), Cynthia Freeman (Verint Intelligent Self-Service)

Average rating:

(3.89, 9 ratings)

Anomaly detection has many applications, such as tracking business KPIs or fraud spotting in credit card transactions. Unfortunately, there's no one best way to detect anomalies across a variety of domains. Jonathan Merriman and Cynthia Freeman introduce a framework to determine the best anomaly detection method for the application based on time series characteristics. Read more.

1:50pm–2:30pm Thursday, March 28, 2019

On a deep journey toward five nines

Location: 2016

Secondary topics: Deep Learning, Financial Services, Temporal data and time-series analytics

Aashish Sheshadri (PayPal)

Average rating:

(4.50, 2 ratings)

Deep learning using sequence-to-sequence networks (Seq2Seq) has demonstrated unparalleled success in neural machine translation. A less explored but highly sought-after area of forecasting can leverage recent gains made in Seq2Seq networks. Aashish Sheshadri explains how PayPal has applied deep networks to monitoring and alerting intelligence. Read more.

1:50pm–2:30pm Thursday, March 28, 2019

Ludwig, a code-free deep learning toolbox

Location: 2007

Secondary topics: Deep Learning, Transportation and Logistics

Piero Molino (Uber AI)

Average rating:

(4.60, 5 ratings)

Piero Molino offers an overview of Ludwig, a deep learning toolbox that allows you to train models and use them for prediction without the need to write code. It's unique in its ability to help make deep learning easier to understand for nonexperts and enable faster model improvement iteration cycles for experienced machine learning developers and researchers alike. Read more.

1:50pm–2:30pm Thursday, March 28, 2019

Using graph metrics to detect lateral movement in enterprise cybersecurity data

Location: 2010

Secondary topics: Graph technologies and analytics, Security and Privacy

Louis DiValentin (Accenture), Dillon Cullinan (Accenture)

Average rating:

(3.00, 3 ratings)

Louis DiValentin and Dillon Cullinan explain how Accenture's Cyber Security Lab built security analytics models to detect attempted lateral movement in networks by transforming enterprise-scale security data into a graph format, generating graph analytics for individual users, and building time series detection models that visualize the changing graph metrics for security operators. Read more.

2:40pm–3:20pm Thursday, March 28, 2019

Machine learning prediction of blood alcohol content: A digital signature of behavior

Location: 2009

Secondary topics: Health and Medicine

Kirstin Aschbacher (UCSF Cardiology)

Average rating:

(4.20, 5 ratings)

Some people use digital devices to track their blood alcohol content (BAC). A BAC-tracking app that could anticipate when a person is likely to have a high BAC could offer coaching in a time of need. Kirstin Aschbacher shares a machine learning approach that predicts user BAC levels with good precision based on minimal information, thereby enabling targeted interventions. Read more.

2:40pm–3:20pm Thursday, March 28, 2019

Personalizing the guest-booking experience  at Airbnb

Location: 2011

Secondary topics: Retail and e-commerce

Kapil Gupta (Airbnb)

Average rating:

(3.50, 4 ratings)

Kapil Gupta explains how Airbnb approaches the personalization of travelers’ booking experiences using machine learning. Read more.

2:40pm–3:20pm Thursday, March 28, 2019

Anomaly detection using deep learning to measure the quality of large datasets

Location: 2016

Secondary topics: Data preparation, data governance, and data lineage, Deep Learning

Sridhar Alla (BlueWhale), Syed Nasar (Cloudera)

Average rating:

(2.86, 7 ratings)

Any business big or small depends on analytics, whether the goal is revenue generation, churn reduction, or sales and marketing. No matter the algorithm and the techniques used, the result depends on the accuracy and consistency of the data being processed. Sridhar Alla and Syed Nasar share techniques used to evaluate the the quality of data and the means to detect the anomalies in the data. Read more.

2:40pm–3:20pm Thursday, March 28, 2019

How to train your model (and catch label leakage)

Location: 2010

Secondary topics: AI and Data technologies in the cloud, AI and machine learning in the enterprise, Automation in data science and big data

Till Bergmann (Salesforce)

Average rating:

(3.67, 6 ratings)

A problem in predictive modeling data is label leakage. At enterprise companies such as Salesforce, this problem takes on monstrous proportions as the data is populated by diverse business processes, making it hard to distinguish cause from effect. Till Bergmann explains how Salesforce—which needs to churn out thousands of customer-specific models for any given use case—tackled this problem. Read more.

3:50pm–4:30pm Thursday, March 28, 2019

Nutrition data science

Location: 2009

Secondary topics: Health and Medicine

Noah Gift (UC Davis ), Michelle Davenport (Quantitative Nutrition)

Average rating:

(2.89, 9 ratings)

Noah Gift and Michelle Davenport explore exciting ideas in nutrition using data science; specifically, they analyze the detrimental relationship between sugar and longevity, obesity, and chronic diseases. Read more.

3:50pm–4:30pm Thursday, March 28, 2019

The next step in the evolution of data science with RAPIDS

Location: 2011

Secondary topics: Graph technologies and analytics

Bartley Richardson (NVIDIA), Joshua Patterson (NVIDIA)

Average rating:

(4.00, 2 ratings)

RAPIDS is the next big step in data science, combining the ease of use of common APIs and the power and scalability of GPUs. Bartley Richardson and Joshua Patterson offer an overview of RAPIDS and and explore cuDF, cuGraph, and cuML—a trio of RAPIDS tools that enable data scientists to work with data in a familiar interface and apply graph analytics and traditional machine learning techniques. Read more.

3:50pm–4:30pm Thursday, March 28, 2019

Analytics Zoo: Distributed TensorFlow in production on Apache Spark

Location: 2016

Secondary topics: Data Platforms, Deep Learning

Yuhao Yang (Intel), Jiao(Jennie) Wang (Intel)

Average rating:

(2.67, 3 ratings)

Yuhao Yang and Jennie Wang demonstrate how to run distributed TensorFlow on Apache Spark with the open source software package Analytics Zoo. Compared to other solutions, Analytics Zoo is built for production environments and encourages more industry users to run deep learning applications with the big data ecosystems. Read more.

3:50pm–4:30pm Thursday, March 28, 2019

Testing ad content with survey experiments

Location: 2010

Secondary topics: AI and Data technologies in the cloud, AI and machine learning in the enterprise, Media, Marketing, Advertising

Patrick Miller (Civis Analytics)

Average rating:

(3.40, 5 ratings)

Brands that test the content of ads before they are shown to an audience can avoid spending resources on the 11% of ads that cause backlash. Using a survey experiment to choose the best ad typically improves effectiveness of marketing campaigns by 13% on average, and up to 37% for particular demographics. Patrick Miller explores data collection and statistical methods for analysis and reporting. Read more.

4:40pm–5:20pm Thursday, March 28, 2019

Applying machine learning in fintech startups: Modeling with sensitive customer datasets

Location: 2004

Secondary topics: Financial Services, Security and Privacy

Ji Peng (Earnin )

Average rating:

(4.50, 2 ratings)

As a customer-facing fintech company, Earnin has access to various types of valuable customer data, from bank transactions to GPS location. Ji Peng shares how Earnin uses unique datasets to build machine learning models and navigates the challenges of prioritizing and applying machine learning in the fintech domain. Read more.

4:40pm–5:20pm Thursday, March 28, 2019

Machine learning for preventive maintenance of mining haul trucks

Location: 2009

Secondary topics: Streaming, realtime analytics, and IoT, Temporal data and time-series analytics, Transportation and Logistics

Alex Gorbachev (Pythian), Paul Spiegelhalter (Pythian)

Average rating:

(4.67, 3 ratings)

Alex Gorbachev and Paul Spiegelhalter use the example of a mining haul truck to explain how to map preventive maintenance needs to supervised machine learning problems, create labeled datasets, do feature engineering from sensors and alerts data, evaluate models—then convert it all to a complete AI solution on Google Cloud Platform that's integrated with existing on-premises systems. Read more.

4:40pm–5:20pm Thursday, March 28, 2019

Machine learning and GDPR

Location: 2011

Secondary topics: Security and Privacy

Michael Gregory (Cloudera)

Average rating:

(4.25, 4 ratings)

The General Data Protection Regulation (GDPR) enacted by the European Union restricts the use of machine learning practices in many cases. Michael Gregory offers an overview of the regulations, important considerations for both EU and non-EU organizations, and tools and technologies to ensure that you're appropriately using ML applications to drive continued transformation and insights. Read more.

4:40pm–5:20pm Thursday, March 28, 2019

Using deep learning to automatically rank millions of hotel images

Location: 2016

Secondary topics: Deep Learning, Retail and e-commerce

Christopher Lennan (idealo.de)

Average rating:

(4.00, 1 rating)

Idealo.de recently trained convolutional neural networks (CNN) for aesthetic and technical image quality predictions. Christopher Lennan shares the training approach, along with some practical insights, and sheds light on what the trained models actually learned by visualizing the convolutional filter weights and output nodes of the trained models. Read more.

4:40pm–5:20pm Thursday, March 28, 2019

Efficient multi-armed bandit with Thompson sampling for applications with delayed feedback

Location: 2010

Secondary topics: Media, Marketing, Advertising

Shradha Agrawal (Adobe)

Average rating:

(4.17, 6 ratings)

Decision making often struggles with the exploration-exploitation dilemma. Multi-armed bandits (MAB) are a popular reinforcement learning solution, but increasing the number of decision criteria leads to an exponential blowup in complexity, and observational delays don’t allow for optimal performance. Shradha Agrawal offers an overview of MABs and explains how to overcome the above challenges. Read more.

Data Science, Machine Learning & AI

If you're in data, you need to understand machine learning & AI

Featured Speakers

Sponsorship Opportunities

Partner Opportunities

Contact Us