Featured Speakers
Monday, Mar 25 - Tuesday, Mar 26: 2-Day Training (Platinum & Training passes) |
Tuesday Mar 26: Tutorials (Gold & Silver passes) |
Wednesday Mar 27: Keynotes & Sessions (Platinum, Gold, Silver & Bronze passes) |
8:45am | Location: Ballroom Strata Data Conference Keynotes |
10:30am Morning break |
Thursday Mar 28: Keynotes & Sessions (Platinum, Gold, Silver & Bronze passes) |
8:45am | Location: Ballroom Strata Data Conference Keynotes |
10:30am Morning break |
9:00am - 5:00pm Monday, March 25 & Tuesday, March 26
The TensorFlow library provides for the use of computational graphs, with automatic parallelization across resources. This architecture is ideal for implementing neural networks. Robert Schroll offers an overview of TensorFlow's capabilities in Python, demonstrating how to build machine learning algorithms piece by piece and how to use TensorFlow's Keras API with several hands-on applications.
Read more.
9:00am - 5:00pm Monday, March 25 & Tuesday, March 26
Don Fox walks you through developing a machine learning pipeline, from prototyping to production. You'll learn about data cleaning, feature engineering, model building and evaluation, and deployment and then extend these models into two applications from real-world datasets. All work will be done in Python.
Read more.
9:00am - 5:00pm Monday, March 25 & Tuesday, March 26
Advancing your career in data science requires learning new languages and frameworks—but learners face an overwhelming array of choices, each with different syntaxes, conventions, and terminology. Ian Cook simplifies the learning process by elucidating the abstractions common to these systems. Through hands-on exercises, you'll overcome obstacles to getting started using new tools.
Read more.
9:00am - 5:00pm Monday, March 25 & Tuesday, March 26
Francesca Lazzeri and Jen Ren walk you through the core steps for using Azure Machine Learning services to train your machine learning models both locally and on remote compute resources.
Read more.
9:00am–12:30pm Tuesday, March 26, 2019
Martin Gorner leads a hands-on introduction to recurrent neural networks and TensorFlow. Join in to discover what makes RNNs so powerful for time series analysis.
Read more.
9:00am–12:30pm Tuesday, March 26, 2019
David Talby, Alex Thomas, and Claudiu Branzan lead a hands-on introduction to scalable NLP using the highly performant, highly scalable open source Spark NLP library. You’ll spend about half your time coding as you work through four sections, each with an end-to-end working codebase that you can change and improve.
Read more.
9:00am–12:30pm Tuesday, March 26, 2019
From healthcare to smart home to autonomous vehicles, new applications of autonomous systems are raising ethical concerns about a host of issues, including bias, transparency, and privacy. Iman Saleh, Cory Ilo, and Cindy Tseng demonstrate tools and capabilities that can help data scientists address these concerns and bridge the gap between ethicists, regulators, and machine learning practitioners.
Read more.
1:30pm–5:00pm Tuesday, March 26, 2019
If machine learning can lead to financial gains for your organization, why isn’t everyone doing it? One reason is training machine learning systems with transparent inner workings and auditable predictions is difficult. Patrick Hall details the good, bad, and downright ugly lessons learned from his years of experience implementing solutions for interpretable machine learning.
Read more.
1:30pm–5:00pm Tuesday, March 26, 2019
Abhishek Kumar and Pramod Singh walk you through deep learning-based recommender and personalization systems they've built for clients. Join in to learn how to use TensorFlow Serving and MLflow for end-to-end productionalization, including model serving, Dockerization, reproducibility, and experimentation, and Kubernetes for deployment and orchestration of ML-based microarchitectures.
Read more.
1:30pm–5:00pm Tuesday, March 26, 2019
Jason Dai, Yuhao Yang, Jennie Wang, and Guoqiong Song explain how to build and productionize deep learning applications for big data with Analytics Zoo—a unified analytics and AI platform that seamlessly unites Spark, TensorFlow, Keras, and BigDL programs into an integrated pipeline—using real-world use cases from JD.com, MLSListings, the World Bank, Baosight, and Midea/KUKA.
Read more.
1:30pm–5:00pm Tuesday, March 26, 2019
Thanks to the rapid growth in data resources, business leaders now appreciate the importance (and the challenge) of mining information from data. Join in as a group of LinkedIn's data scientists share their experiences successfully leveraging emerging techniques to assist in intelligent decision making.
Read more.
11:00am–11:40am Wednesday, March 27, 2019
Financial services are increasingly deploying AI services for a wide range of applications, such as identifying fraud and financial crimes. Such deployment requires models to be interpretable, explainable, and resilient to adversarial attacks—regulatory requirements prohibit black-box machine learning models. Jari Koister shares tools and infrastructure has developed to support these needs.
Read more.
11:00am–11:40am Wednesday, March 27, 2019
Data platforms are being asked to support an ever increasing range of workloads and compute environments, including machine learning and elastic cloud platforms. Tristan Zajonc and Tim Chen discuss emerging capabilities, including running machine learning and Spark workloads on autoscaling container platforms, and share their vision for the road ahead for ML and AI in the cloud.
Read more.
11:00am–11:40am Wednesday, March 27, 2019
Jeremy Howard describes how to leverage the latest research from the deep learning and HCI communities to train neural networks from scratch—without code or preexisting labels. He then shares case studies in fashion, retail and ecommerce, travel, and agriculture where these approaches have been used.
Read more.
11:00am–11:40am Wednesday, March 27, 2019
Alon Kaufman and Vinod Vaikuntanathan discuss the challenges and opportunities of machine learning on encrypted data and describe the state of the art in this space.
Read more.
11:00am–11:40am Wednesday, March 27, 2019
Robert Horton, Mario Inchiosa, and Ali Zaidi demonstrate how to use three cutting-edge machine learning techniques—transfer learning from pretrained language models, active learning to make more effective use of a limited labeling budget, and hyperparameter tuning to maximize model performance—to up your modeling game.
Read more.
11:00am–11:40am Wednesday, March 27, 2019
Sam Lightstone discusses how AI is fundamentally changing computer science and the practice of coding. Join in to discover what machine learning means today and explore recent advances in hardware and software and breakthrough innovations.
Read more.
11:50am–12:30pm Wednesday, March 27, 2019
Quantitative finance is a rich field in finance where advanced mathematical and statistical techniques are employed by both sell-side and buy-side institutions. Chakri Cherukuri explains how machine learning and deep learning techniques are being used in quantitative finance and details how these models work under the hood.
Read more.
11:50am–12:30pm Wednesday, March 27, 2019
How does Salesforce make data science an Agile partner to over 100,000 customers? Sarah Aerni shares the nuts and bolts of the platform and details the Agile process behind it. From open source autoML library TransmogrifAI and experimentation to deployment and monitoring, Sarah covers the tools that make it possible for data scientists to rapidly iterate and adopt a truly Agile methodology.
Read more.
11:50am–12:30pm Wednesday, March 27, 2019
Customer segmentation based on coarse survey data is a staple of traditional market research. Melinda Han Williams explains how Dstillery uses neural networks to model the digital pathways of 100M consumers and uses the resulting embedding space to cluster customer populations into fine-grained behavioral segments and inform smarter consumer insights—in the process, creating a map of the internet.
Read more.
11:50am–12:30pm Wednesday, March 27, 2019
Google uses deep learning extensively in new and existing products. Join Ron Bodkin to learn how Google has used deep learning for recommendations at YouTube, in the Play store, and for customers in Google Cloud. You'll explore the role of embeddings, recurrent networks, contextual variables, and wide and deep learning and discover how to do candidate generation and ranking with deep learning.
Read more.
11:50am–12:30pm Wednesday, March 27, 2019
How do you train a machine learning model with no training data? Michael Johnson and Norris Heintzelman share their journey implementing multiple solutions to bootstrapping training data in the NLP domain, covering topics including weak supervision, building an active learning framework, and annotation adjudication for named-entity recognition.
Read more.
2:40pm–3:20pm Wednesday, March 27, 2019
Divya Choudhary explains how GO-JEK uses random chat messages and notes written in a local language sent by customers to their drivers while waiting for a ride to arrive to carve out unparalleled information about pickup points and their names (which sometimes even Google Maps has no idea of) and help create a world-class customer pickup experience feature.
Read more.
2:40pm–3:20pm Wednesday, March 27, 2019
Evaluating machine learning models is surprisingly hard, particularly because these systems interact in very subtle ways. Ted Dunning breaks the problem of evaluation apart into operational and function evaluation, demonstrating how to do each without unnecessary pain and suffering. Along the way, he shares exciting visualization techniques that will help make differences strikingly apparent.
Read more.
2:40pm–3:20pm Wednesday, March 27, 2019
Dilated neural networks are a class of recently developed neural networks that achieve promising results in time series forecasting. Chenhui Hu discusses representative network architectures of dilated neural networks and demonstrates their advantages in terms of training efficiency and forecast accuracy by applying them to solve sales forecasting and financial time series forecasting problems.
Read more.
2:40pm–3:20pm Wednesday, March 27, 2019
Sonal Gupta explores practical systems for building a conversational AI system for task-oriented queries and details a way to do more advanced compositional understanding, which can understand cross-domain queries, using hierarchical representations.
Read more.
2:40pm–3:20pm Wednesday, March 27, 2019
The nascent field of fair machine learning aims to ensure that decisions guided by algorithms are equitable. Several formal definitions of fairness have gained prominence, but, as Sharad Goel argues, nearly all of them suffer from significant statistical limitations. Perversely, when used as a design constraint, they can even harm the very groups they were intended to protect.
Read more.
4:20pm–5:00pm Wednesday, March 27, 2019
Yogesh Pandit, Saif Addin Ellafi, and Vishakha Sharma discuss how Roche applies Spark NLP for healthcare to extract clinical facts from pathology reports and radiology. They then detail the design of the deep learning pipelines used to simplify training, optimization, and inference of such domain-specific models at scale.
Read more.
4:20pm–5:00pm Wednesday, March 27, 2019
Production ML applications benefit from reproducible, automated retraining, and deployment of ever-more predictive models trained on ever-increasing amounts of data. Kelley Rivoire explains how Stripe built a flexible API for training machine learning models that's used to train thousands of models per week on Kubernetes, supporting automated deployment of new models with improved performance.
Read more.
4:20pm–5:00pm Wednesday, March 27, 2019
User-based real-time recommendation systems have become an important topic in ecommerce. Lu Wang, Nicole Kong, Guoqiong Song, and Maneesha Bhalla demonstrate how to build deep learning algorithms using Analytics Zoo with BigDL on Apache Spark and create an end-to-end system to serve real-time product recommendations.
Read more.
4:20pm–5:00pm Wednesday, March 27, 2019
Talent search systems at LinkedIn strive to match the potential candidates to the hiring needs of a recruiter expressed in terms of a search query. Gungor Polatkan shares the results of the company's deployment of deep learning models on a real-world production system serving 500M+ users through LinkedIn Recruiter.
Read more.
4:20pm–5:00pm Wednesday, March 27, 2019
Time series forecasting techniques are applied in a wide range of scientific disciplines, business scenarios, and policy settings. Jeffrey Yau discusses the applications of statistical time series models, such as ARIMA, VAR, and regime-switching models, and machine learning models, such as random forest and neural network-based models, to forecasting problems.
Read more.
5:10pm–5:50pm Wednesday, March 27, 2019
Rakesh Kumar and Thomas Weise explore how Lyft dynamically prices its rides with a combination of various data sources, ML models, and streaming infrastructure for low latency, reliability, and scalability—allowing the pricing system to be more adaptable to real-world changes.
Read more.
5:10pm–5:50pm Wednesday, March 27, 2019
Ting-Fang Yen details an approach for monitoring production machine learning systems that handle billions of requests daily by discovering detection anomalies, such as spurious false positives, as well as gradual concept drifts when the model no longer captures the target concept. Join in to explore new tools for detecting undesirable model behaviors early in large-scale online ML systems.
Read more.
5:10pm–5:50pm Wednesday, March 27, 2019
From determining the most convenient rider pickup points to predicting the fastest routes, Uber uses data-driven analytics to create seamless trip experiences. Inside Uber, analysts are using deep learning and big data to train models, make predictions, and run analytics in real time. Zhenxiao Luo explains how Uber runs real-time analytics with deep learning.
Read more.
5:10pm–5:50pm Wednesday, March 27, 2019
Kevin Moore walks you through how TransmogrifAI—Salesforce's open source AutoML library built on Spark—automatically generates models that are automatically customized to a company's dataset and use case and provides insights into why the model is making the predictions it does.
Read more.
5:10pm–5:50pm Wednesday, March 27, 2019
Imagine building a model whose training data is collected on edge devices such as cell phones or sensors. Each device collects data unlike any other, and the data cannot leave the device because of privacy concerns or unreliable network access. This challenging situation is known as federated learning. Mike Lee Williams discusses the algorithmic solutions and the product opportunities.
Read more.
11:00am–11:40am Thursday, March 28, 2019
Boris Yakubchik and Salah Zalatimo offer an overview of Bertie, Forbes's new publishing platform—an AI assistant that learns from writers and suggests improvements—and detail Bertie’s features, architecture, and ultimate goals, paying special attention to how the company implemented an ensemble of machine learning models that, together, make up the AI assistant's skill set and personality.
Read more.
11:00am–11:40am Thursday, March 28, 2019
Clustered data is all around us. The best way to attack it? Mixed effect models. Sourav Dey explains how the mixed effects random forests (MERF) model and Python package marries the world of classical mixed effect modeling with modern machine learning algorithms and shows how it can be extended to be used with other advanced modeling techniques like gradient boosting machines and deep learning.
Read more.
11:00am–11:40am Thursday, March 28, 2019
Online fraud flourishes as online services become ubiquitous in our daily life. Fang Yu explains how DataVisor leverages cutting-edge deep learning technologies to address the challenges in large-scale fraud detection.
Read more.
11:00am–11:40am Thursday, March 28, 2019
Federated learning is an approach for training ML models across a fleet of participating devices without collecting their data in a central location. Alex Ingerman offers an overview of federated learning, compares traditional and federated ML workflows, and explores the current and upcoming use cases for decentralized machine learning, with examples from Google's deployment of this technology.
Read more.
11:00am–11:40am Thursday, March 28, 2019
How can we guarantee that the ML system we develop is adequately protected from adversarial manipulation? Ram Shankar Kumar shares a framework and corresponding best practices to quantitatively assess the safety of your ML systems.
Read more.
11:50am–12:30pm Thursday, March 28, 2019
Jeff Chen shares strategies for overcoming time series challenges at the intersection of macroeconomics and data science, drawing from machine learning research conducted at the Bureau of Economic Analysis aimed at improving its flagship product the gross domestic product.
Read more.
11:50am–12:30pm Thursday, March 28, 2019
Today, normal growth isn't enough—you need hockey-stick levels of growth. Sales and marketing orgs are looking to AI to "growth hack" their way to new markets and segments. Ken Johnston and Ankit Srivastava explain how to use mutual information at scale across massive data sources to help filter out noise and share critical insights with new cohort of users, businesses, and networks.
Read more.
11:50am–12:30pm Thursday, March 28, 2019
Machine learning is delivering immense value across industries. However, in some instances, machine learning models can produce overconfident results—with the potential for catastrophic outcomes. Kumar Sricharan explains how to address this challenge through Bayesian machine learning and highlights real-world examples to illustrate its benefits.
Read more.
11:50am–12:30pm Thursday, March 28, 2019
Data remains a linchpin of success for machine learning yet too often is a scarce resource. And even when data is available, trust issues arise about the quality and ethics of collection. Roger Chen explores new models for generating and governing training data for AI applications.
Read more.
11:50am–12:30pm Thursday, March 28, 2019
What is serverless, and how can it be utilized for data analysis and AI? Avner Braverman outlines the benefits and limitations of serverless with respect to data transformation (ETL), AI inference and training, and real-time streaming. This is a technical talk, so expect demos and code.
Read more.
11:50am–12:30pm Thursday, March 28, 2019
Malicious DNS traffic patterns are inconsistent and typically thwart anomaly detection. David Rodriguez explains how Cisco uses Apache Spark and Stripe’s Bayesian inference software, Rainier, to fit the underlying time series distribution for millions of domains and outlines techniques to identify artificial traffic volumes related to spam, malvertising, and botnets (masquerading traffic).
Read more.
1:50pm–2:30pm Thursday, March 28, 2019
Animesh Singh and Tommy Li explain how to implement state-of-the-art methods for attacking and defending classifiers using the open source Adversarial Robustness Toolbox. The library provides AI developers with interfaces that support the composition of comprehensive defense systems using individual methods as building blocks.
Read more.
1:50pm–2:30pm Thursday, March 28, 2019
Anomaly detection has many applications, such as tracking business KPIs or fraud spotting in credit card transactions. Unfortunately, there's no one best way to detect anomalies across a variety of domains. Jonathan Merriman and Cynthia Freeman introduce a framework to determine the best anomaly detection method for the application based on time series characteristics.
Read more.
1:50pm–2:30pm Thursday, March 28, 2019
Deep learning using sequence-to-sequence networks (Seq2Seq) has demonstrated unparalleled success in neural machine translation. A less explored but highly sought-after area of forecasting can leverage recent gains made in Seq2Seq networks. Aashish Sheshadri explains how PayPal has applied deep networks to monitoring and alerting intelligence.
Read more.
1:50pm–2:30pm Thursday, March 28, 2019
Piero Molino offers an overview of Ludwig, a deep learning toolbox that allows you to train models and use them for prediction without the need to write code. It's unique in its ability to help make deep learning easier to understand for nonexperts and enable faster model improvement iteration cycles for experienced machine learning developers and researchers alike.
Read more.
1:50pm–2:30pm Thursday, March 28, 2019
Louis DiValentin and Dillon Cullinan explain how Accenture's Cyber Security Lab built security analytics models to detect attempted lateral movement in networks by transforming enterprise-scale security data into a graph format, generating graph analytics for individual users, and building time series detection models that visualize the changing graph metrics for security operators.
Read more.
2:40pm–3:20pm Thursday, March 28, 2019
Some people use digital devices to track their blood alcohol content (BAC). A BAC-tracking app that could anticipate when a person is likely to have a high BAC could offer coaching in a time of need. Kirstin Aschbacher shares a machine learning approach that predicts user BAC levels with good precision based on minimal information, thereby enabling targeted interventions.
Read more.
2:40pm–3:20pm Thursday, March 28, 2019
Kapil Gupta explains how Airbnb approaches the personalization of travelers’ booking experiences using machine learning.
Read more.
2:40pm–3:20pm Thursday, March 28, 2019
Any business big or small depends on analytics, whether the goal is revenue generation, churn reduction, or sales and marketing. No matter the algorithm and the techniques used, the result depends on the accuracy and consistency of the data being processed. Sridhar Alla and Syed Nasar share techniques used to evaluate the the quality of data and the means to detect the anomalies in the data.
Read more.
2:40pm–3:20pm Thursday, March 28, 2019
A problem in predictive modeling data is label leakage. At enterprise companies such as Salesforce, this problem takes on monstrous proportions as the data is populated by diverse business processes, making it hard to distinguish cause from effect. Till Bergmann explains how Salesforce—which needs to churn out thousands of customer-specific models for any given use case—tackled this problem.
Read more.
3:50pm–4:30pm Thursday, March 28, 2019
Noah Gift and Michelle Davenport explore exciting ideas in nutrition using data science; specifically, they analyze the detrimental relationship between sugar and longevity, obesity, and chronic diseases.
Read more.
3:50pm–4:30pm Thursday, March 28, 2019
RAPIDS is the next big step in data science, combining the ease of use of common APIs and the power and scalability of GPUs. Bartley Richardson and Joshua Patterson offer an overview of RAPIDS and and explore cuDF, cuGraph, and cuML—a trio of RAPIDS tools that enable data scientists to work with data in a familiar interface and apply graph analytics and traditional machine learning techniques.
Read more.
3:50pm–4:30pm Thursday, March 28, 2019
Yuhao Yang and Jennie Wang demonstrate how to run distributed TensorFlow on Apache Spark with the open source software package Analytics Zoo. Compared to other solutions, Analytics Zoo is built for production environments and encourages more industry users to run deep learning applications with the big data ecosystems.
Read more.
3:50pm–4:30pm Thursday, March 28, 2019
Brands that test the content of ads before they are shown to an audience can avoid spending resources on the 11% of ads that cause backlash. Using a survey experiment to choose the best ad typically improves effectiveness of marketing campaigns by 13% on average, and up to 37% for particular demographics. Patrick Miller explores data collection and statistical methods for analysis and reporting.
Read more.
4:40pm–5:20pm Thursday, March 28, 2019
As a customer-facing fintech company, Earnin has access to various types of valuable customer data, from bank transactions to GPS location. Ji Peng shares how Earnin uses unique datasets to build machine learning models and navigates the challenges of prioritizing and applying machine learning in the fintech domain.
Read more.
4:40pm–5:20pm Thursday, March 28, 2019
Alex Gorbachev and Paul Spiegelhalter use the example of a mining haul truck to explain how to map preventive maintenance needs to supervised machine learning problems, create labeled datasets, do feature engineering from sensors and alerts data, evaluate models—then convert it all to a complete AI solution on Google Cloud Platform that's integrated with existing on-premises systems.
Read more.
4:40pm–5:20pm Thursday, March 28, 2019
The General Data Protection Regulation (GDPR) enacted by the European Union restricts the use of machine learning practices in many cases. Michael Gregory offers an overview of the regulations, important considerations for both EU and non-EU organizations, and tools and technologies to ensure that you're appropriately using ML applications to drive continued transformation and insights.
Read more.
4:40pm–5:20pm Thursday, March 28, 2019
Idealo.de recently trained convolutional neural networks (CNN) for aesthetic and technical image quality predictions. Christopher Lennan shares the training approach, along with some practical insights, and sheds light on what the trained models actually learned by visualizing the convolutional filter weights and output nodes of the trained models.
Read more.
4:40pm–5:20pm Thursday, March 28, 2019
Decision making often struggles with the exploration-exploitation dilemma. Multi-armed bandits (MAB) are a popular reinforcement learning solution, but increasing the number of decision criteria leads to an exponential blowup in complexity, and observational delays don’t allow for optimal performance. Shradha Agrawal offers an overview of MABs and explains how to overcome the above challenges.
Read more.