Data science and machine learning: Big data conference & machine learning training

Wednesday Sep 12: Keynotes & Sessions (Platinum, Gold, Silver & Bronze passes)
9:00am \| Location: 3E Strata Data Conference Keynotes
10:50am Morning break

Thursday Sep 13: Keynotes & Sessions (Platinum, Gold, Silver & Bronze passes)
9:00am \| Location: 3E Strata Data Conference Keynotes
10:50am Morning break

9:00am–12:30pm Tuesday, 09/11/2018

Building a large-scale machine learning application using Amazon SageMaker and Spark

Location: 1A 10 Level: Intermediate

David Arpin (Amazon Web Services)

Average rating:

(2.80, 10 ratings)

David Arpin walks you through building a machine learning application, from data manipulation to algorithm training to deployment to a real-time prediction endpoint, using Spark and Amazon SageMaker. Read more.

9:00am–12:30pm Tuesday, 09/11/2018

Deep learning methods for natural language processing

Location: 1A 21/22 Level: Intermediate

Secondary topics: Deep Learning, Text and Language processing and analysis

Garrett Hoffman (StockTwits)

Average rating:

(4.75, 4 ratings)

Garrett Hoffman walks you through deep learning methods for natural language processing and natural language understanding tasks, using a live example in Python and TensorFlow with StockTwits data. Methods include word2vec, recurrent neural networks and variants (LSTM, GRU), and convolutional neural networks. Read more.

9:00am–12:30pm Tuesday, 09/11/2018

Model serving and management at scale using open source tools

Location: 1E 06 Level: Intermediate

Secondary topics: Model lifecycle management

Dan Crankshaw (UC Berkeley RISELab)

Average rating:

(5.00, 1 rating)

Dan Crankshaw offers an overview of the current challenges in deploying machine applications into production and the current state of prediction serving infrastructure. He then leads a deep dive into the Clipper serving system and shows you how to get started. Read more.

9:00am–12:30pm Tuesday, 09/11/2018

Learning machine learning using astronomy datasets

Location: 1E 14 Level: Beginner

Viviana Acquaviva (CUNY New York City College of Technology)

Average rating:

(4.75, 4 ratings)

Using interesting, diverse publicly available datasets and actual problems in astronomy research, Viviana Acquaviva leads an intermediate tutorial on machine learning. You'll learn how to customize algorithms and evaluation metrics required by scientific applications and discover best practices for choosing, developing, and evaluating machine learning algorithms in "real-world" datasets. Read more.

9:00am–12:30pm Tuesday, 09/11/2018

Deep learning-based search and recommendation systems using TensorFlow

Location: 1E 15/16 Level: Intermediate

Secondary topics: Deep Learning, Recommendation Systems

Vijay Agneeswaran (Walmart Labs), Abhishek Kumar (Publicis Sapient)

Average rating:

(4.40, 5 ratings)

Abhishek Kumar and Vijay Srinivas Agneeswaran offer an introduction to deep learning-based recommendation and learning-to-rank systems using TensorFlow. You'll learn how to build a recommender system based on intent prediction using deep learning that is based on a real-world implementation for an ecommerce client. Read more.

1:30pm–5:00pm Tuesday, 09/11/2018

Data science with Unix power tools

Location: 1A 10 Level: Intermediate

Jeroen Janssens (Data Science Workshops)

Average rating:

(3.00, 3 ratings)

The Unix command line remains an amazing environment for efficiently performing tedious but essential data science tasks. By combining small, powerful command-line tools, you can quickly scrub, explore, and model your data as well as hack together prototypes. Join Jeroen Janssens for a hands-on workshop based on his book Data Science at the Command Line. Read more.

1:30pm–5:00pm Tuesday, 09/11/2018

Recurrent neural networks for time series analysis

Location: 1A 12/14 Level: Intermediate

Secondary topics: Deep Learning, Temporal data and time-series analytics

Bruno Goncalves (Data For Science)

Average rating:

(3.14, 7 ratings)

Time series are everywhere around us. Understanding them requires taking into account the sequence of values seen in previous steps and even long-term temporal correlations. Join Bruno Gonçalves to learn how to use recurrent neural networks to model and forecast time series and discover the advantages and disadvantages of recurrent neural networks with respect to more traditional approaches. Read more.

1:30pm–5:00pm Tuesday, 09/11/2018

Natural language understanding at scale with Spark NLP

Location: 1A 21/22 Level: Intermediate

Secondary topics: Text and Language processing and analysis

David Talby (Pacific AI), Claudiu Branzan (Accenture), Alex Thomas (John Snow Labs)

Average rating:

(3.00, 7 ratings)

David Talby, Claudiu Branzan, and Alex Thomas lead a hands-on tutorial for scalable NLP using the highly performant, highly scalable open source Spark NLP library. You’ll spend about half your time coding as you work through four sections, each with an end-to-end working codebase that you can change and improve. Read more.

1:30pm–5:00pm Tuesday, 09/11/2018

Leveraging Spark and deep learning frameworks to understand data at scale

Location: 1E 07/08 Level: Intermediate

Secondary topics: Deep Learning

Vartika Singh (Cloudera), Alan Silva (Cloudera), Alex Bleakley (Cloudera), Steven Totman (Cloudera), Mirko Kämpf (Cloudera), Syed Nasar (Cloudera)

Average rating:

(1.00, 1 rating)

Vartika Singh, Alan Silva, Alex Bleakley, Steven Totman, Mirko Kämpf, and Syed Nasar outline approaches for preprocessing, training, inference, and deployment across datasets (time series, audio, video, text, etc.) that leverage Spark, its extended ecosystem of libraries, and deep learning frameworks. Read more.

1:30pm–5:00pm Tuesday, 09/11/2018

How to be fair: A tutorial for beginners

Location: 1E 11 Level: Intermediate

Secondary topics: Ethics and Privacy

Aileen Nielsen (Skillman Consulting)

Average rating:

(4.00, 4 ratings)

There is mounting evidence that the widespread deployment of machine learning and artificial intelligence in business and government applications is reproducing or even amplifying existing prejudices and social inequalities. Aileen Nielsen demonstrates how to identify and avoid bias and other unfairness in your analyses. Read more.

11:20am–12:00pm Wednesday, 09/12/2018

Semantic recommendations

Location: 1A 06/07 Level: Beginner

Secondary topics: Deep Learning, Recommendation Systems

Shioulin Sam (Cloudera Fast Forward Labs)

Average rating:

(3.25, 4 ratings)

Recent advances in deep learning allow us to use the semantic content of items in recommendation systems, addressing a weakness of traditional methods. Shioulin Sam explores the limitations of classical approaches and explains how using the content of items can help solve common recommendation pitfalls, such as the cold start problem, and open up new product possibilities. Read more.

11:20am–12:00pm Wednesday, 09/12/2018

BlazeIt: An exploratory video analytics engine

Location: 1A 08 Level: Advanced

Secondary topics: Media, Marketing, Advertising

Daniel Kang (Stanford University)

Average rating:

(4.00, 2 ratings)

Daniel Kang offers an overview of exploratory video analytics engine BlazeIt, which offers FrameQL, a declarative SQL-like language for querying video, and a query optimizer for executing these queries. You'll see how FrameQL can capture a large set of real-world queries ranging from aggregation and scrubbing and how BlazeIt can execute certain queries up to 2,000x faster than a naive approach. Read more.

11:20am–12:00pm Wednesday, 09/12/2018

Breaking the rules: End-stage renal disease prediction

Location: 1A 12/14 Level: Beginner

Secondary topics: Health and Medicine

Olga Cuznetova (Optum), Manna Chang (Optum)

Average rating:

(3.33, 3 ratings)

Olga Cuznetova and Manna Chang demonstrate supervised and unsupervised learning methods to work with claims data and explain how the methods complement each other. The supervised method looks at CKD patients at risk of developing end-stage renal disease (ESRD), while the unsupervised approach looks at the classification of patients that tend to develop this disease faster than others. Read more.

11:20am–12:00pm Wednesday, 09/12/2018

Machine learning for time series: What works and what doesn't

Location: 1A 15/16 Level: Intermediate

Secondary topics: Deep Learning, Retail and e-commerce, Temporal data and time-series analytics

Mikio Braun (Zalando)

Average rating:

(4.86, 7 ratings)

Time series data has many applications in industry, from analyzing server metrics to monitoring IoT signals and outlier detection. Mikio Braun offers an overview of time series analysis with a focus on modern machine learning approaches and practical considerations, including recommendations for what works and what doesn’t, and industry use cases. Read more.

11:20am–12:00pm Wednesday, 09/12/2018

Deep learning: Assessing analytics project feasibility and requirements (sponsored by NVIDIA)

Location: 1 E15

Ward Eldred (NVIDIA)

Average rating:

(5.00, 2 ratings)

Ward Eldred offers an overview of the types of analytical problems that can be solved using deep learning and shares a set of heuristics that can be used to evaluate the feasibility of analytical AI projects. Read more.

1:15pm–1:55pm Wednesday, 09/12/2018

Document vectors in the wild: Building a content recommendation system for Reuters.com

Location: 1A 06/07 Level: Intermediate

Secondary topics: Media, Marketing, Advertising, Recommendation Systems, Text and Language processing and analysis

James Dreiss (Reuters)

Average rating:

(3.67, 3 ratings)

James Dreiss discusses the challenges in building a content recommendation system for one of the largest news sites in the world, Reuters.com. The particularities of the system include developing a scrolling newsfeed and the use of document vectors for semantic representation of content. Read more.

1:15pm–1:55pm Wednesday, 09/12/2018

Why data scientists should love Linux containers

Location: 1A 08 Level: Beginner

Secondary topics: Model lifecycle management

William Benton (Red Hat)

Average rating:

(5.00, 2 ratings)

Containers are a hot technology for application developers, but they also provide key benefits for data scientists. William Benton details the advantages of containers for data scientists and AI developers, focusing on high-level tools that will enable you to become more productive and collaborate more effectively. Read more.

1:15pm–1:55pm Wednesday, 09/12/2018

Correlation analysis on live data streams

Location: 1A 12/14 Level: Intermediate

Secondary topics: Media, Marketing, Advertising, Temporal data and time-series analytics

Arun Kejariwal (Independent), Francois Orsini (MZ)

Average rating:

(4.00, 1 rating)

The rate of growth of data volume and velocity has been accelerating along with increases in the variety of data sources. This poses a significant challenge to extracting actionable insights in a timely fashion. Arun Kejariwal and Francois Orsini explain how marrying correlation analysis with anomaly detection can help and share techniques to guide effective decision making. Read more.

1:15pm–1:55pm Wednesday, 09/12/2018

Harnessing and customizing state-of-the-art recommendation solutions with OpenRec

Location: 1A 15/16 Level: Intermediate

Secondary topics: Deep Learning, Media, Marketing, Advertising, Recommendation Systems, Retail and e-commerce

Longqi Yang (Cornell Tech, Cornell University)

State-of-the-art recommendation algorithms are increasingly complex and no longer one size fits all. Current monolithic development practice poses significant challenges to rapid, iterative, and systematic, experimentation. Longqi Yang explains how to use OpenRec to easily customize state-of-the-art solutions for diverse scenarios. Read more.

1:15pm–1:55pm Wednesday, 09/12/2018

Simplifying AI infrastructure: Lessons in scaling a deep learning enterprise (sponsored by NVIDIA)

Location: 1 E15

Darrin Johnson (NVIDIA)

While every enterprise is on a mission to infuse its business with deep learning, few know how to build the infrastructure to get them there. Darrin Johnson shares insights and best practices learned from NVIDIA's deep learning deployments around the globe that you can leverage to shorten deployment timeframes, improve developer productivity, and streamline operations. Read more.

2:05pm–2:45pm Wednesday, 09/12/2018

Diversification in recommender systems: Using topical variety to increase user satisfaction

Location: 1A 06/07 Level: Intermediate

Secondary topics: Media, Marketing, Advertising, Recommendation Systems

Ahsan Ashraf (Pinterest)

Online recommender systems often rely heavily on user engagement features. This can cause a bias toward exploitation over exploration, overoptimizing on users' interests. Content diversification is important for user satisfaction, but measuring and evaluating impact is challenging. Ahsan Ashraf outlines techniques used at Pinterest that drove ~2–3% impression gains and a ~1% time-spent gain. Read more.

2:05pm–2:45pm Wednesday, 09/12/2018

Bighead: Airbnb's end-to-end machine learning platform

Location: 1A 08 Level: Beginner

Secondary topics: Data Platforms, Model lifecycle management, Retail and e-commerce

Atul Kale (Airbnb), Xiaohan Zeng (Airbnb)

Average rating:

(5.00, 3 ratings)

Atul Kale and Xiaohan Zeng offer an overview of Bighead, Airbnb's user-friendly and scalable end-to-end machine learning framework that powers Airbnb's data-driven products. Built on Python, Spark, and Kubernetes, Bighead integrates popular libraries like TensorFlow, XGBoost, and PyTorch and is designed be used in modular pieces. Read more.

2:05pm–2:45pm Wednesday, 09/12/2018

Continuous machine learning over streaming data: The story continues.

Location: 1A 12/14 Level: Intermediate

Secondary topics: Retail and e-commerce, Temporal data and time-series analytics

Roger Barga (Amazon Web Services), Sudipto Guha (Amazon Web Services), Kapil Chhabra (Amazon Web Services )

Average rating:

(5.00, 3 ratings)

Roger Barga, Sudipto Guha, and Kapil Chhabra explain how unsupervised learning with the robust random cut forest (RRCF) algorithm enables insights into streaming data and share new applications to impute missing values, forecast future values, detect hotspots, and perform classification tasks. They also demonstrate how to implement unsupervised learning over massive data streams. Read more.

2:05pm–2:45pm Wednesday, 09/12/2018

Achieving personalization with LSTMs

Location: 1A 15/16 Level: Intermediate

Secondary topics: Deep Learning, Recommendation Systems, Temporal data and time-series analytics, Transportation and Logistics

Ankit Jain (Uber)

Average rating:

(3.00, 3 ratings)

Personalization is a common theme in social networks and ecommerce businesses. Personalization at Uber involves an understanding of how each driver and rider is expected to behave on the platform. Ankit Jain explains how Uber employs deep learning using LSTMs and its huge database to understand and predict the behavior of each and every user on the platform. Read more.

2:05pm–2:45pm Wednesday, 09/12/2018

Kubernetes on GPUs (sponsored by NVIDIA)

Location: 1 E15

Michael Balint (NVIDIA)

Michael Balint explains how NVIDIA employs its own distribution of Kubernetes, in conjunction with DGX hardware, to make the most efficient use of GPU resources and scale its efforts across a cluster, allowing multiple users to run experiments and push their finished work to production. Read more.

2:55pm–3:35pm Wednesday, 09/12/2018

Perverse incentives in metrics: Inequality in the like economy

Location: 1A 06/07 Level: Intermediate

Secondary topics: Ethics and Privacy, Media, Marketing, Advertising, Recommendation Systems

Bonnie Barrilleaux (LinkedIn)

Average rating:

(4.50, 4 ratings)

As LinkedIn encouraged members to join conversations, it found itself in danger of creating a "rich get richer" economy in which a few creators got an increasing share of all feedback. Bonnie Barrilleaux explains why you must regularly reevaluate metrics to avoid perverse incentives—situations where efforts to increase the metric cause unintended negative side effects. Read more.

2:55pm–3:35pm Wednesday, 09/12/2018

Solving the cold start problem: Data and model aggregation using differential privacy

Location: 1A 08 Level: Beginner

Secondary topics: Ethics and Privacy

Chang Liu (Georgian Partners )

Average rating:

(5.00, 1 rating)

Chang Liu offers an overview of a common problem faced by many software companies, the cold-start problem, and explains how Georgian Partners has been successful at solving this problem by transferring knowledge from existing data through differentially private data aggregation. Read more.

2:55pm–3:35pm Wednesday, 09/12/2018

50 reasons to learn the shell for doing data science

Location: 1A 12/14 Level: Beginner

Jeroen Janssens (Data Science Workshops)

Average rating:

(1.50, 2 ratings)

"Anyone who does not have the command line at their beck and call is really missing something," tweeted Tim O'Reilly when Jeroen Janssens's Data Science at the Command Line was recently made available online for free. Join Jeroen to learn what you're missing out on if you're not applying the command line and many of its power tools to typical data science problems. Read more.

2:55pm–3:35pm Wednesday, 09/12/2018

A deep learning approach for precipitation nowcasting with RNN using Analytics Zoo on BigDL

Location: 1A 15/16 Level: Intermediate

Secondary topics: Deep Learning, Temporal data and time-series analytics

Alex Heye (Cray), Ding Ding (Intel)

Precipitation nowcasting is used to predict the future rainfall intensity over a relatively short timeframe. The forecasting resolution and time accuracy required are much higher than for other traditional forecasting tasks. Alexander Heye and Ding Ding explain how to build a precipitation nowcasting system with recurrent neural networks using BigDL on Apache Spark. Read more.

4:35pm–5:15pm Wednesday, 09/12/2018

Anxiety at scale: How Investopedia used readership data to track market volatility

Location: 1A 06/07 Level: Beginner

Secondary topics: Financial Services, Text and Language processing and analysis

Masha Westerlund (Investopedia)

Average rating:

(5.00, 2 ratings)

Businesses rely on user data to power their sites, products, and sales. Can we give back by sharing those insights with users? Masha Westerlund explains how Investopedia harnessed reader data to build an index that tracks market anxiety and moves with the VIX, a proprietary measure of market volatility. You'll see how thinking outside the box helps turn data into tools for users, not stakeholders. Read more.

4:35pm–5:15pm Wednesday, 09/12/2018

Programming by input-output examples

Location: 1A 08 Level: Intermediate

Sumit Gulwani (Microsoft)

Programming by input-output examples (PBE) is a new frontier in AI, set to revolutionize the programming experience for the masses. It can enable end users—99% of whom are nonprogrammers—to create small scripts and make data scientists 10–100x more productive for many data wrangling tasks. Sumit Gulwani leads a deep dive into this new programming paradigm and explores the science behind it. Read more.

4:35pm–5:15pm Wednesday, 09/12/2018

VC trends in machine learning and data science

Location: 1A 12/14

Sarah Catanzaro (Amplify Partners), Rama Sekhar (Norwest Venture Partners), Zavain Dar (Lux Capital), Jonathan Lehr (Work-Bench), Crystal Huang (NEA)

In this panel discussion, venture capital investors explain how startups can accelerate enterprise adoption of machine learning and explore the new tech trends that will give rise to the next transformation in the big data landscape. Read more.

4:35pm–5:15pm Wednesday, 09/12/2018

When Tiramisu meets online fashion retail

Location: 1A 15/16 Level: Intermediate

Secondary topics: Deep Learning, Media, Marketing, Advertising, Retail and e-commerce

Patty Ryan (Microsoft), CY Yam (Microsoft), Elena Terenzi (Microsoft)

Average rating:

(5.00, 1 rating)

Large online fashion retailers must efficiently maintain catalogues of millions of items. Due to human error, it's not unusual that some items have duplicate entries. Since manually trawling such a large catalogue is next to impossible, how can you find these entries? Patty Ryan, CY Yam, and Elena Terenzi explain how they applied deep learning for image segmentation and background removal. Read more.

4:35pm–5:15pm Wednesday, 09/12/2018

GPU-accelerated analytics and machine learning ecosystems (Inception Showcase sponsored by NVIDIA)

Location: 1 E15

Alen Capalik (FASTDATA.io), Jim McHugh (NVIDIA), SriSatish Ambati (H2O.ai), Tim Delisle (Datalogue)

Explore case studies from Datalogue, FASTDATA.io, and H20.ai that demonstrate how GPU-accelerated analytics, machine learning, and ETL help companies overcome slow queries and tedious data preparation process, dynamically correlate among data, and enjoy automatic feature engineering. Read more.

5:25pm–6:05pm Wednesday, 09/12/2018

Deploying machine learning models in the enterprise

Location: 1E 10/11 Level: Intermediate

Secondary topics: Model lifecycle management

Diego Oppenheimer (Algorithmia)

Average rating:

(4.50, 2 ratings)

After big investments in collecting and cleaning data and building machine learning (ML) models, enterprises face big challenges in deploying models to production and managing a growing portfolio of ML models. Diego Oppenheimer covers the strategic and technical hurdles each company must overcome and the best practices developed while deploying over 4,000 ML models for 70,000 engineers. Read more.

5:25pm–6:05pm Wednesday, 09/12/2018

Network effects: Working with modern graph analytic systems

Location: 1A 06/07 Level: Intermediate

Secondary topics: Financial Services

Zachary Hanif (Capital One)

Average rating:

(4.67, 3 ratings)

An understanding of graph-based analytical techniques can be extremely powerful when applied to modern practical problems, and modern frameworks and analytical techniques are making graph analysis methods viable for increasingly large, complex tasks. Zachary Hanif examines three prominent graph analytic methods, including graph convolutional networks, and applies them to concrete use cases. Read more.

5:25pm–6:05pm Wednesday, 09/12/2018

From emotion analysis and topic extraction to narrative modeling

Location: 1A 08 Level: Beginner

Secondary topics: Text and Language processing and analysis

Andreea Kremm (Netex Group), Mohammed Ibraaz Syed (UCLA)

Average rating:

(4.00, 2 ratings)

Narrative economics studies the impact of popular narratives and stories on economic fluctuations in the context of human interests and emotions. Andreea Kremm and Mohammed Ibraaz Syed describe the use of emotion analysis, entity relationship extraction, and topic modeling in modeling narratives from written human communication. Read more.

5:25pm–6:05pm Wednesday, 09/12/2018

A roadmap for open data science and AI for business: Panel discussion with State Street

Location: 1A 12/14 Level: Non-technical

Bethann Noble (Cloudera), Daniel Huss (State Street), Abhishek Kodi (State Street)

Average rating:

(4.00, 1 rating)

Bethann Noble, Abhishek Kodi, and Daniel Huss share their experience and best practices for designing and executing on a roadmap for open data science and AI for business. Read more.

5:25pm–6:05pm Wednesday, 09/12/2018

Accelerating financial data science workflows with GPUs

Location: 1A 15/16 Level: Intermediate

Secondary topics: Financial Services

Joshua Patterson (NVIDIA), Onur Yilmaz (NVIDIA)

GPUs have allowed financial firms to accelerate their computationally demanding workloads. Today, the bottleneck has moved completely to ETL. The GPU Open Analytics Initiative (GoAi) is helping accelerate ETL while keeping the entire workflow on GPUs. Joshua Patterson and Onur Yilmaz discuss several GPU-accelerated data science tools and libraries. Read more.

5:25pm–6:05pm Wednesday, 09/12/2018

Accelerate AI with synthetic data using generative adversarial networks (GAN) (sponsored by NVIDIA)

Location: 1 E15

Renee Yao (NVIDIA)

Average rating:

(5.00, 1 rating)

Renee Yao explains how generative adversarial networks (GAN) are successfully used to improve data generation and explores specific real-world examples where customers have deployed GANs to solve challenges in healthcare, space, transportation, and retail industries. Read more.

11:20am–12:00pm Thursday, 09/13/2018

Applying petabyte-scale analytics and machine learning to billions of news reading sessions

Location: 1A 06/07 Level: Intermediate

Secondary topics: Media, Marketing, Advertising, Text and Language processing and analysis

Andrew Montalenti (Parse.ly )

Average rating:

(5.00, 1 rating)

What can we learn from a one-billion-person live poll of the internet? Andrew Montalenti explains how Parse.ly has gathered a unique dataset of news reading sessions of billions of devices, peaking at over two million sessions per minute on thousands of high-traffic news and information websites, and how the company uses this data to unearth the secrets behind online content. Read more.

11:20am–12:00pm Thursday, 09/13/2018

Predicting residential occupancy and hot water usage from high-frequency, multivector utilities data

Location: 1A 08 Level: Intermediate

Secondary topics: Temporal data and time-series analytics

Cris Lowery (Baringa Partners), Marc Warner (ASI)

Average rating:

(4.00, 1 rating)

In EU households, heating and hot water alone account for 80% of energy usage. Cristobal Lowery and Marc Warner explain how future home energy management systems could improve their energy efficiency by predicting resident needs through utilities data, with a particular focus on the key data features, the need for data compression, and the data quality challenges. Read more.

11:20am–12:00pm Thursday, 09/13/2018

The Vega project: Building an ecosystem of tools for interactive visualization

Location: 1A 12/14 Level: Beginner

Jeffrey Heer (Trifacta | University of Washington)

Average rating:

(4.75, 4 ratings)

Jeffrey Heer offers an overview of Vega and Vega-Lite—high-level declarative languages for interactive visualization that support exploratory data analysis, communication, and the development of new visualization tools. Read more.

11:20am–12:00pm Thursday, 09/13/2018

Democratizing deep learning with transfer learning

Location: 1A 15/16 Level: Beginner

Secondary topics: Deep Learning

Lars Hulstaert (Microsoft)

Average rating:

(5.00, 1 rating)

Transfer learning allows data scientists to leverage insights from large labeled datasets. The general idea of transfer learning is to use knowledge learned from tasks for which a lot of labeled data is available in settings where little labeled data is available. Lars Hulstaert explains what transfer learning is and how it can boost your NLP or CV pipelines. Read more.

1:10pm–1:50pm Thursday, 09/13/2018

Spark NLP in action: How SelectData uses AI to better understand home health patients

Location: 1A 06/07 Level: Intermediate

Secondary topics: Health and Medicine, Text and Language processing and analysis

David Talby (Pacific AI), Alberto Andreotti (John Snow Labs), Stacy Ashworth (SelectData), Tawny Nichols (Select Data)

Average rating:

(3.00, 4 ratings)

David Talby, Alberto Andreotti, Stacy Ashworth, and Tawny Nichols outline a question-answering system for accurately extracting facts from free-text patient records and share best practices for training domain-specific deep learning NLP models. The solution is based on Spark NLP, an extension of Spark ML that provides state-of-the-art performance and accuracy for natural language understanding. Read more.

1:10pm–1:50pm Thursday, 09/13/2018

Scalable machine learning for data cleaning

Location: 1A 08 Level: Non-technical

Secondary topics: Data preparation, governance and privacy

Ihab Ilyas (University of Waterloo)

Average rating:

(5.00, 2 ratings)

Machine learning tools promise to help solve data curation problems. While the principles are well understood, the engineering details in configuring and deploying ML techniques are the biggest hurdle. Ihab Ilyas explains why leveraging data semantics and domain-specific knowledge is key in delivering the optimizations necessary for truly scalable ML curation solutions. Read more.

1:10pm–1:50pm Thursday, 09/13/2018

Augmented reality: Going beyond plots in 3D

Location: 1A 12/14 Level: Beginner

Secondary topics: Ethics and Privacy, Financial Services, Media, Marketing, Advertising

Bob Levy (Virtual Cove, Inc.)

Average rating:

(3.00, 1 rating)

Augmented reality opens a completely new lens on your data through which you see and accomplish amazing things. Bob Levy explains how to use simple Python scripts to leverage completely new plot types. You'll explore use cases revealing new insight into financial markets data as well as new ways of interacting with data that build trust in otherwise “black box” machine learning solutions. Read more.

1:10pm–1:50pm Thursday, 09/13/2018

A high-performance system for deep learning inference and visual inspection

Location: 1A 15/16 Level: Intermediate

Secondary topics: Data Platforms, Deep Learning

Moty Fania (Intel), Sergei Kom (Intel)

Average rating:

(5.00, 1 rating)

Moty Fania and Sergei Kom share their experience and lessons learned implementing an AI inference platform to enable internal visual inspection use cases. The platform is based on open source technologies and was designed for real-time, streaming, and online actuation. Read more.

2:00pm–2:40pm Thursday, 09/13/2018

Let the machines learn to improve data quality

Location: 1A 08 Level: Intermediate

Secondary topics: Data preparation, governance and privacy, Financial Services

Archana Anandakrishnan (American Express)

Average rating:

(3.20, 5 ratings)

Building accurate machine learning models hinges on the quality of the data. Errors and anomalies get in the way of data scientists doing their best work. Archana Anandakrishnan explains how American Express created an automated, scalable system for measurement and management of data quality. The methods are modular and adaptable to any domain where accurate decisions from ML models are critical. Read more.

2:00pm–2:40pm Thursday, 09/13/2018

Stories beat statistics: How to master the art and science of data storytelling

Location: 1A 12/14 Level: Non-technical

Brent Dykes (Domo)

Average rating:

(4.78, 9 ratings)

Companies collect all kinds of data and use advanced tools and techniques to find insights, but they often fail in the last mile: communicating insights effectively to drive change. Brent Dykes discusses the power that stories wield over statistics and explores the art and science of data storytelling—an essential skill in today’s data economy. Read more.

2:00pm–2:40pm Thursday, 09/13/2018

Job recommendations leveraging deep learning using Analytics Zoo on Apache Spark and BigDL

Location: 1A 15/16 Level: Intermediate

Secondary topics: Deep Learning, Media, Marketing, Advertising

Guoqiong Song (Intel), Wenjing Zhan (Talroo), Jacob Eisinger (Talroo )

Can the talent industry make the job search/match more relevant and personalized for a candidate by leveraging deep learning techniques? Guoqiong Song, Wenjing Zhan, and Jacob Eisinger demonstrate how to leverage distributed deep learning framework BigDL on Apache Spark to predict a candidate’s probability of applying to specific jobs based on their résumé. Read more.

3:30pm–4:10pm Thursday, 09/13/2018

Modeling time series in R

Location: 1A 06/07 Level: Beginner

Secondary topics: Temporal data and time-series analytics

Jared Lander (Lander Analytics)

Average rating:

(5.00, 3 ratings)

Temporal data is being produced in ever-greater quantity, but fortunately our time series capabilities are keeping pace. Jared Lander explores techniques for modeling time series, from traditional methods such as ARMA to more modern tools such as Prophet and machine learning models like XGBoost and neural nets. Along the way, Jared shares theory and code for training these models. Read more.

3:30pm–4:10pm Thursday, 09/13/2018

Data visualization in mixed reality with Python

Location: 1A 12/14 Level: Beginner

Anna Nicanorova (Annalect)

Average rating:

(3.00, 3 ratings)

Data visualization is supposed to be our map to information. However, contemporary charting techniques have a few shortcomings, including context reduction, hard numeric grasp, and perceptual dehumanization. Anna Nicanorova explains how augmented reality can solve these issues by presenting an intuitive and interactive environment for data exploration. Read more.

3:30pm–4:10pm Thursday, 09/13/2018

Classifying job execution using deep learning

Location: 1A 15/16 Level: Advanced

Secondary topics: Deep Learning

Ash Munshi (Pepperdata)

Ash Munshi outlines a technique for labeling applications using runtime measurements of CPU, memory, and network I/O along with a deep neural network. This labeling groups the applications into buckets that have understandable characteristics, which can then be used to reason about the cluster and its performance. Read more.

4:20pm–5:00pm Thursday, 09/13/2018

Analytics maturity: Industry trends and financial impacts

Location: 1A 06/07 Level: Non-technical

Secondary topics: Machine Learning in the enterprise

Bill Franks (International Institute For Analytics)

Drawing on a recent study of the analytics maturity level of large enterprises by the International Institute for Analytics, Bill Franks discusses how maturity varies by industry, shares key steps organizations can take to move up the maturity scale, and explains how the research correlates analytics maturity with a wide range of success metrics, including financial and reputational measures. Read more.

4:20pm–5:00pm Thursday, 09/13/2018

Infrastructure for deploying machine learning to production in large financial institutions: Lessons learned and best practices

Location: 1A 08 Level: Intermediate

Secondary topics: Financial Services, Model lifecycle management

Harish Doddi (Datatron), Jerry Xu (Datatron Technologies)

Large financial institutions have many data science teams (e.g., those for fraud, credit risk, and marketing), each often using diverse set of tools to build predictive models. There are many challenges involved in productionizing these predictive AI models. Harish Doddi and Jerry Xu share challenges and lessons learned deploying AI models to production in large financial institutions. Read more.

4:20pm–5:00pm Thursday, 09/13/2018

UX strategies for underperforming analytics services and data products

Location: 1A 12/14 Level: Non-technical

Secondary topics: Machine Learning in the enterprise

Brian O'Neill (Designing for Analytics)

Average rating:

(5.00, 5 ratings)

Gartner says 85%+ of big data projects will fail, despite the fact your company may have invested millions on engineering implementation. Why are customers and employees not engaging with these products and services? Brian O'Neill explains why a "people first, technology second" mission—a design strategy, in other words—enables the best UX and business outcomes possible. Read more.

4:20pm–5:00pm Thursday, 09/13/2018

Deep learning on audio in Azure to detect sounds in real time

Location: 1A 15/16 Level: Beginner

Secondary topics: Deep Learning

Swetha Machanavajhala (Microsoft), Xiaoyong Zhu (Microsoft)

Average rating:

(5.00, 3 ratings)

In this auditory world, the human brain processes and reacts effortlessly to a variety of sounds. While many of us take this for granted, there are over 360 million in this world who are deaf or hard of hearing. Swetha Machanavajhala and Xiaoyong Zhu explain how to make the auditory world inclusive and meet the great demand in other sectors by applying deep learning on audio in Azure. Read more.

Data Science & Machine Learning

If you're in data, you need to understand machine learning

Sponsorship Opportunities

Partner Opportunities

Contact Us