Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Data Science & Machine Learning

September 11-13, 2018
New York, NY

If you're in data, you need to understand machine learning

Machine learning lets you discover hidden insight from your data. It's a simple idea with phenomenal impact and sophisticated use cases like recommenders, text mining, real-time analytics, large-scale anomaly detection, and business forecasting.

At Strata, you’ll get a deeper and broader understanding of machine and deep learning—take a look at the sessions below.

Tuesday Sep 11: Tutorials (Gold & Silver passes)
Wednesday Sep 12: Keynotes & Sessions (Platinum, Gold, Silver & Bronze passes)
9:00am | Location: 3E
Strata Data Conference Keynotes
10:50am
Morning break
Thursday Sep 13: Keynotes & Sessions (Platinum, Gold, Silver & Bronze passes)
9:00am | Location: 3E
Strata Data Conference Keynotes
10:50am
Morning break
9:00am–12:30pm Tuesday, 09/11/2018
Location: 1A 10 Level: Intermediate
David Arpin (Amazon Web Services)
Average rating: **...
(2.80, 10 ratings)
David Arpin walks you through building a machine learning application, from data manipulation to algorithm training to deployment to a real-time prediction endpoint, using Spark and Amazon SageMaker. Read more.
9:00am–12:30pm Tuesday, 09/11/2018
Location: 1A 21/22 Level: Intermediate
Secondary topics:  Deep Learning, Text and Language processing and analysis
Garrett Hoffman (StockTwits)
Average rating: ****.
(4.75, 4 ratings)
Garrett Hoffman walks you through deep learning methods for natural language processing and natural language understanding tasks, using a live example in Python and TensorFlow with StockTwits data. Methods include word2vec, recurrent neural networks and variants (LSTM, GRU), and convolutional neural networks. Read more.
9:00am–12:30pm Tuesday, 09/11/2018
Location: 1E 06 Level: Intermediate
Secondary topics:  Model lifecycle management
Dan Crankshaw (UC Berkeley RISELab)
Average rating: *****
(5.00, 1 rating)
Dan Crankshaw offers an overview of the current challenges in deploying machine applications into production and the current state of prediction serving infrastructure. He then leads a deep dive into the Clipper serving system and shows you how to get started. Read more.
9:00am–12:30pm Tuesday, 09/11/2018
Location: 1E 14 Level: Beginner
Viviana Acquaviva (CUNY New York City College of Technology)
Average rating: ****.
(4.75, 4 ratings)
Using interesting, diverse publicly available datasets and actual problems in astronomy research, Viviana Acquaviva leads an intermediate tutorial on machine learning. You'll learn how to customize algorithms and evaluation metrics required by scientific applications and discover best practices for choosing, developing, and evaluating machine learning algorithms in "real-world" datasets. Read more.
9:00am–12:30pm Tuesday, 09/11/2018
Location: 1E 15/16 Level: Intermediate
Secondary topics:  Deep Learning, Recommendation Systems
Vijay Agneeswaran (Publicis Sapient), Abhishek Kumar (Publicis Sapient)
Average rating: ****.
(4.40, 5 ratings)
Abhishek Kumar and Vijay Srinivas Agneeswaran offer an introduction to deep learning-based recommendation and learning-to-rank systems using TensorFlow. You'll learn how to build a recommender system based on intent prediction using deep learning that is based on a real-world implementation for an ecommerce client. Read more.
1:30pm–5:00pm Tuesday, 09/11/2018
Location: 1A 10 Level: Intermediate
Jeroen Janssens (Data Science Workshops B.V.)
Average rating: ***..
(3.00, 3 ratings)
The Unix command line remains an amazing environment for efficiently performing tedious but essential data science tasks. By combining small, powerful command-line tools, you can quickly scrub, explore, and model your data as well as hack together prototypes. Join Jeroen Janssens for a hands-on workshop based on his book Data Science at the Command Line. Read more.
1:30pm–5:00pm Tuesday, 09/11/2018
Location: 1A 12/14 Level: Intermediate
Secondary topics:  Deep Learning, Temporal data and time-series analytics
Bruno Goncalves (Data For Science, Inc)
Average rating: ***..
(3.14, 7 ratings)
Time series are everywhere around us. Understanding them requires taking into account the sequence of values seen in previous steps and even long-term temporal correlations. Join Bruno Gonçalves to learn how to use recurrent neural networks to model and forecast time series and discover the advantages and disadvantages of recurrent neural networks with respect to more traditional approaches. Read more.
1:30pm–5:00pm Tuesday, 09/11/2018
Location: 1A 21/22 Level: Intermediate
Secondary topics:  Text and Language processing and analysis
David Talby (Pacific AI), Claudiu Branzan (Accenture), Alex Thomas (John Snow Labs)
Average rating: ***..
(3.00, 7 ratings)
David Talby, Claudiu Branzan, and Alex Thomas lead a hands-on tutorial for scalable NLP using the highly performant, highly scalable open source Spark NLP library. You’ll spend about half your time coding as you work through four sections, each with an end-to-end working codebase that you can change and improve. Read more.
1:30pm–5:00pm Tuesday, 09/11/2018
Location: 1E 07/08 Level: Intermediate
Secondary topics:  Deep Learning
Vartika Singh (Cloudera), Alan Silva (Cloudera), Alex Bleakley (Cloudera), Steven Totman (Cloudera), Mirko Kämpf (Cloudera), Syed Nasar (Cloudera)
Average rating: *....
(1.00, 1 rating)
Vartika Singh, Alan Silva, Alex Bleakley, Steven Totman, Mirko Kämpf, and Syed Nasar outline approaches for preprocessing, training, inference, and deployment across datasets (time series, audio, video, text, etc.) that leverage Spark, its extended ecosystem of libraries, and deep learning frameworks. Read more.
1:30pm–5:00pm Tuesday, 09/11/2018
Location: 1E 11 Level: Intermediate
Secondary topics:  Ethics and Privacy
Aileen Nielsen (Skillman Consulting)
Average rating: ****.
(4.00, 4 ratings)
There is mounting evidence that the widespread deployment of machine learning and artificial intelligence in business and government applications is reproducing or even amplifying existing prejudices and social inequalities. Aileen Nielsen demonstrates how to identify and avoid bias and other unfairness in your analyses. Read more.
11:20am–12:00pm Wednesday, 09/12/2018
Location: 1A 06/07 Level: Beginner
Secondary topics:  Deep Learning, Recommendation Systems
Shioulin Sam (Cloudera Fast Forward Labs)
Average rating: ***..
(3.25, 4 ratings)
Recent advances in deep learning allow us to use the semantic content of items in recommendation systems, addressing a weakness of traditional methods. Shioulin Sam explores the limitations of classical approaches and explains how using the content of items can help solve common recommendation pitfalls, such as the cold start problem, and open up new product possibilities. Read more.
11:20am–12:00pm Wednesday, 09/12/2018
Location: 1A 08 Level: Advanced
Secondary topics:  Media, Marketing, Advertising
Daniel Kang (Stanford University)
Average rating: ****.
(4.00, 2 ratings)
Daniel Kang offers an overview of exploratory video analytics engine BlazeIt, which offers FrameQL, a declarative SQL-like language for querying video, and a query optimizer for executing these queries. You'll see how FrameQL can capture a large set of real-world queries ranging from aggregation and scrubbing and how BlazeIt can execute certain queries up to 2,000x faster than a naive approach. Read more.
11:20am–12:00pm Wednesday, 09/12/2018
Location: 1A 12/14 Level: Beginner
Secondary topics:  Health and Medicine
Olga Cuznetova (Optum), Manna Chang (Optum)
Average rating: ***..
(3.33, 3 ratings)
Olga Cuznetova and Manna Chang demonstrate supervised and unsupervised learning methods to work with claims data and explain how the methods complement each other. The supervised method looks at CKD patients at risk of developing end-stage renal disease (ESRD), while the unsupervised approach looks at the classification of patients that tend to develop this disease faster than others. Read more.
11:20am–12:00pm Wednesday, 09/12/2018
Location: 1A 15/16 Level: Intermediate
Secondary topics:  Deep Learning, Retail and e-commerce, Temporal data and time-series analytics
Mikio Braun (Zalando SE)
Average rating: ****.
(4.86, 7 ratings)
Time series data has many applications in industry, from analyzing server metrics to monitoring IoT signals and outlier detection. Mikio Braun offers an overview of time series analysis with a focus on modern machine learning approaches and practical considerations, including recommendations for what works and what doesn’t, and industry use cases. Read more.
11:20am–12:00pm Wednesday, 09/12/2018
Location: 1 E15
Ward Eldred (NVIDIA)
Average rating: *****
(5.00, 2 ratings)
Ward Eldred offers an overview of the types of analytical problems that can be solved using deep learning and shares a set of heuristics that can be used to evaluate the feasibility of analytical AI projects. Read more.
1:15pm–1:55pm Wednesday, 09/12/2018
Location: 1A 06/07 Level: Intermediate
Secondary topics:  Media, Marketing, Advertising, Recommendation Systems, Text and Language processing and analysis
James Dreiss (Reuters)
Average rating: ***..
(3.67, 3 ratings)
James Dreiss discusses the challenges in building a content recommendation system for one of the largest news sites in the world, Reuters.com. The particularities of the system include developing a scrolling newsfeed and the use of document vectors for semantic representation of content. Read more.
1:15pm–1:55pm Wednesday, 09/12/2018
Location: 1A 08 Level: Beginner
Secondary topics:  Model lifecycle management
William Benton (Red Hat)
Average rating: *****
(5.00, 2 ratings)
Containers are a hot technology for application developers, but they also provide key benefits for data scientists. William Benton details the advantages of containers for data scientists and AI developers, focusing on high-level tools that will enable you to become more productive and collaborate more effectively. Read more.
1:15pm–1:55pm Wednesday, 09/12/2018
Location: 1A 12/14 Level: Intermediate
Secondary topics:  Media, Marketing, Advertising, Temporal data and time-series analytics
Arun Kejariwal (Independent), Francois Orsini (MZ)
Average rating: ****.
(4.00, 1 rating)
The rate of growth of data volume and velocity has been accelerating along with increases in the variety of data sources. This poses a significant challenge to extracting actionable insights in a timely fashion. Arun Kejariwal and Francois Orsini explain how marrying correlation analysis with anomaly detection can help and share techniques to guide effective decision making. Read more.
1:15pm–1:55pm Wednesday, 09/12/2018
Location: 1A 15/16 Level: Intermediate
Secondary topics:  Deep Learning, Media, Marketing, Advertising, Recommendation Systems, Retail and e-commerce
Longqi Yang (Cornell Tech, Cornell University)
State-of-the-art recommendation algorithms are increasingly complex and no longer one size fits all. Current monolithic development practice poses significant challenges to rapid, iterative, and systematic, experimentation. Longqi Yang explains how to use OpenRec to easily customize state-of-the-art solutions for diverse scenarios. Read more.
1:15pm–1:55pm Wednesday, 09/12/2018
Location: 1 E15
Darrin Johnson (NVIDIA)
While every enterprise is on a mission to infuse its business with deep learning, few know how to build the infrastructure to get them there. Darrin Johnson shares insights and best practices learned from NVIDIA's deep learning deployments around the globe that you can leverage to shorten deployment timeframes, improve developer productivity, and streamline operations. Read more.
2:05pm–2:45pm Wednesday, 09/12/2018
Location: 1A 06/07 Level: Intermediate
Secondary topics:  Media, Marketing, Advertising, Recommendation Systems
Ahsan Ashraf (Pinterest)
Online recommender systems often rely heavily on user engagement features. This can cause a bias toward exploitation over exploration, overoptimizing on users' interests. Content diversification is important for user satisfaction, but measuring and evaluating impact is challenging. Ahsan Ashraf outlines techniques used at Pinterest that drove ~2–3% impression gains and a ~1% time-spent gain. Read more.
2:05pm–2:45pm Wednesday, 09/12/2018
Location: 1A 08 Level: Beginner
Secondary topics:  Data Platforms, Model lifecycle management, Retail and e-commerce
Atul Kale (Airbnb), Xiaohan Zeng (Airbnb)
Average rating: *****
(5.00, 3 ratings)
Atul Kale and Xiaohan Zeng offer an overview of Bighead, Airbnb's user-friendly and scalable end-to-end machine learning framework that powers Airbnb's data-driven products. Built on Python, Spark, and Kubernetes, Bighead integrates popular libraries like TensorFlow, XGBoost, and PyTorch and is designed be used in modular pieces. Read more.
2:05pm–2:45pm Wednesday, 09/12/2018
Location: 1A 12/14 Level: Intermediate
Secondary topics:  Retail and e-commerce, Temporal data and time-series analytics
Roger Barga (Amazon Web Services), Sudipto Guha (Amazon Web Services), Kapil Chhabra (Amazon Web Services )
Average rating: *****
(5.00, 3 ratings)
Roger Barga, Sudipto Guha, and Kapil Chhabra explain how unsupervised learning with the robust random cut forest (RRCF) algorithm enables insights into streaming data and share new applications to impute missing values, forecast future values, detect hotspots, and perform classification tasks. They also demonstrate how to implement unsupervised learning over massive data streams. Read more.
2:05pm–2:45pm Wednesday, 09/12/2018
Location: 1A 15/16 Level: Intermediate
Secondary topics:  Deep Learning, Recommendation Systems, Temporal data and time-series analytics, Transportation and Logistics
Ankit Jain (Uber)
Average rating: ***..
(3.00, 3 ratings)
Personalization is a common theme in social networks and ecommerce businesses. Personalization at Uber involves an understanding of how each driver and rider is expected to behave on the platform. Ankit Jain explains how Uber employs deep learning using LSTMs and its huge database to understand and predict the behavior of each and every user on the platform. Read more.
2:05pm–2:45pm Wednesday, 09/12/2018
Location: 1 E15
Michael Balint (NVIDIA)
Michael Balint explains how NVIDIA employs its own distribution of Kubernetes, in conjunction with DGX hardware, to make the most efficient use of GPU resources and scale its efforts across a cluster, allowing multiple users to run experiments and push their finished work to production. Read more.
2:55pm–3:35pm Wednesday, 09/12/2018
Location: 1A 06/07 Level: Intermediate
Secondary topics:  Ethics and Privacy, Media, Marketing, Advertising, Recommendation Systems
Bonnie Barrilleaux (LinkedIn)
Average rating: ****.
(4.50, 4 ratings)
As LinkedIn encouraged members to join conversations, it found itself in danger of creating a "rich get richer" economy in which a few creators got an increasing share of all feedback. Bonnie Barrilleaux explains why you must regularly reevaluate metrics to avoid perverse incentives—situations where efforts to increase the metric cause unintended negative side effects. Read more.
2:55pm–3:35pm Wednesday, 09/12/2018
Location: 1A 08 Level: Beginner
Secondary topics:  Ethics and Privacy
Chang Liu (Georgian Partners )
Average rating: *****
(5.00, 1 rating)
Chang Liu offers an overview of a common problem faced by many software companies, the cold-start problem, and explains how Georgian Partners has been successful at solving this problem by transferring knowledge from existing data through differentially private data aggregation. Read more.
2:55pm–3:35pm Wednesday, 09/12/2018
Location: 1A 12/14 Level: Beginner
Jeroen Janssens (Data Science Workshops B.V.)
Average rating: *....
(1.50, 2 ratings)
"Anyone who does not have the command line at their beck and call is really missing something," tweeted Tim O'Reilly when Jeroen Janssens's Data Science at the Command Line was recently made available online for free. Join Jeroen to learn what you're missing out on if you're not applying the command line and many of its power tools to typical data science problems. Read more.
2:55pm–3:35pm Wednesday, 09/12/2018
Location: 1A 15/16 Level: Intermediate
Secondary topics:  Deep Learning, Temporal data and time-series analytics
Alex Heye (Cray), Ding Ding (Intel)
Precipitation nowcasting is used to predict the future rainfall intensity over a relatively short timeframe. The forecasting resolution and time accuracy required are much higher than for other traditional forecasting tasks. Alexander Heye and Ding Ding explain how to build a precipitation nowcasting system with recurrent neural networks using BigDL on Apache Spark. Read more.
4:35pm–5:15pm Wednesday, 09/12/2018
Location: 1A 06/07 Level: Beginner
Secondary topics:  Financial Services, Text and Language processing and analysis
Masha Westerlund (Investopedia)
Average rating: *****
(5.00, 2 ratings)
Businesses rely on user data to power their sites, products, and sales. Can we give back by sharing those insights with users? Masha Westerlund explains how Investopedia harnessed reader data to build an index that tracks market anxiety and moves with the VIX, a proprietary measure of market volatility. You'll see how thinking outside the box helps turn data into tools for users, not stakeholders. Read more.
4:35pm–5:15pm Wednesday, 09/12/2018
Location: 1A 08 Level: Intermediate
Sumit Gulwani (Microsoft)
Programming by input-output examples (PBE) is a new frontier in AI, set to revolutionize the programming experience for the masses. It can enable end users—99% of whom are nonprogrammers—to create small scripts and make data scientists 10–100x more productive for many data wrangling tasks. Sumit Gulwani leads a deep dive into this new programming paradigm and explores the science behind it. Read more.
4:35pm–5:15pm Wednesday, 09/12/2018
Location: 1A 12/14
Sarah Catanzaro (Amplify Partners), Rama Sekhar (Norwest Venture Partners), Zavain Dar (Lux Capital), Jonathan Lehr (Work-Bench), Crystal Huang (NEA)
In this panel discussion, venture capital investors explain how startups can accelerate enterprise adoption of machine learning and explore the new tech trends that will give rise to the next transformation in the big data landscape. Read more.
4:35pm–5:15pm Wednesday, 09/12/2018
Location: 1A 15/16 Level: Intermediate
Secondary topics:  Deep Learning, Media, Marketing, Advertising, Retail and e-commerce
Patty Ryan (Microsoft), CY Yam (Microsoft), Elena Terenzi (Microsoft)
Average rating: *****
(5.00, 1 rating)
Large online fashion retailers must efficiently maintain catalogues of millions of items. Due to human error, it's not unusual that some items have duplicate entries. Since manually trawling such a large catalogue is next to impossible, how can you find these entries? Patty Ryan, CY Yam, and Elena Terenzi explain how they applied deep learning for image segmentation and background removal. Read more.
4:35pm–5:15pm Wednesday, 09/12/2018
Location: 1 E15
Alen Capalik (FASTDATA.io), Jim McHugh (NVIDIA), SriSatish Ambati (H2O.ai), Tim Delisle (Datalogue)
Explore case studies from Datalogue, FASTDATA.io, and H20.ai that demonstrate how GPU-accelerated analytics, machine learning, and ETL help companies overcome slow queries and tedious data preparation process, dynamically correlate among data, and enjoy automatic feature engineering. Read more.
5:25pm–6:05pm Wednesday, 09/12/2018
Location: 1E 10/11 Level: Intermediate
Secondary topics:  Model lifecycle management
Diego Oppenheimer (Algorithmia)
Average rating: ****.
(4.50, 2 ratings)
After big investments in collecting and cleaning data and building machine learning (ML) models, enterprises face big challenges in deploying models to production and managing a growing portfolio of ML models. Diego Oppenheimer covers the strategic and technical hurdles each company must overcome and the best practices developed while deploying over 4,000 ML models for 70,000 engineers. Read more.
5:25pm–6:05pm Wednesday, 09/12/2018
Location: 1A 06/07 Level: Intermediate
Secondary topics:  Financial Services
Zachary Hanif (Capital One)
Average rating: ****.
(4.67, 3 ratings)
An understanding of graph-based analytical techniques can be extremely powerful when applied to modern practical problems, and modern frameworks and analytical techniques are making graph analysis methods viable for increasingly large, complex tasks. Zachary Hanif examines three prominent graph analytic methods, including graph convolutional networks, and applies them to concrete use cases. Read more.
5:25pm–6:05pm Wednesday, 09/12/2018
Location: 1A 08 Level: Beginner
Secondary topics:  Text and Language processing and analysis
Andreea Kremm (Netex Group), Mohammed Ibraaz Syed (UCLA)
Average rating: ****.
(4.00, 2 ratings)
Narrative economics studies the impact of popular narratives and stories on economic fluctuations in the context of human interests and emotions. Andreea Kremm and Mohammed Ibraaz Syed describe the use of emotion analysis, entity relationship extraction, and topic modeling in modeling narratives from written human communication. Read more.
5:25pm–6:05pm Wednesday, 09/12/2018
Location: 1A 12/14 Level: Non-technical
Bethann Noble (Cloudera), Daniel Huss (State Street), Abhishek Kodi (State Street)
Average rating: ****.
(4.00, 1 rating)
Bethann Noble, Abhishek Kodi, and Daniel Huss share their experience and best practices for designing and executing on a roadmap for open data science and AI for business. Read more.
5:25pm–6:05pm Wednesday, 09/12/2018
Location: 1A 15/16 Level: Intermediate
Secondary topics:  Financial Services
Joshua Patterson (NVIDIA), Onur Yilmaz (NVIDIA)
GPUs have allowed financial firms to accelerate their computationally demanding workloads. Today, the bottleneck has moved completely to ETL. The GPU Open Analytics Initiative (GoAi) is helping accelerate ETL while keeping the entire workflow on GPUs. Joshua Patterson and Onur Yilmaz discuss several GPU-accelerated data science tools and libraries. Read more.
5:25pm–6:05pm Wednesday, 09/12/2018
Location: 1 E15
Renee Yao (NVIDIA)
Average rating: *****
(5.00, 1 rating)
Renee Yao explains how generative adversarial networks (GAN) are successfully used to improve data generation and explores specific real-world examples where customers have deployed GANs to solve challenges in healthcare, space, transportation, and retail industries. Read more.
11:20am–12:00pm Thursday, 09/13/2018
Location: 1A 06/07 Level: Intermediate
Secondary topics:  Media, Marketing, Advertising, Text and Language processing and analysis
Andrew Montalenti (Parse.ly )
Average rating: *****
(5.00, 1 rating)
What can we learn from a one-billion-person live poll of the internet? Andrew Montalenti explains how Parse.ly has gathered a unique dataset of news reading sessions of billions of devices, peaking at over two million sessions per minute on thousands of high-traffic news and information websites, and how the company uses this data to unearth the secrets behind online content. Read more.
11:20am–12:00pm Thursday, 09/13/2018
Location: 1A 08 Level: Intermediate
Secondary topics:  Temporal data and time-series analytics
Cris Lowery (Baringa), Marc Warner (ASI)
Average rating: ****.
(4.00, 1 rating)
In EU households, heating and hot water alone account for 80% of energy usage. Cristobal Lowery and Marc Warner explain how future home energy management systems could improve their energy efficiency by predicting resident needs through utilities data, with a particular focus on the key data features, the need for data compression, and the data quality challenges. Read more.
11:20am–12:00pm Thursday, 09/13/2018
Location: 1A 12/14 Level: Beginner
Jeffrey Heer (Trifacta | University of Washington)
Average rating: ****.
(4.75, 4 ratings)
Jeffrey Heer offers an overview of Vega and Vega-Lite—high-level declarative languages for interactive visualization that support exploratory data analysis, communication, and the development of new visualization tools. Read more.
11:20am–12:00pm Thursday, 09/13/2018
Location: 1A 15/16 Level: Beginner
Secondary topics:  Deep Learning
Lars Hulstaert (Microsoft)
Average rating: *****
(5.00, 1 rating)
Transfer learning allows data scientists to leverage insights from large labeled datasets. The general idea of transfer learning is to use knowledge learned from tasks for which a lot of labeled data is available in settings where little labeled data is available. Lars Hulstaert explains what transfer learning is and how it can boost your NLP or CV pipelines. Read more.
1:10pm–1:50pm Thursday, 09/13/2018
Location: 1A 06/07 Level: Intermediate
Secondary topics:  Health and Medicine, Text and Language processing and analysis
David Talby (Pacific AI), Alberto Andreotti (John Snow Labs), Stacy Ashworth (SelectData), Tawny Nichols (Select Data)
Average rating: ***..
(3.00, 4 ratings)
David Talby, Alberto Andreotti, Stacy Ashworth, and Tawny Nichols outline a question-answering system for accurately extracting facts from free-text patient records and share best practices for training domain-specific deep learning NLP models. The solution is based on Spark NLP, an extension of Spark ML that provides state-of-the-art performance and accuracy for natural language understanding. Read more.
1:10pm–1:50pm Thursday, 09/13/2018
Location: 1A 08 Level: Non-technical
Secondary topics:  Data preparation, governance and privacy
Ihab Ilyas (Tamr | University of Waterloo)
Average rating: *****
(5.00, 2 ratings)
Machine learning tools promise to help solve data curation problems. While the principles are well understood, the engineering details in configuring and deploying ML techniques are the biggest hurdle. Ihab Ilyas explains why leveraging data semantics and domain-specific knowledge is key in delivering the optimizations necessary for truly scalable ML curation solutions. Read more.
1:10pm–1:50pm Thursday, 09/13/2018
Location: 1A 12/14 Level: Beginner
Secondary topics:  Ethics and Privacy, Financial Services, Media, Marketing, Advertising
Bob Levy (Virtual Cove, Inc.)
Average rating: ***..
(3.00, 1 rating)
Augmented reality opens a completely new lens on your data through which you see and accomplish amazing things. Bob Levy explains how to use simple Python scripts to leverage completely new plot types. You'll explore use cases revealing new insight into financial markets data as well as new ways of interacting with data that build trust in otherwise “black box” machine learning solutions. Read more.
1:10pm–1:50pm Thursday, 09/13/2018
Location: 1A 15/16 Level: Intermediate
Secondary topics:  Data Platforms, Deep Learning
Moty Fania (Intel), Sergei Kom (Intel)
Average rating: *****
(5.00, 1 rating)
Moty Fania and Sergei Kom share their experience and lessons learned implementing an AI inference platform to enable internal visual inspection use cases. The platform is based on open source technologies and was designed for real-time, streaming, and online actuation. Read more.
2:00pm–2:40pm Thursday, 09/13/2018
Location: 1A 08 Level: Intermediate
Secondary topics:  Data preparation, governance and privacy, Financial Services
Archana Anandakrishnan (American Express)
Average rating: ***..
(3.20, 5 ratings)
Building accurate machine learning models hinges on the quality of the data. Errors and anomalies get in the way of data scientists doing their best work. Archana Anandakrishnan explains how American Express created an automated, scalable system for measurement and management of data quality. The methods are modular and adaptable to any domain where accurate decisions from ML models are critical. Read more.
2:00pm–2:40pm Thursday, 09/13/2018
Location: 1A 12/14 Level: Non-technical
Brent Dykes (Domo)
Average rating: ****.
(4.78, 9 ratings)
Companies collect all kinds of data and use advanced tools and techniques to find insights, but they often fail in the last mile: communicating insights effectively to drive change. Brent Dykes discusses the power that stories wield over statistics and explores the art and science of data storytelling—an essential skill in today’s data economy. Read more.
2:00pm–2:40pm Thursday, 09/13/2018
Location: 1A 15/16 Level: Intermediate
Secondary topics:  Deep Learning, Media, Marketing, Advertising
Guoqiong Song (Intel), Wenjing Zhan (Talroo), Jacob Eisinger (Talroo )
Can the talent industry make the job search/match more relevant and personalized for a candidate by leveraging deep learning techniques? Guoqiong Song, Wenjing Zhan, and Jacob Eisinger demonstrate how to leverage distributed deep learning framework BigDL on Apache Spark to predict a candidate’s probability of applying to specific jobs based on their résumé. Read more.
3:30pm–4:10pm Thursday, 09/13/2018
Location: 1A 06/07 Level: Beginner
Secondary topics:  Temporal data and time-series analytics
Jared Lander (Lander Analytics)
Average rating: *****
(5.00, 3 ratings)
Temporal data is being produced in ever-greater quantity, but fortunately our time series capabilities are keeping pace. Jared Lander explores techniques for modeling time series, from traditional methods such as ARMA to more modern tools such as Prophet and machine learning models like XGBoost and neural nets. Along the way, Jared shares theory and code for training these models. Read more.
3:30pm–4:10pm Thursday, 09/13/2018
Location: 1A 12/14 Level: Beginner
Anna Nicanorova (Annalect)
Average rating: ***..
(3.00, 3 ratings)
Data visualization is supposed to be our map to information. However, contemporary charting techniques have a few shortcomings, including context reduction, hard numeric grasp, and perceptual dehumanization. Anna Nicanorova explains how augmented reality can solve these issues by presenting an intuitive and interactive environment for data exploration. Read more.
3:30pm–4:10pm Thursday, 09/13/2018
Location: 1A 15/16 Level: Advanced
Secondary topics:  Deep Learning
Ash Munshi (Pepperdata)
Ash Munshi outlines a technique for labeling applications using runtime measurements of CPU, memory, and network I/O along with a deep neural network. This labeling groups the applications into buckets that have understandable characteristics, which can then be used to reason about the cluster and its performance. Read more.
4:20pm–5:00pm Thursday, 09/13/2018
Location: 1A 06/07 Level: Non-technical
Secondary topics:  Machine Learning in the enterprise
Bill Franks (International Institute For Analytics)
Drawing on a recent study of the analytics maturity level of large enterprises by the International Institute for Analytics, Bill Franks discusses how maturity varies by industry, shares key steps organizations can take to move up the maturity scale, and explains how the research correlates analytics maturity with a wide range of success metrics, including financial and reputational measures. Read more.
4:20pm–5:00pm Thursday, 09/13/2018
Location: 1A 08 Level: Intermediate
Secondary topics:  Financial Services, Model lifecycle management
Harish Doddi (Datatron Technologies), Jerry Xu (Datatron Technologies)
Large financial institutions have many data science teams (e.g., those for fraud, credit risk, and marketing), each often using diverse set of tools to build predictive models. There are many challenges involved in productionizing these predictive AI models. Harish Doddi and Jerry Xu share challenges and lessons learned deploying AI models to production in large financial institutions. Read more.
4:20pm–5:00pm Thursday, 09/13/2018
Location: 1A 12/14 Level: Non-technical
Secondary topics:  Machine Learning in the enterprise
Brian O'Neill (Designing for Analytics)
Average rating: *****
(5.00, 5 ratings)
Gartner says 85%+ of big data projects will fail, despite the fact your company may have invested millions on engineering implementation. Why are customers and employees not engaging with these products and services? Brian O'Neill explains why a "people first, technology second" mission—a design strategy, in other words—enables the best UX and business outcomes possible. Read more.
4:20pm–5:00pm Thursday, 09/13/2018
Location: 1A 15/16 Level: Beginner
Secondary topics:  Deep Learning
Swetha Machanavajhala (Microsoft), Xiaoyong Zhu (Microsoft)
Average rating: *****
(5.00, 3 ratings)
In this auditory world, the human brain processes and reacts effortlessly to a variety of sounds. While many of us take this for granted, there are over 360 million in this world who are deaf or hard of hearing. Swetha Machanavajhala and Xiaoyong Zhu explain how to make the auditory world inclusive and meet the great demand in other sectors by applying deep learning on audio in Azure. Read more.