Sep 23–26, 2019

Data Science, Machine Learning, & AI

Machine learning lets you discover hidden insight from your data. It's a simple idea with phenomenal impact and sophisticated use cases like recommenders, text mining, real-time analytics, large-scale anomaly detection, and business forecasting.

At Strata, you’ll get a deeper and broader understanding of machine and deep learning—take a look at the sessions below.

Featured Speakers

Monday-Tuesday, September 23-24: 2-Day Training (Platinum & Training passes)
Tuesday, September 24: Tutorials (Gold & Silver passes)
Wednesday, September 25: Keynotes & Sessions (Platinum, Gold, Silver & Bronze passes)
8:45am | Location: 3E
Strata Data Conference Keynotes
10:50
Morning break
Thursday, September 26: Keynotes & Sessions (Platinum, Gold, Silver & Bronze passes)
8:45am | Location: 3E
Strata Data Conference Keynotes
10:50
Morning break
9:00am - 5:00pm Monday, September 23 & Tuesday, September 24
Location: 1A 03
Bargava Subramanian (Binaize Labs), Amit Kapoor (narrativeVIZ)
Recommendation systems play a significant role—for users, a new world of options; for companies, it drives engagement and satisfaction. Amit Kapoor and Bargava Subramanian walk you through the different paradigms of recommendation systems and introduce you to deep learning-based approaches. You'll gain the practical hands-on knowledge to build, select, deploy, and maintain a recommendation system. Read more.
9:00am - 5:00pm Monday, September 23 & Tuesday, September 24
Location: 1A 15/16
Michael Cullan (The Data Incubator)
Michael Cullan walks you through developing a machine learning pipeline from prototyping to production. You'll learn about data cleaning, feature engineering, model building and evaluation, and deployment and then extend these models into two applications from real-world datasets. All work will be done in Python. Read more.
9:00am - 5:00pm Monday, September 23 & Tuesday, September 24
Location: 1A 18
Ian Cook (Cloudera)
Advancing your career in data science requires learning new languages and frameworks—but you face an overwhelming array of choices, each with different syntaxes, conventions, and terminology. Ian Cook simplifies the learning process by outlining the abstractions common to these systems. You'll go hands-on exercises to overcome obstacles to getting started using new tools. Read more.
9:00am - 5:00pm Monday, September 23 & Tuesday, September 24
Location: 1E 07
Dylan Bargteil (The Data Incubator)
The TensorFlow library provides for the use of computational graphs with automatic parallelization across resources. This architecture is ideal for implementing neural networks. Dylan Bargteil explores TensorFlow's capabilities in Python, demonstrating how to build machine learning algorithms piece by piece and how to use TensorFlow's Keras API with several hands-on applications. Read more.
9:00am12:30pm Tuesday, September 24, 2019
Location: 1A 12/14
Sourav Dey (Manifold), Jakov Kucan (Manifold)
Sourav Dey and Jakov Kucan walk you through the six steps of the Lean AI process and explain how it helps your ML engineers work as an an integrated part of your development and production teams. You'll get a hands-on example using real-world data, so you can get up and running with Docker and Orbyter and see firsthand how streamlined they can make your workflow. Read more.
9:00am12:30pm Tuesday, September 24, 2019
Location: 1A 21
Jules Damji (Databricks)
ML development brings many new complexities beyond the software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools, and parameters to get the best results, and they need to track this information. Jules Damji walks you through MLflow, an open source project that simplifies the entire ML lifecycle, to solve this problem. Read more.
9:00am12:30pm Tuesday, September 24, 2019
Location: 1A 23/24
Alice Zhao (Metis)
As a data scientist, we are known to crunch numbers, but you need to decide what to do when you run into text data. Alice Zhao walks you through the steps to turn text data into a format that a machine can understand, explores some of the most popular text analytics techniques, and showcases several natural language processing (NLP) libraries in Python, including NLTK, TextBlob, spaCy, and gensim. Read more.
9:00am12:30pm Tuesday, September 24, 2019
Location: 1E 12/13
Secondary topics:  Deep Learning
Bruno Goncalves (Data For Science)
You'll go hands-on to learn the theoretical foundations and principal ideas underlying deep learning and neural networks. Bruno Gonçalves provides the code structure of the implementations that closely resembles the way Keras is structured, so that by the end of the course, you'll be prepared to dive deeper into the deep learning applications of your choice. Read more.
1:30pm5:00pm Tuesday, September 24, 2019
Location: 1A 12/14
Garrett Hoffman (StockTwits)
Garrett Hoffman walks you through deep learning methods for natural language processing and natural language understanding tasks, using a live example in Python and TensorFlow with StockTwits data. Methods include Word2Vec, recurrent neural networks (RNNs) and variants (long short-term memory [LSTM] and gated recurrent unit [GRU]), and convolutional neural networks. Read more.
1:30pm5:00pm Tuesday, September 24, 2019
Location: 1A 21
Karthik Sonti (Amazon Web Services), Emily Webber (Amazon Web Services), Varun Rao Bhamidimarri (Amazon Web Services)
Karthik Sonti, Emily Webber, and Varun Rao Bhamidimarri introduce you to the Amazon SageMaker machine learning platform and provide a high-level discussion of recommender systems. You'll dig into different machine learning approaches for recommender systems, including common methods such as matrix factorization as well as newer embedding approaches. Read more.
1:30pm5:00pm Tuesday, September 24, 2019
Location: 1A 23/24
David Talby (Pacific AI), Alex Thomas (John Snow Labs), Saif Addin Ellafi (John Snow Labs), Claudiu Branzan (Accenture)
David Talby, Alex Thomas, Saif Addin Ellafi, and Claudiu Branzan walk you through state-of-the-art natural language processing (NLP) using the highly performant, highly scalable open source Spark NLP library. You'll spend about half your time coding as you work through four sections, each with an end-to-end working codebase that you can change and improve. Read more.
1:30pm5:00pm Tuesday, September 24, 2019
Location: 1E 11
Sophie Watson (Red Hat), William Benton (Red Hat)
Go hands-on with Sophie Watson and William Benton to examine data structures that let you answer interesting queries about massive datasets in fixed amounts of space and constant time. This seems like magic, but they'll explain the key trick that makes it possible and show you how to use these structures for real-world machine learning and data engineering applications. Read more.
9:05am9:15am Wednesday, September 25, 2019
Location: 3E
Ben Lorica (O'Reilly)
Ben Lorica dives into emerging technologies for building data infrastructures and machine learning platforms. Read more.
11:20am12:00pm Wednesday, September 25, 2019
Location: 3B - Expo Hall
Secondary topics:  Ethics
Harsha Nori (Microsoft), Samuel Jenkins (Microsoft), Rich Caruana (Microsoft)
Understanding decisions made by machine learning systems is critical for sensitive uses, ensuring fairness, and debugging production models. Interpretability presents options for trying to understand model decisions. Harsha Nori, Sameul Jenkins, and Rich Caruana explore the tools Microsoft is releasing to help you train powerful, interpretable models and interpret existing black box systems. Read more.
11:20am12:00pm Wednesday, September 25, 2019
Location: 1A 06/07
Meir TOLEDANO (Anodot)
ARIMA has been used for time series modeling for decades. In practice, most time series collected from human activities exhibit seasonal patterns, but the efficient estimation of seasonal ARIMA ((S)ARIMA) models was inefficient for decades. Meir Toledano explains how Anodot was able to apply the technique for forecasting and anomaly detection for millions of time series every day. Read more.
11:20am12:00pm Wednesday, September 25, 2019
Location: 1A 08/10
Nan Zhu (Uber), Felix Cheung (Uber)
XGBoost has been widely deployed in companies across the industry. Nan Zhu and Felix Cheung dive into the internals of distributed training in XGBoost and demonstrate how XGBoost resolves the business problem in Uber with a scale to thousands of workers and tens of TB of training data. Read more.
11:20am12:00pm Wednesday, September 25, 2019
Location: 1A 12/14
Ted Dunning (MapR)
Feature engineering is generally the section that gets left out of machine learning books, but it's also the most critical part in practice. Ted Dunning explores techniques, a few well known, but some rarely spoken of outside the institutional knowledge of top teams, including how to handle categorical inputs, natural language, transactions, and more in the context of machine learning. Read more.
1:15pm1:55pm Wednesday, September 25, 2019
Location: 3B - Expo Hall
Saif Addin Ellafi (John Snow Labs), Scott Hoch (BlackBox Engineering)
Recruiting patients for clinical trials is a major challenge in drug development. Saif Addin Ellafi and Scott Hoch explain how Deep 6 uses Spark NLP to scale its training and inference pipelines to millions of patients while achieving state-of-the-art accuracy. They dive into the technical challenges, the architecture of the full solution, and the lessons the company learned. Read more.
1:15pm1:55pm Wednesday, September 25, 2019
Location: 1A 06/07
Every NLP-based document-processing solution depends on converting scanned documents and images to machine readable text using an OCR solution, limited by the quality of scanned images. Nagendra Shishodia, Chaithanya Manda, and Solmaz Torabi explore how GAN can bring significant efficiencies in any document-processing solution by enhancing resolution and denoising scanned images. Read more.
1:15pm1:55pm Wednesday, September 25, 2019
Location: 1A 08/10
James Tang (Walmart Labs), Yiyi Zeng (Walmart Labs), Linhong Kang (Walmart Labs)
James Tang, Yiyi Zeng, and Linhong Kang outline how Walmart provides a secure and seamless shopping experience through machine learning and large scale data analysis on centralized platform. Read more.
1:15pm1:55pm Wednesday, September 25, 2019
Location: 1A 12/14
Secondary topics:  Deep Learning
Shioulin Sam (Cloudera Fast Forward Labs)
Supervised machine learning requires large labeled datasets—a prohibitive limitation in many real world applications. But this could be avoided if machines could earn with a few labeled examples. Shioulin Sam explores and demonstrates an algorithmic solution that relies on collaboration between human and machine to label smartly, and she outlines product possibilities. Read more.
2:05pm2:45pm Wednesday, September 25, 2019
Location: 3B - Expo Hall
Panos Alexopoulos (Textkernel)
In an era where discussions among data scientists are monopolized by the latest trends in machine learning, the role of semantics in data science is often underplayed. Panos Alexopoulos presents real-world cases where making fine, seemingly pedantic, distinctions in the meaning of data science tasks and the related data has helped improve significantly the effectiveness and value. Read more.
2:05pm2:45pm Wednesday, September 25, 2019
Location: 1A 06/07
Keshav Peswani (Expedia Group), Ashish Aggarwal (Expedia Group)
Observability is the key in modern architecture to quickly detect and repair problems in microservices. Modern observability platforms have evolved beyond simple application logs and include distributed tracing systems like Zipkin and Haystack. Keshav Peswani and Ashish Aggarwal explore how combining them with real-time, intelligent alerting mechanisms helps in the automated detection of problems. Read more.
2:05pm2:45pm Wednesday, September 25, 2019
Location: 1A 08/10
Secondary topics:  Culture and Organization
Ann Spencer (Domino), Amy Heineike (Primer), Paco Nathan (derwen.ai), Chris Wiggins (NYT | Columbia)
If, as a data scientist, you've wondered why it takes so long to deploy your model into production or, as an engineer, thought data scientists have no idea what they want, you're not alone. Join a lively discussion with industry veterans Ann Spencer, Paco Nathan, Amy Heineike, and Chris Wiggins to find best practices or insights on increasing collaboration when developing and deploying models. Read more.
2:05pm2:45pm Wednesday, September 25, 2019
Location: 1A 12/14
Mikio Braun (Zalando)
With ML becoming more mainstream, the side effects of machine learning and AI on our lives become more visible. You have to take extra measures to make machine learning models fair and unbiased. And awareness for preserving the privacy in ML models is rapidly growing. Mikio Braun explores techniques and concepts around fairness, privacy, and security when it comes to machine learning models. Read more.
2:55pm3:35pm Wednesday, September 25, 2019
Location: 3B - Expo Hall
Gerard de Melo (Rutgers University)
Gerard de Melo takes a deep dive into the kinds of sentiment and emotion consumers associate with a text. With new data-driven approaches, organizations can better pay attention to what's being said about them in different markets. And you can consider fonts and palettes best suited to convey specific emotions, so organizations can make informed choices when presenting information to consumers. Read more.
2:55pm3:35pm Wednesday, September 25, 2019
Location: 1A 06/07
Tony Xing (Microsoft), Congrui Huang (Microsoft), Qiyang Li (Microsoft), Wenyi Yang (Microsoft)
Anomaly detection may sound old fashioned, yet it's super important in many industry applications. Tony Xing, Congrui Huang, Qiyang Li, and Wenyi Yang detail a novel anomaly-detection algorithm based on spectral residual (SR) and convolutional neural network (CNN) and how this method was applied in the monitoring system supporting Microsoft AIOps and business incident prevention. Read more.
2:55pm3:35pm Wednesday, September 25, 2019
Location: 1A 08/10
Fei Wang (CarGurus)
Fei Wang takes a deep dive into a case study for the CarGurus TV Attribution Model. You'll understand how you can leverage the creation of a causal inference model to calculate cost per acquisition (CPA) of TV spend and measure effectiveness when compared to CPA of digital performance marketing spend. Read more.
2:55pm3:35pm Wednesday, September 25, 2019
Location: 1A 12/14
Secondary topics:  Financial Services
Jari Koister (FICO )
Machine learning and constraint-based optimization are both used to solve critical business problems. They come from distinct research communities and have traditionally been treated separately. But Jari Koister examines how they're similar, how they're different, and how they can be used to solve complex problems with amazing results. Read more.
4:35pm5:15pm Wednesday, September 25, 2019
Location: 3B - Expo Hall
John Berryman (Eventbrite)
Eventbrite is exploring a new machine learning approach that allows it to harvest data from customer search logs and automatically tag events based upon their content. John Berryman dives into the results and how they have allowed the company to provide users with a better inventory-browsing experience. Read more.
4:35pm5:15pm Wednesday, September 25, 2019
Location: 1A 06/07
Anirudh Koul (Microsoft), Meher Kasam (Square)
Over the last few years, convolutional neural networks (CNNs) have risen in popularity, especially in the area of computer vision. Anirudh Koul and Meher Kasam take you through how you can get deep neural nets to run efficiently on mobile devices. Read more.
4:35pm5:15pm Wednesday, September 25, 2019
Location: 1A 08/10
Robert Pesch (inovex), Robin Senge (inovex)
Data-driven software is revolutionizing the world and enable intelligent services we interact with daily. Robert Pesch and Robin Senge outline the development process, statistical modeling, data-driven decision making, and components needed for productionizing a fully automated and highly scalable demand forecasting system for an online grocery shop for a billion-dollar retail group in Europe. Read more.
4:35pm5:15pm Wednesday, September 25, 2019
Location: 1A 12/14
Criteo’s infrastructure provides the capacity and connectivity to host Criteo’s platform and applications. The evolution of this infrastructure is driven by the ability to forecast Criteo’s traffic demand. Hamlet Jesse Medina Ruiz explains how Criteo uses Bayesian dynamic time series models to accurately forecast its traffic load and optimize hardware resources across data centers. Read more.
5:25pm6:05pm Wednesday, September 25, 2019
Location: 3B - Expo Hall
Sireesha Muppala (Amazon Web Services), Shelbee Eigenbrode (Amazon Web Services), Emily Webber (Amazon Web Services)
Mansplaining. Know it? Hate it? Want to make it go away? Sireesha Muppala, Shelbee Eigenbrode, and Emily Webber tackle the problem of men talking over or down to women and its impact on career progression for women. They also demonstrate an Alexa skill that uses deep learning techniques on incoming audio feeds, examine ownership of the problem for women and men, and suggest helpful strategies. Read more.
5:25pm6:05pm Wednesday, September 25, 2019
Location: 1A 06/07
The common perception of deep learning is that it results in a fully self-contained model. However, in most cases, these models have similar requirements for data preprocessing as does more "traditional" machine learning. Despite this, there are few standard solutions for deploying end-to-end deep learning. Nick Pentreath explores how the ONNX format and ecosystem addresses this challenge. Read more.
5:25pm6:05pm Wednesday, September 25, 2019
Location: 1A 08/10
Secondary topics:  Media and Advertising
Aaron Owen (Major League Baseball), Matthew Horton (Major League Baseball), Josh Hamilton (Major League Baseball)
Using SAS, Python, and AWS SageMaker, Major League Baseball's (MLB's) data science team outlines how it predicts ticket purchasers’ likelihood to purchase again, evaluates prospective season schedules, estimates customer lifetime value, optimizes promotion schedules, quantifies the strength of fan avidity, and monitors the health of monthly subscriptions to its game-streaming service. Read more.
5:25pm6:05pm Wednesday, September 25, 2019
Location: 1A 12/14
Secondary topics:  Retail and e-commerce
Subhasish Misra (Walmart Labs)
Causal questions are ubiquitous, and randomized tests are considered the gold standard. However, such tests are not always feasible, and then you just have observational data to get to causal insights. But techniques such as matching offer an opportunity to solve this. Subhasish Misra explores this and practical tips when trying to infer causal effects. Read more.
5:25pm6:05pm Wednesday, September 25, 2019
Location: 1A 01/02
Secondary topics:  Transportation and Logistics
Brandy Freitas (Pitney Bowes)
Brandy Freitas examines the interplay between graph analytics and machine learning, improved feature engineering with graph native algorithms, and how to harness the power of graph structure for machine learning through node embedding. Read more.
5:25pm6:05pm Wednesday, September 25, 2019
Location: 1E 06
venkata gunnu (Comcast), Harish Doddi (Datatron)
Machine learning infrastructure is key to the success of AI at scale in enterprises, with many challenges when you want to bring machine learning models to a production environment, given the legacy of the enterprise environment. Venkata Gunnu and Harish Doddi explore some key insights, what worked, what didn't work, and best practices that helped the data engineering and data science teams. Read more.
11:20am12:00pm Thursday, September 26, 2019
Location: 3B - Expo Hall
Brian Keng (Rubikloud)
Automating decisions require a system to consider more than just a data-driven prediction. Real-world decisions require additional constraints and fuzzy objectives to ensure they're robust and consistent with business goals. Brian Keng takes a deep dive into how to leverage modern machine learning methods and traditional mathematical optimization techniques for decision automation. Read more.
11:20am12:00pm Thursday, September 26, 2019
Location: 1A 06/07
Shital Shah (Microsoft Research)
Taming massive deep learning models, data, and training times requires new way of thinking. Shital Shah explores new tools and methods to better understand AI. Explaining the decisions made by AI not only helps us accelerate its development but also make it safe and more trustworthy. Read more.
11:20am12:00pm Thursday, September 26, 2019
Location: 1A 08/10
Anjali Samani (CircleUp)
The application of smoothing and imputation strategies is common practice in predictive modeling and time series analysis. With a technique-agnostic approach, Anjali Samani provides qualitative and quantitative frameworks that address questions related to smoothing and imputation of missing values to improve data density. Read more.
11:20am12:00pm Thursday, September 26, 2019
Location: 1A 12/14
Secondary topics:  Ethics
Alejandro Saucedo (The Institute for Ethical AI & Machine Learning)
Alejandro Saucedo demystifies AI explainability through a hands-on case study, where the objective is to automate a loan-approval process by building and evaluating a deep learning model. He introduces motivations through the practical risks that arise with undesired bias and black box models and shows you how to tackle these challenges using tools from the latest research and domain knowledge. Read more.
11:20am12:00pm Thursday, September 26, 2019
Location: 1E 14
John Allen (Deutsche Bank)
As an early adopter of data science, machine learning, and AI, Deutsche Bank's analytics function is trailblazing new ways to drive revenues, lower costs, and reduce risk across all areas of the group. John Allen shares how his team combines commercial offerings with open source technologies to revolutionize legacy processes and transform the way the bank uses technology to drive innovation. Read more.
1:15pm1:55pm Thursday, September 26, 2019
Location: 3B - Expo Hall
Victor Dibia (Cloudera Fast Forward Labs)
Recent advances in machine learning frameworks for the browser such as TensorFlow provides the opportunity to craft truly novel experiences within frontend applications. Victor Dibia explores the state of the art for machine learning in the browser using TensorFlow and outlines its use in the design of Handtrack.js—a library for prototyping real-time hand detection in the browser. Read more.
1:15pm1:55pm Thursday, September 26, 2019
Location: 1A 06/07
Sameer Agarwal (Facebook), Ankit Agarwal (Facebook Inc.)
Apache Spark is the largest compute engine at Facebook by CPU. Sameer Agarwal dives into the story of how Facebook optimized, tuned, and scaled Apache Spark to run on clusters of tens of thousands of machines, processing hundreds of petabytes of data, and being used by thousands of data scientists, engineers, and product analysts every day. Read more.
1:15pm1:55pm Thursday, September 26, 2019
Location: 1A 08/10
Alfred Whitehead (Klick), clare jeon (Klick)
Time series forecasts depend on sensors or measurements made in the real, messy world. The sensors flake out, get turned off, disconnect, and otherwise conspire to cause missing signals. Signals that may tell you what tomorrow's temperature will be or what your blood glucose levels are before bed. Alfred Whitehead and Clare Jeon explore methods for handling data gaps and when to consider which. Read more.
1:15pm1:55pm Thursday, September 26, 2019
Location: 1A 12/14
Sandra Carrico (GLYNT)
Sandra Carrico explores mixed formal learning, explains it, and outlines one machine learning example that previously used large numbers of examples and now learns with either zero or a handful of training examples. It maps apparently idiosyncratic techniques to mixed formal learning, a general AI architecture that you can use in your projects. Read more.
2:05pm2:45pm Thursday, September 26, 2019
Location: 3B - Expo Hall
Heitor Murilo Gomes (Télécom ParisTech), Albert Bifet (Télécom ParisTech)
Heitor Murilo Gomes and Albert Bifet introduce you to a machine learning pipeline for streaming data using the streamDM framework. You'll also learn how to use streamDM for supervised and unsupervised learning tasks, see examples of online preprocessing methods, and discover how to expand the framework by adding new learning algorithms or preprocessing methods. Read more.
2:05pm2:45pm Thursday, September 26, 2019
Location: 1A 06/07
Secondary topics:  Deep Learning, Streaming and IoT
Ryan Foltz (Exabeam)
Unmanaged and foreign devices in the corporate networks pose a security risk, and the first step toward reducing this risk is the ability to identify them. Ryan Foltz walks you through a comprehensive device management machine learning model based on deep learning that performs anomaly detection based on only device names to flag devices that do not follow naming structures. Read more.
2:05pm2:45pm Thursday, September 26, 2019
Location: 1A 08/10
Anais Dotis (InfluxData)
Machine learning (ML) gets a lot of hype, but its classical predecessors are still immensely powerful, especially in the time series space, and classical algorithms outperform machine learning methods in time series forecasting. Anais Dotis dives into how she used the Holt-Winters forecasting algorithm to predict water levels in a creek. Read more.
2:05pm2:45pm Thursday, September 26, 2019
Location: 1A 12/14
Mumin Ransom (Comcast), Nick Pinckernell (Comcast)
Mumin Ransom gives an overview of the data management and privacy challenges around automating ML model (re)deployments and stream-based inferencing at scale. Read more.
3:45pm4:25pm Thursday, September 26, 2019
Location: 1A 06/07
Secondary topics:  Deep Learning
Sajan Govindan (Intel)
Sajan Govindan outlines CERN’s research on deep learning in high energy physics experiments as an alternative to customized rule-based methods with an example of topology classification to improve real-time event selection at the Large Hadron Collider. CERN uses deep learning pipelines on Apache Spark using BigDL and Analytics Zoo open source software on Intel Xeon-based clusters. Read more.
3:45pm4:25pm Thursday, September 26, 2019
Location: 1A 12/14
Secondary topics:  Financial Services
David Mack (Octavian)
Graphs are a powerful way to represent knowledge. Organizations, in fields such as biosciences and finance, are starting to amass large knowledge graphs, but they lack the machine learning tools to extract insights from them. David Mack offers an overview of what insights are possible and surveys the most popular approaches. Read more.
3:45pm4:25pm Thursday, September 26, 2019
Location: 1A 08/10
Chad Scherrer (Metis)
Chad Scherrer explores the basic ideas in Soss, a new probabilistic programming library for Julia. Soss allows a high-level representation of the kinds of models often written in PyMC3 or Stan, and offers a way to programmatically specify and apply model transformations like approximations or reparameterizations. Read more.
4:35pm5:15pm Thursday, September 26, 2019
Location: 1A 06/07
Secondary topics:  Deep Learning
Naoto Umemori (NTT DATA), Masaru Dobashi (NTT DATA)
Giant hogweed is a highly toxic plant. Naoto Umemori and Masaru Dobashi aim to automate the process of detecting the plant with technologies like drones and image recognition and detection using machine learning. You'll see how they designed the architecture, took advantage of big data and machine and deep learning technologies (e.g., Hadoop, Spark, and TensorFlow), and the lessons they learned. Read more.
4:35pm5:15pm Thursday, September 26, 2019
Location: 1E 07/08
Jordan Volz (Dataiku)
Spark on Kubernetes is a winning combination for data science that stitches together a flexible platform harnessing the best of both worlds. Jordan Volz gives a brief overview of Spark and Kubernetes, the Spark on Kubernetes project, why it’s an ideal fit for data scientists who may have been dissatisfied with other iterations of Spark in the past, and some applications. Read more.
4:35pm5:15pm Thursday, September 26, 2019
Location: 1A 08/10
Jeroen Janssens (Data Science Workshops)
Jeroen Janssens dives into stochastic outlier section (SOS), an unsupervised algorithm for detecting anomalies in large, high-dimensional data. SOS has been implemented in Python, R, and, most recently, Spark. He illustrates the idea and intuition behind SOS, demonstrates the implementation of SOS on top of Spark, and applies SOS to a real-world use case. Read more.

    Contact us

    confreg@oreilly.com

    For conference registration information and customer service

    partners@oreilly.com

    For more information on community discounts and trade opportunities with O’Reilly conferences

    strataconf@oreilly.com

    For information on exhibiting or sponsoring a conference

    pr@oreilly.com

    For media/analyst press inquires