Sep 23–26, 2019
 
3B - Expo Hall
Add ML is not enough: Decision automation in the real world to your personal schedule
11:20am ML is not enough: Decision automation in the real world Brian Keng (Rubikloud Technologies Inc)
Add Machine learning for streaming data: Practical insights to your personal schedule
2:05pm Machine learning for streaming data: Practical insights Heitor Murilo Gomes (Télécom ParisTech), Albert Bifet (Télécom ParisTech)
1A 06/07
Add Scaling Apache Spark at Facebook to your personal schedule
1:15pm Scaling Apache Spark at Facebook Sameer Agarwal (Facebook Inc.)
Add Deep learning technologies for giant hogweed eradication to your personal schedule
4:35pm Deep learning technologies for giant hogweed eradication Naoto Umemori (NTT DATA Corporation), Masaru Dobashi (NTT Data Corp.)
1A 08/10
Add Handling data gaps in time series using imputation to your personal schedule
1:15pm Handling data gaps in time series using imputation Alfred Whitehead (Klick), Clare Jeon (KLICK INC)
Add When Holt-Winters is better than machine learning to your personal schedule
2:05pm When Holt-Winters is better than machine learning Anais Dotis (InfluxData)
Add Scalable anomaly detection with Spark and SOS to your personal schedule
4:35pm Scalable anomaly detection with Spark and SOS Jeroen Janssens (Data Science Workshops B.V.)
1A 12
Add A practical guide to algorithmic bias and explainability in machine learning to your personal schedule
11:20am A practical guide to algorithmic bias and explainability in machine learning Alejandro Saucedo (The Institute for Ethical AI & Machine Learning)
Add An introduction to machine learning on graphs to your personal schedule
3:45pm An introduction to machine learning on graphs David Mack (Octavian)
1A 15/16
Add Posttransaction processing using Apache Pulsar at Narvar to your personal schedule
2:05pm Posttransaction processing using Apache Pulsar at Narvar Karthik Ramasamy (Streamlio), Anand Madhavan (Narvar)
Add SK Telecom's 5G network monitoring and 3D visualization on streaming technologies to your personal schedule
3:45pm SK Telecom's 5G network monitoring and 3D visualization on streaming technologies Jonghyok Lee (SK Telecom), Chon Yong Lee (SK Telecom)
Add The why and how of data lineage to your personal schedule
4:35pm The why and how of data lineage Neelesh Salian (Stitch Fix)
1A 21
Add Online machine learning in streaming applications to your personal schedule
11:20am Online machine learning in streaming applications Stavros Kontopoulos (Lightbend), Debasish Ghosh (Lightbend )
Add The New SDLC: CI/CD in the age of machine learning to your personal schedule
2:05pm The New SDLC: CI/CD in the age of machine learning Diego Oppenheimer (Algorithmia)
Add MLOps: Applying DevOps practices to machine learning workloads to your personal schedule
3:45pm MLOps: Applying DevOps practices to machine learning workloads Sireesha Muppala (Amazon Web Services), Shelbee Eigenbrode (Amazon Web Services), Randall DeFauw (Amazon Web Services)
1A 23
Add Your cloud, your ML, but more and more scale? How SurveyMonkey did it to your personal schedule
11:20am Your cloud, your ML, but more and more scale? How SurveyMonkey did it Jing Huang (SurveyMonkey), Jessica Mong (SurveyMonkey)
Add How to performance-tune Spark applications in large clusters to your personal schedule
1:15pm How to performance-tune Spark applications in large clusters Omkar Joshi (Uber Technologies), Bo Yang (uber inc)
Add Enabling big data and AI workloads on the object store at DBS Bank to your personal schedule
3:45pm Enabling big data and AI workloads on the object store at DBS Bank Vitaliy Baklikov (Development Bank of Singapore), Dipti Borkar (Alluxio )
Add Bridging the gap between big data computing and high-performance computing to your personal schedule
4:35pm Bridging the gap between big data computing and high-performance computing Supun Kamburugamuve (Indiana University)
1E 07/08
Add Using Spark for crunching astronomical data on the LSST scale to your personal schedule
11:20am Using Spark for crunching astronomical data on the LSST scale Petar Zecevic (SV Group d.o.o.)
Add Fuzzy matching and deduplicating data: Techniques for advanced data prep to your personal schedule
2:05pm Fuzzy matching and deduplicating data: Techniques for advanced data prep Nikki Rouda (Amazon Web Services), Roy Hasson (Amazon Web Services)
Add Scaling data engineers to your personal schedule
4:35pm Scaling data engineers Evgeny Vinogradov (Yandex.Money)
1E 09
Add Intelligent design patterns for cloud-based analytics and BI to your personal schedule
1:15pm Intelligent design patterns for cloud-based analytics and BI Shant Hovsepian (Arcadia Data)
Add Building a best-in-class data lake on AWS and Azure to your personal schedule
2:05pm Building a best-in-class data lake on AWS and Azure Tomer Shiran (Dremio), Jacques Nadeau (Dremio)
Add Using Spark to speed up the diagnosis performance for big data applications  to your personal schedule
4:35pm Using Spark to speed up the diagnosis performance for big data applications Ruixin Xu (Microsoft), Long Tian (Microsoft), Yu Zhou (Microsoft)
1E 10/11
2:05pm
Add Executive Briefing: Building a culture of self-service, from predeployment to continued engagement to your personal schedule
3:45pm Executive Briefing: Building a culture of self-service, from predeployment to continued engagement Jonathan Tudor (GE Aviation), Ross Schalmo (GE Aviation)
1E 12/13
11:20am
Add An in-depth look at the data science career: Defining roles, assessing skills to your personal schedule
1:15pm An in-depth look at the data science career: Defining roles, assessing skills Usama Fayyad (Open Insights & OODA Health, Inc.), Hamit Hamutcu (Analytics Center)
Add Migrating millions of users from voice- and email-based customer support to a chatbot to your personal schedule
3:45pm Migrating millions of users from voice- and email-based customer support to a chatbot Madhu Gopinathan (MakeMyTrip), Sanjay Mohan (MakeMyTrip)
1E 14
1:15pm
Add Purposefully designing technology for civic engagement to your personal schedule
3:45pm Purposefully designing technology for civic engagement Audrey Lobo-Pulo (Phoensight), Annette Hester (National Energy Board, Canada), Ryan Hum (National Energy Board, Canada)
1A 01/02
Add Powering the future with data intelligence (sponsored by Collibra) to your personal schedule
2:05pm Powering the future with data intelligence (sponsored by Collibra) Jim Cushman (Collibra), Piyus Jain (Progressive)
1A 03
1A 04/05
1E 06
Add Thursday keynotes to your personal schedule
3E
8:45am Thursday keynotes Ben Lorica (O'Reilly Media), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Add Staying safe in the AI era to your personal schedule
8:55am Staying safe in the AI era Cassie Kozyrkov (Google)
Add AI is not magic. It’s computer science. to your personal schedule
9:35am AI is not magic. It’s computer science. Robert Thomas (IBM), Tim O'Reilly (O'Reilly Media)
Add Data sonification: Making music from the yield curve to your personal schedule
10:30am Data sonification: Making music from the yield curve Alan Smith (Financial Times)
10:50am Morning break sponsored by Cisco | Room: Expo Hall - 3B
12:00pm Break | Room: Expo Hall - 3B
2:45pm Afternoon break sponsored by Io-Tahoe | Room: Expo Hall - 3B
Add Speed Networking to your personal schedule
8:15am Speed Networking | Room: Keynote Foyer
Add Thursday Business Summit Lunch to your personal schedule
12:00pm Thursday Business Summit Lunch | Room: Expo Hall - 3D
11:20am-12:00pm (40m) Data Science, Machine Learning, & AI Culture and Organization, Retail and e-commerce, Transportation and Logistics
ML is not enough: Decision automation in the real world
Brian Keng (Rubikloud Technologies Inc)
Automating decisions require a system to consider more than just a data-driven prediction. Real-world decisions require additional constraints and fuzzy objectives to ensure that they are robust and consistent with business goals. This talk will describe how to leverage modern machine learning methods and traditional mathematical optimization techniques for decision automation.
1:15pm-1:55pm (40m) Data Science, Machine Learning, & AI Deep dive into specific tools, platforms, or frameworks, Deep Learning
Handtrack.js: Building gesture-based interactions in the browser using TensorFlow.js
Victor Dibia (Cloudera Fast Forward Labs)
Recent advances in Machine Learning frameworks for the browser such as Tensorflow.js provides opportunity to craft truly novel experiences within front-end applications. This talk explores the state of the art for Machine Learning in the browser using Tensorflow.js and covers its use in the design of Handtrack.js - a library for prototyping real time hand detection in the browser.
2:05pm-2:45pm (40m) Data Science, Machine Learning, & AI Streaming and IoT, Telecom, Temporal data and time-series analytics
Machine learning for streaming data: Practical insights
Heitor Murilo Gomes (Télécom ParisTech), Albert Bifet (Télécom ParisTech)
In this talk, we show how to develop a machine learning pipeline for streaming data using the StreamDM framework (https://github.com/huawei-noah/streamDM). We also introduce how to use StreamDM for supervised and unsupervised learning tasks, show examples of online preprocessing methods, and how to expand the framework adding new learning algorithms or preprocessing methods.
11:20am-12:00pm (40m) Data Science, Machine Learning, & AI
Getting to know the elephant: Real-time debugging and visualization for deep learning
Shital Shah (Microsoft Research)
How do we visualize what exactly deep learning is doing? Taming the massive models, data and training times requires new way of thinking about them. In talk we will introduce explore new tools and methods to understand AI better. Explaining the decisions made by AI not only helps us accelerate its development but also make it safe and more trustworthy.
1:15pm-1:55pm (40m) Data Science, Machine Learning, & AI
Scaling Apache Spark at Facebook
Sameer Agarwal (Facebook Inc.)
Apache Spark is the largest compute engine at Facebook by CPU. This talk will cover the story of how we optimized, tuned and scaled Apache Spark at Facebook to run on clusters of tens of thousands of machines, processing hundreds of petabytes of data, and used by thousands of data scientists, engineers and product analysts every day.
2:05pm-2:45pm (40m) Data Science, Machine Learning, & AI Deep Learning, Streaming and IoT
Learning asset naming patterns to find risky unmanaged devices
Ryan Foltz (Exabeam)
Unmanaged & foreign devices in the corporate networks pose a security risk. The 1st step toward reducing risk from these devices is the ability to identify them. To have a comprehensive device management program, we proposed a machine learning model based on Deep Learning to perform anomaly detection based on only device names to flag devices that do not follow device naming structures.
3:45pm-4:25pm (40m) Data Science, Machine Learning, & AI Deep Learning
Deep learning on Apache Spark at CERN’s Large Hadron Collider with Analytics Zoo
Sajan Govindan (Intel), Luca Canali (CERN)
We will show CERN’s research on applying Deep Learning in High Energy Physics experiments as an alternative to customized rule based methods with an example of topology classification to improve real-time event selection at the Large Hadron Collider experiments. CERN implemented deep learning pipelines on Apache Spark using BigDL and Analytics Zoo open source software on Intel Xeon-based clusters
4:35pm-5:15pm (40m) Data Science, Machine Learning, & AI Deep Learning
Deep learning technologies for giant hogweed eradication
Naoto Umemori (NTT DATA Corporation), Masaru Dobashi (NTT Data Corp.)
Giant Hogweed is a highly toxic plant. Our project aims to automate the process of detecting the Giant Hogweed by exploiting technologies like drones and image recognition/detection using Machine Learning. We show you how we designed the architecture, how we took advantage of both of Big Data and Machine / Deep Learning technologies (e.g. Hadoop, Spark and TensorFlow) and lessons learned.
11:20am-12:00pm (40m) Data Science, Machine Learning, & AI Financial Services, Temporal data and time-series analytics
Working with time series: Denoising and imputation frameworks to improve data density
Anjali Samani (CircleUp)
The application of smoothing and imputation strategies is common practice in predictive modelling and time series analysis. With a technique-agnostic approach, this session will provide qualitative and quantitative frameworks that address questions related to smoothing and imputation of missing values to improve data density.
1:15pm-1:55pm (40m) Data Science, Machine Learning, & AI Temporal data and time-series analytics
Handling data gaps in time series using imputation
Alfred Whitehead (Klick), Clare Jeon (KLICK INC)
What will tomorrow’s temperature be? My blood glucose levels tonight before bed? Time series forecasts depend on sensors or measurements made out in the real, messy world. Those sensors flake out, get turned off, disconnect, and otherwise conspire to cause missing data in our signals. We will show a number of methods for handling data gaps and give advice on which to consider and when.
2:05pm-2:45pm (40m) Data Science, Machine Learning, & AI Temporal data and time-series analytics
When Holt-Winters is better than machine learning
Anais Dotis (InfluxData)
Did you know that Classical algorithms outperform Machine Learning methods in time series forecasting? I’ll show you how I used the Holt-Winters forecasting algorithm to predict water levels in a creek.
3:45pm-4:25pm (40m) Data Science, Machine Learning, & AI Deep dive into specific tools, platforms, or frameworks
Soss: Lightweight probabilistic programming in Julia
Chad Scherrer (Metis)
This talk will explore the basic ideas in Soss, a new probabilistic programming library for Julia. Soss allows a high-level representation of the kinds of models often written in PyMC3 or Stan, and offers a way to programmatically specify and apply model transformations like approximations or reparameterizations.
4:35pm-5:15pm (40m) Data Science, Machine Learning, & AI Temporal data and time-series analytics
Scalable anomaly detection with Spark and SOS
Jeroen Janssens (Data Science Workshops B.V.)
In this talk, we present Stochastic Outlier Section (SOS), an unsupervised algorithm for detecting anomalies in large, high-dimensional data. SOS has been implemented in Python, R, and most recently, Spark. First, we illustrate the idea and intuition behind SOS. Subsequently, we demonstrate our implementation of SOS on top of Spark. Finally, we apply SOS to a real-world use case.
11:20am-12:00pm (40m) Data Science, Machine Learning, & AI Ethics
A practical guide to algorithmic bias and explainability in machine learning
Alejandro Saucedo (The Institute for Ethical AI & Machine Learning)
Undesired bias in machine learning has become a worrying topic due to the numerous high profile incidents. In this talk we demystify machine learning bias through a hands-on example. We'll be tasked to automate the loan approval process for a company, and introduce key tools and techniques from latest research that allow us to assess and mitigate undesired bias in our machine learning models.
1:15pm-1:55pm (40m) Data Science, Machine Learning, & AI Text and Language processing and analysis
Data need not be a moat: Mixed formal learning enables zero- and low-shot learning
Sandra Carrico (Glynt.ai)
This talk motivates mixed formal learning, explains it and outlines one machine learning example that previously used large numbers of examples and now learns with either zero or a handful of training examples. It maps apparently idiosyncratic techniques to Mixed Formal Learning, a general AI architecture that you can use in your projects.
2:05pm-2:45pm (40m) Automation in data science and data, Data Science, Machine Learning, & AI Data quality, data governance and data lineage, Media and Advertising, Model Development, Governance, Operations
Automating ML model training and deployments via metadata-driven data, infrastructure, feature engineering, and model management
Andrew Leamon (Comcast), Wadkar Sameer (Comcast NBCUniversal)
And overview of the Data Management and privacy challenges around automating ML model (re)deployments and stream based inferencing at scale.
3:45pm-4:25pm (40m) Data Science, Machine Learning, & AI Financial Services
An introduction to machine learning on graphs
David Mack (Octavian)
Graphs are a powerful way to represent knowledge. Organizations (in fields such as bio-sciences and finance) are starting to amass large knowledge graphs, but lack the machine-learning tools to extract the insights they need from them. In this presentation, I’ll give an overview of what insights are possible and survey the most popular approaches.
4:35pm-5:15pm (40m) Data Science, Machine Learning, & AI Transportation and Logistics
Harnessing graph-native algorithms to enhance machine learning: A primer
Brandy Freitas (Pitney Bowes)
In this session, Brandy Freitas from Pitney Bowes will cover the interplay between graph analytics and machine learning, improved feature engineering with graph native algorithms, and harnessing the power of graph structure for machine learning through node embedding.
11:20am-12:00pm (40m) Data Engineering and Architecture Data Management and Storage, Streaming and IoT
Performant time series data management and analytics with Postgres
Michael Freedman (TimescaleDB)
Leveraging polyglot solutions for your time-series data can lead to a variety of issues including engineering complexity, operational challenges, and even referential integrity concerns. By re-engineering Postgres to serve as a general data platform, your high-volume time-series workloads will be better streamlined, resulting in more actionable data and greater ease of use.
1:15pm-1:55pm (40m) Data Engineering and Architecture Data Management and Storage, Deep dive into specific tools, platforms, or frameworks
Managing your Kafka in an explosive growth environment
Alon Gavra (AppsFlyer)
Kafka, many times is just a piece of the stack that lives in production that often times no one wants to touch - because it just works. At AppsFlyer, Kafka sits at the core of our infrastructure that processes billions of events daily.
2:05pm-2:45pm (40m) Data Engineering and Architecture, Streaming and IoT Data Integration and Data Processing, Data, Analytics, and AI Architecture, Retail and e-commerce, Streaming and IoT
Posttransaction processing using Apache Pulsar at Narvar
Karthik Ramasamy (Streamlio), Anand Madhavan (Narvar)
Narvar provides next generation post transaction experience for over 500+ retailers. This talk explores the journey of how Narvar moving away from using a slew of technologies for their platform and consolidating their use cases using Apache Pulsar.
3:45pm-4:25pm (40m) Data Engineering and Architecture, Streaming and IoT Data Integration and Data Processing, Data, Analytics, and AI Architecture, Streaming and IoT, Telecom
SK Telecom's 5G network monitoring and 3D visualization on streaming technologies
Jonghyok Lee (SK Telecom), Chon Yong Lee (SK Telecom)
Architecture and lessons learned from development of T-CORE, SK Telecom’s monitoring and service analytics platform, which collects system and application data from several thousand servers and applications and provides 3D visualized real-time status of the whole network and services for the operators and analytics platform for data scientists, engineers and developers.
4:35pm-5:15pm (40m) Data Engineering and Architecture Data quality, data governance and data lineage, Retail and e-commerce
The why and how of data lineage
Neelesh Salian (Stitch Fix)
It is important to understand why Data Lineage is needed for an organization. Once the purpose is defined, we can talk about how to go about building such a system.
11:20am-12:00pm (40m) Data Engineering and Architecture, Streaming and IoT Streaming and IoT, Temporal data and time-series analytics
Online machine learning in streaming applications
Stavros Kontopoulos (Lightbend), Debasish Ghosh (Lightbend )
In this talk, we discuss online machine learning algorithm choices for streaming applications. We motive the discussion with resource constrained use cases like IoT and personalization. We cover Hoeffding Adaptive Trees, classic sketch data structures, and drift detection algorithms, all the way from implementation to production deployment, describing the pros and cons of using each of them.
1:15pm-1:55pm (40m) Data Engineering and Architecture Model Development, Governance, Operations
Problems taking AI to production and how to fix them
Jim Scott (NVIDIA)
Data scientists are creating and testing hundreds or thousands more models than in the past. Models require support from both real-time and static data sources. As data becomes enriched, and parameters tuned and explored, there is a need for versioning everything, including the data. We will discuss the very specific problems and approaches to fix them.
2:05pm-2:45pm (40m) Automation in data science and data, Data Engineering and Architecture Model Development, Governance, Operations
The New SDLC: CI/CD in the age of machine learning
Diego Oppenheimer (Algorithmia)
Machine Learning (ML) will fundamentally change the way we build and maintain applications. How can we adapt our infrastructure, operations, staffing, and training to meet the challenges of the new Software Development Life Cycle (SDLC) without throwing away everything that already works?
3:45pm-4:25pm (40m) Automation in data science and data, Data Engineering and Architecture Cloud Platforms and SaaS, Deep dive into specific tools, platforms, or frameworks, Model Development, Governance, Operations
MLOps: Applying DevOps practices to machine learning workloads
Sireesha Muppala (Amazon Web Services), Shelbee Eigenbrode (Amazon Web Services), Randall DeFauw (Amazon Web Services)
As an increasing level of automation is becoming available to data science, there is a balance between automation and quality that needs to be maintained. Applying DevOps practices to machine learning workloads not only brings models to the market faster but also maintains the quality and integrity of those models. This presentation will focus on applying DevOps practices to ML workloads.
11:20am-12:00pm (40m) Data Engineering and Architecture Cloud Platforms and SaaS, Data, Analytics, and AI Architecture, Media and Advertising
Your cloud, your ML, but more and more scale? How SurveyMonkey did it
Jing Huang (SurveyMonkey), Jessica Mong (SurveyMonkey)
You are a SaaS company that operates on a cloud infra prior to the ML era. How do you successfully extend your existing infrastructure to leverage the power of ML? In this case study, you will learn critical lessons from SurveyMonkey’s journey of expanding its ML capabilities with its rich data repo and hybrid cloud infrastructure.
1:15pm-1:55pm (40m) Data Engineering and Architecture Deep dive into specific tools, platforms, or frameworks, Transportation and Logistics
How to performance-tune Spark applications in large clusters
Omkar Joshi (Uber Technologies), Bo Yang (uber inc)
Omkar Joshi and Bo Yang offer an overview of how Uber’s ingestion (Marmary) & observability team improved performance of Apache Spark applications running on thousands of cluster machines and across 100 thousands+ of applications and how they methodically tackled these issues. They will also cover how they used Uber’s open sourced jvm-profiler for debugging issues at scale.
2:05pm-2:45pm (40m) Data Engineering and Architecture Data Integration and Data Processing, Data Management and Storage, Data, Analytics, and AI Architecture, Transportation and Logistics
Creating an extensible 100+ PB real-time big data platform by unifying storage and serving
Reza Shiftehfar (Uber Technologies)
Building a reliable Big Data platform is extremely challenging when it has to store and serve 100s of PetaBytes of data in a real-time fashion . This talk reflects on the challenges faced and proposes architectural solutions to scale a Big Data Platform to ingest, store, and serve 100+ PB of data with minute level latency while efficiently utilizing the hardware and meeting the security needs.
3:45pm-4:25pm (40m) Data Engineering and Architecture Cloud Platforms and SaaS, Data Management and Storage, Data, Analytics, and AI Architecture, Financial Services
Enabling big data and AI workloads on the object store at DBS Bank
Vitaliy Baklikov (Development Bank of Singapore), Dipti Borkar (Alluxio )
In this presentation, Vitaliy Baklikov from DBS Bank and Dipti Borkar from Alluxio will share how DBS Bank has built a modern big data analytics stack leveraging an object store even for data-intensive workloads like ATM forecasting and how it uses Alluxio to orchestrate data locality and data access for Spark workloads.
4:35pm-5:15pm (40m) Data Engineering and Architecture Data, Analytics, and AI Architecture
Bridging the gap between big data computing and high-performance computing
Supun Kamburugamuve (Indiana University)
Big data computing and high-performance computing (HPC) has evolved over the years as separate paradigms. With the explosion of the data and the demand for machine learning algorithms, these two paradigms are increasingly embracing each other for data management and algorithms. Supun Kamburugamuve explores the possibilities and tools available for getting the best of HPC and big data.
11:20am-12:00pm (40m) Data Engineering and Architecture Data Integration and Data Processing
Using Spark for crunching astronomical data on the LSST scale
Petar Zecevic (SV Group d.o.o.)
Large Scale Survey Telescope, or LSST, is one of the most important future surveys. Its unique design will allow it to cover large regions of the sky and obtain images of the faintest objects. In 10 years of its operation it will produce about 80 PB of data, both in images and catalog data. I will present AXS, a system we built for fast processing and cross-matching of survey catalog data.
1:15pm-1:55pm (40m) Data Engineering and Architecture Cloud Platforms and SaaS, Data, Analytics, and AI Architecture
The Hitchhiker’s Guide to the Cloud - Architecting for the Cloud through Customer Stories
Jason Wang (Cloudera), Sushant Rao (Cloudera)
We’ll give you actionable understanding of cloud architecture and different approaches customers took in their journey to the cloud. We start with the different ways we’ve seen customers be successful in the cloud. Then deep dive into the decisions they made, and how that drove their cloud architecture. Along the way we review problems they overcame, lessons learned, and core cloud paradigms.
2:05pm-2:45pm (40m) Data Engineering and Architecture Data Integration and Data Processing, Data quality, data governance and data lineage
Fuzzy matching and deduplicating data: Techniques for advanced data prep
Nikki Rouda (Amazon Web Services), Roy Hasson (Amazon Web Services)
Learn how to deduplicate or link records in a dataset, even when the records don’t have a common unique identifier and no fields match exactly. Link customer records across different databases (e.g. different name spelling or address.) Match external product lists against your own catalog, such as lists of hazardous goods. Solve tough challenges to prepare and cleanse data for analysis.
3:45pm-4:25pm (40m) Data Engineering and Architecture Data, Analytics, and AI Architecture
Lessons learned from scaling the tech stack of a modern analytics platform
Tom O'Neill (Sisense)
CCO Tom O’Neill will discuss lessons learned from scaling up Periscope Data to support incredibly large volumes of data and queries from its 2,000+ teams as part of Sisense.. He’ll highlight the process of migrating from Heroku to Kubernetes and discovering new ways to leverage its power, plus other developments that have allowed users to delve deeper into new data science and ML analysis.
4:35pm-5:15pm (40m) Data Engineering and Architecture Culture and Organization, Financial Services, Model Development, Governance, Operations
Scaling data engineers
Evgeny Vinogradov (Yandex.Money)
With a microservice architecture, DWH is a first place where all the data gets together. It supplied by many different datasources. It is used for many purposes – from near-OLTP till models fitting and realtime classifying. Talk will cover our experience in management and scaling of data Engineering Team and infrastructure for support of 20+ Product Teams.
11:20am-12:00pm (40m) Data Engineering and Architecture Cloud Platforms and SaaS, Data Management and Storage
Where's my lookup table? Modeling relational data in a denormalized world
Rick Houlihan (Amazon Web Services)
Data has always been relational, and it always will be. NoSQL databases are gaining in popularity, but that does not change the fact that the data they manage is still relational, it just changes how we have to model the data. This session dives deep into how real Entity Relationship Models can be efficiently modeled in a denormalized manner using schema examples from real application services.
1:15pm-1:55pm (40m) Business Analytics and Visualization, Data Engineering and Architecture BI, Interactive Analytics and Visualization
Intelligent design patterns for cloud-based analytics and BI
Shant Hovsepian (Arcadia Data)
With cloud object storage (e.g. S3, ADLS) one expects business intelligence (BI) applications to benefit from the scale of data and real-time analytics. However, traditional BI in the cloud surfaces non-obvious challenges. This talk will review service-oriented cloud design (storage, compute, catalog, security, SQL) and shows how native cloud BI provides analytic depth, low cost and performance
2:05pm-2:45pm (40m) Business Analytics and Visualization, Data Engineering and Architecture BI, Interactive Analytics and Visualization, Cloud Platforms and SaaS, Data Management and Storage
Building a best-in-class data lake on AWS and Azure
Tomer Shiran (Dremio), Jacques Nadeau (Dremio)
Data lakes have become a key ingredient in the data architecture of most companies. In the cloud, object storage systems such as S3 and ADLS make it easier than ever to operate a data lake. In this talk we describe how companies can build best-in-class data lakes in the cloud, leveraging open source technologies and the cloud's elasticity to run and optimize various workloads simultaneously.
3:45pm-4:25pm (40m) Data Engineering and Architecture, Security and Privacy Deep dive into specific tools, platforms, or frameworks, Privacy and Security
Protect your private data in your Hadoop clusters with ORC column encryption
Owen O'Malley (Cloudera)
Fine-grained data protection at a column level in data lake environments has become a mandatory requirement to demonstrate compliance with multiple local and international regulations across many industries today. This talk describes how column encryption in ORC files enables both fine grain protection and audits of who accessed the private data.
4:35pm-5:15pm (40m) Data Engineering and Architecture
Using Spark to speed up the diagnosis performance for big data applications
Ruixin Xu (Microsoft), Long Tian (Microsoft), Yu Zhou (Microsoft)
Microsoft big data team run experiment to use Spark and Jupyter notebook as a replacement of existing IDE based diagnose tools for internal DevOps. Experiment result indicates the Spark based solution has improved the diagnosis performance significantly especially for complex job with large profile, and leveraging Jupyter notebook also bring the benefit of fast iteration and easy knowledge share.
11:20am-12:00pm (40m) Culture and organization, Strata Business Summit Culture and Organization
Executive Briefing: Creating a center for data science from scratch—Lessons from nonprofit research
Gayle Bieler (RTI International)
This presentation is about building a thriving Center for Data Science within a large and well-respected non-profit research institute. I'll discuss my transformation from an entrepreneurial statistician to data science leader, as well as some of our most impactful projects and best adventures to date--solving important national problems, improving our local communities, and transforming research.
1:15pm-1:55pm (40m) Executive Briefing and best practices, Strata Business Summit Culture and Organization, Ethics
Executive Briefing: Lessons from the front lines—Building a Responsible AI/ML Program in the enterprise
Michael Kubiske (Captial One)
This talk will explore some of the philosophy around the concept of explaining a model given the colloquial definition is partially recursive. It will cover the lens banking regulation places on this philosophical basis and expand into techniques used for these well governed aspects.
2:05pm-2:45pm (40m)
Session
3:45pm-4:25pm (40m) Culture and organization, Strata Business Summit Culture and Organization, Transportation and Logistics
Executive Briefing: Building a culture of self-service, from predeployment to continued engagement
Jonathan Tudor (GE Aviation), Ross Schalmo (GE Aviation)
GE Aviation has made it a mission to implement Self-Service Data. To ensure success beyond initial implementation of tools, the Data Engineering and Analytics teams at GE Aviation created initiatives designed to foster engagement from an ongoing partnership with each part of the business to the gamification of tagging data in a data catalog to forming a Published Dataset Council.
4:35pm-5:15pm (40m) Executive Briefing and best practices, Strata Business Summit Data, Analytics, and AI Architecture, Streaming and IoT
Executive Briefing: What it takes to use machine learning in fast data pipelines
Dean Wampler (Lightbend)
Join me for a discussion of the following problems and their solutions: 1. How (and why) to integrate ML into production streaming data pipelines, to serve results quickly? 2. How to bridge data science and production environments, with different tools, techniques, and requirements? 3. How to build reliable and scalable, long-running services? 4. How to update ML models without downtime?
11:20am-12:00pm (40m)
Session
1:15pm-1:55pm (40m) Culture and organization, Strata Business Summit Culture and Organization
An in-depth look at the data science career: Defining roles, assessing skills
Usama Fayyad (Open Insights & OODA Health, Inc.), Hamit Hamutcu (Analytics Center)
Ever confused about what it takes to be a data scientist? Or curious about how companies recruit, train and manage analytics resources? This presentation covers insight from the most comprehensive research effort to-date on the data analytics profession, propose a framework for standardization of roles in the industry and methods for assessing skills.
2:05pm-2:45pm (40m) Case studies, Strata Business Summit BI, Interactive Analytics and Visualization, Telecom
T-Mobile's journey to turn crowdsourced big data into actionable insights
Alex Yoon (T-Mobile)
T-Mobile successfully improved the quality of voice calling by analyzing crowd sourced big data from mobile devices. In this session, you will learn how engineers from multiple backgrounds collaborated to achieve 10% improvement in voice quality and why the analysis of big data was the key to the success in bringing a better voice call service quality to millions of end users.
3:45pm-4:25pm (40m) Case studies, Strata Business Summit Text and Language processing and analysis, Transportation and Logistics
Migrating millions of users from voice- and email-based customer support to a chatbot
Madhu Gopinathan (MakeMyTrip), Sanjay Mohan (MakeMyTrip)
At MakeMyTrip, India’s leading online travel platform, customers were using voice or email to contact agents for post sale support. In order to improve the efficiency of agents and improve customer experience, MakeMyTrip developed a chatbot Myra using some of the latest advances in deep learning. In this talk, we will discuss the high level architecture and the business impact created by Myra.
11:20am-12:00pm (40m) Data Science, Machine Learning, & AI
How Deutsche Bank industrialized AI and machine learning
John Allen (Deutsche Bank)
As an early adopter of data science, machine learning, and AI, Deutsche Bank's analytics function is trailblazing new ways to drive revenues, lower costs, and reduce risk across all areas of the group. John Allen shares how his team combines commercial offerings with open source technologies to revolutionize legacy processes and transform the way the bank uses technology to drive innovation.
1:15pm-1:55pm (40m)
Session
2:05pm-2:45pm (40m) Business Analytics and Visualization, Strata Business Summit BI, Interactive Analytics and Visualization, Media and Advertising, Temporal data and time-series analytics
ThirdEye: LinkedIn’s business-wide monitoring platform
Akshay Rai (Linkedin)
Failures or issues in a product or service can negatively affect the business. Detecting issues in advance and recovering from them is crucial to keep the business alive. Come, join us, to learn more about LinkedIn's next-generation open-source monitoring platform, an integrated solution for real-time alerting and collaborative analysis.
3:45pm-4:25pm (40m) Law and Ethics, Strata Business Summit BI, Interactive Analytics and Visualization, Ethics
Purposefully designing technology for civic engagement
Audrey Lobo-Pulo (Phoensight), Annette Hester (National Energy Board, Canada), Ryan Hum (National Energy Board, Canada)
As new digital platforms emerge and governments look at new ways to engage with citizens, there is an increasing awareness of the role these platforms play in shaping public participation and democracy. This talk examines the design attributes of civic engagement technologies, and their ensuing impacts. A framework for better achieving desired outcomes is demonstrated with a NEB Canada case study.
4:35pm-5:15pm (40m) Executive Briefing and best practices, Strata Business Summit Privacy and Security
Executive briefing: Big data in the era of heavy worldwide privacy regulations
Mark Donsky (Okera)
California is following the EU's GDPR with the California Consumer Protection Act (CCPA) in 2020. Penalties for non-compliance, but many companies aren't prepared for this strict regulation. This session will explore the capabilities your data environment needs in order to simplify CCPA and GDPR compliance, as well as other regulations.
2:05pm-2:45pm (40m) Sponsored
Powering the future with data intelligence (sponsored by Collibra)
Jim Cushman (Collibra), Piyus Jain (Progressive)
Transforming data into a trusted business asset that informs decision-making requires giving teams access to a powerful platform that makes it easy to harness data across the enterprise. In this session, you'll hear how Progressive uses Collibra to transform the way data is managed and used across the organization, driving real business value.
1:15pm-1:55pm (40m) Sponsored
Migrating Hadoop analytics to Spark in the cloud without disruption (sponsored by WANdisco)
Paul Scott-Murphy (WANdisco)
What you’ll learn: The options that exist for cloud migration, their advantages and disadvantages * What cloud vendors do and don't offer to support large-scale migration *The business risks associated with large-scale cloud migration *How to migrate analytics data at scale for immediate use in Spark without disrupting on-premises operations
11:20am-12:00pm (40m) Sponsored
Organizing the chaos of healthcare with smart data discovery (sponsored by Io-Tahoe)
Charles Boicey (Clearsense)
Healthcare’s reliance on comprehendible data is critical to the mission of providing optimal and affordable care. Learn how the application of technology, such as machine learning, is paramount to the modernisation of healthcare that provides its professionals with fully integrated and complete medical records.
1:15pm-1:55pm (40m) Sponsored
The end of applications - How data collaboration Is changing everything (sponsored by Cinchy)
Dan DeMers (Cinchy)
After 40 years of apps, enterprise companies are now realizing that building or buying an application for every use case has become a major threat to their ability to leverage and protect their core data assets. Join Cinchy CEO Dan DeMers for this live demo of Cinchy, the World’s first Data Collaboration Platform.
8:45am-8:55am (10m)
Thursday keynotes
Ben Lorica (O'Reilly Media), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Program chairs Ben Lorica, Doug Cutting, and Alistair Croll welcome you to the second day of keynotes.
8:55am-9:10am (15m)
Staying safe in the AI era
Cassie Kozyrkov (Google)
Machine learning and artificial intelligence are no longer science fiction, but what does it take to harness their potential effectively, responsibly, and reliably? Based on lessons learned at Google, this talk will offer actionable advice to help you find opportunities to take advantage of machine learning, navigate the AI era, and stay safe as you innovate.
9:20am-9:30am (10m)
Cloudera keynote
Details to come.
9:35am-9:55am (20m)
AI is not magic. It’s computer science.
Robert Thomas (IBM), Tim O'Reilly (O'Reilly Media)
AI has the potential to add $16 trillion global economy by 2030, but adoption has been slow. While we understand the power of AI, many of us aren’t sure how to fully unleash its potential. The reality is: AI is not magic. It’s hard work.
10:00am-10:05am (5m)
Strata Data Awards: Winners announced
The Strata Data Awards recognize the most innovative startups, leaders, and data science projects from Strata sponsors and exhibitors around the world. Join us during keynotes for the announcement of the winners.
10:10am-10:25am (15m)
Say what? The ethical challenges of designing for humanlike interaction
Jonathan Foster (Microsoft)
Language shapes our thinking, our relationships, our sense of self. Conversation connects us in powerful, intimate, and often unconscious ways. Jonathan Foster explains why, as we design for natural language interactions and more humanlike digital experiences, language—as design material, conversation, and design canvas—reveals ethical challenges we couldn't encounter with GUI-powered experiences.
10:25am-10:30am (5m) Sponsored
Scoring your business in the AI matrix (sponsored by Dataiku)
Jed Dougherty (Dataiku)
One of the more common and fairly widely accepted definitions is that AI means going beyond simple statistics to mimic human skills in perception, learning, interaction, and decision making. But even this definition leaves some room for interpretation. Jed Dougherty breaks down the different parts of that definition and how they might manifest themselves in data science projects.
10:30am-10:50am (20m)
Data sonification: Making music from the yield curve
Alan Smith (Financial Times)
Based on a critical evaluation of the iconic yield curve chart, this talk argues that combining visualisation (data to pixels) with sonification (data to pitch) offers potential to improve not only aesthetic multimedia experiences - but also an opportunity to take the presentation of data into the rapidly expanding universe of screenless devices and products.
10:50am-11:20am (30m)
Break: Morning break sponsored by Cisco
12:00pm-1:15pm (1h 15m)
Break
12:00pm-1:15pm (1h 15m)
Thursday Topic Tables at Lunch (sponsored by IBM)
Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics.
2:45pm-3:45pm (1h)
Break: Afternoon break sponsored by Io-Tahoe
8:15am-8:45am (30m)
Speed Networking
Gather before keynotes on Thursday morning to enjoy casual conversation while meeting fellow attendees. If one of your goals at Strata is to meet new people, this session will jumpstart your networking with other attendees.
12:00pm-1:15pm (1h 15m)
Thursday Topic Tables at Lunch (sponsored by IBM)
Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics.
12:00pm-1:15pm (1h 15m)
Thursday Business Summit Lunch
Join Strata Business Summit speakers and attendees for a networking lunch on Thursday.

Contact us

confreg@oreilly.com

For conference registration information and customer service

partners@oreilly.com

For more information on community discounts and trade opportunities with O’Reilly conferences

strataconf@oreilly.com

For information on exhibiting or sponsoring a conference

Contact list

View a complete list of Strata Data Conference contacts