Sep 23–26, 2019
 
3B - Expo Hall
Add Machine learning for streaming data: Practical insights to your personal schedule
2:05pm Machine learning for streaming data: Practical insights Heitor Murilo Gomes (Télécom ParisTech), Albert Bifet (Télécom ParisTech)
1A 06/07
Add Scaling Apache Spark at Facebook to your personal schedule
1:15pm Scaling Apache Spark at Facebook Sameer Agarwal (Facebook), Ankit Agarwal (Facebook Inc.)
Add Deep learning technologies for giant hogweed eradication to your personal schedule
4:35pm Deep learning technologies for giant hogweed eradication Naoto Umemori (NTT DATA), Masaru Dobashi (NTT DATA)
1A 08/10
Add Handling data gaps in time series using imputation to your personal schedule
1:15pm Handling data gaps in time series using imputation Alfred Whitehead (Klick), clare jeon (Klick)
Add When Holt-Winters is better than machine learning to your personal schedule
2:05pm When Holt-Winters is better than machine learning Anais Dotis (InfluxData)
Add Scalable anomaly detection with Spark and SOS to your personal schedule
4:35pm Scalable anomaly detection with Spark and SOS Jeroen Janssens (Data Science Workshops)
1A 12/14
Add A practical guide to algorithmic bias and explainability in machine learning to your personal schedule
11:20am A practical guide to algorithmic bias and explainability in machine learning Alejandro Saucedo (The Institute for Ethical AI & Machine Learning)
Add An introduction to machine learning on graphs to your personal schedule
3:45pm An introduction to machine learning on graphs David Mack (Octavian)
1A 15/16
Add Your cloud, your ML, but more and more scale? How SurveyMonkey did it to your personal schedule
11:20am Your cloud, your ML, but more and more scale? How SurveyMonkey did it Jing Huang (SurveyMonkey), Jesscia Mong (SurveyMonkey)
Add Posttransaction processing using Apache Pulsar at Narvar to your personal schedule
2:05pm Posttransaction processing using Apache Pulsar at Narvar Davor Bonaci (Kaskada), Anand Madhavan (Narvar)
Add SK Telecom's 5G network monitoring and 3D visualization on streaming technologies to your personal schedule
3:45pm SK Telecom's 5G network monitoring and 3D visualization on streaming technologies Jonghyok Lee (SK Telecom), Chon Yong Lee (SK Telecom)
1A 21/22
Add Online machine learning in streaming applications to your personal schedule
11:20am Online machine learning in streaming applications Stavros Kontopoulos (Lightbend), Debasish Ghosh (Lightbend)
Add The new SDLC: CI/CD in the age of machine learning to your personal schedule
2:05pm The new SDLC: CI/CD in the age of machine learning Diego Oppenheimer (Algorithmia)
Add ML ops: Applying DevOps practices to machine learning workloads to your personal schedule
3:45pm ML ops: Applying DevOps practices to machine learning workloads Sireesha Muppala (Amazon Web Services), Shelbee Eigenbrode (Amazon Web Services), Randall DeFauw (Amazon Web Services)
1A 23/24
Add Performant time series data management and analytics with PostgreSQL to your personal schedule
11:20am Performant time series data management and analytics with PostgreSQL Michael Freedman (TimescaleDB | Princeton University)
Add How to performance-tune Spark applications in large clusters to your personal schedule
1:15pm How to performance-tune Spark applications in large clusters Omkar Joshi (Uber), Bo Yang (Uber)
Add Enabling big data and AI workloads on the object store at DBS Bank to your personal schedule
3:45pm Enabling big data and AI workloads on the object store at DBS Bank Vitaliy Baklikov (DBS Bank), Dipti Borkar (Alluxio )
Add Bridging the gap between big data computing and high-performance computing to your personal schedule
4:35pm Bridging the gap between big data computing and high-performance computing Supun Kamburugamuve (Indiana University)
1E 07/08
Add Fuzzy matching and deduplicating data: Techniques for advanced data prep to your personal schedule
2:05pm Fuzzy matching and deduplicating data: Techniques for advanced data prep Nikki Rouda (Amazon Web Services), Janisha Anand (Amazon Web Services)
Add Spark on Kubernetes for data science to your personal schedule
4:35pm Spark on Kubernetes for data science Jordan Volz (Dataiku)
1E 09
Add Intelligent design patterns for cloud-based analytics and BI to your personal schedule
1:15pm Intelligent design patterns for cloud-based analytics and BI Shant Hovsepian (Arcadia Data)
Add Securing your cloud data lake with a "defense in depth" approach to your personal schedule
2:05pm Securing your cloud data lake with a "defense in depth" approach Tomer Shiran (Dremio), Jacques Nadeau (Dremio)
Add Using Spark to speed up the diagnosis performance for big data applications to your personal schedule
4:35pm Using Spark to speed up the diagnosis performance for big data applications Ruixin Xu (Microsoft), Long Tian (Microsoft), Yu Zhou (Microsoft)
1E 10/11
Add Executive Briefing: Unpacking AutoML to your personal schedule
2:05pm Executive Briefing: Unpacking AutoML Paco Nathan (derwen.ai)
Add Executive Briefing: Building a culture of self-service from predeployment to continued engagement to your personal schedule
3:45pm Executive Briefing: Building a culture of self-service from predeployment to continued engagement Jonathan Tudor (GE Aviation), Ross Schalmo (GE Aviation)
1E 12/13
Add An in-depth look at the data science career: Defining roles, assessing skills to your personal schedule
1:15pm An in-depth look at the data science career: Defining roles, assessing skills Usama Fayyad (Open Insights & OODA Health, Inc.), Hamit Hamutcu (Analytics Center)
Add Migrating millions of users from voice- and email-based customer support to a chatbot to your personal schedule
3:45pm Migrating millions of users from voice- and email-based customer support to a chatbot Madhu Gopinathan (MakeMyTrip), Sanjay Mohan (MakeMyTrip)
Add Combining creativity and analytics to your personal schedule
4:35pm Combining creativity and analytics David Boyle (Audience Strategies)
1E 14
Add Purposefully designing technology for civic engagement to your personal schedule
3:45pm Purposefully designing technology for civic engagement Audrey Lobo-Pulo (Phoensight), Annette Hester (National Energy Board, Canada)
1A 01/02
Add Powering the future with data intelligence (sponsored by Collibra) to your personal schedule
2:05pm Powering the future with data intelligence (sponsored by Collibra) Jim Cushman (Collibra), Piyush Jain (Progressive)
1A 03
Add Stream processing beyond streaming data to your personal schedule
2:05pm Stream processing beyond streaming data Stephan Ewen (Ververica)
1A 04/05
1E 06
Add Thursday keynotes to your personal schedule
3E
8:45am Thursday keynotes Ben Lorica (O'Reilly), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Add Staying safe in the AI era to your personal schedule
8:55am Staying safe in the AI era Cassie Kozyrkov (Google)
Add Delivering the enterprise data cloud to your personal schedule
9:25am Delivering the enterprise data cloud Arun Murthy (Cloudera )
Add Data sonification: Making music from the yield curve to your personal schedule
10:20am Data sonification: Making music from the yield curve Alan Smith (Financial Times)
Add Closing remarks to your personal schedule
10:40am Closing remarks Ben Lorica (O'Reilly), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
10:50am Morning break sponsored by Cisco | Room: Expo Hall - 3B
12:00pm Break | Room: Expo Hall - 3B
Add Why AI fails: Overcoming AI challenges (sponsored by IBM) to your personal schedule
12:30pm Why AI fails: Overcoming AI challenges (sponsored by IBM) | Room: 3B - Expo Hall Brittany Bogle (IBM)
2:45pm Afternoon break sponsored by Io-Tahoe | Room: Expo Hall - 3B
Add Speed Networking to your personal schedule
8:00am Speed Networking | Room: Keynote Foyer
8:30am Early morning coffee (8:00am - 8:45am) | Room: Keynote Foyer
Add Thursday Business Summit Lunch to your personal schedule
12:00pm Thursday Business Summit Lunch | Room: Expo Hall - 3D
11:20am-12:00pm (40m) Data Science, Machine Learning, & AI, Expo Hall Culture and Organization, Retail and e-commerce, Transportation and Logistics
ML is not enough: Decision automation in the real world
Brian Keng (Rubikloud)
Automating decisions require a system to consider more than just a data-driven prediction. Real-world decisions require additional constraints and fuzzy objectives to ensure they're robust and consistent with business goals. Brian Keng takes a deep dive into how to leverage modern machine learning methods and traditional mathematical optimization techniques for decision automation.
1:15pm-1:55pm (40m) Data Science, Machine Learning, & AI, Expo Hall Deep dive into specific tools, platforms, or frameworks, Deep Learning
Handtrack.js: Building gesture-based interactions in the browser using TensorFlow
Victor Dibia (Cloudera Fast Forward Labs)
Recent advances in machine learning frameworks for the browser such as TensorFlow provides the opportunity to craft truly novel experiences within frontend applications. Victor Dibia explores the state of the art for machine learning in the browser using TensorFlow and outlines its use in the design of Handtrack.js—a library for prototyping real-time hand detection in the browser.
2:05pm-2:45pm (40m) Data Science, Machine Learning, & AI, Expo Hall Streaming and IoT, Telecom, Temporal data and time-series analytics
Machine learning for streaming data: Practical insights
Heitor Murilo Gomes (Télécom ParisTech), Albert Bifet (Télécom ParisTech)
Heitor Murilo Gomes and Albert Bifet introduce you to a machine learning pipeline for streaming data using the streamDM framework. You'll also learn how to use streamDM for supervised and unsupervised learning tasks, see examples of online preprocessing methods, and discover how to expand the framework by adding new learning algorithms or preprocessing methods.
11:20am-12:00pm (40m) Data Science, Machine Learning, & AI
Getting to know the elephant: Real-time debugging and visualization for deep learning
Shital Shah (Microsoft Research)
Taming massive deep learning models, data, and training times requires new way of thinking. Shital Shah explores new tools and methods to better understand AI. Explaining the decisions made by AI not only helps us accelerate its development but also make it safe and more trustworthy.
1:15pm-1:55pm (40m) Data Science, Machine Learning, & AI
Scaling Apache Spark at Facebook
Sameer Agarwal (Facebook), Ankit Agarwal (Facebook Inc.)
Apache Spark is the largest compute engine at Facebook by CPU. Sameer Agarwal dives into the story of how Facebook optimized, tuned, and scaled Apache Spark to run on clusters of tens of thousands of machines, processing hundreds of petabytes of data, and being used by thousands of data scientists, engineers, and product analysts every day.
2:05pm-2:45pm (40m) Data Science, Machine Learning, & AI Deep Learning, Streaming and IoT
Learning asset naming patterns to find risky unmanaged devices
Ryan Foltz (Exabeam)
Unmanaged and foreign devices in the corporate networks pose a security risk, and the first step toward reducing this risk is the ability to identify them. Ryan Foltz walks you through a comprehensive device management machine learning model based on deep learning that performs anomaly detection based on only device names to flag devices that do not follow naming structures.
3:45pm-4:25pm (40m) Data Science, Machine Learning, & AI Deep Learning
Deep learning on Apache Spark at CERN’s Large Hadron Collider with Analytics Zoo
Sajan Govindan (Intel)
Sajan Govindan outlines CERN’s research on deep learning in high energy physics experiments as an alternative to customized rule-based methods with an example of topology classification to improve real-time event selection at the Large Hadron Collider. CERN uses deep learning pipelines on Apache Spark using BigDL and Analytics Zoo open source software on Intel Xeon-based clusters.
4:35pm-5:15pm (40m) Data Science, Machine Learning, & AI Deep Learning
Deep learning technologies for giant hogweed eradication
Naoto Umemori (NTT DATA), Masaru Dobashi (NTT DATA)
Giant hogweed is a highly toxic plant. Naoto Umemori and Masaru Dobashi aim to automate the process of detecting the plant with technologies like drones and image recognition and detection using machine learning. You'll see how they designed the architecture, took advantage of big data and machine and deep learning technologies (e.g., Hadoop, Spark, and TensorFlow), and the lessons they learned.
11:20am-12:00pm (40m) Data Science, Machine Learning, & AI Financial Services, Temporal data and time-series analytics
Working with time series: Denoising and imputation frameworks to improve data density
Anjali Samani (CircleUp)
The application of smoothing and imputation strategies is common practice in predictive modeling and time series analysis. With a technique-agnostic approach, Anjali Samani provides qualitative and quantitative frameworks that address questions related to smoothing and imputation of missing values to improve data density.
1:15pm-1:55pm (40m) Data Science, Machine Learning, & AI Temporal data and time-series analytics
Handling data gaps in time series using imputation
Alfred Whitehead (Klick), clare jeon (Klick)
Time series forecasts depend on sensors or measurements made in the real, messy world. The sensors flake out, get turned off, disconnect, and otherwise conspire to cause missing signals. Signals that may tell you what tomorrow's temperature will be or what your blood glucose levels are before bed. Alfred Whitehead and Clare Jeon explore methods for handling data gaps and when to consider which.
2:05pm-2:45pm (40m) Data Science, Machine Learning, & AI Temporal data and time-series analytics
When Holt-Winters is better than machine learning
Anais Dotis (InfluxData)
Machine learning (ML) gets a lot of hype, but its classical predecessors are still immensely powerful, especially in the time series space, and classical algorithms outperform machine learning methods in time series forecasting. Anais Dotis dives into how she used the Holt-Winters forecasting algorithm to predict water levels in a creek.
3:45pm-4:25pm (40m) Data Science, Machine Learning, & AI Deep dive into specific tools, platforms, or frameworks
Soss: Lightweight probabilistic programming in Julia
Chad Scherrer (Metis)
Chad Scherrer explores the basic ideas in Soss, a new probabilistic programming library for Julia. Soss allows a high-level representation of the kinds of models often written in PyMC3 or Stan, and offers a way to programmatically specify and apply model transformations like approximations or reparameterizations.
4:35pm-5:15pm (40m) Data Science, Machine Learning, & AI Temporal data and time-series analytics
Scalable anomaly detection with Spark and SOS
Jeroen Janssens (Data Science Workshops)
Jeroen Janssens dives into stochastic outlier section (SOS), an unsupervised algorithm for detecting anomalies in large, high-dimensional data. SOS has been implemented in Python, R, and, most recently, Spark. He illustrates the idea and intuition behind SOS, demonstrates the implementation of SOS on top of Spark, and applies SOS to a real-world use case.
11:20am-12:00pm (40m) Data Science, Machine Learning, & AI Ethics
A practical guide to algorithmic bias and explainability in machine learning
Alejandro Saucedo (The Institute for Ethical AI & Machine Learning)
Alejandro Saucedo demystifies AI explainability through a hands-on case study, where the objective is to automate a loan-approval process by building and evaluating a deep learning model. He introduces motivations through the practical risks that arise with undesired bias and black box models and shows you how to tackle these challenges using tools from the latest research and domain knowledge.
1:15pm-1:55pm (40m) Data Science, Machine Learning, & AI Text and Language processing and analysis
Data need not be a moat: Mixed formal learning enables zero- and low-shot learning
Sandra Carrico (GLYNT)
Sandra Carrico explores mixed formal learning, explains it, and outlines one machine learning example that previously used large numbers of examples and now learns with either zero or a handful of training examples. It maps apparently idiosyncratic techniques to mixed formal learning, a general AI architecture that you can use in your projects.
2:05pm-2:45pm (40m) Automation in data science and data, Data Science, Machine Learning, & AI Data quality, data governance and data lineage, Media and Advertising, Model Development, Governance, Operations
Automating ML model training and deployments via metadata-driven data, infrastructure, feature engineering, and model management
Mumin Ransom (Comcast), Nick Pinckernell (Comcast)
Mumin Ransom gives an overview of the data management and privacy challenges around automating ML model (re)deployments and stream-based inferencing at scale.
3:45pm-4:25pm (40m) Data Science, Machine Learning, & AI Financial Services
An introduction to machine learning on graphs
David Mack (Octavian)
Graphs are a powerful way to represent knowledge. Organizations, in fields such as biosciences and finance, are starting to amass large knowledge graphs, but they lack the machine learning tools to extract insights from them. David Mack offers an overview of what insights are possible and surveys the most popular approaches.
11:20am-12:00pm (40m) Data Engineering and Architecture Cloud Platforms and SaaS, Data, Analytics, and AI Architecture, Media and Advertising
Your cloud, your ML, but more and more scale? How SurveyMonkey did it
Jing Huang (SurveyMonkey), Jesscia Mong (SurveyMonkey)
You're a SaaS company operating on a cloud infrastructure prior to the machine learning (ML) era and you need to successfully extend your existing infrastructure to leverage the power of ML. Jing Huang and Jessica Mong detail a case study with critical lessons from SurveyMonkey’s journey of expanding its ML capabilities with its rich data repo and hybrid cloud infrastructure.
1:15pm-1:55pm (40m) Data Engineering and Architecture Data Management and Storage, Deep dive into specific tools, platforms, or frameworks
Managing your Kafka in an explosive growth environment
Alon Gavra (AppsFlyer)
Frequently, Kafka is just a piece of the stack that lives in production that often times no one wants to touch—because it just works. Alon Gavra outlines how Kafka sits at the core of AppsFlyer's infrastructure that processes billions of events daily.
2:05pm-2:45pm (40m) Data Engineering and Architecture, Streaming and IoT Data Integration and Data Processing, Data, Analytics, and AI Architecture, Retail and e-commerce, Streaming and IoT
Posttransaction processing using Apache Pulsar at Narvar
Davor Bonaci (Kaskada), Anand Madhavan (Narvar)
Narvar provides next-generation posttransaction experience for over 500 retailers. Karthik Ramasamy and Anand Madhavan take you on the journey of how Narvar moved away from using a slew of technologies for their platform and consolidated its use cases using Apache Pulsar.
3:45pm-4:25pm (40m) Data Engineering and Architecture, Streaming and IoT Data Integration and Data Processing, Data, Analytics, and AI Architecture, Streaming and IoT, Telecom
SK Telecom's 5G network monitoring and 3D visualization on streaming technologies
Jonghyok Lee (SK Telecom), Chon Yong Lee (SK Telecom)
Jonghyok Lee Chon Yong Lee discuss T-CORE, SK Telecom’s monitoring and service analytics platform, which collects system and application data from several thousand servers and applications and provides a 3D visualization of the real-time status of the whole network. Join in to hear lessons learned during development.
11:20am-12:00pm (40m) Data Engineering and Architecture, Streaming and IoT Streaming and IoT, Temporal data and time-series analytics
Online machine learning in streaming applications
Stavros Kontopoulos (Lightbend), Debasish Ghosh (Lightbend)
Stavros Kontopoulos and Debasish Ghosh explore online machine learning algorithm choices for streaming applications, especially those with resource-constrained use cases like IoT and personalization. They dive into Hoeffding Adaptive Trees, classic sketch data structures, and drift detection algorithms from implementation to production deployment, describing the pros and cons of each of them.
1:15pm-1:55pm (40m) Data Engineering and Architecture Model Development, Governance, Operations
Problems taking AI to production and how to fix them
Jim Scott (NVIDIA)
Data scientists create and test hundreds or thousands more models than in the past. Models require support from both real-time and static data sources. As data becomes enriched, and parameters tuned and explored, there's a need for versioning everything, including the data. Jim Scott examines the very specific problems and approaches to fix them.
2:05pm-2:45pm (40m) Automation in data science and data, Data Engineering and Architecture Model Development, Governance, Operations
The new SDLC: CI/CD in the age of machine learning
Diego Oppenheimer (Algorithmia)
Machine learning (ML) will fundamentally change the way we build and maintain applications. Diego Oppenheimer dives into how you can adapt your infrastructure, operations, staffing, and training to meet the challenges of the new software development life cycle (SDLC) without throwing away everything that already works.
3:45pm-4:25pm (40m) Automation in data science and data, Data Engineering and Architecture Cloud Platforms and SaaS, Deep dive into specific tools, platforms, or frameworks, Model Development, Governance, Operations
ML ops: Applying DevOps practices to machine learning workloads
Sireesha Muppala (Amazon Web Services), Shelbee Eigenbrode (Amazon Web Services), Randall DeFauw (Amazon Web Services)
As an increasing level of automation becomes available to data science, the balance between automation and quality needs to be maintained. Applying DevOps practices to machine learning workloads brings models to the market faster and maintains the quality and integrity of those models. Sireesha Muppala, Shelbee Eigenbrode, and Randall DeFauw explore applying DevOps practices to ML workloads.
11:20am-12:00pm (40m) Data Engineering and Architecture Data Management and Storage, Streaming and IoT
Performant time series data management and analytics with PostgreSQL
Michael Freedman (TimescaleDB | Princeton University)
Leveraging polyglot solutions for your time series data can lead to issues including engineering complexity, operational challenges, and even referential integrity concerns. Michael Freedman explains why, by re-engineering PostgreSQL to serve as a general data platform, your high-volume time series workloads will be better streamlined, resulting in more actionable data and greater ease of use.
1:15pm-1:55pm (40m) Data Engineering and Architecture Deep dive into specific tools, platforms, or frameworks, Transportation and Logistics
How to performance-tune Spark applications in large clusters
Omkar Joshi (Uber), Bo Yang (Uber)
Omkar Joshi and Bo Yang offer an overview of how Uber’s ingestion (Marmary) and observability team improved performance of Apache Spark applications running on thousands of cluster machines and across hundreds of thousands+ of applications and how the team methodically tackled these issues. They also cover how they used Uber’s open-sourced jvm-profiler for debugging issues at scale.
2:05pm-2:45pm (40m) Data Engineering and Architecture Data Integration and Data Processing, Data Management and Storage, Data, Analytics, and AI Architecture, Transportation and Logistics
Creating an extensible 100+ PB real-time big data platform by unifying storage and serving
Reza Shiftehfar (Uber)
Building a reliable big data platform is extremely challenging when it has to store and serve hundreds of petabytes of data in real time. Reza Shiftehfar reflects on the challenges faced and proposes architectural solutions to scale a big data platform to ingest, store, and serve 100+ PB of data with minute-level latency while efficiently utilizing the hardware and meeting security needs.
3:45pm-4:25pm (40m) Data Engineering and Architecture Cloud Platforms and SaaS, Data Management and Storage, Data, Analytics, and AI Architecture, Financial Services
Enabling big data and AI workloads on the object store at DBS Bank
Vitaliy Baklikov (DBS Bank), Dipti Borkar (Alluxio )
Vitaliy Baklikov and Dipti Borkar explore how DBS Bank built a modern big data analytics stack leveraging an object store even for data-intensive workloads like ATM forecasting and how it uses Alluxio to orchestrate data locality and data access for Spark workloads.
4:35pm-5:15pm (40m) Data Engineering and Architecture Data, Analytics, and AI Architecture
Bridging the gap between big data computing and high-performance computing
Supun Kamburugamuve (Indiana University)
Big data computing and high-performance computing (HPC) evolved over the years as separate paradigms. With the explosion of the data and the demand for machine learning algorithms, these two paradigms increasingly embrace each other for data management and algorithms. Supun Kamburugamuve explores the possibilities and tools available for getting the best of HPC and big data.
11:20am-12:00pm (40m) Data Engineering and Architecture Data Integration and Data Processing
Using Spark for crunching astronomical data on the LSST scale
Petar Zecevic (SV Group)
The Large Scale Survey Telescope (LSST) is one of the most important future surveys. Its unique design allows it to cover large regions of the sky and obtain images of the faintest objects. After 10 years of operation, it will produce about 80 PB of data in images and catalog data. Petar Zecevic explains AXS, a system built for fast processing and cross-matching of survey catalog data.
1:15pm-1:55pm (40m) Data Engineering and Architecture Cloud Platforms and SaaS, Data, Analytics, and AI Architecture
The hitchhiker’s guide to the cloud: Architecting for the cloud through customer stories
Sushant Rao (Cloudera)
Jason Wang and Sushant Rao offer an overview of cloud architecture, then go into detail on core cloud paradigms like compute (virtual machines), cloud storage, authentication and authorization, and encryption and security. They conclude by bringing these concepts together through customer stories to demonstrate how real-world companies have leveraged the cloud for their big data platforms.
2:05pm-2:45pm (40m) Data Engineering and Architecture Data Integration and Data Processing, Data quality, data governance and data lineage
Fuzzy matching and deduplicating data: Techniques for advanced data prep
Nikki Rouda (Amazon Web Services), Janisha Anand (Amazon Web Services)
Nikki Rouda and Janisha Anand demonstrate how to deduplicate or link records in a dataset, even when the records don’t have a common unique identifier and no fields match exactly. You'll also learn how to link customer records across different databases, match external product lists against your own catalog, and solve tough challenges to prepare and cleanse data for analysis.
3:45pm-4:25pm (40m) Data Engineering and Architecture Data, Analytics, and AI Architecture
Lessons learned from scaling the tech stack of a modern analytics platform
Scott Castle (Sisense)
In this session, Scott Castle, General Manager at Sisense and former VP of Product at Periscope Data, will discuss lessons learned from scaling up Periscope Data to support incredibly large volumes of data and queries from its data teams.
4:35pm-5:15pm (40m) Data Science, Machine Learning, & AI
Spark on Kubernetes for data science
Jordan Volz (Dataiku)
Spark on Kubernetes is a winning combination for data science that stitches together a flexible platform harnessing the best of both worlds. Jordan Volz gives a brief overview of Spark and Kubernetes, the Spark on Kubernetes project, why it’s an ideal fit for data scientists who may have been dissatisfied with other iterations of Spark in the past, and some applications.
11:20am-12:00pm (40m) Data Engineering and Architecture Cloud Platforms and SaaS, Data Management and Storage
Where's my lookup table? Modeling relational data in a denormalized world
Rick Houlihan (Amazon Web Services)
Data has always been and will always be relational. NoSQL databases are gaining in popularity, but that doesn't change the fact that the data is still relational, it just changes how we have to model the data. Rick Houlihan dives deep into how real entity relationship models can be efficiently modeled in a denormalized manner using schema examples from real application services.
1:15pm-1:55pm (40m) Business Analytics and Visualization, Data Engineering and Architecture BI, Interactive Analytics and Visualization
Intelligent design patterns for cloud-based analytics and BI
Shant Hovsepian (Arcadia Data)
With cloud object storage (e.g., S3, ADLS) one expects business intelligence (BI) applications to benefit from the scale of data and real-time analytics. However, traditional BI in the cloud surfaces nonobvious challenges. Shant Hovsepian examines service-oriented cloud design (storage, compute, catalog, security, SQL) and how native cloud BI provides analytic depth, low cost, and performance.
2:05pm-2:45pm (40m) Data Engineering and Architecture Cloud Platforms and SaaS, Privacy and Security
Securing your cloud data lake with a "defense in depth" approach
Tomer Shiran (Dremio), Jacques Nadeau (Dremio)
With cheap and scalable storage services such as S3 and ADLS, it's never been easier to dump data into a cloud data lake. But you still need to secure that data and be sure it doesn't leak. Tomer Shiran and Jacques Nadeau explore capabilities for securing a cloud data lake, including authentication, access control, encryption (in motion and at rest), and auditing, as well as network protections.
3:45pm-4:25pm (40m) Data Engineering and Architecture, Security and Privacy Deep dive into specific tools, platforms, or frameworks, Privacy and Security
Protect your private data in your Hadoop clusters with ORC column encryption
Owen O'Malley (Cloudera)
Fine-grained data protection at a column level in data lake environments has become a mandatory requirement to demonstrate compliance with multiple local and international regulations across many industries today. Owen O'Malley dives into how column encryption in ORC files enables both fine-grain protection and audits of who accessed the private data.
4:35pm-5:15pm (40m) Data Engineering and Architecture
Using Spark to speed up the diagnosis performance for big data applications
Ruixin Xu (Microsoft), Long Tian (Microsoft), Yu Zhou (Microsoft)
Ruixin Xu, Long Tian, and Yu Zhou explore an experiment run using Spark and Jupyter notebooks as a replacement for existing IDE-based tools for internal DevOps. The Spark-based solution improved the diagnosis performance significantly, especially for a complex job with a large profile, and leveraging the Jupyter notebooks brings the benefit of fast iteration and easy knowledge share.
11:20am-12:00pm (40m) Culture and organization, Strata Business Summit Culture and Organization
Executive Briefing: Creating a center for data science from scratch—Lessons from nonprofit research
Gayle Bieler (RTI International)
Gayle Bieler explains how she built a thriving center for data science within a large, well-respected nonprofit research institute and shares some of its most impactful projects and best adventures to date, that have solved important national problems, improved local communities, and transformed research.
1:15pm-1:55pm (40m) Executive Briefing and best practices, Strata Business Summit Culture and Organization, Ethics
Executive Briefing: Lessons from the front lines—Building a responsible AI/ML program in the enterprise
Keegan Hines (Capital One)
This talk will explore some of the philosophy around the concept of explaining a model given the colloquial definition is partially recursive. It will cover the lens banking regulation places on this philosophical basis and expand into techniques used for these well governed aspects.
2:05pm-2:45pm (40m) Strata Business Summit
Executive Briefing: Unpacking AutoML
Paco Nathan (derwen.ai)
Paco Nathan outlines the history and landscape for vendors, open source projects, and research efforts related to AutoML. Starting from the perspective of an AI expert practitioner who speaks business fluently, Paco unpacks the ground truth of AutoML—translating from the hype into business concerns and practices in a vendor-neutral way.
3:45pm-4:25pm (40m) Culture and organization, Strata Business Summit Culture and Organization, Transportation and Logistics
Executive Briefing: Building a culture of self-service from predeployment to continued engagement
Jonathan Tudor (GE Aviation), Ross Schalmo (GE Aviation)
Jonathan Tudor and Ross Schalmo explore how GE Aviation made it a mission to implement self-service data. To ensure success beyond initial implementation of tools, the data engineering and analytics teams created initiatives to foster engagement from an ongoing partnership with each part of the business to the gamification of tagging data in a data catalog to forming a published dataset council.
4:35pm-5:15pm (40m) Executive Briefing and best practices, Strata Business Summit Data, Analytics, and AI Architecture, Streaming and IoT
Executive Briefing: What it takes to use machine learning in fast data pipelines
Dean Wampler (Anyscale)
Dean Wampler dives into how (and why) to integrate ML into production streaming data pipelines and to serve results quickly; how to bridge data science and production environments with different tools, techniques, and requirements; how to build reliable and scalable long-running services; and how to update ML models without downtime.
11:20am-12:00pm (40m) Strata Business Summit
Executive Briefing: Say what? The ethical challenges of designing for humanlike interaction
Jonathan Foster (Microsoft)
Language shapes our thinking, our relationships, our sense of self. Conversation connects us in powerful, intimate, and often unconscious ways. Jonathan Foster explains why, as we design for natural language interactions and more humanlike digital experiences, language—as design material, conversation, and design canvas—reveals ethical challenges we couldn't encounter with GUI-powered experiences.
1:15pm-1:55pm (40m) Culture and organization, Strata Business Summit Culture and Organization
An in-depth look at the data science career: Defining roles, assessing skills
Usama Fayyad (Open Insights & OODA Health, Inc.), Hamit Hamutcu (Analytics Center)
If you've ever been confused about what it takes to be a data scientist or curious about how companies recruit, train, and manage analytics resources, Usama Fayyad and Hamit Hamutcu are here to explore insights from the most comprehensive research effort to date on the data analytics profession and propose a framework for the standardization of roles and methods for assessing skills.
2:05pm-2:45pm (40m) Case studies, Strata Business Summit BI, Interactive Analytics and Visualization, Telecom
T-Mobile's journey to turn crowdsourced big data into actionable insights
Alex Yoon (T-Mobile)
T-Mobile successfully improved the quality of voice calling by analyzing crowdsourced big data from mobile devices. Alex Yoon walks you through how engineers from multiple backgrounds collaborated to achieve 10% improvement in voice quality and why the analysis of big data was the key to the success in bringing a better voice call service quality to millions of end users.
3:45pm-4:25pm (40m) Case studies, Strata Business Summit Text and Language processing and analysis, Transportation and Logistics
Migrating millions of users from voice- and email-based customer support to a chatbot
Madhu Gopinathan (MakeMyTrip), Sanjay Mohan (MakeMyTrip)
At MakeMyTrip customers were using voice or email to contact agents for postsale support. In order to improve the efficiency of agents and improve customer experience, MakeMyTrip developed a chatbot, Myra, using some of the latest advances in deep learning. Madhu Gopinathan and Sanjay Mohan explain the high-level architecture and the business impact Myra created.
4:35pm-5:15pm (40m) Strata Business Summit
Combining creativity and analytics
David Boyle (Audience Strategies)
Companies that harness creativity and data in tandem have growth rates twice as high as companies that don’t. David Boyle shares lessons from his successes and failures in trying to do just that across presidential politics, with pop stars, and with power brands in the world of luxury goods. Join in to find out how analysts can work differently to build these partnerships and unlock this growth.
11:20am-12:00pm (40m) Data Science, Machine Learning, & AI, Strata Business Summit
How Deutsche Bank industrialized AI and machine learning
John Allen (Deutsche Bank)
As an early adopter of data science, machine learning, and AI, Deutsche Bank's analytics function is trailblazing new ways to drive revenues, lower costs, and reduce risk across all areas of the group. John Allen shares how his team combines commercial offerings with open source technologies to revolutionize legacy processes and transform the way the bank uses technology to drive innovation.
1:15pm-1:55pm (40m) Executive Briefing and best practices, Strata Business Summit
Communication breakdown: Facing machine learning’s all-too-human failure
James Kotecki (Infinia ML)
Miscommunication between business leaders and technical experts can doom even the best data science project. Don’t let it drive you insane! In this session, we’ll dissect many flavors of communication failure, from goal misalignment to technical misunderstanding. Then, we’ll explore practical ways to bridge these gaps.
2:05pm-2:45pm (40m) Business Analytics and Visualization, Strata Business Summit BI, Interactive Analytics and Visualization, Media and Advertising, Temporal data and time-series analytics
ThirdEye: LinkedIn’s business-wide monitoring platform
Akshay Rai (Linkedin)
Failures or issues in a product or service can negatively affect the business. Detecting issues in advance and recovering from them is crucial to keeping the business alive. Join Akshay Rai to learn more about LinkedIn's next-generation open source monitoring platform, an integrated solution for real-time alerting and collaborative analysis.
3:45pm-4:25pm (40m) Law and Ethics, Strata Business Summit BI, Interactive Analytics and Visualization, Ethics
Purposefully designing technology for civic engagement
Audrey Lobo-Pulo (Phoensight), Annette Hester (National Energy Board, Canada)
As new digital platforms emerge and governments look at new ways to engage with citizens, there's an increasing awareness of the role these platforms play in shaping public participation and democracy. Audrey Lobo-Pulo, Annette Hester, and Ryan Hum examine the design attributes of civic engagement technologies and their ensuing impacts and an NEB Canada case study.
4:35pm-5:15pm (40m) Executive Briefing and best practices, Strata Business Summit Privacy and Security
Executive Briefing: Big data in the era of heavy worldwide privacy regulations
Mark Donsky (Okera)
California is following the EU's GDPR with the California Consumer Protection Act (CCPA) in 2020. Penalties for non-compliance, but many companies aren't prepared for this strict regulation. This session will explore the capabilities your data environment needs in order to simplify CCPA and GDPR compliance, as well as other regulations.
11:20am-12:00pm (40m) Sponsored
The key to climbing the AI ladder (sponsored by IBM)
DANIEL HERNANDEZ (IBM)
AI isn't magic. It’s still hard work. Daniel Hernandez explains why having the technology alone isn't enough; it requires a thoughtful and well-architected approach.
1:15pm-1:55pm (40m) Sponsored
So you built a model; now what? (sponsored by Dataiku)
Jed Dougherty (Dataiku)
Jed Dougherty takes a deep dive into an often overlooked aspect of the data science lifecycle: model deployment. Once they’ve constructed a data science model that does a good job accurately predicting their test set, many data scientists think the job is over. But really, it’s just begun.
2:05pm-2:45pm (40m) Sponsored
Powering the future with data intelligence (sponsored by Collibra)
Jim Cushman (Collibra), Piyush Jain (Progressive)
Transforming data into a trusted business asset that informs decision making requires giving teams access to a powerful platform that makes it easy to harness data across the enterprise. Jim Cushman and Piyush Jain detail how Progressive uses Collibra to transform the way data is managed and used across the organization, driving real business value.
11:20am-12:00pm (40m) Sponsored
Deliver personalized experiences and content like Xbox with Cognitive Services Personalizer (sponsored by Microsoft Azure)
Edward Jezierski (Microsoft), Jackie Nichols (Microsoft)
Edward Jezierski and Jackie Nichols demonstrate how Cognitive Services Personalizer works with your content and data, how it autonomously learns to make optimal decisions, how you can add it to your app with two lines of code, and what’s under the hood. Then they share the results Personalizer achieved on the Xbox One home page as well as best practices for applying it in your applications today.
1:15pm-1:55pm (40m) Sponsored
Migrating Hadoop analytics to Spark in the cloud without disruption (sponsored by WANdisco)
Paul Scott-Murphy (WANdisco)
Paul Scott-Murphy dives into the options that exist for cloud migration and their advantages and disadvantages, what cloud vendors do and don't offer to support large-scale migration, the business risks associated with large-scale cloud migration, and how to migrate analytics data at scale for immediate use in Spark without disrupting on-premises operations.
2:05pm-2:45pm (40m) Data Engineering and Architecture, Streaming and IoT Deep dive into specific tools, platforms, or frameworks, Streaming and IoT
Stream processing beyond streaming data
Stephan Ewen (Ververica)
Stephan Ewen details how stream processing is becoming a "grand unifying paradigm" for data processing and the newest developments in Apache Flink to support this trend: new cross-batch-streaming machine learning algorithms, state-of-the-art batch performance, and new building blocks for data-driven applications and application consistency.
11:20am-12:00pm (40m) Sponsored
Organizing the chaos of healthcare with smart data discovery (sponsored by Io-Tahoe)
Charles Boicey (Clearsense)
Healthcare’s reliance on comprehendible data is critical to the mission of providing optimal and affordable care. Charles Boicey takes a deep dive into how the application of technology, such as machine learning, is paramount to the modernization of healthcare that provides its professionals with fully integrated and complete medical records.
1:15pm-1:55pm (40m) Sponsored
Next-generation serverless data architecture for insights at the speed of thought (sponsored by Actian)
Paul Wolmering (Actian)
Paul Wolmering explores the key characteristics for building an Agile data warehouse and defines a reference architecture for hybrid data.
2:05pm-2:45pm (40m) Sponsored
Getting clinical trial data ready for analysis: How IQVIA wrangled its way to success (sponsored by Trifacta)
Matt Derda (Trifacta), Yogesh Prasad (IQVIA)
Clinical trial data analysis can be a complex process. The data is typically hand-coded and formatted differently and is required to be delivered in an FDA-approved format. Matt Derda and Yogesh Prasad explain how IQVIA built its Clean Patient Tracker and how it enabled agility and flexibility for end users of the platform, from data acquisition to reporting and analytics.
11:20am-12:00pm (40m) Sponsored
Transforming Financial Reporting Services with Massively Scalable OLAP (sponsored by Kyvos Insights)
Ajay Anand (Kyvos Insights)
Learn how you can overcome the challenges of traditional OLAP solutions and scale BI to deliver quick insights to business users across your enterprise
1:15pm-1:55pm (40m) Sponsored
The end of applications: How data collaboration is changing everything (sponsored by Cinchy)
Dan DeMers (Cinchy)
After 40 years of apps, enterprise companies now realize that building or buying an application for every use case has become a major threat to their ability to leverage and protect their core data assets. Dan DeMers provides a live demo of Cinchy, the world’s first data collaboration platform.
8:45am-8:55am (10m)
Thursday keynotes
Ben Lorica (O'Reilly), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Program chairs Ben Lorica, Doug Cutting, and Alistair Croll welcome you to the second day of keynotes.
8:55am-9:15am (20m)
Staying safe in the AI era
Cassie Kozyrkov (Google)
Machine learning and artificial intelligence are no longer science fiction, so now you have to address what it takes to harness their potential effectively, responsibly, and reliably. Based on lessons learned at Google, Cassie Kozyrkov offers actionable advice to help you find opportunities to take advantage of machine learning, navigate the AI era, and stay safe as you innovate.
9:15am-9:25am (10m) Sponsored
Unlocking the value of your data (sponsored by IBM)
DANIEL HERNANDEZ (IBM)
Daniel Hernandez takes a deep dive into how, with a unified, prescriptive information architecture, organizations can successfully unlock the value of their data for an AI and multicloud world.
9:25am-9:35am (10m)
Delivering the enterprise data cloud
Arun Murthy (Cloudera )
In this keynote, we’ll introduce you to the new 100% open source Cloudera Data Platform (CDP), the world’s first enterprise data cloud. CDP is hybrid and multi-cloud, delivering the speed, agility, and scale you need to secure and govern your data anywhere from the edge to AI.
9:35am-9:40am (5m) Sponsored
Postrevolutionary big data: Promoting the general welfare (sponsored by Io-Tahoe)
Barbara Eckman (Comcast)
Barbara Eckman shares lessons learned from early big data mistakes and the progress her team at Comcast is making toward a postrevolutionary big data vision.
9:40am-9:45am (5m) Sponsored
RL in real life: Bringing reinforcement learning to the enterprise (sponsored by Microsoft Azure)
Edward Jezierski (Microsoft)
Microsoft has an ecosystem spanning research, gaming, and the cloud that's advancing reinforcement learning (RL) and putting it into everyday use. Join Edward Jezierski to see where RL is used practically across Microsoft and imagine the opportunities that exist for your business today.
9:45am-9:55am (10m)
Strata Data Awards: Winners announced
The Strata Data Awards recognize the most innovative startups, leaders, and data science projects from Strata sponsors and exhibitors around the world. Join us during keynotes for the announcement of the winners.
9:55am-10:15am (20m)
Say what? The ethical challenges of designing for humanlike interaction
Jonathan Foster (Microsoft)
Language shapes our thinking, our relationships, our sense of self. Conversation connects us in powerful, intimate, and often unconscious ways. Jonathan Foster explains why, as we design for natural language interactions and more humanlike digital experiences, language—as design material, conversation, and design canvas—reveals ethical challenges we couldn't encounter with GUI-powered experiences.
10:15am-10:20am (5m) Sponsored
Data Science Pioneers: Conquering the next frontier, a documentary investigating the future of data science (sponsored by Dataiku)
Jed Dougherty (Dataiku)
Jed Dougherty presents the trailer of the upcoming _Data Science Pioneers_ documentary about the passionate data scientists driving us toward technological revolution. Cut through the hype with _Data Science Pioneers_ and see what it really means to be a data scientist.
10:20am-10:40am (20m)
Data sonification: Making music from the yield curve
Alan Smith (Financial Times)
Based on a critical evaluation of the iconic yield curve chart, Alan Smith argues that combining visualization (data to pixels) with sonification (data to pitch) offers potential to improve not only aesthetic multimedia experiences but also an opportunity to take the presentation of data into the rapidly expanding universe of screenless devices and products.
10:40am-10:45am (5m)
Closing remarks
Ben Lorica (O'Reilly), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Program chairs, Ben Lorica, Doug Cutting, and Alistair Croll, offer closing remarks.
10:50am-11:20am (30m)
Break: Morning break sponsored by Cisco
12:00pm-1:15pm (1h 15m)
Break
12:00pm-1:15pm (1h 15m)
Thursday Topic Tables at Lunch (sponsored by IBM)
Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics.
12:30pm-1:10pm (40m) Expo Hall
Why AI fails: Overcoming AI challenges (sponsored by IBM)
Brittany Bogle (IBM)
AI will be the most disruptive class of technologies over the next decade, fueled by near-endless amounts of data and unprecedented advances in deep learning. Brittany Bogle walks you through how to address some of the major AI challenges, like trust, talent, and data.
2:45pm-3:45pm (1h)
Break: Afternoon break sponsored by Io-Tahoe
8:00am-8:30am (30m)
Speed Networking
Gather before keynotes on Thursday morning to enjoy casual conversation while meeting fellow attendees. If one of your goals at Strata is to meet new people, this session will jumpstart your networking with other attendees.
8:30am-8:45am (15m)
Break: Early morning coffee (8:00am - 8:45am)
12:00pm-1:15pm (1h 15m)
Thursday Topic Tables at Lunch (sponsored by IBM)
Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics.
12:00pm-1:15pm (1h 15m)
Thursday Business Summit Lunch
Join Strata Business Summit speakers and attendees for a networking lunch on Thursday.
  • Cloudera
  • O'Reilly
  • Google Cloud
  • IBM
  • Cisco
  • Dataiku
  • Intel
  • Io-Tahoe
  • MemSQL
  • Microsoft Azure
  • Oracle Cloud Infrastructure
  • SAS
  • Arcadia Data
  • BMC Software
  • Hazelcast
  • SAP
  • Amazon Web Services
  • Anaconda
  • Esri
  • Infoworks.io, Inc.
  • Kyligence
  • Pitney Bowes
  • Talend
  • Google Cloud
  • Confluent
  • DataStax
  • Dremio
  • Immuta
  • Impetus Technologies Inc.
  • Keyence
  • Kyvos Insights
  • StreamSets
  • Striim
  • Syncsort
  • SK holdings C&C

    Contact us

    confreg@oreilly.com

    For conference registration information and customer service

    partners@oreilly.com

    For more information on community discounts and trade opportunities with O’Reilly conferences

    strataconf@oreilly.com

    For information on exhibiting or sponsoring a conference

    pr@oreilly.com

    For media/analyst press inquires