Presented By
O’Reilly + Cloudera
Make Data Work
March 25-28, 2019
San Francisco, CA

Schedule

Monday, 03/25/2019

8:30am

8:30am–9:00am Monday, 03/25/2019
Location: 2nd floor lobby
Early morning coffee (30m)

9:00am

Add to your personal schedule
9:00am–5:00pm Monday, 03/25/2019
Training
Strata Business Summit
Location: 2010
Michael Li (The Data Incubator), Rich Ott (The Pragmatic Institute)
Average rating: ****.
(4.50, 4 ratings)
Michael Li and Rich Ott offer a nontechnical overview of AI and data science. Learn common techniques, how to apply them in your organization, and common pitfalls to avoid. You’ll pick up the language and develop a framework to be able to effectively engage with technical experts and utilize their input and analysis for your business’s strategic priorities and decision making. Read more.
Add to your personal schedule
9:00am–5:00pm Monday, 03/25/2019
Training
Data Science, Machine Learning & AI
Location: 2014
Secondary topics:  Deep Learning
Robert Schroll (The Data Incubator)
Average rating: ****.
(4.50, 2 ratings)
The TensorFlow library provides for the use of computational graphs, with automatic parallelization across resources. This architecture is ideal for implementing neural networks. Robert Schroll offers an overview of TensorFlow's capabilities in Python, demonstrating how to build machine learning algorithms piece by piece and how to use TensorFlow's Keras API with several hands-on applications. Read more.
Add to your personal schedule
9:00am–5:00pm Monday, 03/25/2019
Training
Data Science, Machine Learning & AI
Location: 2016
Don Fox (The Data Incubator)
Average rating: ****.
(4.75, 12 ratings)
Don Fox walks you through developing a machine learning pipeline, from prototyping to production. You'll learn about data cleaning, feature engineering, model building and evaluation, and deployment and then extend these models into two applications from real-world datasets. All work will be done in Python. Read more.
Add to your personal schedule
9:00am–5:00pm Monday, 03/25/2019
Training
Data Engineering & Architecture
Location: 2018
Jorge Lopez (Amazon Web Services), Roy Hasson (Amazon Web Services), Rajeev Chakrabarti (Amazon Web Services), Jesse Gebhardt (Amazon Web Services), Gautam Srinivasan (Amazon Web Services), Anthony Nguyen (Amazon Web Services)
Average rating: ****.
(4.50, 4 ratings)
Serverless technologies let you build and scale applications and services rapidly without the need to provision or manage servers. Join in to learn how to incorporate serverless concepts into your big data architectures, looking at design patterns to ingest, store, and analyze your data. You'll then build a big data application using AWS technologies such as S3, Athena, Kinesis, and more. Read more.
Add to your personal schedule
9:00am–5:00pm Monday, 03/25/2019
Training
Data Science, Machine Learning & AI
Location: 2020
Secondary topics:  Deep Learning
Ian Cook (Cloudera)
Average rating: ****.
(4.00, 1 rating)
Advancing your career in data science requires learning new languages and frameworks—but learners face an overwhelming array of choices, each with different syntaxes, conventions, and terminology. Ian Cook simplifies the learning process by elucidating the abstractions common to these systems. Through hands-on exercises, you'll overcome obstacles to getting started using new tools. Read more.
Add to your personal schedule
9:00am–5:00pm Monday, 03/25/2019
Training
Data Engineering & Architecture
Location: 3016
Jesse Anderson (Big Data Institute)
Average rating: ***..
(3.00, 1 rating)
Jesse Anderson leads a deep dive into Apache Kafka. You'll learn how Kafka works and how to create real-time systems with it. You'll also discover how to create consumers and publishers in Kafka and how to use Kafka Streams, Kafka Connect, and KSQL as you explore the Kafka ecosystem. Read more.
Add to your personal schedule
9:00am–5:00pm Monday, 03/25/2019
Training
Data Science, Machine Learning & AI
Location: 3018
Francesca Lazzeri (Microsoft), Jen Ren (Microsoft)
Francesca Lazzeri and Jen Ren walk you through the core steps for using Azure Machine Learning services to train your machine learning models both locally and on remote compute resources. Read more.

10:30am

10:30am–11:00am Monday, 03/25/2019
Location: 2nd floor lobby
Morning break (30m)

12:30pm

12:30pm–1:30pm Monday, 03/25/2019
Location: 2nd floor lobby
Lunch (1h)

3:00pm

3:00pm–3:30pm Monday, 03/25/2019
Location: 2nd floor lobby
Afternoon break (30m)

Tuesday, 03/26/2019

7:30am

7:30am–9:00am Tuesday, 03/26/2019
Location: 2nd floor lobby
Early morning coffee (1h 30m)

9:00am

Add to your personal schedule
9:00am–12:30pm Tuesday, 03/26/2019
Tutorial
Data Science, Machine Learning & AI
Location: 2001
Secondary topics:  Ethics, Security and Privacy
Iman Saleh (Intel), Cory Ilo (Intel), Cindy Tseng (Intel)
Average rating: *****
(5.00, 3 ratings)
From healthcare to smart home to autonomous vehicles, new applications of autonomous systems are raising ethical concerns about a host of issues, including bias, transparency, and privacy. Iman Saleh, Cory Ilo, and Cindy Tseng demonstrate tools and capabilities that can help data scientists address these concerns and bridge the gap between ethicists, regulators, and machine learning practitioners. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 03/26/2019
Tutorial
Data Science, Machine Learning & AI
Location: 2002
Martin Gorner (Google)
Average rating: ****.
(4.50, 4 ratings)
Martin Gorner leads a hands-on introduction to recurrent neural networks and TensorFlow. Join in to discover what makes RNNs so powerful for time series analysis. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 03/26/2019
Joshua Poduska (Domino Data Lab), Kimberly Shenk (NakedPoppy), Mac Steele (Domino)
Average rating: ****.
(4.60, 15 ratings)
The honeymoon era of data science is ending, and accountability is coming. Successful data science leaders must deliver measurable impact on an increasing share of an enterprise's KPIs. Joshua Poduska, Kimberly Shenk, and Mac Steele explain how leading organizations take a holistic approach to people, process, and technology to build a sustainable competitive advantage. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 03/26/2019
Fabian Hueske (Ververica)
Average rating: *****
(5.00, 1 rating)
Fabian Hueske offers an overview of Apache Flink via the SQL interface, covering stream processing and Flink's various modes of use. Then you'll use Flink to run SQL queries on data streams and contrast this with the Flink DataStream API. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 03/26/2019
Tutorial
Data Engineering & Architecture
Location: 2005
Mark Madsen (Teradata), Todd Walter (Archimedata)
Average rating: ****.
(4.21, 28 ratings)
Building a data lake involves more than installing Hadoop or putting data into AWS. The goal in most organizations is to build a multiuse data infrastructure that isn't subject to past constraints. Mark Madsen and Todd Walter explore design assumptions and principles and walk you through a reference architecture to use as you work to unify your analytics infrastructure. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 03/26/2019
Tutorial
Data Engineering & Architecture
Location: 2006
Jonathan Seidman (Cloudera), Ted Malaska (Capital One)
Average rating: ****.
(4.00, 6 ratings)
The enterprise data management space has changed dramatically in recent years, and this had led to new challenges for organizations in creating successful data practices. Jonathan Seidman and Ted Malaska share guidance and best practices from planning to implementation based on years of experience working with companies to deliver successful data projects. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 03/26/2019
Boris Lublinsky (Lightbend), Dean Wampler (Anyscale)
Average rating: ***..
(3.85, 13 ratings)
Boris Lublinsky and Dean Wampler walk you through using ML in streaming data pipeline and doing periodic model retraining and low-latency scoring in live streams. You'll explore using Kafka as a data backplane, the pros and cons of microservices versus systems like Spark and Flink, tips for TensorFlow and SparkML, performance considerations, model metadata tracking, and other techniques. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 03/26/2019
Tutorial
Data Engineering & Architecture
Location: 2008
Santosh Kumar (Cloudera), Andre Araujo (Cloudera), Wim Stoop (Cloudera)
Average rating: *****
(5.00, 1 rating)
Cloudera SDX provides unified metadata control, simplifies administration, and maintains context and data lineage across storage services, workloads, and operating environments. Santosh Kumar, Andre Araujo, and Wim Stoop offer an overview of SDX before diving deep into the moving parts and guiding you through setting it up. You'll leave with the skills to set up your own SDX. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 03/26/2019
Tutorial
Data Science, Machine Learning & AI
Location: 2009
David Talby (Pacific AI), Alex Thomas (John Snow Labs), Claudiu Branzan (Accenture)
Average rating: ****.
(4.75, 8 ratings)
David Talby, Alex Thomas, and Claudiu Branzan lead a hands-on introduction to scalable NLP using the highly performant, highly scalable open source Spark NLP library. You’ll spend about half your time coding as you work through four sections, each with an end-to-end working codebase that you can change and improve. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 03/26/2019
Location: 2011
Tutorial TBC
Add to your personal schedule
Add to your personal schedule
Add to your personal schedule
Add to your personal schedule
Add to your personal schedule
9:00am–5:00pm Tuesday, 03/26/2019
Location: 2022
Alex Kudriashova (Astro Digital), Jonathan Francis (Starbucks), JoLynn Lavin (General Mills), Robin Way (Corios), June Andrews (GE), Kyungtaak Noh (SK Telecom), Taposh DuttaRoy (Kaiser Permanente), Sabrina Dahlgren (Kaiser Permanente), Craig Rowley (Columbia Sportswear), Ambal Balakrishnan (IBM), Benjamin Glicksberg (UCSF), Patrick Lucey (Stats Perform), Rhonda Textor (True Fit)
Hear practical insights from household brands and global companies: the challenges they tackled, approaches they took, and the benefits—and drawbacks—of their solutions. Read more.
Add to your personal schedule
9:00am–5:00pm Tuesday, 03/26/2019
Location: 2024
Susan Etlinger (Altimeter Group), Alistair Croll (Solve For Interesting), Susan Etlinger (Altimeter Group), Jake Metcalf (Ethical Resolve), Emanuel Moss (Data & Society), Bradley Voytek (UC San Diego ), Jonathan Foster (Microsoft), Yiannis Kanellopoulos (Code4Thought), Kathy Baxter (Salesforce), Bulbul Gupta (Socos Labs), Brian Rieger (Labelbox), Carole Piovesan (INQ Data Law), Jana Eggers (Nara Logics), Irina Raicu (Santa Clara University), Brian Green (Santa Clara University), Alistair Croll (Solve For Interesting), Susan Etlinger (Altimeter Group), Tim O'Reilly (O'Reilly Media), Bradley Voytek (UC San Diego ), Jana Eggers (Nara Logics), Jonathan Foster (Microsoft), Brian Rieger (Labelbox), Rachel Thomas (fast.ai), Yiannis Kanellopoulos (Code4Thought), Rumman Chowdhury (Accenture), Kathy Baxter (Salesforce), Carole Piovesan (INQ Data Law), Stuart Buck (Arnold Ventures)
In this day-long event, academics, practitioners, and innovators dive deep into the thorny issues of data, privacy, bias, and morality that are at the forefront of today's headlines. Read more.
Add to your personal schedule
Add to your personal schedule

10:30am

10:30am–11:00am Tuesday, 03/26/2019
Location: 2nd floor lobby
Morning break (30m)

12:30pm

12:30pm–1:30pm Tuesday, 03/26/2019
Location: 2nd and 3rd floor lobbies
Lunch (1h)

1:30pm

Add to your personal schedule
1:30pm–5:00pm Tuesday, 03/26/2019
Tutorial
Data Science, Machine Learning & AI
Location: 2001
Secondary topics:  Ethics
Patrick Hall (bnh.ai | H2O.ai)
Average rating: ****.
(4.00, 9 ratings)
If machine learning can lead to financial gains for your organization, why isn’t everyone doing it? One reason is training machine learning systems with transparent inner workings and auditable predictions is difficult. Patrick Hall details the good, bad, and downright ugly lessons learned from his years of experience implementing solutions for interpretable machine learning. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 03/26/2019
Tutorial
Data Science, Machine Learning & AI
Location: 2002
Abhishek Kumar (Publicis Sapient), pramod singh (Walmart Labs )
Average rating: ****.
(4.17, 6 ratings)
Abhishek Kumar and Pramod Singh walk you through deep learning-based recommender and personalization systems they've built for clients. Join in to learn how to use TensorFlow Serving and MLflow for end-to-end productionalization, including model serving, Dockerization, reproducibility, and experimentation, and Kubernetes for deployment and orchestration of ML-based microarchitectures. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 03/26/2019
Andrew Burt (bnh.ai), Steven Touw (Immuta), richard geering (Immuta), Joseph Regensburger (Immuta), Alfred Rossi (Immuta)
Average rating: *****
(5.00, 2 ratings)
As ML becomes increasingly important for businesses and data science teams alike, managing its risks is quickly becoming one of the biggest challenges to the technology’s widespread adoption. Join Andrew Bur, Steven Touw, Richard Geering, Joseph Regensburger, and Alfred Rossi for a hands-on overview of how to train, validate, and audit machine learning models (ML) in practice. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 03/26/2019
Tutorial
Data Engineering & Architecture
Location: 2004
Matt Fuller (Starburst)
Average rating: ***..
(3.57, 7 ratings)
Used by Facebook, Netflix, Airbnb, LinkedIn, Twitter, Uber, and others, Presto has become the ubiquitous open source software for SQL on anything. Presto was built from the ground up for fast interactive SQL analytics against disparate data sources ranging in size from GBs to PBs. Join Matt Fuller to learn how to use Presto and explore use cases and best practices you can implement today. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 03/26/2019
Tutorial
Data Engineering & Architecture
Location: 2005
Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio)
Average rating: **...
(2.67, 12 ratings)
Many industry segments have been grappling with fast data (high-volume, high-velocity data). Arun Kejariwal and Karthik Ramasamy walk you through the state-of-the-art systems for each stage of an end-to-end data processing pipeline—messaging, compute, and storage—for real-time data and algorithms to extract insights (e.g., heavy hitters and quantiles) from data streams. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 03/26/2019
Tutorial
Data Engineering & Architecture
Location: 2006
Sourav Dey (Manifold), Alex Ng (Manifold)
Average rating: ****.
(4.25, 4 ratings)
Many teams are still run as if data science is mainly about experimentation, but those days are over. Now it must offer turnkey solutions to take models into production. Sourav Day and Alex Ng explain how to streamline an ML project and help your engineers work as an integrated part of your production teams, using a Lean AI process and the Orbyter package for Docker-first data science. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 03/26/2019
Tutorial
Data Engineering & Architecture
Location: 2007
Holden Karau (Independent), Francesca Lazzeri (Microsoft), Trevor Grant (IBM)
Average rating: ***..
(3.00, 2 ratings)
Holden Karau, Francesca Lazzeri, and Trevor Grant offer an overview of Kubeflow and walk you through using it to train and serve models across different cloud environments (and on-premises). You'll use a script to do the initial setup work, so you can jump (almost) straight into training a model on one cloud and then look at how to set up serving in another cluster/cloud. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 03/26/2019
Tutorial
Data Engineering & Architecture
Location: 2008
Jason Wang (Cloudera), Brandon Freeman (Cloudera), Michael Kohs (Cloudera), Akihiro Ishikawa (Cloudera), Toby Ferguson (Cloudera)
Average rating: ***..
(3.20, 5 ratings)
There are many challenges with moving multidisciplinary big data workloads to the cloud and running them. Jason Wang, Brandon Freeman, Michael Kohs, Akihiro Nishikawa, and Toby Ferguson explore cloud architecture and its challenges and walk you through using Cloudera Altus to build data warehousing and data engineering clusters and run workloads that share metadata between them using Cloudera SDX. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 03/26/2019
Tutorial
Data Science, Machine Learning & AI
Location: 2009
Jason Dai (Intel), Yuhao Yang (Intel), Jiao(Jennie) Wang (Intel), Guoqiong Song (Intel)
Average rating: ***..
(3.00, 6 ratings)
Jason Dai, Yuhao Yang, Jennie Wang, and Guoqiong Song explain how to build and productionize deep learning applications for big data with Analytics Zoo—a unified analytics and AI platform that seamlessly unites Spark, TensorFlow, Keras, and BigDL programs into an integrated pipeline—using real-world use cases from JD.com, MLSListings, the World Bank, Baosight, and Midea/KUKA. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 03/26/2019
Tutorial
Data Science, Machine Learning & AI
Location: 2011
Chi-Yi Kuan (LinkedIn), Tiger Zhang (LinkedIn), Xiaojing Dong (LinkedIn), Burcu Baran (LinkedIn), Emily Huang (LinkedIn)
Average rating: ****.
(4.43, 14 ratings)
Thanks to the rapid growth in data resources, business leaders now appreciate the importance (and the challenge) of mining information from data. Join in as a group of LinkedIn's data scientists share their experiences successfully leveraging emerging techniques to assist in intelligent decision making. Read more.

3:00pm

3:00pm–3:30pm Tuesday, 03/26/2019
Location: 2nd floor lobby
Afternoon break (30m)

5:00pm

Add to your personal schedule
5:00pm–7:00pm Tuesday, 03/26/2019
Event
Location: Expo Hall (Exhibit Hall - Level 1)
Average rating: *****
(5.00, 1 rating)
Enjoy delicious snacks and beverages with fellow Strata attendees, speakers, and sponsors at the Opening Reception happening immediately after tutorials on Tuesday. Read more.

7:30pm

Add to your personal schedule
7:30pm–9:30pm Tuesday, 03/26/2019
Event
Location: Various Locations
Average rating: *****
(5.00, 1 rating)
Get to know your fellow attendees over dinner. We've made reservations for you at some of the most sought-after restaurants in town. This is a great chance to make new connections and sample some of the great cuisine San Francisco has to offer. Read more.

Wednesday, 03/27/2019

7:30am

7:30am–8:45am Wednesday, 03/27/2019
Location: 3rd floor lobby
Early morning coffee (1h 15m)

8:15am

Add to your personal schedule
8:15am–8:45am Wednesday, 03/27/2019
Event
Location: 3rd floor lobby
Average rating: ****.
(4.00, 3 ratings)
Gather before keynotes on Wednesday morning to enjoy casual conversation while meeting fellow attendees. If one of your goals at Strata is to meet new people, this session will jumpstart your networking opportunities. Read more.

8:55am

Add to your personal schedule
8:55am–9:05am Wednesday, 03/27/2019
Keynote
Location: Ballroom
Ben Lorica (O'Reilly), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Average rating: ***..
(3.89, 9 ratings)
Program chairs Ben Lorica, Alistair Croll, and Doug Cutting welcome you to the first day of keynotes. Read more.

9:05am

Add to your personal schedule
9:05am–9:20am Wednesday, 03/27/2019
Keynote
Location: Ballroom
Amy O'Connor (Cloudera)
Average rating: ***..
(3.26, 46 ratings)
Cloudera “drinks its own champagne”—running Cloudera on Cloudera. The company analyzes data from the edge and runs probabilistic models to tune its business processes with AI, from marketing, sales, and support to strategic planning. Amy O'Connor shares what Cloudera has learned from the edge to AI and explains how it's helping Cloudera and its customers get better at data-driven. Read more.

9:20am

Add to your personal schedule
9:20am–9:25am Wednesday, 03/27/2019
Keynote
Location: Ballroom
Average rating: **...
(2.78, 27 ratings)
The journey to AI begins with data and making intelligent use of it. Dinesh Nirmal shares a strategic framework for streamlining your data assets, a framework that takes into account the current state of your existing data structures, the new technologies driving enterprise, the complexities of business processes, and at the foundation, the elements required in an AI-fluent data platform. Read more.

9:25am

Add to your personal schedule
9:25am–9:30am Wednesday, 03/27/2019
Keynote
Location: Ballroom
Jed Dougherty (Dataiku)
Average rating: ****.
(4.37, 41 ratings)
One widely accepted definition of AI is that it means going beyond simple statistics to mimic human skills in perception, learning, interaction, and decision making. Jed Dougherty tightens up this definition by sharing examples on a matrix that breaks down the different parts of that definition and how they might manifest themselves in data science projects at different levels. Read more.

9:30am

Add to your personal schedule
9:30am–9:40am Wednesday, 03/27/2019
Keynote
Location: Ballroom
Ben Lorica (O'Reilly)
Average rating: ****.
(4.21, 29 ratings)
Keynote with Ben Lorica Read more.

9:40am

Add to your personal schedule
9:40am–10:00am Wednesday, 03/27/2019
Keynote
Location: Ballroom
Secondary topics:  Security and Privacy
David Sanger (The New York Times)
Average rating: ****.
(4.32, 50 ratings)
David Sanger explains how the rise of cyberweapons has transformed geopolitics like nothing since the invention of the atomic bomb. From crippling infrastructure to sowing discord and doubt, cyber is now the weapon of choice for democracies, dictators, and terrorists. Read more.

10:00am

Add to your personal schedule
10:00am–10:20am Wednesday, 03/27/2019
Keynote
Location: Ballroom
Secondary topics:  Security and Privacy
Shafi Goldwasser (UC Berkeley | MIT | Weizmann Institute of Science | Duality)
Average rating: ***..
(3.41, 22 ratings)
Keynote with Shafi Goldwasser Read more.

10:30am

10:30am–11:00am Wednesday, 03/27/2019
Location: Expo Hall (Exhibit Hall - Level 1)
Morning break sponsored by Dataiku (30m)

11:00am

Add to your personal schedule
11:00am–11:40am Wednesday, 03/27/2019
Session
Data Engineering & Architecture
Location: 2001
Jitender Aswani (Netflix), Di Lin (Netflix), Girish Lingappa (Netflix)
Average rating: ***..
(3.40, 15 ratings)
Hundreds of thousands of ETL pipelines ingest over a trillion events daily to populate millions of data tables downstream at Netflix. Jitender Aswani, Girish Lingappa, and Di Lin discuss Netflix’s internal data lineage service, which was essential for enhancing platform’s reliability, increasing trust in data, and improving data infrastructure efficiency. Read more.
Add to your personal schedule
11:00am–11:40am Wednesday, 03/27/2019
Session
Data Engineering & Architecture
Location: 2002
JIAN CHANG (Alibaba Group), Sanjian Chen (Alibaba Group)
Average rating: ****.
(4.50, 4 ratings)
Jian Chang and Sanjian Chen outline the design of the AI engine on Alibaba's TSDB service, which enables fast and complex analytics of large-scale retail data. They then share a successful case study of the Fresh Hema Supermarket, a major “new retail” platform operated by Alibaba Group, highlighting solutions to the major technical challenges in data cleaning, storage, and processing. Read more.
Add to your personal schedule
11:00am–11:40am Wednesday, 03/27/2019
Session
Sponsored
Location: 2003
Ian Swanson (Oracle)
Average rating: ***..
(3.00, 2 ratings)
Being an AI-­driven enterprise earlier than a competitor is an opportunity within your reach. Join in to find out how, as Ian Swanson dives into problem domains, platform differentiators, ease of use, automation, and scale and shares best practices on quick starts with the right infrastructure choices. Read more.
Add to your personal schedule
11:00am–11:40am Wednesday, 03/27/2019
Session
Data Engineering & Architecture
Location: 2004
Shubham Tagra (Qubole)
Average rating: ***..
(3.50, 8 ratings)
Did you know you can run Presto in AWS at a tenth of the cost with AWS Spot nodes, with just a few architectural enhancements to Presto. Shubham Tagra explores the gaps in Presto architecture, explains how to use Spot nodes, covers enhancements, and showcases the improvements in terms of reliability and TCO achieved through them. Read more.
Add to your personal schedule
11:00am–11:40am Wednesday, 03/27/2019
Average rating: ****.
(4.50, 4 ratings)
Sam Lightstone discusses how AI is fundamentally changing computer science and the practice of coding. Join in to discover what machine learning means today and explore recent advances in hardware and software and breakthrough innovations. Read more.
Add to your personal schedule
11:00am–11:40am Wednesday, 03/27/2019
Session
Data Engineering & Architecture
Location: 2006
Osman Sarood (Mist Systems), Chunky Gupta (Mist Systems)
Average rating: ****.
(4.67, 3 ratings)
Osman Sarood and Chunky Gupta discuss Mist’s real-time data pipeline, focusing on Live Aggregators (LA)—a highly reliable and scalable in-house real-time aggregation system that can autoscale for sudden changes in load. LA is 80% cheaper than competing streaming solutions due to running over AWS Spot Instances and having 70% CPU utilization. Read more.
Add to your personal schedule
11:00am–11:40am Wednesday, 03/27/2019
Session
Sponsored
Location: 2007
Chris Bush (Levi's )
Average rating: ****.
(4.50, 2 ratings)
Building a data science practice in any environment is difficult. Integrating data science into a long-standing company with established processes, complex business operations, and global scale creates additional layers of complexity that need to be navigated. Chris Bush explains how Levi’s is tackling this challenge and shares the company's continuing evolution to leverage data science. Read more.
Add to your personal schedule
11:00am–11:40am Wednesday, 03/27/2019
Session
Data Engineering & Architecture
Location: 2008
Diego Oppenheimer (Algorithmia)
Average rating: ****.
(4.00, 11 ratings)
You've invested heavily in cleaning your data, feature engineering, training, and tuning your model—but now you have to deploy your model into production, and you discover it's a huge challenge. Diego Oppenheimer shares common architectural patterns and best practices of the most advanced organizations who are deploying your model for scalability and accessibility. Read more.
Add to your personal schedule
11:00am–11:40am Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: 2009
Secondary topics:  Ethics, Financial Services
Jari Koister (FICO )
Average rating: ****.
(4.33, 3 ratings)
Financial services are increasingly deploying AI services for a wide range of applications, such as identifying fraud and financial crimes. Such deployment requires models to be interpretable, explainable, and resilient to adversarial attacks—regulatory requirements prohibit black-box machine learning models. Jari Koister shares tools and infrastructure has developed to support these needs. Read more.
Add to your personal schedule
11:00am–11:40am Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: 2010
Robert Horton (Microsoft), Mario Inchiosa (Microsoft), Ali Zaidi (Microsoft)
Average rating: ****.
(4.70, 10 ratings)
Robert Horton, Mario Inchiosa, and Ali Zaidi demonstrate how to use three cutting-edge machine learning techniques—transfer learning from pretrained language models, active learning to make more effective use of a limited labeling budget, and hyperparameter tuning to maximize model performance—to up your modeling game. Read more.
Add to your personal schedule
11:00am–11:40am Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: 2011
Tristan Zajonc (Cloudera), Tim Chen (Cloudera)
Average rating: ****.
(4.40, 5 ratings)
Data platforms are being asked to support an ever increasing range of workloads and compute environments, including machine learning and elastic cloud platforms. Tristan Zajonc and Tim Chen discuss emerging capabilities, including running machine learning and Spark workloads on autoscaling container platforms, and share their vision for the road ahead for ML and AI in the cloud. Read more.
Add to your personal schedule
11:00am–11:40am Wednesday, 03/27/2019
Session
Sponsored
Location: 2014
Secondary topics:  Jupyter
Alan Chin (IBM), LUCIANO RESENDE (IBM)
Average rating: ****.
(4.75, 4 ratings)
Alan Chin and Luciano Resende explain how to introduce Jupyter Enterprise Gateway into new and existing notebook environments to enable a "bring your own notebook" model while simultaneously optimizing resources consumed by the notebook kernels running across managed clusters within the enterprise. Read more.
Add to your personal schedule
11:00am–11:40am Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: 2016
Jeremy Howard ( fast.ai | USF | doc.ai and platform.ai)
Average rating: ****.
(4.80, 5 ratings)
Jeremy Howard describes how to leverage the latest research from the deep learning and HCI communities to train neural networks from scratch—without code or preexisting labels. He then shares case studies in fashion, retail and ecommerce, travel, and agriculture where these approaches have been used. Read more.
Add to your personal schedule
11:00am–11:40am Wednesday, 03/27/2019
Average rating: ***..
(3.40, 5 ratings)
Jaipaul Agonus and Daniel Monteiro do Carmo Rosa detail big data analytics and visualization practices and tools used by FINRA to support machine learning and other surveillance activities that the Market Regulation Department conducts in the AWS cloud. Read more.
Add to your personal schedule
11:00am–11:40am Wednesday, 03/27/2019
Mike Olson (Cloudera)
Average rating: ***..
(3.80, 5 ratings)
It's easier than ever to collect data, but managing it securely in compliance with regulations and legal constraints is harder. Mike Olson discusses the risks and the issues that matter most and explains how an enterprise data cloud that embraces your data center and the public cloud in combination can address them, delivering real business results for your organization. Read more.
Add to your personal schedule
11:00am–11:40am Wednesday, 03/27/2019
Session
Sponsored
Location: 2022
Average rating: ****.
(4.33, 3 ratings)
Raghu Chakravarth explores key considerations when building an Agile data warehouse and outlines a reference architecture for hybrid data. Read more.
Add to your personal schedule
11:00am–11:40am Wednesday, 03/27/2019
Session
Case studies, Strata Business Summit
Location: 2024
Secondary topics:  Media, Marketing, Advertising
Bysshe Easton (KIXEYE), Thomas Dobbs (KIXEYE)
Average rating: ****.
(4.50, 2 ratings)
As a fully closed model economy, games offer a unique opportunity to use analytics to create unique purchase opportunities for customers. Bysshe Easton and Thomas Dobbs explain how KIXEYE uses machine learning to create personalized offer recommendations for its customers, resulting in significantly increased monetization and retention. Read more.
Add to your personal schedule
11:00am–11:40am Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI, Expo Hall
Location: Expo Hall
Alon Kaufman (Duality), Vinod Vaikuntanathan (MIT and Duality Technologies)
Average rating: ***..
(3.75, 4 ratings)
Alon Kaufman and Vinod Vaikuntanathan discuss the challenges and opportunities of machine learning on encrypted data and describe the state of the art in this space. Read more.

11:50am

Add to your personal schedule
11:50am–12:30pm Wednesday, 03/27/2019
Session
Data Engineering & Architecture
Location: 2001
Sandeep U (Intuit)
Average rating: ****.
(4.57, 7 ratings)
How efficient is your data platform? The single metric Intuit uses is time to reliable insights: the total of time spent to ingest, transform, catalog, analyze, and publish. Sandeep Uttamchandani shares three design patterns/frameworks Intuit implemented to deal with three challenges to determining time to reliable insights: time to discover, time to catalog, and time to debug for data quality. Read more.
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/27/2019
Session
Data Engineering & Architecture
Location: 2002
Kurt Brown (Netflix)
Average rating: ****.
(4.22, 9 ratings)
The Netflix data platform is a massive-scale, cloud-only suite of tools and technologies. It includes big data tech (Spark and Flink), enabling services (federated metadata management), and machine learning support. But with power comes complexity. Kurt Brown explains how Netflix is working toward an easier, "self-service" data platform without sacrificing any enabling capabilities. Read more.
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/27/2019
Session
Sponsored
Location: 2003
Sarah Gates (SAS)
Average rating: ***..
(3.50, 2 ratings)
SAS empowers you with choice and control, helping you uncover insights from any data for better, faster decisions regardless of language.  Sarah Gates shares methods for accelerating the analytics lifecycle, improving data preparation, quality, and governance, automating and speeding up time-consuming tasks, and quickly creating, selecting, and deploying models—be it one or thousands. Read more.
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/27/2019
Session
Data Engineering & Architecture
Location: 2004
Lars Volker (Cloudera), Michael Ho (Cloudera)
Average rating: ****.
(4.50, 6 ratings)
In recent years, Apache Impala has been deployed to clusters that are large enough to hit architectural limitations in the stack. Lars Volker and Michael Ho cover the efforts to address the scalability limitations in the now legacy Thrift RPC framework by using Apache Kudu's RPC, which was built from the ground up to support asynchronous communication, multiplexed connections, TLS, and Kerberos. Read more.
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/27/2019
Session
Sponsored
Location: 2005
Mehul Shah (Amazon Web Services )
Average rating: *****
(5.00, 2 ratings)
Mehul Shah offers an overview of serverless computing and details AWS Glue's severless analytics features for data science, data discovery, data cleaning and transformation, and data lake management. Read more.
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/27/2019
Session
Data Engineering & Architecture
Location: 2006
Average rating: ****.
(4.60, 5 ratings)
In a large global health services company, streaming data for processing and sharing comes with its own challenges. Data science and analytics platforms need data fast, from relevant sources, to act on this data quickly and share the insights with consumers with the same speed and urgency. Join Mohammad Quraishi to learn why streaming data architectures are a necessity—Kafka and Hadoop are key. Read more.
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/27/2019
Session
Sponsored
Location: 2007
SYED LATHEEF (Verizon)
Average rating: ****.
(4.00, 1 rating)
Verizon wanted to use its BI on Big Data platform to enable real-time artificial intelligence and machine learning to identify friction points, detect anomalies on the fly, and fix issues instantly. Latheef Syed explains how Verizon utilizes Kyvos as a next-generation analytical platform that delivers real-time AI, ML, and BI. Read more.
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/27/2019
Session
Data Engineering & Architecture
Location: 2008
Tobias Knaup (Mesosphere), Joerg Schad (ArangoDB)
Average rating: ****.
(4.50, 2 ratings)
There are many great tutorials for training your deep learning models, but training is only a small part in the overall deep learning pipeline. Tobias Knaup and Joerg Schad offer an introduction to building a complete automated deep learning pipeline, starting with exploratory analysis, overtraining, model storage, model serving, and monitoring. Read more.
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: 2009
Chakri Cherukuri (Bloomberg LP)
Average rating: ****.
(4.33, 3 ratings)
Quantitative finance is a rich field in finance where advanced mathematical and statistical techniques are employed by both sell-side and buy-side institutions. Chakri Cherukuri explains how machine learning and deep learning techniques are being used in quantitative finance and details how these models work under the hood. Read more.
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: 2010
Michael Johnson (Lockheed Martin), Norris Heintzelman (Lockheed Martin)
Average rating: ****.
(4.60, 15 ratings)
How do you train a machine learning model with no training data? Michael Johnson and Norris Heintzelman share their journey implementing multiple solutions to bootstrapping training data in the NLP domain, covering topics including weak supervision, building an active learning framework, and annotation adjudication for named-entity recognition. Read more.
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: 2011
Sarah Aerni (Salesforce)
Average rating: ****.
(4.25, 4 ratings)
How does Salesforce make data science an Agile partner to over 100,000 customers? Sarah Aerni shares the nuts and bolts of the platform and details the Agile process behind it. From open source autoML library TransmogrifAI and experimentation to deployment and monitoring, Sarah covers the tools that make it possible for data scientists to rapidly iterate and adopt a truly Agile methodology. Read more.
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/27/2019
Session
Sponsored
Location: 2014
Secondary topics:  Jupyter
Omoju Miller (GitHub)
Average rating: ***..
(3.50, 10 ratings)
GitHub has a relatively nascent ML group. Its major challenge is to integrate ML product building processes into a mature product engineering org. This means that it's responsible for building end-to-end ML products, from ETL to production. Omoju Miller details the many roles Jupyter notebooks play in the building of ML products. Read more.
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: 2016
Melinda Han Williams (Dstillery)
Average rating: ****.
(4.86, 14 ratings)
Customer segmentation based on coarse survey data is a staple of traditional market research. Melinda Han Williams explains how Dstillery uses neural networks to model the digital pathways of 100M consumers and uses the resulting embedding space to cluster customer populations into fine-grained behavioral segments and inform smarter consumer insights—in the process, creating a map of the internet. Read more.
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/27/2019
Session
Strata Business Summit
Location: 2018
Swatee Singh (American Express)
Average rating: ****.
(4.00, 3 ratings)
Organizations developing artificial intelligence and machine learning (AI/ML)-powered applications face two existential questions: Should they consider a fully or partially hybrid cloud environment for AI/ML deployments, and which public cloud will give them the most features and capabilities? Swatee Singh discusses available options for companies facing these challenges. Read more.
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/27/2019
Session
Law and Ethics, Strata Business Summit
Location: 2020
Secondary topics:  Ethics
Bill Franks (International Institute For Analytics)
Average rating: ****.
(4.67, 3 ratings)
Concerns are constantly being raised today about what data is appropriate to collect and how (or if) it should be analyzed. There are many ethical, privacy, and legal issues to consider, and no clear standards exist in many cases as to what is fair and what is foul. Bill Franks explores a variety of dilemmas and provides some guidance on how to approach them. Read more.
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/27/2019
Session
Sponsored
Location: 2022
Average rating: *****
(5.00, 1 rating)
Recently, Scott Mcclellan's team—which analyzes over six petabytes of data using Hadoop technology—created a high-performance data lake using object storage for consumption by big data workloads. Scott shares his experience deploying object storage for AI workloads. Read more.
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/27/2019
Maryam Jahanshahi (TapRecruit)
Average rating: ****.
(4.80, 5 ratings)
Hiring teams largely rely on both intuition and experience to scout talent for data science and data engineering roles. Drawing on results from analyzing over 15 million jobs and their outcomes, Maryam Jahanshahi interrogates these “common sense” judgments to determine whether they help or hurt hiring of data scientists and engineers. Read more.
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI, Expo Hall
Location: Expo Hall
Ron Bodkin (Google)
Average rating: ****.
(4.33, 6 ratings)
Google uses deep learning extensively in new and existing products. Join Ron Bodkin to learn how Google has used deep learning for recommendations at YouTube, in the Play store, and for customers in Google Cloud. You'll explore the role of embeddings, recurrent networks, contextual variables, and wide and deep learning and discover how to do candidate generation and ranking with deep learning. Read more.

12:30pm

Add to your personal schedule
12:30pm–2:40pm Wednesday, 03/27/2019
Event
Location: 3016
Average rating: *****
(5.00, 4 ratings)
If you’d like to make new professional connections and hear ideas for supporting diversity in the tech community, come to the diversity and inclusion networking lunch on Wednesday. Read more.
Add to your personal schedule
12:30pm–2:40pm Wednesday, 03/27/2019
Event
Location: Expo Hall (Exhibit Hall - Level 1)
Average rating: *****
(5.00, 1 rating)
Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics. Read more.
Add to your personal schedule
12:30pm–2:40pm Wednesday, 03/27/2019
Event
Location: Expo Hall
Average rating: *****
(5.00, 1 rating)
Join fellow executives, business leaders, and strategists for a networking lunch on Wednesday for Strata Business Summit attendees and speakers. Read more.

2:40pm

Add to your personal schedule
2:40pm–3:20pm Wednesday, 03/27/2019
Session
Data Engineering & Architecture
Location: 2001
James Taylor (Lyft)
Average rating: ***..
(3.56, 9 ratings)
James Taylor offers an overview of an automated feedback loop at Lyft to adapt ETL based on the aggregate cost of queries run across the cluster. He also discusses future work to enhance the system through the use of materialized views to reduce the number of ad hoc joins and sorting performed by the most expensive queries by transparently rewriting queries when possible. Read more.
Add to your personal schedule
2:40pm–3:20pm Wednesday, 03/27/2019
Session
Data Engineering & Architecture
Location: 2002
Yaron Haviv (iguazio)
Average rating: ****.
(4.00, 2 ratings)
Faced with the need to handle increasing volumes of data, alternative datasets ("alt data"), and AI, many enterprises are working to design or redesign their big data architectures, but traditional batch platforms fail to generate sufficient ROI. Yaron Haviv shares a continuous analytics approach that yields faster answers for the business while remaining simpler and less expensive for IT. Read more.
Add to your personal schedule
2:40pm–3:20pm Wednesday, 03/27/2019
Session
Sponsored
Location: 2003
Average rating: ****.
(4.00, 1 rating)
Stephen Dantu shares insurance broker Marsh’s pioneering journey into the public cloud and explains why this move was necessary to unleash new opportunities and future-proof the company. Read more.
Add to your personal schedule
2:40pm–3:20pm Wednesday, 03/27/2019
Session
Data Engineering & Architecture
Location: 2004
Zhenxiao Luo (Twitter)
Average rating: ****.
(4.09, 11 ratings)
From determining the most convenient rider pickup points to predicting the fastest routes, Uber uses data-driven analytics to create seamless trip experiences. Zhenxiao Luo explains how Uber supports real-time analytics with deep learning on the fly, without any data copying. Read more.
Add to your personal schedule
2:40pm–3:20pm Wednesday, 03/27/2019
Session
Sponsored
Location: 2005
Harinder Singh (AB inBev)
Average rating: ****.
(4.50, 4 ratings)
Harinder Singh explains how, over the course of two years, the world’s largest brewer completely modernized its data architecture and moved it to the cloud. By accelerating data analytics and freeing up the time of its data scientists, AB inBev has been able to better anticipate demand and production, streamline logistics, and develop new beverages that have become best-sellers. Read more.
Add to your personal schedule
2:40pm–3:20pm Wednesday, 03/27/2019
Session
Data Engineering & Architecture
Location: 2006
Adem Efe Gencer (LinkedIn)
Average rating: ***..
(3.50, 2 ratings)
Adem Efe Gencer explains how LinkedIn alleviated the management overhead of large-scale Kafka clusters using Cruise Control. Read more.
Add to your personal schedule
2:40pm–3:20pm Wednesday, 03/27/2019
Session
Sponsored
Location: 2007
Ashwin Ramachandran (Syncsort)
Average rating: ***..
(3.50, 2 ratings)
"Legacy" data sources like mainframes and data warehouses still power mission-critical applications, holding the historical and transactional insight essential for advanced analytics and real-time applications. Ashwin Ramachandran shares strategies, tools, and techniques for successfully deriving value from these sources using today's modern architectures while future-proofing for what lies ahead. Read more.
Add to your personal schedule
2:40pm–3:20pm Wednesday, 03/27/2019
Session
Data Engineering & Architecture
Location: 2008
Denise Gosnell (DataStax)
Average rating: ****.
(4.73, 11 ratings)
The graph community has spent years defining and describing its passion: applying graph thinking to solve difficult problems. Denise Gosnell leverages years of experience shipping large-scale applications built on graph databases to share practical and tangible decisions that come into play when designing and delivering distributed graph applications. . .or playing SimCity 2000. Read more.
Add to your personal schedule
2:40pm–3:20pm Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: 2009
Divya Choudhary (University of Southern California)
Average rating: ****.
(4.50, 2 ratings)
Divya Choudhary explains how GO-JEK uses random chat messages and notes written in a local language sent by customers to their drivers while waiting for a ride to arrive to carve out unparalleled information about pickup points and their names (which sometimes even Google Maps has no idea of) and help create a world-class customer pickup experience feature. Read more.
Add to your personal schedule
2:40pm–3:20pm Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: 2010
Secondary topics:  Ethics
Sharad Goel (Stanford University)
Average rating: ****.
(4.00, 4 ratings)
The nascent field of fair machine learning aims to ensure that decisions guided by algorithms are equitable. Several formal definitions of fairness have gained prominence, but, as Sharad Goel argues, nearly all of them suffer from significant statistical limitations. Perversely, when used as a design constraint, they can even harm the very groups they were intended to protect. Read more.
Add to your personal schedule
2:40pm–3:20pm Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: 2011
Secondary topics:  Model lifecycle management
Ted Dunning (MapR, now part of HPE)
Average rating: ****.
(4.70, 10 ratings)
Evaluating machine learning models is surprisingly hard, particularly because these systems interact in very subtle ways. Ted Dunning breaks the problem of evaluation apart into operational and function evaluation, demonstrating how to do each without unnecessary pain and suffering. Along the way, he shares exciting visualization techniques that will help make differences strikingly apparent. Read more.
Add to your personal schedule
2:40pm–3:20pm Wednesday, 03/27/2019
Session
Sponsored
Location: 2014
Secondary topics:  Jupyter
M Pacer (Netflix)
Average rating: ****.
(4.57, 7 ratings)
M Pacer discusses two meanings of "Talking with Jupyter": talking to others with Jupyter notebooks and talking to Jupyter in the language of its standards, formats, and protocols. M describes tools, workflows, and patterns that make both kinds of talking with Jupyter easier while unlocking new ways of interacting with the Jupyter ecosystem. Read more.
Add to your personal schedule
2:40pm–3:20pm Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: 2016
Chenhui Hu (Microsoft)
Average rating: ****.
(4.67, 6 ratings)
Dilated neural networks are a class of recently developed neural networks that achieve promising results in time series forecasting. Chenhui Hu discusses representative network architectures of dilated neural networks and demonstrates their advantages in terms of training efficiency and forecast accuracy by applying them to solve sales forecasting and financial time series forecasting problems. Read more.
Add to your personal schedule
2:40pm–3:20pm Wednesday, 03/27/2019
John Haddad (Informatica)
Average rating: ****.
(4.60, 5 ratings)
Just like a powerful space telescope that scans the universe, a data catalog scans the data universe to help data scientists and analysts find data, collaborate, and curate data for analytic and data governance projects. John Haddad explains how a data catalog can help you find the data you need and trust for analytic and data governance projects. Read more.
Add to your personal schedule
2:40pm–3:20pm Wednesday, 03/27/2019
Session
Strata Business Summit
Location: 2020
Greg Quist (SmartCover Systems)
Average rating: ****.
(4.00, 1 rating)
SmartCover Systems has been providing an IoT solution to its customers for 15 years, using techniques honed in defense and remote sensing, gathering more than 200 million hours of sewer data. Greg Quist shares case studies and results from applying the IoT and AI to underground infrastructure. Read more.
Add to your personal schedule
2:40pm–3:20pm Wednesday, 03/27/2019
Session
Sponsored
Location: 2022
Jagane Sundar (WANdisco)
Average rating: ****.
(4.50, 2 ratings)
Jagane Sundar shares a system for replicating data across geographically distributed data centers and discusses the benefits of consistently replicating data that is used by TensorFlow for training. Read more.
Add to your personal schedule
2:40pm–3:20pm Wednesday, 03/27/2019
Eric Colson (Stitch Fix), Daragh Sibley (Stitch Fix)
Average rating: ****.
(4.79, 14 ratings)
A/B testing has revealed the fallibility in human intuition that typically drives business decisions. Eric Colson and Daragh Sibley describe some types of systematic errors domain experts commit, explain how cognitive biases arise from heuristic reasoning processes, and share several mechanisms to mitigate these human limitations and improve decision making. Read more.
Add to your personal schedule
2:40pm–3:20pm Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI, Expo Hall
Location: Expo Hall
Sonal Gupta (Facebook)
Average rating: ****.
(4.40, 5 ratings)
Sonal Gupta explores practical systems for building a conversational AI system for task-oriented queries and details a way to do more advanced compositional understanding, which can understand cross-domain queries, using hierarchical representations. Read more.

3:20pm

3:20pm–4:20pm Wednesday, 03/27/2019
Location: Expo Hall (Exhibit Hall - Level 1)
Afternoon break sponsored by IBM (1h)

4:20pm

Add to your personal schedule
4:20pm–5:00pm Wednesday, 03/27/2019
Session
Data Engineering & Architecture
Location: 2001
Alex Kira (Uber)
Average rating: ****.
(4.00, 13 ratings)
Uber operates at scale, with thousands of microservices serving millions of rides a day, leading to 100+ PB of data. Alex Kira details Uber's journey toward a unified and scalable data workflow system used to manage this data and shares the challenges faced and how the company has rearchitected the system to make it highly available and horizontally scalable. Read more.
Add to your personal schedule
4:20pm–5:00pm Wednesday, 03/27/2019
Session
Data Engineering & Architecture
Location: 2002
Jowanza Joseph (Pluralsight), Karthik Ramasamy (Streamlio)
Average rating: ****.
(4.00, 1 rating)
After two years of running streaming pipelines through Kinesis and Spark at One Click Retail, Jowanza Joseph and Karthik Ramasamy decided to explore a new platform that would take advantage of Kubernetes and support a simpler data processing DSL. Join in to discover why they chose Apache Pulsar and learn tips and tricks for using Pulsar Functions. Read more.
Add to your personal schedule
4:20pm–5:00pm Wednesday, 03/27/2019
Session
Sponsored
Location: 2003
Average rating: ***..
(3.00, 1 rating)
MySQL is great but has limits. When you need key-value pair storage with geospatial and JSON support, easy and fast ingestion from various streams, aggregate queries against 100+ million rows in under one second, and more, there's only one solution. Franck Leveneur explains how on-demand dog walking service Wag! uses MemSQL to take its real-time data access and reporting to the next level. Read more.
Add to your personal schedule
4:20pm–5:00pm Wednesday, 03/27/2019
Session
Data Engineering & Architecture
Location: 2004
Julien Le Dem (WeWork)
Average rating: ****.
(4.83, 6 ratings)
Big data infrastructure has evolved from flat files in a distributed filesystem to an efficient ecosystem to a fully deconstructed and open source database with reusable components. Julien Le Dem discusses the key open source components of the big data ecosystem and explains how they relate to each other and how they make the ecosystem more of a database and less of a filesystem. Read more.
Add to your personal schedule
4:20pm–5:00pm Wednesday, 03/27/2019
Session
Sponsored
Location: 2005
Yang Li (Kyligence)
Average rating: ****.
(4.00, 1 rating)
Augmenting data management and analytics platforms with artificial intelligence and machine learning is game changing for analysts, engineers, and other users. It enables companies to optimize their storage, speed, and spending. Yang Li details the Kyligence platform, which is evolving to the next level with augmented capabilities such as intelligent modeling, smart pushdowns, and more. Read more.
Add to your personal schedule
4:20pm–5:00pm Wednesday, 03/27/2019
Session
Data Engineering & Architecture
Location: 2006
Sean Glover (Lightbend)
Average rating: ****.
(4.00, 1 rating)
The best way to run stateful services with complex operational needs like Kafka is to use the operator pattern. Sean Glover offers an overview of the Strimzi Kafka Operator, a popular new open source Operator-based Apache Kafka implementation on Kubernetes. Read more.
Add to your personal schedule
4:20pm–5:00pm Wednesday, 03/27/2019
Session
Data Engineering & Architecture
Location: 2008
Secondary topics:  Model lifecycle management
Corey Zumar (Databricks)
Average rating: ****.
(4.89, 9 ratings)
Developing applications that leverage machine learning is difficult. Practitioners need to be able to reproduce their model development pipelines, as well as deploy models and monitor their health in production. Corey Zumar offers an overview of MLflow, which simplies this process by managing, reproducing, and operationalizing machine learning through a suite of model tracking and deployment APIs. Read more.
Add to your personal schedule
4:20pm–5:00pm Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: 2009
Yogesh Pandit (Roche), Saif Addin Ellafi (John Snow Labs), Vishakha Sharma (Roche Molecular Solutions)
Average rating: ****.
(4.67, 3 ratings)
Yogesh Pandit, Saif Addin Ellafi, and Vishakha Sharma discuss how Roche applies Spark NLP for healthcare to extract clinical facts from pathology reports and radiology. They then detail the design of the deep learning pipelines used to simplify training, optimization, and inference of such domain-specific models at scale. Read more.
Add to your personal schedule
4:20pm–5:00pm Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: 2010
Ying Yau (Walmart Labs)
Average rating: ***..
(3.29, 7 ratings)
Time series forecasting techniques are applied in a wide range of scientific disciplines, business scenarios, and policy settings. Jeffrey Yau discusses the applications of statistical time series models, such as ARIMA, VAR, and regime-switching models, and machine learning models, such as random forest and neural network-based models, to forecasting problems. Read more.
Add to your personal schedule
4:20pm–5:00pm Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: 2011
Kelley Rivoire (Stripe)
Average rating: ****.
(4.33, 3 ratings)
Production ML applications benefit from reproducible, automated retraining, and deployment of ever-more predictive models trained on ever-increasing amounts of data. Kelley Rivoire explains how Stripe built a flexible API for training machine learning models that's used to train thousands of models per week on Kubernetes, supporting automated deployment of new models with improved performance. Read more.
Add to your personal schedule
4:20pm–5:00pm Wednesday, 03/27/2019
Session
Sponsored
Location: 2014
Secondary topics:  Jupyter
Chris Holdgraf (Berkeley Institute for Data Science)
Average rating: ****.
(4.75, 4 ratings)
Chris Holdgraf shares recent tools from the Jupyter project in partnership with UC Berkeley that facilitate communication with Jupyter and get us closer to displaying notebook-style content in a more discoverable and reader-friendly form—allowing you to turn collections of notebooks into an online book and connect this content with the cloud in order to make your online content interactive. Read more.
Add to your personal schedule
4:20pm–5:00pm Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: 2016
Secondary topics:  Deep Learning, Retail and e-commerce
Luyang Wang (Restaurant Brands International), Jing (Nicole) Kong (Office Depot), Guoqiong Song (Intel), Maneesha Bhalla (Office Depot)
Average rating: ****.
(4.00, 2 ratings)
User-based real-time recommendation systems have become an important topic in ecommerce. Lu Wang, Nicole Kong, Guoqiong Song, and Maneesha Bhalla demonstrate how to build deep learning algorithms using Analytics Zoo with BigDL on Apache Spark and create an end-to-end system to serve real-time product recommendations. Read more.
Add to your personal schedule
4:20pm–5:00pm Wednesday, 03/27/2019
Secondary topics:  Visualization, Design, and UX
Average rating: ****.
(4.50, 4 ratings)
Maxime Beauchemin offers an overview of Apache Superset, discussing the project's open source development dynamics, security, architecture, and underlying technologies as well as the key items on its roadmap. Read more.
Add to your personal schedule
4:20pm–5:00pm Wednesday, 03/27/2019
Paco Nathan (derwen.ai)
Average rating: ***..
(3.67, 6 ratings)
Effective data governance is foundational for AI adoption in enterprise, but it's an almost overwhelming topic. Paco Nathan offers an overview of its history, themes, tools, process, standards, and more. Join in to learn what impact machine learning has on data governance and vice versa. Read more.
Add to your personal schedule
4:20pm–5:00pm Wednesday, 03/27/2019
Session
Sponsored
Location: 2022
Joel Hron (ThoughtTrace), Nick Vandivere (ThoughtTrace)
Average rating: ****.
(4.00, 1 rating)
Building a SaaS AI company targeted at enterprise users presents unique challenges, both technical and nontechnical. Joel Hron and Nick Vandivere walk you through ThoughtTrace's journey, highlighting its beginnings as a company and sharing the challenging use cases the company tackled first. Read more.
Add to your personal schedule
4:20pm–5:00pm Wednesday, 03/27/2019
Session
Strata Business Summit
Location: 2024
Ashley Fontana (Zetta), Katherine Boyle (General Catalyst), Sarah Catanzaro (Amplify Partners), Arif Janmohamed (Lightspeed Venture Partners), Lan Xuezhao (Basis Set Ventures)
Average rating: ****.
(4.00, 1 rating)
What does it mean to be an AI investor? How is this approach different from traditional venture capital? Ash Fontana and Katherine Boyle share their perspectives on investments in machine intelligence and data science. Read more.
Add to your personal schedule
4:20pm–5:00pm Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI, Expo Hall
Location: Expo Hall
Gungor Polatkan (LinkedIn)
Average rating: ****.
(4.33, 3 ratings)
Talent search systems at LinkedIn strive to match the potential candidates to the hiring needs of a recruiter expressed in terms of a search query. Gungor Polatkan shares the results of the company's deployment of deep learning models on a real-world production system serving 500M+ users through LinkedIn Recruiter. Read more.

5:10pm

Add to your personal schedule
5:10pm–5:50pm Wednesday, 03/27/2019
Session
Data Engineering & Architecture
Location: 2001
Gwen Shapira (Confluent)
Average rating: ****.
(4.64, 11 ratings)
As microservices, data services, and serverless APIs proliferate, data engineers need to collect and standardize data in an increasingly complex and diverse system. Gwen Shapira discusses how data engineering requirements have changed in a cloud native world and shares architectural patterns that are commonly used to build flexible, scalable, and reliable data pipelines. Read more.
Add to your personal schedule
5:10pm–5:50pm Wednesday, 03/27/2019
Session
Data Engineering & Architecture
Location: 2002
Rustem Feyzkhanov (Instrumental)
Average rating: ***..
(3.50, 8 ratings)
Serverless implementation of core processing is quickly becoming a production-ready solution. However, companies with existing processing pipelines may find it hard to go completely serverless. Serverless workflows unite the serverless and cluster worlds, with the benefits of both approaches. Rustem Feyzkhanov demonstrates how serverless workflows change your perception of software architecture. Read more.
Add to your personal schedule
5:10pm–5:50pm Wednesday, 03/27/2019
Session
Sponsored
Location: 2003
Geoff Tudor (Vizion.ai)
Average rating: *....
(1.00, 2 ratings)
Elasticsearch is powerful. In its current form, it's also nontrivial and rather expensive to deploy. Not very "elastic." Fortunately, innovations like serverless and microservices are eliminating these barriers, lowering upfront costs, and reducing complexity. Geoff Tudor explains how this is unfolding in the market. Read more.
Add to your personal schedule
5:10pm–5:50pm Wednesday, 03/27/2019
Session
Data Engineering & Architecture
Location: 2004
Tim Armstrong (Cloudera)
Average rating: ****.
(4.80, 5 ratings)
As the popularity and utilization of Apache Impala deployments increases, clusters often become victims of their own success when demand for resources exceeds the supply. Tim Armstrong dives into the latest resource management features in Impala to maintain high cluster availability and optimal performance and provides examples of how to configure them in your Impala deployment. Read more.
Add to your personal schedule
5:10pm–5:50pm Wednesday, 03/27/2019
Average rating: ****.
(4.50, 2 ratings)
GE produces a third of the world's power and 60% of its airplane engines—a critical portion of the world's infrastructure that requires meticulous monitoring of the hundreds of sensors streaming data from each turbine. June Andrews and John Rutherford explain how GE's monitoring and diagnostics teams released the first real-time ML systems used to determine turbine health into production. Read more.
Add to your personal schedule
5:10pm–5:50pm Wednesday, 03/27/2019
Session
Data Engineering & Architecture
Location: 2008
Skyler Thomas (MapR), Terry He (MapR Technologies)
Average rating: ****.
(4.75, 4 ratings)
KubeFlow separates compute and storage to provide the ability to deploy best-of-breed open source systems for machine learning to any cluster running Kubernetes, whether on-premises or in the cloud. Skyler Thomas and Terry He explore the problems of state and storage and explain how distributed persistent storage can logically extend the compute flexibility provided by KubeFlow. Read more.
Add to your personal schedule
5:10pm–5:50pm Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: 2009
Rakesh Kumar (Lyft), Thomas Weise (Lyft)
Average rating: ****.
(4.00, 3 ratings)
Rakesh Kumar and Thomas Weise explore how Lyft dynamically prices its rides with a combination of various data sources, ML models, and streaming infrastructure for low latency, reliability, and scalability—allowing the pricing system to be more adaptable to real-world changes. Read more.
Add to your personal schedule
5:10pm–5:50pm Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: 2010
Secondary topics:  Security and Privacy
Mike Lee Williams (Cloudera Fast Forward Labs)
Average rating: ****.
(4.00, 1 rating)
Imagine building a model whose training data is collected on edge devices such as cell phones or sensors. Each device collects data unlike any other, and the data cannot leave the device because of privacy concerns or unreliable network access. This challenging situation is known as federated learning. Mike Lee Williams discusses the algorithmic solutions and the product opportunities. Read more.
Add to your personal schedule
5:10pm–5:50pm Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: 2011
Ting-Fang Yen (DataVisor)
Average rating: ****.
(4.00, 3 ratings)
Ting-Fang Yen details an approach for monitoring production machine learning systems that handle billions of requests daily by discovering detection anomalies, such as spurious false positives, as well as gradual concept drifts when the model no longer captures the target concept. Join in to explore new tools for detecting undesirable model behaviors early in large-scale online ML systems. Read more.
Add to your personal schedule
5:10pm–5:50pm Wednesday, 03/27/2019
Session
Sponsored
Location: 2014
Secondary topics:  Jupyter
Average rating: ***..
(3.43, 7 ratings)
Project Jupyter is very popular for data science, data exploration, and visualization. Manu Mukerji and Justin Driemeyer explain how to use it for AI/ML in a production environment. Read more.
Add to your personal schedule
5:10pm–5:50pm Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: 2016
Zhenxiao Luo (Twitter)
Average rating: ****.
(4.00, 4 ratings)
From determining the most convenient rider pickup points to predicting the fastest routes, Uber uses data-driven analytics to create seamless trip experiences. Inside Uber, analysts are using deep learning and big data to train models, make predictions, and run analytics in real time. Zhenxiao Luo explains how Uber runs real-time analytics with deep learning. Read more.
Add to your personal schedule
5:10pm–5:50pm Wednesday, 03/27/2019
Dave Stuart (Department of Defense )
Average rating: ****.
(4.38, 8 ratings)
Many organizations look to add data science to their skill portfolios through the hiring of data science experts. Dave Stuart shares a complementary way to build a data science-savvy workforce that nets tremendous value by using Jupyter to add introductory data science practices to domain experts and business analysts. Read more.
Add to your personal schedule
5:10pm–5:50pm Wednesday, 03/27/2019
Average rating: **...
(2.50, 4 ratings)
How do you decide if you should invest in upskilling business teams? The question is no longer "if" but "when" and "how." Barkha Gvalani shares a framework for developing and delivering analytics training to nontechnical users. Read more.
Add to your personal schedule
5:10pm–5:50pm Wednesday, 03/27/2019
Session
Location: 2022
Average rating: ****.
(4.00, 1 rating)
Satheesh Bandaram and Saumitra Buragohain detail how IBM and Cloudera are advancing AI and ML for their customers with solutions to build on-premises or cloud-based secure governed data lakes. Read more.
Add to your personal schedule
5:10pm–5:50pm Wednesday, 03/27/2019
Session
Case studies, Strata Business Summit
Location: 2024
Secondary topics:  Media, Marketing, Advertising
Eric Bradlow (The Wharton School), Zachery Anderson (Electronic Arts)
Average rating: ***..
(3.00, 1 rating)
Eric Bradlow and Zachery Anderson discuss the Wharton Customer Analytics Initiative research opportunity process and explain how some of EA’s solved some of its business problems by sharing its data with 11 teams of researchers from around the world. Read more.
Add to your personal schedule
5:10pm–5:50pm Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI, Expo Hall
Location: Expo Hall
Kevin Moore (Salesforce)
Average rating: ****.
(4.50, 2 ratings)
Kevin Moore walks you through how TransmogrifAI—Salesforce's open source AutoML library built on Spark—automatically generates models that are automatically customized to a company's dataset and use case and provides insights into why the model is making the predictions it does. Read more.

5:50pm

Add to your personal schedule
5:50pm–6:50pm Wednesday, 03/27/2019
Event
Location: Expo Hall (Exhibit Hall - Level 1)
Average rating: *****
(5.00, 2 ratings)
Make your way from booth to booth while you check out all the exhibitors in the Expo Hall on Wednesday after sessions end. Read more.

6:50pm

6:50pm–7:30pm Wednesday, 03/27/2019
Location: On your own
Dinner (40m)

7:30pm

Add to your personal schedule
7:30pm–9:30pm Wednesday, 03/27/2019
Event
Location: SPIN, 690 Folsom St., San Francisco
Average rating: *****
(5.00, 2 ratings)
Don't miss an exciting evening filled with cocktails, food, and entertainment at Data After Dark at Strata San Francisco. Read more.

Thursday, 03/28/2019

8:00am

8:00am–8:45am Thursday, 03/28/2019
Location: 3rd floor lobby
Break (45m)

8:15am

Add to your personal schedule
8:15am–8:45am Thursday, 03/28/2019
Event
Location: 3rd floor lobby
Average rating: ***..
(3.67, 3 ratings)
Gather before keynotes on Thursday morning to enjoy casual conversation while meeting fellow attendees. If one of your goals at Strata is to meet new people, this session will jumpstart your networking opportunities. Read more.

8:45am

Add to your personal schedule
8:45am–8:50am Thursday, 03/28/2019
Keynote
Location: Ballroom
Ben Lorica (O'Reilly), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Average rating: ****.
(4.20, 5 ratings)
Program chairs Ben Lorica, Doug Cutting, and Alistair Croll welcome you to the second day of keynotes. Read more.

8:50am

Add to your personal schedule
8:50am–9:00am Thursday, 03/28/2019
Keynote
Location: Ballroom
Elizabeth Svoboda (What Makes a Hero?)
Average rating: ***..
(3.89, 18 ratings)
Using biosensors and predictive analytics, political campaigns aim to decode your true desires—and influence your vote—without your knowledge. Elizabeth Svoboda explains how these tools work, who's using them, and what they mean for the future of free and fair elections. Read more.

9:00am

Add to your personal schedule
9:00am–9:10am Thursday, 03/28/2019
Keynote
Location: Ballroom
Jordan Tigani (Google )
Average rating: ***..
(3.45, 11 ratings)
Modern data analysis requirements have fundamentally redefined what our expectations should be for data warehouses. Join Google BigQuery cocreator Jordan Tigani as he shares his vision for where he sees cloud-scale data analytics heading as well as what technology leaders should be considering as part of their data warehousing roadmap. Read more.

9:10am

Add to your personal schedule
9:10am–9:25am Thursday, 03/28/2019
Keynote
Location: Ballroom
Lauren Kunze (Pandorabots)
Average rating: ****.
(4.85, 34 ratings)
Keynote with Lauren Kunze Read more.

9:25am

Add to your personal schedule
9:25am–9:35am Thursday, 03/28/2019
Keynote
Location: Ballroom
Mike Olson (Cloudera)
Average rating: ***..
(3.29, 17 ratings)
Most enterprises want the same flexibility and convenience they get in the public cloud, no matter where their data lives or their applications run. We've reached the point that the "enterprise data cloud" must span the firewall and the services offered by hyperscale vendors. Mike Olson describes the key capabilities that such a system requires and why hybrid and multicloud is the future. Read more.

9:35am

Add to your personal schedule
9:35am–9:45am Thursday, 03/28/2019
Keynote
Location: Ballroom
Average rating: **...
(2.62, 8 ratings)
The Strata Data Awards recognize the most innovative startups, leaders, and data science projects from Strata sponsors and exhibitors around the world. Join us during keynotes for the announcement of the winners. Read more.

9:45am

Add to your personal schedule
9:45am–9:55am Thursday, 03/28/2019
Keynote
Location: Ballroom
Theresa Johnson (Airbnb)
Average rating: ****.
(4.22, 18 ratings)
Airbnb uses AI and machine learning in many parts of its user-facing business. But it's also advancing the state of AI-powered internal tools. Theresa Johnson details the AI powering Airbnb's next-generation end-to-end metrics forecasting platform, which leverages machine learning, Bayesian inference, TensorFlow, Hadoop, and web technology. Read more.

9:55am

Add to your personal schedule
9:55am–10:10am Thursday, 03/28/2019
Keynote
Location: Ballroom
Zachery Anderson (Electronic Arts)
Average rating: ****.
(4.54, 24 ratings)
Developing games at EA is where creativity meets AI, analytics, and machine learning, combining an understanding of player motivations with the means to improve the game design process. Zachery Anderson leads a tour of EA’s history combining data with development, taking you through the early days of balancing gameplay to the future of personalized games for everyone. Read more.

10:10am

Add to your personal schedule
10:10am–10:25am Thursday, 03/28/2019
Keynote
Location: Ballroom
Secondary topics:  Security and Privacy
Peter Singer (New America)
Average rating: ****.
(4.80, 20 ratings)
Terrorists live-stream their attacks, “Twitter wars” sell music albums and produce real-world casualties, and viral misinformation alters not just the result of battles but the very fate of nations. The result is that war, tech, and politics have blurred into a new kind of battle space that plays out on our smartphones. P. W. Singer explains. Read more.

10:30am

10:30am–11:00am Thursday, 03/28/2019
Location: Expo Hall (Exhibit Hall - Level 1)
Morning break sponsored by Google Cloud (30m)

11:00am

Add to your personal schedule
11:00am–11:40am Thursday, 03/28/2019
Session
Data Engineering & Architecture
Location: 2001
Mark Grover (Lyft), Tao Feng (Lyft)
Average rating: ****.
(4.40, 10 ratings)
Lyft has reduced the time it takes to discover data by 10x by building its own data portal, Amundsen. Mark Grover and Tao Feng offer a demo of Amundsen and lead a deep dive into its architecture, covering how it leverages centralized metadata, PageRank, and a comprehensive data graph to achieve its goal. They also explore the future roadmap, unsolved problems, and its collaboration model. Read more.
Add to your personal schedule
11:00am–11:40am Thursday, 03/28/2019
Session
Data Engineering & Architecture
Location: 2002
Subhadra Tatavarti (PayPal), Chen Kovacs (Paypal)
Average rating: ****.
(4.12, 8 ratings)
The PayPal data ecosystem is large, with 250+ PB of data transacting in 200+ countries. Given this massive scale and complexity, discovering and access to the right datasets in a frictionless environment is a challenge. Subhadra Tatavarti and Chen Kovacs explain how PayPal’s data platform team is helping solve this problem with a combination of self-service integrated and interoperable products. Read more.
Add to your personal schedule
11:00am–11:40am Thursday, 03/28/2019
Session
Data Engineering & Architecture
Location: 2004
Kamil Bajda-Pawlikowski (Starburst), Martin Traverso (Presto Software Foundation)
Average rating: ***..
(3.33, 3 ratings)
Kamil Bajda-Pawlikowski and Martin Traverso explore Presto's recently introduced cost-based optimizer, which must account for heterogeneous inputs with differing and often incomplete data statistics, and detail use cases for Presto across several industries. They also share recent Presto advancements, such as geospatial analytics at scale, and the project roadmap going forward. Read more.
Add to your personal schedule
11:00am–11:40am Thursday, 03/28/2019
Session
Sponsored
Location: 2005
Prakhar Mehrotra (Walmart Labs)
Average rating: ****.
(4.14, 7 ratings)
Prakhar Mehrotra shares Walmart’s digital transformation journey and explains how the company is using recent advancements in machine learning to power core retail operations like pricing, assortment, and replenishment. Along the way, Prakhar demonstrates how to leverage human expertise and use it as feedback to improve your algorithms. Read more.
Add to your personal schedule
11:00am–11:40am Thursday, 03/28/2019
Sijie Guo (StreamNative), Penghui Li (Zhaopin)
Average rating: ****.
(4.00, 1 rating)
Using a messaging system to build an event bus is very common. However, certain use cases demand a messaging system with a certain set of features. Sijie Guo and Penghui Li discuss the event bus requirements for Zhaopin.com, one of China's biggest online recruitment services providers, and explain why the company chose Apache Pulsar. Read more.
Add to your personal schedule
11:00am–11:40am Thursday, 03/28/2019
Session
Data Engineering & Architecture
Location: 2007
Eric Jonas (UC Berkeley)
Average rating: ****.
(4.50, 2 ratings)
Eric Jonas offers a quick history of cloud computing, including an accounting of the predictions of the 2009 "Berkeley View of Cloud Computing" paper, explains the motivation for serverless computing, describes applications that stretch the current limits of serverless, and then lists obstacles and research opportunities required for serverless computing to fulfill its full potential. Read more.
Add to your personal schedule
11:00am–11:40am Thursday, 03/28/2019
Session
Data Engineering & Architecture
Location: 2008
Yue Li (MemVerge), Shouwei Chen (Rutgers University)
Average rating: *****
(5.00, 4 ratings)
JD.com recently designed a brand-new architecture to optimize Spark computing clusters. Yue Li and Shouwei Chen detail the problems the team faced when building it and explain how the company benefits from the in-memory distributed filesystem now. Read more.
Add to your personal schedule
11:00am–11:40am Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2009
Secondary topics:  Media, Marketing, Advertising
Boris Yakubchik (Forbes), Salah Zalatimo (Forbes)
Average rating: ****.
(4.50, 2 ratings)
Boris Yakubchik and Salah Zalatimo offer an overview of Bertie, Forbes's new publishing platform—an AI assistant that learns from writers and suggests improvements—and detail Bertie’s features, architecture, and ultimate goals, paying special attention to how the company implemented an ensemble of machine learning models that, together, make up the AI assistant's skill set and personality. Read more.
Add to your personal schedule
11:00am–11:40am Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2010
Ram Shankar Siva Kumar (Microsoft (Azure Security))
Average rating: ****.
(4.33, 3 ratings)
How can we guarantee that the ML system we develop is adequately protected from adversarial manipulation? Ram Shankar Kumar shares a framework and corresponding best practices to quantitatively assess the safety of your ML systems. Read more.
Add to your personal schedule
11:00am–11:40am Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2011
Sourav Dey (Manifold)
Average rating: ****.
(4.75, 4 ratings)
Clustered data is all around us. The best way to attack it? Mixed effect models. Sourav Dey explains how the mixed effects random forests (MERF) model and Python package marries the world of classical mixed effect modeling with modern machine learning algorithms and shows how it can be extended to be used with other advanced modeling techniques like gradient boosting machines and deep learning. Read more.
Add to your personal schedule
11:00am–11:40am Thursday, 03/28/2019
Session
Future of the Firm
Location: 2014
Josh Bersin (Bersin by Deloitte)
Average rating: *****
(5.00, 3 ratings)
Josh Bersin explains how firms are transforming for the digital era, covering the death of the traditional organizational hierarchy, new models of leadership and management, changes in the way people learn and progress, new models of pay, and the importance of trust and transparency as a central business value. Read more.
Add to your personal schedule
11:00am–11:40am Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2016
Secondary topics:  Deep Learning, Security and Privacy
Fang Yu (DataVisor)
Average rating: ***..
(3.75, 4 ratings)
Online fraud flourishes as online services become ubiquitous in our daily life. Fang Yu explains how DataVisor leverages cutting-edge deep learning technologies to address the challenges in large-scale fraud detection. Read more.
Add to your personal schedule
11:00am–11:40am Thursday, 03/28/2019
Marc Paradis (UnitedHealth Group)
Average rating: ****.
(4.75, 4 ratings)
Data Science University (DSU) was established to bring analytics education to UnitedHealth Group, the world’s largest healthcare company, with over 270,000 employees. Marc Paradis explains how DSU was built out over time in an era of rapidly changing analytics technology and capabilities in an industry ripe for disruption, covering the challenges faced and lessons learned. Read more.
Add to your personal schedule
11:00am–11:40am Thursday, 03/28/2019
Session
Law and Ethics, Strata Business Summit
Location: 2020
Nick Curcuru (Mastercard)
Average rating: ****.
(4.50, 2 ratings)
Data—in part, harvested personal data—brings industries unprecedented insights about customer behavior. We know more about our customers and neighbors than at any other time in history, but we need to avoid crossing the "creepy" line. Nick Curcuru discusses how ethical behavior drives trust, especially in today's IoT age. Read more.
Add to your personal schedule
11:00am–11:40am Thursday, 03/28/2019
Session
Data Engineering & Architecture
Location: 2024
Thomas Phelan (HPE BlueData)
Average rating: ****.
(4.50, 2 ratings)
Recent headline-grabbing data breaches demonstrate that protecting data is essential for every enterprise. The best-of-breed approach for big data is HDFS configured with Transparent Data Encryption (TDE). But TDE is difficult to configure and manage—particularly when run in Docker containers. Thomas Phelan discusses these challenges and explains how to overcome them. Read more.
Add to your personal schedule
11:00am–11:40am Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI, Expo Hall
Location: Expo Hall
Secondary topics:  Security and Privacy, Storage
Alex Ingerman (Google)
Average rating: ****.
(4.67, 12 ratings)
Federated learning is an approach for training ML models across a fleet of participating devices without collecting their data in a central location. Alex Ingerman offers an overview of federated learning, compares traditional and federated ML workflows, and explores the current and upcoming use cases for decentralized machine learning, with examples from Google's deployment of this technology. Read more.

11:50am

Add to your personal schedule
11:50am–12:30pm Thursday, 03/28/2019
Session
Data Engineering & Architecture
Location: 2001
Jason Wang (Cloudera), Sushant Rao (Cloudera)
Average rating: ****.
(4.00, 2 ratings)
Jason Wang and Sushant Rao offer an overview of cloud architecture, then go into detail on core cloud paradigms like compute (virtual machines), cloud storage, authentication and authorization, and encryption and security. They conclude by bringing these concepts together through customer stories to demonstrate how real-world companies have leveraged the cloud for their big data platforms. Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, 03/28/2019
Session
Data Engineering & Architecture
Location: 2002
Average rating: ****.
(4.75, 4 ratings)
Juan Paulo Gutierrez explains how a small team in Tokyo went through several evolutions as they built an analytics service to help 200+ businesses accelerate their decision-making process. Join in to hear about the background, challenges, architecture, success stories, and best practices as they built and productionalized Rakuten Analytics. Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, 03/28/2019
Session
Sponsored
Location: 2003
Adam Famularo (erwin, Inc.)
Average rating: ****.
(4.00, 1 rating)
Adam Famularo showcases erwin's combination of data management and data governance to produce actionable insights. Erwin customer Nasdaq then shares a real-world use case. You'll learn how to answer tough data questions, how to maintain a metadata landscape, and how to use data management and governance to produce actionable insights. Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, 03/28/2019
Fabian Hueske (Ververica)
Average rating: ****.
(4.30, 10 ratings)
Processing streaming data with SQL is becoming increasingly popular. Fabian Hueske explains why SQL queries on streams should have the same semantics as SQL queries on static data. He then shares a selection of common use cases and demonstrates how easily they can be addressed with Flink SQL. Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, 03/28/2019
Session
Sponsored
Location: 2005
Priyank Patel (Arcadia Data)
Average rating: ****.
(4.00, 1 rating)
With cloud object storage, you'd expect business intelligence (BI) applications to benefit from the scale of data and real-time analytics. However, traditional BI in the cloud surfaces non-obvious challenges. Priyank Patel reviews service-oriented cloud design (storage, compute, catalog, security, SQL) and shows how native cloud BI provides analytic depth, low cost, and high performance. Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, 03/28/2019
Session
Data Engineering & Architecture
Location: 2006
Vivek Pasari (Netflix), Jitender Aswani (Netflix)
Average rating: ***..
(3.14, 7 ratings)
Netflix has over 125 million members spread across 191 countries. Each day its members interact with its client applications on 250 million+ devices under highly variable network conditions. These interactions result in over 200 billion daily data points. Vivek Pasari dives into the data engineering and architecture that enables application performance measurement at this scale. Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, 03/28/2019
Avner Braverman (Binaris)
Average rating: ****.
(4.00, 3 ratings)
What is serverless, and how can it be utilized for data analysis and AI? Avner Braverman outlines the benefits and limitations of serverless with respect to data transformation (ETL), AI inference and training, and real-time streaming. This is a technical talk, so expect demos and code. Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, 03/28/2019
Session
Data Engineering & Architecture
Location: 2008
Fait Poms (Stanford University), Will Crichton (Stanford University)
Average rating: ****.
(4.75, 4 ratings)
Video is now the largest source of data on the internet, so we need tools to make it easier to process and analyze. Alex Poms and Will Crichton offer an overview of Scanner, the first open source distributed system for building large-scale video processing applications, and explore real-world use cases. Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2009
Jeff Chen (US Bureau of Economic Analysis)
Average rating: ****.
(4.50, 2 ratings)
Jeff Chen shares strategies for overcoming time series challenges at the intersection of macroeconomics and data science, drawing from machine learning research conducted at the Bureau of Economic Analysis aimed at improving its flagship product the gross domestic product. Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2010
David Rodriguez (Cisco Systems)
Average rating: ****.
(4.50, 2 ratings)
Malicious DNS traffic patterns are inconsistent and typically thwart anomaly detection. David Rodriguez explains how Cisco uses Apache Spark and Stripe’s Bayesian inference software, Rainier, to fit the underlying time series distribution for millions of domains and outlines techniques to identify artificial traffic volumes related to spam, malvertising, and botnets (masquerading traffic). Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2011
Ken Johnston (Microsoft), Ankit Srivastava (Microsoft)
Average rating: ****.
(4.50, 2 ratings)
Today, normal growth isn't enough—you need hockey-stick levels of growth. Sales and marketing orgs are looking to AI to "growth hack" their way to new markets and segments. Ken Johnston and Ankit Srivastava explain how to use mutual information at scale across massive data sources to help filter out noise and share critical insights with new cohort of users, businesses, and networks. Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, 03/28/2019
Session
Future of the Firm
Location: 2014
Renee DiResta (New Knowledge)
Average rating: *****
(5.00, 1 rating)
Renee Diresta, lead author of the US Senate report about Russian disinformation operations, will discuss how influence operations are manifesting in 2019 as they've moved beyond politics. Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2016
Sricharan Kumar (Intuit )
Average rating: ****.
(4.29, 7 ratings)
Machine learning is delivering immense value across industries. However, in some instances, machine learning models can produce overconfident results—with the potential for catastrophic outcomes. Kumar Sricharan explains how to address this challenge through Bayesian machine learning and highlights real-world examples to illustrate its benefits. Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, 03/28/2019
Francesco Mucio (Francescomuc.io)
Average rating: ****.
(4.00, 2 ratings)
Francesco Mucio tells the story of how Zalando went from an old-school BI company to an AI-driven company built on a solid data platform. Along the way, he shares what Zalando learned in the process and the challenges that still lie ahead. Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, 03/28/2019
Secondary topics:  Model lifecycle management
David Talby (Pacific AI)
Average rating: ****.
(4.90, 10 ratings)
Machine learning and data science systems often fail in production in unexpected ways. David Talby shares real-world case studies showing why this happens and explains what you can do about it, covering best practices and lessons learned from a decade of experience building and operating such systems at Fortune 500 companies across several industries. Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, 03/28/2019
Session
Data Engineering & Architecture
Location: 2024
Alkis Simitsis (Micro Focus), Shivnath Babu (Unravel Data Systems | Duke University)
Average rating: **...
(2.67, 3 ratings)
Alkis Simitsis and Shivnath Babu share an automated technique for root cause analysis (RCA) for big data stack applications using deep learning techniques, using Spark and Impala. The concepts they discuss apply generally to the big data stack. Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, 03/28/2019
Roger Chen (Computable)
Average rating: **...
(2.00, 1 rating)
Data remains a linchpin of success for machine learning yet too often is a scarce resource. And even when data is available, trust issues arise about the quality and ethics of collection. Roger Chen explores new models for generating and governing training data for AI applications. Read more.

12:30pm

Add to your personal schedule
12:30pm–1:50pm Thursday, 03/28/2019
Event
Location: Expo Hall (Exhibit Hall - Level 1)
Average rating: *****
(5.00, 1 rating)
Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics. Read more.
Add to your personal schedule
12:30pm–1:50pm Thursday, 03/28/2019
Event
Location: Expo Hall
Average rating: ***..
(3.50, 2 ratings)
Join Strata Business Summit speakers and attendees for a networking lunch on Thursday. Read more.

1:50pm

Add to your personal schedule
1:50pm–2:30pm Thursday, 03/28/2019
Session
Data Engineering & Architecture
Location: 2001
Krishna Gade (Fiddler Labs)
Average rating: ****.
(4.67, 3 ratings)
Join Krishna Gade to learn how to address engineering and organizational challenges for AI fairness and operationalize these concepts in a production AI system—and crucially, create a culture of trust in AI. Read more.
Add to your personal schedule
1:50pm–2:30pm Thursday, 03/28/2019
Session
Data Engineering & Architecture
Location: 2002
Jacques Nadeau (Dremio)
Average rating: ****.
(4.60, 5 ratings)
Apache Arrow Flight is a new initiative focused on providing high-performance communication within data engineering and data science infrastructure. Jacques Nadeau explains how Flight works and where it has been integrated. He also discusses how Flight can be used to abstract physical data management from logical access and sharse benchmarks of workloads that have been improved by Flight. Read more.
Add to your personal schedule
1:50pm–2:30pm Thursday, 03/28/2019
Session
Sponsored
Location: 2003
Jordan Tigani (Google )
Average rating: ****.
(4.00, 3 ratings)
Google Cloud Platform combines powerful serverless solutions for enterprise data warehousing, streaming analytics, managed Spark and Hadoop, modern BI, planet-scale data lake, and AI. Jordan Tigani details Google Cloud’s vision and engineering strategy, which can help you move big data analytics solutions to the next level of benefits. Read more.
Add to your personal schedule
1:50pm–2:30pm Thursday, 03/28/2019
Session
Data Engineering & Architecture
Location: 2004
Haifeng Chen (Intel)
Average rating: ****.
(4.00, 3 ratings)
Spark SQL is widely used, but it still suffers from stability and performance challenges in highly dynamic environments with large-scale data. Haifeng Chen shares a Spark adaptive execution engine built to address these challenges. It can handle task parallelism, join conversion, and data skew dynamically during runtime, guaranteeing the best plan is chosen using runtime statistics. Read more.
Add to your personal schedule
1:50pm–2:30pm Thursday, 03/28/2019
Session
Data Engineering & Architecture
Location: 2006
Matvey Arye (TimescaleDB)
Average rating: ***..
(3.75, 4 ratings)
Matvey Arye offers an overview of two newly released features of TimescaleDB—automated adaptation of time-partitioning intervals and continuous aggregations in near real time—and discusses how these capabilities ease time series data management. Along the way, he also shares real-world use cases, including TimescaleDB's use with other technologies such as Kafka. Read more.
Add to your personal schedule
1:50pm–2:30pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2007
Piero Molino (Uber AI)
Average rating: ****.
(4.60, 5 ratings)
Piero Molino offers an overview of Ludwig, a deep learning toolbox that allows you to train models and use them for prediction without the need to write code. It's unique in its ability to help make deep learning easier to understand for nonexperts and enable faster model improvement iteration cycles for experienced machine learning developers and researchers alike. Read more.
Add to your personal schedule
1:50pm–2:30pm Thursday, 03/28/2019
Session
Data Engineering & Architecture
Location: 2008
Arun Kumar (University of California, San Diego)
Average rating: ****.
(4.00, 2 ratings)
Arun Kumar details recent techniques to accelerate ML over data that is the output of joins of multiple tables. Using ideas from query optimization and learning theory, Arun demonstrates how to avoid joins before ML to reduce runtimes and memory and storage footprints. Along the way, he explores open source software prototypes and sample ML code in both R and Python. Read more.
Add to your personal schedule
1:50pm–2:30pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2009
Secondary topics:  Security and Privacy
Animesh Singh (IBM), Tommy Li (IBM)
Average rating: ****.
(4.50, 2 ratings)
Animesh Singh and Tommy Li explain how to implement state-of-the-art methods for attacking and defending classifiers using the open source Adversarial Robustness Toolbox. The library provides AI developers with interfaces that support the composition of comprehensive defense systems using individual methods as building blocks. Read more.
Add to your personal schedule
1:50pm–2:30pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2010
Louis DiValentin (Accenture), Dillon Cullinan (Accenture)
Average rating: ***..
(3.00, 3 ratings)
Louis DiValentin and Dillon Cullinan explain how Accenture's Cyber Security Lab built security analytics models to detect attempted lateral movement in networks by transforming enterprise-scale security data into a graph format, generating graph analytics for individual users, and building time series detection models that visualize the changing graph metrics for security operators. Read more.
Add to your personal schedule
1:50pm–2:30pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2011
Jonathan Merriman (Verint Intelligent Self Service), Cynthia Freeman (Verint Intelligent Self-Service)
Average rating: ***..
(3.89, 9 ratings)
Anomaly detection has many applications, such as tracking business KPIs or fraud spotting in credit card transactions. Unfortunately, there's no one best way to detect anomalies across a variety of domains. Jonathan Merriman and Cynthia Freeman introduce a framework to determine the best anomaly detection method for the application based on time series characteristics. Read more.
Add to your personal schedule
1:50pm–2:30pm Thursday, 03/28/2019
Session
Future of the Firm
Location: 2014
Moderated by:
Tim O'Reilly (O'Reilly Media)
Panelists:
Janet Haven (Data & Society), Catherine Bracy (TechEquity Collaborative)
Average rating: ***..
(3.67, 3 ratings)
Tim O'Reilly will be joined by Janet Haven, executive director of Data & Society, and Catherine Bracy, director of the TechEquity Collaborative, to discuss ways in which tech employees are flexing their muscles as the conscience of their companies. Read more.
Add to your personal schedule
1:50pm–2:30pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2016
Average rating: ****.
(4.50, 2 ratings)
Deep learning using sequence-to-sequence networks (Seq2Seq) has demonstrated unparalleled success in neural machine translation. A less explored but highly sought-after area of forecasting can leverage recent gains made in Seq2Seq networks. Aashish Sheshadri explains how PayPal has applied deep networks to monitoring and alerting intelligence. Read more.
Add to your personal schedule
1:50pm–2:30pm Thursday, 03/28/2019
Session
Strata Business Summit
Location: 2018
Stuart Buck (Arnold Ventures)
Average rating: ****.
(4.50, 4 ratings)
Academic research has been plagued by a reproducibility crisis in fields ranging from medicine to psychology. Stuart Buck explains how to take precautions in your data analysis and experiments so as to avoid those reproducibility problems. Read more.
Add to your personal schedule
1:50pm–2:30pm Thursday, 03/28/2019
Ken Johnston (Microsoft), Ankit Srivastava (Microsoft)
Average rating: ****.
(4.80, 5 ratings)
At the rate data sources are multiplying, business value can often be developed faster by joining data sources rather than mining a single source to the very end. Ken Johnston and Ankit Srivastava share four years of hands-on practical experience sourcing and integrating massive numbers of data sources to build the Microsoft Business Intelligence Graph (M360 BIG). Read more.
Add to your personal schedule
1:50pm–2:30pm Thursday, 03/28/2019
Session
Visualization and UX
Location: 2024
Stefaan Vervaet (Western Digital Corporation), Alain Dufaux (École Polytechnique Fédérale de Lausanne (EPFL))
Average rating: *****
(5.00, 1 rating)
The École Polytechnique Fédérale de Lausanne (EPFL) spearheaded the official digital archival of 15,000+ hours of A/V content captured from the Montreux Jazz Festival since 1967. Stefaan Vervaet and Alain Dufaux explain how EPFL created an immersive 3D VR experience. From capture and store to delivery and experience, they detail the evolution of the workflow that made it all possible. Read more.

2:40pm

Add to your personal schedule
2:40pm–3:20pm Thursday, 03/28/2019
Session
Data Engineering & Architecture
Location: 2001
Xiao Li (Databricks), Wenchen Fan (Databricks)
Average rating: ***..
(3.25, 4 ratings)
Xiao Li and Wenchen Fan offer an overview of the major features and enhancements in Apache Spark 2.4 and give insight into upcoming releases. Then you'll get the chance to ask all your burning Spark questions. Read more.
Add to your personal schedule
2:40pm–3:20pm Thursday, 03/28/2019
Session
Data Engineering & Architecture
Location: 2002
Rohan Dhupelia (Atlassian), Jimmy Li (Atlassian)
Average rating: ****.
(4.67, 3 ratings)
Analytics is easy, but good analytics is hard. Atlassian knows this all too well. Rohan Dhupelia and Jimmy Li explain how the company's push to become truly data driven has transformed the way it thinks about behavioral analytics, from how it defined its events to how it ingests and analyzes them. Read more.
Add to your personal schedule
2:40pm–3:20pm Thursday, 03/28/2019
Session
Data Engineering & Architecture
Location: 2004
Eva Andreasson (Cloudera), Mark Brine (Cloudera), Michael Kohs (Cloudera)
Average rating: **...
(2.00, 3 ratings)
Michael Kohs, Eva Andreasson, and Mark Brine explain how Cloudera’s Finance Department used a hybrid model to speed up report delivery and reduce cost of end-of-quarter reporting. They also share guidelines for deploying modern data warehousing in a hybrid cloud environment, outlining when you should choose a private cloud service over a public one, the available options, and some dos and dont's. Read more.
Add to your personal schedule
2:40pm–3:20pm Thursday, 03/28/2019
Session
Data Engineering & Architecture
Location: 2006
Akshai Sarma (Yahoo), Nathan Speidel (Yahoo)
Average rating: ***..
(3.67, 3 ratings)
Akshai Sarma and Nathan Speidel offer an overview of Bullet, a scalable, pluggable, light multitenant query system on any data flowing through a streaming system without storing it. Bullet efficiently supports intractable operations like top K, count distincts, and windowing without any storage using sketch-based algorithms. Read more.
Add to your personal schedule
2:40pm–3:20pm Thursday, 03/28/2019
Session
Culture and organization
Location: 2007
Jesse Anderson (Big Data Institute), Thomas Goolsby (USAA)
Average rating: ***..
(3.67, 6 ratings)
What happens when you have a data science organization but no data engineering organization? Jesse Anderson and Thomas Goolsby explain what happened at USAA without data engineering, how they fixed it, and the results since. Read more.
Add to your personal schedule
2:40pm–3:20pm Thursday, 03/28/2019
Session
Data Engineering & Architecture
Location: 2008
Paul Curtis (Weaveworks)
Average rating: ****.
(4.50, 2 ratings)
What do you do when your technology doesn’t easily fit on a single laptop and consists of many components? Paul Curtis explains how MapR Technologies rolled out a containerized, scalable, globally available, and easily updatable environment using a combination of Kubernetes to orchestrate, shared data fabric to store and persist, and AppLariat to provide the user interface. Read more.
Add to your personal schedule
2:40pm–3:20pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2009
Secondary topics:  Health and Medicine
Kirstin Aschbacher (UCSF Cardiology)
Average rating: ****.
(4.20, 5 ratings)
Some people use digital devices to track their blood alcohol content (BAC). A BAC-tracking app that could anticipate when a person is likely to have a high BAC could offer coaching in a time of need. Kirstin Aschbacher shares a machine learning approach that predicts user BAC levels with good precision based on minimal information, thereby enabling targeted interventions. Read more.
Add to your personal schedule
2:40pm–3:20pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2010
Till Bergmann (Salesforce)
Average rating: ***..
(3.67, 6 ratings)
A problem in predictive modeling data is label leakage. At enterprise companies such as Salesforce, this problem takes on monstrous proportions as the data is populated by diverse business processes, making it hard to distinguish cause from effect. Till Bergmann explains how Salesforce—which needs to churn out thousands of customer-specific models for any given use case—tackled this problem. Read more.
Add to your personal schedule
2:40pm–3:20pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2011
Secondary topics:  Retail and e-commerce
Kapil Gupta (Airbnb)
Average rating: ***..
(3.50, 4 ratings)
Kapil Gupta explains how Airbnb approaches the personalization of travelers’ booking experiences using machine learning. Read more.
Add to your personal schedule
2:40pm–3:20pm Thursday, 03/28/2019
Session
Future of the Firm
Location: 2014
Moderated by:
Josh Bersin (Bersin by Deloitte)
Panelists:
Nancy Vitale (Genentech), Josh Alwitt (Publicis Sapient), Erin Flynn (Optimizely)
Average rating: ****.
(4.50, 2 ratings)
In this panel session, executives will discuss how their companies are adapting to the workforce, business, and economic trends shaping the future of business. Read more.
Add to your personal schedule
2:40pm–3:20pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2016
Sridhar Alla (BlueWhale), Syed Nasar (Cloudera)
Average rating: **...
(2.86, 7 ratings)
Any business big or small depends on analytics, whether the goal is revenue generation, churn reduction, or sales and marketing. No matter the algorithm and the techniques used, the result depends on the accuracy and consistency of the data being processed. Sridhar Alla and Syed Nasar share techniques used to evaluate the the quality of data and the means to detect the anomalies in the data. Read more.
Add to your personal schedule
2:40pm–3:20pm Thursday, 03/28/2019
Session
Case studies, Strata Business Summit
Location: 2018
Mei Fung (People Centered Internet)
Average rating: ****.
(4.67, 3 ratings)
Data sharing necessitates stakeholders and populations of people to come together to learn the benefits, risks, challenges, and known and unknown "unknowns." Data sharing policies and frameworks require increasing levels of trust, which takes time to build. Join Mei Fung for trail-blazing stories from Solano County, California, and ASEAN (SE Asia), which offer important insights Read more.
Add to your personal schedule
2:40pm–3:20pm Thursday, 03/28/2019
Secondary topics:  Security and Privacy
Mark Donsky (Okera), Nikki Rouda (Amazon Web Services)
Average rating: ****.
(4.33, 3 ratings)
The implications of new privacy regulations for data management and analytics, such as the General Data Protection Regulation (GDPR) and the upcoming California Consumer Protection Act (CCPA), can seem complex. Mark Donsky and Nikki Rouda highlight aspects of the rules and outline the approaches that will assist with compliance. Read more.
Add to your personal schedule
2:40pm–3:20pm Thursday, 03/28/2019
Session
Data Engineering & Architecture
Location: 2024
John Bennett (Netflix), Siamac Mirzaie (Netflix)
Average rating: ***..
(3.33, 3 ratings)
Data has become a foundational pillar for security teams operating in organizations of all shapes and sizes. This new norm has created a need for platforms that enable engineers to harness data for various security purposes. John Bennett and Siamac Mirzaie offer an overview of Netflix's internal platform for quickly deploying data-based detection capabilities in the corporate environment. Read more.

3:20pm

3:20pm–3:50pm Thursday, 03/28/2019
Location: Foyer
Afternoon break (30m)

3:50pm

Add to your personal schedule
3:50pm–4:30pm Thursday, 03/28/2019
Session
Data Engineering & Architecture
Location: 2001
Li Gao (Lyft), Bill Graham (Lyft)
Average rating: ****.
(4.00, 2 ratings)
Li Gao and Bill Graham discuss the challenges the Lyft team faced and solutions they developed to support Apache Spark on Kubernetes in production and at scale. Read more.
Add to your personal schedule
3:50pm–4:30pm Thursday, 03/28/2019
Session
Data Engineering & Architecture
Location: 2002
Igor Canadi (Rockset), Dhruba Borthakur (Rockset)
Average rating: ****.
(4.00, 1 rating)
Most existing big data systems prefer sequential scans for processing queries. Igor Canadi and Dhruba Borthakur challenge this view, offering an overview of converged indexing: a single system called ROCKSET that builds inverted, columnar, and document indices. Converged indexing is economically feasible due to the elasticity of cloud-resources and write optimized storage engines. Read more.
Add to your personal schedule
3:50pm–4:30pm Thursday, 03/28/2019
Session
Data Engineering & Architecture
Location: 2004
Secondary topics:  Data Platforms
Adrian Lungu (Adobe), Serban Teodorescu (Adobe)
Average rating: ****.
(4.75, 4 ratings)
Adrian Lungu and Serban Teodorescu explain how—inspired by the green-blue deployment technique—the Adobe Audience Manager team developed an active-passive database migration procedure that allows them to test database clusters in production, minimizing the risks without compromising the innovation. Read more.
Add to your personal schedule
3:50pm–4:30pm Thursday, 03/28/2019
Session
Data Engineering & Architecture
Location: 2006
Vaclav Surovec (Deutsche Telekom), Gabor Kotalik (Deutsche Telekom)
Average rating: ****.
(4.00, 1 rating)
Knowledge of customers' location and travel patterns is important for many companies, including German telco service operator Deutsche Telekom. Václav Surovec and Gabor Kotalik explain how a commercial roaming project using Cloudera Hadoop helped the company better analyze the behavior of its customers from 10 countries and provide better predictions and visualizations for management. Read more.
Add to your personal schedule
3:50pm–4:30pm Thursday, 03/28/2019
Session
Business Analytics and Visualization
Location: 2007
Neerav Jain (Walgreens), Anne Cruz (Walgreens), Vikas Hardia (Kyvos )
Average rating: **...
(2.75, 4 ratings)
Walgreens recently faced the challenge of analyzing 466 billion rows of data from 20,000 suppliers and 9,000 stores, which strained its existing systems when dealing with the scale and cardinality of data. Neerav Jain, Vikas Hardia, and Anne Cruz describe how they used Kyvos and Tableau to transform Walgreens's supply chain with instant, interactive analysis on two-year data. Read more.
Add to your personal schedule
3:50pm–4:30pm Thursday, 03/28/2019
Session
Data Engineering & Architecture
Location: 2008
Yuan Zhou (Intel), haodong tang (Intel), Jian Zhang (Intel)
Average rating: ***..
(3.33, 3 ratings)
Yuan Zhou, Haodong Tang, and Jian Zhang offer an overview of Spark-PMOF and explain how it improves Spark analytics performance. Read more.
Add to your personal schedule
3:50pm–4:30pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2009
Secondary topics:  Health and Medicine
Noah Gift (UC Davis ), Michelle Davenport (Quantitative Nutrition)
Average rating: **...
(2.89, 9 ratings)
Noah Gift and Michelle Davenport explore exciting ideas in nutrition using data science; specifically, they analyze the detrimental relationship between sugar and longevity, obesity, and chronic diseases. Read more.
Add to your personal schedule
3:50pm–4:30pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2010
Patrick Miller (Civis Analytics)
Average rating: ***..
(3.40, 5 ratings)
Brands that test the content of ads before they are shown to an audience can avoid spending resources on the 11% of ads that cause backlash. Using a survey experiment to choose the best ad typically improves effectiveness of marketing campaigns by 13% on average, and up to 37% for particular demographics. Patrick Miller explores data collection and statistical methods for analysis and reporting. Read more.
Add to your personal schedule
3:50pm–4:30pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2011
Average rating: ****.
(4.00, 2 ratings)
RAPIDS is the next big step in data science, combining the ease of use of common APIs and the power and scalability of GPUs. Bartley Richardson and Joshua Patterson offer an overview of RAPIDS and and explore cuDF, cuGraph, and cuML—a trio of RAPIDS tools that enable data scientists to work with data in a familiar interface and apply graph analytics and traditional machine learning techniques. Read more.
Add to your personal schedule
3:50pm–4:30pm Thursday, 03/28/2019
Session
Future of the Firm
Location: 2014
Average rating: ***..
(3.33, 3 ratings)
Jeffrey Wong explains how an old-world firm leveraged technology to transform everything and thrive in our new world of continuous change—anticipating, scaling, and adapting to meet internal needs and client expectations. Read more.
Add to your personal schedule
3:50pm–4:30pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2016
Secondary topics:  Data Platforms, Deep Learning
Yuhao Yang (Intel), Jiao(Jennie) Wang (Intel)
Average rating: **...
(2.67, 3 ratings)
Yuhao Yang and Jennie Wang demonstrate how to run distributed TensorFlow on Apache Spark with the open source software package Analytics Zoo. Compared to other solutions, Analytics Zoo is built for production environments and encourages more industry users to run deep learning applications with the big data ecosystems. Read more.
Add to your personal schedule
3:50pm–4:30pm Thursday, 03/28/2019
Pierre Romera (International Consortium of Investigative Journalists (ICIJ))
Average rating: ****.
(4.67, 6 ratings)
The ICIJ was the team behind the Panama Papers and Paradise Papers. Pierre Romera offers a behind-the-scenes look into the ICIJ's process and explores the challenges in handling 1.4 TB of data (in many different formats)—and making it available securely to journalists all over the world. Read more.
Add to your personal schedule
3:50pm–4:30pm Thursday, 03/28/2019
Dean Wampler (Anyscale)
Average rating: ****.
(4.33, 6 ratings)
Your team is building machine learning capabilities. Dean Wampler demonstrates how to integrate these capabilities in streaming data pipelines so you can leverage the results quickly and update them as needed and covers challenges such as how to build long-running services that are very reliable and scalable and how to combine a spectrum of very different tools, from data science to operations. Read more.
Add to your personal schedule
3:50pm–4:30pm Thursday, 03/28/2019
Session
Data Engineering & Architecture
Location: 2024
J Delange (Twitter), N Lu (Twitter)
Average rating: **...
(2.67, 3 ratings)
Julien Delange and Neng Lu explain how Twitter uses the Heron stream processing engine to monitor and analyze its network infrastructure—implementing a new data pipeline that ingests multiple sources and processes about 1 billion tuples to detect network issues and generate usage statistics. Join in to learn the key technologies used, the architecture, and the challenges Twitter faced. Read more.

4:40pm

Add to your personal schedule
4:40pm–5:20pm Thursday, 03/28/2019
Session
Data Engineering & Architecture
Location: 2001
Holden Karau (Independent), Rachel B Warren (Salesforce Einstein)
Average rating: ****.
(4.60, 5 ratings)
Apache Spark is an amazing distributed system, but part of the bargain we've made with the infrastructure deamons involves providing the correct set of magic numbers (a.k.a. tuning) or our jobs may be eaten by Cthulhu. Holden Karau and Rachel Warren explore auto-tuning jobs using systems like Apache BEAM, Mahout, and internal Spark ML jobs as workloads—including new settings in 2.4. Read more.
Add to your personal schedule
4:40pm–5:20pm Thursday, 03/28/2019
Session
Data Engineering & Architecture
Location: 2002
Sonali Sharma (Netflix), Shriya Arora (Netflix)
Average rating: ***..
(3.00, 2 ratings)
With so much data being generated in real time, what if we could combine all these high-volume data streams and provide near real-time feedback for model training, improving personalization and recommendations and taking the customer experience to a whole new level. Sonali Sharma and Shriya Arora explain how to do exactly that, using Flink's keyed state. Read more.
Add to your personal schedule
4:40pm–5:20pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2004
Ji Peng (Earnin )
Average rating: ****.
(4.50, 2 ratings)
As a customer-facing fintech company, Earnin has access to various types of valuable customer data, from bank transactions to GPS location. Ji Peng shares how Earnin uses unique datasets to build machine learning models and navigates the challenges of prioritizing and applying machine learning in the fintech domain. Read more.
Add to your personal schedule
4:40pm–5:20pm Thursday, 03/28/2019
Jinchul Kim (SK Telecom)
Average rating: **...
(2.17, 6 ratings)
Druid supports autoscaling for data ingestion, but it's only available on AWS EC2. You can't rely on the feature on your private cloud. Jinchul Kim demonstrates autoscale-out/in on Kubernetes, details the benefit on this approach, and discusses the development of Druid Helm charts, rolling updates, and custom metric usage for horizontal autoscaling. Read more.
Add to your personal schedule
4:40pm–5:20pm Thursday, 03/28/2019
Session
Case studies
Location: 2007
Nancy Rausch (SAS)
Average rating: ****.
(4.80, 5 ratings)
For data to be meaningful, it needs to be presented in a way that people can relate to. Nancy Rausch explains how she combined streaming data from a solar array and machine learning techniques to create a live-action art piece—an approach that helped bring the data to life in a fun and compelling way. Read more.
Add to your personal schedule
4:40pm–5:20pm Thursday, 03/28/2019
Session
Data Engineering & Architecture
Location: 2008
Patrick Stuedi (IBM Research)
Average rating: ****.
(4.00, 1 rating)
Modern networking and storage technologies like RDMA or NVMe are finding their way into the data center. Patrick Stuedi offers an overview of Apache Crail (incubating), a new project that facilitates running data processing workloads (ML, SQL, etc.) on such hardware. Patrick explains what Crail does and how it benefits workloads based on TensorFlow or Spark. Read more.
Add to your personal schedule
4:40pm–5:20pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2009
Alex Gorbachev (Pythian), Paul Spiegelhalter (Pythian)
Average rating: ****.
(4.67, 3 ratings)
Alex Gorbachev and Paul Spiegelhalter use the example of a mining haul truck to explain how to map preventive maintenance needs to supervised machine learning problems, create labeled datasets, do feature engineering from sensors and alerts data, evaluate models—then convert it all to a complete AI solution on Google Cloud Platform that's integrated with existing on-premises systems. Read more.
Add to your personal schedule
4:40pm–5:20pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2010
Secondary topics:  Media, Marketing, Advertising
Shradha Agrawal (Adobe)
Average rating: ****.
(4.17, 6 ratings)
Decision making often struggles with the exploration-exploitation dilemma. Multi-armed bandits (MAB) are a popular reinforcement learning solution, but increasing the number of decision criteria leads to an exponential blowup in complexity, and observational delays don’t allow for optimal performance. Shradha Agrawal offers an overview of MABs and explains how to overcome the above challenges. Read more.
Add to your personal schedule
4:40pm–5:20pm Thursday, 03/28/2019
Secondary topics:  Security and Privacy
Michael Gregory (Cloudera)
Average rating: ****.
(4.25, 4 ratings)
The General Data Protection Regulation (GDPR) enacted by the European Union restricts the use of machine learning practices in many cases. Michael Gregory offers an overview of the regulations, important considerations for both EU and non-EU organizations, and tools and technologies to ensure that you're appropriately using ML applications to drive continued transformation and insights. Read more.
Add to your personal schedule
4:40pm–5:00pm Thursday, 03/28/2019
Session
Future of the Firm
Location: 2014
Cathryn Posey (Capital One)
Average rating: ****.
(4.33, 3 ratings)
Cathryn Posey explains how Capital One—the only bank fully committed to a cloud-based infrastructure—is approaching machine learning with a responsible, human-centered focus. Join in to hear about Capital One's research in areas like explainable AI, how the bank is leveraging the technology, and ways in which it can be used for good. Read more.
Add to your personal schedule
4:40pm–5:20pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2016
Secondary topics:  Deep Learning, Retail and e-commerce
Christopher Lennan (idealo.de)
Average rating: ****.
(4.00, 1 rating)
Idealo.de recently trained convolutional neural networks (CNN) for aesthetic and technical image quality predictions. Christopher Lennan shares the training approach, along with some practical insights, and sheds light on what the trained models actually learned by visualizing the convolutional filter weights and output nodes of the trained models. Read more.
Add to your personal schedule
4:40pm–5:20pm Thursday, 03/28/2019
Session
Data Engineering & Architecture
Location: 2018
Secondary topics:  Model lifecycle management
Harish Doddi (Datatron), Jerry Xu (Datatron Technologies)
Average rating: ****.
(4.00, 1 rating)
Harish Doddi and Jerry Xu share the challenges they faced scaling machine learning models and detail the solutions they're building to conquer them. Read more.
Add to your personal schedule
4:40pm–5:20pm Thursday, 03/28/2019
Michael Li (The Data Incubator)
Average rating: ***..
(3.75, 4 ratings)
As their data and AI teams scale from one to thousands of employees and the maturity of their analytics capabilities evolve, companies find that the analytics journey is not always smooth. Drawing on experiences gleaned from dozens of clients, Michael Li discusses organizational growing pains and the best practices that successful executives have adopted to scale and grow their team. Read more.
Add to your personal schedule
4:40pm–5:20pm Thursday, 03/28/2019
Session
Data Engineering & Architecture
Location: 2024
Yves Thibaudeau (US Census Bureau)
Average rating: ***..
(3.33, 3 ratings)
The US Census Bureau has been involved in record linkage projects for over 40 years. In that time, there's been a lot of change in computing capabilities and new techniques, and the Census Bureau is reviewing an inventory of linkage methodologies. Yves Thibaudeau describes the progress made so far in identifying specific record linkage techniques for specific applications. Read more.

5:00pm

Add to your personal schedule
5:00pm–5:20pm Thursday, 03/28/2019
Session
Future of the Firm
Location: 2014
James Cham (Bloomberg Beta)
Average rating: ****.
(4.67, 3 ratings)
Missing amid conversations about corporate strategy and innovation is a mostly untapped source of new ideas and efficiency—the people actually doing the work. James Cham explains why this a problem and suggests some possible solutions. Read more.