Presented By
O’Reilly + Cloudera
Make Data Work
March 25-28, 2019
San Francisco, CA

Monday, 03/25/2019

9:00am

Add to your personal schedule
9:00am–5:00pm Monday, 03/25/2019
Training
Strata Business Summit
Location: 2010 Level: Non-technical
Secondary topics:  AI and machine learning in the enterprise
Rich Ott (The Data Incubator)
This course offers a non-technical overview of AI and data science. You’ll learn common techniques, how to apply them in your organization, and common pitfalls to avoid. Though this course, you’ll pick up the language and develop a framework to be able to effectively engage with technical experts and utilize their input and analysis for your business’s strategic priorities and decision making. Read more.
Add to your personal schedule
9:00am–5:00pm Monday, 03/25/2019
Training
Data Science, Machine Learning & AI
Location: 2014 Level: Intermediate
Secondary topics:  Deep Learning
Robert Schroll (The Data Incubator)
The TensorFlow library provides for the use of computational graphs, with automatic parallelization across resources. This architecture is ideal for implementing neural networks. This training will introduce TensorFlow's capabilities in Python. It will move from building machine learning algorithms piece by piece to using the Keras API provided by TensorFlow with several hands-on applications. Read more.
Add to your personal schedule
9:00am–5:00pm Monday, 03/25/2019
Training
Data Science, Machine Learning & AI
Location: 2016 Level: Intermediate
Zachary Glassman (The Data Incubator)
We will walk through all the steps - from prototyping to production - of developing a machine learning pipeline. We’ll look at data cleaning, feature engineering, model building/evaluation, and deployment. Students will extend these models into two applications from real-world datasets. All work will be done in Python. Read more.
Add to your personal schedule
9:00am–5:00pm Monday, 03/25/2019
Training
Data Engineering and Architecture
Location: 2018 Level: Intermediate
Secondary topics:  AI and Data technologies in the cloud
Jorge A. Lopez (Amazon Web Services)
Serverless technologies let you build and scale applications and services rapidly without the need to provision or manage servers. In this workshop, we show you how to incorporate serverless concepts into your big data architectures, looking at design patterns to ingest, store, and analyze your data. You will build a big data application using AWS technologies such as S3, Athena, Kinesis, and more Read more.
Add to your personal schedule
9:00am–5:00pm Monday, 03/25/2019
Training
Data Science, Machine Learning & AI
Location: 2020 Level: Beginner
Ian Cook (Cloudera)
Advancing your career in data science requires learning new languages and frameworks—but learners face an overwhelming array of choices, each with different syntaxes, conventions, and terminology. Ian Cook simplifies the learning process by elucidating the abstractions common to these systems. Through hands-on exercises, you'll overcome obstacles to getting started using new tools. Read more.
Add to your personal schedule
9:00am–5:00pm Monday, 03/25/2019
Training
Data Engineering and Architecture
Location: 3004 Level: Intermediate
Jesse Anderson (Big Data Institute)
Takes a participant through an in-depth look at Apache Kafka. We show how Kafka works and how to create real-time systems with it. It shows how to create consumers and publishers in Kafka. The we look at Kafka’s ecosystem and how each one is used. We show how to use Kafka Streams, Kafka Connect, and KSQL. Read more.
Add to your personal schedule
9:00am–5:00pm Monday, 03/25/2019
Training
Data Science, Machine Learning & AI
Location: 3006
Secondary topics:  Deep Learning, Temporal data and time-series analytics
Francesca Lazzeri (Microsoft)
Francesca Lazzeri will walk you through the core steps for using Azure Machine Learning services to train your machine learning models both locally and on remote compute resources. Read more.

10:30am

10:30am–11:00am Monday, 03/25/2019
Location: 2nd floor lobby
Morning break (30m)

12:30pm

12:30pm–1:30pm Monday, 03/25/2019
Location: 2nd floor lobby
Lunch (1h)

3:00pm

3:00pm–3:30pm Monday, 03/25/2019
Location: 2nd floor lobby
Afternoon break (30m)

Tuesday, 03/26/2019

9:00am

Add to your personal schedule
9:00am–12:30pm Tuesday, 03/26/2019
Tutorial
Data Science, Machine Learning & AI
Location: 2001 Level: Intermediate
Iman Saleh (Intel), Cory Ilo (Intel), Cindy Tseng (Intel)
From healthcare to smart home to autonomous vehicles, new applications of autonomous systems are raising ethical concerns including bias, transparency, and privacy. In this tutorial, we will demonstrate tools and capabilities that can help data scientists address these concerns. The tools help bridge the gap between ethicists and regulators, and machine learning practitioners. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 03/26/2019
Tutorial
Data Science, Machine Learning & AI
Location: 2002 Level: Advanced
Secondary topics:  Deep Learning, Temporal data and time-series analytics
Martin Gorner (Google)
Hands-on with Recurrent Neural Networks and Tensorflow. Discover what makes RNNs so powerful for time series analysis. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 03/26/2019
Tutorial
Executive Briefing and best practices, Strata Business Summit
Location: 2003 Level: Non-technical
Secondary topics:  AI and machine learning in the enterprise
Joshua Poduska (Domino Data Lab)
The honeymoon era of data science is ending: accountability is coming. Not content to wait for results that may or may not arrive, successful data science leaders deliver measurable impact on an increasing share of an enterprises’ KPIs. You’ll learn how leading organizations take a holistic approach to people, process, and technology to build a sustainable competitive advantage. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 03/26/2019
Tutorial
Streaming and IoT
Location: 2004 Level: Beginner
Jeff Bean (data Artisans)
This hands-on session introduces Flink via the SQL interface. You will receive an overview of stream processing, and a survey of Apache Flink with its various modes of use. Then we’ll use Flink to run SQL queries on data streams and contrast this with the Flink data stream API. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 03/26/2019
Tutorial
Data Engineering and Architecture
Location: 2005 Level: Intermediate
Mark Madsen (Think Big Analytics), Todd Walter (Teradata)
Building a data lake involves more than installing Hadoop or putting data into AWS. The goal in most organizations is to build a multiuse data infrastructure that is not subject to past constraints. Mark Madsen and Todd Walter explore design assumptions and principles and walk you through a reference architecture to use as you work to unify your analytics infrastructure. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 03/26/2019
Tutorial
Data Engineering and Architecture
Location: 2006 Level: Intermediate
Secondary topics:  AI and machine learning in the enterprise
Jonathan Seidman (Cloudera), Ted Malaska (Capital One)
The enterprise data management space has changed dramatically in recent years, and this had led to new challenges for organizations in creating successful data practices. In this presentation we’ll provide guidance and best practices from planning to implementation based on years of experience working with companies to deliver successful data projects. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 03/26/2019
Tutorial
Streaming and IoT
Location: 2007 Level: Intermediate
Secondary topics:  Data preparation, data governance, data lineage, and data privacy, Model lifecycle management
Boris Lublinsky (Lightbend), Dean Wampler (Lightbend)
This hands-on tutorial examines production use of ML in streaming data pipelines; how to do periodic model retraining and low-latency scoring in live streams. We'll discuss Kafka as the data backplane, pros and cons of microservices vs. systems like Spark and Flink, tips for Tensorflow and SparkML, performance considerations, model metadata tracking, and other techniques. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 03/26/2019
Tutorial
Data Engineering and Architecture
Location: 2008 Level: Intermediate
Santosh Kumar (Cloudera)
Cloudera SDX provides unified metadata control, simplifies administration, and maintains context as well as data lineage across storage services, workloads, and operating environments. In this 3h tutorial, we cover the background to SDX, before diving deep into the moving parts and also get hands on in setting it up. You'll leave with all the skills and experience you need to setup your own SDX. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 03/26/2019
Tutorial
Data Science, Machine Learning & AI
Location: 2009 Level: Intermediate
Secondary topics:  Deep Learning, Text and Language processing and analysis
David Talby (Pacific AI), Alexander Thomas (Indeed), Claudiu Branzan (G2 Web Services)
This is a hands-on tutorial for scalable NLP using the highly performant, highly scalable open-source Spark NLP library. You’ll spend about half your time coding as you work through four sections, each with an end-to-end working codebase that you can change and improve. Read more.
Add to your personal schedule
Add to your personal schedule
9:00am–12:30pm Tuesday, 03/26/2019
Tutorial
Data Science, Machine Learning & AI
Location: 2011 Level: Intermediate
Secondary topics:  AI and Data technologies in the cloud, Media, Marketing, Advertising
David Arpin (Amazon Web Services)
Learn how to use the Amazon SageMaker platform to build a machine learning model to recommend products to customers based on their past preferences. Read more.
Add to your personal schedule
Add to your personal schedule
Add to your personal schedule
Add to your personal schedule
Add to your personal schedule
9:00am–5:00pm Tuesday, 03/26/2019
Location: 2022
Tao Feng (lyft), Alex Kudriashova (Astro Digital), Jonathan Francis (Starbucks), JoLynn Lavin (General Mills, Inc), Robin Way (Corios), June Andrews (GE), kyungtaak Noh (SK Telecom), Taposh Dutta Roy (Kaiser Permanente), Sabrina Dahlgren (Kaiser Permanente), Craig Rowley (Columbia Sportswear), Ambal Balakrishnan (IBM), Benjamin Glicksberg (UCSF)
Hear practical insights from household brands and global companies: the challenges they tackled, approaches they took, and the benefits—and drawbacks—of their solutions. Read more.
Add to your personal schedule
9:00am–5:00pm Tuesday, 03/26/2019
Location: 2024
From data breaches and campaign influence to fraud and recidivism, data ethics are at the forefront of today's headlines. In this day-long event, academics, practitioners, and innovators dive deep into the thorny issues of data, privacy, bias, and morality. Read more.
Add to your personal schedule
Add to your personal schedule

10:30am

10:30am–11:00am Tuesday, 03/26/2019
Location: 2nd floor lobby
Morning break (30m)

12:30pm

12:30pm–1:30pm Tuesday, 03/26/2019
Location: 2nd and 3rd floor lobbies
Lunch (1h)

1:30pm

Add to your personal schedule
1:30pm–5:00pm Tuesday, 03/26/2019
Tutorial
Data Science, Machine Learning & AI
Location: 2001 Level: Intermediate
Patrick Hall (H2O.ai | George Washington University)
If machine learning can lead to financial gains for your organization why isn’t everyone doing it? One reason is training machine learning systems with transparent inner-workings and auditable predictions is difficult. This talk will present the good, bad, and downright ugly lessons learned from the presenters’ years of experience in implementing solutions for interpretable machine learning. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 03/26/2019
Tutorial
Data Science, Machine Learning & AI
Location: 2002 Level: Beginner
Secondary topics:  Deep Learning, Media, Marketing, Advertising, Model lifecycle management
Abhishek Kumar (Publicis.Sapient), Dr. Vijay Srinivas Agneeswaran (Publicis.Sapient)
This tutorial describes deep learning based recommender and personalisation systems that we have built for clients. The tutorial primarily gives the view of TensorFlow Serving and MLFlow for the end-to-end productionalization, including model serving, dockerization, reproducibility and experimentation plus how to use Kubernetes for deployment and orchestration of ML based micro-architectures. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 03/26/2019
Tutorial
Executive Briefing and best practices, Strata Business Summit
Location: 2003 Level: Non-technical
Secondary topics:  Ethics and Privacy, Security
Andrew Burt (Immuta), Steve Touw (Immuta)
This tutorial will provide a hands on overview of how to train, validate and audit machine learning models (ML) across the enterprise. As ML becomes increasingly important, managing its risks is quickly becoming one of the biggest challenges to the technology’s widespread adoption. Join us to walk through practical tools and best practices to help safely deploy ML. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 03/26/2019
Tutorial
Data Engineering and Architecture
Location: 2004 Level: Beginner
Matt Fuller (Starburst)
Used by Facebook, Netflix, Airbnb, LinkedIn, Twitter, Uber, and others, Presto has become the ubiquitous open source software for SQL-on-Anything. Presto was built from the ground up for fast interactive SQL analytics against disparate data sources ranging in size from Gigabytes to Petabytes. In this tutorial, attendees will learn Presto usages, best practices, and optional hands on exercises. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 03/26/2019
Tutorial
Data Engineering and Architecture
Location: 2005 Level: Intermediate
Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio)
Many industry segments have been grappling with fast data (high-volume, high-velocity data). In this tutorial we shall lead the audience through a journey of the landscape of state-of-the-art systems for each stage of an end-to-end data processing pipeline - messaging, compute and storage - for real-time data and algorithms to extract insights - e.g., heavy-hitters, quantiles - from data streams. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 03/26/2019
Tutorial
Data Science, Machine Learning & AI
Location: 2006 Level: Intermediate
Secondary topics:  AI and machine learning in the enterprise
Sourav Dey (Manifold), Alex Ng (Manifold)
Many teams are still run as if data science is mainly about experimentation, but those days are over. Now it must be turnkey to take models into production. Sourav Day and Alex Ng explain how to streamline a machine learning project and help your engineers work as an an integrated part of your production teams, using a Lean AI process and the Orbyter package for Docker-first data science. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 03/26/2019
Tutorial
Data Science, Machine Learning & AI
Location: 2007 Level: Intermediate
Secondary topics:  AI and Data technologies in the cloud, Model lifecycle management
Holden Karau (Google), Francesca Lazzeri (Microsoft), Trevor Grant (IBM), Ilan Filonenko (Bloomberg LP)
This workshop will quickly introduce what Kubeflow is, and how we can use it to train and serve models across different cloud environments (and on-prem). We’ll have a script to do the initial set up work ready so you can jump (almost) straight into training a model on one cloud, and then look at how to set up serving in another cluster/cloud. We will start with a simple model w/follow up links. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 03/26/2019
Tutorial
Data Engineering and Architecture
Location: 2008 Level: Intermediate
Secondary topics:  AI and Data technologies in the cloud
Jason Wang (Cloudera), Tony Wu (Cloudera), Vinithra Varadharajan (Cloudera)
Moving to the cloud poses challenges from re-architecting to be cloud-native, to data context consistency across workloads that span multiple clusters on-prem and in the cloud. First, we’ll cover in depth cloud architecture and challenges; second, you’ll use Cloudera Altus to build data warehousing and data engineering clusters and run workloads that share metadata between them using Cloudera SDX. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 03/26/2019
Tutorial
Data Science, Machine Learning & AI
Location: 2009 Level: Intermediate
Secondary topics:  Deep Learning, Temporal data and time-series analytics
Jason Dai (Intel), Yuhao Yang (Intel), Jennie Wang (Intel), Guoqiong Song (Intel)
In this tutorial, we will show how to build and productionize deep learning applications for Big Data using "Analytics Zoo":https://github.com/intel-analytics/analytics-zoo (a unified analytics + AI platform that seamlessly unites Spark, TensorFlow, Keras and BigDL programs into an integrated pipeline) using real-world use cases (such as JD.com, MLSListings, World Bank, Baosight, Midea/KUKA, etc.) Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 03/26/2019
Tutorial
Data Science, Machine Learning & AI
Location: 2011 Level: Intermediate
Secondary topics:  AI and machine learning in the enterprise
Chi-Yi Kuan (LinkedIn), Yongzheng Zhang (LinkedIn), Julie Wang (LinkedIn), Xiaojing Dong (LinkedIn), Wei Di (LinkedIn)
Thanks to the rapid growth in data resources, it is common for business leaders to appreciate the challenge and importance in mining the information from data. In this tutorial, a group of well respected data scientists would share with you their experiences and success on leveraging the emerging techniques in assisting intelligent decisions, that would lead to impactful outcomes at LinkedIn. Read more.

3:00pm

3:00pm–3:30pm Tuesday, 03/26/2019
Location: 2nd floor lobby
Afternoon break (30m)

5:00pm

Add to your personal schedule
5:00pm–7:00pm Tuesday, 03/26/2019
Event
Location: Expo Hall (Exhibit Hall - Level 1)
Enjoy delicious snacks and beverages with fellow Strata attendees, speakers and sponsors at the Opening Reception happening immediately after tutorials on Tuesday. Read more.

Wednesday, 03/27/2019

8:00am

Add to your personal schedule
8:00am–8:30am Wednesday, 03/27/2019
Event
Location: TBD
Gather before Keynotes on Wednesday morning to enjoy casual conversation while meeting fellow attendees. If one of your goals at Strata is to meet new people, this session will jumpstart your networking opportunities. Read more.

8:45am

Add to your personal schedule
8:45am–10:30am Wednesday, 03/27/2019
Keynote
Location: Ballroom
Ben Lorica (O'Reilly Media), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Program Chairs, Ben Lorica, Alistair Croll, and Doug Cutting, welcome you to the first day of keynotes. Read more.

10:30am

10:30am–11:00am Wednesday, 03/27/2019
Location: Expo Hall (Exhibit Hall - Level 1)
Break (30m)

11:00am

Add to your personal schedule
11:00am–11:40am Wednesday, 03/27/2019
Session
Data Engineering and Architecture
Location: 2001 Level: Intermediate
Jitender Aswani (Netflix), Di Lin (Netflix)
Hundreds of thousands of ETL pipelines ingest over a trillion events daily to populate millions of data tables downstream at Netflix. This session discusses Netflix’s internal data lineage service aimed at establishing end-to-end lineage across millions of data artifacts that was essential for enhancing platform’s reliability, increasing trust in data and improving data infrastructure efficiency. Read more.
Add to your personal schedule
11:00am–11:40am Wednesday, 03/27/2019
Session
Data Engineering and Architecture
Location: 2002 Level: Beginner
Secondary topics:  Data Platforms, Retail and e-commerce, Temporal data and time-series analytics
Jian Chang (Alibaba Group), Sanjian Chen (Alibaba Group)
We focus on sharing the design of the AI Engine on Alibaba TSDB service that enables fast and complex analytics of large-scale retail data. A successful case study of the Fresh Hema Supermarket, a major “New Retail” platform operated by Alibaba Group. We will highlight our solutions to the major technical challenges in data cleaning, storage and processing. Read more.
Add to your personal schedule
11:00am–11:40am Wednesday, 03/27/2019
Session
Data Engineering and Architecture
Location: 2004 Level: Intermediate
Shubham Tagra (Qubole)
Running Presto in AWS at 1/10th the cost with AWS Spot nodes can be achieved with few architectural enhancements to Presto. This talk will explain the gaps in Presto architecture to use spot nodes and cover these enhancements and showcase the improvements in terms of reliability and TCO achieved through them. Read more.
Add to your personal schedule
11:00am–11:40am Wednesday, 03/27/2019
Session
Data Engineering and Architecture
Location: 2006 Level: Intermediate
Osman Sarood (Mist Systems), Chunky Gupta (Mist Systems)
Live Aggregators(LA) is a highly reliable and scalable in-house real time aggregation system that can autoscale for sudden changes in load. LA consumes billions of kafka messages and does over 1.5 billion writes to Cassandra per day. It is 80% cheaper than competing streaming solutions due to running over AWS spot instances and having 70% CPU utilization. Read more.
Add to your personal schedule
11:00am–11:40am Wednesday, 03/27/2019
Session
Data Engineering and Architecture
Location: 2008 Level: Intermediate
Diego Oppenheimer (Algorithmia)
You've invested heavily in cleaning your data, feature engineering, training and tuning your model—but now you have to deploy your model into production and you discover it's a huge challenge. In this talk, you'll learn common architectural patterns and best practices of the most advanced organizations who are deploying your model for scalability and accessibility. Read more.
Add to your personal schedule
11:00am–11:40am Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: 2009 Level: Intermediate
Jari Koister (FICO )
Financial Services are increasingly deploying AI services for a wide range of applications such as credit life cycle, fraud, and financial crimes. Such deployment requires models to be interpretable, explainable and resilient to adversarial attacks. Regulatory requirements prohibit application of black-box machine learning models. This talk describes what FICO has developed to support these needs. Read more.
Add to your personal schedule
11:00am–11:40am Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: 2011 Level: Intermediate
Tristan Zajonc (Cloudera), Tim Chen
Data platforms are being asked to support an ever increasing range of workloads and compute environments, including machine learning and elastic cloud platforms. In this talk, we will discuss some emerging capabilities, including running machine learning and Spark workloads on autoscaling container platforms, and share our vision of the road ahead for ML and AI in the cloud. Read more.
Add to your personal schedule
11:00am–11:40am Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: 2014 Level: Intermediate
Robert Horton (Microsoft), Mario Inchiosa (Microsoft), Ali Zaidi (Microsoft)
We show how three cutting-edge machine learning techniques can be used together to up your modeling game: 1. Transfer learning from pre-trained language models 2. Active learning to make more effective use of a limited labeling budget 3. Hyperparameter tuning to maximize model performance We will apply these techniques to a growing business challenge: moderating public discussions. Read more.
Add to your personal schedule
11:00am–11:40am Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: 2016 Level: Beginner
Jeremy Howard (Enlitic)
When deep learning is able to be easily applied by non-engineers (that possess extensive domain expertise), we can accelerate not only the pace of industry adoption but also the rate at which we uncover interesting and relevant research problems. Read more.
11:00am–11:40am Wednesday, 03/27/2019 TBC
11:00am–11:40am Wednesday, 03/27/2019
Location: 2020
TBC
Add to your personal schedule
11:00am–11:40am Wednesday, 03/27/2019
Session
Case studies, Strata Business Summit
Location: 2024 Level: Intermediate
Bysshe Easton (KIXEYE), Thomas Dobbs (KIXEYE)
As a fully closed model economy games offer a unique opportunity to use analytics to create unique purchase opportunities for customers. We’ll cover how KIXEYE was able to use machine learning to create personalized offer recommendations for our customers resulting in significantly increased monetization and retention. We’ll go over some of the important choices to make and pitfalls to avoid. Read more.
Add to your personal schedule
11:00am–11:40am Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: Expo Hall
Secondary topics:  AI and Data technologies in the cloud, Ethics and Privacy, Security
Alon Kaufman (Duality), Vinod Vaikuntanathan (MIT and Duality Technologies)
In this talk, we will discuss the challenges and opportunities of machine learning on encrypted data and describe the state of the art in this space. Read more.

11:50am

Add to your personal schedule
11:50am–12:30pm Wednesday, 03/27/2019
Session
Data Engineering and Architecture
Location: 2001 Level: Intermediate
This talk helps describe the Data lineage system we built at Stitch Fix and what has the journey been as we built it from the ground up. Read more.
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/27/2019
Session
Data Engineering and Architecture
Location: 2002 Level: Intermediate
Kurt Brown (Netflix)
The Netflix data platform is a massive-scale, cloud-only suite of tools and technologies. It includes big data techs (e.g. Spark and Flink), enabling services (e.g. federated metadata management), and machine learning support. But with power comes complexity. I'll talk through how we are investing towards an easier, "self-service" data platform without sacrificing our enabling capabilities. Read more.
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/27/2019
Session
Data Engineering and Architecture
Location: 2004 Level: Advanced
Lars Volker (Cloudera), Michael Ho (Cloudera)
In recent years, Apache Impala has been deployed to clusters that are large enough to hit architectural limitations in the stack. Our talk will cover the efforts and results to address the scalability limitations in the now legacy Thrift RPC framework by using Apache Kudu's RPC which was built from the ground up to support asynchronous communication, multiplexed connections, TLS, and Kerberos. Read more.
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/27/2019
Session
Data Engineering and Architecture
Location: 2006 Level: Intermediate
In a large Global Health Service company, streaming data for processing and sharing comes with its own challenges. Data science and analytics platforms need data fast, from relevant sources to act on this data quickly and share the insights with consumers with the same speed and urgency. Streaming data architectures are a necessity. Kafka and Hadoop are key. Read more.
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/27/2019
Session
Data Engineering and Architecture
Location: 2008 Level: Intermediate
Tobias Knaup (Mesosphere), Jörg Schad (Mesosphere, Inc.)
There are many great tutorials for training your deep learning models using TensorFlow, Keras, Spark or one of the many other frameworks. But training is only a small part in the overall deep learning pipeline. This talk gives an overview into building a complete automated deep learning pipeline starting with exploratory analysis, over training, model storage, model serving, and monitoring. Read more.
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: 2009 Level: Intermediate
Chakri Cherukuri (Bloomberg LP)
In this talk we will see how machine learning and deep learning techniques can be applied in the field of quantitative finance. We will look at a few use-cases in detail and see how machine learning techniques can supplement and sometimes even improve upon already existing statistical models. We will also look at novel visualizations to help us better understand and interpret these models. Read more.
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: 2011 Level: Advanced
Sarah Aerni (Salesforce)
How does Salesforce manage to make data science an agile partner to over 100,000 customers? We will share the nuts and bolts of the platform and our agile process. From our open-source autoML library (TransmogrifAI) and experimentation to deployment and monitoring, we will cover how the tools make it possible for our data scientist to rapidly iterate and adopt a truly agile methodology. Read more.
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: 2014 Level: Intermediate
Michael Johnson (Lockheed Martin), Norris Heintzelman (Lockheed Martin)
How do you train a machine learning model with no training data? We will present our journey implementing multiple solutions to bootstrapping training data in the NLP domain. We will cover topics including weak supervision, building an active learning framework, and annotation adjudication for Named Entity Recognition. Read more.
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: 2016 Level: Intermediate
Melinda Williams (Dstillery)
Customer segmentation based on coarse survey data has long been a staple of traditional market research. We use deep learning to model the digital pathways of over a hundred million consumers and use this embedding to cluster customer populations into fine-grained behavioral segments and inform smarter consumer insights. Along the way, we create a map of the internet. Read more.
11:50am–12:30pm Wednesday, 03/27/2019
Location: 2018
TBC
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/27/2019
Session
Executive Briefing and best practices, Strata Business Summit
Location: 2020 Level: Intermediate
How do you decide if you should invest in upskilling business teams? The question is no longer IF but rather WHEN and HOW. I’m going to share with you a framework for that. In my time at GE, Google and GV, I have created and conducted multiple analytics trainings for non-technical users. They have resulted in increased productivity as well as work satisfaction. Read more.
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/27/2019
Session
Culture and organization, Strata Business Summit
Location: 2024 Level: Non-technical
Maryam Jahanshahi (TapRecruit)
Hiring teams largely rely on both intuition and experience to scout talent for data science and data engineering roles. Drawing on results from analyzing over 15 million jobs and their outcomes, Maryam Jahanshahi interrogates these “common sense” judgements to determine whether they help or hurt hiring of data scientists and engineers. Read more.
Add to your personal schedule
11:50am–12:30pm Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: Expo Hall Level: Intermediate
Ron Bodkin (Google)
Google uses Deep Learning extensively in new and existing products. Come learn about how Google has used Deep Learning for recommendations at YouTube, the Play store and for customers in Google Cloud. Learn about the role of embeddings, recurrent networks, contextual variables and wide and deep learning and how to do both candidate generation and ranking with Deep Learning. Read more.

12:30pm

Add to your personal schedule
12:30pm–2:40pm Wednesday, 03/27/2019
Event
Location: Expo Hall (Exhibit Hall - Level 1)
Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics. Read more.

2:40pm

Add to your personal schedule
2:40pm–3:20pm Wednesday, 03/27/2019
Session
Data Engineering and Architecture
Location: 2001 Level: Advanced
James Taylor (Lyft)
This talk will provide details of an automated feedback loop at Lyft to adapt ETL based on the aggregate cost of queries run across the cluster. In addition, future work will be outlined to enhance the system through the use of materialized views to reduce the number of ad hoc joins and sorting performed by the most expensive queries by transparently rewriting queries when possible. Read more.
Add to your personal schedule
2:40pm–3:20pm Wednesday, 03/27/2019
Session
Data Engineering and Architecture
Location: 2002 Level: Intermediate
Yaron Haviv (iguazio)
Faced with the need to handle increasing volumes of data, alternative data sets ("alt data") and AI, many enterprises are working to design or redesign their big data architectures. While traditional batch platforms fail to generate sufficient ROI, Yaron Haviv suggests a Continuous Analytics approach yielding faster answers for the business while remaining simpler and less expensive for IT. Read more.
Add to your personal schedule
2:40pm–3:20pm Wednesday, 03/27/2019
Session
Data Engineering and Architecture
Location: 2004 Level: Intermediate
Zhenxiao Luo (Uber)
From determining the most convenient rider pickup points to predicting the fastest routes, Uber uses data-driven analytics to create seamless trip experiences. Inside Uber, analysts would like to run Analytics on any data sources, in real time. This talk will share Uber’s engineering effort about real time Analytics on any data source on the fly, without any data copy. Read more.
Add to your personal schedule
2:40pm–3:20pm Wednesday, 03/27/2019
Session
Data Engineering and Architecture
Location: 2006 Level: Intermediate
Adem Efe Gencer (LinkedIn)
This talk will describe our work and experiences towards alleviating the management overhead of large-scale Kafka clusters using Cruise Control at LinkedIn. Read more.
2:40pm–3:20pm Wednesday, 03/27/2019
Session
Data Engineering and Architecture
Location: 2008
TBC
Add to your personal schedule
2:40pm–3:20pm Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: 2009 Level: Intermediate
Kapil Gupta (Airbnb)
In this talk, we will present how we approach personalization of travelers’ booking experience using Machine Learning. Read more.
Add to your personal schedule
2:40pm–3:20pm Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: 2011 Level: Intermediate
Ted Dunning (MapR)
Evaluating machine learning models is surprisingly hard. It gets even harder because these systems interact in very subtle ways. I will break the problem of evaluation apart into operational and function evaluation and show how each can be done without unnecessary pain and suffering. In particular, I will show some exciting visualization techniques that help make differences strikingly apparent. Read more.
2:40pm–3:20pm Wednesday, 03/27/2019
Location: 2014
TBC
Add to your personal schedule
2:40pm–3:20pm Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: 2016 Level: Intermediate
Chenhui Hu (Microsoft)
Dilated neural networks are a class of recently developed neural networks that achieve promising results in time series forecasting. We introduce representative network architectures of dilated neural networks. Then, we demonstrate their advantages in terms of training efficiency and forecast accuracy by applying them to solve sales forecasting and financial time series forecasting problems. Read more.
Add to your personal schedule
2:40pm–3:20pm Wednesday, 03/27/2019
Session
Business Analytics and Visualization, Strata Business Summit
Location: 2018 Level: Intermediate
Pierre Romera (International Consortium of Investigative Journalists (ICIJ))
Pierre Romera, the ICIJ’s Chief Technical Officer, can offer a behind-the-scenes look into the process and explore the challenges in handling 1.4 TB of data (in many different formats) – and making it available securely to journalists all over the world. The ICIJ was the team behind the Panama Papers and Paradise Papers. Read more.
Add to your personal schedule
2:40pm–3:20pm Wednesday, 03/27/2019
Session
Culture and organization, Strata Business Summit
Location: 2024 Level: Non-technical
Eric Colson (Stitch Fix), Daragh Sibley (Stitch Fix)
A|B Testing has revealed the fallibility in human intuition that typically drives business decisions. We describe some types of systematic errors domain experts commit. In this interactive session, we demonstrate and discuss how cognitive biases arise from heuristic reasoning processes. Finally, we propose several mechanisms to mitigate these human limitations and improve our decision-making. Read more.

3:20pm

3:20pm–4:20pm Wednesday, 03/27/2019
Location: Expo Hall (Exhibit Hall - Level 1)
Afternoon Break (1h)

4:20pm

Add to your personal schedule
4:20pm–5:00pm Wednesday, 03/27/2019
Session
Data Engineering and Architecture
Location: 2001 Level: Intermediate
Alex Kira (Uber)
Uber operates at scale, with thousands of microservices serving millions of rides a day leading to more than a hundred petabytes of data. We will describe our journey towards a unified and scalable data workflow system at Uber used to manage this data. We will talk about the challenges we faced and how we have re-architected our system to make it highly available and horizontally scalable. Read more.
Add to your personal schedule
4:20pm–5:00pm Wednesday, 03/27/2019
Session
Data Engineering and Architecture
Location: 2002 Level: Intermediate
Jowanza Joseph (OneClickRetail), Karthik Ramasamy (Streamlio)
After 2 years of running streaming pipelines through Kinesis and Spark at One Click Retail, we evaluated our solution and decided to explore a new platform that would (1) take advantage of Kubernetes and (2) support a simpler data processing DSL. We settled on Apache Pulsar because of its native support for Kubernetes and Pulsar Functions a serverless functions model on top of Pulsar. Read more.
Add to your personal schedule
4:20pm–5:00pm Wednesday, 03/27/2019
Session
Data Engineering and Architecture
Location: 2004 Level: Intermediate
Julien Le Dem (WeWork)
Big data infrastructure has evolved from flat files in a distributed filesystem to an efficient ecosystem to a fully deconstructed and open source database with reusable components. Julien Le Dem discusses the key open source components of the big data ecosystem and explains how they relate to each other and how they make the ecosystem more of a database and less of a filesystem. Read more.
Add to your personal schedule
4:20pm–5:00pm Wednesday, 03/27/2019
Session
Data Engineering and Architecture
Location: 2006 Level: Beginner
Sean Glover (Lightbend)
Introducing Strimzi, a Kafka project for Kubernetes. The best way to run stateful services with complex operational needs like Kafka is to use the operator pattern. This talk will review a popular new open source operator-based Apache Kafka implementation on Kubernetes called the Strimzi Kafka Operator. Read more.
4:20pm–5:00pm Wednesday, 03/27/2019
Location: 2008
TBC
Add to your personal schedule
4:20pm–5:00pm Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: 2009 Level: Intermediate
Anish Kejariwal (Roche), David Talby (Atigeo)
We’ll show how Roche applies Spark NLP for Healthcare to extract clinical facts from pathology reports and radiology, and the design of the deep learning pipelines used to simplify training, optimization, and inference of such domain-specific models at scale. Read more.
Add to your personal schedule
4:20pm–5:00pm Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: 2011 Level: Intermediate
Kelley Rivoire (Stripe)
Production ML applications benefit from reproducible, automated retraining and deployment of ever-more predictive models trained on ever-increasing amounts of data. In this talk, I’ll describe how Stripe built a flexible API for training machine learning models that we use to train thousands of models per week on Kubernetes, supporting automated deployment of new models with improved performance. Read more.
Add to your personal schedule
4:20pm–5:00pm Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: 2014 Level: Intermediate
Jeffrey Yau (AllianceBernstein)
Time series forecasting techniques are applied in a wide range of scientific disciplines, business scenarios, and policy settings. This presentation discuss the applications of statistical time series models, such as ARIMA, VAR, and Regime Switching Models, and machine learning models, such as random forest and neural network-based models, to forecasting problems. Read more.
Add to your personal schedule
4:20pm–5:00pm Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: 2016 Level: Intermediate
Jennie Wang (Intel), Luyang Wang (OfficeDepot), Jing (Nicole) Kong (OfficeDepot)
User-based real-time recommendation system has become an important topic in e-commerce field nowadays. This talk demonstrates how to build deep learning algorithms using Analytics Zoo with BigDL on Apache Spark and create end to end system to serve real-time product recommendation. Read more.
Add to your personal schedule
4:20pm–5:00pm Wednesday, 03/27/2019
Session
Executive Briefing and best practices, Strata Business Summit
Location: 2020 Level: Non-technical
Secondary topics:  AI and machine learning in the enterprise, Data preparation, data governance, data lineage, and data privacy
Paco Nathan (derwen.ai)
Data governance is an almost overwhelming topic. This talk surveys history, themes, plus a survey of tools, process, standards, etc. Mistakes imply data quality issues, lack of availability, and other risks that prevent leveraging data. OTOH, compliance issues aim to preventing risks of leveraging data inappropriately. Ultimately, risk management plays the "thin edge of the wedge" in enterprise. Read more.
Add to your personal schedule
4:20pm–5:00pm Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: Expo Hall Level: Intermediate
Gungor Polatkan (LinkedIn)
Talent search systems at LinkedIn strive to match the potential candidates to the hiring needs of a recruiter expressed in terms of a search query. In this talk, we present the results of our deployment of deep learning models on real-world production system serving 500M+users through LinkedIn Recruiter. The challenges and approaches discussed generalize to any multi-faceted search engine. Read more.

5:10pm

Add to your personal schedule
5:10pm–5:50pm Wednesday, 03/27/2019
Session
Data Engineering and Architecture
Location: 2001 Level: Intermediate
Secondary topics:  AI and Data technologies in the cloud, Data Integration and Data Pipelines
Gwen Shapira (Confluent)
As microservices, data services and serverless APIs proliferate, data engineers need to collect and standardize data in an increasingly complex and diverse system. In this presentation, we’ll discuss how data engineering requirements changed in a cloud-native world and share architectural patterns that are commonly used to build flexible, scalable and reliable data pipelines. Read more.
Add to your personal schedule
5:10pm–5:50pm Wednesday, 03/27/2019
Session
Data Engineering and Architecture
Location: 2002 Level: Beginner
Rustem Feyzkhanov (Instrumental)
Serverless implementation of the core processing is becoming a production-ready solution for a lot of companies. The companies with existing processing pipelines may find it hard to go completely serverless. Serverless workflows unite serverless world and cluster world to use benefits of both approaches. My talk will show how serverless workflows change our perception of software architecture. Read more.
Add to your personal schedule
5:10pm–5:50pm Wednesday, 03/27/2019
Session
Data Engineering and Architecture
Location: 2004 Level: Intermediate
Tim Armstrong (Cloudera)
As the popularity and utilization of Apache Impala deployments increases, often clusters become victims of their own success when demand for resources exceeds the supply. This talk will dive into the latest resource management features in Impala to maintain high cluster availability and optimal performance as well as provide examples of how to configure them in your Impala deployment. Read more.
Add to your personal schedule
5:10pm–5:50pm Wednesday, 03/27/2019
Session
Data Engineering and Architecture, Streaming and IoT
Location: 2006 Level: Beginner
GE produces a third of the world's power and 60% of airplane engines. These engines form a critical portion of the world's infrastructure and require meticulous monitoring of the hundreds of sensors streaming data from each turbine. Here, we share the case study of releasing into production the first real-time ML systems used to determine turbine health by GE's monitoring and diagnostics teams. Read more.
Add to your personal schedule
5:10pm–5:50pm Wednesday, 03/27/2019
Session
Data Engineering and Architecture
Location: 2008 Level: Intermediate
Rachel Silver (MapR Technologies)
KubeFlow separates compute and storage to provide the ability to deploy best-of-breed open source systems for machine learning to any cluster running Kubernetes, whether on-premises or in the cloud. This talk will explore the problems of state and storage and how distributed persistent storage can logically extend the compute flexibility provided by KubeFlow. Read more.
5:10pm–5:50pm Wednesday, 03/27/2019
Location: 2009
TBC
Add to your personal schedule
5:10pm–5:50pm Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: 2011 Level: Intermediate
Ting-Fang Yen (DataVisor)
We describe a monitor for production machine learning systems that handle billions of requests daily. Our approach discovers detection anomalies, such as spurious false positives, as well as gradual concept drifts when the model no longer captures the target concept. This session presents new tools for detecting undesirable model behaviors early in large-scale online ML systems. Read more.
Add to your personal schedule
5:10pm–5:50pm Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: 2014 Level: Beginner
Mike Lee Williams (Cloudera Fast Forward Labs)
Imagine building a model whose training data is collected on edge devices such as cell phones or sensors. Each device collects data unlike any other, and the data cannot leave the device because of privacy concerns or unreliable network access. This challenging situation is known as federated learning. In this talk we’ll cover the algorithmic solutions and the product opportunities. Read more.
Add to your personal schedule
5:10pm–5:50pm Wednesday, 03/27/2019
Session
Data Science, Machine Learning & AI
Location: 2016 Level: Intermediate
Zhenxiao Luo (Uber)
From determining the most convenient rider pickup points to predicting the fastest routes, Uber uses data-driven analytics to create seamless trip experiences. Inside Uber, analysts are using deep learning and big data to train models, make predictions, and run analytics in real time. This talk will share Uber’s engineering effort about running real time Analytics with deep learning. Read more.
Add to your personal schedule
5:10pm–5:50pm Wednesday, 03/27/2019
Session
Culture and organization, Strata Business Summit
Location: 2018 Level: Non-technical
Dave Stuart (Department of Defense )
Many organizations look to add data science to their skill portfolios through the hiring of data science experts. We explore a complementary way to build a data science savvy workforce that nets tremendous value by using Jupyter to add introductory data science practices to domain experts and business analysts. Read more.
5:10pm–5:50pm Wednesday, 03/27/2019
Location: 2020
TBC
Add to your personal schedule
5:10pm–5:50pm Wednesday, 03/27/2019
Session
Case studies, Strata Business Summit
Location: 2024 Level: Intermediate
Secondary topics:  Media, Marketing, Advertising
Eric Bradlow (The Wharton School), Zachery Anderson (Electronic Arts)
A case study presented by leadership at the Wharton Customer Analytics Initiative and Electronic Arts about the WCAI Research Opportunity process & how some of EA’s business problems were solved using their data by 11 teams of researchers from around the world. Read more.

5:50pm

Add to your personal schedule
5:50pm–6:50pm Wednesday, 03/27/2019
Event
Location: Expo Hall (Exhibit Hall - Level 1)
Make your way from booth to booth while you check out all the exhibitors in the Expo Hall on Wednesday after sessions end. Read more.

7:30pm

Add to your personal schedule
7:30pm–10:00pm Wednesday, 03/27/2019
Event
Location: TBD
Don't miss an exciting evening filled with cocktails, food, and entertainment at Data After Dark at Strata San Francisco. Read more.

Thursday, 03/28/2019

8:00am

Add to your personal schedule
8:00am–8:30am Thursday, 03/28/2019
Event
Location: TBD
Gather before Keynotes on Thursday morning to enjoy casual conversation while meeting fellow attendees. If one of your goals at Strata is to meet new people, this session will jumpstart your networking opportunities. Read more.

8:45am

Add to your personal schedule
8:45am–10:30am Thursday, 03/28/2019
Keynote
Location: Ballroom
Ben Lorica (O'Reilly Media), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Program Chairs, Ben Lorica, Alistair Croll, and Doug Cutting, welcome you to the second day of keynotes. Read more.

10:30am

10:30am–11:00am Thursday, 03/28/2019
Location: Expo Hall (Exhibit Hall - Level 1)
Morning break (30m)

11:00am

Add to your personal schedule
11:00am–11:40am Thursday, 03/28/2019
Session
Data Engineering and Architecture
Location: 2001 Level: Intermediate
Mark Grover (Lyft)
In this talk, we'll discuss how Lyft has reduced time taken for discovering data by 10x by building its own data portal - Amundsen. We will give a demo of Amundsen, deep dive into its architecture and discuss how it leverages centralized metadata, page rank, and a comprehensive data graph to achieve its goal. We will close with future roadmap, unsolved problems and collaboration model. Read more.
Add to your personal schedule
11:00am–11:40am Thursday, 03/28/2019
Session
Data Engineering and Architecture
Location: 2002 Level: Intermediate
Subha Tatavarti (PayPal), Vadim Kutsyy (PayPal)
PayPal data eco system is fairly large with over 250+PB of data transacting in over 200+ countries. Given this massive scale and complexity, discovering and access to the right data sets in a frictionless environment is a massive challenge.PayPal’s Data Platform team is helping solve this problem holistically with a combination of self service integrated and interoperable products. Read more.
Add to your personal schedule
11:00am–11:40am Thursday, 03/28/2019
Session
Data Engineering and Architecture
Location: 2004 Level: Intermediate
Kamil Bajda-Pawlikowski (Starburst), Martin Traverso (Facebook)
Presto, an open source distributed SQL engine, is designed for interactive queries and ability to query multiple data sources. With the ever-growing list of connectors (e.g., Apache Kudu, Pulsar, Netflix Iceberg, Elasticsearch) recently introduced Cost-Based Optimizer in Presto must account for heterogeneous data source with incomplete statistics and new use cases such as geospatial analytics. Read more.
11:00am–11:40am Thursday, 03/28/2019 TBC
Add to your personal schedule
11:00am–11:40am Thursday, 03/28/2019
Session
Data Engineering and Architecture
Location: 2008 Level: Intermediate
Zhen Fan (JD.com)
JD.com has designed a brand new architecture to optimize the spark computing clusters. We will show the problems we faced before and how we benefit from the in-memory distributed filesystem now. Read more.
Add to your personal schedule
11:00am–11:40am Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2009 Level: Intermediate
Boris Yakubchik (Forbes)
Introducing Bertie, our new publishing platform at Forbes. Bertie is an AI assistant that learns from writers at all times and suggests improvements along the way. We will discuss Bertie’s features, architecture, and ultimate goals. We will be giving special attention to how we implement an ensemble of machine learning models that, together, makeup a skill set and personality of the AI assistant. Read more.
Add to your personal schedule
11:00am–11:40am Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2011 Level: Intermediate
Sourav Dey (Manifold)
Clustered data is all around us. The best way to attack it? Mixed effect models. This talk explains how the Mixed Effects Random Forests (MERF) model and Python package marries the world of classical mixed effect modeling with modern machine learning algorithms, and how it can be extended to be used with other advanced modeling techniques like gradient boosting machines and deep learning. Read more.
Add to your personal schedule
11:00am–11:40am Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2014 Level: Intermediate
Ram Shankar Kumar (Microsoft (Azure Security))
How can we guarantee to our customers that the ML system we develop is adequately protected from adversarial manipulation? Data scientists, program managers and security experts, will takeaway a framework and corresponding best practices to quantitatively assess the safety of their ML systems. Read more.
Add to your personal schedule
11:00am–11:40am Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2016 Level: Beginner
Olivia Wang (Datavisor)
Online fraud flourishes as online services become ubiquitous in our daily life. This talk will discuss how Datavisor leverages cutting-edge deep learning technologies to address the challenges in large-scale fraud detection. Read more.
Add to your personal schedule
11:00am–11:40am Thursday, 03/28/2019
Session
Culture and organization, Strata Business Summit
Location: 2018 Level: Non-technical
Marc Paradis (UnitedHealth Group)
Data Science University (DSU) was established to bring analytics education to UnitedHealth Group, the world’s largest healthcare company with over 270,000 employees. In an era of rapidly changing analytics technology and capability in an industry ripe for disruption, this session will cover how DSU has been built out over time, the challenges faced, and lessons learned. Read more.
11:00am–11:40am Thursday, 03/28/2019 TBC
Add to your personal schedule
11:00am–11:40am Thursday, 03/28/2019
Session
Data Engineering and Architecture
Location: 2024 Level: Intermediate
Secondary topics:  Security
Thomas Phelan (BlueData)
Recent headline-grabbing data breaches demonstrate that protecting data is essential for every enterprise. The best-of-breed approach for Big Data is HDFS configured with Transparent Data Encryption (TDE). TDE is difficult to configure and manage - even more so when run in Docker containers. This session will discuss these challenges and how to overcome them. Read more.

11:50am

11:50am–12:30pm Thursday, 03/28/2019
Location: 2001
TBC
Add to your personal schedule
11:50am–12:30pm Thursday, 03/28/2019
Session
Data Engineering and Architecture
Location: 2002 Level: Beginner
Secondary topics:  Data Platforms, Retail and e-commerce
Learn about how a small team in Tokyo went through several evolutions as they built an analytics service to help 200+ businesses accelerate their decision-making process. This presentation will cover the background, challenges, architecture, success stories, and best practices as they built and productionalized Rakuten Analytics. Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, 03/28/2019
Session
Data Engineering and Architecture, Streaming and IoT
Location: 2004 Level: Intermediate
Fabian Hueske (data Artisans)
Processing streaming data with SQL is gaining a lot of attention. In this talk, Fabian Hueske explains why SQL queries on streams should have the same semantics as SQL queries on static data. Moreover, Fabian will present a selection of common use cases and demonstrate how easily they can be addressed by Flink SQL. Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, 03/28/2019
Session
Data Engineering and Architecture
Location: 2006 Level: Intermediate
Vivek Pasari (Netflix)
Netflix has over 125 million members spread across 191 countries. Each day our members interact with our client applications on 250 million+ devices under highly variable network conditions. These interactions result in over 200 billion daily data points. In this session, we will highlight the data engineering and architecture which enables application performance measurement at this scale. Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, 03/28/2019
Session
Data Engineering and Architecture
Location: 2008 Level: Beginner
Alex Poms (Stanford University), Will Crichton (Stanford University)
Systems like Spark made it possible to process big numerical/textual data on hundreds of machines. Today, the majority of data in the world is video. Scanner is the first open-source distributed system for building large-scale video processing applications. Scanner is being used at Stanford for analyzing TBs of film with deep learning on GCP, and at Facebook for synthesizing VR video on AWS. Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2009 Level: Intermediate
Jeff Chen (US Bureau of Economic Analysis)
Jeff Chen presents strategies for overcoming time series challenges at the intersection of macroeconomics and data science, drawing from machine learning research conducted at the Bureau of Economic Analysis aimed at improving its flagship product the Gross Domestic Product. Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2011 Level: Beginner
Ken Johnston (Microsoft), Ankit Srivastava (Microsoft)
These days it’s not about normal growth, it’s about driving hockey-stick levels of growth. Sales & marketing orgs are looking to AI to help growth hack their way to new markets and segments. We have used Mutual Information for many years to help filter out noise and find the critical insights to new cohort of users, businesses and networks and now we can do it at scale across massive data sources. Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2014 Level: Intermediate
David Rodriguez (Cisco Systems)
Malicious DNS traffic patterns are inconsistent, ranging from periodic to sporadic, and typically thwart anomaly detection. Using Apache Spark and Stripe’s Bayesian inference software - Rainier, we fit the underlying time-series distribution for millions of domains and outline techniques to identify artificial traffic volumes related to spam, malvertising, and botnets we call masquerading traffic. Read more.
11:50am–12:30pm Thursday, 03/28/2019
Location: 2016
TBC
Add to your personal schedule
11:50am–12:30pm Thursday, 03/28/2019
Session
Culture and organization, Strata Business Summit
Location: 2018 Level: Non-technical
Francesco Mucio (Zalando SE)
Francesco Mucio tells the story of how Zalando went from an old-school BI company to an AI-driven company built on a solid data platform. Along the way, he shares what Zalando learned in the process and the challenges that still lie ahead. Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, 03/28/2019
Session
Executive Briefing and best practices, Strata Business Summit
Location: 2020 Level: Intermediate
David Talby (Pacific AI)
Machine learning and data science systems often fail in production in unexpected ways. David Talby shares real-world case studies showing why this happens and explains what you can do about it, covering best practices and lessons learned from a decade of experience building and operating such systems at Fortune 500 companies across several industries. Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, 03/28/2019
Session
Data Engineering and Architecture
Location: 2024 Level: Advanced
Alkis Simitsis (Micro Focus), Shivnath Babu (Unravel Data Systems, Duke University)
This describes an automated technique for root cause analysis (RCA) for big data stack applications using deep learning techniques. Spark and Impala will be used as examples, but the concepts generalize to the big data stack. Read more.

12:30pm

Add to your personal schedule
12:30pm–1:50pm Thursday, 03/28/2019
Event
Location: Expo Hall (Exhibit Hall - Level 1)
Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics. Read more.

1:50pm

Add to your personal schedule
1:50pm–2:30pm Thursday, 03/28/2019
Session
Data Engineering and Architecture
Location: 2001 Level: Beginner
How efficient is your data platform? The single metric we use is Time-to-Reliable-Insights — total of time spent to ingest, transform, catalog, analyze, and publish. There are three elephants-in-the-room when it comes to Time-to-Reliable-insights — time-to-discover, time-to-catalog, and time-to-debug for data quality. This talk covers three design patterns and/or frameworks we have implemented. Read more.
1:50pm–2:30pm Thursday, 03/28/2019
Session
Data Engineering and Architecture
Location: 2002
TBC
Add to your personal schedule
1:50pm–2:30pm Thursday, 03/28/2019
Session
Data Engineering and Architecture
Location: 2004 Level: Intermediate
Haifeng Chen (Intel)
Spark SQL is widely used today. However, it still suffers from stability and performance challenges in the highly dynamic environment with large scale of data. To address these challenges, we introduced Spark adaptive execution engine which can handle the task parallelism, join conversion and data skew dynamically during run-time, guaranteeing the best plan is chosen using run-time statistics. Read more.
Add to your personal schedule
1:50pm–2:30pm Thursday, 03/28/2019
Session
Data Engineering and Architecture
Location: 2006 Level: Intermediate
Michael Freedman (Timescale)
In this talk, I focus on two newly-released features of TimescaleDB (automated adaptation of time-partitioning intervals and continuous aggregations in near-real-time), and discuss how these capabilities ease time-series data management. I discuss how these capabilities have been leveraged across several different use cases, including in use with other technologies such as Kafka. Read more.
Add to your personal schedule
1:50pm–2:30pm Thursday, 03/28/2019
Session
Data Engineering and Architecture
Location: 2008 Level: Intermediate
Arun Kumar (University of California, San Diego)
This talks presents a couple of recent techniques from research to accelerate ML over data that is the output of joins of multiple tables. Using ideas from query optimization and learning theory, we show how to avoid joins before ML to reduce runtimes and memory/storage footprints. Open source software prototypes and sample ML code in both R and Python will also be shown. Read more.
Add to your personal schedule
1:50pm–2:30pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2009 Level: Intermediate
Animesh Singh (IBM), Tommy Li (IBM)
In this talk we are going to discuss how to provide an implementation for many state-of-the-art methods for attacking and defending classifiers using open source Adversarial Robustness Toolbox. For AI developers, the library provides interfaces that support the composition of comprehensive defense systems using individual methods as building blocks. Read more.
Add to your personal schedule
1:50pm–2:30pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2011 Level: Beginner
Jonathan Merriman (Verint Intelligent Self Service), Cynthia Freeman (Verint Intelligent Self Service)
An anomaly is a pattern not conforming to past, expected behavior. Its detection has many applications such as tracking business KPIs or fraud spotting in credit card transactions. Unfortunately, there is no one best way to detect anomalies across a variety of domains. We introduce a framework to determine the best anomaly detection method for the application based on time series characteristics. Read more.
Add to your personal schedule
1:50pm–2:30pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2014 Level: Intermediate
Louis DiValentin (Accenture Technology Labs), Dillon Cullinan (Accenture)
In this talk, we will show how Accenture's Cyber Security Lab built Security Analytics Models to detect Attempted Lateral Movement in networks by transforming enterprise scale security data into a graph format, generating graph analytics for individual users, and building time series detection models that visualize the changing graph metrics for security operators. Read more.
Add to your personal schedule
1:50pm–2:30pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2016 Level: Intermediate
Aashish Sheshadri (PayPal Inc)
Deep learning using Sequence to Sequence networks (Seq2Seq) has demonstrated unparalleled success in Neural Machine Translation. A less explored but highly sought-after area of Forecasting can leverage recent gains made in Seq2Seq networks. This talk will introduce the application of deep networks to monitoring and alerting intelligence at PayPal. Read more.
Add to your personal schedule
1:50pm–2:30pm Thursday, 03/28/2019
Session
Culture and organization, Strata Business Summit
Location: 2018 Level: Non-technical
Jesse Anderson (Big Data Institute), Thomas Goolsby (USAA)
What happens when you have a data science organization, but no data engineering organization? This is what happened at USAA. In this session, we will share what happened without data engineering, how we fixed it, and what were the results. Read more.
Add to your personal schedule
1:50pm–2:30pm Thursday, 03/28/2019
Session
Executive Briefing and best practices, Strata Business Summit
Location: 2020 Level: Beginner
Ken Johnston (Microsoft), Ankit Srivastava (Microsoft)
At the rate data sources are multiplying business value can often be developed faster by joining data sources rather than mining a single source to the very end. This presentation covers four years of hands on practical experience sourcing and integrating massive numbers of data sources to build the Microsoft Business Intelligence Graph (M360 BIG). Read more.
Add to your personal schedule
1:50pm–2:30pm Thursday, 03/28/2019
Session
Strata Business Summit, Visualization and UX
Location: 2024 Level: Non-technical
Stefaan Vervaet (Western Digital Corporation)
The École Polytechnique Fédérale de Lausanne (EPFL) spearheaded the official digital archival of 15,000+ hours of A/V content captured from the Montreux Jazz Festival since 1967, and most recently, created an immersive 3D VR experience. From capture, store, delivery and experience, this case study focuses on the evolution of M&E workflow– from camera to cloud – that made it all possible. Read more.

2:40pm

Add to your personal schedule
2:40pm–3:20pm Thursday, 03/28/2019
Session
Data Engineering and Architecture
Location: 2001 Level: Beginner
Xiao Li (Databricks), Wenchen Fan (Databricks)
This talk will provide an overview of the major features and enhancements in Apache Spark 2.4 release and the upcoming releases and will be followed by a Q&A session. Read more.
Add to your personal schedule
2:40pm–3:20pm Thursday, 03/28/2019
Session
Data Engineering and Architecture
Location: 2002 Level: Beginner
Rohan Dhupelia (Atlassian), Jimmy Li (Atlassian)
Analytics is easy, good analytics is hard. Here at Atlassian we know this all to well with our push to become a truely data-driven organisation. In order to achieve this we've transformed the way we thought about behavioural analytics, from how we defined our events all the way to how we ingested and analysed them. Read more.
Add to your personal schedule
2:40pm–3:20pm Thursday, 03/28/2019
Session
Data Engineering and Architecture
Location: 2004 Level: Intermediate
Secondary topics:  AI and Data technologies in the cloud
Alan Choi (Cloudera), Eva Andreasson (Cloudera), Mark Brine (Cloudera)
In this talk, you will learn how Cloudera’s Finance Department used a hybrid model to speed up report delivery and reduce cost of end of quarter reporting. Learn from our experience some guidelines for how to deploy modern data warehousing in a hybrid cloud environment: When should you choose private vs public cloud services? What options are there? Do:s and dont:s Read more.
Add to your personal schedule
2:40pm–3:20pm Thursday, 03/28/2019
Session
Data Engineering and Architecture
Location: 2006 Level: Beginner
Akshai Sarma (Yahoo), Michael Natkovich (Yahoo)
Bullet is a scalable, pluggable, light, multi-tenant query system on any data flowing through a streaming system without storing it. Bullet queries are submitted first and operate on data flowing through the system from the point of submission. Bullet efficiently supports intractable operations like Top K, Counting Distincts and Windowing without any storage using Sketch-based algorithms. Read more.
Add to your personal schedule
2:40pm–3:20pm Thursday, 03/28/2019
Session
Data Engineering and Architecture
Location: 2008 Level: Beginner
Paul Curtis (MapR Technologies)
Just like almost everybody, we needed a way for ordinary users to stand up applications on top of Kubernetes, but we had additional requirements. And we had to do it without breaking the bank. Our field sales engineering force of sixty engineers around the globe now can spin up and down our technology quickly and simply using Kubernetes, the cloud, and shared data storage. Read more.
Add to your personal schedule
2:40pm–3:20pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2009 Level: Beginner
Kirstin Aschbacher (UCSF Cardiology)
Some people use digital devices to track their blood alcohol content (BAC) – for example, to avoid driving drunk. If a BAC-tracking App could anticipate when a person is likely to have a high BAC, it might offer coaching in a time of need. We offer a machine learning approach that predicts user BAC levels with good precision based on minimal information, thereby enabling targeted interventions. Read more.
Add to your personal schedule
2:40pm–3:20pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2011 Level: Intermediate
Divya Choudhary (GOJEK)
Who would have imagined that a random chat message or a note written in a local language sent by customers to their drivers while waiting for a ride/car to arrive for their pickup can be utilised to carve out unparalleled information about pickup points, their names that sometimes even Google map has no idea of & to finally help in creating a world class customer pick-up experience feature! Read more.
Add to your personal schedule
2:40pm–3:20pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2014 Level: Intermediate
Till Bergmann (Salesforce)
A problem in predictive modeling data is label leakage. At Enterprise companies such as Salesforce, this problem takes on monstrous proportions as the data is populated by diverse business processes, making it hard to distinguish cause from effect. We will describe how we tackled this problem at Salesforce, where we need to churn out thousands of customer-specific models for any given use case. Read more.
Add to your personal schedule
2:40pm–3:20pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2016 Level: Advanced
Sridhar Alla (Comcast), Syed Nasar (Cloudera)
Any Business big or small depends on analytics whether the goal is revenue generation, churn reduction or sales/marketing purposes. No matter the algorithm and the techniques used, the result depends on the accuracy and consistency of the data being processed. In this talk, we will present some techniques used to evaluate the the quality of data and the means to detect the anomalies in the data. Read more.
Add to your personal schedule
2:40pm–3:20pm Thursday, 03/28/2019
Session
Case studies, Strata Business Summit
Location: 2018 Level: Non-technical
Mei Fung (Customer Think)
Data Sharing requires stakeholders and populations of people to come and learn together the benefits, risks, challenges and the known and unknown "Unknowns". Data Sharing and data sharing policies and data sharing policy frameworks require increasing levels of trust - which takes time to build: Trail breaking stories from Solano County, California and ASEAN (SE Asia) offer important insights Read more.
Add to your personal schedule
2:40pm–3:20pm Thursday, 03/28/2019
Session
Executive Briefing and best practices, Strata Business Summit
Location: 2020 Level: Intermediate
Mark Donsky (Okera)
General Data Protection Regulation went into effect in 2018, and California is following suit with the California Consumer Protection Act (CCPA) in 2020. However many companies aren't prepared for the strict regulation or fines for noncompliance. This session will explore the capabilities your data environment needs in order to simplify CCPA and GDPR compliance, as well as other regulations. Read more.
Add to your personal schedule
2:40pm–3:20pm Thursday, 03/28/2019
Session
Data Engineering and Architecture
Location: 2024 Level: Intermediate
Secondary topics:  Financial Services, Security
Cory Watson (Stripe)
How Stripe uses data sketching and off the shelf parts to build a novel observability pipeline that unifies measurements across our infrastructure to both improve reliability and keep vendor costs down. Read more.

3:20pm

3:20pm–3:50pm Thursday, 03/28/2019
Location: Foyer
Afternoon break (30m)

3:50pm

Add to your personal schedule
3:50pm–4:30pm Thursday, 03/28/2019
Session
Data Engineering and Architecture
Location: 2001 Level: Intermediate
Li Gao (Lyft Inc.), Bill Graham (Lyft Inc.)
In this talk, Li Gao and Bill Graham will talk about challenges the Lyft team faced and solutions they developed to support Apache Spark on Kubernetes in production and at scale. Read more.
3:50pm–4:30pm Thursday, 03/28/2019
Session
Data Engineering and Architecture
Location: 2002
TBC
Add to your personal schedule
3:50pm–4:30pm Thursday, 03/28/2019
Session
Data Engineering and Architecture
Location: 2004 Level: Intermediate
Adrian Lungu (Adobe), Serban Teodorescu (Adobe)
Inspired by the Green / Blue deployment technique, the Adobe Audience Manager team developed an Active / Passive database migration procedure that allows us to test our database clusters in production, minimising the risks without compromising the innovation. We successfully applied this approach twice to upgrade the entire technology stack. But it never was a smooth move. Read more.
Add to your personal schedule
3:50pm–4:30pm Thursday, 03/28/2019
Session
Data Engineering and Architecture
Location: 2006 Level: Beginner
Václav Surovec (T-Mobile Czech Republic)
The knowledge of location and travel patterns of customers is important for many companies. One of them is a German telco service operator Deutsche Telekom. Commercial Roaming project using Cloudera Hadoop helped the company to better analyze the behavior of its customers from 13 countries, in a very secure way, to be able to provide better predictions and visualizations for the high management. Read more.
Add to your personal schedule
3:50pm–4:30pm Thursday, 03/28/2019
Session
Data Engineering and Architecture
Location: 2008 Level: Intermediate
Yuan Zhou (Intel), Haodong Tang (Intel), Jian Zhang (Intel)
We introduce Spark-PMOF and explain how it improves Spark analytics performance. Read more.
Add to your personal schedule
3:50pm–4:30pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2009 Level: Beginner
Noah Gift (UC Davis ), Michelle Davenport (Ritual)
Learn how to explore exciting ideas in Nutrition using Data Science. In this presentation we analyze the detrimental relationship between sugar and longevity, obesity and chronic diseases. Read more.
Add to your personal schedule
3:50pm–4:30pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2011 Level: Intermediate
The next big step in data science combines the ease of use of common Python APIs but with the power and scalability of GPUs. This session highlights the progress that has been made on PyGDF, the first step to give data scientists access to familiar APIs while increasing speed. We also discuss how to get started doing data sciend on the GPU and provide use cases involving graph analytics. Read more.
Add to your personal schedule
3:50pm–4:30pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2014 Level: Beginner
Patrick Miller (Civis Analytics)
Brands that test the content of ads before they are shown to an audience can avoid spending resources on the 11% of ads that cause backlash. Using a survey experiment to choose the best ad typically improves effectiveness of marketing campaigns by 13% on average, and up to 37% for particular demographics. We discuss data collection and statistical methods for analysis and reporting. Read more.
Add to your personal schedule
3:50pm–4:30pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2016 Level: Non-technical
Yuhao Yang (Intel)
The talk introduces how to run distributed TensorFlow on Apache Spark with the open source software package Analytics Zoo. Compared to other solution, Analytics Zoo is built for production environment and encourages more industry users to run deep learning applications with the Big Data ecosystems. Read more.
Add to your personal schedule
3:50pm–4:30pm Thursday, 03/28/2019
Session
Law and Ethics, Strata Business Summit
Location: 2018 Level: Non-technical
Nick Curcuru (Mastercard)
In recent years, security breaches have happened to a number of household names, and users feel violated.People around the world have shared their valuable, personally identifiable information with companies they trusted, and many of those companies didn’t guard that information appropriately. Read more.
Add to your personal schedule
3:50pm–4:30pm Thursday, 03/28/2019
Session
Executive Briefing and best practices, Strata Business Summit
Location: 2020 Level: Non-technical
Dean Wampler (Lightbend)
Your team is building Machine Learning capabilities. I'll discuss how you can integrate these capabilities in streaming data pipelines so you can leverage the results quickly and update them as needed. There are big challenges. How do you build long-running services that are very reliable and scalable? How do you combine a spectrum of very different tools, from data science to operations? Read more.
Add to your personal schedule
3:50pm–4:30pm Thursday, 03/28/2019
Session
Data Engineering and Architecture
Location: 2024 Level: Intermediate
Secondary topics:  Security
John Bennett (Netflix), Siamac Mirzaie (Netflix)
Data has become a foundational pillar for security teams operating in organizations of all shapes and sizes. This new norm has created a need for platforms that enable engineers to harness data for various security purposes. This talk introduces our internal platform aimed at quickly deploying data-based detection capabilities in the Netflix corporate environment. Read more.

4:40pm

Add to your personal schedule
4:40pm–5:20pm Thursday, 03/28/2019
Session
Data Engineering and Architecture
Location: 2001 Level: Intermediate
Holden Karau (Google), Rachel Warren (Salesforce Einstein)
Apache Spark is an amazing distributed system, but part of the bargain we've all made with the infrastructure demons involves providing the correct set of magic numbers (aka tuning) or our jobs may be eaten by Cthulhu. This talk will look at auto-tuning jobs using historical & static job information using systems like Mahout, and internal Spark ML jobs as workloads including new settings in 2.4. Read more.
Add to your personal schedule
4:40pm–5:20pm Thursday, 03/28/2019
Session
Data Engineering and Architecture
Location: 2002 Level: Intermediate
Sonali Sharma (Netflix), Shriya Arora (Netflix)
With so much data being generated in real-time what if we could combine all these high-volume data streams in real time and provide a near realtime feedback for model training, improve personalization and recommendations, thereby taking the customer experience on the product to a whole new level. Well, it is possible to tame large state-join for exactly that purpose using Flink's keyed state. Read more.
Add to your personal schedule
4:40pm–5:20pm Thursday, 03/28/2019
Session
Data Engineering and Architecture
Location: 2004 Level: Intermediate
Igor Canadi (Rockset), dhruba borthakur (Rockset)
Most existing big data systems prefer sequential scans for processing queries. We challenge this view and present converged indexing: a single system called ROCKSET that builds inverted, columnar and document indices. Converged indexing is economically feasible due to the elasticity of cloud-resources and write optimized storage engines. Read more.
Add to your personal schedule
4:40pm–5:20pm Thursday, 03/28/2019
Session
Data Engineering and Architecture, Streaming and IoT
Location: 2006 Level: Intermediate
Jinchul Kim (SK Telecom)
Druid supports auto scaling feature for data ingestion, but it is only available on AWS EC2. We cannot rely on the feature on our private cloud. In this talk, we are going to introduce auto scale-out/in on Kubernetes. We will show benefit on our approach and where it comes from and share development of Druid Helm chart, rolling update, custom metric usage for horizontal auto scaling. Read more.
Add to your personal schedule
4:40pm–5:20pm Thursday, 03/28/2019
Session
Data Engineering and Architecture
Location: 2008 Level: Advanced
Patrick Stuedi (IBM Research)
Modern networking and storage technologies like RDMA or NVMe find their ways into the data center. Apache Crail (incubating) is a new project that facilitates running data processing workloads (ML, SQL, etc.) on such hardware. In this talk I will present Apache Crail, what it does and how workloads based on TensorFlow or Spark can benefit from Crail. Read more.
Add to your personal schedule
4:40pm–5:20pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2009 Level: Intermediate
Alex Gorbachev (Pythian), Paul Speigelhalter (The Pythian Group)
Using the example of r a mining haul truck at a leading Canadian mining company, we will cover mapping preventive maintenance needs to supervised machine learning problems, creating labeled datasets, feature engineering from sensors and alerts data, evaluating models— then converting it all to a complete AI solution on Google Cloud Platform which is integrated with existing on-premise systems. Read more.
Add to your personal schedule
4:40pm–5:20pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI, Law and Ethics
Location: 2011 Level: Beginner
Michael Gregory (Cloudera)
The General Data Protection Regulation (GDPR) enacted by the European Union can restrict the use of Machine Learning practices in many cases. This presentation will provide an overview of the regulations, important considerations for both EU and non-EU organizations and tools and technologies to ensure that ML applications can appropriately be used to drive continued transformation and insights. Read more.
Add to your personal schedule
4:40pm–5:20pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2014 Level: Beginner
Shradha Agrawal (Adobe Systems Inc)
Decision making often struggles with the exploration-exploitation dilemma and multi-armed bandits (MAB) are popular Reinforcement Learning for tackling it. However, increasing the number of decision criteria leads to exponential blowup in complexity of MAB and observational delays doesn’t allow for optimal performance. This talk will introduce MAB and explain how to overcome the above challenges. Read more.
Add to your personal schedule
4:40pm–5:20pm Thursday, 03/28/2019
Session
Data Science, Machine Learning & AI
Location: 2016 Level: Intermediate
Christopher Lennan (idealo.de)
At idealo.de we trained Convolutional Neural Networks (CNN) for aesthetic and technical image quality predictions. We will present our training approach, practical insights, and shed some light on what the trained models actually learned by visualising the convolutional filter weights and output nodes of our trained models. Read more.
4:40pm–5:20pm Thursday, 03/28/2019
Location: 2018
TBC
4:40pm–5:20pm Thursday, 03/28/2019 TBC
Add to your personal schedule
4:40pm–5:20pm Thursday, 03/28/2019
Session
Data Engineering and Architecture
Location: 2024 Level: Beginner
Secondary topics:  Security
Julien Delange (Twitter), Neng Lu (Twitter)
This presentation presents how Twitter uses the heron data processing engine to monitor and analyze its network infrastructure. Within 2 months, infrastructure engineers implemented a new data pipeline that ingests multiple sources and processes about 1 billion of tuples to detect network issues generate usage statistics. The talk focuses on key technologies used, the architecture and challenges. Read more.