Presented By
O’Reilly + Cloudera
Make Data Work
29 April–2 May 2019
London, UK

Monday, 29/04/2019

9:00

Add to your personal schedule
9:00–17:00 Monday, 29/04/2019
Training
Data Science, Machine Learning & AI
Location: S11 C
Secondary topics:  Deep Learning
Ana Hocevar (The Data Incubator)
The TensorFlow library provides for the use of computational graphs, with automatic parallelization across resources. This architecture is ideal for implementing neural networks. This training will introduce TensorFlow's capabilities in Python. It will move from building machine learning algorithms piece by piece to using the Keras API provided by TensorFlow with several hands-on applications. Read more.
Add to your personal schedule
9:00–17:00 Monday, 29/04/2019
Training
Data Science, Machine Learning & AI
Location: Capital Suite 1
Secondary topics:  Data preparation, data governance, and data lineage
Don Fox (The Data Incubator)
We will walk through all the steps - from prototyping to production - of developing a machine learning pipeline. We’ll look at data cleaning, feature engineering, model building/evaluation, and deployment. Students will extend these models into two applications from real-world datasets. All work will be done in Python. Read more.
Add to your personal schedule
9:00–17:00 Monday, 29/04/2019
Training
Data Science, Machine Learning & AI
Location: Capital Suite 7
Secondary topics:  Deep Learning
Ian Cook (Cloudera)
Advancing your career in data science requires learning new languages and frameworks—but learners face an overwhelming array of choices, each with different syntaxes, conventions, and terminology. Ian Cook simplifies the learning process by elucidating the abstractions common to these systems. Through hands-on exercises, you'll overcome obstacles to getting started using new tools. Read more.
Add to your personal schedule
9:00–17:00 Monday, 29/04/2019
Training
Data Engineering and Architecture
Location: Capital Suite 16
Secondary topics:  Data Integration and Data Pipelines, Streaming and realtime analytics
Jesse Anderson (Big Data Institute)
Takes a participant through an in-depth look at Apache Kafka. We show how Kafka works and how to create real-time systems with it. It shows how to create consumers and publishers in Kafka. The we look at Kafka’s ecosystem and how each one is used. We show how to use Kafka Streams, Kafka Connect, and KSQL. Read more.
Add to your personal schedule
9:00–17:00 Monday, 29/04/2019
Training
Strata Business Summit
Location: Capital Suite 17
Secondary topics:  AI and machine learning in the enterprise
Angie Ma (ASI Data Science)
Angie Ma and Jonny Howell offer a condensed introduction to key AI and machine learning concepts and techniques, showing you what is (and isn't) possible with these exciting new tools and how they can benefit your organization. Read more.
Add to your personal schedule
9:00–17:00 Monday, 29/04/2019
Training
Data Science, Machine Learning & AI
Location: London Suite 3
Secondary topics:  Deep Learning, Model lifecycle management
Amir Issaei (Databricks)
The course covers the fundamentals of neural networks and how to build distributed Keras/TensorFlow models on top of Spark DataFrames. Throughout the class, you will use Keras, TensorFlow, Deep Learning Pipelines, and Horovod to build and tune models. You will also use MLflow to track experiments and manage the machine learning lifecycle. NOTE: This course is taught entirely in Python. Read more.
Add to your personal schedule
9:00–17:00 Monday, 29/04/2019
Training
Data Engineering and Architecture
Location: London Suite 3
Secondary topics:  AI and Data technologies in the cloud, Data Integration and Data Pipelines
Jorge Lopez (Amazon Web Services)
Serverless technologies let you build and scale applications and services rapidly without the need to provision or manage servers. In this workshop, we show you how to incorporate serverless concepts into your big data architectures, looking at design patterns to ingest, store, and analyze your data. You will build a big data application using AWS technologies such as S3, Athena, Kinesis, and more Read more.

10:30

10:30–11:00 Monday, 29/04/2019
Location: Capital Suite Foyer
Morning break (30m)

12:30

12:30–13:30 Monday, 29/04/2019
Location: Capital Suite Foyer
Lunch (1h)

15:00

15:00–15:30 Monday, 29/04/2019
Location: Capital Suite Foyer
Afternoon break (30m)

Tuesday, 30/04/2019

9:00

Add to your personal schedule
Add to your personal schedule
Add to your personal schedule
Add to your personal schedule
Add to your personal schedule
Add to your personal schedule
Add to your personal schedule
Add to your personal schedule
9:00–12:30 Tuesday, 30/04/2019
Tutorial
Data Science, Machine Learning & AI
Location: Capital Suite 2/3
Secondary topics:  AI and Data technologies in the cloud, Model lifecycle management
Holden Karau (Google), Trevor Grant (IBM), Ilan Filonenko (Bloomberg LP), Francesca Lazzeri (Microsoft)
This workshop will quickly introduce what Kubeflow is, and how we can use it to train and serve models across different cloud environments (and on-prem). We’ll have a script to do the initial set up work ready so you can jump (almost) straight into training a model on one cloud, and then look at how to set up serving in another cluster/cloud. We will start with a simple model w/follow up links. Read more.
Add to your personal schedule
9:00–12:30 Tuesday, 30/04/2019
Tutorial
Data Science, Machine Learning & AI
Location: Capital Suite 4
Secondary topics:  AI and Data technologies in the cloud, Deep Learning
Amy Unruh (Google)
This tutorial provides an introduction to designing and building machine learning models on Google Cloud Platform. Through a combination of presentations, demos, and hand-ons labs, you’ll learn machine learning (ML) and TensorFlow concepts, and develop skills in developing, evaluating, and productionizing ML models. Read more.
Add to your personal schedule
9:00–17:00 Tuesday, 30/04/2019
Location: Capital Suite 13
Nicolette Bullivant (Santander UK Technology), Ivan Danesi (UniCredit Services S.C.p.A.), Charlotte Werger (Van Lanschot Kempen), Yoav Einav (GigaSpaces), Yiannis Kanellopoulos (Code4Thought), Romi Mahajan (KKM Group), Rashed Iqbal (Investment and Development Office)
From analyzing risk and detecting fraud to predicting payments and improving customer experience, take a deep dive into the ways data technologies are transforming the financial industry. Read more.
Add to your personal schedule
9:00–12:30 Tuesday, 30/04/2019
Tutorial
Data Science, Machine Learning & AI
Location: Capital Suite 14
Secondary topics:  Model lifecycle management
Danilo Sato (ThoughtWorks), Christoph Windheuser (ThoughtWorks Inc.)
In this workshop, we will present how to apply the concept of Continuous Delivery (CD) - which ThoughtWorks pioneered - to data science and machine learning. It allows data scientists to make changes to their models, while at the same time safely integrating and deploying them into production, using testing and automation techniques to release reliably at any time and with a high frequency. Read more.
Add to your personal schedule
9:00–17:00 Tuesday, 30/04/2019
Location: Capital Suite 12
Alistair Croll (Solve For Interesting), Ganes Kesari (Gramener Inc), Alicia Williams (Google), Simon Moritz (Ericsson AB), Samuel Cristóbal (Innaxis), Volker Schnecke (Novo Nordisk), Julia Butter (Scout24 AG), Cecilia Marchi (Jakala), Caroline GOULARD (Dataveyes)
Hear practical insights from household brands and global companies: the challenges they tackled, approaches they took, and the benefits—and drawbacks—of their solutions. Read more.
Add to your personal schedule
9:00–12:30 Tuesday, 30/04/2019
Tutorial
Data Engineering and Architecture
Location: Capital Suite 8
Secondary topics:  Security and Privacy
Mark Donsky (Okera)
New regulations such as CCPA and GDPR are driving new compliance, governance, and security challenges for big data. Infosec and security groups must ensure a consistently secured and governed environment across multiple workloads that span on-prem, private cloud, multi-cloud, and hybrid cloud. We will share hands-on best practices for meeting these challenges, with special attention to CCPA. Read more.
Add to your personal schedule
9:00–12:30 Tuesday, 30/04/2019
Tutorial
Data Engineering and Architecture
Location: Capital Suite 9
Secondary topics:  AI and Data technologies in the cloud, Data Platforms
Mark Madsen (Think Big Analytics), Todd Walter (Teradata)
Building a data lake involves more than installing Hadoop or putting data into AWS. The goal in most organizations is to build a multiuse data infrastructure that is not subject to past constraints. Mark Madsen and Todd Walter explore design assumptions and principles and walk you through a reference architecture to use as you work to unify your analytics infrastructure. Read more.
Add to your personal schedule
9:00–12:30 Tuesday, 30/04/2019
Tutorial
Data Engineering and Architecture
Location: Capital Suite 10
Secondary topics:  Financial Services
Ted Malaska (Capital One), Jonathan Seidman (Cloudera)
The enterprise data management space has changed dramatically in recent years, and this had led to new challenges for organizations in creating successful data practices. In this presentation we’ll provide guidance and best practices from planning to implementation based on years of experience working with companies to deliver successful data projects. Read more.
Add to your personal schedule
9:00–12:30 Tuesday, 30/04/2019
Tutorial
Data Engineering and Architecture
Location: Capital Suite 11
Secondary topics:  Streaming and realtime analytics
Robin Moffatt (Confluent)
In this workshop you will learn the architectural reasoning for Apache Kafka and the benefits of real-time integration, and then build a streaming data pipeline using nothing but your bare hands, Kafka Connect, and KSQL. Read more.
Add to your personal schedule
9:00–12:30 Tuesday, 30/04/2019
Tutorial
Data Science, Machine Learning & AI
Location: Capital Suite 15
Secondary topics:  AI and Data technologies in the cloud, Data preparation, data governance, and data lineage, Health and Medicine
S.P.T. Krishnan (REAN Cloud (A Hitachi Vantara company))
Provides an overview of the latest Big Data and Machine Learning serverless technologies from AWS, and a deep dive into using them to process and analyze two different datasets. The first dataset is publicly available Bureau of Labor Statistics, and the second is Chest X-Ray Image Data. Read more.

10:30

10:30–11:00 Tuesday, 30/04/2019
Location: Capital Suite Foyer
Morning break (30m)

12:30

12:30–13:30 Tuesday, 30/04/2019
Location: Hall N11
Lunch (1h)

13:30

Add to your personal schedule
13:30–17:00 Tuesday, 30/04/2019
Tutorial
Streaming and IoT
Location: Capital Suite 2/3
Secondary topics:  Model lifecycle management, Streaming and realtime analytics
Boris Lublinsky (Lightbend), Dean Wampler (Lightbend)
This hands-on tutorial examines production use of ML in streaming data pipelines; how to do periodic model retraining and low-latency scoring in live streams. We'll discuss Kafka as the data backplane, pros and cons of microservices vs. systems like Spark and Flink, tips for Tensorflow and SparkML, performance considerations, model metadata tracking, and other techniques. Read more.
Add to your personal schedule
13:30–17:00 Tuesday, 30/04/2019
Tutorial
Data Science, Machine Learning & AI
Location: Capital Suite 4
Secondary topics:  AI and Data technologies in the cloud, Deep Learning
Amy Unruh (Google)
This tutorial provides an introduction to designing and building machine learning models on Google Cloud Platform. Through a combination of presentations, demos, and hand-ons labs, you’ll learn machine learning (ML) and TensorFlow concepts and develop skills in developing, evaluating, and productionizing ML models. Read more.
Add to your personal schedule
13:30–17:00 Tuesday, 30/04/2019
Tutorial
Data Science, Machine Learning & AI
Location: Capital Suite 14
Secondary topics:  Deep Learning, Text and Language processing and analysis
Alexander Thomas, Claudiu Branzan (G2 Web Services)
This is a hands-on tutorial for scalable NLP using the highly performant, highly scalable open-source Spark NLP library. You’ll spend about half your time coding as you work through four sections, each with an end-to-end working codebase that you can change and improve. Read more.
Add to your personal schedule
13:30–17:00 Tuesday, 30/04/2019
Tutorial
Strata Business Summit
Location: Capital Suite 8
Secondary topics:  AI and machine learning in the enterprise
Peter Aiken (Data BluePrint, DAMA International, Virginia Commonwealth University)
The presents a more operational perspective on the use of data strategy that is especially useful for organizations just getting started with data Read more.
Add to your personal schedule
13:30–17:00 Tuesday, 30/04/2019
Tutorial
Data Engineering and Architecture
Location: Capital Suite 9
Secondary topics:  AI and Data technologies in the cloud
Jason Wang (Cloudera), Tony Wu (Cloudera), Vinithra Varadharajan (Cloudera)
Moving to the cloud poses challenges from re-architecting to be cloud-native, to data context consistency across workloads that span multiple clusters on-prem and in the cloud. First, we’ll cover in depth cloud architecture and challenges; second, you’ll use Cloudera Altus to build data warehousing and data engineering clusters and run workloads that share metadata between them using Cloudera SDX. Read more.
Add to your personal schedule
13:30–17:00 Tuesday, 30/04/2019
Tutorial
Data Engineering and Architecture
Location: Capital Suite 10
Secondary topics:  AI and Data technologies in the cloud
Matt Fuller (Starburst)
Used by Facebook, Netflix, Airbnb, LinkedIn, Twitter, Uber, and others, Presto has become the ubiquitous open source software for SQL-on-Anything. Presto was built from the ground up for fast interactive SQL analytics against disparate data sources ranging in size from Gigabytes to Petabytes. In this tutorial, attendees will learn Presto usages, best practices, and optional hands on exercises. Read more.
Add to your personal schedule
13:30–17:00 Tuesday, 30/04/2019
Tutorial
Data Engineering and Architecture, Streaming and IoT
Location: Capital Suite 11
Secondary topics:  AI and Data technologies in the cloud, Data Integration and Data Pipelines, Streaming and realtime analytics, Temporal data and time-series
Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio)
Many industry segments have been grappling with fast data (high-volume, high-velocity data). In this tutorial we shall lead the audience through a journey of the landscape of state-of-the-art systems for each stage of an end-to-end data processing pipeline - messaging, compute and storage - for real-time data and algorithms to extract insights - e.g., heavy-hitters, quantiles - from data streams. Read more.
Add to your personal schedule
13:30–17:00 Tuesday, 30/04/2019
Tutorial
Data Science, Machine Learning & AI
Location: Capital Suite 15
Secondary topics:  AI and Data technologies in the cloud, Deep Learning, Financial Services, Temporal data and time-series
Francesca Lazzeri (Microsoft), Aashish Bhateja (Microsoft)
Time series modeling and forecasting has fundamental importance to various practical domains and, during the past few decades, machine learning model-based forecasting has become very popular in the private and the public decision-making process. In this tutorial, we will walk you through the core steps for using Azure Machine Learning to build and deploy your time series forecasting models. Read more.

15:00

15:00–15:30 Tuesday, 30/04/2019
Location: Hall N11
Afternoon break (30m)

17:00

Add to your personal schedule
17:00–18:00 Tuesday, 30/04/2019
Event
Location: Expo Hall
Join us after tutorials on Tuesday in the Expo Hall. Grab a drink and mingle with fellow Strata attendees while you check out all of the exhibitors. Read more.

Wednesday, 1/05/2019

8:15

Add to your personal schedule
8:15–8:45 Wednesday, 1/05/2019
Event
Location: TBD
Gather before keynotes on Wednesday morning for a speed networking event. Enjoy casual conversation while meeting fellow attendees. If one of your goals at Strata is to meet new people, this session will jumpstart your networking with fellow attendees. Read more.

9:00

Add to your personal schedule
9:00–10:45 Wednesday, 1/05/2019
Keynote
Location: Auditorium
Ben Lorica (O'Reilly Media), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Program Chairs, Ben Lorica, Alistair Croll, and Doug Cutting, welcome you to the first day of keynotes. Read more.

10:45

10:45–11:15 Wednesday, 1/05/2019
Location: Expo Hall
Morning break (30m)

11:15

Add to your personal schedule
11:15–11:55 Wednesday, 1/05/2019
Session
Data Engineering and Architecture
Location: Capital Suite 7
Secondary topics:  AI and Data technologies in the cloud, Data Integration and Data Pipelines, Media, Marketing, Advertising, Streaming and realtime analytics
Itai Yaffe (Nielsen)
At Nielsen Marketing Cloud, we provide our customers (marketers and publishers) real-time analytics tools to profile their target audiences. To achieve that, we need to ingest billions of events per day into our big data stores and we need to do it in a scalable yet cost-efficient manner. In this talk, we will discuss how we continuously transform our data infrastructure to support these goals. Read more.
Add to your personal schedule
11:15–11:55 Wednesday, 1/05/2019
Session
Data Science, Machine Learning & AI
Location: Capital Suite 17
Secondary topics:  Deep Learning, Financial Services, Temporal data and time-series
Sami Niemi (Barclays)
Predicting transaction fraud of debit and credit card payments in real-time is an important challenge, which state-of-art supervised machine learning models can help to solve. Barclays has been developing and testing different solutions and will show how well different models perform in variety of situations like card present and card not present debit and credit card transactions. Read more.
Add to your personal schedule
11:15–11:55 Wednesday, 1/05/2019
Session
Data Science, Machine Learning & AI
Location: Expo Hall (Capital Hall N24)
Secondary topics:  Media, Marketing, Advertising, Retail and e-commerce
Mounia Lalmas (Spotify)
The aim of our mission is "to match fans and artists in a personal and relevant way". In this talk, Mounia will describe some of the (research) work we are doing to achieve this, from using machine learning to metric validation. She will describe works done in the context of Home, Search and Voice. Read more.
Add to your personal schedule
11:15–11:55 Wednesday, 1/05/2019
Session
Data Engineering and Architecture
Location: S11 A
Secondary topics:  AI and Data technologies in the cloud
Wojciech Biela (Starburst), Piotr Findeisen (Starburst)
Presto is a popular open source distributed SQL engine for interactive queries over heterogeneous data sources (Hadoop/HDFS, Amazon S3/Azure ADSL, RDBMS, no-SQL, etc). Recently Starburst has contributed the Cost-Based Optimizer for Presto which brings a great performance boost for Presto. Learn about this CBO’s internals, the motivating use cases and observed improvements. Read more.
Add to your personal schedule
11:15–11:55 Wednesday, 1/05/2019
Session
Data Engineering and Architecture
Location: S11 B
Secondary topics:  AI and Data technologies in the cloud, Open Data, Data Generation and Data Networks, Security and Privacy
Felipe Hoffa (Google)
Before releasing a public dataset, practitioners need to thread the needle between utility and protection of individuals. We will explore massive public datasets, taking you from theory to real life showcasing newly available tools that help with PII detection and brings concepts like k-anonymity and l-diversity to the practical realm (with options such as removing, masking, and coarsening). Read more.
Add to your personal schedule
11:15–11:55 Wednesday, 1/05/2019
Session
Data Engineering and Architecture
Location: Capital Suite 8/9
Secondary topics:  AI and machine learning in the enterprise, Data Platforms, Deep Learning, Text and Language processing and analysis
Moty Fania (Intel)
In this session, Moty Fania will share his experience of implementing a Sales AI platform. It handles processing of millions of website pages and sifting thru millions of tweets per day. The platform is based on unique open source technologies and was designed for real-time, data extraction and actuation. This session highlights the key learnings with a thorough review of the architecture. Read more.
Add to your personal schedule
11:15–11:55 Wednesday, 1/05/2019
Session
Law and Ethics, Strata Business Summit
Location: Capital Suite 10/11
Secondary topics:  Security and Privacy
Mark Hinely (KirkpatrickPrice)
Organizations across the globe are trying to determine whether GDPR applies to them. Now, it seems as though GDPR principles are headed to the US. In 2018 alone, more ten states have passed or amended consumer privacy and breach notification laws. Mark Hinely will provide insight on the current and future data privacy laws in the US and how they will impact organizations across the globe. Read more.
Add to your personal schedule
11:15–11:55 Wednesday, 1/05/2019 Secondary topics:  AI and Data technologies in the cloud, AI and machine learning in the enterprise, IoT and its applications
Mike Olson (Cloudera)
It's easier than ever to collect data -- but managing it securely, in compliance with regulations and legal constraints is harder. There are plenty of tools that promise to bring machine learning techniques to your data -- but choosing the right tools, and managing models and applications in compliance with regulation and law is quite difficult. Read more.
Add to your personal schedule
11:15–11:55 Wednesday, 1/05/2019
Session
Data Science, Machine Learning & AI
Location: Capital Suite 14
Secondary topics:  Deep Learning, Media, Marketing, Advertising, Text and Language processing and analysis
In this talk you will learn how to use Spark NLP and Apache Spark to standardize semi-structured text. You will see how Indeed standardizes resume content at scale. Read more.
Add to your personal schedule
11:15–11:55 Wednesday, 1/05/2019
Session
Data Science, Machine Learning & AI
Location: Capital Suite 15/16
Secondary topics:  Ethics, Security and Privacy
The application of AI algorithms in domains such as criminal justice, credit scoring, and hiring holds unlimited promise. At the same time, it raises legitimate concerns about algorithmic fairness. There is a growing demand for fairness, accountability, and transparency from machine learning (ML) systems. In this talk we cover how to build just such a pipeline leveraging open source tools. Read more.
Add to your personal schedule
11:15–11:55 Wednesday, 1/05/2019
Jane McConnell (Teradata), Sun Maria Lehmann (Equinor)
Implementing Enterprise Data Management is never easy, but it's even harder in industrial and scientific organisations. Three worlds of business data, facilities data and scientific data have long been managed separately but must be brought together to realise business value. Sun and Jane will address the cultural and organisational differences as well as data management requirements to succeed. Read more.

12:05

Add to your personal schedule
12:05–12:45 Wednesday, 1/05/2019
Session
Data Engineering and Architecture, Streaming and IoT
Location: Capital Suite 7
Secondary topics:  Streaming and realtime analytics
Ted Dunning (MapR)
As a community, we have been pushing streaming architectures, particularly microservices, for several years now. But what are the results in the field? I will describe several (anonymized) case histories and describe the good, the bad and the ugly. In particular, I will describe how several teams who were new to big data fared by skipping map-reduce and jumping straight into streaming. Read more.
Add to your personal schedule
12:05–12:45 Wednesday, 1/05/2019
Session
Data Science, Machine Learning & AI
Location: Capital Suite 17
Secondary topics:  Deep Learning, Temporal data and time-series
Arun Kejariwal (Independent), Ira Cohen (Anodot)
Recently, Sequence-2-Sequence has also been used for applications based on time series data. In this talk, we first overview S2S and the early use cases of S2S. Subsequently, we shall walk through how S2S modeling can be leveraged for the aforementioned use cases, viz., real-time anomaly detection and forecasting. Read more.
Add to your personal schedule
12:05–12:45 Wednesday, 1/05/2019
Session
Data Science, Machine Learning & AI
Location: Expo Hall (Capital Hall N24)
Secondary topics:  AI and machine learning in the enterprise, Text and Language processing and analysis
Matthew Honnibal (Explosion AI)
In this talk, I'll discuss "one weird trick" that can give your NLP project a better chance of success. The advice is this: avoid a "waterfall" methodology where data definition, corpus construction, modelling and deployment are performed as separate phases of work. Read more.
Add to your personal schedule
12:05–12:45 Wednesday, 1/05/2019
Session
Data Engineering and Architecture
Location: S11 A
Secondary topics:  AI and Data technologies in the cloud
Jacques Nadeau (Dremio)
Performance and cost are two important considerations in determining optimized solutions for SQL workloads in the cloud. We look at TPC workloads and how they can be accelerated, invisible to client apps. We explore how Apache Arrow, Parquet, and Calcite can be used to provide a scalable, high-performance solution optimized for cloud deployments, while significantly reducing operational costs. Read more.
Add to your personal schedule
12:05–12:45 Wednesday, 1/05/2019
Session
Data Engineering and Architecture
Location: S11 B
Secondary topics:  Automation in data science and big data, Data preparation, data governance, and data lineage
Peter Billen (Accenture)
In this session we will explain how to use metadata to automate delivery and operations of a data platform. By injecting automation into the delivery processes we shorten the time-to-market while improving the quality of the initial user experience. Typical examples include: Data profiling and prototyping, Test automation, Continuous delivery and deployment, Automated code creation Read more.
Add to your personal schedule
12:05–12:45 Wednesday, 1/05/2019
Session
Data Engineering and Architecture
Location: Capital Suite 8/9
Secondary topics:  Data Integration and Data Pipelines
Robin Moffatt (Confluent)
This talk discusses the concepts of events, their relevance to software and data engineers and their ability to unify architectures in a powerful way. It describes why analytics, data integration and ETL fit naturally into a streaming world. There'll be a hands-on demonstration of these concepts in practice and commentary on the design choices made. Read more.
Add to your personal schedule
12:05–12:45 Wednesday, 1/05/2019
Session
Law and Ethics, Strata Business Summit
Location: Capital Suite 10/11
Secondary topics:  AI and machine learning in the enterprise, Ethics
Laila Paszti (GTC Law Group PC & Affiliates)
As companies commercialize novel applications of AI in areas such as finance, hiring, and public policy, there is concern that these automated decision-making systems may unconsciously duplicate social biases, with unintended societal consequences. This talk will provide practical advice for companies to counteract such prejudices through a legal and ethics based approach to innovation. Read more.
Add to your personal schedule
12:05–12:45 Wednesday, 1/05/2019 Secondary topics:  AI and machine learning in the enterprise, Open Data, Data Generation and Data Networks
Pete Skomoroch (Workday)
Companies that understand how to apply machine intelligence will scale and win their respective markets over the next decade. Others will fail to ship successful AI products that matter to customers. This talk describes how to combine product design, machine learning, and executive strategy to create a business where every product interaction benefits from your investment in machine intelligence. Read more.
Add to your personal schedule
12:05–12:45 Wednesday, 1/05/2019
Session
Data Science, Machine Learning & AI
Location: Capital Suite 14
Secondary topics:  Text and Language processing and analysis
Yves Peirsman (NLP Town)
In this age of big data, NLP professionals are all too often faced with a lack of data: written language is abundant, but labelled texts are much harder to get by. In my talk, I will discuss the most effective ways of addressing this challenge: from the semi-automatic construction of labelled training data to transfer learning approaches that reduce the need for labelled training examples. Read more.
Add to your personal schedule
12:05–12:45 Wednesday, 1/05/2019
Session
Data Science, Machine Learning & AI, Visualization and UX
Location: Capital Suite 15/16
Secondary topics:  Visualization, Design, and UX
Michael Freeman (University of Washington)
Statistical and machine learning techniques are only useful when they're understood by decision makers. While implementing these techniques is easier than ever, communicating about their assumptions and mechanics is not. In this session, participants will learn a design process for crafting visual explanations of analytical techniques and communicating them to stakeholders. Read more.
Add to your personal schedule
12:05–12:45 Wednesday, 1/05/2019
Session
Case studies, Strata Business Summit
Location: Capital Suite 12
Secondary topics:  Data Platforms, Retail and e-commerce
Dirk Petzoldt (Zalando SE)
Case Study from Europe’s leading online fashion platform Zalando about its journey to a scalable, personalized Machine Learning based marketing platform. Read more.

12:45

Add to your personal schedule
12:45–14:05 Wednesday, 1/05/2019
Event
Location: Lunch
Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics. Read more.

14:05

Add to your personal schedule
14:05–14:45 Wednesday, 1/05/2019
Session
Data Engineering and Architecture
Location: Capital Suite 7
Secondary topics:  AI and Data technologies in the cloud, Media, Marketing, Advertising, Streaming and realtime analytics
Simona Meriam (Nielsen)
Ingesting billions of events per day into our big data stores we need to do it in a scalable, cost-efficient and consistent way. When working with Spark and Kafka the way you manage your consumer offsets has a major implication on data consistency. We will go in depths of the solution we ended up implementing and discuss the working process, the dos and don'ts that led us to its final design. Read more.
Add to your personal schedule
14:05–14:45 Wednesday, 1/05/2019
Session
Data Science, Machine Learning & AI
Location: Capital Suite 17
Secondary topics:  Deep Learning, Text and Language processing and analysis
David Low (Pand.ai)
Transfer Learning has been proven to be a tremendous success in the Computer Vision field as a result of ImageNet competition. In the past months, the Natural Language Processing field has witnessed several breakthroughs with transfer learning, namely ELMo, OpenAI Transformer, and ULMFit. In this talk, David will be showcasing the use of transfer learning on NLP application with SOTA accuracy. Read more.
Add to your personal schedule
14:05–14:45 Wednesday, 1/05/2019
Data Science, Machine Learning & AI
Location: Expo Hall (Capital Hall N24)
Secondary topics:  Security and Privacy
Mikio Braun (Zalando SE)
In this talk, we will look at techniques and concepts around fairness, privacy, and security when it comes to machine learning models. Read more.
Add to your personal schedule
14:05–14:45 Wednesday, 1/05/2019
Session
Data Engineering and Architecture
Location: S11 A
Lars Volker (Cloudera), Anna Szonyi (Cloudera)
The Parquet format recently added column indexes, which improve the performance of query engines like Impala, Hive, and Spark on selective queries. We will cover the technical details of the design and its implementation, and we will give practical tips to help data architects leverage these new capabilities in their schema design. Finally, we will show performance results for common workloads. Read more.
Add to your personal schedule
14:05–14:45 Wednesday, 1/05/2019
Session
Data Engineering and Architecture
Location: S11 B
Secondary topics:  Data Platforms, Data preparation, data governance, and data lineage
Ananth Durai (Slack Technologies Inc)
Logs are everywhere. Every organization collects tons of data every day. The logs are as good as the trust it earns to make business-critical decisions. Building trust and reliability of logs are critical to creating a data-driven organization. Ananth walkthrough his experience building reliable logging infrastructure at Slack and how it helped to build confidence on data. Read more.
Add to your personal schedule
14:05–14:45 Wednesday, 1/05/2019
Session
Data Engineering and Architecture
Location: Capital Suite 8/9
Secondary topics:  Data Platforms, IoT and its applications, Retail and e-commerce, Temporal data and time-series
Jian Chang (Alibaba Group), Sanjian Chen (Alibaba Group)
We would like to share the architecture design and many detailed technology innovations of Alibaba TSDB, a state-of-the-art database for IoT data management, from years of development and continuous improvement. Read more.
Add to your personal schedule
14:05–14:45 Wednesday, 1/05/2019
Session
Law and Ethics, Strata Business Summit
Location: Capital Suite 10/11
Secondary topics:  AI and machine learning in the enterprise, Ethics
Duncan Ross (TES Global), Francine Bennett (Mastodon C)
Being good is hard. Being evil is fun and gets you paid more. Once more Duncan Ross and Francine Bennett explore how to do high-impact evil with data and analysis (and possibly AI). Make the maximum (negative) impact on your friends, your business, and the world—or use this talk to avoid ethical dilemmas, develop ways to deal responsibly with data, or even do good. But that would be perverse. Read more.
Add to your personal schedule
14:05–14:45 Wednesday, 1/05/2019 Secondary topics:  Data preparation, data governance, and data lineage, Security and Privacy
Mark Donsky (Okera), Steven Ross (Cloudera)
General Data Protection Regulation (GDPR) goes into effect in May 2018 for firms doing any business in the EU. However many companies aren't prepared for the strict regulation or fines for noncompliance (up to €20 million or 4% of global annual revenue). This session will explore the capabilities your data environment needs in order to simplify GDPR compliance, as well as future regulations. Read more.
Add to your personal schedule
14:05–14:45 Wednesday, 1/05/2019
Session
Data Science, Machine Learning & AI
Location: Capital Suite 14
Secondary topics:  Media, Marketing, Advertising, Text and Language processing and analysis
Maryam Jahanshahi (TapRecruit)
In this talk I will discuss exponential family embeddings, which are methods that extend the idea behind word embeddings to other data types. I will describe how we used dynamic embeddings to understand how data science skill-sets have transformed over the last 3 years using our large corpus of job descriptions. The key takeaway is that these models can enrich analysis of specialized datasets. Read more.
Add to your personal schedule
14:05–14:45 Wednesday, 1/05/2019
Session
Data Science, Machine Learning & AI
Location: Capital Suite 15/16
Secondary topics:  Financial Services, Temporal data and time-series
Alun Biffin (Van Lanschot Kempen), David Dogon (Van Lanschot Kempen)
In this talk we describe how machine learning revolutionized the stock picking process for portfolio managers at Kempen Capital Management by filtering the vast small-cap, investment universe down to a handful of optimal stocks. Read more.
14:05–14:45 Wednesday, 1/05/2019
Session
Case studies, Strata Business Summit
Location: Capital Suite 12
Secondary topics:  Data Platforms, Retail and e-commerce
TBC

14:55

Add to your personal schedule
14:55–15:35 Wednesday, 1/05/2019
Session
Data Engineering and Architecture, Streaming and IoT
Location: Capital Suite 7
Secondary topics:  AI and Data technologies in the cloud, IoT and its applications, Streaming and realtime analytics
Geir Endahl (Cognite), Daniel Bergqvist (Google)
Learn how Cognite is developing IIoT smart maintenance systems that can process 10M samples/second from thousands of sensors. We’ll review an architecture designed for high performance, robust streaming sensor data ingest and cost-effective storage of large volumes of time series data, best practices for aggregation and fast queries, and achieving high-performance with machine learning. Read more.
14:55–15:35 Wednesday, 1/05/2019
Session
Data Science, Machine Learning & AI
Location: Capital Suite 17
Secondary topics:  Deep Learning
TBC
Add to your personal schedule
14:55–15:35 Wednesday, 1/05/2019
Session
Data Science, Machine Learning & AI
Location: Expo Hall (Capital Hall N24)
Secondary topics:  Deep Learning
Wolff Dobson (Google)
In this talk, we will cover the latest in TensorFlow, both for beginners and for developers migrating from 1.x to 2.0. We'll cover the best ways to set up your model, feed your data to it, and distribute it for fast training. We'll also look at how TensorFlow has been recently upgraded to be more intuitive. Read more.
Add to your personal schedule
14:55–15:35 Wednesday, 1/05/2019
Session
Data Engineering and Architecture
Location: S11 A
Secondary topics:  AI and Data technologies in the cloud
Holden Karau (Google), Mikayla Konst (Google), Ben Sidhom (Google)
As more workloads move to “severless” like environments, the importance of properly handling downscaling increases. Read more.
Add to your personal schedule
14:55–15:35 Wednesday, 1/05/2019
Session
Data Engineering and Architecture
Location: S11 B
Secondary topics:  AI and Data technologies in the cloud, Model lifecycle management
Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio)
In this talk, we shall walk the audience through an architecture whereby models are served in real-time and the models are updated, using Apache Pulsar, without restarting the application at hand. Further, we will describe how Pulsar functions can be applied to support two example use cases, viz., sampling and filtering. We shall lead the audience through a concrete case study of the same. Read more.
Add to your personal schedule
14:55–15:35 Wednesday, 1/05/2019
Session
Data Engineering and Architecture
Location: Capital Suite 8/9
Secondary topics:  Data Integration and Data Pipelines, Data Platforms, Data preparation, data governance, and data lineage, Model lifecycle management, Security and Privacy, Transportation and Logistics
Mark Grover (Lyft), Deepak Tiwari (Lyft)
Lyft’s data platform is at the heart of Lyft’s business. Decisions all the way from pricing, to ETA, to business operations rely on Lyft’s data platform. Moreover, it powers the enormous scale and speed at which Lyft operates. In this talk, Mark Grover walks through various choices Lyft has made in the development and sustenance of the data platform and why along with what lies ahead in future. Read more.
Add to your personal schedule
14:55–15:35 Wednesday, 1/05/2019
Session
Law and Ethics, Strata Business Summit
Location: Capital Suite 10/11
Secondary topics:  Data Platforms, Transportation and Logistics, Visualization, Design, and UX
Our experience with building the Business Intelligence platform has been nothing short of extraordinary. The proposal contains details about how Uber thought about building it's Business Intelligence platform. In this talk, I’ll narrate the journey of deciding on how we took a platform approach rather than adding features in a piecemeal fashion. Read more.
Add to your personal schedule
14:55–15:35 Wednesday, 1/05/2019 Secondary topics:  AI and machine learning in the enterprise, Data preparation, data governance, and data lineage
Paco Nathan (derwen.ai)
Data governance is an almost overwhelming topic. This talk surveys history, themes, plus a survey of tools, process, standards, etc. Mistakes imply data quality issues, lack of availability, and other risks that prevent leveraging data. OTOH, compliance issues aim to preventing risks of leveraging data inappropriately. Ultimately, risk management plays the "thin edge of the wedge" in enterprise. Read more.
Add to your personal schedule
14:55–15:35 Wednesday, 1/05/2019
Session
Data Science, Machine Learning & AI
Location: Capital Suite 14
Secondary topics:  Data preparation, data governance, and data lineage
Ihab Ilyas (University of Waterloo | Tamr)
Last year, we covered two primary challenges in applying machine learning to data curation: entity consolidation & using probabilistic inference to suggest data repair for identified errors and anomalies. This year, we'll cover these limitations in greater detail and explain why data unification projects common to quickly require human guided machine learning and a probabilistic model. Read more.
Add to your personal schedule
14:55–15:35 Wednesday, 1/05/2019
Session
Data Science, Machine Learning & AI
Location: Capital Suite 15/16
Secondary topics:  Ethics, Financial Services, Health and Medicine
Eitan Anzenberg (Flowcast AI)
Machine learning applications balance interpretability and performance. Linear models provide formulas to directly compare the influence of the input variables, while non-linear algorithms produce more accurate models. We utilize "what-if" scenarios to calculate the marginal influence of features per prediction and compare with standardized methods such as LIME. Read more.
Add to your personal schedule
14:55–15:35 Wednesday, 1/05/2019
Session
Case studies, Strata Business Summit
Location: Capital Suite 12
David Maman (Binah.ai)
The combination of a mere of a few minutes of video, signal processing, remote heart rate monitoring, machine learning, and data science can identify a person’s emotions, health condition and performance. Financial institutions and potential employers can analyze whether you have good or bad intentions. Read more.

15:35

15:35–16:35 Wednesday, 1/05/2019
Location: Expo Hall
Afternoon break (1h)

16:35

Add to your personal schedule
16:35–17:15 Wednesday, 1/05/2019
Session
Data Engineering and Architecture
Location: Capital Suite 7
Secondary topics:  AI and Data technologies in the cloud, Automation in data science and big data
Constantin Muraru (Adobe), Dan Popescu (Adobe)
Obtaining servers to run your realtime application has never been easier. Cloud providers have removed the cumbersome process of provisioning new hardware, to suite your needs. What happens though when you wish to deploy your (web) applications frequently, on hundreds or even thousands of servers in a fast and reliable way with minimal human intervention? This session addresses this precise topic. Read more.
Add to your personal schedule
16:35–17:15 Wednesday, 1/05/2019
Session
Data Science, Machine Learning & AI
Location: Capital Suite 17
Secondary topics:  Deep Learning, Temporal data and time-series
Guoqiong Song (Intel)
Collecting and processing massive time series data (e.g., logs, sensor readings, etc.), and detecting the anomalies in real time is critical for many emerging smart systems, such as industrial, manufacturing, AIOps, IoT, etc. This talk will share how to detect anomalies of time series data using Analytics Zoo and BigDL at scale on a standard Spark cluster. Read more.
16:35–17:15 Wednesday, 1/05/2019
Session
Data Science, Machine Learning & AI
Location: Expo Hall (Capital Hall N24)
Secondary topics:  Ethics, Security and Privacy
TBC
Add to your personal schedule
16:35–17:15 Wednesday, 1/05/2019
Session
Data Engineering and Architecture
Location: S11 A
Secondary topics:  AI and Data technologies in the cloud, Data Integration and Data Pipelines
Anirudha Beria (Qubole), Rohit Karlupia (Qubole)
Autoscaling of resources aims to achieve low latency for a big data application, while reducing resource costs at the same time. Scalability aware autoscaling aims to use historical information to make better scaling decisions. In this talk we will talk about (1) Measuring efficiency of autoscaling policies and (2) coming up with more efficient autoscaling policies, in terms of latency and costs. Read more.
Add to your personal schedule
16:35–17:15 Wednesday, 1/05/2019
Session
Data Engineering and Architecture
Location: S11 B
Secondary topics:  Model lifecycle management
Arif Wider (ThoughtWorks), Emily Gorcenski (ThoughtWorks)
Machine learning can be challenging to deploy and maintain. Data change, and both models and the systems that implement them must be able to adapt. Any delays moving models from research to production means leaving your data scientists' best work on the table. In this talk, we explore continuous delivery (CD) for AI/ML, and explore case studies for applying CD principles to data science workflows. Read more.
Add to your personal schedule
16:35–17:15 Wednesday, 1/05/2019
Session
Data Engineering and Architecture
Location: Capital Suite 8/9
Secondary topics:  Data Platforms, Data preparation, data governance, and data lineage, Retail and e-commerce
Neelesh Salian (Stitch Fix)
Developing data infrastructure is not trivial and neither is changing it. It takes effort and discipline to make changes that can affect your team. In this talk, we shall learn what we, in Stitch Fix's Data Platform team, do to maintain and innovate our infrastructure for our Data Scientists. Read more.
Add to your personal schedule
16:35–17:15 Wednesday, 1/05/2019
Session
Strata Business Summit, Visualization and UX
Location: Capital Suite 10/11
Secondary topics:  Visualization, Design, and UX
Brian O'Neill (Designing for Analytics)
Gartner says 85%+ of big data projects will fail, despite the fact your company may have invested millions on engineering implementation. Why are customers and employees not engaging with these products and services? Brian O'Neill explains why a "people first, technology second" mission—a design strategy, in other words—enables the best UX and business outcomes possible. Read more.
Add to your personal schedule
16:35–17:15 Wednesday, 1/05/2019
Ellen Friedman (MapR Technologies)
A surprising fact of modern technology is that not knowing some things can make you better at what you do. This isn’t just lack of distraction or being too delicate to face reality. It’s about separation of concerns, with a techno flavor. In this talk I go through five things that best practice with emerging technologies and new architectures can give us ways to not know, and why that’s important. Read more.
Add to your personal schedule
16:35–17:15 Wednesday, 1/05/2019
Session
Data Science, Machine Learning & AI
Location: Capital Suite 14
Secondary topics:  AI and machine learning in the enterprise, Data preparation, data governance, and data lineage, Text and Language processing and analysis, Transportation and Logistics
Divya Choudhary (GOJEK)
Data scientists around the globe would agree that addresses are the most unorganised textual data. Structuring addresses has almost led to a new stream of NLP itself. Who would've imagined that address text data can be used to develop one of the coolest product feature of finding the most precise pick up/drop-off locations for e-commerce, logistics, food delivery or ride/car services companies! Read more.
Add to your personal schedule
16:35–17:15 Wednesday, 1/05/2019
Session
Data Science, Machine Learning & AI
Location: Capital Suite 15/16
Secondary topics:  Automation in data science and big data, Temporal data and time-series
Shivnath Babu (Unravel Data Systems | Duke University), Alkis Simitsis (Micro Focus)
Cost and resource provisioning are critical components of the big data stack. A magic 8-ball for the big data stack would give an enterprise a glimpse into its future needs and would enable effective and cost-efficient project and operational planning. This talk covers how to build that magic 8-ball, a decomposable time-series model, for optimal cost and resource allocation for the big data stack. Read more.
Add to your personal schedule
16:35–17:15 Wednesday, 1/05/2019
Session
Case studies, Strata Business Summit
Location: Capital Suite 12
Secondary topics:  Data preparation, data governance, and data lineage, Financial Services, Security and Privacy
Maurício Lins (everis consultancy UK), Lidia Crespo (Santander UK)
Big data is usually regarded as a menace for data privacy. However, with the right principles and mind-set, it can be a game changer to put customers first and consider data privacy an inalienable right. Santander UK applied this model to comply with GDPR by using graph technology, Hadoop, Spark, Kudu to drive data obscuring and data portability, and driving machine learning exploration. Read more.

17:25

Add to your personal schedule
17:25–18:05 Wednesday, 1/05/2019
Session
Data Engineering and Architecture
Location: Capital Suite 7
Secondary topics:  Data Integration and Data Pipelines, Financial Services, Streaming and realtime analytics
Ted Malaska (Capital One)
In the world of data it is all about building the best path to support time/quality to value. 80% to 90% of the work is getting the data into the hands and tools that can create value. This talk will take us on a journey of different patterns and solution that can work at the largest of companies. Read more.
17:25–18:05 Wednesday, 1/05/2019
Session
Data Science, Machine Learning & AI
Location: Capital Suite 17
Secondary topics:  Deep Learning, Retail and e-commerce
TBC
Add to your personal schedule
17:25–18:05 Wednesday, 1/05/2019
Session
Data Engineering and Architecture
Location: S11 A
Secondary topics:  Data Platforms, Transportation and Logistics
Felix Cheung (Uber)
Did you know that your Uber rides are powered by Apache Spark? Join Felix Cheung to learn how Uber is building its data platform with Apache Spark at enormous scale and discover the unique challenges the company faced and overcame. Read more.
Add to your personal schedule
17:25–18:05 Wednesday, 1/05/2019
Session
Data Engineering and Architecture
Location: S11 B
Secondary topics:  AI and Data technologies in the cloud, Data Platforms
Mark Samson (Cloudera)
It is now possible to build a modern data platform capable of storing, processing and analysing a wide variety of data across multiple public and private Cloud platforms and on-premise data centres. This session will outline an information architecture for such a platform, informed by working with multiple large organisations who have built such platforms over the last 5 years. Read more.
Add to your personal schedule
17:25–18:05 Wednesday, 1/05/2019
Session
Data Engineering and Architecture
Location: Capital Suite 8/9
Secondary topics:  Data Platforms
Hussein Mehanna (Google Cloud)
AI will change how we live in the next 30 years. However, AI is still limited to a small group of companies. Building AI systems is expensive and difficult. But in order to scale the impact of AI across the globe, we need to reduce the cost of building AI solutions? How can we do that? Can we learn from other industries? Yes, we can. The automobile industry went through a similar cycle. Read more.
Add to your personal schedule
17:25–18:05 Wednesday, 1/05/2019
Session
Strata Business Summit, Visualization and UX
Location: Capital Suite 10/11
Secondary topics:  Visualization, Design, and UX
Mars Geldard (University of Tasmania), Paris Buttfield-Addison (Secret Lab Pty. Ltd.)
Science-fiction has been showcasing complex, AI-driven (often AR or VR) interfaces (for huge amounts of data!) for decades. As television, movies, and video games became more capable of visualising a possible future, the grandeur of these imagined science fictional interfaces has increased. What can we learn from Hollywood UX? Is there a useful takeaway? Does sci-fi show the future of AI UX? Read more.
Add to your personal schedule
17:25–18:05 Wednesday, 1/05/2019 Secondary topics:  AI and machine learning in the enterprise, Financial Services, Graph technologies and analytics
Teresa Tung (Accenture Labs), Jean-Luc Chatelain (Accenture)
How do enterprises scale moving beyond one-off AI projects to making it re-usable? Teresa Tung and Jean-Luc Chatelain explain how domain knowledge graphs—the same technology behind today's Internet search—can bring the same democratized experience to enterprise AI. Beyond search applications, we show other applications of knowledge graphs in oil & gas, financial services, and enterprise IT. Read more.
Add to your personal schedule
17:25–18:05 Wednesday, 1/05/2019
Session
Data Science, Machine Learning & AI
Location: Capital Suite 14
Secondary topics:  Security and Privacy
Chris Wallace (Cloudera)
Imagine building a model whose training data is collected on edge devices such as cell phones or sensors. Each device collects data unlike any other, and the data cannot leave the device because of privacy concerns or unreliable network access. This challenging situation is known as federated learning. In this talk we’ll cover the algorithmic solutions and the product opportunities. Read more.
Add to your personal schedule
17:25–18:05 Wednesday, 1/05/2019
Session
Data Science, Machine Learning & AI
Location: Capital Suite 15/16
Secondary topics:  Text and Language processing and analysis
Weifeng Zhong (American Enterprise Institute)
We developed a machine learning algorithm to “read” the People’s Daily — the official newspaper of the Communist Party of China — and predict changes in China’s policy priorities using only the information in the newspaper. The output of this algorithm, which we call the Policy Change Index (PCI) of China, turns out to be a leading indicator of the actual policy changes in China since 1951. Read more.
Add to your personal schedule
17:25–18:05 Wednesday, 1/05/2019
Session
Law and Ethics, Strata Business Summit
Location: Capital Suite 12
Secondary topics:  Ethics
Duncan Ross (TES Global), Giselle Cory (DataKind)
DataKind UK has been working in data for good since 2013 working with over 100 uk charities, helping them to do data science for the benefit of their users. Some of those projects have delivered above and beyond expectations - others haven't. In this session Duncan and Giselle will talk about how to identify the right data for good projects... Read more.

18:05

Add to your personal schedule
18:05–19:05 Wednesday, 1/05/2019
Event
Location: Expo Hall
Unwind after a long day of sessions with small bites and drinks while networking with Strata attendees, exhibitors, and sponsors. Read more.

19:05

19:05–20:00 Wednesday, 1/05/2019
Location: On Your Own
Dinner (55m)

20:00

Add to your personal schedule
20:00–22:00 Wednesday, 1/05/2019
Event
Location: Madison London: One New Change, St Paul’s, London
Make plans to join us for Data After Dark during Strata. Food, drink, and entertainment will be provided at a venue in London which surely promises to be the highlight of Strata. Read more.

Thursday, 2/05/2019

8:15

Add to your personal schedule
8:15–8:45 Thursday, 2/05/2019
Event
Location: TBD
Gather before keynotes on Thursday morning for a speed networking event. Enjoy casual conversation while meeting fellow attendees. If one of your goals at Strata is to meet new people, this session will jumpstart your networking with fellow attendees. Read more.

9:00

Add to your personal schedule
9:00–10:45 Thursday, 2/05/2019
Keynote
Location: Auditorium
Ben Lorica (O'Reilly Media), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Program Chairs, Ben Lorica, Doug Cutting, and Alistair Croll, welcome you to the second day of keynotes. Read more.

10:45

10:45–11:15 Thursday, 2/05/2019
Location: Expo Hall
Morning break (30m)

11:15

Add to your personal schedule
11:15–11:55 Thursday, 2/05/2019
Session
Data Engineering and Architecture, Streaming and IoT
Location: Capital Suite 7
Secondary topics:  Data Platforms, Streaming and realtime analytics, Transportation and Logistics
Thomas Weise (Lyft)
Fast data and stream processing are essential for making Lyft rides a good experience for passengers and drivers. Our systems need to track and react to event streams in real-time, to update locations, compute routes and estimates, balance prices and more. The streaming platform at Lyft powers these use cases with development frameworks and deployment stack that are based on Apache Flink and Beam. Read more.
Add to your personal schedule
11:15–11:55 Thursday, 2/05/2019
Session
Data Science, Machine Learning & AI
Location: Capital Suite 17
Secondary topics:  Deep Learning, Graph technologies and analytics, Security and Privacy
Scott Stevenson (Faculty)
Modern deep learning systems allow us to build speech synthesis systems with the naturalness of a human speaker. Whilst there are myriad benevolent applications, this also ushers in a new era of fake news. This talk will explore the danger of such systems, as well as how deep learning can also be used to build countermeasures to protect against political disinformation. Read more.
Add to your personal schedule
11:15–11:55 Thursday, 2/05/2019
Session
Data Science, Machine Learning & AI
Location: Expo Hall (Capital Hall N24)
Secondary topics:  Ethics
Machine-learning algorithms are good at learning new behaviors, but bad at identifying when those behaviors are harmful or don’t make sense. Bias, ethics, and fairness is a big risk factor in Machine Learning (ML). We have a lot of experience dealing with intelligent beings—one another. In this talk, we use this common sense to build a checklist for protecting against ethical violations with ML. Read more.
Add to your personal schedule
11:15–11:55 Thursday, 2/05/2019
Session
Data Engineering and Architecture
Location: S11 A
Manish Maheshwari (Cloudera), Lars Volker (Cloudera)
Apache Impala is a MPP SQL query engine for planet scale queries. When set up and used properly, Impala is able to handle hundreds of nodes and tens of thousands of queries hourly. In this talk, we will discuss how to avoid pitfalls in Impala configuration (memory limits, admission pools, metadata management, statistics), along with best practices and antipatterns for end users or BI applications. Read more.
Add to your personal schedule
11:15–11:55 Thursday, 2/05/2019
Session
Data Engineering and Architecture
Location: S11 B
Secondary topics:  AI and Data technologies in the cloud
Jian Zhang (Intel), Chendi Xue (Intel), Yuan Zhou (Intel)
Introduce the challenges of migrating bigdata analytics workloads to public cloud - like performance lost, and missing features. Show case how to the new in memory data accelerator leveraging persistent memory and RDMA NICs can resolve this issues and enables new opportunities for bigdata workloads on the cloud. Read more.
Add to your personal schedule
11:15–11:55 Thursday, 2/05/2019
Session
Data Engineering and Architecture
Location: Capital Suite 8/9
Secondary topics:  Data preparation, data governance, and data lineage, Financial Services
Sandeep U (Intuit)
Teams today rely on tribal data dictionaries which is a mixed bag w.r.t. correctness -- some datasets have accurate attribute details, while others are incorrect & outdated. This significantly impacts productivity of analysts & scientists. Existing tools for data dictionary are manually updated and difficult to maintain. This talk covers 3 patterns we have deployed to manage data dictionaries. Read more.
Add to your personal schedule
11:15–11:55 Thursday, 2/05/2019
Session
Data Engineering and Architecture
Location: Capital Suite 10/11
Secondary topics:  AI and Data technologies in the cloud, Data Integration and Data Pipelines, Financial Services, Security and Privacy
Eoin O'Flanagan (Newday), Darragh McConville (Kainos)
In this session you will learn how we have built a high-performance contemporary data processing platform, from the ground up, on AWS. We will discuss our journey from legacy, onsite, traditional data estate to an entirely cloud-based, PCI DSS-compliant platform. Read more.
Add to your personal schedule
11:15–11:55 Thursday, 2/05/2019 Secondary topics:  AI and machine learning in the enterprise, Transportation and Logistics
Brandy Freitas (Pitney Bowes)
Data science is an approachable field given the right framing. Often, though, practitioners and executives are describing opportunities using completely different languages. In this session, Harvard Biophysicist-turned-Data Scientist, Brandy Freitas, will work with participants to develop context and vocabulary around data science topics to help build a culture of data within their organization. Read more.
Add to your personal schedule
11:15–11:55 Thursday, 2/05/2019
Session
Data Science, Machine Learning & AI
Location: Capital Suite 14
Secondary topics:  AI and machine learning in the enterprise, Financial Services, Security and Privacy, Text and Language processing and analysis
Charlotte Werger (Van Lanschot Kempen)
This talk discusses a best practice use case for detecting fraud at a financial institution. Where traditional systems fall short, machine learning models can provide a solution. Sifting through large amounts of transaction data, external hit lists, and unstructured text data we managed to build a dynamic and robust monitoring system that successfully detects unwanted client behavior. Read more.
Add to your personal schedule
11:15–11:55 Thursday, 2/05/2019
Session
Data Science, Machine Learning & AI
Location: Capital Suite 15/16
Secondary topics:  Media, Marketing, Advertising, Retail and e-commerce
Sophie Watson (Red Hat)
Identifying relevant documents quickly and efficiently enhances both user experience and business revenue every day. Sophie Watson demonstrates how to implement Learning to Rank algorithms and provides you with the information you need to implement your own successful ranking system. Read more.
Add to your personal schedule
11:15–11:55 Thursday, 2/05/2019
Session
Case studies, Strata Business Summit
Location: Capital Suite 12
Secondary topics:  Health and Medicine
Fabio Ferraretto (Accenture), Tatiane Canero (Hospital Albert Einstein)
How Albert Einstein and Accenture evolved patient flow experience and efficiency with the use of applied AI, statistics and combinatorial math, allowing the hospital to antecipate E2E visibility within patient flow operations, from admission of emergency and ellective demands, to assignment and medical releases. Read more.

12:05

Add to your personal schedule
12:05–12:45 Thursday, 2/05/2019
Session
Data Engineering and Architecture
Location: Capital Suite 7
Secondary topics:  AI and Data technologies in the cloud, Model lifecycle management
Kai Wähner (Confluent)
How can you leverage the flexibility and extreme scale in public cloud combined with Apache Kafka ecosystem to build scalable, mission-critical machine learning infrastructures, which span multiple public clouds or bridge your on-premise data centre to cloud? Join this talk to learn how to apply technologies such as TensorFlow with Kafka’s open source ecosystem for machine learning infrastructures Read more.
Add to your personal schedule
12:05–12:45 Thursday, 2/05/2019
Session
Data Science, Machine Learning & AI
Location: Capital Suite 17
Swetha Machanavajhala (Microsoft), Xiaoyong Zhu (Microsoft)
In this auditory world, the human brain processes and reacts effortlessly to a variety of sounds. While many of us take this for granted, there are over 360 million in this world who are deaf or hard of hearing. We will explain how to make the auditory world inclusive and meet the great demand in other sectors by applying deep learning on audio in Azure. Read more.
Add to your personal schedule
12:05–12:45 Thursday, 2/05/2019
Session
Data Science, Machine Learning & AI
Location: Expo Hall (Capital Hall N24)
Secondary topics:  Text and Language processing and analysis
Ines Montani (Explosion AI)
In this talk, I'll explain spaCy's new support for efficient and easy transfer learning, and show you how it can kickstart new NLP projects with our new annotation tool, Prodigy Scale. Read more.
Add to your personal schedule
12:05–12:45 Thursday, 2/05/2019
Session
Data Engineering and Architecture
Location: S11 A
Secondary topics:  AI and Data technologies in the cloud, Data Integration and Data Pipelines, Data Platforms, Streaming and realtime analytics
David Josephsen (Sparkpost)
This is the story of how Sparkpost Reliability Engineering abandoned ELK for a DIY Schema-On-Read logging infrastructure. We share architectural details and tribulations from our _Internal Event Hose_ data ingestion pipeline project, which uses Fluentd, Kinesis, Parquet and AWS Athena to make logging sane. Read more.
Add to your personal schedule
12:05–12:45 Thursday, 2/05/2019
Session
Data Engineering and Architecture
Location: S11 B
Secondary topics:  Data Platforms
Pradeep Bhadani (Hotels.com), Elliot West (Hotels.com)
Expedia Group is a travel platform with an extensive portfolio including Expedia.com and Hotels.com. We like to give our data teams flexibility and autonomy to work with different technologies. However, this approach generates challenges that cannot be solved by existing tools. We'll explain how we built a unified virtual data lake on top of our many heterogeneous and distributed data platforms. Read more.
Add to your personal schedule
12:05–12:45 Thursday, 2/05/2019
Session
Data Engineering and Architecture
Location: Capital Suite 8/9
Secondary topics:  Data Platforms, Security and Privacy, Transportation and Logistics
Václav Surovec (Deutsche Telekom IT), Gabor Kotalik (Deutsche Telekom AG)
The knowledge of location and travel patterns of customers is important for many companies. One of them is a German telco service operator T-Mobile Czech Republic. Commercial Roaming project using Cloudera Hadoop helped the company to better analyze the behavior of its customers from 10 countries, in a very secure way, to be able to provide better predictions and visualizations for the management. Read more.
Add to your personal schedule
12:05–12:45 Thursday, 2/05/2019
Session
Data Engineering and Architecture
Location: Capital Suite 10/11
Secondary topics:  AI and machine learning in the enterprise
Rebecca Simmonds (Red Hat), Michael McCune (Red Hat)
Artificial intelligence and machine learning are now popularly used terms but how do we make use of these techniques, without throwing away the valuable knowledge of experienced employees. This session will delve into this idea with examples of how distributed machine learning frameworks fit together naturally with business rules management systems. Read more.
Add to your personal schedule
12:05–12:45 Thursday, 2/05/2019 Secondary topics:  IoT and its applications, Security and Privacy
Alasdair Allan (Babilim Light Industries)
A arrival of new generation of smart embedded hardware may cause the demise of large scale data harvesting. In its place smart devices will allow us process data at the edge, allowing us to extract insights from the data without storing potentially privacy and GDPR infringing data. The current age where privacy is no longer "a social norm" may not long survive the coming of the Internet of Things. Read more.
Add to your personal schedule
12:05–12:45 Thursday, 2/05/2019
Session
Data Science, Machine Learning & AI
Location: Capital Suite 14
Secondary topics:  Deep Learning, Text and Language processing and analysis
Moshe Wasserblat presents an overview of NLP Architect, an open source DL NLP library that provides SOTA NLP models making it easy for researchers to implement NLP algorithms and for data scientists to build NLP based solution for extracting insight from textual data to improve business operations. Read more.
Add to your personal schedule
12:05–12:45 Thursday, 2/05/2019
Session
Data Science, Machine Learning & AI
Location: Capital Suite 15/16
SEONMIN KIM (LINE Corp)
Kim will provide an introduction to activities that mitigate the risk of mobile payments through various data analytical skills which came out of actual case studies of mobile frauds, along with tree-based machine learning, graph analytics, and statistical approaches. Read more.
Add to your personal schedule
12:05–12:45 Thursday, 2/05/2019 Secondary topics:  AI and machine learning in the enterprise
Vidya Raman (Cloudera)
Not surprisingly, there is no single approach to embracing data-driven innovations within any industry vertical. However, there are some enterprises that are doing a better job than others when it comes to establishing a culture, process and infrastructure that lends itself to data-driven innovations. In this talk, we will share some key foundational ingredients that span multiple industries. Read more.

12:45

Add to your personal schedule
12:45–14:05 Thursday, 2/05/2019
Event
Location: Lunch
Topic Table discussions are a great way to informally network with people in similar industries or interested in the same topics. Read more.

14:05

Add to your personal schedule
14:05–14:45 Thursday, 2/05/2019
Session
Data Engineering and Architecture
Location: Capital Suite 7
Secondary topics:  AI and Data technologies in the cloud
Holden Karau (Google), Kris Nova (VMware)
In the Kubernetes world where declarative resources are a first class citizen, running complicated workloads across distributed infrastructure is easy, and processing big data workloads using Spark is common practice -- we can finally look at constructing a hybrid system of running Spark in a distributed cloud native way. Join respective experts Kris Nova & Holden Karau for a fun adventure. Read more.
Add to your personal schedule
14:05–14:45 Thursday, 2/05/2019
Session
Data Science, Machine Learning & AI
Location: Capital Suite 17
Secondary topics:  Deep Learning
Deep Learning has enabled massive breakthroughs in offbeat tracks and has enabled better understanding of how an artist paints, how an artist composes music and so on. As part of Nischal & Raghotham’s loved project - Deep Learning for Humans, they want to build a font classifier and showcase to masses how fonts : * Can be classified * Understand how and why two or more fonts are similar Read more.
Add to your personal schedule
14:05–14:45 Thursday, 2/05/2019
Session
Data Science, Machine Learning & AI
Location: Expo Hall (Capital Hall N24)
Secondary topics:  Data Integration and Data Pipelines, Deep Learning
Alex Jaimes (Dataminr)
When emergency events occur, social signals and sensor data are generated. In this talk, I will describe how Machine Learning and Deep Learning are applied in processing large amounts of heterogeneous data from various sources in real time, with a particular focus on how such information can be used for emergencies and in critical events for first responders and for other social good use cases. Read more.
Add to your personal schedule
14:05–14:45 Thursday, 2/05/2019
Session
Data Engineering and Architecture
Location: S11 A
Elliot West (Hotels.com), Jaydene Green (Hotels.com)
Hotels.com describe approaches for applying software engineering best practices to SQL-based data applications in order to improve maintainability and data quality. Using open source tools we show how to build effective test suites for Apache Hive code bases. We also present Mutant Swarm, a mutation testing tool we’ve developed to identify weaknesses in tests and to measure SQL code coverage. Read more.
Add to your personal schedule
14:05–14:45 Thursday, 2/05/2019
Session
Data Engineering and Architecture
Location: S11 B
Secondary topics:  Security and Privacy
Marcel Ruiz Forns (Wikimedia Foundation)
Analysts and researchers studying Wikipedia are hungry for long term data to build experiments and feed data-driven decisions. But Wikipedia has a strict privacy policy that prevents storing privacy-sensitive data over 90 days. The Wikimedia Foundation's analytics team is working on a vegan data diet to satisfy both. Read more.
Add to your personal schedule
14:05–14:45 Thursday, 2/05/2019
Session
Data Engineering and Architecture
Location: Capital Suite 8/9
Secondary topics:  AI and Data technologies in the cloud, AI and machine learning in the enterprise, Data Platforms, Transportation and Logistics
Willem Pienaar (GO-JEK), Zhi Ling Chen (GO-JEK)
Features are key to driving impact with AI at all scales. By democratizing the creation, discovery, and access of features through a unified platform, organizations are able to dramatically accelerate innovation and time to market. Find out how GO-JEK, Indonesia's first billion-dollar startup, built a feature platform to unlock insights in AI, and the lessons they learned along the way. Read more.
Add to your personal schedule
14:05–14:45 Thursday, 2/05/2019
Session
Data Engineering and Architecture
Location: Capital Suite 10/11
Secondary topics:  Data Integration and Data Pipelines, Data Platforms, Transportation and Logistics, Visualization, Design, and UX
Ravi Suhag (Go Jek)
At GO-JEK, we build products that help millions of Indonesians commute, shop, eat and pay, daily. The Data team is responsible to create resilient and scalable data infrastructure across all of GO-JEK’s 18+ products. This involves building distributed big data infrastructure, real-time analytics and visualization pipelines for billions of data points per day. Read more.
14:05–14:45 Thursday, 2/05/2019 Secondary topics:  AI and machine learning in the enterprise
TBC
Add to your personal schedule
14:05–14:45 Thursday, 2/05/2019
Session
Data Science, Machine Learning & AI
Location: Capital Suite 14
Secondary topics:  Graph technologies and analytics
Mingxi Wu (TigerGraph)
Graph query language is the key to unleash the value from connected data. In this talk, we point out 8 prerequisites of a practical graph query language concluded from our 6 years experience in dealing with real world graph analytical use cases. And compare GSQL, Gremlin, Cypher and Sparql in this regard. Read more.
Add to your personal schedule
14:05–14:45 Thursday, 2/05/2019
Session
Data Science, Machine Learning & AI
Location: Capital Suite 15/16
Secondary topics:  IoT and its applications, Temporal data and time-series
Christian Hidber (bSquare)
Reinforcement learning (RL) learns complex processes autonomously like walking, beating the world champion in go or flying a helicopter. No big data sets with the “right” answers are needed: the algorithms learn by experimenting. We show “how” and “why” RL works in an intuitive fashion & highlight how to apply it to an industrial, hydraulics application with 7000 clients in 42 countries. Read more.
Add to your personal schedule
14:05–14:45 Thursday, 2/05/2019
Session
Case studies, Strata Business Summit
Location: Capital Suite 12
Secondary topics:  AI and machine learning in the enterprise
Rosaria Silipo (KNIME)
This is a collection of past data science projects. While the structure is often similar - data collection, data transformation, model training, deployment - each one of them has needed some special trick. It was either the change in perspective or a particular techniques to deal with special case and special business questions the turning point in implementing the data science solution. Read more.

14:55

Add to your personal schedule
14:55–15:35 Thursday, 2/05/2019
Session
Data Engineering and Architecture
Location: Capital Suite 7
Secondary topics:  Streaming and realtime analytics, Temporal data and time-series
Erik Nordström (Timescale)
Requirements of time-series databases include ingesting high volumes of structured data; answering complex, performant queries for both recent & historical time intervals; & performing specialized time-centric analysis & data management. I explain how one can avoid these operational problems by re-engineering Postgres to serve as a general data platform, including high-volume time-series workloads Read more.
Add to your personal schedule
14:55–15:35 Thursday, 2/05/2019
Session
Data Science, Machine Learning & AI
Location: Capital Suite 17
Secondary topics:  AI and machine learning in the enterprise, Deep Learning
Yoav Einav (GigaSpaces)
Technological advancements are transforming customer experience, and businesses are beginning to benefit from Deep Learning innovations to automate call center routing to the most proper agent. This session will discuss how Deep Learning models can be run with Intel BigDL and Spark frameworks co-located on an in-memory computing platform to enhance the customer experience without the need for GPUs Read more.
Add to your personal schedule
14:55–15:35 Thursday, 2/05/2019
Session
Data Science, Machine Learning & AI
Location: Expo Hall (Capital Hall N24)
Secondary topics:  Deep Learning, Media, Marketing, Advertising, Retail and e-commerce
Oliver Gindele (Datatonic)
The success of Deep Learning has reached the realm of structured data in the past few years where neural network have shown to improve the effectiveness and predictability of recommendation engines. This session will give a brief overview of such deep recommender systems and how they can be implemented in TensorFlow. Read more.
Add to your personal schedule
14:55–15:35 Thursday, 2/05/2019
Session
Data Engineering and Architecture
Location: S11 A
Secondary topics:  AI and Data technologies in the cloud
Greg Rahn (Cloudera)
Data warehouses have traditionally run in the data center and in recent years they have adapted to be more cloud-native. In this talk, we'll discuss a number of emerging trends and technologies that will impact how data warehouses are run both in the cloud and on-prem and share our vision on what that means for architects, administrators, and end users. Read more.
Add to your personal schedule
14:55–15:35 Thursday, 2/05/2019
Session
Data Engineering and Architecture
Location: S11 B
Secondary topics:  Automation in data science and big data, Data preparation, data governance, and data lineage
Sonal Goyal (Nube)
Enterprise data on customers, vendors, products etc is siloed and represented differently in diverse systems, hurting analytics, compliance, regulatory reporting and 360 views. Traditional rule based MDM systems with legacy architectures struggle to unify this growing data. This talk covers a modern master data application using Spark, Cassandra, ML and Elastic. Read more.
Add to your personal schedule
14:55–15:35 Thursday, 2/05/2019
Session
Data Engineering and Architecture
Location: Capital Suite 8/9
Secondary topics:  AI and Data technologies in the cloud, Data Integration and Data Pipelines, IoT and its applications
Jane McConnell (Teradata), Sun Maria Lehmann (Equinor)
In Upstream Oil and Gas, a vast amount of the data requested for analytics projects is “scientific data” - physical measurements about the real world. Historically this data has been managed “library-style” in files - but to provide this data to analytics projects, we need to do something different. Sun and Jane discuss architectural best practices learned from their work with subsurface data. Read more.
Add to your personal schedule
14:55–15:35 Thursday, 2/05/2019
Session
Data Engineering and Architecture
Location: Capital Suite 10/11
Secondary topics:  Data Integration and Data Pipelines
Jason Bell (DeskHoppa)
The Embulk data migration tool offers a convenient way to load data in to a variety of systems with basic configuration. This talk gives an overview of the Embulk tool and shows some common data migration scenarios that a data engineer could employ using the tool. Read more.
Add to your personal schedule
14:55–15:35 Thursday, 2/05/2019 Secondary topics:  AI and Data technologies in the cloud
Nikki Rouda (Amazon Web Services (AWS))
This talk is about some of the key trends we see in data lakes and analytics, and how they shape the services we offer at AWS. Specific topics include the rise of machine generated data and semi-structured/unstructured data as dominant sources of new data, the move towards serverless, SPI-centric computing, and the growing need for local access to data from users around the world. Read more.
Add to your personal schedule
14:55–15:35 Thursday, 2/05/2019
Session
Data Science, Machine Learning & AI
Location: Capital Suite 14
Shioulin Sam (Cloudera Fast Forward Labs)
Machine learning requires large datasets - a prohibitive limitation in many real world applications. What if we could build models from scratch that could recognize images using only a handful of labeled examples? In this talk, we will cover algorithmic solutions that enable learning with limited data, and discuss product opportunities. Read more.
Add to your personal schedule
14:55–15:35 Thursday, 2/05/2019
Session
Data Science, Machine Learning & AI
Location: Capital Suite 15/16
Secondary topics:  IoT and its applications, Temporal data and time-series, Transportation and Logistics
Christopher Hooi (Land Transport Authority of Singapore)
The Fusion Analytics for Public Transport Event Response (FASTER) system provides a real-time advanced analytics solution for early warning of potential train incidents. Using novel fusion analytics of multiple data sources, FASTER harnesses the use of engineering and commuter-centric IoT data sources to activate contingency plans at the earliest possible time and reduce impact to commuters. Read more.
Add to your personal schedule
14:55–15:35 Thursday, 2/05/2019
Session
Culture and organization, Strata Business Summit
Location: Capital Suite 12
Secondary topics:  AI and machine learning in the enterprise
Robert Cohen (Economic Strategy Institute)
This talk describes the skills that employers are seeking from employees in digital jobs – linked to the new software hierarchy driving digital transformation. We describe this software hierarchy as one that ranges from DevOps, CI/CD, and microservices to Kubernetes and Istio. This hierarchy is used to define the jobs that are central to data-driven digital transformation. Read more.

15:35

15:35–16:35 Thursday, 2/05/2019
Location: Expo Hall
Afternoon break (1h)

16:35

Add to your personal schedule
16:35–17:15 Thursday, 2/05/2019
Session
Data Engineering and Architecture
Location: Capital Suite 7
Secondary topics:  AI and Data technologies in the cloud, Data Integration and Data Pipelines, Retail and e-commerce
Max Schultze (Zalando SE)
Data Lake implementation at a large scale company, raw data collection, standardized data preparation (e.g. binary conversion, partitioning), user driven analytics and machine learning. Read more.
Add to your personal schedule
16:35–17:15 Thursday, 2/05/2019
Session
Data Engineering and Architecture
Location: S11 A
Secondary topics:  AI and Data technologies in the cloud, Data Platforms
Nanda Vijaydev (BlueData), Thomas Phelan (BlueData)
Organizations need to keep ahead of their competition by using the latest AI/ML/DL technologies such as Spark, TensorFlow, and H2O. The challenge is in how to deploy these tools and keep them running in a consistent manner while maximizing the use of scarce hardware resources, such as GPUs. This session will discuss the effective deployment of such applications in a container environment. Read more.
Add to your personal schedule
16:35–17:15 Thursday, 2/05/2019
Session
Data Engineering and Architecture
Location: S11 B
Secondary topics:  Data Integration and Data Pipelines
Feng Lu (Google Cloud), James Malone (Google), Apurva Desai (Google Cloud), Cameron Moberg (Truman State University / Google Cloud)
Apache Oozie and Apache Airflow (incubating) are both widely used workflow orchestration systems where the former focuses on Apache Hadoop jobs. We see a need to build oozie to Airflow workflow mapping as a part of creating an effective cross-cloud/cross-system solution. This talk aims to introduce an open-source Oozie-to-Airflow migration tool developed at Google. Read more.
Add to your personal schedule
16:35–17:15 Thursday, 2/05/2019 Secondary topics:  Data Integration and Data Pipelines, Streaming and realtime analytics
Dean Wampler (Lightbend)
Your team is building Machine Learning capabilities. I'll discuss how you can integrate these capabilities in streaming data pipelines so you can leverage the results quickly and update them as needed. There are big challenges. How do you build long-running services that are very reliable and scalable? How do you combine a spectrum of very different tools, from data science to operations? Read more.
Add to your personal schedule
16:35–17:15 Thursday, 2/05/2019
Session
Data Science, Machine Learning & AI
Location: Capital Suite 14
Secondary topics:  AI and machine learning in the enterprise, Financial Services, Security and Privacy
Brennan Lodge (Goldman Sachs), Jay Kesavan (Bowery Analytics LLC)
Cyber security analysts are under siege to keep pace with the ever-changing threat landscape. The analysts are overworked, burnout and bombarded with the sheer number of alerts that they must carefully investigate. To empower our cyber security analysts we can use a data science model for alert evaluations. Read more.
Add to your personal schedule
16:35–17:15 Thursday, 2/05/2019
Session
Data Science, Machine Learning & AI
Location: Capital Suite 15/16
Secondary topics:  IoT and its applications, Transportation and Logistics
GRDF helps bring natural gas to nearly 11 million customers everyday. In partnership with GRDF, Dataiku worked to optimise the manual process of qualifying addresses to visit and ultimately save GRDF time and money. This solution was the culmination of a year-long adventure in the land of maintenance experts, legacy IT systems and agile development. Read more.
Add to your personal schedule
16:35–17:15 Thursday, 2/05/2019
Session
Law and Ethics, Strata Business Summit
Location: Capital Suite 12
Secondary topics:  Ethics, Security and Privacy
Sundeep Reddy Mallu (Gramener Inc)
Answering simple question of what rights do Indian citizens have over their data is a nightmare. The rollout of India Stack technology based solutions has added fuel to fire. Sundeep explains, with on ground examples, how businesses and citizens are navigating the India Stack ecosystem while dealing with Data privacy, security & Ethics space in India's booming digital economy. Read more.