Presented By
O’Reilly + Cloudera

Make Data Work

29 April–2 May 2019
London, UK

Tutorials

These expert-led presentations on Tuesday, 30 April give you a chance to dive deep into the subject matter. Please note: to attend tutorials, you must register for a Gold or Silver pass; does not include access to training courses on Monday or Tuesday.

Tuesday, 30 April

9:00–12:30 Tuesday, 30 April 2019

Foundations for successful data projects

Location: Capital Suite 8

Secondary topics: Financial Services

Ted Malaska (Capital One), Jonathan Seidman (Cloudera)

Average rating:

(3.50, 12 ratings)

The enterprise data management space has changed dramatically in recent years, and this had led to new challenges for organizations in creating successful data practices. Jonathan Seidman and Ted Malaska share guidance and best practices from planning to implementation based on years of experience working with companies to deliver successful data projects. Read more.

9:00–12:30 Tuesday, 30 April 2019

Real-time SQL stream processing at scale with Apache Kafka and KSQL

Location: Capital Suite 11

Secondary topics: Streaming and realtime analytics

Robin Moffatt (Confluent)

Average rating:

(5.00, 5 ratings)

Robin Moffatt walks you through the architectural reasoning for Apache Kafka and the benefits of real-time integration. You'll then build a streaming data pipeline using nothing but your bare hands, Kafka Connect, and KSQL. Read more.

9:00–12:30 Tuesday, 30 April 2019

Continuous intelligence: Moving machine learning into production reliably

Location: Capital Suite 14

Secondary topics: Model lifecycle management

Danilo Sato (ThoughtWorks), Christoph Windheuser (ThoughtWorks)

Average rating:

(4.31, 13 ratings)

Danilo Sato and Christoph Windheuser walk you through applying continuous delivery (CD), pioneered by ThoughtWorks, to data science and machine learning. Join in to learn how to make changes to your models while safely integrating and deploying them into production, using testing and automation techniques to release reliably at any time and with a high frequency. Read more.

9:00–12:30 Tuesday, 30 April 2019

Cross-cloud model training and serving with Kubeflow

Location: Capital Suite 15

Secondary topics: AI and Data technologies in the cloud, Model lifecycle management

Holden Karau (Independent), Trevor Grant (IBM), Francesca Lazzeri (Microsoft)

Average rating:

(4.43, 7 ratings)

Holden Karau, Francesca Lazzeri, and Trevor Grant offer an overview of Kubeflow and walk you through using it to train and serve models across different cloud environments (and on-premises). You'll use a script to do the initial setup work, so you can jump (almost) straight into training a model on one cloud and then look at how to set up serving in another cluster/cloud. Read more.

9:00–12:30 Tuesday, 30 April 2019

Using AWS serverless technologies to analyze large datasets

Location: Capital Suite 4

Secondary topics: AI and Data technologies in the cloud, Data preparation, data governance, and data lineage, Health and Medicine

Krishnan Saidapet (REAN Cloud, A Hitachi Vantara company)

Average rating:

(3.43, 7 ratings)

Krishnan Saidapet offers an overview of the latest big data and machine learning serverless technologies from Amazon Web Services (AWS) and leads a deep dive into using them to process and analyze two different datasets: the publicly available Bureau of Labor Statistics dataset and the Chest X-Ray Image Data dataset. Read more.

9:00–12:30 Tuesday, 30 April 2019

Getting ready for GDPR and CCPA: Securing and governing hybrid, cloud, and on-premises big data deployments

Location: Capital Suite 10

Secondary topics: Security and Privacy

Mark Donsky (Okera), Ifigeneia Derekli (Cloudera), Lars George (Okera), Michael Ernest (Dataiku)

Average rating:

(4.00, 2 ratings)

New regulations such as CCPA and GDPR are driving new compliance, governance, and security challenges for big data. Infosec and security groups must ensure a consistently secured and governed environment across multiple workloads. Mark Donsky, Ifigeneia Derekli, Lars George, and Michael Ernest share hands-on best practices for meeting these challenges, with special attention paid to CCPA. Read more.

9:00–12:30 Tuesday, 30 April 2019

Serverless machine learning with TensorFlow: Part I

Location: Capital Suite 2/3

Secondary topics: AI and Data technologies in the cloud, Deep Learning

Melinda King (ROI Training)

Average rating:

(3.00, 8 ratings)

Melinda King offers an introduction to designing and building machine learning models on Google Cloud Platform. Through a combination of presentations, demos, and hands-on labs, you’ll learn machine learning (ML) and TensorFlow concepts, and develop skills in developing, evaluating, and productionizing ML models. Read more.

9:00–12:30 Tuesday, 30 April 2019

Architecting a data platform for enterprise use

Location: S11 A

Secondary topics: AI and Data technologies in the cloud, Data Platforms

Mark Madsen (Teradata), Todd Walter (Archimedata)

Average rating:

(3.71, 7 ratings)

Building a data lake involves more than installing Hadoop or putting data into AWS. The goal in most organizations is to build a multiuse data infrastructure that is not subject to past constraints. Mark Madsen and Todd Walter explore design assumptions and principles and walk you through a reference architecture to use as you work to unify your analytics infrastructure. Read more.

9:00–17:00 Tuesday, 30 April 2019

Data Case Studies

Location: Capital Suite 12

Paco Nathan (derwen.ai), Ganes Kesari (Gramener), Alicia Williams (Google), Semih Kumluk (Turkcell), Simon Moritz (Ericsson), Samuel Cristóbal (Innaxis), Volker Schnecke (Novo Nordisk), Julia Butter (Scout24), Cecilia Marchi (Jakala), Caroline Goulard (Dataveyes), Marc Rind (ADP), Juan Bengochea (Royal Caribbean Cruise Lines), Aaronpal Dhanda (EasyJet )

Hear practical insights from household brands and global companies: the challenges they tackled, approaches they took, and the benefits—and drawbacks—of their solutions. Read more.

9:00–17:00 Tuesday, 30 April 2019

Findata Day

Location: Capital Suite 13

Alistair Croll (Solve For Interesting), Nicolette Bullivant (Santander UK Technology), Charlotte Werger (Van Lanschot Kempen), Daniel First (QuantumBlack), Yiannis Kanellopoulos (Code4Thought), Romi Mahajan (Quantarium), Rashed Iqbal (Investment and Development Office), Martin Leijen (Rabobank / Digital Transformation Office), Tal Doron (GigaSpaces), Alistair Croll (Solve For Interesting), Chris Taggart (OpenCorporates), Jan Novotny (Deutsche Bank)

From analyzing risk and detecting fraud to predicting payments and improving customer experience, take a deep dive into the ways data technologies are transforming the financial industry. Read more.

13:30–17:00 Tuesday, 30 April 2019

Running multidisciplinary big data workloads in the cloud

Location: Capital Suite 4

Secondary topics: AI and Data technologies in the cloud

Colm Moynihan (Cloudera), Jonathan Seidman (Cloudera), Michael Kohs (Cloudera)

Average rating:

(4.00, 2 ratings)

Moving to the cloud poses a number of challenges. Join Colm Moynihan, Jonathan Seidman, and Michael Kohs to explore cloud architecture and challenges and learn how to use Cloudera Altus to build data warehousing and data engineering clusters and run workloads that share metadata between them using Cloudera SDX. Read more.

13:30–17:00 Tuesday, 30 April 2019

Architecture and algorithms for end-to-end streaming data processing

Location: S11 A

Secondary topics: AI and Data technologies in the cloud, Data Integration and Data Pipelines, Streaming and realtime analytics, Temporal data and time-series

Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio), Ivan Kelly (Streamlio)

Average rating:

(3.00, 10 ratings)

Many industry segments have been grappling with fast data (high-volume, high-velocity data). Arun Kejariwal and Karthik Ramasamy walk you through the state-of-the-art systems for each stage of an end-to-end data processing pipeline—messaging, compute, and storage—for real-time data and algorithms to extract insights (e.g., heavy hitters and quantiles) from data streams. Read more.

13:30–17:00 Tuesday, 30 April 2019

Your data strategy: It should be concise, actionable, and understandable by business and IT

Location: Capital Suite 8

Secondary topics: AI and machine learning in the enterprise

Peter Aiken (Data BluePrint | DAMA International | Virginia Commonwealth University)

Average rating:

(3.43, 14 ratings)

Peter Aiken offers a more operational perspective on the use of data strategy, which is especially useful for organizations just getting started with data Read more.

13:30–17:00 Tuesday, 30 April 2019

Hands-on machine learning with Kafka-based streaming pipelines

Location: Capital Suite 10

Secondary topics: Model lifecycle management, Streaming and realtime analytics

Boris Lublinsky (Lightbend), Dean Wampler (Anyscale)

Average rating:

(4.20, 5 ratings)

Boris Lublinsky and Dean Wampler walk you through using ML in streaming data pipelines and doing periodic model retraining and low-latency scoring in live streams. You'll explore using Kafka as a data backplane, the pros and cons of microservices versus systems like Spark and Flink, tips for TensorFlow and SparkML, performance considerations, model metadata tracking, and other techniques. Read more.

13:30–17:00 Tuesday, 30 April 2019

Natural language understanding at scale with Spark NLP

Location: Capital Suite 14

Secondary topics: Deep Learning, Text and Language processing and analysis

Alexander Thomas (John Snow Labs), Claudiu Branzan (Accenture)

Average rating:

(4.00, 4 ratings)

Alex Thomas and Claudiu Branzan lead a hands-on introduction to scalable NLP using the highly performant, highly scalable open source Spark NLP library. You’ll spend about half your time coding as you work through four sections, each with an end-to-end working code base that you can change and improve. Read more.

13:30–17:00 Tuesday, 30 April 2019

Serverless machine learning with TensorFlow: Part II

Location: Capital Suite 11

Secondary topics: AI and Data technologies in the cloud, Deep Learning

Melinda King (ROI Training)

Average rating:

(3.12, 8 ratings)

Melinda King offers an introduction to designing and building machine learning models on Google Cloud Platform. Through a combination of presentations, demos, and hands-on labs, you’ll learn machine learning (ML) and TensorFlow concepts and develop skills in developing, evaluating, and productionizing ML models. Read more.

13:30–17:00 Tuesday, 30 April 2019

Learning Presto: SQL on anything

Location: Capital Suite 15

Secondary topics: AI and Data technologies in the cloud

Matt Fuller (Starburst)

Average rating:

(5.00, 2 ratings)

Used by Facebook, Netflix, Airbnb, LinkedIn, Twitter, Uber, and others, Presto has become the ubiquitous open source software for SQL on anything. Presto was built from the ground up for fast interactive SQL analytics against disparate data sources ranging in size from GBs to PBs. Join Matt Fuller to learn how to use Presto and explore use cases and best practices you can implement today. Read more.

13:30–17:00 Tuesday, 30 April 2019

Time series forecasting with Azure Machine Learning

Location: Capital Suite 2/3

Secondary topics: AI and Data technologies in the cloud, Deep Learning, Financial Services, Temporal data and time-series

Francesca Lazzeri (Microsoft), Aashish Bhateja (Microsoft)

Average rating:

(4.25, 4 ratings)

Time series modeling and forecasting is fundamentally important to various practical domains; in the past few decades, machine learning model-based forecasting has become very popular in both private and public decision-making processes. Francesca Lazzeri walks you through using Azure Machine Learning to build and deploy your time series forecasting models. Read more.

Presented by

Global Sponsors

Zettabyte Sponsor

Exabyte Sponsor

Impact Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2019, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com