San FranciscoLondonNew York

Presented By
O’Reilly + Cloudera

Make Data Work

29 April–2 May 2019
London, UK

Schedule List View Grid View

Topics

S11 A

9:00 Tutorial Architecting a data platform for enterprise use Mark Madsen (Teradata), Todd Walter (Archimedata)

13:30 Tutorial Architecture and algorithms for end-to-end streaming data processing Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio), Ivan Kelly (Streamlio)

Capital Suite 7

9:00 Training Expand your data science and machine learning skills with Python, R, SQL, Spark, and TensorFlow (Day 2) Ian Cook (Cloudera)

Capital Suite 12

9:00 Tutorial Data Case Studies Paco Nathan (derwen.ai), Ganes Kesari (Gramener), Alicia Williams (Google), Semih Kumluk (Turkcell), Simon Moritz (Ericsson), Samuel Cristóbal (Innaxis), Volker Schnecke (Novo Nordisk), Julia Butter (Scout24), Cecilia Marchi (Jakala), Caroline Goulard (Dataveyes), Marc Rind (ADP), Juan Bengochea (Royal Caribbean Cruise Lines), Aaronpal Dhanda (EasyJet )

Capital Suite 13

9:00 Tutorial Findata Day Alistair Croll (Solve For Interesting), Nicolette Bullivant (Santander UK Technology), Charlotte Werger (Van Lanschot Kempen), Daniel First (QuantumBlack), Yiannis Kanellopoulos (Code4Thought), Romi Mahajan (Quantarium), Rashed Iqbal (Investment and Development Office), Martin Leijen (Rabobank / Digital Transformation Office), Tal Doron (GigaSpaces), Alistair Croll (Solve For Interesting), Chris Taggart (OpenCorporates), Jan Novotny (Deutsche Bank)

Capital Suite 14

9:00 Tutorial Continuous intelligence: Moving machine learning into production reliably Danilo Sato (ThoughtWorks), Christoph Windheuser (ThoughtWorks)

13:30 Tutorial Natural language understanding at scale with Spark NLP Alexander Thomas (John Snow Labs), Claudiu Branzan (Accenture)

Capital Suite 17

9:00 Training Large-scale ML with MLflow, deep learning, and Apache Spark (Day 2) Amir Issaei (Databricks)

Capital Suite 8

9:00 Tutorial Foundations for successful data projects Ted Malaska (Capital One), Jonathan Seidman (Cloudera)

13:30 Tutorial Your data strategy: It should be concise, actionable, and understandable by business and IT Peter Aiken (Data BluePrint | DAMA International | Virginia Commonwealth University)

Capital Suite 9

9:00 Training Machine learning from scratch in TensorFlow (Day 2) Ana Hocevar (The Data Incubator)

Capital Suite 10

9:00 Tutorial Getting ready for GDPR and CCPA: Securing and governing hybrid, cloud, and on-premises big data deployments Mark Donsky (Okera), Ifigeneia Derekli (Cloudera), Lars George (Okera), Michael Ernest (Dataiku)

13:30 Tutorial Hands-on machine learning with Kafka-based streaming pipelines Boris Lublinsky (Lightbend), Dean Wampler (Anyscale)

Capital Suite 11

9:00 Tutorial Real-time SQL stream processing at scale with Apache Kafka and KSQL Robin Moffatt (Confluent)

13:30 Tutorial Serverless machine learning with TensorFlow: Part II Melinda King (ROI Training)

Capital Suite 15

9:00 Tutorial Cross-cloud model training and serving with Kubeflow Holden Karau (Independent), Trevor Grant (IBM), Francesca Lazzeri (Microsoft)

13:30 Tutorial Learning Presto: SQL on anything Matt Fuller (Starburst)

Capital Suite 2/3

9:00 Tutorial Serverless machine learning with TensorFlow: Part I Melinda King (ROI Training)

13:30 Tutorial Time series forecasting with Azure Machine Learning Francesca Lazzeri (Microsoft), Aashish Bhateja (Microsoft)

Capital Suite 4

9:00 Tutorial Using AWS serverless technologies to analyze large datasets Krishnan Saidapet (REAN Cloud, A Hitachi Vantara company)

13:30 Tutorial Running multidisciplinary big data workloads in the cloud Colm Moynihan (Cloudera), Jonathan Seidman (Cloudera), Michael Kohs (Cloudera)

Capital Suite 1

9:00 Training Hands-on data science with Python (Day 2) Robert Schroll (The Data Incubator)

London Suite 2

9:00 Training Professional Kafka development (Day 2) Jesse Anderson (Big Data Institute)

Capital Suite 16

9:00 Training AI for managers (Day 2) Nijma Khan (Faculty ai), Alberto Favaro (Faculty)

London Suite 3

9:00 Training Building a serverless big data application on AWS (Day 2) Jorge Lopez (Amazon Web Services)

17:00 Opening Reception | Room: Expo Hall

7:30 Early morning coffee | Room: Capital Suite Foyer

10:30 Morning break | Room: Capital Suite Foyer

15:00 Afternoon Break | Room: Capital Suite Foyer

12:30 Lunch | Room: Hall N11

9:00-12:30 (3h 30m) Data Engineering and Architecture AI and Data technologies in the cloud, Data Platforms

Architecting a data platform for enterprise use

Mark Madsen (Teradata), Todd Walter (Archimedata)

Building a data lake involves more than installing Hadoop or putting data into AWS. The goal in most organizations is to build a multiuse data infrastructure that is not subject to past constraints. Mark Madsen and Todd Walter explore design assumptions and principles and walk you through a reference architecture to use as you work to unify your analytics infrastructure.

13:30-17:00 (3h 30m) Data Engineering and Architecture, Streaming and IoT AI and Data technologies in the cloud, Data Integration and Data Pipelines, Streaming and realtime analytics, Temporal data and time-series

Architecture and algorithms for end-to-end streaming data processing

Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio), Ivan Kelly (Streamlio)

Many industry segments have been grappling with fast data (high-volume, high-velocity data). Arun Kejariwal and Karthik Ramasamy walk you through the state-of-the-art systems for each stage of an end-to-end data processing pipeline—messaging, compute, and storage—for real-time data and algorithms to extract insights (e.g., heavy hitters and quantiles) from data streams.

9:00-17:00 (8h)

Expand your data science and machine learning skills with Python, R, SQL, Spark, and TensorFlow (Day 2)

Ian Cook (Cloudera)

Advancing your career in data science requires learning new languages and frameworks—but learners face an overwhelming array of choices, each with different syntaxes, conventions, and terminology. Ian Cook simplifies the learning process by elucidating the abstractions common to these systems. Through hands-on exercises, you'll overcome obstacles to getting started using new tools.

9:00-17:00 (8h)

Data Case Studies

Paco Nathan (derwen.ai), Ganes Kesari (Gramener), Alicia Williams (Google), Semih Kumluk (Turkcell), Simon Moritz (Ericsson), Samuel Cristóbal (Innaxis), Volker Schnecke (Novo Nordisk), Julia Butter (Scout24), Cecilia Marchi (Jakala), Caroline Goulard (Dataveyes), Marc Rind (ADP), Juan Bengochea (Royal Caribbean Cruise Lines), Aaronpal Dhanda (EasyJet )

Hear practical insights from household brands and global companies: the challenges they tackled, approaches they took, and the benefits—and drawbacks—of their solutions.

9:00-17:00 (8h)

Findata Day

Alistair Croll (Solve For Interesting), Nicolette Bullivant (Santander UK Technology), Charlotte Werger (Van Lanschot Kempen), Daniel First (QuantumBlack), Yiannis Kanellopoulos (Code4Thought), Romi Mahajan (Quantarium), Rashed Iqbal (Investment and Development Office), Martin Leijen (Rabobank / Digital Transformation Office), Tal Doron (GigaSpaces), Alistair Croll (Solve For Interesting), Chris Taggart (OpenCorporates), Jan Novotny (Deutsche Bank)

From analyzing risk and detecting fraud to predicting payments and improving customer experience, take a deep dive into the ways data technologies are transforming the financial industry.

9:00-12:30 (3h 30m) Data Science, Machine Learning & AI Model lifecycle management

Continuous intelligence: Moving machine learning into production reliably

Danilo Sato (ThoughtWorks), Christoph Windheuser (ThoughtWorks)

Danilo Sato and Christoph Windheuser walk you through applying continuous delivery (CD), pioneered by ThoughtWorks, to data science and machine learning. Join in to learn how to make changes to your models while safely integrating and deploying them into production, using testing and automation techniques to release reliably at any time and with a high frequency.

13:30-17:00 (3h 30m) Data Science, Machine Learning & AI Deep Learning, Text and Language processing and analysis

Natural language understanding at scale with Spark NLP

Alexander Thomas (John Snow Labs), Claudiu Branzan (Accenture)

Alex Thomas and Claudiu Branzan lead a hands-on introduction to scalable NLP using the highly performant, highly scalable open source Spark NLP library. You’ll spend about half your time coding as you work through four sections, each with an end-to-end working code base that you can change and improve.

9:00-17:00 (8h)

Large-scale ML with MLflow, deep learning, and Apache Spark (Day 2)

Amir Issaei (Databricks)

Join Amir Issaei to explore neural network fundamentals and learn how to build distributed Keras/TensorFlow models on top of Spark DataFrames. You'll use Keras, TensorFlow, Deep Learning Pipelines, and Horovod to build and tune models and MLflow to track experiments and manage the machine learning lifecycle. This course is taught entirely in Python.

9:00-12:30 (3h 30m) Data Engineering and Architecture Financial Services

Foundations for successful data projects

Ted Malaska (Capital One), Jonathan Seidman (Cloudera)

The enterprise data management space has changed dramatically in recent years, and this had led to new challenges for organizations in creating successful data practices. Jonathan Seidman and Ted Malaska share guidance and best practices from planning to implementation based on years of experience working with companies to deliver successful data projects.

13:30-17:00 (3h 30m) Strata Business Summit AI and machine learning in the enterprise

Your data strategy: It should be concise, actionable, and understandable by business and IT

Peter Aiken (Data BluePrint | DAMA International | Virginia Commonwealth University)

Peter Aiken offers a more operational perspective on the use of data strategy, which is especially useful for organizations just getting started with data

9:00-17:00 (8h)

Machine learning from scratch in TensorFlow (Day 2)

Ana Hocevar (The Data Incubator)

The TensorFlow library provides for the use of computational graphs, with automatic parallelization across resources. This architecture is ideal for implementing neural networks. Ana Hocevar offers an intro to TensorFlow's capabilities in Python, taking you from building machine learning algorithms piece by piece to using the Keras API provided by TensorFlow with several hands-on applications.

9:00-12:30 (3h 30m) Data Engineering and Architecture Security and Privacy

Getting ready for GDPR and CCPA: Securing and governing hybrid, cloud, and on-premises big data deployments

Mark Donsky (Okera), Ifigeneia Derekli (Cloudera), Lars George (Okera), Michael Ernest (Dataiku)

New regulations such as CCPA and GDPR are driving new compliance, governance, and security challenges for big data. Infosec and security groups must ensure a consistently secured and governed environment across multiple workloads. Mark Donsky, Ifigeneia Derekli, Lars George, and Michael Ernest share hands-on best practices for meeting these challenges, with special attention paid to CCPA.

13:30-17:00 (3h 30m) Streaming and IoT Model lifecycle management, Streaming and realtime analytics

Hands-on machine learning with Kafka-based streaming pipelines

Boris Lublinsky (Lightbend), Dean Wampler (Anyscale)

Boris Lublinsky and Dean Wampler walk you through using ML in streaming data pipelines and doing periodic model retraining and low-latency scoring in live streams. You'll explore using Kafka as a data backplane, the pros and cons of microservices versus systems like Spark and Flink, tips for TensorFlow and SparkML, performance considerations, model metadata tracking, and other techniques.

9:00-12:30 (3h 30m) Data Engineering and Architecture Streaming and realtime analytics

Real-time SQL stream processing at scale with Apache Kafka and KSQL

Robin Moffatt (Confluent)

Robin Moffatt walks you through the architectural reasoning for Apache Kafka and the benefits of real-time integration. You'll then build a streaming data pipeline using nothing but your bare hands, Kafka Connect, and KSQL.

13:30-17:00 (3h 30m) Data Science, Machine Learning & AI AI and Data technologies in the cloud, Deep Learning

Serverless machine learning with TensorFlow: Part II

Melinda King (ROI Training)

Melinda King offers an introduction to designing and building machine learning models on Google Cloud Platform. Through a combination of presentations, demos, and hands-on labs, you’ll learn machine learning (ML) and TensorFlow concepts and develop skills in developing, evaluating, and productionizing ML models.

9:00-12:30 (3h 30m) Data Science, Machine Learning & AI AI and Data technologies in the cloud, Model lifecycle management

Cross-cloud model training and serving with Kubeflow

Holden Karau (Independent), Trevor Grant (IBM), Francesca Lazzeri (Microsoft)

Holden Karau, Francesca Lazzeri, and Trevor Grant offer an overview of Kubeflow and walk you through using it to train and serve models across different cloud environments (and on-premises). You'll use a script to do the initial setup work, so you can jump (almost) straight into training a model on one cloud and then look at how to set up serving in another cluster/cloud.

13:30-17:00 (3h 30m) Data Engineering and Architecture AI and Data technologies in the cloud

Learning Presto: SQL on anything

Matt Fuller (Starburst)

Used by Facebook, Netflix, Airbnb, LinkedIn, Twitter, Uber, and others, Presto has become the ubiquitous open source software for SQL on anything. Presto was built from the ground up for fast interactive SQL analytics against disparate data sources ranging in size from GBs to PBs. Join Matt Fuller to learn how to use Presto and explore use cases and best practices you can implement today.

9:00-12:30 (3h 30m) Data Science, Machine Learning & AI AI and Data technologies in the cloud, Deep Learning

Serverless machine learning with TensorFlow: Part I

Melinda King (ROI Training)

Melinda King offers an introduction to designing and building machine learning models on Google Cloud Platform. Through a combination of presentations, demos, and hands-on labs, you’ll learn machine learning (ML) and TensorFlow concepts, and develop skills in developing, evaluating, and productionizing ML models.

13:30-17:00 (3h 30m) Data Science, Machine Learning & AI AI and Data technologies in the cloud, Deep Learning, Financial Services, Temporal data and time-series

Time series forecasting with Azure Machine Learning

Francesca Lazzeri (Microsoft), Aashish Bhateja (Microsoft)

Time series modeling and forecasting is fundamentally important to various practical domains; in the past few decades, machine learning model-based forecasting has become very popular in both private and public decision-making processes. Francesca Lazzeri walks you through using Azure Machine Learning to build and deploy your time series forecasting models.

9:00-12:30 (3h 30m) Data Science, Machine Learning & AI AI and Data technologies in the cloud, Data preparation, data governance, and data lineage, Health and Medicine

Using AWS serverless technologies to analyze large datasets

Krishnan Saidapet (REAN Cloud, A Hitachi Vantara company)

Krishnan Saidapet offers an overview of the latest big data and machine learning serverless technologies from Amazon Web Services (AWS) and leads a deep dive into using them to process and analyze two different datasets: the publicly available Bureau of Labor Statistics dataset and the Chest X-Ray Image Data dataset.

13:30-17:00 (3h 30m) Data Engineering and Architecture AI and Data technologies in the cloud

Running multidisciplinary big data workloads in the cloud

Colm Moynihan (Cloudera), Jonathan Seidman (Cloudera), Michael Kohs (Cloudera)

Moving to the cloud poses a number of challenges. Join Colm Moynihan, Jonathan Seidman, and Michael Kohs to explore cloud architecture and challenges and learn how to use Cloudera Altus to build data warehousing and data engineering clusters and run workloads that share metadata between them using Cloudera SDX.

9:00-17:00 (8h)

Hands-on data science with Python (Day 2)

Robert Schroll (The Data Incubator)

Robert Schroll walks you through all the steps of developing a machine learning pipeline from prototyping to production. You'll explore data cleaning, feature engineering, model building and evaluation, and deployment and then extend these models into two applications from real-world datasets. All work will be done in Python.

9:00-17:00 (8h)

Professional Kafka development (Day 2)

Jesse Anderson (Big Data Institute)

Jesse Anderson offers an in-depth look at Apache Kafka. You'll learn how Kafka works and how to create real-time systems with it as well as how to create consumers and publishers. Jesse then walks you through Kafka’s ecosystem, demonstrating how to use tools like Kafka Streams, Kafka Connect, and KSQL.

9:00-17:00 (8h)

AI for managers (Day 2)

Nijma Khan (Faculty ai), Alberto Favaro (Faculty)

Nijma Khan and Alberto Favaro offer a condensed introduction to key AI and machine learning concepts and techniques, showing you what is (and isn't) possible with these exciting new tools and how they can benefit your organization.

9:00-17:00 (8h)

Building a serverless big data application on AWS (Day 2)

Jorge Lopez (Amazon Web Services)

Serverless technologies let you build and scale applications and services rapidly without the need to provision or manage servers. Join in to learn how to incorporate serverless concepts into your big data architectures. You'll explore design patterns to ingest, store, and analyze your data as you build a big data application using AWS technologies such as S3, Athena, Kinesis, and more.

17:00-18:00 (1h)

Opening Reception

Join us after tutorials on Tuesday in the Expo Hall. Grab a drink and mingle with fellow Strata attendees while you check out all of the exhibitors.

7:30-9:00 (1h 30m)

Break: Early morning coffee

10:30-11:00 (30m)

Break: Morning break

15:00-15:30 (30m)

Break: Afternoon Break

12:30-13:30 (1h)

Break: Lunch

Presented by

Global Sponsors

Zettabyte Sponsor

Exabyte Sponsor

Impact Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2019, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com

Schedule List ViewGrid View

Topics

Sponsorship Opportunities

Partner Opportunities

Contact Us

Schedule List View Grid View