Presented By
O’Reilly + Cloudera
Make Data Work
29 April–2 May 2019
London, UK
 
S11 A
Add Architecting a data platform for enterprise use to your personal schedule
9:00 Tutorial Architecting a data platform for enterprise use Mark Madsen (Teradata), Todd Walter (Archimedata)
Add Architecture and algorithms for end-to-end streaming data processing to your personal schedule
13:30 Tutorial Architecture and algorithms for end-to-end streaming data processing Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio), Ivan Kelly (Streamlio)
Capital Suite 7
Capital Suite 12
Add Data Case Studies to your personal schedule
9:00 Tutorial Data Case Studies Paco Nathan (derwen.ai), Ganes Kesari (Gramener), Alicia Williams (Google), Semih Kumluk (Turkcell), Simon Moritz (Ericsson), Samuel Cristóbal (Innaxis), Volker Schnecke (Novo Nordisk), Julia Butter (Scout24), Cecilia Marchi (Jakala), Caroline Goulard (Dataveyes), Marc Rind (ADP), Juan Bengochea (Royal Caribbean Cruise Lines), Aaronpal Dhanda (EasyJet )
Capital Suite 13
Add Findata Day to your personal schedule
9:00 Tutorial Findata Day Alistair Croll (Solve For Interesting), Nicolette Bullivant (Santander UK Technology), Charlotte Werger (Van Lanschot Kempen), Daniel First (QuantumBlack), Yiannis Kanellopoulos (Code4Thought), Romi Mahajan (Quantarium), Rashed Iqbal (Investment and Development Office), Martin Leijen (Rabobank / Digital Transformation Office), Tal Doron (GigaSpaces), Alistair Croll (Solve For Interesting), Chris Taggart (OpenCorporates), Jan Novotny (Deutsche Bank)
Capital Suite 14
Add Continuous intelligence: Moving machine learning into production reliably to your personal schedule
9:00 Tutorial Continuous intelligence: Moving machine learning into production reliably Danilo Sato (ThoughtWorks), Christoph Windheuser (ThoughtWorks)
Add Natural language understanding at scale with Spark NLP to your personal schedule
13:30 Tutorial Natural language understanding at scale with Spark NLP Alexander Thomas (John Snow Labs), Claudiu Branzan (Accenture)
Capital Suite 17
Capital Suite 8
Add Foundations for successful data projects to your personal schedule
9:00 Tutorial Foundations for successful data projects Ted Malaska (Capital One), Jonathan Seidman (Cloudera)
Add Your data strategy: It should be concise, actionable, and understandable by business and IT to your personal schedule
13:30 Tutorial Your data strategy: It should be concise, actionable, and understandable by business and IT Peter Aiken (Data BluePrint | DAMA International | Virginia Commonwealth University)
Capital Suite 9
Add Machine learning from scratch in TensorFlow (Day 2) to your personal schedule
9:00 Training Machine learning from scratch in TensorFlow (Day 2) Ana Hocevar (The Data Incubator)
Capital Suite 10
Add Getting ready for GDPR and CCPA: Securing and governing hybrid, cloud, and on-premises big data deployments to your personal schedule
9:00 Tutorial Getting ready for GDPR and CCPA: Securing and governing hybrid, cloud, and on-premises big data deployments Mark Donsky (Okera), Ifigeneia Derekli (Cloudera), Lars George (Okera), Michael Ernest (Dataiku)
Add Hands-on machine learning with Kafka-based streaming pipelines to your personal schedule
13:30 Tutorial Hands-on machine learning with Kafka-based streaming pipelines Boris Lublinsky (Lightbend), Dean Wampler (Anyscale)
Capital Suite 11
Add Serverless machine learning with TensorFlow: Part II to your personal schedule
13:30 Tutorial Serverless machine learning with TensorFlow: Part II Melinda King (ROI Training)
Capital Suite 15
Add Cross-cloud model training and serving with Kubeflow to your personal schedule
9:00 Tutorial Cross-cloud model training and serving with Kubeflow Holden Karau (Independent), Trevor Grant (IBM), Francesca Lazzeri (Microsoft)
Add Learning Presto: SQL on anything to your personal schedule
13:30 Tutorial Learning Presto: SQL on anything Matt Fuller (Starburst)
Capital Suite 2/3
Add Serverless machine learning with TensorFlow: Part I to your personal schedule
9:00 Tutorial Serverless machine learning with TensorFlow: Part I Melinda King (ROI Training)
Add Time series forecasting with Azure Machine Learning to your personal schedule
13:30 Tutorial Time series forecasting with Azure Machine Learning Francesca Lazzeri (Microsoft), Aashish Bhateja (Microsoft)
Capital Suite 4
Add Using AWS serverless technologies to analyze large datasets to your personal schedule
9:00 Tutorial Using AWS serverless technologies to analyze large datasets Krishnan Saidapet (REAN Cloud, A Hitachi Vantara company)
Add Running multidisciplinary big data workloads in the cloud to your personal schedule
13:30 Tutorial Running multidisciplinary big data workloads in the cloud Colm Moynihan (Cloudera), Jonathan Seidman (Cloudera), Michael Kohs (Cloudera)
Capital Suite 1
Add Hands-on data science with Python (Day 2) to your personal schedule
9:00 Training Hands-on data science with Python (Day 2) Robert Schroll (The Data Incubator)
London Suite 2
Add Professional Kafka development (Day 2) to your personal schedule
9:00 Training Professional Kafka development (Day 2) Jesse Anderson (Big Data Institute)
Capital Suite 16
Add AI for managers (Day 2) to your personal schedule
9:00 Training AI for managers (Day 2) Nijma Khan (Faculty ai), Alberto Favaro (Faculty)
London Suite 3
Add Building a serverless big data application on AWS (Day 2) to your personal schedule
9:00 Training Building a serverless big data application on AWS (Day 2) Jorge Lopez (Amazon Web Services)
Add Opening Reception to your personal schedule
17:00 Opening Reception | Room: Expo Hall
7:30 Early morning coffee | Room: Capital Suite Foyer
10:30 Morning break | Room: Capital Suite Foyer
15:00 Afternoon Break | Room: Capital Suite Foyer
12:30 Lunch | Room: Hall N11
9:00-12:30 (3h 30m) Data Engineering and Architecture AI and Data technologies in the cloud, Data Platforms
Architecting a data platform for enterprise use
Mark Madsen (Teradata), Todd Walter (Archimedata)
Building a data lake involves more than installing Hadoop or putting data into AWS. The goal in most organizations is to build a multiuse data infrastructure that is not subject to past constraints. Mark Madsen and Todd Walter explore design assumptions and principles and walk you through a reference architecture to use as you work to unify your analytics infrastructure.
13:30-17:00 (3h 30m) Data Engineering and Architecture, Streaming and IoT AI and Data technologies in the cloud, Data Integration and Data Pipelines, Streaming and realtime analytics, Temporal data and time-series
Architecture and algorithms for end-to-end streaming data processing
Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio), Ivan Kelly (Streamlio)
Many industry segments have been grappling with fast data (high-volume, high-velocity data). Arun Kejariwal and Karthik Ramasamy walk you through the state-of-the-art systems for each stage of an end-to-end data processing pipeline—messaging, compute, and storage—for real-time data and algorithms to extract insights (e.g., heavy hitters and quantiles) from data streams.
9:00-17:00 (8h)
Expand your data science and machine learning skills with Python, R, SQL, Spark, and TensorFlow (Day 2)
Ian Cook (Cloudera)
Advancing your career in data science requires learning new languages and frameworks—but learners face an overwhelming array of choices, each with different syntaxes, conventions, and terminology. Ian Cook simplifies the learning process by elucidating the abstractions common to these systems. Through hands-on exercises, you'll overcome obstacles to getting started using new tools.
9:00-17:00 (8h)
Data Case Studies
Paco Nathan (derwen.ai), Ganes Kesari (Gramener), Alicia Williams (Google), Semih Kumluk (Turkcell), Simon Moritz (Ericsson), Samuel Cristóbal (Innaxis), Volker Schnecke (Novo Nordisk), Julia Butter (Scout24), Cecilia Marchi (Jakala), Caroline Goulard (Dataveyes), Marc Rind (ADP), Juan Bengochea (Royal Caribbean Cruise Lines), Aaronpal Dhanda (EasyJet )
Hear practical insights from household brands and global companies: the challenges they tackled, approaches they took, and the benefits—and drawbacks—of their solutions.
9:00-17:00 (8h)
Findata Day
Alistair Croll (Solve For Interesting), Nicolette Bullivant (Santander UK Technology), Charlotte Werger (Van Lanschot Kempen), Daniel First (QuantumBlack), Yiannis Kanellopoulos (Code4Thought), Romi Mahajan (Quantarium), Rashed Iqbal (Investment and Development Office), Martin Leijen (Rabobank / Digital Transformation Office), Tal Doron (GigaSpaces), Alistair Croll (Solve For Interesting), Chris Taggart (OpenCorporates), Jan Novotny (Deutsche Bank)
From analyzing risk and detecting fraud to predicting payments and improving customer experience, take a deep dive into the ways data technologies are transforming the financial industry.
9:00-12:30 (3h 30m) Data Science, Machine Learning & AI Model lifecycle management
Continuous intelligence: Moving machine learning into production reliably
Danilo Sato (ThoughtWorks), Christoph Windheuser (ThoughtWorks)
Danilo Sato and Christoph Windheuser walk you through applying continuous delivery (CD), pioneered by ThoughtWorks, to data science and machine learning. Join in to learn how to make changes to your models while safely integrating and deploying them into production, using testing and automation techniques to release reliably at any time and with a high frequency.
13:30-17:00 (3h 30m) Data Science, Machine Learning & AI Deep Learning, Text and Language processing and analysis
Natural language understanding at scale with Spark NLP
Alexander Thomas (John Snow Labs), Claudiu Branzan (Accenture)
Alex Thomas and Claudiu Branzan lead a hands-on introduction to scalable NLP using the highly performant, highly scalable open source Spark NLP library. You’ll spend about half your time coding as you work through four sections, each with an end-to-end working code base that you can change and improve.
9:00-17:00 (8h)
Large-scale ML with MLflow, deep learning, and Apache Spark (Day 2)
Amir Issaei (Databricks)
Join Amir Issaei to explore neural network fundamentals and learn how to build distributed Keras/TensorFlow models on top of Spark DataFrames. You'll use Keras, TensorFlow, Deep Learning Pipelines, and Horovod to build and tune models and MLflow to track experiments and manage the machine learning lifecycle. This course is taught entirely in Python.
9:00-12:30 (3h 30m) Data Engineering and Architecture Financial Services
Foundations for successful data projects
Ted Malaska (Capital One), Jonathan Seidman (Cloudera)
The enterprise data management space has changed dramatically in recent years, and this had led to new challenges for organizations in creating successful data practices. Jonathan Seidman and Ted Malaska share guidance and best practices from planning to implementation based on years of experience working with companies to deliver successful data projects.
13:30-17:00 (3h 30m) Strata Business Summit AI and machine learning in the enterprise
Your data strategy: It should be concise, actionable, and understandable by business and IT
Peter Aiken (Data BluePrint | DAMA International | Virginia Commonwealth University)
Peter Aiken offers a more operational perspective on the use of data strategy, which is especially useful for organizations just getting started with data
9:00-17:00 (8h)
Machine learning from scratch in TensorFlow (Day 2)
Ana Hocevar (The Data Incubator)
The TensorFlow library provides for the use of computational graphs, with automatic parallelization across resources. This architecture is ideal for implementing neural networks. Ana Hocevar offers an intro to TensorFlow's capabilities in Python, taking you from building machine learning algorithms piece by piece to using the Keras API provided by TensorFlow with several hands-on applications.
9:00-12:30 (3h 30m) Data Engineering and Architecture Security and Privacy
Getting ready for GDPR and CCPA: Securing and governing hybrid, cloud, and on-premises big data deployments
Mark Donsky (Okera), Ifigeneia Derekli (Cloudera), Lars George (Okera), Michael Ernest (Dataiku)
New regulations such as CCPA and GDPR are driving new compliance, governance, and security challenges for big data. Infosec and security groups must ensure a consistently secured and governed environment across multiple workloads. Mark Donsky, Ifigeneia Derekli, Lars George, and Michael Ernest share hands-on best practices for meeting these challenges, with special attention paid to CCPA.
13:30-17:00 (3h 30m) Streaming and IoT Model lifecycle management, Streaming and realtime analytics
Hands-on machine learning with Kafka-based streaming pipelines
Boris Lublinsky (Lightbend), Dean Wampler (Anyscale)
Boris Lublinsky and Dean Wampler walk you through using ML in streaming data pipelines and doing periodic model retraining and low-latency scoring in live streams. You'll explore using Kafka as a data backplane, the pros and cons of microservices versus systems like Spark and Flink, tips for TensorFlow and SparkML, performance considerations, model metadata tracking, and other techniques.
9:00-12:30 (3h 30m) Data Engineering and Architecture Streaming and realtime analytics
Real-time SQL stream processing at scale with Apache Kafka and KSQL
Robin Moffatt (Confluent)
Robin Moffatt walks you through the architectural reasoning for Apache Kafka and the benefits of real-time integration. You'll then build a streaming data pipeline using nothing but your bare hands, Kafka Connect, and KSQL.
13:30-17:00 (3h 30m) Data Science, Machine Learning & AI AI and Data technologies in the cloud, Deep Learning
Serverless machine learning with TensorFlow: Part II
Melinda King (ROI Training)
Melinda King offers an introduction to designing and building machine learning models on Google Cloud Platform. Through a combination of presentations, demos, and hands-on labs, you’ll learn machine learning (ML) and TensorFlow concepts and develop skills in developing, evaluating, and productionizing ML models.
9:00-12:30 (3h 30m) Data Science, Machine Learning & AI AI and Data technologies in the cloud, Model lifecycle management
Cross-cloud model training and serving with Kubeflow
Holden Karau (Independent), Trevor Grant (IBM), Francesca Lazzeri (Microsoft)
Holden Karau, Francesca Lazzeri, and Trevor Grant offer an overview of Kubeflow and walk you through using it to train and serve models across different cloud environments (and on-premises). You'll use a script to do the initial setup work, so you can jump (almost) straight into training a model on one cloud and then look at how to set up serving in another cluster/cloud.
13:30-17:00 (3h 30m) Data Engineering and Architecture AI and Data technologies in the cloud
Learning Presto: SQL on anything
Matt Fuller (Starburst)
Used by Facebook, Netflix, Airbnb, LinkedIn, Twitter, Uber, and others, Presto has become the ubiquitous open source software for SQL on anything. Presto was built from the ground up for fast interactive SQL analytics against disparate data sources ranging in size from GBs to PBs. Join Matt Fuller to learn how to use Presto and explore use cases and best practices you can implement today.
9:00-12:30 (3h 30m) Data Science, Machine Learning & AI AI and Data technologies in the cloud, Deep Learning
Serverless machine learning with TensorFlow: Part I
Melinda King (ROI Training)
Melinda King offers an introduction to designing and building machine learning models on Google Cloud Platform. Through a combination of presentations, demos, and hands-on labs, you’ll learn machine learning (ML) and TensorFlow concepts, and develop skills in developing, evaluating, and productionizing ML models.
13:30-17:00 (3h 30m) Data Science, Machine Learning & AI AI and Data technologies in the cloud, Deep Learning, Financial Services, Temporal data and time-series
Time series forecasting with Azure Machine Learning
Francesca Lazzeri (Microsoft), Aashish Bhateja (Microsoft)
Time series modeling and forecasting is fundamentally important to various practical domains; in the past few decades, machine learning model-based forecasting has become very popular in both private and public decision-making processes. Francesca Lazzeri walks you through using Azure Machine Learning to build and deploy your time series forecasting models.
9:00-12:30 (3h 30m) Data Science, Machine Learning & AI AI and Data technologies in the cloud, Data preparation, data governance, and data lineage, Health and Medicine
Using AWS serverless technologies to analyze large datasets
Krishnan Saidapet (REAN Cloud, A Hitachi Vantara company)
Krishnan Saidapet offers an overview of the latest big data and machine learning serverless technologies from Amazon Web Services (AWS) and leads a deep dive into using them to process and analyze two different datasets: the publicly available Bureau of Labor Statistics dataset and the Chest X-Ray Image Data dataset.
13:30-17:00 (3h 30m) Data Engineering and Architecture AI and Data technologies in the cloud
Running multidisciplinary big data workloads in the cloud
Colm Moynihan (Cloudera), Jonathan Seidman (Cloudera), Michael Kohs (Cloudera)
Moving to the cloud poses a number of challenges. Join Colm Moynihan, Jonathan Seidman, and Michael Kohs to explore cloud architecture and challenges and learn how to use Cloudera Altus to build data warehousing and data engineering clusters and run workloads that share metadata between them using Cloudera SDX.
9:00-17:00 (8h)
Hands-on data science with Python (Day 2)
Robert Schroll (The Data Incubator)
Robert Schroll walks you through all the steps of developing a machine learning pipeline from prototyping to production. You'll explore data cleaning, feature engineering, model building and evaluation, and deployment and then extend these models into two applications from real-world datasets. All work will be done in Python.
9:00-17:00 (8h)
Professional Kafka development (Day 2)
Jesse Anderson (Big Data Institute)
Jesse Anderson offers an in-depth look at Apache Kafka. You'll learn how Kafka works and how to create real-time systems with it as well as how to create consumers and publishers. Jesse then walks you through Kafka’s ecosystem, demonstrating how to use tools like Kafka Streams, Kafka Connect, and KSQL.
9:00-17:00 (8h)
AI for managers (Day 2)
Nijma Khan (Faculty ai), Alberto Favaro (Faculty)
Nijma Khan and Alberto Favaro offer a condensed introduction to key AI and machine learning concepts and techniques, showing you what is (and isn't) possible with these exciting new tools and how they can benefit your organization.
9:00-17:00 (8h)
Building a serverless big data application on AWS (Day 2)
Jorge Lopez (Amazon Web Services)
Serverless technologies let you build and scale applications and services rapidly without the need to provision or manage servers. Join in to learn how to incorporate serverless concepts into your big data architectures. You'll explore design patterns to ingest, store, and analyze your data as you build a big data application using AWS technologies such as S3, Athena, Kinesis, and more.
17:00-18:00 (1h)
Opening Reception
Join us after tutorials on Tuesday in the Expo Hall. Grab a drink and mingle with fellow Strata attendees while you check out all of the exhibitors.
7:30-9:00 (1h 30m)
Break: Early morning coffee
10:30-11:00 (30m)
Break: Morning break
15:00-15:30 (30m)
Break: Afternoon Break
12:30-13:30 (1h)
Break: Lunch