Presented By O’Reilly and Cloudera
Make Data Work
21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK
 
Capital Suite 12
Add Understanding data at scale leveraging Spark and Deep Learning Frameworks. to your personal schedule
9:00 Understanding data at scale leveraging Spark and Deep Learning Frameworks. Vartika Singh (Cloudera), Jeffrey Shmain (Cloudera)
Add Natural language understanding at scale with spaCy and Spark NLP to your personal schedule
13:30 Natural language understanding at scale with spaCy and Spark NLP David Talby (Pacific AI), Claudiu Branzan (G2 Web Services)
Capital Suite 13
Add Running data analytic workloads in the cloud   to your personal schedule
9:00 Running data analytic workloads in the cloud Mala Ramakrishnan (Cloudera), Eugene Fratkin (Cloudera), Mark Samson (Cloudera)
Add Architecting a next-generation data platform to your personal schedule
13:30 Architecting a next-generation data platform Ted Malaska (Blizzard Entertainment), Jonathan Seidman (Cloudera)
Capital Suite 14
Add  Architecting a data platform for enterprise use to your personal schedule
9:00 Architecting a data platform for enterprise use Mark Madsen (Third Nature)
Capital Suite 2/3
Capital Suite 4
Add Findata Day to your personal schedule
9:00 Findata Day Paul Lashmet (Arcadia Data), Konrad Sippel (Deustche Borse), Paul Damien Lynn (Nordea), Olaf Hein (ORDIX AG), Mikheil Nadareishvili (TBC Bank)
Capital Suite 8
Add Measure What Matters: How your measurement strategy can reduce OpEx to your personal schedule
9:00 Measure What Matters: How your measurement strategy can reduce OpEx Radhika Dutt (Radical Product), Geordie Kaytes (Fresh Tilled Soil), Nidhi Aggarwal (Radical Product)
Add Managing data science in the enterprise to your personal schedule
13:30 Managing data science in the enterprise Nick Elprin (Domino Data Lab)
Capital Suite 9
Add Making data visual: A practical session on using visualization for insight to your personal schedule
13:30 Making data visual: A practical session on using visualization for insight Danyel Fisher (Microsoft Research), Miriah Meyer (University of Utah)
Capital Suite 10
Add Learning PyTorch by building a recommender system to your personal schedule
13:30 Learning PyTorch by building a recommender system Neejole Patel (Virginia Tech)
Capital Suite 11
Add Getting up and running with TensorFlow to your personal schedule
9:00 Getting up and running with TensorFlow Yufeng Guo (Google)
Capital Suite 15
Add Modern real-time streaming architectures to your personal schedule
9:00 Modern real-time streaming architectures Arun Kejariwal (MZ), Karthik Ramasamy (Streamlio), Sanjeev Kulkarni (Streamlio), Sijie Guo (Streamlio)
Add Hands-on Kafka Streaming Microservices with Akka Streams and Kafka Streams to your personal schedule
13:30 Hands-on Kafka Streaming Microservices with Akka Streams and Kafka Streams Dean Wampler (Lightbend), Boris Lublinsky (Lightbend)
10:30 Morning break | Room: Capital Suite Foyer
15:00 Afternoon break | Room: Capital Suite Foyer
12:30 Lunch | Room: N11
Add Opening Reception to your personal schedule
17:00 Opening Reception | Room: Expo Hall (Capital Hall 24)
9:00-12:30 (3h 30m) Data science and machine learning
Understanding data at scale leveraging Spark and Deep Learning Frameworks.
Vartika Singh (Cloudera), Jeffrey Shmain (Cloudera)
We go through approaches for preprocessing, training, inference and deployment across data sets (time-series, audio, video and text), leveraging Spark, extended ecosystem of libraries and Deep Learning Frameworks. We use respective (sample) data and code to understand implementation nuances, and subsequently highlight the bottlenecks and solutions for data/model at scale.
13:30-17:00 (3h 30m) Data science and machine learning
Natural language understanding at scale with spaCy and Spark NLP
David Talby (Pacific AI), Claudiu Branzan (G2 Web Services)
Natural language processing is a key component in many data science systems that must understand or reason about text. This is a hands-on tutorial for scalable NLP using spaCy for building annotation pipelines, Spark NLP for building distributed natural language machine-learned pipelines, and Spark ML and TensorFlow for using deep learning to build and apply word embeddings.
9:00-12:30 (3h 30m) Data engineering and architecture
Running data analytic workloads in the cloud
Mala Ramakrishnan (Cloudera), Eugene Fratkin (Cloudera), Mark Samson (Cloudera)
The cloud enables the delivery of solutions to single multipurpose clusters offering hyperscale storage decoupled from elastic, on-demand computing. Mala Ramakrishnan, Eugene Fratkin, and Mark Samson detail new paradigms to effectively run production-level pipelines with minimal operational overhead. Join in to learn how to remove barriers to data discovery, metadata sharing, and access control.
13:30-17:00 (3h 30m) Data engineering and architecture
Architecting a next-generation data platform
Ted Malaska (Blizzard Entertainment), Jonathan Seidman (Cloudera)
Using Customer 360 and the IoT as examples, Jonathan Seidman and Ted Malaska explain how to architect a modern, real-time big data platform leveraging recent advancements in the open source software world, using components like Kafka, Flink, Kudu, Spark Streaming, and Spark SQL and modern storage engines to enable new forms of data processing and analytics.
9:00-12:30 (3h 30m) Data engineering and architecture
Architecting a data platform for enterprise use
Mark Madsen (Third Nature)
Building a data lake involves more than installing Hadoop or putting data into AWS. The goal in most organizations is to build multi-use data infrastructure that is not subject to past constraints. Mark Madsen explores design assumptions and principles and walks you through a reference architecture to use as you work to unify your analytics infrastructure.
13:30-17:00 (3h 30m) Law, ethics, and governance, Platform security and cybersecurity Security and Privacy
Securing and governing hybrid, cloud and on-prem big data deployments: step-by-step
Mark Donsky (Cloudera)
"Hybrid big data deployments present significant new security risks that need to be managed. It's encumbent upon security admins to ensure a consistently secured and governed experience for end users and administrators across multiple workloads that span on-prem, private cloud, multi-cloud, and hybrid cloud. We will share hands-on best practices for meeting these challenges."
9:00-17:00 (8h)
Data Case Studies
Hear practical insights from household brands and global companies: the challenges they tackled, approaches they took, and the benefits—and drawbacks—of their solutions.
9:00-17:00 (8h)
Findata Day
Paul Lashmet (Arcadia Data), Konrad Sippel (Deustche Borse), Paul Damien Lynn (Nordea), Olaf Hein (ORDIX AG), Mikheil Nadareishvili (TBC Bank)
From analyzing risk and detecting fraud to predicting payments and improving customer experience, take a deep dive into the ways data technologies are transforming the financial industry.
9:00-12:30 (3h 30m) Data-driven business management, Strata Business Summit
Measure What Matters: How your measurement strategy can reduce OpEx
Radhika Dutt (Radical Product), Geordie Kaytes (Fresh Tilled Soil), Nidhi Aggarwal (Radical Product)
These days it’s easy for companies to say, "We measure everything!” The problem is, most “popular” metrics may not be appropriate or relevant for your business. Measurement isn’t free, and should be done strategically. This session covers how you can align measurement with your product strategy, so you can measure what matters for your business.
13:30-17:00 (3h 30m) Strata Business Summit
Managing data science in the enterprise
Nick Elprin (Domino Data Lab)
The honeymoon era of data science is ending: accountability is coming. Not content to wait for results that may or may not arrive, successful data science leaders deliver measurable impact on an increasing share of an enterprises’ KPIs. You’ll learn how leading organizations take a holistic approach to people, process, and technology to build a sustainable competitive advantage.
9:00-12:30 (3h 30m) Law, ethics, and governance, Strata Business Summit Security and Privacy
General Data Protection Regulation - GDPR - Tutorial (+ ePrivacy introduction)
Aurélie Pols (Mind Your Privacy)
Using a 5+5 Pillars Framework for GDPR Readiness, this tutorial walks attendees through what the GDPR means to data fueled businesses. Anchored within the accountability principle, this interactive session allows to attribute responsibility to assure compliance and hopefully build towards ethical data practices, minimizing risk for your company while fostering trust with your clients.
13:30-17:00 (3h 30m) Visualization and user experience
Making data visual: A practical session on using visualization for insight
Danyel Fisher (Microsoft Research), Miriah Meyer (University of Utah)
Danyel Fisher and Miriah Meyer explore the human side of data analysis and visualization, covering operationalization, the process of reducing vague problems to specific tasks, and how to choose a visual representation that addresses those tasks. Along the way, they also discuss single views and explain how to link them into multiple views.
9:00-12:30 (3h 30m) Data science and machine learning, Emerging technologies and case studies
Introduction to Natural Language Processing with Python
Barbara Fusinska (Katacoda)
Natural Language Processing techniques allow addressing tasks like text classification and information extraction and content generation. In this session, Barbara will walk the audience through the process of building the bag of words representation and using it for text classification. The goal of this tutorial is to build the intuition on the simple natural language processing task.
13:30-17:00 (3h 30m) Data science and machine learning
Learning PyTorch by building a recommender system
Neejole Patel (Virginia Tech)
Since its arrival in early 2017, PyTorch has won over many deep learning researchers and developers due to its dynamic computation framework. Neejole Patel walks you through using PyTorch to build a content recommendation model.
9:00-17:00 (8h) Big data and data science in the cloud
Getting up and running with TensorFlow
Yufeng Guo (Google)
Yufeng Guo teaches you how to train a machine-learning system using popular open source ML library TensorFlow. You'll start with a conceptual overview and build all the way up to complex classifiers as you gain insight into deep learning and how it can apply to complex problems in science and industry.
9:00-12:30 (3h 30m) Streaming systems and real-time applications
Modern real-time streaming architectures
Arun Kejariwal (MZ), Karthik Ramasamy (Streamlio), Sanjeev Kulkarni (Streamlio), Sijie Guo (Streamlio)
The need for instant data-driven insights has led the proliferation of messaging and streaming frameworks. In this tutorial, we present an in-depth overview of state-of-the-art streaming architectures, streaming frameworks, and streaming algorithms, covering the typical challenges in modern real-time big data platforms and offering insights on how to address them.
13:30-17:00 (3h 30m) Streaming systems and real-time applications
Hands-on Kafka Streaming Microservices with Akka Streams and Kafka Streams
Dean Wampler (Lightbend), Boris Lublinsky (Lightbend)
This hands-on tutorial builds streaming apps as _microservices_ using Kafka with Akka Streams and Kafka Streams. We'll assess the strengths and weaknesses of each tool for particular needs, so you'll be better informed when choosing tools for your needs. We'll contrast them with Spark Streaming and Flink, including when to chose them instead.
10:30-11:00 (30m)
Break: Morning break
15:00-15:30 (30m)
Break: Afternoon break
12:30-13:30 (1h)
Break: Lunch
17:00-18:00 (1h)
Opening Reception
Join us after tutorials on Tuesday in the Expo Hall. Grab a drink and mingle with fellow Strata attendees while you check out all of the exhibitors.