Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK
 
Capital Suite 12
Add Practical machine learning with Python to your personal schedule
9:00 Practical machine learning with Python Charlotte Werger (ASI Data Science)
Add Developing a modern enterprise data strategy to your personal schedule
13:30 Developing a modern enterprise data strategy Scott Kurth (Silicon Valley Data Science), John Akred (Silicon Valley Data Science)
Capital Suite 13
Add Distributed deep learning on AWS using Apache MXNet to your personal schedule
13:30 Distributed deep learning on AWS using Apache MXNet Anima Anandkumar (UC Irvine)
Capital Suite 14
Add Data 101 to your personal schedule
9:00 Data 101 Shannon Cutt (O'Reilly Media), Sanjay Mathur (Silicon Valley Data Science), Jim Scott (MapR Technologies), Ellen Friedman (Independent), Martin Goodson (Evolution AI), Majken Sander (TimeXtender), Darren Cook (QQ Trend Ltd.)
Add Interactive data visualizations using Visdown to your personal schedule
13:30 Interactive data visualizations using Visdown Amit Kapoor (narrativeVIZ Consulting), Bargava Subramanian (Independent)
Capital Suite 11
Capital Suite 2/3
Add A practitioner’s guide to securing your Hadoop cluster to your personal schedule
9:00 A practitioner’s guide to securing your Hadoop cluster Mark Donsky (Cloudera), Andre Araujo (Cloudera), Mubashir Kazia (Cloudera), Syed Rafice (Cloudera)
Add Unraveling data with Spark using machine learning to your personal schedule
13:30 Unraveling data with Spark using machine learning Jeffrey Shmain (Cloudera), Jayant Shekhar (Sparkflows Inc.), Vartika Singh (Cloudera)
Capital Suite 4
Add Just enough Scala for Spark to your personal schedule
9:00 Just enough Scala for Spark Dean Wampler (Lightbend)
Add Real-time data pipelines with Apache Kafka to your personal schedule
13:30 Real-time data pipelines with Apache Kafka Tim Berglund (Confluent)
Capital Suite 8
Add Architecting a data platform to your personal schedule
9:00 Architecting a data platform John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science)
Add Architecting a next-generation data platform to your personal schedule
13:30 Architecting a next-generation data platform Jonathan Seidman (Cloudera), Mark Grover (Lyft), Ted Malaska (Blizzard Entertainment)
Capital Suite 9
Add Deploying and managing Hive, Spark, and Impala in the public cloud to your personal schedule
9:00 Deploying and managing Hive, Spark, and Impala in the public cloud David Tishgart (Cloudera), Philip Langdale (Cloudera), Eugene Fratkin (Cloudera), Jennifer Wu (Cloudera)
Add Spark and R with sparklyr to your personal schedule
13:30 Spark and R with sparklyr Douglas Ashton (Mango Solutions), Aimee Gott (Mango Solutions), Mark Sellors (Mango Solutions)
Capital Suite 10
Add Building your first big data application on AWS to your personal schedule
9:00 Building your first big data application on AWS Ian Meyers (Amazon Web Services (AWS)), Pratim Das (Amazon Web Services (AWS)), Ian Robinson (Amazon Web Services (AWS))
Capital Suite 15
Add FinData Day to your personal schedule
9:00 FinData Day Doron Reuter (ING), Aida Mehonic (ASI Data Science), Colin White (Goldman Sachs), Simon Wardley (Leading Edge Forum), Tanvi Singh (Credit Suisse), Olivier de Garrigues (Trifacta)
Add Data Case Studies to your personal schedule
13:30 Data Case Studies Allison Nau (Cox Automotive UK), Sriskandarajah Suhothayan (WSO2), Roland Major (Transport for London), Denis C. Bauer (Commonwealth Scientific and Industrial Research Organisation), Alberto Rey (easyJet PLC), Alistair Croll (Solve For Interesting), Wael Elrifai (Pentaho)
London Suite 2/3
Add Hardcore Data Science to your personal schedule
9:00 Hardcore Data Science Angie Ma (ASI), Ben Lorica (O'Reilly Media), Ira Cohen (Anodot), Yingsong Zhang (ASI Data Science), Ali Hürriyetoglu (Statistics Netherlands), Nelleke Oostdijk (Radboud University), Robin Senge (inovex GmbH), Mathew Salvaris (Microsoft), Miguel Gonzalez-Fierro (Microsoft), Amitai Armon (Intel), Yahav Shadmi (Intel), Kay Brodersen (Google), Ding Ding (Intel), Alan Mosca (Sendence | Birkbeck, University of London), Eduard Vazquez (Cortexica Vision Systems), Aida Mehonic (ASI Data Science), David Barber (Department of Computer Science, UCL)
12:30 Lunch | Room: Hall N21/22/23
Add Opening Reception to your personal schedule
17:00 Opening Reception | Room: Capital Hall (N24)
9:00-12:30 (3h 30m) Data science and advanced analytics
Practical machine learning with Python
Charlotte Werger (ASI Data Science)
Charlotte Werger offers a hands-on overview of implementing machine learning with Python, providing practical experience while covering the most commonly used libraries, including NumPy, pandas, and scikit-learn.
13:30-17:00 (3h 30m) Data-driven business management, Strata Business Summit
Developing a modern enterprise data strategy
Scott Kurth (Silicon Valley Data Science), John Akred (Silicon Valley Data Science)
Big data and data science have great potential for accelerating business, but how do you reconcile the business opportunity with the sea of possible technologies? Data should serve the strategic imperatives of a business—those aspirations that will define an organization’s future vision. Scott Kurth and John Akred explain how to create a modern data strategy that powers data-driven business.
9:00-12:30 (3h 30m) Data science and advanced analytics AI, Deep learning
Deep learning for object detection and neural network deployment
Alison Lowndes (NVIDIA)
Alison Lowndes leads a hands-on exploration of approaches to the challenging problem of detecting if an object of interest is present within an image and, if so, recognizing its precise location within the image. Along the way, Alison walks you through testing three different approaches to deploying a trained DNN for inference.
13:30-17:00 (3h 30m) Data science and advanced analytics Cloud, Deep learning
Distributed deep learning on AWS using Apache MXNet
Anima Anandkumar (UC Irvine)
Deep learning is the state of the art in domains such as computer vision and natural language understanding. Apache MXNet is a highly flexible and developer-friendly deep learning framework. Anima Anandkumar provides hands-on experience on how to use Apache MXNet with preconfigured Deep Learning AMIs and CloudFormation Templates to help speed your development.
9:00-12:30 (3h 30m)
Data 101
Shannon Cutt (O'Reilly Media), Sanjay Mathur (Silicon Valley Data Science), Jim Scott (MapR Technologies), Ellen Friedman (Independent), Martin Goodson (Evolution AI), Majken Sander (TimeXtender), Darren Cook (QQ Trend Ltd.)
Data 101 introduces you to core principles of data architecture, teaches you how to build and manage successful data teams, and inspires you to do more with your data through real-world applications. Setting the foundation for deeper dives on the following days of Strata Data Conference, Data 101 reinforces data fundamentals and helps you focus on how data can solve your business problems.
13:30-17:00 (3h 30m) Strata Business Summit, Visualization & user experience
Interactive data visualizations using Visdown
Amit Kapoor (narrativeVIZ Consulting), Bargava Subramanian (Independent)
Crafting interactive data visualizations for the web is hard—you're stuck using proprietary tools or must become proficient in JavaScript libraries like D3. But what if creating a visualization was as easy as writing text? Amit Kapoor and Bargava Subramanian outline the grammar of interactive graphics and explain how to use declarative markdown-based tool Visdown to build them with ease.
9:00-17:00 (8h) Spark & beyond Text Analysis and Mining
Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML
Stephane Rion (Big Data Partnership)
Stephane Rion introduces you to Apache Spark 2.0 core concepts with a focus on Spark's machine-learning library, using text mining on real-world data as the primary end-to-end use case.
9:00-12:30 (3h 30m) Hadoop platform and applications, Platform Security and Cybersecurity
A practitioner’s guide to securing your Hadoop cluster
Mark Donsky (Cloudera), Andre Araujo (Cloudera), Mubashir Kazia (Cloudera), Syed Rafice (Cloudera)
Mark Donsky, André Araujo, Syed Rafice, and Mubashir Kazia walk you through securing a Hadoop cluster. You’ll start with a cluster with no security and then add security features related to authentication, authorization, encryption of data at rest, encryption of data in transit, and complete data governance.
13:30-17:00 (3h 30m) Spark & beyond
Unraveling data with Spark using machine learning
Jeffrey Shmain (Cloudera), Jayant Shekhar (Sparkflows Inc.), Vartika Singh (Cloudera)
Vartika Singh, Jayant Shekhar, and Jeffrey Shmain walk you through various approaches using the machine-learning algorithms available in Spark Framework (and more) to understand and decipher meaningful patterns in real-world data.
9:00-12:30 (3h 30m) Spark & beyond
Just enough Scala for Spark
Dean Wampler (Lightbend)
Apache Spark is written in Scala. Hence, many if not most data engineers adopting Spark are also adopting Scala, while most data scientists continue to use Python and R. Dean Wampler offers an overview of the core features of Scala you need to use Spark effectively, using hands-on exercises with the Spark APIs.
13:30-17:00 (3h 30m) Stream processing and analytics
Real-time data pipelines with Apache Kafka
Tim Berglund (Confluent)
Tim Berglund demonstrates how to use Kafka Connect and Kafka Streams to build real-world, real-time streaming data pipelines—using Kafka Connect to ingest data from a relational database into Kafka topics as the data is being generated and then using Kafka Streams to process and enrich the data in real time before writing it out for further analysis.
9:00-12:30 (3h 30m) Data engineering and architecture, Spark & beyond
Architecting a data platform
John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science)
What are the essential components of a data platform? John Akred and Stephen O'Sullivan explain how the various parts of the Hadoop, Spark, and big data ecosystems fit together in production to create a data platform supporting batch, interactive, and real-time analytical workloads.
13:30-17:00 (3h 30m) Data engineering and architecture, Hadoop platform and applications, Stream processing and analytics
Architecting a next-generation data platform
Jonathan Seidman (Cloudera), Mark Grover (Lyft), Ted Malaska (Blizzard Entertainment)
Using Entity 360 as an example, Jonathan Seidman, Ted Malaska, and Mark Grover explain how to architect a modern, real-time big data platform leveraging recent advancements in the open source software world, using components like Kafka, Impala, Kudu, Spark Streaming, and Spark SQL with Hadoop to enable new forms of data processing and analytics.
9:00-12:30 (3h 30m) Big data and the Cloud, Data engineering and architecture
Deploying and managing Hive, Spark, and Impala in the public cloud
David Tishgart (Cloudera), Philip Langdale (Cloudera), Eugene Fratkin (Cloudera), Jennifer Wu (Cloudera)
Public cloud usage for Hadoop workloads is accelerating. Consequently, Hadoop components have adapted to leverage cloud infrastructure. Eugene Fratkin, Philip Langdale, David Tishgart, and Jennifer Wu explore best practices for Hadoop deployments in the public cloud and provide detailed guidance for deploying, configuring, and managing Hive, Spark, and Impala in the public cloud.
13:30-17:00 (3h 30m) Big data and the Cloud, Spark & beyond
Spark and R with sparklyr
Douglas Ashton (Mango Solutions), Aimee Gott (Mango Solutions), Mark Sellors (Mango Solutions)
R is a top contender for statistics and machine learning, but Spark has emerged as the leader for in-memory distributed data analysis. Douglas Ashton, Aimee Gott, and Mark Sellors introduce Spark, cover data manipulation with Spark as a backend to dplyr and machine learning via MLlib, and explore RStudio's sparklyr package, giving you the power of Spark without having to leave your R session.
9:00-12:30 (3h 30m) Big data and the Cloud
Building your first big data application on AWS
Ian Meyers (Amazon Web Services (AWS)), Pratim Das (Amazon Web Services (AWS)), Ian Robinson (Amazon Web Services (AWS))
Want to ramp up your knowledge of Amazon's big data web services and launch your first big data application on the cloud? Ian Meyers, Pratim Das, and Ian Robinson walk you through building a big data application in real time using a combination of open source technologies, including Apache Hadoop, Spark, and Zeppelin, as well as AWS managed services such as Amazon EMR, Amazon Kinesis, and more.
13:30-17:00 (3h 30m) Big data and the Cloud, Data engineering and architecture
Architecting and building enterprise-class Spark and Hadoop in cloud environments
John Mikula (Google Cloud)
John Mikula explores using managed Spark and Hadoop solutions in public clouds alongside cloud products for storage, analysis, and message queues to meet enterprise requirements via the Spark and Hadoop ecosystem.
9:00-12:30 (3h 30m)
FinData Day
Doron Reuter (ING), Aida Mehonic (ASI Data Science), Colin White (Goldman Sachs), Simon Wardley (Leading Edge Forum), Tanvi Singh (Credit Suisse), Olivier de Garrigues (Trifacta)
Finance is information. From analyzing risk and detecting fraud to predicting payments and improving customer experience, data technologies are transforming the financial industry. And we're diving deep into this change with a new day of data-meets-finance talks, tailored for Strata Data Conference events in the world's financial hubs.
13:30-17:00 (3h 30m)
Data Case Studies
Allison Nau (Cox Automotive UK), Sriskandarajah Suhothayan (WSO2), Roland Major (Transport for London), Denis C. Bauer (Commonwealth Scientific and Industrial Research Organisation), Alberto Rey (easyJet PLC), Alistair Croll (Solve For Interesting), Wael Elrifai (Pentaho)
In a series of 6 half-hour talks aimed at a business audience, you’ll hear data-themed case studies from household brands and global companies, explaining the challenges they wanted to tackle, the approaches they took, and the benefits—and drawbacks—of their solutions. If you want practical insights about applied data, look no further.
9:00-17:00 (8h)
Hardcore Data Science
Angie Ma (ASI), Ben Lorica (O'Reilly Media), Ira Cohen (Anodot), Yingsong Zhang (ASI Data Science), Ali Hürriyetoglu (Statistics Netherlands), Nelleke Oostdijk (Radboud University), Robin Senge (inovex GmbH), Mathew Salvaris (Microsoft), Miguel Gonzalez-Fierro (Microsoft), Amitai Armon (Intel), Yahav Shadmi (Intel), Kay Brodersen (Google), Ding Ding (Intel), Alan Mosca (Sendence | Birkbeck, University of London), Eduard Vazquez (Cortexica Vision Systems), Aida Mehonic (ASI Data Science), David Barber (Department of Computer Science, UCL)
A full day of hardcore data science, exploring emerging topics and new areas of study made possible by vast troves of raw data and cutting-edge architectures for analyzing and exploring information. Along the way, leading data science practitioners teach new techniques and technologies to add to your data science toolbox.
12:30-13:30 (1h)
Break: Lunch
17:00-18:00 (1h) Event
Opening Reception
Grab a drink and mingle with fellow Strata Data Conference attendees while you check out all of the exhibitors in the Expo Hall.
18:00-20:00 (2h) Event
Strata London Community Lightning Talks
Join us for a fun, high-energy evening with 5- to 15-minute lightning talks from the London data community.