Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

Tutorials

On Tuesday, 23 May, these expert-led presentations give you a chance to dive deep into the subject matter. Please note: to attend, your registration package must include tutorials on Tuesday; does not include access to training courses.

Tuesday, 23 May

Add to your personal schedule
9:0012:30 Tuesday, 23 May 2017
Location: Capital Suite 8
Level: Intermediate
John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science)
Average rating: ***..
(3.64, 14 ratings)
What are the essential components of a data platform? John Akred and Stephen O'Sullivan explain how the various parts of the Hadoop, Spark, and big data ecosystems fit together in production to create a data platform supporting batch, interactive, and real-time analytical workloads. Read more.
Add to your personal schedule
9:0012:30 Tuesday, 23 May 2017
Location: Capital Suite 2/3
Level: Intermediate
Mark Donsky (Cloudera), Andre Araujo (Cloudera), Mubashir Kazia (Cloudera), Syed Rafice (Cloudera)
Average rating: ***..
(3.50, 4 ratings)
Mark Donsky, André Araujo, Syed Rafice, and Mubashir Kazia walk you through securing a Hadoop cluster. You’ll start with a cluster with no security and then add security features related to authentication, authorization, encryption of data at rest, encryption of data in transit, and complete data governance. Read more.
Add to your personal schedule
9:0012:30 Tuesday, 23 May 2017
Location: Capital Suite 4
Level: Intermediate
Dean Wampler (Lightbend)
Average rating: ****.
(4.50, 2 ratings)
Apache Spark is written in Scala. Hence, many if not most data engineers adopting Spark are also adopting Scala, while most data scientists continue to use Python and R. Dean Wampler offers an overview of the core features of Scala you need to use Spark effectively, using hands-on exercises with the Spark APIs. Read more.
Add to your personal schedule
9:0012:30 Tuesday, 23 May 2017
SOLD OUT
Location: Capital Suite 13
Secondary topics:  AI, Deep learning
Level: Intermediate
Alison Lowndes (NVIDIA)
Average rating: **...
(2.50, 4 ratings)
Alison Lowndes leads a hands-on exploration of approaches to the challenging problem of detecting if an object of interest is present within an image and, if so, recognizing its precise location within the image. Along the way, Alison walks you through testing three different approaches to deploying a trained DNN for inference. Read more.
Add to your personal schedule
9:0012:30 Tuesday, 23 May 2017
Location: Capital Suite 9
Level: Intermediate
David Tishgart (Cloudera), Philip Langdale (Cloudera), Eugene Fratkin (Cloudera), Jennifer Wu (Cloudera)
Public cloud usage for Hadoop workloads is accelerating. Consequently, Hadoop components have adapted to leverage cloud infrastructure. Eugene Fratkin, Philip Langdale, David Tishgart, and Jennifer Wu explore best practices for Hadoop deployments in the public cloud and provide detailed guidance for deploying, configuring, and managing Hive, Spark, and Impala in the public cloud. Read more.
Add to your personal schedule
9:0012:30 Tuesday, 23 May 2017
Location: Capital Suite 12
Level: Beginner
Charlotte Werger (ASI Data Science)
Average rating: ***..
(3.80, 5 ratings)
Charlotte Werger offers a hands-on overview of implementing machine learning with Python, providing practical experience while covering the most commonly used libraries, including NumPy, pandas, and scikit-learn. Read more.
Add to your personal schedule
9:0017:00 Tuesday, 23 May 2017
Location: Capital Suite 11
Secondary topics:  Text Analysis and Mining
Stephane Rion (Big Data Partnership)
Average rating: ****.
(4.00, 2 ratings)
Stephane Rion introduces you to Apache Spark 2.0 core concepts with a focus on Spark's machine-learning library, using text mining on real-world data as the primary end-to-end use case. Read more.
Add to your personal schedule
9:0012:30 Tuesday, 23 May 2017
Location: Capital Suite 10
Ian Meyers (Amazon Web Services (AWS)), Pratim Das (Amazon Web Services (AWS)), Ian Robinson (Amazon Web Services (AWS))
Average rating: *****
(5.00, 2 ratings)
Want to ramp up your knowledge of Amazon's big data web services and launch your first big data application on the cloud? Ian Meyers, Pratim Das, and Ian Robinson walk you through building a big data application in real time using a combination of open source technologies, including Apache Hadoop, Spark, and Zeppelin, as well as AWS managed services such as Amazon EMR, Amazon Kinesis, and more. Read more.
Add to your personal schedule
9:0012:30 Tuesday, 23 May 2017
Location: Capital Suite 14
Shannon Cutt (O'Reilly Media), Sanjay Mathur (Silicon Valley Data Science), Jim Scott (MapR Technologies), Ellen Friedman (Independent), Martin Goodson (Evolution AI), Majken Sander (TimeXtender), Darren Cook (QQ Trend Ltd.)
Data 101 introduces you to core principles of data architecture, teaches you how to build and manage successful data teams, and inspires you to do more with your data through real-world applications. Setting the foundation for deeper dives on the following days of Strata Data Conference, Data 101 reinforces data fundamentals and helps you focus on how data can solve your business problems. Read more.
Add to your personal schedule
9:0017:00 Tuesday, 23 May 2017
Location: London Suite 2/3
Angie Ma (ASI), Ben Lorica (O'Reilly Media), Ira Cohen (Anodot), Yingsong Zhang (ASI Data Science), Ali Hürriyetoglu (Statistics Netherlands), Nelleke Oostdijk (Radboud University), Robin Senge (inovex GmbH), Mathew Salvaris (Microsoft), Miguel Gonzalez-Fierro (Microsoft), Amitai Armon (Intel), Yahav Shadmi (Intel), Kay Brodersen (Google), Ding Ding (Intel), Alan Mosca (Sendence | Birkbeck, University of London), Eduard Vazquez (Cortexica Vision Systems), Aida Mehonic (ASI Data Science), David Barber (Department of Computer Science, UCL)
A full day of hardcore data science, exploring emerging topics and new areas of study made possible by vast troves of raw data and cutting-edge architectures for analyzing and exploring information. Along the way, leading data science practitioners teach new techniques and technologies to add to your data science toolbox. Read more.
Add to your personal schedule
13:3017:00 Tuesday, 23 May 2017
Location: Capital Suite 12
Level: Intermediate
Scott Kurth (Silicon Valley Data Science), John Akred (Silicon Valley Data Science)
Average rating: ****.
(4.20, 5 ratings)
Big data and data science have great potential for accelerating business, but how do you reconcile the business opportunity with the sea of possible technologies? Data should serve the strategic imperatives of a business—those aspirations that will define an organization’s future vision. Scott Kurth and John Akred explain how to create a modern data strategy that powers data-driven business. Read more.
Add to your personal schedule
13:3017:00 Tuesday, 23 May 2017
Location: Capital Suite 2/3
Level: Intermediate
Jeffrey Shmain (Cloudera), Jayant Shekhar (Sparkflows Inc.), Vartika Singh (Cloudera)
Average rating: ***..
(3.50, 4 ratings)
Vartika Singh, Jayant Shekhar, and Jeffrey Shmain walk you through various approaches using the machine-learning algorithms available in Spark Framework (and more) to understand and decipher meaningful patterns in real-world data. Read more.
Add to your personal schedule
13:3017:00 Tuesday, 23 May 2017
Location: Capital Suite 4
Level: Intermediate
Tim Berglund (Confluent)
Average rating: ***..
(3.50, 2 ratings)
Tim Berglund demonstrates how to use Kafka Connect and Kafka Streams to build real-world, real-time streaming data pipelines—using Kafka Connect to ingest data from a relational database into Kafka topics as the data is being generated and then using Kafka Streams to process and enrich the data in real time before writing it out for further analysis. Read more.
Add to your personal schedule
13:3017:00 Tuesday, 23 May 2017
Location: Capital Suite 9
Level: Intermediate
Douglas Ashton (Mango Solutions), Aimee Gott (Mango Solutions), Mark Sellors (Mango Solutions)
Average rating: *****
(5.00, 1 rating)
R is a top contender for statistics and machine learning, but Spark has emerged as the leader for in-memory distributed data analysis. Douglas Ashton, Aimee Gott, and Mark Sellors introduce Spark, cover data manipulation with Spark as a backend to dplyr and machine learning via MLlib, and explore RStudio's sparklyr package, giving you the power of Spark without having to leave your R session. Read more.
Add to your personal schedule
13:3017:00 Tuesday, 23 May 2017
Location: Capital Suite 8
Level: Advanced
Jonathan Seidman (Cloudera), Mark Grover (Lyft), Ted Malaska (Blizzard Entertainment)
Average rating: *****
(5.00, 6 ratings)
Using Entity 360 as an example, Jonathan Seidman, Ted Malaska, and Mark Grover explain how to architect a modern, real-time big data platform leveraging recent advancements in the open source software world, using components like Kafka, Impala, Kudu, Spark Streaming, and Spark SQL with Hadoop to enable new forms of data processing and analytics. Read more.
Add to your personal schedule
13:3017:00 Tuesday, 23 May 2017
Location: Capital Suite 10
Level: Intermediate
John Mikula (Google Cloud)
Average rating: *....
(1.33, 3 ratings)
John Mikula explores using managed Spark and Hadoop solutions in public clouds alongside cloud products for storage, analysis, and message queues to meet enterprise requirements via the Spark and Hadoop ecosystem. Read more.
Add to your personal schedule
13:3017:00 Tuesday, 23 May 2017
Location: Capital Suite 14
Level: Beginner
Amit Kapoor (narrativeVIZ Consulting), Bargava Subramanian (Independent)
Average rating: ***..
(3.50, 2 ratings)
Crafting interactive data visualizations for the web is hard—you're stuck using proprietary tools or must become proficient in JavaScript libraries like D3. But what if creating a visualization was as easy as writing text? Amit Kapoor and Bargava Subramanian outline the grammar of interactive graphics and explain how to use declarative markdown-based tool Visdown to build them with ease. Read more.
Add to your personal schedule
13:3017:00 Tuesday, 23 May 2017
Location: Capital Suite 13
Secondary topics:  Cloud, Deep learning
Level: Advanced
Anima Anandkumar (UC Irvine)
Average rating: ***..
(3.67, 3 ratings)
Deep learning is the state of the art in domains such as computer vision and natural language understanding. Apache MXNet is a highly flexible and developer-friendly deep learning framework. Anima Anandkumar provides hands-on experience on how to use Apache MXNet with preconfigured Deep Learning AMIs and CloudFormation Templates to help speed your development. Read more.
Add to your personal schedule
13:3017:00 Tuesday, 23 May 2017
Location: Capital Suite 15
Allison Nau (Cox Automotive UK), Sriskandarajah Suhothayan (WSO2), Roland Major (Transport for London), Denis C. Bauer (Commonwealth Scientific and Industrial Research Organisation), Alberto Rey (easyJet PLC), Alistair Croll (Solve For Interesting), Wael Elrifai (Pentaho)
In a series of 6 half-hour talks aimed at a business audience, you’ll hear data-themed case studies from household brands and global companies, explaining the challenges they wanted to tackle, the approaches they took, and the benefits—and drawbacks—of their solutions. If you want practical insights about applied data, look no further. Read more.