Presented By O'Reilly and Cloudera
Make Data Work
December 1–3, 2015 • Singapore

Strata + Hadoop World in Singapore 2015 Tutorials

On Tuesday, December 1, choose from all-day and half-day tutorials. These expert-led presentations give you a chance to dive deep into the subject matter and offer a more participatory classroom experience. Please note: to attend, your registration package must include tutorials on Tuesday.

9:00am–12:30pm Tuesday, 12/01/2015
Location: 321-322 Level: Intermediate
Gwen Shapira (Confluent), Ted Malaska (Capital One), Mark Grover (Lyft), Jonathan Seidman (Cloudera)
Average rating: ****.
(4.16, 19 ratings)
Looking for a deeper understanding of how to architect real-time data processing solutions? This tutorial will provide this understanding using a real-world example of a fraud detection system. We’ll use this example to discuss considerations for building such a system, how you’d integrate various technologies, and why those choices make sense for the use case in question. Read more.
9:00am–5:00pm Tuesday, 12/01/2015
SOLD OUT
Location: 328-329 Level: Intermediate
Sameer Farooqui (Databricks), Paco Nathan (derwen.ai), Reynold Xin (Databricks)
Average rating: ****.
(4.00, 20 ratings)
The real power and value proposition of Apache Spark is in building a unified use case that combines ETL, batch analytics, real-time stream analysis, machine learning, graph processing and visualizations. In class we will explore various Wikipedia datasets while applying the ideal programming paradigm for each analysis. The class will comprise of about 50% lecture and 50% hands on labs + demos. Read more.
9:00am–12:30pm Tuesday, 12/01/2015
Location: 324 Level: Advanced
Tags: telecom
Juliet Hougland (Cloudera), Sandy Ryza (Cloudera)
Average rating: ***..
(3.40, 5 ratings)
In this half-day tutorial, attendees will get a taste of how large-scale data science techniques and technologies developed for the consumer internet can be applied in the world of Telecom. Read more.
9:00am–12:30pm Tuesday, 12/01/2015
Location: 334 Level: Intermediate
Matthew Conlen (FiveThirtyEight)
Average rating: **...
(2.44, 16 ratings)
This session teaches use of modern data analysis and visualization tools for effective interactive data science. Attendees will learn how to use notebook environments to set up sharable and reproducible analysis pipelines, and will leverage tools for large scale analysis and web-based data visualization to drive further analysis and decision making. Read more.
9:00am–12:30pm Tuesday, 12/01/2015
Location: 331 Level: Intermediate
Andreas Mueller (NYU, scikit-learn)
Average rating: ***..
(3.83, 6 ratings)
This talk is a tutorial for the machine learning library scikit-learn in Python. It starts with a short introduction into what machine learning is, and then dives in-depth into how to use scikit-learn in practice. The tutorial will be in the format of an IPython notebook and includes exercises. Read more.
1:30pm–5:00pm Tuesday, 12/01/2015
Location: 321-322 Level: Intermediate
Kathleen Ting (Cloudera), Jonathan Hsieh (Cloudera, Inc), Philip Langdale (Cloudera), Kostas Sakellis (Cloudera)
Average rating: ***..
(3.62, 8 ratings)
Hadoop is emerging as the standard for big data processing and analytics. However, as usage of Hadoop clusters grow, so do the demands of managing and monitoring these systems. In this tutorial, attendees will get an overview of all phases of successfully managing Hadoop clusters, with an emphasis on production systems. Read more.
1:30pm–5:00pm Tuesday, 12/01/2015
Location: 334 Level: Intermediate
Danielle Dean (iRobot), Wee Hyong Tok (Microsoft)
Average rating: ****.
(4.57, 7 ratings)
In this tutorial, you will create end-to-end predictive models based on an extensive library of machine learning algorithms included in Microsoft Azure Machine Learning studio with its R and Python language extensibility. You will then deploy and consume the model and use it for making predictions over business data. Read more.
1:30pm–5:00pm Tuesday, 12/01/2015
Location: 324 Level: Intermediate
Patrick McFadin (DataStax)
Average rating: ****.
(4.40, 5 ratings)
This tutorial is all about managing large volumes of data coming at your data center fast and continuously. If you don't have a strategy, then allow me to help. Amazing Apache Project software can make this problem a lot easier to deal with. Spend a few hours and learn about how each part works, and how they work together. Your users will thank you. Read more.
1:30pm–5:00pm Tuesday, 12/01/2015
Location: 331
Edd Wilder-James (Google), John Akred (Silicon Valley Data Science)
Average rating: ***..
(3.61, 23 ratings)
Big data and data science have great potential for accelerating business, but how do you reconcile the opportunity with the sea of possible technologies? Conventional data strategy has little to guide us, focusing more on governance than on creating new value. In this tutorial, we explain how to create a modern data strategy that powers data-driven business. Read more.