Presented By O'Reilly and Cloudera
Make Data Work
December 1–3, 2015 • Singapore

Data Science and Advanced Analytics conference sessions

Tuesday, December 1

Add to your personal schedule
9:00am–12:30pm Tuesday, 12/01/2015
Location: 324 Level: Advanced
Tags: telecom
Juliet Hougland (Cloudera), Sandy Ryza (Cloudera)
Average rating: ***..
(3.40, 5 ratings)
In this half-day tutorial, attendees will get a taste of how large-scale data science techniques and technologies developed for the consumer internet can be applied in the world of Telecom. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 12/01/2015
Location: 331 Level: Intermediate
Andreas Mueller (NYU, scikit-learn)
Average rating: ***..
(3.83, 6 ratings)
This talk is a tutorial for the machine learning library scikit-learn in Python. It starts with a short introduction into what machine learning is, and then dives in-depth into how to use scikit-learn in practice. The tutorial will be in the format of an IPython notebook and includes exercises. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 12/01/2015
Location: 334 Level: Intermediate
Matthew Conlen (FiveThirtyEight)
Average rating: **...
(2.44, 16 ratings)
This session teaches use of modern data analysis and visualization tools for effective interactive data science. Attendees will learn how to use notebook environments to set up sharable and reproducible analysis pipelines, and will leverage tools for large scale analysis and web-based data visualization to drive further analysis and decision making. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 12/01/2015
Location: 334 Level: Intermediate
Danielle Dean (Microsoft), Wee Hyong Tok (Microsoft)
Average rating: ****.
(4.57, 7 ratings)
In this tutorial, you will create end-to-end predictive models based on an extensive library of machine learning algorithms included in Microsoft Azure Machine Learning studio with its R and Python language extensibility. You will then deploy and consume the model and use it for making predictions over business data. Read more.

Wednesday, December 2

Add to your personal schedule
11:00am–11:40am Wednesday, 12/02/2015
Location: 321-322 Level: Intermediate
Tags: featured
Kai Xin Thia (Lazada)
Average rating: ****.
(4.45, 11 ratings)
Southeast Asia provides a unique challenge to large recommender systems: how will you design one system that recommends products to millions of users, many whom are spread across several countries, with their own language and cultural preferences? Well, you don't. Instead, we will explore a hybrid system that integrates inputs from a variety of recommenders and deploys it on a distributed system. Read more.
Add to your personal schedule
11:50am–12:30pm Wednesday, 12/02/2015
Location: 321-322 Level: Intermediate
Ju Fan (National University of Singapore), Wei Wang (National University of Singapore)
Average rating: ***..
(3.20, 5 ratings)
We will introduce Apache SINGA, a flexible and scalable deep learning platform for big data analytics. SINGA is flexible to support various deep learning models, and is general to provide scalable training architecture. We will also show two applications to demonstrate how SINGA is helpful for healthcare data analytics, predicting risk-of-readmission and modeling chronic disease progression. Read more.
Add to your personal schedule
1:30pm–2:10pm Wednesday, 12/02/2015
Location: 321-322 Level: Intermediate
Marcel Kornacker (Cloudera), Skye Wanderman-Milne (Cloudera)
Average rating: ***..
(3.90, 10 ratings)
In this talk, we will explain how data scientists use nested data structures to increase analytic productivity. We will use two well-known relational schemas - TPC-H and Twitter - to demonstrate how to simplify data science workloads with nested schemas. Also, we will outline best practices for converting flat relational schemas into nested ones, and give examples of data science-style analysis. Read more.
Add to your personal schedule
4:00pm–4:40pm Wednesday, 12/02/2015
Location: 321-322 Level: Intermediate
Stephen Hardy (National ICT Australia)
Average rating: ****.
(4.17, 6 ratings)
Privacy in the world of big data is often considered as a legal or regulatory function. However, there are technology solutions for analytics that can be used today to protect users' privacy and to enable applications over data that is too sensitive to share. We will illustrate the state-of-the-art in privacy-preserving machine learning, including new techniques we have developed. Read more.
Add to your personal schedule
4:50pm–5:30pm Wednesday, 12/02/2015
Location: 321-322 Level: Intermediate
Jennifer Marsman (Microsoft)
Average rating: ****.
(4.50, 6 ratings)
Using the EPOC headset from Emotiv, I can capture the big data stream of EEG from our brains. I will share my results on a “lie detector” experiment comparing brain waves when telling the truth and lying. I have built classifiers based on the EEG data using Azure Machine Learning to predict whether a subject is telling the truth. The effectiveness of multiple classifiers can be easily compared. Read more.

Thursday, December 3

Add to your personal schedule
11:00am–11:40am Thursday, 12/03/2015
Location: 321-322 Level: Intermediate
Tags: commerce
Deepak Agrawal (24[7] Inc.)
Average rating: ****.
(4.00, 6 ratings)
This talk is about an application of big data predictive analytics to improve the online customer experience. The application is built using big data infrastructure with Hadoop, Cassandra, and machine learning algorithms using R and Python, that predict customer intent and take actions in real time to deliver an enhanced experience. Key challenges and lessons learned are also discussed. Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, 12/03/2015
Location: 321-322 Level: Intermediate
Yuichi Kuroda (Mitsubishi UFJ Information Technology (MUIT))
Average rating: **...
(2.80, 10 ratings)
In this session, attendees will learn the concepts underlying graph data analytics based on MUFG's experiences. Moreover, it will cover how to analyze huge graph data with Apache Spark GraphX. Finally, it will explore what type of data tends to cause problems and how to solve them. Read more.
Add to your personal schedule
1:30pm–2:10pm Thursday, 12/03/2015
Location: 321-322 Level: Intermediate
Wes McKinney (Two Sigma Investments)
Average rating: ***..
(3.60, 5 ratings)
Many data applications are written in Python or R, but developing and deploying these applications at scale or in production is a pain point for many users. We will discuss our new efforts to bridge the gap between familiar in-memory data tools and distributed data systems. In particular, we are working to enable users to streamline interactions with Hadoop and scalable query engines like Impala. Read more.
Add to your personal schedule
2:20pm–3:00pm Thursday, 12/03/2015
Location: 321-322 Level: Non-technical
Tags: geo
Whye Loon Tung (Nielsen)
Average rating: ***..
(3.29, 7 ratings)
Geospatial data is revolutionising the marketing research industry. In this talk, Nielsen researchers will describe how such information is being used by the company to improve internal processes and to give new insights into client behaviour. The goal is to give clients an analytic edge, as will be illustrated through key methodology and insights of recent projects. Read more.
Add to your personal schedule
4:00pm–4:40pm Thursday, 12/03/2015
Location: 321-322 Level: Intermediate
Uri Laserson (Cloudera)
Average rating: ***..
(3.33, 6 ratings)
The advent of next-generation DNA sequencing technologies is revolutionizing life sciences research by routinely generating extremely large data sets. Big data tools developed to handle large-scale internet data (like Hadoop) will help scientists effectively manage this new scale of data, and also enable addressing a host of questions that were previously out of reach. Read more.
Add to your personal schedule
4:00pm–4:40pm Thursday, 12/03/2015
Location: 331 Level: Intermediate
Melanie Warrick (Google)
Average rating: ****.
(4.00, 10 ratings)
This talk will briefly explain what neural nets are and why they’re important, as well as give context about GPUs. Then we will walk through code and launch a neural net on a GPU. I will cover key pitfalls you may hit and techniques to diagnose and troubleshoot. You will walk away understanding how to start using GPUs and where to go for additional help. Read more.
Add to your personal schedule
4:50pm–5:30pm Thursday, 12/03/2015
Location: 321-322 Level: Advanced
Josh Patterson (Skymind)
Average rating: ****.
(4.00, 4 ratings)
In this session we will take a look at a practical review of what is deep learning and introduce DL4J. We'll look at how it supports deep learning in the enterprise on the JVM. We’ll discuss the architecture of DL4J’s scale-out parallelization on Hadoop and Spark in support of modern machine learning workflows. Read more.