Presented By O'Reilly and Cloudera
December 5-6, 2016: Training
December 6–8, 2016: Tutorials & Conference
Singapore

Schedule: Data science and advanced analytics sessions

Inside the world of data practitioners—from the hard science of the latest algorithms and advances in machine learning, to the thorny issues of cultural change and team-building.

1:30pm–5:00pm Tuesday, December 6, 2016
Location: 321/322 Level: Intermediate
Tags: pydata
Juliet Hougland (Cloudera), srowen om (Cloudera)
Average rating: ***..
(3.00, 2 ratings)
Sean Owen and Juliet Hougland offer a practical overview of the basics of using Python data tools with a Hadoop cluster, covering HDFS connectivity and dealing with raw data files, running SQL queries with a SQL-on-Hadoop system like Apache Hive or Apache Impala (incubating), and using Apache Spark to write some more-complex analytical jobs. Read more.
2:35pm–3:15pm Wednesday, December 7, 2016
Location: 321/322 Level: Beginner
Nir Lotan (Intel)
Average rating: ****.
(4.00, 3 ratings)
Nir Lotan describes a new, free software tool based on existing deep learning frameworks that enables the fast and easy creation of deep learning models and incorporates extensive optimizations that provide high performance on standard CPUs. Read more.
5:05pm–5:45pm Wednesday, December 7, 2016
Location: Summit 1 Level: Beginner
Eugene Yan (Lazada)
Average rating: ****.
(4.00, 2 ratings)
As the number of products on Lazada grows exponentially, helping customers find relevant, quality products is key to customer experience. Eugene Yan shares how Lazada ranks products on its website, covering how Lazada scales data pipelines to collect user-behavioral data, cleans and prepares data, creates simple features, builds models to meet key objectives, and measures outcomes. Read more.
5:05pm–5:45pm Wednesday, December 7, 2016
Location: Summit 2 Level: Intermediate
Todd Lipcon (Cloudera), Marcel Kornacker (Cloudera)
Average rating: ****.
(4.17, 6 ratings)
Todd Lipcon and Marcel Kornacker provide an introduction to using Impala + Kudu to power your real-time data-centric applications for use cases like time series analysis (fraud detection, stream market data), machine data analytics, and online reporting. Read more.
11:15am–11:55am Thursday, December 8, 2016
Location: Summit 2 Level: Intermediate
Jason (Jinquan) Dai (Intel), Yiheng Wang (Intel)
Average rating: **...
(2.00, 2 ratings)
Jason Dai and Yiheng Wang share their experience building web-scale machine learning using Apache Spark—focusing specifically on "war stories" (e.g., in-game purchase, fraud detection, and deep leaning)—outline best practices to scale these learning algorithms, and discuss trade-offs in designing learning systems for the Spark framework. Read more.
11:15am–11:55am Thursday, December 8, 2016
Location: 328/329 Level: Intermediate
Bargava Subramanian (Binaize Labs), Amit Kapoor (narrativeVIZ Consulting)
Creating better models is a critical component to building a good data science product. It is relatively easy to build a first-cut machine-learning model, but what does it take to build a reasonably good or state-of-the-art model? Ensemble models—which help exploit the power of computing in searching the solution space. Bargava Subramanian discusses various strategies to build ensemble models. Read more.
12:05pm–12:45pm Thursday, December 8, 2016
Location: Summit 1 Level: Intermediate
Chi-Yi Kuan (LinkedIn), Weidong Zhang (LinkedIn), Tiger Zhang (LinkedIn)
Average rating: ****.
(4.50, 8 ratings)
Chi-Yi Kuan, Weidong Zhang, and Yongzheng Zhang explain how LinkedIn has built a "voice of member" platform to analyze hundreds of millions of text documents. Chi-Yi, Weidong, and Yongzheng illustrate the critical components of this platform and showcase how LinkedIn leverages it to derive insights such as customer value propositions from an enormous amount of unstructured data. Read more.
1:45pm–2:25pm Thursday, December 8, 2016
Location: 308/309 Level: Intermediate
Rajesh Sampathkumar (The Data Team)
Average rating: ****.
(4.00, 1 rating)
One challenge when dealing with manufacturing sensor data analysis is to formulate an efficient model of the underlying physical system. Rajesh Sampathkumar shares his experience working with sensor data at scale to model a real-world manufacturing subsystem with simple techniques, such as moving average analysis, and advanced ones, like VAR, applied to the problem of predictive maintenance. Read more.
2:35pm–3:15pm Thursday, December 8, 2016
Location: 321/322 Level: Intermediate
Mateusz Dymczyk (H2O.ai)
Average rating: ***..
(3.67, 3 ratings)
Deep learning has made a huge impact on predictive analytics and is here to stay, so you'd better get up to speed with the neural net craze. Mateusz Dymczyk explains why all the top companies are using deep learning, what it's all about, and how you can start experimenting and implementing deep learning solutions in your business in only a few easy steps. Read more.
4:15pm–4:55pm Thursday, December 8, 2016
Location: Summit 1 Level: Beginner
Ofer Ron (LivePerson)
Average rating: ***..
(3.00, 1 rating)
Ofer Ron examines the development of LivePerson's traffic targeting solution from a generic to a domain-specific implementation to demonstrate that a thorough understanding of the problem domain is essential to a good machine-learning-based product. Ofer then reviews the underlying architecture that makes this possible. Read more.
5:05pm–5:45pm Thursday, December 8, 2016
Location: Summit 1 Level: Intermediate
Tags: sports
Jared Lander (Lander Analytics)
Average rating: ***..
(3.67, 3 ratings)
Jared Lander worked with the Minnesota Vikings to bring moneyball to football for the 2015 NFL draft. Join Jared as he dives further into football, using statistical modeling and R to analyze opponent play-calling, examine when the New York Giants will run or pass the ball, and discern quarterback Eli Manning's favorite receivers. Read more.
5:05pm–5:45pm Thursday, December 8, 2016
Location: 321/322 Level: Intermediate
Bargava Subramanian (Binaize Labs), Amit Kapoor (narrativeVIZ Consulting)
Average rating: ***..
(3.50, 2 ratings)
Ever wondered how Google Translate works so well, how the autocaptioning works on YouTube, or how to mine the sentiments of tweets on Twitter? What’s the underlying theme? They all use deep learning. Bargava Subramanian and Amit Kapoor explore artificial neural networks and deep learning for natural language processing to get you started. Read more.