Presented By O'Reilly and Cloudera
Make Data Work
December 1–3, 2015 • Singapore

Advanced analytics with large scale distributed machine learning on Apache Spark

11:50am–12:30pm Thursday, 12/03/2015
Location: 332
Average rating: ****.
(4.50, 4 ratings)
Slides:   1-PPTX 

Expanded data analytics to enable better and faster decisions, is expected to accelerate process/product utilization, consumer/market understanding, and minimize risk in better time. New business requirements and usage models are emerging and driving the need for new big data analysis paradigms. In particular, there is increasing demand from organizations to discover and explore data using advanced analytics algorithms (e.g., large-scale machine learning, graph analysis, statistic modeling) for deep insights. In this talk, we will present our efforts on building large scale distributed ML on Apache Spark with many “web-scale” companies, including very complex and advanced analytics applications / algorithms (e.g., topic modelling, deep neural network, etc.), as well as massively scalable learning system/platform leveraging both application and infrastructure specific optimizations (exploring data sparsity, parameter server, etc)

This session is sponsored by Intel

Photo of Shengsheng Huang

Shengsheng Huang


Shengsheng (Shane) Huang is a software architect at Intel and an Apache Spark committer and PMC member, leading the development of large-scale analytical applications and infrastructure on Spark in Intel. Her area of focus is big data and distributed machine learning, especially deep (convolutional) neural networks. Previously at NUS (National University of Singapore), her research interests are large-scale vision data analysis and statistical machine learning.