Presented By O'Reilly and Cloudera
Make Data Work
December 1–3, 2015 • Singapore

Advanced analytics with large scale distributed machine learning on Apache Spark

11:50am–12:30pm Thursday, 12/03/2015
Location: 332
Average rating: ****.
(4.50, 4 ratings)
Slides:   1-PPTX 

Expanded data analytics to enable better and faster decisions, is expected to accelerate process/product utilization, consumer/market understanding, and minimize risk in better time. New business requirements and usage models are emerging and driving the need for new big data analysis paradigms. In particular, there is increasing demand from organizations to discover and explore data using advanced analytics algorithms (e.g., large-scale machine learning, graph analysis, statistic modeling) for deep insights. In this talk, we will present our efforts on building large scale distributed ML on Apache Spark with many “web-scale” companies, including very complex and advanced analytics applications / algorithms (e.g., topic modelling, deep neural network, etc.), as well as massively scalable learning system/platform leveraging both application and infrastructure specific optimizations (exploring data sparsity, parameter server, etc)

This session is sponsored by Intel

Photo of Shengsheng Huang

Shengsheng Huang


Shengsheng (Shane) Huang is a software architect at Intel and an Apache Spark committer and PMC member, leading the development of large-scale analytical applications and infrastructure on Spark in Intel. Her area of focus is big data and distributed machine learning, especially deep (convolutional) neural networks. Previously at the National University of Singapore (NUS), her research interests are large-scale vision data analysis and statistical machine learning.

Shengsheng(Shane)Huang是英特尔的软件架构师,也是Apache Spark的贡献者和PMC成员。她领导着英特尔基于Spark的大规模分析应用和基础架构的开发。她关注的领域是大数据和分布式机器学习,尤其是深度(卷积)神经网络。她之前就读于新加坡国立大学(NUS),研究兴趣是大规模视觉数据分析和统计机器学习。