Presented By O'Reilly and Cloudera
Make Data Work
December 1–3, 2015 • Singapore

Speaker Slides & Video

Presentation slides will be made available after the session has concluded and the speaker has given us the files. Check back if you don't see the file you're looking for—it might be available later! (However, please note some speakers choose not to share their presentations.)

Josh Patterson (Patterson Consulting)
Slides:   1-PPTX 
In this session we will take a look at a practical review of what is deep learning and introduce DL4J. We'll look at how it supports deep learning in the enterprise on the JVM. We’ll discuss the architecture of DL4J’s scale-out parallelization on Hadoop and Spark in support of modern machine learning workflows.
Slides:   1-PPTX 
In this talk, we will present our efforts on building large scale distributed ML on Apache Spark with many "web-scale" companies, including very complex and advanced analytics applications / algorithms (e.g., topic modelling, deep neural network, etc.), as well as massively scalable learning system/platform leveraging both application and infrastructure specific optimizations.
Kathleen Ting (Cloudera), Jonathan Hsieh (Cloudera, Inc), Philip Langdale (Cloudera), Kostas Sakellis (Cloudera)
Slides:   external link,   2-PDF 
Hadoop is emerging as the standard for big data processing and analytics. However, as usage of Hadoop clusters grow, so do the demands of managing and monitoring these systems. In this tutorial, attendees will get an overview of all phases of successfully managing Hadoop clusters, with an emphasis on production systems.
Ivan Teh (Fusionex)
If you could simulate the results of your business decisions, wouldn't that change the way you manage your business? The availability of big data solutions today introduces new management principles, opportunities as well as challenges.
Sujit Mathew (PayPal), Yew Yap Goh (PayPal)
Slides:   external link
Our team’s main focus at PayPal is to boost customer engagement. This talk is about how we use predictive modeling to recommend products to consumers. We will talk about the technologies we use and how we deploy our models to production.
Evan Chan (Tuplejump)
Slides:   external link
This talk will show architectures and techniques for combining Apache Cassandra and Spark to yield a 10-1000x improvement in OLAP analytical performance, and introduce a new open source database that takes advantage of these techniques.
Shirshanka Das (LinkedIn)
Slides:   1-PDF 
LinkedIn describes how they’ve built a self-serve petabyte-scale reporting platform centered around Hadoop, that powers all business decision making at LinkedIn. We describe how we overcame challenges to scale to over a thousand analysts, over a thousand metrics, and provide daily, hourly, as well as real-time reports. This has reduced turnaround times for dashboards from weeks to a few hours.
Doug Cutting (Cloudera)
The data century is upon us and Apache Hadoop has emerged as the platform for managing your big data opportunity. The path to success is not without its perils, however, and without a thoughtful approach progress can be hindered by the impact of change, trust and security.
Dave Chan (UBM Asia), Sonal Goyal (Nube)
Slides:   1-PDF 
UBM Asia is the largest trade show organizer in Asia. To deal with duplicate customer records and ensure clean marketing data, UBM Asia has built an end to end solution using Reifier from Nube Technologies built atop Spark. This talk will discuss UBM's use case and our use of Reifier fuzzy matching engine, Spark and machine learning. We will also cover Reifier's architecture and usage of Spark.
Tara Hirebet (R/GA)
When data is hidden and crunched, and used purely for organization and optimization, we may be losing out on a crucial value it can offer – that of empowerment, engagement and impactful behavioral change.
Amit Bansal (Accenture Digital)
Learn how the intersection of people, data and intelligent machines will have far-reaching impact on the productivity, efficiency and operations of industries around the world as organizations transform to become data-driven, insight-powered enterprises.
Melanie Warrick (Google)
Deep Learning is taking hold as a popular machine learning modeling technique because of its real world applications especially with regards to image, signal and language datasets (e.g. medical diagnosis, self-driving cars, real-time language translation). This talk provides an overview of what deep learning is especially around recent applications.
Jun Liu (Intel), Zhaojuan Bian (Intel)
Slides:   1-PDF 
Based on previous experience, there are many challenges in designing an Impala cluster for production, such as table schema, data placement, file format selection, hardware selection, and software stack parameters tuning. We will walk through a real-world case study in the banking and financial services sector to illustrate how we use our simulator-based approach to design an Impala cluster.
Edd Wilder-James (Google), John Akred (Silicon Valley Data Science)
Slides:   1-PDF 
Big data and data science have great potential for accelerating business, but how do you reconcile the opportunity with the sea of possible technologies? Conventional data strategy has little to guide us, focusing more on governance than on creating new value. In this tutorial, we explain how to create a modern data strategy that powers data-driven business.
Hong Eng Koh (Oracle), Vladimir Videnovic (Oracle)
Slides:   1-PDF 
Public safety and national security are increasingly being challenged by technology; the need to use data to detect and investigate criminal activities has increased dramatically. But with the sheer volume of data and noise, law enforcement organisations are struggling to keep up. This session will examine trends and use cases on how big data can be utilised to make the world a safer place.
The high volume, velocity, variety and veracity of big data have been pushing for more comprehensive solutions and services to enable decision-making and insight discovery across business market segments. Ziya Ma, General Manager of Big Data Software Technologies in Intel's Software and Services Group, will discuss Intel’s software enabling role for making this possible and easier.
Jim Scott (NVIDIA)
Slides:   1-PPTX 
Application developers have long created complex schemas to handle storing with minor relationships in an RDBMS. This talk will show how to convert an existing (complicated schema) music database to HBase for transactional workloads, plus how to use Drill against HBase for real-time queries. HBase column families will also be discussed.
Bin Fan (Alluxio), Xiang Wen (Baidu)
Slides:   1-PPTX 
Baidu runs Tachyon in production with more than 100 nodes managing 2PB space! In this talk we will focus on how Tachyon can help improve big data analytics (ad-hoc query) with 30X performance improvement within Baidu.
Guy Harrison (Dell Software)
Slides:   1-ZIP 
When people think of big data processing, they think of Apache Hadoop, but that doesn't mean traditional databases don't play a role. In most cases users will still draw from data stored in RDBMS systems. Apache Sqoop can be used to unlock that data and transfer it to Hadoop, enabling users with information stored in existing SQL tables to use new analytic tools.
Sean Zhong (Previously Intel)
Slides:   1-PPTX 
GearPump is an akka based framework that processes real time data across a DAG of actors. Its data delivery is highly scalable with at least once data delivery guarantees.
Paul Scott-Murphy (WANdisco)
Slides:   1-ZIP 
Hadoop lacks a mechanism to extend the distributed file system beyond the confines of a single cluster. Done right, active-active consensus can guarantee consistency of replicated file system changes regardless of Hadoop versions, distributions and communication latency. Find out how to perform selective data replication for cluster migration, disaster recovery, multi-site ingest, backup and more.
Nikhil Joshi (EMC, Advanced Software Division), Priya Lakshminarayanan (EMC Corporation)
Slides:   1-PDF 
This session focuses on strategies and technologies you can use to build a global Hadoop cloud with geo-distributed access and protection for analytics in various use-cases like IoT - handling billions of small files or multi-terabyte files in the same system.
Mark Donsky (Okera), Naren Koneru (Cloudera)
Slides:   1-PDF 
Find out how the world's most sophisticated Hadoop deployments are addressing data governance challenges head-on, while preserving Hadoop's flexibility, through an integrated data management and governance approach.
Slides:   1-PPT 
The fast evolution of services and mobile terminals combined with the aggressive competition between mobile operators is driving a continuous upgrade of the radio access network (RAN). This upgrade process is expensive and time consuming, and it scales with the number of base stations. This talk stresses the importance of the customer and proposes a new methodology for an efficient RAN upgrade.
Farrah Bostic (The Difference Engine)
This talk will highlight the top 5 mistakes we make in collecting and analyzing qualitative data, how to do it better, and how it can inspire your next big thing.
Albert Bifet (Télécom ParisTech), Silviu Maniu (Huawei)
Slides:   1-PDF 
Real-time analytics are becoming increasingly important to telecommunication operators due to the large amount of data that flows through their networks. Drawing from our experience at Huawei, we present StreamDM, a new open source data mining and machine learning library on top of Spark Streaming. We will present its implemented advanced methods, and demonstrate its ease of use and extensibility.
Ken Medlock (ANZ Banking Group)
Slides:   1-PDF 
ANZ has adopted an innovative approach to drive continuous business and identification of business value opportunities using disruptive and new big data technologies.
Eric Frenkiel (MemSQL)
Slides:   external link
Eric Frenkiel, CEO/cofounder, MemSQL, will demonstrate a prototype of a futuristic smart city where all household energy devices are tracked in real-time. He will show the challenges, design choices & architecture required to enable urban planners/energy companies to see what is possible for efficient energy consumption through a real-time data pipeline combining Kafka+Spark+an in-memory database.
Rishi Malhotra (Saavn)
In this session, we’ll take a look at how music streaming delivers real time data that enables us to proxy a billion behaviors and apply the signals to other industries. Rishi was also a participant in the O’Reilly Study “Music Science”, published in 2015 by Alistair Croll.
Utkarsh B (Flipkart), Vinod Venkatraman (Flipkart Internet Private Limited)
Slides:   1-FILE 
Have you faced the challenge of storing and optimally serving multibillion-row EAV modeled data out of a traditional data store? Monolithic data stores fall short, even with fast storage like SSDs for a large online marketplace, quantified here as 3 billion catalog entries and 100 million catalog updates in a day. This talk is about paradigms and patterns we adopted to address this problem.
Deepak Ramanathan (SAS Asia Pacific)
Join this keynote presentation to get tips from the future and hear about key patterns emerging from a wide cross section of corporate and institutional Hadoop journeys. Perhaps they’ll inspire yours.
Regunath Balasubramanian (Flipkart Internet)
Slides:   1-PDF 
Aesop is an open source reliable change data propagation system. It has been used to build tiered data stores using best in class SQL and NoSQL databases. Aesop provides simple pubsub-like interfaces with implementations for popular technologies like MySQL, HBase, Redis, Elasticsearch, and Kafka. Aesop scales to multi-node clusters that process millions of data records.
Rod Smith (IBM Emerging Internet Technologies )
Big data and analytics continue to be a disruptive business force. Are we entering another phase – real-time digital business transformation, where businesses are realizing that the time to adjust to market and customer opportunities and threats is shrinking quickly?
Reynold Xin (Databricks)
In this talk, Reynold will look back and review Spark’s growth in adoption, use cases, and development. He will then look forward and discuss both technical initiatives and the evolution of the Spark community for 2016.
Kevin Lee (GrabTaxi)
Why do taxi drivers not want to pick me up when I most need a taxi? Join GrabTaxi's Kevin Lee to learn how GrabTaxi uses machine learning to answer this age old question and build models for predicting taxi availability in order to improve matching on the platform.
Stephen Hardy (National ICT Australia)
Slides:   1-PDF 
Privacy in the world of big data is often considered as a legal or regulatory function. However, there are technology solutions for analytics that can be used today to protect users' privacy and to enable applications over data that is too sensitive to share. We will illustrate the state-of-the-art in privacy-preserving machine learning, including new techniques we have developed.
Pauline Brown (Dataiku)
Slides:   1-PDF 
Getting from raw data to deploying data-driven solutions requires technology, data, and people. All of which exist. So why aren’t we seeing more truly data-driven companies: what's missing and why? Find out how lack of collaboration is what is keeping companies from imagining and actually doing what is possible to accomplish with big data.
Thomas Beaujard (Accenture Digital), Tom Ridsdill-Smith (Woodside)
Slides:   1-PDF 
In 2015 Woodside is working with Accenture to deliver predictive analytics to Woodside’s LNG operations. By combining Accenture’s expertise in data analytics and Woodside’s leading operational experience in oil and gas, valuable, actionable insights have been discovered throughout 2015.
Mike Olson (Cloudera)
Hadoop has come a long way from monolithic storage and batch processing; today the ecosystem is diverse and flexible and is emerging as the foundation of next-generation analytic applications. Join Mike Olson, Cloudera's Chief Strategy Officer, as he discusses new innovations across the ecosystem and gives a vision for Hadoop as an architectural must have for analytics transformation.
Sanqi Li (Huawei)
With the recent advances of big data and machine learning technologies, there has never been a better time for developing telecom data products. However there are various challenges associated with researching and developing telecom data products at scale.
Jennifer Marsman (Microsoft)
Slides:   1-ZIP 
Using the EPOC headset from Emotiv, I can capture the big data stream of EEG from our brains. I will share my results on a “lie detector” experiment comparing brain waves when telling the truth and lying. I have built classifiers based on the EEG data using Azure Machine Learning to predict whether a subject is telling the truth. The effectiveness of multiple classifiers can be easily compared.
Amit Kapoor (narrativeVIZ)
Slides:   1-PDF 
Understand techniques to effectively visualise multi-dimensional data to aid exploratory data analysis. We will look at standard 2D/3D, geometric transformations, glyph-based, pixel-based, and stacking-based approaches to visualise this data, and also explore the interactive approaches needed to make them work.
Jana Eggers (Nara Logics)
Within the next decade, 16 percent of current US jobs will be done by artificial intelligences. It’s time to start thinking about how we onboard these employees. While we’ll look at what it takes to get started with machine learning projects, our focus will be on the top 5 things you need to consider when your next employee is an AI.
Edd Wilder-James (Google)
Slides:   1-PDF 
Big data and data science have great potential for accelerating business, but how do you reconcile the opportunity with the sea of possible technologies? Conventional data strategy has little to guide us, focusing more on governance than on creating new value. In this talk, we explain the how to create a modern data strategy that powers data-driven business.