Presented By O'Reilly and Cloudera
Make Data Work
December 1–3, 2015 • Singapore

Hadoop & Beyond conference sessions

Wednesday, December 2

Add to your personal schedule
11:00am–11:40am Wednesday, 12/02/2015
Location: 328-329 Level: Intermediate
Bin Fan (Alluxio), Xiang Wen (Baidu)
Average rating: ***..
(3.22, 9 ratings)
Baidu runs Tachyon in production with more than 100 nodes managing 2PB space! In this talk we will focus on how Tachyon can help improve big data analytics (ad-hoc query) with 30X performance improvement within Baidu. Read more.
Add to your personal schedule
11:50am–12:30pm Wednesday, 12/02/2015
Location: 328-329 Level: Non-technical
Feng-Yuan Liu (Infocomm Development Authority of Singapore)
Average rating: ***..
(3.91, 11 ratings)
At IDA’s Government Analytics department, our team of data scientists work with bus operators to offer demand-driven express bus routes by combining crowdsourcing and big data. We use Apache Spark to analyze ticketing, taxi, and crowdsourced data to find bus routes that are both time-saving and financially viable. We show how these insights are delivered into a new transport option for commuters. Read more.
Add to your personal schedule
1:30pm–2:10pm Wednesday, 12/02/2015
Location: 328-329 Level: Intermediate
Tags: commerce
Regunath Balasubramanian (Flipkart Internet)
Average rating: ***..
(3.67, 3 ratings)
Aesop is an open source reliable change data propagation system. It has been used to build tiered data stores using best in class SQL and NoSQL databases. Aesop provides simple pubsub-like interfaces with implementations for popular technologies like MySQL, HBase, Redis, Elasticsearch, and Kafka. Aesop scales to multi-node clusters that process millions of data records. Read more.
Add to your personal schedule
2:20pm–3:00pm Wednesday, 12/02/2015
Location: 328-329 Level: Intermediate
Gwen Shapira (Confluent)
Average rating: ***..
(3.75, 4 ratings)
Kafka provides the low latency, high throughput, high availability, and scale that financial services firms require. But can it also provide complete reliability? In this session, we will go over everything that happens to a message - from producer to consumer - and pinpoint all the places where data can be lost if you are not careful. Read more.
Add to your personal schedule
4:00pm–4:40pm Wednesday, 12/02/2015
Location: 328-329 Level: Intermediate
Ted Malaska (Blizzard), Mark Grover (Cloudera)
Average rating: ***..
(3.71, 7 ratings)
In this session, we will discuss common archictectural patterns for building streaming applications. Read more.
Add to your personal schedule
4:50pm–5:30pm Wednesday, 12/02/2015
Location: 328-329 Level: Intermediate
Tyler Akidau (Google)
Average rating: ***..
(3.91, 11 ratings)
Join me for a whirlwind tour of the conceptual building blocks of massive-scale data processing systems over the last decade, comparing and contrasting systems at Google with popular open source systems in use today. Read more.

Thursday, December 3

Add to your personal schedule
11:00am–11:40am Thursday, 12/03/2015
Location: 328-329 Level: Intermediate
Felipe Hoffa (Google), Kalev Leetaru (GDELT Project (http://gdeltproject.org/))
Average rating: ***..
(3.75, 8 ratings)
The GDELT Project is a real-time open data global graph over human society, inventorying the world’s events, emotions, and narratives in 65 languages, used by organizations from the UN to Wall Street. Google BigQuery enables real-time querying and whole-of-data analysis of GDELT, such as exploring the cycles of world history through mass cross-correlation. Read more.
Add to your personal schedule
11:50am–12:30pm Thursday, 12/03/2015
Location: 328-329 Level: Intermediate
Tags: commerce
Utkarsh B (Flipkart Internet Private Limited), Vinod Venkatraman (Flipkart Internet Private Limited)
Average rating: ****.
(4.00, 2 ratings)
Have you faced the challenge of storing and optimally serving multibillion-row EAV modeled data out of a traditional data store? Monolithic data stores fall short, even with fast storage like SSDs for a large online marketplace, quantified here as 3 billion catalog entries and 100 million catalog updates in a day. This talk is about paradigms and patterns we adopted to address this problem. Read more.
Add to your personal schedule
1:30pm–2:10pm Thursday, 12/03/2015
Location: 328-329 Level: Intermediate
Evan Chan (Tuplejump)
Average rating: ***..
(3.67, 3 ratings)
This talk will show architectures and techniques for combining Apache Cassandra and Spark to yield a 10-1000x improvement in OLAP analytical performance, and introduce a new open source database that takes advantage of these techniques. Read more.
Add to your personal schedule
4:00pm–4:40pm Thursday, 12/03/2015
Location: 328-329 Level: Intermediate
Mingfei Shi (Intel), Bin Fan (Alluxio)
Average rating: ****.
(4.00, 3 ratings)
Current memory size is far from enough to host data sets. NVM has emerged to respond to this need. However, how to integrate NVM to support a modernized big data system is a challenge. In this talk, we present our efforts to make a tiered store in Tachyon, which provided a software solution for next-gen data center platforms with NVM. Read more.
Add to your personal schedule
4:50pm–5:30pm Thursday, 12/03/2015
Location: 328-329 Level: Advanced
Sandy Ryza (Cloudera)
Average rating: ****.
(4.00, 1 rating)
This talk will cover Spark design patterns in time series analysis, visualizing data, and Monte Carlo simulation; and will show you what it is like to approach financial modeling with Spark. Read more.
Add to your personal schedule
4:50pm–5:30pm Thursday, 12/03/2015
Location: 334-335 Level: Intermediate
Dave Chan (UBM Asia), Sonal Goyal (Nube)
Average rating: ****.
(4.00, 7 ratings)
UBM Asia is the largest trade show organizer in Asia. To deal with duplicate customer records and ensure clean marketing data, UBM Asia has built an end to end solution using Reifier from Nube Technologies built atop Spark. This talk will discuss UBM's use case and our use of Reifier fuzzy matching engine, Spark and machine learning. We will also cover Reifier's architecture and usage of Spark. Read more.