The class of big data computations known as distributed merge trees was built to aggregate user information across multiple data sources in the media domain. Vijay Srinivas Agneeswaran explores prototypes built on top of Apache HAWQ, Druid, and Kinetica, one of the open source GPU databases. Results show that Kinetica on a single G2.8x node outperformed clusters of HAWQ and Druid nodes.
Data science for enterprise use cases explodes the number of intermediate datasets. Thus, one of upcoming challenges is to find a way into these ever-growing data sources. Andy Petrella proposes a data-science-on-data-science approach, using behavioral data combined with static and runtime metadata of processes.