Mithun Radhakrishnan is a committer on the HCatalog project, and a Hive developer at Yahoo. He’s the author of DistCp on Hadoop 0.23+. He’s an erstwhile firmware developer and is prone to flare-ups from C++ withdrawal.
The past year has seen the advent of various "low latency" solutions for querying big data such as Shark, Impala, and Presto. The Hive team at Yahoo has spent the past several months benchmarking several versions of Hive (and Tez), with several permutations of file-formats, compression, and query engine features, at various data sizes. In this talk, we present our tests, the results, and findings.