The Big Data Platform team at Netflix continuously pushes the boundaries of Hadoop 2 and related technologies in the cloud to maximize performance outside of the traditional datacenter environment and expand integration with cloud services. To maximize both storage and processing, we evaluated the performance of columnar file formats against cloud backed storage and subsequently upgraded our storage to Parquet. We will share our experiences integrating Presto with cloud storage and the performance of running interactive queries across our petabyte scale data warehouse in the cloud.
With the constantly improving cloud hosting solutions, we evaluated new cloud instances against our big data workloads and will share our findings and basis for selections moving forward. We will demonstrate how the latest open source addition to the Netflix big data cloud architecture, Inviso, is used for job performance tuning and visualization to help optimize the performance gains in our multi-tenant cloud environment. Finally, we will feature the second major release of Genie with updates for more generic cluster configuration management and job execution for Hadoop 2 applications and beyond, which ties our architecture together.
Daniel Weeks is the tech lead for the Big Data Platform team at Netflix. Prior to joining Netflix, he focused on research in big data solutions and distributed systems.