Qubole’s Big Data Service began three years back with a hardened Hadoop-1 stack and later started offering YARN-based clusters for next-generation technologies like Spark and Tez, in addition to MapReduce. YARN is a big shift from the traditional Hadoop model, and it supports multitenant platforms, but with support for multitenancy—not to mention bigger organizations moving to the cloud—security becomes a major concern. With YARN, security features such as SSL encryption, Kerberos-based authentication, and HDFS encryption were added.
Achieving the same level of reliability and performance as Qubole’s first-generation Hadoop offering and being able to migrate over scores of customers to use these new security features was a big challenge. Qubole offers running YARN-based services on cloud with features like autoscaling, where nodes may be added and removed at runtime, making it challenging to do SSL-based communication in between them. In addition, services like Hive Server, Spark Notebooks, and Qubole-specific services need to communicate with YARN. Nitin Khandelwal and Abhishek Modi share the challenges they faced in enabling these features for ephemeral clusters running in the cloud with multitenancy support as well as performance numbers for different encryption algorithms available.
Nitin Khandelwal is working at Qubole as a Staff Engineer. He has worked in a different arena of projects like adding encrypted communication for ephemeral clusters nodes running in the cloud, providing Hive as a multi-tenant service, Autoscaling, etc. He has been contributing significantly in optimizing Tez engine for ETL workloads by adding features like workload-aware autoscaling, fault-tolerance, effective use of spot nodes, etc.
Previously, Nitin was working with Microsoft on VPN Site-to-site gateway service which forms the backbone of Microsoft Azure Stack’s network.
Nitin has completed his Masters in Computer Science from IIIT-Hyderabad. His main areas of focus there were distributed computing, databases and networks.
Abhishek Modi works on Hadoop and YARN stack at Qubole, where he has worked on key features in YARN like its autoscaling framework and balancing of spot nodes in cluster. Previously, he worked with Adobe Systems, where, during his tenure, he filed multiple patents. Abhishek holds a degree from IIT-Varanasi.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.