Presented By O’Reilly and Cloudera

San Jose • London • New York

Make Data Work

March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

HDFS on Kubernetes: Tech deep dive on locality and security

Kimoon Kim (Pepperdata), Ilan Filonenko (Bloomberg LP)

4:20pm–5:00pm Thursday, March 8, 2018

Data engineering and architecture, Data science and machine learning, Streaming systems and real-time applications
Location: LL21 C/D

Average rating:

(5.00, 1 rating)

Download slides (PPTX)

Who is this presentation for?

Data scientists, big data engineers, software developers, and big data architects

Prerequisite knowledge

A basic understanding of Spark and big data platforms and architecture

What you'll learn

Learn how to run Spark on Kubernetes while accessing HDFS data in the right way

Description

There is growing interest in running Spark natively on Kubernetes, and Spark data is often stored in HDFS. Kimoon Kim and Ilan Filonenko explain how to make Spark on Kubernetes work seamlessly with HDFS by addressing challenges such as HDFS data locality and secure HDFS support. Kimoon and Ilan demonstrate how the Spark scheduler can still provide HDFS data locality on Kubernetes if HDFS is also running on Kubernetes and how they made Spark properly discover the mapping of Kubernetes containers to physical nodes to HDFS datanode daemons. You’ll also discover how Spark on Kubernetes interacts with secure HDFS using Kubernetes constructs such as Kubernetes secrets and RBAC. The secure HDFS solution can be used also when Spark on Kubernetes reaches out and accesses HDFS that runs outside Kubernetes clusters.

Kimoon Kim

Pepperdata

Kimoon Kim is a software engineer at Pepperdata. Previously, he worked for the Google Search and Yahoo Search teams for many years. Kimoon has hands-on experience with large distributed systems processing massive datasets.

Website

Ilan Filonenko

Bloomberg LP

Ilan Filonenko is a four-time returning engineering intern at Bloomberg LP, where he has designed and architected distributed systems at both the application and infrastructure level. Previously, Ilan was an engineering consultant and technical lead in various startups and research divisions across multiple industry verticals, including medicine, hospitality, finance, and music. Ilan’s current research studies algorithmic, software, and hardware techniques for high-performance machine learning, with a focus on optimizing stochastic algorithms such as stochastic gradient descent (SGD).

Website

Presented by

Elite Sponsors

Strategic Sponsors

Zettabyte Sponsor

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com