In the last few years, Netflix’s S3 data warehouse has grown to more than 100 PB. In that time, the company has shared several techniques and released open source tools for working around S3’s quirks, including s3mper to work around eventual consistency, S3 multipart committers to commit data without renames, and the batchid pattern for cross-partition atomic commits.
Ryan Blue and Daniel Weeks share lessons learned, the tools Netflix currently uses and those it has retired, and the improvements it is rolling out, including Iceberg, a new table format for S3 that is replacing many of the company’s current tools. Iceberg enables a new generation of improvements, including:
Ryan Blue is an engineer on Netflix’s big data platform team. Previously, Ryan was responsible for the Avro and Parquet file formats at Cloudera. He is the author of the Analytic Data Storage in Hadoop series of screencasts from O’Reilly.
Daniel Weeks manages the big data compute team at Netflix and is a Parquet committer. Previously, Daniel focused on research in big data solutions and distributed systems.
For exhibition and sponsorship opportunities, email strataconf@oreilly.com
For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com
View a complete list of Strata Data Conference contacts
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com