Get the free Ebook:
Private and Open Data in Asia: A Regional Guide.
Next generation big data engines (Apache Spark, Tez, etc.) are famous for their performance boost within memory computing. However, current memory size is far from enough to host a data set. Then NVM emerged to respond to this need. However, how to integrate NVM to support a modernized big data system is a challenge. For example, to handle a bunch of GC overheads, and refactor your system API, etc. It does bring benefits for in-memory and real-time computation, but also raises new questions about memory management in big data.
In this talk, we present our efforts to make a tiered store in Tachyon, which provided a software solution for next-gen data center platforms with NVM. It plays transparently to the end user but brings better performance for real-world applications.
Mingfei Shi is a senior software engineer on Intel’s big data technology team. He is one of the top contributors to the Tachyon project, and also a contributor to the Spark project.
Bin Fan is a software engineer at Alluxio and a PMC member of the Alluxio project. Previously, Bin worked at Google, building next-generation storage infrastructure, where he won Google’s technical infrastructure award. He holds a PhD in computer science from Carnegie Mellon University.
©2015, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.