Get the free Ebook:
Private and Open Data in Asia: A Regional Guide.
In this talk we will focus on how Tachyon can help improve big data analytics (ad-hoc query) efficiency within Baidu.
Currently within Baidu, we have a production Tachyon cluster with 100 nodes and over 2PB of storage space – this cluster mainly serves as the cache layer for our big data analytics engine. In this talk, first we introduce the big data analytic infrastructure within Baidu. Then, we explain why we started using Tachyon a few months ago, as well as the problems encountered when we started using Tachyon. Next, we delve into the details of how Tachyon help accelerate our Big big data analytics pipeline at its current state. At the end, we discuss what new features we want to see and the plan to scale further.
Bin Fan is a software engineer at Alluxio and a PMC member of the Alluxio project. Previously, Bin worked at Google, building next-generation storage infrastructure, where he won Google’s technical infrastructure award. He holds a PhD in computer science from Carnegie Mellon University.
Currently work at Baidu Big Data Group, focusing on big data infrastructure
Comments on this page are now closed.
©2015, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.