Coupling online data processing with scalable analytics is a popular technique for systems that produce a large amount of data, such as those at CERN. This has been always difficult to achieve with traditional database systems or the Hadoop ecosystem. Although it is feasible, it involves many compromises and brings with it extra costs or complexity.
Apache Kudu is a new, innovative distributed storage that combines low-latency data ingestion, scalable analytics, and fast data lookups. But what does it deliver in practice? Zbigniew Baranowski explains how to use Apache Kudu for scale-out database-like systems, such as those used at CERN for controlling and supervising the accelerators infrastructure and the particle collisions catalogue, covering the advantages and limitations and measuring performance.
Zbigniew Baranowski is a database system specialist and a member of a group that provides central database and Hadoop services at CERN.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org