Data generation rates are expected to grow very fast for some database workloads going into the second run of the Large Hadron Collider and beyond. In particular this is expected for data coming from controls, logging, and monitoring systems. Storing, administering, and accessing big data sets in a relational database system is in certain cases very demanding on the technology and therefore on cost. Thus, there is high interest in the CERN database community to find alternative solutions to relational database systems for storing and querying big data volumes with fast and scalable data access time. Scale-out database engines are an emerging and rapidly developing area. Recently a technical solution that has attracted attention is Cloudera Impala with columnar storage provided by Parquet on top of the Hadoop Distributed File System. This solution has the additional benefit of offering SQL as the main data access interface, which makes it easy to integrate with existing client applications. In this presentation we will discuss the results of our tests with the Cloudera Impala data querying engine, including tests of data loading and integration with existing data sources, notably Oracle databases. We will report on query performance tests done with various data sets of interest at CERN, especially the accelerator log database.
Zbigniew Baranowski is a database systems specialist and a member of a group which provides and supports central database services at CERN.
Comments on this page are now closed.
©2015, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.