Apache Kylin is an open source distributed analytics engine contributed by eBay Inc. that provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop, supporting extremely large datasets. Kylin’s pre-built MOLAP cubes, distributed architecture, and high concurrency helps users analyze multidimensional queries using Kylin’s SQL interface as well as via other BI tools like Tableau and MicroStrategy. Kylin is successfully deployed and used in eBay for a variety of production use cases, including web traffic analysis and geographical expansion analysis. It was open sourced on Oct 1, 2014 and has 320 stars and 125 forks.
Kylin was accepted as an Apache Incubator Project on Nov 25, 2014.
The challenge faced at eBay is that our data volume has become bigger while our user base has become more diverse. Our users—for example, in analytics and business units—consistently ask for minimal latency but want to continue using their favorite tools, such as Tableau and Excel. We worked closely with our internal analytics community and outlined requirements for a successful product at eBay:
- Sub-second query latency on billions of rows
- ANSI SQL availability for those using SQL-compatible tools
- Full OLAP capability to offer advanced functionality
- Support for high cardinality and very large dimensions
- High concurrency for thousands of users
- Distributed and scale-out architecture for analysis in the TB to PB size range.
We quickly realized nothing met our exact requirements externally—especially in the open-source Hadoop community. To meet our emergent business needs, we decided to build a platform from scratch. With an excellent team and several pilot customers, we have been able to bring the Kylin platform into production as well as open-source it.
Kylin is a platform offering the following features for big data analytics:
- Extremely fast OLAP engine at scale – Kylin is designed to reduce query latency on Hadoop for 10+ billion rows of data.
- ANSI SQL on Hadoop – Kylin supports most ANSI SQL query functions in its ANSI SQL on Hadoop interface.
- Interactive query capability – Users can interact with Hadoop data via Kylin at sub-second latency—better than Hive queries for the same dataset.
- MOLAP cube query serving on billions of rows – Users can define a data model and pre-build in Kylin with more than 10+ billion raw data records.
- Seamless integration with BI Tools – Kylin currently offers integration with business intelligence tools such as Tableau and third-party applications.
- Open-source ODBC driver – Kylin’s ODBC driver is built from scratch and works very well with Tableau. We have open-sourced the driver to the community as well.
- Job management and monitoring
- Compression and encoding to reduce storage
- Incremental refresh of cubes
- Leveraging of the HBase coprocessor for query latency
- Approximate query capability for distinct counts (HyperLogLog)
- Easy-to-use web interface to manage, build, monitor, and query cubes
- Security capability to set ACL at the cube/project level
- Support for LDAP integration.
- What’s Kylin
- Tech highlights
- Open source
- Q & A
Co-creator and PMC member of Apache Kylin, Sr. Product Manager of eBay.
Luke Han joined eBay in late 2011 as staff BI architect of Business Intelligence Platform Team. He is Sr. Product Manager of Kylin managing the evolution of the Kylin as the Interactive Analytics solution on top of Hadoop, driving Kylin’ vision, roadmap, features, plan, Engaging customers and coordinating various teams from different geographies, developing partners internal and external to grow up Apache Kylin community.
Prior to eBay, Luke was a Sr. Consultant at Actuate.
Yang Li is the tech lead for Apache Kylin. He joined eBay-Shanghai in January 2014 as a member of the technical staff, and has been a key developer and architect of the Kylin OLAP engine. Yang also leads the Kylin team of engineers in Shanghai, where they develop the Kylin product and deploy it for eBay as an analytics platform. Prior to eBay, Yang spent eight years with IBM and two years with Morgan Stanley. At IBM, Yang was focused on the core Java library (Apache Harmony), J2EE, and big data engineering development. He was the technical lead at IBM User Technologies and won the Outstanding Technical Achievement Award in 2008. During his time with Morgan Stanley, Yang was the vice president of the Asia Markets team responsible for global regulatory reporting architecture, engine development, and end-to-end production support infrastructure. Yang received his Master’s degree from the School of Computer Science at Shanghai Jiaotong University, a highly-ranked school for computer science in China.
©2015, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.