Recruit Group is one of the largest web service providers in Japan. It has many services covering diverse business fields, including travel and restaurant reservation, human resource services, and POS systems. Analyzing application logs collected from these various services enable the company to provide more insightful services for individuals and corporate customers. Rough estimates show the log size to be around 1 TB per day, and the number of servers/instances to collect logs from will be 1,000+ in the future.
Recruit Group had to design a platform that could handle all these ever-changing requirements. It began with a project to collect and analyze all the application logs generated by these services efficiently and easily. The first step was to develop a platform to handle extensive logs from upstream applications and transfer them to downstream ones in an efficient and effective manner. This platform is based on the data hub architecture and utilizes Apache Kafka for high performance and scalability. The Kafka cluster was developed on Google Compute Engine along with some managed services in Google Cloud Platform, such as BigQuery and Pub/Sub, for analysis.
Recruit Group faced quite a few technical problems while developing this platform. Kenji Hayashida and Toru Sasaki share some of these critical problems and explains how the company solved them. Along the way, you’ll explore the platform and get lessons learned and best practices drawn from this experience.
Topics include:
Kenji Hayashida is a Japan-based data engineer at Recruit Lifestyle Co., Ltd., part of Recruit Group, where he has worked on projects such as advertising technology, content marketing, and the company’s data pipeline. Kenji started his career as a software engineer at HITECLAB while he was in college. He is the author of a popular data science textbook and holds a master’s degree in information engineering from Osaka University. In his free time, Kenji enjoys programing competitions such as TopCoder, Google Code Jam, and Kaggle.
Toru Sasaki is a system infrastructure engineer and leads the OSS professional services team at NTT Data Corporation. He is interested in open source distributed computing systems, such as Apache Hadoop, Apache Spark, and Apache Kafka. Over his career, Toru has designed and developed many clusters utilizing these products to solve his customers’ problems. He is a coauthor of one of the most popular Apache Spark books written in Japanese.
For exhibition and sponsorship opportunities, email strataconf@oreilly.com
For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com
View a complete list of Strata Data Conference contacts
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com