Business and franchise users need access to data to generate reports and dashboards, perform analytics, and create customer-centric predictive/personalization models that assist with managing demand at Choice Hotel properties, but making data available in an accurate, timely, and reliable manner to anyone who is authorized to consume it is no easy task.
Narasimhan Sampath and Avinash Ramineni share how Choice Hotels International used Spark Streaming, Kafka, Spark, and Spark SQL to create an advanced analytics platform that enables business users to be self-reliant by accessing the data they need from a variety of sources to generate customer insights and property dashboards and enable data-driven decisions with minimal IT engagement. Narasimhan and Avinash highlight the architecture, lessons learned, and the challenges that were overcome on both the business and technology fronts.
The analytics platform is designed as a framework to enable self-service data intake, data processing, and report/model generation by the business users. The data-driven framework consists of a distributed hybrid-cloud data ingestor for data intake and a Cloudera CDH cluster with Spark as the distributed compute engine. The solution is built in such a way that storage and compute have been decoupled and encourages the concept of BYOC (bring your own compute). The platform uses EC2 instances to run CDH and leverages Amazon S3 as a data warehouse storage layer (data lake), Spark as an ETL engine, and Spark SQL as a distributed query engine. Results (computations/derived tables) are exposed to the end users via Spark SQL and are discovered via Tableau. The platform supports both batch and streaming use cases and is built on the following technology stack: AWS (S3, EC2, SQS, SNS), Cloudera CDH (YARN, Navigator, Sentry), Spark, Kafka, Spark SQL, and Spark Streaming.
Narasimhan Sampath is a systems architect at Choice Hotels International, one of the largest and most successful hotel chains in the world, where he works on enterprise big data and cloud architectures with a focus on performance tuning and scalability. Narasimhan also has rich experience in a variety of relational and NoSQL databases. He regularly presents at technology events and his work on scalability has been recognized and published by Microsoft.
Avinash Ramineni is a principal at Clairvoyant and leads the engineering efforts in the big data space. He is a passionate technologist with a drive to understand the bigger picture and vision and convert them into pragmatic, implementable solutions. Avinash has over 13 years of experience in engineering and architecting systems on a large scale. He specializes in providing solutions in the areas of big data, cloud, NoSQL, SOA, and event-driven architectures. Before Clairvoyant, Avinash was a principal engineer at Apollo Group, where he was responsible for innovation and technical guidance for all the product development efforts. Avinash holds an MS in computer science from Arizona State University.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.