Presented By O’Reilly and Cloudera
Make Data Work
March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

Architecting an open source enterprise data lake

Sagar Kewalramani (Cloudera)
11:50am12:30pm Thursday, March 8, 2018
Average rating: *****
(5.00, 2 ratings)

Who is this presentation for?

  • Analysts, data scientists, data engineers, software engineers, solution architects, and business people

Prerequisite knowledge

  • An understanding of common Hadoop ecosystem tools and components

What you'll learn

  • Learn best practices for building an efficient, cost-effective, and open source enterprise data lake


There are so many business intelligence tools in the Hadoop ecosystem, and no common measure to identify the efficiency of each. Oftentimes, BI tools provide meaningful access to data for certain use cases while failing to fulfill other architectural needs, leaving you stuck with high subscription fees (usually paid upfront or on a yearly basis). So where do you begin to build or modify your enterprise data lake strategy?

Sagar Kewalramani shares real-world BI problems and how they were resolved with Hadoop tools and demonstrates how to build an effective data lake strategy with open source tools and components, focusing on business requirements, deriving business value, and creating a productive and Agile environment for data discovery and business analytics to achieve an organization’s goals.

Sagar offers an overview of an open source enterprise data lake that was designed for scalability, processing resource needs, storage requirements, and cost efficiency. This ecosystem is built using Kafka for reliable message transport, Spark for real-time data processing, Hive for analytical queries, HBase for tactical queries, and Python for machine learning requirements. Along the way, Sagar also illustrates how batch and real-time data processing and analysis can be implemented with open source tools.

Photo of Sagar Kewalramani

Sagar Kewalramani


Sagar Kewalramani is a Strategic Solution Architect & Data Scientist at Cloudera, where he helps Customers Install, Build, Secure, Optimize & tune their Hadoop clusters. He also helps new customers transition to Hadoop platform and implement their initial use cases. Sagar has worked with customers from all verticals, including Banking, Manufacturing, Healthcare, Retail etc. He has wide experience in building business use cases, high volume real-time data ingestion, transformation and movement, and data lineage and discovery. He has led the discovery and development of big data and machine-learning applications to accelerate digital business and simplify data management and analytics. He has spoken in multiple Hadoop & Big Data Conferences including Oreilly Strata. Previously, he was an Data Architect at Meijer Inc. where he was primary focused in Architecture Design and Administration roles for ETL tools and databases including Teradata.