Presented By O'Reilly and Cloudera
Make Data Work
Sept 29–Oct 1, 2015 • New York, NY

How a global entertainment company successfully built a data lake for continued digital dominance

Joe Caserta (Caserta Concepts), Elliott Cordo (Caserta Concepts, LLC)
2:05pm–2:45pm Wednesday, 09/30/2015
Data-driven Business
Location: 1 E10 / 1 E11 Level: Intermediate
Average rating: ****.
(4.00, 7 ratings)
Slides:   1-PPTX 

With a broad roster of new stars and legendary artists, this global record company has long been considered a technology innovator and progressive force in the music business. Ahead of the industry with its development of a dedicated digital strategy team, the company is recognized for adopting and harnessing digital technology for music creation and distribution.

Recognizing that ongoing digital innovation is essential to this music giant’s continued success, it is clear there is a need to replace legacy applications with a fully integrated framework based on open source, big data technologies. This means seamlessly integrating data from their more than 15+ record labels and a global publishing catalog containing more than 100 million+ copyrights held worldwide.

To transition their paradigm and embrace the benefits of a rapidly evolving technology ecosystem, the company and Caserta Concepts worked together to provide a comprehensive roadmap and implementation of a new data platform in the cloud.

In this presentation, you’ll hear about the strategy, the process, the challenges, and the solution, including:

  • The process for assessing the current landscape, and how strategic recommendations that included re-architecting critical components of the platform to gain stability, performance and resiliency, were developed
  • The core components required to build a production-worthy data lake
  • Framework to integrate data feeds from real-time and streaming sources such as Pandora, Spotify, iTunes, etc., each supplied in different formats and different time sequences
  • The task of systematically onboarding the more than 140+ unique data feeds
  • The strategy built with a laser focus on capacity models to ensure system scalability
  • Resolving data ingestion bottlenecks with a new, open and scalable framework that seamlessly accommodates existing and new data sources
  • A complete overview of the data ecosystem core components and moving parts.

The discussion also covers emerging alternatives such as Spark, and how, where, when, and why these technologies are relevant in the new data lake architecture.

Photo of Joe Caserta

Joe Caserta

Caserta Concepts

Joe Caserta is president of Caserta Concepts, an award-winning New York-based innovation consulting and technology implementation firm specializing in big data analytics, data warehousing, business intelligence solutions, and helping clients maximize data value. A recognized big data strategy consultant, author, and educator, Joe is coauthor of the best-selling book The Data Warehouse ETL Toolkit (Wiley, 2004), a contributor to industry publications, and frequent keynote speaker and expert panelist at industry conferences and events. He also serves on the advisory boards of financial and technical institutions and is the organizer and host of the Big Data Warehousing Meetup group in NYC.

Photo of Elliott Cordo

Elliott Cordo

Caserta Concepts, LLC

Elliott is a big data, data warehouse, information management and technology innovation expert with a passion for helping transform data into powerful information. He has more than a decade of experience in implementing tailored big data and data warehouse solutions with hands-on experience in every component of the data warehouse software development lifecycle. At Caserta Concepts, Elliott oversees large-scale major technology projects, including those involving cloud, business intelligence, data analytics, big data and data warehousing.
Elliott is recognized for his many successful Big Data projects ranging from Big Data Warehousing, Machine Learning, with his personal favorite, Recommendation Engines. His passion is helping people understand the true potential in their data, working hand in hand with clients and partners to learn and develop cutting edge platforms to truly enable their organizations.