Presented By O’Reilly and Cloudera
Make Data Work
21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

The cloud is expensive, so build your own redundant Hadoop clusters.

Stuart Pook (Criteo)
11:1511:55 Wednesday, 23 May 2018
Average rating: ****.
(4.40, 5 ratings)

Who is this presentation for?

  • DevOps engineers, big data engineers, and data architects

Prerequisite knowledge

  • Experience with Hadoop or bare metal deployments

What you'll learn

  • Learn lessons and best practices from building and running a production cluster of 2,000 nodes that runs over 300,000 jobs a day, along with a backup cluster of 1,200 nodes, in its data centers rather than in the cloud


Criteo has a main production cluster of 2,000 nodes that runs over 300,000 jobs a day, along with a backup cluster of 1,200 nodes. Criteo’s job is to keep these clusters running together as it builds a cluster to replace the backup cluster. These clusters are in the company’s own data centers, as running in the cloud would be many times more expensive. These two clusters were meant to provide a redundant solution to Criteo’s storage and compute needs, including a tested failover mechanism.

Building a cluster requires testing the hardware from several manufacturers and choosing the most cost effective option. Stuart Pook offers an overview of the project, shares challenges and lessons learned, and discusses Criteo’s progress in building another cluster to survive the loss of a full DC. Criteo has now done these tests twice and can provide advice on how to do it right the first time. The tests were effective except for the RAID controller for the company’s 35,000 disks. Criteo had so many problems using the new controller that it had to replace it and is now working on a solution that will help the company better manage its disks.

Photo of Stuart Pook

Stuart Pook


Stuart Pook is senior DevOps engineer at Criteo, where he is part of Criteo’s Lake team that runs some small and two rather large Hadoop clusters. Stuart loves storage (208 PB at Criteo) and automation with Chef, because configuring more than 3,000 Hadoop nodes by hand is just too slow. Before discovering Hadoop, he developed
user interfaces and databases for biotech companies. Stuart has presented at ACM CHI 2000, Devoxx 2016, NABD 2016, Hadoop Summit Tokyo 2016, Apache Big Data Europe 2016, Big Data Tech Warsaw 2017, and Apache Big Data North America 2017.