Using HBase

Hadoop: Tools & Technology, Grand East (NY Hilton)
Tutorial Please note: to attend, your registration must include Tutorials.
Average rating: **...
(2.50, 10 ratings)

HBase is one of the new NoSQL data stores that have come up in the recent years and has been gaining popularity at a fast pace. It is a true open source implementation of the Google Bigtable, and is a part of the Hadoop ecosystem. HBase is known to scale to 100s of nodes easily, providing fast random access to terabytes and petabytes of data. This tutorial is to get you started in the world of HBase so you can build a scalable application of your own.

We’ll accomplish this by covering the following aspects:

  • The background of HBase as a datastore
  • Setting up HBase on a *nix machine (bring your laptop with Linux on it. Macs work just as well and so does a remote EC2 instance)
  • Get familiar with the client libraries using hands-on exercises
  • HBase data model and schema design basics
  • Overview of HBase internals and design assumptions

At the end of the tutorial, you’ll have an understanding of how to build applications that use HBase as the backend store.

Requirement: Make sure to come with your laptops (Mac / Linux or access to an EC2 instance) and if possible, download HBase 0.94.1 tar ball from the apache website ( so we can get to work right away. The tutorial includes hands-on exercises.

Photo of Amandeep Khurana

Amandeep Khurana


Amandeep Khurana is a solutions architect at Cloudera, where he’s involved in the entire lifecycle of Hadoop adoption for customers from use-case discovery to taking systems to production. Amandeep is also a coauthor of HBase In Action, a book geared toward building applications using HBase. Prior to Cloudera, Amandeep was at Amazon Web Services, where he was a part of the Elastic MapReduce team, and built the first version of EMR’s HBase offering.

Matteo Bertozzi


Software Engineer at Cloudera, currently focused on the Apache HBase project.

Comments on this page are now closed.


Warren Pfeffer
10/22/2012 6:16pm EDT

Would the Cloudera CDH3 version be OK?

Name : hadoop-hbase Version : 0.90.6+84.73 Repo : cloudera-cdh3

Eric Czech
10/22/2012 1:30pm EDT

Are there any best practices for serving low-latency random reads from HBase using a cluster that is simultaneously running a lot of MapReduce jobs? More specifically, how do you keep the MapReduce jobs from creating intermittent, large spikes in read latency? Is replication typically the best option for dealing with this?

Picture of Amandeep Khurana
Amandeep Khurana
10/22/2012 11:11am EDT

Matthew, not really. Any linux instance should do fine as long as you are able to connect to it from your laptop. I’d recommend not using EC2 because you’ll need reliable internet connectivity for the period you are doing exercises.

Jack, we’ll work with standalone. You don’t need Hadoop installed. In fact, it’s cleaner to keep Hadoop out of the picture for this tutorial.

Picture of Matthew Kleiderman
Matthew Kleiderman
10/22/2012 8:53am EDT

Any configuration suggestions for EC2 instances?

Jinyuan Zhou
10/21/2012 6:55pm EDT

Hi Amandeep, are we going to run some examples on top of a pseudo cluster? If we do, does the cluster version matter? I have installed hadoop 1.0.4 but hbase 0.94.1 has a hadoop-core-1.0.3.jar in its lib direcotry. Does this matter? Thanks,

Picture of Amandeep Khurana
Amandeep Khurana
10/21/2012 3:19pm EDT

0.94.0 would work just fine and so would 0.92.x. We’ll be doing some basic work with the API and any of those versions would suffice.

Robert Goretsky
10/21/2012 10:13am EDT

I’m just preparing my Mac laptop for the tutorial on Tuesday. I have been using the ‘brew’ package manager to install hadoop and hbase. The latest version of hbase supported by brew currently is 0.94.0. Is there anything critical in the upgrade to 0.94.1 that is needed for this tutorial? If so I could take a stab at updating the brew formula – I think it just involves pointing it to the correct tarball..


Sponsorship Opportunities

For information on exhibition and sponsorship opportunities, contact Susan Stewart at

Media Partner Opportunities

For information on trade opportunities contact Kathy Yu at mediapartners

Press and Media

For media-related inquiries, contact Maureen Jennings at

Contact Us

View a complete list of Strata contacts.