Hadoop Data Warehousing with Hive

Hadoop in Practice Ballroom E
Tutorial Please note: to attend, your registration must include Tutorials on Tuesday.
Average rating: ****.
(4.69, 13 ratings)

*There are a few setup steps I need you to do in advance. They should take a few minutes at most. You’ll find the instructions here: http://thinkbig-academy.s3.amazonaws.com/Strata2013/HiveTutorial/index.html

You can view these instructions now. The zip file with the tutorial content will be available Friday, February 22nd. If you have any problems, post a comment to the tutorial page on the conference site: https://conferences.oreilly.com/strata/strata2013/public/schedule/detail/26899

See you in Santa Clara!
Dean Wampler*

In this hands-on tutorial, you’ll learn how to use Hive for Hadoop-based data warehousing. You’ll also learn some tricks of the trade and how to handle known issues.

Writing Hive Queries

We’ll spend most of the tutorial using a series of hands-on exercises with actual Hive queries, so you can learn by doing. We’ll go over all the main features of Hive’s query language, HiveQL, and how Hive works with data in Hadoop.

Advanced Techniques

Hive is very flexible about the formats of data files, the “schema” of records and so forth. We’ll discuss options for customizing these and other aspects of your Hive and data cluster setup. We’ll briefly examine how you can write Java user defined functions (UDFs) and other plugins that extend Hive for data formats that aren’t supported natively.

Hive in the Hadoop Ecosystem

We’ll learn Hive’s place in the Hadoop ecosystem, such as how it compares to other available tools. We’ll discuss installation and configuration issues that ensure the best performance and ease of use in a real production cluster. In particular, we’ll discuss how to create Hive’s separate “metadata” store in a traditional relational database, such as MySQL. We’ll offer tips on data formats and layouts that improve performance in various scenarios.

Photo of Dean Wampler

Dean Wampler


Dean Wampler, Ph.D., is Principal Consultant at Think Big Analytics, specialists in Big Data, particularly Data Science in the Hadoop ecosystem. He speaks frequently at conferences on various big data and other programming topics.

Dean is the co-author of Programming Hive, the author of Functional Programming for Java Developers, and the co-author of Programming Scala all published by O’Reilly.

Comments on this page are now closed.


Picture of Roy Ben Alta
Roy Ben Alta
02/23/2013 10:36pm PST

Thanks, that works

Picture of Dean Wampler
Dean Wampler
02/23/2013 2:01am PST

There’s a typo in the web page, which I’ll fix. This link downloads the zip file: https://s3.amazonaws.com/thinkbig-academy/Strata2013/HiveTutorial/tutorial.zip

Chon Lei
02/23/2013 1:20am PST

I follow the instructions to download the tutorial.zip file but my browser comes back with “server not found” message. Has the file been set up for download? Here is the link I use:



Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Susan Stewart at sstewart@oreilly.com

Media Partner Opportunities

For information on trade opportunities with O'Reilly conferences contact Kathy Yu at mediapartners

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata contacts