The problem with writing a book about a platform like Hadoop is that as soon as the book gets published, the material is already outdated.  Programming Hive is just two years old and already has that problem.  What it has going in its favor is a comprehensive look at how Hive works (for the most part, given how much has changed since its publication).  I enjoy how the authors show Hive as a lot more than a simple SQL interface to Hadoop.  Being able to create and maintain indexes, partition tables, and introduce Java user-defined functions give this language a lot of power.  Pair it with a Hadoop platform (my favorite is the Hortonworks sandbox, but go with your preference if you have one) and run with it.  Unfortunately, a lot of the examples won’t work exactly as written due to changes in the language, but when you run into those, you can either skip them or find out how to do it with a more recent version of Hadoop and Hive.

Leave a comment