- Paperback: 536 pages
- Publisher: Dreamtech Press (19 September 2013)
- Language: English
- ISBN-10: 9351191508
- ISBN-13: 978-9351191506
- Package Dimensions: 23.6 x 18 x 4 cm
- Average Customer Review: Be the first to review this item
- Amazon Bestsellers Rank: #2,07,113 in Books (See Top 100 in Books)
Hadoop in Practice (Manning) Paperback – 19 Sep 2013
Customers who viewed this item also viewed
What other items do customers buy after viewing this item?
About the Author
Alex Holmes is a senior software engineer with extensive expertise in solving big data problems using Hadoop. He has presented at JavaOne and Jazoon and is a technical lead at VeriSign
Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
To get the free app, enter mobile phone number.
Most helpful customer reviews on Amazon.com
None of the installation instructions in the book will work with the newer versions of applications. In some cases the entire idea of how you would run and use a tool has changed. Also, the entire way that HDFS and Map-reduce works has changed since YARN was added, so the books explanation of that is old.
The book often omits important details like which jar you need to use for a particular piece of code. Classpath and dependency issues are always a nightmare to deal with and the book offers little help with this. He should list everything that you would put in a maven dependency. He often omits the import lines in java code, so you have little idea which class he is referring to in the code.
There are often times when he requires you to use software written by him, such as the "File Slurper" that Alex wrote. I am very wary of using any code like that, if it doesn't have the support of the apache/hadoop community then it's very likely to be out of date and unsupported sooner or later. I skipped any chapter I saw like that. I kept seeing this reference to a bash script called "run.sh" in the book, and could not figure out what he was referring to. I could find no such shell script in any software I downloaded. I think it must be a bash script in his git project, like I said I don't want to depend on any code that is not supported by the community.
There were also COUNTLESS compatibility issues I found when I tried to do anything. Almost no two pieces of hadoop software work together out of the box. It's so bad that using anything besides cloudera's hadoop was practically impossible. I am not a stupid guy either.
Here is my advice to you:
1. Use cloudera's pre-built CDH VM, at least at first. I used the CDH 4.5 pre-built VM, and that is the only thing I got to work.
2. Do not follow any installation instructions in the HIP book
3. Do not follow any installation instructions on the hadoop websites
4. Only follow installation/re-configuration instructions found in Cloudera's manual for CDH 4.5 installation
5. Do not deviate your configuration from what is norm. For example, I encountered a lot of bugs when I tried switching to java 7.
5. You might want to hold off from buying this book until a newer issue is released
6. If you use maven for dependencies, make sure you get your hadoop dependencies from the cloudera repository, not maven central
7. Instead of reading the book, just go into each of the hadoop project's websites. Skip their installation instructions like I said before, but try to follow any tutorials you see, and try to practice using everything you read.
8. After you figure out how to do everything, only then should you try to install stuff from scratch on a new VM. If you try to set up a VM on your own from the start, all the frustration will kill your motivation to learn hadoop.
The one thing this book was good for was giving me ideas of what things to try, which is why I give it two stars instead of one.