- Paperback: 310 pages
- Publisher: Packt Publishing Limited (14 April 2014)
- Language: English
- ISBN-10: 1783285559
- ISBN-13: 978-1783285556
- Product Dimensions: 19 x 1.8 x 23.5 cm
- Average Customer Review: Be the first to review this item
- Amazon Bestsellers Rank: #1,11,898 in Books (See Top 100 in Books)
Pig Design Patterns Paperback – Import, 14 Apr 2014
Customers who bought this item also bought
About the Author
Pradeep Pasupuleti has over 16 years of experience in architecting and developing distributed and realtime datadriven systems. Currently, his focus is on developing robust data platforms and data products that are fuelled by scalable machinelearning algorithms, and delivering value to customers by addressing business problems by juxtaposing his deep technical insights into Big Data technologies with future data management and analytical needs. He is extremely passionate about Big Data and believes that it will be the cradle of many innovations that will save humans their time, money, and lives. He has built solid data product teams with experience spanning through every aspect of data science, thus successfully helping clients to build an endtoend strategy around how their current data architecture can evolve into a hybrid pattern that is capable of supporting analytics in both batch and real time-all of this is done using the lambda architecture. He has created COE's (Center of Excellence) to provide quick wins with data products that analyze highdimensional multistructured data using scalable natural language processing and deep learning techniques. He has performed roles in technology consulting advising Fortune 500 companies on their Big Data strategy, product management, systems architecture, social network analysis, negotiations, conflict resolution, chaos and nonlinear dynamics, international policy, highperformance computing, advanced statistical techniques, risk management, marketing, visualization of high dimensional data, humancomputer interaction, machine learning, information retrieval, and data mining. He has a strong experience of working in ambiguity to solve complex problems using innovation by bringing smart people together. His other interests include writing and reading poetry, enjoying the expressive delights of ghazals, spending time with kids discussing impossible inventions, and searching for archeological sites. You can reach him at http://www.linkedin.com/in/pradeeppasupuleti and email@example.com.
Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
To get the free app, enter mobile phone number.
Most helpful customer reviews on Amazon.com
Now the bad stuff. There is a lot of it. If you are actually in the self described target audience for this book, i.e. you are a trained engineer who has used Pig to solve real world problems, you may very well end up feeling cheated! The text takes very well known concepts and describes them as "patterns." CUBE and ROLLUP are the "aggregation pattern." The various JOIN options are described as the "integration pattern." "Patterns" that don't fit existing Pig operators are all basically well known methods in the areas of statistics and a few machine learning algorithms (all implemented with well known and respected open source libraries). Of course, if you have had no exposure to those methods, then you might actually see them as "Pig Design Patterns".
It gets worse. Consider: "Apache Log File Ingestion" is described as a distinct Design Pattern. It is a different pattern from "Custom Log File Ingestion", and different from "XML Ingest". Oh, and those are all different from "JSON Ingest". You see, some of those are "structured" data, and some of them are "unstructured". I'm a little ticked that nginx did not get its own pattern...!
Some of the commentary in the text also made me cringe. For example, the popularity of JSON as an encoding standard is attributed to the rise of "social web companies, such as LinkedIn, Twitter, and Facebook." The discussion continued with statements like "...MongoDB, CouchDB, and Riak, have JSON as their primary storage... [and thus] exhibit extremely high performance... [and] scale horizontally." If you ask me, opinions like that should never be given, even for free!
For the reader who has more than a passing familiarity with statistics, parsing, and fundamental computer algorithms there is little of value here. Design Patterns, whatever else they might be, unfortunately do not substitute for an education!
Finally "Pig Design Patterns" has arrived and has, immediately, become my source book where I have found answers to a lot of my questions.
I've discovered different patterns that exactly fit what I was looking for.
This book should be adopted by every 'serious Pig programmer'.
I give it 5 stars, because of its use cases and machine learning algorithm implementations. It is not simple to design and implement advanced analytic algorithms in Pig. This book covers a large amount of machine learning use cases which I like it a lot.