- Paperback: 325 pages
- Publisher: Manning Publications; 1 edition (31 January 2016)
- Language: English
- ISBN-10: 1633430030
- ISBN-13: 978-1633430037
- Product Dimensions: 18.5 x 1 x 23.4 cm
- Average Customer Review: 3 customer reviews
- Amazon Bestsellers Rank: #8,02,325 in Books (See Top 100 in Books)
Other Sellers on Amazon
+ 90.00 Delivery charge
Includes Import Fees Deposit
+ 496.03 Delivery charge
Introducing Data Science: Big Data, Machine Learning, and more, using Python tools Paperback – 31 Jan 2016
Customers who bought this item also bought
Customers who viewed this item also viewed
About the Author
Davy Cielen is one of the founders and managing partners of Optimately where he focuses on leading and developing data science projects and solutions in various sectors and closely follows new developments in data science. Before Optimately he worked on data science and big data projects at a major retailer.
Arno Meysman is one of the founders and managing partners of Optimately where he focuses on leading and developing data science projects and solutions in various sectors and closely follows new developments in data science. Before Optimately he worked on data science and big data projects at a major retailer. Apart from data science he is also into data visualisation and generally "Creating data-driven things that are smart, interactive and pretty".
Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
To get the free app, enter mobile phone number.
What other items do customers buy after viewing this item?
Top customer reviews
There was a problem filtering reviews right now. Please try again later.
This book is just an appetizer. Once you are done with it, it is up to you to choose a path; big data analysis, machine learning etc.
So, if you are interested in knowing more about data science, go forward and pick this book. You won't be disappointed.
Most helpful customer reviews on Amazon.com
This book would provide you with a very well rounded approach to Data Science and by that I mean truly would give you a ride though all the aspects of this field versus showing you some regression algorithm using python and call it Data Science.
Book has it all - not only it leverages probably the most favorite language (python) for its examples, it also goes in details in supporting tools and eco systems. For examples, Spark - Why create something when Spark is already here and we can just use it in our work.
It covered NoSQL technologies to give readers enough information to get started and weighted pros and cons of each. I especially enjoyed reading ACID, BASE and CAP theorem sections. I am familiar with them and gave presentation on exact same topic few years ago and I enjoyed the read since it covered the important key points leaving me with nice warm feeling in my stomach that unaware readers will be in a good hands!
During discussion of NoSQL, ElasticSearch was introduced and entire chapter was devoted on how to leverage search capabilities to provide us with valuable results... Search is something that ElasticSearch does best! Section about Damerau-Levenshtein was great. It made you think of dirty data that is present in the real world and how you deal with it (vs giving you example with perfectly clean and ready to use data)
Speaking of real world experience - this book took a step back and instead of trying to be data science book and throwing cool python libraries at you, it talked about general approach in the real word when you deal with data science projects by trying to make you think of project's research goals - Why are we doing this? This was done to help you think and to help you pick the right solutions.
Another example of real world problems was their chapter on dealing with big and i mean truly big data. In some sample program, you can surely play with tens of hundreds of sample records, but what do you do with gigs or more of data? while running production servers, you are not dealing with 2-3 lines of log entries, you deal sometimes with gigs! So I was very happy to see section that talked on how you can tackle problems like that.
Authors did a great job in my opinion by cloning and making it available pywebhdfs package that would work with their example of the code (they did use now outdated hortonworks sandbox that made it hard to follow in few chapters, but it was not hard to figure out where menus/buttons were moved)
A nice final touch that I felt was great was section on results visualization. How would you communicate what you found to others? will you point them at some hard to read print out, OR shows them a picture/graph that makes your findings easy to read?
So... many many gems in this book that would really give you a great overview of the field of data science and would get you started not only in strictly academic / demo only way, but also in real life production environment.
I definitely would be re-reading this book and recommending it to my colleagues!
To be clear, machine learning is included with algorithm explanations in the book complete with Python code examples. This includes typical data science topics such as sparse data, text mining, supervised and unsupervised learning. Data scientists tend to split into "R" and "Python" bins and this book is a shout out to "Python" . A nod is given to "R" with the availability of the RPy library and "R" popularity.
The authors address scaling Python code with both optimization and using big data tools. They give a crisp overview of the Hadoop framework and the memory advantages of using Spark. Another important part of data science is working with data and this book provides an excellent overview of SQL and NoSQL databases complete with ACID and BASE concepts and contrasts. Special attention is given to graph databases arguing that this is a contender for efficiently modelling complex data.
This book is great for the aspiring data science to become familiar with the data science process. This book does require a technical background in order to understand how to set up the examples and follow the theory. It would also be useful for a manager, data architect or data engineer to understand how to best support a data scientist to find business solutions by data mining gems from a business data pool.