- Paperback: 258 pages
- Publisher: Packt Publishing Limited (30 April 2015)
- Language: English
- ISBN-10: 1785280422
- ISBN-13: 978-1785280429
- Product Dimensions: 19 x 1.5 x 23.5 cm
- Average Customer Review: Be the first to review this item
- Amazon Bestsellers Rank: #2,54,375 in Books (See Top 100 in Books)
Python Data Science Essentials Paperback – Import, 30 Apr 2015
There is a newer edition of this item:
Customers who viewed this item also viewed
What other items do customers buy after viewing this item?
About the Author
Alberto Boschetti is a data scientist with expertise in signal processing and statistics. He holds a PhD in telecommunication engineering and currently lives and works in London. In his work projects, he faces challenges involving natural language processing (NLP), machine learning, and probabilistic graph models everyday. He is very passionate about his job and he always tries to stay updated on the latest developments in data science technologies by attending meetups, conferences, and other events. Luca Massaron is a data scientist and marketing research director who specializes in multivariate statistical analysis, machine learning, and customer insight, with over a decade of experience in solving real-world problems and generating value for stakeholders by applying reasoning, statistics, data mining, and algorithms. From being a pioneer of web audience analysis in Italy to achieving the rank of a top 10 Kaggler, he has always been passionate about everything regarding data and analysis and about demonstrating the potentiality of data-driven knowledge discovery to both experts and nonexperts. Favoring simplicity over unnecessary sophistication, he believes that a lot can be achieved in data science by understanding its essentials.
Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
To get the free app, enter mobile phone number.
|5 star (0%)|
|4 star (0%)|
|3 star (0%)|
|2 star (0%)|
|1 star (0%)|
Most helpful customer reviews on Amazon.com
In little more than 200 pages it delivers the essential you need to know if you want to do data science and use Python for that (and you should, the authors suggest!). The part of the book I have particularly appreciated is the list of problems you have to face in practice and the proposed solution: loading your data in a fast and easy way by different sources, for instance, or the way to build and tune complex machine learning models for regression and classification problems. Your data can't fit into memory? It happens, as you know. Well there's a paragraph just for that and there are clear and efficient coding examples to solve the problem. I feel safer with this book with so many examples and solutions. Some books I was looking for help in my projects. This one appears to be very useful.
The books begins with a description of how to install Python and various packages needed to run the code. The purpose of these packages is also explained. Different Python distributions are briefly discussed together with their characteristics, so that a reader can select a distribution particularly suitable to his/her needs. As all code examples in the book are run in IPython Notebook, special attention is paid to a short but comprehensive introduction into IPython itself. Data sets used in the book are described too.
After advising on installation of Python and its packages, the book guides readers towards fast and easy data loading from a file, including the case when the entire data set cannot be loaded during one read in the memory and the solution offered is to load it in chunks by using pandas.
Furthermore, answers to the following problems are provided: how to deal with erroneous records, how to treat categorical and text data, what are useful data cleansing and transformation operations implemented in pandas, how to use the optimized data structures - numpy arrays - and what operations on them can be done.
Once data is loaded and converted to a suitable representation, the book then spends a chapter on the general Data Science pipeline that can be implemented with scikit-learn. The pipeline includes dimensionality reduction via either feature extraction or feature selection, outlier detection, predictive modeling (classification and regression), optimization of model's hyper-parameters, and model's performance evaluation. This material creates the holistic view what typical data analysis is comprised of.
The next chapter introduces several popular machine learning algorithms in detail. Among them are linear and logistic regression, Naive Bayes, support vector machines, bagging and boosting ensembles. Special attention is paid to scikit-learn solutions of the 3Vs of big data: namely, volume, velocity and variety. Scalability with volume is solved with incremental learning when at any given moment of time, only a portion (batch) of the entire data fit to the available memory is used to update a model, hence, a model learns incrementally as new batches arrive. To keep up with velocity, scikit-learn offers a number of classification and regression algorithms optimized for speed. Data variety is deal with the help of hashing and sparse matrices. The chapter ends with short examples of doing basic operations of Natural Language Processing with the NLTK package and data clustering.
Final two chapters are devoted to social network analysis with the NetworkX package and data visualization with the matplotlib and pandas packages, respectively.
Although I have both paper and electronic versions of this book, I would advise first to buy the paper version as numerous code is much easier to understand in this format because one can see the entire snapshot at once.