Book recommendation: Practical Data Science with R

Practical Data Science with R by Nina Zumel and John Mount

   I found this book very helpful and informative to understand a data science with R. As its title states, it is a practical book with many example scripts in R. The book is an excellent resource to grasp data models and using them in R.

   Part one introduces data science.  It starts with data science process and how to handle data in R. Data exploring is very important before applying any model. Book explains how to learn data with a summary and visualizing it through the different graphs, and understanding the relationships between the variables in a given dataset. Data organizing process such as checking and changing missing values, normalizing and scaling data are also explained.

   Part two discusses modeling methods. To choose the right model for analysis, the problem should be mapped to identify what kind of analysis is needed for this particular case. Classification, regression, clustering and other machine learning methods are explained. Evaluation and validation of several models are mentioned. Predictions of decision trees, k-nearest neighbor, and naive Bayes models are described. Before applying any other complicated models, linear regression (modeling quantities) and logistic regression (modeling probabilities) are proper techniques to conduct initial tests.
   Unsupervised learning is exploratory data checking to find out similar groups and relations between attributes of data. Clustering and association rule mining models are explained on this topic. Further advanced methods such as Generalized Additive Models (GAMs), kernel functions, and support vector machine (SVM) are described. Book states that those methods are for a specific analysis to fix the issues after common modeling methods.
     
   Part three talks about the results. Organizing source code and documentation is an essential part of software development. In this section, knitr package, usage of GIT for version control, and deploying model are described. Presenting model to end users are also well explained.
   In an appendix, we can find some additional concept of data analysis, such as usage of R, basic statistical concepts, and other further big data tools description.

   In short, I give a five star for this book and recommend it to read for every data scientist.

Find it here



No comments:

Post a Comment