Book Recommendation: R in Action

R in Action, data analysis and graphics with R 
by Robert I. Kabacoff

   A very good reference to learn the topics of statistical methods and graphical visualization with R. You can learn basic, but most important topics on data analysis and statistics. It is a practical book with many R code examples that can be easily implemented for your test cases. I highly recommend reading this book.

   The first part of the book contains how to use R, data sets, working with graphs, data management, simple operations with data, numerical functions (statistical, probability, etc), and techniques of aggregation and reshaping. Remaining parts of the book divided into basic, intermediate, and advanced methods sections.


Basic methods

   The basic methods section covers graphs and plotting, statistical summaries, distributions of variables,  correlations, t-tests, and other basic statistic methods. Those concepts are vital and should be well understood to perform further analysis on the dataset.


Intermediate methods

   Regression analysis is a big topic in data science. The book mainly focuses on regression analysis in this section. Regression diagnostics and various types of regression models are described. Analysis of variance, power analysis, re-sampling, and the visualizing methods come in this part of the book. Statistical tests and bootstrapping methods are explained.


Advanced methods

   In this section, generalized linear models for regression analysis, principal components and factor analysis, advanced graphics and other methods are explained.

  Key definitions:  


   Power analysis helps to determine the size of samples to get a result with a certain level of confidence. pwr package provides power analysis functions.

   Permutation or randomization test build sampling distribution by re-sampling the observed data. coin and lmPerm packages provide permutation test functions.

   Bootstrapping is a method to generate the sampling distribution with the replacement of original observed data. A boot package provides bootstrapping function boot() in R.

   Principal Component Analysis (PCA) is a statistical method to find principal components of data by reducing the number of variables.

   Exploratory Factor Analysis (EFA) is a statistical method to reduce data to smaller summary variables and uncover the underlying structure of data variables.


2 comments: