There are many ways to detect the outliers in a given dataset. In this post, we'll learn how to detect the outlier in a given dataset with boxplot.stat() function in R . You may find more information about this function with ?boxplot.stats command in R.
We'll start with generating sample dataset for this tutorial.
> m <- rnorm(100)
> head(m, 20)
[1] 1.07766774 -0.26540253 0.74722125 -1.46459965 [5] 0.56082679 0.24564791 -0.53357662 -0.62622695 [9] 0.77265093 -2.99791152 0.76610916 0.06503116 [13] 0.60389088 0.87890446 1.73781867 1.56105272 [17] 0.61592582 -0.86839875 1.51704497 1.58302684
Next, we'll get statistics of m data with boxplot.stat() function.
> st <- boxplot.stats(m)
> st
$stats
[1] -2.0254878 -0.6059728 0.1234828 0.7693800
[5] 2.3504506
$n
[1] 100
$conf
[1] -0.09382298 0.34078853
$out
[1] -2.997912 -2.673720 -2.981618
The outliers are defined in an out property of the st object. We'll find the indexes of those elements.
> st$out [1] -2.997912 -2.673720 -2.981618
> out_index <- which(m %in% st$out)
> m[out_index] [1] -2.997912 -2.673720 -2.981618
> out_index [1] 10 85 100
Finally, we'll plot m vector and highlight the outliers.
> plot(m, type = "l", col = "blue")
> points(x = out_index, y = m[out_index], pch = 19, col = "red")
In this post, we have learned how to detect outliers with boxplot.stat function in R. Thank you for reading!
Outlier detection with Local Outlier Factor with R
Outlier check with SVM novelty detection in R
Outlier check with kmeans distance calculation in R
No comments:
Post a Comment