DataTechNotes: Clustering Examples with R

Data clustering is unsupervised learning method to explore data to find out somehow similar and dissimilar parts of it through the grouping and subsetting the elements in a dataset. It is useful technique to investigate the content of a data in a first look and develop appropriate analysis method accordingly.

Clustering is a grouping a set of elements into the clusters that each contains multiple nearby elements. Elements belong to the different clusters can be dissimilar in a certain way with each other. Below graph shows a clustered data. Given data is clustered into 3 groups and found the center points of each group.

There are several clustering methods are available. We learned centroid and hierarchical clustering methods in my previous posts, you can check them for more information.

Center-based or centroid clustering calculates distance metrics between elements in a data and separates into clusters based on those metrics. The k-means algorithm is an example of this method.
Hierarchical clustering method builds the groups that contain similar elements in data from bottom to up. For more information refer the post on hierarchical method.

The distance calculation is a key concept in data clustering. There are many distance calculation methods such as Euclidean, Manhattan, Canberra, Hamming, and etc. Those distance methods are used in both k-means and hierarchical clustering.

DataTechNotes

Pages

Clustering Examples with R

No comments:

Post a Comment