Discuss the two types of Hierarchical clustering.

Discuss the two types of Hierarchical clustering.


 

Hierarchical clustering is an unsupervised machine learning algorithm that groups data points into clusters based on their similarity. It does this by creating a hierarchy of clusters, where each cluster is nested within a larger cluster. The two most common types of hierarchical clustering are agglomerative clustering and divisive clustering.


Agglomerative clustering

Agglomerative clustering starts with each data point in its own cluster. Then, it iteratively merges the two most similar clusters until there is only one cluster left. The similarity between two clusters is typically measured using a distance metric, such as the Euclidean distance or the Manhattan distance.

Divisive clustering

Divisive clustering starts with all of the data points in a single cluster. Then, it iteratively splits the cluster into two clusters, until each data point is in its own cluster. The dissimilarity between two clusters is typically measured using a distance metric.

Which type of hierarchical clustering to use?

The choice of whether to use agglomerative clustering or divisive clustering depends on the specific problem you are trying to solve. Agglomerative clustering is typically better for finding clusters with a clear separation between them, while divisive clustering is better for finding clusters with a more gradual transition between them.

Example of hierarchical clustering

Let's say we have data on the height and weight of 10 people. We can use hierarchical clustering to group these people into clusters based on their similarity in height and weight.

The first step is to compute the distance between each pair of people. The distance between two people can be calculated using the Euclidean distance or the Manhattan distance.

Once we have the distances between all pairs of people, we can start merging the two most similar people into clusters. In our example, the two most similar people are Alice and Bob, who have a distance of 0.5. So, we merge Alice and Bob into a single cluster.

We then repeat this process, merging the two most similar clusters together. In our example, the next two most similar clusters are Carol and David, who have a distance of 1.0. So, we merge Carol and David into a single cluster.

We continue merging clusters until we are left with only one cluster, which contains all 10 people.

The dendrogram is a graphical representation of the hierarchy of clusters created by hierarchical clustering. The dendrogram shows how the clusters are nested within each other, and it can be used to visualize the clustering process.

Advantages of hierarchical clustering

Hierarchical clustering has several advantages over other clustering algorithms:

  • It is relatively easy to understand and implement.
  • It can be used to cluster data with a variety of shapes and sizes.
  • It can be used to cluster data with missing values.
  • It can be used to cluster data with outliers.

Disadvantages of hierarchical clustering

Hierarchical clustering also has some disadvantages:

  • It can be computationally expensive, especially for large datasets.
  • It can be difficult to determine the optimal number of clusters.
  • The dendrogram can be difficult to interpret, especially for large datasets.

Post a Comment

Previous Post Next Post