diff --git a/guide/english/machine-learning/clustering-algorithms/index.md b/guide/english/machine-learning/clustering-algorithms/index.md index 67395c482f..e2c3a50b00 100644 --- a/guide/english/machine-learning/clustering-algorithms/index.md +++ b/guide/english/machine-learning/clustering-algorithms/index.md @@ -117,6 +117,26 @@ Each iteration of the EM algorithm consists of two processes: The E-step, and th Convergence is assured since the algorithm is guaranteed to increase the likelihood at each iteration. +## Evaluation of Clustering: +The million dollar question when doing someting related to dataming is about the perfomance of your algorithm. How can you evaluate whether the clustering algorithm that you had choosen for your dataset was giving the correct result or not. So your main aim in this phase is to assesses the feasibility of the cluster analysis on your particular dataset and the quality of the clusters generated. The major task of evaluating the clusters are as follows: + +### 1. Assessing clustering tendency: +You will analyse whether a nonrandom structure exists in your data. The algorithm will result clusters if you input some dataset, but the clusters mined may be misleading. Clustering analysis on a dataset is meaningful only when there is a nonrandom structure. + +### 2. Determining the number of clustering in the data set: +Your algorithm should only generate the specific amount of clusters that the particular problem requires. + +### 3. Measuring the clustering quality: +In this task, you actually find the answer for the question: "How good is the clustering algorithm generated by a method, and how can you compare the clustering generated by different methods?" + +This process can be categorized into two based on the ground truth. The term ground truth refers to the ideal clustering that is often created by the human being using his intelligence. + +#### > Extrinsic method: +If the ground truth is available, you compare it with the clusters that our algoithm generates. It is also known as the supervised method of evaluating. + +#### > Intrensic methos: +If the ground truth is not available, you evaluate the goodness of the clustering by considering how well your clusters are seperated. It is also known as the unsupervised method of evaluating. + ## More Information: * [Wikipedia Cluster Analysis article](https://en.wikipedia.org/wiki/Cluster_analysis)