Added the topic Evaluating the clustering (#35855)

* Added the topic Evaluating the clustering 

How can we evaluate the clustering algorithms,  based on analyzing our results.

* Fixed grammar and put in second person
This commit is contained in:
Vyshak Puthusseri
2019-07-20 02:57:13 +05:30
committed by Quincy Larson
parent d59f422be3
commit 7b08f7b4ed

View File

@ -117,6 +117,26 @@ Each iteration of the EM algorithm consists of two processes: The E-step, and th
Convergence is assured since the algorithm is guaranteed to increase the likelihood at each iteration.
## Evaluation of Clustering:
The million dollar question when doing someting related to dataming is about the perfomance of your algorithm. How can you evaluate whether the clustering algorithm that you had choosen for your dataset was giving the correct result or not. So your main aim in this phase is to assesses the feasibility of the cluster analysis on your particular dataset and the quality of the clusters generated. The major task of evaluating the clusters are as follows:
### 1. Assessing clustering tendency:
You will analyse whether a nonrandom structure exists in your data. The algorithm will result clusters if you input some dataset, but the clusters mined may be misleading. Clustering analysis on a dataset is meaningful only when there is a nonrandom structure.
### 2. Determining the number of clustering in the data set:
Your algorithm should only generate the specific amount of clusters that the particular problem requires.
### 3. Measuring the clustering quality:
In this task, you actually find the answer for the question: "How good is the clustering algorithm generated by a method, and how can you compare the clustering generated by different methods?"
This process can be categorized into two based on the ground truth. The term ground truth refers to the ideal clustering that is often created by the human being using his intelligence.
#### > Extrinsic method:
If the ground truth is available, you compare it with the clusters that our algoithm generates. It is also known as the supervised method of evaluating.
#### > Intrensic methos:
If the ground truth is not available, you evaluate the goodness of the clustering by considering how well your clusters are seperated. It is also known as the unsupervised method of evaluating.
## More Information:
<!-- Please add any articles you think might be helpful to read before writing the article -->
* [Wikipedia Cluster Analysis article](https://en.wikipedia.org/wiki/Cluster_analysis)