International Conference on Advances in Computer Science and Electronics Engineering - CSEE 2012
Author(s) : M.J.YEOLA
Document clustering is an area that deals with the unsupervised grouping of text documents into meaningful groups, usually representing topics in the document collection. It is one way to organize information without requiring prior knowledge about the classification of documents. The well-known K-means clustering algorithm allows users to specify the number of clusters. However, if the pre-specified number of clusters is modified, the precision of each result also changes. To solve this problem, this paper proposes a new clustering algorithm based on the Kea keyphrase extraction algorithm. In this paper, documents are grouped into several clusters like K-means, but the number of clusters is automatically determined by finding out the similarities between documents and the extracted key phrases. It also calculates F-measure value using precision and recall which gives the better clusters.