TY - JOUR T1 - Study of Ontology or Thesaurus Based Document Clustering and Information Retrieval AU - Bharathi, G. AU - Venkatesan, D. JO - Journal of Engineering and Applied Sciences VL - 7 IS - 4 SP - 342 EP - 347 PY - 2012 DA - 2001/08/19 SN - 1816-949x DO - jeasci.2012.342.347 UR - https://makhillpublications.co/view-article.php?doi=jeasci.2012.342.347 KW - Ontology KW -thesaurus KW -document clustering KW -intelligent information retrieval KW -semantics KW -Wikipedia KW -Wordnet AB - Document clustering generate clusters from the whole document collection automatically and is used in many fields including data mining and information retrieval. Clustering text data faces a number of new challenges. Among others, the volume of text data, dimensionality, sparsity and complex semantics are the most important ones. These characteristics of text data require clustering techniques to be scalable to large and high dimensional data and able to handle sparsity and semantics. In the traditional vector space model, the unique words occurring in the document set are used as the features. But because of the synonym problem and the polysemous problem such a bag of original words cannot represent the content of a document precisely. Most of the existing text clustering methods use clustering techniques which depend only on term strength and document frequency where single terms are used as features for representing the documents and they are treated independently which can be easily applied to non-ontological clustering. To overcome these issues, this study makes a survey of recent research done on ontology or thesaurus based document clustering. ER -