TY  - JOUR
T1  - Study of Ontology or Thesaurus Based Document Clustering and Information Retrieval
AU - Bharathi, G. AU - Venkatesan, D. 
JO  - Journal of Engineering and Applied Sciences
VL  - 7
IS  - 4
SP  - 342
EP  - 347
PY  - 2012
DA  - 2001/08/19
SN  - 1816-949x
DO  - jeasci.2012.342.347
UR  - https://makhillpublications.co/view-article.php?doi=jeasci.2012.342.347
KW  - Ontology
KW  -thesaurus
KW  -document clustering
KW  -intelligent information retrieval
KW  -semantics
KW  -Wikipedia
KW  -Wordnet
AB  - Document clustering generate clusters from the whole document collection automatically and is used in many fields including data mining and information retrieval. Clustering text data faces a number of new challenges. Among others, the volume of text data, dimensionality, sparsity and complex semantics are the most important ones. These characteristics of text data require clustering techniques to be scalable to large and high dimensional data and able to handle sparsity and semantics. In the traditional vector space model, the unique words occurring in the document set are used as the features. But because of the synonym problem and the polysemous problem such a bag of original words cannot represent the content of a document precisely. Most of the existing text clustering methods use clustering techniques which depend only on term strength and document frequency where single terms are used as features for representing the documents and they are treated independently which can be easily applied to non-ontological clustering. To overcome these issues, this study makes a survey of recent research done on ontology or thesaurus based document clustering.
ER  -