TY  - JOUR
T1  - Canonical Data Model for Text Document Clustering
AU - Sakira Kamaruddin, Siti AU - Yusof, Yuhanis AU - Kabir Ahmad, Farzana AU - Ahmed Taiye, Mohammed 
JO  - Journal of Engineering and Applied Sciences
VL  - 12
IS  - 21
SP  - 5554
EP  - 5559
PY  - 2017
DA  - 2001/08/19
SN  - 1816-949x
DO  - jeasci.2017.5554.5559
UR  - https://makhillpublications.co/view-article.php?doi=jeasci.2017.5554.5559
KW  - Canonical Data Model
KW  -text clustering
KW  -latent semantic analysis
KW  -text dimensionality reduction
KW  -summarization
KW  -multi-documents
AB  - The abundance of text data have been witnessed with the growth of web and other text repositories.
There is an important need to provide improved mechanism to effectively represent and retrieve text data. This
study advocates the construction of Canonical Data Models for mapping contents of multi-documents into a
few general models that can represent the corpus. However, to construct Canonical Data Model for text, it
involves non-trivial text mining techniques prior to the actual construction process. Furthermore, constructing
Canonical Data Models for all terms in a set of documents will be costly and will not reduce the sparsity
problem that are associated with text document processing. In order to solve this problem we propose a two
tier dimensionality reduction step adopting commonly used feature extraction and feature selection methods.
The reduced features are then used to construct a Canonical Data Model. A Canonical Data Model for text
documents can be used as a general model that has potential to act as a reference model for text comparison
in a wide variety of text mining tasks such as text clustering, text classification, text summarization and text
deviation detection. Experimental result reveals that the proposed approach produces better results compared
to methods without Canonical Data Model.
ER  -