TY  - JOUR
T1  - A Study on How to Improve the Performance of k-mean Data Mining Algorithm in a Parallel Environment
AU - Dias, N.G.J. AU - Wijegunasekara, M.C. AU - Gunasekara, R.P.T.H. 
JO  - Journal of Engineering and Applied Sciences
VL  - 9
IS  - 10
SP  - 441
EP  - 446
PY  - 2014
DA  - 2001/08/19
SN  - 1816-949x
DO  - jeasci.2014.441.446
UR  - https://makhillpublications.co/view-article.php?doi=jeasci.2014.441.446
KW  - programming language
KW  -clusters
KW  -Clustering alogrithm
KW  -datasets
KW  -attributes
AB  - The k-mean algorithm is widely used clustering algorithm for large datasets. But, there are limitations when k-mean is used for very large datasets. This study is carried out to enhance the performance of the k-mean data-mining algorithm by using parallel programming methodologies. In this research, mainly two methods of parallelizing k-mean clustering algorithm were compared. They were k-mean clustering on parallel and non-parallel execution in WEKA and k-mean clustering on constructed program using Message Passing Interface (MPI) for parallel k-mean algorithm. Firstly, the cluster building ability of WEKA parallel over non-parallel WEKA for very large datasets was investigated. To identify the performance of parallelizing, the number of machines connected to the WEKA parallel was varied and performances were analyzed for several k values using k-mean algorithm for each setup. The experiment was done on three real electricity consumption data consists of 80,000, 50,000 and 30,000 data entries and with 65 attributes. It was identified that there is a significant improvement in performance of the WEKA parallel. Further WEKA parallel can be applied to very large datasets which were failed to work with WEKA. Secondly, the k-mean algorithm was implemented in C programming language and its performance with non-parallel WEKA was compared. According to that the time taken to build clusters was almost similar for small datasets.
ER  -