TY - JOUR T1 - A Study on How to Improve the Performance of k-mean Data Mining Algorithm in a Parallel Environment AU - Dias, N.G.J. AU - Wijegunasekara, M.C. AU - Gunasekara, R.P.T.H. JO - Journal of Engineering and Applied Sciences VL - 9 IS - 10 SP - 441 EP - 446 PY - 2014 DA - 2001/08/19 SN - 1816-949x DO - jeasci.2014.441.446 UR - https://makhillpublications.co/view-article.php?doi=jeasci.2014.441.446 KW - programming language KW -clusters KW -Clustering alogrithm KW -datasets KW -attributes AB - The k-mean algorithm is widely used clustering algorithm for large datasets. But, there are limitations when k-mean is used for very large datasets. This study is carried out to enhance the performance of the k-mean data-mining algorithm by using parallel programming methodologies. In this research, mainly two methods of parallelizing k-mean clustering algorithm were compared. They were k-mean clustering on parallel and non-parallel execution in WEKA and k-mean clustering on constructed program using Message Passing Interface (MPI) for parallel k-mean algorithm. Firstly, the cluster building ability of WEKA parallel over non-parallel WEKA for very large datasets was investigated. To identify the performance of parallelizing, the number of machines connected to the WEKA parallel was varied and performances were analyzed for several k values using k-mean algorithm for each setup. The experiment was done on three real electricity consumption data consists of 80,000, 50,000 and 30,000 data entries and with 65 attributes. It was identified that there is a significant improvement in performance of the WEKA parallel. Further WEKA parallel can be applied to very large datasets which were failed to work with WEKA. Secondly, the k-mean algorithm was implemented in C programming language and its performance with non-parallel WEKA was compared. According to that the time taken to build clusters was almost similar for small datasets. ER -