files/journal/2022-09-02_12-54-44-000000_354.png

Journal of Engineering and Applied Sciences

ISSN: Online 1818-7803
ISSN: Print 1816-949x
124
Views
1
Downloads

A Study on How to Improve the Performance of k-mean Data Mining Algorithm in a Parallel Environment

N.G.J. Dias, M.C. Wijegunasekara and R.P.T.H. Gunasekara
Page: 441-446 | Received 21 Sep 2022, Published online: 21 Sep 2022

Full Text Reference XML File PDF File

Abstract

The k-mean algorithm is widely used clustering algorithm for large datasets. But, there are limitations when k-mean is used for very large datasets. This study is carried out to enhance the performance of the k-mean data-mining algorithm by using parallel programming methodologies. In this research, mainly two methods of parallelizing k-mean clustering algorithm were compared. They were k-mean clustering on parallel and non-parallel execution in WEKA and k-mean clustering on constructed program using Message Passing Interface (MPI) for parallel k-mean algorithm. Firstly, the cluster building ability of WEKA parallel over non-parallel WEKA for very large datasets was investigated. To identify the performance of parallelizing, the number of machines connected to the WEKA parallel was varied and performances were analyzed for several k values using k-mean algorithm for each setup. The experiment was done on three real electricity consumption data consists of 80,000, 50,000 and 30,000 data entries and with 65 attributes. It was identified that there is a significant improvement in performance of the WEKA parallel. Further WEKA parallel can be applied to very large datasets which were failed to work with WEKA. Secondly, the k-mean algorithm was implemented in C programming language and its performance with non-parallel WEKA was compared. According to that the time taken to build clusters was almost similar for small datasets.


How to cite this article:

N.G.J. Dias, M.C. Wijegunasekara and R.P.T.H. Gunasekara. A Study on How to Improve the Performance of k-mean Data Mining Algorithm in a Parallel Environment.
DOI: https://doi.org/10.36478/jeasci.2014.441.446
URL: https://www.makhillpublications.co/view-article/1816-949x/jeasci.2014.441.446