files/journal/2022-09-02_11-59-20-000000_418.png

Asian Journal of Information Technology

ISSN: Online 1993-5994
ISSN: Print 1682-3915
119
Views
1
Downloads

The Use of Hartigan Index for Initializing K-Means++ in Detecting Similar Texts of Clustered Documents as a Plagiarism Indicator

Diana Purwitasari, I. Wayan Surya Priantara , Putu Yuwono Kusmawan , Umi Laili Yuhana and Daniel Oranova Siahaan
Page: 341-347 | Received 21 Sep 2022, Published online: 21 Sep 2022

Full Text Reference XML File PDF File

Abstract

Plagiarism is increasingly alarming, especially if this happens in the field of education. Many writing works in which a part of the content is written by plagiarizing other people’s works. Similar sentence detection as a plagiarism indicator can be conducted by using n-gram based hashing algorithm of Winnowing algorithm. The function of Winnowing is to generate document fingerprint which convert texts within document into a collection of hash values. Similar fingerprint between documents shows that there are similar texts as a plagiarism indicator. Plagiarizing usually happens on documents having similar topics. Therefore, to detect plagiarism, documents having similar topics should be clustered. K-means++ is a clustering algorithm that requires cluster number as its input through recommendation conducted by Hartigan index to give a recommendation for the cluster number. After clustering documents, a comparison was made between document fingerprint and fingerprint cluster instead of between documents. Then, the comparison was made for documents which become members of the closest cluster that had been selected from the first comparison.


How to cite this article:

Diana Purwitasari, I. Wayan Surya Priantara , Putu Yuwono Kusmawan , Umi Laili Yuhana and Daniel Oranova Siahaan . The Use of Hartigan Index for Initializing K-Means++ in Detecting Similar Texts of Clustered Documents as a Plagiarism Indicator.
DOI: https://doi.org/10.36478/ajit.2011.341.347
URL: https://www.makhillpublications.co/view-article/1682-3915/ajit.2011.341.347