Analisis dan Perbandingan Kualitas Pengelompokan Dokumen (Document Clustering) Dengan Menggunakan Metode K-Means Dan K-Medians

Abstract

Conducting data analysis on a large set of documents is not an easy task. The common stages are document filtering, document selection, and document clustering. Clustering is a technique used in data mining to find groups of data that do not have a natural grouping. There are many clustering algorithm have been introduced, and two of them are K-means and K-medians. Both methods classify documents based on the proximity of words weighting between documents. This study aims to compare the quality of the clusters produced by K-means and K-medians. The results show that K-medians obtain a better cluster quality when compared to K-means. However, it takes more time to cluster.