KLASIFIKASI DOKUMEN TUGAS AKHIR (SKRIPSI) MENGGUNAKAN K-NEAREST NEIGHBOR
Abstract
Various scientific works from academicians such as theses, research reports, practical work reports and so forth are available in the digital version. However, in general this phenomenon is not accompanied by a growth in the amount of information or knowledge that can be extracted from these electronic documents. This study aims to classify the abstract data of informatics engineering thesis. The algorithm used in this study is K-Nearest Neighbor. Amount of data used 50 abstract data of Indonesian language, 454 data of English abstract and 504 title data. Each data is divided into training data and test data. Test data will be classified automatically with the classifier model that has been made. Based on the research conducted, the classification of the Indonesian essential data resulted in greater accuracy without going through a stemming process that had a 9: 1 ratio of 100.0% compared to an 8: 2 ratio of 90.0%, 7: 3 which was 80.0%, 6: 4 which is 60.0% and the data distribution using Kfold cross validation is 80.0%.