Plagiarism Detection in Students' Theses Using The Cosine Similarity Method


The main requirement for graduation from students is to make a final scientific paper. One of the factors determining the quality of a student's scientific work is the uniqueness and innovation of the work. This research aims to apply data mining methods to detect similarities in titles, abstracts, or topics of students' final scientific papers so that plagiarism does not occur. In this research, the cosine similarity method is combined with the preprocessing method and TF-IDF to calculate the level of similarity between the title and the abstract of a student's final scientific paper, then the results will be displayed and compared with the existing final project repository based on the threshold value to make a decision whether scientific work can be accepted or rejected. Based on the test data and training data that has been applied to the TF-IDF method, it shows that the percentage level of similarity between the training data document and the test data document is 8%. This shows that the student thesis is still classified as unique and does not contain plagiarism content. The findings of this study can help the university in managing the administration of student theses so that plagiarism does not occur. Furthermore, it is necessary to study further adding methods to increase the accuracy of system performance so that when the process is run the system will work faster and optimally.