An Analysis of Spam Email Detection Performance Assessment Using Machine Learning

Abstract

Spam email is very annoying for email account users to get relevant information. Detection of email spam has actually been applied to email services for the public with various methods. But for the use of a limited number of company's e-mail accounts, not all e-mail servers provide spam e-mail detection features. The server administrator must add a separate or modular spam detection feature so that e-mail accounts can be protected from spam e-mail. This study aims to get the best method in the process of detecting spam emails. Some machine learning methods such as Logistic Regression, Decision Tree, and Random Forest are applied and compared results to get the most efficient method of detecting spam e-mail. Efficiency measurements are obtained from the speed of training and testing processes, as well as the accuracy in detecting spam emails. The results obtained in this study indicate that the Random Forest method has the best performance with a test data speed of 0.19 seconds and an accuracy of 98%. This result can be used as a reference for the development of spam detection using other methods.