MENGIDENTIFIKASI HOAX PADA HASIL PENCARIAN BERITA ONLINE DENGAN TEKNIK WEB SCRAPING DAN ALGORITMA C4.5

Abstract

Online news is a journalistic product reports the facts or events that are produced and distributed via internet. However, not all of the information through online media is a real facts, also described as hoax. The large number of hoax news occurs, of course, deliver the impact on the people who look on the news, so it could cause misperceptions or inappropriate actions. We exploit a web scraping technique to extract the content from search search engines results. Furthermore, we employ the C4.5 algorithm for the classification process. There were three parameters as references: invitation to spread the news, credibility of the sources, and provoking title. The results of this work were a decision tree, that able to classify a news content as a hoax or legitimate. From the experiments which carried out, the accuracy of classification using the web scraping and C4.5 algorithm achieved 80% of success rate in determining the hoax.