CLOSE AND OPEN TASK AUTHORSHIP ATTRIBUTION: A COMPUTATIONAL AUTHORSHIP ANALYSIS

Abstract

Authorship analysis is one of the areas lies within forensic linguistics where the main task is to investigate the characteristics of a text in terms of its authorship. Specifically, authorship attribution examines the possibility of an author for having written the text by analyzing the author's other works. This experimental research addresses two problems: which author writes which text (using a closed task authorship attribution) and who writes each text (using an open task of authorship attribution). In doing so, this research uses R to do statistical computing employing both stylo() and classify() functions. Based on carried out experiments with 1-grams as a fixed variable, it is concluded that SVM algorithm may be best used in doing closed task authorship attribution for its 100% consistency, whereas for the open task k-NN algorithm may be best used since it reaches 94% consistency. In addition to open class task, stylo() function may perform better than classify() function since stylo() function provides results closer to the actual answer. As the legal system often challenges authorship analysis for not having a valid methodology, analyzing styles using stylometry and measuring the styles computationally may help forensic linguists to provide an adequate analysis for the legal system. Scientifically this research provides a framework of how to do authorship analysis computationally while practically it is projected can be used as a tool to detect plagiarism.