Evaluation of the role of corpus in Vietnamese-related machine translation quality

Abstract

The quality of current Vietnamese-related automatic translation systems is still low when compared with the translation quality of other popular language pairs. There are many factors that affect the quality of the translation model, including the translation method and the corpus. To build a good quality translation system, it is necessary to use good quality and large quantity of linguistic resources. This article researches the current situation of Vietnamese bilingual corpus and builds the English-Vietnamese translation systems from corpus of different sizes, using other translation methods. The results of the quality of the translation systems obtained show that, when using the larger corpus size, the quality of the translation system is increased.