WCSE 2020 SPRING ISBN: 978-981-14-4787-7
DOI: 10.18178/wcse.2020.02.007

Statistical Machine Translation between Myanmar and Myeik

Thazin Myint Oo, Ye Kyaw Thu, Khin Mar Soe, Thepchai Supnithi

Abstract— This paper contributes the first evaluation of the quality of machine translation between Myanmar and Myeik (also known as Beik) . We also developed a Myanmar-Myeik parallel corpus (around 10K sentences) based on the Myanmar language of ASEAN MT corpus. In addition, two types of segmentation were studied word and syllable segmentation. The 10 folds cross-validation experiments were carried out using three different statistical machine translation approaches: phrase-based, hierarchical phrase-based, and the operation sequence model (OSM). The results show that all three statistical machine translation approaches give higher and comparable BLEU and RIBES scores for both Myanmar to Myeik and Myeik to Myanmar machine translations. OSM approach achieved the highest BLEU and RIBES scores among three approaches. We also found that syllable segmentation is appropriate for translation quality comparing with word level segmentation results.

Index Terms— Statistical machine translation, Myanmar language (Burmese), Myeik dialect, Machine translation for dialects, Parallel corpus developing

Thazin Myint Oo
University of Computer Studies, MYANMAR
Ye Kyaw Thu
National Electronics and Computer Technology Center, THAILAND
Khin Mar Soe
University of Computer Studies, MYANMAR
Thepchai Supnithi
National Electronics and Computer Technology Center, THAILAND

[Download]


Cite: Thazin Myint Oo, Ye Kyaw Thu, Khin Mar Soe, Thepchai Supnithi, "Statistical Machine Translation between Myanmar and Myeik," Proceedings of 2020 the 10th International Workshop on Computer Science and Engineering (WCSE 2020), pp. 36-45, Yangon (Rangoon), Myanmar (Burma), February 26-28, 2020.