Joint Word Segmentation and Stemming with Neural Sequence Labeling for Myanmar Language

WCSE 2019 SPRING ISBN: 978-981-14-1455-8
DOI: 10.18178/wcse.2019.03.017

Yadanar Oo, Khin Mar Soe

Abstract— Word segmentation is widely-studies sequence labeling problem using machine learning method like conditional random fields. In word segmentation, deep learning approaches have achieved state -of-theart performance. Normally, segmentation is considered as a separate process from stemming. Our approach proposes a joint model that has stronger capabilities for Myanmar word segmentation and stemming. As far as we know, this is the first work on joint Myanmar word segmentation and stemming. In this paper, we evaluate the performance of neural network architecture that relies on two sources of information about syllable- and character-level representation, by using LSTM, CNN, GRU and CRF. For the comparison and analysis process, we examine the importance of different network designs and different factors such as the last layer of the network and different optimizers.

Index Terms— Myanmar word segmentation, Stemming, joint model, neural networks .

Yadanar Oo, Khin Mar Soe
University of Computer Studies, Yangon, Myanmar

[Download]

Cite: Yadanar Oo, Khin Mar Soe, "Joint Word Segmentation and Stemming with Neural Sequence Labeling for Myanmar Language," Proceedings of 2019 the 9th International Workshop on Computer Science and Engineering WCSE_2019_SPRING, pp. 95-100, Yangon, Myanmar, February 27-March 1, 2019.

PREVIOUS PAPER
Anaphora Resolution for Myanmar Text Using K-Nearest Neighbor Algorithm

NEXT PAPER
Dataset for Depression Detection from Speech Emotion Recognition