Integrate Words Internal Information to Improve Word Embeddings

Chuanxiang Tang, Yun Tang

Abstract— we propose a method of improving word embeddings by fusing the hidden information within words, which is different from the traditional method of directly using morphological information on the surface of words to train word embeddings. Based on the average principle and two attention mechanisms, we propose to use the hidden information inside words, which is called the implied meanings of morphemes of words in this paper, and propose six implied meaning embedding models. The comparative experiments are carried out on two basic Natural Language Processing tasks, which prove that our models have more advantages than the classical models represented by CBOW, Skip-Gram and GloVe in mining semantic information. In addition, exploring the relationship between the importance of synthetic implied meanings and the word itself.

Index Terms— average principle, attention mechanism, word embedding, fusion.

