Enhancing Topic Models for Short Texts using Deep Autoencoder

WCSE 2017
ISBN: 978-981-11-3671-9 DOI: 10.18178/wcse.2017.06.067

Luepol Pipanmaekaporn, Suwatchai Kamolsantiroj

Abstract— With the prevalence of micro-blogging platforms such as Twitter, short texts are becoming a large portion of online text data. Despite this, inferring meaningful topics from short texts is still a challenging task due to very limited word co-occurrence information available in documents. In this paper, we propose a novel way for topic learning and inference in short text data by exploring semantic relations between words over short texts and then building word relation matrix to avoid the data sparsity issue. Deep Autoencoder is used to learn meaningful relations between words and topics by using the auxiliary information of word relations. After acquiring the topics, we formulated the problem of topic inference as Non-Negative Matrix Factorization (NNMF) with word-topic relations. Experiments on two real-world short text datasets show that our method outperforms state-of-the-art baselines for topic modeling.

Index Terms— Short Text, Topic Modeling, Short Text, Representation Learning, Deep Autoencoder

Luepol Pipanmaekaporn, Suwatchai Kamolsantiroj
Department of Computer and Information Science, King Mongkut’s University of Technology North Bangkok, THAILAND

[Download]

Cite: Luepol Pipanmaekaporn, Suwatchai Kamolsantiroj, "Enhancing Topic Models for Short Texts using Deep Autoencoder," Proceedings of 2017 the 7th International Workshop on Computer Science and Engineering, pp. 392-397, Beijing, 25-27 June, 2017.

PREVIOUS PAPER
Non-consistence Aggregation-Disaggregation Technology for Battle Simulation Study of SoS

NEXT PAPER
Research on Virtual Simulation System of Campus Emergency Drill