WCSE 2017
ISBN: 978-981-11-3671-9 DOI: 10.18178/wcse.2017.06.067

Enhancing Topic Models for Short Texts using Deep Autoencoder

Luepol Pipanmaekaporn, Suwatchai Kamolsantiroj

Abstract— With the prevalence of micro-blogging platforms such as Twitter, short texts are becoming a large portion of online text data. Despite this, inferring meaningful topics from short texts is still a challenging task due to very limited word co-occurrence information available in documents. In this paper, we propose a novel way for topic learning and inference in short text data by exploring semantic relations between words over short texts and then building word relation matrix to avoid the data sparsity issue. Deep Autoencoder is used to learn meaningful relations between words and topics by using the auxiliary information of word relations. After acquiring the topics, we formulated the problem of topic inference as Non-Negative Matrix Factorization (NNMF) with word-topic relations. Experiments on two real-world short text datasets show that our method outperforms state-of-the-art baselines for topic modeling.

Index Terms— Short Text, Topic Modeling, Short Text, Representation Learning, Deep Autoencoder

Luepol Pipanmaekaporn, Suwatchai Kamolsantiroj
Department of Computer and Information Science, King Mongkut’s University of Technology North Bangkok, THAILAND


Cite: Luepol Pipanmaekaporn, Suwatchai Kamolsantiroj, "Enhancing Topic Models for Short Texts using Deep Autoencoder," Proceedings of 2017 the 7th International Workshop on Computer Science and Engineering, pp. 392-397, Beijing, 25-27 June, 2017.