ISBN: 978-981-11-3671-9 DOI: 10.18178/wcse.2017.06.067
Enhancing Topic Models for Short Texts using Deep Autoencoder
Abstract— With the prevalence of micro-blogging platforms such as Twitter, short texts are becoming a
large portion of online text data. Despite this, inferring meaningful topics from short texts is still a
challenging task due to very limited word co-occurrence information available in documents. In this paper,
we propose a novel way for topic learning and inference in short text data by exploring semantic relations
between words over short texts and then building word relation matrix to avoid the data sparsity issue. Deep
Autoencoder is used to learn meaningful relations between words and topics by using the auxiliary
information of word relations. After acquiring the topics, we formulated the problem of topic inference as
Non-Negative Matrix Factorization (NNMF) with word-topic relations. Experiments on two real-world short
text datasets show that our method outperforms state-of-the-art baselines for topic modeling.
Index Terms— Short Text, Topic Modeling, Short Text, Representation Learning, Deep Autoencoder
Luepol Pipanmaekaporn, Suwatchai Kamolsantiroj
Department of Computer and Information Science, King Mongkut’s University of Technology North Bangkok, THAILAND
Cite: Luepol Pipanmaekaporn, Suwatchai Kamolsantiroj, "Enhancing Topic Models for Short Texts using Deep Autoencoder," Proceedings of 2017 the 7th International Workshop on Computer Science and Engineering, pp. 392-397, Beijing, 25-27 June, 2017.