WCSE 2019 SUMMER ISBN: 978-981-14-1684-2
DOI: 10.18178/wcse.2019.06.107

Keyphrase Generation with a Seq2seq Model

Pengfei Zhang, Dan Li, Yuheng Wang, Yang Fang

Abstract— Keyphrases provide the core information of the source text, thus useful in many applications. Previous researches focus on extracting the keywords from the original document but miss those absent keyphrases unseen from the source. So we propose a generative model based on seq2seq RNN, which can generate both present and absent keyphrases by capturing the semantic information of the source. We adopt the large vocabulary trick to construct the target words corpus so as to improve the efficiency. We also introduce the feature-rich encoders to leverage the linguistic and statistical information in the source. Additionally, we incorporate the switching generator-pointer mechanism to extract those out-of-vocabulary words from the original document. To evaluate our model, we conduct two tasks, i.e., predicting present keyphrases and generating absent keyphrases, on real-life datasets. The results prove the effectiveness of our model as it outperforms the state-of-the-are models consistently and significantly.

Index Terms— Keyphrase Generation; Seq2seq Model; Recurrent Neural Network

Pengfei Zhang, Dan Li, Yuheng Wang, Yang Fang
National University of Defense Technology, CHINA

[Download]


Cite: Pengfei Zhang, Dan Li, Yuheng Wang, Yang Fang, "Keyphrase Generation with a Seq2seq Model," Proceedings of 2019 the 9th International Workshop on Computer Science and Engineering, pp. 721-727, Hong Kong, 15-17 June, 2019.