WCSE 2019 SUMMER ISBN: 978-981-14-1684-2
DOI: 10.18178/wcse.2019.06.106

Research on Network Public Opinion Detection Based on Improved TF-IDF Algorithm

Lu Peng, Zongfeng Qin

Abstract— TF-IDF algorithm is a widely used text feature weighting technology. The core idea of TF-IDF algorithm is as follows: In a corpus, if a participle appears frequently in a certain text and appears less in other texts, then it proves that the participle has a good feature of expression to this text. Although this idea is very simple, it also faces some problems in practical applications. Because it blindly increased the importance of uncommon words in the text and this blindness will also appear in the field of public opinion monitoring. In order to solve the mentioned problem, this thesis has done the following work:  Introduce the lexical weight coefficient of the characteristic word into TF-IDF;  Introduce the word position weight (span weight) coefficient into TF-IDF. The experiment proves that the improved TF-IDF method highlights the importance of text feature words and facilitates classification. Furthermore, the improved method is applied to the public opinion analysis system and got good results.

Index Terms— Network Public Opinion; Cosine Similarity; TF-IDF; Emotional Analysis.

Lu Peng, Zongfeng Qin
City College, Wuhan University of Science and Technology, CHINA

[Download]


Cite: Lu Peng, Zongfeng Qin, "Research on Network Public Opinion Detection Based on Improved TF-IDF Algorithm," Proceedings of 2019 the 9th International Workshop on Computer Science and Engineering, pp. 715-720, Hong Kong, 15-17 June, 2019.