WCSE 2022
ISBN: 978-981-18-3959-7 DOI: 10.18178/wcse.2022.06.037

Research on Topic Concentration of a Text from the Perspective of Readability Research

Yongquan Li

Abstract— In the time of information explosion, how to find the information which we need quickly, high quality and accurately is very pivotal. Scientific and efficient reading is particularly important in the era of information explosion. How to measure readers’ reading level and find the text suitable for their reading level is a difficult problem to solve. In order to solve these problems, the primary problem that must be solved is how to scientifically measure the difficulty of a text. Based on the research perspective of readability, this paper attempts to mine and calculate the difficulty of a text from the topic concentration of the text. This paper extracts the topic words from the topic concentration formula in Quantitative Index Text Analyzer (QUITA), calculates the “topic concentration” of each topic word (its value is the “topic concentration” of a single word multiplied by the frequency of the word) according to the “topic degree” formula of Hua Liu, and then sums these values to obtain the final value of “topic concentration”. The constructed topic concentration can distinguish the differences of versions and grades to a certain extent. It has the feasibility of automatic extraction by computer and reduces the degree of manual intervention.

Index Terms—Chinese, frequency, topic concentration, circulation degree

College of Chinese Language and Culture, Jinan University, Guangzhou 510610, CHINA


