Human Action Recognition in Still Images Combining CBOW Language Processing and CNN

WCSE 2022
ISBN: 978-981-18-3959-7 DOI: 10.18178/wcse.2022.06.031

Quanzhi Gong, Yuxiang Xie, Jie Yan, Xidao Luan

Abstract—In the field of computer vision, human action recognition task is a difficult problem, especially for static images. Traditional action recognition methods rely on local or global features designed manually, which are limited by people’s visual cognition of image. Therefore, it is difficult to achieve the optimal recognition results. In recent years, the convolutional neural network (CNN) has become popular in the action recognition methods, which usually improve the performance by designing network structure. Most of the previous methods solve this task through the implicit image content association information contained in the image. Due to the lack of interpretability of convolution network, it is difficult to judge which part of image information plays an important role, so it is hard to optimize the network model by strengthening some image features. This paper proposes a method of action recognition which combines language processing technology with CNN. The proposed method uses the continuous bag-of-words (CBOW) model to assist CNN to complete the action recognition task by taking the cooccurrence information of the object pairs explicitly. The method is tested on two public datasets, namely Stanford 40 action and Pascal VOC Action 2012. The comparison result with the state-of-the-art methods shows that, as an exploration on the combination of language model and general CNN, the proposed method is satisfactory, whose accuracy rates reach 91.4% and 85.9% respectively.

Index Terms—action recognition, CBOW, DenseNet, deep learning

Quanzhi Gong, Yuxiang Xie, Jie Yan
College of Systems Engineering, National University of Defense Technology, Changsha, CHINA
Xidao Luan
College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, CHINA

[Download]

Cite:Quanzhi Gong, Yuxiang Xie, Jie Yan, Xidao Luan, "Human Action Recognition in Still Images Combining CBOW Language Processing and CNN, " Proceedings of 2022 the 12th International Workshop on Computer Science and Engineering (WCSE 2022), pp. 218-227, June 24-27, 2022.

PREVIOUS PAPER

Mathematical Expression Character Recognition Based on Object Detection

NEXT PAPER

Nondestructive Identification of Cyperus Esculentus Based on Machine Learning and Vis-NIR Hyperspectral Information