WCSE 2021 SPRING ISBN: 978-981-18-1791-5
DOI: 10.18178/wcse.2021.02.002

Spoken Digit Classification: A Method Using Convolutional Neural Network and Mixed Feature

He Ba

Abstract— Spoken digit recognition is one of the hot research fields of artificial intelligence. Many previous works have been done in this field, but few of them focus on recognizing a single digit and few of them would use Convolutional Neural Network (CNN). In addition, as a widely spoken language, relatively few works have been done in recognizing Chinese digits. Among the existing Chinese spoken digit recognition method, no previous works have been done using convolutional neural network and Mel Frequency Cepstral Coefficents (MFCC) feature. This paper proposes a new method that uses the mixture of short-time Fourier transform (STFT) and MFCC feature as the neural network’s input and uses a convolutional neural network as a classifier. Besides, this paper, our method is applied to Chinese dataset. The new method acquires an accuracy higher than 90% in both English and Chinese dataset.

Index Terms— Chinese spoken digit recognition; Deep neural network; Signal processing; MFCC; Spectrogram

He Ba
Nankai University, CHINA

[Download]


Cite: He Ba, "Spoken Digit Classification: A Method Using Convolutional Neural Network and Mixed Feature, " Proceedings of 2021 the 11th International Workshop on Computer Science and Engineering (WCSE 2021), pp. 7-12, February 25-27, 2021.