Spoken Digit Classification: A Method Using Convolutional Neural Network and Mixed Feature
Abstract— Spoken digit recognition is one of the hot research fields of artificial intelligence. Many previous works have been done in this field, but few of them focus on recognizing a single digit and few of them would use Convolutional Neural Network (CNN). In addition, as a widely spoken language, relatively few works have been done in recognizing Chinese digits. Among the existing Chinese spoken digit recognition method, no previous works have been done using convolutional neural network and Mel Frequency Cepstral Coefficents (MFCC) feature. This paper proposes a new method that uses the mixture of short-time Fourier transform (STFT) and MFCC feature as the neural network’s input and uses a convolutional neural network as a classifier. Besides, this paper, our method is applied to Chinese dataset. The new method acquires an accuracy higher than 90% in both English and Chinese dataset.
Index Terms— Chinese spoken digit recognition; Deep neural network; Signal processing; MFCC; Spectrogram
Nankai University, CHINA
Cite: He Ba, "Spoken Digit Classification: A Method Using Convolutional Neural Network and Mixed Feature, " Proceedings of 2021 the 11th International Workshop on Computer Science and Engineering (WCSE 2021), pp. 7-12, February 25-27, 2021.