Onset-Aware Polyphonic Piano Transcription: A CNN-Based Approach
Abstract— Automatic music transcription (AMT) transforms the musical audio content into symbolic
notations, including onsets, offsets and pitches. In this paper, we designed a polyphonic piano transcription
system based on Convolutional Neural Network (CNN), and it improves the note-level results. Our proposed
method has two advantages: Firstly, A CNN model is used to detect the onset event and align the onsets of
the notes into more accurate position. Secondly, the other CNN model is used to detect the onsets of 88 notes.
And we improve the model's performance by using dual-channel spectrogram as input, appropriate number of
convolution layers and the weights for the positive samples in loss function. The public dataset of MAPS is
adopted to train and evaluate. Finally, in the „ENSTDkCl‟ subset, our proposed solution achieves 85.15% on
note-level F1-measure. To the best of our knowledge, the result is highest F1-measure scores in the state of
Index Terms— polyphonic piano transcription, convolutional neural network, onsets detection, onset alignment
Sicong Kong, Wei Xu, Wei Liu, Xuan Gong, Juanting Liu, Wenqing Cheng
School of Electronic Information and Communications, Huazhong University of Science and Technology, CHINA
Cite: Sicong Kong, Wei Xu, Wei Liu, Xuan Gong, Juanting Liu, Wenqing Cheng, "Onset-Aware Polyphonic Piano Transcription: A CNN-Based Approach," Proceedings of 2019 the 9th International Workshop on Computer Science and Engineering, pp. 454-461, Hong Kong, 15-17 June, 2019.