Hawthorne, Curtis
Elsen, Erich
Song, Jialin
Roberts, Adam
Simon, Ian
Raffel, Colin
Engel, Jesse
Oore, Sageev
Eck, Douglas
https://arxiv.org/pdf/1710.11153.pdf
此文的方法用了onset和frame两个objectives。在学习时同时minimize这两个losses,在inference时用onset来限制frame level的pitch prediction. 结果比之前的state-of-the-art好了去了。
另外源代码公开 https://goo.gl/7zTMPf,
Colab demo: https://colab.research.google.com/notebook#fileId=/v2/external/notebooks/magenta/onsets_frames_transcription/onsets_frames_transcription.ipynb
附带的blog演示内容丰富: https://magenta.tensorflow.org/onsets-frames
符合Google做事情的一贯风格。
Two task and two objectives learning for piano transcription.
Our previously reviewed works only use NN to predict the pitch at frame level. This work predicts both pitch and onset by jointly minimize these two losses.
In the inference, they added some restrictions, such as an activation from the frame detector is only allowed to start the note when an onset is presented in that frame. The frame loss is weighted according to the distance between the current frame to the onset frame.
Results:
On all evaluation metrics (1) frame (2) note, (3) note with offset, this method is much better than the state-of-the-art.
Remarks:
(1) The input representation of the NN is not a small frame context, but 20 seconds segments.
(2) There is a network connection between onset output representation to the frame BLSTM. I guess the intuitive behind this is to combine onset feature with frame feature.
(3) They didn't share the conv layer for learning both frame and onset representations. Apparently, this allows learning better features for onset and frame.
(4) The performance gain in this work mainly comes from two sources - joint onset and frame training and restricted inference.
领取专属 10元无门槛券
私享最新 技术干货