Georgia Tech的音乐技术组还有三篇不用深度学习的音乐表演评价文章。都是在Florida Bandmasters Association dataset这个数据库上做的。前两篇十分类似,最后一篇用sparse coding来无监督学特征。
其实整个三篇都没什么亮点,review的目的是吃吃鸡肋。
Towards the Objective Assessment of Music Performances
ICMPC 2016
此文可以不用读,直接跳到后两篇就好。。。
Vidwans, Amruta
Gururani, Siddharth
Wu, Chih-Wei
Subramanian, Vinod
Swaminathan, Rupak Vignesh
Lerch, Alexander
Objective descriptors for the assessment of student music performances
http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2017/06/Vidwans-et-al_2017_Objective-descriptors-for-the-assessment-of-student-music-performances.pdf
用DTW对齐了pitch track和reference score,然后将pitch track切成了notes。提取的特征分两类,一个是score-based,一类是score independent。两类特征合起来用效果最好。
Using score-based features, DTW alignment the pitch track with the reference score helps segment the pitch track into notes.
Features:
(1) note steadiness
(2) duration histogram
(3) DTW based feature, cost normalized by the DTW length and slope deviation
(4) note insertion ratio
(5) note deletion ratio
score-based and score independent features are the best performed one.
Dataset:
A subset of 394 students in Florida Bandmasters Association dataset, three grades and 4 assessment dimensions.
Wu, Chih-Wei
Lerch, Alexander
Learned Features for the Assessment of Percussive Music Performances
http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/01/Wu_Lerch_2018_Learned-Features-for-the-Assessment-of-Percussive-Music-Performances.pdf
此文用Sparce coding无监督的方式来学习特征,用来评价打击乐的表演。
数据库:274首中学生的军鼓表演。评价指标是musicality和节奏准确性。
特征:
Local histogram matrix (LHM)。使用了三个特征IOI, amplitude, average MFCC。将一整首表演切成10秒的小段落,对每个段落计算histogram。然后将这三个特征的histogram在特征维度和时间唯独concatenate一下变成一个矩阵。
模型:Sparse coding从LHM学出来的特征加上SVR回归
结果:Sparse coding+LHM和直接从LHM里面计算的统计量两种特征的效果相当。把两个特征一起用效果更好。
Using sparse coding unsupervised learning to learn features for the percussive music performance.
Dataset: 274 recordings of middle school snare etudes. Assessment of musicality and rhythmic accuracy.
Features:
LHM local histogram matrix:
(1) IOI (inter-onset interval) histogram vector
(2) Amplitude histogram vector
(3) Average MFCC vector
They segment the whole music piece into non-overlapped 10s segments, compute the local histogram vectors of above three features and concatenate these vector in both feature and time dimensions.
Baseline:
(1) a bunch of features
(2) statistics of LHM features (crest, skewness, ...)
(3) Sparse code of STFT
Model:
SVR
Metrics:
correlation coef and coef of determination
Results:
Learned features (Sparse code with LHM) achieve comparable results with the designed features, Finally, combining the designed features with the SC features, the highest performance can be achieved.
领取专属 10元无门槛券
私享最新 技术干货