音乐速度与节拍估计(二)教程阅读笔记
ISMIR 2021 速度、节拍与强拍估计教程阅读笔记
Annotate
自动标注 + 自动修正 + 人工修正
Baseline
See 音乐速度与节拍估计基本方法 | WiZardWen (wzw21.cn)
Evaluate
F-measure
- 将节拍估计问题视为二分类问题来进行评估
- Typically use a tolerance window of +/- 70ms around each ground truth annotation.
- 使用 F-measure 的好处:Catch either: i) natural human variation in tapping and not punish it, or ii) contend with cases like arpeggiated chords where it’s difficult to mark a single beat location.
- 计算方式
Cemgil’s method
- 使用高斯分布作为分值来进行评估
- 将 F-measure 与 Cemgil 结合使用,F-measure 反映所估计节拍的 metrical level and phase,Cemgil 则反映节拍位置是否正确(更精确)
Continuity-based method
- Consider beat i to be accurate if it falls within the tolerance window and that beat i-1 also falls within its respective tolerance window.
- CMLc: “correct” (i.e., annotated) metrical level, with longest single continous segment.
- CMLt: “correct” (i.e., annotated) metrical level, with the total of continous segments.
- AMLc: “allowed” metrical levels, with longest single continous segment.
- AMLt: “allowed” metrical levels, with the total of continous segments.
- Metrical level 包括
- The same metrical level and “in-phase”
- The same metrical level, but tapped on the “off-beat”
- Twice the annotated metrical level
- Half the annotated metrical level (两种)
Others
- 评估方法的选择
- 拓展到 Tempo 和 Downbeat
Theoretical Underpinnings
General pipeline commonly used for beat and/or downbeat tracking systems: feature extraction -> likelihood estimation -> post-processing.
Feature extraction
- Three most explored categories of musically inspired features
- Chroma (CH), reflect the harmonic content of the signal
- Onset detection function (ODF) or spectral flux (SF), event-oriented indicators
- Spectral coefficients or MFCCs, timbre inspired features
- Combinations of logarithmic spectrograms with different resolutions (recently)
Likelihood estimation
- Heuristics method or traditional machine learning method
- Template
- GMM, k-means: recognize rhythm patterns
- Limitation: need assumptions of style or genre
- Deep learning method
Inference (post-processing)
- Obtain the final downbeat sequence
- Most used: Probabilistic graphical models (PGMs)
- PGMs are a set of probabilistic models that express conditional dependencies between random variables as a graph.
- Directed or Undirected
Pros and Cons of DL
- Flexible and adaptable across tasks
- Remove the stage of hand-crafted feature design
- Dependence on annotated data
- Bias of the data
- Lack of interpretability
DNN 的各种结构
- 包括 MLP、CNN、RNN、GRU、双向 RNN 的简单介绍
- TCN:
- Uses dilated convolutions which enable exponentially large receptive fields
- Good at learning sequential/temporal structure
- Handle more context while retain the parallelisation property of CNN
- Trained more efficiently than RNN, LSTM or GRU
Unfinished
本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来自 WiZardWen!
评论