TMI 2019 | Multi-Modal Knowledge Distillation

机器学习炼丹术

发布于 2023-09-26 16:53:33

1880

文章被收录于专栏：机器学习炼丹术机器学习炼丹术

related work

在展示方法之前，我们需要回顾一下启发我们multi-modal laearning scheme的设计的两个关键部分：

separating internal feature normalizations for each modality, given the very different statistical distributions of CT and MRI;
knowledge distillation from pre-softmax activations.

可以看到，作者提出了一种新的结构chilopod-shape，并且用上了knowledge distillation的方法。

Separate internal feature normalization

就是说，CT和MRI分别使用了不同的normalization层，相当于复用了所有的卷积核，除了这哥normalization层。

Knowledge Distillation Loss

The assumption of KD is that the probabilities from softmax contain richer information than one-hot outputs.
We describe this process for 2D below, while it can be easily extended to 3D. 作者是在2D维度上进行的，他说可以简单扩展到3D的。

我们假设softmax之前的activation tensor是NxWxHxC维度的，N是batchsize，C是channels，C等于类别数目。我们最终为每一个标签蒸馏出一个C的向量。

然后计算probability distribution：

上面计算的z是在softmax之前的，这里的p则是softmax之后的概率分布。这里的T就是temperature scalar（Distilling the knowledge in a neural network，2015）提出的temperature scale，为了softer output的。作者将T设置为2，如果T=1那么就是传统的softmax。

作者用这样方法蒸馏出CT和MRI对于不同标签的分布

q^a

a^b