本文作者:木石 https://zhuanlan.zhihu.com/p/129279351 本文已由原作者授权,不得擅自二次转载
论文:https://arxiv.org/abs/2004.04730
代码链接(还未开源):
https://github.com/facebookresearch/SlowFast
PS;这篇工作作者就Christoph Feichtenhofer老哥一个人,也太秀了吧! 不过致谢(后援团)中出现了下面几位(瑟瑟发抖.jpg):
This paper presents X3D, a family of efficient video networks that progressively expand a tiny 2D image classification architecture along multiple network axes, in space, time, width and depth. Inspired by feature selection methods in machine learning, a simple stepwise network expansion approach is employed that expands a single axis in each step, such that good accuracy to complexity trade-off is achieved. To expand X3D to a specific target complexity, we perform progressive forward expansion followed by backward contraction. X3D achieves state-of-the-art performance while requiring 4.8\x~and 5.5\x~fewer multiply-adds and parameters for similar accuracy as previous work.
expand model from the 2D space into 3D spacetime domain.
关于坐标下降法,以下几点需要注意:
考虑一个优化任务:
一个块坐标下降的通用框架如下图所示:
find relevant features to improve in a greedy fashion by including (forward selection) a single feature in each step, or start with a full set of features and aim to find irrelevant ones that are excluded by repeatedly deleting the feature that reduces performance the least (backward elimination).
X2D baseline
文章中设计了以下几种Expansion operations:
定义
expansion is simple and cheap e.g. our low-compute model is completed after only training 30 tiny models that accumulatively require over 25× fewer multiply-add operations for training than one large state-of-the-art network
如果缩放后的model超过了target complexity(GFLOPs), 对缩放因子expansion-rate大小进行略微的压缩,比如略小于2
图中显示了每个step选择的最优expand的维度。
对应选取的模型
version of X3D-M therefore has the same number of parameters,