当前位置：网站首页>Paper Notes: BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition

Paper Notes: BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition

2022-08-11 04:52:00 【shier_smile】

在这里插入图片描述

论文地址：https://arxiv.org/abs/1912.02413

代码地址：https://github.com/megvii-research/BBN

文章目录

1 动机

1.1 问题

The author states that it is used for long-tailed任务中常用的class rebalanceThe method shows good results though,But will break the model fordeep features的表现能力.

在这里插入图片描述

作者分析Class re-balance The method can produce better classification performance,但是在通过class re-balance之后,The intra-class distribution for each class becomes more dispersed.

1.2 How class re-balancing strategies work

In order to further verify the correctness of the above point of view,Controlled experiments were performed using the single control variable method.

The model is divided intofeature extrator(backbone)和classifier两部分.
对于class re-balancemethods were selectedRe-Sampling、Re-Weighting两种,Plus what is often used during normal classificationCross entropy进行实验.
A two-stage experimental approach was designed：
（1）Representation learning manner：先直接使用Cross Entropy或class re-balancemethod only for classification modelsfeature extratorpart of the training.
（2）Classifier learning manner：Fix the model againfeature extratorThe parameter does not move,参照（1）The training strategy in is focused on the modelclassifierpart of the training.

在这里插入图片描述

通过横向对比（控制Classifier learning manner不变）,对于Representation learning manner使用RW（Re-Weighting）和RS（Re-Sampling）both degrade performance.
通过纵向对比（控制Representation learning manner不变）,对于Classifier learning manner使用RW和RSThe classification performance is improved.
不仅在Long-tailed CIFAR-100-IR50(左图)This phenomenon appears above,在Long-tailed-10-IR50(右图)The same result is also reflected in the above.
在训练模型feature extrator阶段使用Cross Entropy,在训练classifier阶段使用RSThe best classification results were obtained.

结论: RW和RSIt can improve the performance of the classifier,But will reduce the model fordeep features的表达.

1.3 解决方法

A general model is proposedBBN,兼顾了representation learning 和classifier learning,
A new cumulative learning strategy is developed,用于调整BBNTwo branches of the model(conventional learning和Re-Balancing)学习,Its specific manifestation is as follows: Make the model more inclined to learn first during the training processuniversal patternThen gradually focus ontail class

2 BBN(Bilateral-Branch Network)

在这里插入图片描述

2.1 bilator-branch结构

(1) Data samplers

conventional learning branch采用uniform sampler
Re-Balance brach采用reversed sampler, The sampling probability is calculated as :

$P_i=\frac{w_i}{\sum^C_{j=1}w_j}\\ w_i=\frac{N_{max}}{N_i}$

Pass probability first $P_i$ 对类别进行采样,Then uniformly sample the class samples.Then, the samples obtained by the two branches are simultaneously input into the model for training.

(2) Weight share

使用了ResNet-32和ResNet50作为骨干网络, 除去最后一个residual block之外, 其他的blockWeights are shared on both branches.

作用:

conventional learningFeatures learned on branches can be better usedRe-Balance分支
Reduce the amount of network computation.

2.2 cumulative learning strategy

在训练过程中通过 $\alpha$ The parameter adjusts the weights of the two different branches, $\alpha$ 随着epochincrease gradually decreased,而在inference过程In the simple will $\alpha$ 设置为0.5.

训练过程中 $\alpha$ The way to change is:
$\alpha=1-(\frac{T}{T_{max}})^2\\ T:当前的epoch\\ T_{max}:最大epoch$

2.3 输出logit和损失函数

输出logit:
$z=\alpha W^T_cf_c+(1-\alpha)W^T_rf_r$
loss:
$L=\alpha E(\hat{p}, y_c)+(1-\alpha)E(\hat{p}, y_c)$
其中:

$f_c$ : conventional learning分支中通过GAPeigenvectors after that

$f_r$ : Re-Balance 分支中通过GAPeigenvectors after that

$W^T$ : classifier的权重.

3 实验

3.1 实验参数

1 CIFAR-LT(10, 100)

(1) preprocess:

random crop:32x32
horizontal flip
padding:4 pixels each side

(2) backbone: ResNet32

(3):training details:

momentum: 0.9
weight decay: $2*10^{-4}$
batchsize:128
epochs:200
lrschduler: multistep(0, 120, 160), gamma:0.01, startlr=0.1

2 iNaturalist(2017,2018)

(1)preprocess:

random_resized_crop(先resize到256再crop到224)
random_horizontal_flip

(2)backbone:ResNet50

(3)training details:

momentum:0.9
weight decay:$1*10{-4}
batchsize: 128(代码中参数)
epochs:180 (代码中参数)
lrscheduler: multistep(0, 120, 160) gamma:0.1, base_lr:0.4(代码中参数) ps:given in the paper hereepoch是60, 80But the code is specific120, 160

3.2 实验结果

1 同其他Class balance性能对比

在这里插入图片描述

2 自身对比

在Re-Balancing Branches are compared using different sampling methods.

在这里插入图片描述

对于 $\alpha$ comparison of different changes.

在这里插入图片描述

3 消融实验

在这里插入图片描述

BBN的ConventionalBranching performance versus direct useCE相近,这表明了BBNModel reserved forLong-tailedFeature extraction capabilities of the data.而BBN的Re-BalancingBranching effect ratioRW和RS要好,The author stated that this is because of the weight sharing among the modelsConventionalThe features learned by the branch are better usedRe-Balance分支上.

在这里插入图片描述

作者还对BBN模型的ClassifierThe weights in are visualized,并与其他class balancemethods were compared.

BBN-ALL的方差最小, RW和RSAlthough the distribution is relatively flat, the variance ratio is higherBBN-ALL略大.
BBN-CB(Conventional分支)的分布情况和CE相似.
BBN-RB(Re-Balance分支)distribution fitsreversed sampling distribution.