当前位置：网站首页>Senet | attention mechanism - source code + comments

Senet | attention mechanism - source code + comments

2022-04-22 03:47:00 【Graduate students are not late】

List of articles

1 SeNet Introduce
2 SeNet advantage
3 Se Specific introduction of the module
4 Complete code

1 SeNet Introduce

SENet yes Squeeze-and-Excitation Networks For short , from Momenta Made by the company on 2017CVPR, In the paper SENet Won ImageNet The last session （ImageNet 2017） Image recognition champion
SENet It's mainly about learning channel The correlation between , The attention to the channel was screened out , A little bit more computation , But the effect is better .
Automatically obtain the importance of each feature channel through learning , Then according to this degree of importance to enhance the useful features , And suppress features that are not useful for the current task .
Se The idea of module is simple , Easy to implement , It is easy to load into the existing network model framework .

2 SeNet advantage

Add a few parameters , And can improve the accuracy of the model to a certain extent .
Is in ResNet Strategy based on , Innovation is good , It is very suitable for creating new models with high accuracy .
It is easy to insert into your own deep neural network model , To improve the accuracy of the model .

3 Se Specific introduction of the module

Sequeeze： Along Spatial dimension （channel） For feature compression , Turn each two-dimensional feature channel into a real number , This real number has a global receptive field to some extent , And the dimension of output matches the number of characteristic channels of input . It represents the global distribution of the response on the characteristic channel , And the layer close to the input can also obtain the global receptive field .
Specific operation （ It corresponds to the numbers in the code one by one ）： For the original feature map 50×512×7×7 Conduct global average pooling, Then I got a 50×512×1×1 A feature map of size , This feature map has a global receptive field .
Excitation ： Output 50×512×1×1 Characteristics of figure , Through two fully connected neural networks , Finally, use a It is similar to that in cyclic neural network Door mechanism , Generate weights for each feature channel through parameters , The parameters are learned to explicitly model the correlation between feature channels ( The paper uses sigmoid).50×512×1×1 become 50×512 / 16×1×1, Finally, restore it ：50×512×1×1
Feature recalibration ： Use Excitation The result obtained is used as the weight , Then, it is weighted to... Channel by channel by multiplication U Of C On one channel （50×512×1×1 adopt expand_as obtain 50×512×7×7）, Complete the recalibration of the original features in the channel dimension , And as the input data of the next level .

class SELayer(nn.Module):
    def __init__(self, channel, reduction=16):
        super(SELayer, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Sequential(
            nn.Linear(channel, channel // reduction, bias=False),
            nn.ReLU(inplace=True),
            nn.Linear(channel // reduction, channel, bias=False),
            nn.Sigmoid()
        )

    def forward(self, x):
        b, c, _, _ = x.size()
        y = self.avg_pool(x).view(b, c)
        y = self.fc(y).view(b, c, 1, 1)
        return x * y.expand_as(x)

Insert picture description here

4 Complete code

import numpy as np
import torch
from torch import nn
from torch.nn import init


class SEAttention(nn.Module):

    def __init__(self, channel=512, reduction=16):
        super().__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)  #  Global mean pooling   The output is c×1×1
        self.fc = nn.Sequential(
            nn.Linear(channel, channel // reduction, bias=False),  # channel // reduction Represents channel compression 
            nn.ReLU(inplace=True),
            nn.Linear(channel // reduction, channel, bias=False),  #  Restore 
            nn.Sigmoid()
        )

    def init_weights(self):
        for m in self.modules():
            print(m)  #  Not running here 
            if isinstance(m, nn.Conv2d):  #  Judge type function ——：m yes nn.Conv2d Class? ？
                init.kaiming_normal_(m.weight, mode='fan_out')
                if m.bias is not None:
                    init.constant_(m.bias, 0)
            elif isinstance(m, nn.BatchNorm2d):
                init.constant_(m.weight, 1)
                init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                init.normal_(m.weight, std=0.001)
                if m.bias is not None:
                    init.constant_(m.bias, 0)

    def forward(self, x):
        b, c, _, _ = x.size()  # 50×512×7×7
        y = self.avg_pool(x).view(b, c)  # ① maxpool After that ：50×512×1×1 ② view Shape get 50×512
        y = self.fc(y).view(b, c, 1, 1)  # 50×512×1×1
        return x * y.expand_as(x)  #  according to x.size To expand y


if __name__ == '__main__':
    input = torch.randn(50, 512, 7, 7)
    se = SEAttention(channel=512, reduction=8)  #  Instantiation model se
    output = se(input)
    print(output.shape)