当前位置：网站首页>Lightweight network (1): MobileNet V1, V2, V3 series

Lightweight network (1): MobileNet V1, V2, V3 series

2022-08-11 08:45:00 【Tao Jiang】

轻量级网络(一)：MobileNet V1,V2, V3系列

文章目录

轻量级网络(一)：MobileNet V1,V2, V3系列

在实际应用中,It's not just about the accuracy of the model,There is also a need to focus on the speed of the model.In consideration of both accuracy and speed,Lightweight networks came into being.Lightweight networks have performance comparable to bulky models,But compared to bulky models,有更少的参数和计算量,对硬件更友好.Lightweight network development so far,已经涌现了SqueezeNet系列,MobileNet系列,ShuffleNet系列,EfficientNet等等系列.This article is only for expositionMobileNet从V1到V3的发展历程.

MobileNet V1

MobileNet V1The main innovation of the version is the replacement of standard convolutions with depthwise separable convolutions.Depthwise separable convolutions are compared to standard convolutions,It can effectively reduce the amount of calculation and model parameters.如下表所示,Using depthwise separable convolutionsMobileNet,compared to using standard convolutionMobileNet,在ImageNetAccuracy on the dataset dropped1.1%,But the model parameters are reduced by approx7倍,The addition and multiplication calculations are reduced by approx9倍.
在这里插入图片描述

卷积计算量

标准卷积

Assume a standard convolutional input feature map $F$ 为 $D_{F} \times D_{F} \times M$ ,输出特征图 $G$ 为 $D_{F} \times D_{F} \times N$ ,其中是 $D_{F}$ 表示特征图的宽和高, $M$ 和 $N$ is the number of channels in the feature map.假设卷积核为 $K$ ,尺寸为 $D_{K} \times D_{K} \times M \times N$ ,其中 $D_{K}$ are the kernel width and height.

Then a standard convolution operation,stride设为1,paddingMake the length and width of the output feature map the same as the input feature map,Then the calculation amount of standard convolution $D_{K} \cdot D_{K} \cdot M \cdot N \cdot D_{F} \cdot D_{F}$

深度可分卷积

深度可分卷积（depthwise separable convolution ）is a form of decomposable convolution,It decomposes standard convolution into one**深度卷积(depthwise convolution)**和一个 $\times 1$ 卷积,其中 $\times 1$ Convolution is calledpointwise convolution.深度卷积的filters尺寸大小为 $D_{K} \times D_{K} \times 1 \times M$ ,It applies a separate one on each input channelfliter,pointwise convolution利用 $\times 1$ Convolution fuses the output channels of depthwise convolutions.这样分解,It can effectively reduce the amount of calculation and reduce the size of the model.

Suppose the input feature map of a depthwise separable convolution is $D_{F} \times D_{F} \times M$ ,Its output feature map is $D_{F} \times D_{F} \times M$ ,其中是 $D_{F}$ 表示特征图的宽和高, $M$ 和 $N$ is the number of channels in the feature map.

深度卷积: 卷积核为 $D_{K} \times D_{K} \times 1 \times M$ ,Then the output feature map size is $D_{F} \times D_{F} \times M$ .
1*1卷积：卷积核为 $\times 1 \times M \times N$ ,最后的输出为 $D_{F} \times D_{F} \times M$ .

The computational cost of depthwise separable convolution is the sum of depthwise convolutionspointwise convolutionthe sum of the calculations,为 $D_{K} \cdot D_{K} \cdot M \cdot D_{F} \cdot D_{F} + M \cdot N \cdot D_{F} \cdot D_{F}$ .

The following formula calculates the ratio of the computation amount of the depthwise separable convolution to the computation amount of the standard convolution,It can be seen that the depthwise separable convolution is a standard convolution $\frac{1}{N} + \frac{1}{D_{K}^{2}}$ .一般情况下,卷积核大小为 $\times 3$ ,为 $D_{K}^{2} = 9$ ,卷积核的通道数 $N$ The value of is generally greater than 9,那么 $\frac{1}{N} + \frac{1}{D_{K}^{2}} > \frac{1}{9}$ .
$\frac{深度可分卷积}{标准卷积}=\frac{D_{K} \cdot D_{K} \cdot M \cdot D_{F} \cdot D_{F} + M \cdot N \cdot D_{F} \cdot D_{F}}{D_{K} \cdot D_{K} \cdot M \cdot N \cdot D_{F} \cdot D_{F}}= \frac{1}{N} + \frac{1}{D_{K}^{2}}$

MobileNet V1The network architecture diagram is shown below：
在这里插入图片描述

Model slimming down

From the comparison of the above-mentioned depthwise separable convolution and standard convolution,我们发现MobileNetThe architecture has fewer parameters and less computation than other convolutional network structures,But there are still many scenarios that require smaller and faster models.为了得到更小的模型,Two hyperparameters are introduced in the paper,宽度倍增器(width multiplier, $\alpha$ )and a resolution multiplier(Resolution Multiplier, $\rho$ ),Slimming the model.The width multiplier acts on the number of channels per layer,The resolution multiplier works on the resolution size of the input image.

宽度倍增器

$\alpha$ 被称为宽度倍增器（width multiplier）,The role is to slim down the network.对于某一层,Defines the width multiplier $\alpha$ ,Then enter the number of channels $M$ 则变成 $\alpha M$ ,输出通道数 $N$ 将变成 $\alpha N$ . $\alpha \in \left(0, 1 \right]$ ,一般取 $1, 0.75, 0.5, 0.25$ ,当时 $\alpha =1$ 是基础版MobileNet,当 $\alpha < 1$ is a shortened versionmobileNet.The amount of computation of the reduced depthwise separable convolution is shown in the following formula,The amount of computation is reduced $\alpha^{2}$ 倍.
$D_{K} \cdot D_{K} \cdot \alpha M \cdot D_{F} \cdot D_{F} + \alpha M \cdot \alpha N \cdot D_{F} \cdot D_{F}$

分辨率倍增器

$\rho$ is called a resolution multiplier(Resolution Multiplier),Input size increases/缩小 $\rho$ 倍,Then each layer in the network grows accordingly/缩小 $\rho$ 倍. $\rho \in \left(0, 1 \right]$ ,The input resolution of the network is generally taken $224, 192, 160, 128$ .当时 $\rho =1$ 为基础版MobileNet,当 $\rho < 1$ is a shortened versionmobileNet.The amount of computation of the reduced depthwise separable convolution is shown in the following formula,It can be seen that the amount of calculation is reduced $\rho^{2}$ 倍.

$D_{K} \cdot D_{K} \cdot \alpha M \cdot \rho D_{F} \cdot \rho D_{F} + \alpha M \cdot \alpha N \cdot \rho D_{F} \cdot \rho D_{F}$

MobileNet V2

MobileNet V2 的创新点在于Inverse residual module with linear bottleneck( the inverted residual with linear bottleneck).在论文中,作者发现Using nonlinear functions in low-dimensional spaces loses some information,But in high dimensional space,The losses are relatively small.因此,引入Inverse residuals（Inverted Residuals）的概念,First increase the dimension and then do the convolution,Features are relatively well preserved.Increasing the dimension will increase the amount of calculation,因此需要降维,And because of nonlinearity, information will be lost,Therefore, dimensionality reduction is carried out in a linear way,is called a linear bottleneck（Linear Bottlenecks）.

Bottleneck residual block

在这里插入图片描述
MobileNetV2和MobileNetV1相比,如上图所示,The depthwise convolutional sums are preserved $\times 1$ 卷积,增加了 $1 * 1$ Convolutional linear layer,MobileNet V2The base component is called Bottleneck residual block.不过 $\times 1$ Convolution is after the input,Before depthwise convolution,The purpose is to expand the dimension.实验验证,使用 $\times 1$ The purpose of linear convolution is to prevent nonlinearity from destroying too much information.strides为1时,Learn from the residual structure,However, it is different from the first decrease and then the increase of the number of channels in the residual structure,MobileNetV2The number of channels is increased first and then decreased.

假设输入特征图大小为 $\times w \times k$ ,首先经过一个 $\times 1$ 大小的卷积,输出特征图的大小为 $\times w \times \left(tk\right)$ ,其中 $t$ 是膨胀系数,MobileNet V2taken from the network architecture $t = 6$ .followed by one $\times 3$ 大小的卷积,步数为 $s$ ,输出特征图大小为 $\frac{h}{s} \times \frac{w}{s} \times \left( tk \right)$ ,最后经过一个 $\times 1$ A linear convolution of size,得到最后的输出,输出特征图大小为 $\times w \times {k}'$ .

在这里插入图片描述

模型架构

MobileNet V2的网络架构如下图所示,其实 $t$ 为膨胀系数, $c$ 是通道数, $n$ 是重复次数, $s$ 是步数.

在这里插入图片描述

MobileNet V3

MnasNet在MobileNet V2 Bottleneck block的基础上加入SEModules introduce lightweight attention,如上图所示,In order to apply attention to the maximum representation,SEThe module is used after the dilated depthwise separable convolution.MobileNet V3Included in the network architectureMnasNet和MobileNet V2基础块.

在这里插入图片描述
MobileNet V3The construction of the network architecture consists of two steps：首先,由platform-aware NAS和NetAdaptAlgorithm compound network search search basic architecture;Then several new components are introduced to improve model performance,形成最终模型.platform-aware NASSearch the global network structure by optimizing each network block,NetAdaptThe algorithm searches each layerfilter数量.MobileNet V3A new nonlinear function is introduced in ,h-swish（swish的改进版）,It's capable of faster stages,And quantification is more friendly.MobileNet V3 under the premise of maintaining accuracy,Redesigned computationally expensive network start and end layers,减少延迟.See this paper for details.

$\; x = x \cdot \sigma \left( x \right) \\ h-swish\left[ x \right] = s \frac{ReLU6\left( x + 3 \right)}{6}$

MobileNet V3有两个模型：MobileNetV3-Large 和 MobileNetV3-Small.The network structure of the large and small models is shown in the following table,They target high and low resource use cases, respectively.其中SE表示在block中是否包括Squee-And-Excite.NLis a nonlinear type,其中HS表示h-swish,RE表示ReLU.NBN表示没有BN操作.

MobileNetV3-Large网络结构如下所示：

MobileNet V3 Large
MobileNet V3 small网络结构如下所示：

MobileNet V1,V2,V3对比

以下是MobileNetV1-V3The networks are thereImageNet-1k上的top-1Accuracy and parameters are compared.

Network	Top-p1	Params
MobileNetV1	70.6	4.2M
MobileNetV2	72.0	3.4M
MobileNetV2(1.4)	74.7	6.9M
MobileNetV3-Large(1.0)	75.2	5.4M
MobileNetV3-Large(0.75)	73.3	4.0M
MobileNetV3-Small(1.0)	67.4	2.5M
MobileNetV3-Small(0.75)	65.4	2.0M