当前位置:网站首页>Contrastive Learning Series (3)-----SimCLR

Contrastive Learning Series (3)-----SimCLR

2022-08-11 08:46:00 Tao Jiang

SimCLR

SimCLRRepresentations are learned by maximizing the consistency of the same data under different augmentations through a contrastive loss in the hidden space.SimCLRThe framework has four main components,分别是:数据增广,encode网络,projection headNetworks and Contrastive Learning Functions.

在这里插入图片描述
对于数据 x x x,Extract two independent data augmentation operators from the same data augmentation family( t ∼ T t \sim T tT t ′ ∼ T {t}' \sim T tT),to get two related views x ^ i \hat{x}_{i} x^i x ^ j \hat{x}_{j} x^j, x ^ i \hat{x}_{i} x^i x ^ j \hat{x}_{j} x^j是一对正样本,Then a neural network encoder f ( ⋅ ) f\left( \cdot \right) f()Extract features from augmented data h i = f ( x ^ i ) , h j = f ( x ^ j ) , h_{i}=f\left( \hat{x}_{i} \right), h_{j}=f\left( \hat{x}_{j} \right), hi=f(x^i),hj=f(x^j),.And then a small neural networkproject head g ( ⋅ ) g\left( \cdot \right) g()Map features into the space of contrastive losses.project headwith one hidden layerMLP获取 z i = g ( h i ) = W ( 2 ) σ ( W ( 1 ) h i ) z_{i} = g\left( h_{i} \right) = W^{\left( 2 \right)} \sigma \left( W^{\left( 1 \right)} h_{i}\right) zi=g(hi)=W(2)σ(W(1)hi).

For contains a pair of positive samples x ^ i \hat{x}_{i} x^i x ^ j \hat{x}_{j} x^j的集合 { x ^ k } \{ \hat{x}_{k} \} { x^k},The contrast prediction task aims for a given x ^ i \hat{x}_{i} x^i { x ^ } k ≠ i \{ \hat{x} \}_{k \neq i} { x^}k=i中识别出 x ^ j \hat{x}_{j} x^j.随机挑选 N N N个样本组成一个minibatch,这个minibatch中则有 2 N 2N 2N个数据样本,将其他 2 ( N − 1 ) 2\left( N - 1\right) 2(N1)an amplified sample as thisminibatch中的负样本,设 s i m ( u , v ) = u T v / ∥ u ∥ ∥ v ∥ sim\left( u, v\right) = u^{T}v / \| u\| \| v\| sim(u,v)=uTv/∥u∥∥v表示 l 2 l_{2} l2Yours after regularization u u u v v v的点积,Then for a pair of positive samples ( i , j ) \left( i, j \right) (i,j),The loss function is defined as follows:

l i , j = − l o g e x p ( s i m ( z i , z j ) / τ ) ∑ k = 1 2 N 1 [ k ≠ i ] e x p ( s i m ( z i , z k ) / τ ) l_{i,j} = - log \frac{exp\left( sim \left( z_{i}, z_{j}\right) / \tau \right)}{\sum_{k=1}^{2N} \mathbb{1}_{[ k \neq i]} exp\left( sim \left( z_{i}, z_{k}\right) / \tau \right)} li,j=logk=12N1[k=i]exp(sim(zi,zk)/τ)exp(sim(zi,zj)/τ)

The final loss function computes aminibatchAll positive sample pairs in ,包括 ( i , j ) \left( i, j \right) (i,j) ( j , i ) \left( j,i \right) (j,i).下面是SimCLR的伪代码.从伪代码中可以看出,编码器 f ( ⋅ ) f\left( \cdot \right) f()和project head g ( ⋅ ) g\left( \cdot \right) g() Parameters are updated during training,But only the encoder f ( ⋅ ) f\left( \cdot \right) f()用于下游任务.
在这里插入图片描述
simCLR不采用memory bank的形式进行训练,rather increasebatchsize,bacth size为8192,对于每一个正样本,将会有16382Instances of negative samples.增大batch sizeActually equivalent to eachminibatchdynamically generate onememory bank.The papers found using standard onesSGD/Momentum,大batch sizeIt is unstable during training,论文中采用LARS优化器.

参考

  1. The Illustrated SimCLR Framework
  2. SimCLR
原网站

版权声明
本文为[Tao Jiang]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/223/202208110811118567.html