当前位置：网站首页>R-Dropout

R-Dropout

2022-04-22 00:29:00 【hxxjxw】

R-Drop yes Regularized Dropout

In order to solve Dropout In training and testing ( Reasoning ) The problem of inconsistency

Dropout In essence, it is a kind of integrated learning , That is, multiple neural networks are trained at the same time

R-Drop Made by Drop Different sub models produced , The distribution of their outputs should be consistent with each other .

say concretely , For each training sample ,R-Dropout Will be two sub models KL The divergence is minimized

in each mini-batch training, each data sample goes through the forward pass twice, and each pass is processed by a different sub model by randomly dropping out some hidden units. R-Drop forces the two distributions for the same data sample outputted by the two sub models to be consistent with each other, through minimizing the bidirectional Kullback-Leibler (KL) divergence between the two distributions

Code implementation

Pseudo code , Demonstration principle
import torch
from torch import nn
import numpy as np
 
# Simulate two-layer network 
def train(p, x, w1, b1, w2, b2):
    layer1 = np.maximum(0, np.dot(w1, x) + b1)
    mask1 = np.random.binomial(1, 1-p, layer1.shape)
    layer1 = layer1 * mask1
    layer1 = layer1 / (1-p)
    
    layer2 = np.maximum(0, np.dot(w2, layer1) + b2)
    mask2 = np.random.binomial(1, 1-p, layer2.shape)
    layer2 = layer2 * mask2
    layer2 = layer2 / (1-p)
    
    return layer2

# Simulate two-layer network 
def train_r_dropout(p, x, w1, b1, w2, b2):
    bs = x.shape[0]
    x = torch.cat((x,x), dim=0)
    
    #----------- primary Dropout Part remains the same ---------
    layer1 = np.maximum(0, np.dot(w1, x) + b1)
    mask1 = np.random.binomial(1, 1-p, layer1.shape)
    layer1 = layer1 * mask1
    layer1 = layer1 / (1-p)
    
    layer2 = np.maximum(0, np.dot(w2, layer1) + b2)
    mask2 = np.random.binomial(1, 1-p, layer2.shape)
    layer2 = layer2 * mask2
    layer2 = layer2 / (1-p)
    #-------------------------------------
    
    logits = func(layer2)
    logits1, logits2 = logits[:bs, :], logits[bs:, :]
    nll1 = nll(logits1, label)
    nll2 = nll(logits2, label)
    kl_loss = kl(logits1, logits2)
    loss = nll1 + nll2 + kl_loss
    
    return loss

def test(x, w1, b1, w2, b2):
    layer1 = np.maximum(0, np.dot(w1, x)+b1)
    layer2 = np.maximum(0, np.dot(w2, layer1) + b2)
    
    return layer2
    
 
input = np.random.randn(5, 4)
w1 = np.random.rand(30,20)
b1 = np.random.rand(30)
w2 = np.random.rand(40,30)
b2 = np.random.rand(40)
output1 = train(p=0.5, x=input.reshape(-1), w1=w1, b1=b1, w2=w2, b2=b2)
print(output1)
output2 = test(x=input.reshape(-1), w1=w1, b1=b1, w2=w2, b2=b2)
print(output2)

版权声明
本文为[hxxjxw]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/04/202204220025138424.html

当前位置：网站首页>R-Dropout

R-Dropout

Code implementation

边栏推荐

猜你喜欢

随机推荐