当前位置：网站首页>Pytorch processes RNN input variable length sequence padding

Pytorch processes RNN input variable length sequence padding

2022-04-22 12:13:00 【Guanrunwei is getting fatter and fatter】

Why? RNN Need to handle variable length input

Suppose we have an example of Emotional Analysis , Make an emotional classification of each sentence , The main process is roughly as shown in the figure below ：

The idea is simple , But when we do batch When the training data are calculated together , We will encounter multiple training samples with different lengths , In this way, we will proceed naturally padding, Short sentences padding To be the same as the longest sentence .

For example, as shown in the figure below ：

But there will be a problem , What's the problem ？ For example, above , The sentence “Yes” There is only one word , however padding 了 5 Of pad Symbol , And that leads to LSTM It is represented by a lot of useless characters , The resulting sentence representation will have errors , A more intuitive picture is as follows ：

So what should we do correctly ？

This leads to pytorch in RNN You need to deal with the need for variable length input . In the above example , All we want to get is LSTM After the word "Yes" The following expression , Instead of passing through multiple useless “Pad” The resulting representation ： Here's the picture ：

pytorch in RNN How to deal with lengthening padding

Mainly using functions

torch.nn.utils.rnn.pack_padded_sequence() as well as
torch.nn.utils.rnn.pad_packed_sequence()

To carry out , Let's take a look at the usage of these two functions .

there pack, It's better to understand it as compression . Will a Filled variable length sequence Compress .（ When filling , There will be redundancy , So press it down ）

The input shape can be (T×B×* ).T Is the longest sequence length ,B yes batch size,* Represents any dimension ( It can be 0). If batch_first=True Words , So the corresponding input size Namely (B×T×*).

Variable Sequence saved in , It should be sorted by the length of the sequence , Long in front , Short after （ Special attention needs to be paid to sorting ）. namely input[:,0] Represents the longest sequence ,input[:, B-1] Save the shortest sequence .

Parameter description :

input (Variable) – Variable length sequences Filled batch
lengths (list[int]) – Variable in The length of each sequence .（ Knowing the length of each sequence , To know how long each sequence has been processed to stop ）
batch_first (bool, optional) – If it is True,input The shape of should be B*T*size.

Return value :

One PackedSequence object . One PackedSequence The expression is as follows ：

The specific code is as follows ：

embed_input_x_packed = pack_padded_sequence(embed_input_x, sentence_lens, batch_first=True)
encoder_outputs_packed, (h_last, c_last) = self.lstm(embed_input_x_packed)

here , Back to h_last and c_last It's to eliminate padding After the character hidden state and cell state, All are Variable Type of . The meaning of the representative is as follows （ The expression of each sentence ,lstm A sentence that only affects its actual length , Not through useless padding character , The following figure shows with a red tick ）：

But back output yes PackedSequence Type of , have access to ：

encoder_outputs, _ = pad_packed_sequence(encoder_outputs_packed, batch_first=True)

take encoderoutputs In the transformation of Variable type , Got _ Represents the length of each sentence .

3、 ... and 、 summary

To sum up ,RNN When dealing with sentence sequences that are similar to variable length , We can use it together

torch.nn.utils.rnn.pack_padded_sequence()
torch.nn.utils.rnn.pad_packed_sequence()

To avoid padding Influence on sentence expression

版权声明
本文为[Guanrunwei is getting fatter and fatter]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/04/202204221206458180.html

当前位置：网站首页>Pytorch processes RNN input variable length sequence padding

Pytorch processes RNN input variable length sequence padding

Why? RNN Need to handle variable length input

pytorch in RNN How to deal with lengthening padding

3、 ... and 、 summary

边栏推荐

猜你喜欢

随机推荐