当前位置：网站首页>Read LSTM (long short term memory)

Read LSTM (long short term memory)

2022-04-23 10:03:00 【Code ape chicken】

LSTM（Long Short-Term Memory）

0. from RNN Speaking of

Cyclic neural network （Recurrent Neural Network,RNN） It's a kind of neural network for processing sequence data . Compared to the general neural network , He's able to process data that changes in sequence . For example, the meaning of a word will have different meanings because of the content mentioned above ,RNN Can solve this kind of problem very well .

1. Ordinary RNN

Let's briefly introduce the general RNN.

The main form is shown in the figure below （ The pictures are from Professor Li Hongyi of NTU PPT）：
Insert picture description here

here ：

$x$ Input data for the current state , $h$ Represents the input received from the previous node .

$y$ Is the output of the current node state , and $h^{'}$ For the output passed to the next node .

From the formula above, we can see that , Output h’ And x and h All of the values are related .

and y We often use h’ Put it into a linear layer （ It's mainly about dimension mapping ） And then use $s o f t m a x$ Classify to get the data you need .

Right here $y$ How to use $h^{'}$ Calculation often depends on the use of specific models .

Through input in the form of sequence , We can get the following form of RNN. Insert picture description here

2. LSTM

2.1 What is? LSTM

Long and short term memory （Long short-term memory, LSTM） It's a special kind RNN, It is mainly to solve the problems in the process of long sequence training Gradient vanishing and gradient exploding problem . Simply speaking , Compared with ordinary RNN,LSTM Be able to perform better in longer sequences .

LSTM structure （ r ） And the general RNN The main input and output differences are as follows .
Insert picture description here

comparison RNN There is only one delivery state $h^t$ ,LSTM There are two transmission states , One $c^t$ （cell state）, And a $h^t$ （hidden state）.（Tips：RNN Medium $h^t$ about LSTM Medium $c^t$ ）

Among them, for the transmission of $c^t$ Change is slow , Usually the output is $c^t$ It's from the last state $c^{t-1}$ Add some numerical values .

and $h^t$ There are often big differences under different nodes .

2.2 thorough LSTM structure

The following is specific to LSTM To analyze the internal structure of .

use first LSTM The current input of $x^t$ And from the last state $h^{t-1}$ You get four states in stitching training .
Insert picture description here

among , $z^f$ , $z^i$ , $z^o$ It's the stitching vector multiplied by the weight matrix , Through one more $s i g m o i d$ The activation function is converted to $0$ To $1$ Value between , As a gating state . and $z$ The result is passed through a $t a n h$ The activation function will be converted to -1 To 1 Between the value of the （ Use here $t a n h$ Because it's used as input data , Instead of gating ）.

Let's start with a further introduction of these four states LSTM Internal use .

** Bold style **

$\odot$ yes Hadamard Product, That is, the multiplication of the corresponding elements in the operation matrix , Therefore, two multiplication matrices are required to be of the same type . $\oplus$ It means matrix addition .

2.3 LSTM There are three main internal stages ：

Forget the stage . This stage is mainly used to input from the previous node selectivity forget . To put it simply, it will “ Forget the unimportant , Remember the important ”.

Specifically, it is calculated $z^f$ （f Express forget） As a forgotten gatekeeper , To control the last state $c^{t-1}$ What needs to be left and what needs to be forgotten .

Choose the stage of memory . This stage selectively inputs this stage “ memory ”. It's mainly about input $x^t$ Choose to remember . What's important is to write down , What doesn't matter , Remember less . The current input is calculated from the previous $z$ Express . And the gate control signal is chosen by $z^i$ （i representative information） To control .

Add the results of the above two steps , You can get the... Transmitted to the next state [ The formula ] . That's the first formula in the picture above .

Output stage . This stage will determine which output will be taken as the current state . Mainly through $z^o$ To control . And also for what we got in the previous stage $c^o$ It's been released and retracted （ Through one tanh The activation function changes ）.