当前位置:网站首页>Read LSTM (long short term memory)
Read LSTM (long short term memory)
2022-04-23 10:03:00 【Code ape chicken】
LSTM(Long Short-Term Memory)
0. from RNN Speaking of
Cyclic neural network (Recurrent Neural Network,RNN) It's a kind of neural network for processing sequence data . Compared to the general neural network , He's able to process data that changes in sequence . For example, the meaning of a word will have different meanings because of the content mentioned above ,RNN Can solve this kind of problem very well .
1. Ordinary RNN
Let's briefly introduce the general RNN.
The main form is shown in the figure below ( The pictures are from Professor Li Hongyi of NTU PPT):
here :
x x x Input data for the current state , h h h Represents the input received from the previous node .
y y y Is the output of the current node state , and h ′ h' h′ For the output passed to the next node .
From the formula above, we can see that , Output h’ And x and h All of the values are related .
and y We often use h’ Put it into a linear layer ( It's mainly about dimension mapping ) And then use s o f t m a x softmax softmax Classify to get the data you need .
Right here y y y How to use h ′ h' h′ Calculation often depends on the use of specific models .
Through input in the form of sequence , We can get the following form of RNN.
2. LSTM
2.1 What is? LSTM
Long and short term memory (Long short-term memory, LSTM) It's a special kind RNN, It is mainly to solve the problems in the process of long sequence training Gradient vanishing and gradient exploding problem . Simply speaking , Compared with ordinary RNN,LSTM Be able to perform better in longer sequences .
LSTM structure ( r ) And the general RNN The main input and output differences are as follows .
comparison RNN There is only one delivery state h t h^t ht,LSTM There are two transmission states , One c t c^t ct(cell state), And a h t h^t ht(hidden state).(Tips:RNN Medium h t h^t ht about LSTM Medium c t c^t ct )
Among them, for the transmission of c t c^t ct Change is slow , Usually the output is c t c^t ct It's from the last state c t − 1 c^{t-1} ct−1 Add some numerical values .
and h t h^t ht There are often big differences under different nodes .
2.2 thorough LSTM structure
The following is specific to LSTM To analyze the internal structure of .
use first LSTM The current input of x t x^t xt And from the last state h t − 1 h^{t-1} ht−1 You get four states in stitching training .
among , z f z^f zf , z i z^i zi , z o z^o zo It's the stitching vector multiplied by the weight matrix , Through one more s i g m o i d sigmoid sigmoid The activation function is converted to 0 0 0 To 1 1 1 Value between , As a gating state . and z z z The result is passed through a t a n h tanh tanh The activation function will be converted to -1 To 1 Between the value of the ( Use here t a n h tanh tanh Because it's used as input data , Instead of gating ).
Let's start with a further introduction of these four states LSTM Internal use .
⊙ \odot ⊙ yes Hadamard Product, That is, the multiplication of the corresponding elements in the operation matrix , Therefore, two multiplication matrices are required to be of the same type . ⊕ \oplus ⊕ It means matrix addition .
2.3 LSTM There are three main internal stages :
- Forget the stage . This stage is mainly used to input from the previous node selectivity forget . To put it simply, it will “ Forget the unimportant , Remember the important ”.
Specifically, it is calculated z f z^f zf (f Express forget) As a forgotten gatekeeper , To control the last state c t − 1 c^{t-1} ct−1 What needs to be left and what needs to be forgotten .
- Choose the stage of memory . This stage selectively inputs this stage “ memory ”. It's mainly about input x t x^t xt Choose to remember . What's important is to write down , What doesn't matter , Remember less . The current input is calculated from the previous z z z Express . And the gate control signal is chosen by z i z^i zi (i representative information) To control .
Add the results of the above two steps , You can get the... Transmitted to the next state [ The formula ] . That's the first formula in the picture above .
- Output stage . This stage will determine which output will be taken as the current state . Mainly through z o z^o zo To control . And also for what we got in the previous stage c o c^o co It's been released and retracted ( Through one tanh The activation function changes ).
And ordinary RNN similar , Output y t y^t yt It's often through h t h^t ht Change gets .
Reference
Link to the original text
Link to the original text :http://iloveeli.top:8090/archives/lstmlongshort-termmemory
版权声明
本文为[Code ape chicken]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204230956434692.html
边栏推荐
- Career planning and implementation in the era of meta universe
- Failureforwardurl and failureurl
- 工业元宇宙平台规划与建设
- 通过流式数据集成实现数据价值(4)-流数据管道
- Sim Api User Guide(7)
- Chapter 1 Oracle database in memory related concepts (im-1.1)
- 【无标题】
- F-niu Mei's apple tree (diameter combined)
- PHP two-dimensional array specifies that the elements are added after they are equal, otherwise new
- (Extended) bsgs and higher order congruence equation
猜你喜欢
随机推荐
shell脚本免交互
Failureforwardurl and failureurl
杰理之用户如何最简单的处理事件【篇】
论文阅读《Integrity Monitoring Techniques for Vision Navigation Systems》
Sim Api User Guide(5)
MapReduce核心和基础Demo
lnmp的配置
101. Symmetric Tree
打印页面的功能实现
SQL tuning series - Introduction to SQL tuning
ARM调试(1):两种在keil中实现printf重定向到串口的方法
DBA常用SQL语句 (5) - Latch 相关
P1390 sum of common divisor (Mobius inversion)
Integral function and Dirichlet convolution
杰理之系统事件有哪些【篇】
LeetCode 1249. Minimum Remove to Make Valid Parentheses - FB高频题1
杰理之通常程序异常情况有哪些?【篇】
Examination questions and answers of the third batch (main person in charge) of Guangdong safety officer a certificate in 2022
一文看懂 LSTM(Long Short-Term Memory)
一文读懂PlatoFarm新经济模型以及生态进展