当前位置：网站首页>Pytorch builds a two-way LSTM to realize time series forecasting (load forecasting)

Pytorch builds a two-way LSTM to realize time series forecasting (load forecasting)

2022-04-22 05:10:00 【Cyril_ KI】

Catalog

I. Preface
II. principle
III. Training and forecasting
IV. Source code and data

I. Preface

The previous articles are all one-way LSTM, This article talks about two-way LSTM.

Series articles ：

In depth understanding of PyTorch in LSTM Input and output of （ from input Input to Linear Output ）
PyTorch build LSTM Time series prediction is realized （ Load forecasting ）
PyTorch build LSTM Realize multivariable time series prediction （ Load forecasting ）
PyTorch build LSTM Realize multivariable and multi step time series prediction （ Load forecasting ）
PyTorch Build a two-way network LSTM Time series prediction is realized （ Load forecasting ）

II. principle

About LSTM The input and output of is in In depth understanding of PyTorch in LSTM Input and output of （ from input Input to Linear Output ） It has been described in detail in .

About nn.LSTM Parameters of , The interpretation given in the official documents is ：
Insert picture description here
There are seven parameters in total , Only the first three are necessary . Because it's widely used PyTorch Of DataLoader To form batch data , therefore batch_first It's more important .LSTM Two common application scenarios are text processing and timing prediction , Therefore, I will explain each parameter from these two aspects .

input_size： In text processing , Because a word cannot participate in the operation , So we have to pass Word2Vec To embed words , Represent each word as a vector , here input_size=embedding_size. For example, there are five words in each sentence , Use one for each word 100 Dimension vector , So here input_size=100; In time series prediction , For example, it is necessary to predict the load , Each load is a separate value , Can directly participate in the operation , Therefore, it is not necessary to represent each load as a vector , here input_size=1. But if we use Multivariable To make predictions , For example, we Before utilization 24 Every hour [ load 、 The wind speed 、 temperature 、 Pressure 、 humidity 、 The weather ] To predict the load at the next moment , So at this time input_size=7.
hidden_size： Number of hidden layer nodes . It can be set at will .
num_layers： The layer number .nn.LSTMCell And nn.LSTM comparison ,num_layers The default is 1.
batch_first： The default is False, See the following text for the significance .

Inputs

About LSTM The input of , The definition given in the official document is ：
Insert picture description here
You can see , The input consists of two parts ：input、( The initial hidden state h_0, Initial unit state c_0)

among input：

input(seq_len, batch_size, input_size)

seq_len： In text processing , If a sentence has 7 Word , be seq_len=7; In time series prediction , Suppose we use the former 24 One hour load to predict the load at the next moment , be seq_len=24.
batch_size： One time input LSTM The number of samples in . In text processing , You can enter many sentences at once ; In time series prediction , You can also enter many pieces of data at one time .
input_size： See above .

(h_0, c_0)：

h_0(num_directions * num_layers, batch_size, hidden_size)
c_0(num_directions * num_layers, batch_size, hidden_size)

h_0 and c_0 Of shape Agreement .

num_directions： If it is two-way LSTM, be num_directions=2; otherwise num_directions=1.
num_layers： See above .
batch_size： See above .
hidden_size： See above .

Outputs

About LSTM Output , The definition given in the official document is ：
Insert picture description here
You can see , The output also consists of two parts ：otput、( Hidden state h_n, Unit status c_n)

among output Of shape by ：

output(seq_len, batch_size, num_directions * hidden_size)

h_n and c_n Of shape remain unchanged , See the previous text for parameter explanation .

batch_first

If it's initializing LSTM season batch_first=True, that input and output Of shape Will be made by ：

input(seq_len, batch_size, input_size)
output(seq_len, batch_size, num_directions * hidden_size)

Turn into ：

input(batch_size, seq_len, input_size)
output(batch_size, seq_len, num_directions * hidden_size)

namely batch_size advance .

Output extraction

Suppose we finally get output(batch_size, seq_len, 2 * hidden_size), We need to input it into the linear layer , There are two ways to refer to ：

（1） Direct input

Like one-way , We can output Type directly into Linear. In one way LSTM in ：

self.linear = nn.Linear(self.hidden_size, self.output_size)

And in both directions LSTM in ：

self.linear = nn.Linear(2 * self.hidden_size, self.output_size)

Model ：

class BiLSTM(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, output_size, batch_size):
        super().__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.output_size = output_size
        self.num_directions = 2
        self.batch_size = batch_size
        self.lstm = nn.LSTM(self.input_size, self.hidden_size, self.num_layers, batch_first=True, bidirectional=True)
        self.linear = nn.Linear(2 * self.num_directions * self.hidden_size, self.output_size)

    def forward(self, input_seq):
        h_0 = torch.randn(self.num_directions * self.num_layers, self.batch_size, self.hidden_size).to(device)
        c_0 = torch.randn(self.num_directions * self.num_layers, self.batch_size, self.hidden_size).to(device)
        # print(input_seq.size())
        seq_len = input_seq.shape[1]
        # input(batch_size, seq_len, input_size)
        input_seq = input_seq.view(self.batch_size, seq_len, self.input_size)
        # output(batch_size, seq_len, num_directions * hidden_size)
        output, _ = self.lstm(input_seq, (h_0, c_0))
        # print(self.batch_size * seq_len, self.hidden_size)
        output = output.contiguous().view(self.batch_size * seq_len, self.num_directions * self.hidden_size)  # (5 * 30, 64)
        pred = self.linear(output)  # pred()
        pred = pred.view(self.batch_size, seq_len, -1)
        pred = pred[:, -1, :]
        return pred

（2） Enter... After processing

stay LSTM in , After passing through the linear layer output Of shape by (batch_size, seq_len, output_size). Suppose we use the former 24 Hours (1 to 24) After forecast 2 An hour's load (25 to 26), that seq_len=24, output_size=2. according to LSTM Principle , The final output contains the predicted values of all positions , That is to say ((2 3), (3 4), (4 5)…(25 26)). Obviously, we only need the last prediction , namely output[:, -1, :].

And in both directions LSTM in , In limine output(batch_size, seq_len, 2 * hidden_size), This contains the output in both directions of all positions . Simply speaking ,output[0] by The sequence outputs the first hidden layer state from left to right and The last hidden layer state output of the sequence from right to left The joining together of ;output[-1] by The last hidden layer state output of the sequence from left to right and the first hidden layer state output of the sequence from right to left The joining together of .

If we want to use both forward and backward output , We can cut them from the middle , Then average . such as output Of shape by (30, 24, 2 * 64), We turn it into (30, 24, 2, 64), And then in dim=2 On average , Get one shape by (30, 24, 64) Output , This is the same as one-way LSTM The output of is consistent .

How to deal with it ：

output = output.contiguous().view(self.batch_size, seq_len, self.num_directions, self.hidden_size)
output = torch.mean(output, dim=2)

The model code ：

class BiLSTM(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, output_size, batch_size):
        super().__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.output_size = output_size
        self.num_directions = 2
        self.batch_size = batch_size
        self.lstm = nn.LSTM(self.input_size, self.hidden_size, self.num_layers, batch_first=True, bidirectional=True)
        self.linear = nn.Linear(self.hidden_size, self.output_size)

    def forward(self, input_seq):
        h_0 = torch.randn(self.num_directions * self.num_layers, self.batch_size, self.hidden_size).to(device)
        c_0 = torch.randn(self.num_directions * self.num_layers, self.batch_size, self.hidden_size).to(device)
        # print(input_seq.size())
        seq_len = input_seq.shape[1]
        # input(batch_size, seq_len, input_size)
        input_seq = input_seq.view(self.batch_size, seq_len, self.input_size)
        # output(batch_size, seq_len, num_directions * hidden_size)
        output, _ = self.lstm(input_seq, (h_0, c_0))
        output = output.contiguous().view(self.batch_size, seq_len, self.num_directions, self.hidden_size)
        output = torch.mean(output, dim=2)
        pred = self.linear(output)
        # print('pred=', pred.shape)
        pred = pred.view(self.batch_size, seq_len, -1)
        pred = pred[:, -1, :]
        return pred

III. Training and forecasting

Data processing 、 Training and prediction are the same as those in the previous articles .

Here to Single step multivariable Compare the predictions of , If other conditions are consistent , The experimental results are as follows ：

Method	LSTM	BiLSTM(1)	BiLSTM(2)
MAPE	7.43	9.29	9.29

You can see , Only for the data I use , A one-way LSTM Is much better . For the two methods mentioned above , There seems to be little difference .

IV. Source code and data

I put the source code and data in GitHub On , When downloading, please give me a follow and star, thank ！
LSTM-Load-Forecasting

版权声明
本文为[Cyril_ KI]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/04/202204220508203870.html