两个人日本免费完整版在线观看1,思思久久q6热在精品国产,99青青

默認只返回最后一個state，所以一次輸入一個step的input

				?

									# coding=UTF-8

									import torch

									import torch.autograd as autograd  # torch中自動計算梯度模塊

									import torch.nn as nn  # 神經網絡模塊

									torch.manual_seed(1)

									# lstm單元輸入和輸出維度都是3

									lstm = nn.LSTM(input_size=3, hidden_size=3)

									# 生成一個長度為5，每一個元素為1*3的序列作為輸入，這里的數字3對應于上句中第一個3

									inputs = [autograd.Variable(torch.randn((1, 3)))

									          for _ in range(5)]

									# 設置隱藏層維度，初始化隱藏層的數據

									hidden = (autograd.Variable(torch.randn(1, 1, 3)),

									          autograd.Variable(torch.randn((1, 1, 3))))

									for i in inputs:

									  out, hidden = lstm(i.view(1, 1, -1), hidden)

									  print(out.size())

									  print(hidden[0].size())

									  print("--------")

									print("-----------------------------------------------")

									# 下面是一次輸入多個step的樣子

									inputs_stack = torch.stack(inputs)

									out,hidden = lstm(inputs_stack,hidden)

									print(out.size())

									print(hidden[0].size())

print結果：

(1L, 1L, 3L)
(1L, 1L, 3L)
--------
(1L, 1L, 3L)
(1L, 1L, 3L)
--------
(1L, 1L, 3L)
(1L, 1L, 3L)
--------
(1L, 1L, 3L)
(1L, 1L, 3L)
--------
(1L, 1L, 3L)
(1L, 1L, 3L)
--------
----------------------------------------------
(5L, 1L, 3L)
(1L, 1L, 3L)

可見LSTM的定義都是不用變的，根據input的step數目，一次輸入多少step，就一次輸出多少output，但只輸出最后一個state

補充：pytorch中實現循環神經網絡的基本單元RNN、LSTM、GRU的輸入、輸出、參數詳細理解

前言：這篇文章是對已經較為深入理解了RNN、LSTM、GRU的數學原理以及運算過程的人而言的，如果不理解它的基本思想和過程，可能理解起來不是很簡單。

一、先從一個實例看起

這是官網上面的一個例子，本次以LSTM作為例子而言，實際上，GRU、LSTM、RNN的運算過程是很類似的。

				?

									import torch

									import torch.nn as nn

									lstm = nn.LSTM(10, 20, 2)

									# 序列長度seq_len=5, batch_size=3, 數據向量維數=10

									input = torch.randn(5, 3, 10)

									# 初始化的隱藏元和記憶元,通常它們的維度是一樣的

									# 2個LSTM層，batch_size=3,隱藏元維度20

									h0 = torch.randn(2, 3, 20)

									c0 = torch.randn(2, 3, 20)

									# 這里有2層lstm，output是最后一層lstm的每個詞向量對應隱藏層的輸出,其與層數無關，只與序列長度相關

									# hn,cn是所有層最后一個隱藏元和記憶元的輸出

									output, (hn, cn) = lstm(input, (h0, c0))

									print(output.size(),hn.size(),cn.size())

									# 分別是：

									# torch.Size([5, 3, 20])

									# torch.Size([2, 3, 20])

									# torch.Size([2, 3, 20]))

后面我會詳細解釋上面的運算過程，我們先看一下LSTM的定義，它是一個類

二、LSTM類的定義

				?

									class LSTM(RNNBase):

									    '''參數Args:

									        input_size: 輸入數據的特征維度，比如我對時間序列建模，特征為1，我對一個句子建模，每一個單詞的嵌入向量為10，則它為10

									        hidden_size: 即循環神經網絡中隱藏節點的個數，這個是自己定義的，多少都可以，后面會詳說

									        num_layers: 堆疊的LSTM的層數，默認是一層，也可以自己定義 Default: 1

									        bias: LSTM層是否使用偏置矩陣 偏置權值為 `b_ih` and `b_hh`.

									            Default: ``True``（默認是使用的）

									        batch_first: 如果設置 ``True``, then the input and output tensors are provided

									            as (batch, seq, feature). Default: ``False``，(seq,batch,features)

									        dropout: 是否使用dropout機制，默認是0，表示不使用dropout，如果提供一個非0的數字，則表示在每一個LSTM層之后默認使用dropout，但是最后一個層的LSTM層不使用dropout。

									        bidirectional: 是否是雙向RNN，默認是否，If ``True``, becomes a bidirectional LSTM. Default: ``False``

									#---------------------------------------------------------------------------------------

									    類的構造函數的輸入為Inputs: input, (h_0, c_0)

									        - **input** of shape `(seq_len, batch, input_size)`: tensor containing the features of the input sequence.

									        - **h_0** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor

									          containing the initial hidden state for each element in the batch.

									          If the LSTM is bidirectional, num_directions should be 2, else it should be 1.

									        - **c_0** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor

									          containing the initial cell state for each element in the batch.

									          If `(h_0, c_0)` is not provided, both **h_0** and **c_0** default to zero.

									#----------------------------------------------------------------------------------

									    輸出是什么：Outputs: output, (h_n, c_n)

									        - **output** of shape `(seq_len, batch, num_directions * hidden_size)`: tensor

									          containing the output features `(h_t)` from the last layer of the LSTM,

									          for each `t`. If a :class:`torch.nn.utils.rnn.PackedSequence` has been

									          given as the input, the output will also be a packed sequence.

									          For the unpacked case, the directions can be separated

									          using ``output.view(seq_len, batch, num_directions, hidden_size)``,

									          with forward and backward being direction `0` and `1` respectively.

									          Similarly, the directions can be separated in the packed case.

									        - **h_n** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor

									          containing the hidden state for `t = seq_len`.

									          Like *output*, the layers can be separated using

									          ``h_n.view(num_layers, num_directions, batch, hidden_size)`` and similarly for *c_n*.

									        - **c_n** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor

									          containing the cell state for `t = seq_len`.

									#------------------------------------------------------------------------------------------

									    類的屬性有Attributes:

									        weight_ih_l[k] : the learnable input-hidden weights of the :math:`\text{k}^{th}` layer

									            `(W_ii|W_if|W_ig|W_io)`, of shape `(4*hidden_size, input_size)` for `k = 0`.

									            Otherwise, the shape is `(4*hidden_size, num_directions * hidden_size)`

									        weight_hh_l[k] : the learnable hidden-hidden weights of the :math:`\text{k}^{th}` layer

									            `(W_hi|W_hf|W_hg|W_ho)`, of shape `(4*hidden_size, hidden_size)`

									        bias_ih_l[k] : the learnable input-hidden bias of the :math:`\text{k}^{th}` layer

									            `(b_ii|b_if|b_ig|b_io)`, of shape `(4*hidden_size)`

									        bias_hh_l[k] : the learnable hidden-hidden bias of the :math:`\text{k}^{th}` layer

									            `(b_hi|b_hf|b_hg|b_ho)`, of shape `(4*hidden_size)`

									    '''

上面的參數有點多，我就不一個一個翻譯了，其實很好理解，每一個都比較清晰。

三、必需參數的深入理解

1、RNN、GRU、LSTM的構造函數的三個必須參數理解——第一步：構造循環層對象

在創建循環層的時候，第一步是構造循環層，如下操作：

				?

									lstm = nn.LSTM(10, 20, 2)

構造函數的參數列表為如下：

				?

									class LSTM(RNNBase):

									    '''參數Args:

									        input_size:

									        hidden_size:         

									        num_layers: 

									        bias:       

									        batch_first: 

									        dropout: 

									        bidirectional:

									    '''

（1）input_size:指的是每一個單詞的特征維度，比如我有一個句子，句子中的每一個單詞都用10維向量表示，則input_size就是10；

（2）hidden_size：指的是循環層中每一個LSTM內部單元的隱藏節點數目，這個是自己定義的，隨意怎么設置都可以；

（3）num_layers：循環層的層數，默認是一層，這個根據自己的情況來定。

比如下面：

pytorch lstm gru rnn 得到每個state輸出的操作

左邊的只有一層循環層，右邊的有兩層循環層。

2、通過第一步構造的對象構造前向傳播的過程——第二步：調用循環層對象，傳入參數，并得到返回值

一般如下操作：

				?

									output, (hn, cn) = lstm(input, (h0, c0))

這里是以LSTM為例子來說的，

（1）輸入參數

input：必須是這樣的格式（seq,batch,feature）。第一個seq指的是序列的長度，這是根據自己的數據來定的，比如我的一個句子最大的長度是20個單詞組成，那這里就是20,上面的例子是假設句子長度為5；第二個是batch，這個好理解，就是一次使用幾條樣本，比如3組樣本；第三個features指的是每一個單詞的向量維度，需要注意的是，這個必須要和構造函數的第一個參數input_size保持一樣的，上面的例子中是10.

（h0,c0）：指的是每一個循環層的初始狀態，可以不指定，不指定的情況下全部初始化為0，這里因為是LSTM有兩個狀態需要傳遞，所以有兩個，像普通的RNN和GRU只有一個狀態需要傳遞，則只需要傳遞一個h狀態即可，如下：

				?

									output, hn = rnn(input, h0)  # 普通rnn

									output, hn = gru(input, h0)  # gru

這里需要注意的是傳入的狀態參數的維度，依然以LSTM來說：

h0和c0的數據維度均是(num_layers * num_directions, batch, hidden_size)，這是什么意思呢？

第一個num_layer指的是到底有基層循環層，這好理解，幾層就應該有幾個初始狀態；

第二個num_directions指的是這個循環層是否是雙向的（在構造函數中通過bidirectional參數指定哦），如果不是雙向的，則取值為1，如果是雙向的則取值為2；

第三個batch指的是每次數據的batch，和前面的batch保持一致即可；

最后一個hidden_size指的是循環層每一個節點內部的隱藏節點數，這個需要很好地理解循環神經網絡的整個運算流程才行哦！

（2）輸出結果

其實輸出的結果和輸入的是相匹配的，分別如下：

				?

									output, hn = rnn(input, h0)  # 普通rnn

									output, hn = gru(input, h0)  # gru

									output, (hn, cn) = lstm(input, (h0, c0)) # lstm

這里依然以lstm而言：

output的輸出維度：(seq_len, batch, num_directions * hidden_size)，在上面的例子中，應該為（5,3,20），我們通過驗證的確如此，需要注意的是，第一個維度是seq_len，也就是說每一個時間點的輸出都是作為輸出結果的，這和隱藏層是不一樣的；

hn、cn的輸出維度：為(num_layers * num_directions, batch, hidden_size)，在上面的例子中為（2,3,20），也得到了驗證，我們發現這個跟序列長度seq_len是沒有關系的，為什么呢，輸出的狀態僅僅是指的是最后一個循環層節點輸出的狀態。

如下圖所示：

下面的例子是以普通的RNN來畫的，所以只有一個狀態h，沒有狀態c。

pytorch lstm gru rnn 得到每個state輸出的操作

3、幾個重要的屬性理解

不管是RNN，GRU還是lstm，內部可學習的參數其實就是幾個權值矩陣，包括了偏置矩陣，那怎么查看這些學習到的參數呢？就是通過這幾個矩陣來實現的

（1）weight_ih_l[k]：這表示的是輸入到隱藏層之間的權值矩陣，其中K表示的第幾層循環層，

若K=0，表示的是最下面的輸入層到第一個循環層之間的矩陣，維度為(hidden_size, input_size)，如果k>0則表示第一循環層到第二循環層、第二循環層到第三循環層，以此類推，之間的權值矩陣，形狀為(hidden_size, num_directions * hidden_size)。

（2）weight_hh_l[k]: 表示的是循環層內部之間的權值矩陣，這里的K表示的第幾層循環層，取值為0,1,2,3,4... ...。形狀為(hidden_size, hidden_size)

注意：循環層的層數取值是從0開始，0代表第一個循環層，1代表第二個循環層，以此類推。

（3）bias_ih_l[k]: 第K個循環層的偏置項，表示的是輸入到循環層之間的偏置，維度為 (hidden_size)

（4）bias_hh_l[k]:第K個循環層的偏置項，表示的是循環層到循環層內部之間的偏置，維度為 (hidden_size)。

				?

									# 首先導入RNN需要的相關模塊

									import torch

									import torch.nn as nn

									# 數據向量維數10, 隱藏元維度20, 2個RNN層串聯(如果是1，可以省略，默認為1)

									rnn = nn.RNN(10, 20, 2)

									# 序列長度seq_len=5, batch_size=3, 數據向量維數=10

									input = torch.randn(5, 3, 10)

									# 初始化的隱藏元和記憶元,通常它們的維度是一樣的

									# 2個RNN層，batch_size=3,隱藏元維度20

									h0 = torch.randn(2, 3, 20)

									# 這里有2層RNN，output是最后一層RNN的每個詞向量對應隱藏層的輸出,其與層數無關，只與序列長度相關

									# hn,cn是所有層最后一個隱藏元和記憶元的輸出

									output, hn = rnn(input, h0)

									print(output.size(),hn.size()) # 分別是：torch.Size([5, 3, 20])   torch.Size([2, 3, 20])

									# 查看一下那幾個重要的屬性：

									print("------------輸入--》隱藏------------------------------")

									print(rnn.weight_ih_l0.size())  

									print(rnn.weight_ih_l1.size())

									print(rnn.bias_ih_l0.size())

									print(rnn.bias_ih_l1.size())

									print("------------隱藏--》隱藏------------------------------")

									print(rnn.weight_hh_l0.size())  

									print(rnn.weight_hh_l1.size())

									print(rnn.bias_hh_l0.size())

									print(rnn.bias_hh_l1.size())

									'''輸出結果為：

									------------輸入--》隱藏------------------------------

									torch.Size([20, 10])

									torch.Size([20, 20])

									torch.Size([20])

									torch.Size([20])

									------------隱藏--》隱藏------------------------------

									torch.Size([20, 20])

									torch.Size([20, 20])

									torch.Size([20])

									torch.Size([20])

									'''