torch.nn

Parameters

class torch.nn.Parameter()

Variable的一种,常被用于模块参数(module parameter)。

ParametersVariable 的子类。ParamentersModules一起使用的时候会有一些特殊的属性,即:当Paramenters赋值给Module的属性的时候,他会自动的被加到 Module的 参数列表中(即:会出现在 parameters() 迭代器中)。将Varibale赋值给Module属性则不会有这样的影响。 这样做的原因是:我们有时候会需要缓存一些临时的状态(state), 比如:模型中RNN的最后一个隐状态。如果没有Parameter这个类的话,那么这些临时变量也会注册成为模型变量。

VariableParameter的另一个不同之处在于,Parameter不能被 volatile(即:无法设置volatile=True)而且默认requires_grad=TrueVariable默认requires_grad=False

参数说明:

  • data (Tensor) – parameter tensor.

  • requires_grad (bool, optional) – 默认为True,在BP的过程中会对其求微分。

Containers(容器):

class torch.nn.Module

所有神经网络的基类。

你的模型也应该继承这个类。

Modules也可以包含其它Modules,允许使用树结构嵌入他们。你可以将子模块赋值给模型属性。

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)# submodule: Conv2d
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
       x = F.relu(self.conv1(x))
       return F.relu(self.conv2(x))

通过上面方式赋值的submodule会被注册。当调用 .cuda() 的时候,submodule的参数也会转换为cuda Tensor

add_module(name, module)

将一个 child module 添加到当前 modle。 被添加的module可以通过 name属性来获取。 例:

import torch.nn as nn
class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.add_module("conv", nn.Conv2d(10, 20, 4))
        #self.conv = nn.Conv2d(10, 20, 4) 和上面这个增加module的方式等价
model = Model()
print(model.conv)

输出:

Conv2d(10, 20, kernel_size=(4, 4), stride=(1, 1))

children()

Returns an iterator over immediate children modules. 返回当前模型 子模块的迭代器。

import torch.nn as nn
class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.add_module("conv", nn.Conv2d(10, 20, 4))
        self.add_module("conv1", nn.Conv2d(20 ,10, 4))
model = Model()

for sub_module in model.children():
    print(sub_module)
Conv2d(10, 20, kernel_size=(4, 4), stride=(1, 1))
Conv2d(20, 10, kernel_size=(4, 4), stride=(1, 1))

cpu(device_id=None)

将所有的模型参数(parameters)和buffers复制到CPU

NOTE:官方文档用的move,但我觉着copy更合理。

cuda(device_id=None)

将所有的模型参数(parameters)和buffers赋值GPU

参数说明:

  • device_id (int, optional) – 如果指定的话,所有的模型参数都会复制到指定的设备上。

double()

parametersbuffers的数据类型转换成double

eval()

将模型设置成evaluation模式

仅仅当模型中有DropoutBatchNorm是才会有影响。

float()

parametersbuffers的数据类型转换成float

forward(* input)

定义了每次执行的 计算步骤。 在所有的子类中都需要重写这个函数。

half()

parametersbuffers的数据类型转换成half

load_state_dict(state_dict)

state_dict中的parametersbuffers复制到此module和它的后代中。state_dict中的key必须和 model.state_dict()返回的key一致。 NOTE:用来加载模型参数。

参数说明:

  • state_dict (dict) – 保存parameterspersistent buffers的字典。

modules()

返回一个包含 当前模型 所有模块的迭代器。

import torch.nn as nn
class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.add_module("conv", nn.Conv2d(10, 20, 4))
        self.add_module("conv1", nn.Conv2d(20 ,10, 4))
model = Model()

for module in model.modules():
    print(module)
Model (
  (conv): Conv2d(10, 20, kernel_size=(4, 4), stride=(1, 1))
  (conv1): Conv2d(20, 10, kernel_size=(4, 4), stride=(1, 1))
)
Conv2d(10, 20, kernel_size=(4, 4), stride=(1, 1))
Conv2d(20, 10, kernel_size=(4, 4), stride=(1, 1))

可以看出,modules()返回的iterator不止包含 子模块。这是和children()的不同。

NOTE: 重复的模块只被返回一次(children()也是)。 在下面的例子中, submodule 只会被返回一次:

import torch.nn as nn

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        submodule = nn.Conv2d(10, 20, 4)
        self.add_module("conv", submodule)
        self.add_module("conv1", submodule)
model = Model()

for module in model.modules():
    print(module)
Model (
  (conv): Conv2d(10, 20, kernel_size=(4, 4), stride=(1, 1))
  (conv1): Conv2d(10, 20, kernel_size=(4, 4), stride=(1, 1))
)
Conv2d(10, 20, kernel_size=(4, 4), stride=(1, 1))

named_children()

返回 包含 模型当前子模块 的迭代器,yield 模块名字和模块本身。

例子:

for name, module in model.named_children():
    if name in ['conv4', 'conv5']:
        print(module)

named_modules(memo=None, prefix='')[source]

返回包含网络中所有模块的迭代器, yielding 模块名和模块本身。

注意:

重复的模块只被返回一次(children()也是)。 在下面的例子中, submodule 只会被返回一次。

parameters(memo=None)

返回一个 包含模型所有参数 的迭代器。

一般用来当作optimizer的参数。

例子:

for param in model.parameters():
    print(type(param.data), param.size())

<class 'torch.FloatTensor'> (20L,)
<class 'torch.FloatTensor'> (20L, 1L, 5L, 5L)

register_backward_hook(hook)

module上注册一个bachward hook

每次计算moduleinputs的梯度的时候,这个hook会被调用。hook应该拥有下面的signature

hook(module, grad_input, grad_output) -> Variable or None

如果module有多个输入输出的话,那么grad_input grad_output将会是个tuplehook不应该修改它的arguments,但是它可以选择性的返回关于输入的梯度,这个返回的梯度在后续的计算中会替代grad_input

这个函数返回一个 句柄(handle)。它有一个方法 handle.remove(),可以用这个方法将hookmodule移除。

register_buffer(name, tensor)

module添加一个persistent buffer

persistent buffer通常被用在这么一种情况:我们需要保存一个状态,但是这个状态不能看作成为模型参数。 例如:, BatchNorm’s running_mean 不是一个 parameter, 但是它也是需要保存的状态之一。

Buffers可以通过注册时候的name获取。

NOTE:我们可以用 buffer 保存 moving average

例子:

self.register_buffer('running_mean', torch.zeros(num_features))

self.running_mean

register_forward_hook(hook)

module上注册一个forward hook。 每次调用forward()计算输出的时候,这个hook就会被调用。它应该拥有以下签名:

hook(module, input, output) -> None

hook不应该修改 inputoutput的值。 这个函数返回一个 句柄(handle)。它有一个方法 handle.remove(),可以用这个方法将hookmodule移除。

register_parameter(name, param)

module添加 parameter

parameter可以通过注册时候的name获取。

state_dict(destination=None, prefix='')[source]

返回一个字典,保存着module的所有状态(state)。

parameterspersistent buffers都会包含在字典中,字典的key就是parameterbuffernames

例子:

import torch
from torch.autograd import Variable
import torch.nn as nn

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv2 = nn.Linear(1, 2)
        self.vari = Variable(torch.rand([1]))
        self.par = nn.Parameter(torch.rand([1]))
        self.register_buffer("buffer", torch.randn([2,3]))

model = Model()
print(model.state_dict().keys())

odict_keys(['par', 'buffer', 'conv2.weight', 'conv2.bias'])

train(mode=True)

module设置为 training mode

仅仅当模型中有DropoutBatchNorm是才会有影响。

zero_grad()

module中的所有模型参数的梯度设置为0.

class torch.nn.Sequential(* args)

一个时序容器。Modules 会以他们传入的顺序被添加到容器中。当然,也可以传入一个OrderedDict

为了更容易的理解如何使用Sequential, 下面给出了一个例子:

# Example of using Sequential

model = nn.Sequential(
          nn.Conv2d(1,20,5),
          nn.ReLU(),
          nn.Conv2d(20,64,5),
          nn.ReLU()
        )
# Example of using Sequential with OrderedDict
model = nn.Sequential(OrderedDict([
          ('conv1', nn.Conv2d(1,20,5)),
          ('relu1', nn.ReLU()),
          ('conv2', nn.Conv2d(20,64,5)),
          ('relu2', nn.ReLU())
        ]))

class torch.nn.ModuleList(modules=None)[source]

submodules保存在一个list中。

ModuleList可以像一般的Python list一样被索引。而且ModuleList中包含的modules已经被正确的注册,对所有的module method可见。

参数说明:

  • modules (list, optional) – 将要被添加到MuduleList中的 modules 列表

例子:

class MyModule(nn.Module):
    def __init__(self):
        super(MyModule, self).__init__()
        self.linears = nn.ModuleList([nn.Linear(10, 10) for i in range(10)])

    def forward(self, x):
        # ModuleList can act as an iterable, or be indexed using ints
        for i, l in enumerate(self.linears):
            x = self.linears[i // 2](x) + l(x)
        return x

append(module)[source]

等价于 list 的 append()

参数说明:

  • module (nn.Module) – 要 append 的module

extend(modules)[source]

等价于 listextend() 方法

参数说明:

  • modules (list) – list of modules to append

class torch.nn.ParameterList(parameters=None)

submodules保存在一个list中。

ParameterList可以像一般的Python list一样被索引。而且ParameterList中包含的parameters已经被正确的注册,对所有的module method可见。

参数说明:

  • modules (list, optional) – a list of nn.Parameter

例子:

class MyModule(nn.Module):
    def __init__(self):
        super(MyModule, self).__init__()
        self.params = nn.ParameterList([nn.Parameter(torch.randn(10, 10)) for i in range(10)])

    def forward(self, x):
        # ModuleList can act as an iterable, or be indexed using ints
        for i, p in enumerate(self.params):
            x = self.params[i // 2].mm(x) + p.mm(x)
        return x

append(parameter)[source]

等价于python listappend 方法。

参数说明:

  • parameter (nn.Parameter) – parameter to append

extend(parameters)[source]

等价于python listextend 方法。

参数说明:

  • parameters (list) – list of parameters to append

Convolution Layers

Pooling Layers

Non-linear Activations

Normalization layers

Recurrent layers

class torch.nn.RNN( args, * kwargs)[source]

将一个多层的 Elman RNN,激活函数为tanh或者ReLU,用于输入序列。

对输入序列中每个元素,RNN每层的计算公式为 $$ h_t=tanh(w_{ih} x_t+b_{ih}+w_{hh} h_{t-1}+b_{hh}) $$ $h_t$是时刻$t$的隐状态。 $x_t$是上一层时刻$t$的隐状态,或者是第一层在时刻$t$的输入。如果nonlinearity='relu',那么将使用relu代替tanh作为激活函数。

参数说明:

  • input_size – 输入x的特征数量。

  • hidden_size – 隐层的特征数量。

  • num_layers – RNN的层数。

  • nonlinearity – 指定非线性函数使用tanh还是relu。默认是tanh

  • bias – 如果是False,那么RNN层就不会使用偏置权重 $b_ih$和$b_hh$,默认是True

  • batch_first – 如果True的话,那么输入Tensor的shape应该是[batch_size, time_step, feature],输出也是这样。

  • dropout – 如果值非零,那么除了最后一层外,其它层的输出都会套上一个dropout层。

  • bidirectional – 如果True,将会变成一个双向RNN,默认为False

RNN的输入: (input, h_0) - input (seq_len, batch, input_size): 保存输入序列特征的tensorinput可以是被填充的变长的序列。细节请看torch.nn.utils.rnn.pack_padded_sequence()

  • h_0 (num_layers * num_directions, batch, hidden_size): 保存着初始隐状态的tensor

RNN的输出: (output, h_n)

  • output (seq_len, batch, hidden_size * num_directions): 保存着RNN最后一层的输出特征。如果输入是被填充过的序列,那么输出也是被填充的序列。
  • h_n (num_layers * num_directions, batch, hidden_size): 保存着最后一个时刻隐状态。

RNN模型参数:

  • weight_ih_l[k] – 第k层的 input-hidden 权重, 可学习,形状是(input_size x hidden_size)

  • weight_hh_l[k] – 第k层的 hidden-hidden 权重, 可学习,形状是(hidden_size x hidden_size)

  • bias_ih_l[k] – 第k层的 input-hidden 偏置, 可学习,形状是(hidden_size)

  • bias_hh_l[k] – 第k层的 hidden-hidden 偏置, 可学习,形状是(hidden_size)

示例:

rnn = nn.RNN(10, 20, 2)
input = Variable(torch.randn(5, 3, 10))
h0 = Variable(torch.randn(2, 3, 20))
output, hn = rnn(input, h0)

class torch.nn.LSTM( args, * kwargs)[source]

将一个多层的 (LSTM) 应用到输入序列。

对输入序列的每个元素,LSTM的每层都会执行以下计算: $$ \begin{aligned} i_t &= sigmoid(W_{ii}x_t+b_{ii}+W_{hi}h_{t-1}+b_{hi}) \ f_t &= sigmoid(W_{if}x_t+b_{if}+W_{hf}h_{t-1}+b_{hf}) \ o_t &= sigmoid(W_{io}x_t+b_{io}+W_{ho}h_{t-1}+b_{ho})\ g_t &= tanh(W_{ig}x_t+b_{ig}+W_{hg}h_{t-1}+b_{hg})\ c_t &= f_tc_{t-1}+i_tg_t\ h_t &= o_t*tanh(c_t) \end{aligned} $$ $h_t$是时刻$t$的隐状态,$c_t$是时刻$t$的细胞状态,$x_t$是上一层的在时刻$t$的隐状态或者是第一层在时刻$t$的输入。$i_t, f_t, g_t, o_t$ 分别代表 输入门,遗忘门,细胞和输出门。

参数说明:

  • input_size – The number of expected features in the input x
  • hidden_size – The number of features in the hidden state h
  • num_layers – Number of recurrent layers.
  • bias – If False, then the layer does not use bias weights b_ih and b_hh. Default: True
  • batch_first – If True, then the input and output tensors are provided as (batch, seq, feature)
  • dropout – If non-zero, introduces a dropout layer on the outputs of each RNN layer except the last layer
  • bidirectional – If True, becomes a bidirectional RNN. Default: False

LSTM输入: input, (h_0, c_0)

  • input (seq_len, batch, input_size): tensor containing the features of the input sequence. The input can also be a packed variable length sequence. See torch.nn.utils.rnn.pack_padded_sequence() for details.
  • h_0 (num_layers * num_directions, batch, hidden_size): tensor containing the initial hidden state for each element in the batch.
  • c_0 (num_layers * num_directions, batch, hidden_size): tensor containing the initial cell state for each element in the batch.

LSTM输出 output, (h_n, c_n) - output (seq_len, batch, hidden_size * num_directions): tensor containing the output features (h_t) from the last layer of the RNN, for each t. If a torch.nn.utils.rnn.PackedSequence has been given as the input, the output will also be a packed sequence. - h_n (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t=seq_len - c_n (num_layers * num_directions, batch, hidden_size): tensor containing the cell state for t=seq_len

LSTM模型参数: - weight_ih_l[k] – the learnable input-hidden weights of the k-th layer (W_ii|W_if|W_ig|W_io), of shape (input_size x 4hidden_size) - weight_hh_l[k] – the learnable hidden-hidden weights of the k-th layer (W_hi|W_hf|W_hg|W_ho), of shape (hidden_size x 4hidden_size) - bias_ih_l[k] – the learnable input-hidden bias of the k-th layer (b_ii|b_if|b_ig|b_io), of shape (4hidden_size) - bias_hh_l[k] – the learnable hidden-hidden bias of the k-th layer (W_hi|W_hf|W_hg|b_ho), of shape (4hidden_size) 示例:

lstm = nn.LSTM(10, 20, 2)
input = Variable(torch.randn(5, 3, 10))
h0 = Variable(torch.randn(2, 3, 20))
c0 = Variable(torch.randn(2, 3, 20))
output, hn = lstm(input, (h0, c0))

class torch.nn.GRU( args, * kwargs)[source]

Applies a multi-layer gated recurrent unit (GRU) RNN to an input sequence.

For each element in the input sequence, each layer computes the following function:

rt=sigmoid(Wirxt+bir+Whrh(t−1)+bhr)it=sigmoid(Wiixt+bii+Whih(t−1)+bhi)nt=tanh(Winxt+bin+rt∗(Whnh(t−1)+bhn))ht=(1−it)∗nt+it∗h(t−1) rt=sigmoid(Wirxt+bir+Whrh(t−1)+bhr)it=sigmoid(Wiixt+bii+Whih(t−1)+bhi)nt=tanh⁡(Winxt+bin+rt∗(Whnh(t−1)+bhn))ht=(1−it)∗nt+it∗h(t−1) where htht is the hidden state at time t, xtxt is the hidden state of the previous layer at time t or inputtinputt for the first layer, and rtrt, itit, ntnt are the reset, input, and new gates, respectively.

Parameters: input_size – The number of expected features in the input x hidden_size – The number of features in the hidden state h num_layers – Number of recurrent layers. bias – If False, then the layer does not use bias weights b_ih and b_hh. Default: True batch_first – If True, then the input and output tensors are provided as (batch, seq, feature) dropout – If non-zero, introduces a dropout layer on the outputs of each RNN layer except the last layer bidirectional – If True, becomes a bidirectional RNN. Default: False Inputs: input, h_0 input (seq_len, batch, input_size): tensor containing the features of the input sequence. The input can also be a packed variable length sequence. See torch.nn.utils.rnn.pack_padded_sequence() for details. h_0 (num_layers * num_directions, batch, hidden_size): tensor containing the initial hidden state for each element in the batch. Outputs: output, h_n output (seq_len, batch, hidden_size * num_directions): tensor containing the output features h_t from the last layer of the RNN, for each t. If a torch.nn.utils.rnn.PackedSequence has been given as the input, the output will also be a packed sequence. h_n (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t=seq_len Variables: weight_ih_l[k] – the learnable input-hidden weights of the k-th layer (W_ir|W_ii|W_in), of shape (input_size x 3hidden_size) weight_hh_l[k] – the learnable hidden-hidden weights of the k-th layer (W_hr|W_hi|W_hn), of shape (hidden_size x 3hidden_size) bias_ih_l[k] – the learnable input-hidden bias of the k-th layer (b_ir|b_ii|b_in), of shape (3hidden_size) bias_hh_l[k] – the learnable hidden-hidden bias of the k-th layer (W_hr|W_hi|W_hn), of shape (3hidden_size) Examples:

>>> rnn = nn.GRU(10, 20, 2)
>>> input = Variable(torch.randn(5, 3, 10))
>>> h0 = Variable(torch.randn(2, 3, 20))
>>> output, hn = rnn(input, h0)

class torch.nn.RNNCell(input_size, hidden_size, bias=True, nonlinearity='tanh')[source]

An Elman RNN cell with tanh or ReLU non-linearity.

h′=tanh(wih∗x+bih+whh∗h+bhh) h′=tanh⁡(wih∗x+bih+whh∗h+bhh) If nonlinearity=’relu’, then ReLU is used in place of tanh.

Parameters: input_size – The number of expected features in the input x hidden_size – The number of features in the hidden state h bias – If False, then the layer does not use bias weights b_ih and b_hh. Default: True nonlinearity – The non-linearity to use [‘tanh’|’relu’]. Default: ‘tanh’ Inputs: input, hidden input (batch, input_size): tensor containing input features hidden (batch, hidden_size): tensor containing the initial hidden state for each element in the batch. Outputs: h’ h’ (batch, hidden_size): tensor containing the next hidden state for each element in the batch Variables: weight_ih – the learnable input-hidden weights, of shape (input_size x hidden_size) weight_hh – the learnable hidden-hidden weights, of shape (hidden_size x hidden_size) bias_ih – the learnable input-hidden bias, of shape (hidden_size) bias_hh – the learnable hidden-hidden bias, of shape (hidden_size) Examples:

>>> rnn = nn.RNNCell(10, 20)
>>> input = Variable(torch.randn(6, 3, 10))
>>> hx = Variable(torch.randn(3, 20))
>>> output = []
>>> for i in range(6):
...     hx = rnn(input[i], hx)
...     output.append(hx)

class torch.nn.LSTMCell(input_size, hidden_size, bias=True)[source]

A long short-term memory (LSTM) cell.

i=sigmoid(Wiix+bii+Whih+bhi)f=sigmoid(Wifx+bif+Whfh+bhf)g=tanh(Wigx+big+Whch+bhg)o=sigmoid(Wiox+bio+Whoh+bho)c′=f∗c+i∗gh′=o∗tanh(ct) i=sigmoid(Wiix+bii+Whih+bhi)f=sigmoid(Wifx+bif+Whfh+bhf)g=tanh⁡(Wigx+big+Whch+bhg)o=sigmoid(Wiox+bio+Whoh+bho)c′=f∗c+i∗gh′=o∗tanh⁡(ct) Parameters: input_size – The number of expected features in the input x hidden_size – The number of features in the hidden state h bias – If False, then the layer does not use bias weights b_ih and b_hh. Default: True Inputs: input, (h_0, c_0) input (batch, input_size): tensor containing input features h_0 (batch, hidden_size): tensor containing the initial hidden state for each element in the batch. c_0 (batch. hidden_size): tensor containing the initial cell state for each element in the batch. Outputs: h_1, c_1 h_1 (batch, hidden_size): tensor containing the next hidden state for each element in the batch c_1 (batch, hidden_size): tensor containing the next cell state for each element in the batch Variables: weight_ih – the learnable input-hidden weights, of shape (input_size x hidden_size) weight_hh – the learnable hidden-hidden weights, of shape (hidden_size x hidden_size) bias_ih – the learnable input-hidden bias, of shape (hidden_size) bias_hh – the learnable hidden-hidden bias, of shape (hidden_size) Examples:

>>> rnn = nn.LSTMCell(10, 20)
>>> input = Variable(torch.randn(6, 3, 10))
>>> hx = Variable(torch.randn(3, 20))
>>> cx = Variable(torch.randn(3, 20))
>>> output = []
>>> for i in range(6):
...     hx, cx = rnn(input[i], (hx, cx))
...     output.append(hx)

class torch.nn.GRUCell(input_size, hidden_size, bias=True)[source]

A gated recurrent unit (GRU) cell

r=sigmoid(Wirx+bir+Whrh+bhr)i=sigmoid(Wiix+bii+Whih+bhi)n=tanh(Winx+bin+r∗(Whnh+bhn))h′=(1−i)∗n+i∗h r=sigmoid(Wirx+bir+Whrh+bhr)i=sigmoid(Wiix+bii+Whih+bhi)n=tanh⁡(Winx+bin+r∗(Whnh+bhn))h′=(1−i)∗n+i∗h Parameters: input_size – The number of expected features in the input x hidden_size – The number of features in the hidden state h bias – If False, then the layer does not use bias weights b_ih and b_hh. Default: True Inputs: input, hidden input (batch, input_size): tensor containing input features hidden (batch, hidden_size): tensor containing the initial hidden state for each element in the batch. Outputs: h’ h’: (batch, hidden_size): tensor containing the next hidden state for each element in the batch Variables: weight_ih – the learnable input-hidden weights, of shape (input_size x hidden_size) weight_hh – the learnable hidden-hidden weights, of shape (hidden_size x hidden_size) bias_ih – the learnable input-hidden bias, of shape (hidden_size) bias_hh – the learnable hidden-hidden bias, of shape (hidden_size) Examples:

rnn = nn.GRUCell(10, 20) input = Variable(torch.randn(6, 3, 10)) hx = Variable(torch.randn(3, 20)) output = [] for i in range(6): ... hx = rnn(input[i], hx) ... output.append(hx)

Linear layers

Dropout layers

Sparse layers

Distance functions

Loss functions

Vision layers

Multi-GPU layers

class torch.nn.DataParallel(module, device_ids=None, output_device=None, dim=0)[source]

在模块级别上实现数据并行。

This container parallelizes the application of the given module by splitting the input across the specified devices by chunking in the batch dimension. In the forward pass, the module is replicated on each device, and each replica handles a portion of the input. During the backwards pass, gradients from each replica are summed into the original module.

The batch size should be larger than the number of GPUs used. It should also be an integer multiple of the number of GPUs so that each chunk is the same size (so that each GPU processes the same number of samples).

See also: Use nn.DataParallel instead of multiprocessing

Arbitrary positional and keyword inputs are allowed to be passed into DataParallel EXCEPT Tensors. All variables will be scattered on dim specified (default 0). Primitive types will be broadcasted, but all other types will be a shallow copy and can be corrupted if written to in the model’s forward pass.

Parameters: module – module to be parallelized device_ids – CUDA devices (default: all devices) output_device – device location of output (default: device_ids[0]) Example:

>>> net = torch.nn.DataParallel(model, device_ids=[0, 1, 2])
>>> output = net(input_var)

Utilities

工具函数

torch.nn.utils.clip_grad_norm(parameters, max_norm, norm_type=2)[source]

Clips gradient norm of an iterable of parameters.

正则項的值由所有的梯度计算出来,就像他们连成一个向量一样。梯度被in-place operation修改。

参数说明: - parameters (Iterable[Variable]) – 可迭代的Variables,它们的梯度即将被标准化。 - max_norm (float or int) – clip后,gradients p-norm 值 - norm_type (float or int) – 标准化的类型,p-norm. 可以是inf 代表 infinity norm.

关于norm

返回值:

所有参数的p-norm值。

torch.nn.utils.rnn.PackedSequence(_cls, data, batch_sizes)[source]

Holds the data and list of batch_sizes of a packed sequence.

All RNN modules accept packed sequences as inputs. 所有的RNN模块都接收这种被包裹后的序列作为它们的输入。

NOTE: 这个类的实例不能手动创建。它们只能被 pack_padded_sequence() 实例化。

参数说明:

  • data (Variable) – 包含打包后序列的Variable

  • batch_sizes (list[int]) – 包含 mini-batch 中每个序列长度的列表。

torch.nn.utils.rnn.pack_padded_sequence(input, lengths, batch_first=False)[source]

Packs a Variable containing padded sequences of variable length. 把一个包含 填充过的变长序列 Variable打包。

输入的形状可以是(T×B×* )。T是最长序列长度,Bbatch size*代表任意维度(可以是0)。如果batch_first=True的话,那么相应的 input size 就是 (B×T×*)

Variable中保存的序列,应该按序列长度的长短排序,长的在前,短的在后。即input[:,0]代表的是最长的序列,input[:, B-1]保存的是最短的序列。

NOTE: 只要是维度大于等于2的input都可以作为这个函数的参数。你可以用它来打包labels,然后用RNN的输出和打包后的labels来计算loss。通过PackedSequence对象的.data属性可以获取 Variable

参数说明:

  • input (Variable) – 变长序列 被填充后的 batch

  • lengths (list[int]) – Variable 中 每个序列的长度。

  • batch_first (bool, optional) – 如果是True,input的形状应该是B*T*size

返回值:

一个PackedSequence 对象。

torch.nn.utils.rnn.pad_packed_sequence(sequence, batch_first=False)[source]

Pads a packed batch of variable length sequences.

It is an inverse operation to pack_padded_sequence().

The returned Variable’s data will be of size TxBx, where T is the length of the longest sequence and B is the batch size. If batch_size is True, the data will be transposed into BxTx format.

Batch elements will be ordered decreasingly by their length.

参数说明:

  • sequence (PackedSequence) – batch to pad

  • batch_first (bool, optional) – if True, the output will be in BxTx* format.

返回值:

Tuple of Variable containing the padded sequence, and a list of lengths of each sequence in the batch.