本节我们介绍如何应用循环神经网络来构建一个语言模型。设小批量中样本数为 1,文本序列为 “h”、“e”、“l”、“l”、“e”。如何使用循环神经网络基于当前和过去的字符来预测下一个字符。在训练时,我们对每个时间步的输出层输出使用 softmax 运算,然后使用交叉熵损失函数来计算它与标签的误差。
如下图所示,由于隐藏层中隐藏状态的循环计算,时间步 4 的输出 O 4 O_4 O4 取决于文本序列 “h”、“e”、“l”、“l”。由于训练数据中该序列的下一个词为 “o”,时间步 4 的损失将取决于该时间步基于序列 “h”、“e”、“l”、“l” 生成下一个词的概率分布与该时间步的标签 “o”。
# 准备数据
idx2char = ['e', 'h', 'l', 'o']
x_data = [1, 0, 2, 2, 3] # "hello"
y_data = [3, 1, 2, 3, 2] # "onlol"
# one-hot 编码
one_hot_dir = [[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]]
x_one_hot = [one_hot_dir[x] for x in x_data]
batch_size = 1
input_size = 4
hidden_size = 4
inputs = torch.Tensor(x_one_hot).view(-1, batch_size, input_size)
labels = torch.LongTensor(y_data).view(-1, 1)
# 损失函数和优化器
criterion = torch.nn.CrossEntropyLoss()
# 定义优化器时要传入网络参数和学习率
optimizer = torch.optim.Adam(net.parameters(), lr=0.1)
基于 torch.nn.RNNCell
实现单隐藏的循环神经网络
import torch
# 初始化模型参数
input_size = 4
hidden_size = 4
batch_size = 1
seq_len = 5
# 设计模型
class Model(torch.nn.Module):
def __init__(self, input_size, hidden_size, batch_size):
super(Model, self).__init__()
# 批量大小
self.batch_size = batch_size
# 输入特征维度
self.input_size = input_size
# 隐藏神经元个数
self.hidden_size = hidden_size
# 循环神经网络基本单元
self.rnncell = torch.nn.RNNCell(input_size=input_size, hidden_size=hidden_size)
def forward(self, input, hidden):
hidden = self.rnncell(input, hidden)
"""
input:(batchSize, inputSize)
hidden:(batchSize, hiddenSize)
"""
return hidden
# 考虑到没有先验,所以初始化隐藏单元为h0全零
def init_hidden(self):
return torch.zeros(self.batch_size, self.hidden_size)
net = Model(input_size, hidden_size, batch_size)
# 损失函数和优化器
criterion = torch.nn.CrossEntropyLoss()
# 定义优化器时要传入网络参数和学习率
optimizer = torch.optim.Adam(net.parameters(), lr=0.1)
# 开始训练模型
for epoch in range(15):
loss = 0
# 梯度清零
optimizer.zero_grad()
# 初始化 h0
hidden = net.init_hidden()
print("Predicted string:", end='')
for input, label in zip(inputs, labels):
"""
inputs:(seqLen, batch, inputSize)
input:(batch, inputSize)
labels:(seqLen, 1)
label:(1)
"""
# 前向传播
hidden = net(input, hidden)
loss += criterion(hidden, label)
# 结果预测,取出概率最大的那个索引
_, idx = hidden.max(dim=1)
print(idx2char[idx.item()], end="")
loss.backward() # 反向传播
optimizer.step() # 梯度下降
print(', Epoch [%d/15] loss=%.4f' % (epoch+1, loss.item()))
模型训练的输出如下所示:
Predicted string:llloo, Epoch [1/15] loss=7.0430
Predicted string:llloo, Epoch [2/15] loss=6.2088
Predicted string:ollll, Epoch [3/15] loss=5.5192
Predicted string:ollll, Epoch [4/15] loss=4.9792
Predicted string:ollll, Epoch [5/15] loss=4.4750
Predicted string:oholl, Epoch [6/15] loss=4.0680
Predicted string:ohool, Epoch [7/15] loss=3.7638
Predicted string:oholl, Epoch [8/15] loss=3.5293
Predicted string:oholl, Epoch [9/15] loss=3.3460
Predicted string:oholl, Epoch [10/15] loss=3.2084
Predicted string:oholl, Epoch [11/15] loss=3.0930
Predicted string:oholl, Epoch [12/15] loss=2.9962
Predicted string:oholl, Epoch [13/15] loss=2.9253
Predicted string:ohlol, Epoch [14/15] loss=2.8782
Predicted string:ohlol, Epoch [15/15] loss=2.8446
第一步:准备训练数据集
# 准备训练数据
idx2char = ['e', 'h', 'l', 'o']
x_data = [1, 0, 2, 2, 3] # "hello"
y_data = [3, 1, 2, 3, 2] # "onlol"
one_hot_dir = [[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]]
x_one_hot = [one_hot_dir[x] for x in x_data]
# seq_len = 5
batch_size = 1
input_size = 4
inputs = torch.Tensor(x_one_hot).view(seq_len, batch_size, input_size)
labels = torch.LongTensor(y_data)
第二步:设计模型
import torch
# 初始化模型参数
input_size = 4
hidden_size = 4
batch_size = 1
seq_len = 5
# 设计模型
class Model(torch.nn.Module):
def __init__(self, input_size, hidden_size, batch_size, num_layers=1):
super(Model, self).__init__()
self.num_layers = num_layers
self.batch_size = batch_size
self.input_size = input_size
self.hidden_size = hidden_size
self.rnn = torch.nn.RNN(input_size=input_size, hidden_size=hidden_size,
num_layers=num_layers)
def forward(self, input):
# numLayers, batchSize, hiddenSize
hidden = torch.zeros(self.num_layers,
self.batch_size,
self.hidden_size)
out, _ = self.rnn(input, hidden)
return out.view(-1, self.hidden_size)
net = Model(input_size, hidden_size, batch_size, num_layers=1)
第三步:定义损失函数和优化器,训练模型;
# 定义损失函数和优化器
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=0.05)
for epoch in range(15):
optimizer.zero_grad()
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
_, idx = outputs.max(dim=1)
idx = idx.data.numpy()
print('Predicted:', ''.join([idx2char[x] for x in idx]), end='')
print(', Epoch [%d/15] loss=%.4f' % (epoch+1, loss.item()))
模型训练的结果如下所示:
Predicted: oholl, Epoch [1/15] loss=0.8424
Predicted: oholl, Epoch [2/15] loss=0.8001
Predicted: oholl, Epoch [3/15] loss=0.7587
Predicted: ohool, Epoch [4/15] loss=0.7207
Predicted: ohool, Epoch [5/15] loss=0.6839
Predicted: ohlol, Epoch [6/15] loss=0.51
Predicted: ohlol, Epoch [7/15] loss=0.6073
Predicted: ohlol, Epoch [8/15] loss=0.5744
Predicted: ohlol, Epoch [9/15] loss=0.75
Predicted: ohlol, Epoch [10/15] loss=0.5240
Predicted: ohlol, Epoch [11/15] loss=0.5015
Predicted: ohlol, Epoch [12/15] loss=0.4810
Predicted: ohlol, Epoch [13/15] loss=0.4635
Predicted: ohlol, Epoch [14/15] loss=0.4492
Predicted: ohlol, Epoch [15/15] loss=0.4372
因篇幅问题不能全部显示,请点此查看更多更全内容
Copyright © 2019- hids.cn 版权所有 赣ICP备2024042780号-1
违法及侵权请联系:TEL:199 1889 7713 E-MAIL:2724546146@qq.com
本站由北京市万商天勤律师事务所王兴未律师提供法律服务