长短期记忆模型(LSTM)是一类典型的递归神经网络,它能够学习观察所得的序列。
这也使得它成为一种非常适合时间序列预测的网络结构。
LSTM存在一个问题,它会非常容易在训练数据上产生过拟合,从而影响他们的预测性能。
Dropout是一种训练时可以采用的正则化方法,通过在正向传递和权值更新的过程中对LSTM神经元的输入和递归连接进行概率性失活,该方法能有效避免过拟合并改善模型性能。
在本教程中,您将了解如何在LSTM网络中使用Dropout,并设计实验来检验它在时间序列预测任务上的效果。
完成本教程后,您将知道:
让我们开始吧。
本教程分为5个部分,分别是:
本教程假设您已经安装了Python SciPy环境,Python 2或Python 3均可以使用。
本教程假定您已经安装了Keras v2.0或更高版本,后端可以是TensorFlow或Theano。
本教程还假设您安装了scikit-learn,Pandas,NumPy和Matplotlib库。
接下来,让我们来看看一个标准的时间序列预测问题,作为这个教程的背景问题。
如果您对配置Python环境存在任何问题,请参阅:
免费参加我的7天e-mail课程,学习6种不同的LSTM体系结构(含示例代码)。
点击注册,并获得本课程免费的PDF教程。
该数据集描述了3年间每月的洗发水销售量。
这些数据是一组销售记录,一共有36组观测数据。原始数据集由Makridakis,Wheelwright和Hyndman(1998)创建。
下面的示例程序用于加载数据集 并绘制出相应数据曲线。
# load and plot dataset
from pandas import read_csv
from pandas import datetime
from matplotlib import pyplot
# load dataset
def parser(x):
return datetime.strptime('190'+x, '%Y-%m')
series = read_csv('shampoo-sales.csv', header=0, parse_dates=[0], index_col=0, squeeze=True,date_parser=parser)
# summarize first few rows
print(series.head())
# line plot
series.plot()
pyplot.show()
运行该示例将加载Pandas格式的数据集并打印前5行。
Month
1901-01-01 266.0
1901-02-01 145.9
1901-03-01 183.1
1901-04-01 119.3
1901-05-01 180.3
Name: Sales, dtype: float64
然后绘制出该数据集的折线图,可以看出销售量随着时间推移有明显的增长趋势。
接下来,我们将了解实验中的模型配置以及所使用的测试工具。
本节介绍了本教程中使用的测试工具。
我们将数据集分为两部分:训练集和测试集。
前两年的数据将被用作训练数据集,剩余的一年数据将用作测试集。
我们将利用训练集对模型进行训练,并对测试集上的数据进行预测。
利用最原始的天真预测法,我们在测试集上的预测误差为每月136.761单位的销售量,这为我们提供了一个可接受的误差上限(性能下限)。
我们将使用滚动预测方法来测试我们的模型,这种方法也称为步进验证方法。
测试时以测试数据集的每个时间结点为一个单位,并对这个结点进行预测,然后将该节点的实际数据值提供给模型以用于下一个时间结点的预测。
这模拟了一个真实世界的情景,每个月都有新的洗发水销售数据,并且可以用于下个月的预测。
我们通过设计训练集和测试集的结构来实现这一点。
我们将所有测试数据集的预测进行整合,并计算误差以评价模型性能。我们将使用均方根误差(RMSE)作为误差函数,因为它会惩罚较大的偏差,并得出与预测数据相同单位的结果,即洗发水的月销售量。
在我们用数据集训练模型之前,我们必须对数据进行一些变换。
在训练和预测之前,我们需要进行对数据集执行以下三个操作。
预测过程中,我们需要对数据进行相反的变换,使其变回它们的原始尺度,而后再给出预测结果并计算误差。
我们将使用一个基本的有状态LSTM模型,其中1个神经元将被1000次迭代训练。
由于我们将使用步进验证的方式对测试集12个月中每个月的数据进行预测,所以处理时的批大小为1。
批大小为1也意味着我们将使用同步训练而不是批量训练或小批量训练来拟合该模型。因此,可以预想模型拟合会存在一些偏差。
理想情况下,我们应该增加更多的迭代次数(如1500次),但是为了保证运行时间的可接受性我们将其缩减为1000次。
该模型将使用高效的ADAM优化算法和均方误差函数进行训练。
我们将每个实验场景将运行30次,并在每次运行结束时记录测试集上的均方根误差值。
接下来让我们深入了解实验的细节。
我们从基本的LSTM模型开始讲起本。
此问题的基线LSTM模型具有以下配置:
下面提供了完整的代码。
这份代码将作为以下所有实验的基础,我们在随后的章节中仅提供对此代码的更改部分。
from pandas import DataFrame
from pandas import Series
from pandas import concat
from pandas import read_csv
from pandas import datetime
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from math import sqrt
import matplotlib
# be able to save images on server
matplotlib.use('Agg')
from matplotlib import pyplot
import numpy
# date-time parsing function for loading the dataset
def parser(x):
return datetime.strptime('190'+x, '%Y-%m')
# frame a sequence as a supervised learning problem
def timeseries_to_supervised(data, lag=1):
df = DataFrame(data)
columns = [df.shift(i) for i in range(1, lag+1)]
columns.append(df)
df = concat(columns, axis=1)
return df
# create a differenced series
def difference(dataset, interval=1):
diff = list()
for i in range(interval, len(dataset)):
value = dataset[i] - dataset[i - interval]
diff.append(value)
return Series(diff)
# invert differenced value
def inverse_difference(history, yhat, interval=1):
return yhat + history[-interval]
# scale train and test data to [-1, 1]
def scale(train, test):
# fit scaler
scaler = MinMaxScaler(feature_range=(-1, 1))
scaler = scaler.fit(train)
# transform train
train = train.reshape(train.shape[0], train.shape[1])
train_scaled = scaler.transform(train)
# transform test
test = test.reshape(test.shape[0], test.shape[1])
test_scaled = scaler.transform(test)
return scaler, train_scaled, test_scaled
# inverse scaling for a forecasted value
def invert_scale(scaler, X, yhat):
new_row = [x for x in X] + [yhat]
array = numpy.array(new_row)
array = array.reshape(1, len(array))
inverted = scaler.inverse_transform(array)
return inverted[0, -1]
# fit an LSTM network to training data
def fit_lstm(train, n_batch, nb_epoch, n_neurons):
X, y = train[:, 0:-1], train[:, -1]
X = X.reshape(X.shape[0], 1, X.shape[1])
model = Sequential()
model.add(LSTM(n_neurons, batch_input_shape=(n_batch, X.shape[1], X.shape[2]), stateful=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
for i in range(nb_epoch):
model.fit(X, y, epochs=1, batch_size=n_batch, verbose=0, shuffle=False)
model.reset_states()
return model
# run a repeated experiment
def experiment(series, n_lag, n_repeats, n_epochs, n_batch, n_neurons):
# transform data to be stationary
raw_values = series.values
diff_values = difference(raw_values, 1)
# transform data to be supervised learning
supervised = timeseries_to_supervised(diff_values, n_lag)
supervised_values = supervised.values[n_lag:,:]
# split data into train and test-sets
train, test = supervised_values[0:-12], supervised_values[-12:]
# transform the scale of the data
scaler, train_scaled, test_scaled = scale(train, test)
# run experiment
error_scores = list()
for r in range(n_repeats):
# fit the model
train_trimmed = train_scaled[2:, :]
lstm_model = fit_lstm(train_trimmed, n_batch, n_epochs, n_neurons)
# forecast test dataset
test_reshaped = test_scaled[:,0:-1]
test_reshaped = test_reshaped.reshape(len(test_reshaped), 1, 1)
output = lstm_model.predict(test_reshaped, batch_size=n_batch)
predictions = list()
for i in range(len(output)):
yhat = output[i,0]
X = test_scaled[i, 0:-1]
# invert scaling
yhat = invert_scale(scaler, X, yhat)
# invert differencing
yhat = inverse_difference(raw_values, yhat, len(test_scaled)+1-i)
# store forecast
predictions.append(yhat)
# report performance
rmse = sqrt(mean_squared_error(raw_values[-12:], predictions))
print('%d) Test RMSE: %.3f' % (r+1, rmse))
error_scores.append(rmse)
return error_scores
# configure the experiment
def run():
# load dataset
series = read_csv('shampoo-sales.csv', header=0, parse_dates=[0], index_col=0, squeeze=True,date_parser=parser)
# configure the experiment
n_lag = 1
n_repeats = 30
n_epochs = 1000
n_batch = 4
n_neurons = 3
# run the experiment
results = DataFrame()
results['results'] = experiment(series, n_lag, n_repeats, n_epochs, n_batch, n_neurons)
# summarize results
print(results.describe())
# save boxplot
results.boxplot()
pyplot.savefig('experiment_baseline.png')
# entry point
run()
在运行实验程序之后,我们可以打印出所有测试的均方根误差(RMSE)汇总信息。
我们可以看到,平均而言这个模型配置在洗发水销售的预测上的均方根误差(RMSE)为92,误差的标准差为5。
results
count 30.000000
mean 92.842537
std 5.748456
min 81.205979
25% 89.514367
50% 92.030003
75% 96.926145
max 105.247117
基于测试结果RMSE的分布中我们也创建了相应的箱线图,并将其保存到文件中。
这张图对RMSE的分布给出了一个简要而清晰的描述,其中盒子表示中间50%的RMSE取值,绿线表示RMSE分布的中位数。
在网络配置时需要考虑的另一个问题是,在模型训练的过程中它的表现如何。
我们可以在每一次迭代之后都对模型在训练集和测试集上的性能进行评估,以了解是否存在过拟合或者欠拟合的问题。
我们将在每组实验的最好结果上使用这种分析方法。相同的配置将会运行10次,并且每一步迭代之后模型在训练集和测试集上的RMSE将会被绘制出来。
本项目中,我们将使用这个分析方法对LSTM模型进行1000次迭代训练。
下面提供完整的分析代码。
与之前的代码一样,下面的代码将被用作本教程中所有分析的基础,并且在后续章节中只提供对此代码的修改部分。
from pandas import DataFrame
from pandas import Series
from pandas import concat
from pandas import read_csv
from pandas import datetime
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from math import sqrt
import matplotlib
# be able to save images on server
matplotlib.use('Agg')
from matplotlib import pyplot
import numpy
# date-time parsing function for loading the dataset
def parser(x):
return datetime.strptime('190'+x, '%Y-%m')
# frame a sequence as a supervised learning problem
def timeseries_to_supervised(data, lag=1):
df = DataFrame(data)
columns = [df.shift(i) for i in range(1, lag+1)]
columns.append(df)
df = concat(columns, axis=1)
return df
# create a differenced series
def difference(dataset, interval=1):
diff = list()
for i in range(interval, len(dataset)):
value = dataset[i] - dataset[i - interval]
diff.append(value)
return Series(diff)
# scale train and test data to [-1, 1]
def scale(train, test):
# fit scaler
scaler = MinMaxScaler(feature_range=(-1, 1))
scaler = scaler.fit(train)
# transform train
train = train.reshape(train.shape[0], train.shape[1])
train_scaled = scaler.transform(train)
# transform test
test = test.reshape(test.shape[0], test.shape[1])
test_scaled = scaler.transform(test)
return scaler, train_scaled, test_scaled
# inverse scaling for a forecasted value
def invert_scale(scaler, X, yhat):
new_row = [x for x in X] + [yhat]
array = numpy.array(new_row)
array = array.reshape(1, len(array))
inverted = scaler.inverse_transform(array)
return inverted[0, -1]
# evaluate the model on a dataset, returns RMSE in transformed units
def evaluate(model, raw_data, scaled_dataset, scaler, offset, batch_size):
# separate
X, y = scaled_dataset[:,0:-1], scaled_dataset[:,-1]
# reshape
reshaped = X.reshape(len(X), 1, 1)
# forecast dataset
output = model.predict(reshaped, batch_size=batch_size)
# invert data transforms on forecast
predictions = list()
for i in range(len(output)):
yhat = output[i,0]
# invert scaling
yhat = invert_scale(scaler, X[i], yhat)
# invert differencing
yhat = yhat + raw_data[i]
# store forecast
predictions.append(yhat)
# report performance
rmse = sqrt(mean_squared_error(raw_data[1:], predictions))
# reset model state
model.reset_states()
return rmse
# fit an LSTM network to training data
def fit_lstm(train, test, raw, scaler, batch_size, nb_epoch, neurons):
X, y = train[:, 0:-1], train[:, -1]
X = X.reshape(X.shape[0], 1, X.shape[1])
# prepare model
model = Sequential()
model.add(LSTM(neurons, batch_input_shape=(batch_size, X.shape[1], X.shape[2]), stateful=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
# fit model
train_rmse, test_rmse = list(), list()
for i in range(nb_epoch):
model.fit(X, y, epochs=1, batch_size=batch_size, verbose=0, shuffle=False)
model.reset_states()
# evaluate model on train data
raw_train = raw[-(len(train)+len(test)+1):-len(test)]
train_rmse.append(evaluate(model, raw_train, train, scaler, 0, batch_size))
# evaluate model on test data
raw_test = raw[-(len(test)+1):]
test_rmse.append(evaluate(model, raw_test, test, scaler, 0, batch_size))
history = DataFrame()
history['train'], history['test'] = train_rmse, test_rmse
return history
# run diagnostic experiments
def run():
# config
n_lag = 1
n_repeats = 10
n_epochs = 1000
n_batch = 4
n_neurons = 3
# load dataset
series = read_csv('shampoo-sales.csv', header=0, parse_dates=[0], index_col=0, squeeze=True,date_parser=parser)
# transform data to be stationary
raw_values = series.values
diff_values = difference(raw_values, 1)
# transform data to be supervised learning
supervised = timeseries_to_supervised(diff_values, n_lag)
supervised_values = supervised.values[n_lag:,:]
# split data into train and test-sets
train, test = supervised_values[0:-12], supervised_values[-12:]
# transform the scale of the data
scaler, train_scaled, test_scaled = scale(train, test)
# fit and evaluate model
train_trimmed = train_scaled[2:, :]
# run diagnostic tests
for i in range(n_repeats):
history = fit_lstm(train_trimmed, test_scaled, raw_values, scaler, n_batch, n_epochs, n_neurons)
pyplot.plot(history['train'], color='blue')
pyplot.plot(history['test'], color='orange')
print('%d) TrainRMSE=%f, TestRMSE=%f' % (i+1, history['train'].iloc[-1], history['test'].iloc[-1]))
pyplot.savefig('diagnostic_baseline.png')
# entry point
run()
运行该分析程序不仅可以打印出最终模型在训练集和测试集上的均方根误差(RMSE),更重要的是画出了每次迭代训练之后训练集(蓝线)和测试集(橙线)上的RMSE变化折线图。
在本案例中,分析图显示在前400-500次迭代过程中误差一直在稳定下降,在此之后训练集误差继续下降而测试集误差有小幅上升,表示模型可能存在一定的过拟合。
我们可以对LSTM结点的输入连接应用Dropout技术。
对输入采用Dropout,即到每个LSTM模块的输入连接上的数据将会以一定概率在前向激活和反向权值更新的过程中失活。
在Keras中,我们可以通过在创建LSTM层时指定dropout参数来决定是否使用Dropout。参数值在是介于0和1之间的失活概率值。
在这个实验中,我们将比较不使用Dropout和使用失活概率分别为20%,40%和60%的Dropout时的区别。
下面列出了针对输入层Dropout修改的fit_lstm(),experiment()和run()函数。
# fit an LSTM network to training data
def fit_lstm(train, n_batch, nb_epoch, n_neurons, dropout):
X, y = train[:, 0:-1], train[:, -1]
X = X.reshape(X.shape[0], 1, X.shape[1])
model = Sequential()
model.add(LSTM(n_neurons, batch_input_shape=(n_batch, X.shape[1], X.shape[2]), stateful=True,dropout=dropout))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
for i in range(nb_epoch):
model.fit(X, y, epochs=1, batch_size=n_batch, verbose=0, shuffle=False)
model.reset_states()
return model
# run a repeated experiment
def experiment(series, n_lag, n_repeats, n_epochs, n_batch, n_neurons, dropout):
# transform data to be stationary
raw_values = series.values
diff_values = difference(raw_values, 1)
# transform data to be supervised learning
supervised = timeseries_to_supervised(diff_values, n_lag)
supervised_values = supervised.values[n_lag:,:]
# split data into train and test-sets
train, test = supervised_values[0:-12], supervised_values[-12:]
# transform the scale of the data
scaler, train_scaled, test_scaled = scale(train, test)
# run experiment
error_scores = list()
for r in range(n_repeats):
# fit the model
train_trimmed = train_scaled[2:, :]
lstm_model = fit_lstm(train_trimmed, n_batch, n_epochs, n_neurons, dropout)
# forecast test dataset
test_reshaped = test_scaled[:,0:-1]
test_reshaped = test_reshaped.reshape(len(test_reshaped), 1, 1)
output = lstm_model.predict(test_reshaped, batch_size=n_batch)
predictions = list()
for i in range(len(output)):
yhat = output[i,0]
X = test_scaled[i, 0:-1]
# invert scaling
yhat = invert_scale(scaler, X, yhat)
# invert differencing
yhat = inverse_difference(raw_values, yhat, len(test_scaled)+1-i)
# store forecast
predictions.append(yhat)
# report performance
rmse = sqrt(mean_squared_error(raw_values[-12:], predictions))
print('%d) Test RMSE: %.3f' % (r+1, rmse))
error_scores.append(rmse)
return error_scores
# configure the experiment
def run():
# load dataset
series = read_csv('shampoo-sales.csv', header=0, parse_dates=[0], index_col=0, squeeze=True,date_parser=parser)
# configure the experiment
n_lag = 1
n_repeats = 30
n_epochs = 1000
n_batch = 4
n_neurons = 3
n_dropout = [0.0, 0.2, 0.4, 0.6]
# run the experiment
results = DataFrame()
for dropout in n_dropout:
results[str(dropout)] = experiment(series, n_lag, n_repeats, n_epochs, n_batch, n_neurons, dropout)
# summarize results
print(results.describe())
# save boxplot
results.boxplot()
pyplot.savefig('experiment_dropout_input.png')
运行此程序会输出每个不同配置模型训练结果的描述性统计信息。
结果表明,整体上看失活率为40%的Dropout模型拥有更好的表现,但20%,40%和60%失活率之间的平均结果差异非常小,相较于不采用Dropout均有所改善。
0.0 0.2 0.4 0.6
count 30.000000 30.000000 30.000000 30.000000
mean 97.578280 89.448450 88.957421 89.810789
std 7.927639 5.807239 4.070037 3.467317
min 84.749785 81.315336 80.662878 84.300135
25% 92.520968 84.712064 85.885858 87.766818
50% 97.324110 88.109654 88.790068 89.585945
75% 101.258252 93.642621 91.515127 91.109452
max 123.578235 104.528209 96.687333 99.660331
同样地,我们绘制出每个模型配置结果的箱线图来比较它们的区别。
从图中我们可以看出随着Dropout失活率的增加,模型误差RMSE有所下降。该图还表明,20%的输入失活率模型可能具有较低的RMSE中位数。
结果表明,我们应该在LSTM输入连接中适当使用Dropout,失活率约为40%。
我们可以分析一下40%输入失活率的Dropout是如何影响模型训练时的动态性能的。
下面的代码总结了分析代码中fit_lstm()和run()函数在之前版本之上的更新。
# fit an LSTM network to training data
def fit_lstm(train, test, raw, scaler, batch_size, nb_epoch, neurons, dropout):
X, y = train[:, 0:-1], train[:, -1]
X = X.reshape(X.shape[0], 1, X.shape[1])
# prepare model
model = Sequential()
model.add(LSTM(neurons, batch_input_shape=(batch_size, X.shape[1], X.shape[2]), stateful=True,dropout=dropout))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
# fit model
train_rmse, test_rmse = list(), list()
for i in range(nb_epoch):
model.fit(X, y, epochs=1, batch_size=batch_size, verbose=0, shuffle=False)
model.reset_states()
# evaluate model on train data
raw_train = raw[-(len(train)+len(test)+1):-len(test)]
train_rmse.append(evaluate(model, raw_train, train, scaler, 0, batch_size))
# evaluate model on test data
raw_test = raw[-(len(test)+1):]
test_rmse.append(evaluate(model, raw_test, test, scaler, 0, batch_size))
history = DataFrame()
history['train'], history['test'] = train_rmse, test_rmse
return history
# run diagnostic experiments
def run():
# config
n_lag = 1
n_repeats = 10
n_epochs = 1000
n_batch = 4
n_neurons = 3
dropout = 0.4
# load dataset
series = read_csv('shampoo-sales.csv', header=0, parse_dates=[0], index_col=0, squeeze=True,date_parser=parser)
# transform data to be stationary
raw_values = series.values
diff_values = difference(raw_values, 1)
# transform data to be supervised learning
supervised = timeseries_to_supervised(diff_values, n_lag)
supervised_values = supervised.values[n_lag:,:]
# split data into train and test-sets
train, test = supervised_values[0:-12], supervised_values[-12:]
# transform the scale of the data
scaler, train_scaled, test_scaled = scale(train, test)
# fit and evaluate model
train_trimmed = train_scaled[2:, :]
# run diagnostic tests
for i in range(n_repeats):
history = fit_lstm(train_trimmed, test_scaled, raw_values, scaler, n_batch, n_epochs, n_neurons,dropout)
pyplot.plot(history['train'], color='blue')
pyplot.plot(history['test'], color='orange')
print('%d) TrainRMSE=%f, TestRMSE=%f' % (i+1, history['train'].iloc[-1], history['test'].iloc[-1]))
pyplot.savefig('diagnostic_dropout_input.png')
运行更新的分析程序,我们可以绘制出每次迭代之后模型在训练集和测试集上的均方根误差(RMSE)曲线图。
从结果中我们可以看到误差轨迹线发生了明显变化,这在测试集上表现得尤为明显。
我们还发现过拟合的问题已经解决,整个1000次迭代过程中测试集误差持续下降,这可能也意味着需要更多的训练次数来利用这个特性。
Dropout也可以应用于LSTM结点的递归输入数据。
在Keras中,这是通过在定义LSTM层时设置recurrent_dropout参数来实现的。
在本实验中,我们将比较失活率为20%,40%和60%的Dropout模型与不使用Dropout时的区别。
下面列出了针对该模型修改的fit_lstm(),experiment()和run()函数。
# fit an LSTM network to training data
def fit_lstm(train, n_batch, nb_epoch, n_neurons, dropout):
X, y = train[:, 0:-1], train[:, -1]
X = X.reshape(X.shape[0], 1, X.shape[1])
model = Sequential()
model.add(LSTM(n_neurons, batch_input_shape=(n_batch, X.shape[1], X.shape[2]), stateful=True,recurrent_dropout=dropout))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
for i in range(nb_epoch):
model.fit(X, y, epochs=1, batch_size=n_batch, verbose=0, shuffle=False)
model.reset_states()
return model
# run a repeated experiment
def experiment(series, n_lag, n_repeats, n_epochs, n_batch, n_neurons, dropout):
# transform data to be stationary
raw_values = series.values
diff_values = difference(raw_values, 1)
# transform data to be supervised learning
supervised = timeseries_to_supervised(diff_values, n_lag)
supervised_values = supervised.values[n_lag:,:]
# split data into train and test-sets
train, test = supervised_values[0:-12], supervised_values[-12:]
# transform the scale of the data
scaler, train_scaled, test_scaled = scale(train, test)
# run experiment
error_scores = list()
for r in range(n_repeats):
# fit the model
train_trimmed = train_scaled[2:, :]
lstm_model = fit_lstm(train_trimmed, n_batch, n_epochs, n_neurons, dropout)
# forecast test dataset
test_reshaped = test_scaled[:,0:-1]
test_reshaped = test_reshaped.reshape(len(test_reshaped), 1, 1)
output = lstm_model.predict(test_reshaped, batch_size=n_batch)
predictions = list()
for i in range(len(output)):
yhat = output[i,0]
X = test_scaled[i, 0:-1]
# invert scaling
yhat = invert_scale(scaler, X, yhat)
# invert differencing
yhat = inverse_difference(raw_values, yhat, len(test_scaled)+1-i)
# store forecast
predictions.append(yhat)
# report performance
rmse = sqrt(mean_squared_error(raw_values[-12:], predictions))
print('%d) Test RMSE: %.3f' % (r+1, rmse))
error_scores.append(rmse)
return error_scores
# configure the experiment
def run():
# load dataset
series = read_csv('shampoo-sales.csv', header=0, parse_dates=[0], index_col=0, squeeze=True,date_parser=parser)
# configure the experiment
n_lag = 1
n_repeats = 30
n_epochs = 1000
n_batch = 4
n_neurons = 3
n_dropout = [0.0, 0.2, 0.4, 0.6]
# run the experiment
results = DataFrame()
for dropout in n_dropout:
results[str(dropout)] = experiment(series, n_lag, n_repeats, n_epochs, n_batch, n_neurons, dropout)
# summarize results
print(results.describe())
# save boxplot
results.boxplot()
pyplot.savefig('experiment_dropout_recurrent.png')
运行此程序会输出每个不同配置模型训练结果的描述性统计信息。
平均结果表明失活率为20%或40%的Dropout模型有最好的表现,但总体而言结果并没有比无Dropout模型好很多。
0.0 0.2 0.4 0.6
count 30.000000 30.000000 30.000000 30.000000
mean 95.743719 93.658016 93.706112 97.354599
std 9.222134 7.318882 5.591550 5.626212
min 80.144342 83.668154 84.585629 87.215540
25% 88.336066 87.071944 89.859503 93.940016
50% 96.703481 92.522428 92.698024 97.119864
75% 101.902782 100.554822 96.252689 100.915336
max 113.400863 106.222955 104.347850 114.160922
同样,我们也创建一个箱线图来比较每种配置的结果分布。
图中突出显示40%失活率相比于20%和0%更加集中的误差分布,也许使这种配置具有一定优势。该图还表明,使用Dropout之后RMSE的最小值似乎已经受到影响,从而提供更差的最佳性能。
我们可以分析一下40%失活率的递归连接Dropout是如何影响模型训练时的动态性能的。
下面的代码总结了分析代码中fit_lstm()和run()函数在之前版本之上的更新。
# fit an LSTM network to training data
def fit_lstm(train, test, raw, scaler, batch_size, nb_epoch, neurons, dropout):
X, y = train[:, 0:-1], train[:, -1]
X = X.reshape(X.shape[0], 1, X.shape[1])
# prepare model
model = Sequential()
model.add(LSTM(neurons, batch_input_shape=(batch_size, X.shape[1], X.shape[2]), stateful=True,recurrent_dropout=dropout))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
# fit model
train_rmse, test_rmse = list(), list()
for i in range(nb_epoch):
model.fit(X, y, epochs=1, batch_size=batch_size, verbose=0, shuffle=False)
model.reset_states()
# evaluate model on train data
raw_train = raw[-(len(train)+len(test)+1):-len(test)]
train_rmse.append(evaluate(model, raw_train, train, scaler, 0, batch_size))
# evaluate model on test data
raw_test = raw[-(len(test)+1):]
test_rmse.append(evaluate(model, raw_test, test, scaler, 0, batch_size))
history = DataFrame()
history['train'], history['test'] = train_rmse, test_rmse
return history
# run diagnostic experiments
def run():
# config
n_lag = 1
n_repeats = 10
n_epochs = 1000
n_batch = 4
n_neurons = 3
dropout = 0.4
# load dataset
series = read_csv('shampoo-sales.csv', header=0, parse_dates=[0], index_col=0, squeeze=True,date_parser=parser)
# transform data to be stationary
raw_values = series.values
diff_values = difference(raw_values, 1)
# transform data to be supervised learning
supervised = timeseries_to_supervised(diff_values, n_lag)
supervised_values = supervised.values[n_lag:,:]
# split data into train and test-sets
train, test = supervised_values[0:-12], supervised_values[-12:]
# transform the scale of the data
scaler, train_scaled, test_scaled = scale(train, test)
# fit and evaluate model
train_trimmed = train_scaled[2:, :]
# run diagnostic tests
for i in range(n_repeats):
history = fit_lstm(train_trimmed, test_scaled, raw_values, scaler, n_batch, n_epochs, n_neurons,dropout)
pyplot.plot(history['train'], color='blue')
pyplot.plot(history['test'], color='orange')
print('%d) TrainRMSE=%f, TestRMSE=%f' % (i+1, history['train'].iloc[-1], history['test'].iloc[-1]))
pyplot.savefig('diagnostic_dropout_recurrent.png')
运行更新的分析程序,我们可以绘制出每次迭代之后模型在训练集和测试集上的均方根误差(RMSE)曲线图。
从结果中我们可以看到测试集上误差轨迹线发生了明显变化,但在训练集上却影响甚微。同时我们也可以看到,在500次迭代之后测试集上的误差达到稳定,并且没有上升的趋势。
至少从这个模型和这个问题的角度上来说,递归连接的Dropout可能不会发挥太大作用。
本部分列出了一些完成教程之后你可能会考虑的一些后续想法。
更多有关Keras框架下在多层神经网络(MLP)模型中使用Dropout技术的相关内容,请参阅以下文章:
以下是一些有关在LSTM网络中使用Dropout的文章,这些文章也许适合您的进一步阅读。
在本教程中,您了解了如何使用带有Dropout的LSTM模型进行时间序列预测。
具体来说,您学习到:
对于LSTM模型中使用Dropout依然有所困惑? 在下面的评论中提出您的问题,我会尽我所能给出答复。