在我的数据集中,目标/输出变量是Sales
列,数据集中的每一行都记录一年(2008-2017)中每一天的Sales
。数据集如下所示:
我的目标是基于这样的数据集建立一个LSTM模型,该模型应该能够在训练结束时提供预测。我在2008-2016年的数据上训练这个模型,并使用2017年数据的一半作为验证,其余的作为测试集。
以前,我尝试使用dropout和提前停止来创建模型。如下所示:
mdl1 <- keras_model_sequential()
mdl1 %>%
layer_lstm(units = 512, input_shape = c(1, 3), return_sequences = T ) %>%
layer_dropout(rate = 0.3) %>%
layer_lstm(units = 512, return_sequences = FALSE) %>%
layer_dropout(rate = 0.2) %>%
layer_dense(units = 1, activation = "linear")
mdl1 %>% compile(loss = 'mse', optimizer = 'rmsprop')
该模型如下所示
___________________________________________________________
Layer (type) Output Shape Param #
===========================================================
lstm_25 (LSTM) (None, 1, 512) 1056768
___________________________________________________________
dropout_25 (Dropout) (None, 1, 512) 0
___________________________________________________________
lstm_26 (LSTM) (None, 512) 2099200
___________________________________________________________
dropout_26 (Dropout) (None, 512) 0
___________________________________________________________
dense_13 (Dense) (None, 1) 513
===========================================================
Total params: 3,156,481
Trainable params: 3,156,481
Non-trainable params: 0
___________________________________________________________
为了训练模型,提前停止与验证集一起使用。
mdl1.history <- mdl1 %>%
fit(dt.tr, dt.tr.out, epochs=500, shuffle=F,
validation_data = list(dt.val, dt.val.out),
callbacks = list(
callback_early_stopping(min_delta = 0.000001, patience = 10, verbose = 1)
))
最重要的是,我希望使用批处理标准化来加快训练速度。根据我的理解,要使用批处理标准化,我需要将数据分成批处理,并将layer_batch_normalization
应用于每个隐藏层的输入。模型层如下所示:
batch_size <- 32
mdl2 <- keras_model_sequential()
mdl2 %>%
layer_batch_normalization(input_shape = c(1, 3), batch_size = batch_size) %>%
layer_lstm(units = 512, return_sequences = T) %>%
layer_dropout(rate = 0.3) %>%
layer_batch_normalization(batch_size = batch_size) %>%
layer_lstm(units = 512, return_sequences = F) %>%
layer_dropout(rate = 0.2) %>%
layer_batch_normalization(batch_size = batch_size) %>%
layer_dense(units = 1, activation = "linear")
mdl2 %>% compile(loss = 'mse', optimizer = 'rmsprop')
此模型如下所示:
______________________________________________________________________________
Layer (type) Output Shape Param #
==============================================================================
batch_normalization_34 (BatchNormalization) (32, 1, 3) 12
______________________________________________________________________________
lstm_27 (LSTM) (32, 1, 512) 1056768
______________________________________________________________________________
dropout_27 (Dropout) (32, 1, 512) 0
______________________________________________________________________________
batch_normalization_35 (BatchNormalization) (32, 1, 512) 2048
______________________________________________________________________________
lstm_28 (LSTM) (32, 1, 512) 2099200
______________________________________________________________________________
dropout_28 (Dropout) (32, 1, 512) 0
______________________________________________________________________________
batch_normalization_36 (BatchNormalization) (32, 1, 512) 2048
______________________________________________________________________________
dense_14 (Dense) (32, 1, 1) 513
==============================================================================
Total params: 3,160,589
Trainable params: 3,158,535
Non-trainable params: 2,054
______________________________________________________________________________
训练模型看起来像以前一样。唯一的区别在于训练和验证数据集,其大小是batch_size
的倍数(这里是32),通过从倒数第二批到最后一批的数据重新采样。
然而,mdl1
的性能比mdl2
要好得多,如下所示。
我不确定我到底做错了什么,因为我是从keras (以及一般的实用神经网络)开始的。此外,第一个模型的性能也不是很好;任何关于如何改进的建议也是很好的。
发布于 2019-08-07 00:55:30
LSTM中的批量规范化并不是那么容易实现的。一些论文展示了一些令人惊叹的结果,https://arxiv.org/pdf/1603.09025.pdf称之为递归批处理标准化。作者遵循以下公式进行应用
不幸的是,这个模型还没有在keras中实现,只在tensorflow https://github.com/OlavHN/bnlstm中实现。
然而,在激活函数没有居中和移位后,我能够使用(默认)批处理归一化获得良好的结果。这种方法不同于上面在c_t和h_t之后应用BN的方法,也许值得一试。
model = Sequential()
model.add(LSTM(neurons1,
activation=tf.nn.relu,
return_sequences=True,
input_shape=(timesteps, data_dim)))
model.add(BatchNormalization(momentum=m, scale=False, center=False))
model.add(LSTM(neurons2,
activation=tf.nn.relu))
model.add(BatchNormalization(momentum=m, scale=False, center=False))
model.add(Dense(1))
发布于 2018-01-31 17:50:43
我在Python中使用Keras,但我可以尝试R。在fit
方法中,文档中说,如果省略,则默认为32。这在当前版本中不再正确,因为它可以在source code中看到。我认为你应该这样尝试,至少在Python中是这样的:
mdl2 <- keras_model_sequential()
mdl2 %>%
layer_input(input_shape = c(1, 3)) %>%
layer_batch_normalization() %>%
layer_lstm(units = 512, return_sequences = T, dropout=0.3) %>%
layer_batch_normalization() %>%
layer_lstm(units = 512, return_sequences = F, dropout=0.2) %>%
layer_batch_normalization() %>%
layer_dense(units = 1, activation = "linear")
mdl2 %>% compile(loss = 'mse', optimizer = 'rmsprop')
mdl2.history <- mdl2 %>%
fit(dt.tr, dt.tr.out, epochs=500, shuffle=F,
validation_data = list(dt.val, dt.val.out),
batch_size=32,
callbacks = list(
callback_early_stopping(min_delta = 0.000001, patience = 10, verbose = 1)
))
https://stackoverflow.com/questions/48544953
复制相似问题