首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >通过了与initial_state`不兼容的Keras双向LSTM:`cell.state_size

通过了与initial_state`不兼容的Keras双向LSTM:`cell.state_size
EN

Stack Overflow用户
提问于 2019-07-24 20:28:20
回答 1查看 4.1K关注 0票数 5

我试图在Keras中构建一个堆叠的双向LSTM seq2seq模型,但是当将编码器的输出状态传递给解码器的输入状态时,我遇到了一个问题。基于这个拉请求,这看起来应该是可能的。最终,我希望保留encoder_output向量,用于其他下游任务。

错误信息:

代码语言:javascript
运行
复制
ValueError: An `initial_state` was passed that is not compatible with `cell.state_size`. Received `state_spec`=[InputSpec(shape=(None, 100), ndim=2)]; however `cell.state_size` is (100, 100)

我的模特:

代码语言:javascript
运行
复制
MAX_SEQUENCE_LENGTH = 50
EMBEDDING_DIM = 250
latent_size_1 = 100
latent_size_2 = 50
latent_size_3 = 250

embedding_layer = Embedding(num_words,
                            EMBEDDING_DIM,
                            embeddings_initializer=Constant(embedding_matrix),
                            input_length=MAX_SEQUENCE_LENGTH,
                            trainable=False,
                            mask_zero=True)

encoder_inputs = Input(shape=(MAX_SEQUENCE_LENGTH,), name="encoder_input")
encoder_emb = embedding_layer(encoder_inputs)
encoder_lstm_1 = Bidirectional(LSTM(latent_size_1, return_sequences=True),                                                         
                               merge_mode="concat",
                               name="encoder_lstm_1")(encoder_emb)
encoder_outputs, forward_h, forward_c, backward_h, backward_c = Bidirectional(LSTM(latent_size_2, return_state=True), 
                               merge_mode="concat"
                               name="encoder_lstm_2")(encoder_lstm_1)
state_h = Concatenate()([forward_h, backward_h])
state_c = Concatenate()([forward_c, backward_c])
encoder_states = [state_h, state_c]

decoder_inputs = Input(shape=(MAX_SEQUENCE_LENGTH,), name="decoder_input")
decoder_emb = embedding_layer(decoder_inputs)
decoder_lstm_1 =  Bidirectional(LSTM(latent_size_1, return_sequences=True), 
                                merge_mode="concat", 
                                name="decoder_lstm_1")(decoder_emb, initial_state=encoder_states)
decoder_lstm_2 =  Bidirectional(LSTM(latent_size_3, return_sequences=True), 
                                merge_mode="concat",
                                name="decoder_lstm_2")(decoder_lstm_1)
decoder_outputs = Dense(num_words, activation='softmax', name="Dense_layer")(decoder_lstm_2)

seq2seq_Model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

非常感谢您的任何帮助/建议/指导!

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-07-24 23:21:54

你的代码有两个问题,

  1. 正如@Daniel所指出的,您不应该连接encoder_states中的状态(而应该使用encoder_states = [forward_h, forward_c, backward_h, backward_c])。
  2. 编码器返回的状态是大小为latent_size_2 (而不是latent_size_1)。因此,如果您希望这样做,作为您的解码器初始状态,您的解码器应该是latent_size_2

您可以在下面找到带有这些更正的代码。

代码语言:javascript
运行
复制
from tensorflow.keras.layers import Embedding, Input, Bidirectional, LSTM, Dense, Concatenate
from tensorflow.keras.initializers import Constant
from tensorflow.keras.models import Model

MAX_SEQUENCE_LENGTH = 50
EMBEDDING_DIM = 250
latent_size_1 = 100
latent_size_2 = 50
latent_size_3 = 250
num_words = 5000
embedding_layer = Embedding(num_words,
                            EMBEDDING_DIM,
                            embeddings_initializer=Constant(1.0),
                            input_length=MAX_SEQUENCE_LENGTH,
                            trainable=False,
                            mask_zero=True)

encoder_inputs = Input(shape=(MAX_SEQUENCE_LENGTH,), name="encoder_input")
encoder_emb = embedding_layer(encoder_inputs)
encoder_lstm_1 = Bidirectional(LSTM(latent_size_1, return_sequences=True),                                                         
                               merge_mode="concat",
                               name="encoder_lstm_1")(encoder_emb)
encoder_outputs, forward_h, forward_c, backward_h, backward_c = Bidirectional(LSTM(latent_size_2, return_state=True), 
                               merge_mode="concat", name="encoder_lstm_2")(encoder_lstm_1)
encoder_states = [forward_h, forward_c, backward_h, backward_c]

decoder_inputs = Input(shape=(MAX_SEQUENCE_LENGTH,), name="decoder_input")
decoder_emb = embedding_layer(decoder_inputs)
decoder_lstm_1 =  Bidirectional(
    LSTM(latent_size_2, return_sequences=True), 
    merge_mode="concat", name="decoder_lstm_1")(decoder_emb, initial_state=encoder_states)
decoder_lstm_2 =  Bidirectional(LSTM(latent_size_3, return_sequences=True), 
                                merge_mode="concat",
                                name="decoder_lstm_2")(decoder_lstm_1)
decoder_outputs = Dense(num_words, activation='softmax', name="Dense_layer")(decoder_lstm_2)

seq2seq_Model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
票数 10
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/57190769

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档