文章/答案/技术大牛

发布

社区首页 >专栏 >keras doc 8 BatchNormalization

keras doc 8 BatchNormalization

CreateAMind

发布于 2018-07-25 03:25:55

1.4K00

代码可运行

文章被收录于专栏：CreateAMindCreateAMind

运行总次数：0

代码可运行

规范化BatchNormalization

BatchNormalization层

keras.layers.normalization.BatchNormalization(epsilon=1e-06, mode=0, axis=-1, momentum=0.9, weights=None, beta_init='zero', gamma_init='one')

该层在每个batch上将前一层的激活值重新规范化，即使得其输出数据的均值接近0，其标准差接近1

参数

epsilon：大于0的小浮点数，用于防止除0错误
mode：整数，指定规范化的模式，取0或1
- 0：按特征规范化，输入的各个特征图将独立被规范化。规范化的轴由参数axis指定。注意，如果输入是形如（samples，channels，rows，cols）的4D图像张量，则应设置规范化的轴为1，即沿着通道轴规范化。输入格式是‘tf’同理。
- 1：按样本规范化，该模式默认输入为2D
axis：整数，指定当mode=0时规范化的轴。例如输入是形如（samples，channels，rows，cols）的4D图像张量，则应设置规范化的轴为1，意味着对每个特征图进行规范化
momentum：在按特征规范化时，计算数据的指数平均数和标准差时的动量
weights：初始化权重，为包含2个numpy array的list，其shape为[(input_shape,),(input_shape)]
beta_init：beta的初始化方法，为预定义初始化方法名的字符串，或用于初始化权重的Theano函数。该参数仅在不传递weights参数时有意义。
gamma_init：gamma的初始化方法，为预定义初始化方法名的字符串，或用于初始化权重的Theano函数。该参数仅在不传递weights参数时有意义。

输入shape

任意，当使用本层为模型首层时，指定input_shape参数时有意义。

输出shape

与输入shape相同

参考文献

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

【Tips】统计学习的一个重要假设是源空间与目标空间的数据分布是一致的，而神经网络各层输出的分布不一定与输入一致，尤其当网络越深，这种不一致越明显。BatchNormalization把分布一致弱化为均值与方差一致，然而即使是这种弱化的版本也对学习过程起到了重要效果。另一方面，BN的更重要作用是防止梯度弥散，它通过将激活值规范为统一的均值和方差，将原本会减小的激活值得到放大。【@Bigmoyan】

噪声层Noise

GaussianNoise层

keras.layers.noise.GaussianNoise(sigma)

为层的输入施加0均值，标准差为sigma的加性高斯噪声。该层在克服过拟合时比较有用，你可以将它看作是随机的数据提升。高斯噪声是需要对输入数据进行破坏时的自然选择。

一个使用噪声层的典型案例是构建去噪自动编码器，即Denoising AutoEncoder（DAE）。该编码器试图从加噪的输入中重构无噪信号，以学习到原始信号的鲁棒性表示。

因为这是一个起正则化作用的层，该层只在训练时才有效。

参数

sigma：浮点数，代表要产生的高斯噪声标准差

输入shape

任意，当使用该层为模型首层时需指定input_shape参数

输出shape

与输入相同

GaussianDropout层

keras.layers.noise.GaussianDropout(p)

为层的输入施加以1为均值，标准差为sqrt(p/(1-p)的乘性高斯噪声

因为这是一个起正则化作用的层，该层只在训练时才有效。

参数

p：浮点数，断连概率，与Dropout层相同

输入shape

任意，当使用该层为模型首层时需指定input_shape参数

输出shape

与输入相同

参考文献

Dropout: A Simple Way to Prevent Neural Networks from Overfitting

包装器Wrapper

TimeDistributed包装器

keras.layers.wrappers.TimeDistributed(layer)

该包装器可以把一个层应用到输入的每一个时间步上

参数

layer：Keras层对象

输入至少为3D张量，下标为1的维度将被认为是时间维

例如，考虑一个含有32个样本的batch，每个样本都是10个向量组成的序列，每个向量长为16，则其输入维度为(32,10,16)，其不包含batch大小的input_shape为(10,16)

我们可以使用包装器TimeDistributed包装Dense，以产生针对各个时间步信号的独立全连接：

# as the first layer in a model
model = Sequential()
model.add(TimeDistributed(Dense(8), input_shape=(10, 16)))
# now model.output_shape == (None, 10, 8)

# subsequent layers: no need for input_shape
model.add(TimeDistributed(Dense(32)))
# now model.output_shape == (None, 10, 32)

程序的输出数据shape为(32,10,8)

使用TimeDistributed包装Dense严格等价于layers.TimeDistribuedDense。不同的是包装器TimeDistribued还可以对别的层进行包装，如这里对Convolution2D包装：

model = Sequential()
model.add(TimeDistributed(Convolution2D(64, 3, 3), input_shape=(10, 3, 299, 299)))

Bidirectional包装器

keras.layers.wrappers.Bidirectional(layer, merge_mode='concat', weights=None)

双向RNN包装器

参数

layer：Recurrent对象
merge_mode：前向和后向RNN输出的结合方式，为sum,mul,concat,ave和None之一，若设为None，则返回值不结合，而是以列表的形式返回

例子

model = Sequential()
model.add(Bidirectional(LSTM(10, return_sequences=True), input_shape=(5, 10)))
model.add(Bidirectional(LSTM(10)))
model.add(Dense(5))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')

编写自己的层

对于简单的定制操作，我们或许可以通过使用layers.core.Lambda层来完成。但对于任何具有可训练权重的定制层，你应该自己来实现。

这里是一个Keras层应该具有的框架结构，要定制自己的层，你需要实现下面三个方法

build(input_shape)：这是定义权重的方法，可训练的权应该在这里被加入列表`self.trainable_weights中。其他的属性还包括self.non_trainabe_weights（列表）和self.updates（需要更新的形如（tensor, new_tensor）的tuple的列表）。你可以参考BatchNormalization层的实现来学习如何使用上面两个属性。
call(x)：这是定义层功能的方法，除非你希望你写的层支持masking，否则你只需要关心call的第一个参数：输入张量
get_output_shape_for(input_shape)：如果你的层修改了输入数据的shape，你应该在这里指定shape变化的方法，这个函数使得Keras可以做自动shape推断

from keras import backend as Kfrom keras.engine.topology import Layerclass MyLayer(Layer):
    def __init__(self, output_dim, **kwargs):
        self.output_dim = output_dim
        super(MyLayer, self).__init__(**kwargs)    def build(self, input_shape):
        input_dim = input_shape[1]
        initial_weight_value = np.random.random((input_dim, output_dim))
        self.W = K.variable(initial_weight_value)
        self.trainable_weights = [self.W]    def call(self, x, mask=None):
        return K.dot(x, self.W)    def get_output_shape_for(self, input_shape):
        return (input_shape[0] + self.output_dim)

调整旧版Keras编写的层以适应Keras1.0

以下内容是你在将旧版Keras实现的层调整为新版Keras应注意的内容，这些内容对你在Keras1.0中编写自己的层也有所帮助。

你的Layer应该继承自keras.engine.topology.Layer，而不是之前的keras.layers.core.Layer。另外，MaskedLayer已经被移除。
build方法现在接受input_shape参数，而不是像以前一样通过self.input_shape来获得该值，所以请把build(self)转为build(self, input_shape)
请正确将output_shape属性转换为方法get_output_shape_for(self, train=False)，并删去原来的output_shape
新层的计算逻辑现在应实现在call方法中，而不是之前的get_output。注意不要改动__call__方法。将get_output(self,train=False)转换为call(self,x,mask=None)后请删除原来的get_output方法。
Keras1.0不再使用布尔值train来控制训练状态和测试状态，如果你的层在测试和训练两种情形下表现不同，请在call中使用指定状态的函数。如，x=K.in_train_phase(train_x, test_y)。例如，在Dropout的call方法中你可以看到：

return K.in_train_phase(K.dropout(x, level=self.p), x)

get_config返回的配置信息可能会包括类名，请从该函数中将其去掉。如果你的层在实例化时需要更多信息（即使将config作为kwargs传入也不能提供足够信息），请重新实现from_config。请参考Lambda或Merge层看看复杂的from_config是如何实现的。
如果你在使用Masking，请实现compute_mas(input_tensor, input_mask)，该函数将返回output_mask。请确保在__init__()中设置self.supports_masking = True
如果你希望Keras在你编写的层与Keras内置层相连时进行输入兼容性检查，请在__init__设置self.input_specs或实现input_specs()并包装为属性（@property）。该属性应为engine.InputSpec的对象列表。在你希望在call中获取输入shape时，该属性也比较有用。
下面的方法和属性是内置的，请不要覆盖它们
- __call__
- add_input
- assert_input_compatibility
- set_input
- input
- output
- input_shape
- output_shape
- input_mask
- output_mask
- get_input_at
- get_output_at
- get_input_shape_at
- get_output_shape_at
- get_input_mask_at
- get_output_mask_at