Maxout激活函数是一种非线性激活函数,它能够学习到分段线性的激活函数,从而增强了神经网络的表达能力。Maxout函数的基本形式是:
[ \text{maxout}(x) = \max(w_1^T x + b_1, w_2^T x + b_2) ]
其中,( w_1 ) 和 ( w_2 ) 是权重向量,( b_1 ) 和 ( b_2 ) 是偏置项。
import numpy as np
def maxout_forward(x, weights, biases):
"""
x: 输入数据,形状为 (batch_size, input_dim)
weights: 权重矩阵,形状为 (num_units, input_dim)
biases: 偏置向量,形状为 (num_units,)
"""
z = np.dot(x, weights.T) + biases
return np.max(z, axis=1)
# 示例数据
x = np.array([[1, 2], [3, 4]]) # 输入数据
weights = np.array([[1, -1], [-1, 1]]) # 权重矩阵
biases = np.array([0, 0]) # 偏置向量
output = maxout_forward(x, weights, biases)
print("Maxout Forward Output:", output)
def maxout_backward(x, weights, biases, grad_output):
"""
x: 输入数据,形状为 (batch_size, input_dim)
weights: 权重矩阵,形状为 (num_units, input_dim)
biases: 偏置向量,形状为 (num_units,)
grad_output: 输出梯度,形状为 (batch_size,)
"""
batch_size = x.shape[0]
input_dim = x.shape[1]
num_units = weights.shape[0]
# 计算每个线性函数的值
z = np.dot(x, weights.T) + biases
# 创建一个掩码矩阵,标记哪些线性函数被选中
mask = (z == np.max(z, axis=1, keepdims=True))
# 计算梯度
grad_weights = np.dot(x.T, grad_output[:, None] * mask) / batch_size
grad_biases = np.sum(grad_output[:, None] * mask, axis=0) / batch_size
grad_x = np.dot(grad_output[:, None] * mask, weights) / batch_size
return grad_x, grad_weights, grad_biases
# 示例梯度
grad_output = np.array([1, 1]) # 输出梯度
grad_x, grad_weights, grad_biases = maxout_backward(x, weights, biases, grad_output)
print("Maxout Backward Gradients:")
print("Gradient w.r.t. x:", grad_x)
print("Gradient w.r.t. weights:", grad_weights)
print("Gradient w.r.t. biases:", grad_biases)
通过上述代码示例和解释,可以清楚地了解Maxout激活函数的正向和反向传播过程及其在NumPy中的实现方法。
领取专属 10元无门槛券
手把手带您无忧上云