您将编写两个辅助函数来初始化模型的参数。第一个函数将用于初始化双层模型的参数。第二个将把这个初始化过程推广到L层。
**练习**:创建并初始化2层神经网络的参数。
**说明**:
* 该模型的结构是:*LINEAR - > RELU - > LINEAR - > SIGMOID*。
* 对权重矩阵使用随机初始化。使用np.random.randn(shape)\*0.01正确的形状。
* 对偏差使用零初始化。使用np.zeros(shape)。
# GRADED FUNCTION: initialize\_parameters
def initialize\_parameters(n\_x, n\_h, n\_y):
"""
Argument:
n\_x -- size of the input laye
n\_h -- size of the hidden laye
n\_y -- size of the output laye
Returns:
parameters -- python dictionary containing your parameters:
W1 -- weight matrix of shape (n\_h, n\_x)
b1 -- bias vector of shape (n\_h, 1)
W2 -- weight matrix of shape (n\_y, n\_h)
b2 -- bias vector of shape (n\_y, 1)
"""
np.random.seed(1)
### START CODE HERE ### (≈ 4 lines of code)
W1 = np.random.randn(n\_h,n\_x)\*0.01
b1 = np.zeros((n\_h,1))
W2 = np.random.randn(n\_y,n\_h)\*0.01
b2 = np.zeros((n\_y,1))
### END CODE HERE ###
assert(W1.shape == (n\_h, n\_x))
assert(b1.shape == (n\_h, 1))
assert(W2.shape == (n\_y, n\_h))
assert(b2.shape == (n\_y, 1))
parameters = {"W1": W1,
"b1": b1,
"W2": W2,
"b2": b2}
return parameters 更深层的L层神经网络的初始化更复杂,因为有更多的权重矩阵和偏置向量。完成后initialize\_parameters\_deep,您应确保每个图层之间的尺寸匹配。回想一下,nl是层l中的单元数。因此,例如,如果我们的输入的大小X是(12288,209)(用m=209个实施例)然后:

# GRADED FUNCTION: initialize\_parameters\_deep
def initialize\_parameters\_deep(layer\_dims):
"""
Arguments:
layer\_dims -- python array (list) containing the dimensions of each layer in our network
Returns:
parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
Wl -- weight matrix of shape (layer\_dims[l], layer\_dims[l-1])
bl -- bias vector of shape (layer\_dims[l], 1)
"""
np.random.seed(3)
parameters = {}
L = len(layer\_dims) # number of layers in the network
for l in range(1, L):
### START CODE HERE ### (≈ 2 lines of code)
parameters['W' + str(l)] = np.random.randn(layer\_dims[l],layer\_dims[l-1])\*0.01
parameters['b' + str(l)] = np.zeros((layer\_dims[l],1))
### END CODE HERE ###
assert(parameters['W' + str(l)].shape == (layer\_dims[l], layer\_dims[l-1]))
assert(parameters['b' + str(l)].shape == (layer\_dims[l], 1))
return parameters现在您已经初始化了参数,您将进行前向传播模块。您将首先实现一些基本功能,稍后您将在实现模型时使用这些功能。您将按此顺序完成三个功能:
* LINEAR
* LINEAR - > ACTIVATION,其中ACTIVATION将是ReLU或Sigmoid。
* LINEAR - > RELU ×(L-1) - > LINEAR - > SIGMOID(整个模型)
# GRADED FUNCTION: linear\_forward
def linear\_forward(A, W, b):
"""
Implement the linear part of a layer's forward propagation.
Arguments:
A -- activations from previous layer (or input data): (size of previous layer, number of examples)
W -- weights matrix: numpy array of shape (size of current layer, size of previous layer)
b -- bias vector, numpy array of shape (size of the current layer, 1)
Returns:
Z -- the input of the activation function, also called pre-activation parameter
cache -- a python dictionary containing "A", "W" and "b" ; stored for computing the backward pass efficiently
"""
### START CODE HERE ### (≈ 1 line of code)
Z = Z = np.dot(W,A)+b
### END CODE HERE ###
assert(Z.shape == (W.shape[0], A.shape[1]))
cache = (A, W, b)
return Z, cache您将使用两个激活功能:

sigmoid``a``cache``Z```A, activation\_cache = sigmoid(Z)```* **ReLU**:ReLu的数学公式是A=RELU(Z)=max(0,Z)。我们为您提供了这项功能。此函数返回**两个**项目:A和Z(这是我们将输入相应的向后函数)。要使用它你可以回调:relu``A``cache``Z
```A, activation\_cache = relu(Z)```# GRADED FUNCTION: linear\_activation\_forward
def linear\_activation\_forward(A\_prev, W, b, activation):
"""
Implement the forward propagation for the LINEAR->ACTIVATION laye
Arguments:
A\_prev -- activations from previous layer (or input data): (size of previous layer, number of examples)
W -- weights matrix: numpy array of shape (size of current layer, size of previous layer)
b -- bias vector, numpy array of shape (size of the current layer, 1)
activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"
Returns:
A -- the output of the activation function, also called the post-activation value
cache -- a python dictionary containing "linear\_cache" and "activation\_cache";
stored for computing the backward pass efficiently
"""
if activation == "sigmoid":
# Inputs: "A\_prev, W, b". Outputs: "A, activation\_cache".
### START CODE HERE ### (≈ 2 lines of code)
Z, linear\_cache = linear\_forward(A\_prev,W,b)
A, activation\_cache = sigmoid(Z)
### END CODE HERE ###
elif activation == "relu":
# Inputs: "A\_prev, W, b". Outputs: "A, activation\_cache".
### START CODE HERE ### (≈ 2 lines of code)
Z, linear\_cache = linear\_forward(A\_prev,W,b )
A, activation\_cache = relu(Z)
### END CODE HERE ###
assert (A.shape == (W.shape[0], A\_prev.shape[1]))
cache = (linear\_cache, activation\_cache)
return A, cachelinear\_activation\_forward``linear\_activation\_forward
# GRADED FUNCTION: L\_model\_forward
def L\_model\_forward(X, parameters):
"""
Implement forward propagation for the [LINEAR->RELU]\*(L-1)->LINEAR->SIGMOID computation
Arguments:
X -- data, numpy array of shape (input size, number of examples)
parameters -- output of initialize\_parameters\_deep()
Returns:
AL -- last post-activation value
caches -- list of caches containing:
every cache of linear\_activation\_forward() (there are L-1 of them, indexed from 0 to L-1)
"""
caches = []
A = X
L = len(parameters) // 2 # number of layers in the neural network
# Implement [LINEAR -> RELU]\*(L-1). Add "cache" to the "caches" list.
for l in range(1, L):
A\_prev = A
### START CODE HERE ### (≈ 2 lines of code)
A, cache = linear\_activation\_forward(A\_prev,parameters['W'+str(l)],parameters['b'+str(l)],activation= 'relu')
caches.append(cache)
### END CODE HERE ###
# Implement LINEAR -> SIGMOID. Add "cache" to the "caches" list.
### START CODE HERE ### (≈ 2 lines of code)
AL, cache = linear\_activation\_forward(A,parameters['W'+str(L)],parameters['b'+str(L)],activation= 'sigmoid')
caches.append(cache)
### END CODE HERE ###
assert(AL.shape == (1,X.shape[1]))
return AL, caches
# GRADED FUNCTION: compute\_cost
def compute\_cost(AL, Y):
"""
Implement the cost function defined by equation (7).
Arguments:
AL -- probability vector corresponding to your label predictions, shape (1, number of examples)
Y -- true "label" vector (for example: containing 0 if non-cat, 1 if cat), shape (1, number of examples)
Returns:
cost -- cross-entropy cost
"""
m = Y.shape[1]
# Compute loss from aL and y.
### START CODE HERE ### (≈ 1 lines of code)
cost = - 1/m \* np.sum(np.multiply(Y,np.log(AL))+np.multiply(1-Y,np.log(1-AL)))
### END CODE HERE ###
cost = np.squeeze(cost) # To make sure your cost's shape is what we expect (e.g. this turns [[17]] into 17).
assert(cost.shape == ())
return cost
###6.1 - 线性后退
通过dZ获取其他导数dw、db、da

# GRADED FUNCTION: linear\_backward
def linear\_backward(dZ, cache):
"""
Implement the linear portion of backward propagation for a single layer (layer l)
Arguments:
dZ -- Gradient of the cost with respect to the linear output (of current layer l)
cache -- tuple of values (A\_prev, W, b) coming from the forward propagation in the current laye
Returns:
dA\_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A\_prev
dW -- Gradient of the cost with respect to W (current layer l), same shape as W
db -- Gradient of the cost with respect to b (current layer l), same shape as b
"""
A\_prev, W, b = cache
m = A\_prev.shape[1]
### START CODE HERE ### (≈ 3 lines of code)
dW = 1/m \* np.dot(dZ,A\_prev.T)
db = 1/m \* np.sum(dZ,axis=1,keepdims=True)
dA\_prev = np.dot(W.T,dZ)
### END CODE HERE ###
assert (dA\_prev.shape == A\_prev.shape)
assert (dW.shape == W.shape)
assert (db.shape == b.shape)
return dA\_prev, dW, db接下来,您将创建一个合并两个辅助函数的函数:**linear\_backward**以及激活的后退步骤**linear\_activation\_backward**。
为了帮助您实现linear\_activation\_backward,我们提供了两个向后功能:
* **sigmoid\_backward**:实现SIGMOID单元的向后传播。您可以按如下方式调用它:
dZ = sigmoid\_backward(dA, activation\_cache)* **relu\_backward**:实现RELU单元的反向传播。您可以按如下方式调用它:
dZ = relu\_backward(dA, activation\_cache)# GRADED FUNCTION: linear\_activation\_backward
def linear\_activation\_backward(dA, cache, activation):
"""
Implement the backward propagation for the LINEAR->ACTIVATION layer.
Arguments:
dA -- post-activation gradient for current layer l
cache -- tuple of values (linear\_cache, activation\_cache) we store for computing backward propagation efficiently
activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"
Returns:
dA\_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A\_prev
dW -- Gradient of the cost with respect to W (current layer l), same shape as W
db -- Gradient of the cost with respect to b (current layer l), same shape as b
"""
linear\_cache, activation\_cache = cache
if activation == "relu":
### START CODE HERE ### (≈ 2 lines of code)
dZ = relu\_backward(dA,activation\_cache)
dA\_prev, dW, db = linear\_backward(dZ,linear\_cache)
### END CODE HERE ###
elif activation == "sigmoid":
### START CODE HERE ### (≈ 2 lines of code)
dZ = sigmoid\_backward(dA,activation\_cache)
dA\_prev, dW, db = linear\_backward(dZ,linear\_cache)
### END CODE HERE ###
return dA\_prev, dW, dbL\_model\_forward函数时,在每次迭代时,您都存储了一个包含(X,W,b和z)的缓存。在反向传播模块中,您将使用这些变量来计算渐变。因此,在L\_model\_backward函数中,您将从层L开始向后遍历所有隐藏层。在每个步骤中,您将使用层l的缓存值通过层l反向传播。下面的图5显示了向后传球。
# GRADED FUNCTION: L\_model\_backward
def L\_model\_backward(AL, Y, caches):
"""
Implement the backward propagation for the [LINEAR->RELU] \* (L-1) -> LINEAR -> SIGMOID group
Arguments:
AL -- probability vector, output of the forward propagation (L\_model\_forward())
Y -- true "label" vector (containing 0 if non-cat, 1 if cat)
caches -- list of caches containing:
every cache of linear\_activation\_forward() with "relu" (it's caches[l], for l in range(L-1) i.e l = 0...L-2)
the cache of linear\_activation\_forward() with "sigmoid" (it's caches[L-1])
Returns:
grads -- A dictionary with the gradients
grads["dA" + str(l)] = ...
grads["dW" + str(l)] = ...
grads["db" + str(l)] = ...
"""
grads = {}
L = len(caches) # the number of layers
m = AL.shape[1]
Y = Y.reshape(AL.shape) # after this line, Y is the same shape as AL
# Initializing the backpropagation
### START CODE HERE ### (1 line of code)
dAL = -(np.divide(Y,AL)- np.divide(1-Y,1-AL))
### END CODE HERE ###
# Lth layer (SIGMOID -> LINEAR) gradients. Inputs: "dAL, current\_cache". Outputs: "grads["dAL-1"], grads["dWL"], grads["dbL"]
### START CODE HERE ### (approx. 2 lines)
current\_cache = caches[-1]
grads["dA" + str(L - 1)], grads["dW" + str(L)], grads["db" + str(L)] = linear\_activation\_backward(dAL, current\_cache, activation = "sigmoid")
### END CODE HERE ###
# Loop from l=L-2 to l=0
for l in reversed(range(L-1)):
# lth layer: (RELU -> LINEAR) gradients.
# Inputs: "grads["dA" + str(l + 1)], current\_cache". Outputs: "grads["dA" + str(l)] , grads["dW" + str(l + 1)] , grads["db" + str(l + 1)]
### START CODE HERE ### (approx. 5 lines)
current\_cache = caches[l]
dA\_prev\_temp, dW\_temp, db\_temp = linear\_activation\_backward(grads["dA" + str(l+1)], current\_cache, activation = "relu")
grads["dA" + str(l)] = dA\_prev\_temp
grads["dW" + str(l + 1)] = dW\_temp
grads["db" + str(l + 1)] = db\_temp
### END CODE HERE ###
return grads
# GRADED FUNCTION: update\_parameters
def update\_parameters(parameters, grads, learning\_rate):
"""
Update parameters using gradient descent
Arguments:
parameters -- python dictionary containing your parameters
grads -- python dictionary containing your gradients, output of L\_model\_backward
Returns:
parameters -- python dictionary containing your updated parameters
parameters["W" + str(l)] = ...
parameters["b" + str(l)] = ...
"""
L = len(parameters) // 2 # number of layers in the neural network
# Update rule for each parameter. Use a for loop.
### START CODE HERE ### (≈ 3 lines of code)
for l in range(L):
parameters["W" + str(l + 1)] = parameters["W" + str(l + 1)]-learning\_rate\*grads["dW" + str(l + 1)]
parameters["b" + str(l + 1)] = parameters["b" + str(l + 1)]-learning\_rate\*grads["db" + str(l + 1)]
### END CODE HERE ###
return parameters现在您已经熟悉了数据集,现在是时候构建一个深度神经网络来区分猫图像和非猫图像。
您将构建两个不同的模型:
一个2层神经网络
L层深度神经网络
然后,您将比较这些模型的性能,并尝试不同的值L 。

图中的详细架构:
1.输入是(64,64,3)图像,其被展平为大小为(12288,1)的向量。
2.相应的向量:x0,x1,...,x12287 T然后乘以权重矩阵W 1的大小(n 1,12288)
3.然后你添加一个偏见项并使其得到以下向量:[a 1 0,a 1 1,...,a 1 n 1 -1] T
4.然后重复相同的过程。
5.将得到的向量乘以W 2并添加截距(偏差)。
6.最后,你取结果的sigmoid。 如果它大于0.5,则将其归类为猫。

图中的详细架构:
1.输入是(64,64,3)图像,其被展平为大小为(12288,1)的向量。
2.相应的向量:x0,x1,...,x12287 T然后乘以权重矩阵W 1,然后加上截距b 1。 结果称为线性单位。
3.接下来,你拿线性单元的relu。 根据模型架构,每个(W I,b l)可以重复此过程若干次。
4.在最初,你采用最终线性单位的sigmoid。 如果它大于0.5,则将其归类为猫。
##3.3 - 一般方法
像往常一样,您将遵循深度学习方法来构建模型:
a. Forward propagationb. Compute cost functionc. Backward propagationd. Update parameters (using parameters, and grads from backprop) **问题**:使用您在先前任务中实现的辅助函数来构建具有以下结构的2层神经网络:*LINEAR - > RELU - > LINEAR - > SIGMOID*。您可能需要的功能及其输入是:
# GRADED FUNCTION: two\_layer\_model
def two\_layer\_model(X, Y, layers\_dims, learning\_rate=0.0075, num\_iterations=3000, print\_cost=False):
"""
Implements a two-layer neural network: LINEAR->RELU->LINEAR->SIGMOID.
Arguments:
X -- input data, of shape (n\_x, number of examples)
Y -- true "label" vector (containing 0 if cat, 1 if non-cat), of shape (1, number of examples)
layers\_dims -- dimensions of the layers (n\_x, n\_h, n\_y)
num\_iterations -- number of iterations of the optimization loop
learning\_rate -- learning rate of the gradient descent update rule
print\_cost -- If set to True, this will print the cost every 100 iterations
Returns:
parameters -- a dictionary containing W1, W2, b1, and b2
"""
np.random.seed(1)
grads = {}
costs = [] # to keep track of the cost
m = X.shape[1] # number of examples
(n\_x, n\_h, n\_y) = layers\_dims
# Initialize parameters dictionary, by calling one of the functions you'd previously implemented
### START CODE HERE ### (≈ 1 line of code)
parameters = initialize\_parameters(n\_x,n\_h,n\_y)
### END CODE HERE ###
# Get W1, b1, W2 and b2 from the dictionary parameters.
W1 = parameters["W1"]
b1 = parameters["b1"]
W2 = parameters["W2"]
b2 = parameters["b2"]
# Loop (gradient descent)
for i in range(0, num\_iterations):
# Forward propagation: LINEAR -> RELU -> LINEAR -> SIGMOID. Inputs: "X, W1, b1, W2, b2". Output: "A1, cache1, A2, cache2".
### START CODE HERE ### (≈ 2 lines of code)
A1, cache1 = linear\_activation\_forward(X,W1,b1,activation='relu')
A2, cache2 = linear\_activation\_forward(A1,W2,b2,activation='sigmoid')
### END CODE HERE ###
# Compute cost
### START CODE HERE ### (≈ 1 line of code)
cost = compute\_cost(A2,Y)
### END CODE HERE ###
# Initializing backward propagation
dA2 = - (np.divide(Y, A2) - np.divide(1 - Y, 1 - A2))
# Backward propagation. Inputs: "dA2, cache2, cache1". Outputs: "dA1, dW2, db2; also dA0 (not used), dW1, db1".
### START CODE HERE ### (≈ 2 lines of code)
dA1, dW2, db2 = linear\_activation\_backward(dA2,cache2,activation='sigmoid')
dA0, dW1, db1 = linear\_activation\_backward(dA1,cache1,activation='relu')
### END CODE HERE ###
# Set grads['dWl'] to dW1, grads['db1'] to db1, grads['dW2'] to dW2, grads['db2'] to db2
grads['dW1'] = dW1
grads['db1'] = db1
grads['dW2'] = dW2
grads['db2'] = db2
# Update parameters.
### START CODE HERE ### (approx. 1 line of code)
parameters = update\_parameters(parameters,grads,learning\_rate)
### END CODE HERE ###
# Retrieve W1, b1, W2, b2 from parameters
W1 = parameters["W1"]
b1 = parameters["b1"]
W2 = parameters["W2"]
b2 = parameters["b2"]
# Print the cost every 100 training example
if print\_cost and i % 100 == 0:
print("Cost after iteration {}: {}".format(i, np.squeeze(cost)))
if print\_cost and i % 100 == 0:
costs.append(cost)
# plot the cost
plt.plot(np.squeeze(costs))
plt.ylabel('cost')
plt.xlabel('iterations (per tens)')
plt.title("Learning rate =" + str(learning\_rate))
plt.show()
return parameters##5 - L层神经网络
# GRADED FUNCTION: L\_layer\_model
def L\_layer\_model(X, Y, layers\_dims, learning\_rate=0.0075, num\_iterations=3000, print\_cost=False): # lr was 0.009
"""
Implements a L-layer neural network: [LINEAR->RELU]\*(L-1)->LINEAR->SIGMOID.
Arguments:
X -- data, numpy array of shape (number of examples, num\_px \* num\_px \* 3)
Y -- true "label" vector (containing 0 if cat, 1 if non-cat), of shape (1, number of examples)
layers\_dims -- list containing the input size and each layer size, of length (number of layers + 1).
learning\_rate -- learning rate of the gradient descent update rule
num\_iterations -- number of iterations of the optimization loop
print\_cost -- if True, it prints the cost every 100 steps
Returns:
parameters -- parameters learnt by the model. They can then be used to predict.
"""
np.random.seed(1)
costs = [] # keep track of cost
# Parameters initialization. (≈ 1 line of code)
### START CODE HERE ###
parameters = initialize\_parameters\_deep(layers\_dims)
### END CODE HERE ###
# Loop (gradient descent)
for i in range(0, num\_iterations):
# Forward propagation: [LINEAR -> RELU]\*(L-1) -> LINEAR -> SIGMOID.
### START CODE HERE ### (≈ 1 line of code)
AL, caches = L\_model\_forward(X,parameters)
### END CODE HERE ###
# Compute cost.
### START CODE HERE ### (≈ 1 line of code)
cost = compute\_cost(AL,Y)
### END CODE HERE ###
# Backward propagation.
### START CODE HERE ### (≈ 1 line of code)
grads = L\_model\_backward(AL,Y,caches)
### END CODE HERE ###
# Update parameters.
### START CODE HERE ### (≈ 1 line of code)
parameters = update\_parameters(parameters,grads,learning\_rate)
### END CODE HERE ###
# Print the cost every 100 training example
if print\_cost and i % 100 == 0:
print("Cost after iteration %i: %f" % (i, cost))
if print\_cost and i % 100 == 0:
costs.append(cost)
# plot the cost
plt.plot(np.squeeze(costs))
plt.ylabel('cost')
plt.xlabel('iterations (per tens)')
plt.title("Learning rate =" + str(learning\_rate))
plt.show()
return parameterspred\_train = predict(train\_x, train\_y, parameters)
Accuracy: 0.985645933014
pred\_test = predict(test\_x, test\_y, parameters)
Accuracy: 0.8原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。