PS:
https://spinningup.openai.com/en/latest/spinningup/rl_intro2.html
https://simoninithomas.github.io/Deep_reinforcement_learning_Course/
policy gradient:
https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html#a2c
A3C:
https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-8-asynchronous-actor-critic-agents-a3c-c88f72a5e9f2
PS:
Dueling DQN implement by tensorflow:
with tf.name_scope("Conv_net"):
for i, (out_size, kernel, stride) in enumerate(filters[:-1], 1):
inputs = tf.layers.conv2d(
inputs,
out_size,
kernel,
stride,
activation=tf.nn.relu,
padding="VALID",
name="conv{}".format(i))
out_size, kernel, stride = filters[-1]
conv3 = tf.layers.conv2d(
inputs,
out_size,
kernel,
stride,
activation=tf.nn.relu,
padding="valid",
name="conv3")
conv3_flat = tf.layers.flatten(conv3)
with tf.name_scope("fc_net"):
# label = "fcn{}".format(i)
fcn4 = tf.layers.dense(
conv3_flat,
512,
kernel_initializer=normc_initializer(1.0),
activation=tf.nn.relu,
name="fcn4v")
fcnv = tf.layers.dense(
fcn4,
units=1,
kernel_initializer=normc_initializer(1.0),
activation=None,
name="fcnv")
fcna = tf.layers.dense(
fcn4,
units=num_outputs,
kernel_initializer=normc_initializer(1.0),
activation=None,
name="fcna")
q_values = fcnv + tf.subtract(fcna, tf.reduce_mean(fcna, axis=1, keepdims=True))