Google AI dopamine 多巴胺强化学习框架

用户1107453

发布于 2018-09-29 15:52:15

1.1K0

发布于 2018-09-29 15:52:15

文章被收录于专栏：UAI人工智能

Dopamine

Google 的 github 账户最新发布一个框架，叫做 Dopamine。

有了 Dopamine 可以帮助大家更快地设计强化学习原型。

值得试试。下图是在 Seaquest 游戏上的算法比对，可以看到 Rainbow 最厉害了。

其设计原则如下：

Easy experimentation: Make it easy for new users to run benchmark experiments.
Flexible development: Make it easy for new users to try out research ideas.
Compact and reliable: Provide implementations for a few, battle-tested algorithms.
Reproducible: Facilitate reproducibility in results.

为啥要有这个框架：主要为了实现 DeepMind 提出的各种复杂 RL 算法，包括 Rainbow 这个集大成者。下面是三个关键点：

n-step Bellman updates (see e.g. Mnih et al., 2016)
Prioritized experience replay (Schaul et al., 2015)
Distributional reinforcement learning (C51; Bellemare et al., 2017)

声明了这也不是官方产品，但值得你去了解学习。

小伙伴们已经试过了，非常方便。

(dopamine-env) neil@neil-workstation:~/Projects/dopamine$ python -um dopamine.atari.train \

> --agent_name=dqn \

> --base_dir=/tmp/dopamine \

> --gin_files='dopamine/agents/dqn/configs/dqn.gin'

2018-08-28 02:19:22.543030: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

I0828 02:19:22.543931 139761019946752 tf_logging.py:115] Creating DQNAgent agent with the following parameters:

I0828 02:19:22.544101 139761019946752 tf_logging.py:115] gamma: 0.990000

I0828 02:19:22.544147 139761019946752 tf_logging.py:115] update_horizon: 1.000000

I0828 02:19:22.544184 139761019946752 tf_logging.py:115] min_replay_history: 20000

I0828 02:19:22.544219 139761019946752 tf_logging.py:115] update_period: 4

I0828 02:19:22.544251 139761019946752 tf_logging.py:115] target_update_period: 8000

I0828 02:19:22.544284 139761019946752 tf_logging.py:115] epsilon_train: 0.010000

I0828 02:19:22.544317 139761019946752 tf_logging.py:115] epsilon_eval: 0.001000

I0828 02:19:22.544348 139761019946752 tf_logging.py:115] epsilon_decay_period: 250000

I0828 02:19:22.544380 139761019946752 tf_logging.py:115] tf_device: /gpu:0

I0828 02:19:22.544410 139761019946752 tf_logging.py:115] use_staging: True

I0828 02:19:22.544441 139761019946752 tf_logging.py:115] optimizer: <tensorflow.python.training.rmsprop.RMSPropOptimizer object at 0x7f1c7c2adf90>

I0828 02:19:22.545419 139761019946752 tf_logging.py:115] Creating a OutOfGraphReplayBuffer replay memory with the following parameters:

I0828 02:19:22.545480 139761019946752 tf_logging.py:115] observation_shape: 84

I0828 02:19:22.545521 139761019946752 tf_logging.py:115] stack_size: 4

I0828 02:19:22.545557 139761019946752 tf_logging.py:115] replay_capacity: 1000000

I0828 02:19:22.545592 139761019946752 tf_logging.py:115] batch_size: 32

I0828 02:19:22.545624 139761019946752 tf_logging.py:115] update_horizon: 1

I0828 02:19:22.545656 139761019946752 tf_logging.py:115] gamma: 0.990000

I0828 02:19:23.212261 139761019946752 tf_logging.py:115] Beginning training...

I0828 02:19:23.212377 139761019946752 tf_logging.py:115] Starting iteration 0

Steps executed: 53072 Episode length: 812 Return: -21.00

...

让子弹飞一会儿～

关注我们，后会有期。

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2018-08-28，如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 UAI人工智能微信公众号，前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

登录后参与评论

0 条评论

热度