前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Teach agent how to walk with sac algorithm

Teach agent how to walk with sac algorithm

作者头像
CreateAMind
发布2019-04-28 14:54:52
3320
发布2019-04-28 14:54:52
举报
文章被收录于专栏:CreateAMind

https://github.com/rail-berkeley/softlearning training about ten hours with 24 cores

视频内容
视频内容

Pretty cool, right? Today we will discuss some fundamental idea in reinforcement learning, one day you can also make this agent walk.

First of all, let's back to that cooking example, when you cook, there are many actions you make, adding water, adding eggs...every action you make is base on two things, states, and policy. States is what your kitchen looks like and what is your dish looks like, the policy tells you what to do under that circumstance. Of course, you will gain some reward by your action, maybe a sweet cookie or a burned rubbish, that is the so-called reward. Let's make it more official. The picture below illustrates the dynamic of state-policy-action-reward.

Just like you want to make as many cookies as you can. The goal of the agent is to maximize its cumulative reward, called return. Reinforcement learning methods are ways that the agent can learn behaviors to achieve its goal[1].

Next time we will jump into this key concept with an accurate definition. Of course, math is inevitable. But we will make it easier to understand, just like cooking, you will get used to it when you practice enough.

  • Policies
  • Trajectories
  • Reward and Return
  • Value functions

One more thing, the sqn algorithm is an old version in yesterday article. The following one is the latest version, please read this one.

[1] https://spinningup.openai.com/en/latest/spinningup/rl_intro.html

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2019-01-14,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 CreateAMind 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档