Teach agent how to walk with sac algorithm

CreateAMind

发布于 2019-04-28 14:54:52

3320

发布于 2019-04-28 14:54:52

文章被收录于专栏：CreateAMind

https://github.com/rail-berkeley/softlearning training about ten hours with 24 cores

视频内容

Pretty cool, right? Today we will discuss some fundamental idea in reinforcement learning, one day you can also make this agent walk.

First of all, let's back to that cooking example, when you cook, there are many actions you make, adding water, adding eggs...every action you make is base on two things, states, and policy. States is what your kitchen looks like and what is your dish looks like, the policy tells you what to do under that circumstance. Of course, you will gain some reward by your action, maybe a sweet cookie or a burned rubbish, that is the so-called reward. Let's make it more official. The picture below illustrates the dynamic of state-policy-action-reward.

Just like you want to make as many cookies as you can. The goal of the agent is to maximize its cumulative reward, called return. Reinforcement learning methods are ways that the agent can learn behaviors to achieve its goal[1].

Next time we will jump into this key concept with an accurate definition. Of course, math is inevitable. But we will make it easier to understand, just like cooking, you will get used to it when you practice enough.