强化学习(RL)不需要这些假设,不需要这些数据,只需要有可以做采样的系统就可以 (问题的场景)。强化学习中不需要专家告诉你该怎么做,你只需要自己尝试不同的行动,并从系统的反馈中吸取经验。比如广告推荐的案例中,反馈可以表示为您获得点击的历史数目。但这意味着,RL算法必须探索到了所有可能的操作动作。一旦错过了尝试最佳行为,就有漏学的风险。RL的另一个问题是你必须非常小心地探索所有可能的状态空间。否则,您就有可能漏掉这个问题的答案。
International version:
The difference between RL and supervised learning:
In supervised learning, you generally assume that you have a dataset where you not only have the observation but the answers that some kind of the experts gave to you. And your task is to get as close as possible to the opinion of this expert. Of course, this means that you need the expertise to gather the data in the first place. And if you don't have, you model more or less get screwed. It is also very important to supervise learning can usually assume that your training point is independent of one another. And this basically helps you a great deal if you apply (say) the stochastic gradient descent where you want to sample the data.
Reinforcement learning, however, lacks all those assumptions. First, you don't have a dataset. Instead, you have some kind of system from which you can sample data, but you don't get reference answers, so there is no expert telling you what to do. Instead, you can try actions by yourself and there is some kind of critics that assign you positive and negative feedbacks. In the case of an advertisement, this feedback is denoted as the memory you get for the clicks. And this implies that whatever algorithm you used for RL, you have to take care of exploring all the possible actions. lest you the risk of never try the optimal actions and never learn it. Another problem with this RL is that the decision-maker affect its own observation. Basically, you have to be very careful to explore the state space well. Otherwise, you risk misinterpretation there and failing to grasp the entirety of your problem.
The other domain like the unsupervised learning also differs from the RL a great deal. Unsupervised learning doesn't have an expert as well, but it tries to do different things. Instead of to learn the optimal strategy, it seems to try to describe the data. It tries to find some underlying data structure, and this is very different from trying to find the strategy because somethings it is much easier to ride the bicycle than to understand the structure of it. Especially, when it comes to not the bicycle but a computer. Finally, it is important to know that although there are those kinds of features and bullet points, there is no hard decision boundary of what is supervised learning, RL, and unsupervised learning. Instead, if you are trying to solve any particular practical problem, you may find yourself using it in some kind of combination of supervised learning, RL, and unsupervised learning as a helper maybe. But RL is more-or-less the most general area that can be treated as kinds of the superset of full-supervised, non supervised learning here.
领取专属 10元无门槛券
私享最新 技术干货