泛化Q学习与连续*动*的工作空间工作、空间、泛化

2023-09-11 03:51:14 作者:岁月染过的梦

我试图让代理去学习必要的鼠标运动的加强,以最好的执行某些任务,学习设置(即奖励信号是学习的唯一的反馈)。

I'm trying to get an agent to learn the mouse movements necessary to best perform some task in a reinforcement learning setting (i.e. the reward signal is the only feedback for learning).

我希望用Q学习技术,但同时我发现的办法这种方法扩展到连续状态空间时,我似乎无法弄清楚如何适应的问题有连续动作空间

I'm hoping to use the Q-learning technique, but while I've found a way to extend this method to continuous state spaces, I can't seem to figure out how to accommodate a problem with a continuous action space.

我可以强迫所有的鼠标动作得到一定幅度的,并只在一定数量不同的方向,但使得行动离散会产生巨大的作用空间中的任何合理的方式。由于标准Q学习要求代理评估的所有的可能行动,这种近似不会解决问题在任何实际意义。

I could just force all mouse movement to be of a certain magnitude and in only a certain number of different directions, but any reasonable way of making the actions discrete would yield a huge action space. Since standard Q-learning requires the agent to evaluate all possible actions, such an approximation doesn't solve the problem in any practical sense.

推荐答案

处理这个问题的常用方法是用的演员评论家方法的。这些自然延伸到连续动作的空间。基本Q学习可以发散但有近似工作时,如果你仍然想使用它,你可以尝试用自组织映射结合,如在做的自组织地图中的应用,以强化学习。该文件还包含一些进一步的引用,你可能会发现有用的。

The common way of dealing with this problem is with actor-critic methods. These naturally extend to continuous action spaces. Basic Q-learning could diverge when working with approximations, however, if you still want to use it, you can try combining it with a self-organizing map, as done in "Applications of the self-organising map to reinforcement learning". The paper also contains some further references you might find useful.