陈于涛, 曹诗杰, 曾凡明. 一种用于无人艇目标跟踪的实时Q学习算法[J]. 中国舰船研究, 2020, 37(0): 1–6. doi: 10.19693/j.issn.1673-3185.01763
引用本文: 陈于涛, 曹诗杰, 曾凡明. 一种用于无人艇目标跟踪的实时Q学习算法[J]. 中国舰船研究, 2020, 37(0): 1–6. doi: 10.19693/j.issn.1673-3185.01763
CHEN Y T, CAO S J, ZENG F M. A Real-time Q-Learning Algorithm for Unmanned Surface Vehicle Target Tracking[J]. Chinese Journal of Ship Research, 2020, 37(0): 1–6. doi: 10.19693/j.issn.1673-3185.01763
Citation: CHEN Y T, CAO S J, ZENG F M. A Real-time Q-Learning Algorithm for Unmanned Surface Vehicle Target Tracking[J]. Chinese Journal of Ship Research, 2020, 37(0): 1–6. doi: 10.19693/j.issn.1673-3185.01763

一种用于无人艇目标跟踪的实时Q学习算法

A Real-time Q-Learning Algorithm for Unmanned Surface Vehicle Target Tracking

  • 摘要: 针对无人艇运动规划中的目标跟踪问题,研究增强学习方法在无人艇目标跟踪控制中的应用。通过分析增强学习的模型和特点,提出改进的实时Q学习算法。设计适用于无人艇目标跟踪问题的Q学习算法模型框架,包括行为空间、状态空间、回报函数以及强化学习策略。在固定和不确定的环境中,设计离线和在线测试场景,对自学习算法和控制效果进行分析研究。结果表明,所设计的Q学习算法模型具有自学习的能力,可以自主的进化行为策略,最大化回报函数,实现实时目标跟踪的效果。可以为增强无人艇控制系统的自学习能力提供研究基础。

     

    Abstract: On the background of motion planning problem, the application of enforcement learning method in unmanned surface vehicle target tracking is studied in this paper. The enforcement learning process and Q-learning model is analyzed, the improved algorithm is presented. The Q-learning algorithm framework which is suit for target tracking problem is implemented. This framework includes action space, state space, reward function, and reinforcement learning strategy. After that, based on the offline and online test scenes in certain and uncertain environment, the self-learning algorithm and control effectiveness are analyzed. The result shows that Q-learning algorithm framework has the ability of self-learning, could autonomous evolve action strategy, maximize reward function, and achieve real-time target tracking effectiveness. This work provides a research basis for increasing the self-learning ability of unmanned surface vehicle control system.

     

/

返回文章
返回