祝亢, 黄珍, 王绪明. 基于深度强化学习的智能船舶航迹跟踪控制[J]. 中国舰船研究, 2021, 16(1): 105–113. doi: 10.19693/j.issn.1673-3185.01940
引用本文: 祝亢, 黄珍, 王绪明. 基于深度强化学习的智能船舶航迹跟踪控制[J]. 中国舰船研究, 2021, 16(1): 105–113. doi: 10.19693/j.issn.1673-3185.01940
ZHU K, HUANG Z, WANG . Tracking control of intelligent ship based on deep reinforcement learning[J]. Chinese Journal of Ship Research, 2021, 16(1): 105–113. doi: 10.19693/j.issn.1673-3185.01940
Citation: ZHU K, HUANG Z, WANG . Tracking control of intelligent ship based on deep reinforcement learning[J]. Chinese Journal of Ship Research, 2021, 16(1): 105–113. doi: 10.19693/j.issn.1673-3185.01940

基于深度强化学习的智能船舶航迹跟踪控制

Tracking control of intelligent ship based on deep reinforcement learning

  • 摘要:
      目的  智能船舶的航迹跟踪控制问题往往面临着控制环境复杂、控制器稳定性不高以及大量的算法计算等问题。为实现对航迹跟踪的精准控制,提出一种引入深度强化学习技术的航向控制器。
      方法  首先,结合视线(LOS)算法制导,以船舶的操纵特性和控制要求为基础,将航迹跟踪问题建模成马尔可夫决策过程,设计其状态空间、动作空间、奖励函数;然后,使用深度确定性策略梯度(DDPG)算法作为控制器的实现,采用离线学习方法对控制器进行训练;最后,将训练完成的控制器与BP-PID控制器进行对比研究,分析控制效果。
      结果  仿真结果表明,设计的深度强化学习控制器可以从训练学习过程中快速收敛达到控制要求,训练后的网络与BP-PID控制器相比跟踪迅速,具有偏航误差小、舵角变化频率小等优点。
      结论  研究成果可为智能船舶航迹跟踪控制提供参考。

     

    Abstract:
      Objectives   The tracking control of intelligent ships often faces the problem of low controller stability in complex control environments and manual algorithmic computing. In order to achieve precise tracking control, this paper proposes a controller based on deep reinforcement learning (DRL).
      Methods  Guided by the line-of-sight (LOS) algorithm and based on the maneuvering characteristics and control requirements of ships, this paper formulates a path of Markov decision processes by following the control problem, designing its state space, action space and reward by applying a deep deterministic policy gradient (DDPG) algorithm to implement the controller. An off-line learning method was used to train the controller. After the training, a comparison was made with BP-PID control to analyze the control effects.
      Results  Simulation results show that the deep reinforcement learning (DRL) controller can rapidly converge from the training process to meet the control requirements, with the advantages of small yaw error, and a visible reduction in the frequency of changes of the rudder angle.
      Conclusions   The study results can provide a reference for the tracking control of intelligent ships.

     

/

返回文章
返回