基于改进DDPG算法的无人艇自适应控制

Adaptive control of unmanned surface vehicle based on improved DDPG algorithm

  • 摘要:
    目的 针对水面无人艇(USV)在干扰条件下航行稳定性差的问题,提出一种基于深度强化学习(DRL)算法的智能参数整定方法,以实现对USV在干扰情况下的有效控制。
    方法 首先,建立USV动力学模型,结合视线(LOS)法和PID控制器对USV进行航向控制;其次,引入DRL理论,设计智能体环境状态、动作和奖励函数在线调整PID参数;然后,针对深度确定性策略梯度 (DDPG)算法收敛速度慢和训练时容易出现局部最优的情况,提出改进DDPG算法,将原经验池分离为成功经验池和失败经验池;最后,设计自适应批次采样函数,优化经验池回放结构。
    结果 仿真实验表明,所改进的算法迅速收敛。同时,在训练后期条件下,基于改进DDPG算法控制器的横向误差和航向角偏差均显著减小,可更快地贴合期望路径后保持更稳定的路径跟踪。
    结论 改进后的DDPG算法显著降低了训练时间成本,不仅增强了智能体训练后期的稳态性能,还提高了路径跟踪精度。

     

    Abstract:
    Objective In order to tackle the issue of the poor navigation stability of unmanned surface vehicles (USVs) under interference conditions, an intelligent control parameter adjustment strategy based on the deep reinforcement learning (DRL) method is proposed.
    Method A dynamic model of a USV combining the line-of-sight (LOS) method and PID navigation controller is established to conduct its navigation control tasks. In view of the time-varying characteristics of PID parameters for course control under interference conditions, the DRL theory is introduced. The environmental state, action and reward functions of the intelligent agent are designed to adjust the PID parameters online. An improved deep deterministic policy gradient (DDPG) algorithm is proposed to increase the convergence speed and address the issue of the occurrence of local optima during the training process. Specifically, the original experience pool is separated into success and failure experience pools, and an adaptive sampling mechanism is designed to optimize the experience pool playback structure.
    Results The simulation results show that the improved algorithm converges rapidly with a slightly improved average return in the later stages of training. Under interference conditions, the lateral errors and heading angle deviations of the controller based on the improved DDPG algorithm are reduced significantly. Path tracking can be maintained more steadily after fitting the desired path faster.
    Conclusion The improved algorithm greatly reduces the cost of training time, enhances the steady-state performance of the agent in the later stages of training and achieves more accurate path tracking.

     

/

返回文章
返回