Research on Adaptive Control of Unmanned Surface Vehicle Based on Improved DDPG Algorithm
-
摘要: 【目的】针对水面无人艇(USV)在干扰条件下航行稳定性差的问题,提出了一种基于深度强化学习算法的智能参数整定方法,实现了无人艇在干扰下的有效控制。【方法】建立无人艇动力学模型,结合视线法和PID控制器进行无人艇的航向控制。引入深度强化学习理论,设计智能体环境状态、动作和奖励函数在线调整PID参数。针对DDPG(Deep Deterministic Policy Gradient)算法收敛速度慢和训练时容易出现局部最优情况,提出了改进DDPG算法,将原经验池分离为成功和失败经验池,并设计自适应批次采样函数,优化经验池回放结构。【结果】仿真实验表明,所改进算法迅速收敛,同时在训练后期条件下,基于改进DDPG算法控制器的横向误差和航向角偏差均显著减小,更快贴合期望路径后保持更稳定的路径跟踪。【结论】改进算法显著降低了训练时间成本,增强了智能体训练后期的稳态性能,提高了路径跟踪精度。Abstract: In order to tackle of the issue for the poor navigation stability of unmanned surface vehicle (USV) under interference conditions, an intelligent adjustment strategy for control parameters based on the deep reinforcement learning method is proposed to control of the USV effectively under interference conditions. [Methods] The dynamical model of the USV, which combines the line-of-sight (LOS) method and the PID navigation controller, is established in order to conduct its navigation control task. In view of the time-varying characteristics of the PID parameters for the course control under the conditions of interference, the deep reinforcement learning theory is introduced. The environmental state, action and reward function of the intelligent agent are designed to adjust the PID parameters online. An improved DDPG algorithm is proposed to increase the convergence speed as well as to tackle of the issue for the occurrence of local optima during the training process. Specifically, the original experience pool is separated into successful and failure experience pools and an adaptive sampling mechanism is designed in order to optimize the experience pool playback structure. [Results] Simulation experiments show that the improved algorithm converges rapidly, and the average return is slightly improved in the later stage of training. Under the condition of external disturbance, the lateral errors and the heading angle deviations of the controller based on the improved DDPG algorithm are reduced significantly. Path tracking can be maintained more steadily after fitting the desired path faster. [Conclusions] The improved algorithm greatly reduces the cost of training time, enhances the steady-state performance of the agent in the later stage of training, and achieves more accurate tracking of the path.
-
Key words:
- USV /
- deep reinforcement learning /
- intelligent control /
- trajectory tracking /
- parameter setting
-
计量
- 文章访问数: 233
- 被引次数: 0