基于改进DDPG算法的无人船自主避碰决策方法

Autonomous decision-making method of unmanned ship based on improved DDPG algorithm

  • 摘要:
    目的 为了实现更加安全、高效的海上交通,提出一种基于改进的深度确定性策略梯度(DDPG)算法的无人船自主避碰决策方法。
    方法 针对传统DDPG算法数据利用率低、收敛性差的特点,利用优先经验回放(PER)自适应调节经验优先级,降低样本的相关性,并利用长短期记忆(LSTM)网络提高算法的收敛性。基于船舶领域和《国际海上避碰规则》(COLREGs),设置会遇情况判定模型和一组新定义的奖励函数,并考虑紧迫危险以应对他船不遵守规则的情况。
    结果 为了验证所提方法的有效性,在两船和多船会遇局面下进行仿真实验。实验结果表明,改进的DDPG算法相比于传统DDPG算法在收敛速度上提升约28.8%,
    结论 训练好的自主避碰模型可以在使无人船遵守COLREGs的同时实现自主决策和导航,从而为海上交通领域的智能化决策提供参考。

     

    Abstract:
    Objectives To enhance the safety and efficiency of maritime traffic, an autonomous collision avoidance decision-making method for unmanned ships based on an enhanced Deep Deterministic Policy Gradient (deep deterministic policy gradient, DDPG) algorithm is proposed in this paper.
    Methods In order to address the issues of low data utilization and poor convergence in traditional DDPG algorithms, we employ Priority Experience Replay (prioritized experience replay, PER) to dynamically adjust experience priority, reduce sample correlation, and utilize Long Short-Term Memory (LSTM) network to improve the algorithm convergence. Based on the domain knowledge of ships and adhering to the International Regulations for Preventing Collisions at Sea (COLREGs), a model for determining meeting situations and a novel set of reward functions that consider urgent scenarios when other ships fail to comply with COLREGs are introduced.
    Results Generalization experiments are conducted involving encounters between two-ship and multi-ship to validate the effectiveness of the proposed method. The experimental results demonstrate that compared to traditional DDPG algorithms, our improved approach enhances convergence speed by approximately 28.8%.
    Conclusions The trained model enables autonomous decision-making and navigation while ensuring compliance with COLREGs, thereby providing valuable insights for intelligent decision-making in the field of maritime transportation.

     

/

返回文章
返回