基于人工势场法改进MADDPG算法的AUV协同应召搜潜航路规划研究

On-call antisubmarine path planning for AUVs based on an artificial potential field-enhanced MADDPG algorithm

  • 摘要:
    目的 为提高AUV在复杂水下环境中的协同探测效率和稳定性,基于人工势场法(APF)改进多智能体深度确定性策略梯度(MADDPG)算法,建立一种新的自主水下航行器(AUV)协同应召搜潜航路规划模型。
    方法 针对搜潜路径规划中使用APF容易局部最优,而MADDPG算法前期盲目探索、收敛性差的问题,提出使用APF的引力场引导AUV前期运动方向并与MADDPG结合的算法(APF-MADDPG)。通过蒙特−卡洛方法仿真大量目标可能轨迹,统计所有目标轨迹点不同时刻所在的海域位置,进而实现预测动态水下目标的散布规律。同时,综合考虑声呐不同距离的探测概率,并与累积探测概率(CDP)公式结合作为路径评估指标,采用该算法实现2艘AUV与3艘AUV的协同探测仿真。
    结果 实验结果显示,APF-MADDPG算法在2艘AUV协同探测场景中相比原始MADDPG算法,能将CDP提高7%,达到80.93%;在3艘AUV协同探测场景中提升0.6%,达到92.67%。
    结论 APF-MADDPG算法可有效地提升AUV协同搜潜任务的探测效率和稳定性。未来研究可以进一步探索其他深度强化学习算法在同一搜潜场景下的性能对比,以进一步提升搜潜场景下多AUV协同的探测效率与协同作战能力。

     

    Abstract:
    Objectives To enhance the cooperative detection efficiency and mission stability of autonomous underwater vehicles (AUVs) in complex underwater environments, this study proposes an improved multi-agent deep deterministic policy gradient (MADDPG) algorithm based on the artificial potential field (APF) method, establishing a novel cooperative search-and-track path planning model for AUVs. This model addresses key challenges in traditional MADDPG algorithms, such as inefficient early-stage exploration and insufficient multi-agent coordination in cooperative detection tasks, providing valuable insights for improving AUV collaborative operations.
    Methods To overcome the limitations of conventional APF in path planning (e.g., local optima) and MADDPG’s drawbacks (e.g., poor convergence and training instability due to random early-stage exploration), this study introduces the APF-MADDPG algorithm, which integrates APF’s attractive field to guide AUVs’ initial movement. The key innovations include: (1) constructing a dynamic time-varying potential field model that adjusts the field strength coefficient in real time to optimize early-stage exploration; (2) employing Monte Carlo simulations to generate possible target trajectories, statistically analyzing their spatiotemporal distribution in the operational area, and establishing a probabilistic model to predict dynamic underwater target movements; and (3) incorporating sonar detection probabilities at varying distances into the reward function design and path evaluation metrics using the cumulative detection probability (CDP) formula. Comparative simulations were conducted for cooperative detection tasks involving 2 and 3 AUVs under identical initial conditions to evaluate the performance differences between APF-MADDPG and conventional MADDPG.
    Results . The experimental results demonstrate that: (1) In terms of detection performance, APF-MADDPG achieves a CDP of 80.93% in the 2-AUV scenario, representing a 7% improvement over conventional MADDPG, while in the 3-AUV scenario, it reaches 92.67%, a 0.6% enhancement; (2) Regarding algorithmic performance, APF-MADDPG exhibits superior initial convergence speed and final convergence stability in both scenarios; (3) In stability tests, the improved algorithm shows reduced performance fluctuations across repeated trials, confirming its stronger robustness
    Conclusions The proposed APF-MADDPG algorithm effectively combines APF’s guidance advantages with deep reinforcement learning’s decision-making capabilities, significantly improving detection efficiency and algorithmic stability in AUV cooperative search missions. Future research may explore performance comparisons with other deep reinforcement learning algorithms and investigate multi-AUV cooperative optimization strategies in more complex scenarios to further advance underwater collaborative operations.

     

/

返回文章
返回