Abstract:
Objectives To enhance the cooperative detection efficiency and mission stability of autonomous underwater vehicles (AUVs) in complex underwater environments, this study proposes an improved multi-agent deep deterministic policy gradient (MADDPG) algorithm based on the artificial potential field (APF) method, establishing a novel cooperative search-and-track path planning model for AUVs. This model addresses key challenges in traditional MADDPG algorithms, such as inefficient early-stage exploration and insufficient multi-agent coordination in cooperative detection tasks, providing valuable insights for improving AUV collaborative operations.
Methods To overcome the limitations of conventional APF in path planning (e.g., local optima) and MADDPG’s drawbacks (e.g., poor convergence and training instability due to random early-stage exploration), this study introduces the APF-MADDPG algorithm, which integrates APF’s attractive field to guide AUVs’ initial movement. The key innovations include: (1) constructing a dynamic time-varying potential field model that adjusts the field strength coefficient in real time to optimize early-stage exploration; (2) employing Monte Carlo simulations to generate possible target trajectories, statistically analyzing their spatiotemporal distribution in the operational area, and establishing a probabilistic model to predict dynamic underwater target movements; and (3) incorporating sonar detection probabilities at varying distances into the reward function design and path evaluation metrics using the cumulative detection probability (CDP) formula. Comparative simulations were conducted for cooperative detection tasks involving 2 and 3 AUVs under identical initial conditions to evaluate the performance differences between APF-MADDPG and conventional MADDPG.
Results . The experimental results demonstrate that: (1) In terms of detection performance, APF-MADDPG achieves a CDP of 80.93% in the 2-AUV scenario, representing a 7% improvement over conventional MADDPG, while in the 3-AUV scenario, it reaches 92.67%, a 0.6% enhancement; (2) Regarding algorithmic performance, APF-MADDPG exhibits superior initial convergence speed and final convergence stability in both scenarios; (3) In stability tests, the improved algorithm shows reduced performance fluctuations across repeated trials, confirming its stronger robustness
Conclusions The proposed APF-MADDPG algorithm effectively combines APF’s guidance advantages with deep reinforcement learning’s decision-making capabilities, significantly improving detection efficiency and algorithmic stability in AUV cooperative search missions. Future research may explore performance comparisons with other deep reinforcement learning algorithms and investigate multi-AUV cooperative optimization strategies in more complex scenarios to further advance underwater collaborative operations.