Abstract:
Objectives To enhance the efficiency of cooperative detection and mission stability of autonomous underwater vehicles (AUVs) in complex underwater environments, this study proposes an improved multi-agent deep deterministic policy gradient (MADDPG) algorithm based on the artificial potential field (APF) method, establishing a novel cooperative search-and-tracking path planning model for AUVs. This model addresses key challenges in traditional MADDPG algorithms, such as inefficient exploration in the early stage and limited coordination among agents in cooperative detection missions, providing valuable insights for improving AUV collaborative operations.
Method To overcome the limitations of conventional APF in path planning, such as local optima, and the drawbacks of MADDPG, such as poor convergence and training instability due to random early-stage exploration, this study proposes the APF−MADDPG algorithm, which integrates APF's attractive field to guide AUVs' initial movement. The key innovations include: (1) constructing a dynamic, time-varying potential field model that adjusts the field strength coefficient in real time to enhance early-stage exploration; (2) employing Monte−Carlo simulations to generate possible target trajectories, statistically analyzing their spatiotemporal distribution in the operational area, and establishing a probabilistic model to predict the dynamic movements of underwater targets; and (3) integrating sonar detection probabilities at varying distances into the reward function design and path evaluation metrics with the use of the cumulative detection probability (CDP) formula. Comparative simulations were conducted for cooperative detection tasks involving 2 and 3 AUVs under identical initial conditions to evaluate the performance differences between the APF−MADDPG and conventional MADDPG algorithms.
Conclusions The experimental results demonstrate that: (1) In terms of detection performance, APF−MADDPG achieves a CDP of 80.93% in the 2-AUV scenario, representing a 7% improvement over conventional MADDPG, while in the 3-AUV scenario, it reaches 92.67%, showing a 0.6% increase; (2) Regarding algorithmic performance, APF−MADDPG exhibits superior initial convergence speed and final convergence stability in both scenarios; (3) In stability tests, the improved algorithm shows less performance fluctuations across repeated trials, confirming its superior robustness.