秦雷洪, 张松涛, 南晓峰, 等. 基于深度强化学习的双体船姿态控制[J]. 中国舰船研究, 2024, 19(X): 1–9. DOI: 10.19693/j.issn.1673-3185.03492
引用本文: 秦雷洪, 张松涛, 南晓峰, 等. 基于深度强化学习的双体船姿态控制[J]. 中国舰船研究, 2024, 19(X): 1–9. DOI: 10.19693/j.issn.1673-3185.03492
QIN L H, ZHANG S T, NAN X F, et al. Deep reinforcement learning for attitude control of catamaran[J]. Chinese Journal of Ship Research, 2024, 19(X): 1–9 (in Chinese. DOI: 10.19693/j.issn.1673-3185.03492
Citation: QIN L H, ZHANG S T, NAN X F, et al. Deep reinforcement learning for attitude control of catamaran[J]. Chinese Journal of Ship Research, 2024, 19(X): 1–9 (in Chinese. DOI: 10.19693/j.issn.1673-3185.03492

基于深度强化学习的双体船姿态控制

Deep reinforcement learning for attitude control of catamaran

  • 摘要:
    目的 针对双体船纵向运动控制中传统控制算法对精确的数学模型和系统参数的依赖问题,提出基于深度强化学习的纵向运动控制算法。
    方法 通过设计奖励函数和神经网络结构以及调整相关超参数,并与双体船模型相结合,通过实验,比较深度强化学习DDPG算法和GA-LQR法在3种不同控制方式下的控制效果,以及在不同工况和初始状态下的鲁棒性。
    结果 在相同工况下,DDPG算法相对于GA-LQR算法在控制效果上略有优势,但其控制过程中的鳍角输出更为激进。在不同工况和初始状态下的仿真实验中,当系统和环境模型发生较大变化时,DDPG算法的控制效果会受到较大影响,但在系统和环境变化较小的情况下,DDPG算法表现出更好的适应性,相较于GA-LQR算法更具优势。综合分析得出,DDPG算法在性能上与GA-LQR算法表现相当。
    结论 基于深度强化学习的DDPG算法在双体船在纵向运动控制中具有应用潜力,为未来复杂海况下船舶运动控制提供了新的研究方向和方法论支持。

     

    Abstract:
    Objectives This paper proposes a vertical motion control algorithm based on deep reinforcement learning, focusing on the dependency of traditional control algorithms on precise mathematical models and system parameters.
    Methods The method achieves its goal by designing reward functions, neural network structures and adjusting relevant hyperparameters. It combines these techniques with a catamaran model. Finally, through experiments, it compares the control performance of the deep reinforcement learning DDPG algorithm and the GA-LQR algorithm under three different control modes and the robustness under different operating conditions and initial states.
    Results Under the same operating conditions, when comparing different control modes, the DDPG algorithm has a slight advantage in control performance over the GA-LQR algorithm, but its fin angle output during the control process is more aggressive. In simulated experiments under different operating conditions and initial states, when the system and environmental models undergo significant changes, the control performance of the DDPG algorithm is significantly affected. However, when the system and environment changes are small, the DDPG algorithm exhibits better adaptability and superiority over the GA-LQR algorithm. Overall, this study concludes that the DDPG algorithm performs similarly to the GA-LQR algorithm in terms of performance.
    Conclusions This study demonstrates the potential applications of the DDPG algorithm, based on deep reinforcement learning, in the longitudinal motion control of catamarans, providing new research directions and methodological support for ship motion control under complex sea conditions in the future.

     

/

返回文章
返回