王兴众, 王敏, 罗威. 基于SAC算法的作战仿真推演智能决策技术[J]. 中国舰船研究, 2021, 16(6): 99–108. doi: 10.19693/j.issn.1673-3185.02099
引用本文: 王兴众, 王敏, 罗威. 基于SAC算法的作战仿真推演智能决策技术[J]. 中国舰船研究, 2021, 16(6): 99–108. doi: 10.19693/j.issn.1673-3185.02099
WANG X Z, WANG M, LUO W. Intelligent decision technology in combat deduction based on soft actor-critic algorithm[J]. Chinese Journal of Ship Research, 2021, 16(6): 99–108. doi: 10.19693/j.issn.1673-3185.02099
Citation: WANG X Z, WANG M, LUO W. Intelligent decision technology in combat deduction based on soft actor-critic algorithm[J]. Chinese Journal of Ship Research, 2021, 16(6): 99–108. doi: 10.19693/j.issn.1673-3185.02099

基于SAC算法的作战仿真推演智能决策技术

Intelligent decision technology in combat deduction based on soft actor-critic algorithm

  • 摘要:
      目的  现有作战推演仿真系统主要基于作战规则和经验知识作决策,但存在应用场景有限、效率低、灵活性差等问题。为此,提出一种基于深度强化学习(DRL)技术的智能决策模型。
      方法  首先,建立仿真推演的最大熵马尔科夫决策过程(MDP);然后,以actor-critic (AC)体系为基础构建智能体训练网络,生成随机化策略以提高智能体的探索能力,利用软策略迭代更新的方法搜索更优策略,不断提高智能体的决策水平;最后,在仿真推演平台上对决策模型进行验证。
      结果  结果表明,利用改进SAC决策算法训练的智能体能够实现自主决策,且与深度确定性策略梯度(DDPG)算法相比,获胜概率约提高了24.53%。
      结论  所提出的决策模型设计方案可以为智能决策技术研究提供理论参考,对作战仿真推演具有借鉴意义。

     

    Abstract:
      Objectives  The existing combat deduction simulation system mainly implements decision-making based on operational rules and experience knowledge, and it has certain problems such as limited application scenarios, low decision-making efficiency and poor flexibility. In view of the shortcomings of conventional decision-making methods, an intelligent decision-making model based on deep reinforcement learning (DRL) technology is proposed.
      Methods  First, the maximum entropy Markov decision process(MDP) of simulation deduction is established, and then the agent training network is constructed on the basis of actor-critic architecture to generate randomization policies that improve the agent's exploration ability. At the same time, the soft policy iterative updating method is used to search for better policies and continuously improve the agent's decision-making level. Finally, the simulation is carried out on the Mozi AI platform to validate the model.
      Results  The results show that an agent trained with the improved soft actor-critic (SAC) decision-making algorithm can achieve autonomous decision-making. Compared with the deep deterministic policy gradient (DDPG) algorithm, the probability of winning is increased by 24.53%.
      Conclusions  The design scheme of this decision-making model can provide theoretical references for research on intelligent decision-making technology, giving it some reference significance for warfare simulation and deduction.

     

/

返回文章
返回