留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于SAC算法的作战仿真推演智能决策技术

王兴众 王敏 罗威

王兴众, 王敏, 罗威. 基于SAC算法的作战仿真推演智能决策技术[J]. 中国舰船研究, 2021, 16(X): 1–10 doi: 10.19693/j.issn.1673-3185.02099
引用本文: 王兴众, 王敏, 罗威. 基于SAC算法的作战仿真推演智能决策技术[J]. 中国舰船研究, 2021, 16(X): 1–10 doi: 10.19693/j.issn.1673-3185.02099
WANG X Z, WANG M, LUO W. Intelligent decision technology in combat deduction based on soft actor-critic algorithm[J]. Chinese Journal of Ship Research, 2021, 16(X): 1–10 doi: 10.19693/j.issn.1673-3185.02099
Citation: WANG X Z, WANG M, LUO W. Intelligent decision technology in combat deduction based on soft actor-critic algorithm[J]. Chinese Journal of Ship Research, 2021, 16(X): 1–10 doi: 10.19693/j.issn.1673-3185.02099

基于SAC算法的作战仿真推演智能决策技术

doi: 10.19693/j.issn.1673-3185.02099
详细信息
    作者简介:

    王兴众,男,1979年生,博士,高级工程师

    王敏,女,1997年生,硕士生

    罗威,男,1980年生,博士,高级工程师

    通信作者:

    王敏

  • 中图分类号: U662.9

Intelligent decision technology in combat deduction based on soft actor-critic algorithm

  • 摘要:   目的  现有作战推演仿真系统主要基于作战规则和经验知识作决策,但存在应用场景有限、效率低、灵活性差等问题。为此,提出了一种基于深度强化学习(DRL)技术的智能决策模型。  方法  首先,建立仿真推演的最大熵马尔科夫决策过程(MDP);然后,以actor-critic体系为基础构建智能体训练网络,生成随机化策略以提高智能体的探索能力,并利用软策略迭代更新的方法搜索更优策略,不断提高智能体的决策水平;最后,在仿真推演平台上对决策模型进行验证。  结果  结果表明,利用改进SAC决策算法训练的智能体能够实现自主决策,且与深度确定性策略梯度(DDPG)算法相比,获胜概率约提高了24.53%。  结论  所提出的决策模型设计方案可以为智能决策技术研究提供理论参考,对作战仿真推演具有重要借鉴意义。
  • 图  1  AC体系架构图[10]

    Figure  1.  The architecture of actor-critic

    图  2  采用离线策略更新的AC体系

    Figure  2.  AC architecture updated by off-policy

    图  3  水面反射探测

    Figure  3.  Water surface reflection detection

    图  4  红方反潜直升机状态分析

    Figure  4.  State analysis of red team's ASW helicopter

    图  5  基于SAC的红方直升机决策网络

    Figure  5.  Decision-making network of red team's helicopter based on SAC

    图  6  价值网络结构

    Figure  6.  The structure of value network

    图  7  策略网络结构

    Figure  7.  The structure of policy network

    图  8  Soft Q网络结构

    Figure  8.  The structure of soft Q network

    图  9  红方智能体的探索动作选择

    Figure  9.  The action choice of the red team agent for exploration

    图  10  仿真推演平台总体框架图

    Figure  10.  Overall framework of simulation platform

    图  11  AI开发平台总体框架图

    Figure  11.  Overall framework of AI development platform

    图  12  红方智能体训练过程

    Figure  12.  The training process of red team agent

    图  13  红方击毁蓝方常规潜艇

    Figure  13.  Red team destroys blue team's conventional submarines

    图  14  红方击毁蓝方常规潜艇航迹图

    Figure  14.  Track map of red team destroying blue team's conventional submarine

    图  15  红方击毁蓝方核潜艇

    Figure  15.  Red team destroys blue team's nuclear submarine

    图  16  红方击毁蓝方核潜艇航迹图

    Figure  16.  Track map of red team destroying blue team's nuclear submarine

    图  17  两种决策算法的平均回报对比

    Figure  17.  Comparison of average return of two decision algorithms

    图  18  红方获胜概率对比

    Figure  18.  Comparison of winning probability of red team

    表  1  红方海军兵力编成

    Table  1.   The navy strength of red team

    单元类型及名称航速/(km·h−1位置数量/艘单元主要武器
    MH-60R“海鹰”
    反潜机
    259.28(34°13'9" E,43°48'37" N)1Mk-54轻型鱼雷 ×2;
    AN/SSQ-62E定向指令主动声呐
    浮标系统(DICASS系统)×8;
    AN/SSQ-53F 定向频率分析和
    记录被动声呐浮标(DIFAR系统)×1
    “阿利∙伯克”级
    Flight IIA导弹驱逐舰
    0(33°50'15" E,43°26'30" N)1Mk-54轻型鱼雷 ×40;
    RUM-139C VLA反潜火箭弹 ×8
    下载: 导出CSV

    表  2  蓝方海军兵力编成

    Table  2.   The navy strength of blue team

    单元类型及单元名称航速/(km·h−1位置数量/艘单元主要武器
    955A“北方之神”级战略核潜艇
    0(34°65'28" E,43°4'36" N)1SS-N-15“海星”反潜导弹×2;
    USET-80K鱼雷 ×14
    21310型“鱼鳃-NN”级常规潜艇
    0(33°84'74" E,43°73'80" N)1高性能炸药 ×6
    下载: 导出CSV

    表  3  超参数设置

    Table  3.   The hyper-parameter settings

    参数数值
    学习率0.001
    折扣因子$\gamma $0.99
    软更新率$\tau $0.001
    温度参数$\alpha $1.00
    经验回放缓存空间Ns1 000 000
    每批训练样本数/个128
    最大训练episode数/个5 000
    每个episode的最大训练步数30
    下载: 导出CSV
  • [1] 胡荟, 吴振齐. 人工智能技术在美国军事情报工作中的当前应用及发展趋势探析[J]. 国防科技, 2020, 41(2): 15–20.

    HU H, WU Z Q. Research on the current application and development trend of artificial intelligence technology in US military intelligence work[J]. National Defense Science & Technology, 2020, 41(2): 15–20 (in Chinese).
    [2] 付长军, 郑伟明, 葛蕾, 等. 人工智能在作战仿真中的应用研究[J]. 无线电工程, 2020, 50(4): 257–261. doi: 10.3969/j.issn.1003-3106.2020.04.001

    FU C J, ZHENG W M, GE L, et al. Application of artificial intelligence in combat simulation[J]. Radio Engineering, 2020, 50(4): 257–261 (in Chinese). doi: 10.3969/j.issn.1003-3106.2020.04.001
    [3] 孙鹏, 谭玉玺, 李路遥. 基于态势描述的陆军作战仿真外部决策模型研究[J]. 指挥控制与仿真, 2016, 38(2): 15–19. doi: 10.3969/j.issn.1673-3819.2016.02.004

    SUN P, TAN Y X, LI L Y. Research on external decision model of army operational simulation based on situation description[J]. Command Control & Simulation, 2016, 38(2): 15–19 (in Chinese). doi: 10.3969/j.issn.1673-3819.2016.02.004
    [4] 董倩, 纪梦琪, 朱一凡, 等. 空中作战决策行为树建模与仿真[J]. 指挥控制与仿真, 2019, 41(1): 12–19. doi: 10.3969/j.issn.1673-3819.2019.01.003

    DONG Q, JI M Q, ZHU Y F, et al. Behavioral tree modeling and simulation for air operations decision[J]. Command Control & Simulation, 2019, 41(1): 12–19 (in Chinese). doi: 10.3969/j.issn.1673-3819.2019.01.003
    [5] 彭希璐, 王记坤, 张昶, 等. 面向智能决策的兵棋推演技术[C]//2019第七届中国指挥控制大会论文集. 北京: 中国指挥与控制学会, 2019: 193-198.

    PENG X L, WANG J K, ZHANG C, et al. The technology of wargame based on intelligent decision[C]//Proceedings of the 7th China Command and Control Conference in 2019. Beijing: Chinese Institute of Command and Control, 2019: 193-198 (in Chinese).
    [6] 廖馨, 孙峥皓. 作战推演仿真中的智能决策技术应用探索[C]//第二十届中国系统仿真技术及其应用学术年会论文集. 乌鲁木齐: 中国自动化学会系统仿真专业委员会, 2019: 368–374.

    LIAO X, SUN Z H. Exploration on application of intelligent decision-making in battle deduction simulation[C]//Proceedings of the 20th China Annual Conference on System Simulation Technology and its Application. Urumqi: System Simulation Committee of China Automation Society, 2019: 368–374 (in Chinese).
    [7] 崔文华, 李东, 唐宇波, 等. 基于深度强化学习的兵棋推演决策方法框架[J]. 国防科技, 2020, 41(2): 113–121.

    CUI W H, LI D, TANG Y B, et al. Framework of wargaming decision-making methods based on deep reinforcement learning[J]. National Defense Science & Technology, 2020, 41(2): 113–121 (in Chinese).
    [8] HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//Proceedings of the 35th International Conference on Machine Learning. Stockholm, Sweden: ACM Press, 2018.
    [9] SUTTON R S, BARTO A G. Reinforcement Learning: An Introduction[M]. Cambridge: MIT Press, 1998.
    [10] SPIELBERG S, GOPALUNI R, LOEWEN P. Deep reinforcement learning approaches for process control[C]//2017 6th International Symposium on Advanced Control of Industrial Processes, [S. 1. ]: IEEE, 2017: 201–203.
    [11] HAARNOJA T, ZHOU A, HARTIKAINEN K, et al. Soft actor-critic algorithms and applications [EB/OL]. ArXiv: 1812.05905, 2018(2018-12-13)[2020-08-30]. https://arxiv.org/abs/1812.05905.
    [12] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529–533. doi: 10.1038/nature14236
    [13] SCHULMAN J, CHEN X, ABBEEL P. Equivalence between policy gradients and soft Q-learning[EB/OL]. ArXiv: 1704.06440, 2017. (2017-4-21)[2020-08-30]. https://arxiv.org/pdf/1704.06440.pdf.
    [14] HAARNOJA T, TANG H, ABBEEL P, et al, Reinforcement learning with deep energy-based policies[C]//Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia: ACM Press: MLR. org, 2017: 1352–1361.
    [15] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[C]//Proceedings of the 4th International Conference on Learning Representations. San Juan, Puerto Rico: Elsevier, 2016.
  • 加载中
图(18) / 表(3)
计量
  • 文章访问数:  190
  • HTML全文浏览量:  56
  • PDF下载量:  42
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-08-31
  • 修回日期:  2021-02-04
  • 网络出版日期:  2021-06-11

目录

    /

    返回文章
    返回