留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于强化学习的成品油船装载方案自主生成技术研究

尼洪涛 周清基 柴松 齐鸣

尼洪涛, 周清基, 柴松, 等. 基于强化学习的成品油船装载方案自主生成技术研究[J]. 中国舰船研究, 2023, 18(增刊 1): 1–10 doi: 10.19693/j.issn.1673-3185.03474
引用本文: 尼洪涛, 周清基, 柴松, 等. 基于强化学习的成品油船装载方案自主生成技术研究[J]. 中国舰船研究, 2023, 18(增刊 1): 1–10 doi: 10.19693/j.issn.1673-3185.03474
NI H T, ZHOU Q J, CHAI S, et al. Reinforcement learning-based autonomous generation technology of loading scheme for product oil tanker[J]. Chinese Journal of Ship Research, 2023, 18(Supp 1): 1–10 doi: 10.19693/j.issn.1673-3185.03474
Citation: NI H T, ZHOU Q J, CHAI S, et al. Reinforcement learning-based autonomous generation technology of loading scheme for product oil tanker[J]. Chinese Journal of Ship Research, 2023, 18(Supp 1): 1–10 doi: 10.19693/j.issn.1673-3185.03474

基于强化学习的成品油船装载方案自主生成技术研究

doi: 10.19693/j.issn.1673-3185.03474
基金项目: 天津市交通运输科技发展计划资助项目(G2022-48)
详细信息
    作者简介:

    尼洪涛,男,1982年生,硕士,高级工程师。研究方向:企业级应用开发,人工智能。E-mail:hongtao.ni@szcu.edu.cn

    周清基,男,1986年生,博士,副教授。研究方向:海事航运数据分析和安全。E-mail:zqj@tju.edu.cn

    柴松,男,1985年生,博士

    通信作者:

    周清基

  • 中图分类号: U674.13

Reinforcement learning-based autonomous generation technology of loading scheme for product oil tanker

知识共享许可协议
基于强化学习的成品油船装载方案自主生成技术研究尼洪涛,等创作,采用知识共享署名4.0国际许可协议进行许可。
  • 摘要:   目的  旨在基于强化学习方法研究液货舱装载方案自主生成技术。  方法  以实际运营的成品油船载货量作为输入,以货舱及压载舱的装载率为目标,基于Unity ML-Agents构建智能体与环境,通过PyTorch框架对智能体进行训练,提出一种综合考虑装载时间与纵倾变化幅度的奖励函数计算方法,并以算例分析来验证所提方法的有效性。  结果  结果显示,所训练的智能体能够学习良好的策略,并实现液货舱装载方案的自主生成。  结论  研究结果表明,将强化学习用于解决多目标条件下的液货舱装载方案自主生成是合理可行的。
  • 图  系统原理示意图

    Figure  1.  System schematic diagram

    图  成品油船总布置图

    Figure  2.  General plan of product oil tanker

    图  累计奖励平均估计值曲线

    Figure  3.  Average cumulative reward curve for estimation

    图  学习率

    Figure  4.  Learning rate

    图 

    Figure  5.  Entropy

    图  价值损失曲线

    Figure  6.  Value loss curve

    图  策略损失曲线

    Figure  7.  Policy loss curve

    图  算例1货舱装载方案

    Figure  8.  Cargo tank loading scheme of Example 1

    图  算例1压载水舱装载方案

    Figure  9.  Ballast water tank loading scheme of Example 1

    图  10  算例1稳性曲线

    Figure  10.  Stability curve for Example 1

    图  11  算例1强度曲线

    Figure  11.  Strength curves for Example 1

    图  12  算例2货舱装载方案

    Figure  12.  Cargo tank loading scheme for Example 2

    图  13  算例2压载水舱装载方案

    Figure  13.  Ballast water tank loading scheme for Example 2

    图  14  算例2稳性曲线

    Figure  14.  Stability curve for Example 2

    图  15  算例2强度曲线

    Figure  15.  Strength curves for Example 2

      算法:策略学习的伪代码
      输入:训练任务$ O\left(T\right) $,最大迭代次数集合K,观察的轨迹数N,训练模型的超参数
      输出:策略$ {\pi }_{\theta } $
      随机生成初始参数$ \theta $
      while 未达到最大迭代次数 do
     foreach $ {K}_{\mathit{i}},{O}_{i}\left(T\right) $属于$ (K,O(T\left)\right) $ do
      while $ k\le {K}_{i} $ do
       抽取一批任务$ T\in {O}_{i}\left(T\right) $
       for 所有T do
         根据策略$ {\pi }_{\theta } $,抽取N条轨迹
         计算关于$ \theta $的损失函数的梯度
       end
       根据计算出的梯度更新$ \theta $
      end
     end
    end
    下载: 导出CSV

    表  空船重心

    Table  1.  Values for the gravity center of light ship

    参数数值
    重心距舯/m44.999
    重心高/m8.017
    重心距中/m0
    下载: 导出CSV

    表  舱室参数

    Table  2.  Tank parameters

    舱室编号体积/m3重心距舯/m重心高/m重心距中/m
    RCAR1P548.2493.6486.124−2.713
    RCAR1S542.5993.6476.1272.738
    RCAR2P765.2282.6145.938−3.870
    RCAR2S768.2982.6175.9373.848
    RCAR3P1028.4570.1445.848−4.101
    RCAR3S1025.3270.1275.8484.101
    RCAR4P1247.7454.7745.821−4.146
    RCAR4S1249.1354.7745.8214.146
    RCAR5P1213.9038.0475.900−4.045
    RCAR5S1210.2038.0475.9004.045
    RWBT1P318.5895.0524.595−4.901
    RWBT1S328.7895.0524.4674.743
    RWBT2P258.1583.2593.399−6.606
    RWBT2S268.0683.2543.2986.363
    RWBT3P304.7170.6082.983−6.731
    RWBT3S317.2770.6042.8926.468
    RWBT4P352.4155.1002.961−6.682
    RWBT4S367.3555.1002.8686.412
    RWBT5P246.9041.0292.984−6.733
    RWBT5S256.8241.0322.8946.473
    RWBT6P228.1130.2463.275−6.526
    RWBT6S237.3830.3183.1466.301
    下载: 导出CSV

    表  RCAR1P部分舱容表

    Table  3.  Summary of partial tank capacities of RCAR1P

    装载率/%体积/m3重心距舯/m重心高/m重心距中/m惯性矩/m4
    0.00.0094.1711.301−0.5873.93
    0.63.2693.1991.352−1.54743.54
    1.26.8493.1971.404−1.61349.25
    1.910.6393.2151.456−1.66456.27
    2.714.6293.2351.509−1.71263.87
    3.418.8293.2561.563−1.76072.08
    4.223.2293.2761.617−1.80880.93
    5.127.8293.2951.672−1.85690.02
    10.859.4793.4092.012−2.136143.87
    20.3111.2393.4892.496−2.340152.18
    29.9164.0893.5232.965−2.432160.86
    39.8218.0493.5433.432−2.493169.91
    49.8273.0993.5583.899−2.541179.35
    60.1329.2593.5714.369−2.582190.00
    70.6387.1993.5854.845−2.624208.36
    80.4440.5693.6015.276−2.665226.04
    90.3494.9193.6205.707−2.702217.93
    97.7535.3893.6386.024−2.722207.65
    98.8541.7093.6426.073−2.722120.97
    99.7546.4993.6466.111−2.71742.29
    100.0548.2493.6486.124−2.71321.11
    下载: 导出CSV

    表  模型配置与超参数

    Table  4.  Model configuration parameters and hyper-parameters

    参数数值
    batch_size2 048
    buffer_size20 480
    learning_rate0.001
    learning_rate_schedulelinear
    beta0.0015
    epsilon0.18
    lambd0.90
    num_epoch5
    hidden_units256
    num_layers3
    gamma0.99
    strength1.0
    time_horizon256
    max_steps1.5×107
    注:参数的具体描述和取值范围,可参见https://unity-technologies.github.io/ml-agents/Training-Configuration-File/
    下载: 导出CSV

    表  两个算例得到的实际装载重量

    Table  5.  Statistics of actual loaded weight for the two examples

    目标重量/t实际重量/t相对误差/%
    算例15 1805238.9461.125
    算例23 9503955.7780.146
    下载: 导出CSV

    表  两个算例得到的相邻步骤间的纵倾差值

    Table  6.  Difference of trim value between adjacent steps for the two examples

    步骤数差值平均值差值标准差
    算例1760.0370.002
    算例2690.0150.007
    下载: 导出CSV
  • [1] 张华, 孟昭燃, 齐鸣. 化学品船智能货物管理技术及最新应用[J]. 中国船检, 2022(9): 61–65. doi: 10.3969/j.issn.1009-2005.2022.09.017

    ZHANG H, MENG Z, QI M. Smart cargo management technology for chemical tankers and its latest applications[J]. China Ship Survey, 2022(9): 61–65 (in Chinese). doi: 10.3969/j.issn.1009-2005.2022.09.017
    [2] CHUN D H, ROH M I, LEE H W, et al. Deep reinforcement learning-based collision avoidance for an autonomous ship[J]. Ocean Engineering, 2021, 234: 109216. doi: 10.1016/j.oceaneng.2021.109216
    [3] DRUNGILAS D, KURMIS M, SENULIS A, et al. Deep reinforcement learning based optimization of automated guided vehicle time and energy consumption in a container terminal[J]. Alexandria Engineering Journal, 2023, 67: 397–407. doi: 10.1016/j.aej.2022.12.057
    [4] YAN N N, LIU G Z, XI Z. H A multi-agent system for container terminal management[C]//Proceedings of the 7th World Congress on Intelligent Control and Automation. Chongqing, China: IEEE, 2008.
    [5] YANG C P, ZHAO Y Q, CAI X, et al. Path planning algorithm for unmanned surface vessel based on multiobjective reinforcement learning[J]. Computational Intelligence and Neuroscience, 2023(8): 1–14.
    [6] LE A V, KYAW P T, VEERAJAGADHESWAR P, et al. Reinforcement learning-based optimal complete water-blasting for autonomous ship hull corrosion cleaning system[J]. Ocean Engineering, 2021, 220: 108477. doi: 10.1016/j.oceaneng.2020.108477
    [7] CHEN C, MA F, XU X B, et al. A novel ship collision avoidance awareness approach for cooperating ships using multi-agent deep reinforcement learning[J]. Journal of Marine Science and Engineering, 2021, 9(10): 1056. doi: 10.3390/jmse9101056
    [8] WEI G, KUO W. COLREGs-compliant multi-ship collision avoidance based on multi-agent reinforcement learning technique[J]. Journal of Marine Science and Engineering, 2022, 10(10): 1431. doi: 10.3390/jmse10101431
    [9] 齐鸣, 林嘉昊, 孙淼, 张勇. 权重搜索遗传算法在液货船装卸计划生成中的应用[J]. 中国舰船研究, 2022, 17(增刊 1): 28–36.

    QI M, LIN J H, SUN M, et al. Application of weight coefficient searching genetic algorithm in the plan generation of cargo loading and unloading for liquid cargo carrier[J]. Chinese Journal of Ship Research, 2022, 17(Supp 1): 28–36 (in Chinese).
    [10] OUCHEIKH R, LÖFSTRÖM T, AHLBERG E, et al. Rolling cargo management using a deep reinforcement learning approach[J]. Logistics, 2021, 5(1): 10. doi: 10.3390/logistics5010010
    [11] JULIANI A, BERGES V P, TENG E, et al. Unity: a general platform for intelligent agents[J/OL]. arXiv: 1809.02627 [cs. LG], 2020. [ 2020-05-06]. https://doi.org/10.48550/arXiv.1809.02627.
    [12] PASZKE A, GROSS S, MASSA F, et al. PyTorch: an imperative style, high-performance deep learning library[C]// Proceedings of the 33rd International Conference on Neural Information Processing Systems (NIPS'19). Vancouver, Canada : Curran Associates, Inc. , 2019: 8026–8037.
    [13] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529–533. doi: 10.1038/nature14236
    [14] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[J/OL]. arXiv: 1707.06347 [cs. LG] , 2017. [2017-08-28]. https://doi.org/10.48550/arXiv.1707.06347.
    [15] HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//Proceedings of the 35th International Conference on Machine Learning (ICML 2018). Stockholm, Sweden: Curran Associates, Inc. , 2018: 1861–1870.
  • 加载中
图(15) / 表(7)
计量
  • 文章访问数:  68
  • HTML全文浏览量:  11
  • PDF下载量:  0
  • 被引次数: 0
出版历程
  • 收稿日期:  2023-07-26
  • 修回日期:  2023-11-13
  • 网络出版日期:  2023-11-17

目录

    /

    返回文章
    返回