Reinforcement learning-based autonomous generation technology of loading scheme for product oil tanker
-
摘要:
目的 旨在基于强化学习方法研究液货舱装载方案自主生成技术。 方法 以实际运营的成品油船载货量作为输入,以货舱及压载舱的装载率为目标,基于Unity ML-Agents构建智能体与环境,通过PyTorch框架对智能体进行训练,提出一种综合考虑装载时间与纵倾变化幅度的奖励函数计算方法,并以算例分析来验证所提方法的有效性。 结果 结果显示,所训练的智能体能够学习良好的策略,并实现液货舱装载方案的自主生成。 结论 研究结果表明,将强化学习用于解决多目标条件下的液货舱装载方案自主生成是合理可行的。 Abstract:Objectives This paper aims to study the automatic generation technology of loading and unloading schemes for the liquid cargo tank of oil tanker based on reinforcement learning. Methods Using the cargo capacity of an actual operating oil tanker as input and the loading rates of the cargo tank and ballast water tank as the targets, an intelligent agent and environment were built based on Unity ML-Agents. The agent was trained using the PyTorch framework, and a reward function calculation method that comprehensively considers the loading time and the change in the trim amplitude was proposed. Finally, the example analysis were carried out to validate the validity of the proposed method. Results The results show that, the trained agent can learn good strategies and achieve autonomous generation of liquid cargo tank loading schemes. Conclusions The study showed that applying reinforcement learning to solve the problem of autonomous generation of liquid cargo tank loading schemes under multi-objective conditions is reasonable and feasible. -
算法:策略学习的伪代码 输入:训练任务$ O\left(T\right) $,最大迭代次数集合K,观察的轨迹数N,训练模型的超参数 输出:策略$ {\pi }_{\theta } $ 随机生成初始参数$ \theta $ while 未达到最大迭代次数 do foreach $ {K}_{\mathit{i}},{O}_{i}\left(T\right) $属于$ (K,O(T\left)\right) $ do while $ k\le {K}_{i} $ do 抽取一批任务$ T\in {O}_{i}\left(T\right) $ for 所有T do 根据策略$ {\pi }_{\theta } $,抽取N条轨迹 计算关于$ \theta $的损失函数的梯度 end 根据计算出的梯度更新$ \theta $ end end end 表 1 空船重心
Table 1. Values for the gravity center of light ship
参数 数值 重心距舯/m 44.999 重心高/m 8.017 重心距中/m 0 表 2 舱室参数
Table 2. Tank parameters
舱室编号 体积/m3 重心距舯/m 重心高/m 重心距中/m RCAR1P 548.24 93.648 6.124 −2.713 RCAR1S 542.59 93.647 6.127 2.738 RCAR2P 765.22 82.614 5.938 −3.870 RCAR2S 768.29 82.617 5.937 3.848 RCAR3P 1028.45 70.144 5.848 −4.101 RCAR3S 1025.32 70.127 5.848 4.101 RCAR4P 1247.74 54.774 5.821 −4.146 RCAR4S 1249.13 54.774 5.821 4.146 RCAR5P 1213.90 38.047 5.900 −4.045 RCAR5S 1210.20 38.047 5.900 4.045 RWBT1P 318.58 95.052 4.595 −4.901 RWBT1S 328.78 95.052 4.467 4.743 RWBT2P 258.15 83.259 3.399 −6.606 RWBT2S 268.06 83.254 3.298 6.363 RWBT3P 304.71 70.608 2.983 −6.731 RWBT3S 317.27 70.604 2.892 6.468 RWBT4P 352.41 55.100 2.961 −6.682 RWBT4S 367.35 55.100 2.868 6.412 RWBT5P 246.90 41.029 2.984 −6.733 RWBT5S 256.82 41.032 2.894 6.473 RWBT6P 228.11 30.246 3.275 −6.526 RWBT6S 237.38 30.318 3.146 6.301 表 3 RCAR1P部分舱容表
Table 3. Summary of partial tank capacities of RCAR1P
装载率/% 体积/m3 重心距舯/m 重心高/m 重心距中/m 惯性矩/m4 0.0 0.00 94.171 1.301 −0.587 3.93 0.6 3.26 93.199 1.352 −1.547 43.54 1.2 6.84 93.197 1.404 −1.613 49.25 1.9 10.63 93.215 1.456 −1.664 56.27 2.7 14.62 93.235 1.509 −1.712 63.87 3.4 18.82 93.256 1.563 −1.760 72.08 4.2 23.22 93.276 1.617 −1.808 80.93 5.1 27.82 93.295 1.672 −1.856 90.02 10.8 59.47 93.409 2.012 −2.136 143.87 20.3 111.23 93.489 2.496 −2.340 152.18 29.9 164.08 93.523 2.965 −2.432 160.86 39.8 218.04 93.543 3.432 −2.493 169.91 49.8 273.09 93.558 3.899 −2.541 179.35 60.1 329.25 93.571 4.369 −2.582 190.00 70.6 387.19 93.585 4.845 −2.624 208.36 80.4 440.56 93.601 5.276 −2.665 226.04 90.3 494.91 93.620 5.707 −2.702 217.93 97.7 535.38 93.638 6.024 −2.722 207.65 98.8 541.70 93.642 6.073 −2.722 120.97 99.7 546.49 93.646 6.111 −2.717 42.29 100.0 548.24 93.648 6.124 −2.713 21.11 表 4 模型配置与超参数
Table 4. Model configuration parameters and hyper-parameters
参数 数值 batch_size 2 048 buffer_size 20 480 learning_rate 0.001 learning_rate_schedule linear beta 0.0015 epsilon 0.18 lambd 0.90 num_epoch 5 hidden_units 256 num_layers 3 gamma 0.99 strength 1.0 time_horizon 256 max_steps 1.5×107 注:参数的具体描述和取值范围,可参见https://unity-technologies.github.io/ml-agents/Training-Configuration-File/ 表 5 两个算例得到的实际装载重量
Table 5. Statistics of actual loaded weight for the two examples
目标重量/t 实际重量/t 相对误差/% 算例1 5 180 5238.946 1.125 算例2 3 950 3955.778 0.146 表 6 两个算例得到的相邻步骤间的纵倾差值
Table 6. Difference of trim value between adjacent steps for the two examples
步骤数 差值平均值 差值标准差 算例1 76 0.037 0.002 算例2 69 0.015 0.007 -
[1] 张华, 孟昭燃, 齐鸣. 化学品船智能货物管理技术及最新应用[J]. 中国船检, 2022(9): 61–65. doi: 10.3969/j.issn.1009-2005.2022.09.017ZHANG H, MENG Z, QI M. Smart cargo management technology for chemical tankers and its latest applications[J]. China Ship Survey, 2022(9): 61–65 (in Chinese). doi: 10.3969/j.issn.1009-2005.2022.09.017 [2] CHUN D H, ROH M I, LEE H W, et al. Deep reinforcement learning-based collision avoidance for an autonomous ship[J]. Ocean Engineering, 2021, 234: 109216. doi: 10.1016/j.oceaneng.2021.109216 [3] DRUNGILAS D, KURMIS M, SENULIS A, et al. Deep reinforcement learning based optimization of automated guided vehicle time and energy consumption in a container terminal[J]. Alexandria Engineering Journal, 2023, 67: 397–407. doi: 10.1016/j.aej.2022.12.057 [4] YAN N N, LIU G Z, XI Z. H A multi-agent system for container terminal management[C]//Proceedings of the 7th World Congress on Intelligent Control and Automation. Chongqing, China: IEEE, 2008. [5] YANG C P, ZHAO Y Q, CAI X, et al. Path planning algorithm for unmanned surface vessel based on multiobjective reinforcement learning[J]. Computational Intelligence and Neuroscience, 2023(8): 1–14. [6] LE A V, KYAW P T, VEERAJAGADHESWAR P, et al. Reinforcement learning-based optimal complete water-blasting for autonomous ship hull corrosion cleaning system[J]. Ocean Engineering, 2021, 220: 108477. doi: 10.1016/j.oceaneng.2020.108477 [7] CHEN C, MA F, XU X B, et al. A novel ship collision avoidance awareness approach for cooperating ships using multi-agent deep reinforcement learning[J]. Journal of Marine Science and Engineering, 2021, 9(10): 1056. doi: 10.3390/jmse9101056 [8] WEI G, KUO W. COLREGs-compliant multi-ship collision avoidance based on multi-agent reinforcement learning technique[J]. Journal of Marine Science and Engineering, 2022, 10(10): 1431. doi: 10.3390/jmse10101431 [9] 齐鸣, 林嘉昊, 孙淼, 张勇. 权重搜索遗传算法在液货船装卸计划生成中的应用[J]. 中国舰船研究, 2022, 17(增刊 1): 28–36.QI M, LIN J H, SUN M, et al. Application of weight coefficient searching genetic algorithm in the plan generation of cargo loading and unloading for liquid cargo carrier[J]. Chinese Journal of Ship Research, 2022, 17(Supp 1): 28–36 (in Chinese). [10] OUCHEIKH R, LÖFSTRÖM T, AHLBERG E, et al. Rolling cargo management using a deep reinforcement learning approach[J]. Logistics, 2021, 5(1): 10. doi: 10.3390/logistics5010010 [11] JULIANI A, BERGES V P, TENG E, et al. Unity: a general platform for intelligent agents[J/OL]. arXiv: 1809.02627 [cs. LG], 2020. [ 2020-05-06]. https://doi.org/10.48550/arXiv.1809.02627. [12] PASZKE A, GROSS S, MASSA F, et al. PyTorch: an imperative style, high-performance deep learning library[C]// Proceedings of the 33rd International Conference on Neural Information Processing Systems (NIPS'19). Vancouver, Canada : Curran Associates, Inc. , 2019: 8026–8037. [13] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529–533. doi: 10.1038/nature14236 [14] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[J/OL]. arXiv: 1707.06347 [cs. LG] , 2017. [2017-08-28]. https://doi.org/10.48550/arXiv.1707.06347. [15] HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//Proceedings of the 35th International Conference on Machine Learning (ICML 2018). Stockholm, Sweden: Curran Associates, Inc. , 2018: 1861–1870. -