﻿ 基于蒙特卡罗树搜索的多载具自动化存取系统优化算法
 舰船科学技术  2022, Vol. 44 Issue (8): 169-173    DOI: 10.3404/j.issn.1672-7649.2022.08.036 PDF

1. 中国船舶集团有限公司第七一三研究所, 河南 郑州 450015;
2. 郑州大学 信息工程学院, 河南 郑州 450001

Research on multi vehicle automatic access system optimization algorithm based on Monte Carlo tree search
CHEN Jian-xin1, NING Meng1, HUANG Yu-luo1, Zhang Lei1, ZHAO Xin-can2
1. The 713 Research Institute of CSSC, Zhengzhou 450015, China;
2. School of Information Engineering, Zhengzhou University, Zhengzhou 450001, China
Abstract: To optimize the distribution of the multi vehicle automatic access system storage space, the operation characteristics of a multi vehicle automatic access system are analyzed, the Markov decision process model of the multi vehicle automatic access system optimization problem is established, and an improved Monte Carlo tree search algorithm is proposed to solve the model. Firstly, the cargo location optimization model is established aiming at the minimum total handling capacity and the distance between the same type of containers. Then, in order to better control the rationality of Monte Carlo tree search branch, the node selection part of the algorithm is optimized. Finally, the improved Monte Carlo tree search algorithm is optimized and tested. The experimental results show that the improved Monte Carlo tree search algorithm is better than the greedy algorithm, the cube reduction algorithm and the traditional Monte Carlo tree search algorithm.
Key words: multi vehicle automatic access system     Markov model     Monte Carlo tree search
0 引　言

1 问题描述及建模 1.1 问题描述

 图 1 某型多载具自动化存取系统一种初始状态 Fig. 1 An initial state of the multi vehicle automatic access system

 图 2 某型多载具自动化存取系统使用后的某一可能状态 Fig. 2 An used state of the multi vehicle automatic access system

1.2 模型建立

1）状态空间

2）动作空间

3）奖励函数

 $\overline {{c_{t{K_i}}}} \left[ {x,y} \right] = \frac{1}{{{K_i}}}\mathop \sum \limits_{j = 1}^{{K_{in}}} \left[ {{x_{t{K_{ij}}}},{y_{t{K_{ij}}}}} \right] 。$ (1)

 $d = \mathop \sum \limits_{i = 1}^{{K_I}} \mathop \sum \limits_{j = 1}^{{K_{in}}} \sqrt {{{\left[ {{x_{t{K_{ij}}}} - \overline {{c_{t{K_i}}}} \left( x \right)} \right]}^2} + {{\left[ {{y_{t{K_{ij}}}} - \overline {{c_{t{K_i}}}} \left( y \right)} \right]}^2}} 。$ (2)

 $C = \frac{1}{n}\sum\limits_{i = 1}^n {\overline {{c_{t{K_i}}}} \left[ {x,y} \right]} 。$ (3)

 $D = \mathop \sum \limits_{i = 1}^n \sqrt {{{\left[ {\overline {{c_{t{K_i}}}} \left( x \right) - C(x)} \right]}^2} + {{\left[ {\overline {{c_{t{K_i}}}} \left( y \right) - C(y)} \right]}^2}} 。$ (4)

 $\min {F_1} = \frac{d}{D} 。$ (5)

 $\mathop \sum \limits_{n = 1}^{{N_r}} {f_h}({x_{{K_{ijt}}}},{x_{{K_{ij\left( {t - 1} \right)}}}}) 。$ (6)

 ${\eta _2}{f_v}({y_{{K_{ijt}}}},{y_{{K_{ij\left( {t - 1} \right)}}}}) + {\eta _1}{f_h}({x_{{K_{ijt}}}},{x_{{K_{ij\left( {t - 1} \right)}}}}) ，$ (7)

 {{{Q}}_{S{{i}}}} = \left\{ {\begin{aligned} &{\mathop \sum \limits_{r = 1}^2 \mathop \sum \limits_{n = 1}^{{N_r}} {\eta _1}{f_h}({x_{{K_{ijt}}}},{x_{{K_{ij\left( {t - 1} \right)}}}})} ，\\ &{{\eta _2}{f_v}({y_{{K_{ijt}}}},{y_{{K_{ij\left( {t - 1} \right)}}}}) + {\eta _1}{f_h}({x_{{K_{ijt}}}},{x_{{K_{ij\left( {t - 1} \right)}}}})} 。\end{aligned}} \right. (8)

 $\min {F_2} = T{{\text{Q}}_{S{{i}}}} 。$ (9)

 $R = {\omega _1}{F_{\text{1}}} + {\omega _2}{F_2}。$ (10)

 {{reward}} = \left\{ \begin{aligned} &1\quad&{\min {R_F} - \min {R_L} > 0} ，\\ &{ - 1}\quad&{{\text{min}}{R_F} - \min {R_L} \leqslant 0} 。\end{aligned} \right. (11)

4）剪枝优化

 $\left\{ {{S_t},a,R_{t + 1}^h,S_{t + 1}^h,A_{t + 1}^h, \cdots ,S_T^h} \right\}_{h = 1}^H\text{～}{M_{{v}}},{\text{π}} 。$ (12)

2 改进型MCTS算法设计

MCTS算法属于强化学习的一种，对路径规划、序贯决策问题具有良好适应性[11]，且能够在拥有较少专业知识的情况下仍能够有较好收敛效果，本文基于自动化存取系统货位优化问题提出兼顾节点收益的蒙特卡洛树搜索的货位优化算法。求解货位优化问题的总体思路是将货位优化过程看作成一个模拟随机货位运动的过程，以当前状态得分作为货位优化效果。然后依据数学模型中的约束关系制定规则，以模型中定义的收益最大化为优化目标，整个搜索过程包括选择、扩展、模拟和更新４个阶段。

MCTS算法每次迭代均通过树策略选择下一个节点进行扩展，不断从扩展节点开始，递归地选择得分最高的货位优化最终状态（循环时间到时停止）。在MCTS中通过树的方法遍历所有可能动作空间，其中，选择步骤被称为树策略，该策略统计分数由上限界限信心（upper confidence bound，UCB）方程给出：

 ${V_{UCB}} = \frac{{Q\left( {{v_i}} \right)}}{{N\left( {{v_i}} \right)}} + c\sqrt {\frac{{{\text{ln}}\left( {N\left( v \right)} \right)}}{{N\left( {{v_i}} \right)}}} 。$ (13)

 $\begin{split}{V_{UCB}}^{1} = &0.5 \times Q{\left( {{v_i}} \right)^{\max }} + 0.5 \times Q\left( {{v_i}} \right)_{_{random}}^{90\% } +\\ &\frac{{Q\left( {{v_i}} \right)}}{{N\left( {{v_i}} \right)}} + c\sqrt {\frac{{{\text{ln}}\left( {N\left( v \right)} \right)}}{{N\left( {{v_i}} \right)}}} 。\end{split}$ (14)

 图 3 改进的蒙特卡罗树搜索算法流程图 Fig. 3 A flowchart of the improved Monte Carlo tree search algorithm

3 仿真结果及分析

 图 4 模拟器下自动化存取系统货位初始状态 Fig. 4 The initial state of the multi vehicle automatic access system space under the simulator

 图 5 自动化存取系统货位优化过程及结果 Fig. 5 The optimization process and results of the storage space of the multi vehicle automatic access system

4 结　语

 [1] 李越. 集装箱装载配置优化算法研究[D]. 上海: 上海交通大学, 2002. [2] 张延华, 姜雄文. 基于改进遗传算法的电气设备仓库库位优化[J/OL]. 控制工程: 1–9 [2022-04-06]. [3] 柏乐. 仓储货位优化及调度模型的研究与实现[D]. 北京: 北京邮电大学, 2020. [4] 邓爱民, 蔡佳, 毛浪. 基于时间的自动化存取系统货位优化模型研究[J]. 中国管理科学, 2013, 21(6): 107-112. DOI:10.3969/j.issn.1003-207X.2013.06.013 [5] YUE L, GUAN Z, HE C, et al. Slotting optimization of automated storage and retrieval system (AS/RS) for efficient delivery of parts in an assembly shop using genetic algorithm: a case study[J]. Iop Conference, 2017, 215: 012002. DOI:10.1088/1757-899X/215/1/012002 [6] GIU J D, JIANG Z Y, TANG M A, et al. Research on slotting optimization of AS/RS based on cyclical virus evolutionary genetic algorithm[J]. Journal of Lanzhou Jiaotong University, 2013. [7] WU J H, QIN T D, CHEN J, et al. Slotting optimization algorithm of the stereo warehouse[J]. Advanced Materials Research, 2012, 756-759: 1371-1376. [8] 张贵军, 姚俊, 周晓根, 等. 基于精英多策略的货位分配优化方法[J]. 计算机科学, 2018, 45(1): 273-279. DOI:10.11896/j.issn.1002-137X.2018.01.048 [9] 陈月婷, 何芳. 基于改进粒子群算法的自动化存取系统货位分配优化[J]. 计算机工程与应用, 2008, 44(11): 229-231,236. DOI:10.3778/j.issn.1002-8331.2008.11.068 [10] WALEDZIK K, MANDZIUK J. Applying hybrid monte carlo tree search methods to risk-awareproject scheduling problem[J]. Information Sciences, 2017. [11] 封佳祥, 江坤颐, 周彬, 等. 多任务约束条件下基于强化学习的水面无人艇路径规划算法[J]. 舰船科学技术, 2019, 41(12): 140-146. [12] PAN C H, WU M H. A study of storage assignment problem for an order picking line in a pick-and-pass warehousing system[J]. Computers & Industrial Engineering, 2009, 57(1): 261-268.