Movatterモバイル変換


[0]ホーム

URL:


CN111829527B - A path planning method for unmanned ships based on deep reinforcement learning and considering marine environment elements - Google Patents

A path planning method for unmanned ships based on deep reinforcement learning and considering marine environment elements
Download PDF

Info

Publication number
CN111829527B
CN111829527BCN202010717418.XACN202010717418ACN111829527BCN 111829527 BCN111829527 BCN 111829527BCN 202010717418 ACN202010717418 ACN 202010717418ACN 111829527 BCN111829527 BCN 111829527B
Authority
CN
China
Prior art keywords
unmanned ship
network
target
time
actual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010717418.XA
Other languages
Chinese (zh)
Other versions
CN111829527A (en
Inventor
曾喆
杜沛
刘善伟
万剑华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Petroleum East China
Original Assignee
China University of Petroleum East China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Petroleum East ChinafiledCriticalChina University of Petroleum East China
Priority to CN202010717418.XApriorityCriticalpatent/CN111829527B/en
Publication of CN111829527ApublicationCriticalpatent/CN111829527A/en
Application grantedgrantedCritical
Publication of CN111829527BpublicationCriticalpatent/CN111829527B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开了一种基于深度强化学习且顾及海洋环境要素的无人船路径规划方法,基本步骤为:S1插值目标海域风、浪、流数据,添加障碍物信息、起点和终点信息;S2使用贝叶斯网络评估无人船承受风浪流的最大值;S3将目标海域AIS数据重组织训练网络,获得优化经验池和初步网络参数;S4将无人船状态特征向量分别输入到深度强化学习模块进行算法迭代,更新网络参数,输出动作;S5每次迭代无人船运行15s,累计时间到1h更新数据;S6当无人船到达目标点结束迭代,输出路径。本发明充分考虑了海洋环境要素对于无人船航行的影响,更符合无人船实际远航情况,能在让无人船在恶劣海况下同时考虑环境要素和障碍物信息,得到一条优质的安全。

Figure 202010717418

The invention discloses an unmanned ship path planning method based on deep reinforcement learning and considering marine environment elements. The basic steps are: S1 interpolate wind, wave and current data in the target sea area, add obstacle information, starting point and end point information; S2 use The Bayesian network evaluates the maximum value of wind, wave and current for the unmanned ship; S3 reorganizes the training network with the AIS data in the target sea area to obtain the optimized experience pool and preliminary network parameters; S4 inputs the state feature vector of the unmanned ship into the deep reinforcement learning module respectively. Perform algorithm iteration, update network parameters, and output actions; S5, the unmanned ship runs for 15s each iteration, and the accumulated time reaches 1h to update the data; S6, when the unmanned ship reaches the target point, the iteration ends, and the path is output. The present invention fully considers the influence of marine environmental elements on the navigation of the unmanned ship, and is more in line with the actual voyage of the unmanned ship.

Figure 202010717418

Description

Unmanned ship path planning method based on deep reinforcement learning and considering marine environment elements
Technical Field
The patent relates to the field of unmanned ship path planning, in particular to an unmanned ship path planning method based on deep reinforcement learning and considering marine environment elements.
Background
The unmanned ship breaks through in many technical fields depending on the development of an artificial intelligence control technology, gradually enters the visual field of people, starts to take on tasks such as ocean exploration and data acquisition, and gradually develops to the marine operation industry.
The presently published patents: CN109657863A, CN109726866A and CN107289939A all provide better path planning methods in the field, but only consider the influence of obstacles on unmanned ships in general. Each unmanned ship has a limit value for bearing wind waves according to factors such as own materials, structure, draught and the like, and when the unmanned ship is influenced by strong wind and strong waves in a real sea area, the unmanned ship has the danger of side turning or overturning, so that the unmanned ship sails in the sea area to avoid dangerous marine environment elements and areas with obstacles, which is extremely important for sailing safety and is particularly prominent for marine transport type unmanned ships.
The influence of marine environment elements on the navigation of the unmanned ship is considered, the marine environment elements and obstacle information around the unmanned ship are used as characteristic input vectors of deep reinforcement learning, the elements which have the largest influence on algorithm output at each moment are highlighted by using an attention moment array, and compared with a collision avoidance reinforcement learning method, the method has the advantages that the reward value is not a fixed value, and the influence degree of the marine environment elements and the obstacles on the unmanned ship is comprehensively changed. The method is more suitable for the actual situation of unmanned ship navigation, and can obtain a high-quality safe path by considering environmental factors and barrier information during navigation of the unmanned ship.
Disclosure of Invention
Objects of the invention
Aiming at the problem that marine environment elements are not considered in many unmanned ship path planning methods proposed at present, the invention provides the unmanned ship path planning method based on deep reinforcement learning and considering the marine environment elements, which fully considers the real marine environment elements and marine obstacles and combines the deep reinforcement learning method to plan a safe and efficient driving path for the unmanned ship.
(II) technical scheme
In order to achieve the purpose, the technical scheme of the invention is as follows: an unmanned ship path planning method based on deep reinforcement learning and considering marine environment elements comprises the following specific steps:
(1) interpolating the data of wind speed, flow speed and wave height at t moment of the target sea area into a grid of 200m multiplied by 200m by stTo describe the characteristic state vector of the unmanned ship at the moment t, namely:
Figure BDA0002598732400000011
wherein
Figure BDA0002598732400000012
Respectively representing the wind speed and wave height of the unmanned ship at the t momentAnd the flow rate of the liquid, and,
Figure BDA0002598732400000013
the distance between the time t and the obstacle of the unmanned ship
Figure BDA0002598732400000014
Indicating that no obstacle has been detected by the drone;
(2) the capability of the unmanned ship for resisting wind, wave and flow is evaluated by utilizing a Bayesian network, the material, the displacement, the length, the width and the height of the unmanned ship are input, and the output is
Figure BDA0002598732400000021
Three parameters respectively representing the maximum values of wind speed, wave height and flow speed borne by the unmanned ship and used for calculating a reward function
(3) Initializing a deep reinforcement learning model, specifically comprising: two identical LSTM networks (as target Q network and actual Q network, respectively), a reward function model, a model experience pool, and an action output set.
(4) Three attributes of coordinates, course and speed of real AIS data of a target sea area are reserved, three marine environment element values and obstacle information are superposed into the AIS data according to time and point positions, new AIS data are used as training samples and put into a deep reinforcement learning model for training, and an optimized experience pool and preliminary network parameters are obtained;
(5) setting a starting point coordinate and an end point coordinate of the voyage of the unmanned ship, and obtaining a state feature vector s of the unmanned ship at the time ttRespectively inputting the data into an actual Q network and a reward function model;
wherein: actual Q network calculation yields QPractice ofAnd find Q according to an e-greedy strategyPractice ofCorresponding actions are output; the reward function model calculates the reward value R of the current iterationt(ii) a Randomly extracting n records in an experience pool by a target Q network, and combining the n records with RtCalculating QTarget,QPractice ofAnd QTargetCalculating loss functions together, updating network parameters of the actual Q network by using a gradient descent method, and when the iteration times reach a certain threshold value alpha, updating the network parameters of the actual Q networkCopying all parameters to a target Q network;
(6) the motion time of the unmanned ship is 15 seconds each time, and when the accumulated motion time reaches 1h, the information of the wind speed, the ocean current, the wave height and the obstacles in the sea area is updated to the current time;
(7) and when the unmanned ship reaches the target point, finishing the iteration output of the safety path.
Specifically, the bayesian network construction method in the step (2) includes the following steps:
(2.1) the nodes of the unmanned ship evaluating bayesian network include: the material, the water displacement, the length, the width, the height, the wind resistance level, the wave resistance level and the flow resistance level are taken as bottom-layer nodes, the wind resistance level, the wave resistance level and the flow resistance level are high-grade nodes, and the bottom-layer nodes are fully connected with the high-grade nodes;
(2.2) training a Bayesian network by taking the unmanned ship structure data as a sample to obtain a conditional probability table of each node;
(2.3) inputting unmanned ship information to be evaluated, including: material, water displacement, length, width and height, calculating the probability of each grade of the three high-grade nodes according to the conditional probability table, and outputting the maximum probability grade as a final value;
(2.4) mapping the grades of the wind speed, wave height and flow velocity of the unmanned ship obtained through the Bayesian network corresponding to the sea state grade into specific numerical values as
Figure BDA0002598732400000022
The value of (c).
Specifically, the reward function model described in step (3), wherein the reward value RtThe calculation formula is as follows:
Rt=softmax(|(θsafe-st)·w1|)·((θsafe-st)·w2)T
wherein s istIs a characteristic state vector theta of the unmanned ship at the time tsafeIncluding four parameters for a safety threshold vector
Figure BDA0002598732400000023
Wherein
Figure BDA0002598732400000024
The maximum value of the wind speed, wave height and flow velocity borne by the unmanned ship obtained in the step 2,
Figure BDA0002598732400000025
sensing obstacle Range, w, for unmanned vessels1The attention matrix of the reward function is a 4 multiplied by 4 upper triangular constant square matrix, and the diagonal element W of the matrixii(i ═ 1,2,3,4) corresponding to the wind speed, wave height, ocean currents and the extent of the effect of obstacles on the path plan, respectively, the off-diagonal element WijRepresenting the correlation between the element i and the element j, wherein the matrix has the function of processing the element values of the marine environment with different orders of magnitude to the same order of magnitude for comparison, and can highlight key elements; w is a2Is a 4 × 4 diagonal array, requires the sum (theta)safe-st) Partially combined to give the final prize value RtEndowing positive and negative, and simultaneously enlarging the reward value to facilitate decision making;
softmax(|(θsafe-st)·w1|) partial calculation of the coefficient of the reward function, responsible for giving weight to each element value, the weight highlights the more important elements to the decision in each iteration, and the reward value can be rapidly reduced when encountering the elements with suddenly increased numerical values and the time of detecting the obstacle; when the unmanned ship does not sense the obstacle, the reward function can guide the model to avoid a high-storm area, and can make collision avoidance actions in the first time when the obstacle is sensed.
(III) advantageous effects
The advantages of the invention are embodied in that:
1. the wind speed, the wave height, the ocean current flow velocity and the obstacle information are jointly used as main reference objects for planning the unmanned ship path, the planned path is more feasible, and in the calculation process of the method, the data are updated according to the running time of the unmanned ship, so that the reliability of the path planning result is ensured.
2. The designed reward function can highlight the more important elements for decision in each iteration, simultaneously considers the detection capability and the capability of bearing storm impact of the ship, gives rewards in a safe area, gives proper penalty in a dangerous area, and makes an avoidance decision at the first time when an obstacle is detected, so that the path planning efficiency of the method is improved, and the path planning result is optimized.
3. The method for evaluating the wind and wave resistance of the unmanned ship by using the Bayesian network is provided, replaces the conventional evaluation mode of giving the wind and wave resistance grade by using expert experience, and is more scientific and efficient.
Drawings
FIG. 1 is a flow chart of a method for planning a unmanned ship route based on deep reinforcement learning and considering marine environment elements
FIG. 2 is a schematic diagram of a Bayesian network for evaluating the capability of an unmanned ship in resisting wind, wave and current
FIG. 3 is a flow chart of a deep reinforcement learning algorithm used by the model
FIG. 4 is a schematic diagram of path planning under the influence of elements and obstacles in marine environment
Detailed Description
The invention will now be described more fully and clearly with reference to the accompanying drawings and examples:
FIG. 1 is a flow chart of a method for planning a path of an unmanned ship based on deep reinforcement learning and considering elements of a marine environment, wherein the method gives a reasonable solution for safely completing a navigation task of the unmanned ship by fully considering the material and structure of the unmanned ship and possible strong wind, strong waves, ocean currents and obstacles in a sea area; the method mainly comprises two modules, wherein the first module is a Bayesian network evaluation module used for evaluating the wind wave resistance of the unmanned ship, and the second module is a deep reinforcement learning route planning module considering marine environment elements; the method utilizes a reward function of deep reinforcement learning to couple the two modules, so that the unmanned ship can make a proper risk avoiding decision according to the self material and structure. The unmanned ship planning method is suitable for planning the path of the unmanned ship for executing the long-range mission.
Specifically, the method comprises the following steps:
(1) the wind speed, the flow speed and the wave height data of the target sea area at the time t and the forecast wind speed and the forecast after the time t are obtainedJointly using a kriging interpolation method for predicting ocean current and wave height data, and interpolating into a grid of 200m multiplied by 200 m; the grid size of 200m × 200m is a value for which the unmanned ship acquires a new element after at most three operations. Storing data by using a three-dimensional array, wherein three dimensions of the array are longitude, latitude and time respectively, the time interval of the data is 1h, and s is used at the time of ttTo describe the characteristic state vector of the unmanned ship at the moment t, namely:
Figure BDA0002598732400000041
wherein
Figure BDA0002598732400000042
Respectively represents the wind speed, wave height and flow velocity of the unmanned ship at the time t,
Figure BDA0002598732400000043
the distance between the time t and the obstacle of the unmanned ship
Figure BDA0002598732400000044
Indicating that no obstacle has been detected by the drone;
(2) the capability of resisting wind, wave and flow of the unmanned ship is evaluated by using the Bayesian network, the material, the displacement, the length, the width and the height of the unmanned ship are input, and the output is
Figure BDA0002598732400000045
And three parameters respectively represent the maximum values of wind speed, wave height and flow speed which can be borne by the unmanned ship and are used for calculating the reward function. The specific steps for constructing the Bayesian network are as follows:
(2.1) as shown in fig. 2, a bayesian network schematic diagram for evaluating the capability of the unmanned ship for resisting wind, wave and current is provided, and nodes comprise: the material, the water displacement, the length, the width, the height, the wind speed resistance grade, the wave height resistance grade and the flow speed resistance grade are taken as bottom layer nodes; the wind resistance level, the wave resistance level and the flow speed resistance level are high-grade nodes, and the bottom-layer nodes are fully connected with the high-grade nodes;
(2.2) training a Bayesian network by taking the unmanned ship structure information as a sample, wherein data need to be subjected to discretization processing, and the unmanned ship structure table is shown as the following table:
Figure BDA0002598732400000046
TABLE 1
And putting the data into a Bayesian network for training to obtain a conditional probability table of each node.
(2.3) inputting unmanned ship information to be evaluated, including: material, water displacement, length, width and height; and calculating the probability of each grade of the three high-grade nodes according to the conditional probability table, and outputting the maximum probability grade as a final value.
(2.4) mapping the grades of the wind speed, wave height and flow velocity of the unmanned ship obtained through the Bayesian network to specific numerical values corresponding to the sea condition grade table to serve as
Figure BDA0002598732400000051
The value of (c).
(3) Initializing a deep reinforcement learning model, specifically comprising: two identical LSTM networks (respectively serving as a target Q network and an actual Q network), a reward function model, a model experience pool and an action output set;
in particular, a reward function model is described, wherein a reward value R istThe calculation formula is as follows:
Rt=softmax(|(θsafe-st)·w1|)·((θsafe-st)·w2)T
wherein s istIs a characteristic state vector theta of the unmanned ship at the time tsafeIncluding four parameters for a safety threshold vector
Figure BDA0002598732400000052
Wherein
Figure BDA0002598732400000053
For unmanned ship bearing obtained by Bayesian network evaluationThe maximum value of wind speed, wave height and flow speed,
Figure BDA0002598732400000054
the collision avoidance range is sensed for the unmanned ship, and the negative sign is added in the front for convenient calculation; weight matrix w1Is a 4 × 4 symmetric constant square matrix with diagonal element Wii(i ═ 1,2,3,4) corresponding to the wind speed, wave height, ocean currents and the extent of the effect of obstacles on the path plan, respectively, the off-diagonal element WijRepresenting the correlation between the element i and the element j, wherein the matrix is used for processing the marine environment element values with different orders of magnitude to the same order of magnitude for comparison, and can highlight the key element, particularly the w1The values are given empirically:
Figure BDA0002598732400000055
w2is a 4 x 4 diagonal matrix, needs and
Figure BDA0002598732400000056
partially combined to give the final prize value RtEndowing positive and negative, simultaneously enlarging the reward value and accelerating the decision-making speed; w is a2The method specifically comprises the following steps:
Figure BDA0002598732400000057
softmax(|(θsafe-st)·w1|) calculating the coefficients of the reward function, in charge of giving weight to each characteristic state element, the weight highlights the element which is more important to the decision in each iteration, and the reward value is rapidly reduced when encountering the element with suddenly increased numerical value and the moment of detecting the obstacle, (theta)safe-st)·w2And part, attaching positive and negative to the calculation result to indicate that reward or punishment is made. The calculation mode of the reward function is exemplified and divided into two cases of not meeting obstacles and meeting obstacles:
when no obstacle is encountered:
suppose a certain time tnCharacteristic state vector of
Figure BDA0002598732400000058
Comprises the following steps:
Figure BDA0002598732400000059
unmanned ship safety threshold vector thetasafe=[3,1.5,0.2,500]NAN represents not participating in the calculation, then softmax (| (θ)safe-st)·w1I) the calculation result is [0.867,0.117,0.016,0 |)]It means that in this calculation, the marine factor "wind speed" needs attention; (theta)safe-st)·w2The part and the weight value are subjected to multiplication, and positive and negative are attached to a calculation result to indicate that reward or punishment is given; the final calculation result of-19.95 represents that punishment is made.
When an obstacle is encountered:
suppose a certain time tnCharacteristic state vector of
Figure BDA0002598732400000061
Comprises the following steps:
Figure BDA0002598732400000062
indicating that the obstacle is detected 50m away from the unmanned ship, the unmanned ship just senses the obstacle, and the safety threshold vector of the unmanned ship
Figure BDA0002598732400000063
Softmax (| (θ)safe-st)·w1I) the calculation result is [0,0,0,1 |)]It means that in this calculation, avoiding obstacles is the most important; (theta)safe-st)·w2And performing dot multiplication on the part and the weight value, attaching positive and negative to a calculation result, and giving a penalty when the final calculation result is-200.
Through the calculation, the algorithm drives the unmanned ship through the reward function, focuses on marine environment elements when no obstacle is detected, and reacts at the first time when the obstacle is detected;
(4) the method comprises the following steps of reserving three attributes of coordinates and course of real AIS data of a target sea area, superposing three marine environment element values and obstacle information into the AIS data according to time and point positions, wherein a new AIS data sample is shown in the following table:
Figure BDA0002598732400000064
TABLE 2
Putting the newly-sorted AIS data serving as training samples into a deep reinforcement learning model for training to obtain an optimized experience pool and preliminary network parameters;
(5) and (5) selecting the discretization course angle of the unmanned ship as the action output of the deep reinforcement learning when the fixed unmanned ship running speed v is 10 m/s. Considering the steering capacity of the ship, the heading change range is limited to be between 35 degrees and minus 35 degrees and discretized at equal intervals, namely an action set output by the model:
A={35,25,15,5,-5,-15,-25,-35}
(6) referring to fig. 3, which is a flowchart of a deep reinforcement learning algorithm, two identical LSTM networks are used as an actual Q network and a target Q value network in a deep reinforcement learning framework, respectively; obtaining state characteristic vector s of unmanned ship at time ttRespectively inputting the data into an actual Q network and a reward function model; the LSTM input layer of the actual Q network at time t is the feature state vector stAnd the output Q(s) of the actual Q network at the last momentt-1)Practice ofThe output layer is Q(s)t)Practice ofValue of Q(s)t)Practice ofThen, selecting action a corresponding to the Q value by utilizing an epsilon-greedy strategyt(at∈A);
(7) Calculating the reward value R at time ttThe characteristic state vector s at the time ttAnd action atExecution of atThe latter feature state vector st' and the Boolean value isend, which determines whether the iteration has terminated, together as a record rect={st,at,Rt,st', is _ end } is stored in experience pool D;
(8) randomly extracting n records from experience pool D si,ai,Ri,si′,is_endiN calculating a target Q value Q, 1,2, …Target
Figure BDA0002598732400000071
Wherein R isiThe prize value recorded for the ith entry, γ is the discount factor, in this example γ is 0.9, ω is a parameter of the actual Q network, ω' is a parameter of the target Q network, amax(si', ω) is the action chosen to record the re-projection of i into the actual Q network:
Figure BDA0002598732400000072
wherein s isi′、aiAnd omega respectively recording the state characteristic vector, the action and the network parameter of the i;
(9) calculating the accumulated loss of the i records, and updating the parameter omega of the actual Q network by utilizing gradient descent, wherein the used loss function is as follows:
Figure BDA0002598732400000073
(10) when the iteration number of the actual Q network reaches the threshold value alpha, the parameter omega of the actual Q network is wholly copied to the target Q network.
(11) The motion time of the unmanned ship is 10 seconds each time, and when the accumulated motion time reaches 1h, the wind speed, ocean current, wave height and obstacle information data of the sea area are updated to the current time;
(12) and finishing the iteration when the unmanned ship reaches the termination point, and outputting a safe path.
Fig. 4 is a schematic diagram of path planning under the influence of marine environmental elements and obstacles, and the method can avoid high marine environmental risk areas and obstacles when planning a path.
The above is an example of the present invention, and all changes made according to the technical scheme of the present invention, which produce the functional effects, do not exceed the technical scheme of the present invention, and all belong to the protection scope of the present invention.

Claims (1)

Translated fromChinese
1.一种基于深度强化学习且顾及海洋环境要素的无人船路径规划方法,其特征在于,包括以下步骤:1. an unmanned ship path planning method based on deep reinforcement learning and taking into account marine environmental elements, is characterized in that, comprises the following steps:(1)将目标海域t时刻的风速、流速、浪高数据插值成500m×500m的栅格,用st来描述无人船t时刻的特征状态向量,即:(1) Interpolate the wind speed, flow velocity, and wave height data of the target sea area at time t into a 500m × 500m grid, and use st to describe the characteristic state vector of the unmanned ship at time t, namely:
Figure FDA0003086131900000011
Figure FDA0003086131900000011
其中
Figure FDA0003086131900000012
分别代表无人船所在t时刻位置的风速、浪高和流速,
Figure FDA0003086131900000013
为无人船t时刻和障碍物的距离,当
Figure FDA0003086131900000014
表示无人船未探测到障碍物;
in
Figure FDA0003086131900000012
represent the wind speed, wave height and flow velocity of the unmanned ship at time t, respectively,
Figure FDA0003086131900000013
is the distance between the unmanned ship at time t and the obstacle, when
Figure FDA0003086131900000014
Indicates that the unmanned ship has not detected obstacles;
(2)利用贝叶斯网络评估无人船抗风浪流的能力,输入为无人船的材质、排水量、长、宽和高度,输出
Figure FDA0003086131900000015
三个参数,分别代表无人船能承受风速、浪高、流速的最大值;
(2) Using the Bayesian network to evaluate the ability of the unmanned ship to resist wind and waves, the input is the material, displacement, length, width and height of the unmanned ship, and the output is
Figure FDA0003086131900000015
Three parameters, which represent the maximum value of wind speed, wave height and flow velocity that the unmanned ship can withstand;
1)构建贝叶斯网络节点,包括:材质、排水量、长、宽、高、抗风等级、抗浪等级和抗流速等级,将材质、排水量、长、宽和高作为底层节点,抗风速等级、抗浪高等级和抗流速等级为高级节点,底层节点和高级节点之间为全连接;1) Build a Bayesian network node, including: material, displacement, length, width, height, wind resistance level, wave resistance level and flow resistance level, using material, displacement, length, width and height as the underlying nodes, wind resistance level , Anti-wave height level and anti-velocity level are high-level nodes, and the bottom nodes and high-level nodes are fully connected;2)将无人船结构数据作为样本训练贝叶斯网络,获得各个节点的条件概率表;2) Using the unmanned ship structure data as a sample to train the Bayesian network to obtain the conditional probability table of each node;3)输入需评估的无人船信息,包括:材质、排水量、长、宽、高,根据条件概率表计算三个高级节点各等级的概率,输出最大概率等级作为最终值;3) Input the information of the unmanned ship to be evaluated, including: material, displacement, length, width, height, calculate the probability of each level of the three advanced nodes according to the conditional probability table, and output the maximum probability level as the final value;4)将通过贝叶斯网络得到的无人船抗风速、浪高、流速的等级,对应海况等级映射为具体数值,作为
Figure FDA0003086131900000016
的值;
4) Map the levels of wind resistance, wave height, and flow velocity of the unmanned ship obtained through the Bayesian network, and the corresponding sea state level to specific values, as
Figure FDA0003086131900000016
the value of;
(3)初始化深度强化学习模型,具体包括:作为目标Q网络和实际Q网络的两个相同的LSTM网络、奖励函数模型、模型经验池、动作输出集,其中,奖励值Rt计算公式为:(3) Initialize the deep reinforcement learning model, which specifically includes: two identical LSTM networks as the target Q network and the actual Q network, the reward function model, the model experience pool, and the action output set, where the reward value Rt The calculation formula is:
Figure FDA0003086131900000017
Figure FDA0003086131900000017
式中st为无人船t时刻特征状态向量,θsafe为安全阈值向量包括四个参数
Figure FDA0003086131900000018
其中
Figure FDA0003086131900000019
为步骤2得到的无人船承受风速、浪高、流速的最大值,
Figure FDA00030861319000000110
为无人船避碰范围;权阵w4×4为奖励函数的注意力矩阵,是4×4的上三角常数方阵,矩阵的对角线元素Wii,分别对应风速、浪高、洋流和障碍物对于路径规划的影响程度,i=1,2,3,4,非对角线元素Wij代表要素i和要素j之间的相关性;
Figure FDA00030861319000000111
计算奖励函数的系数,负责给每一个特征状态要素赋予权重,权重凸显了每次迭代中对决策更重要的要素,在遇到数值突然增高的要素和检测到障碍物的时刻,会让奖励值快速下降;该奖励函数在无人船未感知到障碍物时,会引导模型避开高风浪区域,在感知到障碍物时会第一时间做出避碰动作;
where st is the characteristic state vector of the unmanned ship at time t, and θsafe is the safety threshold vector including four parameters
Figure FDA0003086131900000018
in
Figure FDA0003086131900000019
is the maximum value of wind speed, wave height and flow velocity obtained by the unmanned ship in step 2,
Figure FDA00030861319000000110
is the collision avoidance range of the unmanned ship; the weight matrix w4×4 is the attention matrix of the reward function, which is a 4×4 upper triangular constant square matrix. The diagonal elements of the matrix Wii correspond to wind speed, wave height, and ocean current respectively. and the influence degree of obstacles on path planning, i=1, 2, 3, 4, and the off-diagonal element Wij represents the correlation between element i and element j;
Figure FDA00030861319000000111
Calculate the coefficient of the reward function, which is responsible for assigning weights to each feature state element. The weights highlight the elements that are more important for decision-making in each iteration. When encountering elements with a sudden increase in value and detecting obstacles, the reward value will be increased. Rapid descent; the reward function will guide the model to avoid high wind and wave areas when the unmanned ship does not perceive obstacles, and will make collision avoidance actions as soon as it perceives obstacles;
(4)保留目标海域真实AIS数据的坐标、航向、速度三种属性,并将三种海洋环境要素值和障碍物信息,按照时间和点位叠加到AIS数据中,新的AIS数据作为训练样本投入到深度强化学习模型中进行训练,获得优化经验池和初步网络参数;(4) The three attributes of coordinates, heading and speed of the real AIS data in the target sea area are retained, and the three marine environment element values and obstacle information are superimposed into the AIS data according to time and point, and the new AIS data is used as a training sample Invest in a deep reinforcement learning model for training to obtain an optimized experience pool and preliminary network parameters;(5)设置无人船航程的起点坐标和终点坐标,将无人船t时刻获取的状态特征向量st分别输入到实际Q网络和奖励函数模型;(5) Set the coordinates of the starting point and the end point of the voyage of the unmanned ship, and input the state feature vector st obtained by the unmanned ship at time t into the actual Q network and the reward function model respectively;其中:实际Q网络计算得到Q实际,并根据∈-贪婪策略找到Q实际对应的动作并输出;奖励函数模型计算当前迭代的奖励值Rt;目标Q网络随机抽取经验池中的n条记录,结合Rt计算Q目标,Q实际和Q目标需一起计算损失函数并使用梯度下降方法更新实际Q网络的网络参数,当迭代次数到达阈值α,将实际Q网络的参数全部复制给目标Q网络;Among them: the actual Q network calculates theactual Q, and finds theactual corresponding action of Q according to the ∈-greedy strategy and outputs it; the reward function model calculates the reward value Rt of the current iteration; the target Q network randomly selects n records in the experience pool, Combining Rt to calculate the Qtarget , Qactual and Qtarget need to calculate the loss function together and use the gradient descent method to update the network parameters of the actual Q network. When the number of iterations reaches the threshold α, all the parameters of the actual Q network are copied to the target Q network;(6)每次迭代无人船运动时间为15秒,累计运动时间到1h时,将海域的风速、洋流、浪高和障碍物信息更新到当前时间;(6) The movement time of each iteration of the unmanned ship is 15 seconds, and when the cumulative movement time reaches 1 hour, the wind speed, ocean current, wave height and obstacle information in the sea area are updated to the current time;(7)当无人船达到目标点,结束迭代输出安全路径。(7) When the unmanned ship reaches the target point, end the iteration and output the safe path.
CN202010717418.XA2020-07-232020-07-23 A path planning method for unmanned ships based on deep reinforcement learning and considering marine environment elementsActiveCN111829527B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202010717418.XACN111829527B (en)2020-07-232020-07-23 A path planning method for unmanned ships based on deep reinforcement learning and considering marine environment elements

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202010717418.XACN111829527B (en)2020-07-232020-07-23 A path planning method for unmanned ships based on deep reinforcement learning and considering marine environment elements

Publications (2)

Publication NumberPublication Date
CN111829527A CN111829527A (en)2020-10-27
CN111829527Btrue CN111829527B (en)2021-07-20

Family

ID=72925135

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202010717418.XAActiveCN111829527B (en)2020-07-232020-07-23 A path planning method for unmanned ships based on deep reinforcement learning and considering marine environment elements

Country Status (1)

CountryLink
CN (1)CN111829527B (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112180950B (en)*2020-11-052022-07-08武汉理工大学Intelligent ship autonomous collision avoidance and path planning method based on reinforcement learning
CN112698646B (en)*2020-12-052022-09-13西北工业大学Aircraft path planning method based on reinforcement learning
CN112580801B (en)*2020-12-092021-10-15广州优策科技有限公司Reinforced learning training method and decision-making method based on reinforced learning
CN112800545B (en)*2021-01-282022-06-24中国地质大学(武汉)Unmanned ship self-adaptive path planning method, equipment and storage medium based on D3QN
CN112947431B (en)*2021-02-032023-06-06海之韵(苏州)科技有限公司Unmanned ship path tracking method based on reinforcement learning
CN113176776B (en)*2021-03-032022-08-19上海大学Unmanned ship weather self-adaptive obstacle avoidance method based on deep reinforcement learning
CN113297801B (en)*2021-06-152022-10-14哈尔滨工程大学Marine environment element prediction method based on STEOF-LSTM
CN114371700B (en)*2021-12-152023-07-18中国科学院深圳先进技术研究院 A probabilistic filtering reinforcement learning unmanned ship control method, device and terminal equipment
WO2023108494A1 (en)*2021-12-152023-06-22中国科学院深圳先进技术研究院Probability filtering reinforcement learning-based unmanned ship control method and apparatus, and terminal device
CN114721409B (en)*2022-06-082022-09-20山东大学Underwater vehicle docking control method based on reinforcement learning
CN114942596B (en)*2022-07-262022-11-18山脉科技股份有限公司Intelligent control system for urban flood control and drainage
CN115470934A (en)*2022-09-142022-12-13天津大学Sequence model-based reinforcement learning path planning algorithm in marine environment
CN115493595A (en)*2022-09-282022-12-20天津大学 A AUV path planning method based on local perception and proximal optimization strategy
CN115855226B (en)*2023-02-242023-05-30青岛科技大学Multi-AUV cooperative underwater data acquisition method based on DQN and matrix completion
CN117113227A (en)*2023-09-072023-11-24天津光电通信技术有限公司Communication modulation recognition model training method and device
CN118972809B (en)*2024-10-142025-02-11清华大学Communication method for ocean Internet of things buoy networking
CN120256821A (en)*2025-03-202025-07-04国能神皖马鞍山发电有限责任公司 A training method and device for ship unloader path generation model

Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN102278995A (en)*2011-04-272011-12-14中国石油大学(华东)Bayes path planning device and method based on GPS (Global Positioning System) detection
CN102788581A (en)*2012-07-172012-11-21哈尔滨工程大学Ship route planning method based on modified differential evolution algorithm
CN109726866A (en)*2018-12-272019-05-07浙江农林大学 Path planning method for unmanned ship based on Q-learning neural network
CN110362089A (en)*2019-08-022019-10-22大连海事大学Unmanned ship autonomous navigation method based on deep reinforcement learning and genetic algorithm
CN110514206A (en)*2019-08-022019-11-29中国航空无线电电子研究所A kind of unmanned plane during flying path prediction technique based on deep learning
CN111338356A (en)*2020-04-072020-06-26哈尔滨工程大学 Improved distributed genetic algorithm for multi-objective unmanned ship collision avoidance path planning method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
KR102215520B1 (en)*2018-09-132021-02-15주식회사 웨더아이Method and server for providing course information of vessel including coast weather information

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN102278995A (en)*2011-04-272011-12-14中国石油大学(华东)Bayes path planning device and method based on GPS (Global Positioning System) detection
CN102788581A (en)*2012-07-172012-11-21哈尔滨工程大学Ship route planning method based on modified differential evolution algorithm
CN109726866A (en)*2018-12-272019-05-07浙江农林大学 Path planning method for unmanned ship based on Q-learning neural network
CN110362089A (en)*2019-08-022019-10-22大连海事大学Unmanned ship autonomous navigation method based on deep reinforcement learning and genetic algorithm
CN110514206A (en)*2019-08-022019-11-29中国航空无线电电子研究所A kind of unmanned plane during flying path prediction technique based on deep learning
CN111338356A (en)*2020-04-072020-06-26哈尔滨工程大学 Improved distributed genetic algorithm for multi-objective unmanned ship collision avoidance path planning method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
An Autonomous Path Planning Model for Unmanned Ships Based on Deep Reinforcement Learning;Siyu guo等;《SENSORS》;20200111;第20卷(第2期);全文*

Also Published As

Publication numberPublication date
CN111829527A (en)2020-10-27

Similar Documents

PublicationPublication DateTitle
CN111829527B (en) A path planning method for unmanned ships based on deep reinforcement learning and considering marine environment elements
CN112650237B (en) Ship path planning method and device based on cluster processing and artificial potential field
Tan et al.Fast marching square method based intelligent navigation of the unmanned surface vehicle swarm in restricted waters
CN116952239A (en) An unmanned boat path planning method based on the integration of improved A* and DWA
CN111273670B (en)Unmanned ship collision prevention method for fast moving obstacle
CN112925319B (en) A dynamic obstacle avoidance method for underwater autonomous vehicle based on deep reinforcement learning
CN111880549A (en) Deep reinforcement learning reward function optimization method for unmanned ship path planning
CN114610046B (en) A dynamic safety trajectory planning method for unmanned vehicles considering dynamic water depth
CN110906935A (en)Unmanned ship path planning method
Wang et al.Roboat III: An autonomous surface vessel for urban transportation
CN116954232A (en) Multi-ship collision avoidance decision-making method and system for unmanned ships based on reinforcement learning
Su et al.A constrained locking sweeping method and velocity obstacle based path planning algorithm for unmanned surface vehicles in complex maritime traffic scenarios
CN116700269A (en)Unmanned ship path planning method and system considering environmental disturbance and multi-target constraint
Chaysri et al.Unmanned surface vehicle navigation through generative adversarial imitation learning
Gao et al.An optimized path planning method for container ships in Bohai bay based on improved deep Q-learning
CN117666557A (en) An unmanned boat path planning algorithm for ocean current interference
CN116466701A (en)Unmanned surface vessel energy consumption minimum track planning method
Zhang et al.Path planning of USV in confined waters based on improved A∗ and DWA fusion algorithm
Zhang et al.Five-tiered route planner for multi-AUV accessing fixed nodes in uncertain ocean environments
CN116300913A (en)Unmanned ship multi-constraint local path planning method based on visual information
CN119717888A (en) A collaborative path planning method for unmanned intelligent swarm across air and sea domains
CN119645026A (en)Multi-ship collaborative target searching method and system for complex searching scene
CN115469651A (en) A method for fully electric unmanned tugboats intelligently assisting the automatic berthing of large cargo ships
CN118913278A (en)Water area path planning method, device and storage medium
CN119148714A (en)Improved DWA and A algorithm path planning method based on intelligent ship collision avoidance decision system

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp