CN109726866A

Movatterモバイル変換

Info

Publication number: CN109726866A
Application number: CN201811612058.6A
Authority: CN
Inventors: 冯海林; 吕扬民; 方益明; 周国模
Original assignee: Jiyang College of Zhejiang A&F University
Current assignee: Jiyang College of Zhejiang A&F University
Priority date: 2018-12-27
Filing date: 2018-12-27
Publication date: 2019-05-07

Abstract

本发明公开了种基于Q学习神经网络的无人船路径规划方法，包括以下步骤：a)、初始化存储区D；b)、初始化Q网络，状态、动作初始值；c)、随机设定训练目标；d)、随机选择动作a_t，得到当前奖励r_t，下一时刻状态s_t+1，将(s_t,a_t,r_t,s_t+1)存到存储区D中；e)、从存储区D中随机采样一批数据进行训练，即一批(s_t,a_t,r_t,s_t+1)，当USV达到目标位置，或超过每轮最大时间时的状态都认为是最终状态；f)、如果s_t+1不是最终状态，则返回步骤d，若s_t+1是最终状态，则更新Q网络参数，并返回步骤d，重复n轮后算法结束；g)、设定目标，用训练后的Q网络进行路径规划，直到USV到达目标位置。本发明决策时间短、路线更优化，能够满足在线规划的实时性要求。

The invention discloses an unmanned ship path planning method based on a Q-learning neural network, comprising the following steps: a), initializing a storage area D; b), initializing a Q network, state, and initial value of actions; c), randomly setting training Goal; d), randomly select action at, get current reward_rt , next moment state s_t₊₁ , store (s_t , at , r_t , s_t₊₁ ) in storage area D; e ), randomly sample a batch of data from the storage area D for training, that is, a batch of (s_t , at , r_t , s_t₊₁ ), when the USV reaches the target position, or exceeds the maximum time of each round, the state is all It is considered to be the final state; f), if s_t+1 is not the final state, then return to step d, if s_t+1 is the final state, update the Q network parameters, and return to step d, and the algorithm ends after repeating n rounds; g ), set the target, and use the trained Q network for path planning until the USV reaches the target position. The invention has short decision time and more optimized route, and can meet the real-time requirement of online planning.