Disclosure of Invention
The invention aims to provide a robust navigation planning method of a water surface unmanned ship for converging space-time characteristics for realizing real-time collision avoidance of ships on various movement mode obstacles aiming at a high-dynamic chaser, a high-density obstacle and vortex interference water surface environment in the local path planning direction related to the background technology.
The invention adopts a method for preventing the rear-end collision of the unmanned surface vehicle with polymerized space-time characteristics, which comprises the following steps:
step 1, defining a local navigation area range, setting starting point coordinates and end point coordinates of a water surface unmanned ship in the area, and randomly generating vortex signals in the area;
Step 2, constructing a navigation environment, and randomly generating static obstacles with different sizes and dynamic obstacles with random positions in a blank area;
Step 3, generating simulation signals for providing speed, ship self-position and obstacle relative distance information based on the type of a sensor carried by a real ship, and constructing observation signals according to the simulation signals;
step 4, sending the observation signals into a space-time feature aggregation module, and aggregating space features in an inter-frame stacking mode along a time dimension;
Step 5, determining a distributed reinforcement learning algorithm as a backbone frame of a navigation planning network, and defining an action space and a reward function;
Step 6, collecting experience bar data in the training process by adopting an experience pool, optimizing network weight parameters by small batches of experience bars, and training a deep reinforcement learning model by using an experience playback strategy until convergence;
And 7, constructing an experimental verification scene, loading converged model parameters, and verifying the anti-collision capability of the ship in a local environment and the generalization capability of the ship in response to a new scene.
The local navigation area is a simulation environment built for verifying the effectiveness of the algorithm.
The space-time feature aggregation module is formed by connecting an independent space feature aggregation module and a time feature extraction module in series.
The step 1 specifically comprises the following steps:
1-1, defining a local sailing area range of the unmanned surface vehicle, and setting a length value as H and a width value as W;
1-2, initializing a starting coordinate PSTA (x, y) and an ending coordinate PEND (x, y) of the unmanned surface vessel, wherein the unmanned surface vessel has a coordinate P (x, y) at each moment in the range of the current local navigation area.
The step 2 specifically comprises the following steps:
2-1 determining a random operator, and randomly initializing the position coordinates of the static obstacle in a blank areaRadius sizeSi represents the ith static obstacle;
2-2 for dynamic obstacle, randomly generating its initial position coordinates on the remaining blank areaThe terminal point is set as a position coordinate P (x, y) of the unmanned surface vessel at the current moment, namely the terminal point is continuously changed along with the movement of the unmanned surface vessel, and dj represents the j-th dynamic obstacle.
The step 3 comprises the following steps:
3-1, simulating the working principle and the output characteristic of each sensor based on the type of the ship-borne sensor, and generating an observation signal at the current moment;
3-2, generating satellite and inertial device signals, and providing the self position P (x, y) of the ship;
and 3-3, generating Doppler velocimeter signals through simulation, and providing the current ground speed V (Vx,vy) of the ship.
3-4, Simulating to generate a laser radar signal, and providing relative distance information D (D1,d2,…,dN) of the ship relative to obstacles in the surrounding environment, wherein N is the number of laser beams generated when simulating the laser radar signal;
And 3-5, splicing the position information, the speed information and the obstacle relative distance information of the ship, and using the spliced information as an observation signal st={P(x,y),V(vx,vy),D(d1,d2,…,dN under the current environment.
The step 4 comprises the following steps:
4-1, designing a stack structure aiming at the observation signals St spliced in the step 3, stacking the signals along the time dimension, constructing spatial feature information St={st-k,…,st-1,st which is rich in the spatial position change of the obstacle, wherein St={P(x,y),V(vx,vy),D(d1,d2,…,dN), and extracting motion features of the spatial feature information by adopting a rectangular convolutional neural network.
4-2, On the basis of performing feature aggregation on the space feature information by adopting a neural network, further extracting the motion dependency characteristic in the space feature information by adopting a continuous time neural network to acquire the identification feature of the environment observation signal in the time dimension, wherein the continuous time neural network is formed by the sub-networks { f, g and h } together, and the three sub-networks share a basic skeleton network structure.
Said step 5 comprises the steps of:
5-1, determining a value distributed reinforcement learning (IQN) algorithm as a network framework, and guiding a decision by using the reporting distribution information by modeling the distribution information of accumulated returns, so as to avoid ignoring risk changes in a scene by using reporting average information in all states;
5-2 defining the motion space as the acceleration and angular velocity changes in discrete spaceAlpha is the acceleration rate of the vehicle,Is the angular velocity;
5-3 determining a reward function rt, defining an arrival endpoint reward factor and a collision penalty factor with the obstacle, defining a time penalty factor, and defining an endpoint g-directed reward function at the current time t
The step 6 comprises the following steps:
6-1, initializing a deep reinforcement learning network, initializing an input state st, selecting an action at under the current input state, executing the action to obtain a current moment rewarding value rt and a next moment state st+1, and storing an experience bar (st,at,rt,st+1) in an experience playback pool.
And 6-2, determining the total number of experience playback pools, and when the accumulated training experience bars exceed the total number, extracting experience bar data in small batches, solving gradients by adopting a gradient descent algorithm in the back propagation process, and finishing updating network parameters until the model converges.
The step 7 comprises the following steps:
7-1, the properties of the vortex, the static obstacle and the dynamic obstacle in the experimental scene are consistent with those in the training scene, but the positions and the numbers of the vortex, the static obstacle and the dynamic obstacle are randomly initialized according to the experimental requirements.
Compared with the prior art, the invention has the following advantages that:
1. The invention provides a water surface unmanned ship local anti-collision method with aggregated space-time characteristics, which is characterized in that environmental data are collected from a sensor to an unmanned ship navigation planning decision signal, and an end-to-end unmanned ship local path planning network is constructed. The network structure fully extracts the spatial position change of the continuous observation signal, and further extracts the long-time motion dependency characteristic of the obstacle contained in the characteristic through a continuous time network. Therefore, the local collision prevention and anti-collision navigation route planning under the high-risk and obstacle-dense scene is realized.
2. The invention fully considers the water surface environment and the characteristics of the ship-borne sensor, and adds vortex interference to simulate the influence of the water surface dynamic environment on the ship speed. The output characteristics of the simulation ship-borne sensor generate environment observation data, and the simulation result is ensured to be consistent with the front-end perception input of the ship in the real environment. Compared with the prior algorithm which relies on the obstacle position and length information to calculate the position, the method has the advantages that the update frequency of sensing data sources such as the laser radar is higher, the error transmitted in the obstacle position calculating process is smaller, and the method is favorable for generating a decision signal with stronger robustness and higher real-time performance.
Detailed Description
The technical scheme of the invention is further described in detail below with reference to the accompanying drawings:
As shown in fig. 1, the invention discloses a local anti-collision method for a water surface unmanned ship with aggregated space-time characteristics, which comprises the steps of generating environment observation signals under multiple information sources by simulating the working principle and the output state of a sensor in a local path planning process, and stacking the environment observation along the time dimension by adopting a stack structure to obtain state information. The network architecture diagram shown in fig. 3 is constructed, the network architecture diagram comprises a spatial feature and temporal feature extraction module, and a fusion value distributed reinforcement learning algorithm is used for realizing anti-collision guidance and navigation path planning of the unmanned surface vessel in a high-risk environment. The method provided by the invention is mainly used for a navigation path planning module of the unmanned surface vessel, provides a real-time and high-safety path planning algorithm for the unmanned surface vessel in a high-dynamic water surface strong interference environment, and specifically comprises the following steps:
And 1, defining a local navigation area range, setting starting point coordinates and end point coordinates of the unmanned surface vehicle in the area, and randomly generating vortex signals in the area.
And 2, constructing a navigation environment, and randomly generating static obstacles with different sizes and dynamic obstacles with random positions in the blank area.
And 3, generating simulation signals for providing speed, ship self-position and obstacle relative distance information based on the type of the sensor carried by the real ship, and constructing observation signals according to the simulation signals.
And 4, aggregating the spatial features in an inter-frame stacking mode along the time dimension aiming at the observed signals, and inputting the observed signals into a continuous time neural network frame by frame to aggregate the temporal features.
And 5, determining a distributed reinforcement learning (IQN) algorithm as a backbone framework of a navigation planning network, and defining an action space and a reward function.
And 6, collecting experience bar data in the training process by adopting an experience pool, optimizing network weight parameters by using a small batch of experience bars, and training a deep reinforcement learning model by using an experience playback strategy until convergence.
And 7, constructing an experimental verification scene, loading converged model parameters, and verifying the anti-collision capability of the ship in a local environment and the generalization capability of the ship in response to a new scene.
Further, the step 1 specifically includes the following steps:
1-1, defining a local sailing area range of the unmanned surface vehicle, and setting a length value as H and a width value as W.
1-2 Initializing the starting coordinates PSTA (x, y) and the ending coordinates PEND (x, y) of the unmanned surface vehicle. In the current local sailing area range, the unmanned ship coordinates at each moment are P (x, y).
Further, the step 2 specifically includes the following steps:
2-1 determining a random operator, and randomly initializing the position coordinates of the static obstacle in a blank areaRadius sizeAnd randomly selecting the average distribution.
2-2 For dynamic obstacle, randomly generating its initial position coordinates on the remaining blank areaThe terminal point is set as a position coordinate P (x, y) of the unmanned surface vessel at the current moment, namely the terminal point is continuously changed along with the movement of the unmanned surface vessel.
Further, step 3 includes the steps of:
and 3-1, simulating the working principle and the output characteristic of each sensor based on the type of the ship-borne sensor, and generating an observation signal at the current moment.
And 3-2, generating satellite and inertial device signals to provide the self position P (x, y) of the ship.
And 3-3, generating Doppler velocimeter signals through simulation, and providing the current ground speed V (Vx,vy) of the ship.
3-4, Simulating to generate a laser radar signal, and providing relative distance information D (D1,d2,…,dN) of the ship relative to the obstacles in the surrounding environment, wherein N is the number of laser beams generated when simulating the laser radar signal.
And 3-5, splicing the position information, the speed information and the obstacle relative distance information of the ship, and using the spliced information as an observation signal st={P(x,y),V(vx,vy),D(d1,d2,…,dN under the current environment.
Further, step 4 includes the steps of:
4-1, designing a stack structure aiming at the observation signals St spliced in the step 3, stacking the signals along the time dimension, and constructing spatial characteristic information St={st-k,…,st-1,st which is rich in the spatial position change of the obstacle, wherein St={P(x,y),V(vx,vy),D(d1,d2,…,dN). And extracting motion characteristics from the spatial characteristic information by adopting a rectangular convolutional neural network.
4-2, On the basis of performing feature aggregation on the space feature information by adopting a neural network, further extracting the motion dependency characteristics in the space feature information by adopting a continuous time neural network, and acquiring the identification features of the environment observation signals in the time dimension. Specifically, the continuous-time neural network is composed of subnetworks { f, g, h } together, and the three subnetworks share a basic skeleton network structure.
Further, step 5 includes the steps of:
5-1, determining a value distributed reinforcement learning (IQN) algorithm as a network framework, and guiding a decision by using the return distribution information by modeling the distribution information of accumulated returns, so as to avoid ignoring risk changes in a scene by using return average information in all states.
5-2 Defining the motion space as the acceleration and angular velocity changes in discrete space
5-3 Determining a reward function rt, defining an arrival endpoint reward factor and a collision penalty factor with an obstacle, defining a time penalty factor, and defining an endpoint-directed reward function at the current time
Further, step 6 includes the steps of:
6-1, initializing a deep reinforcement learning network, initializing an input state st, selecting an action at under the current input state, executing the action to obtain a current moment rewarding value rt and a next moment state st+1, and storing an experience bar (st,at,rt,st+1) in an experience playback pool.
And 6-2, determining the total number of experience playback pools, and when the accumulated training experience bars exceed the total number, extracting experience bar data in small batches, solving gradients by adopting a gradient descent algorithm in the back propagation process, and finishing updating network parameters until the model converges.
Further, step 7 includes the steps of:
7-1, the properties of the vortex, the static obstacle and the dynamic obstacle in the experimental scene are consistent with those in the training scene, but the positions and the numbers of the vortex, the static obstacle and the dynamic obstacle are randomly initialized according to the experimental requirements.
Example 1
The invention relates to a local obstacle avoidance technology of a water surface unmanned ship, which aims to guide the unmanned ship to autonomously plan a navigation path in a water surface environment with dense obstacles, high dynamic state and high risk. In an embodiment, by training a deep reinforcement learning network model in a simulation environment and taking the verification of the effectiveness of an algorithm in a single scene as an example, the specific steps of unmanned ship local anti-collision and path planning are elaborated, and related parameter configuration and operation flows are as follows:
and 1, defining a local obstacle avoidance area range to be 50 x 50m, setting a navigation starting point of the unmanned ship to be (5, 5), and setting a navigation end point of the unmanned ship to be (45, 45). The unmanned ship has a ship length of 1.255m and a ship width of 0.29m. The vortex locations are randomly initialized.
Step 2, randomly initializing static barriers in a blank area, wherein the radius of the barriers is uniformly distributed according to [1,3] m. Randomly initializing the starting coordinates of the obstacle of the dynamic chaser, wherein the ship length of the chaser is 1m, and the ship width is 0.5m.
And 3, constructing a composite observation vector shown in fig. 2, wherein the composite observation vector comprises the ship position provided by signals from satellites and inertial devices, speed information acquired by a Doppler velocimeter and relative distance information acquired by a laser radar. The stack length is set to 4, and four continuous frame observation vectors are stacked to be used as the input of the deep reinforcement learning network.
And 4, sending the composite observation signals into a spatial feature extraction and temporal feature extraction module constructed as shown in fig. 2, reducing the data dimension, simultaneously aggregating the spatial and temporal features of the environmental observation signals, and outputting action signals through a value distribution reinforcement learning network decision.
Step 5, the action space comprises acceleration change [ -0.2,0,0.2] m/s2 and angular velocity change [ -5,0,5 ]/s under discrete space. The target rewarding value 50 is set in the rewarding function, the collision punishment value-100 and the time punishment-1 are generated, and the difference value between the last moment distance end point and the current moment distance end point is taken as a sub rewarding factor.
And step 6, training the neural network by adopting an empirical return visit strategy until convergence.
And 7, loading the network parameters trained to be converged in the step 6 into a model, randomly generating a local navigation environment, and verifying the anti-collision capability and navigation path planning capability under the local environment of the ship to obtain an experimental result shown in fig. 4. Wherein the orange rectangle represents a dynamic chaser, the green rectangle represents a surface unmanned boat, and the gray circle represents a static obstacle.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
While the foregoing is directed to embodiments of the present invention, other and further details of the invention may be had by the present invention, it should be understood that the foregoing description is merely illustrative of the present invention and that no limitations are intended to the scope of the invention, except insofar as modifications, equivalents, improvements or modifications are within the spirit and principles of the invention.