CN113954864B

Movatterモバイル変換

Info

Publication number: CN113954864B
Application number: CN202111105338.XA
Authority: CN
Inventors: 蔡英凤; 胡启慧; 滕成龙; 饶中钰; 王海; 陈龙; 李祎承; 刘擎超; 孙晓强
Original assignee: Jiangsu University
Current assignee: Jiangsu University
Priority date: 2021-09-22
Filing date: 2021-09-22
Publication date: 2024-05-14
Anticipated expiration: 2041-09-22
Also published as: CN113954864A

Abstract

The invention discloses an intelligent automobile track prediction system and method integrating peripheral vehicle interaction information, and belongs to the technical field of intelligent driving. The invention provides a graph convolution neural network considering surrounding vehicle interaction, which solves the problem that the information interaction of surrounding vehicles is not considered in the existing track prediction algorithm. A method for extracting map information by using a high-definition vector map instead of a bird's eye view is provided, wherein a vector map is used for defining a lane geometry, so that the problem of prediction discretization caused by the resolution problem is reduced. A mode of fusing the space-time relationship between the vehicle and the driving scene is provided, new lane characteristics are introduced to represent the generalized geometric relationship between the vehicle and the lanes, and the accuracy of track prediction in the face of lanes with different shapes and numbers is effectively improved. A multi-Seq 2Seq structure stacking mode is provided for predicting future multi-modal tracks of a vehicle and the probability of selecting different lanes, and the limitation of single track output is improved.

Description

Intelligent automobile track prediction system and method integrating peripheral automobile interaction information

Technical Field

The invention belongs to the technical field of intelligent driving, and particularly relates to an intelligent automobile track prediction system and method integrating peripheral automobile interaction information.

Background

With the development of intelligent automobile technology and the rising of 5G communication technology, the study of the automatic driving of vehicles by students at home and abroad is more and more, and one of the main purposes of the study is to reduce traffic accidents. The decision system is used as a core part of the automatic driving technology of the vehicle, and is required to predict a driving track capable of avoiding surrounding obstacles in real time, so that the decision system is important for safe driving of the vehicle. The system is an automatic driving brain, and a safe and reasonable optimal track is planned for the intelligent automobile mainly according to the self-vehicle driving information sensed by the vehicle sensor and other traffic main body information such as the position, the speed and the lane line of the surrounding vehicle acquired based on V2X.

The main research direction of the vehicle decision-making system is to face the intelligent vehicle state, and the future track is predicted according to the collected historical track data. The methods used can be divided into two classes, including methods based on traditional physical models and methods based on neural network predictions. One type is a method based on a traditional physical model, and generally using models includes: the method comprises the steps of generating a future track of a predicted vehicle according to historical data representing physical actions, wherein the future track of the predicted vehicle is generated according to a constant speed model, a bicycle model and a Kalman filtering model, but the method rarely considers influence factors of surrounding vehicles, and parameters need to be adjusted in each situation, so that real-time performance and accuracy cannot be well ensured; the other type is a method based on neural network prediction, and the neural network is mainly used as follows: the method is to generate future tracks by encoding and decoding based on historical tracks of vehicles, and the effect of the method is proved to be superior to that of the method based on a traditional physical model, but the environmental data characteristics are not fully mined, and the interactive information of the vehicles and the surrounding environment cannot be well utilized.

In fact, intelligent automobiles must share the road with surrounding vehicles while traveling, and their travel trajectories are also affected and constrained by the road environment, e.g., lane geometry, crosswalks, traffic lights, and other vehicle behaviors. Based on the existing neural network method, the invention provides an intelligent automobile track prediction method integrating peripheral automobile interaction information by considering the influence of the running scene of the automobile and the surrounding automobile environment on track prediction and combining the dynamic graph neural network and the lane graph neural network.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to provide an intelligent automobile track prediction system and method for fusing peripheral automobile interaction information, so as to solve the problem that track prediction accuracy is affected due to neglect of interaction between an own automobile and peripheral automobiles in the prior art, and provide guarantee for safe and efficient running of the intelligent automobile.

In order to achieve the above purpose, the invention adopts the following technical scheme:

the invention discloses an intelligent automobile track prediction model system integrating peripheral vehicle interaction information, which is shown in fig. 1 and comprises four modules, namely: the system comprises a vehicle interaction relation extraction module, a driving scene characterization module, a space-time relation fusion module and a track prediction module. The vehicle interaction relation extraction module is used for defining an influence threshold value of 10 meters from vehicle history track data perceived by a sensor, mainly referring to position coordinates, constructing an interaction diagram representing interaction relation between a vehicle and surrounding vehicles, transmitting the original track sequence coordinates and the interaction diagram into a GCN network, and outputting track data representing the interaction relation diagram among vehicles.

The driving scene representation module constructs an interactive relation graph between lane segments according to the original perceived map information M, namely, the front segment, the subsequent, the left adjacent and the right adjacent lanes of the lane are represented, and then the interactive relation graph and the original map information M are transmitted into a lane graph convolution network together to output map data representing the interactive relation of the lane.

The space-time relationship fusion module fuses the data of the two modules, transmits track information representing an interaction relationship diagram between vehicles to map data, and grasps traffic congestion or traffic use condition; then updating the map data information fused with the track information at the moment through a lane diagram convolution network to realize real-time interconnection among lane segments, and outputting map feature data implicitly containing vehicle information; and finally, feeding back the updated real-time map features and the original track information to the vehicle, wherein the output information implicitly represents the historical track information with real-time map interaction and surrounding vehicle interaction.

The track prediction module inputs the historical track data fused by the space-time relation fusion module, decodes the two-dimensional track coordinates of the vehicle at the future moment through the processing of the encoder and the decoder, and simultaneously sets classification loss and outputs multiple modes through the stacking of a plurality of codes and decodes. The final output track coordinates are thus represented as sets of future track values, representing a number of possible future tracks for the same vehicle.

Further, the vehicle interaction relation extraction module includes: vehicle history trajectory X, build vehicle interaction graph G and graph roll-up network GCN. Constructing a vehicle interaction graph G, receiving historical track information X of a vehicle and surrounding vehicles, describing interaction relation of the vehicles on time and space levels in a graph matrix mode, and inputting the graph into a graph convolution GCN network together with the vehicle historical track X to capture complex interactions among different traffic vehicles and obtain historical track information with interaction information;

further, the driving scene representation module comprises a high-definition vector map M, an interactive lane map graph, a lane map convolution GCN and a full connection layer FC1. Considering the influence of driving scene information (including lane center line, steering and traffic control) on the track of a target vehicle, acquiring lane information of a high-definition vector map M by an interactive lane map, inputting the lane information and the high-definition vector map M into a lane map convolution GCN network, and extracting map feature information through a full-connection layer FC 1;

Further, the space-time relation fusion module is divided into three units, wherein the unit I receives historical track information and map feature information, introduces real-time vehicle information to lane nodes through a layer of Attention mechanism Attention and a full connection layer FC2, acquires the service condition of a lane, and outputs map data containing the historical track information of the vehicle; the second unit receives the output of the first unit, the history track information and the map feature information, and the lane node feature is updated by transmitting lane information through a lane graph convolution GCN layer and a full connection layer FC 3; and the third unit receives the historical track information and the map feature information, and performs real-time traffic information fusion with the updated features of the second unit through an Attention mechanism Attention and a full connection layer FC 4. The three units are used for realizing the transmission of real-time traffic information by constructing a stack of sequential cycle fusion blocks to acquire information flows among vehicles, lanes and between the vehicles, and finally outputting the track information of the vehicles to a track prediction module;

Further, the trajectory prediction module includes an encoder GRU, a decoder GRU, and last observed frame coordinates of Seq2 Seq. Firstly, the encoder GRU receives fusion characteristic information from a space-time relation fusion module as input, codes time dimension, then inputs the fusion characteristic information into the decoder GRU together with the observed frame coordinates, and repeatedly decodes BEV track coordinate values of future time steps. A classification branch is used to predict confidence scores of each mode, and K mode tracks of the own vehicle are obtained.

The invention also provides an intelligent automobile track prediction method integrating the peripheral vehicle interaction information, which comprises the following steps:

S1: firstly, preprocessing an input historical track of a predicted vehicle and a history track of surrounding vehicles and an interaction graph G between the vehicles; processing the historical track into a three-dimensional array of n×t_h ×c, where n represents n objects in the traffic scene observed in the past time step, t_h represents the historical period, and c=2 represents the x and y coordinates of the object;

representing an interaction graph G between vehicles as g= (V, E), wherein V represents nodes of the graph, i.e. observed vehicles, feature vectors on the nodes being coordinates of the object at time t; e represents the interactive connection edges between vehicles and is represented by an adjacent matrix; wherein, considering that there are connection edges between vehicles on the space-time characteristics, namely, connection edges of interaction influence between different vehicles generated by the influence of distance in space and connection edges of each vehicle with own historical moment in the time domain, the interaction graph G is expressed as an adjacency matrix:

G＝{A₀,A₁}

wherein a₀ is a time-connected edge adjacency matrix and a₁ is a space-connected edge adjacency matrix;

S2: mapping the history track and the interaction graph G to a high-dimensional convolution layer through a two-dimensional convolution layer; then carrying out space-time interaction through two layers of picture scroll lamination; the convolution kernel of the space-time interaction comprises two parts, namely an interaction graph G of a current observation frame and a trainable graph G_train with the same size as G; extracting space interaction information by using a convolution network with a convolution kernel as a sum of G and G_train, and processing data by using a time convolution layer with the size fixed as (1 multiplied by 3) through the convolution kernel on a time layer, so that data in the dimension of n multiplied by_h multiplied by c are processed along the time dimension, and after the space layer and the time layer are alternately processed, outputting track data of an inter-vehicle interaction relation diagram with the dimension of n multiplied by_h multiplied by c;

S3: extracting features according to the map data to obtain a structured map representation from the vectorized map data;

S3.1: firstly, constructing a lane map according to map data: according to the obtained lane data center line L_cen, the lane data center line L_cen is expressed as a series of two-dimensional bird's eye view angle coordinate points, any two communicated lane information, namely a left adjacent lane, a right adjacent lane, a front section and a rear section, is obtained, and is processed into four communicated dictionaries corresponding to the lane ids, and the front section of the lane L_{Front section}, the rear section of the lane L_{Subsequent to}, the left adjacent lane L_{Left adjacent} and the right adjacent lane L_{Right adjacent} of the given lane L are respectively expressed, so that a lane map is obtained;

S3.2: features in the lane map and map data are then added, the features including: the method comprises the steps of inputting a lane serial number l_id, a lane central line sequence point l_cen, a lane steering condition l_turn, whether a lane has traffic control l_con or not, whether the lane is an intersection l_inter or not, inputting the lane serial number l_id and the lane central line sequence point l_cen into a lane graph convolution GCN network together, and outputting map data containing lane interaction relations;

S4: fusing the track data of the inter-vehicle interaction relation diagram output in the step S2 with the map data containing the lane interaction relation output in the step S3.2, wherein the method comprises the following steps:

(1) Fusing the vehicle information to the lane nodes, and grasping the traffic congestion condition;

(2) Information fusion update between lane nodes is realized to realize real-time interconnection between lane segments;

(3) Fusing and feeding back map data characteristics and real-time traffic information to the vehicle;

The information update between the (2) th partial lane nodes adopts a lane graph convolution GCN mode, and a graph convolution is constructed by using an adjacent matrix with lane information to extract lane interaction information;

The mutual transmission between the vehicle information and the lane information, namely the (1) th and the (3) th parts, extracts the interactive features of three types of information, namely the input lane features, the vehicle features and the context node information through a spatial attention mechanism; the context node is defined as a channel node with a lane node and a vehicle node, wherein the l₂ distance between the lane node and the vehicle node is smaller than a threshold value;

the network of part (1) is arranged to: the method comprises the steps of forming new map feature information and two-dimensional feature data of vehicles by using n multiplied by 128 two-dimensional lane position information and n multiplied by 4-dimensional lane property features as unit input data, outputting lane features with vehicle information after stacking two layers of graph annotation force mechanisms and fully connecting one layer, and keeping the dimension as n multiplied by 128; the lane property characteristics include whether to turn, whether to have traffic control, and whether to be an intersection;

The network setting of the part (3) is the same as that of the part (1), and finally, vehicle characteristic information containing lane information and lane interaction information is output, and the dimension output is kept to be n multiplied by 128;

s6: outputting final motion trail prediction according to the fused vehicle characteristic information in S5; specifically:

For each vehicle agent, K possible future trajectories and corresponding confidence scores are predicted, the prediction comprising two branches: one regression branch predicts the trajectory of each mode and the other classification branch predicts the confidence score of each mode; for the nth participant, the K-sequence of BEV coordinates was regressed in the regression branch using the Seq2Seq structure, by: firstly, the fused vehicle characteristics are expanded to n multiplied by_h multiplied by c and then input into a Seq2Seq structure network, and vectors representing the vehicle characteristics are fed to corresponding input units of an encoder; the hidden feature of the encoder is then fed to the decoder together with the coordinates of the vehicle at the previous time step to predict the position coordinates of the current time step, in particular the coordinates of the vehicle in the "last history time" step as input to the first decoding step, the output of the current step being fed to the next decoder unit, so that the decoding process is repeated several times until the model predicts the position coordinates of all expected time steps in the future.

Further, in S2, the graph convolution is defined as y=lxw, whereRepresenting node characteristics,/>Representing a weight matrix,/>Representing output, N representing the total number of input nodes, F representing the characteristic number of the input nodes, O representing the characteristic number of the output nodes, and a graph Laplace matrix/>The expression of (2) is:

Wherein I, A and D are an identity matrix, an adjacency matrix and a degree matrix, respectively, I and a represent the self-connection and the connection between different nodes, all connections sharing the same weight W, and the degree matrix D is used for normalizing the data.

Further, before performing the graph convolution in S2, the interaction graph G is normalized:

Wherein A refers to an adjacency matrix, D refers to a degree matrix, and j refers to a data sequence. A_j represents an adjacency matrix representing the construction of the jth data sequence, and D_j represents a degree matrix representing the construction of the jth data sequence, in a manner of calculation:

The degree matrix D_j is a diagonal matrix, the number of nodes adjacent to the node i in k nodes is solved, and α is set to 0.001, so as to avoid empty rows in a_j.

Further, the lane diagram convolution GCN network in S3.2 is expressed as:

Wherein a_i and W_i refer to an adjacency matrix and a weight matrix corresponding to an ith lane connection mode, an X-finger node feature matrix, and a corresponding node feature X_i is an ith row of the node feature matrix X, and represents an input feature of an ith lane node, and includes a shape feature and a position feature of a lane, namely:

where v_i refers to the location of the i-th lane node, i.e. the midpoint between the two end points of the lane segment,And/>Respectively refer to the start position coordinate and the end position coordinate of the ith lane segment.

The invention has the beneficial effects that:

(1) The invention provides a graph convolution neural network considering surrounding vehicle interaction, which solves the problem that the information interaction of surrounding vehicles is not considered in the existing track prediction algorithm.

(2) The invention provides a method for extracting map information by replacing a bird's eye view by a high-definition vector map, wherein the vector map is used for defining the geometric shape of a lane, so that the problem of prediction discretization caused by the resolution problem is reduced.

(3) The invention provides a method for fusing the space-time relationship between the vehicle and the driving scene, and introduces new lane characteristics to represent the generalized geometric relationship between the vehicle and the lanes, thereby effectively improving the accuracy of track prediction when facing lanes with different shapes and numbers.

(4) The invention provides a multi-Seq 2Seq structure stacking mode for predicting future multi-modal tracks of a vehicle and the probability of selecting different lanes, and improves the limitation of single track output.

Drawings

FIG. 1 is a schematic diagram of the predictive model architecture of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

Modeling analysis of trajectory prediction problem

The trajectory prediction problem can be expressed as a problem of predicting the trajectory of a vehicle in a future scene based on the historical trajectory information of all objects. Specifically, the input to the model is the historical track X of all observations over a historical time t_h:

Wherein,

The positions of the abscissa and the ordinate of n observing vehicles at the time t are indicated;

Further, in consideration of the influence of the static environment around the vehicle on the running of the vehicle, the present invention simultaneously takes into consideration the map lane information within the scene, and therefore the input section includes the map data M of the scene in addition to the history track of the vehicle:

M＝[l_id,l_cen,l_turn,l_con,l_inter]

where l_id denotes the lane sequence number, l_cen denotes the lane centerline sequence point, l_turn denotes the lane steering situation, l_con denotes whether the lane has traffic control, and l_inter denotes whether the lane is an intersection.

The future time coordinate sequence Y from t_h +1 to t_h+t_f is output after model training:

The raw data needs to be preprocessed before it is input into the model. Firstly, the predicted vehicles and the surrounding vehicles in the traffic scene are sampled at the frequency of 10Hz, and the position coordinates of all the vehicle sampling points, namely the transverse coordinates and the longitudinal coordinates of the vehicles, are obtained. The predicted vehicle coordinates are set to be (0, 0), and the vehicle coordinates around the predicted vehicle are corrected to be relative coordinates with the predicted vehicle as an origin, so that generalization and robustness of the model are enhanced. Track information of the next 3s is then predicted from the track information of the previous 2s as history data.

(II) design model implementation trajectory prediction

As shown in fig. 1, a track prediction model for fusing surrounding vehicle interaction information according to the present invention includes: the system comprises a vehicle interaction relation extraction module, a driving scene characterization module, a space-time relation fusion module and a track prediction module. The track prediction method by using the model comprises the following steps:

The input of the vehicle interaction relation extraction module comprises two parts, namely a history track of the predicted vehicle and surrounding vehicles and an interaction graph G between the vehicles.

The input is first preprocessed to process the history trace into a three-dimensional array of n x t_h x c, where n represents n objects in the traffic scene observed in the past time step, t_h refers to the history period, and c=2 represents the x and y coordinates of the object.

Representing an interaction graph G between vehicles as g= (V, E), wherein V represents nodes of the graph, i.e. observed vehicles, feature vectors on the nodes being coordinates of the object at time t; e represents the interactive connection edge between vehicles, and is represented by an adjacent matrix when a model is input. In consideration of the fact that there are connecting edges between vehicles on the space-time characteristics, namely, connecting edges of interaction influence between different vehicles generated by the influence of distance in space and connecting edges of each vehicle and own historical moment in a time domain, the interaction graph G is expressed as:

G＝{A₀,A₁}

where a₀ is the time-connected edge adjacency matrix and a₁ is the space-connected edge adjacency matrix.

In the data processing, taking all tracks of the last frame of the history frame into consideration, taking a predicted vehicle as a center, defining a 10-meter area (empirically taking value) as an influence radius r, calculating a distance value l between a peripheral vehicle and the predicted vehicle, considering that interaction is caused between vehicles when l is smaller than or equal to r compared with the empirical radius distance r, setting an adjacent matrix value as 1, and otherwise setting the adjacent matrix value as 0, thereby constructing a space matrix A₁ at the current observation time, and processing A₀ as an identity matrix I in a vehicle time domain.

After data processing, the history track and the interaction graph G are transmitted into a vehicle interaction relation extraction module, and the history track and the interaction graph G are mapped to a high-dimensional convolution layer through a layer of 2D convolution layer; then, the space-time interaction is carried out through the two layers of picture scroll lamination. Considering the time-varying nature of the spatial interaction, the convolution kernel of the spatial interaction consists of the sum of two parts, namely the interaction graph G of the current observation frame and the trainable graph G_train which is consistent with the size of G but participates in training.

The convolution is defined as y=lxw, whereRepresenting a node feature matrix,/>Representing a weight matrix,/>Representing the output (N represents the total number of input nodes, F represents the characteristic number of the input nodes, O represents the characteristic number of the output nodes), and drawing the Laplace matrix/>The expression of (2) is:

Wherein I, A and D are the identity matrix, the adjacency matrix, and the degree matrix, respectively. I and a represent self-connections and connections between different nodes. All connections share the same weight W, and the degree matrix D is used to normalize the data.

Therefore, before performing the graph convolution operation, to ensure that the value range of the element graph remains unchanged after performing the graph operation, the present invention normalizes the interaction graph G using the following equation:

In this way, the space interaction information is extracted through the convolution network with the convolution kernels G and G_train, and then the data in the dimension n×t_h ×c is processed along the time dimension (the second dimension) by the time convolution layer with the convolution kernel fixed to be (1×3) in the time plane. And outputting data with dimensions of n multiplied by_h multiplied by c unchanged after the space layer and the time layer are alternately processed, wherein the data are subsequently used for being fused with the output data of the driving scene representation module.

The driving scene representation module extracts features from the input map data M, and learns the structured map representation from the vectorized map data. Firstly, a lane map is required to be built according to map data M before the input module, and any two communicated lanes, namely a left adjacent lane, a right adjacent lane, a front section and a subsequent lane, can be obtained according to an obtained lane data central line l_cen (expressed as a series of two-dimensional bird-eye view angle coordinate points). The data are processed into four connected dictionaries corresponding to the lane ids, which respectively represent the previous section of the lane L_{Front section}, the following lane L_{Subsequent to}, the left adjacent lane L_{Left adjacent} and the right adjacent lane L_{Right adjacent} of the given lane L, thereby obtaining a lane map. The interactive lane map is then entered into the lane-map convolutional GCN network along with other features in the map data M, including a lane sequence number l_id, a lane centerline sequence point l_cen, a lane turn condition l_turn, whether the lane has traffic control l_con, and whether the lane is an intersection l_inter. The invention deforms the conventional graph convolution to obtain the lane graph convolution GCN network, which is expressed as:

Wherein A_i and W_i refer to an adjacency matrix and a heavy matrix corresponding to the ith lane connection mode (i.e. i epsilon { front section, next, left adjacent, right adjacent }), and X refers to a node characteristic matrix. The corresponding node feature X_i is the ith row of the node feature matrix X, and represents the input feature of the node of the ith lane, including the shape feature and the position feature of the lane, namely:

Considering that the prediction process has the situation that the vehicle generates long-distance historical track segments due to too high speed in a fixed historical time period, and the long-distance historical track segments usually occur in a straight line lane segment, the expansion convolution mode can be adopted in the straight line segment, and the adjacency matrix is increased to enlarge the field of view. The lane features are then output through a layer of n x 128-dimensional fully connected layers after the convolutional network.

The space-time relationship fusion module mainly fuses the characteristics output by the vehicle interaction relationship extraction module and the driving scene representation module, and sequentially realizes: (1) Transmitting the vehicle information to the lane nodes, and grasping lane congestion or other service conditions; (2) Information between lane nodes is updated to realize real-time interconnection between lane segments; (3) And fusing and feeding back the updated map features and the real-time traffic information to the vehicle. The information update between the lane nodes in the part (2) still adopts a lane graph convolution GCN mode, and a graph convolution is constructed by using an adjacent matrix with lane information to extract lane interaction information. And the mutual transmission between the vehicle information and the lane information, namely the (1) th and the (3) th parts, the interactive features of three types of information, namely the input lane features, the vehicle features and the context node information, are extracted through a spatial attention mechanism. Where a context node is defined as a lane node that is less than a threshold value, where the threshold value may take a 6 meter checked value, from the vehicle node's l₂ distance. The network of part (1) is arranged to: the driving scene representation module extracts n×128 two-dimensional lane position information and n×4-dimensional lane property characteristics (whether steering, traffic control and intersection) to form new map characteristic information and two-dimensional characteristic data of vehicles as the unit input data, and outputs lane characteristics with vehicle information after stacking two layers of graph annotation force mechanisms and full-connection of one layer, wherein the dimension is n×128. And (3) the network structure of the part (1) is consistent, vehicle characteristic information containing lane information and lane interaction information is finally extracted, and the dimension output is kept to be n multiplied by 128.

The track prediction module takes the fused vehicle characteristic information as input, and the multi-mode prediction head outputs final motion track prediction. For each vehicle agent, K possible future trajectories and corresponding confidence scores are predicted. The prediction module thus has two branches, one regression branch predicting the trajectory of each mode and the other classification branch predicting the confidence score of each mode. For the nth participant, the K sequence of BEV coordinates was regressed using the Seq2Seq structure in the regression branch. The specific process is as follows: firstly, the fused vehicle characteristics are expanded into n multiplied by_h multiplied by c and then input into a Seq2Seq structure network, and vectors representing the vehicle characteristics are fed to corresponding input units of an encoder GRU (in each time dimension); the hidden features of the encoder GRU, along with the coordinates of the vehicle at the previous time step, are then fed together to the decoder GRU to predict the position coordinates of the current time step. Specifically, the input to the first decoding step is the coordinates of the vehicle in the "last historic moment" step, and the output of the current step is fed to the next GRU unit. Such a decoding process is repeated several times until the model predicts the position coordinates of all expected time steps in the future.

(III) model training

The invention collects real vehicle data in a continuous time period in a track prediction implementation scene as a data set of model training, and a training set, a verification set and a test set used for model training are all taken from the data set.

The invention uses pytorch framework to train the model, wherein the model uses Adam optimizer to accelerate the learning speed of the model, and the learning rate of Adam optimizer is set to 0.001, so that the training can find the global optimal point more accurately. The loss function consists of the sum of a lane classification error and a track regression error, wherein the lane classification loss adopts two classification range losses, and the track regression loss adopts Root Mean Square Error (RMSE) loss. The evaluation results use the L2 distance FDE between the end point of the best predicted trajectory and the ground truth value and the average L2 distance ADE between the best predicted trajectory and the ground truth value.

The training rounds are adjusted in real time according to actual demands and training effects, and model parameter files are stored once after each round of training.

The above list of detailed descriptions is only specific to practical embodiments of the present invention, and they are not intended to limit the scope of the present invention, and all equivalent manners or modifications that do not depart from the technical scope of the present invention should be included in the scope of the present invention.

Claims

1. The intelligent automobile track prediction system integrating the peripheral automobile interaction information is characterized by comprising an automobile interaction relation extraction module, a driving scene characterization module, a space-time relation fusion module and a track prediction module;

The vehicle interaction relation extraction module mainly comprises position coordinates according to vehicle history track data, an interaction diagram representing interaction relation between a vehicle and surrounding vehicles is constructed within a threshold range, and the original track sequence coordinates and the interaction diagram are input into a GCN network together to obtain track data representing the interaction relation diagram among vehicles;

the driving scene representation module constructs an interactive relation graph between lane segments according to the original perceived map information, namely, the front segment, the subsequent lane, the left adjacent lane and the right adjacent lane of the lane are represented, then the interactive relation graph and the original map information are input into a lane graph packing network together, and map data representing the interactive relation of the lane is output;

the space-time relationship fusion module fuses the data output by the vehicle interaction relationship extraction module and the driving scene characterization module, transmits track information representing an interaction relationship diagram among vehicles to map data, and grasps traffic congestion or traffic usage conditions; then updating the map data information fused with the track information at the moment through a lane graph rolling network to realize real-time interconnection among lane segments, and outputting map feature data containing vehicle information; finally, the updated real-time map features and original track information are fed back to the vehicle, and the output information represents the historical track information with real-time map interaction and surrounding vehicle interaction;

The track prediction module inputs the historical track information fused by the space-time relation fusion module, decodes the two-dimensional track coordinates of the vehicle at the future moment through the processing of the encoder and the decoder, sets classification loss through the stacking of a plurality of codes and decodes, outputs multiple modes, and finally, the output track coordinates are expressed as a plurality of groups of future track values to express that the same vehicle corresponds to a plurality of possible future tracks.

2. The intelligent vehicle track prediction system for fusing information about vehicle interactions around of claim 1, wherein the vehicle interaction relation extraction module comprises: the vehicle history track X module, the building module of the vehicle interaction diagram G and the diagram rolling network GCN module; the vehicle interaction graph G construction module receives the historical track information X of the own vehicle and the surrounding vehicles of the vehicle history track X module, describes the interaction relation of the vehicles on the time and space levels in a graph matrix mode, and inputs the graph into the graph convolution GCN network together with the vehicle history track X to capture complex interactions among different traffic vehicles and obtain the historical track information with interaction information.

3. The intelligent automobile track prediction system integrating peripheral automobile interaction information according to claim 1, wherein the driving scene representation module comprises a high-definition vector map module, an interactive lane map module, a lane map convolution GCN module and a full connection layer FC1 module; considering the influence of driving scene information on the track of a target vehicle, wherein the driving scene information comprises a lane center line, steering information and traffic control information, an interactive lane map module acquires lane information of a high-definition vector map, the interactive lane map and the high-definition vector map are input into a lane map convolution GCN module, and map feature information is extracted through a full connection layer FC1 and comprises the lane information.

4. The intelligent automobile track prediction system for fusing surrounding vehicle interaction information according to claim 1, wherein the space-time relationship fusion module comprises three units, and the three units are used for acquiring information flows among vehicles, lanes and lanes by constructing a stack of sequential cycle fusion blocks so as to realize real-time traffic information transmission and finally output track information of the vehicles to the track prediction module.

5. The intelligent vehicle track prediction system incorporating peripheral vehicle interaction information according to claim 4, wherein, among the three units: the method comprises the steps that firstly, historical track information and map feature information are received, real-time vehicle information is transmitted to lane nodes through a layer of Attention mechanism Attention and a full connection layer FC2, the service condition of a lane is obtained, and map data containing the historical track information of the vehicle is output; the second unit receives the output of the first unit, the history track information and the map feature information, and the lane node feature is updated by transmitting lane information through a lane graph convolution GCN layer and a full connection layer FC 3; and the third unit receives the history track information and the map feature information, and carries out real-time traffic information fusion with the lane node feature updated by the second unit through the Attention mechanism Attention and the full connection layer FC 4.

6. The intelligent vehicle track prediction system in combination with surrounding vehicle interaction information of claim 1, wherein the track prediction module comprises a Seq2Seq encoder, a Seq decoder and an observation frame coordinate module; firstly, an encoder receives traffic information fusion characteristics from a space-time relation fusion module as input, codes time dimension, then inputs the traffic information fusion characteristics into a decoder together with observed frame coordinates, and the decoder repeatedly decodes BEV track coordinate values of future time steps and predicts confidence scores of each mode by using a classification branch to obtain K mode tracks of the own vehicle.

7. An intelligent automobile track prediction method integrating peripheral automobile interaction information is characterized in that,

G＝{A₀,A₁}

S2: mapping the history track and the interaction graph G to a high-dimensional convolution layer through a two-dimensional convolution layer; then carrying out space-time interaction through two layers of picture scroll lamination; the convolution kernel of the space-time interaction comprises two parts, namely an interaction graph G of a current observation frame and a trainable graph G_train with the same size as G; extracting space interaction information by using a convolution network with G and G_train as convolution kernels, and then processing data by using a time convolution layer with the size fixed as (1 multiplied by 3) of the convolution kernels on a time layer, so that data in the dimension of n multiplied by_h multiplied by c are processed along the time dimension, and outputting track data of an inter-vehicle interaction relation diagram with the dimension of n multiplied by_h multiplied by c after the space layer and the time layer are alternately processed;

S3.1, firstly, constructing a lane map according to map data: according to the obtained lane data center line L_cen, the lane data center line L_cen is expressed as a series of two-dimensional bird's eye view angle coordinate points, any two communicated lane information, namely a left adjacent lane, a right adjacent lane, a front section and a rear section are obtained, and are processed into four communicated dictionaries corresponding to the lane ids, and the four communicated dictionaries respectively represent a front section lane L_{Front section}, a rear connection lane L_{Subsequent to}, a left adjacent lane L_{Left adjacent} and a right adjacent lane L_{Right adjacent} of a given lane L, so that a lane map is obtained;

S3.2, inputting the characteristics in the lane map and the map data into a lane map convolution GCN network together, and outputting the map data containing the lane interaction relationship; the features include: lane sequence number l_id, lane data center line l_cen, lane steering condition l_turn, whether the lane has traffic control l_con, whether the lane is intersection l_inter;

S4, fusing the track data of the inter-vehicle interaction relation diagram output by S2 with the map data containing the lane interaction relation output by S3.2, wherein the method comprises the following steps:

The network of part (1) is arranged to: forming new map feature information and two-dimensional feature data of a vehicle by using the n×128-dimension lane position information and the n×4-dimension lane property feature as input data, and outputting lane features with vehicle information after stacking two layers of drawing and meaning force mechanisms and fully connecting one layer, wherein the dimension is n×128; the lane property characteristics include whether to turn, whether to have traffic control, and whether to be an intersection;

S6, outputting final motion trail prediction according to the fused vehicle characteristic information of the S4; specifically:

For each vehicle, K possible future trajectories and corresponding confidence scores are predicted, the prediction comprising two branches: one regression branch predicts the trajectory of each mode and the other classification branch predicts the confidence score of each mode; for the nth participant, the K-sequence of BEV coordinates was regressed in the regression branch using the Seq2Seq structure, by: firstly, the fused vehicle characteristics are expanded to n multiplied by_h multiplied by c and then input into a Seq2Seq structure network, and vectors representing the vehicle characteristics are fed to corresponding input units of an encoder; the hidden feature of the encoder is then fed to the decoder together with the coordinates of the vehicle at the previous time step to predict the position coordinates of the current time step, in particular the coordinates of the vehicle in the "last history time" step as input to the first decoding step, the output of the current step being fed to the next decoder unit, so that the decoding process is repeated several times until the model predicts the position coordinates of all expected time steps in the future.

8. The intelligent vehicle track prediction method based on the surrounding vehicle interaction information according to claim 7, wherein in S2, the graph convolution is defined as y=lxw, and whereinRepresenting node characteristics,/>Representing a weight matrix,/>Representing output, N representing the total number of input nodes, F representing the characteristic number of the input nodes, O representing the characteristic number of the output nodes, and a graph Laplace matrix/>The expression of (2) is:

9. The intelligent vehicle track prediction method based on the surrounding vehicle interaction information fusion according to claim 7, wherein before the graph rolling in S2 is performed, the interaction graph G is normalized:

Wherein, A refers to an adjacency matrix, D refers to a degree matrix, j refers to a data sequence, A_j refers to an adjacency matrix constructed by the jth data sequence, D_j refers to a degree matrix constructed by the jth data sequence, and the calculation mode is as follows:

10. The intelligent automobile track prediction method based on the fusion of the surrounding vehicle interaction information according to claim 7, wherein the lane diagram convolution GCN network in S3.2 is expressed as:

Wherein W₀ represents a weight matrix of a self-vehicle lane, a_i and W_i refer to an adjacent matrix and a weight matrix corresponding to an ith lane connection mode, an X-finger node feature matrix, and a corresponding node feature X_i is an ith row of the node feature matrix X, and represents an input feature of an ith lane node, including a shape feature and a position feature of the lane, namely: