Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide the automatic driving automobile lane change decision control method taking uncertainty into consideration, and the reliability of a decision result is improved by taking uncertainty of a predicted track of the other automobile into consideration, so that the comfort and safety of driving the automobile are enhanced.
The aim of the invention can be achieved by the following technical scheme: an automatic driving automobile lane change decision control method considering uncertainty comprises the following steps:
s1, based on a perceived vehicle state and a predicted other vehicle state, a state space and an action space are built by combining a vehicle dynamics model, and a state transition equation is built;
S2, respectively establishing an observation space model and a belief space model of the own vehicle, an observation space model and a belief space model of the other vehicle and an uncertainty model of a predicted track of the other vehicle;
s3, setting a reward function, combining the steps S1 and S2 to construct and obtain a POMDP (PARTIALLY OBSERVABLE MARKOV DECISION PROCESS, part of observable Markov decision process) model, and solving a decision state point set of the vehicle;
S4, decoupling the decision state point set of the own vehicle into a transverse space decision set and a longitudinal time decision set;
S5, determining a transverse drivable boundary, introducing road boundary constraint, vehicle speed constraint and obstacle distance constraint, and dividing a transverse drivable region and a reference path;
Determining a longitudinal drivable boundary, introducing road boundary constraint, vehicle speed constraint and barrier distance constraint, and dividing a longitudinal drivable region and a reference speed curve;
s6, outputting an optimal track of the corresponding vehicle by the vehicle planning module according to the transverse drivable area and the reference path, the longitudinal drivable area and the reference speed curve which are divided in the step S5, so that the vehicle can run according to the optimal track.
Further, the step S1 specifically includes the following steps:
S11, simplifying the motion of a vehicle into the motion of particles in a Frenet coordinate system by using a simplified vehicle kinematic model so as to construct a state space and an action space of the vehicle;
S12, using the state space and the action space to establish state transfer functions of the own vehicle and the other vehicle.
Further, the step S11 specifically includes the following steps:
s111, acquiring the perceived vehicle position, longitudinal speed, longitudinal acceleration and lateral speed information of the self-vehicle to form a state space of the self-vehicle;
the method comprises the steps of collecting predicted course angle, vehicle position, longitudinal speed, longitudinal acceleration and lateral speed information of the other vehicle, and combining the length and width of the vehicle body to form a state space of the other vehicle;
s112, setting a discrete transverse acceleration sequence and a discrete longitudinal acceleration sequence as an action space.
Further, the specific process of step S12 is as follows: the evolution process of the states of the own vehicle and the other vehicle is assumed to be independent, a state transition equation of the own vehicle is obtained according to a simplified vehicle motion model, and the state transition equation of the other vehicle is expressed by probabilities of different states at the next moment.
Further, the observation function of the observation space model in the step S2 is divided into a vehicle and another vehicle, wherein the vehicle observation function is represented by a boolean value, if the next state exists, the vehicle observation function is set to 1, otherwise, the vehicle observation function is set to 0;
The observation function of the other vehicle accords with Gaussian distribution;
And in the step S2, an uncertainty model of the prediction track of the other vehicle is constructed by adopting a multi-element Gaussian distribution.
Further, the step S3 specifically includes the following steps:
S31, setting a reward function comprising a safety reward function Rsafe, a comfort reward function Rcomfort and an efficiency reward function Refficiency, wherein the safety reward function Rsafe comprises a collision reward function Rcolli and a distance reward function Rdis;
The comfort rewards function Rcomfort includes a speed dependent comfort rewards function Rspeed, a lateral speed action penalty function Rvlat, and a continuity indicator Rcontinuity;
The efficiency rewards function Refficiency includes a target task rewards function Rlane and a target speed function Rv_tar;
s32, solving a POMDP model by adopting a deterministic sparse observable tree (DETERMINED SPARSE PARTIALLY Observable Tree, DESPOT) method based on the constructed state space, action space, belief space and rewarding function to obtain a decision state point set of the own vehicle, wherein the content of the decision state point set comprises position information of the own vehicle and speed and acceleration information at different moments.
Further, the step S31 is specifically to design the collision reward function Rcolli according to the probability of collision, that is, giving a penalty when the collision probability exceeds the set threshold;
step S31 is to design a distance rewarding function Rdis according to a collision time (TTC) model, and set upper and lower limits of a maximum safe distance;
In the step S31, the influencing factors of the speed-related comfort bonus function Rspeed include the current longitudinal speed vloncur, the longitudinal acceleration aloncur, the lateral speed vlatcur, and the longitudinal speed vlonnext, the longitudinal acceleration alonnext, and the lateral speed vlatnext at the next moment;
In the step S31, the lateral velocity action penalty function Rvlat is related to the lateral velocity only;
In the step S31, the continuity indicator Rcontinuity is positively correlated with the position change amounts of the two previous and subsequent decisions, and the larger the position change amounts of the two previous and subsequent decisions, the larger Rcontinuity is.
Further, the lateral space decision set in step S4 includes location information of the own vehicle; the longitudinal time decision set includes speed and acceleration information of the own vehicle.
Further, in the step S5, when the lateral drivable boundary is selected, the distance from the node of the lateral drivable boundary to the obstacle is greater than the safety distance, and the lateral drivable boundary value does not exceed the original boundary of the structured road, and the distance of a safety threshold is maintained;
The establishment process of the transverse reference path specifically comprises the following steps:
Firstly, constructing a transverse optimization problem, designing a transverse cost function to evaluate each node, and taking a point at the minimum position of the transverse cost function as an optimal solution, wherein the constraint of the transverse optimization problem is that a course angle is between the minimum and maximum wheel corners, the objective function of the transverse optimization problem is the transverse cost function, and the transverse cost function Cnode-h is a weighted sum of a first distance cost Cd, a first safety cost Co and a continuity cost Cc;
Calculating the distance from the node to the target state point as a first distance cost; calculating a distance from the node to the obstacle as a first security cost; calculating the position change rate between the front node and the rear node as a continuity cost; combining the corresponding weight coefficients, and calculating to obtain a transverse cost function;
And finally, taking the value at the minimum of the transverse cost function, and connecting the lines to form a transverse expected reference path.
Further, the principle of determining the longitudinal drivable boundary in step S5 is as follows: the longitudinal drivable boundary coincides with the parking space coordinates of the obstacle;
the constraints of the longitudinal runnability boundary are: the s-t curve of the longitudinal drivable boundary does not exceed the s-t curve representing the highest and lowest average vehicle speed;
the establishment process of the longitudinal reference speed curve specifically comprises the following steps:
Firstly, constructing a longitudinal optimization problem, designing a longitudinal cost function to evaluate each node, and taking a point at the minimum position of the longitudinal cost function as an optimal solution, wherein the longitudinal cost function Cnode-z is a weighted sum of a second distance cost Cd-ref, a second safety cost Co-ref and a speed change cost Cv;
Calculating the distance from the node to the state point connecting line to be used as a second distance cost; calculating the square of the difference between the distance from the node to the obstacle on the s-axis and the distance threshold value from the obstacle as a second safety cost; calculating the change rate of the reference speed as the speed change cost; calculating to obtain a longitudinal cost function by combining the corresponding weight coefficients;
And finally, taking the value at the minimum of the longitudinal cost function, and forming a longitudinal expected reference speed curve by connecting the two points.
Compared with the prior art, the method fully considers the uncertainty of the predicted track of the own vehicle, establishes the uncertainty model of the predicted track of the own vehicle by establishing the observation space model and the belief space model of the own vehicle and the other vehicle, and obtains the decision state point set of the own vehicle based on the POMDP model by combining the set reward function. Therefore, the determined vehicle state point set is more stable and reliable, the comfort and the safety of driving the vehicle are enhanced, and the vehicle driving method is well applicable to unmanned driving under typical driving conditions such as vehicle following, lane changing and autonomous overtaking.
After the decision state point set of the own vehicle is obtained by solving, the decision state point set is decoupled into a transverse space decision set and a longitudinal time decision set, and the reliability of a decision result can be further ensured by respectively dividing a transverse drivable region and a reference path and dividing a longitudinal drivable region and a reference speed curve through introducing road boundary constraint, vehicle speed constraint and barrier distance constraint.
Detailed Description
The invention will now be described in detail with reference to the drawings and specific examples.
Examples
As shown in fig. 1, an uncertainty-considered automatic driving automobile lane change decision control method comprises the following steps:
s1, based on a perceived vehicle state and a predicted other vehicle state, a state space and an action space are built by combining a vehicle dynamics model, and a state transition equation is built;
S2, respectively establishing an observation space model and a belief space model of the own vehicle, an observation space model and a belief space model of the other vehicle and an uncertainty model of a predicted track of the other vehicle;
s3, setting a reward function, combining the steps S1 and S2 to construct a POMDP model, and solving a decision state point set of the vehicle;
S4, decoupling the decision state point set of the own vehicle into a transverse space decision set and a longitudinal time decision set;
S5, determining a transverse drivable boundary, introducing road boundary constraint, vehicle speed constraint and obstacle distance constraint, and dividing a transverse drivable region and a reference path;
Determining a longitudinal drivable boundary, introducing road boundary constraint, vehicle speed constraint and barrier distance constraint, and dividing a longitudinal drivable region and a reference speed curve;
s6, outputting an optimal track of the corresponding vehicle by the vehicle planning module according to the transverse drivable area and the reference path, the longitudinal drivable area and the reference speed curve which are divided in the step S5, so that the vehicle can run according to the optimal track.
The channel change decision process of the embodiment is shown in fig. 2, and includes:
1. Based on the perceived state of the own vehicle and the predicted state of the other vehicle, constructing a state space and an action space according to the simplified vehicle dynamics model, and establishing a state transition equation;
the method specifically comprises the following steps:
11 Using simplified vehicle kinematic model to simplify the motion of vehicle into particle motion in Frenet coordinate system, to construct state space and action space of vehicle;
111 State space expression is:
S=[stateego,state1,state2,...,staten]
Wherein stateego is a state space of the vehicle, time is a time stamp, (sego,lego) is a position of the vehicle, vlonego is a longitudinal speed of the vehicle, accego is a longitudinal acceleration of the vehicle, and vlatego is lateral speed information of the vehicle;
staten is a state space of the other vehicle n, (length, width) is a length and width of the vehicle body of the other vehicle n, θ is a predicted course angle of the other vehicle, (s, l) is a position of the other vehicle, and vn is speed information of the other vehicle;
112 A discrete lateral acceleration sequence and a discrete longitudinal acceleration sequence are set as the action space:
A=[Acclon,Vellat]
Velvlat={-2.0,-1.5,-1.0,-0.5,0.0,0.5,1.0,1.5,2.0}
Acclon={-3.0,-2.5,-2.0,-1.5,-1.0,-0.5,0.0,0.5,1.0,1.5,2.0}
Wherein Velvlat is the lateral acceleration sequence; acclon is the longitudinal acceleration sequence;
12 Using the state space and the action space to establish the state transfer functions of the own vehicle and other vehicles;
the specific establishment process is as follows:
121 Assuming that the evolution processes of the states of the own vehicle and the other vehicle are mutually independent, obtaining a state transition equation of the own vehicle according to a simplified vehicle motion model, wherein the state transition equation of the other vehicle is expressed by probabilities of different states at the next moment;
the calculation formula of the state transition equation of the bicycle is as follows:
the calculation formula of the state transfer equation of the other vehicle is as follows:
2. establishing an observation space model and a belief space model of the own vehicle and the other vehicle, and establishing an uncertainty model of a prediction track of the other vehicle, wherein the observation space comprises information which can be observed by an intelligent vehicle environment perception positioning system, and comprises position, speed and course angle information of the own vehicle and position, speed and course angle information of the other vehicle;
The method comprises the following specific steps:
21 The state of the self-vehicle is completely considerable, and the observation function of the self-vehicle comprises coordinates and course angles, and accords with Gaussian distribution;
an observation function of an observation space is established, the self-vehicle observation function is represented by a Boolean value, if the next state exists and is set to be 1, otherwise, the self-vehicle observation function is set to be 0:
the observation function of the vehicle is:
the observation function of the other vehicle accords with Gaussian distribution, and the calculation formula is as follows:
where μs,l,θ represents the observed value, σs,l,θ represents the variance;
The process of calculating the position uncertainty through the observation model comprises the following steps:
22 Constructing an uncertainty model of the vehicle pose of the other vehicle by adopting a multi-element Gaussian score;
The uncertainty calculation formula of the self-vehicle pose of the other vehicle is as follows:
wherein Φ is a state transition matrix, M represents noise in a state transition process, Z is covariance of the noise, Σ is a covariance matrix of the system, X is a state quantity, and the expression mode is as follows:
the Gaussian distribution covariance of each state quantity on the prediction track point of the other vehicle is obtained through uncertainty modeling and is shown in figure 3;
3. setting a reward function, and solving a decision state point set of the vehicle based on the POMDP model;
31 The content included in the set bonus function: safety index Rsafe, comfort index Rcomfort, and efficiency index Refficiency;
32 A bonus function for designing a security indicator): considering the collision reward function Rcolli and the distance reward function Rdis, the specific calculation formula is:
Rsafe=Rcolli+Rdis
33 Designing a collision reward function Rcolli according to the probability of collision, giving a penalty when the collision probability exceeds a threshold, and specifically solving the collision reward function as follows:
Where wcolli = 100 is a weight value, Psafe is a safety threshold;
designing a reward function Rdis of the distance between the front vehicles according to a collision occurrence time (TTC) model, setting upper and lower limits of the maximum safe distance, and specifically calculating the reward function of the distance between the front vehicles as follows:
wherein dmax、dmin is the maximum safety distance and the minimum safety distance between the own vehicle and other vehicles, and wdis =20 is the weight value of the distance rewards;
34 Design comfort rewards function: considering the speed-related comfort reward function Rspeed, the lateral speed action penalty function Rvlat and the continuity indicator Rcontinuity, the comfort reward function is written:
Rcomfort=Rspeed+Rvlat+Rcontinuity
the solving formula of the speed-related comfort rewarding function Rspeed is as follows:
The current longitudinal speed is vloncur, the current longitudinal acceleration is aloncur, the current lateral speed is vlatcur, the longitudinal speed at the next moment is vlonnext, the longitudinal acceleration at the next moment is alonnext, the lateral speed at the next moment is vlatnext,wspeed =15, and the weight coefficient of the speed is the same;
The lateral velocity action penalty function is calculated as:
Rvlat=-vlat2×wvlat
wherein wvlat = 12 is the weight coefficient of the lateral velocity;
the continuity index calculation formula of the result is:
Wherein t represents the time of each step of the decision result; m and n represent the total step numbers of the front decision result and the rear decision result respectively, (slast_loop,t,llast_loop,t) and (scur_loop,t,lcur_loop,t) are the position information of the last decision result and the current decision result respectively, and wcon =12 is the weight coefficient of the continuity index;
35 Design efficiency bonus function Refficiency: considering the target task rewards function Rlane and the target speed function Rv_tar, the efficiency rewards function is written as:
Refficiency=Rlane+Rv_tar
Wherein the target task rewarding function is Rlane=-|llane-lego|×wlane;
The target speed function is Rv_tar=-|vtar-vego|×wv_tar;
wv_tar = 12 is a weight coefficient for the target speed;
36 Solving a decision action sequence, dividing the decision into a longitudinal space decision set of a transverse space decision set, and specifically solving a POMDP model by adopting a deterministic sparse observational tree method to obtain a vehicle decision state point set comprising vehicle position information and speed and acceleration information at different moments;
4. Decoupling the decision state point set into a transverse space decision set and a longitudinal time decision set, wherein the transverse space decision set comprises the position information of the own vehicle at each moment; the longitudinal time decision set comprises speed and acceleration information of each moment of the vehicle;
5. determining a transverse drivable boundary, introducing road boundary constraint, vehicle speed constraint and barrier distance constraint, and dividing a transverse drivable region and a reference path;
when the lateral drivable boundary is selected, the distance from the node of the boundary to the obstacle is larger than the safety distance, and the specific expression is as follows:
Wherein dobs is the distance from the current node to the obstacle, and when the distance is smaller than a certain safety distance, the node is regarded as unsafe, lk is the lateral coordinate of the node, and lmin≤lk≤lmax should be satisfied;
at the same time, it should also be possible to meet the boundary values lmin and lmax, at which the driving boundary value does not exceed the original boundary of the structured road, and to maintain a distance of the safety threshold dsafe:
the calculated transverse travelable area is shown in fig. 4;
establishing a transverse optimization model in a transverse drivable area to obtain a transverse reference path, wherein the transverse reference path comprises the following specific steps of:
Constructing a transverse optimization problem, designing a transverse cost function to evaluate each node, wherein the point at the minimum of the transverse cost function is the optimal point;
The constraint is heading angle constraint:
wherein, θ'min、θ'max is the minimum value and maximum value of the change of the course angle of the vehicle movement respectively;
the transverse cost function is used as an objective function, and the expression is:
Cnode-h=wdCd+woCo+wcCc
Wherein, Cd is the first distance cost of the node, Co is the first security cost, Cc is the continuity cost, and the corresponding weight functions are wd=0.35,wo=0.45,wc =0.2, respectively;
the first distance cost function Cd is expressed as:
Wherein (sstate,lstate) is the location of the target state point and (sk,lk) is the location of the extension node;
The first security cost function Co is expressed as:
Where dobs represents the distance of the node from the nearest obstacle and dmax represents the distance threshold from the obstacle;
the cost function Cc representing continuity is expressed as:
Wherein, li,li+1,li+2 is the lateral position of the front and rear three nodes;
Finally, a value connecting line at the minimum position of the transverse cost function is taken to form a transverse expected reference path;
The transverse desired reference path formed in this embodiment is represented in fig. 5 by the hollow dots;
6. Determining a longitudinal drivable boundary, introducing road boundary constraint, vehicle speed constraint and barrier distance constraint, and dividing a longitudinal drivable region and a reference speed curve;
The method for determining the longitudinal drivable boundary specifically comprises the following steps:
Ensuring that the boundary sborder in the s-axis direction satisfies the formula:
the sborder constraint is: smin_t≤sborder≤smax_t, wherein smax_t and smin_t are the upper and lower boundaries of the average speed, respectively;
the method for establishing the longitudinal reference speed curve specifically comprises the following steps:
constructing a longitudinal optimization problem, designing a longitudinal cost function to evaluate each node, wherein the point at the minimum of the longitudinal cost function is the optimal point; the longitudinal cost function is calculated as:
Cnode-z=wd-refCd-ref+wo-refCo-ref+wvCv
Wherein Cd-ref is the second distance cost from the node to the state point connection, Co-ref is the second security cost of the node, Cv is the speed change cost, and the corresponding weight coefficients are wd-ref=0.4,wo-ref=0.4,wv =0.2, respectively.
The second distance cost function is:
cd-ref=(si-sref)2
Wherein sref is the position of the node on the two-state point connection line;
the second safety cost function is:
Where sobs represents the distance along the s-axis of the node from the nearest obstacle and smax represents the distance threshold from the obstacle;
The cost function of the speed change is:
and finally, connecting values at the minimum of the longitudinal cost function to form a longitudinal expected reference speed curve.
The longitudinal reference velocity profile formed in this embodiment is shown in fig. 6.
In summary, the technical scheme considers the behavior interaction characteristics of the own vehicle and the traffic participants and the uncertainty of the predicted track of the other vehicle, the decision process is more stable, and the decided own vehicle state point set is more reasonable and reliable, so that the following vehicle planning module can accurately and rapidly solve, and the comfort and safety of vehicle driving are effectively enhanced.