Disclosure of Invention
The purpose of the invention is as follows: the invention provides an automatic driving vehicle roll control method based on DDPG and iterative control, which enables a vehicle to safely run in a cross-sea large bridge intersection environment with any different complex levels and can improve the intellectualization level of the vehicle through an instant roll angle.
The technical scheme is as follows: the invention provides an automatic driving vehicle roll control method based on DDPG and iterative control, which specifically comprises the following steps:
(1) installing a laser radar, a vision and millimeter wave radar, an ultrasonic radar sensor, a positioning system and an inertial navigation system on the automatic driving vehicle;
(2) the method comprises the steps that a visual sensor, a positioning system and an inertial navigation system are used for achieving the position and the map of a vehicle under different scenes respectively so as to generate an automatic driving vehicle driving map under different scenes and achieve the environment required by a vehicle driving track;
(3) respectively controlling a steering wheel, an accelerator and a pedal, driving on the sea-crossing bridge, acquiring corresponding driving tracks in rainy and snowy days, strong wind and severe weather and sunny days, and constructing a data set;
(4) the driving map training DDPG algorithm of the automatic driving vehicle under different scenes is used for driving states of the sea-crossing bridge under different complex road condition grades in severe weather; the automatic driving vehicle generates a real-time vehicle state by interacting with map environments under different scenes, and determines the action behavior of the vehicle; when action training is carried out, an action space is initialized, an online strategy network in an actor network generates state space information, action output is carried out, and action noise is added to obtain an exploratory action space;
(5) and generating a path for predicting the state of the automatic driving vehicle based on LSTM historical memory and road planning attributes, realizing the tracking control of path tracks of the automatic driving vehicle under normal driving road conditions and extreme driving road conditions by adopting a DDPG algorithm, and realizing the compensation control of the automatic driving vehicle by adopting an iterative control method.
Further, the laser radar sensor in the step (1) is used for detecting dynamic and static obstacles on a road, including pedestrians, motorcycles, various vehicles and the like, and feasible road areas; the vision sensor is used for sensing lane lines, pedestrians and vehicles for detection, positioning and synchronous map creation; the millimeter wave radar sensor is used for detecting the distance between a vehicle and a pedestrian and between the vehicle and the running vehicle; the ultrasonic radar is used for detecting the distance between close-distance vehicles; the vision sensor, positioning system and inertial navigation system are used to implement vehicle positioning techniques.
Further, the different scenes in the step (2) include five scenes, namely a sea-crossing bridge road condition in rainy and snowy weather, a sea-crossing bridge road condition in severe wind and severe weather, a road condition when the bridge vibrates in sunny days, a driving road condition of a single vehicle in frequently changeable weather, and a driving road condition of multiple vehicles in frequently changeable weather.
Further, the data set in step (3) comprises vehicle speed, driving track, vehicle position, heading angle, slip angle, yaw rate and roll angle.
Further, the network design of the DDPG algorithm of step (4) is as follows:
an actor network is constructed, a vehicle state and an environment state are used as input, the output is a vector formed by a steering angle, an accelerator and a brake signal, the vector corresponds to 3 neurons of an actor strategy network output layer respectively, the activation function of the accelerator and the brake is set to be Sigmoid, the activation function of a steering action value is Tanh, and the structure of a hidden layer is as follows: the first layer is 200 neurons with convolution size 7 x 7,filter size 48,step size 4; the second layer is convolution size 5 x 5,filter size 16,step size 2, activation function ReLu function, total 400 neurons; the third layer adds 100 neurons of the LSTM layer; the fourth layer is a full link layer of 128 cells; the fifth layer is a full connection layer, and 128 units are provided; inputting a critic network into a state and action space, and splicing the state and action space with an activation function ReLu through two hidden layers, namely a first layer of 200 neurons and a second layer of 400 neurons to finally obtain a Q value; definition hi∈(St-T,St-T+1,…,St) Wherein S ist-TAnd StRespectively representing the current time and the current timeThe state information of the moment, the coded state is: s ═ f (h)i(ii) a β), the policy of the changed actor network is defined as: a is mu (h)i/β,γπ)+η。
Further, the implementation process of tracking and controlling the path trajectory of the automatic driving vehicle under the normal driving road condition in the step (5) is as follows:
the method comprises the following steps of (1) establishing a vehicle dynamic model by considering the characteristics of roll, sideslip and yaw of a vehicle according to a normal driving road condition, namely the road condition when a bridge vibrates in a clear day, setting a vehicle state constraint condition, and determining a transverse stability range, a maximum steering angle range and a range of allowable vehicle control for preventing roll so as to reduce the side offset error of the vehicle:
ωz-min≤ωz≤ωz-max,ωx-min≤ωx≤ωx-max,ux-min≤ux≤ux-max,er-x-min≤er≤er-x-max
in the formula, ωzIs the yaw angular velocity; omegaxIs the vehicle roll angle; u. ofxIs the steering angle; e.g. of the typerIs the lateral tracking offset error;
according to the road state information predicted by the LSTM, an objective function considering a steering angle, a tire adhesion coefficient, a roll angle error and a path tracking error is constructed, and the physical constraint of the vehicle determined by a transverse tracking offset error is determined under the condition of fully considering the dynamic constraint condition of the maximum vehicle allowable error, so that the error of tracking and controlling the vehicle is reduced:
in the formula, w1,w2,w3,w4Are respectively parameter variables; mu.srIs the road adhesion coefficient;
the method comprises the steps that compensation control of vehicle side inclination is achieved through an iterative control algorithm, and a reference vehicle state, a reference control input value and a reference output value are set, so that a tracking function under vehicle physical constraint and road constraint conditions under multiple constraint conditions is guaranteed, anti-interference performance of a vehicle during running is increased, and a model error rate is reduced;
according to the vehicle running condition, constructing a state space, an action space and a reward function required by a DDPG algorithm; the action space mainly comprises steering wheel turning angles, an accelerator and braking signals, and the state space comprises a vehicle transverse tracking error and a change rate thereof, a vehicle roll angle error and a change rate thereof, and a yaw rate error and a change rate thereof; the construction of the reward function is that under the condition of vibration of the sea-crossing bridge, the actual track of the vehicle changes, and a certain inclination angle is generated on a road, so that the reward function is equal to the multiplication of the discount factor and the change of the speed.
Further, the implementation process of the step (5) for tracking and controlling the path track of the automatic driving vehicle under the extreme driving road condition is as follows:
under extreme driving road conditions, namely under severe weather and other road condition influencing factors, the phenomenon that the vehicle is easy to generate wet slip and vibration is considered, the driving track is changed, the actual running speed of the vehicle is influenced, and the deviation between the actual vehicle speed and the planned vehicle speed is generated, so that the change of the actual vehicle speed is set as follows:
in the formula, vrefIs a reference vehicle speed; v. ofactIs the actual vehicle speed; α is an influencing factor;
when the error between the actual vehicle speed and the reference vehicle speed is within 2KM/H, the actual vehicle speed can be assumed to be equal to the reference vehicle speed; when the error between the actual vehicle speed and the reference vehicle speed is in a [ 25 ] KM/H interval, the vehicle speed of the actual vehicle speed is equal to the reference vehicle speed and the difference between the two vehicle speeds; wherein α is a speed variation factor; when the error between the actual speed and the reference speed is more than 5KM/H, the vehicle needs to be instantaneously braked at the moment, so that the running safety of the vehicle is ensured;
at time t, the set of actions and states of the vehicle is expressed as follows: { v1,…,vi…vn1, …, n, usingThe Bezier curve realizes path planning to generate a predicted trajectory without collision; to judge the accuracy of the vehicle speed planning, the vehicle states and actions of small samples taken from the experience buffer of the DDPG algorithm as reference values are: { vr1,…,vri…vrn1, …, n, and calculating the error of the two as:
in the formula, viIs the vehicle speed; l is the vehicle speed error rate; v. ofriIs a reference vehicle speed obtained from an experience buffer;
given a desired state reference trajectory, S, under extreme driving conditionsrOutput error is ek(t)=Sr(t) -s (t), learning law: u. ofk+1(t)=L(uk(t),ek(t)), the compensation control action a is obtainedk(ii) a Under normal driving, the DDPG algorithm is adopted to realize vehicle control, and the output action is aπ=μ(hi/β,γπ) + η, total motion control a ═ ak+aπThe reference formula is as follows:
in order to judge the accuracy of path planning, the track error when the actual running track of the actual running vehicle of the vehicle and the reference track generate the change of the roll angle is calculated as follows:
in the formula, r
actIs the actual vibrated vehicle trajectory; r is
refIs a reference trajectory;
is the maximum angle difference between the actual track and the reference track, and mu is the road surface adhesionA coefficient; sigma is a deviation angle when the vehicle generates transverse sideslip, phi is a vehicle sideslip angle, d is a vertical vibration distance, and chi is a deviation angle when the bridge vibrates vertically, namely a vertical inclination angle generated by the bridge deck;
the vertical inclination angle chi that the bridge vibration produced can maximize to set up as:
searching an optimal road driving area according to a high-precision map designed by a visual sensor, designing reference vehicle states, control inputs and parameter output values in the optimal road driving area, designing constraint conditions of multiple states, and realizing the roll control of the vehicle by adopting an iterative learning control algorithm;
the method comprises the following steps of searching a limit road driving area by a high-precision map designed by a visual sensor and a prediction state of an LSTM, designing a range of layered uncertain state parameters and action parameters, realizing roll control of a vehicle by adopting a DDPG (distributed data group) and an iterative control algorithm when the vehicle drives in the limit road driving area, wherein the iterative control algorithm plays a compensation role, and constructing a reward function according to a constraint range of a vehicle state as follows:
R=v·(R1+R2+…R6)
wherein R is1And R2Each represents a lateral distance error and a rate of change thereof; r3And R4Represents the lateral angular velocity and its rate of change; r5And R6Represents the roll angle and its rate of change; x is an angle value; v is vehicle speed; e.g. of the typeyIs the lateral distance, ki,kjRespectively, are reward factors.
Has the advantages that: compared with the prior art, the invention has the beneficial effects that: 1. the invention designs a comprehensive control method for automatically driving the vehicle to roll based on a reinforcement learning algorithm (DDPG), and controls the vehicle to roll in a complex road environment through reinforcement learning, so that the automatically driving vehicle realizes exploratory driving of the vehicle through exploration and utilization methods under the complex road conditions and extreme weather; 2. and the compensation effect of iterative learning control on the DDPG algorithm is carried out aiming at the extreme driving condition, so that the comprehensive control effect of the vehicle is realized, and the final safe driving of the vehicle is ensured.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
The invention provides an automatic driving vehicle roll control method based on DDPG and iterative control, which specifically comprises the following steps:
step 1: a laser radar, a vision and millimeter wave radar, an ultrasonic radar sensor, a positioning system and an inertial navigation system are installed on the automatic driving vehicle.
The invention aims at the road condition environment of a cross-sea bridge, and aims to control a vehicle to safely run at a medium-low speed (5-80KM/H) and realize high-level intellectualization of the vehicle through the control action of the vehicle which is instantaneously inclined. In order to achieve the above object, the present invention installs several laser radar, machine vision, millimeter wave radar and ultrasonic radar sensors on an autonomous vehicle, and installs a positioning system and an inertial navigation system (IMU) and the like. The laser radar sensor is used for detecting dynamic and static obstacles on a road, including pedestrians, motorcycles, various vehicles and the like, and feasible road areas; the machine vision sensor is used for sensing lane lines, pedestrians and vehicles for detection, positioning and synchronizing map creation; the millimeter wave radar sensor is used for detecting the distance between a vehicle and a pedestrian and between the vehicle and the running vehicle; the ultrasonic radar is used for detecting the distance between close-distance vehicles; positioning systems and inertial navigation Systems (IMUs) are used to implement vehicle positioning techniques.
Step 2: the position and the map of the vehicle under different scenes are respectively realized by using the visual sensor, the positioning system and the inertial navigation system, so that the driving map of the automatic driving vehicle under different scenes is generated, and the environment required by the driving track of the vehicle is realized.
The method comprises the steps of using a visual sensor, a positioning system and an inertial navigation system (IMU) to realize the position and the map of a vehicle in severe weather such as sunny days, rain, snow, fog and strong wind, so as to generate an automatic driving vehicle driving map under five scenes of a sea-crossing bridge road condition in rainy and snowy weather, a sea-crossing bridge road condition in severe weather, a road condition when a bridge vibrates in sunny days, a driving road condition of a single vehicle in frequently changeable weather and a driving road condition of multiple vehicles in frequently changeable weather, and the automatic driving vehicle driving map is used for realizing the environment required by a vehicle driving track.
And step 3: the method comprises the steps of respectively controlling a steering wheel, an accelerator and a pedal, driving on the sea-crossing bridge, obtaining corresponding driving tracks in rainy and snowy days, strong wind and severe weather and sunny days, and constructing a data set.
A plurality of experienced drivers respectively drive on the sea-crossing bridge under five scenes by controlling a steering wheel, an accelerator and a pedal, and corresponding driving tracks are recorded to construct corresponding data sets. The data set includes: the vehicle speed, the driving track, the vehicle position, the course angle, the slip angle, the yaw rate and the roll angle provide necessary reference data for training a data set and evaluating the controllability of the vehicle.
And 4, step 4: the driving map training DDPG algorithm of the automatic driving vehicle under different scenes is used for driving states of the sea-crossing bridge under different complex road condition grades in severe weather; the automatic driving vehicle generates a real-time vehicle state by interacting with map environments under different scenes, and determines the action behavior of the vehicle; when action training is carried out, action space is initialized, the online strategy network in the actor network generates state space information, action output is carried out, and action noise is added to obtain exploratory action space.
As shown in fig. 1, an actor network is constructed to provide vehicle status and environmentThe state is used as input, the output is a vector formed by a steering angle, an accelerator and a brake, the vector corresponds to 3 neurons of an actor strategy network output layer respectively, the activation function of the accelerator and the brake is set to be Sigmoid, the activation function of a steering action value is Tanh, and the hidden layer has the following structure: the first layer is 200 neurons with convolution size 7 x 7, filter size 48, step size 4; the second layer is convolution size 5 x 5, filter size 16, step size 2, activation function ReLu function, total 400 neurons; the third layer adds 100 neurons of the LSTM layer; the fourth layer is a full link layer of 128 cells; the fifth layer is a full connection layer, and 128 units are provided; inputting a critic network into a state and action space, and splicing the state and action space with an activation function ReLu through two hidden layers, namely a first layer of 200 neurons and a second layer of 400 neurons to finally obtain a Q value; definition hi∈(St-T,St-T+1,…,St) Wherein S ist-TAnd StRespectively representing the current time and the state information of the current time, the coded state is: s ═ f (h)i(ii) a β), the policy of the changed actor network is defined as: a is mu (h)i/β,γπ)+η。
Designing the action space of the vehicle, wherein the action space comprises the steering wheel rotation angle delta and the brake signal of the vehicle
And throttle signal

Considering that the running environment of the vehicle is complex, when the road environment is complex, the vehicle runs at a variable speed, the braking signal is the action generated under the extreme running condition of the vehicle to prevent the vehicle from generating roll and side-turning movement due to braking and wet and slippery road surface, and the action space at the moment is set to be three types of steering wheel turning angle, braking signal and throttle signal; when the vehicle is in a normal running working condition, the vehicle is assumed to run at a constant speed, in order to prevent the vehicle from generating a roll phenomenon due to wet and slippery road surfaces, the action space is set to be a steering wheel corner and an accelerator signal at the moment, and the two different running working conditions are set according to the two different running working conditionsAnd setting the constraint ranges of the three actions to ensure that the vehicle can controllably run in a drivable road area.
The vehicle acquires state data, typically including lateral distance and rate of change thereof, and vehicle roll angle and rate of change thereof, by way of exploration and utilization with the environment, which are typically contained in an experience buffer. When iterative learning is adopted to realize vehicle control, parameters such as states of reference vehicles and vehicle tracks need to be designed, the reference states and the tracks can be obtained from an experience buffer, and the reference tracks can be adjusted and changed along with the complexity of different road environments.
Under five different scenes of a sea-crossing bridge road condition in rainy and snowy weather, a sea-crossing bridge road condition in strong wind severe weather, a road condition when a bridge vibrates in clear weather, a driving road condition of a single vehicle in frequent and changeable weather, and a driving road of multiple vehicles in frequent and changeable weather, different generated tracks are used as reference paths, error comparison is carried out on the actually planned paths of the vehicles and the reference paths, various constraint conditions meeting the dynamic characteristics of the vehicles also need to be added to the reference paths, modification and adjustment are carried out to serve as set actual paths, and the conditions can be expressed as follows:
where σ is the path impact factor, prefIs a reference trajectory; p is a radical ofactAn actual trajectory; that is, when the vehicle runs on different road environments, the obtained running track needs to be modified and adjusted appropriately to meet the vehicle dynamics characteristics of the vehicle during automatic running, and then can be used as the running track of the automatic driving vehicle.
And 5: as shown in fig. 2, a predicted path of the state of the autonomous vehicle is generated based on LSTM history memory and road planning attributes, tracking control of the path trajectory of the autonomous vehicle under normal driving road conditions and under extreme driving road conditions is realized by using a DDPG algorithm, and compensation control of the autonomous vehicle is realized by using an iterative control method.
Under normal driving road conditions, namely road conditions under the vibration of a bridge in a fine day in five scenes, in order to ensure the safety and stability of a vehicle, when vehicle dynamics modeling is carried out, the roll, sideslip and yaw dynamics characteristics of the vehicle need to be considered, vehicle state constraint conditions are set, and a transverse stability range, a maximum steering angle range and a range of allowable vehicle control for preventing roll are determined so as to reduce the side offset error of the vehicle:
ωz-min≤ωz≤ωz-max,ωx-min≤ωx≤ωx-max,ux-min≤ux≤ux-max,er-x-min≤er≤er-x-max
in the formula, Ψ ═ vy ωz ωx φ]TIs a state vector, u1Is a control input, u2Is an auxiliary control input; omegazIs the yaw angular velocity; omegaxIs the vehicle roll angle; u. ofxIs the steering angle; e.g. of the typerIs the lateral tracking offset error.
According to the road state information predicted by the LSTM, an objective function considering a steering angle, a tire adhesion coefficient, a roll angle error and a path tracking error is constructed, and the physical constraint of the vehicle determined by a transverse tracking offset error is determined under the condition of fully considering the dynamic constraint condition of the maximum vehicle allowable error, so that the error of tracking and controlling the vehicle is reduced:
in the formula, w1,w2,w3,w4Are respectively parameter variables; mu.srIs road adhesion coefficient
As shown in fig. 3, an iterative control algorithm is used to implement the compensation control of the vehicle roll, and the reference vehicle state, the reference control input and the reference output value are set to ensure the tracking function under the vehicle physical constraint and the road constraint condition under the multi-constraint condition, increase the anti-interference performance to the vehicle running and reduce the error rate of the model. Firstly, a DDPG algorithm is adopted to train a network model, interactive training is carried out on a vehicle and a dynamic cross-sea bridge road condition, a training completion task is ensured, if the task is completed, the trained action is stored, if the training task is not ideal in completion effect, an iterative control algorithm is adopted to compensate parameters of an output action space, the training task is finally completed, better action is realized, and better automatic driving vehicle roll control is finally realized.
Under extreme driving conditions, namely a sea-crossing bridge road condition under the rain and snow weather in five scenes, a sea-crossing bridge road condition under the severe wind and severe weather, a driving road condition of a single vehicle under frequent and variable weather, and a driving road of multiple vehicles under frequent and variable weather; due to the influence of severe weather, the road environment has uncertainty, which interferes with the normal driving of the vehicle.
Under the extreme road conditions of traveling, the vehicle produces wet and slippery and vibration phenomenon easily, can lead to the orbit of traveling to change, and influences the actual operating speed of vehicle, makes actual speed of a motor vehicle and the speed of a motor vehicle of planning produce the deviation, therefore, the change of actual speed of a motor vehicle can set up as follows:
in the formula, vrefIs a reference vehicle speed; v. ofactIs the actual vehicle speed; alpha is an influencing factor.
When the error between the actual vehicle speed and the reference vehicle speed is within 2KM/H, the actual vehicle speed can be assumed to be equal to the reference vehicle speed; when the error between the actual vehicle speed and the reference vehicle speed is in a [ 25 ] KM/H interval, the vehicle speed of the actual vehicle speed is equal to the reference vehicle speed and the difference between the two vehicle speeds, wherein alpha is a speed change factor; when the error between the actual speed and the reference speed is more than 5KM/H, the vehicle needs to be braked instantaneously at the moment, so that the running safety of the vehicle is ensured.
Path prediction through LSTM calendar for autonomous vehiclesAfter the state is memorized by history, a predicted speed path is generated by using the road planning attribute, and at the time t, the motion and state of the vehicle are collectively expressed as follows: { v1,…,vi…vn1, …, n, and realizing path planning by adopting a Bezier curve to generate a predicted trajectory without collision; to judge the accuracy of the vehicle speed planning, the vehicle states and actions of small samples taken from the experience buffer of the DDPG algorithm as reference values are: { vr1,…,vri…vrn1, …, n, and calculating the error of the two as:
in the formula, viIs the vehicle speed; l is the vehicle speed error rate; v. ofriIs a reference vehicle speed obtained from an experience buffer.
In extreme conditions, given a desired state reference trajectory, SrOutput error is ek(t)=Sr(t) -s (t), learning law: u. ofk+1(t)=L(uk(t),ek(t)), the compensation control action a is obtainedk(ii) a Under normal driving, the DDPG algorithm is adopted to realize vehicle control, and the output action is aπ=μ(hi/β,γπ) + η, total motion control a ═ ak+aπThe reference formula is as follows:
in order to judge the accuracy of path planning, the track error when the actual running track of the actual running vehicle of the vehicle and the reference track generate the change of the roll angle is calculated as follows: the rolling motion phenomenon of the vehicle is mainly embodied in two situations, the first motion is that when the bridge generates vibration, the automatic driving vehicle deviates from a planned path, and at the moment, the vehicle is easy to roll and generate a roll angle; therefore, the track error of the actual running track of the vehicle and the reference track generating the change of the roll angle can be expressed as:
in the formula, r
actIs the actual vibrated vehicle trajectory; r is
refIs a reference trajectory;
the maximum angle difference between the actual track and the reference track, sigma is the deviation angle of the vehicle when the vehicle generates transverse sideslip, phi is the vehicle sideslip angle, d is the vertical vibration distance, and chi is the deviation angle of the bridge when the bridge vibrates vertically, namely the vertical inclination angle generated by the bridge deck.
The vertical inclination angle chi that the bridge vibration produced can maximize to set up as:
the vehicle has roll motion phenomenon, and the second motion is that the road surface generates wet and slippery phenomenon due to severe weather, so that the road adhesion coefficient changes, and the vehicle generates roll, sideslip and rollover motion; therefore, the track error of the roll angle change generated by the actual running track of the vehicle and the reference track can be expressed as:
in the formula, r
actIs the actual vibrated vehicle trajectory; r is
refIs a reference trajectory;
is the maximum angle difference between the actual track and the reference track, and mu is the road surface adhesion coefficient; sigma is the deviation angle when the vehicle produces lateral sideslip, phi is the vehicle sideslip angle, d is the vertical vibration distance, chi is the deviation angle when the bridge vibrates vertically, i.e. the vertical inclination produced by the bridge deck, mu is the road adhesion coefficient, which isRange is [01]。
According to a high-precision map designed by a visual sensor, an optimal road driving area is searched, a reference vehicle state, a control input value and a parameter output value are designed in the optimal road driving area, constraint conditions of multiple states are designed, and an iterative learning control algorithm is adopted to realize roll control of the vehicle, wherein the DDPG algorithm plays a compensation role at the moment, so that safe driving of the vehicle in the controllable road area is realized. Searching a limit road driving area according to a high-precision map designed by a visual sensor and a prediction state of an LSTM, designing a range of layered uncertain state parameters and action parameters, and when a vehicle drives in the limit road driving area, realizing roll control of the vehicle by adopting DDPG and an iterative control algorithm, wherein the iterative control algorithm plays a compensation role; when the vehicle is in the limit working condition, the constraint condition of the vehicle state needs to be increased, and the reward function is constructed as follows:
R=v·(R1+R2+…R6)
wherein R is1And R2Each represents a lateral distance error and a rate of change thereof; r3And R4Represents the lateral angular velocity and its rate of change; x is an angle value; v is vehicle speed; r5And R6Represents the roll angle and its rate of change; e.g. of the typeyIs the lateral distance, ki,kjRespectively, are reward factors.
The above description is only specific to possible embodiments of the present invention, and they are not intended to limit the scope of the present invention, and all equivalent modifications or variations that do not depart from the technical spirit of the present invention are intended to be included within the scope of the present invention.