CN114228690A

Movatterモバイル変換

Info

Publication number: CN114228690A
Application number: CN202111353270.7A
Authority: CN
Inventors: 唐晓峰
Original assignee: Yangzhou University
Current assignee: Yangzhou University
Priority date: 2021-11-16
Filing date: 2021-11-16
Publication date: 2022-03-25
Anticipated expiration: 2041-11-16
Also published as: CN114228690B

Abstract

Translated fromChinese

本发明公开了一种基于DDPG和迭代控制的自动驾驶车辆侧倾控制方法，在不同场景下的自动驾驶车辆行驶地图训练DDPG算法，自动驾驶车辆通过与不同场景下的地图环境交互，产生实时的车辆状态，确定车辆的动作行为；在进行动作训练时，对动作空间进行初始化，演员网络中的online策略网络产生状态空间信息，进行动作输出，并增加一个动作噪声来获取具有探索性的动作空间；基于LSTM历史记忆和道路规划属性，生成自动驾驶车辆状态预测的路径，采用DDPG算法实现自动驾驶车辆正常行驶工况下和极端行驶工况下路径轨迹的跟踪控制，并采用迭代控制方法实现自动驾驶车辆补偿控制。本发明避免了车辆在极端道路环境行驶条件下，强化学习算法的不稳定性问题的发生，提升了车辆的行驶安全性和鲁棒性。

The invention discloses a roll control method of an automatic driving vehicle based on DDPG and iterative control. The DDPG algorithm is trained on the driving map of the automatic driving vehicle in different scenarios, and the automatic driving vehicle interacts with the map environment in different scenarios to generate real-time The vehicle state determines the action behavior of the vehicle; during action training, the action space is initialized, and the online policy network in the actor network generates state space information, performs action output, and adds an action noise to obtain an exploratory action space. ;Based on the LSTM historical memory and road planning attributes, the path for the state prediction of the autonomous vehicle is generated, the DDPG algorithm is used to realize the tracking control of the path trajectory of the autonomous vehicle under normal driving conditions and extreme driving conditions, and the iterative control method is used to realize automatic driving. Driving vehicle compensation control. The invention avoids the occurrence of instability of the reinforcement learning algorithm under the driving conditions of the vehicle in extreme road environment, and improves the driving safety and robustness of the vehicle.

Description

Automatic driving vehicle roll control method based on DDPG and iterative control

Technical Field

The invention belongs to the field of intelligent vehicle control, and particularly relates to an automatic driving vehicle roll control method based on DDPG and iterative control.

Background

With the development of artificial intelligence technology, the current automatic driving technology has been greatly developed, and the application is more common in closed park scenes, such as closed campus scenes, logistics industry park scenes and the like, especially in a bay road environment with structured road characteristics and fewer pedestrians and vehicles. The automatic driving vehicle adopts environment perception, navigation, map positioning, decision, motion planning and track tracking control to realize the intellectualization of the vehicle. However, when the automatic driving vehicle is in complicated weather and complicated driving environments such as a sea-crossing bridge and the like, the road condition of the bridge is influenced by severe weather environments, so that the vehicle is steered or sideslips and turns on one side, and road adhesion coefficient change and tire slip are caused by rain, snow, wind and the like in the weather environments, so that path tracking, lane keeping and vehicle control accuracy are changed. In addition, the bridge road environment may vibrate due to the influence of wind weather, so that a vehicle roll phenomenon may occur, and the control may be impossible. Therefore, road vibration characteristics, road angles and aerodynamic characteristics are also considered in the overall control design, and the control technology is a complex task when the vehicle has uncertain characteristics such as a side-slip phenomenon caused by a wet road surface, a vehicle yaw characteristic caused by bridge vibration, a vehicle roll dynamics phenomenon caused by high-speed vehicle performance and the like. Therefore, it is an important key technology to conduct the research on the driving safety and stability based on the roll control of the autonomous vehicle in the severe environment. Reinforcement learning is an application range of artificial intelligence technology, an agent can explore unknown dynamic environment, try different actions and interact with the dynamic environment without any precise vehicle model and given surrounding environment, the reinforcement learning can learn the unknown environment, complex vehicle dynamics is realized through the action and state of interaction with the environment, and the method is a new realization method suitable for the cognition of the dynamic road environment and the performance of the complex vehicle dynamics. Therefore, the adoption of the reinforcement learning algorithm to realize the vehicle control of the driving environment of the automatic driving vehicle on the cross-sea bridge road condition is beneficial to realizing the intelligent safety of the vehicle and realizing the large-scale industrial development of the automatic driving vehicle.

Disclosure of Invention

The purpose of the invention is as follows: the invention provides an automatic driving vehicle roll control method based on DDPG and iterative control, which enables a vehicle to safely run in a cross-sea large bridge intersection environment with any different complex levels and can improve the intellectualization level of the vehicle through an instant roll angle.

The technical scheme is as follows: the invention provides an automatic driving vehicle roll control method based on DDPG and iterative control, which specifically comprises the following steps:

(1) installing a laser radar, a vision and millimeter wave radar, an ultrasonic radar sensor, a positioning system and an inertial navigation system on the automatic driving vehicle;

(2) the method comprises the steps that a visual sensor, a positioning system and an inertial navigation system are used for achieving the position and the map of a vehicle under different scenes respectively so as to generate an automatic driving vehicle driving map under different scenes and achieve the environment required by a vehicle driving track;

(3) respectively controlling a steering wheel, an accelerator and a pedal, driving on the sea-crossing bridge, acquiring corresponding driving tracks in rainy and snowy days, strong wind and severe weather and sunny days, and constructing a data set;

(4) the driving map training DDPG algorithm of the automatic driving vehicle under different scenes is used for driving states of the sea-crossing bridge under different complex road condition grades in severe weather; the automatic driving vehicle generates a real-time vehicle state by interacting with map environments under different scenes, and determines the action behavior of the vehicle; when action training is carried out, an action space is initialized, an online strategy network in an actor network generates state space information, action output is carried out, and action noise is added to obtain an exploratory action space;

(5) and generating a path for predicting the state of the automatic driving vehicle based on LSTM historical memory and road planning attributes, realizing the tracking control of path tracks of the automatic driving vehicle under normal driving road conditions and extreme driving road conditions by adopting a DDPG algorithm, and realizing the compensation control of the automatic driving vehicle by adopting an iterative control method.

Further, the laser radar sensor in the step (1) is used for detecting dynamic and static obstacles on a road, including pedestrians, motorcycles, various vehicles and the like, and feasible road areas; the vision sensor is used for sensing lane lines, pedestrians and vehicles for detection, positioning and synchronous map creation; the millimeter wave radar sensor is used for detecting the distance between a vehicle and a pedestrian and between the vehicle and the running vehicle; the ultrasonic radar is used for detecting the distance between close-distance vehicles; the vision sensor, positioning system and inertial navigation system are used to implement vehicle positioning techniques.

Further, the different scenes in the step (2) include five scenes, namely a sea-crossing bridge road condition in rainy and snowy weather, a sea-crossing bridge road condition in severe wind and severe weather, a road condition when the bridge vibrates in sunny days, a driving road condition of a single vehicle in frequently changeable weather, and a driving road condition of multiple vehicles in frequently changeable weather.

Further, the data set in step (3) comprises vehicle speed, driving track, vehicle position, heading angle, slip angle, yaw rate and roll angle.

Further, the network design of the DDPG algorithm of step (4) is as follows:

an actor network is constructed, a vehicle state and an environment state are used as input, the output is a vector formed by a steering angle, an accelerator and a brake signal, the vector corresponds to 3 neurons of an actor strategy network output layer respectively, the activation function of the accelerator and the brake is set to be Sigmoid, the activation function of a steering action value is Tanh, and the structure of a hidden layer is as follows: the first layer is 200 neurons with convolution size 7 x 7,filter size 48,step size 4; the second layer is convolution size 5 x 5,filter size 16,step size 2, activation function ReLu function, total 400 neurons; the third layer adds 100 neurons of the LSTM layer; the fourth layer is a full link layer of 128 cells; the fifth layer is a full connection layer, and 128 units are provided; inputting a critic network into a state and action space, and splicing the state and action space with an activation function ReLu through two hidden layers, namely a first layer of 200 neurons and a second layer of 400 neurons to finally obtain a Q value; definition h_i∈(S_t-T,S_t-T+1,…,S_t) Wherein S is_t-TAnd S_tRespectively representing the current time and the current timeThe state information of the moment, the coded state is: s ═ f (h)_i(ii) a β), the policy of the changed actor network is defined as: a is mu (h)_i/β,γ^π)+η。

Further, the implementation process of tracking and controlling the path trajectory of the automatic driving vehicle under the normal driving road condition in the step (5) is as follows:

the method comprises the following steps of (1) establishing a vehicle dynamic model by considering the characteristics of roll, sideslip and yaw of a vehicle according to a normal driving road condition, namely the road condition when a bridge vibrates in a clear day, setting a vehicle state constraint condition, and determining a transverse stability range, a maximum steering angle range and a range of allowable vehicle control for preventing roll so as to reduce the side offset error of the vehicle:

ω_z-min≤ω_z≤ω_z-max,ω_x-min≤ω_x≤ω_x-max，u_x-min≤u_x≤u_x-max，e_r-x-min≤e_r≤e_r-x-max

in the formula, ω_zIs the yaw angular velocity; omega_xIs the vehicle roll angle; u. of_xIs the steering angle; e.g. of the type_rIs the lateral tracking offset error;

according to the road state information predicted by the LSTM, an objective function considering a steering angle, a tire adhesion coefficient, a roll angle error and a path tracking error is constructed, and the physical constraint of the vehicle determined by a transverse tracking offset error is determined under the condition of fully considering the dynamic constraint condition of the maximum vehicle allowable error, so that the error of tracking and controlling the vehicle is reduced:

in the formula, w₁,w₂,w₃,w₄Are respectively parameter variables; mu.s_rIs the road adhesion coefficient;

the method comprises the steps that compensation control of vehicle side inclination is achieved through an iterative control algorithm, and a reference vehicle state, a reference control input value and a reference output value are set, so that a tracking function under vehicle physical constraint and road constraint conditions under multiple constraint conditions is guaranteed, anti-interference performance of a vehicle during running is increased, and a model error rate is reduced;

according to the vehicle running condition, constructing a state space, an action space and a reward function required by a DDPG algorithm; the action space mainly comprises steering wheel turning angles, an accelerator and braking signals, and the state space comprises a vehicle transverse tracking error and a change rate thereof, a vehicle roll angle error and a change rate thereof, and a yaw rate error and a change rate thereof; the construction of the reward function is that under the condition of vibration of the sea-crossing bridge, the actual track of the vehicle changes, and a certain inclination angle is generated on a road, so that the reward function is equal to the multiplication of the discount factor and the change of the speed.

Further, the implementation process of the step (5) for tracking and controlling the path track of the automatic driving vehicle under the extreme driving road condition is as follows:

under extreme driving road conditions, namely under severe weather and other road condition influencing factors, the phenomenon that the vehicle is easy to generate wet slip and vibration is considered, the driving track is changed, the actual running speed of the vehicle is influenced, and the deviation between the actual vehicle speed and the planned vehicle speed is generated, so that the change of the actual vehicle speed is set as follows:

in the formula, v_refIs a reference vehicle speed; v. of_actIs the actual vehicle speed; α is an influencing factor;

when the error between the actual vehicle speed and the reference vehicle speed is within 2KM/H, the actual vehicle speed can be assumed to be equal to the reference vehicle speed; when the error between the actual vehicle speed and the reference vehicle speed is in a [ 25 ] KM/H interval, the vehicle speed of the actual vehicle speed is equal to the reference vehicle speed and the difference between the two vehicle speeds; wherein α is a speed variation factor; when the error between the actual speed and the reference speed is more than 5KM/H, the vehicle needs to be instantaneously braked at the moment, so that the running safety of the vehicle is ensured;

at time t, the set of actions and states of the vehicle is expressed as follows: { v₁,…,v_i…v_n1, …, n, usingThe Bezier curve realizes path planning to generate a predicted trajectory without collision; to judge the accuracy of the vehicle speed planning, the vehicle states and actions of small samples taken from the experience buffer of the DDPG algorithm as reference values are: { v_r1,…,v_ri…v_rn1, …, n, and calculating the error of the two as:

in the formula, v_iIs the vehicle speed; l is the vehicle speed error rate; v. of_riIs a reference vehicle speed obtained from an experience buffer;

given a desired state reference trajectory, S, under extreme driving conditions_rOutput error is e_k(t)＝S_r(t) -s (t), learning law: u. of_k+1(t)＝L(u_k(t),e_k(t)), the compensation control action a is obtained_k(ii) a Under normal driving, the DDPG algorithm is adopted to realize vehicle control, and the output action is a_π＝μ(h_i/β,γ^π) + η, total motion control a ═ a_k+a_πThe reference formula is as follows:

in order to judge the accuracy of path planning, the track error when the actual running track of the actual running vehicle of the vehicle and the reference track generate the change of the roll angle is calculated as follows:

in the formula, r_actIs the actual vibrated vehicle trajectory; r is_refIs a reference trajectory;

is the maximum angle difference between the actual track and the reference track, and mu is the road surface adhesionA coefficient; sigma is a deviation angle when the vehicle generates transverse sideslip, phi is a vehicle sideslip angle, d is a vertical vibration distance, and chi is a deviation angle when the bridge vibrates vertically, namely a vertical inclination angle generated by the bridge deck;

the vertical inclination angle chi that the bridge vibration produced can maximize to set up as:

searching an optimal road driving area according to a high-precision map designed by a visual sensor, designing reference vehicle states, control inputs and parameter output values in the optimal road driving area, designing constraint conditions of multiple states, and realizing the roll control of the vehicle by adopting an iterative learning control algorithm;

the method comprises the following steps of searching a limit road driving area by a high-precision map designed by a visual sensor and a prediction state of an LSTM, designing a range of layered uncertain state parameters and action parameters, realizing roll control of a vehicle by adopting a DDPG (distributed data group) and an iterative control algorithm when the vehicle drives in the limit road driving area, wherein the iterative control algorithm plays a compensation role, and constructing a reward function according to a constraint range of a vehicle state as follows:

R＝v·(R₁+R₂+…R₆)

wherein R is₁And R₂Each represents a lateral distance error and a rate of change thereof; r₃And R₄Represents the lateral angular velocity and its rate of change; r₅And R₆Represents the roll angle and its rate of change; x is an angle value; v is vehicle speed; e.g. of the type_yIs the lateral distance, k_i，k_jRespectively, are reward factors.

Has the advantages that: compared with the prior art, the invention has the beneficial effects that: 1. the invention designs a comprehensive control method for automatically driving the vehicle to roll based on a reinforcement learning algorithm (DDPG), and controls the vehicle to roll in a complex road environment through reinforcement learning, so that the automatically driving vehicle realizes exploratory driving of the vehicle through exploration and utilization methods under the complex road conditions and extreme weather; 2. and the compensation effect of iterative learning control on the DDPG algorithm is carried out aiming at the extreme driving condition, so that the comprehensive control effect of the vehicle is realized, and the final safe driving of the vehicle is ensured.

Drawings

FIG. 1 is a schematic diagram of a DDPG network architecture;

FIG. 2 is a schematic diagram of an integrated control of vehicle roll;

fig. 3 is a flowchart of integrated control based on the roll of an autonomous vehicle.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings.

The invention provides an automatic driving vehicle roll control method based on DDPG and iterative control, which specifically comprises the following steps:

step 1: a laser radar, a vision and millimeter wave radar, an ultrasonic radar sensor, a positioning system and an inertial navigation system are installed on the automatic driving vehicle.

The invention aims at the road condition environment of a cross-sea bridge, and aims to control a vehicle to safely run at a medium-low speed (5-80KM/H) and realize high-level intellectualization of the vehicle through the control action of the vehicle which is instantaneously inclined. In order to achieve the above object, the present invention installs several laser radar, machine vision, millimeter wave radar and ultrasonic radar sensors on an autonomous vehicle, and installs a positioning system and an inertial navigation system (IMU) and the like. The laser radar sensor is used for detecting dynamic and static obstacles on a road, including pedestrians, motorcycles, various vehicles and the like, and feasible road areas; the machine vision sensor is used for sensing lane lines, pedestrians and vehicles for detection, positioning and synchronizing map creation; the millimeter wave radar sensor is used for detecting the distance between a vehicle and a pedestrian and between the vehicle and the running vehicle; the ultrasonic radar is used for detecting the distance between close-distance vehicles; positioning systems and inertial navigation Systems (IMUs) are used to implement vehicle positioning techniques.

Step 2: the position and the map of the vehicle under different scenes are respectively realized by using the visual sensor, the positioning system and the inertial navigation system, so that the driving map of the automatic driving vehicle under different scenes is generated, and the environment required by the driving track of the vehicle is realized.

The method comprises the steps of using a visual sensor, a positioning system and an inertial navigation system (IMU) to realize the position and the map of a vehicle in severe weather such as sunny days, rain, snow, fog and strong wind, so as to generate an automatic driving vehicle driving map under five scenes of a sea-crossing bridge road condition in rainy and snowy weather, a sea-crossing bridge road condition in severe weather, a road condition when a bridge vibrates in sunny days, a driving road condition of a single vehicle in frequently changeable weather and a driving road condition of multiple vehicles in frequently changeable weather, and the automatic driving vehicle driving map is used for realizing the environment required by a vehicle driving track.

And step 3: the method comprises the steps of respectively controlling a steering wheel, an accelerator and a pedal, driving on the sea-crossing bridge, obtaining corresponding driving tracks in rainy and snowy days, strong wind and severe weather and sunny days, and constructing a data set.

A plurality of experienced drivers respectively drive on the sea-crossing bridge under five scenes by controlling a steering wheel, an accelerator and a pedal, and corresponding driving tracks are recorded to construct corresponding data sets. The data set includes: the vehicle speed, the driving track, the vehicle position, the course angle, the slip angle, the yaw rate and the roll angle provide necessary reference data for training a data set and evaluating the controllability of the vehicle.

And 4, step 4: the driving map training DDPG algorithm of the automatic driving vehicle under different scenes is used for driving states of the sea-crossing bridge under different complex road condition grades in severe weather; the automatic driving vehicle generates a real-time vehicle state by interacting with map environments under different scenes, and determines the action behavior of the vehicle; when action training is carried out, action space is initialized, the online strategy network in the actor network generates state space information, action output is carried out, and action noise is added to obtain exploratory action space.

As shown in fig. 1, an actor network is constructed to provide vehicle status and environmentThe state is used as input, the output is a vector formed by a steering angle, an accelerator and a brake, the vector corresponds to 3 neurons of an actor strategy network output layer respectively, the activation function of the accelerator and the brake is set to be Sigmoid, the activation function of a steering action value is Tanh, and the hidden layer has the following structure: the first layer is 200 neurons with convolution size 7 x 7, filter size 48, step size 4; the second layer is convolution size 5 x 5, filter size 16, step size 2, activation function ReLu function, total 400 neurons; the third layer adds 100 neurons of the LSTM layer; the fourth layer is a full link layer of 128 cells; the fifth layer is a full connection layer, and 128 units are provided; inputting a critic network into a state and action space, and splicing the state and action space with an activation function ReLu through two hidden layers, namely a first layer of 200 neurons and a second layer of 400 neurons to finally obtain a Q value; definition h_i∈(S_t-T,S_t-T+1,…,S_t) Wherein S is_t-TAnd S_tRespectively representing the current time and the state information of the current time, the coded state is: s ═ f (h)_i(ii) a β), the policy of the changed actor network is defined as: a is mu (h)_i/β,γ^π)+η。

Designing the action space of the vehicle, wherein the action space comprises the steering wheel rotation angle delta and the brake signal of the vehicle

And throttle signal

Considering that the running environment of the vehicle is complex, when the road environment is complex, the vehicle runs at a variable speed, the braking signal is the action generated under the extreme running condition of the vehicle to prevent the vehicle from generating roll and side-turning movement due to braking and wet and slippery road surface, and the action space at the moment is set to be three types of steering wheel turning angle, braking signal and throttle signal; when the vehicle is in a normal running working condition, the vehicle is assumed to run at a constant speed, in order to prevent the vehicle from generating a roll phenomenon due to wet and slippery road surfaces, the action space is set to be a steering wheel corner and an accelerator signal at the moment, and the two different running working conditions are set according to the two different running working conditionsAnd setting the constraint ranges of the three actions to ensure that the vehicle can controllably run in a drivable road area.

The vehicle acquires state data, typically including lateral distance and rate of change thereof, and vehicle roll angle and rate of change thereof, by way of exploration and utilization with the environment, which are typically contained in an experience buffer. When iterative learning is adopted to realize vehicle control, parameters such as states of reference vehicles and vehicle tracks need to be designed, the reference states and the tracks can be obtained from an experience buffer, and the reference tracks can be adjusted and changed along with the complexity of different road environments.

Under five different scenes of a sea-crossing bridge road condition in rainy and snowy weather, a sea-crossing bridge road condition in strong wind severe weather, a road condition when a bridge vibrates in clear weather, a driving road condition of a single vehicle in frequent and changeable weather, and a driving road of multiple vehicles in frequent and changeable weather, different generated tracks are used as reference paths, error comparison is carried out on the actually planned paths of the vehicles and the reference paths, various constraint conditions meeting the dynamic characteristics of the vehicles also need to be added to the reference paths, modification and adjustment are carried out to serve as set actual paths, and the conditions can be expressed as follows:

where σ is the path impact factor, p_refIs a reference trajectory; p is a radical of_actAn actual trajectory; that is, when the vehicle runs on different road environments, the obtained running track needs to be modified and adjusted appropriately to meet the vehicle dynamics characteristics of the vehicle during automatic running, and then can be used as the running track of the automatic driving vehicle.

And 5: as shown in fig. 2, a predicted path of the state of the autonomous vehicle is generated based on LSTM history memory and road planning attributes, tracking control of the path trajectory of the autonomous vehicle under normal driving road conditions and under extreme driving road conditions is realized by using a DDPG algorithm, and compensation control of the autonomous vehicle is realized by using an iterative control method.

Under normal driving road conditions, namely road conditions under the vibration of a bridge in a fine day in five scenes, in order to ensure the safety and stability of a vehicle, when vehicle dynamics modeling is carried out, the roll, sideslip and yaw dynamics characteristics of the vehicle need to be considered, vehicle state constraint conditions are set, and a transverse stability range, a maximum steering angle range and a range of allowable vehicle control for preventing roll are determined so as to reduce the side offset error of the vehicle:

in the formula, Ψ ═ v_y ω_z ω_x φ]^TIs a state vector, u₁Is a control input, u₂Is an auxiliary control input; omega_zIs the yaw angular velocity; omega_xIs the vehicle roll angle; u. of_xIs the steering angle; e.g. of the type_rIs the lateral tracking offset error.

in the formula, w₁,w₂,w₃,w₄Are respectively parameter variables; mu.s_rIs road adhesion coefficient

As shown in fig. 3, an iterative control algorithm is used to implement the compensation control of the vehicle roll, and the reference vehicle state, the reference control input and the reference output value are set to ensure the tracking function under the vehicle physical constraint and the road constraint condition under the multi-constraint condition, increase the anti-interference performance to the vehicle running and reduce the error rate of the model. Firstly, a DDPG algorithm is adopted to train a network model, interactive training is carried out on a vehicle and a dynamic cross-sea bridge road condition, a training completion task is ensured, if the task is completed, the trained action is stored, if the training task is not ideal in completion effect, an iterative control algorithm is adopted to compensate parameters of an output action space, the training task is finally completed, better action is realized, and better automatic driving vehicle roll control is finally realized.

Under extreme driving conditions, namely a sea-crossing bridge road condition under the rain and snow weather in five scenes, a sea-crossing bridge road condition under the severe wind and severe weather, a driving road condition of a single vehicle under frequent and variable weather, and a driving road of multiple vehicles under frequent and variable weather; due to the influence of severe weather, the road environment has uncertainty, which interferes with the normal driving of the vehicle.

Under the extreme road conditions of traveling, the vehicle produces wet and slippery and vibration phenomenon easily, can lead to the orbit of traveling to change, and influences the actual operating speed of vehicle, makes actual speed of a motor vehicle and the speed of a motor vehicle of planning produce the deviation, therefore, the change of actual speed of a motor vehicle can set up as follows:

in the formula, v_refIs a reference vehicle speed; v. of_actIs the actual vehicle speed; alpha is an influencing factor.

When the error between the actual vehicle speed and the reference vehicle speed is within 2KM/H, the actual vehicle speed can be assumed to be equal to the reference vehicle speed; when the error between the actual vehicle speed and the reference vehicle speed is in a [ 25 ] KM/H interval, the vehicle speed of the actual vehicle speed is equal to the reference vehicle speed and the difference between the two vehicle speeds, wherein alpha is a speed change factor; when the error between the actual speed and the reference speed is more than 5KM/H, the vehicle needs to be braked instantaneously at the moment, so that the running safety of the vehicle is ensured.

Path prediction through LSTM calendar for autonomous vehiclesAfter the state is memorized by history, a predicted speed path is generated by using the road planning attribute, and at the time t, the motion and state of the vehicle are collectively expressed as follows: { v₁,…,v_i…v_n1, …, n, and realizing path planning by adopting a Bezier curve to generate a predicted trajectory without collision; to judge the accuracy of the vehicle speed planning, the vehicle states and actions of small samples taken from the experience buffer of the DDPG algorithm as reference values are: { v_r1,…,v_ri…v_rn1, …, n, and calculating the error of the two as:

in the formula, v_iIs the vehicle speed; l is the vehicle speed error rate; v. of_riIs a reference vehicle speed obtained from an experience buffer.

In extreme conditions, given a desired state reference trajectory, S_rOutput error is e_k(t)＝S_r(t) -s (t), learning law: u. of_k+1(t)＝L(u_k(t),e_k(t)), the compensation control action a is obtained_k(ii) a Under normal driving, the DDPG algorithm is adopted to realize vehicle control, and the output action is a_π＝μ(h_i/β,γ^π) + η, total motion control a ═ a_k+a_πThe reference formula is as follows:

in order to judge the accuracy of path planning, the track error when the actual running track of the actual running vehicle of the vehicle and the reference track generate the change of the roll angle is calculated as follows: the rolling motion phenomenon of the vehicle is mainly embodied in two situations, the first motion is that when the bridge generates vibration, the automatic driving vehicle deviates from a planned path, and at the moment, the vehicle is easy to roll and generate a roll angle; therefore, the track error of the actual running track of the vehicle and the reference track generating the change of the roll angle can be expressed as:

the maximum angle difference between the actual track and the reference track, sigma is the deviation angle of the vehicle when the vehicle generates transverse sideslip, phi is the vehicle sideslip angle, d is the vertical vibration distance, and chi is the deviation angle of the bridge when the bridge vibrates vertically, namely the vertical inclination angle generated by the bridge deck.

the vehicle has roll motion phenomenon, and the second motion is that the road surface generates wet and slippery phenomenon due to severe weather, so that the road adhesion coefficient changes, and the vehicle generates roll, sideslip and rollover motion; therefore, the track error of the roll angle change generated by the actual running track of the vehicle and the reference track can be expressed as:

is the maximum angle difference between the actual track and the reference track, and mu is the road surface adhesion coefficient; sigma is the deviation angle when the vehicle produces lateral sideslip, phi is the vehicle sideslip angle, d is the vertical vibration distance, chi is the deviation angle when the bridge vibrates vertically, i.e. the vertical inclination produced by the bridge deck, mu is the road adhesion coefficient, which isRange is [01]。

According to a high-precision map designed by a visual sensor, an optimal road driving area is searched, a reference vehicle state, a control input value and a parameter output value are designed in the optimal road driving area, constraint conditions of multiple states are designed, and an iterative learning control algorithm is adopted to realize roll control of the vehicle, wherein the DDPG algorithm plays a compensation role at the moment, so that safe driving of the vehicle in the controllable road area is realized. Searching a limit road driving area according to a high-precision map designed by a visual sensor and a prediction state of an LSTM, designing a range of layered uncertain state parameters and action parameters, and when a vehicle drives in the limit road driving area, realizing roll control of the vehicle by adopting DDPG and an iterative control algorithm, wherein the iterative control algorithm plays a compensation role; when the vehicle is in the limit working condition, the constraint condition of the vehicle state needs to be increased, and the reward function is constructed as follows:

R＝v·(R₁+R₂+…R₆)

wherein R is₁And R₂Each represents a lateral distance error and a rate of change thereof; r₃And R₄Represents the lateral angular velocity and its rate of change; x is an angle value; v is vehicle speed; r₅And R₆Represents the roll angle and its rate of change; e.g. of the type_yIs the lateral distance, k_i，k_jRespectively, are reward factors.

The above description is only specific to possible embodiments of the present invention, and they are not intended to limit the scope of the present invention, and all equivalent modifications or variations that do not depart from the technical spirit of the present invention are intended to be included within the scope of the present invention.

Claims

Translated fromChinese

1.一种基于DDPG和迭代控制的自动驾驶车辆侧倾控制方法，其特征在于，包括以下步骤：1. a kind of automatic driving vehicle roll control method based on DDPG and iterative control, is characterized in that, comprises the following steps:

(1)在自动驾驶车辆上安装激光雷达、视觉、毫米波雷达、超声波雷达传感器、定位系统和惯性导航系统；(1) Install lidar, vision, millimeter-wave radar, ultrasonic radar sensors, positioning systems, and inertial navigation systems on autonomous vehicles;

(2)使用视觉传感器、定位系统和惯性导航系统实现车辆分别在不同场景下的位置和地图，以生成不同场景下的自动驾驶车辆行驶地图，实现车辆行驶轨迹所需要的环境；(2) Use visual sensors, positioning systems and inertial navigation systems to realize the positions and maps of vehicles in different scenarios, so as to generate driving maps of autonomous vehicles in different scenarios, and realize the environment required for the vehicle's driving trajectory;

(3)分别控制方向盘、油门和踏板，在跨海大桥上行驶，获取雨雪天、强风恶劣天气、晴天下相应的行驶轨迹，构建数据集；(3) Control the steering wheel, accelerator and pedals respectively, drive on the bridge across the sea, obtain the corresponding driving trajectories in rainy and snowy days, strong winds and bad weather, and sunny days, and build a data set;

(4)在不同场景下的自动驾驶车辆行驶地图训练DDPG算法，用于恶劣天气下跨海大桥在不同复杂路况等级下的行驶状态；自动驾驶车辆通过与不同场景下的地图环境交互，产生实时的车辆状态，确定车辆的动作行为；在进行动作训练时，对动作空间进行初始化，演员网络中的online策略网络产生状态空间信息，进行动作输出，并增加一个动作噪声来获取具有探索性的动作空间；(4) The DDPG algorithm is trained on the driving map of the autonomous driving vehicle in different scenarios, which is used for the driving state of the cross-sea bridge under different complex road conditions in severe weather; the autonomous driving vehicle interacts with the map environment in different scenarios to generate real-time The state of the vehicle is determined, and the action behavior of the vehicle is determined; during the action training, the action space is initialized, and the online policy network in the actor network generates state space information, performs action output, and adds an action noise to obtain exploratory actions. space;

(5)基于LSTM历史记忆和道路规划属性，生成自动驾驶车辆状态预测的路径，采用DDPG算法实现自动驾驶车辆正常行驶路况下和极端行驶路况下路径轨迹的跟踪控制，并采用迭代控制方法实现自动驾驶车辆补偿控制。(5) Based on the LSTM historical memory and road planning attributes, the path for the state prediction of the autonomous vehicle is generated, and the DDPG algorithm is used to realize the tracking control of the path trajectory of the autonomous vehicle under normal driving conditions and extreme driving conditions, and the iterative control method is used to realize automatic driving. Driving vehicle compensation control.

2.根据权利要求1所述的基于DDPG和迭代控制的自动驾驶车辆侧倾控制方法，其特征在于，步骤(1)所述激光雷达传感器用来探测道路上的动静态障碍物，包括行人、摩托车和各种车辆等，以及可行使道路区域；所述视觉传感器用来感知车道线、行人和车辆检测工作，以及进行定位和同步地图创建工作；所述毫米波雷达传感器用来探测车辆与行人和行驶的车辆间距；所述超声波雷达用来探测近距离车辆间距；所述视觉传感器、定位系统和惯性导航系统用来实现车辆定位技术。2. The automatic driving vehicle roll control method based on DDPG and iterative control according to claim 1, wherein the lidar sensor of step (1) is used to detect dynamic and static obstacles on the road, including pedestrians, Motorcycles and various vehicles, etc., as well as drivable road areas; the visual sensor is used to perceive lane lines, pedestrian and vehicle detection, and to perform positioning and synchronous map creation; the millimeter wave radar sensor is used to detect vehicles and The distance between pedestrians and traveling vehicles; the ultrasonic radar is used to detect the distance between vehicles at close range; the visual sensor, positioning system and inertial navigation system are used to realize vehicle positioning technology.

3.根据权利要求1所述的基于DDPG和迭代控制的自动驾驶车辆侧倾控制方法，其特征在于，步骤(2)所述的不同场景包括下雨雪天气下的跨海大桥路况、强风恶劣天气下的跨海大桥路况、晴天时桥梁振动时的路况、频繁多变天气下单车的行驶路况、频繁多变天气下多车的行驶路况五种场景。3. the automatic driving vehicle roll control method based on DDPG and iterative control according to claim 1, is characterized in that, the different scenarios described in step (2) comprise cross-sea bridge road conditions under rainy and snowy weather, strong wind is bad There are five scenarios for the road conditions of the cross-sea bridge in weather, the road conditions when the bridge vibrates in sunny days, the driving road conditions of bicycles in frequent and changeable weather, and the driving road conditions of multiple vehicles in frequent and changeable weather.

4.根据权利要求1所述的基于DDPG和迭代控制的自动驾驶车辆侧倾控制方法，其特征在于，步骤(3)所述数据集包括车速、行驶轨迹、车辆位置、航向角、滑移角、横摆角速度、侧倾角。4. The automatic driving vehicle roll control method based on DDPG and iterative control according to claim 1, wherein the data set of step (3) comprises vehicle speed, driving track, vehicle position, heading angle, slip angle , yaw rate, roll angle.

5.根据权利要求1所述的基于DDPG和迭代控制的自动驾驶车辆侧倾控制方法，其特征在于，步骤(4)所述DDPG算法的网络设计如下：5. the automatic driving vehicle roll control method based on DDPG and iterative control according to claim 1, is characterized in that, the network design of the described DDPG algorithm of step (4) is as follows:

构建演员网络，以车辆状态和环境状态作为输入，输出是转向角、油门和制动信号组成的矢量，分别对应演员策略网络输出层的3个神经元，设定油门和制动的激活函数是Sigmoid，转向动作值的激活函数是Tanh，隐藏层的结构为：第一层是卷积大小是7*7，滤波器大小是48，步长是4，共200个神经元；第二层是卷积大小是5*5，滤波器大小是16，步长是2，激活函数是ReLu函数，共400个神经元；第三层增加了LSTM层100个神经元；第四层是128个单元的全连接层；第五层是全连接层，共128个单元；评论家网络输入为状态和动作空间，经过两层隐藏层，第一层200个神经元，第二层400个神经元，与激活函数ReLu拼接，最终得出Q值；定义h_i∈(S_t-T,S_t-T+1,…,S_t)，其中，S_t-T和S_t分别表示当前时刻和当前时刻的状态信息，则编码后的状态是：s＝f(h_i；β)，则变化后的演员网络的策略定义为：a＝μ(h_i/β,γ^π)+η。Construct an actor network, taking the vehicle state and the environment state as input, and the output is a vector composed of steering angle, accelerator and brake signals, corresponding to the three neurons in the output layer of the actor policy network respectively, and the activation functions of the accelerator and brake are set as Sigmoid, the activation function of the steering action value is Tanh, the structure of the hidden layer is: the first layer is the convolution size is 7*7, the filter size is 48, the step size is 4, a total of 200 neurons; the second layer is The convolution size is 5*5, the filter size is 16, the stride is 2, and the activation function is the ReLu function, with a total of 400 neurons; the third layer adds 100 neurons to the LSTM layer; the fourth layer is 128 units The fifth layer is the fully connected layer, with a total of 128 units; the input of the critic network is the state and action space, after two hidden layers, the first layer has 200 neurons, and the second layer has 400 neurons. Splicing with the activation function ReLu, the Q value is finally obtained; define h_i ∈(S_tT , S_t-T+1 ,..., S_t ), where S_tT and S_t represent the current moment and the state information of the current moment, respectively , then the encoded state is: s=f(h_i ; β), then the strategy of the actor network after the change is defined as: a=μ(_hi /β,γ^π )+η.

6.根据权利要求1所述的基于DDPG和迭代控制的自动驾驶车辆侧倾控制方法，其特征在于，步骤(5)所述的实现对自动驾驶车辆正常行驶路况下路径轨迹的跟踪控制实现过程如下：6. The automatic driving vehicle roll control method based on DDPG and iterative control according to claim 1, it is characterized in that, the realization described in step (5) is to the tracking control realization process of the path trajectory under the normal driving road condition of the automatic driving vehicle as follows:

正常行驶路况即晴天时桥梁振动时的路况，考虑车辆的侧倾、侧滑和横摆动力学特性，建立车辆动力学模型，并设置车辆状态约束条件，确定横向稳定性范围、最大转向角度范围和防止侧倾的可允许车辆控制的范围，以减少车辆的侧偏移误差：The normal driving road conditions are the road conditions when the bridge vibrates on sunny days. Considering the roll, sideslip and yaw dynamics of the vehicle, the vehicle dynamics model is established, and the vehicle state constraints are set to determine the lateral stability range, the maximum steering angle range and The range of allowable vehicle control to prevent roll to reduce the vehicle's side offset error:

ω_z-min≤ω_z≤ω_z-max,ω_x-min≤ω_x≤ω_x-max，u_x-min≤u_x≤u_x-max，e_r-x-min≤e_r≤e_r-x-maxω_z-min ≤ω_z ≤ω_z-max ,ω_x-min ≤ω_x ≤ω_x-max , u_x-min ≤u_x ≤u_x-max , e_rx-min ≤e_r_{≤e rx- max}

式中，ω_z是横摆角速度；ω_x是车辆侧倾角；u_x是转向角；e_r是横向跟踪偏移误差；where ω_z is the yaw rate; ω_x is the vehicle roll angle;_u_x is the steering angle; er is the lateral tracking offset error;

根据LSTM预测的道路状态信息，构建考虑转向角度、轮胎附着系数、侧倾角度误差和路径跟踪误差的目标函数，充分考虑到车辆允许误差最大的动力学约束条件下，确定横向跟踪偏移误差所确定的车辆的物理约束，以减少跟踪控制车辆的误差：According to the road state information predicted by LSTM, construct an objective function considering steering angle, tire adhesion coefficient, roll angle error and path tracking error, and fully consider the dynamic constraints of the maximum allowable error of the vehicle. Determine the physical constraints of the vehicle to reduce errors in tracking the control vehicle:

式中，w₁,w₂,w₃,w₄分别是参数变量；μ_r是道路附着系数；In the formula, w₁ , w₂ , w₃ , and w₄ are parameter variables respectively; μ_r is the road adhesion coefficient;

采用迭代控制算法实现车辆侧倾的补偿控制，并设置参考车辆状态、参考控制输入和参考输出值，以确保在多约束条件下车辆物理约束和道路约束条件下的跟踪函数，增加对车辆行驶时的抗干扰性和降低模型误差率；The iterative control algorithm is used to realize the compensation control of the vehicle roll, and the reference vehicle state, reference control input and reference output values are set to ensure the tracking function under the physical constraints of the vehicle and the road constraints under multiple constraints. anti-interference and reduce the model error rate;

根据车辆行驶条件，构建DDPG算法所需要的状态空间、动作空间和奖励函数；其中动作空间主要包括方向盘转角、油门和制动信号，状态空间包括车辆横向跟踪误差及其变化率、车辆侧倾角度误差及其变化率和横摆角速度误差及其变化率；奖励函数的构建在跨海大桥振动情况下，车辆的实际轨迹会产生变化，道路产生一定倾角，因此奖励函数等于折扣因子和速度的变化的累乘。According to the driving conditions of the vehicle, construct the state space, action space and reward function required by the DDPG algorithm; the action space mainly includes the steering wheel angle, accelerator and brake signals, and the state space includes the vehicle lateral tracking error and its rate of change, the vehicle roll angle Error and its rate of change and yaw rate error and rate of change; the construction of the reward function is based on the vibration of the bridge across the sea, the actual trajectory of the vehicle will change, and the road will have a certain inclination angle, so the reward function is equal to the discount factor and the change of speed 's multiplication.

7.根据权利要求1所述的基于DDPG和迭代控制的自动驾驶车辆侧倾控制方法，其特征在于，步骤(5)所述的实现对自动驾驶车辆极端行驶路况下路径轨迹的跟踪控制实现过程如下：7. The automatic driving vehicle roll control method based on DDPG and iterative control according to claim 1, is characterized in that, the realization described in step (5) is to the tracking control realization process of the path trajectory under extreme driving road conditions of the automatic driving vehicle as follows:

极端行驶路况下，即在恶劣天气及其他影响路况的因素下，考虑车辆容易产生湿滑和振动现象，会导致行驶轨迹发生变化，而影响车辆的实际运行速度，使实际车速与规划的车速产生偏差，因此，实际车速的变化设置如下：Under extreme road conditions, that is, in bad weather and other factors that affect road conditions, considering that the vehicle is prone to slippage and vibration, the driving trajectory will change, which will affect the actual running speed of the vehicle, making the actual vehicle speed and the planned speed. deviation, therefore, the change in actual vehicle speed is set as follows:

式中，v_ref是参考车速；v_act是实际车速；α是影响因子；In the formula,_vref is the reference speed;_vact is the actual speed; α is the influence factor;

当实际车速与参考车速误差在2KM/H之内，可以假设实际的车速等于参考车速；当实际车速与参考车速误差在[2 5]KM/H区间时，实际车速的车速等于参考车速以及二者车速的差；其中α是速度变化因子；当实际车速与参考车速误差大于5KM/H时，此时车辆需要瞬时制动，保证车辆的行驶安全；When the error between the actual vehicle speed and the reference vehicle speed is within 2KM/H, it can be assumed that the actual vehicle speed is equal to the reference vehicle speed; when the error between the actual vehicle speed and the reference vehicle speed is within [2 5]KM/H interval, the actual vehicle speed where α is the speed change factor; when the error between the actual speed and the reference speed is greater than 5KM/H, the vehicle needs to brake instantaneously to ensure the safety of the vehicle;

在t时刻，车辆的动作和状态集合表述如下：{v₁,…,v_i…v_n},i＝1,…,n，采用贝塞尔曲线实现路径规划，以产生无碰撞的预测轨迹；为判断车速规划的准确性，从DDPG算法的经验缓存器中取出小样本的车辆状态和动作作为参考值为：{v_r1,…,v_ri…v_rn},i＝1,…,n，并计算二者的误差为：At time t, the set of actions and states of the vehicle is expressed as follows: {v₁ ,...,v_i ...v_n },i=1,...,n, using Bezier curves to implement path planning to generate collision-free predicted trajectories ; In order to judge the accuracy of the speed planning, take out a small sample of vehicle states and actions from the experience buffer of the DDPG algorithm as a reference value: {v_r1 ,...,v_ri ...v_rn },i=1,...,n , and calculate the error of the two as:

式中，vx是车速；L是车速误差率；v_ri是从经验缓存器中获取的参考车速；In the formula, vx is the vehicle speed; L is the vehicle speed error rate;_vri is the reference vehicle speed obtained from the experience buffer;

在极端行驶工况下，给定期望的状态参考轨迹为，S_r，输出误差为e_k(t)＝S_r(t)-S(t)，学习律：u_k+1(t)＝L(u_k(t),e_k(t))，得到补偿控制动作a_k；而在正常行驶下，采用DDPG算法实现车辆控制，其输出动作是a_π＝μ(h_i/β,γ^π)+η，则总的动作控制a＝a_k+a_π，参考公式如下：In extreme driving conditions, given the desired state reference trajectory, S_r , the output error is_ek (t)=S_r (t)-S(t), and the learning law: u_k+1 (t)= L(u_k (t), e_k (t)), the compensation control action a_k is obtained; under normal driving, the_DDPG algorithm is used to realize vehicle control, and the output action is a_π = μ(hi /β,γ^π )+η, then the total action control a=_ak +a_π , the reference formula is as follows:

为判断路径规划的准确性，计算车辆的实际运行车辆的实际运行轨迹与参考轨迹产生侧倾角度变化时的轨迹误差如下：In order to judge the accuracy of the path planning, calculate the trajectory error when the actual running trajectory of the vehicle and the reference trajectory produce the roll angle change as follows:

式中，r_act是实际振动后的车辆轨迹；r_ref是参考轨迹；

是实际的轨迹与参考轨迹最大的角度差，μ是路面附着系数；σ是车辆产生横向侧滑时的偏离角度，Φ是车辆侧倾角度，d垂向振动距离，χ是桥梁垂向振动时的偏离角度，即，桥面产生的垂向倾角；In the formula, r_act is the vehicle trajectory after actual vibration; r_ref is the reference trajectory;

is the maximum angle difference between the actual track and the reference track, μ is the road adhesion coefficient; σ is the deviation angle of the vehicle when the vehicle laterally slips, Φ is the vehicle roll angle, d is the vertical vibration distance, and χ is the vertical vibration of the bridge. The deviation angle of , that is, the vertical inclination angle produced by the bridge deck;

桥梁振动产生的垂向倾角χ可以最大化设置为：The vertical inclination angle χ caused by bridge vibration can be maximized as:

根据视觉传感器所设计的高精度地图，寻找最优的道路行驶区域，并在最优的道路行驶区域内，设计参考的车辆状态、控制输入和参数输出值，同时设计多类状态的约束条件，并采用迭代学习控制算法实现车辆的侧倾控制；According to the high-precision map designed by the visual sensor, find the optimal road driving area, and design the reference vehicle state, control input and parameter output values in the optimal road driving area, and design the constraints of multiple types of states at the same time. And the iterative learning control algorithm is used to realize the roll control of the vehicle;

视觉传感器所设计的高精度地图和LSTM的预测状态，寻找极限的道路行驶区域，并且设计分层的不确定性状态参数和动作参数的范围，当车辆行驶在极限的道路行驶区域时，采用DDPG和迭代控制算法实现车辆的侧倾控制，迭代控制算法起到一个补偿作用，并根据车辆状态的约束范围，构建奖励函数为：The high-precision map designed by the visual sensor and the predicted state of LSTM are used to find the extreme road driving area, and design the range of hierarchical uncertainty state parameters and action parameters. When the vehicle is driving in the extreme road driving area, DDPG is used. and the iterative control algorithm to achieve the roll control of the vehicle, the iterative control algorithm plays a compensation role, and according to the constraint range of the vehicle state, the reward function is constructed as:

R＝v·(R₁+R₂+…R₆)R=v·(R₁ +R₂ +...R₆ )

其中，R₁和R₂各代表侧向距离误差及其变化率；R₃和R₄代表了横向角速度及其变化率；R₅和R₆代表了侧倾角度及其变化率；x是角度值；v是车速；e_y是侧向距离，k_i，k_j分别是奖励因子。Among them, R₁ and R₂ each represent the lateral distance error and its rate of change; R₃ and R₄ represent the lateral angular velocity and its rate of change; R₅ and R₆ represent the roll angle and its rate of change; x is the angle value; v is the speed of the vehicle; e_y is the lateral distance, ki and_{k j}_are the reward factors, respectively.