CN116524741A

Movatterモバイル変換

Info

Publication number: CN116524741A
Application number: CN202310451039.4A
Authority: CN
Inventors: 曹健; 戈萧; 钱诗友
Original assignee: Shanghai Jiao Tong University
Current assignee: Shanghai Jiao Tong University
Priority date: 2023-04-24
Filing date: 2023-04-24
Publication date: 2023-08-01
Anticipated expiration: 2043-04-24
Also published as: CN116524741B

Abstract

Translated fromChinese

本发明提供了一种基于强化学习的特车优先通行方法及系统，包括如下步骤：布置步骤：在交叉口布置信号灯控制智能体和通信判断智能体；信号灯控制步骤：使用信号灯控制智能体控制交叉口信号灯的运行，根据获取的交通状态决策当前需要切换的相位。通信判断步骤：使用通信判断智能体在特车到达时判断选定周围智能体并进行通信，将特车信息通知给下游的交叉口；配合步骤：通过结合信号灯控制智能体和通信判断智能体优化相位安排策略，进行特车的优先通行。本发明通过结合交通信号灯控制智能体和通信判断智能体，使得相位安排的策略的更优，达到特车优先通行的目的。

The invention provides a special vehicle priority passing method and system based on reinforcement learning, comprising the following steps: arranging step: arranging a signal light control agent and a communication judgment agent at the intersection; signal light control step: using the signal light control agent to control the intersection The operation of the signal lights at the entrance, and the current phase that needs to be switched are decided according to the obtained traffic status. Communication judgment step: use the communication judgment agent to judge and select the surrounding agents when the special vehicle arrives and communicate, and notify the downstream intersection of the special vehicle information; coordination step: control the agent and communication judgment agent optimization by combining signal lights Phase arrangement strategy to give priority to special vehicles. In the present invention, by combining the traffic signal light control intelligent body and the communication judgment intelligent body, the strategy of phase arrangement is more optimal, and the purpose of special vehicle priority is achieved.

Description

Translated fromChinese

基于强化学习的特车优先通行方法及系统Method and system for prioritizing special vehicles based on reinforcement learning

技术领域technical field

本发明涉及特车通行的技术领域，具体地，涉及一种基于强化学习的特车优先通行方法及系统。The present invention relates to the technical field of special vehicle traffic, in particular to a special vehicle priority traffic method and system based on reinforcement learning.

背景技术Background technique

特车一般指有特殊用途的车辆，与平时所见的社会车辆有所区分。在城市发生交通事故或有其他紧急情况时，需要如救护车、消防车等应急车辆到达现场进行救援，或者如清扫车、警车、工程车辆等特种车辆完成特种任务，需要保证特车快速到达现场，即保证特车优先通行。Special vehicles generally refer to vehicles with special purposes, which are different from social vehicles that are usually seen. In the event of traffic accidents or other emergencies in the city, emergency vehicles such as ambulances and fire trucks are required to arrive at the scene for rescue, or special vehicles such as sweepers, police cars, and engineering vehicles are required to complete special tasks, and it is necessary to ensure that special vehicles arrive at the scene quickly , that is to ensure the priority of special vehicles.

根据我国有关交通法律规定，特车在不影响交通的正常通行的情况下，可以获得道路的绝对通行权，可以不受行车路线、行驶车道、行驶方向、行驶速度、交通信号灯等的约束。但是有关数据显示，即使特车拥有绝对通行权，但是面对交通拥堵的情况也无能为力，由于前方有社会车辆，紧急车辆不得不停下等待绿灯。另外，社会车辆的避让不及时也会导致特车的行驶延误大大增加。According to my country's relevant traffic laws and regulations, special vehicles can obtain the absolute right of way of the road without affecting the normal passage of traffic, and can not be restricted by driving routes, driving lanes, driving directions, driving speeds, traffic lights, etc. However, relevant data shows that even if special vehicles have the absolute right of way, they are helpless in the face of traffic congestion. Because there are social vehicles ahead, emergency vehicles have to stop and wait for the green light. In addition, the untimely avoidance of social vehicles will also greatly increase the delay of special vehicles.

目前，对于特车优先通行实施的策略为信号优先策略，当应急车辆靠近信号灯并被路边的检测设备识别时，使得当前信号灯相位切换至应急车辆所在的车道，使得应急车辆以及前面的车辆快速通行，但是此方法无法解决多个应急车辆同时到达交叉口的情况，对本身就拥堵的交通不适合，同时也会造成交通的拥堵程度增加。At present, the strategy implemented for special vehicle priority is the signal priority strategy. When the emergency vehicle approaches the signal light and is recognized by the detection equipment on the roadside, the current signal light phase is switched to the lane where the emergency vehicle is located, so that the emergency vehicle and the vehicle in front quickly However, this method cannot solve the situation that multiple emergency vehicles arrive at the intersection at the same time. It is not suitable for the traffic itself that is congested, and it will also increase the degree of traffic congestion.

公开号CN113096419A的中国发明专利文献公开了一种服务于车辆优先通行的信号控制方法，以提高绿灯利用率为优化目标，根据车辆的实时定位数据感知车辆在路口的优先需求，根据路口各流向的实时流量与排队长度等运行动态分析配时方案与交通需求的匹配度，在不改变周期、相序的前提下提高非优先流向的绿时利用率，并使优先车辆通行相位能够最大程度延长以保障特殊车辆减少排队时长、优先通过路口，有效降低了特殊车辆优先通行对社会车辆正常通行的影响，同时，通过预测特殊车辆的路口到达时间，提前两个周期进行配时微调，使路口信号方案缓和过渡。The Chinese invention patent document with the publication number CN113096419A discloses a signal control method serving priority traffic of vehicles, in order to improve the utilization rate of green light as an optimization goal, perceive the priority demand of vehicles at intersections according to the real-time positioning data of vehicles, and according to the flow direction of the intersections, Real-time flow and queuing length and other operational dynamics analyze the matching degree between the timing scheme and traffic demand, improve the green hour utilization rate of non-priority flow directions without changing the cycle and phase sequence, and maximize the priority vehicle traffic phase. Guarantee special vehicles to reduce queuing time and give priority to crossings, effectively reducing the impact of special vehicles on the normal traffic of social vehicles. At the same time, by predicting the arrival time of special vehicles at intersections, fine-tuning the timing two cycles in advance to make the intersection signal scheme Easing the transition.

针对上述中的相关技术，发明人认为对于交叉口的车流来说，其流动方向主要是靠信号灯的控制。所以如何设计高效的交通灯信号控制器一直是交通工程中的一个重要问题。由于交通环境的复杂性和不确定性，传统的模型求解起来相当困难。在这样的背景下，提供一种适用于特车优先通行的交通灯控制算法具有重要意义。With regard to the related technologies mentioned above, the inventor believes that for the traffic flow at the intersection, its flow direction is mainly controlled by signal lights. So how to design an efficient traffic light signal controller has always been an important issue in traffic engineering. Due to the complexity and uncertainty of the traffic environment, it is quite difficult to solve the traditional model. In this context, it is of great significance to provide a traffic light control algorithm suitable for special vehicles to pass through preferentially.

发明内容Contents of the invention

针对现有技术中的缺陷，本发明的目的是提供一种基于强化学习的特车优先通行方法及系统。Aiming at the defects in the prior art, the object of the present invention is to provide a special vehicle priority passing method and system based on reinforcement learning.

根据本发明提供的一种基于强化学习的特车优先通行方法，包括如下步骤：A method for prioritizing special vehicles based on reinforcement learning provided by the present invention comprises the following steps:

布置步骤：在交叉口布置信号灯控制智能体；Arrangement step: Arrange signal light control agents at intersections;

信号灯控制步骤：使用信号灯控制智能体控制交叉口信号灯的运行，根据获取的交通状态决策当前需要切换的相位。Signal light control step: Use the signal light control agent to control the operation of the intersection signal light, and decide the current phase that needs to be switched according to the acquired traffic status.

优选的，在所述布置步骤中，在交叉口布置通信判断智能体；Preferably, in the arranging step, a communication judging agent is arranged at the intersection;

该方法还包括通信判断步骤：使用通信判断智能体在特车到达时判断选定周围智能体并进行通信，将特车信息通知给下游的交叉口；The method also includes a communication judgment step: using the communication judging agent to judge and select surrounding agents when the special vehicle arrives and communicate, and notify the downstream intersection of the special vehicle information;

配合步骤：通过结合信号灯控制智能体和通信判断智能体优化相位安排策略，进行特车的优先通行。Coordination steps: optimize the phase arrangement strategy by combining the signal light control agent and the communication judgment agent to give priority to special vehicles.

优选的，该方法还包括智能体训练步骤：使用强化学习训练交通信号灯控制智能体以及通信判断智能体。Preferably, the method further includes an agent training step: using reinforcement learning to train the traffic light control agent and the communication judgment agent.

优选的，在所述信号灯控制步骤中，信号灯控制智能体观察交叉口实时的交通状态，然后根据观察的交通状态周期性决策，动态安排交叉口信号灯信号，使特车优先通行；Preferably, in the signal light control step, the signal light control agent observes the real-time traffic state at the intersection, and then makes periodic decisions according to the observed traffic state, dynamically arranges the signal light signal at the intersection, so that special vehicles have priority to pass;

所述信号灯控制智能体感知的交通状态包括特车状态、社会车辆状态以及交通信号灯的状态；The traffic state perceived by the signal light control agent includes the state of special vehicles, the state of social vehicles and the state of traffic lights;

特车状态state_ev包括特车所在车道的信息p_i以及特车的瞬时速度s_i，状态编码为：The special vehicle state state_ev includes the information p_i of the lane where the special vehicle is located and the instantaneous speed s_i of the special vehicle. The state code is:

state_ev＝[p₁，p₂，…，p_n，s₁，s₂，…，s_n]state_ev =[p₁ , p₂ , ..., p_n , s₁ , s₂ , ..., s_n ]

其中，n为该交叉口的进口车道总数，该状态所有的值初始化为0，若检测到车道i上有特车，则p_i＝1，对应的s_i设置为特车当前的速度；Among them, n is the total number of entrance lanes at the intersection, and all values in this state are initialized to 0. If it is detected that there is a special car on lane i, then p_i =1, and the corresponding_si is set as the current speed of the special car;

社会车辆状态state_social包括每个进口车道i的观察范围内的社会车辆的排队长度q_i以及感知范围内车辆密度d_i；The social vehicle state state_social includes the queuing length q_i of social vehicles within the observation range of each entrance lane i and the vehicle density d_i within the perception range;

state_social＝[d₁，d₂，…，d_n，q₁，q₂，…，q_n]state_social =[d₁ ,d₂ ,...,d_n ,q₁ ,q₂ ,...,q_n ]

交通信号灯状态state_cross包括当前的交通信号相位；The traffic signal state state_cross includes the current traffic signal phase;

信号灯控制智能体的感知状态S为The perception state S of the signal light control agent is

S＝state_ev，state_social，state_cross。S = state_ev , state_social , state_cross .

优选的，在所述信号灯控制步骤中，信号灯控制智能体的奖励函数包括特车的奖励函数和社会车辆的奖励函数；Preferably, in the signal light control step, the reward function of the signal light control agent includes a special car reward function and a social vehicle reward function;

特车的奖励函数reward_ev由以下公式计算：The special car's reward function reward_ev is calculated by the following formula:

R_p＝20R_p =20

其中，E_c表示本轮决策过程中观察到的特车集合，E_p表示本轮中不出现但是在上一轮决策过程中观察到的特车集合，R_c(e)表示E_c集合中特车的奖励函数，R_p表示E_p集合中特车的奖励函数，speed_e表示车辆e的速度，wait_e表示车辆e在交叉口的等待时间；Among them, E_c represents the set of special vehicles observed in the decision-making process of this round, E_p represents the set of special vehicles that did not appear in this round but was observed in the decision-making process of the previous round, and R_c (e) represents the set of special vehicles in the E_c set The reward function of the special car, R_p represents the reward function of the special car in the E_p set, speed_e represents the speed of the vehicle e, and wait_e represents the waiting time of the vehicle e at the intersection;

社会车辆的奖励函数reward_social由以下公式进行计算：The reward function reward_social of the social vehicle is calculated by the following formula:

其中，L为进口车道总数，q_l表示车道l的排队长度，N为车道检测范围内的所有车辆总数，wait_i表示车辆i的等待时间，α为系数；Among them, L is the total number of entrance lanes, q_l represents the queue length of lane l, N is the total number of all vehicles within the detection range of the lane, wait_i represents the waiting time of vehicle i, and α is a coefficient;

信号灯控制智能体的奖励函数R为社会车辆奖励函数以及特车奖励函数的加权和，由以下公式计算：The reward function R of the signal light control agent is the weighted sum of the social vehicle reward function and the special car reward function, which is calculated by the following formula:

R＝reward_ev+β*reward_socialR＝reward_ev +β*reward_social

其中，β为系数，调节社会车辆占奖励的比重。Among them, β is a coefficient, which adjusts the proportion of social vehicles in the reward.

优选的，在所述通信判断步骤中，通信判断智能体观察实时的交通状态，若通信判断智能体检测到有特车到来，则判断特车的行进方向，进而通知下游的交叉口的智能体；Preferably, in the communication judging step, the communication judging agent observes real-time traffic conditions, if the communication judging agent detects the arrival of a special car, it judges the direction of travel of the special car, and then notifies the downstream intersection agent ;

所述通信判断智能体感知的交通状态包括特车的状态以及周围交叉口车道的状态；The traffic state perceived by the communication judging agent includes the state of the special vehicle and the state of the surrounding intersection lanes;

特车的状态state_ev包括特车所在的车道信息和特车的类别，状态编码如下：The state state_ev of the special vehicle includes the information of the lane where the special vehicle is located and the category of the special vehicle. The state code is as follows:

state_ev＝[p₁，p₂，…，p_n，c₁，c₂，…，c_g]state_ev =[p₁ , p₂ , ..., p_n , c₁ , c₂ , ..., c_g ]

其中，g为特车的种类数，该状态所有的值初始化为0，若检测到车道i上有特车，则p_i＝1，若特车的种类为第i种，则c_i＝1；Among them, g is the number of types of special vehicles, all the values in this state are initialized to 0, if it is detected that there is a special vehicle on lane i, then p_i =1, if the type of special vehicle is the i-th type, then c_i =1 ;

周围交叉口车道的状态state_to包括从本交叉口到邻居交叉口的车道整体的车辆密度，状态编码如下：The state state of the surrounding intersection lanes_includes the overall vehicle density of the lanes from the intersection to the neighbor intersection, and the state code is as follows:

state_to＝[t₁，t₂，...，t_h]state_to = [t₁ , t₂ , . . . , t_h ]

其中，h为该交叉口的邻居总数，t_i表示从本交叉口到邻居交叉口i道路的车辆密度；Among them, h is the total number of neighbors at the intersection, and t_i represents the vehicle density of the road from this intersection to the neighbor intersection i;

通信判断智能体的感知状态S_j为：The perception state S_j of the communication judgment agent is:

S_j＝state_ev，state_to。S_j = state_ev , state_to .

优选的，在所述通信判断步骤中，所述通信判断智能体的奖励函数R_j由以下公式计算：Preferably, in the communication judgment step, the reward function R_j of the communication judgment agent is calculated by the following formula:

其中，N_target为特车最终到达的结点，N_impossible为不可能到达的结点集合，N_judge为智能体所判断的要达到的结点集合。R(x)表示在N_judge集合中结点的奖励，x为N_judge集合中结点；R_j为智能体的奖励函数。Among them, N_target is the node that the special vehicle finally reaches, N_impossible is the set of nodes that cannot be reached, and N_judge is the set of nodes that the agent judges to reach. R(x) represents the reward of the node in the N_judge set, x is the node in the N_judge set; R_j is the reward function of the agent.

优选的，在所述结合步骤中，在两层智能体结合的情况下，信号灯控制智能体的状态添加将要到来的特车的信息state_coming并编码，为每个车道赋予位置，采用如下定义：Preferably, in the combining step, in the case of the combination of two layers of agents, the state of the signal light control agent adds the information state_coming of the coming special car and encodes it, and assigns a position to each lane, using the following definition:

state_coming＝[l₁，l₂，...，l_n]state_coming = [l₁ , l₂ ,..., l_n ]

其中，l_i表示车道i，若判断某一条边上有特车到来，将该条边上所有的车道对应的l置为1；信号灯控制智能体的感知状态S更新为：Among them, l_i represents lane i. If it is judged that there is a special vehicle coming on a certain side, set l corresponding to all lanes on the side to 1; the perception state S of the signal light control agent is updated as:

S＝state_ev，state_social，state_cross，state_coming。S = state_ev , state_social , state_cross , state_coming .

根据本发明提供的一种基于强化学习的特车优先通行系统，包括如下模块：According to a special vehicle priority passing system based on reinforcement learning provided by the present invention, it includes the following modules:

布置模块：在交叉口布置信号灯控制智能体；Arrangement module: Arrange signal light control agents at intersections;

信号灯控制模块：使用信号灯控制智能体控制交叉口信号灯的运行，根据获取的交通状态决策当前需要切换的相位。Signal light control module: Use the signal light control agent to control the operation of the intersection signal light, and decide the current phase that needs to be switched according to the acquired traffic status.

优选的，在所述布置模块中，在交叉口布置通信判断智能体；Preferably, in the arrangement module, the communication judgment agent is arranged at the intersection;

该方法还包括通信判断模块：使用通信判断智能体在特车到达时判断选定周围智能体并进行通信，将特车信息通知给下游的交叉口；The method also includes a communication judging module: using the communication judging agent to judge and select surrounding agents when the special vehicle arrives and communicate, and notify the downstream intersection of the special vehicle information;

配合模块：通过结合信号灯控制智能体和通信判断智能体优化相位安排策略，进行特车的优先通行。Coordination module: By combining the signal light control agent and the communication judgment agent to optimize the phase arrangement strategy, the priority of special vehicles is given.

与现有技术相比，本发明具有如下的有益效果：Compared with the prior art, the present invention has the following beneficial effects:

1、本发明通过结合交通信号灯控制智能体和通信判断智能体，使得相位安排的策略的更优，达到特车优先通行的目的；1. The present invention makes the strategy of phase arrangement more optimal by combining the traffic signal light control agent and the communication judgment agent, so as to achieve the purpose of priority for special vehicles;

2、本发明智能体通过与环境交互来学习最佳控制决策，对不同的交通情况做出灵活的反应，在不影响社会车流的情况下，让特车快速通过交叉口，同时能够应对多个特车同时出现的情况；2. The intelligent body of the present invention learns the optimal control decision by interacting with the environment, responds flexibly to different traffic conditions, and allows special vehicles to pass through the intersection quickly without affecting the social traffic flow, while being able to deal with multiple The situation where special vehicles appear at the same time;

3、本发明在没有特车出现的交叉口，智能信号灯能够改善社会车辆的交通状况，减少交叉口拥堵，提高绿灯时间的利用率。3. In the intersection where there are no special vehicles, the intelligent signal light can improve the traffic conditions of social vehicles, reduce intersection congestion, and improve the utilization rate of green light time.

附图说明Description of drawings

通过阅读参照以下附图对非限制性实施例所作的详细描述，本发明的其它特征、目的和优点将会变得更明显：Other characteristics, objects and advantages of the present invention will become more apparent by reading the detailed description of non-limiting embodiments made with reference to the following drawings:

图1为本发明的交通信号灯控制智能体的结构图；Fig. 1 is the structural diagram of traffic signal light control agent of the present invention;

图2为本发明的通信判断智能体的结构图；Fig. 2 is a structural diagram of the communication judging agent of the present invention;

图3为交通信号灯控制智能体以及通信判断智能体相结合的系统框架图。Figure 3 is a system frame diagram of the combination of the traffic signal light control agent and the communication judgment agent.

具体实施方式Detailed ways

下面结合具体实施例对本发明进行详细说明。以下实施例将有助于本领域的技术人员进一步理解本发明，但不以任何形式限制本发明。应当指出的是，对本领域的普通技术人员来说，在不脱离本发明构思的前提下，还可以做出若干变化和改进。这些都属于本发明的保护范围。The present invention will be described in detail below in conjunction with specific embodiments. The following examples will help those skilled in the art to further understand the present invention, but do not limit the present invention in any form. It should be noted that those skilled in the art can make several changes and improvements without departing from the concept of the present invention. These all belong to the protection scope of the present invention.

本发明公开了一种基于双层智能体强化学习的特车优先通行方法，包括：交通信号灯控制智能体、通信判断智能体以及两层强化学习的结合。The invention discloses a special vehicle priority passing method based on double-layer intelligent body reinforcement learning, comprising: a combination of traffic signal light control intelligent body, communication judgment intelligent body and two-layer reinforcement learning.

该方法包括如下步骤：The method comprises the steps of:

布置步骤：在交叉口布置信号灯控制智能体；在交叉口布置通信判断智能体。Arrangement steps: Arrange signal light control agents at intersections; arrange communication judgment agents at intersections.

通信判断步骤：使用通信判断智能体在特车到达时判断选定周围智能体并进行通信，将特车信息通知给下游的交叉口。Communication judgment step: Use the communication judgment agent to judge and select surrounding agents when the special vehicle arrives and communicate, and notify the downstream intersection of the special vehicle information.

即，该方法使用交通信号灯控制智能体来控制交叉口信号灯的运行，根据获得的车流状态，周期地决策当前需要切换的相位。通信判断智能体用于在特车到达时判断和哪些周围智能体进行通信，将特车信息通知给下游的交叉口。最后，通过结合交通信号灯控制智能体和通信判断智能体，使得相位安排的策略的更优，达到特车优先通行的目的。That is, the method uses the traffic signal light control agent to control the operation of the intersection signal light, and periodically decides the phase that needs to be switched according to the obtained traffic flow state. The communication judging agent is used to judge which surrounding agents communicate with when the special vehicle arrives, and notify the downstream intersection of the special vehicle information. Finally, by combining the traffic light control agent and the communication judgment agent, the strategy of phase arrangement is optimized to achieve the purpose of priority for special vehicles.

交通信号灯控制智能体：每个交叉口放置一个信号灯控制智能体，智能体可以观察到交叉口实时的交通状态，交通状态可以由车载设备GPS、路边检测器、交叉口摄像头等设备获得，然后智能体根据观察到的状态，周期性决策，动态地安排合适的交通灯信号，从而缓解交叉口的拥堵，同时使得特车优先通行。该智能体使用强化学习进行训练。Traffic light control agent: A signal light control agent is placed at each intersection. The agent can observe the real-time traffic status of the intersection. The traffic status can be obtained by vehicle-mounted equipment GPS, roadside detectors, intersection cameras, etc., and then According to the observed state, the agent makes periodic decisions and dynamically arranges appropriate traffic light signals, thereby alleviating the congestion at the intersection and allowing special vehicles to pass first. The agent is trained using reinforcement learning.

对于交叉口信号灯控制智能体感知到的环境状态空间的定义，本发明设置的状态分为特车状态、社会车辆状态以及交通信号灯的状态，可由路边检测器、交叉口摄像头等设备获得。每个交叉口的智能体感知距离交叉口50米到60米的进口道的车辆状态，若某一车道小于该范围，则智能体观察到的状态为整条车道。For the definition of the environment state space perceived by the intersection signal light control agent, the states set by the present invention are divided into special vehicle state, social vehicle state and traffic signal state, which can be obtained by roadside detectors, intersection cameras and other equipment. The agent at each intersection perceives the vehicle status of the entrance road 50 meters to 60 meters away from the intersection. If a certain lane is smaller than this range, the state observed by the agent is the entire lane.

对于特车的状态，包括特车所在车道的信息、以及特车的瞬时速度；将其状态编码为：For the state of the special car, it includes the information of the lane where the special car is located and the instantaneous speed of the special car; its state is coded as:

其中，n为该交叉口的进口车道总数，该状态所有的值初始化为0，若检测到车道i上有特车，则p_i＝1，对应的s_i设置为特车当前的速度。Among them, n is the total number of entrance lanes at the intersection, and all values in this state are initialized to 0. If a special vehicle is detected on lane i, then p_i =1, and the corresponding_si is set to the current speed of the special vehicle.

对于社会车辆的状态，包括每一个进口车道的观察范围内的社会车辆的排队长度，以及感知范围内车辆密度。For the status of social vehicles, it includes the queuing length of social vehicles within the observation range of each entrance lane, and the vehicle density within the perception range.

其中，n为该交叉口的进口车道总数，对每个车道i，d_i为该车道上的车辆密度，q_i定义为该车道上的车道排队长度。Among them, n is the total number of entrance lanes at the intersection, for each lane i, d_i is the vehicle density on this lane, and q_i is defined as the lane queue length on this lane.

对于交通信号灯的状态，包括当前的交通信号相位。For the status of traffic lights, this includes the current phase of the traffic signal.

对于交通信号灯的状态state_cross，包括当前的交通信号相位，将其设置为one-hot向量。one-hot表示独热码。综上可以得到交通信号灯控制智能体感知的状态S为For the traffic signal state state_cross , including the current traffic signal phase, set it to a one-hot vector. one-hot means one-hot encoding. In summary, the state S perceived by the traffic light control agent can be obtained as

对于智能体的动作空间，将每个动作定义为一个绿灯相位，即该交叉口信号灯的红绿组合。具体来说，需要预先定义为每个交叉点的所有可行相位的集合U，智能体每5s进行决策，在该集合中选择其中一个的相位执行，若相位不变，则信号灯按原来的执行，若相位发生改变，则将信号灯当前的相位切换至新的相位，并且在其中插入相对应的黄色信号。该切换的过程即为智能体执行动作的过程。For the action space of the agent, each action is defined as a green light phase, which is the red and green combination of the intersection signal light. Specifically, it needs to be pre-defined as the set U of all feasible phases of each intersection point. The agent makes a decision every 5s, and selects one of the phases in the set to execute. If the phase remains unchanged, the signal light will be executed as before. If the phase changes, switch the current phase of the signal lamp to the new phase, and insert the corresponding yellow signal into it. The switching process is the process of the agent performing actions.

智能体的奖励函数包括应急车辆和社会车辆的奖励函数。对于智能体的奖励函数设计。智能体的奖励函数包括两部分，一部分为应急车辆的奖励函数，另一部分为社会车辆的奖励函数。应急车辆的奖励函数考虑到了应急车辆的速度、等待时间以及是否通过交叉口。The agent's reward function includes the reward functions of emergency vehicles and social vehicles. Reward function design for agents. The reward function of the agent includes two parts, one is the reward function of emergency vehicles, and the other is the reward function of social vehicles. The reward function for the emergency vehicle takes into account the speed of the emergency vehicle, the waiting time, and whether it passes through the intersection.

由以下公式计算。Calculated by the following formula.

R_p＝20R_p =20

其中，E_c表示本轮决策过程中观察到的特车集合，E_p表示本轮中不出现但是在上一轮决策过程中观察到的特车集合，R_c(e)表示E_c集合中特车的奖励函数，R_p表示E_p集合中特车的奖励函数，speed_e表示车辆e的速度，wait_e表示车辆e在交叉口的等待时间。Among them, E_c represents the set of special vehicles observed in the decision-making process of this round, E_p represents the set of special vehicles that did not appear in this round but was observed in the decision-making process of the previous round, and R_c (e) represents the set of special vehicles in the E_c set The reward function of the special car, R_p represents the reward function of the special car in the E_p set, speed_e represents the speed of the vehicle e, and wait_e represents the waiting time of the vehicle e at the intersection.

对于社会车辆的奖励函数，主要考虑到社会车辆的等待时间以及排队长度。可以由以下公式进行计算：For the reward function of social vehicles, the waiting time and queue length of social vehicles are mainly considered. It can be calculated by the following formula:

其中，L为进口车道总数，q_l表示车道l的排队长度，N为车道检测范围内的所有车辆总数，wait_i表示车辆i的等待时间，α为系数。Among them, L is the total number of entrance lanes, q_l is the queue length of lane l, N is the total number of all vehicles within the detection range of the lane, wait_i is the waiting time of vehicle i, and α is a coefficient.

智能体的奖励函数为社会车辆奖励函数以及应急车辆奖励函数的加权和。由以下公式计算：The reward function of the agent is the weighted sum of the social vehicle reward function and the emergency vehicle reward function. Calculated by the following formula:

R＝reward_ev+β*reward_socialR＝reward_ev +β*reward_social

其中，β为系数，可以调节社会车辆占奖励的比重。Among them, β is a coefficient, which can adjust the proportion of social vehicles in rewards.

本发明中信号控制智能体的结构如图1所示，智能体将感知到的状态进行合并，然后分别通过一个LSTM层，将交通流的特征提取出来，并输入到策略网络和价值网络中，获得对应的策略和价值。信号灯控制智能体使用强化学习A2C算法进行训练。以下是具体步骤。The structure of the signal control agent in the present invention is shown in Figure 1. The agent combines the perceived states, and then extracts the characteristics of the traffic flow through an LSTM layer respectively, and inputs them into the strategy network and the value network. Get the corresponding strategy and value. The signal light control agent is trained using reinforcement learning A2C algorithm. The following are the specific steps.

信号灯控制智能体的强化学习训练根据以下步骤进行，首先初始化环境和网络参数，本发明中信号控制智能体的结构如图1所示，结构中包括三个模块。第一个模块是智能体观察到的状态S，状态S包括了交通信号相位、特车速度、特车所在车道、车道车辆密度、车道排队长度，然后将其状态进行合并拼接至下一模块的输入。第二个模块是LSTM层，LSTM是指长短期记忆神经网络，用于将合并后的状态输入后输出一段时间内的交通流特征，然后输入到第三个模块中，提取出的特征分别输入到策略网络和价值网络中，策略网络用于输出智能体要做的策略π_θ，价值网络输出动作对应的价值V。The intensive learning training of the signal light control agent is carried out according to the following steps. First, the environment and network parameters are initialized. The structure of the signal control agent in the present invention is shown in FIG. 1 , and the structure includes three modules. The first module is the state S observed by the agent. The state S includes the traffic signal phase, the speed of the special vehicle, the lane where the special vehicle is located, the vehicle density in the lane, and the queue length of the lane, and then merge and splicing its state into the next module. enter. The second module is the LSTM layer. LSTM refers to the long-term short-term memory neural network, which is used to input the merged state and output the traffic flow characteristics for a period of time, and then input it into the third module, and the extracted features are input separately. In the policy network and value network, the policy network is used to output the strategy π_θ that the agent will do, and the value network outputs the value V corresponding to the action.

然后进行智能体的训练。智能体根据观察到的环境状态S，输入到智能体网络中，得到策略π_θ，以及价值网络输出的价值V，然后以策略π_θ为概率取样得到智能体的动作a，智能体执行动作a后环境发生改变，可以观察到的环境状态为S′，根据状态S和状态S′，使用本发明提供的方法计算出智能体在本次决策中的奖励R，然后使用以下公式进行智能体参数的更新，(就是A2C算法的通用做法)。Then train the agent. According to the observed environment state S, the agent inputs it into the agent network, obtains the strategy π_θ , and the value V output by the value network, and then takes the strategy π_θ as the probability sampling to obtain the action a of the agent, and the agent executes the action a After the environment changes, the observed environmental state is S′, according to the state S and state S′, use the method provided by the present invention to calculate the reward R of the agent in this decision, and then use the following formula to determine the agent parameters The update, (that is, the general practice of the A2C algorithm).

其中，θ为策略网络的参数，ω为价值网络的参数，α为策略网络的学习率，π_θ(S，a)表示状态S和动作a对应的策略的值，表示对logπ_θ(S，a)关于θ求导数，R为本次决策中的奖励R，γ表示衰减因子，V(S′)表示状态S′的价值，V(S)表示状态S的价值。Among them, θ is the parameter of the policy network, ω is the parameter of the value network, α is the learning rate of the policy network, π_θ (S, a) represents the value of the strategy corresponding to the state S and action a, Represents the derivative of logπ_θ (S, a) with respect to θ, R is the reward R in this decision, γ represents the attenuation factor, V(S′) represents the value of state S′, V(S) represents the value of state S .

参数按照以上做法进行更新后，进入下一回合，环境状态S赋值为S′，然后继续按照以上方法进行更新，直到学习至收敛。这便是信号灯控制智能体的强化学习训练的过程。After the parameters are updated according to the above method, enter the next round, assign the environment state S to S′, and then continue to update according to the above method until the learning reaches convergence. This is the process of reinforcement learning training of the signal light control agent.

通信判断智能体：每个交叉口放置一个通信判断智能体，通信判断智能体可以观察到实时的交通状态，交通状态可以由车载设备GPS、路边检测器、交叉口摄像头等设备获得，若智能体检测到有特车到来，则会判断特车的行进方向，进而通知下游的交叉口的智能体。本发明的智能体判断的要进行通信的邻居不唯一，而是可以同时通知多个邻居。Communication Judgment Agent: A communication judgment agent is placed at each intersection. The communication judgment agent can observe the real-time traffic status. When the agent detects the arrival of a special vehicle, it will judge the direction of travel of the special vehicle, and then notify the agent at the downstream intersection. The intelligent agent of the present invention judges that the neighbor to communicate is not unique, but can notify multiple neighbors at the same time.

对于通信判断智能体感知到的环境状态空间的定义，本发明设置的状态为特车的状态以及到周围交叉口车道的状态，可由路边检测器、交叉口摄像头等设备获得。每个交叉口的智能体感知距离交叉口50米到60米的进口道的车辆状态，若某一车道小于该范围，则智能体观察到的状态为整条车道。For the definition of the environment state space perceived by the communication judgment agent, the state set by the present invention is the state of the special vehicle and the state of the surrounding intersection lanes, which can be obtained by roadside detectors, intersection cameras and other equipment. The agent at each intersection perceives the vehicle status of the entrance road 50 meters to 60 meters away from the intersection. If a certain lane is smaller than this range, the state observed by the agent is the entire lane.

对于特车的状态，包括(考虑到了)特车所在的车道信息p_i，以及特车的类别c_i，其状态编码如下：For the state of the special vehicle, including (considering) the lane information p_i where the special vehicle is located, and the category_ci of the special vehicle, its state code is as follows:

其中，n为该交叉口的进口车道总数，g为特车的种类数，该状态所有的值初始化为0，若检测到车道i上有特车，则p_i＝1，若特车的种类为第i种，则c_i＝1。Among them, n is the total number of entrance lanes at the intersection, g is the number of types of special vehicles, and all the values of this state are initialized to 0, if it is detected that there is a special vehicle on lane i, then p_i =1, if the type of special vehicle is the i-th type, then c_i =1.

对于周围交叉口车道的状态，包括(考虑到了)从本交叉口到邻居交叉口的车道整体的车辆密度。其状态编码如下：For the state of the surrounding intersection lanes, including (taking into account) the overall vehicle density of the lanes from this intersection to neighbor intersections. Its status code is as follows:

state_to＝[t₁，t₂，...，t_h]state_to = [t₁ , t₂ , . . . , t_h ]

其中h为该交叉口的邻居总数，t_i表示从本交叉口到邻居交叉口i道路的车辆密度。综上可以得到通信判断智能体感知的状态S_j为：Where h is the total number of neighbors of the intersection, and t_i represents the vehicle density of the road from this intersection to the neighbor intersection i. To sum up, it can be obtained that the state S_j perceived by the communication judgment agent is:

S_j＝state_ev，state_to。S_j = state_ev , state_to .

对于智能体的奖励函数R_j的设计，一方面要考虑到该通知智能体要尽量通知到，不该通知的智能体要尽量避免通知。由以下公式计算：For the design of the reward function R_j of the agent, on the one hand, it should be considered that the notifying agent should be notified as much as possible, and the agent that should not be notified should try to avoid notification as much as possible. Calculated by the following formula:

其中，N_target为特车最终到达的结点，N_impossible为不可能到达的结点集合，N_judge为智能体所判断的要达到的结点集合。R(x)表示在N_judge集合中结点的奖励，x为R(x)函数的参数，为N_judge集合中结点。R_j为智能体的奖励函数。Among them, N_target is the node that the special vehicle finally reaches, N_impossible is the set of nodes that cannot be reached, and N_judge is the set of nodes that the agent judges to reach. R(x) represents the reward of nodes in the N_judge set, and x is the parameter of the R(x) function, which is the node in the N_judge set. R_j is the reward function of the agent.

对于智能体的动作空间，将每个动作定义为要通知的邻居集合。对于每个邻居来说，都可以判断是否通知，所以动作空间的大小为H＝2^h，其中h为该智能体的邻居总数。智能体在观察到有特车到达时，根据定义的状态进行决策，从动作集合中选出最佳的要通知的邻居集合，然后对其进行通信，将特车即将到达这一信息传达给下游的智能体。For the action space of the agent, each action is defined as the set of neighbors to be notified. For each neighbor, it can be judged whether to notify, so the size of the action space is H=2^h , where h is the total number of neighbors of the agent. When the agent observes the arrival of a special car, it makes a decision according to the defined state, selects the best set of neighbors to be notified from the action set, and then communicates with it, and conveys the information that the special car is about to arrive to the downstream intelligent body.

本发明中通信判断智能体的结构如图2所示，智能体将感知到的状态进行合并，作为观察到交通流的特征，并输入到价值网络中，获得每个动作对应的价值，然后选择价值最大的动作执行。通信判断智能体使用强化学习DQN算法进行训练。以下是具体步骤。The structure of the communication judgment agent in the present invention is shown in Figure 2. The agent combines the perceived states as the characteristics of the observed traffic flow, and inputs it into the value network to obtain the value corresponding to each action, and then selects The action with the greatest value is executed. The communication judgment agent is trained using the reinforcement learning DQN algorithm. The following are the specific steps.

通信判断智能体的强化学习训练根据以下步骤进行，首先初始化环境和网络参数，本发明中信号控制智能体的结构如图1所示，结构中包括已两个模块。第一个模块是智能体观察到的状态S_j，状态S_j包括了特车的类别、特车所在的车道以及至周围交叉口车道的车流密度。然后将其状态进行合并拼接至下一模块的输入。第二个模块是价值网络模块，用于将合并后的状态输入后输出每个动作对应的价值，图中Q(S，a₀)表示在状态S_j下执行动作a₀的价值，图中Q(S，a_H)表示在状态S_j下执行动作a_H的价值。图中H表示通信判断智能体的动作总数。The intensive learning training of the communication judgment agent is carried out according to the following steps. First, the environment and network parameters are initialized. The structure of the signal control agent in the present invention is shown in FIG. 1 , and the structure includes two modules. The first module is the state S_j observed by the agent. The state S_j includes the category of the special vehicle, the lane where the special vehicle is located, and the traffic flow density to the surrounding intersection lanes. Its state is then merged and spliced to the input of the next module. The second module is the value network module, which is used to input the merged state and output the value corresponding to each action. In the figure, Q(S, a₀ ) represents the value of executing action a₀ in state S_j . In the figure Q(S, a_H ) represents the value of performing action a_H in state S_j . H in the figure represents the total number of actions of the communication judgment agent.

然后进行智能体的训练。智能体根据观察到的环境状态S_j，输入到智能体网络中，得到每个动作的价值Q(S_j，a)，然后在这些输出中取最大的价值对应的动作，得到智能体的动作a_j，智能体执行动作a_j后环境发生改变，可以观察到的环境状态为S′_j，根据状态S_j和状态S′_j，使用本发明提供的方法计算出智能体在本次决策中的奖励R_j，然后使用以下公式进行智能体参数的更新，(就是DQN算法的通用做法)。Then train the agent. According to the observed environment state S_j , the agent inputs it into the agent network to obtain the value Q(S_j , a) of each action, and then takes the action corresponding to the maximum value in these outputs to obtain the action of the agent a_j , the environment changes after the agent executes the action a_j , and the observable state of the environment is S′_j , according to the state S_j and state S′_j , use the method provided by the present invention to calculate the The reward R_j , and then use the following formula to update the parameters of the agent (this is the general practice of the DQN algorithm).

其中，R_j为奖励值，γ为折扣系数，表示对未来奖励的折扣，S_j为环境观察到的状态，Q表示价值网络的价值函数，a_j为选择的动作，S′_j为状态S_j执行动作a_j后观察到的状态，w_j为价值网络的网络参数，α为价值网络的学习率。Among them, R_j is the reward value, γ is the discount coefficient, which means the discount for future rewards, S_j is the observed state of the environment, Q is the value function of the value network, a_j is the selected action, and S′_j is the state S_j is the observed state after performing action a_j , w_j is the network parameter of the value network, and α is the learning rate of the value network.

参数按照以上做法进行更新后，进入下一回合，环境状态S_j赋值为S′_j，然后继续按照以上方法进行更新，直到学习至收敛。这便是通信判断智能体的强化学习训练的过程。After the parameters are updated according to the above method, enter the next round, and the environment state S_j is assigned the value S′_j , and then continue to update according to the above method until the learning converges. This is the process of reinforcement learning training of the communication judgment agent.

两层智能体的结合：本步骤将信号灯控制智能体以及通信判断智能体结合，共同协作达到应急车辆优先通行的目的。在两者的结合过程中，信号灯控制智能体负责控制交通信号灯的运行，通信判断智能体在有特车的情况下，判断和哪些邻居通信。交通信号灯控制智能体以及通信判断智能体相结合的系统框架如图3所示。Combination of two layers of agents: In this step, the signal light control agent and the communication judgment agent are combined to cooperate together to achieve the priority of emergency vehicles. In the process of combining the two, the signal light control agent is responsible for controlling the operation of traffic lights, and the communication judgment agent judges which neighbors to communicate with when there is a special car. The system framework of the combination of traffic light control agent and communication judgment agent is shown in Figure 3.

在两层智能体结合的情况下，信号灯控制智能体的状态需要新添加一部分内容，为将要到来的特车的信息state_coming编码，为每个车道赋予一个位置。采用如下定义：In the case of the combination of two layers of agents, the_state of the signal light control agent needs to add a new part of the content, which encodes the information state coming of the coming special car and assigns a position to each lane. The following definitions are used:

state_coming＝[l₁，l₂，...，l_n]state_coming = [l₁ , l₂ ,..., l_n ]

其中，n表示交叉口进口道总数，l_i表示车道i，若判断某一条边上有特车到来，将该条边上所有的车道对应的l置为1。所以，智能体感知到的状态更新为：Among them, n represents the total number of intersection entrances, and l_i represents lane i. If it is judged that there is a special vehicle coming on a certain side, set l corresponding to all lanes on the side to 1. Therefore, the state update perceived by the agent is:

S＝state_ev，state_social，state_cross，state_commingS＝state_ev , state_social , state_cross , state_comming

此外，两层智能体的结合需要使得信号灯控制智能体的奖励函数改变，即需要额外的奖励设置，在基础的奖励设置情况下，若选择的动作能完全覆盖特车要来的边，则奖励值加一个常数。即In addition, the combination of two layers of agents needs to change the reward function of the signal light control agent, which requires additional reward settings. In the case of basic reward settings, if the selected action can completely cover the side where the special car is coming, then reward Value plus a constant. Right now

R＝R+cR=R+c

对于两层智能体的训练。首先，使用上述的通信判断智能体的强化学习训练步骤训练通信判断智能体，然后等其收敛后，再使用上述的信号灯控制智能体的强化学习训练步骤训练信号灯控制智能体，在训练信号灯控制智能体的过程中，智能体根据观察到的环境状态S，若检测到特车的到来，通知智能体则对邻居进行通知，按照本发明(上述步骤)对信号灯控制智能体的状态S进行更新，然后执行智能体选择的动作a，按照更新后的奖励函数计算奖励R，最后按照原步骤更新网络的参数。若没有检测到特车到来，则按照原步骤进行智能体的训练。最后可获得两层智能体的结合。For the training of two-layer agents. First, train the communication judgment agent using the above-mentioned intensive learning training steps of the communication judgment agent, and then wait for it to converge, then use the above-mentioned reinforcement learning training steps of the signal light control agent to train the signal light control agent, and then train the signal light control agent In the process of the agent, if the agent detects the arrival of the special car according to the observed environment state S, the agent is notified to notify the neighbors, and the state S of the signal light control agent is updated according to the present invention (above-mentioned steps), Then execute the action a selected by the agent, calculate the reward R according to the updated reward function, and finally update the parameters of the network according to the original steps. If the arrival of the special car is not detected, the training of the agent is carried out according to the original steps. Finally, the combination of two layers of agents can be obtained.

本发明还提供一种基于强化学习的特车优先通行系统，所述基于强化学习的特车优先通行系统可以通过执行所述基于强化学习的特车优先通行方法的流程步骤予以实现，即本领域技术人员可以将所述基于强化学习的特车优先通行方法理解为所述基于强化学习的特车优先通行系统的优选实施方式。The present invention also provides a special vehicle priority passage system based on reinforcement learning, which can be realized by executing the process steps of the special vehicle priority passage method based on reinforcement learning, which is the state-of-the-art A skilled person may understand the reinforcement learning-based special vehicle priority passing method as a preferred implementation of the reinforcement learning-based special vehicle priority passage system.

该系统包括如下模块：The system includes the following modules:

布置模块：在交叉口布置信号灯控制智能体；在交叉口布置通信判断智能体。Arrangement module: Arrange signal light control agents at intersections; arrange communication judgment agents at intersections.

通信判断模块：使用通信判断智能体在特车到达时判断选定周围智能体并进行通信，将特车信息通知给下游的交叉口。Communication Judgment Module: Use the communication judgment agent to judge and select the surrounding agents when the special vehicle arrives and communicate, and notify the downstream intersection of the special vehicle information.

本领域技术人员知道，除了以纯计算机可读程序代码方式实现本发明提供的系统及其各个装置、模块、单元以外，完全可以通过将方法步骤进行逻辑编程来使得本发明提供的系统及其各个装置、模块、单元以逻辑门、开关、专用集成电路、可编程逻辑控制器以及嵌入式微控制器等的形式来实现相同功能。所以，本发明提供的系统及其各项装置、模块、单元可以被认为是一种硬件部件，而对其内包括的用于实现各种功能的装置、模块、单元也可以视为硬件部件内的结构；也可以将用于实现各种功能的装置、模块、单元视为既可以是实现方法的软件模块又可以是硬件部件内的结构。Those skilled in the art know that, in addition to realizing the system provided by the present invention and its various devices, modules, and units in a purely computer-readable program code mode, the system provided by the present invention and its various devices can be completely programmed by logically programming the method steps. , modules, and units implement the same functions in the form of logic gates, switches, ASICs, programmable logic controllers, and embedded microcontrollers. Therefore, the system and its various devices, modules, and units provided by the present invention can be regarded as a hardware component, and the devices, modules, and units included in it for realizing various functions can also be regarded as hardware components. The structure; the devices, modules, and units for realizing various functions can also be regarded as not only the software modules for realizing the method, but also the structures in the hardware components.

以上对本发明的具体实施例进行了描述。需要理解的是，本发明并不局限于上述特定实施方式，本领域技术人员可以在权利要求的范围内做出各种变化或修改，这并不影响本发明的实质内容。在不冲突的情况下，本申请的实施例和实施例中的特征可以任意相互组合。Specific embodiments of the present invention have been described above. It should be understood that the present invention is not limited to the specific embodiments described above, and those skilled in the art may make various changes or modifications within the scope of the claims, which do not affect the essence of the present invention. In the case of no conflict, the embodiments of the present application and the features in the embodiments can be combined with each other arbitrarily.