CN116523162A

Movatterモバイル変換

Info

Publication number: CN116523162A
Application number: CN202310541102.3A
Authority: CN
Inventors: 杨卫东; 朱治邦; 李大韦; 周涛; 刘锐鑫; 白桦; 王皖东; 张桐; 曹璐; 范甬辰; 齐超
Original assignee: China Design Group Co Ltd
Current assignee: China Design Group Co Ltd
Priority date: 2023-05-15
Filing date: 2023-05-15
Publication date: 2023-08-01
Anticipated expiration: 2043-05-15
Also published as: CN116523162B

Abstract

The invention discloses a comprehensive traffic distribution method based on mobile phone signaling, which utilizes mobile phone signaling data to solve the problems of parameter estimation and parameter calibration of a comprehensive traffic random user balance model, and specifically comprises the following steps: pre-processing a mobile phone signaling; extracting travel OD and travel chain data; constructing a comprehensive integrated traffic network and a travel cost function; generating a travel path selection set; constructing a discrete selection model based on generalized selection; estimating parameters by using a maximum likelihood estimation method; constructing a random user balance model and solving by using an MSWA algorithm; constructing and training a support vector regression model, and if the accuracy is not in accordance with the requirement, adjusting parameters for retraining; and solving optimal parameters by combining the trained model and the STA_GA algorithm. The invention can exert the advantages of mobile phone signaling data, improve the accuracy of the balanced model traffic distribution of the comprehensive traffic random users, improve the efficiency of model parameter calibration through machine learning, and better provide scientific basis for regional traffic planning and management.

Description

Translated fromChinese

一种基于手机信令的综合交通分配方法A comprehensive traffic assignment method based on mobile phone signaling

技术领域Technical Field

本发明属于大范围交通出行规划技术领域，特别涉及一种基于手机信令的综合交通分配方法。The invention belongs to the technical field of large-scale traffic travel planning, and in particular relates to a comprehensive traffic allocation method based on mobile phone signaling.

背景技术Background Art

随着经济社会的不断发展，我国交通基础设施的快速发展并逐步完善，为更好地发挥交通系统的效能，亟需构建综合交通体系。综合交通模型是综合交通体系的基础，通过交通模型可以获取用户的出行情况，为综合交通体系的建设提供数据支撑。然而，受可获取数据资料限制，以往的交通模型大多沿用传统的四阶段法，只能粗略的展现区域交通的出行情况，不能很好的为综合交通规划与管理提供明确的指导。随着信息技术的发展和大数据时代的到来，交通相关的数据可以更好的存储和利用，如GPS数据、手机信令数据、高速公路收费数据等，为改进传统综合交通分配模型提供可能。With the continuous development of the economy and society, my country's transportation infrastructure has developed rapidly and gradually improved. In order to better play the role of the transportation system, it is urgent to build a comprehensive transportation system. The comprehensive transportation model is the basis of the comprehensive transportation system. Through the transportation model, the user's travel situation can be obtained, providing data support for the construction of the comprehensive transportation system. However, due to the limitations of available data, most of the previous transportation models use the traditional four-stage method, which can only roughly show the travel situation of regional transportation and cannot provide clear guidance for comprehensive transportation planning and management. With the development of information technology and the advent of the big data era, traffic-related data can be better stored and utilized, such as GPS data, mobile phone signaling data, highway toll data, etc., which provides the possibility of improving the traditional comprehensive transportation allocation model.

随着智能手机的普及，手机信令数据在各领域都具有重要的应用价值，可以记录完整的用户的出行活动轨迹，具有覆盖范围广、用户体量大、获取成本低、实时性强等优势，是交通大数据的重要数据类型之一，可用于区域交通出行分析。With the popularization of smart phones, mobile phone signaling data has important application value in various fields. It can record the complete travel activity trajectory of users. It has the advantages of wide coverage, large user base, low acquisition cost, and strong real-time performance. It is one of the important data types of traffic big data and can be used for regional traffic travel analysis.

然而，当数据量越大，利用传统交通分配模型求解的难度越大，求解所花费的时间也将大大超出可接受的范围，而并行计算、机器学习、启发式算法等技术的发展，可以大大提高海量数据处理的能力和模型求解的效率，为交通大数据的应用和发展提供了技术支撑。However, the larger the amount of data, the more difficult it is to solve using the traditional traffic distribution model, and the time spent on solving will far exceed the acceptable range. The development of technologies such as parallel computing, machine learning, and heuristic algorithms can greatly improve the ability to process massive data and the efficiency of model solving, providing technical support for the application and development of traffic big data.

发明内容Summary of the invention

本发明的目的在于针对上述现有技术存在的问题，提供一种基于手机信令的综合交通分配方法。The purpose of the present invention is to provide a comprehensive traffic distribution method based on mobile phone signaling in view of the problems existing in the above-mentioned prior art.

实现本发明目的的技术解决方案为：一种基于手机信令的综合交通分配方法，所述方法包括：The technical solution to achieve the purpose of the present invention is: a comprehensive traffic distribution method based on mobile phone signaling, the method comprising:

步骤1，通过手机信令数据获取居民出行OD数据和出行链数据；Step 1, obtain residents’ travel OD data and travel chain data through mobile phone signaling data;

步骤2，依据现有路网生成综合一体交通网；Step 2, generating a comprehensive integrated transportation network based on the existing road network;

步骤3，基于手机信令的出行链数据生成出行路径选择集；Step 3, generating a travel path selection set based on the travel chain data of mobile phone signaling;

步骤4，构建考虑广义选择的离散选择模型；Step 4: construct a discrete choice model considering generalized choice;

步骤5，构建交通分配模型；Step 5, constructing a traffic assignment model;

步骤6，利用最大似然估计法估计离散选择模型参数；Step 6, estimate the discrete choice model parameters using maximum likelihood estimation;

步骤7，基于机器学习对所述交通分配模型进行参数标定；Step 7, calibrating parameters of the traffic assignment model based on machine learning;

步骤8，基于交通分配模型，根据交通出行需求进行交通分配。Step 8: Based on the traffic assignment model, traffic assignment is performed according to traffic travel demand.

进一步地，在步骤1执行之前，还包括：对手机信令数据进行预处理，具体地：对手机信令数据进行清洗，去除错误数据，包括重复数据、乒乓数据和漂移数据。Furthermore, before executing step 1, the method further includes: preprocessing the mobile phone signaling data, specifically: cleaning the mobile phone signaling data to remove erroneous data, including duplicate data, ping-pong data and drift data.

进一步地，所述去除错误数据，具体包括：Furthermore, the removing of erroneous data specifically includes:

根据数据的MDN码和时间对数据进行排序，根据数据的位置，计算相邻的两条数据的距离是否小于预设阈值，若是，则判定两者为重复数据并删除后一条数据；根据数据的时间和位置信息，计算相邻两条数据的速度是否大于预设阈值，若是，则判定为乒乓数据并删除后一条数据；判断同一MDN码数据中是否有基站连续出现次数超过预设阈值，若是，则判断为漂移数据，只保留出现该基站的第一条和最后一条数据，将其余数据删除。The data is sorted according to the MDN code and time of the data. According to the location of the data, whether the distance between two adjacent data is less than the preset threshold is calculated. If so, the two are determined to be duplicate data and the latter data is deleted; according to the time and location information of the data, whether the speed of the two adjacent data is greater than the preset threshold is calculated. If so, it is determined to be ping-pong data and the latter data is deleted; it is determined whether there is a base station in the same MDN code data that appears more than the preset threshold number of times continuously. If so, it is determined to be drift data, and only the first and last data of the base station are retained, and the rest of the data is deleted.

进一步地，步骤1中通过手机信令数据获取居民出行OD数据和出行链数据，具体包括：Furthermore, in step 1, the resident travel OD data and travel chain data are obtained through mobile phone signaling data, specifically including:

基于预处理后的手机信令数据，根据现有的轨迹数据确定用户的驻留点，进而由驻留点识别用户出行的OD数据；Based on the pre-processed mobile phone signaling data, the user's stay point is determined according to the existing trajectory data, and then the OD data of the user's travel is identified by the stay point;

根据OD数据提取出行者编码得到出行轨迹数据；Extract the traveler codes based on OD data to obtain travel trajectory data;

将出行轨迹数据与GIS路网数据匹配得到用户的出行链数据。Match the travel trajectory data with the GIS road network data to obtain the user's travel chain data.

进一步地，步骤2所述依据现有路网生成综合一体交通网，具体包括：Furthermore, step 2 of generating a comprehensive integrated transportation network based on the existing road network specifically includes:

通过虚拟的换乘路段链接不同交通方式的枢纽节点，将各种交通方式连接成为一个整体的综合一体交通网；Through virtual transfer sections, the hub nodes of different transportation modes are connected to form an integrated transportation network.

为综合一体交通网建立路段出行费用函数，计算各路段的出行费用并将其作为路段权值。A road segment travel cost function is established for the integrated transportation network, and the travel cost of each road segment is calculated and used as the road segment weight.

进一步地，步骤3所述基于手机信令的出行链数据生成出行路径选择集，具体包括：Furthermore, the generation of the travel path selection set based on the travel chain data of the mobile phone signaling in step 3 specifically includes:

针对每一个起点终点对即OD点对，使用改进的偏离路径算法获得包括K条可能的出行路径的出行路径选择集，具体包括：For each start-end point pair, i.e., OD point pair, the improved deviation path algorithm is used to obtain a travel path selection set including K possible travel paths, including:

步骤3-1，构建空的最短路径集合P和候选路径集合B；Step 3-1, construct an empty shortest path set P and a candidate path set B;

步骤3-2，求解OD点对之间的最短路径并将其放入最短路径集合P中，将待求解路径数k设置为2；Step 3-2, solve the shortest path between the OD point pairs and put them into the shortest path set P, and set the number of paths to be solved k to 2;

步骤3-3，令i为最短路径集合P的基数即集合P中最短路径的条数，将第i个最短路径p_i上的除终点v_n的节点依次作为偏离点v_j，j＝1,2,..,n-1，n+1为路径p_i中的节点总数；Step 3-3, let i be the cardinality of the shortest path set P, that is, the number of shortest paths in the set P, and take the nodes on the i-th shortest path p_i except the end point v_n as the deviation points v_j in sequence, j = 1, 2, .., n-1, n+1 is the total number of nodes in the path p_i ;

步骤3-4，对于偏离点v_j，令最短路径p_i上的v_j和其后一节点v_j+1对应路段的权值d_jq＝∞，利用astar算法求得偏离点v_j到终点v_n的偏离路径h_j，并与最短路径p_i中起点v₀到v_j的路径进行组合，放入候选路径集合B中；重复该步骤，遍历所有偏离点，得到n条候选最短路径；Step 3-4, for the deviation point v_j , let the weight d_jq = ∞ of the road section corresponding to v_j and the next node v_j+1 on the shortest path p_i , use the astar algorithm to obtain the deviation path h_j from the deviation point v_j to the end point v_n , and combine it with the path from the starting point v₀ to v_j in the shortest path p_i , and put it into the candidate path set B; repeat this step, traverse all deviation points, and obtain n candidate shortest paths;

步骤3-5，将候选路径集合B中权值最小的路径添加到最短路径集合P中，并将所述权值最小的路径从候选路径集合B中删除；Step 3-5, adding the path with the smallest weight in the candidate path set B to the shortest path set P, and deleting the path with the smallest weight from the candidate path set B;

步骤3-6，判断最短路径集合P中的路径个数是否小于K，若是，令k＝k+1，并转入步骤3-3，若否，则输出最短路径集合P中的前K条路径作为初始出行路径选择集合；Step 3-6, determine whether the number of paths in the shortest path set P is less than K. If so, set k=k+1 and go to step 3-3. If not, output the first K paths in the shortest path set P as the initial travel path selection set;

步骤3-7，将初始出行路径选择集合与步骤1所提取的出行链数据进行比对，保留数据一致的路径作为OD对的最终出行路径选择集。Step 3-7, compare the initial travel path selection set with the travel chain data extracted in step 1, and retain the paths with consistent data as the final travel path selection set of the OD pair.

进一步地，步骤4所述构建考虑广义选择的离散选择模型，采用Path-sizeLogit模型即PSL模型，根据PSL模型，用户选择路径k的概率为：Furthermore, in step 4, a discrete choice model considering generalized choice is constructed, and the Path-sizeLogit model, namely the PSL model, is adopted. According to the PSL model, the probability of the user selecting path k is:

其中，P_k为选择路径k的概率，V_k＝-μU_k，U_k和V_k分别为路径k的特性变量和系统效用，U_k＝x_kγ+ε_k，x_k表示路径k的费用影响因素向量，γ表示各个影响因素的系数向量，ε_k表示平均值为0的误差项，μ为值为正的参数，U_l和V_l分别为路径l的特性变量和系统效用，K_ij为OD对(i,j)上的路径集合，任意k∈K_ij，β₁与β₂为待标参数，S_k为修正项：Where_Pk is the probability of selecting path k,_Vk =_-μUk ,_Uk and_Vk are the characteristic variables and system utility of path k respectively,_Uk =_xkγ +_εk ,_xk represents the cost influencing factor vector of path k, γ represents the coefficient vector of each influencing factor,_εk represents the error term with an average value of 0, μ is a positive parameter,_Ul and_Vl are the characteristic variables and system utility of path l respectively,_Kij is the path set on OD pair (i, j), any k∈Kij,_β1 and_β2 are parameters to be labeled, and_Sk is the correction term_:

式中，l_a为路段a的费用，L_k为路径k的费用，Γ_k为路径k的路段集合，若路段a在路径k上，则δ_aj＝1，否则为0，C_n为路径集合，为C_n上的最小路径费用。Where, l_a is the cost of segment a, L_k is the cost of path k, Γ_k is the segment set of path k, if segment a is on path k, then δ_aj = 1, otherwise it is 0, C_n is the path set, is the minimum path cost on_Cn .

进一步地，步骤5所述构建交通分配模型，具体为：基于离散选择模型构建一个基于PSL的随机用户均衡交通分配模型，其具体形式为：Furthermore, the traffic assignment model is constructed in step 5, specifically: a PSL-based random user equilibrium traffic assignment model is constructed based on the discrete choice model, and its specific form is:

模型中的第一项为所有路段出行总费用，第二项为认知出行费用的误差；约束分别表示流量守恒、路径流量与出行需求关系、变量可行域和路段流量的计算公式。The first item in the model is the total travel cost of all road sections, and the second item is the error in the perceived travel cost; the constraints represent the flow conservation, the relationship between path flow and travel demand, the feasible domain of variables, and the calculation formula for section flow.

式中，x_a为路段a上的流量，若路段a在OD对(o,d)之间的路径k上，则否则t_a(w)为路段a上的流量为w时行驶所需的平均费用；为OD对(o,d)之间的路径k的交通流量；θ为非负的校正参数。Where x_a is the flow rate on section a. If section a is on path k between the OD pair (o, d), then otherwise t_a (w) is the average cost required to travel when the flow rate on road section a is w; is the traffic flow of path k between the OD pair (o, d); θ is a non-negative correction parameter.

进一步地，步骤6所述利用最大似然估计法估计离散选择模型参数，由于手机信令定位精度较低，将选择集按照主要出行方式进行分组，生成广泛选择集，再构建最大似然函数进行参数估计。设原始精确选择集合包含所有路径k＝1，2，3…,K，设B为广泛选择集合，C_b(b＝1，2，3，…，B)表示以b种交通方式为主的路径选择组，路径k仅属于一个选择组，用户选择某一个路径选择组的概率为选择该路径组包含所有路径的概率和。Furthermore, in step 6, the discrete choice model parameters are estimated by the maximum likelihood estimation method. Due to the low accuracy of mobile phone signaling positioning, the selection set is grouped according to the main travel mode to generate a broad selection set, and then the maximum likelihood function is constructed for parameter estimation. Assume that the original precise selection set contains all paths k = 1, 2, 3 ..., K, and B is the broad selection set. C_b (b = 1, 2, 3, ..., B) represents a path selection group based on b modes of transportation. Path k belongs to only one selection group. The probability of a user selecting a path selection group is the sum of the probabilities of selecting the path group containing all paths.

该步骤具体包括：依据步骤1获得的出行链数据和步骤4构建的离散选择模型，采用最大似然估计法估计离散选择模型的参数，其中构建离散选择模型的似然函数：This step specifically includes: according to the trip chain data obtained in step 1 and the discrete choice model constructed in step 4, the parameters of the discrete choice model are estimated by using the maximum likelihood estimation method, wherein the likelihood function of the discrete choice model is constructed:

其中，S_b是广义类别b的路线组成的集合，Z_n'b是虚拟变量，当用户n₁选择广义路径选择组b时，其值为1，否则为0；Where S_b is the set of routes of generalized category b, Z_n'b is a dummy variable, which takes the value of 1 when user n₁ chooses generalized route selection group b, and 0 otherwise;

将用户的出行链数据和路网数据加载到离散选择模型中，求似然函数的最大值，由此获得离散选择模型的参数值。The user's travel chain data and road network data are loaded into the discrete choice model, and the maximum value of the likelihood function is calculated, thereby obtaining the parameter value of the discrete choice model.

进一步地，步骤7所述基于机器学习对所述交通分配模型进行参数标定，具体包括：Furthermore, the step 7 of calibrating the parameters of the traffic assignment model based on machine learning specifically includes:

步骤7-1，随机生成交通分配模型参数：在路段出行费用函数参数经验值附近随机生成若干组候选参数；Step 7-1, randomly generate traffic assignment model parameters: randomly generate several groups of candidate parameters near the empirical values of the road segment travel cost function parameters;

步骤7-2，将每一组候选参数输入步骤5所构建的交通分配模型，并将出行OD数据加载到该模型中，利用相继加权平均算法即MSWA算法对交通分配模型进行迭代求解，获得路段的交通量；所述MSWA算法的具体过程包括：Step 7-2, input each set of candidate parameters into the traffic assignment model constructed in step 5, and load the travel OD data into the model, and use the successive weighted average algorithm, i.e., the MSWA algorithm, to iteratively solve the traffic assignment model to obtain the traffic volume of the road section; the specific process of the MSWA algorithm includes:

(1)令迭代次数m＝1，辅助变量γ₀＝0，令各路段a的初始流量(1) Let the number of iterations m = 1, the auxiliary variable γ₀ = 0, and the initial flow rate of each section a be

(2)通过路段出行费用函数计算各OD对出行路径选择集中各条路径的总形式费用根据OD需求计算出行选择集中所有路径的分配交通量：(2) Calculate the total formal cost of each path in the travel path selection set for each OD pair through the road segment travel cost function Calculate the allocated traffic volume of all paths in the travel choice set based on OD demand:

式中，为OD对(i,j)间的第k条路径的分配流量，为OD对(i,j)之间用户选择第k条路径的概率，q^ij为OD对(i,j)之间的总需求量；In the formula, is the allocated flow of the kth path between OD pair (i, j), is the probability that the user chooses the kth path between the OD pair (i, j), and q^ij is the total demand between the OD pair (i, j);

将路径的分配流量加载到综合一体交通网中，得到加载完成后各路段a的辅助流量Load the allocated flow of the path into the integrated transportation network and obtain the auxiliary flow of each section a after loading.

(3)计算迭代步长，令辅助变量β_m＝m^d，γ_m＝γ_m-1+β_m，d为参数，计算迭代步长(3) Calculate the iteration step length. Let auxiliary variables β_m = m^d , γ_m = γ_m-1 + β_m , d is the parameter, and calculate the iteration step length

(4)更新路段流量：其中分别为第m次、第m+1次迭代的路段a的流量；(4) Update road traffic: in are the flow rates of section a at the mth and m+1th iterations respectively;

(5)判断收敛条件是否满足，其中ε'为预设收敛阈值；若满足收敛性要求，则算法终止并输出路段流量；否则令m＝m+1，转入(2)；(5) Determine the convergence conditions Whether it is satisfied, where ε' is the preset convergence threshold; if the convergence requirement is met, the algorithm terminates and outputs the section flow; otherwise, let m = m + 1 and go to (2);

步骤7-3，计算交通分配模型分配的结果与实际流量结果的耦合度z为：Step 7-3, calculate the coupling degree z between the result of the traffic assignment model and the actual flow result:

其中，代表路段a上分配流量，y_a代表路段a上的实际流量，n'为路网中路段的总数；in, represents the allocated flow on road segment a, y_a represents the actual flow on road segment a, and n' is the total number of road segments in the road network;

遍历所有候选参数，将参数和对应的耦合度进行组合作为后续机器学习模型的训练集；Traverse all candidate parameters and combine the parameters and corresponding coupling degrees as the training set for the subsequent machine learning model;

步骤7-4，构建机器学习模型：Step 7-4, build a machine learning model:

设置待标定参数与路段流量耦合度的映射关系为：The mapping relationship between the parameters to be calibrated and the coupling degree of road section flow is set as follows:

z＝f(x)＝w·x+bz＝f(x)＝w·x+b

其中，x表示待标定参数的取值向量，w表示变量系数向量，b表示偏置量；Among them, x represents the value vector of the parameter to be calibrated, w represents the variable coefficient vector, and b represents the bias;

设x_i为第i组候选参数向量，i＝1,2,…,q，q为候选参数组数，z_i为第i组候选参数对应的路段流量耦合度，可接受误差为ε，通过引入松弛变量ξ_i和将SVR模型表示为：Assume that_xi is the i-th group of candidate parameter vectors, i = 1, 2, …, q, q is the number of candidate parameter groups,_zi is the road flow coupling degree corresponding to the i-th group of candidate parameters, and the acceptable error is ε. By introducing relaxation variables_ξi and The SVR model is represented as:

s.t.z_i-f(x_i)＝ε+ξ_i,i＝1,2,…,qstz_i -f(x_i )＝ε+ξ_i ,i＝1,2,…,q

其中，C为正则化常数；l_ε为不敏感损失函数损失函数：Among them, C is the regularization constant; l_ε is the insensitive loss function:

构造拉格朗日函数：Construct the Lagrangian function:

a_i,μ_i,为辅助变量，令拉格朗日函数对除拉格朗日乘子外的变量w,b,ξ,ξ^*的偏导为零可得：a_i , μ_i , As auxiliary variables, let the partial derivatives of the Lagrangian function with respect to the variables w, b, ξ, ξ^* other than the Lagrangian multiplier be zero:

将SVR模型转化为其对偶问题的形式：Transform the SVR model into its dual problem form:

步骤7-5，将步骤7-2获得的训练集数据输入SVR模型并进行训练，得到最佳SVR模型；输入交通分配模型参数到SVR中即可输出对应的分配流量与实际流量的耦合度；Step 7-5, input the training set data obtained in step 7-2 into the SVR model and train it to obtain the best SVR model; input the traffic assignment model parameters into the SVR to output the coupling degree between the corresponding assigned flow and the actual flow;

步骤7-6，采用启发式算法遗传算法求解最佳SVR模型中耦合度最高的出行费用函数参数，其具体步骤为：Step 7-6, using the heuristic algorithm genetic algorithm to solve the parameters of the travel cost function with the highest coupling degree in the optimal SVR model, the specific steps are:

(1)选择合适的编码方案并随机生成一个种群，作为初始解；(1) Select a suitable coding scheme and randomly generate a population as the initial solution;

(2)对于种群中的每个个体，计算其适应度值；(2) For each individual in the population, calculate its fitness value;

(3)根据适应度高低选择保留的个体；(3) Select the individuals to be retained based on their fitness;

(4)对选出的个体进行两两交叉遗传，并加入变异操作，获得子代种群(4) Perform crossover genetics on the selected individuals and add mutation operations to obtain the offspring population

(5)根据每个个体的是适应度差值是否满足一定条件，如果满足，退出算法，否则转到步骤(2)继续种群繁衍产生子代，直到获得最优个体。(5) Whether the fitness difference of each individual meets certain conditions, if so, exit the algorithm; otherwise, go to step (2) to continue population reproduction and produce offspring until the optimal individual is obtained.

由此完成交通分配模型的参数标定。This completes the parameter calibration of the traffic assignment model.

进一步地，步骤8所述基于交通分配模型，根据交通出行需求进行交通分配，具体包括：Furthermore, in step 8, based on the traffic assignment model, traffic assignment is performed according to traffic travel demand, specifically including:

将现有的出行OD数据输入到步骤7标定好的交通分配模型中，利用MSWA算法进行求解，获得各个路段上交通分配的结果。Input the existing travel OD data into the traffic assignment model calibrated in step 7, and use the MSWA algorithm to solve it to obtain the traffic assignment results on each road section.

本发明与现有技术相比，其显著优点为：Compared with the prior art, the present invention has the following significant advantages:

1)通过手机信令获取用户的出行信息，将其与以铁路、公路和虚拟换乘道路为主的综合一体交通网络进行匹配，生成出行OD信息和出行链数据，这些数据分别应用于选择集生成、离散选择模型的参数估计和交通分配模型的参数标定，充分挖掘手机信令数据的价值，手机信令数据体量大且覆盖范围广，使模型的结果更接近实际出行情况。1) The user's travel information is obtained through mobile phone signaling, and it is matched with the integrated transportation network composed mainly of railways, highways and virtual transfer roads to generate travel OD information and travel chain data. These data are used for choice set generation, parameter estimation of discrete choice models and parameter calibration of traffic assignment models, respectively, to fully tap the value of mobile phone signaling data. Mobile phone signaling data is large in volume and covers a wide range, making the model results closer to actual travel conditions.

2)采用广义选择的方法生成广义选择组克服了手机信令由于定位精度低，无法观察到用户特定的出行路线的缺点，广义选择组包含了一些具体的路径，可以从更高维度上拟合用户的选择。并结合路径大小logit模型(PSL)解决了MNL模型在高度重叠的路线上出现不合逻辑的选择概率的问题。2) The generalized choice method is used to generate the generalized choice group, which overcomes the disadvantage that the mobile phone signaling cannot observe the specific travel route of the user due to the low positioning accuracy. The generalized choice group contains some specific paths, which can fit the user's choice from a higher dimension. And combined with the path size logit model (PSL), it solves the problem of illogical selection probability in the MNL model on highly overlapping routes.

3)通过构建机器学习模型，代替反复求解随机用户均衡模型，通过少量样本即可获得模型参数与模型分配结果与实际流量耦合度之间的映射关系，大大减少了参数标定的时间，同时采用启发式算法中的遗传算法搜索最优的参数，对参数进行标定，提高了参数标定的效率，经过参数标定的随机用户均衡模型具有更高的精度。3) By constructing a machine learning model, instead of repeatedly solving the random user equilibrium model, the mapping relationship between the model parameters and the model allocation results and the actual traffic coupling degree can be obtained through a small number of samples, which greatly reduces the time for parameter calibration. At the same time, the genetic algorithm in the heuristic algorithm is used to search for the optimal parameters and calibrate the parameters, which improves the efficiency of parameter calibration. The random user equilibrium model after parameter calibration has higher accuracy.

下面结合附图对本发明作进一步详细描述。The present invention is further described in detail below in conjunction with the accompanying drawings.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明基于手机信令的综合交通分配方法的流程图。FIG. 1 is a flow chart of the integrated traffic distribution method based on mobile phone signaling of the present invention.

图2为综合交通路网示意图。Figure 2 is a schematic diagram of the comprehensive transportation network.

图3为偏离路径算法流程图。FIG3 is a flow chart of the deviation path algorithm.

图4为STA_GA算法流程图。Figure 4 is a flow chart of the STA_GA algorithm.

具体实施方式DETAILED DESCRIPTION

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处描述的具体实施例仅仅用以解释本申请，并不用于限定本申请。In order to make the purpose, technical solution and advantages of the present application more clearly understood, the present application is further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application and are not used to limit the present application.

需要说明，若本发明实施例中有涉及“第一”、“第二”等的描述，则该“第一”、“第二”等的描述仅用于描述目的，而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此，限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外，各个实施例之间的技术方案可以相互结合，但是必须是以本领域普通技术人员能够实现为基础，当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在，也不在本发明要求的保护范围之内。It should be noted that if there are descriptions involving "first", "second", etc. in the embodiments of the present invention, the descriptions of "first", "second", etc. are only used for descriptive purposes and cannot be understood as indicating or implying their relative importance or implicitly indicating the number of technical features indicated. Therefore, the features defined as "first" and "second" may explicitly or implicitly include at least one of the features. In addition, the technical solutions between the various embodiments can be combined with each other, but they must be based on the ability of ordinary technicians in the field to implement them. When the combination of technical solutions is contradictory or cannot be implemented, it should be deemed that such a combination of technical solutions does not exist and is not within the scope of protection required by the present invention.

在一个实施例中，提供了一种基于手机信令的综合交通分配方法，通过提取手机信令用户的出行信息，将出行OD和出行链数据用于随机用户均衡模型，利用机器学习模型和启发式算法对模型进行标定，提高了区域综合交通路网交通分配的精度和效率。结合图1，所述方法包括：In one embodiment, a comprehensive traffic distribution method based on mobile phone signaling is provided, which extracts the travel information of mobile phone signaling users, uses the travel OD and travel chain data for the random user equilibrium model, and calibrates the model using the machine learning model and heuristic algorithm, thereby improving the accuracy and efficiency of regional comprehensive traffic network traffic distribution. In conjunction with Figure 1, the method includes:

步骤S1，通过手机信令数据获取居民出行OD数据和出行链数据；在该步骤执行之前，还包括：对手机信令数据进行预处理，具体地：对手机信令数据进行清洗，去除错误数据，包括重复数据、乒乓数据和漂移数据。Step S1, obtaining residents' travel OD data and travel chain data through mobile phone signaling data; before executing this step, it also includes: preprocessing the mobile phone signaling data, specifically: cleaning the mobile phone signaling data to remove erroneous data, including duplicate data, ping-pong data and drift data.

这里，所述去除错误数据，具体包括：Here, removing the erroneous data specifically includes:

根据数据的MDN码和时间对数据进行排序，根据数据的位置，计算相邻的两条数据的距离是否小于预设阈值，若是，则判定两者为重复数据并删除后一条数据；根据数据的时间和位置信息，计算相邻两条数据的速度是否大于预设阈值，若是，则判定为乒乓数据并删除后一条数据；判断同一MDN码数据中是否有基站连续出现次数超过预设阈值，若是，则判断为漂移数据，只保留出现该基站的第一条和最后一条数据，将其余数据删除。The data is sorted according to the MDN code and time of the data. According to the location of the data, whether the distance between two adjacent data is less than the preset threshold is calculated. If so, the two are determined to be duplicate data and the latter data is deleted; according to the time and location information of the data, whether the speed of the two adjacent data is greater than the preset threshold is calculated. If so, it is determined to be ping-pong data and the latter data is deleted; it is determined whether there is a base station in the same MDN code data with a number of consecutive appearances exceeding the preset threshold. If so, it is determined to be drift data, and only the first and last data of the base station are retained, and the rest of the data is deleted.

步骤S1具体包括：Step S1 specifically includes:

步骤S2，依据现有路网生成综合一体交通网。由于手机信令定位精度较低以及省域出行方式主要为公路和铁路，本实施例中选择省域的公路网和铁路网作为综合交通网的骨架，并将两个路网的节点和路线以网络图的形式进行存储，并在路网上相邻的节点添加虚拟换乘路段，将两个路网连接成一体的交通网络，实现用户在网络上使用多种交通方式出行，联通后的网络如图2所示。为交通网建立路段出行的费用函数，其表达式为：Step S2, generating an integrated transportation network based on the existing road network. Due to the low accuracy of mobile phone signaling positioning and the fact that the main modes of provincial travel are roads and railways, in this embodiment, the provincial road network and railway network are selected as the skeleton of the integrated transportation network, and the nodes and routes of the two road networks are stored in the form of a network graph, and virtual transfer sections are added to the adjacent nodes on the road network to connect the two road networks into an integrated transportation network, so that users can use multiple modes of transportation to travel on the network. The connected network is shown in Figure 2. A cost function for section travel is established for the transportation network, and its expression is:

其中，为OD对w间路径p的广义出行费用，若路段l属于路径p，则b_lp为1，否则为0，c_l为路段l的广义出行费用：in, is the generalized travel cost of path p between OD pairs w. If the road segment l belongs to path p, then b_lp is 1, otherwise it is 0. c_l is the generalized travel cost of road segment l:

其中，分别表示出行时间费用、出行货币费用和出行舒适度损耗费用，γ_t、γ_p、γ_u分别表示时间、货币和舒适度的权重，并满足三者和为1。in, They represent travel time cost, travel monetary cost and travel comfort loss cost respectively. γ_t , γ_p , γ_u represent the weights of time, money and comfort respectively, and the sum of the three is 1.

对于公路路段的各种费用具体计算方法如下：The specific calculation methods for various fees for highway sections are as follows:

式中，为所选定的路段l上的形式时间，v_l为路段l上的交通量，为路段l上的通行能力，α、β为待标定参数，x_l为路段l的长度，ρ为单位长度的过路费，s^h为汽车单位时间内形式的舒适度损耗，η、λ分别是货币和时间费用及舒适度损耗与时间之间的这算系数，取值分别为3.09和1.5。In the formula, is the form time on the selected road section l, v_l is the traffic volume on the road section l, is the traffic capacity on section l, α and β are parameters to be calibrated, x_l is the length of section l, ρ is the toll per unit length,^sh is the comfort loss of the car per unit time, η and λ are the coefficients between the monetary and time costs and the comfort loss and time, and their values are 3.09 and 1.5 respectively.

对于铁路路段，其具体的时间和票价可由具体的路线运行情况和定价确定，而货币费用和出行舒适度损耗费用计算方法如下For railway sections, the specific time and fare can be determined by the specific route operation and pricing, while the monetary cost and travel comfort loss cost are calculated as follows:

其中p_l为铁路路段l的票价，s^r为铁路单位时间内的舒适度损耗，为铁路路段l的最大载客量。对于换乘路段，公路换乘铁路耗时采用30分钟，铁路换乘公路耗时采用15分钟，货币费用设置为0，舒适损耗与时间费用相等。Where p_l is the ticket price of railway section l, s^r is the comfort loss per unit time of the railway, is the maximum passenger capacity of railway section l. For the transfer section, the time taken to transfer from highway to railway is 30 minutes, and the time taken to transfer from railway to highway is 15 minutes. The monetary cost is set to 0, and the comfort loss is equal to the time cost.

步骤S3，基于手机信令的出行链数据生成出行路径选择集。结合图3，具体包括：Step S3, generating a travel path selection set based on the travel chain data of the mobile phone signaling. In conjunction with Figure 3, it specifically includes:

步骤S3-1，构建空的最短路径集合P和候选路径集合B；Step S3-1, construct an empty shortest path set P and a candidate path set B;

步骤S3-2，求解OD点对之间的最短路径并将其放入最短路径集合P中，将待求解路径数k设置为2；Step S3-2, solving the shortest path between the OD point pairs and putting them into the shortest path set P, and setting the number of paths to be solved k to 2;

步骤S3-3，令i为最短路径集合P的基数即集合P中最短路径的条数，将第i个最短路径p_i上的除终点v_n的节点依次作为偏离点v_j，j＝1,2,..,n-1，n+1为路径p_i中的节点总数；Step S3-3, let i be the cardinality of the shortest path set P, that is, the number of shortest paths in the set P, and take the nodes on the i-th shortest path p_i except the end point v_n as the deviation points v_j in sequence, j=1,2,..,n-1, n+1 is the total number of nodes in the path p_i ;

步骤S3-4，对于偏离点v_j，令最短路径p_i上的v_j和其后一节点v_j+1对应路段的权值d_jq＝∞，利用astar算法求得偏离点v_j到终点v_n的偏离路径h_j，并与最短路径p_i中起点v₀到v_j的路径进行组合，放入候选路径集合B中；重复该步骤，遍历所有偏离点，得到n条候选最短路径；Step S3-4, for the deviation point v_j , let the weight d_jq = ∞ of the road section corresponding to v_j and the next node v_j+1 on the shortest path p_i , use the astar algorithm to obtain the deviation path h_j from the deviation point v_j to the end point v_n , and combine it with the path from the starting point v₀ to v_j in the shortest path p_i , and put it into the candidate path set B; repeat this step, traverse all deviation points, and obtain n candidate shortest paths;

步骤S3-5，将候选路径集合B中权值最小的路径添加到最短路径集合P中，并将所述权值最小的路径从候选路径集合B中删除；Step S3-5, adding the path with the smallest weight in the candidate path set B to the shortest path set P, and deleting the path with the smallest weight from the candidate path set B;

步骤S3-6，判断最短路径集合P中的路径个数是否小于K，若是，令k＝k+1，并转入步骤S3-3，若否，则输出最短路径集合P中的前K条路径作为初始出行路径选择集合；Step S3-6, determine whether the number of paths in the shortest path set P is less than K. If so, set k=k+1 and go to step S3-3. If not, output the first K paths in the shortest path set P as the initial travel path selection set;

步骤S3-7，将初始出行路径选择集合与步骤S1所提取的出行链数据进行比对，保留数据一致的路径作为OD对的最终出行路径选择集。Step S3-7, compare the initial travel path selection set with the travel chain data extracted in step S1, and retain the paths with consistent data as the final travel path selection set of the OD pair.

步骤S4，构建考虑广义选择的离散选择模型。采用Path-sizeLogit模型即PSL模型，根据PSL模型，用户选择路径k的概率为：Step S4: construct a discrete choice model considering generalized choice. The Path-sizeLogit model, namely the PSL model, is used. According to the PSL model, the probability of a user choosing path k is:

步骤S5，构建交通分配模型。具体为：Step S5: construct a traffic assignment model. Specifically:

基于离散选择模型构建一个基于PSL的随机用户均衡交通分配模型，其具体形式为：Based on the discrete choice model, a PSL-based stochastic user equilibrium traffic assignment model is constructed, and its specific form is:

模型中的第一项为所有路段出行总费用，第二项为认知出行费用的误差；The first term in the model is the total travel cost of all road segments, and the second term is the error in the perceived travel cost;

步骤S6，利用最大似然估计法估计离散选择模型参数。由于手机信令定位精度较低，将选择集按照主要出行方式进行分组，生成广泛选择集，再构建最大似然函数进行参数估计。设原始精确选择集合包含所有路径k＝1，2，3…,K，设B为广泛选择集合，C_b(b＝1，2，3，…，B)表示以b种交通方式为主的路径选择组，路径k仅属于一个选择组，用户选择某一个路径选择组的概率为选择该路径组包含所有路径的概率和。该步骤具体包括：依据步骤S1获得的出行链数据和步骤S4构建的离散选择模型，采用最大似然估计法估计离散选择模型的参数，其中构建离散选择模型的似然函数：Step S6, estimate the parameters of the discrete choice model using the maximum likelihood estimation method. Due to the low accuracy of mobile phone signaling positioning, the selection set is grouped according to the main travel mode to generate a broad selection set, and then the maximum likelihood function is constructed for parameter estimation. Suppose the original precise selection set contains all paths k=1, 2, 3..., K, and B is the broad selection set. C_b (b=1, 2, 3,..., B) represents a path selection group based on b modes of transportation. Path k belongs to only one selection group. The probability of a user selecting a path selection group is the sum of the probabilities of selecting all paths included in the path group. This step specifically includes: based on the travel chain data obtained in step S1 and the discrete choice model constructed in step S4, the maximum likelihood estimation method is used to estimate the parameters of the discrete choice model, wherein the likelihood function of the discrete choice model is constructed:

步骤S7，基于机器学习对所述交通分配模型进行参数标定。具体包括：Step S7, calibrating the parameters of the traffic assignment model based on machine learning, including:

步骤S7-1，随机生成交通分配模型参数：在路段出行费用函数参数经验值附近随机生成若干组候选参数；Step S7-1, randomly generating traffic assignment model parameters: randomly generating several groups of candidate parameters near the empirical values of the road segment travel cost function parameters;

步骤S7-2，将每一组候选参数输入步骤S5所构建的交通分配模型，并将出行OD数据加载到该模型中，利用相继加权平均算法即MSWA算法对交通分配模型进行迭代求解，获得路段的交通量；所述MSWA算法的具体过程包括：Step S7-2, input each set of candidate parameters into the traffic assignment model constructed in step S5, and load the travel OD data into the model, and use the successive weighted average algorithm, i.e., the MSWA algorithm, to iteratively solve the traffic assignment model to obtain the traffic volume of the road section; the specific process of the MSWA algorithm includes:

步骤S7-3，计算交通分配模型分配的结果与实际流量结果的耦合度z为：Step S7-3, calculate the coupling degree z between the result of the traffic assignment model and the actual flow result:

步骤S7-4，构建机器学习模型：Step S7-4, building a machine learning model:

z＝f(x)＝w·x+bz＝f(x)＝w·x+b

s.t.z_i-f(x_i)＝ε+ξ_i,i＝1,2,…,qstz_i -f(x_i )＝ε+ξ_i ,i＝1,2,…,q

构造拉格朗日函数：Construct the Lagrangian function:

步骤S7-5，将步骤S7-2获得的训练集数据输入SVR模型并进行训练，得到最佳SVR模型；输入交通分配模型参数到SVR中即可输出对应的分配流量与实际流量的耦合度；Step S7-5, input the training set data obtained in step S7-2 into the SVR model and train it to obtain the best SVR model; input the traffic assignment model parameters into the SVR to output the coupling degree between the corresponding assigned flow and the actual flow;

步骤S7-6，采用启发式算法遗传算法(STA_GA)求解最佳SVR模型中耦合度最高的出行费用函数参数，具体步骤为：Step S7-6, using a heuristic genetic algorithm (STA_GA) to solve the parameters of the travel cost function with the highest coupling degree in the optimal SVR model, the specific steps are:

(5)根据每个个体的是适应度差值是否满足一定条件，如果满足，退出算法，否则转到步骤(2)继续种群繁衍产生子代，直到获得最优个体。其算法的流程和耦合度的变化分别如图4所示。(5) Whether the fitness difference of each individual meets certain conditions, if it does, exit the algorithm, otherwise go to step (2) to continue population reproduction and produce offspring until the optimal individual is obtained. The algorithm flow and coupling degree changes are shown in Figure 4.

步骤S8，根据交通出行需求进行交通分配，将现有的出行OD数据输入到标定好的随机用户均衡模型中，利用MSWA算法进行求解，获得各个路段上交通分配的结果。Step S8, traffic allocation is performed according to traffic travel demand, the existing travel OD data is input into the calibrated random user equilibrium model, and the MSWA algorithm is used to solve it to obtain the traffic allocation results on each road section.

在一个实施例中，提供了一种基于手机信令的综合交通分配系统，所述系统包括：In one embodiment, a comprehensive traffic distribution system based on mobile phone signaling is provided, the system comprising:

第一模块，用于通过手机信令数据获取居民出行OD数据和出行链数据；The first module is used to obtain residents’ travel OD data and travel chain data through mobile phone signaling data;

第二模块，用于依据现有路网生成综合一体交通网；The second module is used to generate a comprehensive transportation network based on the existing road network;

第三模块，用于基于手机信令的出行链数据生成出行路径选择集；The third module is used to generate a travel path selection set based on the travel chain data of mobile phone signaling;

第四模块，用于构建考虑广义选择的离散选择模型；The fourth module is used to build discrete choice models considering generalized choices;

第五模块，用于构建交通分配模型；The fifth module is used to build a traffic assignment model;

第六模块，用于利用最大似然估计法估计离散选择模型参数；The sixth module is used to estimate the parameters of discrete choice models using maximum likelihood estimation;

第七模块，用于基于机器学习对所述交通分配模型进行参数标定；A seventh module is used to calibrate parameters of the traffic assignment model based on machine learning;

第八模块，用于基于交通分配模型，根据交通出行需求进行交通分配。The eighth module is used to perform traffic allocation according to traffic travel demand based on the traffic allocation model.

关于基于手机信令的综合交通分配系统的具体限定可以参见上文中对于基于手机信令的综合交通分配方法的限定，在此不再赘述。上述基于手机信令的综合交通分配系统中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中，也可以以软件形式存储于计算机设备中的存储器中，以便于处理器调用执行以上各个模块对应的操作。For the specific definition of the integrated traffic distribution system based on mobile phone signaling, please refer to the definition of the integrated traffic distribution method based on mobile phone signaling above, which will not be repeated here. Each module in the above-mentioned integrated traffic distribution system based on mobile phone signaling can be implemented in whole or in part by software, hardware and a combination thereof. The above-mentioned modules can be embedded in or independent of the processor in the computer device in the form of hardware, or can be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.

在一个实施例中，提供了一种计算机设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，处理器执行计算机程序时实现以下步骤：In one embodiment, a computer device is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the following steps when executing the computer program:

关于每一步的具体限定可以参见上文中对于基于手机信令的综合交通分配方法的限定，在此不再赘述。For the specific limitations of each step, please refer to the limitations of the integrated traffic allocation method based on mobile phone signaling in the above text, which will not be repeated here.

在一个实施例中，提供了一种计算机可读存储介质，其上存储有计算机程序，计算机程序被处理器执行时实现以下步骤：In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented:

本发明可以发挥手机信令数据的优势，提高综合交通随机用户均衡模型交通分配的精度，通过机器学习提高模型参数标定的效率，更好的为区域交通规划与管理提供科学依据。The present invention can give full play to the advantages of mobile phone signaling data, improve the accuracy of traffic distribution of the comprehensive traffic random user equilibrium model, improve the efficiency of model parameter calibration through machine learning, and better provide a scientific basis for regional traffic planning and management.

以上显示和描述了本发明的基本原理、主要特征及优点。本行业的技术人员应该了解，本发明不受上述实施例的限制，上述实施例和说明书中描述的只是说明本发明的原理，在不脱离本发明精神和范围的前提下，凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above shows and describes the basic principles, main features and advantages of the present invention. Those skilled in the art should understand that the present invention is not limited by the above embodiments, and the above embodiments and descriptions are only for explaining the principles of the present invention. Without departing from the spirit and scope of the present invention, any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.