Disclosure of Invention
Aiming at the defects in the prior art, the invention provides the highway pavement performance prediction method, which solves the problem of poor model prediction precision in the prior art.
In order to achieve the aim of the invention, the invention adopts the following technical scheme:
a highway pavement performance prediction method comprises the following steps:
S1, acquiring a target road section data set, wherein the target road section data set comprises a road surface influence factor data set and a road surface performance prediction index data set, and the time corresponding to each road surface influence factor data in the road surface influence factor data set corresponds to the time corresponding to each road surface performance prediction index data in the road surface performance prediction index data set one by one;
s2, performing correlation analysis on the target road section data set to obtain a preprocessed target road section data set;
S3, performing influence factor feature selection on the preprocessing target road section data set by using a Lasso method to obtain an influence feature factor set;
s4, performing characteristic factor gray prediction on the influence characteristic factor set by using GM (1, 1) to obtain a future characteristic factor set;
And S5, processing the future characteristic factor set by using support vector regression SVR to obtain a future pavement prediction result.
The beneficial effect of above-mentioned scheme is:
(1) The invention adopts a combination model based on the progressive function f (j) four-stage self-adaptive Lasso method characteristic factor search, GM (1, 1) characteristic factor gray prediction and a myxobacteria algorithm (SMA) optimizing Support Vector Regression (SVR) to predict the performance of the project-level highway pavement, has the advantages of clear and complete theory, high characteristic selection precision and easy realization of SVR parameter optimization, and realizes the high-precision prediction of the performance index of the project-level highway pavement.
(2) Compared with the traditional Lasso algorithm which is slow in convergence and poor in feature factor selection accuracy, the provided four-stage self-adaptive Lasso method based on the incremental function f (j) adopts different punishment weights for different hysteresis orders, has the advantages of fast parameter adjustment and convergence speed, and improves the accuracy and effectiveness of a model for parameter selection.
(3) The SVR penalty parameter C, the insensitive loss function maximum error factor epsilon and the nuclear parameter gamma of the support vector regression machine by adopting the slime algorithm (SMA) optimization have the advantages of complete and simple theory, strong optimizing capability and convenient realization.
Further, step S2 specifically includes:
and performing correlation analysis on the target road section data set by using a Pearson correlation coefficient method to obtain a preprocessed target road section data set.
Further, step S3 specifically includes:
s31, establishing a preset regression model of pavement impact factor data and pavement performance prediction index data in the pretreatment target road section data set;
S32, establishing a self-adaptive Lasso estimation equation of fitting parameters in a preset regression model by using a Lasso method;
S33, solving a self-adaptive Lasso estimation equation, determining fitting parameters and determining an influence characteristic factor set.
Further, in step S31, the preset regression model is:
Wherein j represents an influence factor, that is, the order of a preset regression model, j=1, 2. I denotes the length of time of the target road segment data set, i=1, 2. I represents a target road segment data set established from a previous time to a current time point t, betaj represents a fitting parameter, epsiloni represents an error term;
The adaptive Lasso estimation equation is:
Wherein βj represents the fitting parameters; Lambdat represents a non-negative adjustment parameter, wj represents an adaptive weight; Representing an estimate of the fitting parameter βj;
in the adaptive Lasso estimation equation of the fitting parameter βj, the definition equation of the adaptive weight wj is:
wherein,The initial estimated value can be obtained by common least squares OLS regression, and delta1 is the initial estimated valueDelta1>0;δ2 is a penalty parameter for order j, delta2 >0;f (j) is an increasing function with respect to order j.
In the definition equation of the adaptive weight wj, the definition equation of the increasing function f (j) is:
f(j)=kj
where k is a non-negative coefficient: q1, Q2, Q3 are the lower quartile, the median, and the upper quartile, respectively, of the sequence j.
The further scheme has the beneficial effects that different punishment weights are adopted for different hysteresis orders, the advantages of fast parameter adjustment and convergence speed are achieved, and the accuracy and the effectiveness of the model on parameter selection are improved.
Further, step S4 specifically includes:
S41, constructing a 1-AGO sequence of each pavement influence factor data in the influence characteristic factor set;
S42, generating an adjacent mean value sequence based on the 1-AGO sequence;
s43, constructing a gray differential equation GM (1, 1) based on the adjacent mean value sequences to obtain a least square estimation parameter sequence of the gray differential equation;
S44, obtaining a 1-AGO estimation sequence based on the least square estimation parameter sequence;
s45, performing subtraction reduction on the 1-AGO estimation sequence to obtain a future time sequence of each influencing characteristic factor,
S46, combining the influence characteristic factor set and the future time sequence to construct a future characteristic factor set.
Further, after step S4, the method further includes:
and (4) checking the step S4 based on a small residual probability P value checking mode and a variance ratio C value checking mode.
The further scheme has the advantages that the GM (1, 1) has higher calculation efficiency, is suitable for the condition of smaller data volume in the invention, and has higher prediction accuracy for the first-order linear accumulation sequence.
Further, step S5 specifically includes:
s51, constructing a training set based on a future characteristic factor set, wherein the training set comprises input characteristic vectors and prediction indexes;
s52, mapping the input feature vector to a high-dimensional linear space by using a nonlinear transformation relation;
S53, constructing a function expression Supporting Vector Regression (SVR) according to the nonlinear transformation relation, and solving the function expression;
S54, determining a future pavement prediction result according to the function expression and the future characteristic factor set.
Further, step S53 specifically includes:
s531, converting the process of solving the function expression into a problem of searching the optimization penalty parameter, the maximum error factor of the optimization insensitive loss function and the optimization kernel parameter;
S532, determining optimization penalty parameters, maximum error factors of the optimization insensitive loss function and optimization kernel parameters by using a mucor algorithm SMA to solve the function expression.
Further, step S532 specifically includes:
s5321, setting initial parameters and initializing a population;
the initial parameters comprise population quantity, individual dimension, fitness function, maximum iteration number, individual dimension upper boundary and individual dimension lower boundary;
S5322, calculating an initial fitness value;
S5323, calculating weight, a first parameter and a second parameter;
s5324, generating a random number and a pseudo random number, judging the sizes of the random number and the update parameter, and updating the individual position according to the judgment result;
S5325, calculating a fitness value, and updating an optimization penalty parameter, an optimization insensitive loss function maximum error factor and an optimization kernel parameter;
s5326, judging whether an end condition is met, if so, outputting an optimization penalty parameter, an optimization insensitive loss function maximum error factor and an optimization kernel parameter, and if not, repeating the steps S5323-S5325.
Further, in step S5321, the fitness function is:
Wherein C, epsilon and gamma respectively represent an optimization penalty parameter, an optimization insensitive loss function maximum error factor and an optimization kernel parameter,Representing the fitness value of the ith individual, yg representing the actual value of the training set,The predicted value of SVR regression of the training set when the ith individual is taken is represented, g represents the sample of the training set, and h represents the number of samples in the data set.
The further scheme has the beneficial effects that SVR punishment parameters C, insensitive loss function maximum error factors epsilon and nuclear parameters gamma of the support vector regression machine are optimized by adopting a myxobacteria algorithm (SMA), and the SVR punishment method has the advantages of being complete and simple in theory, strong in optimizing capability and convenient to implement.
Detailed Description
The invention will be further described with reference to the drawings and specific examples.
As shown in fig. 1, a method for predicting road surface performance includes the following steps:
s1, acquiring a target road section data set, wherein the target road section data set comprises a road surface influence factor data set and a road surface performance prediction index data set, and the time corresponding to each road surface influence factor data in the road surface influence factor data set and the time corresponding to each road surface performance prediction index data in the road surface performance prediction index data set are in one-to-one correspondence.
In the present embodiment, the acquired target segment data set may be a time series.
Optionally, the method for acquiring the pavement influencing factor data set may be analyzing basic information, maintenance history, climate environment, traffic flow, maintenance funds and other information of the project-level target road section. For example, annual rainfall, annual highest air temperature, annual lowest air temperature, annual average temperature, resident population, general public budget expenditure (transportation), and AADT may be used as road surface influencing factor data.
Alternatively, the road surface performance prediction index data set may contain the road surface damage condition index PCI or the road surface running quality index RQI.
And S2, performing correlation analysis on the target road section data set to obtain a preprocessed target road section data set.
In this embodiment, step S2 specifically includes:
And performing correlation analysis on the target road section data set by using a Pearson correlation coefficient method, wherein the analysis content is correlation analysis between road surface influence factor data and correlation analysis between the road surface influence factor data and road surface performance prediction index data, so as to obtain a preprocessing target road section data set.
S3, performing influence factor feature selection on the preprocessing target road section data set by using a Lasso method to obtain an influence feature factor set.
In this embodiment, step S3 specifically includes:
s31, establishing a preset regression model of pavement impact factor data and pavement performance prediction index data in the pretreatment target road section data set;
S32, establishing a self-adaptive Lasso estimation equation of fitting parameters in a preset regression model by using a Lasso method;
S33, solving a self-adaptive Lasso estimation equation, determining fitting parameters and determining an influence characteristic factor set.
In this embodiment, in step S31, the preset regression model is:
Wherein i represents an influence factor, i.e., an order of a preset regression model, j=1, 2, n, i represents a time length of the target segment data set, i=1, 2, t is a current time point, i represents the target segment data set established from a previous time to the current time point t, βj represents a fitting parameter, epsiloni represents an error term,
The adaptive Lasso estimation equation is:
Wherein βj represents the fitting parameters; Lambdat represents a non-negative adjustment parameter, wj represents an adaptive weight; Representing an estimate of the fitting parameter betaj.
In this embodiment, in the adaptive Lasso estimation equation of the fitting parameter βj, the definition equation of the adaptive weight wj is:
wherein: The initial estimated value can be obtained by common least squares OLS regression, and delta1 is the initial estimated valueDelta1>0;δ2 is a penalty parameter for order j, delta2 >0;f (j) is an increasing function with respect to order j.
In the definition equation of the adaptive weight wj, the definition equation of the increasing function f (j) is:
f(j)=kj
where k is a non-negative coefficient: q1, Q2, Q3 are the lower quartile, the median, and the upper quartile, respectively, of the sequence j.
Then the initial estimated value in the equation defining the non-negative adjustment parameter lambdat and the adaptive weight wj in the adaptive Lasso estimation equation of the fitting parameter betajGrid search is carried out on penalty parameters delta1 and penalty parameters delta2 of the order j to obtain fitting parameters based on self-adaptive Lasso estimation
Finally, through the feature selection, the feature influencing factor set can be determined
S4, performing characteristic factor gray prediction on the influence characteristic factor set by using GM (1, 1) to obtain a future characteristic factor set.
In this embodiment, step S4 specifically includes:
S41, constructing a 1-AGO sequence of each pavement influence factor data in the influence characteristic factor set;
S42, generating an adjacent mean value sequence based on the 1-AGO sequence;
s43, constructing a gray differential equation GM (1, 1) based on the adjacent mean value sequences to obtain a least square estimation parameter sequence of the gray differential equation;
S44, obtaining a 1-AGO estimation sequence based on the least square estimation parameter sequence;
S45, performing subtraction reduction on the 1-AGO estimation sequence to obtain a future time sequence of each influence characteristic factor.
S46, combining the influence characteristic factor set and the future time sequence to construct a future characteristic factor set.
Illustratively, the time series of each road surface influence factor data in the influence characteristic factor set is constructed as
Illustratively, a time sequence is generatedThe 1-AGO sequence of (2) may beWherein:
For example, generating a neighboring mean sequence based on a 1-AGO sequence may beWherein:
illustratively, the gray differential equation constructed based on the adjacent mean sequence may be: And can makeThe least squares estimation parameter sequence for obtaining the gray differential equation can be as follows
Illustratively, the resulting 1-AGO estimation sequence may be:
Illustratively, the time estimated sequence obtained by subtracting the 1-AGO estimated sequence from the influence characteristic factor v may be
Optionally, when k is greater than or equal to t, prediction of each influence characteristic factor v at a certain future time point can be achieved, and a future time sequence is obtained.
Optionally, after step S4, the method further comprises:
and (4) checking the step S4 based on a small residual probability P value checking mode and a variance ratio C value checking mode.
Illustratively, verifying step S4 based on the small residual probability P-value verification manner may specifically include:
First, a residual sequence is constructedThen calculate the average relative errorThe small residual probability P value p= (1-MRE (k)) ×100% is then calculated.
Illustratively, the checking the step S4 by the variance ratio C value checking method may specifically include:
First, a time series is calculatedMean of (2)Next, a time series is calculatedVariance of (2)Then, the variance of the residual sequence ε (k) is calculatedFinally, calculate the variance ratio C value:
Alternatively, the GM (1, 1) model evaluation is performed by using a combination of the small residual probability P value and the variance ratio C value, and the evaluation results can be classified into good, acceptable, barely acceptable and unacceptable. As shown in table 1, table 1 is an evaluation result ranking criterion for GM (1, 1) model evaluation using a combination of small residual probability P value and variance ratio C value.
TABLE 1P evaluation results of values and C values
| P value | C value | Model evaluation |
| >0.95 | <0.35 | Good (good) |
| >0.8 | <0.5 | Qualified product |
| >0.7 | <0.65 | Barely acceptable |
| ≤0.7 | ≥0.65 | Failure to pass |
And S5, processing the future characteristic factor set by using support vector regression SVR to obtain a future pavement prediction result.
In this embodiment, step S5 specifically includes:
s51, constructing a training set based on a future characteristic factor set, wherein the training set comprises input characteristic vectors and prediction indexes;
s52, mapping the input feature vector to a high-dimensional linear space by using a nonlinear transformation relation;
S53, constructing a function expression Supporting Vector Regression (SVR) according to the nonlinear transformation relation, and solving the function expression;
S54, determining a future pavement prediction result according to the function expression and the future characteristic factor set.
In this embodiment, step S53 specifically includes:
s531, converting the process of solving the function expression into a problem of searching the optimization penalty parameter, the maximum error factor of the optimization insensitive loss function and the optimization kernel parameter;
S532, determining optimization penalty parameters, maximum error factors of the optimization insensitive loss function and optimization kernel parameters by using a mucor algorithm SMA to solve the function expression.
In this embodiment, step S532 specifically includes:
s5321, setting initial parameters and initializing a population;
the initial parameters comprise population quantity, individual dimension, fitness function, maximum iteration number, individual dimension upper boundary and individual dimension lower boundary;
S5322, calculating an initial fitness value;
S5323, calculating weight, a first parameter and a second parameter;
s5324, generating a random number and a pseudo random number, judging the sizes of the random number and the update parameter, and updating the individual position according to the judgment result;
S5325, calculating a fitness value, and updating an optimization penalty parameter, an optimization insensitive loss function maximum error factor and an optimization kernel parameter;
s5326, judging whether an end condition is met, if so, outputting an optimization penalty parameter, an optimization insensitive loss function maximum error factor and an optimization kernel parameter, and if not, repeating the steps S5323-S5325.
In this embodiment, in step S5321, the fitness function is:
Wherein C, epsilon and gamma respectively represent an optimization penalty parameter, an optimization insensitive loss function maximum error factor and an optimization kernel parameter,Representing the fitness value of the ith individual, yg representing the actual value of the training set,The predicted value of SVR regression of the training set when the ith individual is taken is represented, g represents the sample of the training set, and h represents the number of samples in the data set.
Illustratively, the training set constructed based on the future feature factor set may be t= { (x1,y1),(x2,y2),…,(xh,yh) }, where h is the number of samples within the training set, h < T, xi(xi∈Rq, i=1, 2,..h) is the input feature vector of the i-th sample, xi=[xi1,xi2…xiq]T, and the output value yi is the predictor, yi e R.
Illustratively, a nonlinear transformation is employedMapping the training set input feature vector xi∈Rq to a high-dimensional linear space, and constructing a function expression as follows: Where ω is the weight and b is the bias term.
Adopting epsilon-insensitive loss function and considering relaxation variables xi, xi*, converting the process of solving f (x) into solving the following optimization problem:
ξ,ξ*≥0
wherein C is a penalty parameter, and epsilon is a maximum error factor of the insensitive loss function.
Illustratively, using the Lagrangian multiplier method, the Lagrangian multiplier alphai is introduced,Obtained by saddle point conditions:
wherein: is a defined kernel function.
Thus, it is possible to obtain
Illustratively, a Radial Basis Function (RBF) is employed as a kernel function k (xi, x) of the support vector machine, expressed as kRBF(xi,x)=exp(-γ||xi-x||2, where the kernel parameters
Illustratively, determining the optimization penalty parameter, the optimization insensitive loss function maximum error factor, and the optimization kernel parameter using the mucor algorithm SMA may specifically include:
Parameters (population number pop, individual dimension dim, fitness function fitn, maximum number of iterations M, individual dimension upper boundary maxxj, individual dimension lower boundary minxj) are set, initializing population Xpop×dim.
And calculating an initial fitness value fitni, wherein i is an individual, i is equal to or less than pop, j is an individual dimension, and j is equal to or less than dim.
In this embodiment ,minxC=1,maxxC=200、minxε=0.1,maxxε=2,minxγ=0.01,maxxγ=20,M=200. is directed to penalty parameter C, insensitive loss function maximum error factor epsilon and kernel parameter γ in the optimizing support vector regression machine SVR, dim=3.
Illustratively, the fitness function may be:
Wherein, the right meaning of the formula is that the actual value and the predicted value yi of the regression of the data set SVR when the ith individual (c, epsilon and gamma) is taken,G represents samples of the training set and h represents the number of samples within the training set.
Illustratively, the first parameter may be a, the second parameter may be b, the weight may be W, and
a=arctanh(1-(m/M))
b=1-m/M
Wherein M is the iteration number, M is less than or equal to M, BF (M) is the optimal fitness of the population in the mth iteration, WF (M) is the worst fitness of the population in the mth iteration, Fi (M) is the fitness value of the individual i in the mth iteration, and rand (1, dim) is a random number vector of 1 row and dim column.
Illustratively, the random number may be r, the pseudo random number may be rand, and the update parameter may be z.
In this embodiment, z=0.03, and random numbers r and rand are generated, and the magnitudes of the judgment r and the location update parameter z may be:
if r < z, updating the individual position according to Xij(m+1)=rand·(maxxj-minxj)+minxj, otherwise, updating parameters p, va, vb:
p=tanh(|Fm(i)|-BFm)
va=2a·rand([1,dim])-a
vb=2b·rand([1,dim])-b
Optionally, if r < p, the individual positions are updated according to Xij(m+1)=XBFj(m)+va·(Wij(m)·XAj(m)-XBj (m)), where a and B are two randomly selected individuals and XBFj (m) is the position of the most adaptive individual at the mth iteration.
Alternatively, if r.gtoreq.p, the individual location is updated as Xij(m+1)=vb·Xij (m).
And, the fitness value can be calculated, the global optimal solution can be updated, and whether the end condition is satisfied can be judged,
If so, outputting an optimization penalty parameter, an optimization insensitive loss function maximum error factor and an optimization kernel parameter, and if not, repeating steps S5323-S5325.
Alternatively, the end condition may be that the maximum number of iterations M is reached.
In the embodiment, the SVR punishment parameter C, the insensitive loss function maximum error factor epsilon and the nuclear parameter gamma of the support vector regression machine by adopting the slime algorithm (SMA) optimization have the advantages of complete and simple theory, strong optimizing capability and convenience in implementation.
Those of ordinary skill in the art will recognize that the embodiments described herein are for the purpose of aiding the reader in understanding the principles of the present invention and should be understood that the scope of the invention is not limited to such specific statements and embodiments. Those of ordinary skill in the art can make various other specific modifications and combinations from the teachings of the present disclosure without departing from the spirit of the invention, and such modifications and combinations are still within the scope of the invention.