Disclosure of Invention
In order to overcome the technical defects of lower accuracy, low operation efficiency and easy overfitting existing in the existing prediction technology, the invention provides a prediction method and a prediction system for the taxi taking demand of a net taxi, which have the advantages of high accuracy, high operation efficiency and difficult overfitting.
In order to solve the problems, the invention is realized according to the following technical scheme:
The invention discloses a prediction method for taxi taking demand of a network taxi, which is characterized by comprising the following steps of:
reading a plurality of associated data which have an influence on the driving requirement;
Repairing abnormal and missing data in the associated data;
respectively calculating the characteristics of the associated data and generating a characteristic set;
carrying out overall training on the data in the feature set;
Performing secondary training on the data of the high single-volume region in the feature set;
carrying out multi-model fusion on the results of the two training to obtain a prediction model;
And inputting current associated data to obtain a predicted future driving demand result.
The associated data includes: historical order volume data, order weather data, holiday information data, regional population density data, and regional community density data.
The repairing of the abnormal and missing data in the associated data is specifically as follows: sorting and ordering the associated data respectively, arranging the associated data into a normal distribution diagram according to the occurrence frequency of the data, and deleting the data outside the 3 delta range in the normal distribution diagram to reduce abnormal data; and if the value can be assigned to the missing data, filling the missing data by using the element 0, and if the value cannot be assigned to the missing data, acquiring the data of the front section and the rear section of the missing data, extracting the data with high frequency, and filling the missing data.
The method comprises the steps of respectively calculating the characteristics of the associated data and generating a characteristic set, and specifically comprises the following steps: dividing the historical order quantity data at fixed time intervals to obtain a sequence of the historical order quantity data, then respectively calculating a single quantity ring ratio, a single quantity daily homonymy ratio and a single quantity Zhou Tongbi of the sequence, marking the time dimension and the space dimension of the historical order quantity data by utilizing the related data, and finally processing by adopting a time-frequency transformation algorithm to obtain the feature set.
The time-frequency transformation algorithm comprises a Fourier transformation algorithm.
The data in the feature set is integrally trained, specifically: the feature set is read, data is input into a first machine learning model for training, then the generated data is combined into the feature set, the data is extracted again and input into a second machine learning model for training, and the generated data is combined into the feature set.
The first machine learning model is XGBoost algorithm model.
The second machine learning model is LightGBM algorithm model.
The second training is performed on the data of the high single-volume area in the feature set, specifically: and reading the feature set, extracting data marked as a high-single-volume region, inputting the data into a deep learning model for training, and then merging the generated data into the feature set.
The deep learning model is an LSTM algorithm model.
The multi-model fusion is carried out on the results of the two training to obtain a prediction model, which is specifically as follows: and respectively obtaining the prediction results and the mean square error of the whole training and the secondary training, obtaining the prediction result of multi-model fusion by carrying out weighted average on the two prediction results, and obtaining a stable prediction model by repeated iterative computation.
A predictive system for taxi taking demand for network taxi taking, the system comprising:
the reading module is used for reading a plurality of associated data which have influence on the driving requirement;
the repair module is used for repairing abnormal and missing data in the associated data;
The feature module is used for respectively calculating the features of the associated data and generating a feature set;
the integral training module is used for carrying out integral training on the data in the feature set;
the secondary training module is used for carrying out secondary training on the data of the high single-volume area in the feature set;
the fusion module is used for carrying out multi-model fusion on the results of the two training to obtain a prediction model;
And the prediction module is used for inputting the current associated data and obtaining a predicted future taxi taking demand result.
Compared with the prior art, the invention has the beneficial effects that:
The prediction method and the prediction system for the taxi taking demand of the online taxi taking have the advantages of being high in accuracy, high in operation efficiency and not easy to fit, machine learning modeling processing is conducted according to a plurality of associated data, the accuracy is higher compared with that of simply calculating the order quantity, different training models are selected according to the magnitude of the data through separating feature sets corresponding to the associated data, the efficiency is higher compared with that of operation of a general system, the complexity of single training is reduced, the condition of fitting is reduced, the problems of low accuracy, low operation efficiency and easiness in fitting existing in the existing prediction technology are solved, and the prediction demand of the taxi taking demand is met.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.
As shown in fig. 1 to 2, the method for predicting the taxi taking demand for the internet taxi according to the present invention is characterized in that the method includes:
101. reading a plurality of associated data which have an influence on the driving requirement;
The associated data includes, but is not limited to: the historical order quantity data, order weather data, holiday information data, regional population density data and regional community density data have the advantages that the historical order quantity data, the regional population density data and the regional community density data are processed and analyzed, so that the historical order quantity data has higher prediction accuracy relative to single historical order quantity data, the influence of a small amount of abnormal data on the overall result is reduced, and the prediction stability is ensured.
102. Repairing abnormal and missing data in the associated data;
The repairing of the abnormal and missing data in the associated data is specifically as follows: and respectively sorting and ordering the associated data, and arranging the associated data into a normal distribution diagram according to the occurrence frequency of the data, wherein the data outside the 3 delta range in the normal distribution diagram is the data with lower frequency, usually the maximum value or the minimum value caused by statistical errors, repeated calculation, system loopholes or extreme events, so that the partial data is deleted to reduce abnormal data.
If the missing data can be assigned with numerical values, such as historical order quantity data, regional population density data or regional community density data, filling the missing data by using 0 element, so that the data is kept continuous; if the value cannot be assigned, such as order weather data or holiday information data, acquiring the data of the front section and the rear section of the missing data, calculating the frequency of effective data in the fixed interval section, and extracting the data with high frequency to fill the missing data.
103. Respectively calculating the characteristics of the associated data and generating a characteristic set;
The method comprises the steps of respectively calculating the characteristics of the associated data and generating a characteristic set, and specifically comprises the following steps: setting a fixed time interval as a constant t, and dividing the historical order quantity data at the time interval t to obtain a sequence { d0,d1,...,dn } of the historical order quantity data, wherein dn is the historical order quantity data contained in the nth time interval t.
Then, calculating the single-quantity loop ratio of the sequence respectively, wherein the formula is as follows: in order to obtain a percentage change in the historical order quantity data over the last period of time.
Calculating a single-quantity daily homonymy of the sequence, wherein the formula is as follows: in order to obtain a percentage change in the historical order quantity data relative to the last day.
The single quantity Zhou Tongbi of the sequence is calculated, and the formula is as follows: in order to obtain a percentage change in the historical order quantity data relative to the last day.
Marking the time dimension of the historical order quantity data through the order weather data and the holiday information data, and marking the historical order quantity data as data of information such as peak time intervals, flat peak time intervals, working days, holidays, raindays and the like; marking the space dimension of the historical order quantity data through the regional population density data and the regional community density data, marking the administrative area, the hexagonal area and the longitude and latitude of the central area corresponding to the historical order quantity data area, marking a high-order-quantity area and a low-order-quantity area at the same time, and finally processing by adopting a time-frequency transformation algorithm, wherein the time-frequency transformation algorithm comprises a Fourier transformation algorithm, and the formula of the time-frequency transformation algorithm is as follows:
Wherein Dk is the amplitude after Fourier transformation, Dk is the historical order quantity data of the kth data, and the feature set can be obtained by reading the data processed by the time-frequency transformation algorithm.
104. Carrying out overall training on the data in the feature set;
The data in the feature set is integrally trained, specifically: the method comprises the steps of reading a feature set, inputting data into a first machine learning model for training, wherein the first machine learning model is a XGBoost algorithm model, merging generated data into the feature set, extracting data again, inputting the extracted data into a second machine learning model for training, and obtaining a prediction result of a low single-quantity region in the feature set through a XGBoost algorithm model and a LightGBM algorithm model, marking as predict1, obtaining a mean square error of the feature set as error1, and merging the generated data into the feature set.
Wherein, the calculation formula of the mean square error is as follows
Di is the true value of the data as a reference,Is a predicted value of the data.
105. Performing secondary training on the data of the high single-volume region in the feature set;
the secondary training is carried out on the data of the high single-volume area in the feature set, specifically: the feature set is read, data marked as a high-single-volume region is extracted, the data is input into a deep learning model for training, and as a preferred implementation mode of the invention, the deep learning model is an LSTM algorithm model, a prediction result of the high-single-volume region in the feature set is obtained through training of the LSTM algorithm model and is marked as predict < 2>, a mean square error of the feature set is obtained and is marked as error < 2>, and then the generated data is combined into the feature set.
106. Carrying out multi-model fusion on the results of the two training to obtain a prediction model;
The multi-model fusion is carried out on the results of the two training to obtain a prediction model, which is specifically as follows: the prediction results and the mean square error of the whole training and the secondary training are respectively obtained, and the prediction results of the multi-model fusion are obtained by carrying out weighted average on the two prediction results, wherein the formula is as follows:
and obtaining a stable prediction model through repeated iterative computation.
107. And inputting current associated data to obtain a predicted future driving demand result.
Current association data is entered including, but not limited to: historical order quantity data, order weather data, holiday information data, regional population density data and regional community density data are input into a prediction model, and predicted driving demand results of all regions in a future designated time period are calculated.
A predictive system for taxi taking demand for network taxi taking, the system comprising:
the reading module 1 is used for reading a plurality of associated data which have influence on the driving requirement;
a repair module 2, configured to repair abnormal and missing data in the associated data;
a feature module 3, configured to calculate features of the associated data and generate feature sets, respectively;
the integral training module 4 is used for integral training of the data in the feature set;
the secondary training module 5 is used for performing secondary training on the data of the high single-volume region in the feature set;
The fusion module 6 is used for carrying out multi-model fusion on the results of the two training to obtain a prediction model;
And the prediction module 7 is used for inputting the current associated data and obtaining a predicted future taxi taking demand result.
The prediction method and the prediction system have the advantages of high accuracy, high operation efficiency and difficult fitting, machine learning modeling processing is carried out according to a plurality of associated data, higher accuracy is achieved relative to simple calculation of order quantity, different training models are selected according to the magnitude of data by carrying out separation processing on feature sets corresponding to the associated data, higher efficiency is achieved relative to general operation, complexity of single training is reduced, fitting condition is reduced, the problems of lower accuracy, low operation efficiency and easy fitting existing in the existing prediction technology are solved, and the prediction requirement of driving requirements is met.
The present invention is not limited to the preferred embodiments, and any modifications, equivalent variations and modifications made to the above embodiments according to the technical principles of the present invention are within the scope of the technical proposal of the present invention.