CN113469739B

Movatterモバイル変換

Info

Publication number: CN113469739B
Application number: CN202110711952.4A
Authority: CN
Inventors: 吴元琪
Original assignee: Guangzhou Chenqi Travel Technology Co Ltd
Current assignee: Guangzhou Chenqi Travel Technology Co Ltd
Priority date: 2021-06-25
Filing date: 2021-06-25
Publication date: 2024-05-28
Anticipated expiration: 2041-06-25
Also published as: CN113469739A

Abstract

The invention relates to the technical field of taxi taking, in particular to a taxi taking demand prediction method and a taxi taking demand prediction system for network taxi taking. Comprising the following steps: reading a plurality of associated data which have an influence on the driving requirement; repairing abnormal and missing data in the associated data; respectively calculating the characteristics of the associated data and generating a characteristic set; carrying out overall training on the data in the feature set; performing secondary training on the data of the high single-volume region in the feature set; carrying out multi-model fusion on the results of the two training to obtain a prediction model; and inputting current associated data to obtain a predicted future driving demand result. The prediction method and the prediction system have the advantages of high accuracy, high operation efficiency and difficult fitting, machine learning modeling processing is carried out according to a plurality of associated data, the prediction method and the prediction system have higher accuracy compared with simple order quantity calculation, the problems of lower accuracy, low operation efficiency and easy fitting existing in the existing prediction technology are solved, and the prediction requirement of driving requirements is met.

Description

Prediction method and system for taxi taking demand of network taxi taking

Technical Field

The invention relates to the technical field of taxi taking, in particular to a taxi taking demand prediction method and a taxi taking demand prediction system for network taxi taking.

Background

With the development of the Internet, the business of taxi taking travel is gradually transferred from offline to online, and a user can be matched with a required network taxi taking and ordering only by inputting a starting point and a terminal point through application software. In general, the dispatching work of the network about car is regulated and controlled by a platform side, and a network about car driver is reasonably dispatched according to the single quantity condition displayed by the urban thermodynamic diagram so as to improve the capacity efficiency of a coverage area; because the taxi taking demands of passengers in each area are continuously changed, real-time adjustment work is difficult to be effective, historical data are required to be analyzed in advance, and future taxi taking demands are predicted so as to improve the carrying efficiency of network taxi taking business.

The existing prediction work of the taxi taking demand is mainly calculated by depending on the change of the number of historical orders, the order quantity in the same time period in each month or each day is predicted by calculating the average value of the orders in the subsequent same time period, and the method is simple and easy to implement, but has poor prediction effect, often has larger deviation from the actual situation, and the prediction accuracy cannot meet the requirement; in addition, at present, there is a technology for predicting the driving requirement by machine learning, but because the algorithm is not optimized according to the driving travel field, when a large amount of order data is processed, the machine model has lower running efficiency, and the situation of fitting is easy to occur, that is, the history data is used for testing to obtain a better effect, but the accuracy is slipped down when the driving requirement in the future time period is predicted, so that a prediction method and a prediction system for the driving requirement of the network bus are needed to solve the problems.

Disclosure of Invention

In order to overcome the technical defects of lower accuracy, low operation efficiency and easy overfitting existing in the existing prediction technology, the invention provides a prediction method and a prediction system for the taxi taking demand of a net taxi, which have the advantages of high accuracy, high operation efficiency and difficult overfitting.

In order to solve the problems, the invention is realized according to the following technical scheme:

The invention discloses a prediction method for taxi taking demand of a network taxi, which is characterized by comprising the following steps of:

reading a plurality of associated data which have an influence on the driving requirement;

Repairing abnormal and missing data in the associated data;

respectively calculating the characteristics of the associated data and generating a characteristic set;

carrying out overall training on the data in the feature set;

Performing secondary training on the data of the high single-volume region in the feature set;

carrying out multi-model fusion on the results of the two training to obtain a prediction model;

And inputting current associated data to obtain a predicted future driving demand result.

The associated data includes: historical order volume data, order weather data, holiday information data, regional population density data, and regional community density data.

The repairing of the abnormal and missing data in the associated data is specifically as follows: sorting and ordering the associated data respectively, arranging the associated data into a normal distribution diagram according to the occurrence frequency of the data, and deleting the data outside the 3 delta range in the normal distribution diagram to reduce abnormal data; and if the value can be assigned to the missing data, filling the missing data by using the element 0, and if the value cannot be assigned to the missing data, acquiring the data of the front section and the rear section of the missing data, extracting the data with high frequency, and filling the missing data.

The method comprises the steps of respectively calculating the characteristics of the associated data and generating a characteristic set, and specifically comprises the following steps: dividing the historical order quantity data at fixed time intervals to obtain a sequence of the historical order quantity data, then respectively calculating a single quantity ring ratio, a single quantity daily homonymy ratio and a single quantity Zhou Tongbi of the sequence, marking the time dimension and the space dimension of the historical order quantity data by utilizing the related data, and finally processing by adopting a time-frequency transformation algorithm to obtain the feature set.

The time-frequency transformation algorithm comprises a Fourier transformation algorithm.

The data in the feature set is integrally trained, specifically: the feature set is read, data is input into a first machine learning model for training, then the generated data is combined into the feature set, the data is extracted again and input into a second machine learning model for training, and the generated data is combined into the feature set.

The first machine learning model is XGBoost algorithm model.

The second machine learning model is LightGBM algorithm model.

The second training is performed on the data of the high single-volume area in the feature set, specifically: and reading the feature set, extracting data marked as a high-single-volume region, inputting the data into a deep learning model for training, and then merging the generated data into the feature set.

The deep learning model is an LSTM algorithm model.

The multi-model fusion is carried out on the results of the two training to obtain a prediction model, which is specifically as follows: and respectively obtaining the prediction results and the mean square error of the whole training and the secondary training, obtaining the prediction result of multi-model fusion by carrying out weighted average on the two prediction results, and obtaining a stable prediction model by repeated iterative computation.

A predictive system for taxi taking demand for network taxi taking, the system comprising:

the reading module is used for reading a plurality of associated data which have influence on the driving requirement;

the repair module is used for repairing abnormal and missing data in the associated data;

The feature module is used for respectively calculating the features of the associated data and generating a feature set;

the integral training module is used for carrying out integral training on the data in the feature set;

the secondary training module is used for carrying out secondary training on the data of the high single-volume area in the feature set;

the fusion module is used for carrying out multi-model fusion on the results of the two training to obtain a prediction model;

And the prediction module is used for inputting the current associated data and obtaining a predicted future taxi taking demand result.

Compared with the prior art, the invention has the beneficial effects that:

The prediction method and the prediction system for the taxi taking demand of the online taxi taking have the advantages of being high in accuracy, high in operation efficiency and not easy to fit, machine learning modeling processing is conducted according to a plurality of associated data, the accuracy is higher compared with that of simply calculating the order quantity, different training models are selected according to the magnitude of the data through separating feature sets corresponding to the associated data, the efficiency is higher compared with that of operation of a general system, the complexity of single training is reduced, the condition of fitting is reduced, the problems of low accuracy, low operation efficiency and easiness in fitting existing in the existing prediction technology are solved, and the prediction demand of the taxi taking demand is met.

Drawings

The invention is described in further detail below with reference to the attached drawing figures, wherein:

FIG. 1 is a schematic flow diagram of the method of the present invention;

fig. 2 is a schematic diagram of the system architecture of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.

As shown in fig. 1 to 2, the method for predicting the taxi taking demand for the internet taxi according to the present invention is characterized in that the method includes:

101. reading a plurality of associated data which have an influence on the driving requirement;

The associated data includes, but is not limited to: the historical order quantity data, order weather data, holiday information data, regional population density data and regional community density data have the advantages that the historical order quantity data, the regional population density data and the regional community density data are processed and analyzed, so that the historical order quantity data has higher prediction accuracy relative to single historical order quantity data, the influence of a small amount of abnormal data on the overall result is reduced, and the prediction stability is ensured.

102. Repairing abnormal and missing data in the associated data;

The repairing of the abnormal and missing data in the associated data is specifically as follows: and respectively sorting and ordering the associated data, and arranging the associated data into a normal distribution diagram according to the occurrence frequency of the data, wherein the data outside the 3 delta range in the normal distribution diagram is the data with lower frequency, usually the maximum value or the minimum value caused by statistical errors, repeated calculation, system loopholes or extreme events, so that the partial data is deleted to reduce abnormal data.

If the missing data can be assigned with numerical values, such as historical order quantity data, regional population density data or regional community density data, filling the missing data by using 0 element, so that the data is kept continuous; if the value cannot be assigned, such as order weather data or holiday information data, acquiring the data of the front section and the rear section of the missing data, calculating the frequency of effective data in the fixed interval section, and extracting the data with high frequency to fill the missing data.

103. Respectively calculating the characteristics of the associated data and generating a characteristic set;

The method comprises the steps of respectively calculating the characteristics of the associated data and generating a characteristic set, and specifically comprises the following steps: setting a fixed time interval as a constant t, and dividing the historical order quantity data at the time interval t to obtain a sequence { d₀,d₁,...,d_n } of the historical order quantity data, wherein d_n is the historical order quantity data contained in the nth time interval t.

Then, calculating the single-quantity loop ratio of the sequence respectively, wherein the formula is as follows: in order to obtain a percentage change in the historical order quantity data over the last period of time.

Calculating a single-quantity daily homonymy of the sequence, wherein the formula is as follows: in order to obtain a percentage change in the historical order quantity data relative to the last day.

The single quantity Zhou Tongbi of the sequence is calculated, and the formula is as follows: in order to obtain a percentage change in the historical order quantity data relative to the last day.

Marking the time dimension of the historical order quantity data through the order weather data and the holiday information data, and marking the historical order quantity data as data of information such as peak time intervals, flat peak time intervals, working days, holidays, raindays and the like; marking the space dimension of the historical order quantity data through the regional population density data and the regional community density data, marking the administrative area, the hexagonal area and the longitude and latitude of the central area corresponding to the historical order quantity data area, marking a high-order-quantity area and a low-order-quantity area at the same time, and finally processing by adopting a time-frequency transformation algorithm, wherein the time-frequency transformation algorithm comprises a Fourier transformation algorithm, and the formula of the time-frequency transformation algorithm is as follows:

Wherein D_k is the amplitude after Fourier transformation, D_k is the historical order quantity data of the kth data, and the feature set can be obtained by reading the data processed by the time-frequency transformation algorithm.

104. Carrying out overall training on the data in the feature set;

The data in the feature set is integrally trained, specifically: the method comprises the steps of reading a feature set, inputting data into a first machine learning model for training, wherein the first machine learning model is a XGBoost algorithm model, merging generated data into the feature set, extracting data again, inputting the extracted data into a second machine learning model for training, and obtaining a prediction result of a low single-quantity region in the feature set through a XGBoost algorithm model and a LightGBM algorithm model, marking as predict1, obtaining a mean square error of the feature set as error1, and merging the generated data into the feature set.

Wherein, the calculation formula of the mean square error is as follows

D_i is the true value of the data as a reference,Is a predicted value of the data.

105. Performing secondary training on the data of the high single-volume region in the feature set;

the secondary training is carried out on the data of the high single-volume area in the feature set, specifically: the feature set is read, data marked as a high-single-volume region is extracted, the data is input into a deep learning model for training, and as a preferred implementation mode of the invention, the deep learning model is an LSTM algorithm model, a prediction result of the high-single-volume region in the feature set is obtained through training of the LSTM algorithm model and is marked as predict < 2>, a mean square error of the feature set is obtained and is marked as error < 2>, and then the generated data is combined into the feature set.

106. Carrying out multi-model fusion on the results of the two training to obtain a prediction model;

The multi-model fusion is carried out on the results of the two training to obtain a prediction model, which is specifically as follows: the prediction results and the mean square error of the whole training and the secondary training are respectively obtained, and the prediction results of the multi-model fusion are obtained by carrying out weighted average on the two prediction results, wherein the formula is as follows:

and obtaining a stable prediction model through repeated iterative computation.

107. And inputting current associated data to obtain a predicted future driving demand result.

Current association data is entered including, but not limited to: historical order quantity data, order weather data, holiday information data, regional population density data and regional community density data are input into a prediction model, and predicted driving demand results of all regions in a future designated time period are calculated.

the reading module 1 is used for reading a plurality of associated data which have influence on the driving requirement;

a repair module 2, configured to repair abnormal and missing data in the associated data;

a feature module 3, configured to calculate features of the associated data and generate feature sets, respectively;

the integral training module 4 is used for integral training of the data in the feature set;

the secondary training module 5 is used for performing secondary training on the data of the high single-volume region in the feature set;

The fusion module 6 is used for carrying out multi-model fusion on the results of the two training to obtain a prediction model;

And the prediction module 7 is used for inputting the current associated data and obtaining a predicted future taxi taking demand result.

The prediction method and the prediction system have the advantages of high accuracy, high operation efficiency and difficult fitting, machine learning modeling processing is carried out according to a plurality of associated data, higher accuracy is achieved relative to simple calculation of order quantity, different training models are selected according to the magnitude of data by carrying out separation processing on feature sets corresponding to the associated data, higher efficiency is achieved relative to general operation, complexity of single training is reduced, fitting condition is reduced, the problems of lower accuracy, low operation efficiency and easy fitting existing in the existing prediction technology are solved, and the prediction requirement of driving requirements is met.

The present invention is not limited to the preferred embodiments, and any modifications, equivalent variations and modifications made to the above embodiments according to the technical principles of the present invention are within the scope of the technical proposal of the present invention.

Claims

1. A method for predicting taxi taking demand for a network taxi, the method comprising:

Repairing abnormal and missing data in the associated data, specifically: sorting and ordering the associated data respectively, arranging the associated data into a normal distribution diagram according to the occurrence frequency of the data, and deleting the data outside the 3 delta range in the normal distribution diagram to reduce abnormal data; for missing data, if the value can be assigned, filling the missing data by using 0 element, if the value cannot be assigned, obtaining the data of the front section and the rear section of the missing data, extracting the data with high frequency, and filling the missing data;

The characteristics of the associated data are calculated respectively and a characteristic set is generated, specifically: dividing the historical order quantity data at fixed time intervals to obtain a sequence of the historical order quantity data, then respectively calculating a single quantity ring ratio, a single quantity daily homonymy ratio and a single quantity Zhou Tongbi of the sequence, marking the time dimension and the space dimension of the historical order quantity data by utilizing the associated data, and finally processing by adopting a time-frequency transformation algorithm to obtain the feature set;

The data in the feature set is integrally trained, specifically: reading a feature set, inputting data into a first machine learning model for training, merging the generated data into the feature set, extracting the data again, inputting the extracted data into a second machine learning model for training, and merging the generated data into the feature set; the first machine learning model is XGBoost algorithm model; the second machine learning model is LightGBM algorithm model;

the data of the high single-volume area in the feature set is subjected to secondary training, and the method specifically comprises the following steps: reading the feature set, extracting data marked as a high single-volume region, inputting the data into a deep learning model for training, and then merging the generated data into the feature set; the deep learning model is an LSTM algorithm model;

2. A method of predicting taxi taking demand for network taxi taking as defined in claim 1, wherein: the associated data includes: historical order volume data, order weather data, holiday information data, regional population density data, and regional community density data.

3. A method of predicting taxi taking demand for network taxi taking as defined in claim 1, wherein: the time-frequency transformation algorithm comprises a Fourier transformation algorithm.

4. A method of predicting taxi taking demand for network taxi taking as defined in claim 1, wherein: the multi-model fusion is carried out on the results of the two training to obtain a prediction model, which is specifically as follows: and respectively obtaining the prediction results and the mean square error of the whole training and the secondary training, obtaining the prediction result of multi-model fusion by carrying out weighted average on the two prediction results, and obtaining a stable prediction model by repeated iterative computation.

5. A predictive system for taxi taking demand for network taxi taking, the system comprising:

The repair module is used for repairing abnormal and missing data in the associated data, and specifically comprises the following steps: sorting and ordering the associated data respectively, arranging the associated data into a normal distribution diagram according to the occurrence frequency of the data, and deleting the data outside the 3 delta range in the normal distribution diagram to reduce abnormal data; for missing data, if the value can be assigned, filling the missing data by using 0 element, if the value cannot be assigned, obtaining the data of the front section and the rear section of the missing data, extracting the data with high frequency, and filling the missing data;

The feature module is used for respectively calculating the features of the associated data and generating a feature set, and specifically comprises the following steps: dividing the historical order quantity data at fixed time intervals to obtain a sequence of the historical order quantity data, then respectively calculating a single quantity ring ratio, a single quantity daily homonymy ratio and a single quantity Zhou Tongbi of the sequence, marking the time dimension and the space dimension of the historical order quantity data by utilizing the associated data, and finally processing by adopting a time-frequency transformation algorithm to obtain the feature set;

The integral training module is used for integral training of the data in the feature set, and specifically comprises the following steps: reading a feature set, inputting data into a first machine learning model for training, merging the generated data into the feature set, extracting the data again, inputting the extracted data into a second machine learning model for training, and merging the generated data into the feature set; the first machine learning model is XGBoost algorithm model; the second machine learning model is LightGBM algorithm model;

The secondary training module is used for carrying out secondary training on the data of the high single-volume area in the feature set, and specifically comprises the following steps: reading the feature set, extracting data marked as a high single-volume region, inputting the data into a deep learning model for training, and then merging the generated data into the feature set; the deep learning model is an LSTM algorithm model;