Land use Properties	Coefficient of attraction
		Residence land	1
Commercial service facility land	1.2
		Public management and public service land	1.1
Industrial land	1
		Logistics binStorage land	0.6
Public land	0.8
		Greenbelt and plaza land	0.7
Traffic facility land	1.3

Meanwhile, the number of BRT lines passing through each site can be determined according to static line site information of the BRT as shown in fig. 6.

2. Evaluation method and index

The test of the present invention will be divided into two parts: in a second layer based on a two-layer Stacking frame method, identifying a Logistic regression model for learning bus IC card data of a lower station point according to a travel chain method; and (5) checking the identification of the station points of the passengers getting off the bus IC card when the travel chain breaks.

(1) Verification of Logistic regression models

The inspection method comprises the following steps: the training set and the testing set of the Logistic regression model come from bus IC card data for identifying the departure station point based on a travel chain method.

And (3) checking the index: and F1 score is selected as a test index, and the closer the value is to 1, the better the learned Logistic regression model is. Let this part have the actual tag value 1 and be predicted as tag value 1A bar record; actual tag value 1 and predicted as tag value 0 +.>Stripe data; actual tag value 0 and is predicted as tagValue 1 +.>The bar record has the following F1 score:

(2) Inspection of bus IC card passenger getting-off station when trip chain breaks

The inspection method comprises the following steps: and if the existing departure station point is missing, identifying the departure station point for the IC card data when the travel chain is broken, and comparing and checking the identified departure station point with the actual departure station point.

And (3) checking the index: the identification rate and the accuracy rate are selected as the inspection indexes, and the method is better as the numerical value of the identification rate and the accuracy rate are close to 100%. Let the present part share N^un The data of the broken trip chain needs to be identified as the station for getting off the vehicle, whereinThe bar may be determined to be the next stop at which point there is an identification rate as follows:

if it isThere is +.>The accuracy rate of the bar data when the bar data is correctly identified to the next station is as follows:

3. setting of experimental parameters

The parameter values involved in this example are determined as follows:

(1) Distance threshold setting based on travel chain method

According to the actual situation of BRT site spacing in Xiamen city, the part sets the radius Dis of the research range of the land utilization property around the BRT site^land-use =800 meters, the distance threshold based on the travel chain method is 2000 meters.

(2) Penalty coefficient determination for Logistic regression model

In the second layer of the method based on the two-layer Stacking framework, in order to enable the Logistic regression model to better exert potential, the method determines that the optimal penalty coefficient is 100 through multiple experiments.

(3) Parameter setting for contrast method

And (3) selecting a KNN-based method, a decision tree-based method, a random forest-based method and a passenger high-frequency station and downstream station attraction right-based method as comparison.

In the KNN-based method, the decision tree-based method and the random forest-based method, the existing research uses POI data to replace site surrounding land utilization properties for research, however, the site surrounding land utilization properties exist in the known data used in the invention, so that the site surrounding POI data is not required to be used for replacement. Meanwhile, through multiple experiments, the nearest neighbor sample value suitable for the partial data set in the KNN-based method is determined to be 1000, the number of established trees in the random forest-based method is determined to be 2000, and the coefficient of foundation is selected as a standard in the decision tree-based method.

In the method based on the attraction of the high-frequency station and the downstream station and the method based on the attraction of the downstream station in the first layer of the Stacking framework of the method, the passenger flow of the station in the shift needs to be selected to determine the off station when the travel chain breaks. However, in the embodiment, only BRT data recorded by the gate of the passenger in and out is used, and it is not known which shift the passenger takes, so that the passenger traffic of each stop in the hour where the passenger in-station moment is located is used as the passenger traffic of each stop of the shift where the passenger is located, and the probability calculation is performed to determine the get-off stop.

4. Example results

According to the setting of experimental parameters, for 3673184 pieces of IC card data of the boarding station identified in the research period, the departure station point (the accuracy rate is 80.96%) of 2425101 pieces of data can be identified based on a travel chain method, and the records of 1248083 travel chain breaks are left, so that the identification of the boarding station can be performed by a KNN-based method, a decision tree-based method, a random forest-based method, a passenger high-frequency station and downstream station attraction right-based method and the method.

And then, respectively carrying out example result display on a method based on group history records, a Logistic regression model based on a second layer of the two-layer Stacking framework and identification of the station getting-off point of the IC card passenger when the travel chain breaks.

(1) Method based on group history record

The frequency of swiping the 3673184 pieces of IC card data of the boarding site identified in the study period was counted as shown in fig. 7. It can be found that the number of cards decreases as the frequency of card swiping by passengers increases, and that the frequency of card swiping by more than 80% of IC cards is 8 times or less.

When the passenger loyalty index based on the group history method is constructed, the card swiping times and the corresponding card quantity of the research data of 2018 in Xiamen city are clustered into three types based on a K-Means algorithm, and then a result is obtained: when the number of card swipes is 1 or 2 per month, the passenger occasionally rides BRT, and the loyalty of the passenger=1; when the number of card swipes is 3 to 8 in one month, BRT is a traffic mode selectable by the passenger, and the loyalty of the passenger=2; when the number of swipes is greater than 8 for one month, the passenger is a faithful user for BRT travel, and the loyalty of the passenger=3. At this time, the IC card data amount ratio of various card types under various loyalty can be obtained as shown in fig. 8.

Constructing 11 clustering indexes according to actual data in a research time period of Xiamen city, as shown in table 2; and maximum and minimum normalization is performed. Subsequently, clustering based on the K-Means algorithm can be performed to obtain the variation of the average distortion degree with the number of clusters, as shown in FIG. 9.

Table 2: index introduction based on 1C card data clustering in group history recording method

According to fig. 9 and the elbow rule, it is determined that the clustering is optimal when it is 2 clusters. At this time, 3673184 pieces of IC card data having identified the boarding station within the study period are divided into clusters R₁ And cluster R₂ The number of records of each cluster is shown in table 3.

Table 3: number of recordings of different clusters

When two travel chains are broken, the travel is carried out at the 8 th station in the 1 st line uplink direction of the BRT of the Xiamen, but the two travel chains respectively belong to the cluster R₁ And cluster R₂ Based on the method of group history, according to the data of the alighting stations identified by the same group based on the travel chain method, the alighting probability of the alighting stations at each possible alighting station for two trips can be determined, as shown in fig. 10. Belonging to cluster R₁ The trip of (2) is the highest in the 14 th station, and belongs to the cluster R₂ The trip of (2) is the highest in the 16 th station.

(2) Logistic regression model in second layer based on two-layer Stacking framework

The model uses 2425101 pieces of data for identifying the station points at the departure based on a travel chain method to learn and test the model. At this time, after the IC card data are associated with each possible departure station, 40113752 pieces of data can be obtained, wherein 2425101 pieces of records with the correct tag value of 1 for the departure station are included, and tag values of the remaining 37688651 pieces of records are 0. Due to the data imbalance phenomenon, these records will be resampled as shown in table 4.

Table 4: number of IC card data before and after sampling

As can be seen from table 4, in order to prevent the under fitting of the model caused by too small data amount, the data amount of tag 0 after sampling is still large, but the data ratio of tag value 0 to tag value 1 is already from 15.54 before sampling: 1 falls to 1.82:1, has better improvement.

For the 75178131 sampled data, 90% of the data were randomly selected as the training set for the Logistic regression model, and the remaining 10% were used as the test set. When the penalty coefficient is 100, the score of the test index F1 is 0.67, and the parameters of the Logistic regression model obtained by the method are as follows:

(3) Identification of IC card passenger getting-off station point when trip chain breaks

The identification of the train station recorded by the IC card when the travel chain is broken by the method based on the two-layer Stacking framework is shown in table 5.

Table 5: results of different methods for identifying a get-off station

And (3) for different periods of different days of the week, the accuracy rate of identifying the get-off station based on a two-layer Stacking frame method is studied. The accuracy per time period is counted by different days of the week and the division in units of hours is made in terms of the passenger transaction time, as shown in fig. 11.

And (3) for different card types and different time periods, researching the accuracy rate of identifying the get-off station based on a two-layer Stacking frame method. The accuracy per time period is counted according to different card types and the division in units of hours is performed in the passenger transaction time, as shown in fig. 12.

For passengers with different card types and different loyalty, the accuracy of identifying the getting-off station based on a two-layer Stacking frame method is studied. The accuracy of each division is counted by dividing according to different card types and different loyalty, as shown in fig. 13.

5. Analysis of results

(1) As shown in fig. 7, the frequency of card swiping of IC cards exceeding 80% is small, resulting in that in the method based on personal history, the collection of data of individuals who recognize the departure points by the trip chain method as personal history data sets is small, so the present invention proposes a method based on group history is necessary in order to improve the disadvantage of the method that personal history data sets are small.

As shown in fig. 11, identifying the next station data based on the trip chain method as the historical data set in different clusters shows different performances, and clustering based on the data layer can effectively bring similar data together, which illustrates that the group-based historical record method of clustering IC card data as the minimum unit is effective.

(2) As can be obtained from table 5, the recognition rate based on the two-layer Stacking frame method provided by the invention is 100.00%, and the recognition rate based on the KNN method, the decision tree method and the random forest method can be the same, so that the recognition of the next station point of all the IC card data when the travel chain breaks can be performed, and the recognition rate is slightly higher than the recognition rate based on the passenger high-frequency station and downstream station attraction method.

(3) As can be seen from table 5, the two-layer Stacking frame method based on the integrated meter and the non-integrated meter model, the passenger high-frequency station based on the passenger high-frequency station and the downstream station attraction method have much higher accuracy than the KNN based method, the decision tree based method and the random forest based method which only use a single model, and the integrated meter and the non-integrated meter model can obtain better effects than the method which only uses one model. In the two methods which are the integrated meter and the non-meter model, the accuracy of the method is higher than that of the other method, and the method has the advantages that the method is effective in determining the weights of the methods in the first layer aiming at different data sets by using the Logistic regression model in the second layer, and the obtained model parameters are more suitable for the data sets and have better generalization capability.

(4) As shown in fig. 11, the accuracy is highest in the early 11-month working day peak period (7:00:00-9:00:00) in 2018, and higher than the accuracy in the non-working day period; meanwhile, as shown in fig. 8 and 12, the data amount of the IC card is recorded by a normal card and a student card with a data amount ratio exceeding 93.5%, and the accuracy in the early peak period is higher than that in other periods.

Thus, the working day early peak period is the most regular time period for the passengers to travel, and is consistent with the actual situation that a large number of commuter/school passengers enter from a fixed residence to a fixed workplace/school at early peak. As can be seen from analysis of fig. 8 and 13, whichever card type corresponds to the situation that the more the card is swiped and the higher the accuracy of identifying the departure points, the higher the loyalty of the passenger to select the BRT, and the more regular the travel behavior from the departure point to the destination.

The method is greatly different from the existing typical station point identification method, can be used for comprehensive analysis and comparison in the aspects of a method system, a data volume application range, an identification rate and the like, and is particularly shown in a table 6.

Table 6: the invention is compared with the prior typical method for identifying the station points at the next station by different point analysis

The above examples are only for illustrating the present invention and are not to be construed as limiting the invention. Variations, modifications, etc. of the above-described embodiments are intended to fall within the scope of the claims of the present invention, as long as they are in accordance with the technical spirit of the present invention.

Claims

1. The method for identifying the bus IC card passengers to get off the bus when the travel chain breaks is characterized by comprising the following steps:

1) According to IC card swiping data and operation vehicle data of the conventional bus, a first layer of Stacking frame is used for identifying the station point of the conventional bus IC card swiping passenger, in the step 1), the mth passenger is provided with the mth travel with broken mth travel chain on the d-th day, and the mth travel chain is in the J-th station of the T-th shift in the f-direction of the l-line₁ Personal siteGet on the car, the travel at the j-th place is obtained through identification₂ Probability of getting off for each possible get off station, where j₁ ＜j₂ ＜J；

2) Taking the identification result of the step 1) as input, and identifying the station point of the passenger getting off the conventional bus IC card by using a second-layer Stacking frame based on a Logistic regression model, wherein the step 2) is specifically as follows:

wherein,,is an input vector, +.>Is->One or more of the following; />Is a vector of weights and is used to determine,w is W₁ 、w₂ 、w₃ 、w₄ 、w₅ One or more of which are respectively represented by +.>Weights, w₀ Is biased;

wherein,,is->Maximum likelihood estimates of (a);

2. the method for identifying the departure point of a bus IC card passenger when the travel chain is broken according to claim 1, wherein in the step 1), the method for determining the probability of the departure of the possible departure point based on the method of the personal high-frequency station is as follows:

Statistics of mth passenger during study period D day, at jth₂ Station points of possible departureTotal number of card swiping times for getting on carThe next trip is at j₂ Possible departure stops->The probability of getting off is as follows:

3. the method for identifying the departure point of a bus IC card passenger when the travel chain is broken according to claim 1, wherein in the step 1), the method for determining the probability of departure of a possible departure station based on the attraction of the downstream station is as follows:

statistics of mth passenger at jth₁ Personal siteIn the bus shift of the bus, at the j₂ Station points of possible departureThe total number of times of card swiping on the car is->The next trip is at j₂ Possible departure stops->The probability of getting off is as follows:

4. the method for identifying the departure point of a bus IC card passenger when the travel chain is broken according to claim 1, wherein in the step 1), the method for determining the departure probability of the possible departure point based on the transfer convenience probability is as follows:

and, in addition, the method comprises the steps of,

5. the method for identifying the departure point of a bus IC card passenger when the travel chain is broken according to claim 1, wherein in the step 1), the method for determining the departure probability of the possible departure point based on the method for attracting probability of the land property is as follows:

wherein C is_h For the H e {1, 2.,. Sup.H } city construction land type attraction coefficient,to possibly get off the stationAround h city construction land type.

6. The method for identifying the departure point of a bus IC card passenger when the travel chain is broken according to claim 1, wherein in step 1), the method for determining the probability of the possible departure of the departure point based on the group history method is as follows:

C) Selecting a plurality of clustering indexes, normalizing the selected clustering indexes, scaling the clustering indexes by adopting maximum and minimum standardization, enabling the index values to be located between a given minimum value and a given maximum value, and scaling the characteristic value of each clustering index to a unit size;

7. the method for identifying the departure point of a bus IC card passenger when the travel chain is broken according to claim 1, wherein in step 2.2), if the number of incorrect departure points is greater than the number of correct departure points, the following steps are performed:

8. The method for identifying the bus IC card passengers to get off the bus when the travel chain is broken according to claim 1, wherein in the step 2.4), after the number of the bus to get off the bus is determined, the name of the bus to get off the bus and the longitude and latitude are determined by combining the static line station information.