Movatterモバイル変換


[0]ホーム

URL:


CN112733891B - A method for identifying the alighting station of bus IC card passengers when the travel chain is broken - Google Patents

A method for identifying the alighting station of bus IC card passengers when the travel chain is broken
Download PDF

Info

Publication number
CN112733891B
CN112733891BCN202011593440.4ACN202011593440ACN112733891BCN 112733891 BCN112733891 BCN 112733891BCN 202011593440 ACN202011593440 ACN 202011593440ACN 112733891 BCN112733891 BCN 112733891B
Authority
CN
China
Prior art keywords
station
departure
bus
card
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011593440.4A
Other languages
Chinese (zh)
Other versions
CN112733891A (en
Inventor
王成
崔紫薇
张烜榕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huaqiao University
Original Assignee
Huaqiao University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huaqiao UniversityfiledCriticalHuaqiao University
Priority to CN202011593440.4ApriorityCriticalpatent/CN112733891B/en
Publication of CN112733891ApublicationCriticalpatent/CN112733891A/en
Application grantedgrantedCritical
Publication of CN112733891BpublicationCriticalpatent/CN112733891B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

The invention relates to a method for identifying a bus IC card passenger getting off a bus station when a travel chain breaks, which comprehensively uses a meter-collecting model and a non-meter-collecting model and is based on a two-layer Stacking frame. A method based on group history is provided in a first layer of Stacking framework to improve the recognition probability of a get-off station, and the defect of less personal history trip records in the method based on the personal history is overcome. The Logistic regression model is used in the second layer of Stacking framework, the weight of each method in the first layer can be effectively determined for different data sets, the obtained model parameters are more suitable for the data sets, the generalization capability is better, and the recognition accuracy is beneficially influenced. The invention can identify the off-board stations of the IC card data when all travel chains are broken; different weights of the multiple methods of the first layer are determined by using a Logistic regression model in the second layer, and the weights can be adjusted according to different data sets, so that the method has better generalization capability and further has higher accuracy.

Description

Method for identifying bus IC card passengers to get off station points during travel chain breakage
Technical Field
The invention relates to the technical field of data processing, in particular to a method for identifying a bus stop of a bus IC card passenger when a travel chain breaks.
Background
The get-off station data of the passengers of the bus IC card are important basic data for researches such as bus passenger flow prediction, bus network optimization, vehicle dispatching, system evaluation and the like.
At present, conventional calculation methods of bus stops can be mainly divided into two types: a set meter model and a non-set meter model.
The study object of the meter collection model is a group, and the number of passengers at each station can be determined. The common method is a station attraction method based on bus stations, and the number of passengers getting off each station is calculated by using a get-off probability theory based on influence factors such as the travel distance of resident buses, the influence of the land property around the stations and the attraction strength of the stations. However, the meter collection model cannot determine the departure station of an individual passenger.
The study object of the non-set-top model is an individual, and the departure station of each passenger can be determined. The most classical method is a travel chain theory-based method when each card has a plurality of card swipe records every day and a travel chain is closed. The basic idea is to find the get-off station of the previous trip through the card swiping time and the station position of the get-on station of the next trip. For the last swipe record of a card on a day, some studies assume that the destination is near the first pick-up station on the day or the next day, to determine the last recorded pick-up station. In fact, some passengers do not often ride public transportation, nor do they travel to form a closed travel chain, and therefore cannot determine the destination of the broken travel chain by the above method. When each card only has a card-swiping record once a day, the departure station point cannot be determined by a travel chain-based method.
Under the condition that the public transportation travel chain of the passengers is incomplete, a plurality of scholars further study the records of the destination determined by the travel chain method as a personal history set, divide travel into two cases of working days and non-working days, respectively find similar records in the personal history records, then calculate the boarding frequency of each possible station to get off, and select the station with the highest frequency as the destination.
Some students analyze the getting-off probability of each station from two aspects of time and space, and the getting-off station with the highest probability is determined by multiplying the time probability and the space probability to serve as a destination. In addition, a suitable threshold value for calculating the distance between the station and the accurate station is determined. However, none of these methods can calculate the departure point for all the card swiping data.
Some scholars have proposed a two-step based algorithm to solve this problem. The method comprises the steps of determining a departure station point based on a deterministic model such as a travel chain method and the like in the first step, and determining the departure station point based on a plurality of characteristics and a machine learning algorithm for travel chain fracture data in the second step. Also, this study classified passengers into several groups based on K-Means clustering, and the departure stations were determined separately for each group to improve accuracy.
The above method is to analyze from a set of models or a non-set of models alone.
Some scholars integrate individual travel characteristics of passengers into a probability model calculated by the site attraction weights, and determine all recorded departure stops in the example. First, the departure station point is determined by a travel chain method. Then, the possible departure station point with the highest personal boarding frequency is selected as the destination. And finally, calculating the proportion of the boarding passenger flow of each possible boarding station to the boarding passenger flow of all the possible boarding stations, and distributing the boarding stations to each record at random in proportion.
In the prior art, a KNN-based method, a decision tree-based method and a random forest-based method are non-set counting models, the three machine learning algorithms all need to be based on a trip chain method and the like to determine that data of a get-off station identified by the models are used as training sets for training the models, and information such as IC card information, IC card recorded get-on stations, get-on time, land utilization properties and the like is selected in the existing research to be used as input of the three algorithms.
The method based on KNN directly determines a plurality of pieces of data most similar to the records of the departure station to be identified to identify the departure station according to a plurality of characteristics, but the selection of nearest neighbor sample values in the method is generally subjective and lacks objective basis, which can cause that a similar data set to the records of the departure station to be identified is difficult to determine and reduces the accuracy.
The decision tree-based method is used for determining nodes according to characteristics, the common existing coefficient, information gain or information gain ratio of internal node splitting standards can determine a decision tree based on different standards and the data of the identified departure stations, so that the departure station data to be identified is predicted, but the method is easy to be subjected to fitting, and the prediction accuracy is reduced.
The random forest-based method comprises a plurality of decision trees, and can greatly reduce overfitting, the result of determining the station to be identified for getting off is determined by the mode of the output result of the decision trees, but the number of the trees is usually required to be set manually, and the larger the number is, the better, but for identifying the station for getting off with a large data set, the too many trees can cause excessive time and space cost, and the too few trees can reduce accuracy.
In summary, in the prior art, two methods for recording the train station by the IC card mainly identify the breaking of the traveling chain are as follows: methods based on non-set-top models, including the use of machine learning algorithms, etc.; methods of combining a set of meter models and a non-set of meter models. The former is generally complex and has low interpretability; the latter often uses several fixing methods in sequence, the recognition rate is difficult to guarantee, and the generalization capability is poor and the accuracy is low.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, and provides a method for identifying the departure point of a bus IC card passenger when a travel chain breaks, which has high accuracy for identifying and distinguishing the departure point of a common bus when the travel chain breaks, has wide application range and can meet the actual requirements in engineering application.
The technical scheme of the invention is as follows:
a method for identifying the bus stop of a bus IC card passenger when a travel chain breaks comprises the following steps:
1) According to IC card swiping data and operation vehicle data of the conventional bus, a first layer of Stacking frame is used for identifying the stop point of the passenger who swipes the IC card of the conventional bus;
2) And 3) taking the identification result in the step 1) as input, and identifying the stop of the passenger getting off the bus IC card by using a second-layer Stacking framework based on a Logistic regression model.
Preferably, in step 1), there is providedThe trip of the mth passenger on the d-th day and the b-th trip chain break is the J-th trip in the T-th class J station in the f direction of the l route1 Personal siteGet on the car, the travel at the j-th place is obtained through identification2 Probability of getting off for each possible get off station, where j1 <j2 <J;
Identifying the station-off point of a conventional bus IC card swiping passenger by adopting one or more of a method based on a personal high-frequency station, a method based on a downstream station attraction, a method based on transfer convenience probability, a method based on a land property attraction probability and a method based on a group history record by using a first layer of Stacking framework, and respectively obtaining the station-off point of the conventional bus IC card swiping passenger at the j2 Station points of possible departureGet-off probability->
Preferably, in step 1), the method for determining the probability of getting off a possible get-off station based on the method of the personal high frequency station is as follows:
statistics of mth passenger during study period D day, at jth2 Station points of possible departureThe total number of times of card swiping on the car is->The next trip is at j2 Possible departure stops->The probability of getting off is as follows:
preferably, in step 1), the method for determining the probability of getting off a possible get-off station based on the attraction of the downstream station is as follows:
statistics of mth passenger at jth1 Personal siteIn the bus shift of the bus, at the j2 Possible departure stops->The total number of times of card swiping on the car is->The next trip is at j2 Possible departure stops->The probability of getting off is as follows:
preferably, in step 1), the method for determining the getting-off probability of the possible getting-off station based on the transfer convenience probability is as follows:
statistics of j according to bus static line station information2 Station points of possible departureBus route number->The next trip is at j2 Possible departure stops->The probability of getting off is as follows:
and, in addition, the method comprises the steps of,
preferably, in step 1), the method for determining the getting-off probability of the possible getting-off station based on the method of the land property attraction probability is as follows:
Let j be2 Station points of possible departureH city construction land types are shared in the surrounding research areas of the road, and the road is in the j th place2 Possible departure stops->The probability of getting off is as follows:
wherein C ish For the H e {1, 2.,. Sup.H } city construction land type attraction coefficient,for possibly getting off the station>Around h city construction land type.
Preferably, in step 1), the method for determining the probability of getting off a possible get-off station based on the group history method is as follows:
a) Clustering the bus IC card data of the identified departure points into clusters, taking the bus IC card data of the same cluster for identifying the departure points based on a travel chain method as a history group record, and determining the departure points to be identified;
b) Constructing a clustering index, wherein the clustering index comprises two types, and the first type is related fields in the bus IC card data of the identified station points and is used for recording data generated by each card swiping; the second class is a plurality of indexes constructed according to the first class clustering indexes and actual conditions and used for mining the similarity among different IC card data;
c) And (3) selecting a plurality of clustering indexes, normalizing the selected clustering indexes, scaling the clustering indexes by adopting maximum and minimum standardization, enabling the index value to be located between a given minimum value and a given maximum value, and scaling the characteristic value of each clustering index to the unit size.
D) Clustering based on K-Means algorithm, and combining elbow rule to determine the best clustering class number CG Obtaining a card swiping travel mode
E) Trip data for setting the mth passenger on the d-th day and the b-th trip chain to break belong to a clusterThen cluster->Determining records of the departure station points as a group history record data set based on a travel chain method; and determining to get on and at j according to the group history data set2 Possible departure stops->The frequency of getting off is +.>The next trip is at j2 Possible departure stops->The get-off probability of (2) is as follows:
Preferably, the method is characterized in that the step 2) is specifically as follows:
2.1 A model is built, the possible get-off station points obtained through identification in the step 1) are respectively marked as 0 or 1, the possible get-off station points marked as 1 are the identified correct get-off station points, the possible get-off station points marked as 0 are the incorrect get-off station points, and the correct get-off station points and the incorrect get-off station points are used as input of a Logistic regression model of a second layer of Stacking framework;
for the travel of the mth passenger on the d-th day and the b-th travel chain break, outputting the Logistic regression model as the j-th travel2 The probability of getting off a potential get off station is as follows:
wherein,,is an input vector, +. >Pm,d,b (j1 ,j2 ) Is->One or more of the following; />Is a weight vector, ">W is W1 、w2 、w3 、w4 、w5 One or more of which are respectively represented by +.>Weights, w0 Is biased;
2.2 Identifying bus IC card data of a departure station point by using a travel chain-based method, and learning a model by taking the bus IC card data as a training set and a test set;
2.3 Selecting a maximum likelihood estimation method to estimate model parameters, and adopting an L-BFGS algorithm suitable for large-scale data calculation to determine parameter values; then at j2 Station points of possible departureThe probability of getting off is as follows:
wherein,,is->Maximum likelihood estimates of (a);
2.4 (m) the trip station point of trip with broken travel chain of the mth passenger on the d-th dayIs the j-th with the highest probability of getting off in the possible getting-off station2 Possible departure stops->The method comprises the following steps:
preferably, in step 2.2), if the number of incorrect alighting stations is greater than the number of correct alighting stations, the following steps are performed:
2.2.1 Random undersampling is adopted for the data of the incorrect get-off station, and the data are combined with the original data of the correct get-off station;
2.2.2 Oversampling based on SMOTE algorithm is performed on the data of the correct departure station, and the oversampling is combined with the original data of the incorrect departure station;
2.2.3 Combining the data obtained after the step 2.2.1) and the step 2.2.2), selecting 90% of the data as a training data set of the Logistic regression model, and the remaining 10% as a test set of the Logistic regression model.
Preferably, in step 2.4), after determining the number of the station of the departure station, the name of the station of the departure station and the longitude and latitude are determined by combining the station information of the static bus route.
The beneficial effects of the invention are as follows:
the method for identifying the bus IC card passengers when the travel chain breaks comprises the steps of comprehensively using a two-layer Stacking framework based on a centralized meter model and a non-centralized meter model, wherein the first-layer Stacking framework uses a personal high-frequency station method, a downstream station attraction method, a transfer convenience probability-based method, a land property attraction probability-based method and a group history record-based method. The second layer of Stacking framework can effectively determine the weight of each method in the first layer for different data sets by using a Logistic regression model, and the obtained model parameters are more suitable for the data sets, have better generalization capability and have beneficial effects on the recognition accuracy; the early peak period of working day is the time period that the passenger goes out the most regularly, and the more the passenger swipes the card, the more regular the trip behavior.
The invention provides a method based on group history in a first layer of Stacking framework to improve the recognition probability of a get-off station, and overcomes the defect of less personal history travel records in the method based on the personal history. The method based on the personal history record needs to be based on the record of the historical travel behaviors of the passengers on the same line and at the same station, and the identification of the passenger departure station is carried out according to the similar travel behavior rules of the personal history.
Compared with the traditional KNN-based method, decision tree-based method and random forest-based method, the method provided by the invention adopts a two-layer Stacking framework method, and can identify the off-board stations of the IC card data when all travel chains are broken. According to the invention, different weights of the multiple methods of the first layer are determined by using the Logistic regression model in the second layer, and the weights can be adjusted according to different data sets, so that the method has better generalization capability and further higher accuracy.
Drawings
FIG. 1 is a functional block diagram of the present invention;
FIG. 2 is a schematic flow chart of using data in learning by a Logistic regression model of a second layer Stacking framework;
FIG. 3 is a schematic diagram of travel situation analysis based on a single travel chain of a single passenger in the invention;
FIG. 4 is a schematic diagram of subject data relationships;
FIG. 5 is a schematic view of land utilization property distribution within 800 meters around BRT fast 1 line site in Xiamen;
FIG. 6 is a schematic diagram of site names and number of lines traversed;
FIG. 7 is a diagram showing the number of IC cards as a function of the frequency of card swiping;
FIG. 8 is a schematic diagram of the data volume duty cycle for each loyalty of each card type;
FIG. 9 is a graph showing the variation of the average distortion level with the number of clusters of the K-Means algorithm;
FIG. 10 is a schematic illustration of the probability of getting off each possible get off station in a different cluster in an example;
FIG. 11 is a diagram of accuracy of identifying departure stops in different time periods on different days of the week;
FIG. 12 is a schematic diagram of accuracy of identifying a departure point in different time periods for different card types;
fig. 13 is a diagram of accuracy of identifying departure points for different loyalty of different card types.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples.
The method for identifying the bus IC card passengers to get off the bus when the travel chain is broken comprises the following steps as shown in fig. 1 and 2:
1) According to IC card swiping data and operation vehicle data of the conventional bus, a first layer of Stacking frame is used for identifying the stop point of the passenger who swipes the IC card of the conventional bus; wherein, the trip with broken trip chain of the mth passenger on the d-th day is the J-th station of the T-th shift in the f direction of the l route1 Personal siteGet on the car, the travel at the j-th place is obtained through identification2 Probability of getting off for each possible get off station, where j1 <j2 <J;
Identifying the station-off point of a conventional bus IC card swiping passenger by adopting one or more of a method based on a personal high-frequency station, a method based on a downstream station attraction, a method based on transfer convenience probability, a method based on a land property attraction probability and a method based on a group history record by using a first layer of Stacking framework, and respectively obtaining the station-off point of the conventional bus IC card swiping passenger at the j2 Station points of possible departureGet-off probability->
2) And 3) taking the identification result in the step 1) as input, and identifying the stop of the passenger getting off the bus IC card by using a second-layer Stacking framework based on a Logistic regression model.
In step 1), the method for determining the getting-off probability of the possible getting-off station based on the personal high-frequency station method is as follows:
statistics of mth passenger during study period D day, at jth2 Station points of possible departureThe total number of times of card swiping on the car is->Whether the departure station point is identified by a travel chain method or not is counted; the next trip is at j2 Possible departure stops->The probability of getting off is as follows:
the method for determining the getting-off probability of the possible getting-off station based on the method of the attraction right of the downstream station is as follows:
statistics of mth passenger at jth1 Personal siteIn the bus shift of the bus, at the j2 Possible departure stops->The total number of times of card swiping on the car is->Whether the departure station point is identified by a travel chain method or not is counted; the next trip is at j2 Possible departure stops->Is below (1)The probability of the vehicle is as follows:
the method for determining the getting-off probability of the possible getting-off station based on the transfer convenience probability is as follows:
statistics of j according to bus static line station information2 Station points of possible departureBus route number->The next trip is at j2 Possible departure stops->The probability of getting off is as follows:
and, because at least one bus route passes through one bus stop, therefore,
the method for determining the getting-off probability of the possible getting-off station based on the method of the land property attraction probability is as follows:
let j be2 Station points of possible departureH city construction land types are shared in the surrounding research areas of the road, and the road is in the j th place2 Possible departure stops->The probability of getting off is as follows:
wherein C ish For the H e {1, 2.,. Sup.H } city construction land type attraction coefficient,for possibly getting off the station>Around h city construction land type.
The method for determining the getting-off probability of the possible getting-off station based on the group history method is as follows:
a) Clustering the bus IC card data of the identified departure points into clusters, taking the bus IC card data of the same cluster for identifying the departure points based on a travel chain method as a history group record, and determining the departure points to be identified;
b) The method comprises the steps of constructing clustering indexes, wherein the clustering indexes comprise two types, and specifically comprise the following steps:
the first type is related fields in the bus IC card data of the identified bus stop, such as fields of card numbers, card types and the like, and is used for recording data generated by each card swiping, ensuring that each piece of IC card data is a record generated by one card swiping, and ensuring that the minimum unit of the partial aggregation is data.
The second class is a plurality of indexes constructed according to the first class clustering indexes and actual conditions and used for mining the similarity among different IC card data. For example, the boarding station type index is that the station number and the total passenger flow of the boarding station are clustered into three categories based on a K-Means algorithm, namely three passenger flow center station types, which can help the data of boarding at the same passenger flow center station level to be clustered in the same cluster better; the loyalty index of the passengers is obtained by gathering the number of card swiping times and the corresponding number of cards into three types based on a K-Means algorithm, and the data with more similar behavioral habits can be gathered in the same cluster.
C) And (3) selecting a plurality of clustering indexes, normalizing the selected clustering indexes, scaling the clustering indexes by adopting maximum and minimum standardization, enabling the index value to be located between a given minimum value and a given maximum value, and scaling the characteristic value of each clustering index to the unit size.
D) Clustering based on K-Means algorithm, and combining elbow rule to determine the best clustering class number CG Obtaining a card swiping travel mode
E) Trip data for setting the mth passenger on the d-th day and the b-th trip chain to break belong to a clusterThen cluster->Determining records of the departure station points as a group history record data set based on a travel chain method; and determining to get on and at j according to the group history data set2 Possible departure stops->The frequency of getting off is +.>The next trip is at j2 Possible departure stops->The probability of getting off is as follows:
the step 2) is specifically as follows:
2.1 The possible get-off station points obtained through identification in the step 1) are respectively marked as 0 or 1, the possible get-off station points marked as 1 are the identified correct get-off station points, the possible get-off station points marked as 0 are the incorrect get-off station points, and the correct get-off station points and the incorrect get-off station points are used as input of a Logistic regression model of the second layer of Stacking framework. At this time, the input value of the Logistic regression model of the second layer Stacking framework is between [0,1], and normalization is not needed.
For the travel of the mth passenger on the d-th day and the b-th travel chain break, outputting the Logistic regression model as the j-th travel2 The probability of getting off a potential get off station is as follows:
wherein,,is an input vector, +.>Pm,d,b (j1 ,j2 ) Is->One or more of the following; />Is a weight vector, ">W is W1 、w2 、w3 、w4 、w5 One or more of which are respectively represented by +.>Weights, w0 Is biased;
in the present embodiment of the present invention,
2.2 Using bus IC card data based on travel chain method to identify the station at the next station, and using the bus IC card data as training set and test set to learn the model. In practical situations, the number of possible departure stops is often large, so that the data size of the tag 0 is far greater than the data size of the tag 1. For this data imbalance phenomenon, i.e. if the number of incorrect alighting stations is greater than the number of correct alighting stations, the following steps are performed:
2.2.1 Randomly undersampling the data of the incorrect get-off station (the data with the label of 0) and merging the data with the original data of the correct get-off station (the original data with the label of 1);
2.2.2 Oversampling based on SMOTE algorithm is performed on the data of the correct departure station (the raw data with the label of 1), and the data is combined with the raw data of the incorrect departure station (the raw data with the label of 0);
2.2.3 Combining the data obtained after the step 2.2.1) and the step 2.2.2), selecting 90% of the data as a training data set of the Logistic regression model, and the remaining 10% as a test set of the Logistic regression model.
2.3 Selecting a maximum likelihood estimation method to estimate model parameters, and converting the problem into an optimization problem aiming at a maximization criterion function; and the L-BFGS (Limited-memory BFGS) algorithm suitable for large-scale data calculation is adopted for determining the parameter values; then at j2 Station points of possible departureThe probability of getting off is as follows:
wherein,,is->Maximum likelihood estimates of (a);
2.4 (m) the trip station point of trip with broken travel chain of the mth passenger on the d-th dayIs the j-th with the highest probability of getting off in the possible getting-off station2 Possible departure stops->The method comprises the following steps:
further, after the number of the station is determined, the name of the station and the longitude and latitude of the station can be determined by combining with the station information of the static bus line.
Examples
1. Introduction to Experimental objects and data sets
Rapid Transit (BRT) is a new type of public passenger transport system that is interposed between rapid transit and conventional buses. The Xiamen BRT can collect gate access information of the passenger IC card and determine the complete boarding and alighting station of each IC card data. Meanwhile, the physical isolation of the BRT special lanes of the Xiamen enables a plurality of BRT lines to form a small bus traffic network.
The study problem of this example is how to identify a bus IC card passenger at a departure station to be identified when a travel chain breaks, and this problem will be formally described in connection with three bus trips of a certain passenger on a certain day shown in fig. 3.
First trip: station k of passenger in upward direction of line A1 Get on the bus, and can determine the possible station point set { k }2 ,k3 ,k4 ,k5 ,k6 According to the distance between the boarding station and each possible alighting station, the arrival time of the bus GPS and the IC card swiping time of the second trip, the station can be determined to be the first alighting station based on a trip chain method.
And (5) traveling for the second time: station k of passenger in descending direction of B line5 Get on the bus, and can determine the possible station point set { k }2 ,k3 ,k4 And the distance between the station point of the third trip and each possible get-off station exceeds a threshold value, so that a trip chain breaks, and the second trip get-off station cannot be determined based on a trip chain method.
And (3) traveling for the third time: station k of passenger in C line uplink direction1 Get on the bus, and can determine the possible station point set { k }2 ,k3 ,k4 And the last trip of the present day is the current trip, so that the last station point of the first trip of the present day is assumed to be the last station point of the next trip, and whether the next station point can be determined according to a trip chain method is judged.
In this example, since the station distance between the boarding station for the first trip and each possible alighting station exceeds the threshold, the trip chain breaks, and the alighting station for the third trip cannot be determined based on the trip chain method.
To sum up, as shown in fig. 3, in three bus trips of a certain passenger on a certain day: the first trip is complete in the trip chain, and is not a study object; the second trip and the third trip are trips when the trip chain breaks, and the identification of the departure station points of the two trips is the research problem of the example.
In this example, IC card data of a station along the BRT fast 1 line on which the gate access machine of the station has been identified in 2018, 11 in xiaomen, city, fujian, is selected as a study object, and IC card data of the gate access machines of the rest BRT having been identified in the same period of time are selected to assist in identifying the station point of the station based on the travel chain method, as shown in fig. 4. In the research period, the IC card data of the recognized boarding station, namely the boarding station, is 3673184 in total in the BRT 1-line station, and the card types of the boarding station can be divided into student cards, old people cards, common cards and special cards. In the research period, the early peak of Xiamen city is 7:00:00-9:00:00, the late peak is 17:00:00-19:00:00, and whether the raining condition exists every day in the research period can be determined according to the weather conditions issued by the China weather exchange.
The fast 1 line of the Xiamen BRT is divided into an uplink and a downlink, each of which passes through 27 identical sites but has opposite sequences, and spans three administrative areas of a Siming area, a lake area and a beauty area, and the types and the areas of the urban construction land within the range of 800 meters around each site are shown in figure 5. The magnitude of the land use property attraction coefficient around the site is related to the scale of the research city, and the values in cities with similar scales are similar, so the invention determines the land use property attraction coefficient of each city construction land according to the city scale of the Xiamen city and related research, and the land use property attraction coefficient is shown in table 1.
Table 1: site-surrounding land use property attraction coefficient
Land use PropertiesCoefficient of attraction
Residence land1
Commercial service facility land1.2
Public management and public service land1.1
Industrial land1
Logistics binStorage land0.6
Public land0.8
Greenbelt and plaza land0.7
Traffic facility land1.3
Meanwhile, the number of BRT lines passing through each site can be determined according to static line site information of the BRT as shown in fig. 6.
2. Evaluation method and index
The test of the present invention will be divided into two parts: in a second layer based on a two-layer Stacking frame method, identifying a Logistic regression model for learning bus IC card data of a lower station point according to a travel chain method; and (5) checking the identification of the station points of the passengers getting off the bus IC card when the travel chain breaks.
(1) Verification of Logistic regression models
The inspection method comprises the following steps: the training set and the testing set of the Logistic regression model come from bus IC card data for identifying the departure station point based on a travel chain method.
And (3) checking the index: and F1 score is selected as a test index, and the closer the value is to 1, the better the learned Logistic regression model is. Let this part have the actual tag value 1 and be predicted as tag value 1A bar record; actual tag value 1 and predicted as tag value 0 +.>Stripe data; actual tag value 0 and is predicted as tagValue 1 +.>The bar record has the following F1 score:
(2) Inspection of bus IC card passenger getting-off station when trip chain breaks
The inspection method comprises the following steps: and if the existing departure station point is missing, identifying the departure station point for the IC card data when the travel chain is broken, and comparing and checking the identified departure station point with the actual departure station point.
And (3) checking the index: the identification rate and the accuracy rate are selected as the inspection indexes, and the method is better as the numerical value of the identification rate and the accuracy rate are close to 100%. Let the present part share Nun The data of the broken trip chain needs to be identified as the station for getting off the vehicle, whereinThe bar may be determined to be the next stop at which point there is an identification rate as follows:
if it isThere is +.>The accuracy rate of the bar data when the bar data is correctly identified to the next station is as follows:
3. setting of experimental parameters
The parameter values involved in this example are determined as follows:
(1) Distance threshold setting based on travel chain method
According to the actual situation of BRT site spacing in Xiamen city, the part sets the radius Dis of the research range of the land utilization property around the BRT siteland-use =800 meters, the distance threshold based on the travel chain method is 2000 meters.
(2) Penalty coefficient determination for Logistic regression model
In the second layer of the method based on the two-layer Stacking framework, in order to enable the Logistic regression model to better exert potential, the method determines that the optimal penalty coefficient is 100 through multiple experiments.
(3) Parameter setting for contrast method
And (3) selecting a KNN-based method, a decision tree-based method, a random forest-based method and a passenger high-frequency station and downstream station attraction right-based method as comparison.
In the KNN-based method, the decision tree-based method and the random forest-based method, the existing research uses POI data to replace site surrounding land utilization properties for research, however, the site surrounding land utilization properties exist in the known data used in the invention, so that the site surrounding POI data is not required to be used for replacement. Meanwhile, through multiple experiments, the nearest neighbor sample value suitable for the partial data set in the KNN-based method is determined to be 1000, the number of established trees in the random forest-based method is determined to be 2000, and the coefficient of foundation is selected as a standard in the decision tree-based method.
In the method based on the attraction of the high-frequency station and the downstream station and the method based on the attraction of the downstream station in the first layer of the Stacking framework of the method, the passenger flow of the station in the shift needs to be selected to determine the off station when the travel chain breaks. However, in the embodiment, only BRT data recorded by the gate of the passenger in and out is used, and it is not known which shift the passenger takes, so that the passenger traffic of each stop in the hour where the passenger in-station moment is located is used as the passenger traffic of each stop of the shift where the passenger is located, and the probability calculation is performed to determine the get-off stop.
4. Example results
According to the setting of experimental parameters, for 3673184 pieces of IC card data of the boarding station identified in the research period, the departure station point (the accuracy rate is 80.96%) of 2425101 pieces of data can be identified based on a travel chain method, and the records of 1248083 travel chain breaks are left, so that the identification of the boarding station can be performed by a KNN-based method, a decision tree-based method, a random forest-based method, a passenger high-frequency station and downstream station attraction right-based method and the method.
And then, respectively carrying out example result display on a method based on group history records, a Logistic regression model based on a second layer of the two-layer Stacking framework and identification of the station getting-off point of the IC card passenger when the travel chain breaks.
(1) Method based on group history record
The frequency of swiping the 3673184 pieces of IC card data of the boarding site identified in the study period was counted as shown in fig. 7. It can be found that the number of cards decreases as the frequency of card swiping by passengers increases, and that the frequency of card swiping by more than 80% of IC cards is 8 times or less.
When the passenger loyalty index based on the group history method is constructed, the card swiping times and the corresponding card quantity of the research data of 2018 in Xiamen city are clustered into three types based on a K-Means algorithm, and then a result is obtained: when the number of card swipes is 1 or 2 per month, the passenger occasionally rides BRT, and the loyalty of the passenger=1; when the number of card swipes is 3 to 8 in one month, BRT is a traffic mode selectable by the passenger, and the loyalty of the passenger=2; when the number of swipes is greater than 8 for one month, the passenger is a faithful user for BRT travel, and the loyalty of the passenger=3. At this time, the IC card data amount ratio of various card types under various loyalty can be obtained as shown in fig. 8.
Constructing 11 clustering indexes according to actual data in a research time period of Xiamen city, as shown in table 2; and maximum and minimum normalization is performed. Subsequently, clustering based on the K-Means algorithm can be performed to obtain the variation of the average distortion degree with the number of clusters, as shown in FIG. 9.
Table 2: index introduction based on 1C card data clustering in group history recording method
According to fig. 9 and the elbow rule, it is determined that the clustering is optimal when it is 2 clusters. At this time, 3673184 pieces of IC card data having identified the boarding station within the study period are divided into clusters R1 And cluster R2 The number of records of each cluster is shown in table 3.
Table 3: number of recordings of different clusters
When two travel chains are broken, the travel is carried out at the 8 th station in the 1 st line uplink direction of the BRT of the Xiamen, but the two travel chains respectively belong to the cluster R1 And cluster R2 Based on the method of group history, according to the data of the alighting stations identified by the same group based on the travel chain method, the alighting probability of the alighting stations at each possible alighting station for two trips can be determined, as shown in fig. 10. Belonging to cluster R1 The trip of (2) is the highest in the 14 th station, and belongs to the cluster R2 The trip of (2) is the highest in the 16 th station.
(2) Logistic regression model in second layer based on two-layer Stacking framework
The model uses 2425101 pieces of data for identifying the station points at the departure based on a travel chain method to learn and test the model. At this time, after the IC card data are associated with each possible departure station, 40113752 pieces of data can be obtained, wherein 2425101 pieces of records with the correct tag value of 1 for the departure station are included, and tag values of the remaining 37688651 pieces of records are 0. Due to the data imbalance phenomenon, these records will be resampled as shown in table 4.
Table 4: number of IC card data before and after sampling
As can be seen from table 4, in order to prevent the under fitting of the model caused by too small data amount, the data amount of tag 0 after sampling is still large, but the data ratio of tag value 0 to tag value 1 is already from 15.54 before sampling: 1 falls to 1.82:1, has better improvement.
For the 75178131 sampled data, 90% of the data were randomly selected as the training set for the Logistic regression model, and the remaining 10% were used as the test set. When the penalty coefficient is 100, the score of the test index F1 is 0.67, and the parameters of the Logistic regression model obtained by the method are as follows:
(3) Identification of IC card passenger getting-off station point when trip chain breaks
The identification of the train station recorded by the IC card when the travel chain is broken by the method based on the two-layer Stacking framework is shown in table 5.
Table 5: results of different methods for identifying a get-off station
And (3) for different periods of different days of the week, the accuracy rate of identifying the get-off station based on a two-layer Stacking frame method is studied. The accuracy per time period is counted by different days of the week and the division in units of hours is made in terms of the passenger transaction time, as shown in fig. 11.
And (3) for different card types and different time periods, researching the accuracy rate of identifying the get-off station based on a two-layer Stacking frame method. The accuracy per time period is counted according to different card types and the division in units of hours is performed in the passenger transaction time, as shown in fig. 12.
For passengers with different card types and different loyalty, the accuracy of identifying the getting-off station based on a two-layer Stacking frame method is studied. The accuracy of each division is counted by dividing according to different card types and different loyalty, as shown in fig. 13.
5. Analysis of results
(1) As shown in fig. 7, the frequency of card swiping of IC cards exceeding 80% is small, resulting in that in the method based on personal history, the collection of data of individuals who recognize the departure points by the trip chain method as personal history data sets is small, so the present invention proposes a method based on group history is necessary in order to improve the disadvantage of the method that personal history data sets are small.
As shown in fig. 11, identifying the next station data based on the trip chain method as the historical data set in different clusters shows different performances, and clustering based on the data layer can effectively bring similar data together, which illustrates that the group-based historical record method of clustering IC card data as the minimum unit is effective.
(2) As can be obtained from table 5, the recognition rate based on the two-layer Stacking frame method provided by the invention is 100.00%, and the recognition rate based on the KNN method, the decision tree method and the random forest method can be the same, so that the recognition of the next station point of all the IC card data when the travel chain breaks can be performed, and the recognition rate is slightly higher than the recognition rate based on the passenger high-frequency station and downstream station attraction method.
(3) As can be seen from table 5, the two-layer Stacking frame method based on the integrated meter and the non-integrated meter model, the passenger high-frequency station based on the passenger high-frequency station and the downstream station attraction method have much higher accuracy than the KNN based method, the decision tree based method and the random forest based method which only use a single model, and the integrated meter and the non-integrated meter model can obtain better effects than the method which only uses one model. In the two methods which are the integrated meter and the non-meter model, the accuracy of the method is higher than that of the other method, and the method has the advantages that the method is effective in determining the weights of the methods in the first layer aiming at different data sets by using the Logistic regression model in the second layer, and the obtained model parameters are more suitable for the data sets and have better generalization capability.
(4) As shown in fig. 11, the accuracy is highest in the early 11-month working day peak period (7:00:00-9:00:00) in 2018, and higher than the accuracy in the non-working day period; meanwhile, as shown in fig. 8 and 12, the data amount of the IC card is recorded by a normal card and a student card with a data amount ratio exceeding 93.5%, and the accuracy in the early peak period is higher than that in other periods.
Thus, the working day early peak period is the most regular time period for the passengers to travel, and is consistent with the actual situation that a large number of commuter/school passengers enter from a fixed residence to a fixed workplace/school at early peak. As can be seen from analysis of fig. 8 and 13, whichever card type corresponds to the situation that the more the card is swiped and the higher the accuracy of identifying the departure points, the higher the loyalty of the passenger to select the BRT, and the more regular the travel behavior from the departure point to the destination.
The method is greatly different from the existing typical station point identification method, can be used for comprehensive analysis and comparison in the aspects of a method system, a data volume application range, an identification rate and the like, and is particularly shown in a table 6.
Table 6: the invention is compared with the prior typical method for identifying the station points at the next station by different point analysis
The above examples are only for illustrating the present invention and are not to be construed as limiting the invention. Variations, modifications, etc. of the above-described embodiments are intended to fall within the scope of the claims of the present invention, as long as they are in accordance with the technical spirit of the present invention.

Claims (8)

CN202011593440.4A2020-12-292020-12-29 A method for identifying the alighting station of bus IC card passengers when the travel chain is brokenActiveCN112733891B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202011593440.4ACN112733891B (en)2020-12-292020-12-29 A method for identifying the alighting station of bus IC card passengers when the travel chain is broken

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202011593440.4ACN112733891B (en)2020-12-292020-12-29 A method for identifying the alighting station of bus IC card passengers when the travel chain is broken

Publications (2)

Publication NumberPublication Date
CN112733891A CN112733891A (en)2021-04-30
CN112733891Btrue CN112733891B (en)2023-08-01

Family

ID=75607891

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202011593440.4AActiveCN112733891B (en)2020-12-292020-12-29 A method for identifying the alighting station of bus IC card passengers when the travel chain is broken

Country Status (1)

CountryLink
CN (1)CN112733891B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN113902180B (en)*2021-09-272025-07-04佳都科技集团股份有限公司 A subway up and down passenger flow prediction method and processing terminal

Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2016045195A1 (en)*2014-09-222016-03-31北京交通大学Passenger flow estimation method for urban rail network
CN109903553A (en)*2019-02-192019-06-18华侨大学 Multi-source data mining method for identification and inspection of bus alighting stations
CN111932867A (en)*2020-06-182020-11-13东南大学Multisource data-based bus IC card passenger getting-off station derivation method
CN111985710A (en)*2020-08-182020-11-24深圳诺地思维数字科技有限公司Bus passenger trip station prediction method, storage medium and server

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2016045195A1 (en)*2014-09-222016-03-31北京交通大学Passenger flow estimation method for urban rail network
CN109903553A (en)*2019-02-192019-06-18华侨大学 Multi-source data mining method for identification and inspection of bus alighting stations
CN111932867A (en)*2020-06-182020-11-13东南大学Multisource data-based bus IC card passenger getting-off station derivation method
CN111985710A (en)*2020-08-182020-11-24深圳诺地思维数字科技有限公司Bus passenger trip station prediction method, storage medium and server

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于历史出行记录扩充的公交乘客下车站点推算方法;崔紫薇;王成;陈德蕾;雷蕾;;南京大学学报(自然科学)(02);全文*
基于多源数据挖掘的公交IC卡乘客下车站点识别及应用;崔紫薇;《中国优秀硕士学位论文全文数据库 (工程科技Ⅱ辑)》;全文*

Also Published As

Publication numberPublication date
CN112733891A (en)2021-04-30

Similar Documents

PublicationPublication DateTitle
CN111932867B (en)Multisource data-based bus IC card passenger getting-off station derivation method
CN110390349A (en) Modeling method for forecasting bus passenger flow based on XGBoost model
CN105513337B (en)The Forecasting Methodology and device of a kind of volume of the flow of passengers
CN105718946A (en)Passenger going-out behavior analysis method based on subway card-swiping data
CN108681741B (en)Subway commuting crowd information fusion method based on IC card and resident survey data
CN110889092A (en)Short-time large-scale activity peripheral track station passenger flow volume prediction method based on track transaction data
CN109903553B (en) Multi-source data mining method for identification and inspection of bus alighting stations
CN112508425B (en) A method for constructing an urban travel user portrait system for flexible public transportation systems
CN107729938B (en)Rail station classification method based on bus connection radiation zone characteristics
CN107832779B (en) A system for classification of orbital stations
CN105469602A (en)Method for predicting bus passenger waiting time range based on IC card data
CN113449780A (en)In-road berth occupancy prediction method based on random forest and LSTM neural network
CN107918826B (en)Driver evaluation and scheduling method for driving environment perception
CN111598333A (en)Passenger flow data prediction method and device
CN112949926B (en) A revenue-maximizing ticket allocation method based on passenger demand re-identification
CN116502781A (en)Bus route planning and influence factor visual analysis method based on GPS data
Li et al.Influence of weather conditions on the intercity travel mode choice: A case of Xi’an
CN110020666B (en) A public transportation advertisement delivery method and system based on passenger behavior pattern
CN112800210A (en) Crowd portrait algorithm based on massive bus data
CN112733891B (en) A method for identifying the alighting station of bus IC card passengers when the travel chain is broken
CN113361885B (en) A dual-objective urban public transport benefit evaluation method based on multi-source data
Song et al.Public transportation service evaluations utilizing seoul transportation card data
CN108932530A (en)The construction method and device of label system
CN116090785B (en)Custom bus planning method for two stages of large-scale movable loose scene
Cui et al.Alighting Stop Determination of Unlinked Trips Based on a Two‐Layer Stacking Framework

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp