Feature extraction method based on mobile phone signaling historical data reinforcement learningTechnical Field
The invention relates to the field of mobile phone signaling data processing, in particular to a feature extraction method based on mobile phone signaling historical data reinforcement learning.
Background
Along with the gradual strong functions of the mobile phone, people become more and more away from the mobile phone in daily life, such as mobile phone payment, mobile phone navigation, mobile phone information inquiry and the like, so that the mobile phone can basically accompany the user around all weather, and the position information of the mobile phone can represent the position information of the user to a great extent. Therefore, in the aspect of travel information acquisition technology, the mobile phone signaling data is widely applied as one of acquisition data sources.
At present, in the aspect of a travel analysis technology based on mobile phone signaling, two modes are mainly adopted to identify a travel origin-destination point, namely, the origin-destination point is determined by switching mobile phone signaling data and converting a position update information judgment track; the second category is determined by using a hierarchical or density-based clustering algorithm.
However, the current widely used technical means still cannot perfectly solve the problem, and particularly under the condition that the mobile phone signal base station has a coordinate recognition error, accurate acquisition of the geographic information of the user can have certain difficulty. Meanwhile, the incompatibility problem existing between the division of the base station area and the division of the traffic cell can also cause errors in identifying the user position information, data drift and ping-pong phenomenon occur, and errors in identifying the trip origin-destination are further aggravated.
In addition, in terms of algorithm, the algorithm for processing the mobile phone signaling data also has the problems of insufficient noise data processing, unclear definition of attribution division of a travel mode of the data, undefined threshold setting, unsatisfactory point location data clustering effect, difficult recognition of travel mode and mode selection and the like, and finally results in a series of pain points such as insufficient data utilization rate, inconsistent travel analysis results and reality and the like, and a new method needs to be specifically explored.
Disclosure of Invention
The invention aims to solve the problems that: the feature extraction method based on the reinforcement learning of the mobile phone signaling historical data is used for solving the problems of user travel analysis and data statistics accuracy and improving the mobile phone signaling data utilization rate.
The invention adopts the following technical scheme: a feature extraction method based on mobile phone signaling history data reinforcement learning comprises the following steps:
S10, constructing a history database according to the mobile phone signaling history data in the last period of time, and using the history database for double mobile phone signaling data reinforcement learning;
S20, first re-reinforcement learning and data characteristic learning: constructing a multichannel convolution Bayesian learning algorithm, learning signal intensity, time length and coordinate information characteristics of mobile phone signaling historical data, eliminating drift data and ping-pong data existing in initial mobile phone signaling data of a user, complementing the missing data points, and outputting complete trip sequence data of the user with high credibility;
s30, second reinforcement learning and travel behavior learning: defining travel coincidence degree calculation, learning historical data with higher travel coincidence degree with complete travel sequence data of a user, decomposing the data into three motion states of travel, stillness and small-range activity, outputting motion state fuzzy weights of the three motion states, and constructing a fuzzy travel membership set;
s40, combining the fuzzy travel membership sets, providing a three-Gaussian mixed clustering algorithm based on the fuzzy travel membership degree, carrying out clustering division on the complete travel sequence data of the user, extracting a travel set and a stay point set comprising two subsets of a static set and an edge motion set, and calculating the coverage area of each stay point set;
S50, reading land utilization information in the coverage range of the stay point set by means of an API (application program interface) of a map platform, providing a double-dynamic POI (point of interest) similar mapping algorithm, carrying out geographic information mapping and POI matching on the stay point set by combining land information to dynamically adjust algorithm parameters, and outputting accurate origin and destination points and a complete travel chain of a user;
And S60, reading and processing all the user data, dividing and outputting a travel OD matrix between each traffic cell in the same day according to time, counting the user travel mode and rule, and feeding back the user data result to the historical database in S10 for parameter updating.
Specifically, according to the mobile phone signaling historical data characteristics, each piece of point location data content stored in the S10 historical database comprises a user identifier, a user mobile phone number, a timestamp record of each piece of data comprising date and time information, position information comprising longitude and latitude coordinates and attribution of a base station, signal strength information reflecting signal quality between the base station and a mobile phone and dynamic and static three-value weights, and the record format of the data is as follows:
MS(U1,U2,…,Us)
Us(i)={Us,Usp,DATE,MM,(Lo,La,LCI),SI,Tra(j)}T
wherein MS (Us) is a complete user mobile phone signaling historical data sample, Us (i) refers to ith point location data of user trip sequence data Us, Usp is a mobile phone number of user Us, DATE is DATE data in a format of year/month/day, MM is time length data in a format of time/minute/second, Lo is longitude coordinates of the user, La is latitude coordinates of the user, LCI is base station cell home coordinates, SI is signal strength, tra (j) is a jth motion state fuzzy weight value, and T represents matrix transposition.
Specifically, in S20, the drift data refers to noise data in which user data is recorded as noise data that is suddenly changed to an abnormal value and then is switched back to the original value in a short time; ping-pong data refers to noise data of user data, which is recorded in the coverage area of two base stations and switched back and forth; the first reinforcement learning in the reinforcement learning of the signaling data of the dual mobile phone comprises the following steps:
S21, performing format conversion on the historical data in S10 and the initial data of the user in S20, normalizing the timestamp data, longitude and latitude coordinates and signal strength information, and ensuring that the scales are consistent;
S22, constructing a multichannel convolution Bayesian learning algorithm, learning historical data, calculating abnormal scores of user data according to a learning result, and eliminating abnormal data and carrying out Bayesian processing according to a dynamic threshold.
A multi-channel convolutional bayesian learning algorithm comprising the steps of:
S221, taking the mobile phone signaling historical data as input to perform feature learning, performing CNN first-order learning on a signal intensity channel, and outputting a one-dimensional convolution feature learning result, wherein the one-dimensional convolution feature learning result is expressed as follows:
Wherein X1 (i) is the one-dimensional convolution feature learning result of the ith point location data, bSI is the signal strength bias value, omegaSI is the signal strength weight, PSIl (i) is the signal loss probability of the ith point location data,N is the data quantity of the signaling information input;
s222, on the basis of S221, CNN second-order learning is carried out by combining a time channel, and a two-dimensional convolution characteristic learning result is output and expressed as follows:
Wherein X2 (i) is the two-dimensional convolution feature learning result of the ith point location data, bTIM is a time stamp offset value, sigmaTIM is a time compensation parameter, omegaTIM is a time weight, PTIMl (i) is the time loss probability of the ith point location data,The two-dimensional convolution input is carried out, and n and p are the data quantity of the signaling information input;
s223, on the basis of S222, performing CNN third-order learning by combining a coordinate channel, and outputting a three-dimensional convolution characteristic learning result, wherein the three-dimensional convolution characteristic learning result is expressed as follows:
Wherein X3 (i) is the three-dimensional convolution characteristic learning result of the ith point location data, bLOA is a coordinate offset value, sigmaLOA is a coordinate balance parameter, omegaLOA is a coordinate weight, PLOAl (i) is the coordinate loss probability of the ith point location data,For three-dimensional convolution input, n, p and q are signaling information input data quantity;
S224, introducing user initial data, and calculating anomaly score for each point location data:
Wherein:
AS (i) is an anomaly score value of the ith point location data of the user data Us, ASm (i) is an anomaly residual error in the mth dimension, rhom is a dimension index, a convolution feature learning result in the mth dimension of Xm (i), and J represents the total number of historical data, wherein j=1, 2, … and J;
Usi(SI)、Usi(MM)、Usi (LOA) is the signal intensity, the time length data and the user coordinate value of the ith point data in the user data respectively, and Hj(SI)、Hj(MM)、Hj (LOA) is the signal intensity, the time length information and the coordinate information of the jth historical data respectively;
Sd、Dd、Td is a one-dimensional, two-dimensional and three-dimensional data form, and ωs, ωd and ωt are one-dimensional, two-dimensional and three-dimensional anomaly correction weights.
S225, introducing Bayesian posterior observation smooth prediction processing calculation, carrying out data correction and elimination on the basis of historical data, and outputting each point data after user correction
Wherein,
For each point data of a complete trip sequence of a user, MAXA is the maximum anomaly allowable value, A (i) is the dynamic anomaly segmentation value of the ith point data, alpha and beta are dynamic parameters, bay (US (i)) is the expression of Bayesian posterior observation smooth prediction processing of the ith data point, and/ >The posterior observation probability of the data Y (j) and the data X (i);
specifically, the second reinforcement learning in the reinforcement learning of the signaling data of the dual mobile phone comprises the following steps:
S31, data are called from a historical database, and trip coincidence ratio CR (i) between the user complete trip sequence data in S20 is calculated:
wherein CR (i) is the coincidence degree between the ith historical data and the user data, H (i) is the extracted ith historical data,For the coincidence index scale value, ε is a spatially similar decay parameter,/>Representing X information in the user travel sequence data, wherein XH (i) represents X information in the ith historical data;
S32, setting a coincidence degree dividing threshold CRT according to the data quantity, and extracting historical data meeting the condition that CR (i) is more than or equal to CRT to construct an available data set HC, wherein the method is specifically expressed as follows:
HC={HC1,HC2,HC3,…,HCk}
s33, constructing a travel covariance matrix incorporating the coincidence weight according to the available data set, wherein the travel covariance matrix is specifically expressed as follows:
Wherein, Rcov HC is a travel covariance matrix incorporating coincidence weights, RHC is a matrix element, CRT is a coincidence degree dividing threshold, COV is a covariance function, K is the data volume contained in the dataset HC, and k=1, 2, …, K;
S34, adopting an algorithm of combining PCA with K-means clustering, carrying out principal component analysis on Rcov HC, dividing data into three motion states of travel, stillness and small-range activity on a new feature matrix, and giving a fuzzy weight value Tra (i), wherein the specific expression is as follows:
Wherein,
Tra (1), tra (2) and Tra (3) are fuzzy weight values of three motion states of travel, stillness and small-range activity respectively,Time fuzzy balance coefficients of three motion states of travel, static and small-range movement are respectively represented, λC= { λC1,λC1,…,λCK } is a coincidence characteristic root of Rcov HC, and Nm represents data quantity of the m-th class in a clustering result;
S35, constructing a fuzzy trip membership parameter set muf (i) according to fuzzy weight values of the three motion states in S34:
Wherein muf(1)、μf(2)、μf (3) is fuzzy membership parameters of three states of travel, stillness and small-range activity respectively,For the space distance operation between two data points before and after the complete trip sequence data of the user, epsilon T, zeta T and eta T are single-term, two-term and three-term fuzzy matching coefficients of the trip state respectively, epsilon S, zeta S and eta S are single-term, two-term and three-term fuzzy matching coefficients of the static state respectively, and epsilon A, zeta A and eta A are single-term, two-term and three-term fuzzy matching coefficients of the small-range active state respectively.
Specifically, in step S40, the three-gaussian mixture clustering algorithm (FTT-GMM) based on the fuzzy trip membership includes the following steps:
s41, adopting the idea of three decisions, combining the three types of states of the travel behaviors described in S30 to construct a three-weight GMM posterior probability function, and iterating function parameters by using a CS algorithm to obtain three-weight GMM posterior probability values of each point data in the user dataSpecifically expressed as follows;
Wherein,
Complete travel sequence for user/>The ith point location data below belongs to the three-weighted GMM posterior probability of the kth gaussian component, k=3, and pi represents data/>Probability of the kth Gaussian component, pt (x) is a three-weight GMM probability density function, tra (k) is a fuzzy weight value of the kth Gaussian component,/>The mean value is ablated for the vector dimension of the kth gaussian component, Σk is the tri-state covariance matrix of the kth gaussian component,/>Is a data dimension coefficient;
S42, defining three GMM cluster division thresholds based on the fuzzy trip membership parameter set;
Wherein, sigmaTLV is an upper threshold value for dividing three travel states, sigmaLLV is a lower threshold value, and thetaTf、θSf、θAf is respectivelyDividing loss costs to travel, stillness and small range of activities under fuzzy decision, and enabling thetaTc、θSc、θAc to be/>, respectivelyLoss costs divided into travel, stillness and small range of activities under clear decision, Ω is fuzzy state decision index,/>Is a clear state decision index;
s43, clustering the data of each point, and dividing the data into three classes of travel, static and edge movement according to the threshold value in S42;
Wherein,
TCU={TC1,TC2,TC3}
TCU is a clustering result set, TC1 is a travel set, TC2 is a static set, and TC3 is an edge motion set;
S44, merging the stationary set and the edge motion set, constructing a stay point set, and calculating a coverage area weighted radius SORi of each stay point set, wherein the coverage area weighted radius SOR is specifically expressed as:
SU={TC2}∪{TC3}
Wherein SU is a stay point set, SRi is a coverage area weighted radius of an ith stay point set, max </DEG > represents a maximum value calculation, srij represents a two-dimensional coverage radius of jth point data in the ith stay point set, f (·) represents a range mapping normalization calculation, v is a conversion rate control parameter, τi is a point set clustering attribution coefficient, and |dj-Di‖2 represents a Euclidean distance calculation of jth point data in the ith stay point set and a central point Di;
Specifically, in step S50, the dual dynamic POI similarity mapping algorithm includes the following steps:
s51, taking longitude and latitude coordinates of a center point of the stay point set as a center, reading land utilization information in a weighted radius range of a coverage domain of the stay point set, wherein the land utilization information comprises population number, building rectangular degree, urban degree, POI number and daily average travel amount, and providing a matching domain density Lρ, wherein the specific expression is as follows:
Wherein Lρ is the density of the matching domain, ADT is the daily average travel in the domain, AR is the building rectangle, NPOI is the number of POIs in the domain, peo is the number of population in the domain, SR is the weighted radius of the coverage domain, CLL is the urban coefficient in the domain, and lambda is the dimension normalization coefficient;
s52, dynamically adjusting a matching domain: the matching domain radius SRρ of the dwell point set is corrected according to the matching domain density as described in S51, specifically expressed as:
Wherein SRρ is the radius of the matching domain, SR (&) is the calculation of the weighted radius of the coverage domain, Lρ is the density of the matching domain, sri is the two-dimensional coverage radius of the ith point location, τi is the cluster attribution coefficient, deltar is the radius adjustment step length, and Tρ is the density dividing value;
S53, dynamic similarity weighted matching: extracting the POIs within the radius range of the matching domain in S52, constructing POI candidate sets, providing a dynamic and static set weighted similarity estimation algorithm, outputting similarity estimation values of all candidate POIs, and finally matching POI points of origin and destination points and complete travel chains, wherein the specific expression is as follows:
POIEi=max<SIM(k)>
PATHUs={POIE1,POIE2,…,POIEn}
Wherein SIM (k) is the similar estimated value of the kPOI th point in the corresponding matching domain of the ith stay point set,Is the coordinate vector of kPOI th point in the ith matching domain,/>Coordinate vector of jth point data in ith retention point set, II·IIis vector norm operation expression,/>For the region radius of kPOI th point, srij represents the two-dimensional coverage radius of the j-th point data in the i-th stay point set, ωTj is state set inclination weight, ed (·) is the expression of similar adjustment euclidean distance calculation, POIEi takes the accurate POI point with the POI corresponding to the maximum similar estimated value as the i-th origin and destination point, and PATHUs is the complete travel chain of the user Us;
Specifically, in step S60, after reading and processing all the user data, the user is complete with the travel sequence dataAnd returning to S10 the historical database for updating with the motion state fuzzy weight { Tra }.
The technical scheme of the invention also provides: an electronic device, comprising:
One or more processors;
a storage device having one or more programs stored thereon;
when the one or more programs are executed by the one or more processors, the one or more processors implement any of the feature extraction methods based on reinforcement learning of mobile phone signaling history data described above.
The technical scheme of the invention also provides a computer readable storage medium, on which a computer program is stored, which when being executed by a processor, realizes the steps in any feature extraction method based on the reinforcement learning of the mobile phone signaling historical data.
Compared with the prior art, the technical scheme provided by the invention has the following technical effects:
The feature extraction method based on the reinforcement learning of the mobile phone signaling historical data further ensures the accuracy and the reliability of the mobile phone signaling data as a data source for traffic travel research on the basis of the double reinforcement learning; the provided clustering algorithm enhances the data clustering effect; the dynamic information matching mechanism further enriches the information quantity about the trip origin-destination of the user; meanwhile, the historical database built based on the original user data and the processed user data can be used for travel analysis and research of other purposes in the future, so that the improvement of the utilization rate of signaling data of the user mobile phone is realized, and the problems of travel analysis and data statistics accuracy of the user are effectively solved.
Drawings
Fig. 1 is a flowchart of a trip origin-destination extraction algorithm based on reinforcement learning of mobile phone signaling history data according to an embodiment of the present application;
FIG. 2 is an algorithm flow chart of a trip origin-destination extraction algorithm based on reinforcement learning of mobile phone signaling history data;
FIG. 3 is a graph of user raw data versus user precision value-to-value extracted at the end.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the application will be further elaborated in conjunction with the accompanying drawings, and the described embodiments are only a part of the embodiments to which the present invention relates. All non-innovative embodiments in this example by others skilled in the art are intended to be within the scope of the invention. Meanwhile, the step numbers in the embodiments of the present invention are set for convenience of illustration, the order between the steps is not limited, and the execution order of the steps in the embodiments can be adaptively adjusted according to the understanding of those skilled in the art.
The invention discloses a feature extraction method based on reinforcement learning of mobile phone signaling historical data, which comprises the following specific steps as shown in fig. 1:
S10, constructing a history database according to the mobile phone signaling history data in the last period of time, and using the history database for double mobile phone signaling data reinforcement learning;
S20, first re-reinforcement learning and data characteristic learning: constructing a multichannel convolution Bayesian learning algorithm, learning signal intensity, time length and coordinate information characteristics of mobile phone signaling historical data, eliminating drift data and ping-pong data existing in initial mobile phone signaling data of a user, complementing the missing data points, and outputting complete trip sequence data of the user with high credibility;
s30, second reinforcement learning and travel behavior learning: defining travel coincidence degree calculation, learning historical data with higher travel coincidence degree with complete travel sequence data of a user, decomposing the data into three motion states of travel, stillness and small-range activity, outputting motion state fuzzy weights of the three motion states, and constructing a fuzzy travel membership set;
s40, combining the fuzzy travel membership sets, providing a three-Gaussian mixed clustering algorithm based on the fuzzy travel membership degree, carrying out clustering division on the complete travel sequence data of the user, extracting a travel set and a stay point set comprising two subsets of a static set and an edge motion set, and calculating the coverage area of each stay point set;
S50, reading land utilization information in the coverage range of the stay point set by means of an API (application program interface) of a map platform, providing a double-dynamic POI (point of interest) similar mapping algorithm, carrying out geographic information mapping and POI matching on the stay point set by combining land information to dynamically adjust algorithm parameters, and outputting accurate origin and destination points and a complete travel chain of a user;
And S60, reading and processing all the user data, dividing and outputting a travel OD matrix between each traffic cell in the same day according to time, counting the user travel mode and rule, and feeding back the user data result to the historical database in S10 for parameter updating.
In one embodiment of the invention, taking the mobile phone signaling data collected in a certain city as an example, a history database is constructed according to the mobile phone signaling history data in the last year, and meanwhile, the mobile phone signaling original data of a certain user in the city is taken as a processing sample to carry out reinforcement learning and feature extraction, and the specific method is as follows:
step one: user data cleaning pretreatment.
Historical database creation and user data reading: according to the signaling data characteristics of the mobile phone, defining that each piece of point location data content stored in a historical database comprises a user identifier, a user mobile phone number, a timestamp record of each piece of data comprising date and time information, position information comprising longitude and latitude coordinates and attribution of a base station, signal strength information reflecting signal quality between the base station and the mobile phone and dynamic and static three-value weights, wherein the record format of the data is as follows:
MS(U1,U2,…,Us)
Us(i)={Us,Usp,DATE,MM,(Lo,La,LCI),SI,Tra(j)}T
Wherein MS (Us) is a complete user data sample, Us (i) refers to ith point location data of user travel sequence data Us, Usp is a mobile phone number of user Us, DATE is DATE data in year/month/day format, MM is time length data in time/minute/second format, Lo is longitude coordinates of the user, La is latitude coordinates of the user, LCI is base station cell home coordinates, SI is signal strength, and Tra (j) is a jth motion state fuzzy weight value.
Step two: and 4, strengthening learning of the signaling data of the dual mobile phone.
In this embodiment, based on the mobile phone signaling history data, double reinforcement learning is performed to determine the accurate trip origin-destination and the complete trip chain of the user, as shown in fig. 2.
First, performing first re-reinforcement learning and data characteristic learning.
Firstly, carrying out format conversion on the mobile phone signaling historical data in S10 and the user initial mobile phone signaling data in S20, normalizing the timestamp data, longitude and latitude coordinates and signal strength information, and ensuring the consistency of the scales;
then, constructing a multichannel convolution Bayes learning algorithm, which specifically comprises the following steps:
Firstly, performing feature learning by using CNN pairs, namely performing CNN first-order learning on a signal intensity channel, and outputting a one-dimensional convolution feature learning result; performing CNN second-order learning by combining the time channel, and outputting a two-dimensional convolution characteristic learning result; and combining the coordinate channel to perform CNN third-order learning, and outputting a three-dimensional convolution characteristic learning result.
Next, introducing user initial mobile phone signaling data, and calculating an anomaly score for each point location data:
Wherein AS (i) is an anomaly score value of the ith point location data of the user data Us, ASm (i) is an anomaly residual under the mth dimension, and ρm is a dimension index.
Next, a Bayesian posterior observation smooth prediction processing calculation is introduced, data correction and elimination are carried out on the basis of historical data, and each point data after user correction is output
Wherein,The method comprises the steps that point location data of a complete trip sequence of a user are obtained, and MAXA is the maximum abnormal permission value;
A (i) is a dynamic anomaly segmentation value of the ith point data, and Bay (US (i)) is an expression of Bayesian posterior observation smooth prediction processing of the ith data point;
And then, performing second reinforcement learning and travel behavior learning.
Firstly, data are called from a historical database, and trip coincidence ratio CR (i) between the data and complete trip sequence data of a user is calculated, wherein the trip coincidence ratio CR (i) is specifically expressed as:
Wherein CR (i) is the coincidence degree between the ith historical data and the complete trip sequence data of the user, H (i) is the extracted ith historical data,For the coincidence index scale value, ε is a spatially similar decay parameter,/>Representing X information in the user travel sequence data, wherein XH (i) represents X information in the ith historical data;
Next, a coincidence degree dividing threshold CRT is set according to the data volume, and the historical data meeting the conditions of CR (i) not less than CRT is extracted to construct an available data set HC, which is specifically expressed as:
HC={HC1,HC2,HC3,…,HCk}
next, constructing a travel covariance matrix incorporating the coincidence weight, which is specifically expressed as:
Wherein, Rcov HC is a travel covariance matrix incorporating coincidence weight, RHC is a matrix element, and CRT is a coincidence degree dividing threshold;
Next, adopting an algorithm of combining PCA with K-means clustering to analyze the main component of Rcov HC, dividing the data into three motion states of travel, stillness and small range activity on a new feature matrix, and giving a fuzzy weight value Tra (i); tra (1), tra (2) and Tra (3) are used for respectively representing fuzzy weight values of three motion states of travel, stillness and small-range activity;
next, according to the fuzzy weight values of the three motion states, a fuzzy trip membership parameter set muf (i) is constructed, which is specifically expressed as follows:
Wherein muf(1)、μf(2)、μf (3) is fuzzy membership parameters of three states of travel, stillness and small-range activity respectively,For the space distance operation between two data points before and after the complete trip sequence data of a user, epsilon T, zeta T and eta T are single-term, two-term and three-term fuzzy matching coefficients of a trip state respectively, epsilon S, zeta S and eta S are single-term, two-term and three-term fuzzy matching coefficients of a static state respectively, and epsilon A, zeta A and eta A are single-term, two-term and three-term fuzzy matching coefficients of a small-range activity state respectively;
Step three: user origin-destination extraction matches geographic information.
The user original data is compared with the user accurate origin-destination extracted finally, as shown in fig. 3, and the method comprises the following steps:
Firstly, extracting a stay point set and demarcating a coverage area:
In the embodiment, on the basis of the fuzzy travel membership set, three Gaussian mixture clustering algorithm (FTT-GMM) based on the fuzzy travel membership degree is provided, and the specific process of the FTT-GMM algorithm is as follows:
Firstly, adopting the idea of three decisions, combining three types of states of travel behaviors to construct a three-weight GMM posterior probability function, and iterating function parameters by using a CS algorithm to obtain three-weight GMM posterior probability values of each point data in user dataSpecifically expressed as follows;
Wherein,
Complete travel sequence for user/>The ith point location data below belongs to the three-weighted GMM posterior probability of the kth gaussian component, k=3, and pi represents data/>Probability of the kth Gaussian component, pt (x) is a three-weight GMM probability density function, tra (k) is a fuzzy weight value of the kth Gaussian component,/>The mean value is ablated for the vector dimension of the kth gaussian component, Σk is the tri-state covariance matrix of the kth gaussian component,/>Is a data dimension coefficient;
then, defining three GMM cluster division thresholds based on the fuzzy trip membership parameter set;
Wherein, sigmaTLV is an upper threshold value for dividing three travel states, sigmaLLV is a lower threshold value, and thetaTf、θSf、θAf is respectivelyDividing loss costs to travel, stillness and small range of activities under fuzzy decision, and enabling thetaTc、θSc、θAc to be/>, respectivelyLoss costs divided into travel, stillness and small range of activities under clear decision, Ω is fuzzy state decision index,/>Is a clear state decision index;
then, clustering the data of each point, and dividing the data into three classes of travel, static and edge movement according to three GMM class cluster dividing thresholds;
Wherein,
TCU={TC1,TC2,TC3}
TCU is a clustering result set, TC1 is a travel set, TC2 is a static set, and TC3 is an edge motion set;
Finally, merging the stationary set and the edge motion set, constructing a stay point set, and calculating a coverage area weighted radius SORi of each stay point set, wherein the coverage area weighted radius SORi is specifically expressed as follows:
SU={TC2}∪{TC3}
Wherein SU is a resting point set, SRi is a coverage area weighted radius of an ith resting point set, max </DEG > represents a maximum value calculation, srij represents a two-dimensional coverage radius of jth point data in the ith resting point set, f (·) represents a range mapping normalization calculation, v is a transformation rate control parameter, τi is a point set clustering attribution coefficient, and |dj-Di‖2 represents a Euclidean distance calculation of jth point data in the ith resting point set and a central point Di.
Second, travel origin-destination information matching:
by means of an API (application program interface) of a map platform, land utilization information including factors such as POIs (points of interest), population number and the like in the coverage range of a stay point set is read, and a double-dynamic POI similarity mapping algorithm is provided in the embodiment, and the specific process is as follows:
Firstly, taking longitude and latitude coordinates of a center point of a stay point set as a center, reading land utilization information in a weighted radius range of a coverage area of the stay point set, wherein the land utilization information comprises population number, building rectangular degree, urban degree, POI number and daily average travel amount, and providing matching area density Lρ, wherein the specific expression is as follows:
ADT is the daily average travel in the domain, AR is the building rectangle degree, NPOI is the number of POIs in the domain, peo is the population in the domain, SR is the weighted radius of the coverage domain, CLL is the urban coefficient in the domain, and lambda is the dimension normalization coefficient;
Then, the matching field is dynamically adjusted. Correcting the matching domain radius SRρ of the stay point set according to the matching domain density:
Wherein SRρ is the radius of the matching domain, SR (&) is the calculation of the weighted radius of the coverage domain, Lρ is the density of the matching domain, sri is the two-dimensional coverage radius of the ith point location, τi is the cluster attribution coefficient, deltar is the radius adjustment step length, and Tρ is the density dividing value;
then, dynamic similarity weighted matching:
Extracting POIs within the radius range of the matching domain, constructing POI candidate sets, providing a dynamic and static set weighted similarity estimation algorithm, outputting similarity estimation values of all candidate POIs, and finally matching POI points of origin and destination points and complete travel chains, wherein the method is specifically expressed as follows:
POIEi=max<SIM(k)>
PATHUs={POIE1,POIE2,…,POIEn}
Wherein SIM (k) is the similar estimated value of the kPOI th point in the corresponding matching domain of the ith stay point set,Is the coordinate vector of kPOI th point in the ith matching domain,/>Coordinate vector of jth point data in ith retention point set, II·IIis vector norm operation expression,/>For the region radius of kPOI th point, srij represents the two-dimensional coverage radius of the j-th point data in the i-th stay point set, ωTj is state set inclination weight, ed (·) is the expression of similar adjustment euclidean distance calculation, POIEi takes the accurate POI point with the POI corresponding to the maximum similar estimated value as the i-th origin and destination point, and PATHUs is the complete travel chain of the user Us;
step four: data statistics and feedback.
Reading and processing all user data, dividing and outputting a travel OD matrix among traffic cells in the same day according to time, and counting the travel modes and rules of the user to obtain complete travel sequence data of the userThe motion state fuzzy weight { Tra } is returned to the historical database for updating, parameter updating is carried out, and the motion state fuzzy weight { Tra } is used for carrying out the same data learning and data processing on the mobile phone signaling data of the next user; the parameters include the complete trip sequence data/>, of the userMotion state blur weights { Tra }.
The foregoing is only a preferred embodiment of the invention, it being noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the present invention, and such modifications and adaptations are intended to be comprehended within the scope of the invention.