Disclosure of Invention
The invention mainly aims to solve the technical problems of high subjectivity, high time, labor and capital cost in the prior art, and can detect the level overlapping structure in a city and identify the characteristics of the level overlapping structure.
In order to achieve the above object, the present invention provides a method for detecting urban overlapping structure features based on a label propagation algorithm, comprising the steps of:
s1: obtaining city map data, taxi track data and interest point data in a research area, and carrying out city unit division on the city map data to obtain city grids; preprocessing the taxi track data to obtain processed track data;
s2: carrying out weighted matching on the urban grids and the processed trajectory data to obtain a four-layer directed weighting network;
s3: carrying out unsupervised training on the four-layer directed weighting network input graph division model, obtaining four-layer urban community structures after the training is finished, and extracting an overlapping structure in the four-layer urban community structures;
s4: and constructing a measurement index through the point of interest data, and identifying the overlapped structure through the measurement index to obtain the land use characteristics and the space interaction mode of the overlapped structure.
Preferably, step S1 is specifically:
s11: processing the urban map data through GIS software, performing spatial fishing net analysis on urban areas of the urban map data, and dividing the urban areas into urban grids; the city grid comprises a plurality of 500mx500m city grid cells;
s12: removing the point data and invalid point data which are not in the urban area in the taxi track data to obtain removed track data;
s13: and extracting the data of the point of getting on or off the vehicle in each piece of the eliminated track data, wherein the processed track data is a set of the data of the point of getting on or off the vehicle.
Preferably, step S2 is specifically:
s21: matching each getting-on and getting-off point data in the processed track data with each city grid unit, simulating each city grid unit into a graph node, and simulating the number of interaction times among the city grid units into the weight of an edge;
s22: the directed weighting network is composed of a plurality of city grid units and interactive relations among the city grid units, and the interactive relations are related to the interactive times among the city grid units;
s23: and path layering is carried out on the processed track data, a first layer is used when the track path is less than 3km, a second layer is used when the track path is less than 5km, a third layer is used when the track path is less than 9km, all the track paths are fourth layers, corresponding directed weighting networks are respectively constructed on the first layer, the second layer, the third layer and the fourth layer, and the four-layer directed weighting networks are obtained.
Preferably, step S3 is specifically:
s31: initializing the memory of each node in the graph partitioning model by using the id of the corresponding node, and obtaining a corresponding unique label by each node;
s32: selecting a certain node as a listener node;
s33: all adjacent nodes of the listener node send own unique labels to the listener node, and the listener node selects the most popular label from all the received labels;
s34: repeating the steps S32-S33 for n times, traversing all the nodes and obtaining the most popular labels of all the nodes;
s35: post-processing all the labels of the nodes to obtain the four-layer urban community structure, and evaluating the division result of the four-layer urban community structure through an overlapping modularity function, wherein the overlapping modularity function specifically comprises the following steps:
wherein m is the weight sum of edges in the network, A is the weighted adjacent matrix of the network, and if an edge exists between the node v and the node w, A isvwThe weight of vw edge is, otherwise, 0 is adopted; k is a radical ofv,kwRespectively, the out-degree weight of the node v and the in-degree weight of the node w, Ov,OwThe number of communities to which the node v and the node w belong respectively;
s36: and extracting an overlapping structure in the four-layer city community structure.
Preferably, in step S4, the measurement index includes: richness, simpson index and entropy measure index;
the land use condition and the function type can be identified through the richness; the land mixing condition can be identified through the Simpson index and the entropy measurement index;
the formula of the richness is specifically as follows:
Fi,ldenotes the enrichment index, n, of POIs of class I in the ith plotl,iRepresenting the number of type I land use types in the ith plot, niIs the number of all POIs in the ith parcel. N is a radical oflN is the total number of POIs of class i, and N is the total number of POIs in the whole research area;
the simpson index and the entropy measurement index are expressed by a hill index, and the formula specifically comprises:
in the formula, D represents the value of the Hill index, piRepresenting the proportion of the i-th POI; when q is 1, it represents entropy, and higher values indicate that the distribution of POI species is more disordered, and lower values indicate that the distribution of POI species is more ordered; q is 2, the inverse of the happson index, which measures the probability that two POIs randomly selected from an urban area belong to the same category; therefore, the abundance of the POI and the relative abundance of different types of POI are considered, and the lower the value is, the higher the mixed utilization degree of the land is, and the higher the value is, the lower the mixed utilization degree of the land is.
Preferably, in step S4;
the land use characteristics are land use types and land mixing degrees, the functional areas are structures which show certain functional attributes in urban areas, such as residential areas and scenic areas, and the functional structures and the function mixing degrees of the overlapped structures can be measured through the richness; the POI mixing degree of the overlapped structure can be calculated through the Simpson index and the entropy measurement index, and the POI mixing degree can reflect the vitality of the land;
the space interaction mode represents the interaction condition of the overlapping area and the adjacent community, the interaction strength is expressed through the interaction times, an interaction network is built, the travel mode of people is analyzed through the trajectory flow from one functional area to another functional area, the travel mode has a fixed time law such as the interaction mode from a living area to a working area in the morning rush hour and the interaction mode from the living area to a leisure area in the holiday, the interaction hotspot area of the overlapping area is analyzed through the interaction strength, and the functional structure of the interaction area is further identified through the richness.
A city overlapping structure feature detection system based on a label propagation algorithm comprises the following modules:
the data acquisition module is used for acquiring city map data, taxi track data and interest point data in a research area, and performing city unit division on the city map data to obtain city grids; preprocessing the taxi track data to obtain processed track data;
the four-layer directed weighting network acquisition module is used for performing weighted matching on the urban grids and the processed trajectory data to acquire a four-layer directed weighting network;
the overlapping structure extraction module is used for carrying out unsupervised training on the four-layer directed weighting network input graph division model, obtaining four-layer urban community structures after the training is finished, and extracting an overlapping structure in the four-layer urban community structures;
and the identification module is used for constructing a measurement index through the point of interest data, identifying the overlapped structure through the measurement index and obtaining the land use characteristics and the space interaction mode of the overlapped structure.
The invention has the following beneficial effects:
1. the invention adopts an analogy reasoning method, introduces a method for dividing the network science into city planning, has better benefit, and can divide the city structure in batch and automatically;
2. the invention can fully mine the spatial interaction information of the city hidden in the urban resident activity, pay attention to the overlapping structure existing in the network, and mine the land utilization characteristics and the spatial interaction relationship thereof.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1, the invention provides a method for detecting characteristics of an urban overlapping structure based on a label propagation algorithm, which is an extension method based on a model method, and applies the idea of network science to urban division on the basis of the previous space division model research, and detects the overlapping structure existing in the city; the method is based on the idea of label propagation in graph division, adopts an analogy reasoning method, aims at the hierarchy and the overlapping property of cities in the urbanization process, discusses the hierarchy overlapping property from human activities, and is beneficial to gradually grasp the regional change and the spatial distribution of the city space structure from local to whole; the method effectively introduces the concept of overlapping nodes in the complex network into city division, combines taxi track data, performs community detection on long and short distance city community structures, fully excavates the space interaction information of cities hidden in city resident activities, pays attention to the overlapping structures in the city community structures, and excavates the land utilization type and the space interaction relationship thereof;
the method specifically comprises the following steps:
s1: obtaining city map data, taxi track data and interest point data in a research area, and carrying out city unit division on the city map data to obtain city grids; preprocessing the taxi track data to obtain processed track data;
s2: carrying out weighted matching on the urban grids and the processed trajectory data to obtain a four-layer directed weighting network;
s3: carrying out unsupervised training on the four-layer directed weighting network input graph division model, obtaining four-layer urban community structures after the training is finished, and extracting an overlapping structure in the four-layer urban community structures;
s4: and constructing a measurement index through the point of interest data, and identifying the overlapped structure through the measurement index to obtain the land use characteristics and the space interaction mode of the overlapped structure.
In this embodiment, step S1 specifically includes:
s11: processing the urban map data through GIS software, performing spatial fishing net analysis on urban areas of the urban map data, and dividing the urban areas into urban grids; the city grid comprises a plurality of 500mx500m city grid cells; in this embodiment, 4853 city grid cells are obtained in total;
s12: removing the point data and invalid point data which are not in the urban area in the taxi track data to obtain removed track data;
s13: extracting the boarding and alighting point data in each eliminated track data, wherein the processed track data is a set of the boarding and alighting point data; in this embodiment, 793253 pieces of boarding and alighting point data are obtained in total.
In this embodiment, step S2 specifically includes:
s21: matching each getting-on and getting-off point data in the processed track data with each city grid unit, simulating each city grid unit into a graph node, and simulating the number of interaction times among the city grid units into the weight of an edge;
s22: the directed weighting network is composed of a plurality of city grid units and interactive relations among the city grid units, and the interactive relations are related to the interactive times among the city grid units; each city grid unit interacts with a plurality of other city grid units;
s23: path layering is carried out on the processed track data, a first layer is used when the track path is less than 3km, a second layer is used when the track path is less than 5km, a third layer is used when the track path is less than 9km, all the track paths are fourth layers, corresponding directed weighting networks are respectively constructed on the first layer, the second layer, the third layer and the fourth layer, and the four layers of directed weighting networks are obtained;
in the specific implementation, the route layering threshold is determined according to the track proportion of each layer of route, and the tracks less than 3km, less than 5km and less than 9km respectively account for 29.3301%, 51.1679% and 76.0485% of the total track; meanwhile, the network structure of the four layers of directional weighting networks can be changed according to requirements.
In this embodiment, step S3 specifically includes:
s31: initializing the memory of each node in the graph partitioning model by using the id of the corresponding node, and obtaining a corresponding unique label by each node;
s32: selecting a certain node as a listener node;
s33: all adjacent nodes of the listener node send own unique labels to the listener node, and the listener node selects the most popular label from all the received labels;
s34: repeating the steps S32-S33 for n times, traversing all the nodes and obtaining the most popular labels of all the nodes;
in the specific implementation, when a four-layer directed weighting network input graph division model is subjected to unsupervised training, an SLPA model parameter needs to be set, in the graph division model, the iteration number of the model is set to be 100, the number of communities divided into the smallest layers is set to be 3, in order to keep the result of each division consistent, the random seed is set to be 5140727168289296997, meanwhile, through contrast detection, a random seed value is used, the modularity fluctuation result of the community division result is less than 0.1, and the normal modularity value is 0.3-0.7. In addition, r parameters for controlling the output of the overlapped communities are determined through comparison, wherein r belongs to {0.01,0.05,0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45 and 0.5}, the modularity curve inflection point is determined when r is 0.1 through comparison experiments, and finally, r is selected to be 0.1 through training parameters, and the parameter selection can be changed according to requirements;
s35: post-processing all the labels of the nodes to obtain the four-layer urban community structure, and evaluating the division result of the four-layer urban community structure through an overlapping modularity function, wherein the overlapping modularity function specifically comprises the following steps:
wherein m is the weight sum of edges in the network, A is the weighted adjacent matrix of the network, and if an edge exists between the node v and the node w, A isvwThe weight of vw edge is, otherwise, 0 is adopted; k is a radical ofv,kwRespectively, the out-degree weight of the node v and the in-degree weight of the node w, Ov,OwThe number of communities to which the node v and the node w belong respectively;
in the specific implementation, the overlapping modularity function results of the four-layer urban community structure fluctuate up and down at 0.65, 0.6, 0.55 and 0.3 respectively, and considering the reasons of overlapping nodes and path lengths, it can be explained that the more the overlapping nodes are identified, the more chaotic the area is, the area possibly belongs to a plurality of areas, the result of community division is not good, the longer the path is, the larger the scale of community division is, and the interaction of short paths can influence the division result; for the overlapped nodes, the first community is the community with the maximum label probability, and so on;
s36: and extracting an overlapping structure in the four-layer city community structure.
In this embodiment, in step S4, the point-of-interest data is obtained through the gold API, and the data is reclassified; adopting a crawler method to acquire POI data of each category, wherein the data comprises 622206 data, and the POI data comprises attributes such as POI name, longitude and latitude, large category, medium category, small category, address and the like; meanwhile, referring to urban land use categories, reclassifying POIs into 16 categories;
the measurement index includes: richness, simpson index and entropy measure index;
the land use condition and the function type can be identified through the richness; the land mixing condition can be identified through the Simpson index and the entropy measurement index;
the formula of the richness is specifically as follows:
Fi,ldenotes the enrichment index, n, of POIs of class I in the ith plotl,iRepresenting the number of type I land use types in the ith plot, niIs the number of all POIs in the ith parcel. N is a radical oflN is the total number of POIs of class i, and N is the total number of POIs in the whole research area;
the simpson index and the entropy measurement index are expressed by a hill index, and the formula specifically comprises:
formula (II)Wherein D represents the value of the Hill index, piRepresenting the proportion of the i-th POI; when q is 1, it represents entropy, and higher values indicate that the distribution of POI species is more disordered, and lower values indicate that the distribution of POI species is more ordered; q is 2, the inverse of the happson index, which measures the probability that two POIs randomly selected from an urban area belong to the same category; therefore, the abundance of the POI and the relative abundance of different types of POI are considered, and the lower the value is, the higher the mixed utilization degree of the land is, and the higher the value is, the lower the mixed utilization degree of the land is.
In this embodiment, in step S4;
the land use characteristics are land use types and land mixing degrees, the functional areas are structures which show certain functional attributes in urban areas, such as residential areas and scenic areas, and the functional structures and the function mixing degrees of the overlapped structures can be measured through the richness; the POI mixing degree of the overlapped structure can be calculated through the Simpson index and the entropy measurement index, and the POI mixing degree can reflect the vitality of the land;
the space interaction mode represents the interaction condition of the overlapping area and the adjacent community, the interaction strength is expressed through the interaction times, an interaction network is built, the travel mode of people is analyzed through the trajectory flow from one functional area to another functional area, the travel mode has a fixed time law such as the interaction mode from a living area to a working area in the morning rush hour and the interaction mode from the living area to a leisure area in the holiday, the interaction hotspot area of the overlapping area is analyzed through the interaction strength, and the functional structure of the interaction area is further identified through the richness.
Referring to fig. 2, the present invention provides a system for detecting characteristics of an urban overlapping structure based on a label propagation algorithm, including the following modules:
thedata acquisition module 10 is configured to acquire city map data, taxi track data, and interest point data in a research area, and perform city unit division on the city map data to obtain city grids; preprocessing the taxi track data to obtain processed track data;
a four-layer directional weightingnetwork obtaining module 20, configured to perform weighted matching on the city grid and the processed trajectory data to obtain a four-layer directional weighting network;
the overlapstructure extraction module 30 is configured to perform unsupervised training on the four-layer directed weighting network input graph partitioning model, obtain a four-layer urban community structure after the training is completed, and extract an overlap structure in the four-layer urban community structure;
and theidentification module 40 is configured to construct a measurement index according to the point-of-interest data, identify the overlapping structure according to the measurement index, and obtain land use characteristics and a space interaction mode of the overlapping structure.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third and the like do not denote any order, but rather the words first, second and the like may be interpreted as indicating any order.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.