Summary of the invention
It is an object of the invention to overcome the defect of the above-mentioned prior art, a kind of community search side of net with attributes is providedMethod.
According to the first aspect of the invention, a kind of community search method of net with attributes is provided.Should, method includes:
Step S1: region of search range delimited according to the spatial position of the network user;
Step S2: target community is being searched for according to the connection tightness between the network user in net with attributes, wherein describedThe spatial position of user is within the scope of the region of search delimited in target community.
In one embodiment, step S1 includes following sub-step:
Carry out characterization attributes network with Connected undigraph G=(V, E, S), wherein V indicates that vertex set, E indicate that side collection, S indicateSet of spatial locations, the vertex representation network user;
In the Connected undigraph G, the target community indicated with connected subgraph is searched for, wherein the vertex position of the subgraphThe circle encirclement that can be D by diameter and other subgraphs relative to the Connected undigraph G are set, vertex is formed most in the subgraphThe k-core of high-order.
In one embodiment, in step s 2, the target community indicated with connected subgraph is searched for according to following steps:
Step S21: for the Connected undigraph G, quaternary tree index structure is constructed, wherein root node corresponds to the entire of GSpace;
Step S22: traversing the quaternary tree index structure, obtains side length less than the side length of D and its father node greater than D'sThese nodes are stored in node listing nodeList by all nodes;
Step S23: for each node in node listing nodeList, maximum nucleus number k is obtainedcur;
Step S24: N.DistMap [k is trimmed from node listingcurThe node N of] > D, wherein N.DistMap [kcur]Indicate node N apart from mapping table;
Step S25: for the remaining node in nodeList, carrying out ascending sort according to the nucleus number upper bound and successively verify,To search out k-core of the satisfaction with most high-order and can be surrounded by diameter for the circle of D.
In one embodiment, in step s 25, for a node N in node listing nodeList, using following stepSuddenly it is verified:
N is extended with length D, carry out nuclear decomposition in the square area of extension and ignores nucleus number less than kcurVertex;
The remaining vertex verified in the square area of extension is higher than k with the presence or absence of ordercurK-core, if it does,It then records the k-core and updates kcur。
In one embodiment, rank whether there is using the remaining vertex in the square area of following steps verifying extensionNumber is higher than kcurK-core:
For a vertex in node N, place it on the boundary for the circle that diameter is D and rotational circle;
When there is new summit to enter bowlder, order is checked for higher than kcurK-core.
In one embodiment, rank whether there is using the remaining vertex in the square area of following steps verifying extensionNumber is higher than kcurK-core:
The square area of extension is divided into m × m cell, using can surround the covering s of the circle that diameter is D ×The square of s cell is come the k-core that searches in extended square area, wherein s, m are positive integer and s is less thanm。
In one embodiment, rank whether there is using the remaining vertex in the square area of following steps verifying extensionNumber is higher than kcurK-core:
For a vertex in node N, place it on the boundary for the circle that diameter is D and rotational circle;
In rotational circle, when the new summit for entering circle meets kcIt when-core, stops rotating, wherein kcIndicate current authenticationNucleus number.
In one embodiment, the target community indicated with connected subgraph is searched for according to following steps:
The circle that all diameters are D is searched in shown Connected undigraph G;
For searching all circles, the maximum kernel order on the vertex surrounded can be justified and will have maximum kernel order by checkingThe vertex that is surrounded of circle as the target community.
Compared with the prior art, the advantages of the present invention are as follows: the present invention provides the co-located communities with structure cohesivenessThe solution of search;And during community search, by constructing index structure, by spatial information and local structural informationIntegrate the efficiency and validity for improving community search.
Specific embodiment
It is logical below in conjunction with attached drawing in order to keep the purpose of the present invention, technical solution, design method and advantage more clearCrossing specific embodiment, the present invention is described in more detail.It should be appreciated that specific embodiment described herein is only used for explainingThe present invention is not intended to limit the present invention.
It is as shown herein and discuss all examples in, any occurrence should be construed as merely illustratively, withoutIt is as limitation.Therefore, other examples of exemplary embodiment can have different values.
Technology, method and apparatus known to person of ordinary skill in the relevant may be not discussed in detail, but suitableIn the case of, the technology, method and apparatus should be considered as part of specification.
One of goal in research of the invention, which is to provide, most has the search problem of the co-located community of cohesiveness (referred to herein asMC3, the most cohesive co-located community), wherein the community searched for meets following two categoryProperty: structure cohesiveness refers to that the member contact in community is most close;Space is co-located, refers to that member is closer to each other on geographical location, toolThere is space cohesiveness.
According to one embodiment of present invention, the community search method of home network is provided, in short, this method utilizesConnected undigraph carrys out characterization attributes network, meets structure cohesiveness and space cohesiveness standard by searching in Connected undigraphConnected subgraph determine searched for target community.Specifically, shown in Figure 1, method includes the following steps:
Step S110 indicates net with attributes using Connected undigraph.
In embodiments of the present invention, it is described for characterizing undirected net with attributes G=(V, E, S) with Connected undigraph,Wherein, G has vertex set V, side collection E and set of spatial locations S.In G the degree of vertex v (for example, user in social networks) byDegG (v) indicates that each vertex v has spatial position v.l=(x, y) ∈ S (for example, registration location of user), x and y differenceIndicate the coordinate in two-dimensional space along x-axis and y-axis.
For ease of description, symbol definition of the present invention is summed up as follows:
G (V, E, S): the geography society figure with vertex set V, side collection E and set of spatial locations S is indicated;
(v.x, v.y): indicate a vertex v in vertex set V along the position of x-axis and y-axis;
DegG (v): the degree of a vertex v in G is indicated.
γ (N): the side length of index structure interior joint N is indicated.
Goal in research of the invention is that the community indicated by connected subgraph, the community are found from Connected undigraph GMeet the following conditions: structure cohesiveness, i.e., the vertex connection in connected subgraph are most intensive;Space cohesiveness, i.e., in connected subgraphVertex it is spatially very compact.
In embodiments of the present invention, the assessment of structure cohesiveness is illustrated by taking k-core as an example, it should be understood thatIt is that method of the invention also can be generalized to the algorithm of other commensurate structure cohesiveness such as k-truss, clique.
For ease of description, following concept is introduced first:
1), the definition of k-kore
For k-core, nonnegative integer k is given, the k-core of G is the clique of G, wherein each vertex v in the subgraphDegree be not less than k.
Specifically, in embodiments of the present invention, (G is expressed as using the connection k-core in Gk) indicate community, referred to as GkRank be k.A given figure, k-core can be obtained by algorithm in the prior art, for example, linear kernel decomposition algorithm, lineThe complexity of property nuclear decomposition algorithm is denoted herein as O (| E |) (for example, bibliography " An o (m) algorithm forCores decomposition of networks ", Batagelj, V., Zaversnik, M., arXiv preprint cs/0310049(2003))。
2), the definition of nucleus number
For giving the vertex v in G, nucleus number is the most high-order of the k-core comprising v, is expressed as CG[v]。
3), the definition of co-located community
In embodiments of the present invention, co-located community refers to connected subgraph (k-core) Gk, wherein the vertex position in the subgraphSetting can be surrounded by the circle of predetermined diameter D.Vertex position herein desirably in co-located community is closer, this is able to reflect this" the co-located property " of a community.
In embodiments of the present invention, undirected attributed graph G and diameter D, co-located community search (MC are given3) return to any vertexGroup and its position, meet following constraint: the position on vertex can be that the circle of D is surrounded with diameter;The k- of vertex formation most high-ordercore。
Fig. 2 is the schematic diagram of net with attributes and co-located community, wherein C1And C2It is two co-located communities in net with attributes,The circle that its member can be D by diameter surrounds, C1Member include A, B, C, C2Member include D, G, H, F, E.C2It is 3-core,It is the core (about diameter D) in two co-located communities with most high-order, therefore C2It is the MC of the net with attributes3, that is, to be searched forTarget community.
Step S120, searches for connected subgraph in Connected undigraph, it is made to meet structure cohesiveness and space cohesiveness markIt is quasi-.
It, can the present invention is directed to find the circle that the most community of structure cohesiveness and the community can be D by diameter to surroundTo meet the community of structure cohesiveness and space cohesiveness standard using search in various embodiments dependence network.
Embodiment 3: the k core quaternary tree mode based on perceived distance
Above-mentioned space mode of priority and structure mode of priority is all unable to reach good performance, this is because MC3ProblemIt needs to consider simultaneously space cohesiveness and structure cohesiveness, but both modes or has ignored the space characteristics of data or neglectStructure feature is omited.
In a preferred embodiment, target community is scanned for using the k core quaternary tree of perceived distance, hereinReferred to as DkQ-TREE (Distance-aware k-core Quadtree).It can be constructed using quad-tree structure and precalculate officeThe index of portion's structure cohesiveness, to acceleration search and trim search space.
Hereafter by the specific tree index structure for introducing the quaternary tree based on spatial index, and based on index structure propositionCommunity search method solve MC3Problem.
1), the index structure of quaternary tree
Known linear k nuclear decomposition algorithm can only calculate the global nucleus number on vertex, therefore during inquiry, local cohesionForce information is unknown.In the index structure based on quaternary tree, structural information and spatial information are combined to calculateThe cohesiveness of part (about diameter D).
Quad-tree structure is shown in Figure 3, in short, the method for building DkQ-TREE is that root node corresponds to entire space,Entire space is divided into four sub-spaces, every sub-spaces correspond to a child node of root node.Then, it repeats each sectionPoint is separated into four child nodes, for example, entire space is root node (root), four child nodes point of the root node for Fig. 3Not Dui Ying { A, B, C }, { K, J }, { L } and { D, E, F, G, H, I } similarly can further segment four child nodes.
In this embodiment, using quaternary tree and the space monotonicity based on localized agglomeration power precalculates each tree nodeLocalized agglomeration power and other useful informations.Space monotonicity refers to, area of space R (for example, square) is given, if the areaVertex in domain is capable of forming the k-core of most h ranks, then for any region R ', the k- formed by the vertex in R ' in RThe order of core is not more than h.Space monotonicity attribute has less vertex based on lesser region.
In each node N of DkQ-TREE, each vertex of the node from the subgraph of the extracted region is precalculatedNucleus number, and record the maximum nucleus number on vertex in node, be labeled as LCN.Executing this calculating is since following principle is (hereinIn be known as lemma 1): the tree node N, MC that given inquiry diameter D and the circle that can be D by diameter surround3Order be not less than LCN。
Since the N circle that can be D by diameter surrounds, above-mentioned principle can be proved according to space monotonicity attribute.Therefore, rootAccording to the information precalculated MC can be obtained from DkQ-TREE3Rank Lower Bound Estimation.
However, this is still not enough to obtain local cohesiveness, the maximum nucleus number in each node can only be obtained.It can by Fig. 3To find out, when some vertex are not on certain node, the vertex of the node can form a k-core.Therefore, for givenDiameter D, the boundary of the nucleus number of these nodes cannot be obtained.Therefore, distance mapping of the vertex in each tree is further calculatedTable DistMap.Thought is node N to be given, for each value k > LCN, by point spread to the vertex with minimum range d, makeIt obtains the vertex being related to during extension and is capable of forming k-core, distance d and corresponding k are being recorded in mapping table.
Facilitate to trim search space (referred to herein as lemma 2) according to following principle apart from mapping table: assuming that currentMC3Optimal factor be kcur, inquiry diameter D and node N is given, if N.DistMap [kcur] > D, then N cannot be to MC3ContributionAny vertex, wherein N.DistMap [kcur] indicate node N apart from mapping table, the optimal factor of N is kcur。
Space monotonicity attribute also can be used to prove, i.e., if N.DistMap [k in above-mentioned principlecur] > D, then meanWhen extend N boundary length be diameter D when, can not still find k in this regioncur- core, therefore, any section in the regionThe nucleus number of point is less than kcur, can be trimmed to about.
In addition, in order to also vertex mapping table can be used when vertex has multiple positions from position quick obtaining vertexOrganising map information.
To sum up, in embodiments of the present invention, for each node of DkQ-TREE, the information of storage includes: in the nodeVertex;Maximum nucleus number in the node;Vertex mapping table;Apart from mapping table.
2), the index construct of quaternary tree
Referring also to quad-tree structure as indicated at 3, entire space is root node (root), four child nodes of the root nodeIt respectively corresponding { A, B, C }, { K, J }, { L } and { D, E, F, G, H, I }, node { A, B, C } is further subdivided into { A }, { B }, { C },Node { D, E, F, G, H, I } is further subdivided into { D }, { E }, { F } and { G, H, I }.When obtaining a new node, use firstVertex in the node carries out nuclear decomposition and stores maximum nucleus number.If maximum nucleus number is less than some value kε, then not furtherSplit the node.For example, vertex { A, B, C } forms 2 cores in Fig. 3, then this region is divided, is formed { A }, { B } and { C }.?After division, any subregion cannot all form 2 cores, therefore, stop splitting the corresponding node of these subregions.
In addition, also constructing it apart from mapping table (Distance Map) and vertex mapping table when obtaining a new node(Vertex Map).Building vertex mapping table is the position for marking each vertex, for example, in Fig. 3, the position seal of vertex ARecord vAPosition be A (vA' s locations:A), it is other similar.The thought constructed apart from mapping table is, for each value k, to holdRow binary search with by point spread to minimum range vertex so that the vertex introduced during extension is capable of forming kCore.For example, with reference to shown in Fig. 4, node only has a vertex C, when expanding to vertex B, formed 1-core, extended away fromFrom being d1;When expanding to vertex A, it is initially formed 2-core, the distance extended is d2.Distance d1 and d2 are stored to distanceMapping table, such as storage format are 1-core:d1;2-core:d2.
3), the community search method MC based on quaternary tree3Alg
In embodiments of the present invention, propose that two kinds of algorithms are referred to as to distinguish based on quaternary tree index structureMC3Alg algorithm and MC3Alg+ algorithm, MC3Alg+ is MC3The improvement of Alg algorithm.
In short, MC3Alg algorithm is related to two iterative steps: the node in trimming DkQ-TREE;From the section that can not be trimmedMC is found in point3.Specifically, MC3Alg the following steps are included:
Step S211 trims the node in DkQ-TREE
In this step, MC is obtained according to above-mentioned lemma 13The lower bound of order.
Specifically, given diameter D traverses DkQ-TREE from top to bottom, obtains side length and is less than D and the side length of its father nodeAll nodes greater than D.These nodes are stored in node listing nodeList.Then, from these sections in node listingPoint obtains maximum nucleus number and uses k as lower boundcurIt indicates.Use the MC3The lower bound of order is (i.e. given to look into according to lemma 2Diameter D and node N is ask, if N.DistMap [kcur] > D, then N cannot be to MC3Contribute any vertex) further trimmingNode in nodeList.
Step S212 searches for target community from node remaining after trimming.
After trimming, for the remaining node in nodeList, according to from the nucleus number upper bound obtained apart from mapping table intoThen row sequence starts to verify optimal node N.
Specifically, node N is given, if N.distMap [k1]≤D≤N.distMap[k2], then k1It is the core on vertex in NThe number upper bound.Firstly, extending N with length D and carrying out nuclear decomposition in the square area of extension.It is then possible to safely ignore coreNumber is less than kcurVertex because these vertex cannot be included in MC3In.In order to verify whether remaining vertex in extended area depositsIn the k-core with higher order, rather than all possible circle is checked as in the mode of priority of space.Implement at oneIn example, using rotational circle method, basic thought is, for each vertex in node N, places it in the circle that diameter is DOn boundary, then, circle is rotated clockwise.When vertex enters bowlder, order is checked for higher than kcurK-core.IfIn the presence of record k-core simultaneously updates kcur.For example, with reference to shown in Fig. 5, make vertex G be located at circle boundary on and rotate clockwiseCircle can find the 2-core formed by { G, F, H, I } when F enters bowlder.
K can be updated after verifying NcurAnd further according to updated kcurTrim the more more piece in nodeListThen point executes verifying from next optimal node.It repeats the above process, until having handled all nodes in nodeList.
Further clearly to illustrate, following example 1 describes MC in the form of pseudocode3The frame of Alg.Firstly, from DkQ-TREE (the 1st row) obtains nodeList;Then, MC is obtained3The lower bound of order, and N is stored using φmaxIn best k-core(2-4 row), for each node in nodeList, obtain it apart from mapping table DistMap and check need to be expandedOpen up the distance comprising k-core;Knot removal (5-8 row) is safely carried out by lemma 2;Obtain vertex in the nodeThe nucleus number upper bound (the 9th row);Next, being ranked up by the ascending order (the 10th row) in the node upper bound to nodeList, for each sectionPoint is extended it using length D and trims vertex as described above;For each vertex unpruned in N, rotational circle is usedMethod checks k-core and updates φ (11-15 row).K-core with most high-order is ultimately stored in φ (the 16th row).
It include G referring also to giving shown in Fig. 5, the both candidate nodes of H, I make G on round boundary, and I, H, F, E, D existIn round rotary area, the sequence for entering circle according to them obtains ordered list { I, H, F, E, D }.Then, circle is rotated clockwise,When the vertex in { I, H, F, E, D } enters circle (on its boundary), rotation stops and checks inside it whether there is k-core.For example, a 2-core ({ G, I, H, F }) can be obtained in circle when F enters bowlder.When circle rotates to vertex D, can obtainTo a 3-core ({ G, H, F, E, D }).In an identical manner after processing H and I, it can be seen that { G, H, F, E, D } is the sectionWith the k-core of most high-order in point.
For MC3Alg algorithm, computation complexity are analyzed as follows:
Assuming that average each unit space region includes n vertex and m edge, and obtained from the DkQ-TREE of given DObtain X node.
Firstly, being ranked up according to the nucleus number upper limit to node, complexity is O (XlogX).Then, for γ (N)=Each node N of l, by N extension length D, i.e. γ (Nex)=2D+l, and nuclear decomposition is carried out in the square area.It is extendingSquare in, there is (2D+l)2M edge, therefore nuclear decomposition cost is O ((2D+l)2m).Next, each top in NRotational circle on point.In each circle, haveA vertex andSide.
It note that the k-core executed in circle verifying can be divided into three steps:
Spending check cost isNuclear decomposition cost isBFS (breadth-first search) is checkedCost isTherefore, k-core verifying cost is up toIn the worst case, in NMost π D are executed for each vertex2(number of vertex in N is l to n times2n).Therefore, MC3The total complexity of Alg algorithm is
4), the community search method MC based on quaternary tree3Alg+
MC3Alg is still not efficient enough in large attribute network, and has limitation.This is because, firstly, eachIn the node to be checked, there are many vertex, and each vertex is required using rotational circle method;Secondly, the expansion area of nodeThere are many vertex in domain, therefore need to verify many times k-core in rotational circle.In order to overcome these problems, providing one kind more hasThe algorithm of effect, referred to herein as MC3Alg+。MC3Alg+ and MC3The main distinction between Alg is, the verifying cost of node,And node trimming and MC in DkQ-TREE3Alg is identical.
In MC3In Alg+, for each node N to be checked, binary search is executed to find the maximum kernel in the nodeNumber.The upper limit of nucleus number obtains in mapping table from N's, with MC3Alg is similar, and lower limit is current optimal factor.It is searched at two pointsDuring rope, check in the extended area of N with the presence or absence of with current nucleus number kcK-core.This mode can be quickly obtainedBiggish kc, have the beneficial effect that, firstly, reducing the vertex in the N detected;Second, reduce and draws in circle rotationVertex quantity in the extended area entered.
Next, the square area of extension is divided into m × m in order to be further reduced the vertex in the N to be checkedCell, and the vertex that can not form solution is filtered out using a small square.Basic principle be or not directly byA inspection vertex, but the square of covering s × s unit is used, it can surround the circle that diameter is D and carry out search extension justAll k-core in square region.It is mobile from the upper left corner of extension square area (including m × m cell) to the lower right corner(s × s) square checks each position of square with the presence or absence of kc-core.Record includes kcAll squares of-core,Round rotation only is carried out to the vertex i.e. in N and such square, wherein m, s are positive integer, and s is less than m, is actually answeringIn, m and s appropriate can be set according to diameter of a circle, requirement to search granularity etc..In this fashion, verifying granularity isUnit rather than vertex, therefore verifying speed is faster.
Finally, proposing two points of rotational circle methods of one kind to check candidate vertices, to improve verifying cost.With MC3Alg'sThe main distinction is, in rotational circle, when new summit enters bowlder, will not stop rotating, but, use binary search strategyHandle this problem.Specifically, it stops rotating when reaching such a vertex, from vertex is initially entered to the vertex, firstMeet kc-core.Then, the circle on boundary with the vertex is checked, if there is kc- core then records it and stops rotating;Otherwise, it since the circle checked, finds and can satisfy kcNext vertex of-core.Due to that can skip not comprising anyThe big region of core, therefore which is highly effective.
The example of binary search process shown in Figure 6 gives both candidate nodes identical with Fig. 4, is executed based on nucleus numberBinary search.There is upper bound upper=3 (from apart from mapping table) and lower bound lower=2 (current optimum value) first, therefore, whenPreceding nucleus number kcIt isThen, border vertices are set by vertex G, H, I.In rotary course, two points of strategies are considered.Firstly, obtaining an ordered list { I, H, F, E, D } according to the sequence for entering search circle, ordered list is labeled asInAngleList.Next, binary search is executed on InAngleList to find the vertex for meeting 2-core first.BecauseRotary area { G, H, I } forms 2-core, so find vertex H first, i.e., will circle rotate to H and find 2-core (i.e. G, H,I}).It records and updates lower=2+1=3.Now, kcIt is 3, setting vertex G as border vertices and is repeated the above process.WhenFor vertex D when on the boundary of search circle, ({ G, D, E, F, H } forms 3-core to rotary area.Circle is directly rotated into vertex D, energyIt is enough that 3-core is found in circle, in this manner, it when rotating to F, E, will not stop, but be directly rotated to vertex D.MostAfterwards, discovery { G, D, E, F, H } is the best core in the node.
For MC3Alg+ algorithm, computation complexity are analyzed as follows:
Make and M C3A l g is identical it is assumed that each expanding node N checked in needsex(γ(Nex)=2D+l) on holdRow binary search.Assuming that from the maximum nucleus number obtained apart from mapping table be kmax, and be preferably at most logk to the binary search of kmaxIt is secondary.The square area of extension is divided into T × T unit lattice, and filters out some vertex using the small square of covering s × s.Small square coveringA vertex andA side needs to move small square (T-s)2It is secondary.CauseThis, the expense of moving process is up toFor each vertex in N, in two cyclotomyIn rotary course, expense is up to(each circle coveringA vertex andA side.Therefore, in the worst case, MC3Total complexity of Alg+ is
In order to further verify effect of the invention, emulation experiment has been carried out to assess the technical effect of above-described embodiment,Wherein have evaluated the MC based on quaternary tree3Alg and MC3Alg+ algorithm and structure mode of priority, space mode of priority.ButSince structure mode of priority and the space mode of priority speed of service are very slow, its performance only is reported in one group of experiment below.It is realIt is as follows to test condition setting:
1), about the setting of data set
Experiment is utilized four data sets, including three real data sets (Gowalla, FourSquare, Flickr) andOne generated data collection (YoutubeSyn).In Gowalla, each vertex is the user in Gowalla, and each side indicatesFriendship between two users.Each user has many registrations, selects a most frequently used register information as his position.AndAnd in this data set there is the case where multiple registrations to be tested also directed to user.In FourSquare, each vertexIt is all the user of the website Foursquare, each side represents the social networks between two users.For each user, it is selectedPosition of the most common register information as him.In Flickr, vertex is user, and side indicates " following " between two usersRelationship.Mark user in the position for wherein possessing most photo tokens.In YoutubeSyn, each vertex represents YoutubeUser, each side is " to follow " relationship between two users.But the not no location information of user, it is raw for each userAt a position.In addition, in an experiment, position, including random distribution and Gaussian Profile are also generated using two kinds of location modes.The details ginseng of data set is shown in Table 1, whereinIt is average degree, maxkIt is the maximum position number on node.
Table 1: data set attribute
2), about the setting of parameter.
10 are set by the quantity (quantity of the grid cell in expanded search region) of m, tests prove that the parameter is notIt can have much impact to performance, as m=10, realize the optimum operation time, therefore make in all experiments using m=10For default value.In the experiment of multiple positions of user, for Gowalla, the position of user is all registrations of the userInformation.For YoutubeSyn, the random position for generating user.In different distribution experiments, generates two distributions of satisfaction and wantThe position asked, including random distribution and Gaussian Profile.For all data sets, by position be placed on size be [0,100] × [0,100] in square.
3), about experimental facilities.
It tests, is mounted on the machine of configuration Intel i7-6700 3.40GHz processor and 16GB memoryWindows10, and all algorithms are realized with java.
The experimental results showed that changing diameter D, there are multiple registration locations on a vertex, changes the factors such as user location distributionThe technical effect of the embodiment of the present invention can be had an impact.
Fig. 7 (a) to Fig. 7 (c) is the relevance schematic diagram of diameter and runing time, and specifically, changing diameter D will affect knotStructure precedence method, space precedence method, MC3Alg and MC3The region of search of Alg+ and efficiency.Referring to shown in Fig. 7 (a) to Fig. 7 (c),In, abscissa indicates diameter D, changes to 12.5 from 2.5 and (refers to for actual coordinate being transformed into [0,100] x's [0,100]Coordinate behind square aearch region), ordinate indicates that runing time, unit are second (sec).Fig. 7 (a) to 7 (c) shows fourThe runing time of kind algorithm, i.e. space precedence method (spatial), structure precedence method (structure), MC3Alg and MC3Alg+,Fig. 7 (a) is the experimental result in data set Flickr, and Fig. 7 (b) is the experimental result of data set FourSquare, and Fig. 7 (c) isThe experimental result of data set Gowalla.It is observed that MC3Alg+ is always better than other algorithms, because it has most cutBranch strategy and optimisation strategy, and space precedence method and structure sequence rule are very time-consuming, therefore will ignore in subsequent experimentBoth algorithms.
Fig. 8 (a) to 8 (b) is the relevance schematic diagram of positional number and runing time, wherein abscissa is positional number, indulges and sitsMark is runing time (sec), and Fig. 8 (a) is the experimental result of data set YoutubeSyn, and Fig. 8 (b) is data set GowallaExperimental result.When there are multiple registration locations on a vertex, more registration locations will lead to more k-core and check.CauseThis, log-on count will affect MC3Alg and MC3The performance of Alg+.It is observed that MC3Alg+ by repeatedly register influenced it is smaller,This is because executing binary search can speed up MC3The rotary course of Alg+.In addition, MC3The runing time ratio MC of Alg+3Alg is fastAbout 7 times.
Fig. 9 (a) to 9 (b) is the relevance schematic diagram of position distribution and runing time, wherein abscissa is diameter value, is indulgedCoordinate is runing time, the Gaussian Profile of Fig. 9 (a) corresponding data collection YoutubeSyn, Fig. 9 (b) corresponding data collectionThe random distribution of YoutubeSyn.It is observed that MC3Alg+ is better than MC always3Alg.It should be noted that MC3Alg+'s is superiorProperty become apparent in Gaussian Profile, this is because some nodes include larger numbers of vertex, this leads to MC3Alg is being searched forThere is higher complexity when these nodes.
Figure 10 (a) to Figure 10 (b) is the effect picture of scalability, wherein abscissa is the percentage on vertex, is referred to wholeA data set number of vertex percentage (such as 20% expression certain data set number of vertex 20% scale Sub Data Set carry out realityTest), ordinate is runing time, and Figure 10 (a) corresponds to Flickr data set, and the corresponding FourSquare data set of Figure 10 (b) passes throughChange two datasets to demonstrate the scalability of the embodiment of the present invention.It is observed that two kinds of algorithms can fit wellAnswer data set size and MC3Alg+ also runs most fast due to there is more trimming strategies.
In conclusion for the search problem of the most co-located community of cohesiveness, the present invention provides various embodiments,It is preferably based in the index structure (i.e. DkQ-TREE) of quaternary tree, spatial information and local structural information is integrated,Accelerate the speed of target community's search.Also, it is based on DkQ-TREE, proposes two kinds of efficient algorithms, by true and conjunctionAt data set progress, experimental results demonstrate the efficiency of the algorithm proposed and validity.The community search side of the embodiment of the present inventionMethod can be used for the behavioural analysis of social network user, recommend, disease forecasting etc..
It should be noted that, although each step is described according to particular order above, it is not intended that must pressEach step is executed according to above-mentioned particular order, in fact, some in these steps can concurrently execute, or even is changed suitableSequence, as long as can be realized required function.In addition, those skilled in the art is in the premise without prejudice to spirit of that inventionUnder, some embodiments can suitably be deformed, for example, with rotational circle counterclockwise, scale, Yong Huxu based on net with attributesIt asks, inquiry velocity requires that diameter D appropriate etc. is arranged.
The present invention can be system, method and/or computer program product.Computer program product may include computerReadable storage medium storing program for executing, containing for making processor realize the computer-readable program instructions of various aspects of the invention.
Computer readable storage medium can be to maintain and store the tangible device of the instruction used by instruction execution equipment.Computer readable storage medium for example can include but is not limited to storage device electric, magnetic storage apparatus, light storage device, electromagnetism and depositStore up equipment, semiconductor memory apparatus or above-mentioned any appropriate combination.The more specific example of computer readable storage mediumSub (non exhaustive list) include: portable computer diskette, hard disk, random access memory (RAM), read-only memory (ROM),Erasable programmable read only memory (EPROM or flash memory), static random access memory (SRAM), Portable compressed disk are read-onlyMemory (CD-ROM), memory stick, floppy disk, mechanical coding equipment, is for example stored thereon with instruction at digital versatile disc (DVD)Punch card or groove internal projection structure and above-mentioned any appropriate combination.
Various embodiments of the present invention are described above, above description is exemplary, and non-exclusive, andIt is not limited to disclosed each embodiment.Without departing from the scope and spirit of illustrated each embodiment, for this skillMany modifications and changes are obvious for the those of ordinary skill in art field.The selection of term used herein, purportIn principle, the practical application or to the technological improvement in market for best explaining each embodiment, or make the art itsIts those of ordinary skill can understand each embodiment disclosed herein.