Disclosure of Invention
The invention provides a high-net-value customer group identification method, a high-net-value customer group identification device and a storage medium based on a knowledge graph.
The invention adopts the following technical scheme:
the high-net-value customer group identification method based on the knowledge graph comprises the following steps:
constructing a client knowledge graph model, wherein the knowledge graph model comprises a plurality of nodes and node relations, each node has a unique name, the nodes are divided into client nodes and non-client nodes, edges exist between the client nodes and the non-client nodes, and the edges are node relations;
importing the client data in the MySQL structured database into a Neo4j graphic database;
randomly determining a client node as an initial client node in a graph, traversing neighbor client nodes of the initial client node in a breadth-first mode, and calculating the strength of social relationship among clients;
if the calculation result of the social relationship strength between the client and the initial client is more than 0.5, judging that the social relationship strength is stronger and considering the social relationship strength as a group, and if the calculation result of the social relationship strength between the client and the initial client is less than 0.5, judging that the social relationship strength is lower and excluding the current node group;
after the client node is added, judging whether the number of contracts is increased after the client node is added, calculating whether the strength of the social relationship among the clients is more than 0.5, and if so, adding the added client node into a set corresponding to the node;
sequentially traversing the client nodes with higher social relationship strength among the clients of the initial client node to obtain a set corresponding to the client node with higher social relationship strength among the clients;
and combining the sets of all the client nodes to obtain the high-net-value client group in the graph.
Further, the customer node is identification information of a customer, the non-customer node is one of a mobile phone number, a bank card, a contract and a home address, the node relationship of the contract includes but is not limited to contract number, contract time, contract building and contract amount, and the node relationship of the identification card includes but is not limited to name and home address.
Further, the edges also represent incidence relations and ternary relations between the client nodes and the non-client nodes.
Further, the ternary relationship is a set consisting of a starting node, a tail node and a relationship edge pointing to the tail node from the starting node, and the ternary relationship comprises an identity card, a used telephone, a mobile phone number, a contract, a used bank card, a bank card number, an identity card, a contract, an identity card, a residence address and a home address.
Further, the importing the customer data in the MySQL structured database into the Neo4j graph database includes: and standardizing the writing method of the home address, and writing the ternary relationship into a knowledge graph model by using python.
Further, the calculation formula for calculating the social relationship strength between the clients is as follows:
wherein Rs is the intensity of social relationship among clients, k is the number of independent variables, ak Is the weight of the kth argument; f (x)k ) Is xk The result of the normalization of (1) is [0 ] in]In between.
Furthermore, in the calculation of the strength of the social relationship among the clients, the number of independent variables is three, wherein x is1 Is the number of paths, x, between two client nodes2 Whether the node relationships of the contracted building groups for two customer nodes are the same, x3 The node relation of the two client nodes is the time interval of contract signing time; a is1 Is x1 Is weighted 3/5, a2 And a3 Are respectively x2 And x3 1/5 for each weight; f (x)k ) The calculation formula of (2) is as follows:
an apparatus comprising a processor, a memory, and a communication bus; the communication bus is used for realizing connection communication between the processor and the memory; the processor is configured to execute one or more programs stored in the memory to perform the steps of the method for knowledgegraph-based high-net customer base identification of any of claims 1 to 7.
A storage medium storing one or more programs, the one or more programs being executable by one or more processors to perform the steps of the method for knowledgegraph-based high net customer base identification of any of claims 1-7.
The beneficial effects of the invention are as follows: the invention establishes the knowledge graph according to the relevant information of the customer subscription class, calculates the social relationship strength among the customers according to the access degree, the subscription time and whether the same floor is purchased or not of the neighbor nodes of the customers in the knowledge graph, and judges whether the number of contracts is increased after the customer nodes are added on the basis of meeting the relationship strength so as to determine the high-net-value customer group.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
High-net-value customers, defined as the repurchase population in a real estate company.
Example one
As shown in fig. 1, the present invention provides a method for identifying a high-net-value customer group based on a knowledge-graph, comprising:
s1, constructing a customer knowledge graph model, wherein the knowledge graph model comprises a plurality of nodes and node relations, each node has a unique name, the nodes are divided into customer nodes and non-customer nodes, and edges exist between the customer nodes and the non-customer nodes and are in the node relations.
The client node is the identity identification information of the client, and the non-client node is one of a mobile phone number, a bank card, a contract and a home address; in the node relationship, the node relationship of the contract includes but is not limited to contract number, contract time, contract building and contract amount, and the node relationship of the identity card includes but is not limited to name and home address.
The edges also represent an associative relationship, a ternary relationship (h, r, t) between the customer node and the non-customer nodes. In the ternary relationship (h, r, t), h represents the start node, t represents the end node, and r is the relationship edge pointing from h to t. The three-dimensional relationships involved are: (identification card, use of phone, mobile phone number), (contract, use of bank card, bank card number), (identification card, contract), (identification card, residence, home address).
And S2, importing the client data in the MySQL structured database into a Neo4j graphic database. The data processing is mainly related to clearing of vacant data, writing of the home address is standardized, and the ternary relationship is written into the knowledge graph model by means of python according to the relationship among the nodes in S1.
S3, randomly determining a client node as an initial client node in a graph, traversing the neighbor client nodes of the initial client node in a breadth-first mode, and calculating the social relationship strength among clients.
The calculation formula for calculating the social relationship strength among the clients is as follows:
wherein Rs is the strength of social relationship between clients, and k is selfNumber of variables, ak Is the weight of the kth argument; f (x)k ) Is xk The result of the normalization of (1) is [0,1 ]]In the meantime.
In the above formula, the number of independent variables is three, wherein, x1 Is the number of paths, x, between two client nodes2 Whether the node relationships of the contracted building groups for two customer nodes are the same, x3 The node relationship for the two client nodes is the time interval of contract signing time.
In the above formula, f (x)
k ) The variable normalization function relates to normalization of three variables, namely the number of associated paths, signing time interval and whether signing is carried out on the same building. The signing time interval and the number of the associated paths belong to continuous variables, whether the signing on the same floor belongs to classification variables or not, and the two types of variables are classified and processed. a is
1 The weight corresponding to the path number between the client nodes plays an important role in the relation measurement as
a
2 And a
3 Corresponding weight is as follows
The formula for f (xk) is such that when k ∈ (1, 2), x is 4 and 1 at maximum, respectively, and k is 3, x is a time interval variable, the minimum is 0, and the maximum is determined by the time point of the repurchase. Data statistics show that the repeated purchasing client population basically performs repeated purchasing within three years, and three years are considered as effective time intervals. :
as shown in fig. 2, for example, a represents a customer node, M represents a telephone node, C represents a bank card node, D represents a contract node, and F represents a home address. Wherein, a1 represents the initial client node, and it can be seen that the neighboring client nodes associated with it have a2 and A3, where a2 and a1 share the mobile phone number and the home address information, and A3 and a1 have the same subscription information.
Through the graph statistics A1, the relationship attributes of the client and the neighbor clients are calculated by using the Neo4j query language Cypher according to the attributes of the buildings and the purchasing time of the contract nodes, and the table shows the statistical results.
| Node point | A2 | A3 |
| Number of relevant paths | 2 | 4 |
| Whether to purchase the same building | 0 | 1 |
| Purchase time interval (sky) | 121 | 0 |
| Whether the number of contracts increases | Is that | Whether or not |
According to the steps { A2, A3} obtained by breadth-first traversal, the social relationship strength of each node is calculated in sequence, the social relationship strength of A2 is calculated:
social relationship strength calculation of A3:
And S4, if the calculation result of the social relationship strength between the client and the initial client is more than 0.5, judging that the social relationship strength is stronger and considering the client as a group, and if the calculation result of the social relationship strength between the client and the initial client is less than 0.5, judging that the social relationship strength is lower and excluding the current node group.
In the example of step S3, the social relationship strength of a2 satisfies the condition, and the social relationship strength is determined to be strong and regarded as one group, and the social relationship strength of A3 satisfies the condition, and the social relationship strength is determined to be strong and regarded as one group.
And S5, after the client node is added, judging whether the number of contracts is increased after the client node is added, calculating whether the social relationship strength among the clients is more than 0.5, and if so, adding the added client node into the set corresponding to the node.
In the example in step S3, the social relationship strength of a2 satisfies the condition, and the number of contracts after joining node a2 is increased, so a2 is added to the result set { a1, a2 }. The social relationship strength of a3 satisfies the condition, but the number of contracts does not increase, so the condition is not satisfied overall.
And S6, sequentially traversing the client nodes with higher social relationship strength among the clients of the initial client node to obtain a set corresponding to the client node with higher social relationship strength among the clients.
In the example in step S3, a2 and A3 both have no other neighbor client nodes, and when a2 and A3 both have other neighbor client nodes, a2 and the other neighbor client nodes are sequentially traversed to obtain a set corresponding to a2 with higher social relationship strength of a2, and A3 and the other neighbor client nodes are sequentially traversed to obtain a set corresponding to A3 with higher social relationship strength of A3.
And S7, merging the set of all the client nodes to obtain the high-net-value client group in the graph.
In the example in step S3, since neither A2 nor A3 has any other neighbor client nodes, the high-net-value client cluster that has finally traversed the graph is { A1, A2 }.
In practical application, when both a2 and A3 have other neighbor client nodes, the same traversal continues for the neighbor nodes of a1, and finally the result sets of a graph are obtained through combination.
Example two
In this embodiment, on the basis of the first embodiment, an apparatus is provided, as shown in fig. 3, which is mainly used for implementing the steps of the method for identifying a high-net-value customer group based on a knowledge graph in the first embodiment, and the apparatus mainly includes aprocessor 21, amemory 22 and acommunication bus 23; thecommunication bus 23 is used for realizing connection communication between theprocessor 21 and thememory 22;processor 21 is operative to execute one or more programs stored inmemory 22 to implement the steps of the knowledge-graph based high net customer base identification method as in the previous embodiment. For details, please refer to the description in the first embodiment, which is not repeated herein.
In addition, the present embodiment also provides a storage medium, where the storage medium stores one or more programs, and the one or more programs are executable by one or more processors to implement the steps of the method for identifying a high-net-worth customer group based on a knowledge graph according to the first embodiment. For details, please refer to the description in the first embodiment, which is not repeated herein.
The beneficial effects of the invention are as follows: the invention establishes the knowledge graph according to the relevant information of the customer subscription class, calculates the social relationship strength among the customers according to the access degree, the subscription time and whether the same floor is purchased or not of the neighbor nodes of the customers in the knowledge graph, and judges whether the number of contracts is increased after the customer nodes are added on the basis of meeting the relationship strength so as to determine the high-net-value customer group.
It will be apparent to those skilled in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be centralized on a single computing device or distributed across a network of computing devices, and optionally they may be implemented in program code executable by a computing device, such that they may be stored on a computer storage medium (ROM/RAM, magnetic disks, optical disks) and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art; the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; these modifications and substitutions do not cause the essence of the corresponding technical solution to depart from the scope of the technical solution of the embodiments of the present invention, and are intended to be covered by the claims and the specification of the present invention.