Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application.
With the continuous development of internet technology, artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) technology has also been developed better. So-called artificial intelligence is the theory, method, technique and application of simulating, extending and expanding human intelligence, sensing the environment, obtaining knowledge and using knowledge to obtain optimal results using a digital computer or a machine controlled by a digital computer. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
The machine learning (MACHINE LEARNING, ML) is a multi-domain interdisciplinary, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. The deep learning is a technology for machine learning by using a deep neural network system, and the machine learning and the deep learning generally comprise artificial neural network, confidence network, reinforcement learning, migration learning, induction learning, teaching learning and other technologies.
Based on a machine learning/deep learning technology in an AI technology, the embodiment of the application provides an account identification scheme to improve the recall rate of an abnormal account. It should be noted that the embodiments of the present application may be applied to various scenarios, including, but not limited to, cloud technology, artificial intelligence, intelligent transportation, driving assistance, and the like.
Referring to fig. 1a, the general principle of the account identification scheme provided by the embodiment of the application is that firstly, a plurality of account attributes of each account to be identified can be obtained, feature extraction is performed on each account attribute of each account to obtain a plurality of target account features, so that an account relation network is constructed by adopting the plurality of target account features, and correspondingly, a plurality of triplets can be obtained through the account relation network to train an optimized vector representation model, so that the target characterization vector of each node in the account relation network is determined by adopting the optimized vector characterization model, and the account corresponding to each node is clustered based on the target characterization vector of each node, so that abnormal accounts are identified in the accounts participating in clustering according to the clustering result.
Practice shows that the account identification scheme provided by the embodiment of the application has the advantages that ① can refer to account attributes with more dimensions at the same time and support expansion of the account attributes so as to construct a more accurate account relation network, ② acquires target characterization vectors of all nodes so as to improve the accuracy of the characterization vectors of all nodes and further more accurately represent accounts corresponding to all nodes, the accuracy of clustering results can be effectively improved, and ③ can improve the recall rate of abnormal accounts under the condition that the identification accuracy is not affected.
In a specific implementation, the account identification scheme can be implemented by a computer device, which can be a terminal or a server, wherein the terminal can include, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart watch, a smart voice interaction device, a smart home appliance, a vehicle-mounted terminal, an aircraft and the like, and various clients (applications) such as a video playing client, a social client, a browser client, an information flow client, an education client and the like can be operated in the terminal. The servers mentioned herein may be independent physical servers, may be server clusters or distributed systems formed by a plurality of physical servers, and may also be cloud servers for providing cloud services, cloud databases, cloud computing (cloud computing), cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content delivery networks), and basic cloud computing services such as big data and artificial intelligence platforms, where the so-called cloud computing is a computing mode that distributes computing tasks on a resource pool formed by a large number of computers, so that various application systems can acquire computing power, storage space and information services as required. The computer device according to the embodiment of the present application may be located outside or inside a blockchain network, which is not limited to this, and the blockchain network is a network composed of a point-to-point network (P2P network) and a blockchain, and the blockchain refers to a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm, etc., which is essentially a decentralised database, and is a series of data blocks (or referred to as blocks) generated by associating with each other using a cryptographic method.
Or in other embodiments, the above-mentioned account identification scheme may be performed by the server and the terminal together, where the terminal and the server may be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein. For example, the terminal may be responsible for acquiring a plurality of account attributes of each account to be identified, so as to obtain a plurality of target account features, and then send the plurality of target account features to the server, so that the server may use the plurality of target account features to construct an account relationship network, determine target feature vectors of each node through the account relationship network, and then identify an abnormal account based on the target feature vectors of each node, as shown in fig. 1 b. For another example, the terminal may be responsible for acquiring a plurality of account attributes of each account to be identified, so as to obtain a plurality of target account features, and then send the plurality of target account features to the server, so that the server may use the plurality of target account features to construct an account relationship network, determine a target characterization vector of each node based on the account relationship network, and thus send the target characterization vector of each node to the terminal, and then the terminal performs clustering processing on the accounts corresponding to each node based on the target characterization vector of each node, and identifies an abnormal account in the plurality of accounts participating in the clustering according to the clustering result. It should be understood that these two cases where the terminal and the server together perform the above account identification scheme are merely illustrated by way of example and not by way of exhaustive illustration.
Based on the above description of the account identification scheme, the embodiment of the present application proposes an account identification method, which may be executed by the above-mentioned computer device (terminal or server), or the account identification method may be executed by both the terminal and the server. For convenience of explanation, the computer device executes the account identification method, and referring to fig. 2, the account identification method may include the following steps S201-S205:
S201, obtaining a plurality of account attributes of each account to be identified, and extracting features of the account attributes of each account to obtain a plurality of target account features.
Each account to be identified may refer to all accounts in any platform, that is, each account created in any platform, and each account to be identified may also refer to each account complained in any platform, which is not limited in the present application.
It should be noted that, any platform may receive a plurality of complaints every day, in which case, the computer device may use the accounts of the complaints as the accounts to be identified, or when the associated detection needs to be performed on each account in any platform, the computer device may use each account in the platform as the account to be identified, and so on.
Correspondingly, the account attribute can include but is not limited to account information, reported information and the like, the specific content of the account attribute of any account is not limited by the application, the account information includes but is not limited to nickname, head portrait, personalized signature, login address and the like, and the reported information includes but is not limited to reported description, reported chat and the like.
In a specific implementation, when the computer device obtains the plurality of account attributes of each account to be identified, if the computer device stores the plurality of accounts and the account attributes of each account in its own storage space, the computer device may select each account to be identified and the plurality of account attributes of each account to be identified from the stored plurality of accounts and the corresponding account attributes, or each account of any platform and the corresponding account attributes are stored in the target operation device, and the computer device may obtain each account to be identified and the plurality of account attributes of each account to be identified from the target operation device. It should be noted that, the method for acquiring the plurality of account attributes of each account to be identified is not limited.
S202, an account relation network is constructed by adopting a plurality of target account characteristics, one node in the account relation network records the target account characteristics of one account, and two nodes connected by one side record the same target account characteristics.
It should be noted that, one account in each account to be identified may have at least one target account feature of the plurality of target account features, that is, at least one target account feature of the plurality of target account features may be an account feature of the same account, and accordingly, one account in each account to be identified may not have any target account feature of the plurality of target account features, which is not limited in the present application.
In a specific implementation, the computer device may use multiple target account characteristics to construct an account relationship network for each account to be identified, where the account corresponding to any node in the account relationship network has at least one target account characteristic, and it may be understood that an account not having any target account characteristic may refer to an account not associated with any account other than the account in each account to be identified, and then the computer device may determine that the account not having any target account characteristic is a non-abnormal account, so that further identification of the accounts may not be performed. Optionally, the account relationship network may also include nodes corresponding to accounts without any target account feature, based on which the target account feature recorded by the node corresponding to the account without any target account feature may be blank, and the nodes are not connected with any node. The application is not limited to the specific construction mode of the account relation network.
It should be noted that the computer device may screen the account attributes of each account based on the SAN framework (SimilarAttributeNetwork, a framework that is generally based on similar attributes of the accounts to obtain multiple target account features (i.e., screened features), that is, the computer device may perform feature extraction on the account attributes of each account based on the framework to obtain multiple initial account features, and screen the multiple initial account features to obtain multiple target account features, and further, use the multiple target account features as a relationship to associate the nodes corresponding to each account together, thereby forming an account relationship network, as shown in fig. 3. It can be understood that the SAN framework provided by the embodiment of the present application can flexibly use the account attributes of each account to construct an account relationship network, that is, the framework is universal for account clustering based on merchants, and supports the expansion of the account attributes.
S203, sampling a plurality of triplets from the account relationship network, wherein one triplet comprises a center node, a positive sample node and a negative sample node, the center node is a node selected from the account relationship network, the positive sample node is a node connected with the center node, and the negative sample node is a node not connected with the center node.
It should be noted that, the computer device may traverse the account number relationship network and generate the plurality of triples in a random walk manner, so as to obtain training data sets corresponding to the plurality of triples, so as to facilitate subsequent training optimization on the vector representation model.
In one embodiment, for any one of the triplets, the computer device may randomly select a node from the account relationship network as a center node (center) included in the any one triplet, select a node from nodes connected to the center node included in the any one triplet as a positive sample node (positive) included in the any one triplet, and select a node from nodes not connected to the center node included in the any one triplet as a negative sample node (negativate) included in the any one triplet. In this case, the computer device may randomly select a node in the account relationship network as a central node in the corresponding triplet, respectively, to sample a plurality of triples from the account relationship network.
In another embodiment, the computer device may sequentially take a node in the account relationship network as a central node included in each of the at least one triplet to sample a plurality of triples from the account relationship network. Correspondingly, after determining the central node in any triplet, the computer equipment can select one node from nodes connected with the central node in any triplet as a positive sample node in any triplet, and select one node from nodes not connected with the central node in any triplet as a negative sample node in any triplet.
For example, assuming that the computer device uses node A in the account relationship network as a central node in any triplet, the nodes connected with the central node in any triplet include node D, node E and node H, and the nodes not connected with the central node in any triplet include node B, node C, node F, node G, node I, node J and node K, the computer device may select one node from node D, node E and node H as a positive sample node in any triplet and one node from node B, node C, node F, node G, node I, node J and node K as a negative sample node in any triplet, as shown in FIG. 3.
Further, for any triplet of the above-mentioned plurality of triples, the positive sample node in any triplet may be determined by, but not limited to, the following ways:
The first way of determining is that the computer device may randomly select a node from among the nodes connected to the central node in any triplet as the positive sample node in any triplet, that is, the computer device may select a node from among the nodes connected to the central node in any triplet with equal probability as the positive sample node in any triplet.
The second determination mode is that the computer equipment can determine the weight value of each edge corresponding to each node in the nodes connected with the central node in any triplet, and according to each weight value, sample processing is carried out in the nodes connected with the central node in any triplet, namely, the nodes connected with the central node in any triplet are weighted and sampled, so that one sampled node is taken as a positive sample node in any triplet, namely, the computer equipment can select one node from the nodes connected with the central node in any triplet and weighted as the positive sample node in any triplet. The determination method of the weight value of the edge related to any two connected nodes is described below, and is not described herein.
It will be appreciated that in the weighted sampling process, the node to which the edge with the larger weight value is connected (i.e., the node with the larger weight value) is sampled more likely, in other words, after sampling is performed a plurality of times, the node with the larger weight value is sampled more times.
Correspondingly, the determination mode of the negative sample node in any triplet can refer to that one node is randomly selected as the negative sample node in any triplet from nodes which are not connected with the central node in the triplet.
S204, training and optimizing the vector representation model according to the initial characterization vector of each node in each triplet, and determining the target characterization vector of each node in the account relation network by adopting the optimized vector representation model.
Vector representation models may include, but are not limited to GRAPHSAGE (GRAPH SAMPLE AGGREGATE, a graph neural network (Graph Neural Networks, GNN) that generates a target token vector for a center node by learning a function that aggregates representations of neighboring nodes), deepWalk (an algorithm that learns node embedding), and GCN (Graph Convolutional Network, graph convolution network), to name a few, as the application is not limited in this regard. It will be appreciated that the vector representation model is an unsupervised model and that the computer device may employ the vector representation model for unsupervised training to determine the target token vector for each node in the account relationship network.
S205, clustering the accounts corresponding to the nodes based on the target characterization vectors of the nodes, and identifying abnormal accounts among the accounts participating in clustering according to the clustering result.
It should be noted that, when the computer device performs clustering processing on the account corresponding to each node based on the target token vector of each node, a K-Means (K-Means) clustering algorithm may be used to perform clustering processing on the target token vector of each node to implement clustering processing on the account corresponding to each node, a mean shift (MEAN SHIFT) clustering algorithm may also be used to perform clustering processing on the target token vector of each node to implement clustering processing on the account corresponding to each node, a hierarchical clustering algorithm may also be used to perform clustering processing on the target token vector of each node to implement clustering processing on the account corresponding to each node, and so on.
According to the method and the device for identifying the multiple account attributes of the multiple accounts, the multiple account attributes of each account to be identified can be obtained, more dimensionality account attributes can be referred to at the same time, expansion of the account attributes is supported, corresponding, feature extraction can be carried out on each account attribute of each account to obtain multiple target account features, a more accurate account relation network is built by the multiple target account features, one node in the account relation network records the target account features of one account, and two nodes connected with one side record the same target account features. Then, a plurality of triplets can be sampled from the account relation network, so that training optimization is carried out on the vector representation model according to the initial representation vector of each node in each triplet, and the more accurate target representation vector of each node in the account relation network is determined by adopting the optimized vector representation model, so that the accuracy of the representation vector of each node is improved, and the account corresponding to the corresponding node is better represented by the target representation vector of each node; correspondingly, the account corresponding to each node can be clustered based on the target characterization vector of each node to obtain a more accurate clustering result, and the abnormal account is identified among the accounts participating in clustering according to the clustering result, so that the recall rate of the abnormal account is improved under the condition that the identification accuracy is not affected.
Fig. 4 is a flowchart of another account identification method according to an embodiment of the present application. The account identifying method may be performed by the above-mentioned computer device (terminal or server), or the account identifying method may be performed by both the terminal and the server. For convenience of explanation, the computer device will be used to execute the account identifying method, and referring to fig. 4, the account identifying method may include the following steps S401-S409:
S401, acquiring a plurality of account attributes of each account to be identified.
S402, respectively extracting features of each account attribute of each account to obtain a plurality of initial account features, wherein one account attribute corresponds to one or more initial account features.
The method comprises the steps of determining a feature extraction mode corresponding to any account attribute according to the attribute type of the any account attribute by a computer device, wherein the feature extraction mode comprises at least one of a source string extraction mode, a word segmentation extraction mode and a keyword extraction mode when the attribute type is a text type, the source string extraction mode is used for indicating that any account attribute is used as a corresponding initial account feature, the word segmentation extraction mode is used for indicating that any account attribute is subjected to word segmentation processing, a word segmentation processing result is used as a corresponding initial account feature, the keyword extraction mode is used for indicating that all extracted keywords are used as corresponding initial account features, and it is understood that the word segmentation extraction mode comprises, but is not limited to, a two-word (bigram) extraction mode, a three-word (trigram) extraction mode and the like, namely, the word segmentation processing result can be a two-word or a three-word, the method is not limited by the method, and the computer device can extract any account attribute through a hot discovery or a new word or the like when the keywords of any account attribute are extracted.
It should be noted that the text types may include a short text type and a long text type, where the short text type refers to a text type of a short text, the short text refers to a text with a text length smaller than a preset length threshold, and the text length may refer to the number of characters included in the text, and correspondingly, the long text type refers to a text type of a long text, and the long text refers to a text with a text length greater than or equal to the preset length threshold, and it should be understood that the preset text length may be empirically set, or may be set according to actual requirements, which is not limited in the present application. Optionally, when the attribute type is a short text type, the feature extraction mode may include at least one of an original string extraction mode and a word segmentation extraction mode, and when the attribute type is a long text type, the feature extraction mode may include a keyword extraction mode.
Correspondingly, when the attribute type is an image type, the characteristic extraction mode comprises an image quantization extraction mode, and the image quantization extraction mode is used for indicating that any account attribute is subjected to image quantization processing, an image quantization processing result is taken as a corresponding initial account characteristic, wherein the image quantization processing can refer to characteristic extraction of an image, and when the image quantization processing result is a hash algorithm string, the image quantization processing of any account attribute can refer to that the hash algorithm string of any account attribute is extracted to obtain an image quantization processing result (namely, an extraction result of the hash algorithm string) corresponding to any account attribute, wherein the hash algorithm can refer to ahash (mean hash) algorithm, dhash (difference hash) algorithm and the like, and the application is not limited to the method. Or when the attribute type is a numerical value type or a character type, the feature extraction mode includes an original string extraction mode, such as an encrypted mobile phone number, an encrypted md5 (Message-Digest Algorithm fifth edition) code (i.e. a feature code obtained by performing mathematical transformation on the original information through the md5 Algorithm), and the like.
Further, the computer device may perform feature extraction on any account attribute according to a feature extraction manner corresponding to the any account attribute, to obtain one or more initial account features corresponding to the any account attribute. The method comprises the steps that when a feature extraction mode corresponding to any account attribute comprises an original string extraction mode, a computer device can take the any account attribute as a corresponding initial account feature, when the feature extraction mode corresponding to any account attribute comprises a word segmentation extraction mode, the computer device can conduct word segmentation on the any account attribute and take a word segmentation processing result as a corresponding initial account feature, when the feature extraction mode corresponding to any account attribute comprises a keyword extraction mode, the computer device can conduct keyword extraction on the any account attribute and take an extracted keyword as a corresponding initial account feature, and when the feature extraction mode corresponding to any account attribute comprises an image quantization extraction mode, the computer device can conduct image quantization processing on the any account attribute and take an image quantization processing result as a corresponding initial account feature.
S403, calculating the characteristic value of each initial account characteristic according to the characteristic type of each initial account characteristic.
In a specific implementation, the computer device may group the plurality of initial account features according to feature types of the respective initial account features to obtain a plurality of account feature groups, where one account feature group corresponds to one feature type, that is, feature types of each initial account feature in the same account feature group are the same. Further, for any initial account feature, the computer device may determine a feature value of any initial account feature according to a duty ratio of the any initial account feature in a corresponding account feature group, where the feature value of any initial account feature is inversely related to a corresponding duty ratio. Based on this, the computer device may calculate the feature value of the initial account feature f using equation 1.1 as follows:
tf=-log p2 type 1.1
When calculating the characteristic value of the initial account characteristic f, p is the duty ratio of the initial account characteristic f in the corresponding account characteristic group.
It should be noted that, the feature type of any initial account feature may be used to indicate that the attribute sources of the account attributes corresponding to the initial account feature are the same, that is, the attribute sources of the account attributes corresponding to the initial account feature of the same feature type are the same, and the attribute sources of any account attribute include, but are not limited to, nicknames, head portraits, personalized signatures, login addresses, reported items, etc., if the account attribute a is a nickname "i love you", and the account attribute B is a nickname "honest Chinese medicine", the attribute sources of the account attribute a and the account attribute B are both nicknames. Correspondingly, the feature type of any initial account feature can also be used for indicating that the attribute source of the account attribute corresponding to the any initial account feature and the corresponding feature extraction mode are the same, that is, the attribute source of the initial account feature of the same feature type and the corresponding feature extraction mode are the same, and also can be used for indicating that the attribute type of the account attribute corresponding to the any initial account feature, that is, the attribute type of the account attribute corresponding to the initial account feature of the same feature type is the same, and the like, which is not limited by the application.
For example, assuming that the source of the account attributes corresponding to the initial account features of the same feature type are the same, the feature extraction method corresponding to the initial account features of the same feature type is the same, and a nickname is taken as an example for explanation, assuming that nicknames of all accounts are short texts, account attributes A are nicknames of 'i are in good faith Chinese medicine', and account attributes B are nicknames of 'good faith Chinese medicine', the computer device can perform feature extraction on the account attributes A and the account attributes B according to the original string extraction method and the word segmentation extraction method respectively, and one or more initial account features of the account attributes A and one or more initial account features of the account attributes B are obtained. Assuming that the term extraction manner refers to a two-word extraction manner, and the nickname two-word of the account attribute a may include "i is" and "honest-traditional Chinese medicine", the nickname two-word of the account attribute B may include "honest-traditional Chinese medicine", that is, the initial account feature of the account attribute a includes a nickname primary string "i is honest traditional Chinese medicine", the nickname two-word "i is" and the initial account feature of the nickname two-word "honest-traditional Chinese medicine" account attribute B includes a nickname primary string "honest traditional Chinese medicine" and a nickname two-word "honest traditional Chinese medicine", so that the nickname primary string "i is honest traditional Chinese medicine" and the nickname primary string "honest traditional Chinese medicine" are the same in feature type, and the nickname two-word "honest-traditional Chinese medicine" are the feature type, in this case, the computer device may divide the initial account features into two feature groups, one honest traditional Chinese medicine primary string "honest traditional Chinese medicine" and one honest-Chinese medicine primary string "and" honest-Chinese medicine "and" primary nickname primary string "and" first honest-name "and" account name two-primary character "and" are the same.
Based on this, taking the example of determining the characteristic value of the nickname two-part word "honest-traditional Chinese medicine" as an example, assuming that the number of the nickname two-part word "honest-traditional Chinese medicine" in the corresponding account characteristic group is C Integrity of - Traditional Chinese medicine, and the number of all the characteristics in the corresponding account characteristic group (i.e., all the numbers of the nickname two-part words) is C nickname -bigram, the computer device may calculate the characteristic value of the nickname two-part word "honest-traditional Chinese medicine" by using formula 1.2:
s404, detecting common features from a plurality of initial account features according to the feature values of the initial account features, wherein the common features are the initial account features held by at least K accounts, and K is a positive integer.
It should be appreciated that the commonality feature has a greater popularity that may be held by at least K accounts, and that the popularity of one initial account feature is positively correlated with the corresponding duty cycle, i.e., the greater the duty cycle of any initial account feature in the corresponding set of account features, the greater the popularity of any initial account feature, and correspondingly, the greater the feature value of any initial account feature due to the negative correlation of the feature value of the one initial account feature with the corresponding duty cycle, the less the popularity of any initial account feature, i.e., the computer device may treat the initial account feature having a smaller feature value as the commonality feature. The common characteristic may be province, city, etc., and the application is not limited thereto.
In a specific implementation, the computer equipment can arrange the initial account characteristics according to the characteristic values of the initial account characteristics in sequence from large to small to obtain a characteristic sequence, and the initial account characteristics positioned behind the target arrangement position in the characteristic sequence are used as common characteristics.
In one embodiment, the computer device may determine the target arrangement position according to a preset rejection threshold before each initial account feature located after the target arrangement position in the feature sequence is used as the common feature, where the preset rejection threshold may be a specific rejection number or a rejection percentage, which is not limited in the present application. It should be understood that when the preset culling threshold value refers to the number of culls, the target arrangement position may refer to a first position before the reciprocal preset culling threshold value number position in the feature sequence, and when the preset culling threshold value refers to the culling percentage, the target arrangement position may refer to a first position between the reciprocal target number position in the feature sequence, where the target number is equal to a multiplication result between the number of initial account number features in the feature sequence and the culling percentage.
For example, assuming that the preset culling threshold is the number of culls and the number of culls is 20, the target arrangement position may refer to the first position before the last 20 positions in the feature sequence, that is, the last 21 positions in the feature sequence, and the computer device may use each initial account feature located after the last 21 positions in the feature sequence as a common feature, and further assuming that the preset culling threshold is the culling percentage and the culling percentage is 10%, the number of initial account features in the feature sequence is 100, the target arrangement position may refer to the first position before the last 10 positions in the feature sequence, then the computer device may use each initial account feature located after the last 11 positions in the feature sequence as a common feature, that is, may use the initial account feature located at the last 10% of the feature sequence as a common feature, and so on.
In another embodiment, the computer device may determine the target arrangement position according to a preset feature threshold before each initial account feature located after the target arrangement position in the feature sequence is used as the common feature, and specifically, the computer device may use a position where the initial account feature corresponding to the minimum feature value with the feature value larger than the preset feature threshold in the feature sequence is located as the target arrangement position. In this case, each initial account feature located after the target arrangement position in the feature sequence is taken as a common feature, which may mean that each initial account feature with a feature value smaller than or equal to a preset feature threshold in the feature sequence is taken as a common feature. The initial feature threshold may be set empirically, or may be set according to actual requirements, which is not limited in the present application.
S405, taking the rest initial account characteristics except the common characteristics in the plurality of initial account characteristics as target account characteristics.
It can be understood that the computer device can select each initial account feature with a larger feature value from the plurality of initial account features as a target account feature, that is, the computer device can select a representative target account feature according to the ratio of the initial account feature in the group, and further, the selected target account features can be used for associating accounts together to form an account relation network.
S406, an account relation network is constructed by adopting a plurality of target account characteristics, one node in the account relation network records the target account characteristics of one account, and two nodes connected by one side record the same target account characteristics.
In a specific implementation, the computer device may generate a plurality of nodes, record the target account characteristics of one account according to a principle that one node records the target account characteristics of one account, record the target account characteristics into the plurality of nodes according to accounts corresponding to the target account characteristics, detect the same target account characteristics recorded by any two nodes for any two nodes, count the feature number of the same target account characteristics when the same target account characteristics are detected, and connect any two nodes by one edge if the feature number is greater than a number threshold, wherein the same target account characteristics may also be referred to as similar account characteristics. In other words, the computer device can take the account numbers as nodes and take the common target account number characteristics among the account numbers as edges to form an account number relation network, based on the account number relation network, one node corresponds to one account number, and the same target account number characteristics exist between two nodes connected through one edge in the account number relation network.
The number threshold may be set empirically or according to actual requirements, which is not limited in the present application. For example, when the number threshold is 1, the computer device connects any two nodes with one edge if and only if the number of features corresponding to those two nodes is greater than 1.
Further, the feature quantity can be represented by M, M is a positive integer, and the specific implementation mode of connecting any two nodes by one side can comprise the steps of counting the quantity of different account attributes in account attributes corresponding to the M target account features if the feature quantity is larger than a quantity threshold, and connecting any two nodes by one side if the counted quantity of the different account attributes is larger than an attribute quantity threshold.
The application is not limited to the above, and the attribute number threshold may be empirically set or may be set according to actual needs, and by way of example, if and only if the number of the counted different account attributes is greater than 1, an edge is adopted to connect any two nodes, that is, if two accounts only have a certain type of common target account feature, the nodes corresponding to the two accounts are not connected together, for example, if the same target account feature between the account indicated by the node a and the account indicated by the node B includes nicknames (i.e., nickname primary string) "honest traditional Chinese medicine" and nickname secondary word "honest-traditional Chinese medicine", and the account attributes corresponding to the two target account features are nickname "honest traditional Chinese medicine", then only one type of common target account feature exists between the account indicated by the node a and the account indicated by the node B, that is, that the counted different attribute number is equal to 1, the two nodes are not connected together by an edge, that is, the two nodes are not associated. It should be understood that this is mainly because as the data size increases, many accounts will have the same target account characteristics due to the coincidence of the edges, so a threshold of the attribute number may be set to constrain the connection relationship between any two nodes, so as to improve the accuracy of the account relationship network.
It should be noted that, for any one of the M target account features, the computer device may use two account attributes corresponding to the account indicated by any two nodes of the M target account features as the account attribute corresponding to the any one of the target account features, or may use one account attribute corresponding to the account indicated by any two nodes of the any one of the target account features as the account attribute corresponding to the any one of the target account features, or the like.
Specifically, the computer equipment can initialize an account attribute set when counting the number of different account attributes in account attributes corresponding to M target account features, traverse each target account feature in the M target account features, judge whether the account attribute corresponding to the currently traversed target account feature is in the account attribute set, if not, add the account attribute corresponding to the currently traversed target account feature to the account attribute set, if so, continue traversing the M target account features, and then count the number of the account attributes in the account attribute set after each target account feature in the M target account features is traversed, so as to obtain the number of different account attributes.
Further, for any two nodes, the computer device may further determine an account association degree corresponding to the any two nodes (i.e., account association degrees of accounts corresponding to the two nodes), and use the account association degree as a weight value of an edge connecting the any two nodes, that is, the account relationship network may be a weighted network. Specifically, if the two nodes have the same target account number characteristics, the computer device may calculate the account number association corresponding to the two nodes according to the characteristic values of the same target account number characteristics recorded by the two nodes when determining the account number association corresponding to the two nodes, and if the two nodes do not have the same target account number characteristics, may determine that the account number association corresponding to the two nodes is zero.
In one embodiment, when the account association degree corresponding to the any two nodes is calculated according to the feature values of the same target account features recorded by the any two nodes, the computer device may perform a summation operation on the feature values of the same target account features recorded by the any two nodes, and use the summation operation result as the account association degree corresponding to the any two nodes.
In another embodiment, when calculating the account correlation degree corresponding to the any two nodes according to the feature values of the same target account features recorded by the any two nodes, the computer device may determine one or more target feature values from the feature values of the same target account features recorded by the any two nodes, where any target feature value refers to a largest feature value among the feature values of at least one target account feature under the corresponding account attribute, and any target feature value corresponds to a target account feature with a largest feature value among the at least one target account feature under the corresponding account attribute, and then perform a summation operation on the one or more target feature values, and use a summation operation result as the account correlation degree corresponding to the any two nodes. Specifically, the computer device may calculate the account association degrees corresponding to the node a and the node B (i.e., the account association degrees of the account corresponding to the node a and the account corresponding to the node B) by using the formula 1.3 as follows:
rAB=max(T nickname bigram- Integrity of - Traditional Chinese medicine,T Nickname string - Integrity of Chinese medicine…)+T Head portrait hash+…+T Others type 1.3
The account attribute corresponding to the nickname two-segmentation word 'honest-Chinese medicine' and the account attribute corresponding to the nickname original string 'honest Chinese medicine' are nickname 'honest Chinese medicine', that is, the same target account features correspond to the same account attribute, then the computer equipment can select a maximum feature value from the feature values of the same target account features as a target feature value, and sum the obtained target feature values to obtain account association degrees corresponding to two nodes, namely account association degrees of the corresponding two accounts.
S407, sampling a plurality of triples from the account relationship network, wherein one triplet comprises a center node, a positive sample node and a negative sample node, the center node is a node selected from the account relationship network, the positive sample node is a node connected with the center node, and the negative sample node is a node not connected with the center node.
S408, training and optimizing the vector representation model according to the initial characterization vector of each node in each triplet, and determining the target characterization vector of each node in the account relation network by adopting the optimized vector representation model.
It should be noted that, before training and optimizing the expression model according to the initial characterization vector of each node in each triplet, the computer device may determine, according to a preset characterization manner, the initial characterization vector of any node in the account relationship network.
In one embodiment, the computer device may perform vector representation processing on the target account attributes of the accounts corresponding to each node in the account relationship network, to obtain an initial characterization vector of each node, where the target account attribute of any account may refer to any account attribute of the plurality of account attributes of any account, and attribute sources of the target account attributes of the accounts corresponding to each node are the same, such as an account nickname or an avatar. For example, assuming that the attribute of the target account is an account nickname, the computer device may perform vector representation processing on nicknames of accounts corresponding to each node in the account relationship network to obtain an initial feature vector of each node, in which case, the computer device may perform vector representation processing on nicknames of accounts corresponding to each node in the account relationship network using w2v (word 2vec, a word vector construction model) or Bert (Bidirectional Encoder Representation from Transformers, a pre-trained language characterization model) to obtain an initial feature vector of each node, and the specific embodiment of the vector representation processing is not limited in the present application.
In another embodiment, the computer device may randomly generate a vector for each node in the account relationship network according to the target dimension, and use each randomly generated vector as the initial characterization vector of the corresponding node. Specifically, the computer device may determine the target dimension and randomly generate a vector according to the target dimension, or the computer device may randomly generate a vector within a preset vector range, where the dimension of any vector within the preset vector range is the target dimension. The target dimension and the preset vector range may be set empirically, or may be set according to actual requirements, which is not limited in the present application.
In another embodiment, the computer device may perform random walk in the account relationship network according to a preset walk length by taking any node as a starting point for any node in the account relationship network, and generate an initial characterization vector of any node according to a random walk result. It should be noted that, the computer device may perform random walk in the account relationship network according to the probability of the preset walk length, or may perform random walk in the account relationship network according to the weight value of each edge in the account relationship network according to the preset walk length, that is, during each random walk, the computer device may randomly select a node from the nodes connected to the current node according to the weight value of the edge corresponding to the node connected to the current node, etc., and it may be understood that in the process of sampling with weight, the probability that the node connected to the edge with the larger weight value (i.e. the node with the larger weight value) is sampled is higher, in other words, after sampling for multiple times, the number of times that the node with the larger weight value is sampled is higher.
The preset wander length may be set empirically or according to actual requirements, which is not limited in the present application. It should be appreciated that the dimension of the initial characterization vector is equal to a preset walk length, such as 5 or 6.
For example, as shown in FIG. 3, assuming that the account relationship network includes node A, node B, node C, node D, node E, node F, node G, node H, node I, node J, and node K, and assuming that the preset walk length is 5, then when starting from node A, the computer device may walk randomly to any one of node E, and node H, assuming that the random walk is to node D, and since node D is connected to any one of node A, node B, node C, and node G, the computer device may walk randomly to any one of node A, node B, node C, and node G, and so on, and assuming that the walk path starting from node A may be node A, node D, node B, node C, and node D, assuming that the walk path corresponding to node A, node C, and node D is 1, 2,3, and 4, respectively, and the computer device may generate a random walk according to the initial vector of the results (1,4,2,3,4).
Further, when the computer device performs training optimization on the vector representation model according to the initial characterization vector of each node in each triplet, the vector representation model can be called to perform vector representation on each node in each triplet according to the initial characterization vector of each node in each triplet to obtain an intermediate characterization vector of each node in each triplet, and the model loss value of the vector representation model is calculated according to vector difference conditions required to be met by the intermediate characterization vector of each node in each triplet and the intermediate characterization vector of each node in a single triplet, so that the computer device performs training optimization on the vector representation model according to the direction of reducing the model loss value correspondingly.
It should be appreciated that the token vectors of connected nodes (i.e., nodes with better relationships) are more similar, the token vectors of disconnected nodes (nodes with no relationships) are more distant, based on which the similarity (i.e., distance) between the intermediate token vector of the center node and the intermediate token vector of the corresponding negative sample node in any triplet is greater than the similarity between the intermediate token vector of the center node and the intermediate token vector of the corresponding positive sample node, and that the vector difference condition that the intermediate token vectors of the respective nodes in a single triplet are required to satisfy is that the similarity between the intermediate token vector of the center node and the intermediate token vector of the corresponding negative sample node differs from the similarity between the intermediate token vector of the center node and the intermediate token vector of the corresponding positive sample node by more than a predetermined distance threshold. The preset distance threshold may be set empirically or according to actual requirements, which is not limited in the present application.
S409, clustering the accounts corresponding to the nodes based on the target characterization vectors of the nodes, and identifying abnormal accounts among the accounts participating in clustering according to the clustering result.
The clustering result comprises multiple types of accounts (namely multiple clustering account groups), wherein one type of accounts comprises at least one account, correspondingly, when abnormal accounts are identified among the multiple accounts participating in clustering according to the clustering result, the computer equipment can divide any type of accounts into multiple account pairs according to any type of accounts in the clustering result, and respectively calculate the account relevance of each account pair according to the characteristic value of the target account characteristic of each account in each account pair, wherein if two accounts in any account pair have the same target account characteristic, the account relevance of any account pair is obtained by summing the characteristic values of the same target account characteristic, then sum operation can be carried out on the account relevance of each account pair in the multiple account pairs to obtain the general account relevance of any type of accounts, and correspondingly, if the general account relevance of any type of accounts is greater than or equal to a preset relevance threshold, each account in any type of accounts is identified as abnormal. It should be appreciated that if two accounts in any one account pair do not have the same target account characteristics, then the account association of the any one account pair is zero. When the account association degree of any account pair is calculated, the feature values of the same target account features related to any account pair may be summed, and the result of the summation operation is used as the account association degree of any account pair, or the target feature values in the feature values of the same target account features related to any account pair may be summed, and the result of the summation operation is used as the account association degree of any account pair.
Specifically, when summing the account association degrees of each of the plurality of account pairs to obtain the general account association degrees of any account, the computer device may calculate the general account association degrees of any account (i.e. the score of the association degrees of any account) according to formula 1.4 as follows:
Wherein Rij refers to the account association degree of the ith account and the jth account in any account, n is the number of the accounts in any account, and n is a positive integer.
It should be noted that the preset association threshold may be set empirically, or may be set according to actual requirements, which is not limited in the present application. Further, assuming that the preset association threshold is represented by St, the computer device may take various account numbers of S > =st for manual review, and may adjust St according to the situation of itself in a practical situation. For example, fig. 5a shows a schematic diagram of list data provided for manual review, the data of the same account number may form an independent batch, with a unique batch number, the main feature is the most main feature in the data of the batch, the auxiliary feature is the feature of other references, and the account number is the number of the accounts in the batch, and as another example, clicking on the details of any batch in fig. 5a can see the condition of each account in the batch, including but not limited to the basic information of the account number and the complaint information, etc., as shown in fig. 5 b.
It should be understood that through unsupervised learning based on a graph neural network, various types of account numbers with dispersed characteristics and associated with each other can be mined, for example, as shown in fig. 5c, the individual signatures, the head portraits and the nicknames of the account numbers do not have the same identification, but are associated with each other and have similar geographic positions. It should be noted that fig. 5a-5c only show the content of each display interface by way of example, and the present application is not limited thereto, and as in fig. 5a, information such as batch status may not be included, or fig. 5b may not include real name information.
Further, in order to better illustrate the effect of the account identification method provided by the application, result statistics is performed in practical application, and the obtained statistics result indicates that the daily average found abnormal clustering account group (one abnormal clustering account group is a type of abnormal account) of the account identification method provided by the application is increased by 50%, and the accuracy of the abnormal clustering account group is 60%, wherein one abnormal clustering account group is a type of account with the general account association degree being greater than or equal to a preset association threshold value, and compared with clustering based on account attributes of a single dimension, the account identification method provided by the application can refer to information of account attributes of more dimensions at the same time, so that the recall rate of the abnormal clustering account group is improved, namely the recall rate of the abnormal account is improved under the condition that the accuracy is not affected. In addition, the statistical result also indicates that the high-quality account number (any high-quality account number refers to a type of account number with similar account number attributes in P dimensions, and P is a positive integer greater than a preset dimension threshold) obtained by the account number identification method provided by the application is 93%, which is consistent with the prior art, and the obtained non-high-quality account number (any non-high-quality account number refers to a type of account number with similar account number attributes in Q dimensions, and Q is a positive integer less than or equal to the preset dimension threshold) has 25% improvement, that is, the account number identification method provided by the application can mine more abnormal cluster groups, namely, can mine more abnormal account numbers.
According to the method and the device for identifying the abnormal account, after the plurality of account attributes of each account to be identified are obtained, feature extraction is carried out on the account attributes of each account to obtain a plurality of initial account features, common features are detected from the plurality of initial account features according to feature values of the initial account features, the remaining initial account features except the common features in the plurality of initial account features are all used as target account features, so that a more accurate account relation network is constructed, a plurality of triples are sampled from the account relation network, training optimization is carried out on a vector representation model according to initial characterization vectors of all nodes in each triplet, and the target characterization vectors of all nodes in the account relation network are determined by adopting the optimized vector representation model, so that accuracy of characterization vectors of all nodes is improved, clustering processing is carried out on the target account corresponding to all nodes based on the target characterization vectors with higher accuracy, abnormal accounts are identified in a plurality of clustered accounts according to more accurate clustering results, and further abnormal account recovery rate is improved under the condition that identification accuracy is affected.
Based on the description of the related embodiments of the account identification method, the embodiment of the application also provides an account identification device, which may be a computer program (including program code) running in a computer device. The account identification device may execute the account identification method shown in fig. 2 or fig. 4, please refer to fig. 6, the account identification device may operate the following units:
The acquiring unit 601 is configured to acquire a plurality of account attributes of each account to be identified, and perform feature extraction on each account attribute of each account to obtain a plurality of target account features;
The processing unit 602 is configured to construct an account relationship network by using the multiple target account characteristics, where a node in the account relationship network records the target account characteristics of one account, and two nodes connected to one edge record the same target account characteristics;
The processing unit 602 is further configured to sample a plurality of triplets from the account relational network, where one triplet includes a center node, a positive sample node and a negative sample node, the center node is a node selected from the account relational network, the positive sample node is a node connected to the center node, and the negative sample node is a node unconnected to the center node;
The processing unit 602 is further configured to perform training optimization on the representation model according to the initial characterization vector of each node in each triplet, and determine a target characterization vector of each node in the account relationship network by using the optimized vector representation model;
The processing unit 602 is further configured to perform clustering processing on the accounts corresponding to the nodes based on the target token vectors of the nodes, and identify an abnormal account from among the multiple accounts participating in clustering according to the clustering result.
In one embodiment, when extracting the characteristics of each account attribute of each account, the obtaining unit 601 may be specifically configured to:
Extracting the characteristics of each account attribute of each account to obtain a plurality of initial account characteristics, wherein one account attribute corresponds to one or more initial account characteristics;
Calculating characteristic values of the characteristics of each initial account according to the characteristic types of the characteristics of each initial account;
Detecting common characteristics from the plurality of initial account characteristics according to the characteristic values of the initial account characteristics, wherein the common characteristics are the initial account characteristics held by at least K accounts, and K is a positive integer;
And taking the remaining initial account characteristics except the common characteristics in the plurality of initial account characteristics as target account characteristics.
In another embodiment, when the obtaining unit 601 performs feature extraction on each account attribute of each account to obtain a plurality of initial account features, the method may be specifically used to:
for any account attribute, determining a feature extraction mode corresponding to the any account attribute according to the attribute type of the any account attribute;
And carrying out feature extraction on the attribute of any account according to the feature extraction mode to obtain one or more initial account features corresponding to the attribute of any account.
In another embodiment, when the attribute type is a text type, the feature extraction mode includes at least one of an original string extraction mode, a word segmentation extraction mode and a keyword extraction mode, wherein the original string extraction mode is used for indicating that any account attribute is used as a corresponding initial account feature, the word segmentation extraction mode is used for indicating that any account attribute is subjected to word segmentation processing, a word segmentation processing result is used as a corresponding initial account feature, and the keyword extraction mode is used for indicating that all extracted keywords are used as the corresponding initial account feature;
when the attribute type is an image type, the characteristic extraction mode comprises an image quantization extraction mode, wherein the image quantization extraction mode is used for indicating that any account attribute is subjected to image quantization processing, and an image quantization processing result is used as a corresponding initial account characteristic;
And when the attribute type is numerical value type or character type, the characteristic extraction mode comprises an original string extraction mode.
In another embodiment, the obtaining unit 601 may be specifically configured to, when calculating the feature values of the features of each initial account according to the feature types of the features of each initial account:
Grouping the plurality of initial account characteristics according to the characteristic types of the initial account characteristics to obtain a plurality of account characteristic groups, wherein one account characteristic group corresponds to one characteristic type;
And determining a characteristic value of any initial account number characteristic according to the duty ratio of the any initial account number characteristic in the corresponding account number characteristic group aiming at any initial account number characteristic, wherein the characteristic value of any initial account number characteristic is inversely related to the corresponding duty ratio.
In another embodiment, the obtaining unit 601 may be specifically configured to, when detecting the common feature from the plurality of initial account features according to the feature values of the respective initial account features:
According to the sequence of the characteristic values from large to small, the characteristic values of the initial account numbers are arranged according to the characteristic values of the initial account numbers to obtain a characteristic sequence;
and taking all initial account characteristics of the characteristic sequence, which are positioned behind the target arrangement position, as common characteristics.
In another embodiment, the processing unit 602 may be specifically configured to, when using the plurality of target account characteristics to construct an account relationship network:
generating a plurality of nodes, and recording the target account characteristics into the plurality of nodes according to the principle that one node records the target account characteristics of one account and according to the accounts corresponding to the target account characteristics;
Detecting the same target account number characteristics recorded by any two nodes aiming at any two nodes, and counting the characteristic quantity of the same target account number characteristics when the same target account number characteristics are detected;
And if the feature quantity is larger than the quantity threshold value, connecting any two nodes by adopting one edge.
In another embodiment, the feature number is represented by M, where M is a positive integer, and the processing unit 602 may be specifically configured to, when the feature number is greater than a number threshold and one edge is used to connect any two nodes:
if the feature quantity is larger than the quantity threshold, counting the quantity of different account attributes in the account attributes corresponding to the M target account features;
If the counted number of the different account attributes is larger than the attribute number threshold, connecting any two nodes by adopting one edge.
In another embodiment, the processing unit 602 may be specifically configured to, when counting the number of different account attributes in the account attributes corresponding to the M target account features:
Initializing an account attribute set, wherein the initialized account attribute set is a blank set;
traversing each target account number feature in the M target account number features, and judging whether the account number attribute corresponding to the currently traversed target account number feature is positioned in the account number attribute set;
if not, adding the account attributes corresponding to the currently traversed target account features into the account attribute set, and if so, continuing to traverse the M target account features;
and after each target account number feature in the M target account number features is traversed, counting the number of account number attributes in the account number attribute set to obtain the number of different account number attributes.
In another embodiment, the processing unit 602 may be further configured to:
Performing vector representation processing on target account attributes of accounts corresponding to all nodes in the account relation network respectively to obtain initial characterization vectors of all the nodes;
Or randomly generating a vector for each node in the account relation network according to the target dimension, and taking each randomly generated vector as an initial characterization vector of the corresponding node;
or aiming at any node in the account relation network, taking the any node as a starting point, carrying out random walk in the account relation network according to a preset walk length, and generating an initial characterization vector of the any node according to a random walk result.
In another embodiment, the processing unit 602 may be specifically configured to, when performing training optimization on the vector representation model according to the initial token vector of each node in each triplet:
calling a vector representation model to represent each node in each triplet according to an initial representation vector of each node in each triplet, so as to obtain an intermediate representation vector of each node in each triplet;
calculating a model loss value of the vector representation model according to the intermediate representation vector of each node in each triplet and the vector difference condition required to be met by the intermediate representation vector of each node in a single triplet;
And training and optimizing the vector representation model according to the direction of reducing the model loss value.
In another embodiment, the clustering result includes multiple types of accounts, and one type of account includes at least one account, and the processing unit 602 is specifically configured to, when identifying an abnormal account among the multiple accounts participating in the clustering according to the clustering result:
dividing any account into a plurality of account pairs aiming at any account in the clustering result;
Respectively calculating the account association degree of each account pair according to the characteristic value of the target account characteristic of each account in the plurality of account pairs, wherein if two accounts in any account pair have the same target account characteristic, the account association degree of any account pair is obtained by summing the characteristic values of the same target account characteristic;
Summing the account association degrees of all the account pairs in the plurality of account pairs to obtain the general account association degree of any account;
and if the general account association degree of any account is greater than or equal to a preset association threshold, identifying each account in any account as an abnormal account.
According to one embodiment of the present application, the steps involved in the method shown in fig. 2 or fig. 4 may be performed by the units in the account identification apparatus shown in fig. 6. For example, step S201 shown in fig. 2 may be performed by the acquisition unit 601 shown in fig. 6, and steps S202 to S205 may each be performed by the processing unit 602 shown in fig. 6. As another example, steps S401 to S405 shown in fig. 4 may be performed by the acquisition unit 601 shown in fig. 6, steps S406 to S409 may be performed by the processing unit 602 shown in fig. 6, and so on.
According to another embodiment of the present application, each unit in the account identification apparatus shown in fig. 6 may be separately or completely combined into one or several other units, or some unit(s) thereof may be further split into a plurality of units with smaller functions, which may achieve the same operation without affecting the implementation of the technical effects of the embodiments of the present application. The above units are divided based on logic functions, and in practical applications, the functions of one unit may be implemented by a plurality of units, or the functions of a plurality of units may be implemented by one unit. In other embodiments of the present application, the account identification apparatus may also include other units, and in practical applications, these functions may also be implemented with assistance of other units, and may be implemented by cooperation of multiple units.
According to another embodiment of the present application, an account identification apparatus as shown in fig. 6 may be constructed by running a computer program (including program code) capable of executing steps involved in the respective methods as shown in fig. 2 or 4 on a general-purpose computing device such as a computer including a processing element such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read only storage medium (ROM), and the like, and a storage element, and the account identification method of the embodiment of the present application is implemented. The computer program may be recorded on, for example, a computer storage medium, and loaded into and run in the above-described computing device through the computer storage medium.
According to the method and the device for identifying the multiple account attributes of the multiple accounts, the multiple account attributes of each account to be identified can be obtained, more dimensionality account attributes can be referred to at the same time, expansion of the account attributes is supported, corresponding, feature extraction can be carried out on each account attribute of each account to obtain multiple target account features, a more accurate account relation network is built by the multiple target account features, one node in the account relation network records the target account features of one account, and two nodes connected with one side record the same target account features. Then, a plurality of triplets can be sampled from the account relation network, so that training optimization is carried out on the vector representation model according to the initial representation vector of each node in each triplet, and the more accurate target representation vector of each node in the account relation network is determined by adopting the optimized vector representation model, so that the accuracy of the representation vector of each node is improved, and the account corresponding to the corresponding node is better represented by the target representation vector of each node; correspondingly, the account corresponding to each node can be clustered based on the target characterization vector of each node to obtain a more accurate clustering result, and the abnormal account is identified among the accounts participating in clustering according to the clustering result, so that the recall rate of the abnormal account is improved under the condition that the identification accuracy is not affected.
Based on the description of the method embodiment and the device embodiment, the embodiment of the application also provides a computer device. Referring to fig. 7, the computer device includes at least a processor 701, an input interface 702, an output interface 703, and a computer storage medium 704. Wherein the processor 701, input interface 702, output interface 703, and computer storage medium 704 within a computer device may be connected by a bus or other means.
The computer storage medium 704 may be stored in a memory of a computer device, the computer storage medium 704 being for storing a computer program, the computer program comprising program instructions, the processor 701 being for executing the program instructions stored by the computer storage medium 704. The processor 701 (or CPU (Central Processing Unit, central processing unit)) is a computing core and a control core of a computer device, and is suitable for realizing one or more instructions, particularly suitable for loading and executing one or more instructions to realize a corresponding method flow or a corresponding function, in one embodiment, the processor 701 of the embodiment of the application can be used for performing a series of account identification, and specifically comprises acquiring a plurality of account attributes of each account to be identified, and extracting features of each account attribute of each account to obtain a plurality of target account features, constructing a target account feature network by adopting the plurality of target account features, wherein one node in the account feature network records target account features of one account, two nodes connected by one side record the same target account features, sampling a plurality of triplets from the account feature network, wherein one triplet comprises a central node, a positive sample node and a negative sample node, the positive sample node is a node selected from the relationship network, the positive sample node is a node connected with the central node, the negative sample node is a node connected with the central node, and performs feature extraction on each account attribute of each account to obtain a plurality of target account feature, adopts the target account feature in the target account feature network, one side is used for carrying out the clustering feature vector, and the clustering is performed on the target account feature vector is represented by adopting the model, and the clustering feature vector is determined after the clustering feature is represented in the clustering feature vector is performed according to the initial feature vector.
The embodiment of the application also provides a computer storage medium (Memory), which is a Memory device in the computer device and is used for storing programs and data. It is understood that the computer storage media herein may include both built-in storage media in a computer device and extended storage media supported by the computer device. The computer storage media provides storage space that stores an operating system of the computer device. Also stored in the memory space are one or more instructions, which may be one or more computer programs (including program code), adapted to be loaded and executed by the processor. The computer storage medium may be a high-speed RAM memory, a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory, or at least one computer storage medium located remotely from the processor. In one embodiment, one or more instructions stored in a computer storage medium may be loaded and executed by a processor to implement the various method steps described above in connection with the embodiments of the account identification method shown in fig. 2 or 4.
According to the method and the device for identifying the multiple account attributes of the multiple accounts, the multiple account attributes of each account to be identified can be obtained, more dimensionality account attributes can be referred to at the same time, expansion of the account attributes is supported, corresponding, feature extraction can be carried out on each account attribute of each account to obtain multiple target account features, a more accurate account relation network is built by the multiple target account features, one node in the account relation network records the target account features of one account, and two nodes connected with one side record the same target account features. Then, a plurality of triplets can be sampled from the account relation network, so that training optimization is carried out on the vector representation model according to the initial representation vector of each node in each triplet, and the more accurate target representation vector of each node in the account relation network is determined by adopting the optimized vector representation model, so that the accuracy of the representation vector of each node is improved, and the account corresponding to the corresponding node is better represented by the target representation vector of each node; correspondingly, the account corresponding to each node can be clustered based on the target characterization vector of each node to obtain a more accurate clustering result, and the abnormal account is identified among the accounts participating in clustering according to the clustering result, so that the recall rate of the abnormal account is improved under the condition that the identification accuracy is not affected.
It should be noted that according to an aspect of the present application, there is also provided a computer program product or a computer program comprising computer instructions stored in a computer storage medium. The processor of the computer device reads the computer instructions from the computer storage medium and executes the computer instructions to cause the computer device to perform the methods provided in the various alternatives to the account identification method embodiments aspects shown in fig. 2 or fig. 4 described above.
It is also to be understood that the foregoing is merely illustrative of the present application and is not to be construed as limiting the scope of the application, which is defined by the appended claims.