Movatterモバイル変換


[0]ホーム

URL:


CN110895604A - Correlation fusion method of virtual identity information - Google Patents

Correlation fusion method of virtual identity information
Download PDF

Info

Publication number
CN110895604A
CN110895604ACN201811059284.6ACN201811059284ACN110895604ACN 110895604 ACN110895604 ACN 110895604ACN 201811059284 ACN201811059284 ACN 201811059284ACN 110895604 ACN110895604 ACN 110895604A
Authority
CN
China
Prior art keywords
identity information
virtual identity
metadata
account
tags
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811059284.6A
Other languages
Chinese (zh)
Other versions
CN110895604B (en
Inventor
周琳娜
黄琳凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Relations, University of
Original Assignee
International Relations, University of
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Relations, University offiledCriticalInternational Relations, University of
Priority to CN201811059284.6ApriorityCriticalpatent/CN110895604B/en
Publication of CN110895604ApublicationCriticalpatent/CN110895604A/en
Application grantedgrantedCritical
Publication of CN110895604BpublicationCriticalpatent/CN110895604B/en
Expired - Fee Relatedlegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

The invention provides a method for association and fusion of virtual identity information, which comprises the following steps: acquiring virtual identity information of a plurality of platforms; analyzing the virtual identity information and extracting metadata; associating the metadata to form an associated network; building a tag for the metadata; calculating a membership coefficient of the tag; identity fusion is carried out by adopting a label propagation algorithm, and a community is determined; the method can determine the virtual identity to a certain extent, and provides a reliable scheme for tracing the virtual identity.

Description

Correlation fusion method of virtual identity information
Technical Field
The invention relates to the technical field of information, in particular to a correlation fusion method of virtual identity information.
Background
The virtual identity information is identity information of a person or an organization in a network space, for example, a registers an account 123456 by using a mailbox abc @ def.com in a forum, a nickname is GHIJ, a password is set to be qwer, and the information of the mailbox abc @ def.com, the account 123456, the nickname GHIJ, the password qwer and the like is regarded as the virtual identity information.
With the continuous development of science and technology, networks enter thousands of households, and meanwhile, a large number of network virtual account numbers are established by individuals or organizations, most of the network virtual account numbers can be used without a real-name system, and therefore great difficulty and challenge are brought to security maintenance and supervision of network spaces. In view of this, many departments and organizations gradually build knowledge maps based on virtual identity information to ensure the purity and safety of network space environment, and effectively implement the principle that the offender must be chosen. In the process of constructing the knowledge graph, the association and fusion of the virtual identity information are indispensable links.
At present, because of a large number of network virtual account numbers established by individuals or organizations, the number of virtual identity information corresponding to each network virtual account number is far more than two, and in real life, each individual or organization may have a plurality of virtual account numbers on a certain platform or a plurality of platforms, so that the virtual identity information presents the characteristics of mass and multiple; meanwhile, because the virtual identity information in each large website is presented and shows that a unified standard is not formed, the virtual identity information required in the process of establishing the knowledge graph, such as a mailbox, an account, a nickname and the like, needs to be extracted from mass data really, and data guarantee is provided for association and fusion of the virtual identity information in the next step.
At present, a knowledge graph constructed based on virtual identities generally constructs the relationship between the virtual identities, for example, a relationship between a and B is a friend relationship, an a-B relationship can be established, so that a line-net-surface is formed according to points, but research on association and fusion based on the same virtual identity information is less.
Disclosure of Invention
The invention aims to solve the problems that the virtual identities in the network are various and effective information is difficult to extract in the prior art, and provides a correlation fusion method of virtual identity information, which can determine the virtual identities to a certain extent and provides a reliable scheme for tracing the virtual identities.
A method for association fusion of virtual identity information comprises the following steps:
acquiring virtual identity information of a plurality of platforms;
analyzing the virtual identity information and extracting metadata;
associating the metadata to form an associated network;
building a tag for the metadata;
calculating a membership coefficient of the tag;
and performing identity fusion by adopting a label propagation algorithm, and determining a community.
Further, the virtual identity information is acquired through a crawler and monitoring mode.
Further, the virtual identity information at least includes an account ID.
Further, the metadata includes attributes and attribute values.
Further, the format of the tag is ((account ID, membership, source), and the initial membership of the tag is 1.
Further, calculating a membership coefficient of the tag, comprising:
storing metadata from different sources and tags thereof;
simplifying the label and removing the source;
and comparing all the attributes and the attribute values, and reassigning the dependent coefficients.
Further, comparing all the attributes and attribute values, and reassigning the dependent coefficients, including:
and for the tags with the same attribute and attribute values and different account IDs, reassigning the subordinate coefficients of the tags to ensure that the sum of the subordinate coefficients of all the tags with the same attribute and attribute values is 1.
Further, comparing all the attributes and attribute values, and reassigning the dependent coefficients, further comprising:
and combining the initial tags with the same attribute and attribute values and the same account ID, and reassigning the combined tags so that the sum of the subordinate coefficients of all the tags with the same attribute and attribute values is 1.
Further, identity fusion is carried out by adopting a label propagation algorithm, and communities are determined, wherein the method comprises the following steps:
arbitrarily selecting two account IDs for fusion to serve as target nodes;
extracting all related metadata and tags of the two account IDs to serve as neighbor nodes of a target node;
fusing the labels of the adjacent nodes according to the fused account ID;
updating the subordinate label of the target node, and performing normalization processing on the subordinate coefficient of the subordinate label to enable the sum of the subordinate coefficients to be 1;
and judging whether the membership coefficient is larger than or equal to a preset threshold value, if so, determining that the two account IDs are the same person, and determining that the account ID of the same person is a community.
The virtual identity information association fusion method provided by the invention is directed at the same person or organization association fusion method, the label propagation algorithm is used, the association between the account IDs can be effectively determined, meanwhile, the account IDs can be assisted to be confirmed to be the same person or the same organization according to the corresponding real information, the guarantee is provided for constructing the knowledge graph of the virtual identity information, and a new scheme is provided for a network supervisor to trace and trace the source of the virtual identity information.
Drawings
Fig. 1 is a flowchart of an embodiment of a method for fusing association of virtual identity information provided by the present invention.
Fig. 2 is a block diagram of an embodiment of a method for fusing association of virtual identity information provided by the present invention.
Fig. 3 is a schematic diagram of extracting virtual identity information through crawler and monitoring in an application scenario in the method for associating and fusing virtual identity information provided by the present invention.
Fig. 4 is a schematic diagram of a database format for storing metadata in an application scenario in the association fusion method for virtual identity information provided by the present invention.
Fig. 5 is a schematic diagram of an association network in an application scenario in the association fusion method for virtual identity information provided by the present invention.
Fig. 6 is a schematic diagram of a tag of metadata in an application scenario in the association fusion method for virtual identity information provided by the present invention.
Fig. 7 is a schematic diagram after merging the tags of metadata in an application scenario in the association fusion method for virtual identity information provided in the present invention.
Fig. 8 is a tag diagram of metadata "mailbox-zhangsan 1990@163. com" in an application scenario in the virtual identity information association fusion method provided by the present invention.
Fig. 9 is a tag diagram of metadata "phone-13312341234" in an application scenario in the association fusion method for virtual identity information provided by the present invention.
Fig. 10 is a schematic diagram of metadata "nickname-HappyZS" in an application scenario in the association and fusion method for virtual identity information provided by the present invention.
Fig. 11 is a schematic diagram of fusion of two account IDs in an application scenario in the association fusion method for virtual identity information provided by the present invention.
Fig. 12 is a schematic diagram of all metadata and tags in two account IDs in an application scenario in the association and fusion method for virtual identity information provided by the present invention.
Fig. 13 is a schematic diagram of label fusion of neighboring nodes in an application scenario in the association fusion method for virtual identity information provided in the present invention.
Fig. 14 is a schematic diagram of updating a slave tag of a target node in an application scenario in the association fusion method of virtual identity information provided by the present invention.
Detailed Description
In order to make the objects, technical solutions and effects of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1 and fig. 2, the present embodiment provides a method for fusing association of virtual identity information, including:
step S101, acquiring virtual identity information of a plurality of platforms;
step S102, analyzing the virtual identity information and extracting metadata;
step S103, associating the metadata to form an associated network;
step S104, establishing a label for the metadata;
step S105, calculating a membership coefficient of the tag;
and S106, performing identity fusion by adopting a label propagation algorithm, and determining a community.
Specifically, step S101 is executed, and the virtual identity information may be obtained by using a crawler, a monitor, and the like. The crawler technology, i.e. web crawler, is a program or script that automatically captures web information according to certain rules. Other less commonly used names are ants, automatic indexing, simulation programs, or worms. Network monitoring, including network flow monitoring, website keyword monitoring and other modes, wherein effective information in the flow can be obtained by monitoring the network flow and restoring the flow; by monitoring the keywords of the website, the user can more intuitively acquire the data of the keywords required by the user.
Referring to fig. 3, fig. 3 provides an example of obtaining virtual identity information from a web page associated with the X site.
Step S102 is executed, after the virtual identity information is obtained, the virtual identity information is analyzed, the information obtained by the crawler needs to be subjected to data preprocessing to obtain corresponding virtual identity data, the flow needs to be restored to obtain corresponding data based on the monitoring of the flow, keywords need to be extracted based on the monitoring of the website of the keywords, and the like, wherein the extracted data comprises attributes and attribute values, and the attributes and the attribute values are called as metadata. For example, "name-zhang san", which is an attribute, "zhang san", which is a value corresponding to the attribute of "name", we define "name-zhang san" as a piece of metadata, and similarly, "nickname-HappyZS" is a piece of metadata.
Since each virtual identity has a fixed account, an account ID is selected as a unique identifier of the virtual identity information of the account in a certain platform X, and is recorded as (account ID: information source X). Therefore, the obtained virtual identity information at least comprises the account ID, and in addition, other virtual identity information such as an email, a phone number, a QQ number, a password, a nickname and the like needs to be included, the more the extracted information is, the more perfect the established virtual identity information correlation network is. For the acquired name, gender, birth date, age, identification card number, school and the like, due to the authenticity of the name, the gender, the birth date, the age, the identification card number, the school and the like, the information is used as auxiliary information and plays an important role in virtual identity information fusion. In consideration of convenience of subsequent calling, the three-element form ((account ID: information source) -attribute-value) is saved in a corresponding database. Referring to fig. 4, fig. 4 is a database format of metadata preservation.
Further, step S103 is executed to associate the metadata to form an association network, and the tree-shaped association network can more intuitively display the association among the account ID, the attribute, and the attribute value. Referring to fig. 5, a schematic diagram of an associated network in an application scenario is provided.
Further, step S104 is executed to create a tag for the metadata, where the tag has a format ((account ID, membership, source)) and an initial membership of 1. For example, an initial tag of metadata "name-tree" derived from platform X with account ID zhangsan0123 is ((zhangsan0123, 1), X).
Further, step S105 is executed to store the metadata from different sources and the initial tag established in the last step in a unified manner, and in this process, since the fusion of the virtual identity information is finally formed, the source of the information does not need to be considered, so we can remove the source of the information in the tag. That is, for some metadata in source X ((zhangsan 0123: X) -mailbox-zhangsan 1990@163.com), after de-sourcing, the tag becomes (zhangsan0123, 1).
Subsequently, the assignment is re-performed on the dependent coefficients by comparing all attributes and attribute values:
and for the tags with the same attribute and attribute values and different account IDs, reassigning the subordinate coefficients of the tags to ensure that the sum of the subordinate coefficients of all the tags with the same attribute and attribute values is 1.
The following is illustrated as an example:
s1. for a certain metadata in source X ((zhangsan 0123: X) -mailbox-zhangsan 1990@163.com) an initial tag is established ((zhangsan0123, 1), X);
s2. for another piece of metadata in source Y ((zhangsi 0123: Y) -mailbox-zhangsan 1990@163.com) the initial label established is ((zhangsi0123, 1), Y);
s3. for another piece of metadata in Source Y ((zhangsi 0234: Y) -mailbox-zhangsan 1990@163.com) the initial label established is ((zhangsi0234, 1), Y).
For the three pieces of metadata, their "attribute-attribute values" are all "mailbox-zhangsan 1990@163. com", but because the sources are different or the account IDs are different, their three pieces of metadata are established with three initial labels, but because the "attribute-attribute values" are identical, the sum of their dependent coefficients must be 1, so that the dependent coefficients of the three pieces of metadata are all reassigned, the assigned new dependent coefficient is 1/3, and the label for this piece of metadata is as shown in fig. 6.
Further, the initial tags with the same attribute and attribute values and the same account ID are merged, and the merged tags are reassigned, so that the sum of the subordinate coefficients of all the tags with the same attribute and attribute values is 1.
The following is illustrated as an example:
s4. for a certain metadata in source X ((zhangsan 0123: X) -mailbox-zhangsan 1990@163.com) an initial tag is established ((zhangsan0123, 1), X);
s5. for another piece of metadata in source Y ((zhangsi 0123: Y) -mailbox-zhangsan 1990@163.com) the established initial label is ((zhangsi0123, 1), Y);
s6. for another piece of metadata in source Y ((zhangsan 0123: Y) -mailbox-zhangsan 1990@163.com) the initial label established is ((zhangsan0123, 1), Y).
For the three pieces of metadata, except for the information sources, the rest account IDs, attributes and attribute values of S6 and S4 are the same, and since the attribute-attribute values are completely the same and the sum of their dependent coefficients must be 1, the dependent coefficients of the three pieces of metadata are reassigned, the assigned new dependent coefficient is 1/3, and for S4 and S6, two tags may be merged, and the dependent coefficient is 2/3. The tag representation for this piece of metadata is shown in fig. 7.
Further, step S106 is executed, identity fusion is carried out by adopting a label propagation algorithm, any two accounts ID are selected from a plurality of accounts at first, and the two accounts are fused to be used as target nodes; simultaneously extracting all metadata and tags associated with the account ID, and using the extracted metadata and tags as neighbor nodes of a target node; fusing the labels of the adjacent nodes according to the fused account ID; and enabling the target node to update the label of the target node according to the membership coefficient of the labels of the nodes. This section is illustrated as an example. After the above steps, a plurality of virtual identity information metadata and tags are obtained, two account IDs zhangsan0123 and zhangsi0123 are selected, and for simplicity of the process, it is assumed that all metadata and tags of the two account IDs are shown in fig. 8-10, fig. 8 is a tag schematic diagram of metadata mailbox zhangsan1990@163.com ", fig. 9 is a tag schematic diagram of metadata phone-13312341234", fig. 10 is a schematic diagram of metadata nickname HappyZS ", fig. 11 is a schematic diagram of fusion of two account IDs zhangsan0123 and zhangsi0123, and zhangsan0123 and zhangsi0123 are used as target nodes.
Further, referring to fig. 12, all metadata and tags of two account IDs are extracted and serve as neighbor nodes of the target node.
Referring to fig. 12 and 13, the labels of the neighboring nodes are fused according to the fused account ID: for the tag of the metadata mailbox-zhangsan 1990@163.com ", the original tag is (zhangsan0123, 1/3) (zhangsi0123, 1/3) (zhangsan0234, 1/3), the account IDs are added according to the target nodes zhangsan0123& zhangsi0123, and the dependent coefficients are added, and the fused tag is (zhangsan0123& zhangsi0123, 2/3) (zhangsan0234, 1/3), similarly, for the metadata" phone-13312341234 ", the fused tag is (zhangsan0123& zhangsi0123, 1), and for the metadata" nickname-HappyZS ", the original tag does not contain" zhangsi0123 ", and is" zhangsi0123 "after being fused, and the fused tag is (zhangsan0123& zhangsi0123, 0234, 1/2).
Further, referring to fig. 14, the dependent tags of the target node are updated, and the dependent coefficients of the dependent tags are normalized so that the sum of the dependent coefficients is 1.
Updating the dependent label of the target node, firstly adding the corresponding dependent coefficients, and for the label (zhangsan0123& zhangsi0123), adding the dependent coefficients: 2/3+1+1/2 ═ 13/6, for label (zhangsi0234), the membership coefficients are added: 1/3+1/2 is 5/6, however, 13/6+5/6 is not equal to 1, normalization processing is performed on the basis of the principle that the sum of the subordinate coefficients is 1, the processed subordinate coefficients are 13/18 and 5/18, and therefore the updated subordinate label of the target node is (zhangsan0123& zhangsi0123, 13/18) (zhangsan0234, 5/18).
According to a preset threshold, when the membership coefficient of the fused account ID reaches the preset threshold, it can be determined that the two account IDs are the same person or the same organization. Assuming that the preset threshold is 1/2, the slave label of the target node is (zhangsan0123& zhangsi0123, 13/18), and the slave coefficient 13/18 is greater than 1/2, zhangsan0123 and zhangsi0123 can be considered as the same person.
According to auxiliary information such as name, gender, birth month, identification number and the like, if the auxiliary information points to the same person, the two account IDs can be considered as the same person, and if the auxiliary information points to multiple persons, the two account IDs can be considered as two persons in the same organization.
After repeated virtual identity fusion, the fused virtual identity information is confirmed as an account ID of the same person or the same organization, and the account ID is defined as a new community, and the community has attributes and attribute values of metadata in all the account IDs.
The association fusion method for the virtual identity information provided by this embodiment, aiming at the association fusion method for the same person or organization, uses a tag propagation algorithm, can effectively determine the association between the account IDs, can assist in confirming that the account IDs are of the same person or the same organization according to the corresponding real information, also provides a guarantee for constructing a knowledge graph of the virtual identity information, and provides a new scheme for a network supervisor to trace and trace the source of the virtual identity information.
It will be understood that modifications and variations can be made by persons skilled in the art in light of the above teachings and all such modifications and variations are intended to be included within the scope of the invention as defined in the appended claims.

Claims (9)

CN201811059284.6A2018-09-122018-09-12Correlation fusion method of virtual identity informationExpired - Fee RelatedCN110895604B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201811059284.6ACN110895604B (en)2018-09-122018-09-12Correlation fusion method of virtual identity information

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201811059284.6ACN110895604B (en)2018-09-122018-09-12Correlation fusion method of virtual identity information

Publications (2)

Publication NumberPublication Date
CN110895604Atrue CN110895604A (en)2020-03-20
CN110895604B CN110895604B (en)2022-03-11

Family

ID=69784835

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201811059284.6AExpired - Fee RelatedCN110895604B (en)2018-09-122018-09-12Correlation fusion method of virtual identity information

Country Status (1)

CountryLink
CN (1)CN110895604B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN111459999A (en)*2020-03-272020-07-28北京百度网讯科技有限公司Identity information processing method and device, electronic equipment and storage medium
CN114860822A (en)*2022-03-242022-08-05北京华宇信息技术有限公司 Information data fusion method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20150074039A1 (en)*2013-09-112015-03-12Oracle International CorporationMetadata-driven audit reporting system
CN105677648A (en)*2014-11-182016-06-15四三九九网络股份有限公司Community detection method and system based on label propagation algorithm
CN105893382A (en)*2014-12-232016-08-24天津科技大学Priori knowledge based microblog user group division method
CN107093149A (en)*2017-04-112017-08-25浙江工商大学Online friend relation strength assessment method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20150074039A1 (en)*2013-09-112015-03-12Oracle International CorporationMetadata-driven audit reporting system
CN105677648A (en)*2014-11-182016-06-15四三九九网络股份有限公司Community detection method and system based on label propagation algorithm
CN105893382A (en)*2014-12-232016-08-24天津科技大学Priori knowledge based microblog user group division method
CN107093149A (en)*2017-04-112017-08-25浙江工商大学Online friend relation strength assessment method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吴小兰等: "基于贡献度的多标签传播重叠社区发现研究", 《情报学报》*

Cited By (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN111459999A (en)*2020-03-272020-07-28北京百度网讯科技有限公司Identity information processing method and device, electronic equipment and storage medium
CN111459999B (en)*2020-03-272023-08-18北京百度网讯科技有限公司Identity information processing method, device, electronic equipment and storage medium
CN114860822A (en)*2022-03-242022-08-05北京华宇信息技术有限公司 Information data fusion method and device

Also Published As

Publication numberPublication date
CN110895604B (en)2022-03-11

Similar Documents

PublicationPublication DateTitle
Pacheco et al.Uncovering coordinated networks on social media: methods and case studies
JP6574904B2 (en) Method, server, and storage medium for mining a target object social account
Adamic et al.How to search a social network
CN110110093A (en)A kind of recognition methods, device, electronic equipment and the storage medium of knowledge based map
CN111666501B (en)Abnormal community identification method, device, computer equipment and storage medium
CN112632405A (en)Recommendation method, device, equipment and storage medium
CN106796682A (en) Populate user contact entries
CN110880124A (en)Conversion rate evaluation method and device
CN111177481A (en)User identifier mapping method and device
CN107766470B (en)Intelligent statistical method, intelligent statistical display method and device for data sharing
Vatsalan et al.Privacy risk quantification in education data using Markov model
CN110895604B (en)Correlation fusion method of virtual identity information
CN112925899B (en)Ordering model establishment method, case clue recommendation method, device and medium
CN112307297B (en)User identification unification method and system based on priority rule
CN107070932B (en)Anonymous method for preventing label neighbor attack in social network dynamic release
CN106780062B (en)User group updating method and system based on social network and big data analysis
CN119089237B (en)Refined data processing method based on artificial intelligence
JP5350319B2 (en) Friend recommendation device, method and program
CN113987087A (en)Account processing method and device, electronic equipment and storage medium
CN113313505B (en) Abnormal location method, device and computing equipment
Nettasinghe et al.In-Group Love, Out-Group Hate: A Framework to Measure Affective Polarization via Contentious Online Discussions
CN113806555A (en)Operation abnormity identification method, system, device and storage medium for APP
CN112416922A (en)Group partner association data mining method, device, equipment and storage medium
CN110399399B (en)User analysis method, device, electronic equipment and storage medium
CN117009202A (en)Buried data processing method, buried data processing device, buried data processing equipment and storage medium

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant
CF01Termination of patent right due to non-payment of annual fee

Granted publication date:20220311

CF01Termination of patent right due to non-payment of annual fee

[8]ページ先頭

©2009-2025 Movatter.jp