Movatterモバイル変換


[0]ホーム

URL:


CN110737662B - Data analysis method, device, server and computer storage medium - Google Patents

Data analysis method, device, server and computer storage medium
Download PDF

Info

Publication number
CN110737662B
CN110737662BCN201910958968.8ACN201910958968ACN110737662BCN 110737662 BCN110737662 BCN 110737662BCN 201910958968 ACN201910958968 ACN 201910958968ACN 110737662 BCN110737662 BCN 110737662B
Authority
CN
China
Prior art keywords
data
tag
user data
analyzed
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910958968.8A
Other languages
Chinese (zh)
Other versions
CN110737662A (en
Inventor
俄万有
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co LtdfiledCriticalTencent Technology Shenzhen Co Ltd
Priority to CN201910958968.8ApriorityCriticalpatent/CN110737662B/en
Publication of CN110737662ApublicationCriticalpatent/CN110737662A/en
Application grantedgrantedCritical
Publication of CN110737662BpublicationCriticalpatent/CN110737662B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本发明实施例提供了一种数据分析方法、装置、服务器及计算机存储介质,其中方法包括:在接收到数据分析请求时,获取数据分析请求携带的待分析号码包的标识以及所请求的待分析用户数据的目标数据标签;获取目标数据标签对应的目标标签类别,并获取目标标签类别对应的索引表,索引表包括数据标签、用户数据的信息、号码包标识三者之间的映射关系;根据索引表获取与待分析号码包的标识以及目标数据标签匹配的待分析用户数据,待分析用户数据为索引表中用户数据的信息所指示的用户数据;对获取到的待分析用户数据进行分析,得到待分析号码包中的号码对应的属性特征。通过本发明实施例可以实现用户数据的实时分析,有效提高用户数据的分析效率。

The embodiment of the present invention provides a data analysis method, device, server and computer storage medium, wherein the method includes: upon receiving a data analysis request, obtaining the identifier of the number package to be analyzed carried by the data analysis request and the target data tag of the requested user data to be analyzed; obtaining the target tag category corresponding to the target data tag, and obtaining the index table corresponding to the target tag category, the index table including the mapping relationship between the data tag, the information of the user data and the number package identifier; obtaining the user data to be analyzed that matches the identifier of the number package to be analyzed and the target data tag according to the index table, the user data to be analyzed is the user data indicated by the information of the user data in the index table; analyzing the obtained user data to be analyzed to obtain the attribute characteristics corresponding to the number in the number package to be analyzed. The embodiment of the present invention can realize real-time analysis of user data and effectively improve the analysis efficiency of user data.

Description

Translated fromChinese
一种数据分析方法、装置、服务器及计算机存储介质Data analysis method, device, server and computer storage medium

技术领域Technical Field

本发明涉及数据处理技术领域,尤其涉及一种数据分析方法、装置、服务器及计算机存储介质。The present invention relates to the field of data processing technology, and in particular to a data analysis method, device, server and computer storage medium.

背景技术Background technique

随着互联网技术以及大数据技术的快速发展,基于海量用户数据挖掘实现的用户精细化运营策略成为互联网业务提升业务流量的重要手段。在进行用户数据分析时,目前常用的技术手段是将离线的用户数据与号码包进行映射关联,生成号码包标签表,再基于该号码包标签表进行用户数据分析。但上述方式在数据分析过程中,需要借助离线的数据计算,而离线计算的耗时较长,不能实时为用户反馈数据分析结果,从而导致数据分析的效率低。With the rapid development of Internet technology and big data technology, user-oriented operation strategies based on massive user data mining have become an important means for Internet services to increase business traffic. When analyzing user data, the commonly used technical means is to map and associate offline user data with number packages, generate a number package label table, and then perform user data analysis based on the number package label table. However, the above method requires offline data calculation during the data analysis process, and offline calculation takes a long time and cannot provide users with real-time feedback on data analysis results, resulting in low data analysis efficiency.

发明内容Summary of the invention

本发明实施例提供了一种数据分析方法、装置、服务器及计算机存储介质,可以实现用户数据的实时分析,有效提高用户数据的分析效率。The embodiments of the present invention provide a data analysis method, device, server and computer storage medium, which can realize real-time analysis of user data and effectively improve the analysis efficiency of user data.

一方面,本发明实施例提供了一种数据分析方法,所述方法包括:In one aspect, an embodiment of the present invention provides a data analysis method, the method comprising:

在接收到数据分析请求时,获取所述数据分析请求携带的待分析号码包的标识以及所请求的待分析用户数据的目标数据标签;Upon receiving a data analysis request, obtaining an identifier of a number package to be analyzed and a target data tag of the requested user data to be analyzed carried in the data analysis request;

获取所述目标数据标签对应的目标标签类别,并获取所述目标标签类别对应的索引表,所述索引表包括数据标签、用户数据的信息、号码包标识三者之间的映射关系;Obtaining a target tag category corresponding to the target data tag, and obtaining an index table corresponding to the target tag category, wherein the index table includes a mapping relationship between a data tag, user data information, and a number package identifier;

根据所述索引表获取与所述待分析号码包的标识以及所述目标数据标签匹配的待分析用户数据,所述待分析用户数据为所述索引表中用户数据的信息所指示的用户数据;Acquire, according to the index table, the user data to be analyzed that matches the identifier of the number package to be analyzed and the target data tag, wherein the user data to be analyzed is the user data indicated by the information of the user data in the index table;

对获取到的待分析用户数据进行分析,得到所述待分析号码包中的号码对应的属性特征。The acquired user data to be analyzed is analyzed to obtain attribute features corresponding to the numbers in the number package to be analyzed.

在一实施例中,号码包中包括至少一个号码,号码包中的号码用于表示用户的通信标识,所述通信标识为设备标识、应用账号、路由地址中的任一种。In one embodiment, the number package includes at least one number, and the number in the number package is used to represent the communication identification of the user, and the communication identification is any one of a device identification, an application account, and a routing address.

在一实施例中,所述在接收到数据分析请求时,获取所述数据分析请求携带的待分析号码包的标识以及所请求的待分析用户数据的目标数据标签之前,所述方法还包括:In one embodiment, when receiving the data analysis request, before obtaining the identifier of the number package to be analyzed carried by the data analysis request and the target data tag of the requested user data to be analyzed, the method further includes:

在检测到有新增号码包时,获取所述新增号码包,所述新增号码包与所述待分析号码包匹配;When a newly added number package is detected, the newly added number package is obtained, and the newly added number package is matched with the number package to be analyzed;

获取根据以所述目标标签类别的数据标签为标签的用户数据生成的特征标签表,所述特征标签表包括数据标签、用户数据的信息两者之间的映射关系;Acquire a feature label table generated according to user data labeled with a data label of the target label category, wherein the feature label table includes a mapping relationship between information of the data label and user data;

根据所述新增号码包以及所述特征标签表创建索引表。An index table is created according to the newly added number package and the feature label table.

在一实施例中,所述用户数据的信息包括用于标识所述用户数据的信息的号码,所述根据所述新增号码包以及所述特征标签表创建索引表,包括:In one embodiment, the information of the user data includes a number used to identify the information of the user data, and the creating an index table according to the newly added number package and the feature tag table includes:

获取所述新增号码包的标识以及所述新增号码包中的号码;Obtaining an identifier of the newly added number package and numbers in the newly added number package;

在所述特征标签表中添加号码包标记,以为所述特征标签表中符合预设条件的用户数据的信息标记所述新增号码包的标识,其中,符合预设条件是指用于标识用户数据的信息的号码为所述新增号码包中的号码;Adding a number package tag in the feature tag table to mark the identifier of the newly added number package for the information of the user data that meets the preset conditions in the feature tag table, wherein meeting the preset conditions means that the number used to identify the information of the user data is a number in the newly added number package;

根据添加号码包标记的特征标签表创建索引表。Create an index table based on the feature tag table with added number package tags.

在一实施例中,所述根据添加号码包标记的特征标签表创建索引表,包括:In one embodiment, the step of creating an index table according to the feature tag table to which the number package mark is added includes:

根据添加号码包标记的特征标签表创建倒排索引表,所述倒排索引表中以数据标签和号码包标识为属性值、以用户数据的信息为具有所述属性值的数据。An inverted index table is created according to the feature tag table with number package marks added, wherein the inverted index table uses data tags and number package identifiers as attribute values and user data information as data having the attribute values.

在一实施例中,所述获取所述目标数据标签对应的目标标签类别,包括:In one embodiment, obtaining the target tag category corresponding to the target data tag includes:

从预置的标签字典中查询所述目标数据标签对应的至少一个标签类别,所述目标标签类别为所述至少一个标签类别中的任意一个。At least one tag category corresponding to the target data tag is queried from a preset tag dictionary, and the target tag category is any one of the at least one tag category.

在一实施例中,所述用户数据的信息包括所述用户数据的信息所指示的用户数据的存储地址,所述根据所述索引表获取与所述待分析号码包的标识以及所述目标数据标签匹配的待分析用户数据,包括:In one embodiment, the information of the user data includes a storage address of the user data indicated by the information of the user data, and acquiring the user data to be analyzed that matches the identifier of the number package to be analyzed and the target data tag according to the index table includes:

根据所述索引表获取与所述待分析号码包的标识以及所述目标数据标签匹配的目标用户数据的信息;Acquire information of target user data matching the identifier of the number package to be analyzed and the target data tag according to the index table;

从所述目标用户数据的信息中获取目标用户数据的存储地址,所述目标用户数据为所述目标用户数据的信息所指示的用户数据;Acquire a storage address of the target user data from the information of the target user data, wherein the target user data is user data indicated by the information of the target user data;

根据所述存储地址获取所述目标用户数据,并将所述目标用户数据作为待分析用户数据。The target user data is acquired according to the storage address, and the target user data is used as the user data to be analyzed.

另一方面,本发明实施例提供了一种数据分析装置,所述装置包括:On the other hand, an embodiment of the present invention provides a data analysis device, the device comprising:

获取单元,用于在收发单元接收到数据分析请求时,获取所述数据分析请求携带的待分析号码包的标识以及所请求的待分析用户数据的目标数据标签;An acquisition unit, configured to acquire, when the transceiver unit receives the data analysis request, an identifier of the number package to be analyzed carried in the data analysis request and a target data tag of the requested user data to be analyzed;

所述获取单元,还用于获取所述目标数据标签对应的目标标签类别,并获取所述目标标签类别对应的索引表,所述索引表包括数据标签、用户数据的信息、号码包标识三者之间的映射关系;The acquisition unit is further used to acquire the target tag category corresponding to the target data tag, and acquire the index table corresponding to the target tag category, wherein the index table includes a mapping relationship between the data tag, the user data information, and the number package identifier;

处理单元,用于根据所述索引表获取与所述待分析号码包的标识以及所述目标数据标签匹配的待分析用户数据,所述待分析用户数据为所述索引表中用户数据的信息所指示的用户数据;a processing unit, configured to obtain, according to the index table, user data to be analyzed that matches the identifier of the number package to be analyzed and the target data tag, wherein the user data to be analyzed is user data indicated by information of the user data in the index table;

所述处理单元,还用于对获取到的待分析用户数据进行分析,得到所述待分析号码包中的号码对应的属性特征。The processing unit is further configured to analyze the acquired user data to be analyzed to obtain attribute features corresponding to the numbers in the number package to be analyzed.

再一方面,本发明实施例提供了一种服务器,包括:处理器和存储器,所述存储器存储有可执行程序代码,所述处理器用于调用所述可执行程序代码,执行上述数据分析方法。On the other hand, an embodiment of the present invention provides a server, including: a processor and a memory, the memory storing executable program code, and the processor being used to call the executable program code to execute the above-mentioned data analysis method.

相应地,本发明实施例还提供了一种计算机存储介质,所述计算机存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述数据分析方法。Correspondingly, an embodiment of the present invention further provides a computer storage medium, in which instructions are stored, and when the computer storage medium is run on a computer, the computer executes the above-mentioned data analysis method.

本发明实施例通过响应于数据分析请求获取待分析号码包的标识以及目标数据标签,获取与目标数据标签对应的目标标签类别匹配的索引表,根据索引表获取与该标识以及目标数据标签匹配的待分析用户数据,对待分析用户数据进行分析,得到属性特征,从而可以实现用户数据的实时分析,有效提高用户数据的分析效率。The embodiment of the present invention obtains the identifier of the number package to be analyzed and the target data tag in response to a data analysis request, obtains an index table that matches the target tag category corresponding to the target data tag, obtains the user data to be analyzed that matches the identifier and the target data tag according to the index table, analyzes the user data to be analyzed, and obtains attribute characteristics, thereby realizing real-time analysis of user data and effectively improving the analysis efficiency of user data.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required for use in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying creative work.

图1是本发明实施例提供的一种数据分析方法的流程示意图;FIG1 is a schematic diagram of a flow chart of a data analysis method provided by an embodiment of the present invention;

图2是本发明实施例提供的另一种数据分析方法的流程示意图;FIG2 is a schematic diagram of a flow chart of another data analysis method provided by an embodiment of the present invention;

图3是本发明实施例提供的一种数据分析系统的架构示意图;FIG3 is a schematic diagram of the architecture of a data analysis system provided by an embodiment of the present invention;

图4是本发明实施例提供的一种索引表创建方法的流程示意图;FIG4 is a schematic diagram of a flow chart of a method for creating an index table provided in an embodiment of the present invention;

图5是本发明实施例提供的又一种数据分析方法的流程示意图;FIG5 is a schematic diagram of a flow chart of another data analysis method provided by an embodiment of the present invention;

图6是本发明实施例提供的一种数据分析装置的结构示意图;FIG6 is a schematic diagram of the structure of a data analysis device provided by an embodiment of the present invention;

图7是本发明实施例提供的一种服务器的结构示意图。FIG. 7 is a schematic diagram of the structure of a server provided in an embodiment of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述。显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be described clearly and completely below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.

下面结合附图,对本发明的一些实施方式作详细说明。在不冲突的情况下,下述的实施例及实施例中的特征可以相互组合。Some embodiments of the present invention are described in detail below in conjunction with the accompanying drawings. In the absence of conflict, the following embodiments and features in the embodiments can be combined with each other.

在互联网业务中,描述一个用户的数据往往是多维度交叉的。如在视频业务中,用户数据既存在以设备id(guid)为标识(key)的大盘用户播放数据、活跃数据等,设备id包括设备的物理地址(MAC地址);也存在以用户应用账号(vuid)为key的会员属性数据等,还存在以家庭路由地址(wifi-mac)为key的家庭属性数据。一个key标识一个用户,对应一组用户数据。In Internet business, the data describing a user is often multi-dimensional and cross-cutting. For example, in video business, user data includes large-scale user playback data and active data with device ID (guid) as the key, and device ID includes the physical address (MAC address) of the device; there is also member attribute data with user application account (vuid) as the key, and there is also family attribute data with home routing address (wifi-mac) as the key. One key identifies one user, corresponding to a group of user data.

在互联网业务的数据分析过程中,往往需要对一个用户群体进行跨类型的数据分析。如对于一个视频会员群体(vuid组成的号码包),需要同时分析其会员特征的分布以及播放特征的分布;可以根据该视频会员群体对应的大盘用户播放数据等分析其播放特征的分布,根据该视频会员群体对应的会员属性数据等分析其会员特征的分布。对于这种数据分析需求,需要进行跨类型数据的整合。在可行的实施方式中,可以将离线的至少两种类型的用户数据与号码包进行映射关联,生成号码包对应的跨类型数据关联的号码包标签表,再基于该号码包标签表进行用户数据分析;该方法的处理流程如图1所示,包括以下步骤:In the data analysis process of Internet services, it is often necessary to conduct cross-type data analysis on a user group. For example, for a video member group (a number package composed of vuids), it is necessary to analyze the distribution of its member characteristics and the distribution of its playback characteristics at the same time; the distribution of its playback characteristics can be analyzed based on the large-scale user playback data corresponding to the video member group, and the distribution of its member characteristics can be analyzed based on the member attribute data corresponding to the video member group. For this kind of data analysis requirement, cross-type data integration is required. In a feasible implementation method, at least two types of offline user data can be mapped and associated with the number package, and a number package label table associated with the cross-type data corresponding to the number package is generated, and then user data analysis is performed based on the number package label table; the processing flow of the method is shown in Figure 1, and includes the following steps:

数据分析人员在分析系统提交数据分析任务,并进行相关的数据分析配置。系统解析数据分析人员提交的配置信息,提取对应的属性标签以及号码包,并对提取出的属性标签进行归类,如大盘类标签、会员类标签、家庭类标签等。根据标签类型,将原始的号码包映射为与各标签类型对应的号码包。结合标签类型和号码包,生成各标签类型对应的子标签表。将各子标签表进行合表,合并成以目标号码包类型为key的标签大表。将标签表导入到搜索引擎,并进行用户数据提取及分析。待数据分析完成后,向待分析人员返回分析结果。但上述方式在数据分析过程中,需要借助离线的数据计算(包括号码包映射、子标签表提取、标签合表等),而离线计算的耗时较长,不能实时为用户反馈数据分析结果,从而导致数据分析的效率低。The data analyst submits the data analysis task to the analysis system and performs relevant data analysis configuration. The system parses the configuration information submitted by the data analyst, extracts the corresponding attribute tags and number packages, and classifies the extracted attribute tags, such as large-scale tags, membership tags, family tags, etc. According to the tag type, the original number package is mapped to the number package corresponding to each tag type. Combining the tag type and number package, a sub-tag table corresponding to each tag type is generated. Each sub-tag table is merged into a large tag table with the target number package type as the key. The tag table is imported into the search engine, and user data is extracted and analyzed. After the data analysis is completed, the analysis results are returned to the person to be analyzed. However, the above method requires offline data calculation (including number package mapping, sub-tag table extraction, tag merging, etc.) during the data analysis process, and offline calculation takes a long time and cannot provide real-time feedback on data analysis results to users, resulting in low data analysis efficiency.

基于此,本发明实施例提供一种数据分析方法,事先将各种用户数据对应的特征标签表输入到搜索系统,并创建倒排索引表,以及存储账号映射数据。在接收到数据分析请求时,获取数据分析请求携带的待分析号码包的标识以及所请求的待分析用户数据的目标数据标签;获取目标数据标签对应的目标标签类别,并获取目标标签类别对应的索引表,索引表包括数据标签、用户数据的信息、号码包标识三者之间的映射关系;根据索引表获取与待分析号码包的标识以及目标数据标签匹配的待分析用户数据,待分析用户数据为索引表中用户数据的信息所指示的用户数据;对获取到的待分析用户数据进行分析,得到待分析号码包中的号码对应的属性特征。采用上述方式可以实现用户数据的实时分析,有效提高用户数据的分析效率,以下进行详细说明。Based on this, an embodiment of the present invention provides a data analysis method, which inputs the feature label table corresponding to various user data into the search system in advance, creates an inverted index table, and stores account mapping data. When a data analysis request is received, the identifier of the number package to be analyzed carried by the data analysis request and the target data label of the requested user data to be analyzed are obtained; the target label category corresponding to the target data label is obtained, and the index table corresponding to the target label category is obtained, and the index table includes the mapping relationship between the data label, the information of the user data, and the number package identifier; according to the index table, the user data to be analyzed that matches the identifier of the number package to be analyzed and the target data label is obtained, and the user data to be analyzed is the user data indicated by the information of the user data in the index table; the obtained user data to be analyzed is analyzed to obtain the attribute characteristics corresponding to the numbers in the number package to be analyzed. The above method can realize real-time analysis of user data and effectively improve the analysis efficiency of user data, which is described in detail below.

请参阅图2,图2为本发明实施例提供的一种数据分析方法的流程示意图。本发明实施例中所描述的数据分析方法应用于服务器,所述方法包括:Please refer to Figure 2, which is a flow chart of a data analysis method provided by an embodiment of the present invention. The data analysis method described in the embodiment of the present invention is applied to a server, and the method includes:

S201、在接收到数据分析请求时,获取所述数据分析请求携带的待分析号码包的标识以及所请求的待分析用户数据的目标数据标签。S201. When a data analysis request is received, an identifier of a number package to be analyzed and a target data tag of the requested user data to be analyzed carried in the data analysis request are obtained.

本发明实施例中,数据分析人员通过终端提交数据分析任务,并进行相关的数据分析配置;终端根据用户输入的配置数据向服务器发送数据分析请求,该数据分析请求携带待分析号码包的标识、以及所请求的待分析用户数据的目标数据标签。服务器在接收到终端发送的数据分析请求时,响应于该数据分析请求获取该待分析号码包的标识以及该目标数据标签。In the embodiment of the present invention, a data analyst submits a data analysis task through a terminal and performs relevant data analysis configuration; the terminal sends a data analysis request to a server according to the configuration data input by the user, and the data analysis request carries the identifier of the number package to be analyzed and the target data tag of the requested user data to be analyzed. When the server receives the data analysis request sent by the terminal, it obtains the identifier of the number package to be analyzed and the target data tag in response to the data analysis request.

其中,号码包中包括同一类别的至少一个号码,号码包中的号码用于表示用户的通信标识,该通信标识为设备标识、应用账号、路由地址中的任一种。目标数据标签包括一个或者多个数据标签。数据标签用于指示用户数据的数据类型。在一实施例中,用户数据包括以设备id为key的大盘用户播放、活跃数据等,以用户应用账号为key的会员属性数据等,以家庭路由地址为key的家庭属性数据等。The number package includes at least one number of the same category, and the number in the number package is used to represent the user's communication identifier, which is any one of a device identifier, an application account, and a routing address. The target data tag includes one or more data tags. The data tag is used to indicate the data type of the user data. In one embodiment, the user data includes large-scale user playback and active data with the device id as the key, member attribute data with the user application account as the key, and family attribute data with the family routing address as the key.

S202、获取所述目标数据标签对应的目标标签类别,并获取所述目标标签类别对应的索引表,所述索引表包括数据标签、用户数据的信息、号码包标识三者之间的映射关系。S202: Obtain a target tag category corresponding to the target data tag, and obtain an index table corresponding to the target tag category, wherein the index table includes a mapping relationship between a data tag, user data information, and a number package identifier.

本发明实施例中,服务器从预置的标签字典中查询目标数据标签对应的至少一个标签类别,例如大盘类标签、会员类标签、家庭类标签、guid标签、vuid标签、wifi-mac标签等标签类别。目标标签类别为该至少一个标签类别中的任意一个。索引表中的用户数据的信息包括:用于标识用户数据的信息的号码、用户数据的信息所指示的用户数据的存储地址、用户数据的信息所指示的用户数据中的一种或者多种。在一实施方式中,用于标识用户数据的信息的号码,与用于标识该信息所指示的用户数据的号码一致;用于标识用户数据的信息的号码为相应用户的通信标识,该通信标识为设备标识、应用账号、路由地址中的任一种。In an embodiment of the present invention, the server queries at least one tag category corresponding to the target data tag from a preset tag dictionary, such as a large-market tag, a membership tag, a family tag, a GUID tag, a VUID tag, a WiFi-MAC tag, and other tag categories. The target tag category is any one of the at least one tag category. The information of the user data in the index table includes: a number of information for identifying the user data, a storage address of the user data indicated by the information of the user data, and one or more of the user data indicated by the information of the user data. In one embodiment, the number of the information for identifying the user data is consistent with the number for identifying the user data indicated by the information; the number of the information for identifying the user data is the communication identifier of the corresponding user, and the communication identifier is any one of a device identifier, an application account, and a routing address.

在一实施例中,目标标签类别对应的索引表是事先创建的。服务器在检测到有新增号码包时,获取该新增号码包。该新增号码包与该待分析号码包匹配,即该新增号码包与该待分析号码包相同;该新增号码包的标识与该待分析号码包的标识一致。服务器获取目标标签类别对应的特征标签表,该特征标签表是事先根据以目标标签类别的数据标签为标签的用户数据生成的,包括数据标签、用户数据的信息两者之间的映射关系。In one embodiment, the index table corresponding to the target tag category is created in advance. When the server detects a newly added number package, it obtains the newly added number package. The newly added number package matches the number package to be analyzed, that is, the newly added number package is the same as the number package to be analyzed; the identifier of the newly added number package is consistent with the identifier of the number package to be analyzed. The server obtains a feature tag table corresponding to the target tag category, which is generated in advance based on user data labeled with a data tag of the target tag category, including a mapping relationship between data tags and user data information.

进一步地,服务器根据该新增号码包以及该特征标签表创建索引表。在一实施方式中,获取新增号码包的标识以及新增号码包中的号码;在特征标签表中添加号码包标记,以为特征标签表中符合预设条件的用户数据的信息标记新增号码包的标识。其中,符合预设条件是指用于标识用户数据的信息的号码为该新增号码包中的号码,或者用于标识用户数据的信息的号码为该新增号码包中的号码对应的目标类别的号码;目标类别与该新增号码包中的号码所属的类别不同,服务器存储有不同类别的号码之间的映射关系。进一步地,根据添加号码包标记的特征标签表创建索引表。具体地,根据添加号码包标记的特征标签表创建倒排索引表,该倒排索引表中以数据标签和号码包标识为属性值、以用户数据的信息为具有该属性值的数据,从而基于数据标签和号码包标识可以快速查询到需要分析的用户数据的信息,进而快速获取到需要分析的用户数据。Further, the server creates an index table based on the newly added number package and the feature tag table. In one embodiment, the identifier of the newly added number package and the numbers in the newly added number package are obtained; the number package tag is added to the feature tag table to mark the identifier of the newly added number package for the information of the user data that meets the preset conditions in the feature tag table. Among them, meeting the preset conditions means that the number used to identify the information of the user data is a number in the newly added number package, or the number used to identify the information of the user data is a number of the target category corresponding to the number in the newly added number package; the target category is different from the category to which the number in the newly added number package belongs, and the server stores the mapping relationship between numbers of different categories. Further, an index table is created based on the feature tag table with the number package tag added. Specifically, an inverted index table is created based on the feature tag table with the number package tag added, in which the data tag and the number package identifier are used as attribute values, and the information of the user data is used as data with the attribute value, so that the information of the user data that needs to be analyzed can be quickly queried based on the data tag and the number package identifier, and the user data that needs to be analyzed can be quickly obtained.

S203、根据所述索引表获取与所述待分析号码包的标识以及所述目标数据标签匹配的待分析用户数据,所述待分析用户数据为所述索引表中用户数据的信息所指示的用户数据。S203: Acquire, according to the index table, user data to be analyzed that matches the identifier of the number package to be analyzed and the target data tag, wherein the user data to be analyzed is user data indicated by information of user data in the index table.

本发明实施例中,服务器根据目标标签类别对应的索引表获取标记为该待分析号码包的标识、且标记为该目标数据标签的目标用户数据的信息;根据该目标用户数据的信息获取待分析用户数据。当目标用户数据的信息包括目标用户数据时,直接从该目标用户数据的信息中获取目标用户数据,并将获取到的目标用户数据作为待分析用户数据;目标用户数据为该目标用户数据的信息所指示的用户数据。当目标用户数据的信息包括目标用户数据的存储地址时,从该目标用户数据的信息中获取目标用户数据的存储地址,并根据该存储地址获取目标用户数据,并将获取到的目标用户数据作为待分析用户数据。In an embodiment of the present invention, the server obtains information of the target user data marked as the identifier of the number package to be analyzed and marked as the target data tag according to the index table corresponding to the target tag category; and obtains the user data to be analyzed according to the information of the target user data. When the information of the target user data includes the target user data, the target user data is directly obtained from the information of the target user data, and the obtained target user data is used as the user data to be analyzed; the target user data is the user data indicated by the information of the target user data. When the information of the target user data includes the storage address of the target user data, the storage address of the target user data is obtained from the information of the target user data, and the target user data is obtained according to the storage address, and the obtained target user data is used as the user data to be analyzed.

在另一实施例中,目标数据标签对应至少两个标签类别,且该至少两个标签类别为预设标签类别集合中的标签类别;其中,预设标签类别集合包括guid标签、vuid标签、wifi-mac标签中的至少两个标签类别。目标标签类别为该至少两个标签类别中的任意一个。在一实施方式中,对于第一标签类别,第一标签类别为该至少两个标签类别中的任意一个;服务器根据第一标签类别对应的索引表获取标记为该待分析号码包的标识、且标记为第一标签类别的数据标签的目标用户数据的信息;根据该目标用户数据的信息获取待分析用户数据。并且,根据第一标签类别对应的索引表获取标记为该待分析号码包的标识的至少一个第一号码,该第一号码为第一标签类别对应的号码,根据该至少一个第一号码形成第一号码包。获取预设的第一号码与第二号码之间的映射关系,并根据该映射关系获取与该第一号码包对应的第二号码包,该第二号码包中的第二号码与该第一号码包中的第一号码一一对应;第二号码为第二标签类别对应的号码,第二标签类别为该至少两个标签类别中除第一标签类别之外的任一标签类别。对于第二标签类别,服务器根据第二标签类别对应的索引表获取以第二号码包中的第二号码为标识、标记为该待分析号码包的标识、且标记为第二标签类别的数据标签的目标用户数据的信息;根据该目标用户数据的信息获取待分析用户数据。In another embodiment, the target data tag corresponds to at least two tag categories, and the at least two tag categories are tag categories in a preset tag category set; wherein the preset tag category set includes at least two tag categories of GUID tags, VUID tags, and WiFi-MAC tags. The target tag category is any one of the at least two tag categories. In one implementation, for the first tag category, the first tag category is any one of the at least two tag categories; the server obtains information of the target user data marked as the identifier of the number package to be analyzed and the data tag marked as the first tag category according to the index table corresponding to the first tag category; the user data to be analyzed is obtained according to the information of the target user data. In addition, at least one first number marked as the identifier of the number package to be analyzed is obtained according to the index table corresponding to the first tag category, and the first number is a number corresponding to the first tag category, and a first number package is formed according to the at least one first number. Obtain a mapping relationship between a preset first number and a second number, and obtain a second number package corresponding to the first number package according to the mapping relationship, wherein the second number in the second number package corresponds to the first number in the first number package one-to-one; the second number is a number corresponding to a second label category, and the second label category is any label category of the at least two label categories except the first label category. For the second label category, the server obtains information on target user data that is identified by the second number in the second number package, marked as the identifier of the number package to be analyzed, and marked as a data label of the second label category according to an index table corresponding to the second label category; and obtains the user data to be analyzed according to the information on the target user data.

例如,该至少两个标签类别为vuid标签和guid标签两个标签类别,则目标标签类别为vuid标签和guid标签中的任意一个。假设第一标签类别为vuid标签,第二标签类别为guid标签。对于vuid标签,根据vuid标签对应的索引表获取标记为该待分析号码包的标识、且标记为vuid标签的目标用户数据的信息;根据该目标用户数据的信息获取第一待分析用户数据。并且,根据vuid标签对应的索引表获取标记为该待分析号码包的标识的至少一个vuid号码,根据该至少一个vuid号码形成vuid号码包。获取预设的vuid号码与guid号码之间的映射关系,并根据该映射关系获取与该vuid号码包对应的guid号码包,该guid号码包中的guid号码与该vuid号码包中的vuid号码一一对应。对于guid标签,根据guid标签对应的索引表获取以该guid号码包中的guid号码为标识、标记为该待分析号码包的标识、且标记为guid标签的目标用户数据的信息;根据该目标用户数据的信息获取第二待分析用户数据。基于此,可以获取到至少两个类别的待分析用户数据,以便于后续进行跨类别的用户数据分析。For example, if the at least two tag categories are vuid tags and guid tags, then the target tag category is any one of the vuid tags and the guid tags. Assume that the first tag category is a vuid tag and the second tag category is a guid tag. For the vuid tag, obtain information of the target user data marked as the identifier of the number package to be analyzed and marked as the vuid tag according to the index table corresponding to the vuid tag; obtain the first user data to be analyzed according to the information of the target user data. In addition, obtain at least one vuid number marked as the identifier of the number package to be analyzed according to the index table corresponding to the vuid tag, and form a vuid number package according to the at least one vuid number. Obtain a mapping relationship between a preset vuid number and a guid number, and obtain a guid number package corresponding to the vuid number package according to the mapping relationship, and the guid number in the guid number package corresponds one-to-one to the vuid number in the vuid number package. For the GUID tag, information of the target user data with the GUID number in the GUID number package as the identifier, marked as the identifier of the number package to be analyzed, and marked as the GUID tag is obtained according to the index table corresponding to the GUID tag; the second user data to be analyzed is obtained according to the information of the target user data. Based on this, at least two categories of user data to be analyzed can be obtained, so as to facilitate subsequent cross-category user data analysis.

在另一实施方式中,服务器根据目标标签类别对应的索引表获取标记为该待分析号码包的标识、且标记为该目标标签类别的数据标签的目标用户数据的信息;根据该目标用户数据的信息获取待分析用户数据。例如,该至少两个标签类别为vuid标签和guid标签两个标签类别,则目标标签类别为vuid标签和guid标签中的任意一个。对于vuid标签,根据vuid标签对应的索引表获取标记为该待分析号码包的标识、且标记为vuid标签的目标用户数据的信息;根据该目标用户数据的信息获取第一待分析用户数据。同理,对于guid标签,根据guid标签对应的索引表获取标记为该待分析号码包的标识、且标记为guid标签的目标用户数据的信息;根据该目标用户数据的信息获取第二待分析用户数据。基于此,也可以获取到至少两个类别的待分析用户数据,以便于后续进行跨类别的用户数据分析。In another embodiment, the server obtains information of the target user data marked as the identifier of the number package to be analyzed and the data tag marked as the target tag category according to the index table corresponding to the target tag category; and obtains the user data to be analyzed according to the information of the target user data. For example, if the at least two tag categories are two tag categories of vuid tags and guid tags, then the target tag category is any one of the vuid tags and the guid tags. For the vuid tag, the server obtains information of the target user data marked as the identifier of the number package to be analyzed and the vuid tag according to the index table corresponding to the vuid tag; and obtains the first user data to be analyzed according to the information of the target user data. Similarly, for the guid tag, the server obtains information of the target user data marked as the identifier of the number package to be analyzed and the guid tag according to the index table corresponding to the guid tag; and obtains the second user data to be analyzed according to the information of the target user data. Based on this, at least two categories of user data to be analyzed can also be obtained to facilitate subsequent cross-category user data analysis.

S204、对获取到的待分析用户数据进行分析,得到所述待分析号码包中的号码对应的属性特征。S204: Analyze the acquired user data to be analyzed to obtain attribute features corresponding to the numbers in the number package to be analyzed.

例如,当获取到的待分析数据为大盘用户播放数据时,对获取到的大盘用户播放数据进行分析,可以得到待分析号码包对应的应用用户群体的播放特征;当获取到的待分析数据为会员属性数据时,对获取到的会员属性数据进行分析,可以得到待分析号码包对应的应用用户群体的会员特征。For example, when the data to be analyzed is the playback data of large-scale users, by analyzing the obtained playback data of large-scale users, the playback characteristics of the application user group corresponding to the number package to be analyzed can be obtained; when the data to be analyzed is member attribute data, by analyzing the obtained member attribute data, the member characteristics of the application user group corresponding to the number package to be analyzed can be obtained.

本发明实施例通过响应于数据分析请求获取待分析号码包的标识以及目标数据标签,获取与目标数据标签对应的目标标签类别匹配的索引表,根据索引表获取与该标识以及目标数据标签匹配的待分析用户数据,对待分析用户数据进行分析,得到属性特征,从而可以实现用户数据的实时分析,提高用户数据的分析效率。The embodiment of the present invention obtains the identifier of the number package to be analyzed and the target data tag in response to a data analysis request, obtains an index table that matches the target tag category corresponding to the target data tag, obtains the user data to be analyzed that matches the identifier and the target data tag according to the index table, analyzes the user data to be analyzed, and obtains attribute characteristics, thereby realizing real-time analysis of user data and improving the analysis efficiency of user data.

在可行的实施方式中,本发明实施例提供的数据分析方法可以应用于数据分析系统中,该数据分析系统可以设置于服务器中。如图3所示,数据分析系统包括交互层、接入层、逻辑层、数据同步层以及及数据存储层五部分。各部分的功能如下:In a feasible implementation, the data analysis method provided in the embodiment of the present invention can be applied to a data analysis system, which can be set in a server. As shown in FIG3 , the data analysis system includes five parts: an interaction layer, an access layer, a logic layer, a data synchronization layer, and a data storage layer. The functions of each part are as follows:

1、交互层:向用户提供具有输入输出能力的web系统,以向用户提供简单明了的数据分析任务配置页面以及分析结果展示页面,满足用户数据分析需求。1. Interaction layer: Provide users with a web system with input and output capabilities, so as to provide users with a simple and clear data analysis task configuration page and analysis result display page to meet users' data analysis needs.

2、接入层:提供查询适配服务,具体可由查询适配模块提供。接收用户提交的数据分析请求,并解析请求信息,获取所请求的用户数据的数据标签。在一实施方式中,当数据分析请求包含多个数据标签,且该多个标签属于不同的标签类别时,可以根据数据标签所属的标签类别将数据分析请求拆分为多个并发的数据分析请求;同时,将各个数据分析请求按照搜索模块支持的语法进行适配,生成相应的查询语句。2. Access layer: Provide query adaptation services, which can be specifically provided by the query adaptation module. Receive the data analysis request submitted by the user, parse the request information, and obtain the data tag of the requested user data. In one embodiment, when the data analysis request contains multiple data tags, and the multiple tags belong to different tag categories, the data analysis request can be split into multiple concurrent data analysis requests according to the tag category to which the data tags belong; at the same time, each data analysis request is adapted according to the syntax supported by the search module to generate a corresponding query statement.

3、逻辑层:提供搜索服务,具体可由搜索模块提供。包括搜索集群,如ElasticSearch集群;Elastic Search是基于Lucene的搜索服务器,提供了分布式多用户能力的全文搜索引擎;Lucene是开放源代码的全文检索引擎工具包。一个集群(cluster)是由一个或多个节点组织在一起,共同持有整个的数据,并一起提供索引和搜索功能,根据接入层提供的查询语句,提供线上实时的数据搜索以及聚合服务。3. Logical layer: provides search services, which can be provided by the search module. Including search clusters, such as ElasticSearch clusters; Elastic Search is a search server based on Lucene, which provides a distributed multi-user full-text search engine; Lucene is an open source full-text search engine toolkit. A cluster is organized by one or more nodes, which jointly hold the entire data and provide indexing and search functions. According to the query statements provided by the access layer, it provides online real-time data search and aggregation services.

4、数据同步层:数据同步层包括标签数据同步模块、号码包数据同步模块和第一帐号映射数据存储数据库。标签数据同步模块用于定期将多个数据类型的用户标签数据加载到搜索集群中,用户标签数据如用户A,性别、年龄、学历、观看偏好等信息组成的一组数据。数据同步层具备帐号映射数据存储服务,以定期将各种账号的映射关系数据加载到第一帐号映射数据存储数据库中,该数据库支持线上实时查询服务,可以是远程数据服务(Remote Dictionary Server,Redis)数据库。号码包数据同步模块对于每一个新增的用户号码包,为新增的用户号码包增加一个号码包标识;并且查询存储的账号映射关系,在转换后的账号对应的标识信息中心,打上原号码包id的标识。例如:guid号码包pkg_100包含用户guid_1,对应的vuid账号为vuid_1,则在vuid标签表里扩展一列tag_pkg_100标签,并将vuid_1对应的tag_pkg_100置为1。4. Data synchronization layer: The data synchronization layer includes a label data synchronization module, a number package data synchronization module and a first account mapping data storage database. The label data synchronization module is used to regularly load user label data of multiple data types into the search cluster. The user label data is a set of data consisting of information such as user A, gender, age, education, viewing preferences, etc. The data synchronization layer has an account mapping data storage service to regularly load the mapping relationship data of various accounts into the first account mapping data storage database. The database supports online real-time query services and can be a remote data service (Remote Dictionary Server, Redis) database. The number package data synchronization module adds a number package identifier to each newly added user number package; and queries the stored account mapping relationship, and marks the original number package ID in the identifier information center corresponding to the converted account. For example: the guid number package pkg_100 contains the user guid_1, and the corresponding vuid account is vuid_1, then a column of tag_pkg_100 tags is extended in the vuid tag table, and the tag_pkg_100 corresponding to vuid_1 is set to 1.

5、数据存储层:提供离线的大数据存储服务,可由Hadoop分布式文件系统HDFS提供;包括:用户标签数据存储服务,可由标签数据存储数据库存储大数据平台计算得到的用户标签数据;用户号码包数据存储服务,可由号码包数据存储数据库存储大数据平台计算得到的用户号码包数据;用户帐号映射数据存储服务,可由第二帐号映射数据存储数据库存储大数据平台计算得到的用户帐号映射数据。需要说明的是,标签数据存储数据库、号码包数据存储数据库和第二帐号映射数据存储数据库可以是同一数据库,也可以是不同的数据库。5. Data storage layer: provides offline big data storage services, which can be provided by Hadoop distributed file system HDFS; including: user tag data storage service, which can be used by the tag data storage database to store user tag data calculated by the big data platform; user number package data storage service, which can be used by the number package data storage database to store user number package data calculated by the big data platform; user account mapping data storage service, which can be used by the second account mapping data storage database to store user account mapping data calculated by the big data platform. It should be noted that the tag data storage database, the number package data storage database, and the second account mapping data storage database can be the same database or different databases.

其中,当检测到有新增号码包时,需要为新增的号码包创建索引表。请一并参见图4,图4所示为索引表的创建流程图,以guid和vuid两类标签数据为例。guid标签数据是以设备id为数据标签的用户数据,例如用户设备id+大盘播放数据,用户设备id作为数据标签。vuid标签数据是以用户应用账号为数据标签的用户数据,例如用户应用账号+会员属性数据,用户应用账号作为数据标签。Among them, when a new number package is detected, an index table needs to be created for the new number package. Please refer to Figure 4, which shows a flow chart for creating an index table, taking GUID and VUID as examples of two types of label data. GUID label data is user data with device ID as the data label, such as user device ID + large disk playback data, with user device ID as the data label. VUID label data is user data with user application account as the data label, such as user application account + membership attribute data, with user application account as the data label.

如图4所示,索引表的创建流程包括以下步骤:As shown in Figure 4, the process of creating an index table includes the following steps:

1、标签数据同步模块向标签数据存储数据库发起guid标签数据读取请求,且并行向标签数据存储数据库发起vuid标签数据读取请求。其中,标签数据存储数据库会定期更新存储的用户标签数据,包括更新guid标签数据和vuid标签数据等。1. The tag data synchronization module initiates a GUID tag data read request to the tag data storage database, and simultaneously initiates a VUID tag data read request to the tag data storage database. The tag data storage database will periodically update the stored user tag data, including updating GUID tag data and VUID tag data.

2、标签数据存储数据库向标签数据同步模块返回guid标签数据和vuid标签数据。2. The tag data storage database returns the GUID tag data and the VUID tag data to the tag data synchronization module.

3、标签数据同步模块获取到guid标签数据后,向搜索模块发起guid标签索引表创建请求,以使搜索模块创建guid标签索引表guid_tag;标签数据同步模块获取到vuid标签数据后,并行向搜索模块发起vuid标签索引表创建请求,以使搜索模块创建vuid标签索引表vuid_tag。在一实施方式中,guid标签索引表创建请求携带标签数据同步模块获取到的guid标签数据;vuid标签索引表创建请求携带标签数据同步模块获取到的vuid标签数据。3. After the tag data synchronization module obtains the GUID tag data, it initiates a GUID tag index table creation request to the search module, so that the search module creates the GUID tag index table GUID_TAG; after the tag data synchronization module obtains the VUID tag data, it initiates a VUID tag index table creation request to the search module in parallel, so that the search module creates the VUID tag index table VUID_TAG. In one embodiment, the GUID tag index table creation request carries the GUID tag data obtained by the tag data synchronization module; the VUID tag index table creation request carries the VUID tag data obtained by the tag data synchronization module.

4、搜索模块接收到guid标签索引表创建请求后,对guid标签数据创建guid标签索引表。可以是对guid标签数据创建倒排索引表,以支持guid标签数据的实时索引;具体地,根据guid标签数据创建特征标签表,该特征标签表为正排表,如:特征标签表的组织格式是:设备id+大盘播放数据的信息(例如大盘播放数据的存储地址);然后搜索模块会针对特征标签表中数据创建倒排索引表。并且搜索模块会向标签数据同步模块返回guid标签索引表的创建结果。同理,搜索模块接收到vuid标签索引表创建请求后,对vuid标签数据创建vuid标签索引表;可以是对vuid标签数据创建倒排索引表,以支持vuid标签数据的实时索引;并且搜索模块会向标签数据同步模块返回vuid标签索引表的创建结果。4. After the search module receives the request to create the GUID tag index table, it creates a GUID tag index table for the GUID tag data. It can be to create an inverted index table for the GUID tag data to support real-time indexing of the GUID tag data; specifically, a feature tag table is created based on the GUID tag data, and the feature tag table is a forward table, such as: the organizational format of the feature tag table is: device ID + information of the large disk playback data (such as the storage address of the large disk playback data); then the search module will create an inverted index table for the data in the feature tag table. And the search module will return the creation result of the GUID tag index table to the tag data synchronization module. Similarly, after the search module receives the request to create the VUID tag index table, it creates a VUID tag index table for the VUID tag data; it can be to create an inverted index table for the VUID tag data to support real-time indexing of the VUID tag data; and the search module will return the creation result of the VUID tag index table to the tag data synchronization module.

5、标签数据同步模块在接收到guid标签索引表以及vuid标签索引表创建成功的响应后,向号码包数据同步模块发送标签数据同步成功的通知。号码包数据同步模块接收到标签数据同步成功的通知后,向标签数据同步模块返回接收到标签数据同步成功通知的确认通知消息。标签数据同步模块接收到号码包数据同步模块发送的确认通知消息后,更新标签数据的同步状态为已完成。5. After receiving the response that the guid tag index table and the vuid tag index table are successfully created, the tag data synchronization module sends a notification of successful tag data synchronization to the number package data synchronization module. After receiving the notification of successful tag data synchronization, the number package data synchronization module returns a confirmation notification message of receiving the notification of successful tag data synchronization to the tag data synchronization module. After receiving the confirmation notification message sent by the number package data synchronization module, the tag data synchronization module updates the synchronization status of the tag data to completed.

6、号码包数据同步模块向号码包数据存储数据库请求获取vuid号码包,所请求的vuid号码包为新增的由用户应用账号组成的任一号码包。号码包数据存储数据库向号码包数据同步模块返回对应的vuid号码包,返回的号码包的标识为pkg1。6. The number package data synchronization module requests the number package data storage database to obtain the vuid number package, and the requested vuid number package is any number package newly added by the user application account. The number package data storage database returns the corresponding vuid number package to the number package data synchronization module, and the returned number package is identified as pkg1.

7、对于号码包pkg1中的每一个vuid号码x,号码包数据同步模块从第一帐号映射数据存储数据库中查询与vuid号码x对应的guid号码y;vuid号码x和guid号码y对应同一用户。对于vuid号码x,号码包数据同步模块向搜索模块发起vuid标签索引表更新请求。搜索模块接收到vuid标签索引表更新请求后,为vuid标签索引表vuid_tag新增标签pkg1,并对vuid=x对应的数据(包括账号、数据标签、标签数据或标签数据的信息等)的pkg1标签置为1;以及向号码包数据同步模块返回更新结果。7. For each vuid number x in the number package pkg1, the number package data synchronization module queries the guid number y corresponding to the vuid number x from the first account mapping data storage database; the vuid number x and the guid number y correspond to the same user. For vuid number x, the number package data synchronization module initiates a vuid tag index table update request to the search module. After receiving the vuid tag index table update request, the search module adds a tag pkg1 to the vuid tag index table vuid_tag, and sets the pkg1 tag of the data corresponding to vuid=x (including account number, data tag, tag data or tag data information, etc.) to 1; and returns the update result to the number package data synchronization module.

对于vuid号码x对应的guid号码y,号码包数据同步模块向搜索模块发起guid标签索引表更新请求。搜索模块接收到guid标签索引表更新请求后,为guid标签索引表guid_tag新增标签pkg1,并对guid=y对应的数据的pkg1标签置为1;以及向号码包数据同步模块返回更新结果。For the guid number y corresponding to the vuid number x, the number package data synchronization module initiates a guid tag index table update request to the search module. After receiving the guid tag index table update request, the search module adds a tag pkg1 to the guid tag index table guid_tag, and sets the pkg1 tag of the data corresponding to guid=y to 1; and returns the update result to the number package data synchronization module.

8、在对号码包pkg1中的所有号码处理完后,号码包数据同步模块更新号码包pkg1的同步状态为已完成。8. After processing all the numbers in the number package pkg1, the number package data synchronization module updates the synchronization status of the number package pkg1 to completed.

需要说明的是,对于步骤6和步骤7,号码包数据同步模块也可以向号码包数据存储数据库请求获取guid号码包,所请求的guid号码包为新增的由用户设备标识组成的任一号码包。号码包数据存储数据库向号码包数据同步模块返回对应的guid号码包,返回的号码包的标识为pkg2。对于号码包pkg2中的每一个guid号码y,号码包数据同步模块从第一帐号映射数据存储数据库中查询与guid号码y对应的vuid号码x;vuid号码x和guid号码y对应同一用户。对于guid号码y,号码包数据同步模块向搜索模块发起guid标签索引表更新请求。搜索模块接收到guid标签索引表更新请求后,为guid标签索引表guid_tag新增标签pkg2,并对guid=y对应的数据的pkg2标签置为1;以及向号码包数据同步模块返回更新结果。对于guid号码y对于的vuid号码x,号码包数据同步模块向搜索模块发起vuid标签索引表更新请求。搜索模块接收到vuid标签索引表更新请求后,为vuid标签索引表vuid_tag新增标签pkg2,并对vuid=x对应的数据的pkg2标签置为1;以及向号码包数据同步模块返回更新结果。It should be noted that for steps 6 and 7, the number package data synchronization module can also request the number package data storage database to obtain the GUID number package, and the requested GUID number package is any newly added number package consisting of a user device identifier. The number package data storage database returns the corresponding GUID number package to the number package data synchronization module, and the identifier of the returned number package is pkg2. For each GUID number y in the number package pkg2, the number package data synchronization module queries the vuid number x corresponding to the GUID number y from the first account mapping data storage database; the vuid number x and the GUID number y correspond to the same user. For the GUID number y, the number package data synchronization module initiates a GUID tag index table update request to the search module. After receiving the GUID tag index table update request, the search module adds a tag pkg2 to the GUID tag index table guid_tag, and sets the pkg2 tag of the data corresponding to GUID=y to 1; and returns the update result to the number package data synchronization module. For the vuid number x corresponding to the GUID number y, the number package data synchronization module initiates a vuid tag index table update request to the search module. After receiving the vuid tag index table update request, the search module adds a tag pkg2 to the vuid tag index table vuid_tag, and sets the pkg2 tag of the data corresponding to vuid=x to 1; and returns the update result to the number package data synchronization module.

当有数据分析需求时,响应于数据分析请求进行数据分析。如图5所示,示出了用户数据分析的处理流程,包括以下步骤:When there is a need for data analysis, data analysis is performed in response to the data analysis request. As shown in FIG5 , the processing flow of user data analysis is shown, including the following steps:

1、数据分析人员(即数据分析系统的用户)通过web系统向查询适配模块发起数据分析请求,数据分析请求中携带待分析号码包的标识pkg1以及所请求的待分析用户数据的数据标签。1. The data analyst (ie, the user of the data analysis system) initiates a data analysis request to the query adaptation module through the web system. The data analysis request carries the identifier pkg1 of the number package to be analyzed and the data tag of the requested user data to be analyzed.

2、查询适配模块接收到数据分析请求后,解析数据分析请求中的数据标签以及号码包标识;并查询标签字典,获取各个数据标签对应的类别。标签字典返回目标数据标签对应的标签类别。2. After receiving the data analysis request, the query adaptation module parses the data tag and number package identifier in the data analysis request, and queries the tag dictionary to obtain the category corresponding to each data tag. The tag dictionary returns the tag category corresponding to the target data tag.

3、对于vuid标签,查询适配模块增加pkg1标签等于1的分析条件,并向搜索模块发起vuid标签分析请求,以请求搜索模块获取对应vuid标签的用户数据。对于guid标签,查询适配模块增加pkg1等于1的分析条件,并向搜索模块发起guid标签分析请求,以请求搜索模块获取对应guid标签的用户数据。3. For vuid tags, the query adapter module adds the analysis condition that pkg1 tag is equal to 1, and initiates a vuid tag analysis request to the search module to request the search module to obtain the user data corresponding to the vuid tag. For guid tags, the query adapter module adds the analysis condition that pkg1 is equal to 1, and initiates a guid tag analysis request to the search module to request the search module to obtain the user data corresponding to the guid tag.

4、搜素模块接收到vuid标签分析请求后,根据事先创建的vuid标签索引表获取第一待分析用户数据,第一待分析用户数据对应的用户为pkg1标签等于1的用户;对获取到的第一待分析用户数据进行分析,并将分析结果返回给查询适配模块。同理,搜索模块接收到guid标签分析请求,根据事先创建的guid标签索引表获取第二待分析用户数据,第二待分析用户数据对应的用户为pkg1标签等于1的用户;对获取到的第二待分析用户数据进行分析,并将分析结果返回给查询适配模块。4. After the search module receives the vuid tag analysis request, it obtains the first user data to be analyzed according to the vuid tag index table created in advance. The user corresponding to the first user data to be analyzed is the user with the pkg1 tag equal to 1; the obtained first user data to be analyzed is analyzed, and the analysis result is returned to the query adaptation module. Similarly, the search module receives the guid tag analysis request, obtains the second user data to be analyzed according to the guid tag index table created in advance. The user corresponding to the second user data to be analyzed is the user with the pkg1 tag equal to 1; the obtained second user data to be analyzed is analyzed, and the analysis result is returned to the query adaptation module.

5、查询适配模块接收到所有类别的用户数据的分析结果后,进行数据聚合,得到最终的数据分析结果;并将数据分析结果发送给web系统,以将数据分析结果展示给数据分析人员。5. After receiving the analysis results of all categories of user data, the query adaptation module aggregates the data to obtain the final data analysis results; and sends the data analysis results to the web system to display the data analysis results to the data analysts.

采用上述方式可实现跨类型数据联合索引的用户数据实时分析。对于每一个待分析号码包,通过给每一个数据类型的特征标签表添加对应的号码包标签,可实现对待分析号码包对应的用户数据的实时分析;并且在对于跨数据类型的用户数据进行分析时,根据待分析属性特征所属的特征类型,拆分为对应的索引请求,从而可实现跨数据类型的用户数据的实时分析,极大地提升了数据分析的效率。The above method can realize real-time analysis of user data with cross-type data joint index. For each number package to be analyzed, by adding the corresponding number package label to the feature label table of each data type, real-time analysis of the user data corresponding to the number package to be analyzed can be realized; and when analyzing user data across data types, according to the feature type to which the attribute feature to be analyzed belongs, it is split into corresponding index requests, thereby realizing real-time analysis of user data across data types, greatly improving the efficiency of data analysis.

请参阅图6,图6为本发明实施例提供的一种数据分析装置的结构示意图。本发明实施例中所描述的数据分析装置,对应于前文所述的服务器,所述装置包括:Please refer to Figure 6, which is a schematic diagram of the structure of a data analysis device provided in an embodiment of the present invention. The data analysis device described in the embodiment of the present invention corresponds to the server described above, and the device includes:

获取单元601,用于在收发单元602接收到数据分析请求时,获取所述数据分析请求携带的待分析号码包的标识以及所请求的待分析用户数据的目标数据标签;The acquisition unit 601 is used to acquire the identifier of the number package to be analyzed and the target data tag of the requested user data to be analyzed carried in the data analysis request when the transceiver unit 602 receives the data analysis request;

所述获取单元603,还用于获取所述目标数据标签对应的目标标签类别,并获取所述目标标签类别对应的索引表,所述索引表包括数据标签、用户数据的信息、号码包标识三者之间的映射关系;The acquisition unit 603 is further configured to acquire a target tag category corresponding to the target data tag, and acquire an index table corresponding to the target tag category, wherein the index table includes a mapping relationship between the data tag, user data information, and a number package identifier;

处理单元603,用于根据所述索引表获取与所述待分析号码包的标识以及所述目标数据标签匹配的待分析用户数据,所述待分析用户数据为所述索引表中用户数据的信息所指示的用户数据;The processing unit 603 is configured to obtain, according to the index table, the user data to be analyzed that matches the identifier of the number package to be analyzed and the target data tag, wherein the user data to be analyzed is the user data indicated by the information of the user data in the index table;

所述处理单元603,还用于对获取到的待分析用户数据进行分析,得到所述待分析号码包中的号码对应的属性特征。The processing unit 603 is further configured to analyze the acquired user data to be analyzed, and obtain attribute features corresponding to the numbers in the number package to be analyzed.

在一实施例中,号码包中包括至少一个号码,号码包中的号码用于表示用户的通信标识,所述通信标识为设备标识、应用账号、路由地址中的任一种。In one embodiment, the number package includes at least one number, and the number in the number package is used to represent the communication identification of the user, and the communication identification is any one of a device identification, an application account, and a routing address.

在一实施例中,所述获取单元601,还用于:在检测到有新增号码包时,获取所述新增号码包,所述新增号码包与所述待分析号码包匹配;获取根据以所述目标标签类别的数据标签为标签的用户数据生成的特征标签表,所述特征标签表包括数据标签、用户数据的信息两者之间的映射关系;In one embodiment, the acquisition unit 601 is further used to: when a new number package is detected, acquire the new number package, the new number package matches the number package to be analyzed; acquire a feature label table generated according to user data labeled with a data label of the target label category, the feature label table including a mapping relationship between the data label and the user data information;

所述处理单元603,还用于根据所述新增号码包以及所述特征标签表创建索引表。The processing unit 603 is further configured to create an index table according to the newly added number package and the feature label table.

在一实施例中,所述用户数据的信息包括用于标识所述用户数据的信息的号码,所述处理单元603根据所述新增号码包以及所述特征标签表创建索引表时,具体用于:In one embodiment, the information of the user data includes a number for identifying the information of the user data, and when the processing unit 603 creates an index table according to the newly added number package and the feature label table, it is specifically used to:

获取所述新增号码包的标识以及所述新增号码包中的号码;Obtaining an identifier of the newly added number package and numbers in the newly added number package;

在所述特征标签表中添加号码包标记,以为所述特征标签表中符合预设条件的用户数据的信息标记所述新增号码包的标识,其中,符合预设条件是指用于标识用户数据的信息的号码为所述新增号码包中的号码;Adding a number package tag in the feature tag table to mark the identifier of the newly added number package for the information of the user data that meets the preset conditions in the feature tag table, wherein meeting the preset conditions means that the number used to identify the information of the user data is a number in the newly added number package;

根据添加号码包标记的特征标签表创建索引表。Create an index table based on the feature tag table with added number package tags.

在一实施例中,所述处理单元603根据添加号码包标记的特征标签表创建索引表时,具体用于:In one embodiment, when the processing unit 603 creates an index table according to the feature tag table with the number package mark added, it is specifically used to:

根据添加号码包标记的特征标签表创建倒排索引表,所述倒排索引表中以数据标签和号码包标识为属性值、以用户数据的信息为具有所述属性值的数据。An inverted index table is created according to the feature tag table with number package marks added, wherein the inverted index table uses data tags and number package identifiers as attribute values and user data information as data having the attribute values.

在一实施例中,所述获取单元601获取所述目标数据标签对应的目标标签类别时,具体用于:In one embodiment, when the acquisition unit 601 acquires the target tag category corresponding to the target data tag, it is specifically used to:

从预置的标签字典中查询所述目标数据标签对应的至少一个标签类别,所述目标标签类别为所述至少一个标签类别中的任意一个。At least one tag category corresponding to the target data tag is queried from a preset tag dictionary, and the target tag category is any one of the at least one tag category.

在一实施例中,所述用户数据的信息包括所述用户数据的信息所指示的用户数据的存储地址,所述处理单元603根据所述索引表获取与所述待分析号码包的标识以及所述目标数据标签匹配的待分析用户数据时,具体用于:In one embodiment, the information of the user data includes a storage address of the user data indicated by the information of the user data, and when the processing unit 603 obtains the user data to be analyzed that matches the identifier of the number package to be analyzed and the target data tag according to the index table, it is specifically used to:

根据所述索引表获取与所述待分析号码包的标识以及所述目标数据标签匹配的目标用户数据的信息;Acquire information of target user data matching the identifier of the number package to be analyzed and the target data tag according to the index table;

从所述目标用户数据的信息中获取目标用户数据的存储地址,所述目标用户数据为所述目标用户数据的信息所指示的用户数据;Acquire a storage address of the target user data from the information of the target user data, wherein the target user data is user data indicated by the information of the target user data;

根据所述存储地址获取所述目标用户数据,并将所述目标用户数据作为待分析用户数据。The target user data is acquired according to the storage address, and the target user data is used as the user data to be analyzed.

可以理解的是,本发明实施例的数据分析装置的各功能单元的功能可根据上述方法实施例中的方法具体实现,其具体实现过程可以参照上述方法实施例的相关描述,此处不再赘述。It can be understood that the functions of each functional unit of the data analysis device of the embodiment of the present invention can be specifically implemented according to the method in the above method embodiment, and its specific implementation process can refer to the relevant description of the above method embodiment, which will not be repeated here.

本发明实施例通过响应于数据分析请求获取待分析号码包的标识以及目标数据标签,获取与目标数据标签对应的目标标签类别匹配的索引表,根据索引表获取与该标识以及目标数据标签匹配的待分析用户数据,对待分析用户数据进行分析,得到属性特征,从而可以实现用户数据的实时分析,有效提高用户数据的分析效率。The embodiment of the present invention obtains the identifier of the number package to be analyzed and the target data tag in response to a data analysis request, obtains an index table that matches the target tag category corresponding to the target data tag, obtains the user data to be analyzed that matches the identifier and the target data tag according to the index table, analyzes the user data to be analyzed, and obtains attribute characteristics, thereby realizing real-time analysis of user data and effectively improving the analysis efficiency of user data.

请参阅图7,图7为本发明实施例提供的一种服务器的结构示意图。本发明实施例中所描述的服务器包括:处理器701、通信接口702及存储器703。其中,处理器701、通信接口702及存储器703可通过总线或其他方式连接,本发明实施例以通过总线连接为例。Please refer to Figure 7, which is a schematic diagram of the structure of a server provided in an embodiment of the present invention. The server described in the embodiment of the present invention includes: a processor 701, a communication interface 702 and a memory 703. The processor 701, the communication interface 702 and the memory 703 can be connected via a bus or other means, and the embodiment of the present invention takes the connection via a bus as an example.

其中,处理器701(或称CPU(Central Processing Unit,中央处理器))是服务器的计算核心以及控制核心,其可以解析服务器内的各类指令以及处理服务器的各类数据,例如:CPU可以在服务器内部结构之间传输各类交互数据,等等。通信接口702可选的可以包括标准的有线接口、无线接口(如Wi-Fi、移动通信接口等),受处理器701的控制用于收发数据。存储器703(Memory)是服务器中的记忆设备,用于存放程序和数据。可以理解的是,此处的存储器703既可以包括服务器的内置存储器,当然也可以包括服务器所支持的扩展存储器。存储器703提供存储空间,该存储空间存储了服务器的操作系统,可包括但不限于:Android系统、iOS系统、Windows Phone系统等等,本发明对此并不作限定。Among them, the processor 701 (or CPU (Central Processing Unit)) is the computing core and control core of the server, which can parse various instructions in the server and process various data of the server. For example, the CPU can transmit various interactive data between the internal structures of the server, and so on. The communication interface 702 can optionally include a standard wired interface, a wireless interface (such as Wi-Fi, a mobile communication interface, etc.), which is controlled by the processor 701 to send and receive data. The memory 703 (Memory) is a memory device in the server for storing programs and data. It can be understood that the memory 703 here can include the built-in memory of the server, and of course it can also include the extended memory supported by the server. The memory 703 provides a storage space, which stores the operating system of the server, which may include but is not limited to: Android system, iOS system, Windows Phone system, etc., and the present invention is not limited to this.

在本发明实施例中,处理器701通过运行存储器703中的可执行程序代码,执行如下操作:In the embodiment of the present invention, the processor 701 performs the following operations by running the executable program code in the memory 703:

在通过所述通信接口702接收到数据分析请求时,获取所述数据分析请求携带的待分析号码包的标识以及所请求的待分析用户数据的目标数据标签;When a data analysis request is received through the communication interface 702, an identifier of the number package to be analyzed and a target data tag of the requested user data to be analyzed carried in the data analysis request are obtained;

获取所述目标数据标签对应的目标标签类别,并获取所述目标标签类别对应的索引表,所述索引表包括数据标签、用户数据的信息、号码包标识三者之间的映射关系;Obtaining a target tag category corresponding to the target data tag, and obtaining an index table corresponding to the target tag category, wherein the index table includes a mapping relationship between a data tag, user data information, and a number package identifier;

根据所述索引表获取与所述待分析号码包的标识以及所述目标数据标签匹配的待分析用户数据,所述待分析用户数据为所述索引表中用户数据的信息所指示的用户数据;Acquire, according to the index table, the user data to be analyzed that matches the identifier of the number package to be analyzed and the target data tag, wherein the user data to be analyzed is the user data indicated by the information of the user data in the index table;

对获取到的待分析用户数据进行分析,得到所述待分析号码包中的号码对应的属性特征。The acquired user data to be analyzed is analyzed to obtain attribute features corresponding to the numbers in the number package to be analyzed.

在一实施例中,号码包中包括至少一个号码,号码包中的号码用于表示用户的通信标识,所述通信标识为设备标识、应用账号、路由地址中的任一种。In one embodiment, the number package includes at least one number, and the number in the number package is used to represent the communication identification of the user, and the communication identification is any one of a device identification, an application account, and a routing address.

在一实施例中,所述处理器701在通过所述通信接口702接收到数据分析请求时,获取所述数据分析请求携带的待分析号码包的标识以及所请求的待分析用户数据的目标数据标签之前,还用于:在检测到有新增号码包时,获取所述新增号码包,所述新增号码包与所述待分析号码包匹配;获取根据以所述目标标签类别的数据标签为标签的用户数据生成的特征标签表,所述特征标签表包括数据标签、用户数据的信息两者之间的映射关系;根据所述新增号码包以及所述特征标签表创建索引表。In one embodiment, when the processor 701 receives a data analysis request through the communication interface 702, before obtaining the identifier of the number package to be analyzed carried by the data analysis request and the target data tag of the requested user data to be analyzed, it is also used to: when a new number package is detected, obtain the new number package, and the new number package matches the number package to be analyzed; obtain a feature label table generated based on user data labeled with a data label of the target label category, the feature label table including a mapping relationship between data labels and user data information; create an index table based on the new number package and the feature label table.

在一实施例中,所述用户数据的信息包括用于标识所述用户数据的信息的号码,所述处理器701根据所述新增号码包以及所述特征标签表创建索引表时,具体用于:获取所述新增号码包的标识以及所述新增号码包中的号码;在所述特征标签表中添加号码包标记,以为所述特征标签表中符合预设条件的用户数据的信息标记所述新增号码包的标识,其中,符合预设条件是指用于标识用户数据的信息的号码为所述新增号码包中的号码;根据添加号码包标记的特征标签表创建索引表。据的信息标记所述新增号码包的标识,其中,符合预设条件是指用于标识用户数据的信息的号码为所述新增号码包中的号码;根据添加号码包标记的特征标签表创建索引表。In one embodiment, the information of the user data includes a number for identifying the information of the user data. When the processor 701 creates an index table according to the newly added number package and the feature tag table, it is specifically used to: obtain the identifier of the newly added number package and the number in the newly added number package; add a number package mark in the feature tag table to mark the identifier of the newly added number package with the information of the user data that meets the preset conditions in the feature tag table, wherein meeting the preset conditions means that the number used to identify the information of the user data is a number in the newly added number package; and create an index table according to the feature tag table with the number package mark added.

在一实施例中,所述处理器701根据添加号码包标记的特征标签表创建索引表时,具体用于:根据添加号码包标记的特征标签表创建倒排索引表,所述倒排索引表中以数据标签和号码包标识为属性值、以用户数据的信息为具有所述属性值的数据。In one embodiment, when the processor 701 creates an index table based on a feature tag table with a number package mark added, it is specifically used to: create an inverted index table based on the feature tag table with a number package mark added, wherein the inverted index table uses data tags and number package identifiers as attribute values, and user data information as data having the attribute values.

在一实施例中,所述处理器701获取所述目标数据标签对应的目标标签类别时,具体用于:从预置的标签字典中查询所述目标数据标签对应的至少一个标签类别,所述目标标签类别为所述至少一个标签类别中的任意一个。In one embodiment, when the processor 701 obtains the target tag category corresponding to the target data tag, it is specifically used to: query at least one tag category corresponding to the target data tag from a preset tag dictionary, and the target tag category is any one of the at least one tag category.

在一实施例中,所述用户数据的信息包括所述用户数据的信息所指示的用户数据的存储地址,所述处理器701根据所述索引表获取与所述待分析号码包的标识以及所述目标数据标签匹配的待分析用户数据时,具体用于:根据所述索引表获取与所述待分析号码包的标识以及所述目标数据标签匹配的目标用户数据的信息;从所述目标用户数据的信息中获取目标用户数据的存储地址,所述目标用户数据为所述目标用户数据的信息所指示的用户数据;根据所述存储地址获取所述目标用户数据,并将所述目标用户数据作为待分析用户数据。In one embodiment, the information of the user data includes a storage address of the user data indicated by the information of the user data. When the processor 701 obtains the user data to be analyzed that matches the identifier of the number package to be analyzed and the target data tag according to the index table, it is specifically used to: obtain the information of the target user data that matches the identifier of the number package to be analyzed and the target data tag according to the index table; obtain the storage address of the target user data from the information of the target user data, the target user data being the user data indicated by the information of the target user data; obtain the target user data according to the storage address, and use the target user data as the user data to be analyzed.

具体实现中,本发明实施例中所描述的处理器701、通信接口702及存储器703可执行本发明实施例提供的一种数据分析方法中所描述的服务器的实现方式,也可执行本发明实施例提供的一种数据装置中所描述的实现方式,在此不再赘述。In a specific implementation, the processor 701, communication interface 702 and memory 703 described in the embodiment of the present invention can execute the implementation method of the server described in a data analysis method provided in an embodiment of the present invention, and can also execute the implementation method described in a data device provided in an embodiment of the present invention, which will not be repeated here.

本发明实施例通过响应于数据分析请求获取待分析号码包的标识以及目标数据标签,获取与目标数据标签对应的目标标签类别匹配的索引表,根据索引表获取与该标识以及目标数据标签匹配的待分析用户数据,对待分析用户数据进行分析,得到属性特征,从而可以实现用户数据的实时分析,有效提高用户数据的分析效率。The embodiment of the present invention obtains the identifier of the number package to be analyzed and the target data tag in response to a data analysis request, obtains an index table that matches the target tag category corresponding to the target data tag, obtains the user data to be analyzed that matches the identifier and the target data tag according to the index table, analyzes the user data to be analyzed, and obtains attribute characteristics, thereby realizing real-time analysis of user data and effectively improving the analysis efficiency of user data.

本发明实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行如本发明实施例所述的数据分析方法。An embodiment of the present invention further provides a computer-readable storage medium, in which instructions are stored. When the computer-readable storage medium is run on a computer, the computer is enabled to execute the data analysis method as described in the embodiment of the present invention.

本发明实施例还提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行如本发明实施例所述的数据分析方法。The embodiment of the present invention further provides a computer program product comprising instructions, which, when executed on a computer, enables the computer to execute the data analysis method as described in the embodiment of the present invention.

需要说明的是,对于前述的各个方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,因为依据本发明,某一些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本发明所必须的。It should be noted that, for the above-mentioned various method embodiments, for the sake of simplicity of description, they are all expressed as a series of action combinations, but those skilled in the art should know that the present invention is not limited by the described order of actions, because according to the present invention, some steps can be performed in other orders or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by the present invention.

本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:闪存盘、只读存储器(Read-Only Memory,ROM)、随机存取器(Random AccessMemory,RAM)、磁盘或光盘等。A person skilled in the art may understand that all or part of the steps in the various methods of the above embodiments may be completed by instructing related hardware through a program, and the program may be stored in a computer-readable storage medium, and the storage medium may include: a flash drive, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, etc.

以上所揭露的仅为本发明部分实施例而已,当然不能以此来限定本发明之权利范围,因此依本发明权利要求所作的等同变化,仍属本发明所涵盖的范围。The above disclosure is only part of the embodiments of the present invention, which certainly cannot be used to limit the scope of the present invention. Therefore, equivalent changes made according to the claims of the present invention are still within the scope of the present invention.

Claims (9)

Translated fromChinese
1.一种数据分析方法,其特征在于,所述方法包括:1. A data analysis method, characterized in that the method comprises:在检测到有新增号码包时,获取所述新增号码包;When a new number package is detected, obtaining the new number package;获取根据以目标标签类别的数据标签为标签的用户数据生成的特征标签表,所述特征标签表包括数据标签、用户数据的信息两者之间的映射关系;所述用户数据的信息包括用于标识所述用户数据的信息的号码;Acquire a feature label table generated according to user data labeled with a data label of a target label category, wherein the feature label table includes a mapping relationship between data labels and information of user data; the information of the user data includes a number for identifying information of the user data;获取所述新增号码包的标识以及所述新增号码包中的号码;Obtaining an identifier of the newly added number package and numbers in the newly added number package;在所述特征标签表中添加号码包标记,以为所述特征标签表中符合预设条件的用户数据的信息标记所述新增号码包的标识;符合预设条件是指用于标识用户数据的信息的号码为所述新增号码包中的号码;Adding a number package tag in the feature tag table to mark the information of the user data that meets the preset conditions in the feature tag table with the identifier of the newly added number package; meeting the preset conditions means that the number used to identify the information of the user data is a number in the newly added number package;根据添加号码包标记的特征标签表创建所述目标标签类别对应的索引表;Creating an index table corresponding to the target tag category according to the feature tag table with added number package marks;在接收到数据分析请求时,获取所述数据分析请求携带的待分析号码包的标识以及所请求的待分析用户数据的目标数据标签;所述新增号码包与所述待分析号码包匹配;When receiving a data analysis request, obtaining an identifier of a number package to be analyzed carried in the data analysis request and a target data tag of the requested user data to be analyzed; the newly added number package is matched with the number package to be analyzed;获取所述目标数据标签对应的目标标签类别,并获取所述目标标签类别对应的索引表,所述索引表包括数据标签、用户数据的信息、号码包标识三者之间的映射关系;Obtaining a target tag category corresponding to the target data tag, and obtaining an index table corresponding to the target tag category, wherein the index table includes a mapping relationship between a data tag, information about user data, and a number package identifier;根据所述索引表获取与所述待分析号码包的标识以及所述目标数据标签匹配的待分析用户数据,所述待分析用户数据为所述索引表中用户数据的信息所指示的用户数据;Acquire, according to the index table, the user data to be analyzed that matches the identifier of the number package to be analyzed and the target data tag, wherein the user data to be analyzed is the user data indicated by the information of the user data in the index table;对获取到的待分析用户数据进行分析,得到所述待分析号码包中的号码对应的属性特征。The acquired user data to be analyzed is analyzed to obtain attribute features corresponding to the numbers in the number package to be analyzed.2.如权利要求1所述的方法,其特征在于,号码包中包括至少一个号码,号码包中的号码用于表示用户的通信标识,所述通信标识为设备标识、应用账号、路由地址中的任一种。2. The method as claimed in claim 1 is characterized in that the number package includes at least one number, and the number in the number package is used to represent the user's communication identification, and the communication identification is any one of a device identification, an application account, and a routing address.3.如权利要求1或2所述的方法,其特征在于,所述根据添加号码包标记的特征标签表创建索引表,包括:3. The method according to claim 1 or 2, characterized in that the step of creating an index table according to the feature tag table to which the number package mark is added comprises:根据添加号码包标记的特征标签表创建倒排索引表,所述倒排索引表中以数据标签和号码包标识为属性值、以用户数据的信息为具有所述属性值的数据。An inverted index table is created according to the feature tag table with number package marks added, wherein the inverted index table uses data tags and number package identifiers as attribute values and user data information as data having the attribute values.4.如权利要求1所述的方法,其特征在于,所述获取所述目标数据标签对应的目标标签类别,包括:4. The method according to claim 1, wherein obtaining the target label category corresponding to the target data label comprises:从预置的标签字典中查询所述目标数据标签对应的至少一个标签类别,所述目标标签类别为所述至少一个标签类别中的任意一个。At least one tag category corresponding to the target data tag is queried from a preset tag dictionary, and the target tag category is any one of the at least one tag category.5.如权利要求1所述的方法,其特征在于,所述用户数据的信息还包括所述用户数据的信息所指示的用户数据的存储地址,所述根据所述索引表获取与所述待分析号码包的标识以及所述目标数据标签匹配的待分析用户数据,包括:5. The method according to claim 1, wherein the information of the user data further includes a storage address of the user data indicated by the information of the user data, and the step of obtaining the user data to be analyzed that matches the identifier of the number package to be analyzed and the target data tag according to the index table comprises:根据所述索引表获取与所述待分析号码包的标识以及所述目标数据标签匹配的目标用户数据的信息;Acquire information of target user data matching the identifier of the number package to be analyzed and the target data tag according to the index table;从所述目标用户数据的信息中获取目标用户数据的存储地址,所述目标用户数据为所述目标用户数据的信息所指示的用户数据;Acquire a storage address of the target user data from the information of the target user data, wherein the target user data is user data indicated by the information of the target user data;根据所述存储地址获取所述目标用户数据,并将所述目标用户数据作为待分析用户数据。The target user data is acquired according to the storage address, and the target user data is used as the user data to be analyzed.6.一种数据分析装置,其特征在于,所述装置包括:6. A data analysis device, characterized in that the device comprises:获取单元,用于:在检测到有新增号码包时,获取所述新增号码包;获取根据以目标标签类别的数据标签为标签的用户数据生成的特征标签表,所述特征标签表包括数据标签、用户数据的信息两者之间的映射关系;所述用户数据的信息包括用于标识所述用户数据的信息的号码;An acquisition unit is used to: acquire the newly added number package when a newly added number package is detected; acquire a feature label table generated according to user data labeled with a data label of a target label category, wherein the feature label table includes a mapping relationship between the data label and information of the user data; the information of the user data includes a number for identifying the information of the user data;处理单元,用于:获取所述新增号码包的标识以及所述新增号码包中的号码;在所述特征标签表中添加号码包标记,以为所述特征标签表中符合预设条件的用户数据的信息标记所述新增号码包的标识;符合预设条件是指用于标识用户数据的信息的号码为所述新增号码包中的号码;根据添加号码包标记的特征标签表创建所述目标标签类别对应的索引表;A processing unit, configured to: obtain the identifier of the newly added number package and the numbers in the newly added number package; add a number package tag to the feature tag table to mark the identifier of the newly added number package for the information of the user data that meets the preset conditions in the feature tag table; meeting the preset conditions means that the number used to identify the information of the user data is a number in the newly added number package; create an index table corresponding to the target tag category according to the feature tag table to which the number package tag is added;所述获取单元,还用于在收发单元接收到数据分析请求时,获取所述数据分析请求携带的待分析号码包的标识以及所请求的待分析用户数据的目标数据标签;所述新增号码包与所述待分析号码包匹配;The acquisition unit is further configured to acquire, when the transceiver unit receives the data analysis request, an identifier of the number package to be analyzed carried in the data analysis request and a target data tag of the requested user data to be analyzed; the newly added number package matches the number package to be analyzed;所述获取单元,还用于获取所述目标数据标签对应的目标标签类别,并获取所述目标标签类别对应的索引表,所述索引表包括数据标签、用户数据的信息、号码包标识三者之间的映射关系;The acquisition unit is further used to acquire the target tag category corresponding to the target data tag, and acquire the index table corresponding to the target tag category, wherein the index table includes a mapping relationship between the data tag, the user data information, and the number package identifier;所述处理单元,还用于根据所述索引表获取与所述待分析号码包的标识以及所述目标数据标签匹配的待分析用户数据,所述待分析用户数据为所述索引表中用户数据的信息所指示的用户数据;The processing unit is further configured to obtain, according to the index table, user data to be analyzed that matches the identifier of the number package to be analyzed and the target data tag, wherein the user data to be analyzed is user data indicated by information of the user data in the index table;所述处理单元,还用于对获取到的待分析用户数据进行分析,得到所述待分析号码包中的号码对应的属性特征。The processing unit is further configured to analyze the acquired user data to be analyzed to obtain attribute features corresponding to the numbers in the number package to be analyzed.7.一种服务器,其特征在于,包括:处理器和存储器,所述存储器存储有可执行程序代码,所述处理器用于调用所述可执行程序代码,执行如权利要求1-5中任一项所述的数据分析方法。7. A server, characterized in that it comprises: a processor and a memory, wherein the memory stores executable program code, and the processor is used to call the executable program code to execute the data analysis method according to any one of claims 1 to 5.8.一种计算机存储介质,其特征在于,所述计算机存储介质中存储有指令,当其在计算机上运行时,使得计算机执行如权利要求1-5中任一项所述的数据分析方法。8. A computer storage medium, characterized in that instructions are stored in the computer storage medium, and when the instructions are run on a computer, the computer executes the data analysis method according to any one of claims 1 to 5.9.一种包含指令的计算机程序产品,其特征在于,当其在计算机上运行时,使得计算机执行如权利要求1-5中任一项所述的数据分析方法。9. A computer program product comprising instructions, characterized in that when the computer program product is run on a computer, the computer is enabled to perform the data analysis method according to any one of claims 1 to 5.
CN201910958968.8A2019-10-102019-10-10 Data analysis method, device, server and computer storage mediumActiveCN110737662B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201910958968.8ACN110737662B (en)2019-10-102019-10-10 Data analysis method, device, server and computer storage medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201910958968.8ACN110737662B (en)2019-10-102019-10-10 Data analysis method, device, server and computer storage medium

Publications (2)

Publication NumberPublication Date
CN110737662A CN110737662A (en)2020-01-31
CN110737662Btrue CN110737662B (en)2024-06-18

Family

ID=69270051

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201910958968.8AActiveCN110737662B (en)2019-10-102019-10-10 Data analysis method, device, server and computer storage medium

Country Status (1)

CountryLink
CN (1)CN110737662B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN111427884A (en)*2020-03-032020-07-17中国平安人寿保险股份有限公司 Form data processing method, device, electronic device and storage medium
CN113421108B (en)*2021-05-112025-04-18北京沃东天骏信息技术有限公司 A method, device, equipment and storage medium for determining data relationship
CN117579456B (en)*2023-10-182024-10-29中移互联网有限公司Service message sending method and device, electronic equipment and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110020333A (en)*2017-07-272019-07-16北京嘀嘀无限科技发展有限公司Data analysing method and device, electronic equipment, storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN106959965B (en)*2016-01-122021-02-05腾讯科技(北京)有限公司Information processing method and server
CN106997357B (en)*2016-01-222020-10-09腾讯科技(深圳)有限公司Message processing method, device and system
CN107918618B (en)*2016-10-102023-03-24腾讯科技(北京)有限公司Data processing method and device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110020333A (en)*2017-07-272019-07-16北京嘀嘀无限科技发展有限公司Data analysing method and device, electronic equipment, storage medium

Also Published As

Publication numberPublication date
CN110737662A (en)2020-01-31

Similar Documents

PublicationPublication DateTitle
US9953639B2 (en)Voice recognition system and construction method thereof
WO2018024057A1 (en)Method and apparatus for accessing service
CN110221901A (en)Container asset creation method, apparatus, equipment and computer readable storage medium
US11308031B2 (en)Resolving in-memory foreign keys in transmitted data packets from single-parent hierarchies
CN107483522B (en)Service access method and device
CN109669980B (en)Cross-database access method and device for data
US20110302277A1 (en)Methods and apparatus for web-based migration of data in a multi-tenant database system
CN110737662B (en) Data analysis method, device, server and computer storage medium
CN108319661A (en)A kind of structured storage method and device of spare part information
US20140007038A1 (en)Social project management system and marketplace
CN113076729B (en)Method and system for importing report, readable storage medium and electronic equipment
CN112579898A (en)Enterprise information management method and device and server
CN108959294B (en)Method and device for accessing search engine
CN107491463B (en)Optimization method and system for data query
CN108154024B (en)Data retrieval method and device and electronic equipment
CN114820080A (en) User grouping method, system, device and medium based on crowd flow
CN117725077A (en)Identification search method, apparatus, computer device, storage medium, and program product
CN105786941B (en)Information mining method and device
WO2020024824A1 (en)Method and device for determining user status identifier
CN112231377B (en) Data mapping method, system, device, server and storage medium
CN110266596B (en)Message processing method, device, equipment and computer readable storage medium
CN110674383B (en)Public opinion query method, device and equipment
US20130262662A1 (en)Methods and systems for smart adapters in a social media content analytics environment
US20210141791A1 (en)Method and system for generating a hybrid data model
CN113449003B (en)Information query method, device, electronic equipment and medium

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
REGReference to a national code

Ref country code:HK

Ref legal event code:DE

Ref document number:40021043

Country of ref document:HK

SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant
TG01Patent term adjustment
TG01Patent term adjustment

[8]ページ先頭

©2009-2025 Movatter.jp