Movatterモバイル変換


[0]ホーム

URL:


CN115115386A - Data analysis method, apparatus, equipment and storage medium - Google Patents

Data analysis method, apparatus, equipment and storage medium
Download PDF

Info

Publication number
CN115115386A
CN115115386ACN202110308220.0ACN202110308220ACN115115386ACN 115115386 ACN115115386 ACN 115115386ACN 202110308220 ACN202110308220 ACN 202110308220ACN 115115386 ACN115115386 ACN 115115386A
Authority
CN
China
Prior art keywords
data
trigger
address
conversion
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110308220.0A
Other languages
Chinese (zh)
Inventor
林俊安
黄卓玥
傅含笑
康彪
张源良
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co LtdfiledCriticalBeijing ByteDance Network Technology Co Ltd
Priority to CN202110308220.0ApriorityCriticalpatent/CN115115386A/en
Publication of CN115115386ApublicationCriticalpatent/CN115115386A/en
Pendinglegal-statusCriticalCurrent

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本公开实施例涉及一种数据分析方法、装置、设备和存储介质。该方法包括:获取与推广信息对应的预设操作完成时生成的转化数据;解析转化数据,并基于解析结果确定与转化数据相匹配的多个候选触发数据;其中,触发数据是不同信息发布平台中推广信息被触发时获得的数据;确定每个候选触发数据与转化数据的匹配置信度;基于各候选触发数据对应的匹配置信度,确定转化数据对应的目标触发数据。通过上述技术方案,实现了在不引入新的数据的情况下,确定每个候选触发数据与转化数据之间的匹配置信度,进而确定一个与转化数据最为匹配的目标触发数据,提高了推广信息在各信息发布平台被转化的效果分析的准确性和召回率。

Figure 202110308220

Embodiments of the present disclosure relate to a data analysis method, apparatus, device, and storage medium. The method includes: acquiring conversion data generated when a preset operation corresponding to promotion information is completed; analyzing the conversion data, and determining a plurality of candidate trigger data matching the conversion data based on the analysis result; wherein, the trigger data are different information release platforms The data obtained when the promotion information is triggered; determine the matching confidence between each candidate trigger data and the conversion data; and determine the target trigger data corresponding to the conversion data based on the matching confidence corresponding to each candidate trigger data. Through the above technical solution, the matching confidence between each candidate trigger data and the conversion data is determined without introducing new data, and then a target trigger data that best matches the conversion data is determined, which improves the promotion information. The accuracy and recall rate of the effect analysis transformed on each information release platform.

Figure 202110308220

Description

Translated fromChinese
数据分析方法、装置、设备和存储介质Data analysis method, apparatus, equipment and storage medium

技术领域technical field

本公开涉及计算机技术领域,尤其涉及一种数据分析方法、装置、设备和存储介质。The present disclosure relates to the field of computer technology, and in particular, to a data analysis method, apparatus, device, and storage medium.

背景技术Background technique

随着互联网技术的发展,内容投放平台日益多元化。为了分析各个平台的投放情况,需要对平台流量带来的转化进行分析,即确定转化效果来自于哪个投放平台。With the development of Internet technology, content delivery platforms are increasingly diversified. In order to analyze the delivery situation of each platform, it is necessary to analyze the conversion brought by the platform traffic, that is, to determine which delivery platform the conversion effect comes from.

但是,现有技术中的转化分析方法通常结果精度较低,甚至无法获取到分析结果。However, the conversion analysis method in the prior art usually results in low precision, and even cannot obtain the analysis result.

发明内容SUMMARY OF THE INVENTION

为了解决上述技术问题或者至少部分地解决上述技术问题,本公开提供了一种数据分析方法、装置、设备和存储介质。In order to solve the above technical problems or at least partially solve the above technical problems, the present disclosure provides a data analysis method, apparatus, device and storage medium.

本公开实施例提供了一种数据分析方法,该方法包括:The embodiment of the present disclosure provides a data analysis method, the method includes:

获取与推广信息对应的预设操作完成时生成的转化数据;Obtain the conversion data generated when the preset operation corresponding to the promotion information is completed;

解析所述转化数据,并基于解析结果确定与所述转化数据相匹配的多个候选触发数据;其中,所述候选触发数据是不同信息发布平台中所述推广信息被触发时获得的数据;Analyzing the conversion data, and determining a plurality of candidate trigger data matching the conversion data based on the analysis result; wherein, the candidate trigger data is the data obtained when the promotion information in different information publishing platforms is triggered;

确定每个所述候选触发数据与所述转化数据的匹配置信度;determining the matching confidence of each of the candidate trigger data and the conversion data;

基于各所述候选触发数据对应的所述匹配置信度,确定所述转化数据对应的目标触发数据。Based on the matching confidence level corresponding to each candidate trigger data, target trigger data corresponding to the conversion data is determined.

本公开实施例还提供了一种数据分析装置,所述装置包括:Embodiments of the present disclosure also provide a data analysis device, the device comprising:

转化数据获取模块,用于获取与推广信息对应的预设操作完成时生成的转化数据;The conversion data acquisition module is used to acquire the conversion data generated when the preset operation corresponding to the promotion information is completed;

候选触发数据确定模块,用于解析所述转化数据,并基于解析结果确定与所述转化数据相匹配的多个候选触发数据;其中,所述候选触发数据是不同信息发布平台中所述推广信息被触发时获得的数据;A candidate trigger data determination module, configured to parse the conversion data, and determine a plurality of candidate trigger data matching the conversion data based on the analysis result; wherein, the candidate trigger data is the promotion information in different information publishing platforms The data obtained when it is triggered;

匹配置信度确定模块,用于确定每个所述候选触发数据与所述转化数据的匹配置信度;a matching confidence level determination module for determining the matching confidence level of each of the candidate trigger data and the conversion data;

目标触发数据确定模块,用于基于各所述候选触发数据对应的所述匹配置信度,确定所述转化数据对应的目标触发数据。A target trigger data determination module, configured to determine target trigger data corresponding to the conversion data based on the matching confidence level corresponding to each candidate trigger data.

本公开实施例还提供了一种的电子设备,该电子设备包括:Embodiments of the present disclosure also provide an electronic device, the electronic device comprising:

处理器和存储器;processor and memory;

所述处理器通过调用所述存储器存储的程序或指令,用于执行本发明任意实施例中所述数据分析方法的步骤。The processor is configured to execute the steps of the data analysis method in any embodiment of the present invention by invoking a program or an instruction stored in the memory.

本公开实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储程序或指令,所述程序或所述指令使计算机执行本发明任意实施例中所述数据分析方法的步骤。An embodiment of the present disclosure further provides a computer-readable storage medium, where the computer-readable storage medium stores a program or an instruction, and the program or the instruction causes a computer to execute the steps of the data analysis method in any embodiment of the present disclosure .

本公开实施例提供的数据分析方案,通过获取与推广信息对应的预设操作完成时生成的转化数据;解析转化数据,并基于解析结果确定与转化数据相匹配的多个候选触发数据;其中,候选触发数据是不同信息发布平台中推广信息被触发时获得的数据;确定每个候选触发数据与转化数据的匹配置信度;基于各候选触发数据对应的匹配置信度,确定转化数据对应的目标触发数据。实现了通过分析每个候选触发数据与转化数据之间的匹配置信度,进而根据匹配置信度来选择一个与转化数据最为匹配的目标触发数据,提高了推广信息的转化分析的准确性,也能够通过灵活调节候选触发数据筛选的置信度的阈值,提高推广信息的转化分析的召回率。In the data analysis solution provided by the embodiments of the present disclosure, the conversion data generated when the preset operation corresponding to the promotion information is completed is obtained; the conversion data is parsed, and a plurality of candidate trigger data matching the conversion data is determined based on the analysis result; wherein, Candidate trigger data is the data obtained when promotion information is triggered in different information release platforms; determine the matching confidence between each candidate trigger data and conversion data; determine the target trigger corresponding to the conversion data based on the matching confidence corresponding to each candidate trigger data data. By analyzing the matching confidence between each candidate trigger data and the conversion data, and then selecting a target trigger data that best matches the conversion data according to the matching confidence, the accuracy of the conversion analysis of the promotion information is improved, and it can also be used. By flexibly adjusting the confidence threshold of candidate trigger data screening, the recall rate of conversion analysis of promotion information is improved.

附图说明Description of drawings

此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description serve to explain the principles of the disclosure.

为了更清楚地说明本公开实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the accompanying drawings that are required to be used in the description of the embodiments or the prior art will be briefly introduced below. In other words, on the premise of no creative labor, other drawings can also be obtained from these drawings.

图1为本公开实施例提供的一种数据分析方法的流程示意图;1 is a schematic flowchart of a data analysis method according to an embodiment of the present disclosure;

图2为本公开实施例提供的又一种数据分析方法的流程示意图;2 is a schematic flowchart of another data analysis method provided by an embodiment of the present disclosure;

图3为本公开实施例提供的又一种数据分析方法的流程示意图;3 is a schematic flowchart of another data analysis method provided by an embodiment of the present disclosure;

图4为本公开实施例提供的一种数据分析装置的结构示意图;FIG. 4 is a schematic structural diagram of a data analysis device according to an embodiment of the present disclosure;

图5为本公开实施例提供的一种电子设备的结构示意图。FIG. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

具体实施方式Detailed ways

为了能够更清楚地理解本公开的上述目的、特征和优点,下面将对本公开的方案进行进一步的详细描述。需要说明的是,在不冲突的情况下,本公开的实施例及实施例中的特征可以相互组合。In order to more clearly understand the above objects, features and advantages of the present disclosure, the solutions of the present disclosure will be described in further detail below. It should be noted that the embodiments of the present disclosure and the features in the embodiments may be combined with each other under the condition of no conflict.

在下面的描述中阐述了很多具体细节以便于充分理解本公开,但本公开还可以采用其他不同于在此描述的方式来实施;显然,说明书中的实施例只是本公开的一部分实施例,而不是全部的实施例。Many specific details are set forth in the following description to facilitate a full understanding of the present disclosure, but the present disclosure can also be implemented in other ways different from those described herein; obviously, the embodiments in the specification are only a part of the embodiments of the present disclosure, and Not all examples.

本公开实施例提供的数据分析方法,主要适用于推广信息在不同投放渠道(也称信息发布平台)的转化效果的分析(也称为渠道归因)场景,尤其适用于推广信息被触发时生成的触发数据中无法获取设备号或渠道号等唯一标识性的信息的渠道归因场景,如短链投放场景。本公开实施例提供的数据分析方法可以由数据分析装置来执行,该装置可以由软件和/或硬件的方式实现,该装置可以集成在具有一定计算能力的电子设备中,例如手机、平板电脑、笔记本电脑、台式电脑或服务器等。The data analysis method provided by the embodiment of the present disclosure is mainly applicable to the analysis (also called channel attribution) scenario of the conversion effect of promotion information on different delivery channels (also called information release platforms), and is especially suitable for generating when promotion information is triggered Channel attribution scenarios where unique identifying information such as device number or channel number cannot be obtained from the trigger data, such as short-chain delivery scenarios. The data analysis method provided by the embodiments of the present disclosure may be executed by a data analysis apparatus, which may be implemented in software and/or hardware, and the apparatus may be integrated into an electronic device with certain computing capabilities, such as a mobile phone, a tablet computer, a Laptop, desktop or server etc.

图1是本公开实施例提供的一种数据分析方法的流程图。参见图1,该数据分析方法具体包括:FIG. 1 is a flowchart of a data analysis method provided by an embodiment of the present disclosure. Referring to Figure 1, the data analysis method specifically includes:

S110、获取与推广信息对应的预设操作完成时生成的转化数据。S110. Acquire conversion data generated when a preset operation corresponding to the promotion information is completed.

其中,推广信息是指通过各个信息发布平台来发布、推广的信息,例如可以是应用程序app,也可以是电商物品等。预设操作是指推广信息被转化时对应的操作,例如可以是冷启动并注册应用程序,升级应用程序的使用级别,浏览、加购和/或购买电商平台中的物品,点击并跳转至推广网页等。转化数据是指推广信息被转化时设备生成的数据,其中包含转化时用户授权的IP地址(即转化IP地址)和用户授权的设备属性信息(例如设备号、设备操作系统信息、设备品牌和型号等)。Among them, the promotion information refers to the information released and promoted through various information release platforms, for example, it may be an application program or an e-commerce item. The preset operation refers to the corresponding operation when the promotion information is converted, for example, it can be cold start and register the application, upgrade the usage level of the application, browse, add and/or purchase items in the e-commerce platform, click and jump to promotion pages, etc. Conversion data refers to the data generated by the device when the promotion information is converted, which includes the IP address authorized by the user at the time of conversion (ie the conversion IP address) and the device attribute information authorized by the user (such as device number, device operating system information, device brand and model) Wait).

具体地,要对推广信息的不同信息发布平台进行信息转化效果的分析,首先要获取推广信息被转化时的数据,即获取推广信息对应的预设操作执行完成时生成的转化数据。Specifically, to analyze the effect of information conversion on different information publishing platforms of promotion information, first obtain the data when the promotion information is converted, that is, obtain the conversion data generated when the preset operation corresponding to the promotion information is executed.

S120、解析转化数据,并基于解析结果确定与转化数据相匹配的多个候选触发数据。S120. Analyze the conversion data, and determine multiple candidate trigger data matching the conversion data based on the analysis result.

其中,触发数据是不同信息发布平台中推广信息被触发时获得的数据,例如可以是任一信息发布平台中推广信息被点击、被语音控制、被眼动控制等触发方式而触发时生成的数据。触发数据可包含推广信息被触发时对应设备的用户授权的IP地址(即触发IP地址)和用户授权的用户代理字符串(User-Agent,UA)字符串,该用户授权的UA字符串中可解析得到用户授权的设备属性信息,但该用户授权的设备属性信息中可能不包含设备号。Among them, the trigger data is the data obtained when the promotion information is triggered in different information distribution platforms, for example, it can be the data generated when the promotion information in any information distribution platform is triggered by triggering methods such as clicks, voice control, eye movement control, etc. . The trigger data may include the user-authorized IP address of the corresponding device when the promotion information is triggered (ie, the triggering IP address) and the user-agent (User-Agent, UA) string authorized by the user. Parse the device attribute information authorized by the user, but the device attribute information authorized by the user may not contain the device number.

具体地,解析推广信息的转化数据,获得解析结果。再将该解析结果与其对应的所有触发数据进行粗略匹配,获得与该解析结果匹配的多个触发数据,作为候选触发数据。在一些实施例中,可以利用相关技术中基于用户授权的IP地址和用户授权的UA字符串进行投放平台的转化分析的方式,对转化数据和各触发数据进行粗略匹配,获得各候选触发数据。在一些实施例中,可以从基于用户授权的IP地址和用户授权的UA字符串进行转化分析的结果中直接获取推广信息对应的各候选触发数据。Specifically, the conversion data of the promotion information is analyzed to obtain the analysis result. The analysis result is then roughly matched with all trigger data corresponding to it, and multiple trigger data matching the analysis result are obtained as candidate trigger data. In some embodiments, the conversion analysis method of the delivery platform based on the user-authorized IP address and the user-authorized UA string in the related art can be used to roughly match the conversion data and each trigger data to obtain each candidate trigger data. In some embodiments, each candidate trigger data corresponding to the promotion information may be directly obtained from the result of the conversion analysis based on the user-authorized IP address and the user-authorized UA string.

S130、确定每个候选触发数据与转化数据的匹配置信度。S130. Determine the matching confidence between each candidate trigger data and the conversion data.

其中,匹配置信度是指两个数据之间正确匹配的可靠性程度。匹配置信度越高,说明转化数据转化自该匹配置信度对应的候选触发数据的可能性越大。Among them, matching confidence refers to the reliability of correct matching between two data. The higher the matching confidence, the greater the possibility that the conversion data is converted from the candidate trigger data corresponding to the matching confidence.

在触发数据中无法获取到设备号或渠道号的情况下,相关技术中会采用两种方案进行推广信息在各信息发布平台的转化分析。一种方案是基于用户授权的IP地址和用户授权的UA字符串进行转化分析,但是该方案所得的分析结果精度较低,甚至无法获取到目标触发数据,而且会因为该方案所得目标触发数据的精度低而无法避免数据作弊问题。另一种方案是向触发数据和转化数据中同时增加具有唯一标识性的信息,以达到触发数据和转化数据的精确匹配。但是,该方案会引入新的数据,对已投入运行的业务的兼容性较差。In the case where the device number or channel number cannot be obtained from the trigger data, two schemes are adopted in the related technology to analyze the conversion of the promotion information on each information release platform. One solution is to perform transformation analysis based on the IP address authorized by the user and the UA string authorized by the user, but the analysis results obtained by this solution are of low precision, and the target trigger data cannot even be obtained, and the target trigger data obtained by this solution will be changed. The accuracy is low and the problem of data cheating cannot be avoided. Another solution is to add unique identifying information to both trigger data and conversion data to achieve exact matching of trigger data and conversion data. However, this solution will introduce new data and is less compatible with the already-running business.

鉴于上述情况,本公开实施例在不引入新数据的基础上,利用已获得的用户授权的IP地址和用户授权的UA字符串,进行转化数据和每个候选触发数据之间的匹配置信度的计算。例如,可以分析用户授权的IP地址和由用户授权的UA字符串所得的用户授权的设备属性信息中的各属性维度的属性在归因匹配中的作用和影响趋势,基于贝叶斯原理,构建置信度计算的概率模型。然后,利用构建的概率模型计算转化数据和每个候选触发数据之间的匹配置信度。In view of the above situation, on the basis of not introducing new data, the embodiment of the present disclosure uses the obtained user-authorized IP address and user-authorized UA string to determine the matching confidence between the conversion data and each candidate trigger data. calculate. For example, it is possible to analyze the role and influence trend of attributes of each attribute dimension in the attribute information of the user-authorized device obtained from the IP address authorized by the user and the UA string authorized by the user in attribution matching. Based on the Bayesian principle, construct Probabilistic model for confidence calculation. Then, the matching confidence between the conversion data and each candidate trigger data is calculated using the constructed probabilistic model.

S140、基于各候选触发数据对应的匹配置信度,确定转化数据对应的目标触发数据。S140. Determine target trigger data corresponding to the conversion data based on the matching confidence level corresponding to each candidate trigger data.

具体地,根据每个候选触发数据对应的匹配置信度,从所有候选触发数据中为转化数据确定一个最终的归因结果,即目标触发数据。Specifically, according to the matching confidence level corresponding to each candidate trigger data, a final attribution result, that is, target trigger data, is determined for the conversion data from all the candidate trigger data.

在一些实施例中,目标触发数据的确定过程可以为:通过设置匹配置信度的阈值来筛选候选触发数据,进而确定目标触发数据。例如,将超过阈值的候选触发数据确定为目标触发数据。如果超过阈值的候选触发数据有多个,那么可以进一步筛选。例如,进一步从超过阈值的候选触发数据中随机选择一个作为目标触发数据;或者,从超过阈值的候选触发数据的匹配置信度中,选择最高或中位数等指标对应的匹配置信度,并将该选择的匹配置信度对应的候选触发数据作为目标触发数据。本实施例中,可以根据业务需求来灵活调节匹配置信度的阈值。例如,业务需求侧重于目标触发数据的召回率,那么针对转化数据对应的各候选触发数据的匹配置信度普遍较低的情况,可以适当减小其阈值,确保较多的转化数据能够匹配到目标触发数据。又如,业务需求侧重于目标触发数据的准确率,那么可以适当增大阈值。In some embodiments, the process of determining the target trigger data may be as follows: selecting the candidate trigger data by setting a threshold of matching confidence, and then determining the target trigger data. For example, candidate trigger data exceeding a threshold value is determined as target trigger data. If there are multiple candidate trigger data that exceed the threshold, it can be further filtered. For example, further randomly select one of the candidate trigger data exceeding the threshold as the target trigger data; or, from the matching confidence levels of the candidate trigger data exceeding the threshold, select the matching confidence level corresponding to the index such as the highest or the median, and use The candidate trigger data corresponding to the selected matching confidence level is used as the target trigger data. In this embodiment, the threshold of the matching confidence can be flexibly adjusted according to business requirements. For example, if the business requirement focuses on the recall rate of target trigger data, in the case where the matching confidence of each candidate trigger data corresponding to the conversion data is generally low, the threshold can be appropriately reduced to ensure that more conversion data can match the target. trigger data. For another example, if the business requirement focuses on the accuracy of target trigger data, the threshold can be appropriately increased.

在一些实施例中,目标触发数据的确定过程可以为:将最大匹配置信度对应的候选触发数据确定为转化数据对应的目标触发数据。本实施例中,为了兼顾准确性和召回率,将所有匹配置信度中数值最高的匹配置信度(即最大匹配置信度)对应的候选触发数据确定为目标触发数据。In some embodiments, the process of determining the target trigger data may be: determining candidate trigger data corresponding to the maximum matching confidence as target trigger data corresponding to the conversion data. In this embodiment, in order to take into account both the accuracy and the recall rate, the candidate trigger data corresponding to the matching confidence level with the highest numerical value among all the matching confidence levels (ie, the maximum matching confidence level) is determined as the target trigger data.

在一些实施例中,目标触发数据的确定过程可以为:若各匹配置信度中存在多个最大匹配置信度,则从各最大匹配置信度对应的候选触发数据中,筛选触发数据的数据生成时间最接近于转化数据的数据生成时间的候选触发数据,并将筛选出的候选触发数据确定为转化数据对应的目标触发数据。本实施例中,如果确定出的最大匹配置信度为多个,那么需要对该多个最大匹配置信度对应的候选触发数据进行再次筛选。考虑到用户触发推广信息后,即时进行转化的可能性较大,故根据候选触发数据的数据生成时间和转化数据的数据生成时间的接近程度来再次筛选候选触发数据,以进一步提高数据分析准确性。具体实施时,选择最大匹配置信度对应的候选触发数据中数据生成时间最接近转化数据的数据生成时间的候选触发数据,并将其作为该转化数据的目标触发数据。In some embodiments, the process of determining the target trigger data may be as follows: if there are multiple maximum matching confidence levels in each matching confidence level, screening the data generation time of the trigger data from the candidate trigger data corresponding to each maximum matching confidence level The candidate trigger data closest to the data generation time of the conversion data is determined, and the filtered candidate trigger data is determined as the target trigger data corresponding to the conversion data. In this embodiment, if there are multiple maximum matching confidence levels determined, the candidate trigger data corresponding to the multiple maximum matching confidence levels need to be screened again. Considering that after the user triggers the promotion information, the possibility of immediate conversion is high. Therefore, the candidate trigger data is screened again according to the closeness of the data generation time of the candidate trigger data and the data generation time of the conversion data to further improve the accuracy of data analysis. . During specific implementation, the candidate trigger data whose data generation time is closest to the data generation time of the conversion data among the candidate trigger data corresponding to the maximum matching confidence is selected as the target trigger data of the conversion data.

本公开实施例的上述技术方案,通过获取与推广信息对应的预设操作完成时生成的转化数据;解析转化数据,并基于解析结果确定与转化数据相匹配的多个候选触发数据;其中,候选触发数据是不同信息发布平台中推广信息被触发时获得的数据;确定每个候选触发数据与转化数据的匹配置信度;基于各候选触发数据对应的匹配置信度,确定转化数据对应的目标触发数据。实现了通过分析每个候选触发数据与转化数据之间的匹配置信度,进而根据匹配置信度来选择一个与转化数据最为匹配的目标触发数据,提高了推广信息的转化分析的准确性,也能够通过灵活调节候选触发数据筛选的置信度的阈值,提高推广信息的转化分析的召回率。In the above technical solutions of the embodiments of the present disclosure, the conversion data generated when the preset operation corresponding to the promotion information is completed is obtained; the conversion data is parsed, and a plurality of candidate trigger data matching the conversion data are determined based on the analysis result; Trigger data is the data obtained when promotion information is triggered in different information publishing platforms; determine the matching confidence between each candidate trigger data and conversion data; determine the target trigger data corresponding to the conversion data based on the matching confidence corresponding to each candidate trigger data . By analyzing the matching confidence between each candidate trigger data and the conversion data, and then selecting a target trigger data that best matches the conversion data according to the matching confidence, the accuracy of the conversion analysis of the promotion information is improved, and it can also be used. By flexibly adjusting the confidence threshold of candidate trigger data screening, the recall rate of conversion analysis of promotion information is improved.

图2是本公开实施例提供的又一种数据分析方法的流程图。其对“基于解析结果确定与转化数据相匹配的推广信息的多个候选触发数据”进行了进一步优化。在此基础上,还可以进一步对“确定每个候选触发数据与转化数据的匹配置信度”进行优化。在上述基础上,还可以进一步增加触发数据过滤的相关步骤。其中与上述各实施例相同或相应的术语的解释在此不再赘述。参见图2,该数据分析方法包括:FIG. 2 is a flowchart of another data analysis method provided by an embodiment of the present disclosure. It further optimizes "determining multiple candidate trigger data of promotion information matching the conversion data based on the parsing result". On this basis, "determining the matching confidence between each candidate trigger data and conversion data" can be further optimized. On the basis of the above, further steps related to triggering data filtering may be added. The explanations of terms that are the same as or corresponding to the above embodiments are not repeated here. Referring to Figure 2, the data analysis method includes:

S201、基于解析结果确定转化数据对应的用户授权的转化IP地址和用户授权的转化设备属性信息。S201. Determine, based on the analysis result, a user-authorized conversion IP address and user-authorized conversion device attribute information corresponding to the conversion data.

其中,转化设备属性信息是指用户执行预设操作时所用设备的设备信息。在一些实施例中,转化设备属性信息包括设备品牌(如设备生产厂商)、设备型号、操作系统类型(如Android或IOS等)和系统版本号。The conversion device attribute information refers to the device information of the device used when the user performs the preset operation. In some embodiments, the converted device attribute information includes a device brand (eg, a device manufacturer), a device model, an operating system type (eg, Android or IOS, etc.), and a system version number.

具体地,对获取的转化数据进行ETL属性计算,例如按照IP地址、设备品牌、设备型号、操作系统类型和系统版本号等对转化数据进行提取和归一化处理,获得该转化数据中包含的用户授权的转化IP地址和用户授权的转化设备属性信息,用于后续的初始匹配过程。Specifically, ETL attribute calculation is performed on the acquired conversion data, for example, the conversion data is extracted and normalized according to IP address, device brand, device model, operating system type and system version number, etc., and the conversion data contained in the conversion data is obtained. The user-authorized conversion IP address and the user-authorized conversion device attribute information are used for the subsequent initial matching process.

S202、获取各信息发布平台中的推广信息被触发时生成的各触发数据,并解析各触发数据,获得每个触发数据对应的用户授权的触发设备属性信息和用户授权的触发IP地址。S202: Acquire each trigger data generated when the promotion information in each information release platform is triggered, parse each trigger data, and obtain user-authorized trigger device attribute information and user-authorized trigger IP address corresponding to each trigger data.

其中,触发设备属性信息是指用户执行触发行为时所用设备的设备信息。触发设备属性信息中包含的信息取决于触发数据获取过程中所能获得的设备信息。在一些实施例中,触发设备属性信息包括设备品牌、设备型号、操作系统类型和系统版本号中的至少一个属性维度上的属性信息。The triggering device attribute information refers to the device information of the device used when the user performs the triggering behavior. The information contained in the trigger device attribute information depends on the device information that can be obtained during the trigger data acquisition process. In some embodiments, the triggering device attribute information includes attribute information on at least one attribute dimension among device brand, device model, operating system type, and system version number.

具体地,在统计归因时间窗口内,从推广信息发布的各个信息发布平台中获取该推广信息被触发时生成的触发数据。然后,对每个触发数据都进行ETL属性计算。例如,从触发数据中提取用户授权的触发IP地址,并解析其中用户授权的UA字符串,获得设备品牌、设备型号、操作系统类型和系统版本号共四个属性维度中有实质数据的设备属性,并对其进行归一化处理,获得每个触发数据中包含的用户授权的触发IP地址和用户授权的触发设备属性信息,用于后续的初始匹配过程。Specifically, within the statistical attribution time window, the trigger data generated when the promotion information is triggered is obtained from each information release platform where the promotion information is released. Then, ETL attribute calculation is performed for each trigger data. For example, extract the trigger IP address authorized by the user from the trigger data, and parse the UA string authorized by the user, and obtain the device attributes with substantial data in four attribute dimensions: device brand, device model, operating system type, and system version number. , and normalize it to obtain the user-authorized triggering IP address and user-authorized triggering device attribute information contained in each triggering data, which are used in the subsequent initial matching process.

需要说明的是,获取到的触发数据以HashMap存储在Redis中,以便在O(1)的时间复杂度内获取到所有可能匹配的触发数据,进一步提高归因效率。It should be noted that the obtained trigger data is stored in Redis as a HashMap, so that all possible matching trigger data can be obtained within O(1) time complexity, which further improves the attribution efficiency.

S203、判断用户授权的触发IP地址是否属于异常IP类别或过大公有IP类别。S203. Determine whether the triggering IP address authorized by the user belongs to an abnormal IP category or an excessively large public IP category.

其中,过大公有IP类别为IP地址的共用人数超过设定阈值的IP类别。该设定阈值可根据实际业务情况进行设置。Among them, the excessively large public IP category is the IP category in which the number of shared IP addresses exceeds the set threshold. The set threshold can be set according to actual business conditions.

具体地,考虑到过大公有IP类别下有太多用户的IP地址相同,很容易导致归因错误,故为了进一步提高归因准确性,本公开实施例中将属于过大公有IP类别的触发数据剔除。另外,考虑到异常IP没有归因的实际意义,故也剔除属于异常IP类别的触发数据。具体实施时,对每个用户授权的触发IP地址进行IP类别的识别,以判断其是否属于异常IP类别或过大公有IP类别。如果均不属于,则执行S250;如果属于其中一个类别,则执行S240。Specifically, considering that there are too many users with the same IP address in the too large public IP category, it is easy to cause attribution errors. Therefore, in order to further improve the accuracy of attribution, in the embodiment of the present disclosure, the triggers belonging to the too large public IP category are used. Data culling. In addition, considering that abnormal IPs have no practical significance for attribution, trigger data belonging to the category of abnormal IPs are also excluded. During specific implementation, the IP category is identified for the triggering IP address authorized by each user to determine whether it belongs to an abnormal IP category or an excessively large public IP category. If none of them belong, execute S250; if they belong to one of the categories, execute S240.

在一些实施例中,在判断用户授权的触发IP地址是否属于异常IP类别或过大公有IP类别之前,该数据分析方法还包括:收集多个用户授权的IP地址,并对多个用户授权的IP地址进行统计分类,获得异常IP类别和过大公有IP类别,该异常IP类别中包含各异常IP地址,该过大公有IP类别中包含各过大公有IP地址。本实施例中会预先收集多个用户授权的IP地址。例如,基于大数据处理平台,用Hive每天定时分析存储在Hadoop中的日志数据,以获取到全网的用户授权的IP地址。然后,对这些用户授权的IP地址进行IP字段分类操作。将所有的异常IP地址划分至异常IP类别,将所有的过大公有IP地址划分至过大公有IP类别。In some embodiments, before judging whether the triggering IP address authorized by the user belongs to an abnormal IP category or an excessively large public IP category, the data analysis method further includes: collecting IP addresses authorized by multiple users, and analyzing the IP addresses authorized by multiple users. The IP addresses are statistically classified to obtain an abnormal IP category and an excessively large public IP category. The abnormal IP category includes each abnormal IP address, and the excessively large public IP category includes each excessively large public IP address. In this embodiment, IP addresses authorized by multiple users are collected in advance. For example, based on the big data processing platform, Hive is used to regularly analyze the log data stored in Hadoop every day to obtain the IP addresses authorized by users of the entire network. Then, the IP field classification operation is performed on the IP addresses authorized by these users. All abnormal IP addresses are classified into abnormal IP categories, and all excessively large public IP addresses are classified into excessively large public IP addresses.

相应地,本实施例中S230可以实现为:将用户授权的触发IP地址分别与各异常IP地址和各过大公有IP地址进行匹配;若与异常IP地址匹配成功,则确定用户授权的触发IP地址属于异常IP类别;若与过大公有IP地址匹配成功,则确定用户授权的触发IP地址属于过大公有IP类别。若均未匹配成功,说明该用户授权的触发IP地址不属于异常IP类别和过大公有IP类别。这样能够提高用户授权的触发IP地址的识别效率,从而提高触发数据过滤的效率。Correspondingly, in this embodiment, S230 may be implemented as: the triggering IP address authorized by the user is respectively matched with each abnormal IP address and each excessive public IP address; if the matching with the abnormal IP address is successful, the triggering IP address authorized by the user is determined. The address belongs to the abnormal IP category; if it matches the excessively large public IP address successfully, it is determined that the triggering IP address of the user authorization belongs to the excessively large public IP address. If none of the matches are successful, it means that the triggering IP address authorized by the user does not belong to the abnormal IP category or the excessively large public IP category. In this way, the identification efficiency of the triggering IP address authorized by the user can be improved, thereby improving the efficiency of triggering data filtering.

S204、若是,则剔除相应用户授权的触发IP地址对应的触发数据。S204. If yes, remove the trigger data corresponding to the trigger IP address authorized by the corresponding user.

具体地,从所有收集的触发数据中剔除用户授权的触发IP地址属于异常IP类别或过大公有IP类别的触发数据,剩余的触发数据参与后续初始匹配的操作。Specifically, from all the collected trigger data, the trigger IP address authorized by the user belongs to the abnormal IP category or the trigger data of the excessively large public IP category, and the remaining trigger data participates in the subsequent initial matching operation.

S205、基于用户授权的转化IP地址、各用户授权的触发IP地址、用户授权的转化设备属性信息和各用户授权的触发设备属性信息,对转化数据和各触发数据进行匹配,获得与转化数据相匹配的各候选触发数据。S205 , based on the conversion IP address authorized by the user, the triggering IP address authorized by each user, the attribute information of the conversion device authorized by the user, and the attribute information of the trigger device authorized by each user, match the conversion data and each trigger data, and obtain the corresponding conversion data. Matching candidate trigger data.

具体地,针对转化数据和任一触发数据,对用户授权的转化IP地址和用户授权的触发IP地址进行匹配,同时对用户授权的转化设备属性信息和用户授权的触发设备属性信息进行匹配。根据两类数据的匹配程度来确定该触发数据与该转化数据是否达到初始匹配要求。如果满足要求,则将该触发数据作为候选触发数据。Specifically, for the conversion data and any trigger data, the conversion IP address authorized by the user and the trigger IP address authorized by the user are matched, and the attribute information of the conversion device authorized by the user and the attribute information of the trigger device authorized by the user are matched at the same time. Whether the trigger data and the conversion data meet the initial matching requirement is determined according to the matching degree of the two types of data. If the requirements are met, the trigger data is used as candidate trigger data.

应当理解的是,初始匹配的各候选触发数据和转化数据之间没有信息冲突。例如,候选触发数据中可能不包含某些属性维度的属性信息,但是其包含的其他属性维度的属性信息与转化数据中的对应属性维度的属性信息一致。It should be understood that there is no information conflict between each candidate trigger data of the initial match and the conversion data. For example, the candidate trigger data may not include attribute information of some attribute dimensions, but the attribute information of other attribute dimensions included in the candidate trigger data is consistent with the attribute information of the corresponding attribute dimensions in the conversion data.

S206、解析各候选触发数据,获得每个候选触发数据对应的触发设备属性信息和用户授权的触发IP地址。S206 , parse each candidate trigger data, and obtain trigger device attribute information corresponding to each candidate trigger data and a trigger IP address authorized by the user.

其中,设备属性信息包括多个属性维度的属性信息。The device attribute information includes attribute information of multiple attribute dimensions.

具体地,通过S220的操作,可获得每个候选触发数据对应的用户授权的触发设备属性信息和用户授权的触发IP地址。Specifically, through the operation of S220, the attribute information of the user-authorized trigger device and the user-authorized trigger IP address corresponding to each candidate trigger data can be obtained.

S207、基于转化数据的解析结果中用户授权的转化设备属性信息和各用户授权的触发设备属性信息,确定相同属性维度上属性信息相同的至少一个属性组合。S207 , based on the transformation device attribute information authorized by the user and the trigger device attribute information authorized by each user in the analysis result of the transformation data, determine at least one attribute combination with the same attribute information in the same attribute dimension.

其中,属性组合是指至少两个维度的属性的组合。The attribute combination refers to a combination of attributes of at least two dimensions.

具体地,如果某一触发数据为转化数据的转化分析结果,那么该触发数据和该转化数据之间的用户授权的IP地址和用户授权的设备属性信息均应当具有较高的一致性。基于此,本公开实施例中按照IP地址和设备属性信息的属性组合进行匹配置信度的计算。Specifically, if a certain trigger data is the conversion analysis result of the conversion data, the user-authorized IP address and the user-authorized device attribute information between the trigger data and the conversion data should have high consistency. Based on this, in the embodiment of the present disclosure, the matching confidence is calculated according to the attribute combination of the IP address and the device attribute information.

基于上述说明,用户授权的转化设备属性信息和用户授权的触发设备属性信息均是设备品牌、设备型号、操作系统类型和系统版本号共四个维度的属性信息,故本操作中先将用户授权的转化设备属性信息和每个用户授权的触发设备属性信息中相同属性维度上的属性值相同的属性进行组合,获得各个属性组合。Based on the above description, the attribute information of the conversion device authorized by the user and the attribute information of the triggering device authorized by the user are the attribute information of the device brand, device model, operating system type and system version number in four dimensions. Therefore, in this operation, the user is authorized first. The conversion device attribute information of the device is combined with the attributes with the same attribute value in the same attribute dimension in the triggering device attribute information authorized by each user to obtain each attribute combination.

举例而言,假设以1表示某属性维度上有属性值,且属性值一致,以0表示某属性维度上无属性值。转化数据中用户授权的转化设备属性信息的数字化表征为(1,1,0,1),同样IP地址下,有5个候选触发数据,其用户授权的触发设备属性信息分别表征为(1,1,0,1)、(1,1,0,1)、(1,0,0,1)、(1,0,1,1)和(0,1,1,1)。那么,确定出的属性组合分别为(1,1,0,1)、(1,0,0,1)和(0,1,0,1)。For example, it is assumed that 1 indicates that there is an attribute value in an attribute dimension, and the attribute values are consistent, and 0 indicates that there is no attribute value in an attribute dimension. The digital representation of the attribute information of the conversion device authorized by the user in the conversion data is (1, 1, 0, 1). There are 5 candidate trigger data under the same IP address, and the attribute information of the user-authorized trigger device is represented as (1, 1, 1). 1,0,1), (1,1,0,1), (1,0,0,1), (1,0,1,1) and (0,1,1,1). Then, the determined attribute combinations are (1,1,0,1), (1,0,0,1) and (0,1,0,1) respectively.

S208、确定每个属性组合的分布概率值。S208: Determine the distribution probability value of each attribute combination.

其中,分布概率值是指属性组合在一定统计范围内的存在稀有程度。统计范围内,某一属性组合的出现次数越少,其存在越稀有,分布概率值越小。Among them, the distribution probability value refers to the existence rarity of the attribute combination within a certain statistical range. Within the statistical range, the fewer occurrences of a certain attribute combination, the rarer its existence, and the smaller the distribution probability value.

具体地,如果一个属性组合在归因统计范围内的存在越稀有,那么包含该属性组合的候选触发数据被确定为包含该属性组合的转化数据的目标触发数据的可能性就越大。基于此,本公开实施例中以属性组合的分布概率值为基础数据来计算匹配置信度。确定每一个属性组合在归因统计范围内的分布概率值的方式,可以是实时获取数据并统计计算来获得;也可以是预先统计计算并存储各分布概率值,此处直接根据属性组合来查询存储的数据来获得。Specifically, the rarer the presence of an attribute combination within the attribution statistics, the higher the probability that the candidate trigger data containing the attribute combination will be determined as the target trigger data containing the conversion data of the attribute combination. Based on this, in the embodiment of the present disclosure, the distribution probability value of the attribute combination is used as the basic data to calculate the matching confidence. The method of determining the distribution probability value of each attribute combination within the attribution statistics range can be obtained by real-time data acquisition and statistical calculation; or pre-statistically calculated and stored for each distribution probability value, where the query is directly based on the attribute combination. stored data to obtain.

S209、针对各候选触发数据中包含的任一用户授权的触发IP地址和任一属性组合,基于该属性组合的分布概率值,确定该用户授权的触发IP地址下该属性组合对应的候选触发数据与转化数据的匹配置信度。S209. For any user-authorized triggering IP address and any attribute combination contained in each candidate triggering data, based on the distribution probability value of the attribute combination, determine the candidate triggering data corresponding to the attribute combination under the user-authorized triggering IP address Match confidence with conversion data.

具体地,针对任一用户授权的触发IP地址和任一属性组合,以该属性组合的分布概率值为基础数据,基于贝叶斯原理以及该属性组合的存在越稀有匹配置信度越大的构思,计算该用户授权的触发IP地址下该属性组合对应的候选触发数据和转化数据之间的匹配置信度。Specifically, for the triggering IP address and any attribute combination authorized by any user, the distribution probability value of the attribute combination is the basic data, based on the Bayesian principle and the idea that the rarer the attribute combination exists, the greater the matching confidence is. , and calculate the matching confidence between the candidate trigger data and the conversion data corresponding to the attribute combination under the trigger IP address authorized by the user.

S210、基于各候选触发数据对应的匹配置信度,确定转化数据对应的目标触发数据。S210. Determine target trigger data corresponding to the conversion data based on the matching confidence level corresponding to each candidate trigger data.

本公开实施例的上述技术方案,通过判断各触发数据中用户授权的触发IP地址是否属于异常IP类别或过大公有IP类别;若是,则剔除相应用户授权的触发IP地址对应的触发数据。实现了根据IP地址进行触发数据的初步过滤,减少了归因过程的数据量,且避免了归因误差较大的触发数据的参与,从而进一步提高后续归因的精确性。通过基于解析结果确定转化数据对应的用户授权的转化IP地址和用户授权的转化设备属性信息;获取各信息发布平台中的推广信息被触发时生成的各触发数据,并解析各触发数据,获得每个触发数据对应的用户授权的触发设备属性信息和用户授权的触发IP地址;基于用户授权的转化IP地址、各用户授权的触发IP地址、用户授权的转化设备属性信息和各用户授权的触发设备属性信息,对转化数据和各触发数据进行匹配,获得与转化数据相匹配的各候选触发数据。实现了对各触发数据的初步筛选,为后续精准归因提供数据基础。通过基于解析结果中用户授权的转化设备属性信息和各用户授权的触发设备属性信息,确定相同属性维度上属性信息相同的至少一个属性组合;确定每个属性组合的分布概率值;针对各候选触发数据中包含的任一用户授权的触发IP地址和任一属性组合,基于属性组合的分布概率值,确定该用户授权的触发IP地址下属性组合对应的候选触发数据与转化数据的匹配置信度。实现了以用户授权的触发IP地址和属性组合为置信度计算单位,以表征属性组合的稀有程度的分布概率值为基础来计算匹配置信度,提高了匹配置信度的准确性,从而进一步提高归因准确性。In the above technical solutions of the embodiments of the present disclosure, it is determined whether the triggering IP address authorized by the user in each triggering data belongs to the abnormal IP category or the excessive public IP category; if so, the triggering data corresponding to the triggering IP address authorized by the corresponding user is eliminated. Preliminary filtering of trigger data based on IP addresses is implemented, the amount of data in the attribution process is reduced, and the participation of trigger data with a large attribution error is avoided, thereby further improving the accuracy of subsequent attribution. Determine the user-authorized conversion IP address and user-authorized conversion device attribute information corresponding to the conversion data based on the analysis result; obtain each trigger data generated when the promotion information in each information release platform is triggered, and parse each trigger data to obtain each User-authorized triggering device attribute information and user-authorized triggering IP address corresponding to each triggering data; user-authorized transformation IP address, user-authorized triggering IP address, user-authorized transformation device attribute information, and user-authorized triggering device attribute information, match the conversion data with each trigger data, and obtain each candidate trigger data matching the conversion data. The preliminary screening of each trigger data is realized, providing a data basis for subsequent accurate attribution. Determine at least one attribute combination with the same attribute information in the same attribute dimension based on the transformation device attribute information authorized by the user and the triggering device attribute information authorized by each user in the analysis result; determine the distribution probability value of each attribute combination; for each candidate trigger For any user-authorized trigger IP address and any attribute combination contained in the data, based on the distribution probability value of the attribute combination, determine the matching confidence between the candidate trigger data and the conversion data corresponding to the attribute combination under the user-authorized trigger IP address. It is realized that the triggering IP address and attribute combination authorized by the user are used as the confidence calculation unit, and the matching confidence is calculated based on the distribution probability value representing the rarity of the attribute combination, which improves the accuracy of the matching confidence and further improves the classification. for accuracy.

图3是本公开实施例提供的又一种数据分析方法的流程图。其对“确定每个属性组合的分布概率值”进行了进一步优化。在此基础上,还可以进一步对“针对各候选触发数据中包含的任一用户授权的触发IP地址和任一属性组合,基于属性组合的分布概率值,确定用户授权的触发IP地址下属性组合对应的候选触发数据与转化数据的匹配置信度”进行优化。其中与上述各实施例相同或相应的术语的解释在此不再赘述。参见图3,该数据分析方法包括:FIG. 3 is a flowchart of another data analysis method provided by an embodiment of the present disclosure. It further optimizes "determining the distribution probability value of each attribute combination". On this basis, it is also possible to further determine the combination of attributes under the trigger IP address authorized by the user based on the distribution probability value of the attribute combination for the trigger IP address and any attribute combination authorized by any user contained in each candidate trigger data. The matching confidence between the corresponding candidate trigger data and conversion data” is optimized. The explanations of terms that are the same as or corresponding to the above embodiments are not repeated here. Referring to Figure 3, the data analysis method includes:

S310、获取与推广信息对应的预设操作完成时生成的转化数据。S310. Acquire conversion data generated when a preset operation corresponding to the promotion information is completed.

S320、解析转化数据,并基于解析结果确定与转化数据相匹配的多个候选触发数据。S320. Analyze the conversion data, and determine multiple candidate trigger data matching the conversion data based on the analysis result.

S330、解析各候选触发数据,获得每个候选触发数据对应的用户授权的触发设备属性信息和用户授权的触发IP地址。S330: Analyze each candidate trigger data, and obtain user-authorized trigger device attribute information and user-authorized trigger IP address corresponding to each candidate trigger data.

S340、基于解析结果中用户授权的转化设备属性信息和各用户授权的触发设备属性信息,确定相同属性维度上属性信息相同的至少一个属性组合。S340. Determine at least one attribute combination with the same attribute information in the same attribute dimension based on the transformation device attribute information authorized by the user and the triggering device attribute information authorized by each user in the analysis result.

S350、基于属性组合,查询属性概率字典,确定该属性组合的分布概率值。S350. Based on the attribute combination, query the attribute probability dictionary to determine the distribution probability value of the attribute combination.

其中,属性概率字典用于存储各属性组合的分布概率值。为了提高查询效率,其以具有高效读写效率的字典形式存储。Among them, the attribute probability dictionary is used to store the distribution probability value of each attribute combination. In order to improve query efficiency, it is stored in the form of a dictionary with efficient read and write efficiency.

具体地,针对确定出的每一个属性组合,以其为索引,查询属性概率字典,查询结果便为相应属性组合的分布概率值。Specifically, for each determined attribute combination, using it as an index, query the attribute probability dictionary, and the query result is the distribution probability value of the corresponding attribute combination.

在一些实施例中,属性概率字典预先通过对多个用户授权的设备属性信息进行统计而获得。其预先建立的过程为:针对任一属性组合,确定多个用户授权的设备属性信息中与该属性组合一致的属性数量,并确定属性数量和多个用户授权的设备属性信息的总数量的比值,作为该属性组合的分布概率值。具体地,通过预先收集的多个用户授权的设备属性信息来确定出各个属性组合Attr,并统计每个属性组合Attr出现的次数NAttr和所有属性组合出现的次数N,然后将两者的比值确定为相应属性组合的分布概率值,即P(Attr)=NAttr/N。In some embodiments, the attribute probability dictionary is obtained in advance by collecting statistics on device attribute information authorized by multiple users. The pre-established process is: for any combination of attributes, determine the number of attributes in the device attribute information authorized by multiple users that are consistent with the attribute combination, and determine the ratio of the number of attributes to the total number of device attribute information authorized by multiple users. , as the distribution probability value of the attribute combination. Specifically, each attribute combination Attr is determined through pre-collected device attribute information authorized by multiple users, and the number of occurrences of each attribute combination Attr NAttr and the number of occurrences of all attribute combinations N are counted, and then the ratio of the two is calculated. It is determined as the distribution probability value of the corresponding attribute combination, that is, P(Attr)=NAttr /N.

S360、针对任一用户授权的触发IP地址和任一属性组合,确定该用户授权的触发IP地址对应的候选触发数据的数据数量,并基于该属性组合的分布概率值,确定不同组合数量的候选触发数据与转化数据匹配的初始概率值。S360. For the triggering IP address authorized by any user and any combination of attributes, determine the data quantity of the candidate trigger data corresponding to the triggering IP address authorized by the user, and based on the distribution probability value of the attribute combination, determine the number of candidates for different combinations The initial probability value that trigger data matches conversion data.

其中,组合数量的取值范围为[1,数据数量],且组合数量为自然数。The value range of the number of combinations is [1, the number of data], and the number of combinations is a natural number.

具体地,对于任一个用户授权的触发IP地址和任一个属性组合,从所有候选触发数据中确定出包含该用户授权的触发IP地址和该属性组合的候选触发数据,并统计这些确定出的候选触发数据的数量(即数据数量)n。然后,根据公式(1),以该属性组合的分布概率值P(Attr)为基础概率数据,分别确定该n个候选触发数据中假设有i个候选触发数据与转化数据能够匹配上的概率:Specifically, for any user-authorized triggering IP address and any attribute combination, determine the candidate triggering data including the user-authorized triggering IP address and the attribute combination from all candidate triggering data, and count the determined candidate triggering data. The number of trigger data (ie, the number of data) n. Then, according to formula (1), using the distribution probability value P(Attr) of the attribute combination as the basic probability data, respectively determine the probability that the n candidate trigger data can be matched with the i candidate trigger data and the conversion data:

Figure BDA0002988494790000131
Figure BDA0002988494790000131

其中,i的取值范围为[1,n],表示从1逐次到n的不同组合数量;P(Attr,ip,i)表示在用户授权的触发IP地址和属性组合Attr下,组合数量为i时的初始概率值。Among them, the value range of i is [1,n], which means the number of different combinations from 1 to n; P(Attr,ip,i) means that under the triggering IP address and attribute combination Attr authorized by the user, the number of combinations is The initial probability value at i.

按照公式(1)可以计算得到该用户授权的触发IP地址和该属性组合下,组合数量从1到n的所有初始概率值:P(Attr,ip,1)、P(Attr,ip,2)、…、P(Attr,ip,n)。According to formula (1), all initial probability values of the combination from 1 to n under the combination of the trigger IP address authorized by the user and the attribute can be calculated: P(Attr,ip,1), P(Attr,ip,2) , ..., P(Attr, ip, n).

S370、针对任一用户授权的触发IP地址和任一属性组合,基于该用户授权的触发IP地址和该属性组合对应的数据数量和初始概率值,确定该用户授权的触发IP地址下该属性组合对应的候选触发数据与转化数据的匹配置信度。S370. For the triggering IP address authorized by any user and any combination of attributes, based on the data quantity and initial probability value corresponding to the triggering IP address authorized by the user and the attribute combination, determine the attribute combination under the triggering IP address authorized by the user The matching confidence between the corresponding candidate trigger data and the conversion data.

具体地,针对任一用户授权的触发IP地址和任一属性组合Attr,按照公式(2),由该用户授权的触发IP地址和该属性组合Attr下的所有组合数量的初始概率值,计算得到该用户授权的触发IP地址和该属性组合Attr对应的候选触发数据和转化数据之间的匹配置信度P(Attr,ip):Specifically, for the triggering IP address authorized by any user and any attribute combination Attr, according to formula (2), the initial probability value of all combinations under the triggering IP address authorized by the user and the attribute combination Attr is calculated to obtain The matching confidence P(Attr,ip) between the trigger IP address authorized by the user and the candidate trigger data and conversion data corresponding to the attribute combination Attr:

Figure BDA0002988494790000141
Figure BDA0002988494790000141

由公式(2)可知,若该用户授权的触发IP地址和该属性组合Attr对应的候选触发数据为多个,那么该多个候选触发数据的匹配置信度相同。It can be known from formula (2) that if there are multiple candidate trigger data corresponding to the trigger IP address authorized by the user and the attribute combination Attr, then the matching confidence levels of the multiple candidate trigger data are the same.

以上述用户授权的转化设备属性信息为(1,1,0,1),相同的用户授权的触发IP地址下,5个用户授权的触发设备属性信息分别为(1,1,0,1)、(1,1,0,1)、(1,0,0,1)、(1,0,1,1)和(0,1,1,1),所得属性组合为(1,1,0,1)、(1,0,0,1)和(0,1,0,1)为例,5个候选触发数据的匹配置信度分别为P1、P1、P2、P2和P3,且在各初始概率值不同的前提下,P1、P2和P3不相等。Taking the above-mentioned user-authorized conversion device attribute information as (1,1,0,1), under the same user-authorized triggering IP address, the five user-authorized triggering device attribute information are respectively (1,1,0,1) , (1,1,0,1), (1,0,0,1), (1,0,1,1) and (0,1,1,1), the resulting combination of attributes is (1,1, 0,1), (1,0,0,1) and (0,1,0,1) as examples, the matching confidences of the five candidate trigger data are P1, P1, P2, P2 and P3 respectively, and the Under the premise of different initial probability values, P1, P2 and P3 are not equal.

S380、基于各候选触发数据对应的匹配置信度,确定转化数据对应的目标触发数据。S380. Based on the matching confidence levels corresponding to each candidate trigger data, determine target trigger data corresponding to the conversion data.

本公开实施例的上述技术方案,通过基于属性组合,查询属性概率字典,确定该属性组合的分布概率值,提高了分布概率值的确定效率,从而进一步提高渠道归因效率。通过针对任一用户授权的触发IP地址和任一属性组合,确定用户授权的触发IP地址对应的候选触发数据的数据数量,并基于属性组合的分布概率值,确定不同组合数量的候选触发数据与转化数据匹配的初始概率值;以及基于用户授权的触发IP地址和属性组合对应的数据数量和初始概率值,确定用户授权的触发IP地址下属性组合对应的候选触发数据与转化数据的匹配置信度。实现了同一用户授权的触发IP地址和同一属性组合下的匹配置信度的计算,进一步提高了匹配置信度的可靠性,从而进一步提高短链投放等推广信息投放平台下的转化分析结果的稳定性和可靠性,进而增加了推广信息的投放平台。The above technical solutions of the embodiments of the present disclosure improve the efficiency of determining the distribution probability value by querying the attribute probability dictionary based on the attribute combination to determine the distribution probability value of the attribute combination, thereby further improving the channel attribution efficiency. Determine the data quantity of the candidate trigger data corresponding to the trigger IP address authorized by any user and any attribute combination, and based on the distribution probability value of the attribute combination, determine the number of candidate trigger data of different combinations and The initial probability value of conversion data matching; and based on the data quantity and initial probability value corresponding to the trigger IP address and attribute combination authorized by the user, determine the matching confidence of the candidate trigger data corresponding to the attribute combination under the user-authorized trigger IP address and the conversion data . It realizes the calculation of matching confidence under the trigger IP address authorized by the same user and the same attribute combination, which further improves the reliability of matching confidence, thereby further improving the stability of conversion analysis results under short-chain placement and other promotional information placement platforms. and reliability, thereby increasing the delivery platform for promotional information.

图4为本公开实施例提供的一种数据分析装置的结构示意图,该装置可由软件和/或硬件实现,一般可集成在电子设备中,可通过执行数据分析方法来为转化数据匹配更为准确的目标触发数据。如图4所示,该装置包括:4 is a schematic structural diagram of a data analysis apparatus provided by an embodiment of the present disclosure. The apparatus can be implemented by software and/or hardware, and can generally be integrated in electronic equipment. The data analysis method can be executed to match conversion data more accurately. target trigger data. As shown in Figure 4, the device includes:

转化数据获取模块410,用于获取与推广信息对应的预设操作完成时生成的转化数据;A conversiondata acquisition module 410, configured to acquire conversion data generated when a preset operation corresponding to the promotion information is completed;

候选触发数据确定模块420,用于解析转化数据,并基于解析结果确定与转化数据相匹配的多个候选触发数据;其中,候选触发数据是不同信息发布平台中推广信息被触发时获得的数据;The candidate triggerdata determination module 420 is used to parse the conversion data, and determine a plurality of candidate trigger data matching the conversion data based on the analysis result; wherein, the candidate trigger data is the data obtained when the promotion information in different information publishing platforms is triggered;

匹配置信度确定模块430,用于确定每个候选触发数据与转化数据的匹配置信度;a matching confidencelevel determination module 430, configured to determine the matching confidence level between each candidate trigger data and the conversion data;

目标触发数据确定模块440,用于基于各候选触发数据对应的匹配置信度,确定转化数据对应的目标触发数据。The target triggerdata determination module 440 is configured to determine target trigger data corresponding to the conversion data based on the matching confidence level corresponding to each candidate trigger data.

在一些实施例中,匹配置信度确定模块420包括:In some embodiments, matchconfidence determination module 420 includes:

候选触发数据解析子模块,用于解析各候选触发数据,获得每个候选触发数据对应的用户授权的触发设备属性信息和用户授权的触发IP地址;其中,设备属性信息包括多个属性维度的属性信息;The candidate trigger data parsing submodule is used to parse each candidate trigger data, and obtain the user-authorized trigger device attribute information and the user-authorized trigger IP address corresponding to each candidate trigger data; wherein, the device attribute information includes attributes of multiple attribute dimensions information;

属性组合确定子模块,用于基于解析结果中用户授权的转化设备属性信息和各用户授权的触发设备属性信息,确定相同属性维度上属性信息相同的至少一个属性组合;The attribute combination determination submodule is used to determine at least one attribute combination with the same attribute information on the same attribute dimension based on the transformation device attribute information authorized by the user and the trigger device attribute information authorized by each user in the analysis result;

分布概率值确定子模块,用于确定每个属性组合的分布概率值;The distribution probability value determination sub-module is used to determine the distribution probability value of each attribute combination;

匹配置信度确定子模块,用于针对各候选触发数据中包含的任一用户授权的触发IP地址和任一属性组合,基于属性组合的分布概率值,确定用户授权的触发IP地址下属性组合对应的候选触发数据与转化数据的匹配置信度。The matching confidence determination sub-module is used to determine the corresponding attribute combination under the user-authorized trigger IP address based on the distribution probability value of the attribute combination for any user-authorized trigger IP address and any attribute combination contained in each candidate trigger data. The matching confidence between the candidate trigger data and the conversion data.

在一些实施例中,用户授权的转化设备属性信息包括设备品牌、设备型号、操作系统类型和系统版本号;用户授权的触发设备属性信息包括设备品牌、设备型号、操作系统类型和系统版本号中的至少一个。In some embodiments, the user-authorized conversion device attribute information includes device brand, device model, operating system type, and system version number; the user-authorized trigger device attribute information includes the device brand, device model, operating system type, and system version number. at least one of.

在一些实施例中,分布概率值确定子模块具体用于:In some embodiments, the distribution probability value determination submodule is specifically used for:

基于属性组合,查询属性概率字典,确定该属性组合的分布概率值;其中,属性概率字典预先通过对多个用户授权的设备属性信息进行统计而获得。Based on the attribute combination, the attribute probability dictionary is queried, and the distribution probability value of the attribute combination is determined; wherein, the attribute probability dictionary is obtained by collecting statistics on the device attribute information authorized by multiple users in advance.

在一些实施例中,该装置还包括属性概率字典建立模块,用于通过如下方式预先建立属性概率字典:In some embodiments, the apparatus further includes an attribute probability dictionary establishment module, configured to pre-establish an attribute probability dictionary in the following manner:

针对任一属性组合,确定多个用户授权的设备属性信息中与属性组合一致的属性数量,并确定属性数量和多个用户授权的设备属性信息的总数量的比值,作为属性组合的分布概率值。For any attribute combination, determine the number of attributes in the device attribute information authorized by multiple users that are consistent with the attribute combination, and determine the ratio of the number of attributes to the total number of device attribute information authorized by multiple users, as the distribution probability value of the attribute combination .

在一些实施例中,匹配置信度确定子模块具体用于:In some embodiments, the matching confidence determination sub-module is specifically used for:

针对任一用户授权的触发IP地址和任一属性组合,确定用户授权的触发IP地址对应的候选触发数据的数据数量,并基于属性组合的分布概率值,确定不同组合数量的候选触发数据与转化数据匹配的初始概率值;其中,组合数量的取值范围为[1,数据数量],且组合数量为自然数;For any user-authorized triggering IP address and any combination of attributes, determine the data quantity of candidate triggering data corresponding to the user-authorized triggering IP address, and determine the number of candidate triggering data and conversion based on the distribution probability value of the attribute combination. The initial probability value of data matching; among them, the value range of the number of combinations is [1, the number of data], and the number of combinations is a natural number;

针对任一用户授权的触发IP地址和任一属性组合,基于用户授权的触发IP地址和属性组合对应的数据数量和初始概率值,确定用户授权的触发IP地址下属性组合对应的候选触发数据与转化数据的匹配置信度。For any user-authorized triggering IP address and any attribute combination, based on the data quantity and initial probability value corresponding to the user-authorized triggering IP address and attribute combination, determine the candidate triggering data corresponding to the attribute combination under the user-authorized triggering IP address and the Match confidence for conversion data.

在一些实施例中,目标触发数据确定模块440具体用于:In some embodiments, the target triggerdata determination module 440 is specifically configured to:

将最大匹配置信度对应的候选触发数据确定为转化数据对应的目标触发数据。The candidate trigger data corresponding to the maximum matching confidence is determined as the target trigger data corresponding to the conversion data.

在一些实施例中,目标触发数据确定模块440具体用于:In some embodiments, the target triggerdata determination module 440 is specifically configured to:

若各匹配置信度中存在多个最大匹配置信度,则从各最大匹配置信度对应的候选触发数据中,确定触发数据的数据生成时间最接近于转化数据的数据生成时间的候选触发数据,并将筛选出的候选触发数据确定为转化数据对应的目标触发数据。If there are multiple maximum matching confidence levels in each matching confidence level, from the candidate trigger data corresponding to each maximum matching confidence level, determine the candidate trigger data whose data generation time of the trigger data is closest to the data generation time of the transformed data, and The filtered candidate trigger data is determined as the target trigger data corresponding to the conversion data.

在一些实施例中,候选触发数据获取模块420包括:In some embodiments, the candidate triggerdata acquisition module 420 includes:

转化数据解析子模块,用于基于解析结果确定转化数据对应的用户授权的转化IP地址和用户授权的转化设备属性信息;The transformation data parsing submodule is used to determine the user-authorized transformation IP address and the user-authorized transformation device attribute information corresponding to the transformation data based on the parsing result;

触发数据解析子模块,用于获取各信息发布平台中的推广信息被触发时生成的各触发数据,并解析各触发数据,获得每个触发数据对应的用户授权的触发设备属性信息和用户授权的触发IP地址;The trigger data analysis sub-module is used to obtain the trigger data generated when the promotion information in each information release platform is triggered, and analyze the trigger data to obtain the user-authorized trigger device attribute information and the user-authorized trigger data corresponding to each trigger data. trigger IP address;

候选触发数据获取子模块,用于基于用户授权的转化IP地址、各用户授权的触发IP地址、用户授权的转化设备属性信息和各用户授权的触发设备属性信息,对转化数据和各触发数据进行匹配,获得与转化数据相匹配的各候选触发数据。The candidate trigger data acquisition sub-module is used for converting data and each trigger data based on the conversion IP address authorized by the user, the trigger IP address authorized by each user, the attribute information of the conversion device authorized by the user, and the attribute information of the trigger device authorized by each user. Match to obtain each candidate trigger data that matches the conversion data.

在一些实施例中,候选触发数据获取模块420还包括触发数据过滤子模块,用于:In some embodiments, the candidate triggerdata acquisition module 420 further includes a trigger data filtering sub-module for:

在解析各触发数据,获得每个触发数据对应的用户授权的触发设备属性信息和用户授权的触发IP地址之后,判断用户授权的触发IP地址是否属于异常IP类别或过大公有IP类别,其中,过大公有IP类别为IP地址的共用人数超过设定阈值的IP类别;After analyzing each trigger data and obtaining the attribute information of the user-authorized trigger device and the user-authorized trigger IP address corresponding to each trigger data, it is determined whether the user-authorized trigger IP address belongs to the abnormal IP category or the excessive public IP category, wherein, The excessively large public IP category is the IP category in which the number of people sharing the IP address exceeds the set threshold;

若是,则剔除相应用户授权的触发IP地址对应的触发数据。If so, the trigger data corresponding to the trigger IP address authorized by the corresponding user is eliminated.

在一些实施例中,该装置还包括IP地址分类模块,用于:In some embodiments, the apparatus further includes an IP address classification module for:

在判断用户授权的触发IP地址是否属于异常IP类别或过大公有IP类别之前,收集多个用户授权的IP地址,并对多个用户授权的IP地址进行统计分类,获得异常IP类别和过大公有IP类别,异常IP类别中包含各异常IP地址,过大公有IP类别中包含各过大公有IP地址。Before judging whether the triggering IP address authorized by the user belongs to the abnormal IP category or the excessive public IP category, collect the IP addresses authorized by multiple users, perform statistical classification on the IP addresses authorized by multiple users, and obtain the abnormal IP category and the excessive public IP address. There are IP categories. The abnormal IP category contains abnormal IP addresses. The oversized public IP category includes oversized public IP addresses.

相应地,触发数据过滤子模块具体用于:Correspondingly, the trigger data filtering sub-module is specifically used for:

将用户授权的触发IP地址分别与各异常IP地址和各过大公有IP地址进行匹配;Match the triggering IP address authorized by the user with each abnormal IP address and each excessively large public IP address;

若与异常IP地址匹配成功,则确定用户授权的触发IP地址属于异常IP类别;If the match with the abnormal IP address is successful, it is determined that the triggering IP address authorized by the user belongs to the abnormal IP category;

若与过大公有IP地址匹配成功,则确定用户授权的触发IP地址属于过大公有IP类别。If the match is successful with the excessively large public IP address, it is determined that the triggering IP address authorized by the user belongs to the excessively large public IP category.

通过本公开实施例提供的一种数据分析装置,实现了通过分析每个候选触发数据与转化数据之间的匹配置信度,进而根据匹配置信度来选择一个与转化数据最为匹配的目标触发数据,提高了推广信息的转化分析的准确性,也能够通过灵活调节候选触发数据筛选的置信度的阈值,提高推广信息的转化分析的召回率。Through the data analysis device provided by the embodiment of the present disclosure, it is realized by analyzing the matching confidence between each candidate trigger data and the conversion data, and then selecting a target trigger data that best matches the conversion data according to the matching confidence, The accuracy of the conversion analysis of the promotion information is improved, and the recall rate of the conversion analysis of the promotion information can be improved by flexibly adjusting the confidence threshold of the candidate trigger data screening.

本公开实施例所提供的数据分析装置可执行本公开任意实施例所提供的数据分析方法,具备执行方法相应的功能模块和有益效果。The data analysis apparatus provided by the embodiment of the present disclosure can execute the data analysis method provided by any embodiment of the present disclosure, and has functional modules and beneficial effects corresponding to the execution method.

值得注意的是,上述数据分析装置的实施例中,所包括的各个模块和子模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,各功能模块和功能子模块的具体名称也只是为了便于相互区分,并不用于限制本公开的保护范围。It is worth noting that in the above-mentioned embodiments of the data analysis device, the modules and sub-modules included are only divided according to functional logic, but are not limited to the above-mentioned division, as long as the corresponding functions can be realized; The specific names of the functional modules and functional sub-modules are only for the convenience of distinguishing from each other, and are not used to limit the protection scope of the present disclosure.

图5为本公开实施例提供的一种电子设备的结构示意图。如图5所示,电子设备500包括一个或多个处理器501和存储器502。FIG. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. As shown in FIG. 5 ,electronic device 500 includes one ormore processors 501 andmemory 502 .

处理器501可以是中央处理单元(CPU)或者具有数据处理能力和/或指令执行能力的其他形式的处理单元,并且可以控制电子设备500中的其他组件以执行期望的功能。Processor 501 may be a central processing unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components inelectronic device 500 to perform desired functions.

存储器502可以包括一个或多个计算机程序产品,计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。非易失性存储器例如可以包括只读存储器(ROM)、硬盘、闪存等。在计算机可读存储介质上可以存储一个或多个计算机程序指令,处理器501可以运行程序指令,以实现上文所说明的本公开实施例的数据分析方法以及/或者其他期望的功能。在计算机可读存储介质中还可以存储诸如属性概率字典、各IP类别及其下属的IP地址字段等各种内容。Memory 502 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory may include, for example, random access memory (RAM) and/or cache memory, among others. Non-volatile memory may include, for example, read only memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer-readable storage medium, and theprocessor 501 may execute the program instructions to implement the data analysis method and/or other desired functions of the embodiments of the present disclosure described above. Various contents such as attribute probability dictionary, each IP category and its subordinate IP address fields can also be stored in the computer-readable storage medium.

在一个示例中,电子设备500还可以包括:输入装置503和输出装置504,这些组件通过总线系统和/或其他形式的连接机构(未示出)互连。该输入装置503可以包括例如键盘、鼠标等等。该输出装置504可以向外部输出各种信息,包括转化数据对应的目标触发数据及其匹配置信度等。该输出装置504可以包括例如显示器、扬声器、打印机、以及通信网络及其所连接的远程输出设备等等。In one example, theelectronic device 500 may also include aninput device 503 and anoutput device 504 interconnected by a bus system and/or other form of connection mechanism (not shown). Theinput device 503 may include, for example, a keyboard, a mouse, and the like. Theoutput device 504 can output various information to the outside, including target trigger data corresponding to the conversion data and its matching confidence. Theoutput device 504 may include, for example, displays, speakers, printers, and communication networks and their connected remote output devices, among others.

当然,为了简化,图5中仅示出了该电子设备500中与本公开有关的组件中的一些,省略了诸如总线、输入/输出接口等等的组件。除此之外,根据具体应用情况,电子设备500还可以包括任何其他适当的组件。Of course, for simplicity, only some of the components in theelectronic device 500 related to the present disclosure are shown in FIG. 5 , and components such as buses, input/output interfaces, and the like are omitted. Besides, theelectronic device 500 may also include any other appropriate components according to the specific application.

除了上述方法和设备以外,本公开的实施例还可以是计算机程序产品,其包括计算机程序指令,计算机程序指令在被处理器运行时使得处理器执行本公开任意实施例所提供的数据分析方法的步骤。In addition to the above-mentioned methods and apparatuses, the embodiments of the present disclosure may also be computer program products, which include computer program instructions that, when executed by a processor, cause the processor to execute the data analysis method provided by any embodiment of the present disclosure. step.

计算机程序产品可以以一种或多种程序设计语言的任意组合来编写用于执行本公开实施例操作的程序代码,程序设计语言包括面向对象的程序设计语言,诸如Java、C++等,还包括常规的过程式程序设计语言,诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。The computer program product may write program code for performing operations of embodiments of the present disclosure in any combination of one or more programming languages, including object-oriented programming languages, such as Java, C++, etc., as well as conventional procedural programming language, such as "C" language or similar programming language. The program code may execute entirely on the user computing device, partly on the user device, as a stand-alone software package, partly on the user computing device and partly on a remote computing device, or entirely on the remote computing device or server execute on.

此外,本公开的实施例还可以是计算机可读存储介质,其上存储有计算机程序指令,该计算机程序指令在被处理器运行时使得处理器执行本公开任意实施例所提供的数据分析方法的步骤。In addition, an embodiment of the present disclosure may also be a computer-readable storage medium on which computer program instructions are stored, the computer program instructions, when executed by a processor, cause the processor to execute the data analysis method provided by any embodiment of the present disclosure. step.

上述计算机可读存储介质可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以包括但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。The aforementioned computer-readable storage media may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but not limited to, electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatuses or devices, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

需要说明的是,本公开所用术语仅为了描述特定实施例,而非限制本申请范围。如本公开说明书和权利要求书中所示,除非上下文明确提示例外情形,“一”、“一个”、“一种”和/或“该”等词并非特指单数,也可包括复数。术语“和/或”包括一个或多个相关所列条目的任何一个和所有组合。术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法或者设备中还存在另外的相同要素。It should be noted that the terminology used in the present disclosure is only for describing specific embodiments, rather than limiting the scope of the present application. As shown in this disclosure and the claims, unless the context clearly dictates otherwise, the words "a", "an", "an" and/or "the" are not intended to be specific in the singular and may include the plural. The term "and/or" includes any and all combinations of one or more of the associated listed items. The terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion such that a process, method or device comprising a list of elements includes not only those elements, but also other elements not expressly listed, Alternatively, elements inherent to such a process, method or apparatus may also be included. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, or device that includes the element.

以上所述仅是本公开的具体实施方式,使本领域技术人员能够理解或实现本公开。对这些实施例的多种修改对本领域的技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本公开的精神或范围的情况下,在其它实施例中实现。因此,本公开将不会被限制于本文所述的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。The above descriptions are only specific embodiments of the present disclosure, so that those skilled in the art can understand or implement the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the present disclosure. Therefore, the present disclosure is not intended to be limited to the embodiments described herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (13)

Translated fromChinese
1.一种数据分析方法,其特征在于,包括:1. a data analysis method, is characterized in that, comprises:获取与推广信息对应的预设操作完成时生成的转化数据;Obtain the conversion data generated when the preset operation corresponding to the promotion information is completed;解析所述转化数据,并基于解析结果确定与所述转化数据相匹配的多个候选触发数据;其中,触发数据是不同信息发布平台中所述推广信息被触发时获得的数据;Analyzing the conversion data, and determining a plurality of candidate trigger data matching the conversion data based on the analysis result; wherein, the trigger data is the data obtained when the promotion information in different information publishing platforms is triggered;确定每个所述候选触发数据与所述转化数据的匹配置信度;determining the matching confidence of each of the candidate trigger data and the conversion data;基于各所述候选触发数据对应的所述匹配置信度,确定所述转化数据对应的目标触发数据。Based on the matching confidence level corresponding to each candidate trigger data, target trigger data corresponding to the conversion data is determined.2.根据权利要求1所述的方法,其特征在于,所述确定每个所述候选触发数据与所述转化数据的匹配置信度包括:2. The method according to claim 1, wherein the determining the matching confidence of each of the candidate trigger data and the conversion data comprises:解析各所述候选触发数据,获得每个所述候选触发数据对应的触发设备属性信息和触发IP地址;其中,设备属性信息包括多个属性维度的属性信息;Analyzing each candidate trigger data to obtain trigger device attribute information and trigger IP address corresponding to each candidate trigger data; wherein, the device attribute information includes attribute information of multiple attribute dimensions;基于所述解析结果中的转化设备属性信息和各所述触发设备属性信息,确定相同属性维度上属性信息相同的至少一个属性组合;Determine at least one attribute combination with the same attribute information on the same attribute dimension based on the conversion device attribute information and each of the triggering device attribute information in the analysis result;确定每个所述属性组合的分布概率值;determining a distribution probability value for each combination of said attributes;针对各所述候选触发数据中包含的任一所述触发IP地址和任一所述属性组合,基于所述属性组合的所述分布概率值,确定所述触发IP地址下所述属性组合对应的所述候选触发数据与所述转化数据的匹配置信度。For any of the triggering IP addresses and any of the attribute combinations included in each of the candidate triggering data, based on the distribution probability value of the attribute combination, determine the corresponding attribute combination under the triggering IP address. The matching confidence of the candidate trigger data and the conversion data.3.根据权利要求2所述的方法,其特征在于,所述确定所述属性组合的分布概率值包括:3. The method according to claim 2, wherein the determining the distribution probability value of the attribute combination comprises:基于所述属性组合,查询属性概率字典,确定所述属性组合的分布概率值;其中,所述属性概率字典预先通过对多个设备属性信息进行统计而获得。Based on the attribute combination, an attribute probability dictionary is queried to determine the distribution probability value of the attribute combination; wherein, the attribute probability dictionary is obtained in advance by performing statistics on attribute information of a plurality of devices.4.根据权利要求3所述的方法,其特征在于,所述属性概率字典通过如下方式预先建立:4. The method according to claim 3, wherein the attribute probability dictionary is pre-established in the following manner:针对任一所述属性组合,确定所述多个设备属性信息中与所述属性组合一致的属性数量,并确定所述属性数量和所述多个设备属性信息的总数量的比值,作为所述属性组合的分布概率值。For any combination of attributes, determine the number of attributes in the plurality of device attribute information that are consistent with the attribute combination, and determine the ratio of the number of attributes to the total number of attribute information of the plurality of devices, as the The distribution probability value of the attribute combination.5.根据权利要求2所述的方法,其特征在于,所述针对各所述候选触发数据中包含的任一所述触发IP地址和任一所述属性组合,基于所述属性组合的所述分布概率值,确定所述触发IP地址下所述属性组合对应的所述候选触发数据与所述转化数据的匹配置信度包括:5 . The method according to claim 2 , wherein, for any combination of the trigger IP address and any combination of the attributes included in the candidate trigger data, the method based on the combination of attributes Distribution probability value, and determining the matching confidence of the candidate trigger data corresponding to the attribute combination under the trigger IP address and the conversion data includes:针对任一所述触发IP地址和任一所述属性组合,确定所述触发IP地址对应的所述候选触发数据的数据数量,并基于所述属性组合的所述分布概率值,确定不同组合数量的所述候选触发数据与所述转化数据匹配的初始概率值;其中,所述组合数量的取值范围为[1,所述数据数量],且所述组合数量为自然数;For any combination of the trigger IP address and the attribute, determine the data quantity of the candidate trigger data corresponding to the trigger IP address, and determine the number of different combinations based on the distribution probability value of the attribute combination The initial probability value of matching the candidate trigger data with the conversion data; wherein, the value range of the number of combinations is [1, the number of data], and the number of combinations is a natural number;针对任一所述触发IP地址和任一所述属性组合,基于所述触发IP地址和所述属性组合对应的所述数据数量和所述初始概率值,确定所述触发IP地址下所述属性组合对应的所述候选触发数据与所述转化数据的匹配置信度。For any combination of the trigger IP address and the attribute, determine the attribute under the trigger IP address based on the data quantity and the initial probability value corresponding to the trigger IP address and the attribute combination The corresponding matching confidences of the candidate trigger data and the conversion data are combined.6.根据权利要求1所述的方法,其特征在于,所述基于各所述候选触发数据对应的所述匹配置信度,确定所述转化数据对应的目标触发数据包括:6. The method according to claim 1, wherein determining the target trigger data corresponding to the conversion data based on the matching confidence level corresponding to each candidate trigger data comprises:将最大匹配置信度对应的所述候选触发数据确定为所述转化数据对应的目标触发数据。The candidate trigger data corresponding to the maximum matching confidence is determined as the target trigger data corresponding to the conversion data.7.根据权利要求1所述的方法,其特征在于,所述基于各所述候选触发数据对应的所述匹配置信度,确定所述转化数据对应的目标触发数据包括:7 . The method according to claim 1 , wherein determining the target trigger data corresponding to the conversion data based on the matching confidence level corresponding to each candidate trigger data comprises: 8 .若各所述匹配置信度中存在多个最大匹配置信度,则从各所述最大匹配置信度对应的所述候选触发数据中,筛选触发数据的数据生成时间最接近于所述转化数据的数据生成时间的所述候选触发数据,并将筛选出的所述候选触发数据确定为所述转化数据对应的目标触发数据。If there are multiple maximum matching confidence levels in each of the matching confidence levels, from the candidate trigger data corresponding to each of the maximum matching confidence levels, filter the data whose data generation time of the trigger data is closest to the conversion data. The candidate trigger data of time is generated, and the filtered candidate trigger data is determined as the target trigger data corresponding to the conversion data.8.根据权利要求1所述的方法,其特征在于,所述基于解析结果确定与所述转化数据相匹配的所述推广信息的多个候选触发数据包括:8. The method according to claim 1, wherein the determining, based on the parsing result, the multiple candidate trigger data of the promotion information that matches the conversion data comprises:基于所述解析结果确定所述转化数据对应的转化IP地址和转化设备属性信息;Determine the conversion IP address and conversion device attribute information corresponding to the conversion data based on the analysis result;获取各所述信息发布平台中的所述推广信息被触发时生成的各触发数据,并解析各所述触发数据,获得每个所述触发数据对应的触发设备属性信息和触发IP地址;Acquire each trigger data generated when the promotion information in each of the information publishing platforms is triggered, and parse each of the trigger data to obtain the trigger device attribute information and trigger IP address corresponding to each of the trigger data;基于所述转化IP地址、各所述触发IP地址、所述转化设备属性信息和各所述触发设备属性信息,对所述转化数据和各所述触发数据进行匹配,获得与所述转化数据相匹配的各所述候选触发数据。Based on the conversion IP address, each of the triggering IP addresses, the attribute information of the conversion device, and the attribute information of each of the triggering devices, the conversion data and each of the triggering data are matched to obtain the corresponding conversion data. matching each of the candidate trigger data.9.根据权利要求8所述的方法,其特征在于,在所述解析各所述触发数据,获得每个所述触发数据对应的触发设备属性信息和触发IP地址之后,所述方法还包括:9. The method according to claim 8, characterized in that, after parsing each of the trigger data and obtaining the trigger device attribute information and trigger IP address corresponding to each of the trigger data, the method further comprises:判断所述触发IP地址是否属于异常IP类别或过大公有IP类别,其中,所述过大公有IP类别为IP地址的共用人数超过设定阈值的IP类别;Determine whether the triggering IP address belongs to an abnormal IP category or an excessively large public IP category, wherein the excessively large public IP category is an IP category in which the number of people sharing an IP address exceeds a set threshold;若是,则剔除相应触发IP地址对应的所述触发数据。If so, the trigger data corresponding to the corresponding trigger IP address is eliminated.10.根据权利要求9所述的方法,其特征在于,在所述判断所述触发IP地址是否属于异常IP类别或过大公有IP类别之前,所述方法还包括:10. The method according to claim 9, wherein before judging whether the triggering IP address belongs to an abnormal IP class or an excessively large public IP class, the method further comprises:收集多个IP地址,并对所述多个IP地址进行统计分类,获得所述异常IP类别和所述过大公有IP类别,所述异常IP类别中包含各异常IP地址,所述过大公有IP类别中包含各过大公有IP地址;Collect multiple IP addresses, perform statistical classification on the multiple IP addresses, and obtain the abnormal IP category and the excessively large public IP category, where the abnormal IP category includes each abnormal IP address, and the excessively large public IP address is obtained. The IP category contains large public IP addresses;所述判断所述触发IP地址是否属于异常IP类别或过大公有IP类别包括:The judging whether the triggering IP address belongs to an abnormal IP category or an excessively large public IP category includes:将所述触发IP地址分别与各所述异常IP地址和各所述过大公有IP地址进行匹配;Matching the triggering IP address with each of the abnormal IP addresses and each of the too large public IP addresses;若与所述异常IP地址匹配成功,则确定所述触发IP地址属于所述异常IP类别;If the matching with the abnormal IP address is successful, it is determined that the triggering IP address belongs to the abnormal IP category;若与所述过大公有IP地址匹配成功,则确定所述触发IP地址属于所述过大公有IP类别。If the matching with the excessively large public IP address is successful, it is determined that the triggering IP address belongs to the excessively large public IP category.11.一种数据分析装置,其特征在于,包括:11. A data analysis device, comprising:转化数据获取模块,用于获取与推广信息对应的预设操作完成时生成的转化数据;The conversion data acquisition module is used to acquire the conversion data generated when the preset operation corresponding to the promotion information is completed;候选触发数据确定模块,用于解析所述转化数据,并基于解析结果确定与所述转化数据相匹配的多个候选触发数据;其中,所述候选触发数据是不同信息发布平台中所述推广信息被触发时获得的数据;A candidate trigger data determination module, configured to parse the conversion data, and determine a plurality of candidate trigger data matching the conversion data based on the analysis result; wherein, the candidate trigger data is the promotion information in different information publishing platforms The data obtained when it is triggered;匹配置信度确定模块,用于确定每个所述候选触发数据与所述转化数据的匹配置信度;a matching confidence level determination module for determining the matching confidence level of each of the candidate trigger data and the conversion data;目标触发数据确定模块,用于基于各所述候选触发数据对应的所述匹配置信度,确定所述转化数据对应的目标触发数据。A target trigger data determination module, configured to determine target trigger data corresponding to the conversion data based on the matching confidence level corresponding to each candidate trigger data.12.一种电子设备,其特征在于,所述电子设备包括:12. An electronic device, characterized in that the electronic device comprises:处理器和存储器;processor and memory;所述处理器通过调用所述存储器存储的程序或指令,用于执行如权利要求1至10任一项所述方法的步骤。The processor is adapted to perform the steps of the method according to any one of claims 1 to 10 by invoking programs or instructions stored in the memory.13.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储程序或指令,所述程序或指令使计算机执行如权利要求1至10任一项所述方法的步骤。13. A computer-readable storage medium, characterized in that the computer-readable storage medium stores programs or instructions, the programs or instructions causing a computer to perform the steps of the method according to any one of claims 1 to 10.
CN202110308220.0A2021-03-232021-03-23 Data analysis method, apparatus, equipment and storage mediumPendingCN115115386A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202110308220.0ACN115115386A (en)2021-03-232021-03-23 Data analysis method, apparatus, equipment and storage medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202110308220.0ACN115115386A (en)2021-03-232021-03-23 Data analysis method, apparatus, equipment and storage medium

Publications (1)

Publication NumberPublication Date
CN115115386Atrue CN115115386A (en)2022-09-27

Family

ID=83323156

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202110308220.0APendingCN115115386A (en)2021-03-232021-03-23 Data analysis method, apparatus, equipment and storage medium

Country Status (1)

CountryLink
CN (1)CN115115386A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20160371740A1 (en)*2008-03-172016-12-22Segmint Inc.System and method for delivering a financial application to a prospective customer
US20170091809A1 (en)*2015-09-302017-03-30Linkedln CorporationTracking interaction with sponsored and unsponsored content
CN109034906A (en)*2018-08-032018-12-18北京木瓜移动科技股份有限公司Anti- cheat method, device, electronic equipment and the storage medium of advertising conversion
CN110033315A (en)*2019-03-182019-07-19北京品友互动信息技术股份公司The attribution method and device of advertising information conversion, storage medium, electronic device
CN110189152A (en)*2018-02-232019-08-30北京国双科技有限公司The attribution method and apparatus of channel
CN111242687A (en)*2020-01-132020-06-05腾讯科技(深圳)有限公司 An advertising data analysis method, device, electronic device and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20160371740A1 (en)*2008-03-172016-12-22Segmint Inc.System and method for delivering a financial application to a prospective customer
US20170091809A1 (en)*2015-09-302017-03-30Linkedln CorporationTracking interaction with sponsored and unsponsored content
CN110189152A (en)*2018-02-232019-08-30北京国双科技有限公司The attribution method and apparatus of channel
CN109034906A (en)*2018-08-032018-12-18北京木瓜移动科技股份有限公司Anti- cheat method, device, electronic equipment and the storage medium of advertising conversion
CN110033315A (en)*2019-03-182019-07-19北京品友互动信息技术股份公司The attribution method and device of advertising information conversion, storage medium, electronic device
CN111242687A (en)*2020-01-132020-06-05腾讯科技(深圳)有限公司 An advertising data analysis method, device, electronic device and storage medium

Similar Documents

PublicationPublication DateTitle
CN111352907A (en)Method and device for analyzing pipeline file, computer equipment and storage medium
CN110263009A (en)Generation method, device, equipment and the readable storage medium storing program for executing of log classifying rules
CN112650743B (en)Funnel data analysis method, system, electronic equipment and storage medium
US20230004979A1 (en)Abnormal behavior detection method and apparatus, electronic device, and computer-readable storage medium
CN110941530B (en)Method, device, computer equipment and storage medium for acquiring monitoring data
CN112527830A (en)Data query method, device, equipment and storage medium
CN115033688A (en)Method, device, equipment and storage medium for identifying alarm event type
CN112612887A (en)Log processing method, device, equipment and storage medium
CN105574032A (en)Rule matching operation method and device
CN116860856A (en)Financial data processing method and device, computer equipment and storage medium
WO2021120782A1 (en)Method and apparatus for extracting key information from log, terminal and storage medium
CN113986932B (en)Data processing method and device, computer equipment and readable storage medium
CN109284268B (en) A method, system and electronic device for quickly parsing logs
CN111930545B (en)SQL script processing method, SQL script processing device and SQL script processing server
CN115115386A (en) Data analysis method, apparatus, equipment and storage medium
CN117688062A (en)Data sampling method, device and storage medium based on data modeling
CN116795987A (en)Transaction message processing method and device, electronic equipment and storage medium
CN116150225A (en) Data field processing method, apparatus, device, medium and program product
WO2023185377A1 (en)Multi-granularity data pattern mining method and related device
CN112434057B (en)Data query method and device
CN118152420A (en)Data query method, device, electronic equipment and computer readable storage medium
CN108664550A (en)It is a kind of that funnel analysis method and device are carried out to user behavior data
CN107688948A (en)Claims Resolution data processing method, device, computer equipment and storage medium
CN111782479A (en) Log processing method, apparatus, electronic device, and computer-readable storage medium
CN111427870B (en)Resource management method, device and equipment

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp