Movatterモバイル変換


[0]ホーム

URL:


CN118569936B - Advertisement user analysis method and system - Google Patents

Advertisement user analysis method and system
Download PDF

Info

Publication number
CN118569936B
CN118569936BCN202410775423.4ACN202410775423ACN118569936BCN 118569936 BCN118569936 BCN 118569936BCN 202410775423 ACN202410775423 ACN 202410775423ACN 118569936 BCN118569936 BCN 118569936B
Authority
CN
China
Prior art keywords
instance
sample
behavior
user
screening
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410775423.4A
Other languages
Chinese (zh)
Other versions
CN118569936A (en
Inventor
谭荣棉
徐彬
林浩勇
李政平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Liheng Information Technology Guangzhou Co ltd
Original Assignee
Liheng Information Technology Guangzhou Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Liheng Information Technology Guangzhou Co ltdfiledCriticalLiheng Information Technology Guangzhou Co ltd
Priority to CN202410775423.4ApriorityCriticalpatent/CN118569936B/en
Publication of CN118569936ApublicationCriticalpatent/CN118569936A/en
Application grantedgrantedCritical
Publication of CN118569936BpublicationCriticalpatent/CN118569936B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开了一种广告用户分析方法及系统,包括:首先通过响应广告推广操作,实时获取用户行为数据,并输入至训练后的广告用户行为识别模型中进行分类。根据分类结果,系统从广告推送策略库中确定个性化的推送策略,从而实现对用户行为的精准分析和广告的有效推送。如此设计,提升了广告推送的精准性和用户满意度。

The present invention discloses an advertising user analysis method and system, comprising: first, obtaining user behavior data in real time by responding to advertising promotion operations, and inputting the data into a trained advertising user behavior recognition model for classification. According to the classification results, the system determines a personalized push strategy from an advertising push strategy library, thereby achieving accurate analysis of user behavior and effective advertising push. Such a design improves the accuracy of advertising push and user satisfaction.

Description

Advertisement user analysis method and system
Technical Field
The invention relates to the technical field of advertisement pushing, in particular to an advertisement user analysis method and system.
Background
In the existing advertisement pushing system, advertisement user analysis is often based on simple user images and static behavior data, and it is difficult to accurately grasp the dynamic interests and demands of users, so that the advertisement pushing efficiency is low.
Disclosure of Invention
The invention aims to provide an advertisement user analysis method and system.
In a first aspect, an embodiment of the present invention provides an advertisement user analysis method, where the method includes:
responding to advertisement promotion operation triggered by a current user, and acquiring current user behavior of the current user in a preset monitoring period;
Transmitting the current user behavior to the trained advertisement user behavior recognition model, and obtaining a classification result of the current user behavior through the trained advertisement user behavior recognition model;
according to the classification result, determining an advertisement pushing strategy corresponding to the current user from a pre-stored advertisement pushing strategy library;
And taking the classification result and the advertisement pushing strategy as advertisement user analysis results of the current user.
In a second aspect, an embodiment of the present invention provides a server system, including a server, where the server is configured to perform the method described in the first aspect.
Compared with the prior art, the advertisement user analysis method and system have the beneficial effects that the advertisement user analysis method and system disclosed by the invention are adopted, and the user behavior data is firstly obtained in real time through responding to advertisement promotion operation and is input into a trained advertisement user behavior recognition model for classification. According to the classification result, the system determines a personalized pushing strategy from the advertisement pushing strategy library, so that accurate analysis of user behaviors and effective pushing of advertisements are realized. By the design, the accuracy of advertisement pushing and the satisfaction degree of users are improved.
Drawings
In order to more clearly illustrate the technical solution of the embodiments of the present invention, the drawings that are required to be used in the embodiments will be briefly described. It is appreciated that the following drawings depict only certain embodiments of the invention and are therefore not to be considered limiting of its scope. Other relevant drawings may be made by those of ordinary skill in the art without undue burden from these drawings.
FIG. 1 is a flowchart illustrating steps of an advertisement user analysis method according to an embodiment of the present invention;
fig. 2 is a schematic block diagram of a computer device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It will be apparent that the described embodiments are some, but not all, embodiments of the invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
The following describes specific embodiments of the present invention in detail with reference to the drawings.
In order to solve the foregoing technical problems in the background art, fig. 1 is a schematic flow chart of an advertisement user analysis method according to an embodiment of the disclosure, and the advertisement user analysis method is described in detail below.
Step S201, responding to advertisement promotion operation triggered by a current user, and acquiring current user behavior of the current user in a preset monitoring period;
step S202, transmitting the current user behavior to the trained advertisement user behavior recognition model, and obtaining a classification result of the current user behavior through the trained advertisement user behavior recognition model;
step S203, determining an advertisement pushing strategy corresponding to the current user from a pre-stored advertisement pushing strategy library according to the classification result;
And step S204, taking the classification result and the advertisement pushing strategy as advertisement user analysis results of the current user.
In an embodiment of the present invention, the server detects that user a clicks on an advertisement link on a social media application, which is identified as a trigger for an advertisement promotion operation, for example. The server then starts monitoring the behavior of user a during the next preset monitoring period (e.g. the last week). These actions include, but are not limited to, user A's browsing records on the social media application, search history, praise and comment actions, and the like. User a clicks on an advertising link with "new smart phone" and the server begins to record user a's behavior for the next 7 days. In the 7 days, the server collects that the user A browses a plurality of articles related to the mobile phone, inputs keywords such as 'mobile phone evaluation' and the like in a search bar, and prays a plurality of comments about the photographing function of the mobile phone and the like. The server inputs the collected behavior data of the user A into the trained advertisement user behavior recognition model. The model can identify the behavior mode of the user and classify the behavior mode into different behavior categories through a large amount of data training. For example, these categories may include "potential buyers", "interested but hesitant", "unintended buyers", and the like. The server inputs data such as browsing records, search histories, praise comments and the like of the user A into the advertisement user behavior recognition model. After the model is calculated, the behavior of user A is classified as "potential purchaser" because the behavior of user A shows continued interest and interest in the cellular phone product. And the server searches the corresponding advertisement pushing strategy in a pre-stored advertisement pushing strategy library according to the user behavior classification result given by the advertisement user behavior recognition model. These policies may include advertising content pushed, frequency, time, etc. Since user a is classified as a "potential buyer," the server finds the corresponding policy in the advertisement push policy repository, decides to push an advertisement for the latest smartphone to user a, and pushes it once a day for the next week, with the push time being selected at the beginning of the user a active period. The server stores the user behavior classification result and the corresponding advertisement pushing strategy as a result of analysis of the advertisement user, and the result can be used for subsequent advertisement pushing and effect evaluation. The server keeps the results of user a classified as a "potential purchaser" and the corresponding advertisement push policy. This result will be used for the subsequent pushing of advertisements to user a and may be used to evaluate the effectiveness of this advertisement push, e.g., click-through rate, conversion rate, etc.
In the embodiment of the invention, the advertisement user behavior recognition model is obtained in the following way.
Obtaining a target behavior instance and at least two reference behavior categories, wherein the target behavior instance corresponds to at least two sample behavior categories;
Obtaining a reference behavior class vector of each reference behavior class through a behavior feature extraction model, performing aggregation operation on at least two reference behavior classes according to the reference behavior class vector to obtain at least two reference behavior class centers, and obtaining an aggregation behavior class of each reference behavior class center through the sample behavior class;
transmitting the target behavior instance to an advertisement user behavior recognition model, obtaining an inferred behavior class center from at least two reference behavior class centers through the advertisement user behavior recognition model, and obtaining the reference behavior class corresponding to the target behavior instance from the inferred behavior class center;
obtaining a first deviation according to the judging result of the inferred behavior class center and the aggregate behavior class, and obtaining a second deviation according to the judging result of the reference behavior class corresponding to the target behavior instance and the sample behavior class;
and executing an integrated training process on the behavior feature extraction model and the advertisement user behavior recognition model according to the first deviation and the second deviation to obtain a trained advertisement user behavior recognition model.
In an exemplary embodiment of the present invention, a server first collects a series of target behavior instances, which are actual behavior data of a user during an advertising campaign. At the same time, the server also defines multiple benchmark behavioral categories, such as "active buyers", "occasional attentives", and "potential attrition users", etc. For each target behavior instance, the server is labeled with at least two sample behavior categories, which are derived based on expert judgment or preliminary analysis. The server collects behavior data of user B during advertisement promotion as a target behavior instance. For user B, the expert initially determines that its behavior may belong to the two sample behavior categories of "active buyers" or "occasional attentives". And the server processes the behavior data of the reference behavior category by using the behavior feature extraction model, extracts key features and generates a reference behavior category vector. An aggregation operation (e.g., calculating an average or clustering center) is then performed on these vectors, resulting in a center for each benchmark behavior class. Finally, based on the noted sample behavior categories, the server further aggregates the benchmark behavior category centers into higher-level aggregate behavior categories. For the two benchmark behavioral categories of "active buyers" and "occasional attentors", the server uses the behavioral feature extraction model to extract their feature vectors, respectively, and calculates the respective category centers. the server then aggregates the two category centers into a more specific aggregate behavior category, such as "high intent purchaser," based on the sample behavior category labels for user B. The server transmits the targeted behavior instance (e.g., user B's behavior data) to the advertisement user behavior recognition model. The model deduces the most probable behavior class center according to the similarity between the characteristics of the target behavior instance and the reference behavior class center, and determines the reference behavior class corresponding to the target behavior instance according to the most probable behavior class center. The server inputs the behavior data of user B into the advertisement user behavior recognition model. The model deduces from the similarity of the behavior characteristics of user B to the class centers of "active buyers" and "occasional attentors" that user B's behavior is closer to the class center of "active buyers", thus classifying user B as "active buyer". The server compares the judgment result of the inferred behavior class center of the advertisement user behavior recognition model with the actual value of the aggregate behavior class, and calculates a first deviation. Meanwhile, the server also compares the reference behavior type judgment result of the model on the target behavior instance with the actual value of the sample behavior type, and calculates a second deviation. The server finds that the advertisement user behavior recognition model determines user B as the "active purchaser" and the actual value in the aggregate behavior category is the "high intent purchaser" with a difference therebetween, producing a first bias. Meanwhile, the model determines that user B is an "active purchaser," and the labeling of the sample behavior category is an "active purchaser" or an "occasional attention," and if the labeling is an "occasional attention," a second bias is generated. The server uses the first deviation and the second deviation as feedback to jointly adjust and optimize the behavior feature extraction model and the advertisement user behavior recognition model. This process is an iterative process that improves model accuracy and generalization ability by continually reducing bias. And the server performs parameter adjustment and optimization on the behavior feature extraction model and the advertisement user behavior recognition model according to the first deviation and the second deviation. After repeated iterative training, the deviation of the model is gradually reduced, and finally, a trained advertisement user behavior recognition model is obtained, and the model can more accurately recognize the behavior category of the user.
The method comprises the steps of transmitting target behavior examples to an advertisement user behavior recognition model, obtaining inferred behavior class centers from at least two reference behavior class centers through the advertisement user behavior recognition model, and implementing through the following example implementation.
Transmitting the target behavior instance to the feature extraction component to obtain an instance feature vector;
transmitting the instance feature vector to the feature recognition component to obtain a first class reliability coefficient of each reference behavior class center;
Respectively executing standardization operation on each first class reliability coefficient to obtain first class attribution confidence coefficient of each reference behavior class center;
And taking the reference behavior class center corresponding to the first class attribution confidence as an inferred behavior class center under the condition that the first class attribution confidence is not lower than a preset first confidence threshold.
In an embodiment of the present invention, the server, illustratively, after collecting the target behavior instances of the user, transmits these behavior data (e.g., the user's browsing records, search history, praise and comments, etc.) to the feature extraction component in the advertisement user behavior recognition model. The feature extraction component processes the behavioral data, extracts key features, and generates an example feature vector. The server collects behavior data of user C during advertisement promotion as a target behavior instance. Such data includes user C having browsed a number of articles related to travel, searched keywords such as "travel destination recommendations", praised several comments about the travel experience, etc. The server transmits the data to a feature extraction component that extracts key features (e.g., number of travel topic articles the user browses, frequency of travel related keywords searched, number of travel reviews praised, etc.) and generates an example feature vector that contains the features. After the instance feature vector is obtained, the server transmits it to the feature recognition component in the advertisement user behavior recognition model. The feature recognition component calculates the reliability coefficient (also referred to as the degree of matching or similarity) of each reference behavior class to the target behavior instance based on the similarity of the vector to the predefined reference behavior class center. The server transmits the instance feature vector of user C to the feature recognition component. The component calculates the reliability factor for each benchmark behavior class for the user C behavior instance by comparing the similarity of this vector to the centers of the three benchmark behavior classes, "tourist lovers", "recreational tourists" and "planned tourists", respectively. The calculated reliability coefficients for the dummy designs were 0.85, 0.60 and 0.75, respectively. Since the reliability coefficients of different benchmark behavior class centers may be in different numerical ranges, for comparison and judgment, the server will perform normalization operations on these reliability coefficients, converting them to confidence levels within the same numerical range. The server performs a normalization operation on the calculated reliability coefficients. Assume that the normalization operation is to convert all the reliability coefficients to values between 0 and 1, and that a larger value indicates a higher confidence. After normalization, the confidence levels of user C for the three benchmark categories "travel fan", "recreational tourist" and "planned tourist" were 0.90 (corresponding to a reliability coefficient of 0.85), 0.65 (corresponding to a reliability coefficient of 0.60) and 0.80 (corresponding to a reliability coefficient of 0.75), respectively. To determine which benchmark behavioral categories the target behavioral instance is most likely to belong to, the server sets a confidence threshold. Only if the confidence level of a certain benchmark behavioral class meets or exceeds this threshold will the server treat it as an inferred behavioral class center. The server sets a first confidence threshold of 0.75. After comparing the confidence levels of user C for each benchmark behavioral category, it was found that both the confidence levels of "tourist fan" and "planned tourist" (0.90 and 0.80, respectively) met the threshold requirement, while the confidence level of "recreational tourist" (0.65) was below the threshold. Therefore, the server uses the reference behavior class center of the "tourist fan" with the highest confidence as the inferred behavior class center, and considers that the user C is most likely to belong to the "tourist fan" behavior class.
In the embodiment of the invention, the judging result of the inferred behavior type center comprises the first category attribution confidence of each reference behavior type center, and the first deviation is obtained according to the judging result of the inferred behavior type center and the aggregate behavior type, and can be implemented through the following example.
Obtaining the aggregate behavior confidence coefficient of each reference behavior class center according to the aggregate behavior class of each reference behavior class center;
calculating the mean square deviation between each aggregation behavior confidence coefficient and the corresponding first category attribution confidence coefficient to obtain at least two confidence coefficient deviation;
Taking the summation result of all the confidence deviations as a first deviation.
In an embodiment of the invention, illustratively, in the previous step, the aggregate behavior class for each reference behavior class center has been obtained. To calculate the first bias, the server needs to assign an aggregate behavior confidence to each benchmark behavior class center. This confidence level may be set based on expert knowledge, historical data, or other criteria. Assume three benchmark behavioral category centers, namely "tourist fan", "recreational tourist" and "planned tourist". At the discretion of the expert, the three category centers were assigned aggregate behavior confidence levels of 0.9 (corresponding to "tourist lovers"), 0.7 (corresponding to "recreational tourists"), and 0.8 (corresponding to "planned tourists"), respectively. These confidence levels reflect the reliability or importance of each aggregate behavior category. In the foregoing procedure, the first class attribution confidence of each benchmark class center for the target behavior instance has been obtained. Now, the server needs to calculate the deviation between the aggregate behavior confidence of each benchmark behavior class center and the corresponding first class attribution confidence. Such bias may be calculated by mean square error (or other statistical measure). For the behavioral example of user C, the first category attribution confidence levels for "tourist fan", "recreational tourist" and "planned tourist" were found to be 0.90, 0.65 and 0.80, respectively (as shown in the previous examples). Meanwhile, the aggregate behavior confidence of the centers of the three reference behavior categories is known to be 0.9, 0.7 and 0.8, respectively. The server now calculates the mean square error deviation between the aggregate behavior confidence of each reference behavior class center and the corresponding first class attribution confidence. For example, the deviation is ((0.9-0.90)/(2=0) for the "tourist fan" category, ((0.7-0.65)/(2=0.0025) for the "recreational tourist" category, and ((0.8-0.80)/(2=0) for the "planned tourist" category. After calculating the confidence deviations for each reference behavior class center, the server adds the deviation values to obtain a total value of the first deviation. This total value reflects the overall degree of difference between the inferred behavior class center and the aggregate behavior class of the advertising user behavior recognition model. The confidence deviations of the centers of the three reference behavior categories calculated above are added to obtain the total value of the first deviation (0+0.0025+0=0.0025). This value represents the subtle difference between the inferred results of the user C's behavior instance in the advertisement user behavior recognition model and the aggregated behavior categories defined by the expert or historical data. If the value of the first deviation is large, it may be necessary to further adjust or optimize the model.
In the embodiment of the invention, the obtaining the reference behavior category corresponding to the target behavior instance from the inferred behavior category center can be implemented through the following example execution.
Traversing each of the benchmark behavior categories in the inferred behavior category center, calculating a vector distance between the instance feature vector and a benchmark behavior category vector of the benchmark behavior category;
performing a standardization operation on the vector distance to obtain a second class attribution confidence level of the reference behavior class;
And under the condition that the second class attribution confidence level is not lower than a preset second confidence level threshold value, taking the reference behavior class corresponding to the second class attribution confidence level as the reference behavior class corresponding to the target behavior instance.
In an embodiment of the present invention, the server, after determining the inferred behavior class centers by the feature recognition component, illustratively traverses further the benchmark behavior classes represented by those centers. For each reference behavior class, the server calculates a vector distance between the instance feature vector of the target behavior instance and the reference behavior class vector of the reference behavior class. This distance generally reflects the degree of similarity or difference between the instance feature vector and the benchmark behavioral class vector. Assume that the inferred behavior class center includes two benchmark behavior classes, "tourist lovers" and "planned out tourists". The server will calculate the vector distance between the user C's instance feature vector and the "travel fan" reference behavior category vector, and the vector distance between the user C's instance feature vector and the "planned tourist" reference behavior category vector. These vector distances may be calculated by euclidean distance, cosine similarity, or other vector similarity metrics. Since the vector distances between different benchmark behavioral categories may be in different numerical ranges, for comparison and judgment, the server needs to perform normalization operations on these vector distances, converting them to confidence levels within the same numerical range. This confidence reflects the likelihood that the target behavior instance belongs to a certain benchmark behavior class. The server performs a normalization operation on the calculated vector distances using some normalization method (e.g., min-Max normalization). Assuming that the second-class attribution confidence of the reference behavior class of "travel fan" is 0.85 and the second-class attribution confidence of the reference behavior class of "planned to be tourist" is 0.70. These values represent how similar the behavior instance of user C is to the two reference behavior categories. To ensure accuracy and reliability of the results, the server sets a second confidence threshold. Only when the second category attribution confidence level of a certain reference behavior category reaches or exceeds this threshold value, the server takes the reference behavior category as the reference behavior category corresponding to the target behavior instance. The server sets the second confidence threshold to 0.75. After comparing the second category attribution confidence levels of the two benchmark behavioral categories "tourist fan" and "planned tourist", it was found that the confidence level (0.85) of "tourist fan" reached the threshold requirement, while the confidence level (0.70) of "planned tourist" was below the threshold. Thus, the server treats "travel fan" as the benchmark behavioral category corresponding to the target behavioral instance of user C.
In the embodiment of the invention, the judgment result of the reference behavior class corresponding to the target behavior instance comprises a second class attribution confidence of each reference behavior class in the inferred behavior class center, and the second deviation is obtained according to the judgment result of the reference behavior class corresponding to the target behavior instance and the sample behavior class, and can be implemented through the following example execution.
Obtaining target category confidence degrees of each reference behavior category in the inferred behavior category center according to the sample behavior categories;
calculating the mean square deviation between each target class confidence coefficient and the corresponding second class attribution confidence coefficient to obtain at least two class deviations;
and taking the summation result of all the category deviations as a second deviation.
In an embodiment of the present invention, the server illustratively has some sample behavior data that has been labeled as a particular behavior class (i.e., sample behavior class). The server will use these sample data to determine the target class confidence for each benchmark behavior class in the inferred behavior class center. This typically involves statistics and classification of the sample data to calculate the frequency or probability of occurrence of each benchmark behavioral class in the sample data. Assume that there are a large number of sample behavioral data that have been labeled as three benchmark behavioral categories, "tourist lovers", "recreational tourists" and "planned to-go. The server counts the occurrence times or frequency of each reference behavior class in the sample data and calculates the corresponding target class confidence. For example, of 1000 sample data, 300 are labeled "tourist lovers", 200 are labeled "recreational tourist", and 500 are labeled "planned tourist". Thus, the target category confidence levels are 0.3 (corresponding to "travel lovers"), 0.2 (corresponding to "recreational travelers"), and 0.5 (corresponding to "planned walkers"), respectively. After determining the target class confidence for each benchmark behavior class, the server calculates the bias between these probabilities and the second class attribution confidence for the corresponding benchmark behavior class. Such bias is typically calculated by mean square error (or other statistical measure) to quantify the degree of difference between the target class confidence and the second class home confidence. Assume that the second category attribution confidence of "travel fan" calculated previously is 0.85 (as described in the previous step), and the corresponding target category confidence is 0.3. The server calculates the mean square error deviation between these two values. Similarly, the mean square deviation between the target class confidence and the second class attribution confidence of the benchmark behavior classes of "casual tourists" and "planned tourists" is also calculated. The server sums the calculated class deviations for each reference behavior class to obtain a total deviation value, i.e. the second deviation. This second deviation reflects the overall degree of difference between the second class-attributed confidence of the benchmark behavior class in the inferred behavior class center and the target class confidence of the sample behavior class. Assuming a category deviation of 0.05 for "tourist lovers", 0.03 for "recreational tourist", and 0.02 for "planned tourist" (these values are merely examples, and the actual calculated values will depend on the specific algorithm and data). The server adds these class deviations to give a total value of the second deviation of 0.05+0.03+0.02=0.1. This value represents the overall degree of difference between the second category-attributed confidence of the benchmark behavior category in the inferred behavior category center and the target category confidence of the sample behavior category. If the value of the second deviation is large, it may mean that the advertisement user behavior recognition model needs further adjustment or optimization.
In the embodiment of the invention, the transmission of the target behavior instance to the feature extraction component to obtain an instance feature vector can be implemented by the following example execution.
Performing time sequence cutting processing on the target behavior instance to obtain a user behavior set, wherein the user behavior set comprises at least two user interaction operations;
configuring an initial tag for the initial interaction operation of the user behavior set, and configuring a termination tag for the termination interaction operation of the user behavior set, so as to obtain a behavior set to be processed;
Performing feature extraction processing on the behavior set to be processed to obtain a behavior operation vector set;
Performing feature extraction operation on the behavior operation vector set by using the feature extraction component through a cyclic neural network to obtain an interaction feature vector set, wherein the interaction feature vector set comprises feature representation of each user interaction operation in the behavior set to be processed;
And carrying out feature combination on each feature representation through a cyclic neural network to obtain an example feature vector.
In the embodiment of the invention, the server receives a target behavior instance, which may be a behavior record of a user in a time period. The server performs a time-series cutting process on the behavior instance, and divides the behavior instance into a plurality of independent user interaction operations, and the operations are combined into a user behavior set according to a time sequence. Assuming that the server receives a target behavior instance of user C on the e-commerce platform, this instance records all the operations of user C from entering the platform to leaving the platform. The server cuts the behavior instance into a plurality of user interaction operations, such as browsing commodity A, joining shopping cart, browsing commodity B, submitting order, etc., according to the operation time sequence of user C, and these operations form a user behavior set. The server assigns a label to each interaction in the set of user actions to identify the beginning and end of the operation. Typically, the first interactive operation will be marked as a start operation and the last operation will be marked as a stop operation. These tags facilitate subsequent feature extraction and model training. In the user C's behavior set, "browse merchandise a" is the first operation for which the server will configure an initial tag indicating that this is the beginning of the user's behavior. While "submit order" is the last operation of the user's action, the server will configure it with a termination tag indicating the end of the user's action. Through such processing, the server obtains a set of pending actions with start and stop tags. The server extracts the characteristics of each user interaction operation in the behavior set to be processed, and converts the characteristics into a vector form which can be understood by a computer. These vectors typically contain key information for user interaction such as type of operation, time of operation, results of the operation, etc. The server extracts the characteristics of each interactive operation of the user C, converts the browsing commodity A into a vector containing information such as commodity ID, browsing time, browsing duration and the like, and converts the joining shopping cart into a vector containing information such as commodity ID, joining time, shopping cart number and the like. Through such processing, the server obtains a set of behavior operation vectors. The server processes a set of behavioral operation vectors using a Recurrent Neural Network (RNN). The RNN is able to capture timing dependencies in the sequence data, thereby extracting features between user interactions. Through the RNN process, the server obtains a set of interaction feature vectors that contain a feature representation for each interaction in the sequence of user actions. The server inputs the behavior operation vector set into the RNN model, the model processes each vector, and outputs a corresponding interaction feature vector. These feature vectors contain timing relationships and dependency information between individual interactions in the user behavior sequence. After obtaining the set of interaction feature vectors, the server again uses the RNN or other sequence processing model to merge the feature vectors. The purpose of this process is to combine multiple interoperable feature representations in a sequence of user actions into one unified instance feature vector for subsequent classification or predictive task use. The server inputs the set of interaction feature vectors into another RNN model, which combines the vectors and outputs a single instance feature vector. The vector integrates the characteristic information of all the interactive operations in the user behavior sequence and can be used as the characteristic representation input by the subsequent model.
In the embodiment of the invention, the target behavior instance is obtained in the following manner.
Acquiring a sample instance set for training an advertisement user behavior recognition model, and determining a candidate instance set to be screened based on the sample instance set;
the method comprises the steps of obtaining a multi-task integrated model for carrying out instance screening processing on a candidate instance set, wherein the multi-task integrated model is a model formed by a plurality of subtask networks matched with a plurality of evaluation indexes, and one evaluation index corresponds to one subtask network;
loading the candidate instance set to each subtask network in the plurality of subtask networks, and performing instance screening processing on the candidate instance set by each subtask network to obtain an instance screening result of the candidate instance set under each evaluation index;
Preprocessing the sample instance set based on the instance screening result of the candidate instance set under each evaluation index;
And determining the target behavior instance based on the preprocessed sample instance set.
In an exemplary embodiment of the invention, the server first retrieves a large amount of sample data from a data repository that is relevant to the behavior of the advertising user, including various behavior records of the user on the advertising platform. Based on these sample data, the server may apply some predefined rules or algorithms, screening candidate instances of potential value therefrom as a basis for further screening and processing. Assume that a server obtains a sample instance set of 10 ten thousand user clicks, browses, purchases, etc. on an advertising platform. Based on these sample instances, the server uses data cleansing and screening algorithms to remove duplicate, outlier or insignificant instances, and finally determines a 5-ten-thousand candidate instance set with higher representativeness and relevance. the server will then load a pre-trained multi-tasking integration model. This model is made up of a plurality of subtask networks, each subtask network corresponding to a particular evaluation index for evaluating the quality and value of candidate instances from different perspectives. The server is loaded with a multi-task integration model comprising three sub-task networks, corresponding to three evaluation indexes of click rate (CTR), conversion rate (CVR) and User Satisfaction (US), respectively. The three subtask networks are each trained to predict the performance of candidate instances on corresponding assessment indicators. The server inputs the candidate instance set into each subtask network in the multitasking integrated model, and each subtask network screens and scores the candidate instance according to the evaluation index of the subtask network. Thus, each candidate instance may be screened for multiple evaluation criteria. The server loads 5 ten thousand candidate instances into the three subtask networks. The first subtask network scores and ranks candidate instances according to Click Through Rate (CTR), the second subtask network scores and ranks candidate instances according to conversion rate (CVR), and the third subtask network scores and ranks candidate instances according to User Satisfaction (US). Eventually, each candidate instance will result in a screening result that includes three evaluation index scores. The server pre-processes the original sample instance set according to screening results of the candidate instance set under a plurality of evaluation indexes. This may include deleting unsatisfactory instances, weighting instances, etc., to ensure that the final set of targeted behavioral instances better reflect the actual behavior of the advertiser. The server preprocesses the original 10 ten thousand sample instance sets according to the screening results of the three subtask networks. The server first deletes instances that score too low under any of the evaluation criteria, and then assigns different weights to each instance based on its score under different evaluation criteria. By such preprocessing, the quality of the sample instance set is improved. And finally, the server determines a final target behavior instance according to the preprocessed sample instance set. These target behavior instances will be used as training data for the advertisement user behavior recognition model to improve the accuracy and performance of the model. After pretreatment, the server screens out 2 ten thousand high-quality target behavior examples from 10 ten thousand original sample examples. These examples perform well on multiple evaluation metrics such as click through rate, conversion rate, and user satisfaction, and are therefore selected as training data for the advertisement user behavior recognition model.
In the embodiment of the invention, the sample instance set comprises a main sample instance set, and the determination of the candidate instance set to be screened based on the sample instance set can be implemented through the following example.
Performing instance sampling on the main sample instance set to obtain a sampling sample instance matched with the main sample instance set;
And taking the sampled sample instance as a sample instance in a candidate instance set, and determining the candidate instance set to be screened based on the sample instance in the candidate instance set.
In an exemplary embodiment of the invention, a server stores a large amount of user behavior data that is consolidated into a plurality of sample instance sets for different model training and data analysis tasks. Wherein the primary sample instance set is a core dataset containing a large amount of user behavior data for subsequent data screening and model training. The main sample instance set on the server contains all user behavior data that occurred on the advertising platform over the past year, totaling about 500 tens of thousands. Such data includes user clicking, browsing, purchasing, commenting, etc. actions, as well as corresponding time stamp, advertisement ID, user ID, etc. information. Since the data volume of a primary sample instance set can be very large, directly processing the entire set can consume significant computing resources and time. Thus, the server may employ a sampling technique to extract a portion of the representative data from the primary sample instance set as sampled sample instances. The server randomly samples the main sample instance set, and 10 ten thousand pieces of data in the main sample instance set are extracted as sample instances. The 10 ten thousand pieces of data cover data of different time periods, different user groups and different advertisement types, and can better represent the characteristics of the whole main sample instance set. The server takes the sampled sample instance as initial data of the candidate instance set. Because errors or deviations may exist during the sampling process, further screening and processing of the candidate instance set is required to ensure that the resulting data set is of higher quality and representativeness. The server uses 10 ten thousand sample instances as initial data for the candidate instance set. Then, the server removes repeated data, abnormal data and irrelevant data through a series of data cleaning and preprocessing operations, and ensures the accuracy and consistency of the data. Then, the server further screens and sorts the candidate instance sets according to the service requirements and the data characteristics, and finally determines 5 ten thousand candidate instance sets to be screened. The number of the example sets is reduced, but the quality and the representativeness are improved remarkably, and more reliable data support is provided for subsequent model training and analysis.
In the embodiment of the invention, the main sample instance set comprises a plurality of sample instance data sets, at least one sample instance is included under each sample instance data set in the plurality of sample instance data sets, and the sample sampling is carried out on the main sample instance set to obtain a sampling sample instance matched with the main sample instance set, and the implementation can be carried out through the following example.
Acquiring a sampling number for sampling an instance;
Sequentially extracting sample instances corresponding to the sampling number from the sample instance data sets to obtain sample instance data sets corresponding to the sample instance data sets;
and determining the sampling sample instance matched with the main sample instance set based on the sampling sample instance data set corresponding to each sampling instance data set.
In an exemplary embodiment of the present invention, a server stores a vast amount of user behavior data in an advertising system, which is organized into a plurality of sample instance data sets. Each sample instance data set represents a particular class of user behavior or advertising campaign, such as click data for a particular advertising campaign, browsing data for a particular period of time, etc. The primary sample instance set includes a plurality of sample instance data sets, such as a click data set with advertisement A, a purchase data set with advertisement B, a first quarter browse data set, and so forth. Each data set contains a plurality of specific sample instances, such as a user click record of advertisement a, a user purchase record of advertisement B, etc. In deciding to sample, the server needs to determine a sample number, which is generally determined by a combination of factors such as traffic demands, computational resource limitations, data characteristics, and the like. The server decides to extract 1000 sample instances from each sample instance data set as sampling samples according to the service requirements. This number ensures the representativeness of the samples without putting excessive computational stress on the server. After determining the number of samples, the server starts to extract a corresponding number of sample instances in turn from each sample instance data set. The extraction process may be random or may be performed according to a certain rule or policy. The server randomly extracts 1000 click records from the click data set of advertisement a, randomly extracts 1000 purchase records from the purchase data set of advertisement B, and randomly extracts 1000 browse records from the browse data set of the first quarter. These decimated sample instances form a sampled sample instance data set corresponding to each sample instance data set. After the sample instance data sets corresponding to the respective sample instance data sets are obtained, the server needs to integrate them together to form a sample instance set that matches the primary sample instance set. This set will be the input data for subsequent data processing and model training. The server integrates 1000 click records for ad a, 1000 purchase records for ad B, and 1000 browse records for the first quarter into one sampled sample instance set containing 3000 sample instances. This set maintains consistency with the main sample instance set in terms of data distribution, feature distribution, etc., and thus can be used as a reliable input for subsequent data processing and model training.
In the embodiment of the invention, the sample instance set comprises a main sample instance set, and the preprocessing of the sample instance set based on the instance screening result of the candidate instance set under each evaluation index can be implemented through the following example execution.
Determining an evaluation index to be processed which needs to be preprocessed based on an instance screening result of the candidate instance set under each evaluation index;
and preprocessing the sample instances in the main sample instance set based on the evaluation index to be processed.
In an exemplary embodiment of the present invention, in an advertising data analysis system, a server stores a large amount of user behavior data, which is organized into a primary sample instance set. The collection contains detailed data of various aspects of advertisement activities, user behaviors and the like, and is the basis of subsequent data analysis and model training. The main sample instance set contains user behavior data, such as clicks, browses, purchases, etc., for all advertising campaigns over the past year. Each piece of data has information of user ID, advertisement ID, behavior type, time stamp, etc. recorded in detail. And the server analyzes which evaluation indexes are poor in data performance or abnormal in data performance according to the screening results of the candidate instance set under a plurality of evaluation indexes, so as to determine the evaluation indexes needing to be preprocessed. The server inputs the candidate instance set into the multi-task integration model, and a screening result under three evaluation indexes of click rate (CTR), conversion rate (CVR) and User Satisfaction (US) is obtained. Analysis found that under the CVR index, the candidate instance set performed generally poorly and there were a large number of outliers. Thus, the server determines the CVR as a pending evaluation index requiring preprocessing. After determining the evaluation index to be processed, the server pre-processes the sample instances in the main sample instance set for the index. The aim of the preprocessing is to improve the data quality and reduce the influence of abnormal values on the subsequent model training. Since the CVR is the evaluation index to be processed, the server begins CVR-related preprocessing of sample instances in the primary sample instance set. First, the server performs outlier detection on the CVR value, and removes outliers exceeding a set threshold. The server then fills in missing CVR values using statistical methods (e.g., median fill, mean fill, etc.). Finally, the server also carries out weight adjustment on the sample instance according to the distribution condition of the CVR value so as to ensure the balance of data distribution. After the pretreatment steps, the data quality of the sample instances in the main sample instance set under the CVR index is obviously improved.
In the embodiment of the invention, the plurality of evaluation indexes comprise target evaluation indexes, the instance screening result under the target evaluation indexes comprises instance screening sub-results of each instance in the candidate instance set under the target evaluation indexes, and the determination of the evaluation indexes to be processed, which need to be preprocessed, based on the instance screening result of each instance in the candidate instance set under the evaluation indexes can be implemented through the following example execution.
In the candidate instance set, a sample instance with an abnormal state as an instance screening sub-result under the target evaluation index is taken as an abnormal sample instance;
And when the number of the abnormal sample instances reaches a preprocessing condition, taking the target evaluation index as an evaluation index to be preprocessed which needs preprocessing.
In an exemplary embodiment of the present invention, the server uses multiple evaluation metrics to measure advertisement effectiveness and user behavior in an advertisement data analysis system. Wherein a particular evaluation index is set to a "target evaluation index" because that index is critical to business decision or model performance. In the advertisement effectiveness analysis, there are a plurality of evaluation indexes such as Click Through Rate (CTR), conversion rate (CVR), user Satisfaction (US), etc. Among them, CVR (conversion rate) is set as a target evaluation index because conversion rate is directly related to economic benefits of advertising campaigns. And the server screens each sample instance according to the performance of the candidate instance set under the target evaluation index, and generates a corresponding screening sub-result. These sub-results will be used for subsequent data analysis and preprocessing decisions. The server calculates the CVR value of each sample instance in the candidate instance set, and judges whether the value is abnormal according to a business rule or a statistical method. For example, if the CVR value of a sample instance is well below a historical average or industry average, it will be considered an abnormal condition and a corresponding screening sub-result will be generated. The server identifies those sample instances that are behaving abnormally based on the screening sub-results under the target evaluation index, and marks them as abnormal sample instances. These abnormal sample instances may be due to data errors, abnormal behavior, or other reasons. In the advertisement effectiveness analysis, the server finds that the CVR value of a plurality of sample instances in the candidate instance set is extremely low and far below the normal level. These sample instances are identified by the server as abnormal sample instances and marked. When the server detects that the number of abnormal sample instances reaches a certain preset threshold or meets other preprocessing conditions, the server considers that the data quality of the target evaluation index is problematic, and needs to preprocess the index. In the advertisement effectiveness analysis, the server sets a threshold value, and when the number of abnormal sample instances exceeds the threshold value, the server considers that the CVR data has a problem. Once this condition is met, the server takes the CVR as an evaluation index to be processed that needs to be preprocessed, and starts a corresponding preprocessing flow, such as data cleaning, outlier processing, data reconstruction, and the like. In this way, the server can ensure that high quality data is used for subsequent data analysis and model training.
In the embodiment of the invention, the sample instance set comprises an auxiliary sample instance set, and the determination of the candidate instance set to be screened based on the sample instance set can be implemented through the following example.
And taking the sample instance in the auxiliary sample instance set as the sample instance in a candidate instance set, and determining the candidate instance set to be screened based on the sample instance in the candidate instance set.
In an embodiment of the present invention, illustratively, in a data analysis system, a server typically has a large number of data sets, including a primary sample instance set and a secondary sample instance set. The secondary sample instance set typically contains some additional, non-core data that can be used to supplement the information of the primary sample instance set, improving the accuracy and comprehensiveness of the data analysis. In the analysis of advertisement data, the main sample instance set contains core advertisement behavior data such as clicking, browsing and purchasing by a user. The auxiliary sample instance set may contain other relevant information of the user, such as the user's search history, browsing preferences, geographic location, etc. Although the information is not a core index for directly measuring the advertising effect, more context information can be provided, so that the server can be helped to more accurately understand the user behavior and the advertising effect. In determining the candidate instance set to be screened, the server takes into account the sample instances in the auxiliary sample instance set. This is because the data in the set of auxiliary sample instances, while perhaps not a core indicator, may also contain some information that is valuable for analysis of the data. In the advertisement data analysis, the server decides to filter the candidate instance set before model training. It first takes all sample instances in the set of auxiliary sample instances (e.g., the user's search history, browsing preferences, etc.) as initial sample instances of the set of candidate instances. These sample instances will be subjected to subsequent screening and processing along with other sample instances from the main sample instance set. After taking the sample instances in the auxiliary sample instance set as initial sample instances of the candidate instance set, the server screens and processes the candidate instance set based on a certain rule or algorithm. This screening process may involve steps of data cleaning, outlier detection, data conversion, etc., with the aim of removing invalid or low quality sample instances, while preserving high quality, valuable sample instances. In advertisement data analysis, the server performs data cleansing and outlier detection on the candidate instance set. It first removes those sample instances that contain invalid data or missing critical information (e.g., user ID is empty, advertisement ID does not exist, etc.). It then uses statistical methods to detect and remove outliers (e.g., CVR values well above or well below normal levels). After these screening steps, the server obtains a cleaned and optimized set of candidate instances that will be used as input data for subsequent model training and data analysis.
In the embodiment of the invention, the sample instance set comprises an auxiliary sample instance set, the plurality of evaluation indexes comprise target evaluation indexes, the instance screening result under the target evaluation indexes comprises an instance screening sub-result of each sample instance in the candidate instance set under the target evaluation indexes, and the sample instance set is preprocessed based on the instance screening result of the candidate instance set under each evaluation index, so that implementation can be performed through the following examples.
In the candidate instance set, a sample instance with an abnormal state as an instance screening sub-result under the target evaluation index is taken as an abnormal sample instance;
And preprocessing the abnormal sample instance based on the target evaluation index.
In an exemplary embodiment of the present invention, in an advertisement data analysis system, a server maintains a set of sample instances comprising a primary set of sample instances and a secondary set of sample instances. The set of auxiliary sample instances contains additional information related to the advertising campaign, such as the user's browsing history, search records, etc., which may help to more accurately analyze the advertising effectiveness. In the advertisement data analysis system, the main sample instance set comprises core data such as advertisement clicking, displaying and converting. The auxiliary sample instance set contains information such as search keywords, browsing pages, geographic positions and the like of the user. Such information, while not directly involved in the evaluation of advertising effectiveness, may serve as supplemental information to help the server more fully understand user behavior. The server may use multiple evaluation metrics to measure the performance of the advertisement when analyzing the effectiveness of the advertisement. Wherein a particular evaluation index is set to a "target evaluation index" because that index is critical to business decision or model performance. In the advertisement effectiveness analysis, the server uses a plurality of evaluation indexes such as Click Through Rate (CTR), conversion rate (CVR), and User Satisfaction (US). Among them, CVR (conversion rate) is set as a target evaluation index because conversion rate is directly related to economic benefits of advertising campaigns. And the server screens each sample instance in the candidate instance set under the target evaluation index and generates a corresponding screening sub-result. These sub-results will be used for subsequent data analysis and preprocessing decisions. The server calculates its CVR value for each sample instance in the candidate instance set and determines if the value is abnormal based on a set threshold or business rule. If the CVR value of a sample instance is far below the normal level, it will be considered an abnormal state and a corresponding screening sub-result will be generated. The server identifies those sample instances that are behaving abnormally based on the screening sub-results under the target evaluation index, and marks them as abnormal sample instances. These abnormal sample instances may be due to data errors, abnormal behavior, or other reasons. In the advertisement effectiveness analysis, the server finds that the CVR value of a plurality of sample instances in the candidate instance set is extremely low and far below the normal level. These sample instances are identified by the server as abnormal sample instances and marked. When the server identifies abnormal sample instances, it pre-processes the abnormal sample instances based on the target evaluation index. The purpose of the preprocessing is to improve the data quality and reduce the influence of outliers on subsequent model training or data analysis. For the above identified abnormal sample instances where the CVR value is very low, the server may take a variety of preprocessing strategies, such as removing the sample instances (if they are deemed invalid or misleading), or using more robust statistical methods (such as median fill) to replace the abnormal values. In this way, the server can ensure that more accurate and reliable data is used for subsequent data analysis and model training.
In the embodiment of the invention, the target evaluation index in the plurality of evaluation indexes comprises a plurality of evaluation sub-indexes, a sub-task network matched with the target evaluation index is composed of a plurality of sub-task units matched with the plurality of evaluation sub-indexes, one evaluation sub-index corresponds to one sub-task unit, the candidate instance set is loaded to each sub-task network in the plurality of sub-task networks, and instance screening processing is carried out on the candidate instance set by each sub-task network to obtain an instance screening result of the candidate instance set under each evaluation index, and the implementation can be carried out through the following example.
Loading the candidate instance set to each subtask unit in the plurality of subtask units, and performing instance screening processing on the candidate instance set by each subtask unit to obtain an instance screening result of the candidate instance set under each evaluation subtask index;
Determining an instance screening result of the candidate instance set under the target evaluation index based on the instance screening result under each evaluation sub-index;
based on the instance screening results of the candidate instance set under the target evaluation index, determining the instance screening results of the candidate instance set under each evaluation index.
In an exemplary embodiment of the present invention, in the advertisement data analysis system, the server not only measures advertisement effectiveness using a plurality of evaluation indexes, but also further subdivides some key evaluation indexes (such as target evaluation indexes) into a plurality of evaluation sub-indexes. Doing so may more fully evaluate the performance of the advertisement in various aspects. In the advertisement effectiveness analysis, the target evaluation index "advertisement quality" is subdivided into a plurality of evaluation sub-indexes, such as "advertisement relevance" (measuring the matching degree of advertisements and user search intention), "advertisement creative" (measuring the appeal of advertisement content), and "advertisement loading speed" (measuring the speed of advertisement loading). The server builds a subtask network that matches the target evaluation index. The network is composed of a plurality of subtask units, each of which is specially responsible for processing data of an evaluation subtask index. This division of work allows each subtask unit to focus on processing its particular assessment subtopic, thereby improving processing efficiency and accuracy. In the advertisement data analysis system, the server constructs a subtask network matched with the target evaluation index of the advertisement quality. The network includes three subtask units, one responsible for processing the "advertisement relevance" data, one responsible for processing the "advertisement creative" data, and one responsible for processing the "advertisement loading speed" data. The server loads the candidate instance set into each subtask unit in the subtask network. And each subtask unit receives a part of candidate instance data and performs instance screening processing according to the corresponding evaluation subtopic. The processing results will be the example screening results under the evaluation sub-index. The server loads the candidate advertisement instance set into three subtask units. The first subtask unit receives candidate advertisement instance data and screens according to the advertisement relativity evaluation subtopic, and advertisements which are not matched with the searching intention of the user are found out. The second subtask unit screens out advertisements with poor creatives according to the advertisement creative evaluation subtasks. And the third subtask unit screens out the advertisement with too slow loading speed according to the advertisement loading speed evaluation subtask unit. The server collects the instance screening results of each subtask unit under the respective evaluation sub-index, and comprehensively determines the instance screening results of the candidate instance set under the target evaluation index according to the results. This process may involve weighting, aggregation, or comprehensive analysis of the multiple evaluation sub-index results. The server collects the screening results of the three subtask units under the evaluation sub-indexes of advertisement relevance, advertisement creative and advertisement loading speed. Then, it integrates the results according to a certain weight or rule to obtain the overall score or state of each candidate advertisement under the target evaluation index of advertisement quality. Based on this score or status, the server may determine which advertisements are of high quality and which advertisements need to be further optimized or adjusted. Although the target evaluation index is one of a plurality of evaluation indexes, the server typically also needs to determine the screening result of the candidate instance set under other evaluation indexes. This process may involve combining or correlating the screening results under the target evaluation index with the results under other evaluation indexes. After determining the screening result of the candidate advertisement instance set under the "advertisement quality" target evaluation index, the server also needs to consider other evaluation indexes such as "click through rate" (CTR), "conversion rate" (CVR), etc. the screening result of the advertisement quality and the results of the evaluation indexes can be comprehensively analyzed, so that the screening result of a more comprehensive candidate advertisement instance set is obtained. Therefore, the server can optimize the advertisement strategy and improve the advertisement effect according to the evaluation results of multiple dimensions.
In the embodiment of the invention, the sample instance set comprises a plurality of sample instance data sets of user group classification, wherein the plurality of sample instance data sets of user group classification comprise a first user group classification sample instance data set and a second user group classification sample instance data set under the first user group classification sample instance data set.
The method comprises the steps of acquiring a user group association rule associated with a first user group classification sample instance data set and a second user group classification sample instance data set, wherein the user group association rule is used for representing a corresponding relation between the first user group classification sample instance data set and the second user group classification sample instance data set;
Based on the user group association rule, summarizing the user group analysis results of the candidate instance set under each evaluation index to obtain user group analysis data matched with the second user group classification sample instance data set;
And based on the user group association rule, summarizing the user group analysis results of the candidate instance set under each evaluation index to obtain user group analysis data matched with the first user group classification sample instance data set.
In an exemplary embodiment of the present invention, in an advertising data analysis system, a server-maintained sample instance set contains a plurality of sample instance data sets of user population classifications. These user group classifications are partitioned based on the characteristics, behavior, or interests of the users in order to more accurately deliver advertisements and analyze advertisement effectiveness. In the e-commerce advertisement system, the sample instance set comprises sample instance data sets classified by a plurality of user groups, such as 'young female user groups', 'middle-aged male user groups', 'student user groups', and the like. Each user group classification sample instance data set comprises historical data such as advertisement clicking, browsing, purchasing and the like of the corresponding user group. In some cases, the user population classification may have a hierarchical structure, i.e., one large user population classification may be further subdivided into smaller sub-user population classifications. This helps to more finely analyze the behavior and needs of different user groups. In the e-commerce advertising system, a "young female user group" is a large user group classification, which can be further subdivided into two sub-user group classifications, namely "young female fashion pursuers" and "young female housewives". The two sub-user group classifications correspond to different shopping preferences and advertising needs, respectively. In order to correlate the screening results of the candidate instance set with the user population classification, the server needs to obtain user population correlation rules. these rules define correspondence between different user population classifications for user population analysis. In the e-commerce advertising system, a server obtains association rules between a 'young female user group' and two sub-user groups of a 'young female fashion pursuer' and a 'young female housewife'. These rules may include conditions or logic such as age ranges, shopping preferences, browsing behavior, etc. for determining whether a user belongs to a certain sub-user group. After the candidate instance set is subjected to screening processing of each evaluation index, the server correlates the screening result with the user group classification according to the user group correlation rule, and gathers to obtain user group analysis data matched with the specific user group classification (such as the second user group classification). The server performs screening treatment of evaluation indexes on the candidate advertisement instance set to obtain the performance of each advertisement under evaluation indexes such as click rate, conversion rate and the like. It then associates the screening result with the user population classification according to the user population association rules. For example, for a sub-user group of "young female fashion seekers," the server would aggregate the performance data of all advertisements matching the sub-user group under the evaluation index to form a user group analysis data report for the sub-user group. Similar to the previous flow, but this time, a summary of the user population analysis results is performed for the first user population class (i.e., the larger user population class). After completing the user population analysis with the two sub-user populations of "young female fashion seekers" and "young female housewives", the server also needs to sort and aggregate the larger user population of "young female user population". According to the user group association rule, the analysis data of the two sub-user groups are integrated to form a comprehensive user group analysis data report aiming at the 'young female user group'. This report may help advertisers more fully understand the advertising needs and performance of the user community.
In the embodiment of the present invention, the following implementation manner is also provided.
Determining quantization index information on a plurality of multi-dimensional data analysis levels based on an instance screening result of the candidate instance set under each evaluation index;
and displaying a visual interaction interface, wherein the visual interaction interface comprises visual components corresponding to the quantitative index information on the multi-dimensional data analysis layers.
In the embodiment of the present invention, after completing the screening process of the candidate instance set (such as advertisement instance) under each evaluation index (such as click rate, conversion rate, etc.), the server further performs deep analysis on the screening result, so as to determine the quantization index information on a plurality of different dimensions or layers. Such quantitative indicator information may help a user more fully understand the performance and characteristics of candidate instances. The server is assumed to perform screening processing on a group of candidate advertisement examples in the e-commerce advertisement system, so that the performances of the advertisements under evaluation indexes such as click rate, conversion rate and the like are obtained. The server then analyzes the screening results in multiple dimensions, such as analyzing the effect of the advertisement from different dimensions of region, time, user portrayal, etc. For example, the server may determine how the click-through rate of an advertisement is distributed across different provinces, or how the conversion rate fluctuates over different time periods during the day. These multidimensional quantization index information will be extracted for subsequent visual presentation. In order to enable a user to intuitively understand the performance of candidate instances under multiple evaluation indexes and multi-dimensional quantitative index information, a server generates a visual interaction interface and displays the information in the interface. This interface typically contains various graphic, image, etc. visualization components through which a user can interact with the data to further explore and analyze the data. In the E-commerce advertisement system, a visual interaction interface is generated by a server, and screening results of candidate advertisements under a plurality of evaluation indexes and multi-dimensional quantization index information are displayed in the interface. The interface may include a plurality of different visualization components, such as a bar graph for showing click-through rate distribution of advertisements at different provinces, and a line graph for showing conversion rate fluctuation of advertisements at different time periods in a day. The user can interact with the components through clicking, dragging and other operations to view the data details in different dimensions, so that the advertising effect and characteristics are more deeply known. In this scenario, the server first determines quantization index information at a plurality of multidimensional data profiling levels based on screening results of the candidate instance set under a plurality of evaluation indexes. Then, it generates a visual interactive interface, and displays the visual components corresponding to the quantitative index information in the interface. The user can intuitively understand the performance and characteristics of the candidate instances through the interface, and conduct further data analysis and exploration.
In the embodiment of the invention, the plurality of evaluation indexes comprise target evaluation indexes, the plurality of subtask networks comprise target subtask networks matched with the target evaluation indexes, and the embodiment of the invention also provides the following implementation modes.
The method comprises the steps of obtaining a sample screening instance matched with a target evaluation index, wherein the sample screening instance is associated with a screening target value, and the screening target value is used for representing a real screening result of the sample screening instance under the target evaluation index;
acquiring an original subtask network matched with the target subtask network, loading the sample screening instance to the original subtask network, and carrying out instance screening treatment on the sample screening instance through the original subtask network to obtain an inferred screening result of the sample screening instance under the target evaluation index;
And performing tuning operation on the original subtask network based on the real screening result and the inferred screening result, and taking the original subtask network after the tuning operation as the target subtask network.
In an exemplary embodiment of the present invention, in a recommendation system, a server maintains a plurality of evaluation metrics, such as click through rate, conversion rate, user satisfaction, etc. Wherein the conversion rate is determined as a target evaluation index because it is directly related to the core target of the business-up sales. The server is provided with a plurality of subtask networks, and each subtask network is optimized for a specific evaluation index. The target subtask network matching the conversion is selected for subsequent training and tuning. In the e-commerce platform recommendation system, the server has a subtask network named "conversion rate forecast" which is dedicated to forecasting the conversion rate of recommended goods purchased by the user. This subtask network is selected as the target subtask network that matches the target evaluation index of conversion. The server retrieves historical data from the data warehouse relating to the target evaluation index (conversion rate), which is referred to as a sample screening instance. Each sample screening instance is associated with a screening target value that represents the actual screening result of the instance at the target evaluation index (conversion), i.e., whether the user actually purchased the recommended good. The server retrieves from the database user interaction data with the recommended merchandise over a period of time, including user ID, recommended merchandise ID, whether to purchase (i.e., filter target value), etc. The "purchase or not" field is a target value for screening, which indicates whether the user actually purchases the recommended commodity. The server loads the acquired sample screening instance into an original subtask network (namely a conversion rate prediction subtask network) matched with the target evaluation index. The original subtask network will perform an instance screening process on these sample screening instances, i.e., predict the likely outcome of each instance under the target evaluation index (conversion). This predicted outcome is referred to as an inferred screening outcome. The server inputs the retrieved interaction data of the user and the recommended commodity into a conversion rate prediction subtask network. The network will process and analyze this data based on the patterns and rules it has learned and then output the probability of each user purchasing the recommended merchandise (i.e., infer screening results). The server compares the inferred screening results (predicted conversion) output by the original subtask network with the actual screening results (actual conversion) in the sample screening instance. And according to the comparison result, the server can adjust and optimize the parameters of the original subtask network so as to improve the prediction accuracy. This tuned subtask network will be used as a new target subtask network. The server compares the predicted conversion output by the "conversion prediction" subtask network with the actual conversion recorded in the database. If a large deviation exists between the predicted result and the actual result, the server adjusts parameters (such as weight, bias and the like) of the network so as to improve the prediction accuracy. After multiple iterations and optimizations, a better performing "conversion prediction" subtask network is obtained, which will be used as a new target subtask network for the subsequent recommended tasks.
In the embodiment of the invention, the sample instance set comprises a main sample instance set, and the main sample instance set is stored in a sample instance data pool, and the following implementation mode is also provided.
When the preprocessed primary sample instance set is determined, the primary sample instance set stored in the sample instance data pool is optimized by the preprocessed primary sample instance set.
In an exemplary embodiment of the present invention, a server first collects a large amount of advertisement user behavior data, including clicking, browsing, purchasing, etc. behaviors of the user, and various attribute information related to these behaviors, such as advertisement content, time, geographic location, etc. Among these data, the server identified some sample instances of particular importance or representativeness, which were categorized as "primary sample instance sets". For convenience of subsequent use and management, the server maintains these primary sample instance sets in a particular data structure, referred to as a "sample instance data pool". For example, in an e-commerce advertising scenario, a server gathers data of a large number of users browsing and purchasing merchandise. Some of these users are considered important users because of high purchase frequency, large purchase amount, or rich purchase item types. The browsing and purchase records of these important users are considered the primary sample instances and are stored in the sample instance data pool. The server will pre-process the primary sample instances before using the sample instances for advertiser analysis. The preprocessing operations may include steps of data cleansing (e.g., removing duplicate data, filling in missing values, etc.), feature extraction (e.g., extracting features of user interest, purchasing power, etc. from user behavior data), and feature encoding (e.g., converting text data into numeric data). Taking the e-commerce advertising scenario as an example, the server may preprocess the user data in the main sample instance set. First, the server will remove duplicate user purchase records, ensuring that each user's data is calculated only once. Then, the server extracts the characteristics of the user such as purchase frequency, purchase amount, and purchase commodity type for subsequent user analysis. Finally, the server encodes the features into numerical data suitable for machine learning model processing. After determining the preprocessed set of primary sample instances, the server optimizes the original set of primary sample instances in the sample instance data pool with the preprocessed sample instances. The purpose of the optimization may be to increase the quality of the sample set, increase the diversity of the samples, or increase the training efficiency of the model, etc. In an e-commerce advertising scenario, the server may use the preprocessed set of primary sample instances to optimize the original sample instance data pool. In particular, the server may check whether the new pre-processed sample instance is duplicated or similar to an existing sample in the data pool. If duplicate or similar samples are present, the server may choose to retain samples of higher quality or more representative while deleting or replacing samples of poorer quality or redundancy. In this way, the optimized sample instance data pool will contain a more diverse and high quality primary sample instance, helping to improve the accuracy and efficiency of advertising user analysis. Through the steps, the server can continuously optimize and update the sample instance data pool by utilizing the preprocessed main sample instance set, so that the accuracy and the effectiveness of analysis of advertisement users are ensured.
In the embodiment of the invention, the sample instance set comprises a main sample instance set and an auxiliary sample instance set, wherein the main sample instance set is stored in a sample instance data pool, and the following implementation mode is also provided.
And when the preprocessed auxiliary sample instance set is determined, saving the preprocessed auxiliary sample instance set to the sample instance data pool, and adding the sample instances in the preprocessed auxiliary sample instance set to the main sample instance set.
In an embodiment of the invention, the sample instance set maintained by the server is illustratively divided into two parts, a primary sample instance set and a secondary sample instance set. The main sample instance set consists of those sample instances that are critical to the analysis of the advertiser, which are typically highly representative and important, and can directly reflect the advertiser behavior characteristics of the advertiser. The secondary sample instance set contains additional sample instances that may be less critical but that also provide useful information that can be used to enhance the diversity and richness of the primary sample instance set. For example, in an e-commerce advertising scenario, the main sample instance set may contain a large amount of key information such as purchase history, browsing behavior, search keywords, etc. of the user, which can directly reflect shopping preferences and interests of the user. The auxiliary sample instance set can comprise comments, shared content, praise actions and the like of the user on social media, and the information is not as direct as the purchase record, but can also provide valuable references for analysis of the advertisement user. When the server collects new auxiliary sample instances, a preprocessing operation is performed on the instances. The purpose of preprocessing is primarily to clean the data, extract features, transform formats, etc., for subsequent analysis and model training. The preprocessing operation may include steps of removing duplicate data, filling in missing values, removing noise data, text segmentation, feature encoding, and the like. Taking an e-commerce advertisement scene as an example, after collecting comment data of a user on social media, a server firstly carries out text cleaning on the comments to remove meaningless labels, special characters and the like. Then, the server uses natural language processing technology to segment and label the parts of speech of the comments, and extracts the keywords and key phrases in the comments as features. The server then encodes and converts these features to fit the subsequent advertising user analysis model. After determining the preprocessed set of auxiliary sample instances, the server saves these instances to the sample instance data pool and adds them to the main sample instance set. The aim of this is to expand the scale of the main sample instance set and increase the diversity and richness of the samples, thereby improving the accuracy and reliability of the advertising user analysis. In the e-commerce advertising scene, after preprocessing comment data of users on social media, the server stores the comment data into a sample instance data pool. The server then adds the data to the primary sample instance set, so that the primary sample instance set not only contains direct information such as the purchase history of the user, but also contains indirect information such as behavior data of the user on social media. In this way, the advertising user analysis model can more fully understand the interests and needs of the user, thereby providing a more accurate advertisement pushing strategy. Through the steps, the server can fully utilize the auxiliary sample instance set to enhance the diversity and richness of the main sample instance set, and improve the accuracy and reliability of analysis of the advertisement user. At the same time, by continuously collecting new auxiliary sample instances and performing preprocessing and joining operations, the server can also continuously update and optimize the sample instance set to accommodate changing user requirements and advertising market environments.
The embodiment of the invention provides a computer device 100, wherein the computer device 100 comprises a processor and a nonvolatile memory storing computer instructions, and when the computer instructions are executed by the processor, the computer device 100 executes the advertising user analysis method. As shown in fig. 2, fig. 2 is a block diagram of a computer device 100 according to an embodiment of the present invention. The computer device 100 comprises a memory 111, a processor 112 and a communication unit 113. For data transmission or interaction, the memory 111, the processor 112 and the communication unit 113 are electrically connected to each other directly or indirectly. For example, the elements may be electrically connected to each other via one or more communication buses or signal lines.
The foregoing description, for purpose of explanation, has been presented with reference to particular embodiments. The illustrative discussions above are not intended to be exhaustive or to limit the disclosure to the precise forms disclosed. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the disclosure and its practical application, to thereby enable others skilled in the art to best utilize the disclosure and various embodiments with various modifications as are suited to the particular use contemplated.

Claims (9)

CN202410775423.4A2024-06-172024-06-17 Advertisement user analysis method and systemActiveCN118569936B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202410775423.4ACN118569936B (en)2024-06-172024-06-17 Advertisement user analysis method and system

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202410775423.4ACN118569936B (en)2024-06-172024-06-17 Advertisement user analysis method and system

Publications (2)

Publication NumberPublication Date
CN118569936A CN118569936A (en)2024-08-30
CN118569936Btrue CN118569936B (en)2025-02-07

Family

ID=92467094

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202410775423.4AActiveCN118569936B (en)2024-06-172024-06-17 Advertisement user analysis method and system

Country Status (1)

CountryLink
CN (1)CN118569936B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN119130557B (en)*2024-11-082025-04-25西安点扣软件科技有限公司Advertisement loading method and system based on user behavior

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN104063801A (en)*2014-06-232014-09-24广州优蜜信息科技有限公司Mobile advertisement recommendation method based on cluster
CN108363821A (en)*2018-05-092018-08-03深圳壹账通智能科技有限公司A kind of information-pushing method, device, terminal device and storage medium

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US6134532A (en)*1997-11-142000-10-17Aptex Software, Inc.System and method for optimal adaptive matching of users to most relevant entity and information in real-time
CN106355449B (en)*2016-08-312021-09-07腾讯科技(深圳)有限公司 User selection method and device
CN112819492A (en)*2019-11-152021-05-18北京达佳互联信息技术有限公司 An advertisement recommendation method, device and electronic device
CN111882361A (en)*2020-07-312020-11-03苏州云开网络科技有限公司Audience accurate advertisement pushing method and system based on artificial intelligence and readable storage medium
CN112418935B (en)*2020-11-242021-08-20上海东方财富金融数据服务有限公司 Data processing method and big data platform based on big data and advertisement push
CN117668582A (en)*2023-12-282024-03-08科睿特软件集团股份有限公司User cluster analysis method based on behavior data
CN117829914B (en)*2024-03-042024-05-10长春大学 A digital media advertising effect evaluation system
CN118014622B (en)*2024-03-052024-07-12武汉卓尔数字传媒科技有限公司Advertisement pushing method and system based on user portrait
CN118132856B (en)*2024-05-072024-07-02南京梓恒数字科技有限公司Intelligent analysis method and system based on big data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN104063801A (en)*2014-06-232014-09-24广州优蜜信息科技有限公司Mobile advertisement recommendation method based on cluster
CN108363821A (en)*2018-05-092018-08-03深圳壹账通智能科技有限公司A kind of information-pushing method, device, terminal device and storage medium

Also Published As

Publication numberPublication date
CN118569936A (en)2024-08-30

Similar Documents

PublicationPublication DateTitle
US12033177B2 (en)Systems and methods for machine learning model to calculate user elasticity and generate recommendations using heterogeneous data
KR102451099B1 (en)Purchsase inducement system and method through inferencing user's needs based on artificial intelligence
US20090138304A1 (en)Data Mining
CN118132856A (en)Intelligent analysis method and system based on big data
CN112200601A (en)Item recommendation method and device and readable storage medium
US20190080352A1 (en)Segment Extension Based on Lookalike Selection
CN114861079B (en) A collaborative filtering recommendation method and system integrating product features
CN118569936B (en) Advertisement user analysis method and system
CN118710375B (en)Prefabricated dish recommending method and system
CN119398866A (en) Design of a recommendation algorithm based on big data in an e-commerce platform
CN116304374B (en)Customer matching method and system based on package data
US20230245144A1 (en)System for identifying and predicting trends
Thakur et al.Enhancing customer experience through ai-powered personalization: A data science perspective in e-commerce
Noviantoro et al.Applying data mining techniques to investigate online shopper purchase intention based on clickstream data
Dhanushkodi et al.Customer Behavior Analysis and Predictive Modeling in Supermarket Retail: A Comprehensive Data Mining Approach
CN118967207B (en)Customer loss prediction system and method based on machine learning
CN118485459A (en) A system for accelerating the generation of user portraits
CN117689444A (en) A standard retail package price comparison method and system for small and medium-sized users
Fuad et al.A Recommender System for Mobile Applications of Google Play Store
CN111400567B (en)AI-based user data processing method, device and system
Shukla et al.Retracted: Performance optimization of unstructured E‐commerce log data for activity and pattern evaluation using web analytics
Moradi et al.Analyzing the customer purchase data of an online shopping store by data mining: A real case study in Iran
Kannan et al.Visualizing the interactions and relationships from sales data and data-driven automatic product bundling to increase cross-selling
CN120106894B (en)Client marketing strategy recommendation method, system and medium based on artificial intelligence
Chaudhary et al.Towards Quality Ad Selection: A Model-based Approach to Performance Filtering

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp