Movatterモバイル変換


[0]ホーム

URL:


CN114022209A - Advertisement recall method and device based on alliance flow - Google Patents

Advertisement recall method and device based on alliance flow
Download PDF

Info

Publication number
CN114022209A
CN114022209ACN202111323295.2ACN202111323295ACN114022209ACN 114022209 ACN114022209 ACN 114022209ACN 202111323295 ACN202111323295 ACN 202111323295ACN 114022209 ACN114022209 ACN 114022209A
Authority
CN
China
Prior art keywords
advertisement
channel
model
recall
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111323295.2A
Other languages
Chinese (zh)
Inventor
罗伟东
杨璞
林冠彦
丁炜塨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Hexun Huagu Information Technology Co ltd
Original Assignee
Shenzhen Hexun Huagu Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Hexun Huagu Information Technology Co ltdfiledCriticalShenzhen Hexun Huagu Information Technology Co ltd
Priority to CN202111323295.2ApriorityCriticalpatent/CN114022209A/en
Publication of CN114022209ApublicationCriticalpatent/CN114022209A/en
Pendinglegal-statusCriticalCurrent

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本发明提供了一种基于联盟流量的广告召回方法及装置,方法包括:对广告请求内的用户的ID进行基于hash编码进行通道分流,确定广告的召回渠道;过滤掉文案相似的广告信息;判断分流后的广告信息是否为新的广告信息,若是,则进行精排序;若否,则通过多路广告召回模型中进行处理;将多路广告召回模型处理的广告,按照分配的广告数量并行匹配数据,得到去重复广告信息;将符合数量要求的广告列表返回到精排序中,不符合则进行随机填充后返回到精排序中;精排序计算出广告的竞价出价排行,返回竞价成功的广告ID。本发明通过多路广告召回模型,保证每个产品都有曝光的机会,保证了竞价体系的公平性,利用相似文案过滤策略,防止了恶性竞价行为。

Figure 202111323295

The present invention provides a method and device for recalling advertisements based on alliance traffic. The method includes: performing channel shunting based on hash coding on the IDs of users in advertisement requests to determine the recalling channels of advertisements; filtering out advertisement information with similar copywriting; Whether the diverted advertisement information is new advertisement information, if so, it will be finely sorted; if not, it will be processed in the multi-channel advertisement recall model; the advertisements processed by the multi-channel advertisement recall model will be matched in parallel according to the number of allocated advertisements Data, get de-duplicated advertisement information; return the list of advertisements that meet the quantity requirements to the fine sorting, if they do not meet the requirements, they will be filled randomly and then returned to the fine sorting; the fine sorting calculates the bidding ranking of the advertisement, and returns the ID of the successful bidding advertisement . Through the multi-channel advertisement recall model, the present invention ensures that each product has the opportunity to be exposed, ensures the fairness of the bidding system, and uses the similar copywriting filtering strategy to prevent vicious bidding behaviors.

Figure 202111323295

Description

Advertisement recall method and device based on alliance flow
Technical Field
The present invention relates to an advertisement recall method and apparatus, and more particularly, to an advertisement recall method and apparatus based on federation traffic.
Background
In the whole link of the real-time bidding advertisement of the alliance flow, a request is initiated by a channel side and sent to a DSP (flow demand party platform, namely an advertiser can release the demand and bid on which the advertiser needs to put the advertisement) through an ADX (advertisement real-time bidding transaction platform), the DSP returns the advertisement to participate in bidding on the basis of flow evaluation, the advertisement is returned after the ADX is subjected to price comparison, and the advertisement is displayed and reported by a medium to complete the whole advertisement flow.
After receiving an advertisement request, the DSP needs to find out a conforming advertisement within 300ms and return the conforming advertisement to the ADX, the number of candidates in the advertisement library is huge, scoring evaluation of a large number of advertisements is to be completed in a very short time, and the common practice in the industry is divided into two main links similar to that of a recommendation system: recall + sort.
The recall aims at accurately screening a candidate set, reducing the number of candidates and facilitating the accurate sequencing of a complex model used in a subsequent sequencing link; because a large number of candidates are evaluated in a short time, the key point of recall is a quick word, and compared with the limitation of sorting, the recalled algorithm model is relatively simple and uses fewer features.
Under the premise that a single-way recall and a single model are not enough, multiple strategies are generally applied at the same time in one personalized recommendation, and if the recommendation is attempted only through a certain refined recommendation strategy, users tend to be interested in the early stage, and the users gradually fatigue with the increase of the number. After all, the reading tendency of the user is often multivariate, and therefore, the recall strategy is generally to screen the candidate set from various strategies.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: provided are an advertisement recall method and device based on alliance flow.
In order to solve the technical problems, the invention adopts the technical scheme that: a federation traffic-based advertisement recall method, comprising:
acquiring an advertisement request which contains the ID of a user and the information of an advertisement;
performing channel distribution on the ID of the user in the advertisement request based on hash coding, and determining a recall channel of the advertisement;
after the advertisement recall channel is determined, filtering out advertisement information with similar files based on file vectors through a similar file filtering model;
judging whether the advertisement information after the shunting is new advertisement information, if so, transmitting the new advertisement information to a fine sorting; if not, inputting the advertisement information into a multi-channel advertisement recall model for processing, wherein the multi-channel advertisement recall model comprises a collaborative filtering model based on users, a matrix decomposition model, a product division random model and a full random model based on stream processing;
matching the advertisements processed by the multi-channel advertisement recall model with data in parallel according to the number of the distributed advertisements to obtain repeated advertisement information;
judging whether the advertisement quantity meets the threshold requirement, if so, returning the advertisement list to the fine sorting, if not, randomly filling, and returning the advertisement list to the fine sorting;
calculating the estimated click rate of each advertisement and the bid price of each click of an advertiser according to behavior information, equipment information, user liveness, geographical position and advertisement context characteristics given to the user by fine sorting to obtain the final bid price ranking of the advertisement;
and (4) according to the bidding and ranking of the advertisements, returning the advertisement ID with successful bidding and completing the advertisement recall.
Further, user similarity optimization is carried out in the user-based collaborative filtering model, the channel exposure and the channel high-version proportion number are increased to describe the users, and single feature dimensions and cross feature dimensions are aggregated from the user dimensions.
Further, the performing channel distribution on the ID of the user in the advertisement request based on the hash code to determine the advertisement recall channel includes: applying an intra-application message pushing and message pushing form;
the application message pushing comprises five channels, namely a matrix decomposition and product division random fusion model channel, a high ECPM strategy and product division random fusion model channel, a new product channel, a product division random algorithm channel and a pure random algorithm channel;
the message pushing mode comprises six channels, namely a high ECPM strategy and product division random fusion model channel, a new product channel, a product division random algorithm channel and a pure random algorithm channel.
Further, after confirming the advertisement recall channel, filtering out advertisement information similar to the file based on the file vector, including,
coding the pattern of the advertisement through a dictionary, and converting characters of the pattern into vectors;
based on the file vector, calculating vector similarity and setting a threshold, sorting completely consistent files according to ECPM, and reserving 50% of files with the highest ECPM; and for the documents with the similarity of more than 95%, the documents with the highest proportion of 30% of ECPM are reserved according to the ECPM sequence.
Further, the above-mentioned judging that the advertisement information after shunting is new advertisement information, if yes, transmitting the new advertisement information to the fine sorting includes,
and the recall system judges whether the historical data exists in the historical data of the model or not, if not, the historical data is judged to be new advertisement information, the new advertisement information is subjected to advertisement cold start strategy processing and channel cold start strategy processing, and then the new advertisement information is transmitted to the fine sorting.
The invention also provides an advertisement recalling device based on the alliance flow, which comprises:
the advertisement request acquisition module is used for acquiring an advertisement request which contains the ID of a user and the information of the advertisement;
the channel distribution module is used for carrying out channel distribution on the ID of the user in the advertisement request based on hash coding and determining a recall channel of the advertisement;
the file filtering module is used for filtering out advertisement information similar to the file on the basis of the file vector through the similar file filtering model after the advertisement recall channel is determined;
the model processing module is used for judging whether the shunted advertisement information is new advertisement information or not, and if so, transmitting the new advertisement information to the fine sorting module; if not, inputting the advertisement information into a multi-channel advertisement recall model for processing, wherein the multi-channel advertisement recall model comprises a collaborative filtering model based on users, a matrix decomposition model, a product division random model and a full random model based on stream processing;
the model integration module is used for matching the advertisements processed by the multi-channel advertisement recall model with the data in parallel according to the number of the distributed advertisements to obtain the repeated advertisement information;
the advertisement threshold value judging module is used for judging whether the advertisement quantity meets the threshold value requirement, returning the advertisement list to the fine sorting if the advertisement quantity meets the threshold value requirement, randomly filling if the advertisement quantity does not meet the threshold value requirement, and returning the advertisement list to the fine sorting;
the advertisement bidding module is used for calculating the estimated click rate of each advertisement and the bid price of each click of the advertiser according to the behavior information, the equipment information, the user liveness, the geographic position and the advertisement context characteristics given to the user by the fine sorting to obtain the final bidding bid price ranking of the advertisement;
and the advertisement returning module is used for ranking according to the bidding price of the advertisement, returning the advertisement ID with successful bidding price and finishing the advertisement recall.
Further, in the model processing module, user similarity optimization is performed on the collaborative filtering model based on the user, the channel exposure and the channel high version ratio number are increased to describe the user, and the single feature dimension and the cross feature dimension are aggregated from the user dimension.
Further, the channel splitting module is configured to split a channel based on a hash code for an ID of a user in the advertisement request, and determine a recall channel of the advertisement, including: applying an intra-application message pushing and message pushing form;
the application message pushing comprises five channels, namely a matrix decomposition and product division random fusion model channel, a high ECPM strategy and product division random fusion model channel, a new product channel, a product division random algorithm channel and a pure random algorithm channel;
the message pushing mode comprises six channels, namely a high ECPM strategy and product division random fusion model channel, a new product channel, a product division random algorithm channel and a pure random algorithm channel.
Further, the file filtering module is used for,
coding the pattern of the advertisement through a dictionary, and converting characters of the pattern into vectors;
based on the file vector, calculating vector similarity and setting a threshold, sorting completely consistent files according to ECPM, and reserving 50% of files with the highest ECPM; and for the documents with the similarity of more than 95%, the documents with the highest proportion of 30% of ECPM are reserved according to the ECPM sequence.
Further, the model processing module judges whether the advertisement information after the shunting is new advertisement information, if so, the new advertisement information is transmitted to the fine sorting module,
and the recall system judges whether the historical data exists in the historical data of the model or not, if not, the historical data is judged to be new advertisement information, the new advertisement information is subjected to advertisement cold start strategy processing and channel cold start strategy processing, and then the new advertisement information is transmitted to the fine sorting.
The invention has the technical effects that: the product-based random model is adopted in the model recall, and the product-based random algorithm is utilized to ensure that each product has creative participation in the bidding, more opportunities are given to the products with high income, the original purely random opportunities are changed into product-based opportunities, each advertiser is ensured to have exposure opportunities, and the fairness of the bidding system is ensured. By utilizing a similar file filtering strategy, the malignant bidding behavior caused by the water injection file is ensured not to occur, and the number of repeated files is greatly reduced compared with the prior art. Through the fusion of the model and the strategy, the effective distribution of the flow is ensured, and the ECPM (thousands of exposure gains) is improved by about 12 percent compared with the standard average whole.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the mechanisms shown in the drawings without creative efforts.
FIG. 1 is a flowchart of a federation traffic-based advertisement recall method according to an embodiment of the present invention;
fig. 2 is a block diagram of an advertisement recall apparatus based on federation traffic according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the description of the invention relating to "first", "second", etc. is for descriptive purposes only and is not to be construed as indicating or implying any relative importance or implicit indication of the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.
Referring to fig. 1, a first embodiment of the present invention is: an advertisement recall method based on alliance flow comprises the following steps:
s10, obtaining an advertisement request, wherein the advertisement request contains the ID of the user and the information of the advertisement;
s20, carrying out channel distribution on the ID of the user in the advertisement request based on hash coding, and determining a recall channel of the advertisement; the method comprises the following steps: applying an intra-application message pushing and message pushing form;
the application message pushing comprises five channels, namely a matrix decomposition and product division random fusion model channel, a high ECPM strategy and product division random fusion model channel, a new product channel, a product division random algorithm channel and a pure random algorithm channel;
the message pushing mode comprises six channels, namely a high ECPM strategy and product division random fusion model channel, a new product channel, a product division random algorithm channel and a pure random algorithm channel.
The interior of the recall system can be shunted according to hash codes, different models can be obtained for different quantities according to the inconsistency of advertisement styles, a plurality of rounds of tests can be carried out in the early period, and an optimal flow distribution strategy is obtained based on data analysis.
a) Message push within application:
i. the 45% flow is distributed to a matrix decomposition and product distribution random fusion model, the matrix decomposition is divided into 50 advertisements, and a random algorithm obtains 150 advertisements.
ii. The 40% flow is distributed to a high ECPM strategy and product distribution random fusion model, the high ECPM strategy is divided into 50 advertisements, and a random algorithm obtains 150 advertisements.
And iii, 5% of flow is distributed to new products, new documentations and direct randomness.
iv, 5% flow is distributed to a product random algorithm, and malignant bidding among products is reduced.
v, 5% flow is distributed to a pure random algorithm to be used as effect comparison of model lifting.
b) Message push mode:
i. the 30% flow is divided into a matrix decomposition model, a high ECPM strategy and a product division random fusion model, the matrix decomposition is divided into 50 advertisements, the high ECPM strategy is divided into 50 advertisements, and a random algorithm obtains 100 advertisements.
ii. The method comprises the steps of distributing 30% of flow to collaborative filtering, randomly fusing a high ECPM strategy and a distributed product model, distributing 50 advertisements based on channel collaborative filtering, distributing 50 advertisements to the high ECPM strategy, and obtaining 100 advertisements by a random algorithm.
And iii, distributing the flow of 25% to a high ECPM strategy and product distribution random fusion model, distributing 50 advertisements to the high ECPM strategy, and obtaining 150 advertisements by a random algorithm.
iv, 5% flow rate is assigned to new product, new case, direct randomization.
v, 5% flow is divided into product random algorithm, and the vicious bidding among products is reduced.
vi, 5% flow is distributed to a pure random algorithm for comparison of the effect of model lifting.
S30, after the advertisement recall channel is determined, filtering out advertisement information with similar files based on file vectors through a similar file filtering model; the method specifically comprises the following steps: coding the pattern of the advertisement through a dictionary, and converting characters of the pattern into vectors; based on the file vector, calculating vector similarity and setting a threshold, sorting completely consistent files according to ECPM, and reserving 50% of files with the highest ECPM; and for the documents with the similarity of more than 95%, the documents with the highest proportion of 30% of ECPM are reserved according to the ECPM sequence. If no matching documentation occurs, it is ignored.
S40, judging whether the advertisement information after the shunting is new advertisement information, if so, transmitting the new advertisement information to a fine sorting; if not, inputting the advertisement information into a multi-channel advertisement recall model for processing, wherein the multi-channel advertisement recall model comprises a collaborative filtering model based on users, a matrix decomposition model, a product division random model and a full random model based on stream processing;
the recall system will first determine if it is a cold start, such as new data, new advertisements, and no historical data in the model:
a) cold starting of the advertisement:
i. the number of cold starts of the creative is judged, and if the advertisements come from are all cold starts, the advertisements are directly processed according to a random algorithm.
ii. If there are old advertisements, the model channel is followed.
b) Channel cold starting; if the channel is cold started, the aid number returned by the model becomes 0.
S50, matching the advertisements processed by the multi-channel advertisement recall model with data in parallel according to the number of the distributed advertisements to obtain repeated advertisement information;
s60, judging whether the advertisement quantity meets the threshold value requirement, if so, returning the advertisement list to the fine sorting, if not, randomly filling, and then, returning the advertisement list to the fine sorting;
s70, calculating the estimated click rate of each advertisement and the bid price of each click of an advertiser according to behavior information, equipment information, user liveness, geographic position and advertisement context characteristics given to the user by fine sorting, and obtaining the final bid price ranking of the advertisement;
and S80, ranking according to the bidding price of the advertisement, returning the advertisement ID with successful bidding price, and completing the advertisement recall.
In a specific embodiment, user similarity optimization is performed on a user-based collaborative filtering model, channel exposure and channel high-version proportion number are increased to describe users, and single feature dimensions and cross feature dimensions are aggregated from user dimensions.
In the embodiment, a product-based random model is adopted in the model recall, and a product-based random algorithm is utilized, so that more opportunities are given to products with high income while each product is initiatively participated in the bidding, the original pure randomness is changed into product-based opportunities, each advertiser is ensured to have an exposure opportunity, and the fairness of a bidding system is ensured. By utilizing a similar file filtering strategy, the malignant bidding behavior caused by the water injection file is ensured not to occur, and the number of repeated files is greatly reduced compared with the prior art. Through the fusion of the model and the strategy, the effective distribution of the flow is ensured, and the ECPM (thousands of exposure gains) is improved by about 12 percent compared with the standard average whole. Based on the assumption that the number of union flow channels is large and the channel users have similarity, the channels are used for replacing the users to calculate the similarity, the calculation difficulty is reduced, and meanwhile flow preferred distribution is achieved.
The number of operational users based on federation traffic can reach 2 million per day, while the model needs to compute full users, and under this time delay pressure, it is not practical to compute full users in the recall phase and do full matches at high QPS. If a clustering algorithm is used, the characteristic information of the whole number of users also needs to be stored on the line, and the condition of cold start of a new user cannot be included, and on the basis, some optimization needs to be carried out on the model.
Recalling the model: a user-based collaborative filtering model (UserCF).
1. Based on the principle that users coming from the same channel are similar, such as users who install k songs of the whole population, the user quality is similar. A layer of improvement is made, and a channel main mark is used for representing a user, so that the calculation amount is reduced, and the occupied ratio of cold start of the user is greatly reduced.
2. Grouping is performed from a channel only, so that the situation that the granularity is too coarse possibly exists, although the recall speed is guaranteed, the recall accuracy cannot be guaranteed, and therefore a feature with the highest feature importance is selected from more than 2 hundred features by using a decision tree algorithm.
The decision tree model training process comprises the following steps:
a) sequentially traversing all the features, and selecting the feature with the largest information gain (the largest degree of information uncertainty reduction);
b) then continuously selecting the features and splitting into 2 groups, continuously and respectively calculating the information gain of each node under the 2 groups of the feature groups, selecting the features with the maximum information gain, and continuously splitting;
c) stopping splitting after threshold conditions (number of leaf nodes, tree depth, number of users on each leaf node) are reached;
and (3) screening the feature flow with the highest importance by the decision tree:
according to the following feature importance formula, traversing all features can obtain feature importance ranking, and the larger the calculated value is, the higher the importance is.
The calculation formula is as follows:
feature_importances=N_t/N*(impurity-N_t_R/N_t*right_impurity-N_t_L/N_t*left_impurity);
where N is the total number of samples, N _ t is the number of samples for the current node, N _ t _ L is the number of samples for the left child of the node, N _ t _ R is the number of samples for the right child of the node, impuity is the degree of non-purity (the kini index or entropy) of the feature node, right _ impuity is the right-side tree degree of non-purity, and 1eft _ impuity is the left-side tree degree of non-purity.
3. The concept of a "user" (hereinafter "user") is replaced with channel + features. Then under the concept of the user, the click willingness (CTR click rate) of the user to the advertisement, the income (ECPM) brought by the advertisement, the click willingness to the advertisement products (different advertisement similar products), the income (ECPM) brought by the advertisement products (different advertisement similar products) and the channel ECPM are calculated. From these 5 dimensions, the user vector is derived.
4. By comparing a plurality of similarity calculation modes, the cosine similarity is finally selected to calculate the similarity of the user, and the following formula is a formula for calculating the cosine similarity by using two vectors A and B:
Figure BDA0003345637280000091
5. then based on the similarity of the users, the following formula is used for obtaining the recommendation score:
Figure BDA0003345637280000092
the UserCF algorithm will recommend items to the "user" that are liked by the most similar K "users". The set K is a full user, and comprehensiveness of information is guaranteed.
The formula mainly measures the interest degree of the user u in the item i in the UserCF algorithm: where S (u, K) contains K users with the closest interests to the target "user" u, n (i) is the set of users who have performed the item i, Wuv is the similarity between user u and user v, Rvi represents the interest measure of user v for item i, where ECPM is used as the measure.
6. And after the recommendation scores of the result set are obtained, sorting in a reverse order to obtain the recommended advertisement list of the user.
Recalling the model: and (5) decomposing the model by using a matrix.
1. Calculating a co-occurrence matrix of the user and the advertisement: (similarity of user concept and collaborative filtering for matrix decomposition, which is not described here);
the matrix is filled with interest metrics, ECPM was initially chosen because both revenue and user click interest can be described. However, since some "user" exposures are too high, resulting in a false high ECPM, a layer of weight reduction needs to be applied to high-exposure users using the following formula:
the formula: newEcpm ═ Ecpm (log (base, exposure/scaling));
the base number is 1000;
the scaling is 50, and the value has the best effect after testing a plurality of groups;
when the exposure is less than 5 ten thousand, a layer of weight reduction is carried out, and the ECPM data fluctuation is increased due to low exposure; when the exposure is more than 5 ten thousand, the ECPM is not changed.
2. And decomposing the co-occurrence matrix into two small matrixes of U and V by using a least square method.
3. And multiplying the matrix U and the matrix V to obtain the final recommendation score of each user, and obtaining the recommendation list of each user in a reverse order.
Recalling the model: and (5) dividing a product random model.
In the product-based random stage, the random of the weight is realized, the weight is added for some advertisements (such as new creatives or creatives which need to artificially increase the probability of the participation competition), and the probability of the participation competition in the random part is increased:
1. the weight information of all advertisements is calculated off-line and stored on-line.
2. The request for entering the stochastic model reads the weights of all advertisements from the database.
3. When a request comes, assuming that 1000 advertisements come, 10 products and 200 advertisements are needed at random, there are 20 opportunities for each product, and then advertisements are selected in parallel for a plurality of products.
4. Calculating the number of advertisements that can be selected from each product, and obtaining the weight sum (product division) of the advertisements
5. Using this data, the probability of each advertisement being selected is calculated:
Figure BDA0003345637280000101
pi is the probability of the advertisement being selected, index is the weight, and the denominator is the sum of the weights of the requested advertisements.
6. A random number k of (0, 1) is generated then let p be the current weight.
7. If the random number K is less than the probability of the first advertisement, the first advertisement is returned.
8. If the random number K is greater than the probability of the first advertisement, then p is the weight of the first advertisement + the weight of the second advertisement.
9. And sequentially circulating until K is less than p, and returning the advertisements with the corresponding number.
10. And then repeated among the remaining advertisements until 20 advertisements for each product are randomly completed.
Recalling the model: full-scale stochastic model based on stream processing
The random without weight of stream processing is realized, and because training sets with required number are obtained randomly in a large number of advertisements in an equal probability manner, an algorithm of stream processing equal probability is needed, and the algorithm rapidly passes through one cycle, thereby reducing cycle times and time complexity.
1. The ad request carries n ads, and assuming that k ads need to be selected from the n ads, an algorithm is needed to ensure that the probability of each ad being selected is k/n.
2. According to the sequence of n advertisements brought by the request, the 1 st to the kth enter an initialized return list re.
3. Then starting with k +1 ads, k +1 ads may replace any of the first k or be static. So the probability of returning the k +1 th advertisement is k/(k +1), while the previous k advertisements can be successfully returned as long as they are not replaced, so the probability of successfully returning is 1 x (1-1/(k +1)) -k/(k +1)
4. Then, each advertisement is processed in turn, and according to the same idea, the probability of each final advertisement is demonstrated to be k/n, and the formula is as follows: (since k +1 has already been demonstrated, the probability of selection of the ith advertisement is also demonstrated below as k/n)
Figure BDA0003345637280000111
5. And after traversing n advertisements, returning after obtaining the advertisement list re.
Recalling the model: similar document filtering model.
1. And arranging a text dictionary table comprising Chinese, emoji expressions, other languages, full-angle punctuations, half-angle punctuations and the like.
2. And (3) sorting the text mapping, and defining the corresponding conditions of several special conditions:
a) at the beginning of the text: filled uniformly with a notation [ STR ].
b) Ending the case: filled uniformly with a notation [ END ].
c) Dictionary tables cannot be matched: filled uniformly with a notation [ UNK ].
d) Truncation is direct beyond setting the vector length.
e) Not satisfying the set vector length, is filled with the uniform mark symbol [ SEP ].
3. The full-scale file is coded, and the text is corresponding to non-repeated numbers, such as [ STR ] corresponding to 0 and the like.
4. And calculating and storing the similarity of the file vector.
5. A threshold is set based on the result and adjusted according to the result. Completely consistent files are sorted according to the ECPM, and the file with the highest 50% proportion of the ECPM is reserved; and (4) documents with the similarity larger than 95% are sorted according to the ECPM, and documents with the highest proportion of 30% of the ECPM are reserved.
As shown in fig. 2, another embodiment of the present invention is: an advertisement recall apparatus based on federation traffic, comprising:
an advertisementrequest obtaining module 10, configured to obtain an advertisement request, where the advertisement request includes an ID of a user and information of an advertisement;
thechannel distribution module 20 is configured to perform channel distribution on the ID of the user in the advertisement request based on hash coding, and determine a recall channel of the advertisement;
thefile filtering module 30 is configured to filter advertisement information with similar files based on file vectors through a similar file filtering model after determining a recall channel of the advertisement;
themodel processing module 40 is configured to determine whether the shunted advertisement information is new advertisement information, and if so, transmit the new advertisement information to the fine sorting; if not, inputting the advertisement information into a multi-channel advertisement recall model for processing, wherein the multi-channel advertisement recall model comprises a collaborative filtering model based on users, a matrix decomposition model, a product division random model and a full random model based on stream processing;
themodel integration module 50 is used for matching the advertisements processed by the multi-channel advertisement recall model with the data in parallel according to the number of the distributed advertisements to obtain the repeated advertisement information;
the advertisement thresholdvalue judging module 60 judges whether the number of advertisements meets the threshold value requirement, returns the advertisement list to the fine sorting if the number of advertisements meets the threshold value requirement, randomly fills the advertisement list if the number of advertisements does not meet the threshold value requirement, and returns the advertisement list to the fine sorting;
theadvertisement bidding module 70 is used for calculating the estimated click rate of each advertisement and the bidding price of the advertiser in each click according to the behavior information, the equipment information, the user liveness, the geographic position and the advertisement context characteristics given to the user by the fine sorting, so as to obtain the final bidding price ranking of the advertisement;
and anadvertisement return module 80, configured to arrange according to the bidding price of the advertisement, return an advertisement ID with which bidding is successful, and complete advertisement recall.
Further, in themodel processing module 40, user similarity optimization is performed on the collaborative filtering model based on the user, the channel exposure and the channel high-version proportion number are increased to describe the user, and a single feature dimension and a cross feature dimension are aggregated from the user dimension.
Further, thechannel splitting module 20 is configured to split a channel based on a hash code for an ID of a user in an advertisement request, and determine a recall channel of an advertisement, where the channel splitting module includes: applying an intra-application message pushing and message pushing form;
the application message pushing comprises five channels, namely a matrix decomposition and product division random fusion model channel, a high ECPM strategy and product division random fusion model channel, a new product channel, a product division random algorithm channel and a pure random algorithm channel;
the message pushing mode comprises six channels, namely a high ECPM strategy and product division random fusion model channel, a new product channel, a product division random algorithm channel and a pure random algorithm channel.
Further, thedocument filtering module 30 is used for,
coding the pattern of the advertisement through a dictionary, and converting characters of the pattern into vectors;
based on the file vector, calculating vector similarity and setting a threshold, sorting completely consistent files according to ECPM, and reserving 50% of files with the highest ECPM; and for the documents with the similarity of more than 95%, the documents with the highest proportion of 30% of ECPM are reserved according to the ECPM sequence.
Further, in themodel processing module 40, it is determined whether the advertisement information after being distributed is new advertisement information, if yes, the new advertisement information is transmitted to the fine sorting module,
and the recall system judges whether the historical data exists in the historical data of the model or not, if not, the historical data is judged to be new advertisement information, the new advertisement information is subjected to advertisement cold start strategy processing and channel cold start strategy processing, and then the new advertisement information is transmitted to the fine sorting.
It should be noted that, as can be clearly understood by those skilled in the art, for a specific implementation process of the above advertisement recall device based on federation traffic, reference may be made to the corresponding description in the foregoing method embodiment, and for convenience and brevity of description, no further description is provided herein.
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes performed by the present specification and drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

Translated fromChinese
1.一种基于联盟流量的广告召回方法,其特征在于,包括:1. an advertisement recall method based on alliance traffic, is characterized in that, comprises:获取广告请求,广告请求内包含有用户的ID及广告的信息;Obtain an advertisement request, which contains the user's ID and advertisement information;对广告请求内的用户的ID进行基于hash编码进行通道分流,确定广告的召回渠道;Perform channel shunting based on hash coding on the user ID in the advertisement request to determine the recall channel of the advertisement;在确定广告的召回渠道后,通过相似文案过滤模型基于文案向量过滤掉文案相似的广告信息;After determining the recall channel of the advertisement, the similar copy filtering model is used to filter out the advertisement information with similar copy based on the copy vector;判断分流后的广告信息是否为新的广告信息,若是,则将新的广告信息传输到精排序中;若否,则将广告信息输入到多路广告召回模型中进行处理,包括基于用户的协同过滤模型、矩阵分解模型、分产品随机模型、基于流处理的全量随机模型;Determine whether the diverted advertisement information is new advertisement information, if so, transmit the new advertisement information to the fine sorting; if not, input the advertisement information into the multi-channel advertisement recall model for processing, including user-based collaboration Filtering model, matrix decomposition model, random model by product, full random model based on stream processing;将多路广告召回模型处理的广告,按照分配的广告数量并行匹配数据,得到去重复广告信息;Recall the advertisements processed by the multi-channel advertisement model, and match the data in parallel according to the allocated number of advertisements to obtain de-duplicated advertisement information;判断广告数量是否符合阈值要求,符合则将广告列表返回到精排序中,不符合则进行随机填充,再将广告列表返回到精排序中;Determine whether the number of advertisements meets the threshold requirements. If they meet the requirements, the advertisement list will be returned to the fine sorting. If it does not meet the requirements, it will be filled randomly, and then the advertisement list will be returned to the fine sorting;根据精排序给予用户的行为信息、设备信息、用户活跃度、地理位置、广告上下文特征,计算出每个广告的预估点击率、广告主每次点击出价,得到最终的广告的竞价出价排行;According to the user's behavior information, device information, user activity, geographic location, and advertising context characteristics given by the refined sorting, calculate the estimated click-through rate of each advertisement and the advertiser's bid per click, and obtain the final bidding ranking of the advertisement;根据广告的竞价出价排行,返回竞价成功的广告ID,完成广告召回。According to the bidding ranking of advertisements, return the ID of the advertisement that successfully bids, and complete the advertisement recall.2.如权利要求1所述的基于联盟流量的广告召回方法,其特征在于:对基于用户的协同过滤模型中进行用户相似度优化,增加渠道曝光量和渠道高版本占比个数来描述用户,并从用户维度聚合成单特征维度和交叉特征维度。2. the advertisement recall method based on alliance traffic as claimed in claim 1, is characterized in that: carry out user similarity optimization in the collaborative filtering model based on user, increase channel exposure and channel high version ratio number to describe user , and aggregated from the user dimension into a single-feature dimension and a cross-feature dimension.3.如权利要求1所述的基于联盟流量的广告召回方法,其特征在于:所述对广告请求内的用户的ID进行基于hash编码进行通道分流,确定广告的召回渠道中,包括:应用内消息推送及消息推送形式;3. the advertisement recall method based on alliance traffic as claimed in claim 1, is characterized in that: described to the ID of the user in the advertisement request is carried out channel shunting based on hash coding, determines in the recall channel of advertisement, comprises: in the application Message push and message push form;应用消息推送包括五个通道,分别为矩阵分解与分产品随机融合模型通道、高ECPM策略与分产品随机融合模型通道、新产品通道、分产品随机算法通道、纯随机算法通道;Application message push includes five channels, namely, matrix decomposition and sub-product random fusion model channel, high ECPM strategy and sub-product random fusion model channel, new product channel, sub-product random algorithm channel, and pure random algorithm channel;消息推送形式包括六个通道,分别为高ECPM策略与分产品随机融合模型通道、高ECPM策略与分产品随机融合模型通道、高ECPM策略与分产品随机融合模型通道、新产品通道、分产品随机算法通道、纯随机算法通道。The form of message push includes six channels, namely high ECPM strategy and sub-product random fusion model channel, high ECPM strategy and sub-product random fusion model channel, high ECPM strategy and sub-product random fusion model channel, new product channel, and sub-product random fusion model channel. Algorithm channel, pure random algorithm channel.4.如权利要求1所述的基于联盟流量的广告召回方法,其特征在于:所述在确认广告的召回渠道后,基于文案向量过滤掉文案相似的广告信息,包括,4. The advertising recall method based on alliance traffic as claimed in claim 1, characterized in that: after confirming the recall channel of the advertisement, filtering out the advertisement information with similar copy based on the copy vector, including,通过字典对广告的文案进行编码,将文案的文字转换为向量;Encode the copy of the advertisement through a dictionary, and convert the text of the copy into a vector;基于文案向量,计算向量相似度并设置阈值,对于完全一致的文案,按照ECPM排序,保留ECPM最高的50%比例文案;对于相似度大于95%的文案,按照ECPM排序,保留ECPM最高的30%比例文案。Based on the copy vector, calculate the vector similarity and set a threshold. For the completely consistent copy, sort according to ECPM, and keep the 50% of the copy with the highest ECPM; for the copy with a similarity greater than 95%, sort according to the ECPM, and keep the 30% with the highest ECPM Proportional copy.5.如权利要求1所述的基于联盟流量的广告召回方法,其特征在于:所述判断分流后的广告信息是否为新的广告信息,若是,则将新的广告信息传输到精排序中包括,5. the advertisement recall method based on alliance flow as claimed in claim 1 is characterized in that: whether the advertisement information after the described judgment shunt is new advertisement information, if so, then the new advertisement information is transmitted to the fine sorting including ,召回系统判断在模型的历史数据中有没有该历史数据,若没有则判断为新的广告信息,对新的广告信息进行广告冷启动策略处理和渠道冷启动策略处理,然后将新的广告信息传输到精排序中。The recall system determines whether there is such historical data in the historical data of the model. If not, it is determined as new advertisement information, and the new advertisement information is processed by the advertisement cold start strategy and the channel cold start strategy, and then the new advertisement information is transmitted. into fine sorting.6.一种基于联盟流量的广告召回装置,其特征在于,包括:6. An advertisement recall device based on affiliate traffic, characterized in that, comprising:广告请求获取模块,用于获取广告请求,广告请求内包含有用户的ID及广告的信息;The advertisement request obtaining module is used to obtain the advertisement request, and the advertisement request contains the user's ID and advertisement information;通道分流模块,用于对广告请求内的用户的ID进行基于hash编码进行通道分流,确定广告的召回渠道;The channel diversion module is used to divert the channel based on hash coding on the user ID in the advertisement request, and determine the recall channel of the advertisement;文案过滤模块,用于在确定广告的召回渠道后,通过相似文案过滤模型基于文案向量过滤掉文案相似的广告信息;The copy filtering module is used to filter out the advertisement information with similar copy based on the copy vector through the similar copy filtering model after determining the recall channel of the advertisement;模型处理模块,用于判断分流后的广告信息是否为新的广告信息,若是,则将新的广告信息传输到精排序中;若否,则将广告信息输入到多路广告召回模型中进行处理,包括基于用户的协同过滤模型、矩阵分解模型、分产品随机模型、基于流处理的全量随机模型;The model processing module is used to judge whether the diverted advertisement information is new advertisement information, if so, transmit the new advertisement information to the fine sorting; if not, input the advertisement information into the multi-channel advertisement recall model for processing , including user-based collaborative filtering model, matrix factorization model, product-by-product stochastic model, and full-scale stochastic model based on stream processing;模型整合模块,用于将多路广告召回模型处理的广告,按照分配的广告数量并行匹配数据,得到去重复广告信息;The model integration module is used to recall the advertisements processed by the multi-channel advertisement model, and match the data in parallel according to the allocated number of advertisements to obtain the de-duplicated advertisement information;广告阈值判断模块,判断广告数量是否符合阈值要求,符合则将广告列表返回到精排序中,不符合则进行随机填充,再将广告列表返回到精排序中;The advertisement threshold judgment module determines whether the number of advertisements meets the threshold requirements. If it meets the requirements, the advertisement list will be returned to the fine sorting. If it does not meet the requirements, it will be filled randomly and then the advertisement list will be returned to the fine sorting;广告竞价模块,用于根据精排序给予用户的行为信息、设备信息、用户活跃度、地理位置、广告上下文特征,计算出每个广告的预估点击率、广告主每次点击出价,得到最终的广告的竞价出价排行;The advertising bidding module is used to calculate the estimated click-through rate of each advertisement and the advertiser's bid for each click according to the behavior information, device information, user activity, geographical location, and advertisement context characteristics given to the user, and obtain the final result. The bidding ranking of the advertisement;广告返回模块,用于根据广告的竞价出价排行,返回竞价成功的广告ID,完成广告召回。The advertisement return module is used to rank the advertisements according to the bidding bids, and return the IDs of the advertisements whose bidding is successful, so as to complete the advertisement recall.7.如权利要求6所述的基于联盟流量的广告召回装置,其特征在于:模型处理模块中,对基于用户的协同过滤模型中进行用户相似度优化,增加渠道曝光量和渠道高版本占比个数来描述用户,并从用户维度聚合成单特征维度和交叉特征维度。7. The advertisement recalling device based on alliance traffic as claimed in claim 6, is characterized in that: in the model processing module, user similarity optimization is carried out in the collaborative filtering model based on user, and the ratio of channel exposure and channel high version is increased. The number of users is used to describe users, and the user dimension is aggregated into single-feature dimension and cross-feature dimension.8.如权利要求6所述的基于联盟流量的广告召回装置,其特征在于:所述通道分流模块用于广告请求内的用户的ID进行基于hash编码进行通道分流,确定广告的召回渠道中,包括:应用内消息推送及消息推送形式;8. the advertisement recalling device based on alliance flow as claimed in claim 6 is characterized in that: described channel diversion module is used for the ID of the user in the advertisement request to carry out channel diversion based on hash coding, determine in the recall channel of advertisement, Including: in-app message push and message push form;应用消息推送包括五个通道,分别为矩阵分解与分产品随机融合模型通道、高ECPM策略与分产品随机融合模型通道、新产品通道、分产品随机算法通道、纯随机算法通道;Application message push includes five channels, namely, matrix decomposition and sub-product random fusion model channel, high ECPM strategy and sub-product random fusion model channel, new product channel, sub-product random algorithm channel, and pure random algorithm channel;消息推送形式包括六个通道,分别为高ECPM策略与分产品随机融合模型通道、高ECPM策略与分产品随机融合模型通道、高ECPM策略与分产品随机融合模型通道、新产品通道、分产品随机算法通道、纯随机算法通道。The form of message push includes six channels, namely high ECPM strategy and sub-product random fusion model channel, high ECPM strategy and sub-product random fusion model channel, high ECPM strategy and sub-product random fusion model channel, new product channel, and sub-product random fusion model channel. Algorithm channel, pure random algorithm channel.9.如权利要求6所述的基于联盟流量的广告召回装置,其特征在于:所述文案过滤模块用于,9. The device for recalling advertisements based on alliance traffic as claimed in claim 6, wherein the copywriting filtering module is used for,通过字典对广告的文案进行编码,将文案的文字转换为向量;Encode the copy of the advertisement through a dictionary, and convert the text of the copy into a vector;基于文案向量,计算向量相似度并设置阈值,对于完全一致的文案,按照ECPM排序,保留ECPM最高的50%比例文案;对于相似度大于95%的文案,按照ECPM排序,保留ECPM最高的30%比例文案。Based on the copy vector, calculate the vector similarity and set a threshold. For the completely consistent copy, sort by ECPM, and keep the 50% of the copy with the highest ECPM; for the copy with a similarity greater than 95%, sort by ECPM, and keep the 30% with the highest ECPM Proportional copy.10.如权利要求6所述的基于联盟流量的广告召回装置,其特征在于:所述模型处理模块中,判断分流后的广告信息是否为新的广告信息,若是,则将新的广告信息传输到精排序中包括,10. The device for recalling advertisements based on alliance traffic as claimed in claim 6, wherein in the model processing module, it is judged whether the advertisement information after the diversion is new advertisement information, and if so, the new advertisement information is transmitted Include in the fine ordering,召回系统判断在模型的历史数据中有没有该历史数据,若没有则判断为新的广告信息,对新的广告信息进行广告冷启动策略处理和渠道冷启动策略处理,然后将新的广告信息传输到精排序中。The recall system determines whether there is such historical data in the historical data of the model. If not, it is determined as new advertisement information, and the new advertisement information is processed by the advertisement cold start strategy and the channel cold start strategy, and then the new advertisement information is transmitted. into fine sorting.
CN202111323295.2A2021-11-092021-11-09Advertisement recall method and device based on alliance flowPendingCN114022209A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202111323295.2ACN114022209A (en)2021-11-092021-11-09Advertisement recall method and device based on alliance flow

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202111323295.2ACN114022209A (en)2021-11-092021-11-09Advertisement recall method and device based on alliance flow

Publications (1)

Publication NumberPublication Date
CN114022209Atrue CN114022209A (en)2022-02-08

Family

ID=80062705

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202111323295.2APendingCN114022209A (en)2021-11-092021-11-09Advertisement recall method and device based on alliance flow

Country Status (1)

CountryLink
CN (1)CN114022209A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109544201A (en)*2018-10-112019-03-29北京奇虎科技有限公司A kind of generation method of Advertising Copy, device and electronic equipment
CN111178970A (en)*2019-12-302020-05-19微梦创科网络科技(中国)有限公司Advertisement delivery method and device, electronic equipment and computer readable storage medium
CN111597454A (en)*2020-04-022020-08-28微梦创科网络科技(中国)有限公司 Account recommendation method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109544201A (en)*2018-10-112019-03-29北京奇虎科技有限公司A kind of generation method of Advertising Copy, device and electronic equipment
CN111178970A (en)*2019-12-302020-05-19微梦创科网络科技(中国)有限公司Advertisement delivery method and device, electronic equipment and computer readable storage medium
CN111597454A (en)*2020-04-022020-08-28微梦创科网络科技(中国)有限公司 Account recommendation method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王晋升: "广告CTR预估系统的设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 2021, 15 May 2021 (2021-05-15), pages 1 - 60*

Similar Documents

PublicationPublication DateTitle
CN110263235B (en)Information push object updating method and device and computer equipment
CN111768244B (en) A method and device for recommending advertisement placement
CN102880688B (en)A kind of method for webpage is estimated, device and equipment
CN111626863A (en)Intelligent recommendation method for financial products
US9336212B2 (en)Systems and methods for collection and automatic analysis of opinions on various types of media
CN112801760A (en)Sequencing optimization method and system of content personalized recommendation system
CN114399352B (en)Information recommendation method and device, electronic equipment and storage medium
CN111078859B (en)Author recommendation method based on reference times
CN106303720B (en)A kind of video recommendation method and system
CN111612519B (en)Method, device and storage medium for identifying potential customers of financial products
CN113222326A (en)Method and device for evaluating maturity of scientific and technological resource service platform
CN112967116A (en)Multi-strategy-pool-based off-line retail store commodity recommendation method and system
WO2002025486A1 (en)Method for collecting and processing information
CN111598632B (en)Method and device for determining equity shares and equity share sequence
CN119722254B (en)Recommendation method of financial products
CN111709819B (en)Point-and-shoot-net property right transaction recommendation system and recommendation method based on graph neural network
CN109636184B (en) Method and system for evaluating account assets of a brand
CN114022209A (en)Advertisement recall method and device based on alliance flow
CN112308419A (en) Data processing method, apparatus, equipment and computer storage medium
CN111859146A (en)Information mining method and device and electronic equipment
CN109472455B (en)Activity evaluation method, activity evaluation device, electronic equipment and storage medium
CN116308545B (en)Advertisement online dynamic pushing method combining user behaviors
CN117236465A (en)Information entropy-based federal decision tree information measurement method
Melnyk et al.The Hierarchical Information System for Management of the Targeted Advertising.
CN108595595A (en)A kind of user knowledge requirement acquisition method calculated based on interactive differential evolution

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp