Disclosure of Invention
The embodiment of the application provides a weight adjustment method and device and a storage medium, and the weight adjustment method and device judge whether the weight of the recall is increased or not by evaluating the characteristic information of the recall result of the recall strategy, so that the flexibility of application proportion distribution of the recall result of different recall strategies in a recall fusion stage is improved, and the recommendation effect is improved.
The technical scheme of the embodiment of the application is realized as follows:
the embodiment of the application provides a weight adjusting method, which comprises the following steps:
acquiring multiple recall strategies corresponding to a target type object and multiple current weights corresponding to the multiple recall strategies one by one; each weight in the plurality of current weights represents the application proportion of the result of the corresponding recall strategy recall in the recall fusion stage;
respectively determining a group of characteristic information based on the recall result of each strategy in the plurality of recall strategies to obtain a plurality of groups of characteristic information;
for each strategy in the plurality of recall strategies, determining a corresponding sampling score based on a corresponding group of information in the plurality of groups of characteristic information to obtain a plurality of sampling scores;
based on the plurality of sample scores, at least one weight is selected from the plurality of current weights and adjusted.
In the above method, the determining a set of feature information based on the result of each policy recall in the plurality of recall policies to obtain a plurality of sets of feature information respectively includes:
obtaining a group of evaluation information corresponding to the result of each strategy recall in the plurality of recall strategies to obtain a plurality of groups of evaluation information; each group of information in the multiple groups of evaluation information comprises a smoothed click rate and a normalized single watching time length;
respectively determining a group of positive and negative data based on each group of information in the multiple groups of evaluation information to obtain multiple groups of positive and negative data;
and determining the multiple groups of positive and negative data as the multiple groups of characteristic information.
In the above method, the obtaining a set of evaluation information corresponding to a result of each policy recall in the multiple recall policies to obtain multiple sets of evaluation information includes:
obtaining click rates and single watching duration corresponding to the result recalled by each strategy in the multiple recall strategies to obtain multiple click rates and multiple single watching durations;
smoothing the plurality of click rates to obtain a plurality of smoothed click rates, and normalizing the plurality of single watching durations to obtain a plurality of normalized single watching durations;
and combining each click rate in the plurality of smoothed click rates with the time length of the corresponding recall strategy in the plurality of normalized single watching time lengths to form a group of evaluation information, and obtaining the plurality of groups of evaluation information.
In the above method, the determining a set of positive and negative direction data based on each set of information in the multiple sets of evaluation information to obtain multiple sets of positive and negative direction data includes:
acquiring a preset click rate parameter and a preset viewing time parameter;
and combining each group of information in the multiple groups of evaluation information with the preset click rate parameter and the preset watching duration parameter respectively to construct a group of positive and negative data to obtain multiple groups of positive and negative data.
In the above method, the selecting and adjusting at least one weight from the plurality of current weights based on the plurality of sample scores includes:
sequencing the plurality of sampling scores from large to small to obtain a sampling score sequence;
sequentially selecting at least one sampling score from the sampling score sequence;
obtaining at least one recall strategy corresponding to the at least one sampling score from the plurality of recall strategies;
and acquiring at least one weight corresponding to the at least one recall strategy from the plurality of current weights, and increasing the at least one weight.
In the above method, the increasing the at least one weight comprises:
under the condition that at least one weight is a current weight, increasing the current weight until a recall result of a recall strategy corresponding to the current weight meets a preset stop condition;
when the at least one weight is N current weights and correspondingly the at least one sampling score is N sampling scores, sequentially increasing the weights corresponding to the same recall strategy in the N current weights according to the sequence of the N sampling scores until the recall result of the recall strategy corresponding to each weight in the N current weights meets the preset stop condition; n is a natural number greater than 1.
In the method, the preset stop condition is that the exposure duty ratio is greater than a first preset threshold value, and the profit is less than a second preset threshold value.
The embodiment of the application provides a weight adjusting device, including:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a plurality of recall strategies corresponding to a target type object and a plurality of current weights corresponding to the plurality of recall strategies one to one; each weight in the plurality of current weights represents the application proportion of the result of the corresponding recall strategy recall in the recall fusion stage;
the determining module is used for respectively determining a group of characteristic information based on the recall result of each strategy in the plurality of recall strategies to obtain a plurality of groups of characteristic information; for each strategy in the plurality of recall strategies, determining a corresponding sampling score based on a corresponding group of information in the plurality of groups of characteristic information to obtain a plurality of sampling scores;
and the adjusting module is used for selecting at least one weight from the plurality of current weights and adjusting the weight based on the plurality of sampling scores.
In the above apparatus, the determining module is specifically configured to obtain a set of evaluation information corresponding to a result of each policy recall in the multiple recall policies, so as to obtain multiple sets of evaluation information; each group of information in the multiple groups of evaluation information comprises a smoothed click rate and a normalized single watching time length; respectively determining a group of positive and negative data based on each group of information in the multiple groups of evaluation information to obtain multiple groups of positive and negative data; and determining the multiple groups of positive and negative data as the multiple groups of characteristic information.
In the above apparatus, the determining module is specifically configured to obtain a click rate and a single viewing duration corresponding to a result of each policy recall in the multiple recall policies, and obtain multiple click rates and multiple single viewing durations; smoothing the plurality of click rates to obtain a plurality of smoothed click rates, and normalizing the plurality of single watching durations to obtain a plurality of normalized single watching durations; and combining each click rate in the plurality of smoothed click rates with the time length of the corresponding recall strategy in the plurality of normalized single watching time lengths to form a group of evaluation information, and obtaining the plurality of groups of evaluation information.
In the device, the determining module is specifically configured to obtain a preset click rate parameter and a preset viewing duration parameter; and combining each group of information in the multiple groups of evaluation information with the preset click rate parameter and the preset watching duration parameter respectively to construct a group of positive and negative data to obtain multiple groups of positive and negative data.
In the above apparatus, the adjusting module is specifically configured to sort the plurality of sample scores from small to large to obtain a sample score sequence; sequentially selecting at least one sampling score from the sampling score sequence; obtaining at least one recall strategy corresponding to the at least one sampling score from the plurality of recall strategies; and acquiring at least one weight corresponding to the at least one recall strategy from the plurality of current weights, and increasing the at least one weight.
In the above apparatus, the adjusting module is specifically configured to, when the at least one weight is a current weight, increase the current weight until a recall policy recall corresponding to the current weight meets a preset stop condition; when the at least one weight is N current weights and correspondingly the at least one sampling score is N sampling scores, sequentially increasing the weights corresponding to the same recall strategy in the N current weights according to the sequence of the N sampling scores until the recall result of the recall strategy corresponding to each weight in the N current weights meets the preset stop condition; n is a natural number greater than 1.
In the above apparatus, the preset stop condition is that the exposure duty is greater than a first preset threshold, and the profit is less than a second preset threshold.
The embodiment of the application provides a weight adjusting device, which comprises a processor, a memory and a communication bus;
the communication bus is used for realizing communication connection between the processor and the memory;
the processor is configured to execute one or more programs stored in the memory to implement the weight adjustment method.
An embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the weight adjustment method described above.
The embodiment of the application provides a weight adjustment method, a weight adjustment device and a storage medium, wherein the method comprises the following steps: acquiring multiple recall strategies corresponding to the target type object and multiple current weights corresponding to the multiple recall strategies one by one; respectively determining a group of characteristic information based on the recall result of each strategy in the plurality of recall strategies to obtain a plurality of groups of characteristic information; for each strategy in the multiple recalling strategies, determining a corresponding sampling score based on a corresponding group of information in the multiple groups of characteristic information to obtain multiple sampling scores; based on the plurality of sample scores, at least one weight is selected from the plurality of current weights and adjusted. According to the technical scheme, the characteristic information of the recall result of the recall strategy is evaluated to judge whether the weight of the recall is increased or not, so that the flexibility of application proportion distribution in the recall fusion stage of different recall strategies is improved, and the recommendation effect is improved.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
Fig. 1 is a schematic view of an exemplary video recommendation process provided in an embodiment of the present application. As shown in fig. 1, in the embodiment of the present application, a video recommendation system needs to first perform video recall from a video source by using a video recall policy to obtain candidate videos, and then perform fusion of the candidate videos, that is, select a certain number of videos from the candidate videos based on the weight of each video recall policy, so as to sort the selected videos, and finally recommend the videos to a user. The weight adjustment method provided by the embodiment of the application can be applied to the video recommendation system shown in fig. 1, and certainly, can also be applied to recommendation systems such as articles, audios, books and the like.
The embodiment of the application provides a weight adjustment method, and particularly relates to a method for dynamically fusing multiple recalls through a Bandit algorithm period by taking the fusion of the multiple recalls as a Multi-arm Bandit (MAB) problem. The details are described below.
Fig. 2 is a schematic flow chart of a weight adjustment method according to an embodiment of the present disclosure. As shown in fig. 2, in the embodiment of the present application, the weight adjustment method mainly includes the following steps:
s101, acquiring multiple recall strategies corresponding to a target type object and multiple current weights corresponding to the multiple recall strategies one by one; each weight in the plurality of current weights represents an application proportion of a result of the corresponding recall policy recall in the recall fusion stage.
In an embodiment of the present application, the weight adjusting apparatus may obtain a plurality of recall policies corresponding to the target type object, and a current weight corresponding to each policy of the plurality of recall policies.
It can be understood that, in the embodiment of the present application, in the recommendation process, for different types of users, that is, different types of objects, a corresponding plurality of recall policies may be preset, so as to recall from a large amount of users by using different policies. Object of target type as having a certainThese areSpecific users, such as new users or old users, specific target type objects, and various recall policies corresponding to the target type objects may be set according to actual application scenarios and requirements, and the embodiments of the present application are not limited.
It should be noted that, in the embodiment of the present application, while the weight adjustment device performs weight adjustment on multiple recall policies corresponding to the target type object, actually, the weight adjustment device may also perform weight adjustment on multiple recall policies corresponding to one or more other type objects different from the target type object at the same time, and the specific weight adjustment methods are completely the same.
It should be noted that, in the embodiment of the present application, the current weight corresponding to each recall policy represents an application ratio of the result recalled by the recall policy in the recall fusion stage.
It should be noted that, in the embodiment of the present application, there is a current weight corresponding to each of the plurality of recall policies, where the current weight is actually the last adjusted weight when the weight adjustment has been implemented, and where the current weight is actually the preset initial weight when the weight adjustment has not been implemented.
It should be noted that, in the embodiment of the present application, the multiple recall policies may be multiple video recall policies, multiple article recall policies, and the like, and the corresponding recall result may be content of a type of video, an article, a book, an audio, and the like, and the embodiment of the present application is not limited.
For example, in the embodiment of the present application, the target type object corresponds to four recall policies, namely an interest recall policy, a collaborative filtering recall policy, a deep learning recall policy, and a vectorization recall policy. The current weight of the interest recall strategy is 0.6, the current weight of the collaborative filtering recall strategy is 0.2, and the current weights of the deep learning recall strategy and the vectorization recall strategy are respectively 0.1.
S102, respectively determining a group of characteristic information based on the recall result of each strategy in the multiple recall strategies to obtain multiple groups of characteristic information.
In an embodiment of the present application, the weight adjusting apparatus may determine a set of feature information based on a result of recalling each of the plurality of kinds of recall policies, so as to obtain a plurality of sets of feature information.
It is understood that, in the embodiment of the present application, each policy in the plurality of recall policies may be used as a recall criterion to implement a corresponding recall, so that the weight adjustment apparatus may determine, for each recall policy, a corresponding set of feature information by using a result of the recall, to obtain a plurality of sets of feature information.
Specifically, in an embodiment of the present application, the weight adjusting apparatus determines a set of feature information based on a result of recalling each of the multiple kinds of recall policies, and obtains multiple sets of feature information, including: acquiring a group of evaluation information corresponding to the result of each strategy recall in a plurality of recall strategies to obtain a plurality of groups of evaluation information; respectively determining a group of positive and negative data based on each group of information in the multiple groups of evaluation information to obtain multiple groups of positive and negative data; and determining the multiple groups of positive and negative data into multiple groups of characteristic information.
Specifically, in an embodiment of the present application, the obtaining, by the weight adjusting device, a set of evaluation information corresponding to a result of recalling each of the multiple kinds of policies to obtain multiple sets of evaluation information includes: obtaining click rates and single watching duration corresponding to the result recalled by each strategy in a plurality of recall strategies to obtain a plurality of click rates and a plurality of single watching durations; smoothing the plurality of click rates to obtain a plurality of smoothed click rates, and normalizing the plurality of single viewing durations to obtain a plurality of normalized single viewing durations; and combining each click rate in the plurality of smoothed click rates with the time length corresponding to the same recall strategy in the plurality of normalized single watching time lengths to form a group of evaluation information to obtain a plurality of groups of evaluation information.
It should be noted that, in the embodiment of the present application, when acquiring a set of corresponding evaluation information for each of multiple recall policies, the weight adjustment device first acquires the click rate and the single viewing duration of the result of the policy recall, since the result of the policy recall may include multiple sub-results, for example, the recall policy is a video recall policy, and for a video recall policy, multiple videos that meet the policy may be recalled, each video being a sub-result, so that the click rate is the total click rate that combines the multiple sub-results, and the single viewing duration is the average viewing duration of each result of the multiple sub-results, and then the weight adjustment device performs smoothing processing on the click rate and performs normalization processing on the single viewing duration to finally form a set of evaluation information. The specific smoothing processing and normalization processing modes may be set according to actual application scenarios and requirements, and the embodiment of the present application is not limited.
Specifically, in the embodiment of the present application, the click through rate corresponding to the result of one recall strategy is ctr,
pv is the exposure number, click is the number of clicks, and the weighting adjustment device can perform wilson smoothing on the exposure number, specifically using the following formula (1):
where ctr' is the smoothed click rate and z is 1.96, i.e., the confidence is 95%.
In the embodiment of the present application, when the weight adjustment device performs the smoothing process on the click rate ctr, the weight adjustment device may also perform a bayesian smoothing process on the click rate ctr, that is, a constant is added to each of the click number click and the exposure number pv, as shown in formula (2):
where ctr' is the smoothed click rate, and a and b are preset constants.
It can be understood that, in the embodiment of the present application, the weight adjustment device performs smoothing processing on the click rate, which can solve the problem that the click rate itself only considers a relative value and does not consider an absolute value, that is, the actual influence of the size of the exposure number is not considered, for example, one case is that the click number is 5, the exposure number is 10, another case is that the click number is 500, the exposure number is 1000, and the click rate of both cases is 0.5, but according to the actual performance, the second case is obviously better than the first case from the confidence angle analysis, that is, the original click rate is not reliable enough, so that the click rate is smoothed, and the obtained smoothed click rate can more accurately reflect the actual situation.
Specifically, in the embodiment of the present application, the length of a single viewing time of a recall policy recall is duration, and the weight adjustment device may perform normalization processing on the duration by using the following formula (3):
wherein, the duration' is a normalized single viewing duration, min (duration) is a shortest single viewing duration among the single viewing durations corresponding to the multiple recall policies, and max (duration) is a longest single viewing duration among the single viewing durations corresponding to the multiple recall policies.
Specifically, in an embodiment of the present application, the weight adjusting apparatus determines a set of positive and negative direction data respectively based on each set of information in a plurality of sets of evaluation information, and obtains a plurality of sets of positive and negative direction data, including: acquiring a preset click rate parameter and a preset viewing time parameter; and combining each group of information in the multiple groups of evaluation information with a preset click rate parameter and a preset viewing duration parameter respectively to construct a group of positive and negative data to obtain multiple groups of positive and negative data.
It should be noted that, in the embodiment of the present application, a preset click rate parameter and a preset viewing duration parameter may be stored in the weight adjusting device, so that when positive and negative direction data are determined, the positive and negative direction data may be directly obtained and called. The specific preset click rate parameter and the preset viewing duration parameter can be set according to actual requirements, and the embodiment of the application is not limited.
It can be understood that, in the embodiment of the present application, each group of information in the multiple groups of evaluation information includes a smoothed click rate and a normalized single viewing duration, and the weight adjusting device may implement the construction of the positive and negative direction data in a specific manner by using a preset click rate parameter and a preset viewing duration parameter, and the click rate and the duration included in each group of information are respectively.
Specifically, in the embodiment of the present application, the following formula (4) and formula (5) are used for the weight adjustment device for any set of evaluation information (ctr ', duration') to implement the construction of a set of positive and negative direction data (F, G):
F=(α×ctr’+β×duration’)×100 (4)
G=(α×(1-ctr’)+β×(1-duration’))×100 (5)
wherein, F is positive data, G is negative data, alpha is a preset click rate parameter, and beta is a preset viewing duration parameter.
In the embodiment of the present application, the manner of constructing the positive and negative direction data is only an exemplary manner, and of course, other manners may be set according to actual requirements to implement the determination of the positive and negative direction data, and the embodiment of the present application is not limited.
It should be noted that, in the embodiment of the present application, the weight adjustment apparatus may determine a set of positive and negative directional data corresponding to each recall policy as a set of feature information, so as to obtain multiple sets of feature information.
S103, aiming at each strategy in the multiple recalling strategies, determining a corresponding sampling score based on a corresponding group of information in the multiple groups of characteristic information to obtain multiple sampling scores.
In an embodiment of the application, after obtaining the multiple sets of feature information corresponding to the multiple recall policies, the weight adjustment apparatus may determine, for each of the multiple recall policies, a corresponding sampling score based on a corresponding set of information in the multiple sets of feature information, so as to obtain multiple sampling scores.
Specifically, in the embodiment of the present application, the set of feature information may be a set of positive and negative direction data (F, G), and the weight adjustment device may perform thompson sampling through Beta to determine the sampling score z, specifically using the following formula (6):
wherein, the larger the ctr 'and duration', the larger z.
It should be noted that, in the embodiment of the present application, the weight adjustment device determines the sampling scores by using a group of feature information corresponding to each recall policy, and an adopted algorithm may also be a UCB algorithm, and correspondingly, the feature information determined in step S102 is feature information matched with a corresponding algorithm, which is not limited in the embodiment of the present application.
And S104, based on the plurality of sampling scores, selecting at least one weight from the plurality of current weights and adjusting the weight.
In an embodiment of the application, after obtaining the sampling score corresponding to each recall policy, the weight adjusting apparatus may select at least one weight from the plurality of current weights based on the plurality of sampling scores and perform adjustment.
Specifically, in an embodiment of the present application, the weight adjusting apparatus selects at least one weight from the plurality of current weights based on the plurality of sample scores and adjusts the at least one weight, including: sequencing the plurality of sampling scores from large to small to obtain a sampling score sequence; sequentially selecting at least one sampling score from the sampling score sequence; obtaining at least one recall strategy corresponding to at least one sampling score from a plurality of recall strategies; and acquiring at least one weight corresponding to at least one recall strategy from the plurality of current weights, and increasing the at least one weight.
It should be noted that, in the embodiment of the present application, the weight adjustment device may rank the multiple sampling scores to obtain a sampling score sequence, so that the weight adjustment device can preferentially select the recall policy corresponding to the score with the larger sampling score, and thereby adjust the current weight corresponding to the selected recall policy.
It can be understood that, in the embodiment of the present application, each policy in the plurality of recall policies corresponds to one current weight and one sampling score, and therefore, the weight adjusting apparatus may directly obtain, according to each selected sampling score, a recall policy corresponding to a requirement for implementing weight adjustment, and a current weight corresponding to the recall policy, so as to perform adjustment.
It should be noted that, in the embodiment of the present application, the weight adjustment apparatus sequentially selects at least one sample score from the sample score sequence, and actually selects one or more previous sample scores from the sample score sequence.
Specifically, in an embodiment of the present application, the weight adjusting device increases at least one weight, including: under the condition that at least one weight is a current weight, increasing the current weight until a recall result of a recall strategy corresponding to the current weight meets a preset stop condition; sequentially increasing the weight corresponding to the same recall strategy in the N current weights according to the sequence of the N sampling scores under the condition that at least one weight is N current weights and correspondingly at least one sampling score is N sampling scores until the recall result of the recall strategy corresponding to each weight in the N current weights meets a preset stop condition; n is a natural number greater than 1.
It can be understood that, in the embodiment of the present application, at least one weight selected by the weight adjusting device may be one current weight or N current weights, where in the case of one current weight, the current weight is directly increased until a result of the recall policy recall corresponding to the current weight meets a preset stop condition, and in the case of N current weights, the weight increase may be sequentially performed according to the ranking of the corresponding sampling scores, for example, the current weight in the top ranking is increased first until the result of the recall policy recall corresponding to the weight in the top ranking meets the preset stop condition, and then the current weight in the second ranking is increased until the result of the recall policy recall corresponding to the weight in the second ranking meets the preset stop condition, and until the increase of the current weight in the last ranking is completed.
Specifically, in the embodiment of the present application, the preset stop condition is that the exposure duty is greater than a first preset threshold, and the profit is less than a second preset threshold.
It can be understood that, in the embodiment of the present application, the weight adjusting apparatus needs to obtain the exposure duty and the profit of the result of the recall policy corresponding to the weight adjusting apparatus in real time during the process of increasing a current weight, so as to stop increasing the current weight when two pieces of information satisfy the preset stop condition, where a specific exposure duty is equal to a ratio of the number of exposures of the result of the recall policy to the total number of exposures of the results of multiple recall policy recalls, as shown in the following formula (7):
it should be noted that, in the embodiment of the present application, the first preset threshold and the second preset threshold may be set according to actual needs and application scenarios, and the embodiment of the present application is not limited. In addition, the preset stop condition may also be flexibly set according to actual requirements and application scenarios, and the embodiment of the present application is not limited.
It is understood that, in the embodiment of the present application, the weight adjusting device may increase the weight of one unit each time during the process of increasing one current weight, and then determine whether the preset stop condition is met, and if not, continue to increase one unit until the preset stop condition is met.
It should be noted that, in the embodiment of the present application, each of the plurality of current weights represents an application proportion of a result corresponding to a recall policy recall in a recall fusion stage, and after the weight adjustment device performs at least one current weight adjustment, the results of the plurality of recall policy recalls can be fused based on the adjusted weight, so as to be pushed to the target type object.
It should be noted that, in the embodiment of the present application, the weight adjustment device may adjust the weights of the multiple recall policies periodically, that is, the above steps may be executed again to implement dynamic adjustment of the weights once every time a period of time elapses, so as to meet the real-time requirement.
Fig. 3 is a schematic diagram illustrating comparison between video viewing durations according to an embodiment of the present application. As shown in fig. 3, in an application scenario of video recommendation, the weight adjustment method of the present application is adopted to periodically adjust the weights corresponding to multiple video recall strategies, so as to perform fusion of multiple video recall strategies and recall videos according to the adjusted weights, so as to perform sorting and recommendation, where a variation of video watching duration of a user is shown by asolid line 1, and compared with an averaging method, a variation of video watching duration of a user is shown by a dashedline 2, a duration of video watching of a user is longer, that is, an influence on the user is positive.
The embodiment of the application provides a weight adjusting method, which comprises the following steps: acquiring multiple recall strategies corresponding to the target type object and multiple current weights corresponding to the multiple recall strategies one by one; each weight in the current weights represents the application proportion of the result of the corresponding recall strategy recall in the recall fusion stage; respectively determining a group of characteristic information based on the recall result of each strategy in the plurality of recall strategies to obtain a plurality of groups of characteristic information; for each strategy in the multiple recalling strategies, determining a corresponding sampling score based on a corresponding group of information in the multiple groups of characteristic information to obtain multiple sampling scores; based on the plurality of sample scores, at least one weight is selected from the plurality of current weights and adjusted. According to the weight adjusting method provided by the embodiment of the application, the characteristic information of the recall result of the recall strategy is evaluated to judge whether the weight of the recall is increased or not, so that the flexibility of application proportion distribution of the recall results of different recall strategies in a recall fusion stage is improved, and the recommendation effect is improved.
The embodiment of the application also provides a weight adjusting device. Fig. 4 is a first schematic structural diagram of a weight adjusting apparatus according to an embodiment of the present disclosure. As shown in fig. 4, in an embodiment of the present application, the weight adjusting apparatus includes:
an obtainingmodule 401, configured to obtain multiple recall policies corresponding to a target type object and multiple current weights corresponding to the multiple recall policies one to one; each weight in the plurality of current weights represents the application proportion of the result of the corresponding recall strategy recall in the recall fusion stage;
a determiningmodule 402, configured to determine a set of feature information based on a result of each policy recall in the multiple recall policies, respectively, to obtain multiple sets of feature information; for each strategy in the plurality of recall strategies, determining a corresponding sampling score based on a corresponding group of information in the plurality of groups of characteristic information to obtain a plurality of sampling scores;
anadjusting module 403, configured to select at least one weight from the plurality of current weights and adjust the weight based on the plurality of sample scores.
In an embodiment of the present application, the determiningmodule 402 is specifically configured to obtain a set of evaluation information corresponding to a result of recalling each of the multiple kinds of recall policies, so as to obtain multiple sets of evaluation information; each group of information in the multiple groups of evaluation information comprises a smoothed click rate and a normalized single watching time length; respectively determining a group of positive and negative data based on each group of information in the multiple groups of evaluation information to obtain multiple groups of positive and negative data; and determining the multiple groups of positive and negative data as the multiple groups of characteristic information.
In an embodiment of the present application, the determiningmodule 402 is specifically configured to obtain a click rate and a single viewing duration corresponding to a result recalled by each of the multiple recall policies, so as to obtain multiple click rates and multiple single viewing durations; smoothing the plurality of click rates to obtain a plurality of smoothed click rates, and normalizing the plurality of single watching durations to obtain a plurality of normalized single watching durations; and combining each click rate in the plurality of smoothed click rates with the time length of the corresponding recall strategy in the plurality of normalized single watching time lengths to form a group of evaluation information, and obtaining the plurality of groups of evaluation information.
In an embodiment of the present application, the determiningmodule 402 is specifically configured to obtain a preset click rate parameter and a preset viewing duration parameter; and combining each group of information in the multiple groups of evaluation information with the preset click rate parameter and the preset watching duration parameter respectively to construct a group of positive and negative data to obtain multiple groups of positive and negative data.
In an embodiment of the present application, the adjustingmodule 403 is specifically configured to sort the multiple sample scores from small to large to obtain a sample score sequence; sequentially selecting at least one sampling score from the sampling score sequence; obtaining at least one recall strategy corresponding to the at least one sampling score from the plurality of recall strategies; and acquiring at least one weight corresponding to the at least one recall strategy from the plurality of current weights, and increasing the at least one weight.
In an embodiment of the present application, the adjustingmodule 403 is specifically configured to, when the at least one weight is a current weight, increase the current weight until a recall policy recall corresponding to the current weight meets a preset stop condition; when the at least one weight is N current weights and correspondingly the at least one sampling score is N sampling scores, sequentially increasing the weights corresponding to the same recall strategy in the N current weights according to the sequence of the N sampling scores until the recall strategy corresponding to each weight in the N current weights meets the preset stop condition; n is a natural number greater than 1.
In an embodiment of the application, the predetermined stop condition is that the exposure duty ratio is greater than a first predetermined threshold, and the profit is less than a second predetermined threshold.
Fig. 5 is a schematic structural diagram of a weight adjusting apparatus according to an embodiment of the present application. As shown in fig. 5, in the embodiment of the present application, the weight adjusting apparatus includes aprocessor 501, amemory 502, and a communication bus 503;
the communication bus 503 is used for realizing communication connection between theprocessor 501 and thememory 502;
theprocessor 501 is configured to execute one or more programs stored in thememory 502 to implement the above weight adjustment method.
The embodiment of the application provides a weight adjusting device, which acquires multiple recall strategies corresponding to a target type object and multiple current weights corresponding to the multiple recall strategies one by one; each weight in the current weights represents the application proportion of the result of the corresponding recall strategy recall in the recall fusion stage; respectively determining a group of characteristic information based on the recall result of each strategy in the plurality of recall strategies to obtain a plurality of groups of characteristic information; for each strategy in the multiple recalling strategies, determining a corresponding sampling score based on a corresponding group of information in the multiple groups of characteristic information to obtain multiple sampling scores; based on the plurality of sample scores, at least one weight is selected from the plurality of current weights and adjusted. The weight adjusting device provided by the embodiment of the application evaluates the characteristic information of the recall result of the recall strategy to judge whether to improve the weight of the recall, so that the flexibility of application proportion distribution of the recall result of different recall strategies in a recall fusion stage is improved, and the recommendation effect is improved.
An embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the weight adjustment method described above. The computer-readable storage medium may be a volatile Memory (volatile Memory), such as a Random-Access Memory (RAM); or a non-volatile Memory (non-volatile Memory), such as a Read-Only Memory (ROM), a flash Memory (flash Memory), a Hard Disk (Hard Disk Drive, HDD) or a Solid-State Drive (SSD); or may be a respective device, such as a mobile phone, computer, tablet device, personal digital assistant, etc., that includes one or any combination of the above-mentioned memories.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of implementations of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart block or blocks and/or flowchart block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart block or blocks in the flowchart and/or block diagram block or blocks.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application are included in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.