Content retrieval method, device terminal and storage medium based on user behavior mapTechnical Field
The present invention relates to the field of data retrieval technologies, and in particular, to a content retrieval method based on a user behavior graph, an apparatus terminal, and a computer-readable storage medium.
Background
With the popularization of the internet and the development of search technology, each website or APP (Application) is provided with a search function, and a user can search for desired content by inputting keywords. For example, a search website or a search interface provided by APP is used to search for functions, articles, questions and answers related to "car wash" by inputting the keyword "car wash".
In specific implementation, the implementation of searching related content through keywords can be realized based on an ES (elastic search) full-text search engine, and after the contents of articles, functions, questions and answers and the like are subjected to ES word segmentation, an inverted index is established and is placed in a database. When the user searches, the keywords input by the user are subjected to the same word segmentation processing as before, then the keywords subjected to the word segmentation processing and the like are matched with the inverted index to determine related content, and the related content is recalled from the database. And for the recalled content, sequencing according to the business weight to determine the display sequence, and returning to the front end to display the retrieval result.
However, in the above retrieval scheme, only the matching degree between the input keyword and the content in the database is considered, the current user behavior information or other user behavior information is not considered, and personalized search ranking aiming at the actual needs of the user is not achieved, so that the satisfaction degree of the user on the search result is insufficient, and the corresponding click rate is also low.
Disclosure of Invention
In view of the above, it is necessary to provide a content retrieval method, a device terminal and a computer-readable storage medium based on a user behavior graph in order to solve the above problems.
A content retrieval method based on a user behavior map comprises the following steps:
acquiring an input search keyword;
determining at least one piece of recall content matched with the search keyword in a preset content database according to the search keyword;
respectively calculating an association score corresponding to each recalled content according to a preset association score calculation method based on a user behavior map, wherein the user behavior map is constructed according to the content included in the preset content database and historical click data corresponding to each piece of content, and the historical click data comprises click users and click times;
and sequencing at least one piece of recall content according to the association score, and outputting a sequencing result as a target retrieval result.
Wherein the method further comprises: and aiming at each piece of content contained in the preset content database, determining a content node and a user node corresponding to the content according to the clicking user and the clicking times in the historical clicking data, and constructing the user behavior map according to the content node and the user node.
Wherein, the calculating, based on the user behavior map, the association score corresponding to each recalled content according to a preset association score calculating method further includes: aiming at each recall content, determining click users and click times associated with the recall content according to the user behavior map; and calculating the association score corresponding to the recall content according to the associated click users and the click times.
Wherein the historical click data further comprises click time; the calculating the associated score corresponding to the recalled content according to the associated clicking users and the clicking times further comprises: calculating a time penalty value corresponding to the click time according to a preset time penalty function; and calculating the association score corresponding to the recall content according to the associated click times of the click users and the time penalty score corresponding to the click time.
Wherein the user behavior graph further comprises topical content tags set to one or more content; determining at least one piece of recall content matched with the search keyword in a preset content database according to the search keyword, and further comprising: and determining at least one content with a hot content tag in the preset content database as the recall content.
Wherein, the calculating, based on the user behavior map, the association score corresponding to each recalled content according to a preset association score calculating method further includes: and under the condition that the recalling content is provided with a popular content label, calculating the association score corresponding to the recalling content according to a preset penalty weight coefficient and a preset association score calculation method.
Wherein, according to the search keyword, determining at least one piece of recall content matched with the search keyword in a preset content database, further comprises: matching the retrieval keywords with the inverted index corresponding to the preset content database, and determining at least one piece of recall content according to a matching result; wherein, the matching the search keyword with the inverted index corresponding to the preset content database, and determining at least one piece of recall content according to the matching result, further comprises: respectively calculating a matching score between each piece of content contained in the preset content database and the retrieval keyword according to the inverted index; taking the content with the matching score exceeding a preset matching threshold value as the recalling content; or, sorting each piece of content contained in the preset content database according to the matching score, and determining the sorting result in the preset content database according to the sorting result, wherein the sorting result is stored in a block chain.
A content retrieval apparatus based on a user behavior map, comprising:
the keyword acquisition module is used for acquiring input search keywords;
the content recall module is used for determining at least one piece of recall content matched with the search keyword in a preset content database according to the search keyword;
the association score calculation module is used for calculating an association score corresponding to each recalled content according to a preset association score calculation method based on a user behavior map, wherein the user behavior map is constructed according to the content included in the preset content database and historical click data corresponding to each content, and the historical click data comprises click users and click times;
and the sequencing module is used for sequencing at least one piece of recall content according to the association score and outputting a sequencing result as a target retrieval result.
A terminal comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of:
acquiring an input search keyword;
determining at least one piece of recall content matched with the search keyword in a preset content database according to the search keyword;
respectively calculating an association score corresponding to each recalled content according to a preset association score calculation method based on a user behavior map, wherein the user behavior map is constructed according to the content included in the preset content database and historical click data corresponding to each piece of content, and the historical click data comprises click users and click times;
and sequencing at least one piece of recall content according to the association score, and outputting a sequencing result as a target retrieval result.
A readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
acquiring an input search keyword;
determining at least one piece of recall content matched with the search keyword in a preset content database according to the search keyword;
respectively calculating an association score corresponding to each recalled content according to a preset association score calculation method based on a user behavior map, wherein the user behavior map is constructed according to the content included in the preset content database and historical click data corresponding to each piece of content, and the historical click data comprises click users and click times;
and sequencing at least one piece of recall content according to the association score, and outputting a sequencing result as a target retrieval result.
The invention has the following beneficial effects:
after the content retrieval method, the device terminal and the computer readable storage medium based on the user behavior map are adopted, in the process of content retrieval, at least one piece of recall content matched with the recall content is collected from a preset content database according to retrieval keywords input by a user, then association scores corresponding to each piece of recall content are calculated based on the constructed user behavior map and a preset association score calculation method, and the recall contents are sequenced according to the association scores so as to take the sequenced recall contents as final target retrieval results and output the final target retrieval results to the user. That is, after the content retrieval method based on the user behavior graph, the device terminal and the computer readable storage medium are adopted, the retrieval result obtained according to the input retrieval keyword can be further sorted based on the user behavior graph, so that the effectiveness of sorting and displaying the retrieval result is improved, and the subsequent conversion rate of content retrieval is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Wherein:
FIG. 1 is a flow chart of a content retrieval method based on a user behavior graph according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a user behavior graph in one embodiment of the invention;
FIG. 3 is a schematic diagram of a user behavior graph in accordance with an embodiment of the present invention;
FIG. 4 is a flow chart illustrating a content retrieval method based on a user behavior graph according to an embodiment of the present invention;
FIG. 5 is a flow chart illustrating a content retrieval method based on a user behavior graph according to an embodiment of the present invention;
FIG. 6 is a flow chart illustrating a content retrieval method based on a user behavior graph according to an embodiment of the present invention;
FIG. 7 is a flow chart illustrating the determination of recalled content based on search keywords according to one embodiment of the present invention;
FIG. 8 is a schematic structural diagram of a content retrieval apparatus based on a user behavior graph according to an embodiment of the present invention;
FIG. 9 is a schematic structural diagram of a content retrieval apparatus based on a user behavior graph according to an embodiment of the present invention;
FIG. 10 is a block diagram of the associationscore calculation module 106 according to an embodiment of the invention;
FIG. 11 is a block diagram of thecontent recall module 104 according to an embodiment of the present invention;
FIG. 12 is a schematic structural diagram of a computer device for executing the above-mentioned content retrieval method based on a user behavior graph according to an embodiment of the present application;
fig. 13 is a schematic structural diagram of an embodiment of a readable storage medium provided in the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In this embodiment, in order to solve the problems that the content retrieval scheme does not consider the insufficient satisfaction of the retrieval result and the too low click rate caused by the user behavior, a content retrieval method based on a user behavior map is provided.
Referring to fig. 1, fig. 1 is a schematic flowchart illustrating a content retrieval method based on a user behavior graph according to an embodiment of the present invention.
Specifically, as shown in fig. 1, the content retrieval method based on the user behavior graph provided by the present invention includes steps S102 to S108:
step S102: and acquiring the input search keyword.
In this implementation scenario, a user may input a search keyword through an application or a preset search interface. The search keyword is a keyword determined by the user according to the search requirement (the number of the keyword may be one or more).
In a specific embodiment, the user can input a search keyword through an input function in the application program for searching the related content in the application program. For example, the search keyword input by the user may be "car wash" for searching for content related to "car wash".
Step S104: and determining at least one piece of recall content matched with the search keyword in a preset content database according to the search keyword.
In this embodiment, the content retrieval is implemented based on a preset content database, where the preset content database includes a plurality of pieces of content, for example, one piece of content may be an article, or one function, and the like. The user can search the content database for the content to be viewed through the preset retrieval function.
Specifically, the process of determining at least one piece of recall content according to the retrieval keyword can be used for recalling the content through an open source search engine according to the retrieval keyword, so as to obtain at least one piece of recall content; recall of content is achieved, for example, by the es (elastic search) full text search engine.
Step S106: and respectively calculating the association score corresponding to each recalled content according to a preset association score calculation method based on the user behavior map.
In this embodiment, after a plurality of pieces of recall content are determined according to the search keyword, the recall content needs to be further sorted, and then the recall content is returned to be displayed according to the sorting result, so that the recall content displayed in the search result of the search interface before is more in line with the content that the user wants to search.
In this embodiment, the ranking of the recalled content is based on an association score for each piece of recalled content. In this step, the calculation of the association score will be described. The association score is used for describing the association degree of each piece of recall content and the retrieval keyword input by the user, and the higher the association score is, the higher the association degree is, the higher the possibility that the user clicks the piece of recall content is, and the higher the satisfaction degree of the user for retrieval is.
Specifically, in this step, the calculation of the association score is based on the user behavior map. The user behavior map is a data map constructed according to all contents included in a preset content database and historical click data corresponding to each piece of content.
Specifically, the historical click data includes historical click data corresponding to each piece of content included in the content data, and specifically includes click content (content tag) corresponding to each click, a click user (tag corresponding to the user), and the number of clicks; in other embodiments, the time of the click for each click is also included.
In the process of constructing the user behavior map, aiming at each piece of content contained in a preset content database, determining click users, click times and the like in historical click data corresponding to the piece of content, determining corresponding content nodes and user nodes in the user behavior map, and constructing the user behavior map according to the content nodes and the user nodes. Specifically, the data can be quickly stored in the neo4j database through a storm or spark streaming data processing framework according to historical click data or new click data of the user, so that a corresponding user behavior map is constructed, and the content database and the corresponding user behavior map are updated.
Specifically, as shown in fig. 2, a schematic diagram of a User behavior graph constructed according to a Content node (the label of the Content node in fig. 2 is User) and a User node (the label of the User node in fig. 2 is Content) is given.
With continued reference to FIG. 2, the nodes included in FIG. 2 are specified in the following table:
the attributes corresponding to the USer node (USer) include USer _ id (USer identifier) and read _ num (out degree of the USer node, which is used for identifying the number of contents clicked by the USer). The attribute corresponding to the Content node (Content) includes a Content _ id (Content identification) and a Content _ type (Content type).
With continued reference to FIG. 2, the user behavior graph of FIG. 2 includes the following relationships:
based on the user behavior map, a corresponding path pattern (path pattern) can be constructed. Specifically, for any piece of recall content c, u is a user who currently inputs a search keyword, based on a user behavior graph, a user au corresponding to the recalled content c can be determined to have been clicked, and a corresponding path is determined as follows:
Path=u→(ac:Content)←(au:User)→c,
wherein ac is any piece of content.
In this embodiment, the association score F (c, u) between the recall content c and the user u who currently inputs the search keyword can be determined by calculating the number of paths between the recall content c and the user u, and the association score F (c, u) is the association score F (c) of the recall content c. The number of paths between the recalled content c and the user u is the number of times the recalled content c is clicked by the user au.
Specifically, the association score f (c) of each recalled content c is calculated as follows: determining a click user associated with the recalled content c (wherein the click user associated with the recalled content is the user au clicked the recalled content c) and the click times according to the user behavior map; and then calculating an association score F (c) corresponding to the recalled content c according to all the users au and the corresponding click times.
The association score f (c) may be specifically calculated according to the following formula:
F(c)=F(c,u)=∑au∈Pat(1+au.read_num),
read _ num represents the number of clicks of the user au for the recall content c.
In this embodiment, the calculation of the association score f (c) may also be performed not by directly summing the number of clicks but by inverting the logarithm of the number of clicks, in order to avoid the influence of the abnormally active user on the calculation result of the association score. Specifically, assume that there is a user in au who is crawling and has crawled most (e.g., 80%) of the content in the pre-defined content database. Although the user is "active", the content clicked is not of interest, so the influence of the user on Path should be far less than that of a user clicking only dozens of pieces of content. Specifically, in order to reduce the influence of such abnormal active users on Path, according to the idea of TF-IDF (term frequency-Inverse document frequency, TF means word frequency and IDF means Inverse text frequency index), IUF (Inverse User frequency, parameter of reciprocal of log of User activity) factors are introduced, that is, the reciprocal of log of User click times is adopted to replace simple addition of User click times, so that the actual accuracy of the calculated association score is closer to the actual requirement, and the scientificity and accuracy of the calculation of the association score are improved.
Specifically, the association score f (c) is calculated by the following formula according to the following formula:
further, in other embodiments, in the above calculation process of the association score for such abnormally active users, for some excessively active users, all paths corresponding to the user may also be directly deleted in the Path to remove the influence of the Path on the calculation of the association score.
Specifically, for the user au, the corresponding click number (which may be for all the content or for the currently recalled content c) is determined, if the click number exceeds a preset click number threshold, the user au is determined to be an abnormally active user, and the Path corresponding to the user au is deleted in the Path, so that the scientificity and the accuracy of the calculation of the association score are improved.
Step S108: and sequencing at least one piece of recall content according to the association score, and outputting a sequencing result as a target retrieval result.
After the association score corresponding to each piece of recall content is obtained through calculation, the recall contents can be sorted according to the association score, namely, all the recall contents are sorted in a descending order according to the association score. In the ranking result, the higher the association score is, the earlier the corresponding recalled content is in the ranking result, and the more the recalled content is matched with the retrieval requirement of the user for the current retrieval. Therefore, in the present embodiment, the recall contents are sorted according to the association score, and the sorted recall contents are output as the target retrieval result, so that the user can view or click the recall contents with higher association scores preferentially.
As can be seen from the above description, in the content retrieval process in this embodiment, at least one piece of recall content matched with the recall content is summarized from a preset content database according to the retrieval keyword input by the user, then, based on the constructed user behavior map and the preset association score calculation method, the association score corresponding to each piece of recall content is calculated, and the recall contents are sorted according to the association scores, so that the sorted recall contents are used as the final target retrieval result and output to the user. By the scheme, the retrieval results obtained according to the input retrieval keywords can be further ranked based on the user behavior atlas, so that the retrieval efficiency is effectively improved.
Further, in another embodiment, in the process of determining the recall content in the preset content database, it is necessary to consider not only whether each piece of content matches the input search keyword, but also to expand the coverage of the search result, and in the process of determining the recall result, it is necessary to consider whether the corresponding content is popular content or not.
Specifically, in an embodiment, the nodes included in the user behavior graph further include hot content nodes.
As shown in fig. 3, a schematic diagram of a user behavior graph including hot content nodes (HotContent) is provided.
With continued reference to FIG. 3, the hot content node (HotContent) included in FIG. 3 is specified in the following table:
with continued reference to fig. 3, the user behavior graph shown in fig. 3 also includes the following relationships:
that is, the content included in the preset content database is also provided with a corresponding hot content tag, and the content identified with the hot content tag is the hot content. In the corresponding user behavior map, for the situation of the hot Content, a label corresponding to the hot Content label is further attached to the corresponding Content node (Content), and the Content node (Content) used for identifying that the Content node attached with the label is the hot Content node.
In the user behavior graph, a plurality of labels exist in support of one node (for example, Content node). So in actual storage, both the Content and the HotContent tags are attached to the same Content node (for Content nodes that are hot Content).
In the case where it is considered whether the content is a popular content, as shown in fig. 4, step S104 in the above embodiment: determining at least one piece of recalled content matched with the search keyword in a preset content database according to the search keyword, further comprising the following step S1042:
step S1042: and determining at least one content with a hot content label as a recall content in a preset content database.
The hot content is content with a large number of clicks of other users. Generally, popular content is clicked on more than other content. If certain popular content is contained in the determined recall content, the click rate of the user on the retrieval result can be improved, and the coverage range of the recall content is improved.
Specifically, in the present embodiment, in the process of determining the recall content, it is necessary to determine, as the recall content, not only the content matching the search keyword among the contents included in the preset content data, but also select a part of popular content in the preset content database as the recall content. Specifically, the percentage of the popular content in the recall content may be set to a preset ratio, for example, 20%. That is, the final recalled content is composed of the matched content recalled according to the search keyword, and the popular content. By serving the trending content as a supplement to the recall content, effective determination of the recall content can be ensured.
Further, if the input search keyword is relatively cold, the number of the content matched with the search keyword is small, and the requirement of the number of content recalls cannot be met; in this case, trending content may also be employed as a supplement to the recall content to ensure that a certain amount of recall content is determined efficiently.
In another embodiment, if the search keyword input by the user is a wrongly written word or there is no matching content, the number of the content matching the search keyword determined in step S104 may be 0 or less, resulting in a search failure. In this case, the popular content can be used as the entrance of the recalled content, and the recalled content is filled through the popular content, so as to ensure that a certain amount of recalled content is effectively determined.
Generally, if a piece of content is popular content, the number of clicks to click on the content will be significantly higher than the activity level of the general content, and the corresponding association score will also be higher. In order to avoid that the score of the hot content is too high to cause unreasonable calculation of the associated score or the hot content is too loud, and the preset content retrieval effect cannot be achieved; in this embodiment, the calculation of the association score of the recalled content, which is the popular content, needs to be multiplied by a penalty factor α smaller than 1, and then is used as the association score of the popular content.
Specifically, as shown in fig. 5, the step S106: respectively calculating the association score corresponding to each recalled content according to a preset association score calculation method based on the user behavior atlas, and further comprising the following steps:
step S106a 1: under the condition that the recalling content is not provided with a popular content label, calculating an association score corresponding to the recalling content according to a preset association score calculation method;
step S106a 2: and under the condition that the recalling content is provided with a popular content label, calculating the association score corresponding to the recalling content according to a preset penalty weight coefficient and a preset association score calculation method.
That is, the calculation of the association score of the recalled content c is performed according to the following calculation formula:
that is to say, the influence of the top content score is reasonably considered in the calculation of the association score, and the accuracy and the authenticity of the calculation of the association score are improved by adding a penalty factor (penalty weight coefficient alpha) to the association score of the top content, so that the effectiveness of the final search result sequencing and display is improved, and the subsequent conversion rate is improved.
In this embodiment, whether a piece of content is a hot content or not can be determined by calculating a hot value of the content.
Specifically, for each piece of content, according to historical click data, the click times and the recall times corresponding to the piece of content are determined, sorting is respectively performed according to the click times and the recall times, and then the heat value corresponding to the piece of content is calculated according to a first sorting result of the click times and a second sorting result of the recall times. Wherein, the number of recalls refers to whether the piece of content is recalled as the recalled content in the process of retrieval. Wherein, the number of clicks and the number of recalls can indicate whether a piece of content is popular.
In a specific embodiment, the calculation of the heat value of a piece of content may be based on the sequence number of the piece of content in the first sorting result and the sequence number of the piece of content in the second sorting result. For example, in one embodiment, the heat value of content a is the rank of content a in the first sort result + the rank of content a in the second sort result. In other embodiments, the calculation of the heat value may also be other calculation methods of ranking results ranked according to the number of clicks and the number of recalls.
In the actual retrieval process, whether a user wants to retrieve a certain piece of content in the retrieval process is also related to the click time of the piece of content in the historical click data, and the influence of click behaviors at different times on the actual interest of the current user in searching is different. For example, the impact of other users on the content should be significantly higher than other users' clicks on thecontent 1 or 10 years ago; the user is more concerned with the current hot content than the related content 1 year or 10 years ago. Therefore, in the present embodiment, in the process of calculating the association score of the recalled content, the influence of the corresponding click time needs to be considered in the process of considering the number of clicks.
Specifically, the historical click data further includes click time corresponding to each click; as shown in fig. 6, the step of calculating the association score corresponding to the recalled content according to the associated clicked user and the number of clicks further includes:
step S106B 1: calculating a time penalty value corresponding to the click time according to a preset time penalty function;
step S106B 2: and calculating the association score corresponding to the recall content according to the associated click times of the click users and the time penalty score corresponding to the click time.
In the process of calculating the association score f (c), au.read _ num is originally used as the calculation of the number of clicks; in the present embodiment, in order to consider the influence of the click time, the corresponding penalty weight needs to be multiplied for each click.
Specifically, in an embodiment, the penalty weight corresponding to the click time is a time penalty score corresponding to the click time, which is calculated according to a preset time penalty function.
Specifically for each click p in au.read _ num, in the relevance score f (c), au.read _ num is replaced with the following parameters:
∑p∈au.readscore_p_time·p_num,
it shows the sum of the product of each click number p _ num and the corresponding time penalty score _ p _ time in the click times of the corresponding content by the user au, and can more accurately identify the influence of the click behavior of the user au on the interest of the user currently performing the retrieval relative to au. Wherein the temporal penalty function for calculating the temporal penalty score _ p _ time may be a time-dependent negative correlation function.
In the present embodiment, a process of determining at least one piece of recall content in step S104 is further described in detail.
As described above, the process of determining at least one piece of recalled content in the preset content database according to the search keyword may be to implement the recall of recalled content through an es (elastic search) full-text search engine. The ElasticSearch is a search server based on Lucene, provides a full-text search engine with distributed multi-user capability and is based on a RESTful web interface; elastic Search was developed in the Java language and published as open source code under the Apache licensing terms, a popular enterprise-level Search engine.
In a specific embodiment, after a user inputs a search keyword, the input search keyword is subjected to word segmentation processing, and a word segmentation result corresponding to the search keyword is obtained. And for a preset content database, performing word segmentation processing on each piece of content by the same word segmentation method, then constructing an inverted index for the word segmentation result, and performing retrieval on the preset content database through the inverted index.
And matching the word segmentation result corresponding to the search keyword with an inverted index corresponding to a preset content database according to the word segmentation result corresponding to the search keyword, and then determining the recall content according to the matching result. Also, the process of determining the recalled content according to the matching result may be a flow chart of determining the recalled content according to the search keyword as given in fig. 7.
Specifically, the step S104 further includes steps S401 to S403:
step S401: respectively calculating the matching score between each piece of content contained in the preset content database and the retrieval key word according to the inverted index corresponding to the preset content database;
step S402: taking the content with the matching score exceeding a preset matching threshold value as recall content;
or, step S403: and sequencing each piece of content contained in the preset content database according to the matching score, and determining a sequencing result in the preset content database according to the sequencing result, wherein the sequencing result is stored in a block chain.
It is emphasized that, in order to further ensure the privacy and security of the sorting result, the sorting result may also be stored in a node of a block chain.
According to the reverse index corresponding to the preset content database and the high-frequency words and the low-frequency words in the reverse index, calculating the matching score between each piece of content and the search keywords, wherein the matching score represents the matching degree between each piece of content and the search keywords, and whether the piece of content should be recalled or not can be determined according to the matching degree.
After the matching score between each piece of content and the retrieval keyword is calculated, the recalled content can be determined according to the matching score. In a specific embodiment, whether a piece of content is recalled is determined based on whether the match score exceeds a preset match threshold; for example, in the case where the matching score exceeds a preset matching threshold, the piece of content is regarded as the recalled content, and in the case where the matching score does not exceed the preset matching threshold, the piece of content is not considered in the process of determining the recalled content.
In another embodiment, whether a piece of content is recalled is determined based on the ranking of the match scores for the piece of content among the match scores for all pieces of content. Specifically, all the contents are sorted in a descending order according to the matching scores, and then the contents corresponding to the top-ranked number N are used as the recall contents in the sorting result according to the number N of the recall contents to be recalled.
In the present embodiment, in the process of determining the recalled content, in order to ensure the coverage of the retrieval result, in the process of determining the recalled content in step S104, the number of pieces of the recalled content also needs to be increased. For example, the number of pieces of the recalled content is 200 pieces in general, and in the present embodiment, the number of the recalled pieces may be increased to 240 pieces, 300 pieces or even higher. That is, by increasing the number of the recalled contents, the coverage rate of the search results can be increased, so that the effectiveness of the final search result sorting and displaying is improved, and the subsequent conversion rate is increased.
In one embodiment, as shown in fig. 8, a content retrieval device based on a user behavior graph is also provided. Specifically, as shown in fig. 8, the content retrieval apparatus based on the user behavior map includes:
akeyword obtaining module 102, configured to obtain an input search keyword;
acontent recall module 104, configured to determine, according to the search keyword, at least one piece of recalled content that matches the search keyword in a preset content database;
the associationscore calculation module 106 is configured to calculate an association score corresponding to each piece of recall content according to a preset association score calculation method based on a user behavior map, where the user behavior map is constructed according to content included in the preset content database and historical click data corresponding to each piece of content, and the historical click data includes a click user and click times;
and thesorting module 108 is configured to sort at least one piece of recall content according to the association score, and output a sorting result as a target retrieval result.
In an embodiment, as shown in fig. 9, the content retrieval apparatus based on a user behavior graph further includes a user behaviorgraph building module 110, configured to, for each piece of content included in the preset content database, determine a content node and a user node corresponding to the content according to the clicked user and the number of clicks in the historical clicked data, and build the user behavior graph according to the content node and the user node.
In one embodiment, the association score calculatingmodule 106 is further configured to determine, for each recalled content, a clicked user and a number of clicks associated with the recalled content according to the user behavior map; and calculating the association score corresponding to the recall content according to the associated click users and the click times.
In one embodiment, the historical click data further includes click time; as shown in fig. 10, the association score calculatingmodule 106 includes a time penaltyscore calculating unit 1062 and an associationscore calculating unit 1064, where the time penaltyscore calculating unit 1062 is configured to calculate a time penalty score corresponding to the click time according to a preset time penalty function; the associationscore calculating unit 1064 is configured to calculate an association score corresponding to the recalled content according to the associated number of clicks of the click user and the time penalty score corresponding to the click time.
In one embodiment, the user behavior graph further comprises topical content tags set to one or more content; thecontent recall module 104 is further configured to determine, in the preset content database, at least one content with a trending content tag as the recalled content.
In an embodiment, the association score calculatingmodule 106 is further configured to calculate an association score corresponding to the recalled content according to a preset penalty weight coefficient and a preset association score calculating method when the recalled content is provided with a trending content tag.
Thecontent recall module 104 is further configured to match the search keyword with an inverted index corresponding to the preset content database, and determine at least one piece of recalled content according to a matching result;
as shown in fig. 11, thecontent recall module 104 further includes a matchingscore calculating subunit 1042 and acontent recall sub-module 1044, wherein the matchingscore calculating subunit 1042 is configured to calculate, according to the inverted index, a matching score between each piece of content included in the preset content database and the search keyword, respectively; thecontent recall sub-module 1044 is configured to use the content of which the matching score exceeds a preset matching threshold as the recalled content; or, sorting each piece of content contained in the preset content database according to the matching score, and determining the sorting result in the preset content database according to the sorting result, wherein the sorting result is stored in a block chain.
As can be seen from the above description, in this embodiment, the content retrieval apparatus based on the user behavior graph summarizes and recalls at least one piece of matched recall content from the preset content database according to the retrieval keyword input by the user, then calculates the association score corresponding to each piece of recall content based on the constructed user behavior graph and the preset association score calculation method, and sorts the recall content according to the association score, so as to output the sorted recall content as the final target retrieval result to the user. That is, after the content retrieval method based on the user behavior graph, the device terminal and the computer readable storage medium are adopted, the retrieval result obtained according to the input retrieval keyword can be further sorted based on the user behavior graph, so that the effectiveness of sorting and displaying the retrieval result is improved, and the subsequent conversion rate of content retrieval is improved.
FIG. 12 is a diagram illustrating an internal structure of a computer device in one embodiment. The computer device may specifically be a terminal, and may also be a server. As shown in fig. 12, the computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program that, when executed by the processor, causes the processor to implement a user behavior profile-based content retrieval method. The internal memory may also have stored therein a computer program that, when executed by the processor, causes the processor to perform a method for content retrieval based on a user behavior profile. Those skilled in the art will appreciate that the architecture shown in fig. 12 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a smart terminal is proposed, comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:
acquiring an input search keyword;
determining at least one piece of recall content matched with the search keyword in a preset content database according to the search keyword;
respectively calculating an association score corresponding to each recalled content according to a preset association score calculation method based on a user behavior map, wherein the user behavior map is constructed according to the content included in the preset content database and historical click data corresponding to each piece of content, and the historical click data comprises click users and click times;
and sequencing at least one piece of recall content according to the association score, and outputting a sequencing result as a target retrieval result.
As can be seen from the above description, in this embodiment, the terminal recalls at least one piece of matched recall content from a preset content database according to the search keyword input by the user, then calculates an association score corresponding to each piece of recall content based on the constructed user behavior graph and a preset association score calculation method, and sorts the recall content according to the association score, so as to output the sorted recall content as a final target search result to the user. That is, after the content retrieval method based on the user behavior graph, the device terminal and the computer readable storage medium are adopted, the retrieval result obtained according to the input retrieval keyword can be further sorted based on the user behavior graph, so that the effectiveness of sorting and displaying the retrieval result is improved, and the subsequent conversion rate of content retrieval is improved.
In an embodiment, please refer to fig. 13, which is a schematic structural diagram of an embodiment of a readable storage medium provided in the present invention. Thereadable storage medium 10 has stored therein at least onecomputer program 20, thecomputer program 20 being for execution by a processor to implement the method of:
acquiring an input search keyword;
determining at least one piece of recall content matched with the search keyword in a preset content database according to the search keyword;
respectively calculating an association score corresponding to each recalled content according to a preset association score calculation method based on a user behavior map, wherein the user behavior map is constructed according to the content included in the preset content database and historical click data corresponding to each piece of content, and the historical click data comprises click users and click times;
and sequencing at least one piece of recall content according to the association score, and outputting a sequencing result as a target retrieval result.
In one embodiment, thereadable storage medium 20 may be a memory chip in a terminal, a hard disk, or other readable and writable storage tool such as a mobile hard disk or a flash drive, an optical disk, or the like, and may also be a server or the like.
As can be seen from the above description, in this embodiment, the computer program in the readable storage medium may summarize at least one piece of recall content that matches from a preset content database according to the search keyword input by the user, then calculate an association score corresponding to each piece of recall content based on the constructed user behavior graph and the preset association score calculation method, and rank the recall content according to the association score, so as to output the ranked recall content as a final target search result to the user. That is, after the content retrieval method based on the user behavior graph, the device terminal and the computer readable storage medium are adopted, the retrieval result obtained according to the input retrieval keyword can be further sorted based on the user behavior graph, so that the effectiveness of sorting and displaying the retrieval result is improved, and the subsequent conversion rate of content retrieval is improved.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.