Method and device for generating brand derivative wordsTechnical Field
The invention relates to the technical field of computers, in particular to a method and a device for generating brand derivative words.
Background
With the progress of technology and the increase of data volume, the user experience requirement on search is higher and higher, and particularly when data related to brand words is searched, the user expects to reach the corresponding brand or shop quickly.
The prior art recognizes a brand word based on a principle that a search word is completely matched, for example, for the brand word "hua yi", a brand is hit only when the search word input by a user is "huayi", and the brand is not hit when the search word input by the user is "huayi".
In the process of implementing the invention, the inventor finds that the prior art has at least the following problems: brand words are identified on the basis of the principle that search terms are completely matched, when a user inputs diversified search terms, corresponding brands and shops are difficult to identify, and search experience is poor.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for generating a brand derivative word, so as to improve user search experience.
To achieve the above object, according to one aspect of an embodiment of the present invention, there is provided a method of generating a brand-derived word.
The method for generating the brand derivative word in the embodiment of the invention comprises the following steps: extracting search terms according to the obtained user search data, wherein the user search data comprises browsing behavior data based on the search terms and clicking behavior data based on the search terms; extracting brand words of clicked commodities according to the click behavior data based on the search words; and calculating the correlation degree of the search word and the brand word of the clicked commodity, and selecting the search word with the correlation degree larger than a preset threshold value as a brand derivative word.
Optionally, before extracting the search term according to the acquired user search data, the method further includes: filtering the user search data; and extracting the search terms according to the acquired user search data comprises: and extracting search terms in the user search data, and carrying out unification processing on the search terms.
Optionally, the calculating the relevancy of the search term and the brand term of the clicked item includes: calculating click points of the search terms and brand terms of the clicked commodities; calculating text similarity scores of the search terms and brand terms of the clicked commodity; and calculating the relevance of the search word and the brand word of the clicked commodity based on the click score and the text similarity score.
Optionally, the calculation formula of the click score is as follows:
wherein,
for searching a word w
jAnd a click score of brand word b of the clicked item,
for the number of times the search term wj corresponds to the brand word b of the clicked good,
for searching for a word w in a unit time
jNumber of searches of, AvgSearchTimes
bExposure data of the brand word b of the clicked commodity in unit time is shown, wherein the exposure data of the brand word b of the clicked commodity represents the number of times of showing the brand word b of the clicked commodity.
Optionally, before calculating the text similarity score of the search term and the brand term of the clicked item, the method further comprises: and judging whether the text of the search word comprises the brand word of the clicked commodity, and if so, calculating the text similarity score of the search word and the brand word of the clicked commodity.
Optionally, the text type of the search term includes Chinese text and/or English text, and
if the text type of a certain search word comprises both Chinese text and English text, performing word segmentation processing on the search word according to different text types, and then calculating text similarity scores of the search word and brand words of the clicked commodity.
Optionally, calculating the text similarity score of the search term and the brand term of the clicked item includes: when the text type of the search word is a Chinese text, the calculation formula of the text similarity score is as follows:
wherein,
for searching a word w
jText similarity score with brand word b of clicked item, α
1A factor is calculated for the similarity of the chinese text,
for searching a word w
jLength of text, L
bThe text length of the brand word b of the clicked commodity; when the text type of the search word is an English text, the calculation formula of the text similarity score is as follows:
wherein,
for searching a word w
jText similarity score with brand word b of clicked item, α
2For the similar calculation factor of the english text,
for searching a word w
jLength of character, L
b' is a search term w
jThe character length of the prefix word.
Optionally, before calculating the relevance of the search term and the brand term of the clicked item, the method includes: and respectively carrying out normalization calculation on the click score and the text similarity score.
Optionally, the calculation formula of the correlation is:
wherein,
for searching a word w
jThe relevancy to the brand word b of the clicked commodity is β, the relevancy calculation factor of the click score is gamma, the relevancy calculation factor of the text similarity score is gamma,
for normalized search terms w
jAnd a click score of brand word b of the clicked item,
for normalized search terms w
jAnd the similar score with the text of the brand word b of the clicked commodity.
To achieve the above object, according to another aspect of the embodiments of the present invention, there is provided an apparatus for generating a brand-derived word.
The device for generating the brand derivative word in the embodiment of the invention comprises the following components: the search word extraction module is used for extracting search words according to the obtained user search data, and the user search data comprises browsing behavior data based on the search words and clicking behavior data based on the search words; the brand word extraction module is used for extracting the brand words of the clicked commodities according to the click behavior data based on the search words; and the calculation module is used for calculating the correlation degree of the search terms and the brand terms of the clicked commodities and selecting the search terms with the correlation degree larger than a preset threshold value as the brand derivative terms.
To achieve the above object, according to still another aspect of embodiments of the present invention, there is provided an electronic apparatus.
An electronic device of an embodiment of the present invention includes: one or more processors; a storage device to store one or more programs that, when executed by the one or more processors, cause the one or more processors to implement the method of generating brand derivatives of embodiments of the present invention.
To achieve the above object, according to still another aspect of an embodiment of the present invention, there is provided a computer-readable medium.
A computer-readable medium of an embodiment of the present invention has a computer program stored thereon, where the computer program, when executed by a processor, implements the method for generating a brand-derived word of an embodiment of the present invention.
One embodiment of the above invention has the following advantages or benefits: by calculating the relevance of the extracted search words and the brand words of the clicked commodities and selecting the search words with the relevance larger than a preset threshold value as the brand derivative words, the brand derivative word bank is generated, so that the requirement that a user can directly reach brand shops and commodities quickly can be met when the user uses the words in the brand derivative word bank as the search words, and the user search experience is improved; according to the embodiment of the invention, by filtering the user search data, illegal user data can be filtered, and the accuracy of the generated brand derivative word is improved; in the embodiment of the invention, the search terms are unified, so that the format of the search terms can be standardized; in the calculation formula of the click points of the search words and the brand words of the clicked commodities, the times of the search words in unit time and the exposure data of the brand words of the clicked commodities in unit time are counted, so that the updating of data can be considered, and outdated data can be discarded; in the calculation formula of the text similarity score of the search word and the brand word of the clicked commodity, different calculation methods are provided for the Chinese text and the English text, so that the accuracy of the text similarity score can be improved; in the embodiment of the invention, the search terms with the correlation degree larger than the preset threshold are selected as the brand derivative terms, so that the preset threshold can be set according to application requirements, and the flexibility of the generated brand derivative terms is improved.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic diagram of the main steps of a method of generating brand-derived words, according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a main flow of a method of generating brand-derived words, according to an embodiment of the invention;
FIG. 3 is a schematic main flow chart of a method for generating brand derivative words according to an embodiment of the present invention, for calculating text similarity scores of a search term and a brand term of a clicked item;
FIG. 4 is a schematic diagram of the major modules of an apparatus for generating brand-derived words, according to an embodiment of the present invention;
FIG. 5 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 6 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In the prior art, brand words are identified based on a principle that search words are completely matched, when a user inputs diversified search words, corresponding brands and shops are difficult to identify, and the search experience of the user is poor. In the embodiment of the invention, the brand derivative word bank is generated by calculating the correlation degree of the extracted search word and the brand word of the clicked commodity and selecting the search word with the correlation degree larger than the preset threshold value as the brand derivative word, so that the requirement that a user can directly reach brand shops and commodities quickly can be met when the user uses the word in the brand derivative word bank as the search word, and the user search experience is improved.
The following is a detailed explanation of technical terms involved in the examples of the present invention:
the brand derivative term refers to deriving a corresponding brand term from a canonical brand term, such as "Nike shoe" being a brand derivative of the brand term "Nike".
PV sorting means that the behavior data of the user is accumulated and counted, and then sorted according to the accumulated times. The behavior data of the user here includes data of browsing and clicking of goods by the user through the search term.
FIG. 1 is a schematic diagram illustrating the main steps of a method for generating a brand derivative word according to an embodiment of the present invention, as shown in FIG. 1, the method for generating a brand derivative word according to an embodiment of the present invention mainly includes the following steps:
step S101: and extracting search terms according to the acquired user search data. The user search data comprises browsing behavior data and click behavior data which are performed by a user on the electronic commerce platform through search words. After the user search data is acquired, search terms in the user search data are extracted.
Step S102: and extracting brand words of the clicked commodities according to the click behavior data based on the search words. And extracting brand words of the clicked commodities from the click behavior data based on the search words.
Step S103: and calculating the correlation degree of the search terms and the brand terms of the clicked commodities, and selecting the search terms with the correlation degree larger than a preset threshold value as brand derivative terms. After the search term and the brand term of the clicked commodity are extracted in the steps S101 and S102, the correlation between the search term and the brand term of the clicked commodity is calculated, then the similarity is compared with the preset threshold value, and the search term with the correlation larger than the preset threshold value is selected as the brand derivative term according to the comparison result.
In an embodiment of the present invention, before extracting a search term according to the obtained user search data, the method for generating a brand derivative term may further include: user search data is filtered. By filtering the user search data before extracting the search words, illegal user search data can be filtered out, and the legality of the user search data is guaranteed. The user illegal data may include: in the embodiment of the present invention, the value of m may be set according to an actual situation, for example, but not limited to, 1%, data without a user ID, data with an unknown source, user data with excessive data per day, and blacklist IP data.
In the embodiment of the present invention, extracting search terms according to the obtained user search data may include: and extracting search words in the user search data, and carrying out unification processing on the search words. After the search terms are extracted, the extracted search terms are unified, so that the extracted search terms can be normalized. Wherein, unifying the extracted search terms may include: removing blank characters before and after the search word; changing a plurality of spaces in the character into one space; removing invisible characters: unifying English in the search words into capital or lowercase; unifying Chinese characters in the search words into traditional characters or simplified characters.
In the embodiment of the present invention, calculating the relevance between the search term and the brand term of the clicked item may include: calculating click points of the search terms and the brand terms of the clicked commodities; calculating text similarity scores of the search terms and the brand terms of the clicked commodities; and calculating the correlation between the search word and the brand word of the clicked commodity based on the click score and the text similarity score. After the search terms and the brand terms of the clicked commodities are extracted, point points and text similarity scores of the search terms and the brand terms of the clicked commodities are respectively calculated, then the correlation degrees of the search terms and the brand terms of the clicked commodities are calculated, then the correlation degrees are compared with the preset threshold value, and finally the search terms with the correlation degrees larger than the preset threshold value are selected as brand derivative terms according to the comparison results.
In the embodiment of the present invention, the calculation formula of the click score may be:
wherein,
for searching a word w
jAnd a click score of brand word b of the clicked item,
for searching a word w
jNumber of brand word b corresponding to clicked commodityThe number of the first and second groups is,
is a search word w in a unit time (in the embodiment of the present invention, a value of the unit time may be set according to an actual situation, for example, but not limited to, the value may be set to be the last 15 days)
jNumber of searches of, AvgSearchTimes
bExposure data of the brand word b of the clicked commodity in unit time is shown, wherein the exposure data of the brand word b of the clicked commodity represents the number of times of showing the brand word b of the clicked commodity.
In this embodiment of the present invention, before calculating the text similarity score between the search term and the brand term of the clicked item, the method for generating the brand derivative term may further include: and judging whether the text of the search word comprises the brand word of the clicked commodity, and if so, calculating the text similarity score of the search word and the brand word of the clicked commodity. Judging whether the text of the search word comprises a brand word of the clicked commodity or not before calculating the text similarity score, and calculating the text similarity score of the search word and the brand word of the clicked commodity when the text of the search word comprises the brand word of the clicked commodity; when the text of the search word does not include the brand word of the clicked item, the text similarity between the search word and the brand word of the clicked item is zero.
In the embodiment of the present invention, the text type of the search word may include a chinese text and/or an english text. In the formula for calculating the text similarity score, calculation is performed only for the same text type, that is, the text similarity score of the Chinese name of the Chinese search word and the brand word of the clicked commodity is calculated, or the text similarity score of the English name of the English search word and the brand word of the clicked commodity is calculated. If the text type of a certain search word comprises both Chinese text and English text, word segmentation processing is firstly carried out on the search word according to different text types, and then text similarity scores of the search word and brand words of clicked commodities are calculated. The method comprises the steps of considering the composition of text types of search words before calculating text similarity scores, when the text type of a certain search word consists of a Chinese text and an English text, performing word segmentation processing on the search word, splitting the search word into a Chinese part and an English part, and then calculating the text similarity scores of the Chinese part and the English part respectively.
In the embodiment of the present invention, calculating the text similarity score of the search term and the brand term of the clicked item may include: when the text type of the search word is a Chinese text, the calculation formula of the text similarity score is as follows:
wherein,
for searching a word w
jText similarity score with brand word b of clicked item, α
1Factor is calculated for Chinese text similarity (in the embodiment of the present invention, α can be set according to actual conditions
1For example, but not limited to setting α
1Is 90) of the total weight of the steel,
for searching a word w
jLength of text, L
bThe length of the text of the brand word b of the clicked commodity, for example, the brand word is Hua's, the search word is Hua's mobile phone, and the Chinese text similarity calculation factor α
1The text similarity score was calculated as 90 × 2/4 ═ 45, and was 90. When the text type of the search word is English text, the calculation formula of the text similarity score is as follows:
wherein,
for searching a word w
jText similarity score with brand word b of clicked item, α
2Calculating the factor for similarity of English text (in the embodiment of the present invention, α can be set according to actual conditions
2For example, but not limited to setting α
2Is 40) of the total weight of the steel,
for searching a word w
jLength of character, L
b' is a search term w
jThe text length of the brand word of the clicked commodity refers to the text length of the Chinese name of the brand word of the clicked commodity, and the character length of the brand word of the clicked commodity refers to the character length of the English name of the brand word of the clicked commodity
2The text similarity score was calculated as 40 × 4/10 ═ 16 at 40.
In an embodiment of the present invention, before calculating the correlation between the search term and the brand term of the clicked item, the method for generating the brand derivative term may include: and respectively carrying out normalization calculation on the click score and the text similarity score. The formula for performing normalized calculation on the click score may be:
wherein,
for normalized search terms w
jAnd a click score of a brand word b of the clicked item, n representing the total number of all search terms corresponding to the brand word b.
After the click score and the text similarity score of the search word and the brand word of the clicked commodity are calculated, the calculated point integral and the text similarity score are respectively subjected to normalization calculation, and then the correlation degree of the search word and the brand word of the clicked commodity is calculated.
In the embodiment of the present invention, the formula for calculating the correlation may be:
wherein,
for searching a word w
jThe correlation degree with the brand word b of the clicked commodity, β and γ are correlation degree calculation factors (in the embodiment of the present invention, the values of β and γ may be set according to practical situations, for example, β may be set to 0.7 without limitation, γ may be set to 0.3 without limitation),
for normalized search terms w
jAnd a click score of brand word b of the clicked item,
for normalized search terms w
jAnd the similar score with the text of the brand word b of the clicked commodity. When the correlation degree is greater than a preset threshold (in the embodiment of the present invention, the preset threshold may be set according to an actual situation, for example, the preset threshold may be set to be, but is not limited to be, 0.5), the search term is selected as the brand derivative term.
FIG. 2 is a schematic diagram of a main flow of a method of generating brand-derived words, according to an embodiment of the invention. As shown in FIG. 2, the main flow of the method for generating brand derivative words according to the embodiment of the present invention includes: filtering user search data; extracting search terms according to the filtered user search data; extracting brand words of clicked commodities based on click behavior data of the search words; calculating click points of the search terms and the brand terms of the clicked commodities; calculating text similarity scores of the search terms and the brand terms of the clicked commodities; respectively carrying out normalization calculation on the click score and the text similarity score; calculating the correlation degree of the search word and the brand word of the clicked commodity according to the normalized click score and the normalized text similarity score; and judging whether the correlation degree is greater than a preset threshold value, and if the correlation degree is greater than the preset threshold value, selecting the search word as a brand derivative word.
In the step of filtering the user search data, the filtered user search data includes browsing behavior data based on a search term and clicking behavior data based on the search term.
In the method for generating brand derivative words according to the embodiment of the present invention, the order of extracting the search terms and extracting the brand terms of the clicked item may be, but not limited to, the order shown in fig. 2, and may also be set in combination with specific different service requirements.
In the step of calculating the text similarity scores of the search word and the brand word of the clicked commodity, the text similarity scores of the search word and the brand word of the clicked commodity are calculated. FIG. 3 is a schematic main flowchart of a method for generating brand derivative words according to an embodiment of the present invention, in which text similarity scores of brand words of search terms and clicked goods are calculated. As shown in fig. 3, the main process of calculating the text similarity score of the search term and the brand term of the clicked item according to the method for generating brand derivative terms of the embodiment of the present invention includes: judging whether the text of the search word includes the brand word of the clicked commodity, if the text of the search word does not include the brand word of the clicked commodity, the similarity of the texts of the search word and the brand word of the clicked commodity is zero; judging whether the text type of the search word comprises a Chinese text and an English text, if so, performing word segmentation processing on the search word according to different text types; and calculating the text similarity score of the search word and the brand word of the clicked commodity according to a text similarity score calculation formula.
In the method for generating a brand derivative word according to the embodiment of the present invention, the order of calculating the click score and calculating the text similarity score may be, but not limited to, the order shown in fig. 2, and may also be set in combination with specific different service requirements.
In the step of respectively carrying out normalization calculation on the click score and the text similarity score, respectively carrying out normalization calculation on the click score and the text similarity score. The formula for performing normalized calculation on the click score may be:
wherein,
for normalized search terms w
jAnd clicked goodsAnd n represents the total number of all search terms corresponding to brand term b. The formula for performing normalization calculation on the text similarity can be as follows:
wherein,
for normalized search terms w
jAnd a text similarity score of the brand word b of the clicked item, wherein n represents the total number of all search words corresponding to the brand word b.
In the embodiment of the present invention, the formula for performing normalization calculation on the click score and the formula for performing normalization calculation on the text similarity may be, but not limited to, the formulas described above, and may also be a formula for performing normalization calculation on the click score and a formula for performing normalization calculation on the text similarity according to the actual scene setting.
According to the technical scheme for generating the brand derivative words, the correlation degree between the extracted search words and the brand words of the clicked commodities is calculated, the search words with the correlation degree larger than the preset threshold value are selected as the brand derivative words, and the brand derivative word bank is generated, so that the requirement that a user can directly reach brand stores and commodities quickly can be met when the user uses the words in the brand derivative word bank as the search words, and the user search experience is improved; according to the embodiment of the invention, by filtering the user search data, illegal user data can be filtered, and the accuracy of the generated brand derivative word is improved; in the embodiment of the invention, the search terms are unified, so that the format of the search terms can be standardized; in the calculation formula of the click points of the search words and the brand words of the clicked commodities, the times of the search words in unit time and the exposure data of the brand words of the clicked commodities in unit time are counted, so that the updating of data can be considered, and outdated data can be discarded; in the calculation formula of the text similarity score of the search word and the brand word of the clicked commodity, different calculation methods are provided for the Chinese text and the English text, so that the accuracy of the text similarity score can be improved; in the embodiment of the invention, the search terms with the correlation degree larger than the preset threshold are selected as the brand derivative terms, so that the preset threshold can be set according to application requirements, and the flexibility of the generated brand derivative terms is improved.
FIG. 4 is a schematic diagram of the major modules of an apparatus for generating brand-derived words, according to an embodiment of the present invention. As shown in FIG. 4, theapparatus 400 for generating brand derivatives of the present invention mainly comprises the following modules: searchterm extraction module 401, brandterm extraction module 402, andcalculation module 403.
The searchterm extraction module 401 may be configured to extract a search term according to the obtained user search data. The user search data comprises browsing behavior data and click behavior data which are performed by a user on the electronic commerce platform through search words. The brandword extracting module 402 may be configured to extract a brand word of a clicked item according to click behavior data based on the search word. After the searchterm extraction module 401 extracts the search term and the brandterm extraction module 402 extracts the brand term of the clicked commodity, thecalculation module 403 may be configured to calculate a correlation between the search term and the brand term of the clicked commodity, and select the search term with the correlation greater than a preset threshold as a brand derivative term.
In this embodiment of the present invention, the searchterm extraction module 401 may further be configured to: filtering the user search data before extracting the search terms according to the acquired user search data. Before extracting the search terms, the searchterm extraction module 401 filters the user search data, so that illegal user search data can be filtered out, and the legality of the user search data is guaranteed.
In this embodiment of the present invention, the searchterm extraction module 401 may further be configured to: and extracting search words in the user search data, and carrying out unification processing on the search words. After the search term is extracted, the searchterm extraction module 401 is configured to perform normalization processing on the extracted search term, and may normalize the extracted search term.
In this embodiment of the present invention, the calculatingmodule 403 may further be configured to: calculating click points of the search terms and the brand terms of the clicked commodities; calculating text similarity scores of the search terms and the brand terms of the clicked commodities; and calculating the correlation between the search word and the brand word of the clicked commodity based on the click score and the text similarity score. After the searchterm extraction module 401 extracts the search term and the brandterm extraction module 402 extracts the brand term of the clicked commodity, thecalculation module 403 may be configured to calculate point scores and text similarity scores of the search term and the brand term of the clicked commodity, respectively, then calculate a degree of correlation between the search term and the brand term of the clicked commodity, then compare the degree of correlation with a preset threshold, and finally select the search term with the degree of correlation greater than the preset threshold as a brand derivative term according to the comparison result.
In the embodiment of the present invention, the calculation formula of the click score may be:
wherein,
for searching a word w
jAnd a click score of brand word b of the clicked item,
for searching a word w
jThe number of times of the brand word b corresponding to the clicked item,
for searching for a word w in a unit time
jNumber of searches of, AvgSearchTimes
bExposure data of the brand word b of the clicked commodity in unit time is shown, wherein the exposure data of the brand word b of the clicked commodity represents the number of times of showing the brand word b of the clicked commodity.
In this embodiment of the present invention, the calculatingmodule 403 may further be configured to: before calculating the text similarity scores of the search terms and the brand terms of the clicked commodity, judging whether the text of the search terms comprises the brand terms of the clicked commodity or not, and if so, calculating the text similarity scores of the search terms and the brand terms of the clicked commodity.
In embodiments of the present invention, the text type of the search term may include a Chinese text and/or an English text, an
Thecalculation module 403 may also be used to: if the text type of a certain search word comprises both Chinese text and English text, word segmentation processing is firstly carried out on the search word according to different text types, and then text similarity scores of the search word and brand words of clicked commodities are calculated.
In this embodiment of the present invention, the calculatingmodule 403 may further be configured to: when the text type of the search word is a Chinese text, the calculation formula of the text similarity score is as follows:
wherein,
for searching a word w
jText similarity score with brand word b of clicked item, α
1A factor is calculated for the similarity of the chinese text,
for searching a word w
jLength of text, L
bThe text length of the brand word b of the clicked commodity; when the text type of the search word is English text, the calculation formula of the text similarity score is as follows:
wherein,
for searching a word w
jText similarity score with brand word b of clicked item, α
2For the similar calculation factor of the english text,
for searching a word w
jLength of character, L
b' is a search term w
jCharacter length of prefix wordAnd (4) degree.
In this embodiment of the present invention, the calculatingmodule 403 may further be configured to: before the relevance of the search word and the brand word of the clicked commodity is calculated, the click score and the text similarity score are respectively subjected to normalization calculation.
In the embodiment of the present invention, the formula for calculating the correlation may be:
wherein,
for searching a word w
jThe relevancy to the brand word b of the clicked commodity is β, the relevancy calculation factor of the click score is gamma, the relevancy calculation factor of the text similarity score is gamma,
for normalized search terms w
jAnd a click score of brand word b of the clicked item,
for normalized search terms w
jAnd the similar score with the text of the brand word b of the clicked commodity.
From the above description, the relevance between the extracted search word and the brand word of the clicked commodity is calculated, the search word with the relevance larger than the preset threshold is selected as the brand derivative word, and the brand derivative word library is generated, so that the requirement that a user can directly reach the brand shop and the commodity quickly can be met when the user uses the word in the brand derivative word library as the search word, and the user search experience is improved; according to the embodiment of the invention, by filtering the user search data, illegal user data can be filtered, and the accuracy of the generated brand derivative word is improved; in the embodiment of the invention, the search terms are unified, so that the format of the search terms can be standardized; in the calculation formula of the click points of the search words and the brand words of the clicked commodities, the times of the search words in unit time and the exposure data of the brand words of the clicked commodities in unit time are counted, so that the updating of data can be considered, and outdated data can be discarded; in the calculation formula of the text similarity score of the search word and the brand word of the clicked commodity, different calculation methods are provided for the Chinese text and the English text, so that the accuracy of the text similarity score can be improved; in the embodiment of the invention, the search terms with the correlation degree larger than the preset threshold are selected as the brand derivative terms, so that the preset threshold can be set according to application requirements, and the flexibility of the generated brand derivative terms is improved.
FIG. 5 illustrates anexemplary system architecture 500 to which a method of generating brand derivatives or an apparatus for generating brand derivatives of embodiments of the present invention may be applied.
As shown in fig. 5, thesystem architecture 500 may includeterminal devices 501, 502, 503, anetwork 504, and aserver 505. Thenetwork 504 serves to provide a medium for communication links between theterminal devices 501, 502, 503 and theserver 505.Network 504 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use theterminal devices 501, 502, 503 to interact with aserver 505 over anetwork 504 to receive or send messages or the like. Theterminal devices 501, 502, 503 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
Theterminal devices 501, 502, 503 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
Theserver 505 may be a server providing various services, such as a background management server (for example only) providing support for shopping websites browsed by users using theterminal devices 501, 502, 503. The backend management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (for example, target push information, product information — just an example) to the terminal device.
It should be noted that the method for generating the brand derivative provided by the embodiment of the present invention is generally executed by theserver 505, and accordingly, the apparatus for generating the brand derivative is generally disposed in theserver 505.
It should be understood that the number of terminal devices, networks, and servers in fig. 5 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 6, a block diagram of acomputer system 600 suitable for use with a terminal device implementing an embodiment of the invention is shown. The terminal device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 6, thecomputer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from astorage section 608 into a Random Access Memory (RAM) 603. In theRAM 603, various programs and data necessary for the operation of thesystem 600 are also stored. TheCPU 601,ROM 602, andRAM 603 are connected to each other via abus 604. An input/output (I/O)interface 605 is also connected tobus 604.
The following components are connected to the I/O interface 605: aninput portion 606 including a keyboard, a mouse, and the like; anoutput portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; astorage section 608 including a hard disk and the like; and acommunication section 609 including a network interface card such as a LAN card, a modem, or the like. Thecommunication section 609 performs communication processing via a network such as the internet. Thedriver 610 is also connected to the I/O interface 605 as needed. Aremovable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on thedrive 610 as necessary, so that a computer program read out therefrom is mounted in thestorage section 608 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through thecommunication section 609, and/or installed from theremovable medium 611. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 601.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor comprises a search term extraction module, a brand term extraction module and a calculation module. The names of these modules do not constitute a limitation to the module itself in some cases, and for example, the search term extraction module may also be described as a "module for extracting a search term from acquired user search data".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: extracting search terms according to the acquired user search data, wherein the user search data comprises browsing behavior data based on the search terms and clicking behavior data based on the search terms; extracting brand words of clicked commodities according to click behavior data based on the search words; and calculating the correlation degree of the search terms and the brand terms of the clicked commodities, and selecting the search terms with the correlation degree larger than a preset threshold value as brand derivative terms.
According to the technical scheme of the embodiment of the invention, the relevance between the extracted search word and the brand word of the clicked commodity is calculated, and the search word with the relevance larger than the preset threshold value is selected as the brand derivative word to generate the brand derivative word library, so that the requirement that a user can directly reach brand shops and commodities quickly can be met when the user uses the word in the brand derivative word library as the search word, and the user search experience is improved; according to the embodiment of the invention, by filtering the user search data, illegal user data can be filtered, and the accuracy of the generated brand derivative word is improved; in the embodiment of the invention, the search terms are unified, so that the format of the search terms can be standardized; in the calculation formula of the click points of the search words and the brand words of the clicked commodities, the times of the search words in unit time and the exposure data of the brand words of the clicked commodities in unit time are counted, so that the updating of data can be considered, and outdated data can be discarded; in the calculation formula of the text similarity score of the search word and the brand word of the clicked commodity, different calculation methods are provided for the Chinese text and the English text, so that the accuracy of the text similarity score can be improved; in the embodiment of the invention, the search terms with the correlation degree larger than the preset threshold are selected as the brand derivative terms, so that the preset threshold can be set according to application requirements, and the flexibility of the generated brand derivative terms is improved.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.