Movatterモバイル変換


[0]ホーム

URL:


HK1176431A1 - Method and system for information distribution in a website - Google Patents

Method and system for information distribution in a website

Info

Publication number
HK1176431A1
HK1176431A1HK13103670.6AHK13103670AHK1176431A1HK 1176431 A1HK1176431 A1HK 1176431A1HK 13103670 AHK13103670 AHK 13103670AHK 1176431 A1HK1176431 A1HK 1176431A1
Authority
HK
Hong Kong
Prior art keywords
query
items
server
information
keywords
Prior art date
Application number
HK13103670.6A
Other languages
Chinese (zh)
Other versions
HK1176431B (en
Inventor
張祝玉
张祝玉
黃鵬
黄鹏
林鋒
林锋
馮炯
冯炯
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司filedCritical阿里巴巴集团控股有限公司
Publication of HK1176431A1publicationCriticalpatent/HK1176431A1/en
Publication of HK1176431BpublicationCriticalpatent/HK1176431B/en

Links

Landscapes

Abstract

The invention provides a method and a system for issuing information on websites. The method includes: a server at an information issuing end receives thematic information, input by a user, of information to be issued through a client end; the server inquires inquiring entries related to the thematic information in a storage storing inquiring entry history; the server transmits the inquired inquiring entries as keywords of information to be issued to the client end; and the server receives issued information obtained by choosing the keywords through the client end. The technical problem that how to further improve recall rate of retrieval under the circumference of not occupying extra storage space of website database is solved, and recall rate of issued information is increased without occupying extra storage space of website database.

Description

Information publishing method and system in website
Technical Field
The application relates to the technical field of internet, in particular to an information publishing method and system in a website.
Background
The basic process of information retrieval is as follows: 1) a user conceiving a query word expressing the search intention and submitting the query word to a search engine; 2) searching a webpage matched with the keyword by a search engine; 3) the search engine sorts the retrieved web pages according to information of the web pages themselves or a relationship between the web pages according to a certain predetermined policy.
One of the differences between vertical search engines, such as e-commerce search engines, and general search engines is that: the e-commerce search engine rather emphasizes the accuracy of the matching results, requiring a complete match with the user's input. For example, a user often enters products of a particular attribute, model, or brand when querying, it is common practice for e-commerce search engines to ensure that portions of a query term (other than a normalized process) are retrieved from the query results. The method ensures the accuracy of the retrieval result to a great extent, but the recall ratio (the ratio of the number of the retrieved related documents to the number of all the related documents in the document library, and the recall ratio of the retrieval system is measured) has a certain loss correspondingly, because the product release information (offer) of the user does not contain keyword descriptions of certain specific attributes, models, brands and the like, the query experience of the query user is finally reduced.
One of the methods for improving the retrieval recall rate may be to require the user to fill in more complete information description when the user of the website publishes information, such as completely filling in keywords of specific attributes, models or brands of goods one by one, and uploading the keywords to the website server to be stored in the website database. Therefore, in the process of information retrieval, more query results can be matched with the query keywords input by the query user. However, the biggest technical problem encountered by this method is that the amount of data in the information published by the user increases due to the increase of the information content filled in when the user publishes the information, and for a very large website, the database storage capacity of the website is challenged, so that the website must add more database servers to store the data information additionally filled in when the user publishes the information.
For the above problems in the related art, no technical solution for further improving the recall rate of the search without occupying additional storage space of the website database has been proposed.
Disclosure of Invention
The present application mainly aims to provide an information publishing method and system in a website, so as to at least solve the technical problem in the prior art of how to further improve the recall rate of retrieval without additionally occupying the storage space of a website database.
According to one aspect of the application, an information publishing method in a website is provided, which comprises the following steps: a server of an information publishing terminal receives subject information of information to be published input by a user through a client; the server inquires about the inquiry items related to the subject information from the memory, wherein the memory stores the inquiry items of the history records; the server sends the inquired inquiry items to the client as keywords of the information to be issued; the server receives the release information obtained by selecting the keywords through the client.
Further, the topic information includes: the title and category of the information to be published.
Further, the server queries the memory for query entries related to the subject information by: the method comprises the steps that a server divides a title into M independent keywords, and N keywords are selected from the M keywords, wherein M and N are natural numbers, and M is larger than or equal to N; the server inquires whether a query item comprising N key words exists in the storage; if the query items exist, the server judges whether the number of the query items belonging to the category in the queried query items is larger than or equal to P, if so, the former P queried query items belonging to the category are taken as query items related to the subject information, wherein P is a preset natural number.
Further, if the server determines that the number of query entries belonging to the category in the queried query entries is less than P, the step of querying the memory for query entries related to the topic information by the server further includes: repeatedly executing the following steps until the number of the query items belonging to the category in the queried query items is more than or equal to P: the server makes N-1 and performs the querying step in memory.
Further, the server queries the memory for query entries related to the subject information by: the server selects the query items belonging to the category from the memory; the method comprises the steps that a server divides a title into M independent keywords, and N keywords are selected from the M keywords, wherein M and N are natural numbers, and M is larger than or equal to N; the server searches whether more than Q query items comprising N keywords exist in the selected query items belonging to the category, wherein Q is a preset natural number; and if so, taking the top Q inquired inquiry items belonging to the category as inquiry items related to the subject information.
Further, if the server determines that the number of the queried query entries belonging to the category is less than Q, the step of querying the memory for query entries related to the topic information by the server further includes: repeatedly executing the following steps until the number of the inquired inquiry items belonging to the category is more than or equal to Q: the server makes N-1 and performs the querying step in memory.
Further, the step of the server sending the queried query item as a keyword to the client comprises: the server judges whether the number of the on-line query results of each query item in the queried query items is larger than a preset threshold value or not; the server records the query items with the number of the online query results larger than a preset threshold value as a first group of query items, and records the query items with the number of the online query results smaller than or equal to the preset threshold value as a second group of query items; the server sends the first set of query entries and the second set of query entries as keywords to the client.
Further, the step of recording, by the server, query entries having a number of online query results greater than a predetermined threshold as a first set of query entries includes: calculating the correlation degree between each query item and the title in the query items of which the number of the online query results is greater than a preset threshold value; and recording the query items of which the number of the query results is greater than a preset threshold in the first group of query items according to the sequence from the large degree to the small degree of the correlation. The step of recording the query entries with the number of the online query results smaller than or equal to the predetermined threshold as the second group of query entries by the server includes: calculating the correlation degree between each query item and the title in the query items of which the number of the online query results is less than or equal to a preset threshold value; and recording the query items of which the number of the query results is less than or equal to a preset threshold in the second group of query items according to the sequence from the large degree to the small degree of correlation.
Further, before the server queries the memory for the query entry related to the subject information, the method further includes: the server updates the historical query entries stored in memory.
According to another aspect of the present application, there is provided an information distribution system in a website, including: the system comprises a server and a client of an information publishing terminal, wherein the client is used for sending subject information of information to be published, which is input by a user, to the server, and the subject information comprises a title and a category of the information to be published; the server of the information publishing terminal is used for receiving the theme information sent by the client; inquiring a memory for inquiry items related to the subject information, wherein the memory stores the inquiry items of the history records; and sending the inquired inquiry items to the client as keywords of the information to be issued, and receiving the issued information obtained by selecting the keywords through the client.
Further, the server includes: the first title processing unit is used for dividing the title into M independent keywords and selecting N keywords from the M keywords when inquiring inquiry items related to the topic information from the memory, wherein M and N are natural numbers, and M is more than or equal to N; a first query unit configured to query whether a query entry including N keywords exists from a memory; the first judging unit is used for judging whether the number of the query items belonging to the category in the queried query items is larger than or equal to P when the query items comprising N key words exist, and if the number of the query items belonging to the category is larger than or equal to P, the former P queried query items belonging to the category are taken as query items related to the subject information, wherein P is a preset natural number.
Further, the server is further configured to, when the first determining unit determines that the number of query entries belonging to the category in the queried query entries is less than P, repeatedly perform the following steps until the number of query entries belonging to the category in the queried query entries is greater than or equal to P: the server makes N equal to N-1; informing the first title processing unit to select N keywords from the M keywords; notifying the first search unit of a search for whether a search entry including N keywords exists from the memory; and informing the first judging unit to judge whether the number of the query items belonging to the category in the queried query items is more than or equal to P when the query items comprising the N key words exist, and if so, taking the former P queried query items belonging to the category as the query items related to the subject information.
Further, the server includes: a selection unit for selecting the query item belonging to the category from the memory; the second title processing unit is used for dividing the title into M independent keywords and selecting N keywords from the M keywords, wherein M and N are natural numbers, and M is more than or equal to N; the second query unit is used for searching whether more than Q query items comprising N keywords exist in the selected query items belonging to the category, wherein Q is a preset natural number; and if so, taking the top Q inquired inquiry items belonging to the category as inquiry items related to the subject information.
Further, the server is further configured to, when the number of the queried query entries belonging to the category, which are found by the second querying unit, is less than Q, repeatedly perform the following steps until the number of the queried query entries belonging to the category is greater than or equal to Q: the server makes N equal to N-1; informing a second title processing unit to select N keywords from the M keywords; and informing the second query unit to search whether more than Q query items comprising N key words exist in the selected query items belonging to the category, and if so, taking the top Q queried query items belonging to the category as query items related to the subject information.
Further, the server includes: the second judging unit is used for judging whether the number of the online inquiry results of each inquiry item in the inquired inquiry items is larger than a preset threshold value or not when the server sends the inquired inquiry items to the client as the keywords of the information to be issued; the recording unit is used for recording the query items with the number of the online query results larger than a preset threshold value as a first group of query items, and recording the query items with the number of the online query results smaller than or equal to the preset threshold value as a second group of query items; and the sending unit is used for sending the first group of query items and the second group of query items to the client as key words.
Further, the recording unit includes: the first recording unit is used for recording the query items of which the number of the query results is greater than a preset threshold value through the following steps: calculating the correlation degree between each query item and the title in the query items of which the number of the online query results is greater than a preset threshold value; recording query items with the number of query results larger than a preset threshold in a first group of query items according to the sequence of the relevance from large to small; the second recording unit is used for recording the query items of which the number of the query results is less than or equal to a preset threshold through the following steps: calculating the correlation degree between each query item and the title in the query items of which the number of the online query results is less than or equal to a preset threshold value; and recording the query items of which the number of the query results is less than or equal to a preset threshold in the second group of query items according to the sequence from the large degree to the small degree of correlation.
Further, the server includes: and the updating unit is used for updating the query items of the history records stored in the memory before the server queries the query items related to the subject information from the memory.
The application realizes the following technical effects through the technical scheme:
1) the server effectively recommends the query tendency of the buyer user to the seller user through the client by sending the query items of the historical records to the client as the keywords of the to-be-issued information, so that the seller user does not need to fill in a large amount of information description contents, the recall rate of the product information issued by the user can be improved under the condition of not additionally occupying the storage space of a website database server, and finally the purpose of reducing the number of zero/few result query words is achieved, preferably, the experience of the buyer on an e-commerce website can be improved, and the transaction enthusiasm of the buyer can be further improved;
2) the server selects the query items related to the subject information input by the seller user in the history record, and the query items come from different inputs of various buyers, so that the problem of generating a single keyword is avoided, and the query result of the seller can be returned according to the product information issued by the seller even when the buyer user inputs different query terms;
3) the server dynamically updates the query items of the historical records, so that the problems of limited quantity and serious homogenization of the generated keywords can be solved, and the keywords reflecting the query tendency of the buyer user can be recommended to the seller user in real time.
Of course, it is not necessary for any product to achieve all of the above-described advantages at the same time for the practice of the present application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a block diagram of a preferred architecture of an information distribution system in a web site according to an embodiment of the present application;
FIG. 2 is a block diagram of a preferred architecture of a server in an information distribution system in a website according to an embodiment of the present application;
FIG. 3 is a block diagram of another preferred structure of a server in an information distribution system in a website according to an embodiment of the present application;
FIG. 4 is a block diagram of another configuration of an information distribution system in a website according to an embodiment of the present application;
FIG. 5 is a preferred flow chart of a method for posting information in a website according to an embodiment of the present application;
FIG. 6 is another preferred flow chart of a method of posting information in a website according to an embodiment of the present application;
fig. 7 is a further preferred flowchart of an information publishing method in a website according to an embodiment of the present application.
Detailed Description
The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments. The embodiment of the application will be described by taking product information release and retrieval in an e-commerce website as an example, and certainly, a person skilled in the art can popularize and apply the technical scheme of the application to websites such as video websites and internet forums without creative labor. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
First, several terms to which this application relates are explained:
1) and (3) inquiring: i.e., the query terms entered by the user during the search process.
2) The product is as follows: in the field of electronic commerce, merchants sell goods.
3) Category (categories): i.e. the category (category) to which a product belongs in the field of e-commerce.
4) Key words: and correctly describing a plurality of words of the commodity information for indexing the commodity information at the retrieval end.
5) Blue-sea words: in the field of electronic commerce, a user searches for query terms with more times but less retrieval results.
6) Hot word: in the field of electronic commerce, a user searches for query terms with more times and more retrieval results.
7) The recall ratio is as follows: the ratio of the number of the searched relevant documents to the number of all the relevant documents in the document library is measured by the recall ratio of the search system.
8) querylog: a log of queries made by the user at the e-commerce web site.
9) Product exposure rate: the rate at which an item is presented to a searching user in a historical query of an e-commerce web site.
Before describing further details of embodiments of the present application, one suitable computing architecture that may be used to implement the principles of the present application will be described with reference to FIG. 1. In the following description, embodiments of the present application will be described with reference to acts and symbolic representations of operations that are performed by one or more computers, unless indicated otherwise. As such, it will be understood that such acts and operations, which are at times referred to as being computer-executed, include the manipulation by the processing unit of the computer of electrical signals representing data in a structured form. This manipulation transforms the data or maintains it at locations in the computer's memory system, which reconfigures or otherwise alters the operation of the computer in a manner well understood by those skilled in the art. The data structures that maintain the data are physical locations of the memory that have particular properties defined by the format of the data. However, while the present application is described in the foregoing context, it is not meant to be limiting, as those of skill in the art will appreciate that aspects of the acts and operations described hereinafter may also be implemented in hardware.
Turning to the drawings, wherein like reference numerals refer to like elements, the principles of the present application are illustrated as being implemented in a suitable computing environment. The following description is based on illustrated embodiments of the application and should not be taken as limiting the application with regard to alternative embodiments that are not explicitly described herein.
FIG. 1 shows a schematic diagram of one example computer architecture that may be used for these devices. For descriptive purposes, the architecture portrayed is only one example of a suitable environment and is not intended to suggest any limitation as to the scope of use or functionality of the application. Neither should the computing system be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in FIG. 1.
The principles of the present application may be implemented using other general purpose or special purpose computing or communication environments or configurations. Examples of well known computing systems, environments, and configurations that may be suitable for use with the application include, but are not limited to, personal computers, servers, multiprocessor systems, microprocessor-based systems, minicomputers, mainframe computers, and distributed computing environments that include any of the above systems or devices.
In its most basic configuration, FIG. 1 shows an information distribution system in a website, comprising: a server 102 at the information distributor and one or more clients 104. Server 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA, a storage device for storing data, and a transmission device in communication with client 104; the client 104 may include: the system comprises a microprocessor MCU, a transmission device communicated with a server and a display device interacted with a user. In this description and in the claims, an "information distribution system in a website" may also be defined as any hardware component or combination of hardware components capable of executing software, firmware, or microcode to achieve the functionality. The information distribution system in the website may even be distributed to achieve distributed functionality.
Example 1
As shown in fig. 1, the information distribution system in the website includes: a server 102 and a client 104 of an information issuing end are connected with each other.
In the working process, the client 104 sends the topic information of the information to be published, which is input by the user, to the server 102, and in the preferred embodiment of the present application, the topic information includes, but is not limited to, the title and the category of the information to be published; after receiving the topic information sent by the client 104, the server 102 at the information publishing end queries a memory for query entries related to the topic information, where the memory stores the query entries of history records; the server 102 sends the queried query entry to the client 104 as a keyword of the information to be published, and receives the published information obtained by selecting the keyword through the client 104. The query entry is historical query information adopted by the buyer user, and represents information such as search habits and search concerns adopted by the user for searching the information for the information to be issued.
The information to be published in this embodiment may be product information to be published in an e-commerce website, or video information to be published in a video website, or the like.
In the preferred embodiment, the server sends the query entries of the history record to the client as the keywords of the information to be published, and effectively recommends the user query tendency of the search information to the information publishing user through the client, so that the recall rate of the information published by the user can be improved, and the purpose of reducing the number of zero/few result query words is finally achieved. In addition, the method and the system effectively recommend the tendency information of the user query of the search information to the user issuing the information through the client, so that the user issuing the information does not need to fill in a large amount of information description content when issuing the information, and the database server storing the information to be issued does not need to store a large amount of information to be issued, namely, the recall rate of the information issued by the user issuing the information is improved under the condition that the storage space of the website database server is not additionally occupied. Preferably, when the user who issues the information is a seller of the e-commerce website, the experience of the buyer on the e-commerce website can be improved, and the transaction enthusiasm of the buyer can be further improved.
In various embodiments of the present application, the memory may be disposed on a server at an information distribution end, or may be disposed on another server, and the present application is not limited thereto.
In order to enable the server to obtain the query item related to the subject information in the storage, the present application provides two different ways, and the following describes in detail a process of obtaining the query item related to the subject information in the storage by taking the information to be published as the product information to be published by the seller on the e-commerce website as an example, with reference to the accompanying drawings.
(1) Firstly judging the title of the product to be released, and then judging the category of the product to be released
In this query mode, the server shown in fig. 1 may include the specific structure in fig. 2. As shown in fig. 2, the server 202 includes a first title processing unit 2021, a first query unit 2022, and a first judgment unit 2023, which are connected in sequence, where the first title processing unit 2021, when querying a memory (here, the memory is used for storing a query entry of a history record, which may be located on the server, or on another background device, or an independent storage device) for a query entry related to topic information, divides the title into M independent keywords, and selects N keywords from the M keywords, where M and N are both natural numbers, and M is greater than or equal to N; the first search unit 2022 searches from the memory whether there is a search entry including N keywords; the first determining unit 2023 determines whether the number of query entries belonging to the category in the queried query entries is greater than or equal to P when there are query entries including N keywords, and if so, takes the queried query entries belonging to the category as query entries related to the topic information, where P is a preset natural number. Preferably, only a part of the query entries may be used as the query entries related to the topic information, for example, if the number of queried query entries belonging to a category is greater than or equal to P, the top P queried query entries belonging to the category may be used as the query entries related to the topic information.
When the first judging unit 2023 judges that the number of query entries belonging to the category in the queried query entries is less than P, the following steps are repeatedly executed until the number of query entries belonging to the category in the queried query entries is greater than or equal to P: the server 202 makes N-1 and informs the first querying unit 2022 to perform the querying step in memory. In the preferred embodiment, the query parameters are dynamically adjusted, so that the required query result can be quickly and accurately obtained.
The above-described determination process is further described below by way of example. Assume that a title in topic information is divided into 2 independent keywords: the method comprises the steps that an apple and a mobile phone are used, the category of the subject information is a 3G network, P is 30, when items related to the subject information are inquired in a memory, inquiry items simultaneously comprising the two keywords of the apple and the mobile phone are inquired, if 100 inquiry items are found, the inquiry items are judged to belong to the 3G network category in the 100 inquiry items, and if the inquiry items belonging to the 3G network category in the 100 inquiry items are judged to exceed 30 inquiry items, the inquiry items inquired in the first 30 inquiry items are used as the inquiry items related to the subject information.
(2) Firstly judging the category of the product to be released, and then judging the title of the product to be released
In this query mode, the server shown in fig. 1 may include the specific structure in fig. 3. As shown in fig. 3, the server 302 includes a selection unit 3021, a second title processing unit 3022, and a second query unit 3023 connected in this order, wherein the selection unit 3021 selects a query item belonging to a category from a memory; the second title processing unit 3022 divides the title into M independent keywords, and selects N keywords from the M keywords, where M and N are both natural numbers, and M is greater than or equal to N; the second query unit 3023 searches whether or not there are more than Q query items including N keywords in the selected query items belonging to the category, where Q is a preset natural number; and if so, taking the inquired inquiry item belonging to the category as the inquiry item related to the subject information. Preferably, only a part of the query entries may be used as the query entries related to the topic information, for example, if there are more than Q query entries including N keywords in the selected query entries belonging to the category, the top Q queried query entries belonging to the category may be used as the query entries related to the topic information.
When the number of the queried query entries belonging to the category, which are found by the second querying unit 3023, is less than Q, the following steps are repeatedly performed until the number of the queried query entries belonging to the category is greater than or equal to Q: the server 302 makes N-1 and informs the second querying unit 3023 to perform the querying step in memory. In the preferred embodiment, the query parameters are dynamically adjusted, so that the required query result can be quickly and accurately obtained.
The above-described determination process is further described below by way of example. Assume that a title in topic information is divided into 2 independent keywords: the method comprises the steps of 'apple' and 'mobile phone', the category of the subject information is '3G network', Q is 30, when items related to the subject information are inquired in a memory, inquiry items belonging to the category of the '3G network' are inquired firstly, if 100 inquiry items are found, the 100 inquiry items are judged to simultaneously comprise the two keywords 'apple' and 'mobile phone', and if the 100 inquiry items are judged to simultaneously comprise more than 30 inquiry items of the two keywords 'apple' and 'mobile phone', the first 30 inquired inquiry items are taken as the inquiry items related to the subject information.
For the server shown in fig. 2 and 3, the query items related to the topic information input by the user of the seller are selected from the history records, and the query items come from different inputs of various buyers, so that a single problem of generating keywords is avoided, and the product information issued by the seller can meet the characteristic of diversification of query words of the users of the buyers.
On the basis of the above embodiments, in order to send the queried query entry to the client as the keyword of the product to be published, the server may further include a specific structure in fig. 4. As shown in fig. 4, the server 402 includes a second judging unit 4021, a recording unit 4022 and a sending unit 4023, which are connected in sequence, wherein the second judging unit 4021 judges whether the number of online query results of each query entry in the queried query entries is greater than a predetermined threshold value when the server sends the queried query entry to the client as a keyword of a product to be issued; the recording unit 4022 records query entries having the number of online query results greater than a predetermined threshold as a first set of query entries, and records query entries having the number of online query results less than or equal to the predetermined threshold as a second set of query entries; the sending unit 4023 sends the first and second sets of query entries to the client 404 as keywords.
For example, when the predetermined threshold is 100, the query entries with the number of the online query results being less than or equal to 100 may be recorded as a second group of query entries, and the group of query entries may be regarded as blue terms (with relatively high value), and preferably, the group of query entries may be preferentially sent when being sent to the client, so that the client can display the blue terms with higher value to the seller user first, which may more effectively reflect the current tendency of the user query; in addition, the query entries with the number of online query results larger than 100 may be recorded as a first group of query entries, and a part of the first group of query entries may be regarded as popular terms (with relatively low value), and preferably, the first group of query entries as popular terms is sent after the second group of query entries as blue-sea terms is sent to the client. That is, the server recommends keywords reflecting the tendency of the buyer user's query to the seller user through the client in the order of the blue-sea word first and the hot word later. Through the display and recording scheme, the server can recommend the keywords reflecting the query tendency of the buyer to the seller users according to the value, so that the selection efficiency of the seller users is improved.
The recording unit 4022 includes a first recording unit 40221 and a second recording unit 40222, where the first recording unit 40221 records query entries with the number of query results greater than a predetermined threshold by: calculating the correlation degree between each query item and the title in the query items of which the number of the online query results is greater than a preset threshold value; recording query items with the number of query results larger than a preset threshold in a first group of query items according to the sequence of the relevance from large to small; the second recording unit 40222 records query entries having the number of query results less than or equal to a predetermined threshold by: calculating the correlation degree between each query item and the title in the query items of which the number of the online query results is less than or equal to a preset threshold value; and recording the query items of which the number of the query results is less than or equal to a preset threshold in the second group of query items according to the sequence from the large degree to the small degree of correlation.
On the basis of the foregoing embodiments, in order to implement dynamic update of the query entry in the history record, the server may further include an updating unit 406 shown in fig. 4, where the updating unit 406 is connected to the storage 405 and is configured to update the query entry of the history record stored in the storage before the server queries the storage 405 for the query entry related to the subject information. In the preferred embodiment, the server dynamically updates the query entries of the history record, so that the problems of limited number of generated keywords and serious homogenization can be solved, and the keywords reflecting the tendency of the query of the buyer user can be recommended to the seller user in real time.
Example 2
On the basis of the information publishing system in the website shown in fig. 1-4, the present application further provides an information publishing method in the website, as shown in fig. 5, the information publishing method in the website includes the following steps:
s502, a server of an information publishing terminal receives subject information of information to be published input by a user through a client;
s504, the server inquires about the inquiry items related to the subject information from the memory, wherein the inquiry items of the history record are stored in the memory;
s506, the server sends the inquired inquiry items to the client as keywords of the information to be issued;
s508, the server receives the release information obtained by selecting the keywords through the client.
In the preferred embodiment, the server sends the query entries of the history record to the client as the keywords of the information to be published, and effectively recommends the user query tendency of the search information to the information publishing user through the client, so that the recall rate of the information published by the user can be improved, and the purpose of reducing the number of zero/few result query words is finally achieved. In addition, the method and the system effectively recommend the tendency information of the user query of the search information to the user issuing the information through the client, so that the user issuing the information does not need to fill in a large amount of information description content when issuing the information, and the database server storing the information to be issued does not need to store a large amount of information to be issued, namely, the recall rate of the information issued by the user issuing the information is improved under the condition that the storage space of the website database server is not additionally occupied. Preferably, when the user who issues the information is a seller of the e-commerce website, the experience of the buyer on the e-commerce website can be improved, and the transaction enthusiasm of the buyer can be further improved.
Preferably, the subject information includes: the title and category of the information to be published.
In order to enable the server to obtain the query item related to the subject information in the storage, the present application provides two different ways, and the following describes in detail a process of obtaining the query item related to the subject information in the storage by taking the information to be published as the product information to be published by the seller on the e-commerce website as an example, with reference to the accompanying drawings.
(1) Firstly judging the title of the product to be released, and then judging the category of the product to be released
The server may query the memory for query entries related to the subject information by: the method comprises the steps that a server divides a title into M independent keywords, and N keywords are selected from the M keywords, wherein M and N are natural numbers, and M is larger than or equal to N; the server inquires whether a query item comprising N key words exists in the storage; if the query items exist, the server judges whether the number of the query items belonging to the category in the queried query items is larger than or equal to P, if so, the queried query items belonging to the category are taken as query items related to the subject information, wherein P is a preset natural number. Preferably, only a part of the query entries may be used as the query entries related to the topic information, for example, if the number of queried query entries belonging to a category is greater than or equal to P, the top P queried query entries belonging to the category may be used as the query entries related to the topic information.
If the server judges that the number of the query items belonging to the category in the queried query items is less than P, the step of querying the memory for the query items related to the subject information by the server further comprises the following steps: repeatedly executing the following steps until the number of the query items belonging to the category in the queried query items is more than or equal to P: the server makes N-1 and performs the querying step in memory. In the preferred embodiment, the query parameters are dynamically adjusted, so that the required query result can be quickly and accurately obtained.
(2) Firstly judging the category of the product to be released, and then judging the title of the product to be released
The server may also query the memory for query entries related to the subject information by: the server selects the query items belonging to the category from the memory; the method comprises the steps that a server divides a title into M independent keywords, and N keywords are selected from the M keywords, wherein M and N are natural numbers, and M is larger than or equal to N; the server searches whether more than Q query items comprising N keywords exist in the selected query items belonging to the category, wherein Q is a preset natural number; and if so, taking the inquired inquiry item belonging to the category as the inquiry item related to the subject information. Preferably, only a part of the query entries may be used as the query entries related to the topic information, for example, if there are more than Q query entries including N keywords in the selected query entries belonging to the category, the top Q queried query entries belonging to the category may be used as the query entries related to the topic information.
Wherein, if the server judges that the number of the inquired inquiry items belonging to the category is less than Q, the step of the server inquiring the inquiry items related to the subject information from the memory further comprises: repeatedly executing the following steps until the number of the inquired inquiry items belonging to the category is more than or equal to Q: the server makes N-1 and performs the querying step in memory. In the preferred embodiment, the query parameters are dynamically adjusted, so that the required query result can be quickly and accurately obtained.
For the two query modes, the server selects the query items related to the subject information input by the seller user in the historical records, and the query items come from different inputs of various buyers, so that the single problem of generating key words is avoided, and the product information issued by the seller can meet the characteristic of diversification of the query words of the buyer user.
On the basis of the above embodiments, in order to send the queried query entry to the client as the keyword of the product to be published, the server sends the queried query entry to the client as the keyword of the product to be published by the following steps:
s1: the server judges whether the number of the online query results of each query item in the queried query items is greater than a preset threshold value;
s2: the server records the query items with the number of the online query results larger than a preset threshold value as a first group of query items, and records the query items with the number of the online query results smaller than or equal to the preset threshold value as a second group of query items;
for example, when the predetermined threshold is 100, the query entries with the number of the online query results being less than or equal to 100 may be recorded as a second group of query entries, and the group of query entries may be regarded as blue terms (with relatively high value), and preferably, the group of query entries may be preferentially sent when being sent to the client, so that the client can display the blue terms with higher value to the seller user first, which may more effectively reflect the current tendency of the user query; in addition, the query entries with the number of online query results larger than 100 may be recorded as a first group of query entries, and a part of the first group of query entries may be regarded as popular terms (with relatively low value), and preferably, the first group of query entries as popular terms is sent after the second group of query entries as blue-sea terms is sent to the client. That is, the server recommends keywords reflecting the tendency of the buyer user's query to the seller user through the client in the order of the blue-sea word first and the hot word later. Through the display and recording scheme, the server can recommend the keywords reflecting the query tendency of the buyer to the seller users according to the value, so that the selection efficiency of the seller users is improved.
Further, for a recording and displaying scheme among query items belonging to the blue-sea term, the recording and displaying scheme can be performed according to the matching relevance between the query items and the titles, and specifically comprises the following steps: the server calculates the correlation degree between each query item and the title in the query items of which the number of the online query results is less than or equal to a preset threshold value; and recording the query items of which the number of the query results is less than or equal to a preset threshold in the second group of query items according to the sequence from the large degree to the small degree of correlation.
Further, for the recording and displaying scheme among the query items belonging to the popular terms, the recording and displaying scheme can be performed according to the matching relevance of the query items and the titles, and the method specifically comprises the following steps: the server calculates the correlation degree between each query item and the title in the query items of which the number of the online query results is greater than a preset threshold value; and recording the query items of which the number of the query results is greater than a preset threshold in the first group of query items according to the sequence from the large degree to the small degree of the correlation.
S3: the server sends the first set of query entries and the second set of query entries as keywords to the client.
Through the display and recording scheme, the keywords reflecting the query tendency of the buyer can be recommended to the seller according to the value, so that the selection efficiency of the seller is improved.
On the basis of the above embodiments, in order to implement dynamic update of the query entry in the history record, the server may further perform dynamic update of the query entry in the history record, and the specific process includes: the server updates the historical query entries stored in the memory before the server queries the memory for query entries related to the subject information. In the preferred embodiment, the server dynamically updates the query entries of the history record, so that the problems of limited number of generated keywords and serious homogenization can be solved, and the keywords reflecting the tendency of the query of the buyer user can be recommended to the seller user in real time.
Specific examples are described in detail below with reference to the drawings and the product information distribution system and method in electronic commerce described above.
As shown in fig. 6, when publishing product information, the seller user selects categories, fills in titles and keywords, and other information on the server at the product publishing end, and then stores the information in the data warehouse (which may also be understood as being stored in the database); then, it will be indexed by Build indexing machine. Accordingly, when the buyer user inputs the keyword in the search engine, the corresponding product can be retrieved. From the above description, it can be known that the keywords filled out by the seller user at the product issuing end are an important factor for whether the product can be indexed. However, in reality, the seller does not know the search habit of the buyer and the information of the search focus, and therefore the search query term of the user cannot be matched precisely when filling out the keyword of the product.
In view of the above, the present application provides a product information publishing method as shown in fig. 7, which recommends keywords reflecting buyer search habits and search hot spots to a seller user through machine learning and data mining technologies when the seller user publishes product information, so that the product information published by the seller user can correspond to the search habits and the search hot spots of the current buyer, thereby increasing the exposure rate of the product on an electronic website, and accordingly, the proportion of zero or few result query terms can be reduced on the whole, and the search experience of the query user on the electronic commerce website can be improved.
Referring to fig. 7, the system is divided into a background data mining module 702 and a foreground keyword automatic recommendation module 704. The background data mining module 702 mainly establishes an association relationship between query terms (query) through products (offer) under various categories, for example, the association relationship between query terms (query) may be established according to click rate and exposure rate of the products (offer), where two query terms are both related to the same offer, and it is considered that the two query terms have a certain association relationship. And then, by iteratively calculating the association degree between the candidate query and the query, the synonymy relationship between the query and the query can be mined, and further, a combined synonym and a complete synonym are mined from the query synonyms. In addition, the system also comprises a background query log (querylog) processing module which is mainly used for performing data cleaning on the query (including normalized writing, forbidden word filtering, invalid word filtering, spelling error correction filtering and keyword length filtering), cat _ compute (category calculation), update _ data (data day update), merge _ data (data merging) and build inverted index.
The processing flow of the foreground keyword automatic recommendation module 704 includes:
s1: and performing central information extraction on the title information (title) of the product input by the module and the category to which the title information belongs. Specifically, first, a series of information processing processes are performed on the title, including: performing vocabulary (token) processing on the title (dividing the title into independent English keywords) to obtain each token and part of speech; simple syntactic analysis extracts the NP (Noun Phrase ) of title. Then, information such as central NP (expanded) and central words of the title is extracted by applying the technologies such as statistics, machine learning and the like.
S2: the synonyms in the title are scanned and located according to the synonym information output by the background data mining module 702.
S3: and scoring each keyword. Specifically, each NP is extracted and each keyword is scored.
S4-S5: and performing word loss, combination and retrieval according to the scores in the step S3 to obtain a candidate recommended word set. Specifically, the words are sorted according to the scores of the keywords in each NP, the words are sequentially lost according to the scores, and the inverted index is inquired. Here, the word loss in order of scores means: let N be N-1, that is, select the keyword with the top N-1 score as the candidate keyword corresponding to the title of the released product. The initial value of N may be set in advance, for example, 10 to 50.
S6: and filtering the retrieval result according to the category. Specifically, it is determined whether each of the above-described candidate keywords corresponding to the title of the issued product belongs to the category of the issued product.
S7: and if the number of the candidate keywords meeting the category meets the maximum searching number, turning to S8, otherwise, turning to S4.
S8-S9: and (5) carrying out synonym replacement and retrieval to obtain a candidate recommended word set. That is, the method of retrieving and recommending the synonym of the candidate keyword determined in step S5 may include: and replacing the current synonym of the keyword, then retrieving the replaced synonym, sequencing the retrieved synonyms, and selecting a plurality of synonyms with the top rank as a candidate recommending set. The adopted sorting method can comprise the following steps:
1) for the complete synonym contained in the title and close to the core word, directly using the synonym to search the inverted index;
2) for a combined synonym contained in a title and close to a core word, the synonym is combined with other core words, and then the inverted index is retrieved.
S10: and filtering the retrieval result according to the category. Specifically, it is determined whether each of the above-described candidate keywords corresponding to the title of the issued product belongs to the category of the issued product.
S11: and if the number of the candidate keywords meeting the category meets the maximum searching number, turning to S12, otherwise, turning to S8.
S12: and dividing the blue-sea words and the hot words. Since the recommended keywords have two types of usages, one is a hit word and the other is a blue-sea word, the criterion for the two types of division is whether the number of results is greater than a predetermined threshold (e.g., 100). Wherein, the blue sea word is less than the hot word.
S13-S14: and sorting the similarity, and recommending the sorted keywords to the seller user. Wherein the keywords may be ordered in the following manner:
1) the blue-sea words are arranged first, and then the popular words are arranged. This is because the value of the blue-sea word is greater than the popular word.
2) Ranking among blue-sea words or among popular words according to the scores and search times of the keywords, specifically
i) Firstly, sorting according to the size of score;
ii) then, if score identity difference is within 0.01, then rank by number of searches.
The calculation method of the score of the keyword is described below: calculating the matching relevance of the title and the keyword (query), and normalizing the matching relevance, the category relevance and the competition degree to obtain the score (score) of the keyword,
1) match correlation calculation (match _ replace):
the title (title) and the query word (query) are respectively considered as two vectors X, Y, all non-repetitive words in the title and the query word are taken as one dimension of the vector, and X is [ X1, X2,. xn ], Y is [ Y1, Y2,. yn ], wherein X1 to xn, Y1 to yn represent the score of each word in the two vectors (if a word does not appear in the query word or title, the dimension score is 0).
At the same time, the query terms that are completely contained by the title need to be filtered, because such query terms do not help to improve the recall rate of the search.
2) Category dependency (cat _ relevance):
using category calculation tools under the line, and calculating the probability of the query word belonging to certain categories;
and on-line supposing that the category selected when the seller releases the product is the category i, the category correlation of the query word is the probability that the query word belongs to the category i.
3) Normalized correlation score (relevance)
(match _ reservance text match relevance weight + cat _ reservance category match relevance weight)/(text match relevance weight + category match relevance weight).
4) Competition degree score (competition)
The competition degree needs to consider the number of search times (search _ cnt) and the number of search results (result _ num), but the number of search results is usually large, so the number of results needs to be converted into the number of results according to the number of results (page _ num) that can be displayed in one page, and the number of results is no longer different from the number of results that is larger than 20 pages, namely the maximum value of the number of results is 20; the resulting page number is multiplied by a certain penalty value (page _ penalty) to calculate result _ rank. The competitiveness score is proportional to the number of searches and inversely proportional to the number of pages of the search results.
The calculation formula is as follows:
result_rank=(result_num/page_num)×page_penalty+1.0
competition=log10(search_cnt/result_rank)/4.0+0.3
5) normalized Total score (score)
score (reservance correlation score weight + competition score weight)/(correlation score weight + competition score weight).
By the product information publishing system and method described in the above embodiments, the query log and the click log of the user reflect the query intention of the user to a great extent, a mapping model between offer and query terms can be established by the related field technologies such as machine learning and information processing, and technical support is provided for providing keyword recommendation for the offer publishing end.
The method and the device have high commercial value, and in the field of electronic commerce search, the specific gravity of the query words with few search results is high, so that the website experience of the query user is seriously influenced. The reasons for the zero-less search results are mainly as follows: the query words which reflect the search intention of the user are input by the user to be inaccurate; the seller does not fill in rich information when issuing commodity information, particularly information points concerned by the user, such as attributes, models and the like; the seller does not publish the goods desired by the user. The previous research has focused on the former case, and the main techniques include query rewrite, query expansion, and the like. The application focuses on solving the latter situation, and the main ideas are as follows: and recommending the keywords which have high user attention and belong to zero/few search results at the commodity publishing end, and guiding the seller to fill in the recommended keywords, thereby finally achieving the purpose of improving the overall recall rate of the query terms.
In the preferred embodiment, recommended words are dynamically mined according to the query log and the click relation of the user, and the ordering of the recommended words effectively reflects the tendency of the user to input the query words; by analyzing the keywords and the title and category of the offer filled in by the seller, the relevance between the keywords and the offer is calculated, and meanwhile, the recommendation set is updated regularly, so that the accuracy and timeliness of the recommended words are enhanced; in addition, the generated keywords are diversified, and the coverage rate of the electronic commodity keywords in the query words of the user is improved.
It will be apparent to those skilled in the art that the modules or steps of the present application described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present application is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (17)

12. The system according to claim 11, wherein the server is further configured to, when the first determining unit determines that the number of query entries belonging to the category in the queried query entries is less than P, repeatedly execute the following steps until the number of query entries belonging to the category in the queried query entries is greater than or equal to P: the server makes N equal to N-1; informing the first title processing unit to select N keywords from the M keywords; notifying the first search unit of a search for whether a search entry including the N keywords exists from the memory; and informing the first judging unit to judge whether the number of the query items belonging to the category in the queried query items is more than or equal to P when the query items comprising the N key words exist, and if so, taking the former P queried query items belonging to the category as the query items related to the subject information.
HK13103670.6A2013-03-25Method and system for information distribution in a websiteHK1176431B (en)

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201110221386.5ACN102915312B (en)2011-08-032011-08-03Information issuing method in website and system

Publications (2)

Publication NumberPublication Date
HK1176431A1true HK1176431A1 (en)2013-07-26
HK1176431B HK1176431B (en)2017-08-18

Family

ID=

Also Published As

Publication numberPublication date
CN102915312A (en)2013-02-06
CN102915312B (en)2016-08-24

Similar Documents

PublicationPublication DateTitle
US10459989B1 (en)Providing result-based query suggestions
US11294970B1 (en)Associating an entity with a search query
CN112632359B (en)Information recommendation method, device, electronic equipment and storage medium
CN103699700B (en)A kind of generation method of search index, system and associated server
CN102725759B (en)Semantic directory for search results
CN103377232B (en)Headline keyword recommendation method and system
KR101215791B1 (en)Using reputation measures to improve search relevance
JP4859892B2 (en) Product advertisement distribution device, product advertisement distribution method, and product advertisement distribution control program
US7870135B1 (en)System and method for providing tag feedback
US20070083507A1 (en)Identifying the items most relevant to a current query based on items selected in connection with similar queries
US20130282702A1 (en)Method and system for search assistance
US20050125240A9 (en)Product recommendation in a network-based commerce system
US20120150861A1 (en)Highlighting known answers in search results
CN111654714B (en)Information processing method, apparatus, electronic device and storage medium
JP5084673B2 (en) Product information retrieval apparatus, method and system
KR20080024208A (en) System and method for providing search results
US20110307504A1 (en)Combining attribute refinements and textual queries
US20150215271A1 (en)Generating suggested domain names by locking slds, tokens and tlds
US20150169576A1 (en)Dynamic Search Results
US9690858B1 (en)Predicting categorized completions of a partial search term
JP2002215659A (en)Information retrieval support method and information retrieval support system
JP2016131045A (en)Search method, apparatus and server for online trading platform
US20150347423A1 (en)Methods for completing a user search
US11055335B2 (en)Contextual based image search results
US20190065502A1 (en)Providing information related to a table of a document in response to a search query

Legal Events

DateCodeTitleDescription
PCPatent ceased (i.e. patent has lapsed due to the failure to pay the renewal fee)

Effective date:20240731


[8]ページ先頭

©2009-2025 Movatter.jp