Movatterモバイル変換


[0]ホーム

URL:


CN101158971A - Method and device for sorting search results based on search engine - Google Patents

Method and device for sorting search results based on search engine
Download PDF

Info

Publication number
CN101158971A
CN101158971ACNA2007101872765ACN200710187276ACN101158971ACN 101158971 ACN101158971 ACN 101158971ACN A2007101872765 ACNA2007101872765 ACN A2007101872765ACN 200710187276 ACN200710187276 ACN 200710187276ACN 101158971 ACN101158971 ACN 101158971A
Authority
CN
China
Prior art keywords
word
keyword
search
sorted
network resource
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2007101872765A
Other languages
Chinese (zh)
Other versions
CN100557612C (en
Inventor
刘汉洲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhigu Ruituo Technology Services Co Ltd
Original Assignee
Shenzhen Xunlei Networking Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Xunlei Networking Technologies Co LtdfiledCriticalShenzhen Xunlei Networking Technologies Co Ltd
Priority to CNB2007101872765ApriorityCriticalpatent/CN100557612C/en
Publication of CN101158971ApublicationCriticalpatent/CN101158971A/en
Application grantedgrantedCritical
Publication of CN100557612CpublicationCriticalpatent/CN100557612C/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Landscapes

Abstract

Translated fromChinese

本发明公开了一种基于搜索引擎的搜索结果排序方法及装置,涉及搜索引擎领域,使排序结果更加贴近用户的需求。方法包括:对用户输入的搜索词进行分词处理;以分词处理所得的分词分别在关键词索引中进行查找,以确定所述搜索词在各待排序网络资源中的关键词权重;确定所述搜索词在各待排序网络资源中的总权重;以及按照总权重对所述各待排序的网络资源进行排序,并呈现给用户。装置包括:分词单元、关键词权重确定单元、总权重确定单元、排序单元和呈现单元。

Figure 200710187276

The invention discloses a method and device for sorting search results based on a search engine, relates to the field of search engines, and makes the sorting results closer to the needs of users. The method includes: performing word segmentation processing on the search word input by the user; searching the word segmentation obtained from the word segmentation processing in the keyword index to determine the keyword weight of the search word in each network resource to be sorted; determining the search word The total weight of the words in each network resource to be sorted; and the network resources to be sorted are sorted according to the total weight, and presented to the user. The device includes: a word segmentation unit, a keyword weight determination unit, a total weight determination unit, a sorting unit and a presentation unit.

Figure 200710187276

Description

Translated fromChinese
一种基于搜索引擎的搜索结果排序方法及装置Method and device for sorting search results based on search engine

技术领域technical field

本发明涉及搜索引擎领域,特别是涉及一种基于搜索引擎的搜索结果排序方法及装置。The invention relates to the field of search engines, in particular to a method and device for sorting search results based on a search engine.

背景技术Background technique

随着搜索引擎技术的不断发展与信息处理技术的进步,人们对搜索引擎的需求也越来越广泛,搜索引擎的种类也呈现多样化发展。目前,主流的搜索引擎分为:全文搜索引擎、目录搜索引擎和元搜索引擎。近期,垂直搜索引擎也逐渐进入了人们的视野。With the continuous development of search engine technology and the advancement of information processing technology, people's needs for search engines are becoming more and more extensive, and the types of search engines are also developing in diversification. At present, the mainstream search engines are divided into: full-text search engines, directory search engines and meta-search engines. Recently, vertical search engines have gradually entered people's field of vision.

在搜索引擎领域,评价一个搜索引擎好坏的重要标准就是能否让用户尽快找到所需的信息,即与用户搜索主题相关的各种信息。In the field of search engines, an important criterion for evaluating a search engine is whether it allows users to find the desired information as soon as possible, that is, various information related to the user's search topic.

近年来,各大搜索引擎在搜索结果的相关性排序上均做了优化。所谓搜索结果的相关性就是指用户搜索词和页面的相关程度。通常,相关性是搜索引擎进行排序的一个重要依据。计算页面相关性的主要方法有google的PageRank、Bharat的HillTop、百度的超链分析等。它们的基本原理就是根据网页的引用情况进行排序。In recent years, major search engines have optimized the relevance ranking of search results. The so-called relevance of search results refers to the degree of relevance between the user's search terms and the page. Usually, relevance is an important basis for search engines to sort. The main methods for calculating page relevance include Google's PageRank, Bharat's HillTop, Baidu's hyperlink analysis, etc. Their basic principle is to sort according to the citations of web pages.

但是由于在中文搜索引擎中存在分词的问题,词典是以搜索词作为查询词的搜索引擎的基础。词典的好坏在一定程度上决定着搜索结果排序的效果。词典过小,会导致过多的无关信息的出现;词典过大,有时会导致部分词搜索结果的主题过少等问题出现。因此如何确定词典,增加新的扩充词典集,使搜索引擎的结果更加准确、更加人性化就随之成为一个备受关注的问题。However, due to the problem of word segmentation in Chinese search engines, dictionaries use search words as the basis of search engines for query words. The quality of the dictionary determines the effect of sorting the search results to a certain extent. If the dictionary is too small, too much irrelevant information will appear; if the dictionary is too large, sometimes there will be too few topics in the search results for some words. Therefore, how to determine the dictionary, add a new extended dictionary set, and make the results of the search engine more accurate and more humanized has become a problem that has attracted much attention.

发明内容Contents of the invention

本发明实施例提供一种基于搜索引擎的搜索结果排序方法及装置,使排序结果更加贴近用户的需求。Embodiments of the present invention provide a method and device for sorting search results based on a search engine, so that the sorting results are closer to user needs.

本发明实施例的一种基于搜索引擎的搜索结果排序方法,包括下列步骤:对用户输入的搜索词进行分词处理;以分词处理所得的分词分别在关键词索引中进行查找,以确定所述搜索词在各待排序网络资源中的关键词权重;确定所述搜索词在各待排序网络资源中的总权重;以及按照总权重对所述各待排序的网络资源进行排序,并呈现给用户。A method for sorting search results based on a search engine in an embodiment of the present invention includes the following steps: performing word segmentation processing on the search word input by the user; searching the word segmentation obtained from the word segmentation processing in the keyword index respectively to determine the search term. The keyword weights of the words in each network resource to be sorted; determining the total weight of the search word in each network resource to be sorted; and sorting the network resources to be sorted according to the total weight, and presenting it to the user.

本发明实施例的一种基于搜索引擎的搜索结果排序装置,包括:分词单元,用于对用户输入的搜索词进行分词处理;关键词权重确定单元,用于以分词处理所得分词分别在关键词索引中进行查找,以确定所述搜索词在各待排序网络资源中的关键词权重;总权重确定单元,用于确定所述搜索词在各待排序网络资源中的总权重;排序单元,用于按照总权重对所述各待排序的网络资源进行排序;呈现单元,用于向用户呈现排序结果。A device for sorting search results based on a search engine in an embodiment of the present invention includes: a word segmentation unit for performing word segmentation processing on the search words input by the user; search in the index to determine the keyword weight of the search term in each network resource to be sorted; the total weight determination unit is used to determine the total weight of the search term in each network resource to be sorted; the sorting unit uses for sorting the network resources to be sorted according to the total weight; the presenting unit is configured to present the sorting result to the user.

综上所述,本发明实施例中对用户输入的搜索词进行分词处理;以分词处理所得的分词分别在关键词索引中进行查找,以确定所述搜索词在各待排序网络资源中的关键词权重,并确定所述搜索词在各待排序网络资源中的总权重。由于总权重中考虑了搜索词与关键词的匹配等情况,所以按照总权重对所述各待排序的网络资源进行排序并呈现给用户,可更加贴近用户的需求。To sum up, in the embodiment of the present invention, word segmentation processing is performed on the search word input by the user; the word segmentation processing obtained by word segmentation processing is respectively searched in the keyword index to determine the keyword of the search word in each network resource to be sorted. The word weight, and determine the total weight of the search word in each network resource to be sorted. Since the total weight takes into account the matching of search words and keywords, the network resources to be sorted are sorted according to the total weight and presented to the user, which can be more close to the needs of the user.

附图说明Description of drawings

图1为本发明实施例的方法步骤流程图;Fig. 1 is the flow chart of the method step of the embodiment of the present invention;

图2为本发明实施例的装置结构示意图;Fig. 2 is the device structural representation of the embodiment of the present invention;

图3为本发明实施例的装置优化结构示意图;Fig. 3 is a schematic diagram of an optimized structure of a device according to an embodiment of the present invention;

图4为本发明实施例的索引示意图;FIG. 4 is a schematic diagram of an index according to an embodiment of the present invention;

图5为本发明实施例中确定待排序网络资源的示意图;FIG. 5 is a schematic diagram of determining network resources to be sorted in an embodiment of the present invention;

图6为本发明实施例中查询分词权重的示意图。FIG. 6 is a schematic diagram of query word segmentation weights in an embodiment of the present invention.

具体实施方式Detailed ways

为了使排序结果更加贴近用户的需求,本发明实施例提供了一种基于搜索引擎的搜索结果排序方法及装置,以下分别简要概述。In order to make the sorting results closer to the needs of users, embodiments of the present invention provide a method and device for sorting search results based on a search engine, which are briefly summarized below.

本发明实施例提供的一种基于搜索引擎的搜索结果排序方法,在进行了一些预先设置,用户输入搜索词,并且确定了待排序网络资源之后,参见图1所示,执行下列主要步骤:In the method for sorting search results based on a search engine provided by an embodiment of the present invention, after some preset settings are made, the user inputs a search word, and the network resources to be sorted are determined, as shown in FIG. 1 , the following main steps are performed:

S1、对用户输入的搜索词进行分词处理(本步骤也可在确定待排序网络资源之前执行)。S1. Perform word segmentation processing on the search word input by the user (this step may also be performed before determining the network resources to be sorted).

S2、以分词处理所得的分词分别在关键词索引中进行查找,以确定所述搜索词在各待排序网络资源(包括但不限于网页资源及下载资源,以下不再赘述)中的关键词权重。S2. The word segmentation obtained by the word segmentation processing is searched in the keyword index respectively to determine the keyword weight of the search word in each network resource to be sorted (including but not limited to webpage resources and download resources, which will not be described in detail below). .

S3、确定所述搜索词在各待排序网络资源中的总权重。S3. Determine the total weight of the search word in each network resource to be ranked.

S4、按照总权重对所述各待排序的网络资源进行排序,并呈现给用户。S4. Sort the network resources to be sorted according to the total weight, and present it to the user.

在用户输入搜索词进行搜索之前,预先的设置步骤,具体包括:Before the user enters the search term to search, the pre-setting steps include:

定制关键词词典的步骤:以词和词的属性作为基本结构,定制的关键词词典中包括各有效词和每一有效词对应的属性,以及各无效词和每一无效词对应的属性。所述无效词的集合与有效词的集合互为互斥关系,并且一个无效词包含的字符覆盖一个有效词包含的字符。所述词的属性以字符型数字表示,每一位字符分别表示所述词的一种属性。The steps of customizing the keyword dictionary: taking words and their attributes as the basic structure, the customized keyword dictionary includes each valid word and the attribute corresponding to each valid word, and each invalid word and the attribute corresponding to each invalid word. The set of invalid words and the set of valid words are mutually exclusive, and the characters contained in one invalid word cover the characters contained in one valid word. The attributes of the word are represented by character numbers, and each character represents an attribute of the word.

提取关键词的步骤:依据关键词词典,按最大匹配原则对每一网络资源的主题信息进行分词处理;根据分词处理所得分词的属性对该分词进行过滤,以提取每一网络资源的主题信息的关键词。其中,以网页的标题作为该网页的主题信息,或者从网页的内容中提取该网页的主题信息,或者以描述下载资源的信息作为主题信息等。The step of extracting keywords: according to the keyword dictionary, the subject information of each network resource is segmented according to the principle of maximum matching; the word segmentation is filtered according to the attributes of the segmented words obtained by word segmentation, so as to extract the subject information of each network resource Key words. Wherein, the title of the web page is used as the subject information of the web page, or the subject information of the web page is extracted from the content of the web page, or the information describing the downloaded resources is used as the subject information.

建立关键词索引的步骤:分别对每一网络资源的主题信息的各关键词采用基础分词词典进行分词处理,并建立关键词的各分词到网络资源的关键词索引。The step of establishing a keyword index: each keyword of the subject information of each network resource is segmented using a basic word segmentation dictionary, and a keyword index from each word segmentation of the keyword to the network resource is established.

建立资源索引的步骤:根据基础分词词典对网络资源的主题信息进行分词处理,并建立网络资源的各分词到网络资源的资源索引。The step of establishing a resource index: perform word segmentation processing on the subject information of the network resource according to the basic word segmentation dictionary, and establish a resource index from each word segment of the network resource to the network resource.

配置权重的步骤:根据关键词的各分词词长占该关键词词长的比例,为各分词分别配置分词权重;或者根据网络资源的信息(包括但不限于:被浏览次数和/或被引用情况和/或被下载次数和/或文件格式,以下不再赘述),为该网络资源配置静态权重,以及根据关键词的各分词词长占该关键词词长的比例,为各分词分别配置分词权重。配置的权重可记录在上述资源索引和关键词索引中。配置权重后在S2中,可将对搜索词进行分词处理所得分词分别在关键词索引中进行查找,以确定每一分词在各待排序网络资源的主题信息的关键词中的分词权重,并将各分词在同一待排序网络资源的主题信息中的分词权重相加,作为搜索词在该待排序网络资源中的关键词权重。在S3中,可取搜索词在当前待排序网络资源中的关键词权重作为总权重;也可取根据当前待排序网络资源的信息配置的静态权重和搜索词在当前待排序网络资源中的关键词权重,并将该静态权重与关键词权重组合成当前待排序网络资源的总权重;或者以其它相关权重与关键词权重组合成当前待排序网络资源的总权重。Steps for configuring weights: according to the ratio of the word length of each word segment of the keyword to the length of the keyword, configure word segmentation weights for each word segment; or according to the information of network resources (including but not limited to: the number of times viewed and/or cited situation and/or the number of downloads and/or file formats, which will not be described in detail below), configure static weights for the network resource, and configure each partici Word segmentation weight. The configured weights can be recorded in the resource index and keyword index mentioned above. After the weight is configured, in S2, the word segmentation processing of the search word can be searched in the keyword index respectively to determine the weight of each word segmentation in the keywords of the subject information of the network resources to be sorted, and The weights of each participle in the subject information of the same network resource to be sorted are summed up to serve as the keyword weight of the search term in the network resource to be sorted. In S3, the keyword weight of the search term in the current network resource to be sorted can be taken as the total weight; the static weight configured according to the information of the current network resource to be sorted and the keyword weight of the search term in the current network resource to be sorted can also be taken , and combine the static weight with the keyword weight to form the total weight of the current network resource to be sorted; or combine other relevant weights with the keyword weight to form the total weight of the current network resource to be sorted.

在用户输入搜索词进行搜索之后,确定待排序网络资源具体以对搜索词进行分词处理所得分词分别在资源索引中进行查找,以分别确定每一分词所属的网络资源的集合;取各所述集合的交集,作为待排序的网络资源。After the user enters the search word to search, determine the specific network resources to be sorted and search the word segmentation for the search word. The resulting word segmentation is respectively searched in the resource index to determine the set of network resources to which each word segmentation belongs; take each set The intersection of , as the network resources to be sorted.

本发明实施例还提供了一种基于搜索引擎的搜索结果排序装置,参见图2所示,其包括:分词单元、关键词权重确定单元、总权重确定单元、排序单元和呈现单元。The embodiment of the present invention also provides a device for sorting search results based on a search engine, as shown in FIG. 2 , which includes: a word segmentation unit, a keyword weight determination unit, a total weight determination unit, a sorting unit and a presentation unit.

分词单元,用于对用户输入的搜索词进行分词处理。The word segmentation unit is used to perform word segmentation processing on the search word input by the user.

关键词权重确定单元,用于以分词处理所得分词分别在关键词索引中进行查找,以确定所述搜索词在各待排序网络资源中的关键词权重。The keyword weight determination unit is used to search the keyword index for the segmented words obtained through the word segmentation process, so as to determine the keyword weight of the search word in each network resource to be sorted.

总权重确定单元,用于确定所述搜索词在各待排序网络资源中的总权重。The total weight determination unit is configured to determine the total weight of the search term in each network resource to be sorted.

排序单元,用于按照总权重对所述各待排序的网络资源进行排序。The sorting unit is configured to sort the network resources to be sorted according to the total weight.

呈现单元,用于向用户呈现排序结果。The presentation unit is configured to present the sorting results to the user.

进一步为了提供上述单元所需的信息,参见图3所示,所述装置还包括:定制单元、提取单元、关键词索引建立单元、资源索引建立单元、确定单元和配置单元。Further, in order to provide information required by the above units, as shown in FIG. 3 , the device further includes: a customization unit, an extraction unit, a keyword index establishment unit, a resource index establishment unit, a determination unit and a configuration unit.

定制单元,用于以词和词的属性作为基本结构,定制关键词词典;定制的关键词词典中包括各有效词和每一有效词对应的属性,以及各无效词和每一无效词对应的属性。The custom unit is used to customize the keyword dictionary with the word and the attribute of the word as the basic structure; the customized keyword dictionary includes each valid word and the corresponding attribute of each valid word, and each invalid word and the corresponding attribute of each invalid word Attributes.

提取单元,用于依据关键词词典,按最大匹配原则对每一网络资源的主题信息进行分词处理;根据分词处理所得分词的属性对该分词进行过滤,以提取每一网络资源的主题信息的关键词。The extraction unit is used to perform word segmentation processing on the theme information of each network resource according to the principle of maximum matching according to the keyword dictionary; filter the word segmentation according to the attributes of the word segmentation processing to extract the key words of the theme information of each network resource word.

关键词索引建立单元,用于根据基础分词词典分别对每一网络资源的主题信息的各关键词进行分词处理,并建立关键词的各分词到网络资源的关键词索引,以备关键词权重确定单元调用。The keyword index establishment unit is used to perform word segmentation processing on each keyword of the subject information of each network resource according to the basic word segmentation dictionary, and establishes keyword indexes from each word segmentation of the keyword to the network resource, in order to determine the weight of the keyword unit calls.

资源索引建立单元,用于根据基础分词词典对网络资源的主题信息进行分词处理,并建立网络资源的各分词到网络资源的资源索引。The resource index building unit is configured to perform word segmentation processing on the topic information of the network resources according to the basic word segmentation dictionary, and establish a resource index from each word segmentation of the network resources to the network resources.

确定单元,以对搜索词进行分词处理所得分词分别在资源索引中进行查找,以分别确定每一分词所属的网络资源的集合;取各所述集合的交集,作为待排序的网络资源。The determining unit is used to search the resource index for the segmented words obtained by segmenting the search word, so as to respectively determine the set of network resources to which each segmented word belongs; take the intersection of each set as the network resource to be sorted.

配置单元,用于根据关键词的各分词词长占该关键词词长的比例,为各分词分别配置分词权重;或者根据网络资源的信息,为该网络资源配置静态权重,并根据关键词的各分词词长占该关键词词长的比例,为各分词分别配置分词权重。配置单元配置权重后,关键词权重确定单元可将对搜索词进行分词处理所得分词分别在关键词索引中进行查找,以确定每一分词在各待排序网络资源的主题信息的关键词中的分词权重,并将各分词在同一待排序网络资源的主题信息中的分词权重相加,作为搜索词在该待排序网络资源中的关键词权重。总权重确定单元可取搜索词在当前待排序网络资源中的关键词权重作为总权重;也可取根据当前待排序网络资源的信息配置的静态权重和搜索词在当前待排序网络资源中的关键词权重,并将该静态权重与关键词权重组合成当前待排序网络资源的总权重;或者以其它相关权重与关键词权重组合成当前待排序网络资源的总权重。The configuration unit is used to configure the word segmentation weight for each word according to the ratio of the word length of each word segment of the keyword to the length of the keyword; or configure the static weight for the network resource according to the information of the network resource, and according to the keyword. The proportion of the word length of each participle to the length of the keyword, and the participle weight is configured for each participle. After the weight is configured by the configuration unit, the keyword weight determination unit can search the word segmentation for the search word in the keyword index respectively, so as to determine the word segmentation of each word in the keywords of the subject information of the network resources to be sorted weight, and add up the word segmentation weights of each word in the topic information of the same network resource to be sorted, as the keyword weight of the search term in the network resource to be sorted. The total weight determination unit can take the keyword weight of the search term in the current network resource to be sorted as the total weight; it can also take the static weight according to the information configuration of the current network resource to be sorted and the keyword weight of the search term in the current network resource to be sorted , and combine the static weight with the keyword weight to form the total weight of the current network resource to be sorted; or combine other relevant weights with the keyword weight to form the total weight of the current network resource to be sorted.

至此,对本发明实施例的方法及装置的概述完毕。以下通过1个实施例进一步详细描述本发明。So far, the overview of the method and device of the embodiment of the present invention is completed. The present invention is further described in detail by an embodiment below.

实施例1、本实施例包括设置步骤、确定待排序网络资源的步骤、计算权重的步骤、排序步骤,以及呈现步骤。其中设置步骤包括:关键词词典的定制子步骤、关键词的提取子步骤、建立关键词索引的子步骤、建立资源索引的子步骤,以及权重配置子步骤。Embodiment 1. This embodiment includes a setting step, a step of determining network resources to be sorted, a step of calculating weights, a sorting step, and a presenting step. The setting steps include: a sub-step of customizing the keyword dictionary, a sub-step of extracting keywords, a sub-step of establishing a keyword index, a sub-step of establishing a resource index, and a sub-step of weight configuration.

101、关键词词典的定制。101. Customization of keyword dictionary.

关键词,即能够标识一个网络资源(网页资源或下载资源)的主题信息的词汇。例如,在搜索引擎中,用户经常会输入软件名称+ “下载”,电影名+“高清晰”等词组,这里的软件名称和电影名就可以定义为这些词组的关键词。Keywords are words that can identify subject information of a network resource (web page resource or download resource). For example, in a search engine, users often enter phrases such as software name + "download", movie name + "high definition", and the software name and movie name here can be defined as keywords of these phrases.

为了有效提取一个网络资源的主题信息的关键词,首先需要建立一个关键词词典。根据用户的日常搜索习惯统计,在影视搜索引擎、音乐搜索引擎以及通用搜索引擎中,用户常常会输入影视名、歌曲名、歌手名等词汇作为搜索词。因此,可以根据目前流行的电影、电视剧、歌曲、歌手、演员等信息建立关键词词典。该词典的基本结构为:(词,属性)。其中,属性描述了词的有效性及类别,如是否有效,是否为电影名、歌名、软件名等。In order to effectively extract the keywords of the subject information of a network resource, it is first necessary to establish a keyword dictionary. According to the statistics of users' daily search habits, in video search engines, music search engines, and general search engines, users often input words such as movie titles, song titles, and singer names as search terms. Therefore, a keyword dictionary can be established according to information such as currently popular movies, TV dramas, songs, singers, and actors. The basic structure of the dictionary is: (word, attribute). Among them, the attribute describes the validity and category of the word, such as whether it is valid, whether it is a movie name, song name, software name, etc.

本实施例采用以下方式(但不限于该方式)描述属性:以一个字节的字符型数字按位描述属性信息,共8位,每一位代表该词的一种属性,1为具有该属性,0为不具有该属性。如“英雄”既可以是电影名又可以是电视剧名,它的属性就可以表示为11100000,各位属性信息参见表1所示:This embodiment adopts the following method (but not limited to this method) to describe the attribute: describe the attribute information bit by bit with a character number of one byte, a total of 8 bits, each bit represents an attribute of the word, and 1 means having the attribute , 0 means not having this attribute. For example, "hero" can be either a movie name or a TV series name, and its attributes can be expressed as 11100000. Please refer to Table 1 for each attribute information:

  77  66  55  44  33  2 2  1 1  00  有效性Validity  影视film and television  电视剧 TV drama  歌名song title  歌手singer  导演director  演员 actor  软件名software name

表1Table 1

其中最高位(即表1所示的第7位)的属性定义如下:该位记录了关键词词典中词的有效属性,无效词集合与有效词集合互为互斥关系。无效词集合中的词A在字面上会包含有效词集合中的某个词B,如某电影名“东”这个词为有效词,“东方”、“东门”等为无效词。无效词的优先确定原则为:字面上包含某个有效词,但不属于有效词集合,而且不是某个电影名、歌名等可以作为关键词的词汇。Wherein the attribute of the highest bit (ie the 7th bit shown in Table 1) is defined as follows: this bit records the valid attributes of words in the keyword dictionary, and the set of invalid words and the set of valid words are mutually exclusive. Word A in the set of invalid words will literally contain a certain word B in the set of valid words. For example, the word "Dong" in the title of a movie is a valid word, and "Dongfang" and "Dongmen" are invalid words. The priority determination principle for invalid words is: a valid word is literally included, but it does not belong to the set of valid words, and it is not a vocabulary that can be used as a keyword such as a movie title or song title.

102、关键词的提取。102. Keyword extraction.

针对搜索引擎数据库中的每一网络资源,需要为其主题信息提取相应的关键词。For each network resource in the search engine database, it is necessary to extract corresponding keywords for its subject information.

首先采用关键词词典,按最大匹配原则对网络资源的主题信息进行分词,将分词所得结果根据其属性进行过滤。去掉属性为无效的词汇,保留属性为有效的词汇,并以保留的词汇作为该网络资源的主题信息的关键词。Firstly, the keyword dictionary is used to segment the subject information of network resources according to the principle of maximum matching, and the result of word segmentation is filtered according to its attributes. The vocabulary whose attribute is invalid is removed, the vocabulary whose attribute is valid is reserved, and the reserved vocabulary is used as the keyword of the subject information of the network resource.

例如,关键词词典中有以下一组词:For example, the keyword dictionary has the following set of words:

东        1100 0000East 1100 0000

东方      0000 0000Oriental 0000 0000

东游记    1010 0000Journey to the East 1010 0000

东北      0000 0000Northeast 0000 0000

对如下一组网页标题的提取结果为:The extraction results for the following set of web page titles are:

影片东的花絮     ------     东Highlights of the movie East ------ East

东游记高清晰版   ------     东游记Journey to the East HD Version ------ Journey to the East

东北的小路       ------Northeast path ------

对于垂直搜索引擎而言,如对影视搜索引擎,关键词的最后确定还可以根据提取的关键词的其他属性进一步过滤。如对网页标题“龙虎门甄子丹主演”提取的关键词为“龙虎门”和“甄子丹”,但“甄子丹”不是一个影视词汇,而是一个人名,此时就应该将“甄子丹”这个词过滤。这种过滤方式可以依据搜索引擎的具体搜索类别而确定。For a vertical search engine, such as a film and television search engine, the final determination of keywords can be further filtered according to other attributes of the extracted keywords. For example, the keywords extracted from the title of the web page "Starring Donnie Yen" are "Dragon Tiger Gate" and "Donnie Yen", but "Donnie Yen" is not a film and television vocabulary, but a person's name. At this time, the word "Donnie Yen" should be filtered. This filtering method can be determined according to the specific search category of the search engine.

103、建立关键词索引。103. Establish a keyword index.

采用基础分词词典(但不限于),分别对每一网络资源的主题信息的各关键词进行分词处理,并建立关键词的各分词到网络资源的关键词索引。Using a basic word segmentation dictionary (but not limited to), perform word segmentation processing for each keyword of the subject information of each network resource, and establish a keyword index from each word segmentation of the keyword to the network resource.

例如有如下一批网络资源的主题信息:For example, the subject information of the following batch of network resources:

Doc1:不能说的秘密全集中文字幕;Doc1: The Unspeakable Secret Complete Works with Chinese subtitles;

Doc2:不能说的秘密全集;Doc2: The Complete Works of Unspeakable Secrets;

Doc3:铁三角DVD中文字幕;Doc3: Audio-Technica DVD Chinese subtitles;

Doc4:铁三角全集;Doc4: The Complete Works of the Iron Triangle;

Doc5:铁三角(主演任达华);Doc5: Iron Triangle (starring Simon Yam);

Doc6:秘密全集;Doc6: The Complete Works of Secrets;

它们的关键词分别为:Their keywords are:

Doc1:不能说的秘密;Doc1: Secrets that cannot be told;

Doc2:不能说的秘密;Doc2: A secret that cannot be told;

Doc3:铁三角;Doc3: Iron Triangle;

Doc4:铁三角;Doc4: Iron Triangle;

Doc5:铁三角;Doc5: Iron Triangle;

Doc6:秘密。Doc6: Secret.

对各关键词进行分词处理,得到如下分词:不能、说、的、秘密、铁三角。Word segmentation processing is performed on each keyword to obtain the following word segmentation: can't, say, of, secret, iron triangle.

关键词索引的建立情况如下:The establishment of the keyword index is as follows:

“不能”关联Doc1和Doc2;“说”关联Doc1和Doc2;“的”关联Doc1和Doc2;“秘密”关联Doc1、Doc2和Doc6;“铁三角”关联Doc3、Doc4和Doc5。"Can't" associates Doc1 and Doc2; "say" associates Doc1 and Doc2; "of" associates Doc1 and Doc2; "secret" associates Doc1, Doc2, and Doc6; "iron triangle" associates Doc3, Doc4, and Doc5.

104、建立资源索引(与建立关键词索引之间不分先后)。104. Establishing a resource index (in no particular order with establishing a keyword index).

根据基础分词词典(但不限于)对网络资源的主题信息进行分词处理,并建立网络资源的各分词到网络资源的资源索引。According to the basic word segmentation dictionary (but not limited to), the subject information of the network resource is segmented, and the resource index of each word segment of the network resource to the network resource is established.

例如有如下一批网络资源的主题信息:For example, the subject information of the following batch of network resources:

Doc1:不能说的秘密全集中文字幕;Doc1: The Unspeakable Secret Complete Works with Chinese subtitles;

Doc2:不能说的秘密全集;Doc2: The Complete Works of Unspeakable Secrets;

Doc3:铁三角DVD中文字幕;Doc3: Audio-Technica DVD Chinese subtitles;

Doc4:铁三角全集;Doc4: The Complete Works of the Iron Triangle;

Doc5:铁三角(主演  任达华);Doc5: Iron Triangle (starring Ren Dahua);

Doc6:秘密全集;Doc6: Complete Works of Secrets;

分词处理后资源索引的建立情况如下:The establishment of resource index after word segmentation processing is as follows:

“不能”关联Doc1,Doc2;“说”关联Doc1,Doc2;“的”关联Doc1,Doc2;“秘密”关联Doc1,Doc2,Doc6;“全集”关联Doc1,Doc2,Doc4,Doc6;“中文”关联Doc1,Doc3;“字幕”关联Doc1,Doc3;“铁三角”关联Doc3,Doc4,Doc5;“DVD”关联Doc3;“主演”关联Doc5;“任达华”关联Doc5。"Can't" is associated with Doc1, Doc2; "Say" is associated with Doc1, Doc2; "De" is associated with Doc1, Doc2; "Secret" is associated with Doc1, Doc2, Doc6; "Complete Works" is associated with Doc1, Doc2, Doc4, Doc6; "Chinese" is associated Doc1, Doc3; "Subtitle" is associated with Doc1, Doc3; "Iron Triangle" is associated with Doc3, Doc4, Doc5; "DVD" is associated with Doc3; "starring" is associated with Doc5; "Ren Dahua" is associated with Doc5.

105、权重配置。105. Weight configuration.

权重配置包括:对网络资源的静态权重配置以及对关键词中各分词的权重配置这两部分。The weight configuration includes two parts: the static weight configuration of the network resources and the weight configuration of each participle in the keyword.

其中,网页资源的静态权重由网页的浏览次数、网页来源、网页引用情况等信息确定;下载资源的静态权重由资源的下载次数、文件大小、文件格式等信息确定。例如:对某下载资源docid1而言,可以根据docid1的下载次数、docid1的大小等信息确定该下载资源的静态权重为W1。Wherein, the static weight of the webpage resource is determined by information such as the number of page views, the source of the webpage, and the citation of the webpage; the static weight of the downloaded resource is determined by the information such as the download times, file size, and file format of the resource. For example, for a download resource docid1, the static weight of the download resource can be determined as W1 according to information such as the download times of docid1 and the size of docid1.

其中,对关键词中各分词的权重配置包括下列步骤:首先根据基础分词词典(但不限于)对关键词进行分词,如关键词“不能说的秘密”被分为四个词,即分词结果为:不能、说、的、秘密。其次假设每个关键词的权重均为weight=1,则word1“不能”所对应的权重为W11,word2“说”所对应的权重为W21,word3“的”所对应的权重为W31,word4“秘密”所对应的权重为W41,并且W11=W41=1/3,W21=W31=1/4,即各分词权重按分词词长占关键词词长的比例确定。Among them, the weight configuration of each participle in the keyword includes the following steps: First, the keyword is segmented according to the basic word segmentation dictionary (but not limited to), such as the keyword "unspeakable secret" is divided into four words, that is, the word segmentation result For: can't, say, of, secret. Secondly, assuming that the weight of each keyword is weight=1, then the weight corresponding to word1 "can't" is W11, the weight corresponding to word2 "said" is W21, the weight corresponding to word3 "of" is W31, and the weight corresponding to word4 " The weight corresponding to "secret" is W41, and W11=W41=1/3, W21=W31=1/4, that is, the weight of each word segment is determined according to the ratio of the length of the word segment to the length of the keyword.

配置的静态权重和关键词中各分词的权重可加入到上述资源索引和关键词索引中。参见图4所示,在具体实现中所有网络资源的静态权重信息都记录在一起,并且以网络资源对应的docid为索引。Word1,Word2...Wordn分别记录了该词在各网络资源的主题信息的关键词中的分词权重,并且以关键词所属网络资源的主题信息对应的docid为索引。The configured static weight and the weight of each participle in the keyword can be added to the above-mentioned resource index and keyword index. Referring to FIG. 4 , in a specific implementation, the static weight information of all network resources is recorded together, and the docid corresponding to the network resource is used as an index. Word1, Word2...Wordn respectively record the word segmentation weights of the word in the keywords of the topic information of each network resource, and use the docid corresponding to the topic information of the network resources to which the keyword belongs as an index.

106、确定待排序网络资源。106. Determine the network resources to be sorted.

参见图5所示,当用户输入某个词word作为搜索词进行搜索时,首先对搜索词word采用基础分词词典进行分词处理,得到分词序列word1,word2,...,wordn。然后在图4所示的资源索引中查找出分词wordk,k=1,2,...,n所对应的docid序列的交集,如docid2,docid4,docid5等,并以docid序列的交集对应的网络资源的交集作为待排序网络资源。Referring to Fig. 5, when the user inputs a certain word word as a search word to search, first, the search word word is segmented using the basic word segmentation dictionary to obtain the word segment sequence word1, word2, . . . , wordn. Then in the resource index shown in Fig. 4, find out wordk, k=1, 2, ..., the intersection of the corresponding docid sequence of n, as docid2, docid4, docid5 etc., and corresponding with the intersection of docid sequence The intersection of network resources is used as the network resources to be sorted.

107、计算权重。107. Calculate the weight.

计算各待排序网络资源的总权重,以下以docid2为例。Calculate the total weight of each network resource to be sorted. The following uses docid2 as an example.

参见图6所示,在关键词索引(参见图4所示)中分别查找word1,word2,...,wordn在docid2所对应的待排序网络资源的主题信息中的分词权重,取出分词权重W12,W22,...,Wn2进行累加,得到搜索词在docid2所对应的待排序网络资源的主题信息中的关键词权重,即Wk(docid)=∑Wmn。如果某个wordk所对应的docid中不含docid2,则其相应的权重为Wk2=0,即该词不是docid2对应的网络资源的主题信息的关键词分词。Referring to Fig. 6, search word1, word2, ..., wordn in the subject information of the network resources to be sorted corresponding to docid2 in the keyword index (see Fig. 4) respectively to find the participle weight, and take out the participle weight W12 , W22, . . . , Wn2 are accumulated to obtain the keyword weight of the search word in the topic information of the network resources to be sorted corresponding to docid2, that is, Wk(docid)=∑Wmn. If the docid corresponding to a certain wordk does not contain docid2, its corresponding weight is Wk2=0, that is, the word is not a keyword segment of the subject information of the network resource corresponding to docid2.

并且在图4所示的资源索引中取docid2对应的网络资源的静态权重Ws(docid)。And the static weight Ws(docid) of the network resource corresponding to docid2 is taken in the resource index shown in FIG. 4 .

最后计算docid2对应的网络资源的总权重W(docid)。可根据具体情况确定Ws(docid)和Wk(docid)在W(docid)中分别所占的比例,如:Ws(docid)占q1,Wk(docid)占q2,则W(docid)=q1*Ws(docid)+q2*Wk(docid)。Finally, the total weight W(docid) of the network resource corresponding to docid2 is calculated. The respective proportions of Ws(docid) and Wk(docid) in W(docid) can be determined according to specific conditions, such as: Ws(docid) accounts for q1, Wk(docid) accounts for q2, then W(docid)=q1* Ws(docid)+q2*Wk(docid).

108、排序。108. Sort.

计算出各待排序网络资源的总权重后,按照总权重由高至低的顺序对所述各待排序网络资源进行排序。After the total weights of the network resources to be sorted are calculated, the network resources to be sorted are sorted in descending order of the total weights.

当采用上述方案对搜索结果排序后,可以得到比较理想的搜索结果。例如,当用户搜索“秘密预告片”时,若搜索结果中有网页标题1-“秘密预告片”,网页标题2-“不能说的秘密预告片”,则“秘密预告片”的权重将大于“不能说的秘密预告片”的权重。这是因为“秘密预告片”的关键词为“秘密”,“不能说的秘密预告片”的关键词为“不能说的秘密”,而“预告片”为无效关键词。当对关键词分词后,“不能说的秘密”将会被分为“不能、说、的、秘密”四个词。在关键词索引中,“秘密”在网页标题1的关键词中的权重为weight,在网页标题2的关键词中的权重为weight/3。After the search results are sorted using the above solution, a relatively ideal search result can be obtained. For example, when a user searches for "Secret Trailer", if there are webpage title 1-"Secret Trailer" and webpage title 2-"Secret Trailer" in the search results, the weight of "Secret Trailer" will be greater than The weight of the "Unspeakable Secret Trailer". This is because the keyword of "secret trailer" is "secret", the keyword of "unspeakable secret trailer" is "unspeakable secret", and "trailer" is an invalid keyword. After the keywords are segmented, "secret that cannot be said" will be divided into four words: "cannot, say, of, secret". In the keyword index, the weight of "secret" in the keywords of web page title 1 is weight, and the weight of "secret" in the keywords of web page title 2 is weight/3.

109、向用户呈现排序结果。109. Present the sorting result to the user.

将实际总权重最高的网络资源排在最前面,从而使排序结果更加贴近用户的需求。The network resource with the highest actual total weight is ranked first, so that the sorting result is closer to the user's needs.

从实施例1中可以看出,q1和q2是可调节的。在特殊情况下,由于提取关键词的原因,有时当用户输入一个字,且该字是一个电影名时,例如“东”,该搜索词可能会有许多结果均为关键词“东”,这时会导致搜索结果过于单一化,结果显示整页均是有关“东”的电影,这样可能与用户实际想要的结果有一定差距。可以降低q2并升高q1,以针对该特殊情况。It can be seen from Example 1 that q1 and q2 are adjustable. In special cases, due to the reason for extracting keywords, sometimes when the user enters a word, and the word is a movie name, such as "East", many results of the search word may be the keyword "East", which means Sometimes, the search results will be too simple, and the results will show that the entire page is full of movies about "Dong", which may have a certain gap with the results that users actually want. It is possible to lower q2 and raise q1 to address this special case.

综上所述,本发明实施例中对用户输入的搜索词进行分词处理;以分词处理所得的分词分别在关键词索引中进行查找,以确定所述搜索词在各待排序网络资源中的关键词权重,并确定所述搜索词在各待排序网络资源中的总权重。由于总权重中考虑了搜索词与关键词的匹配等情况,所以按照总权重对所述各待排序的网络资源进行排序并呈现给用户,可更加贴近用户的需求。To sum up, in the embodiment of the present invention, word segmentation processing is performed on the search word input by the user; the word segmentation processing obtained by word segmentation processing is respectively searched in the keyword index to determine the keyword of the search word in each network resource to be sorted. The word weight, and determine the total weight of the search word in each network resource to be sorted. Since the total weight takes into account the matching of search words and keywords, the network resources to be sorted are sorted according to the total weight and presented to the user, which can be more close to the needs of the user.

进一步,本发明实施例中提供了设置步骤、确定待排序网络资源的步骤、计算权重的步骤、排序步骤,以及呈现步骤的具体实现方案。其中设置步骤包括:关键词词典的定制子步骤、关键词的提取子步骤、建立关键词索引的子步骤、建立资源索引的子步骤,以及权重配置子步骤。更好的支撑了本发明。Further, the embodiment of the present invention provides specific implementation solutions of the setting step, the step of determining the network resource to be sorted, the step of calculating the weight, the step of sorting, and the step of presenting. The setting steps include: a sub-step of customizing the keyword dictionary, a sub-step of extracting keywords, a sub-step of establishing a keyword index, a sub-step of establishing a resource index, and a sub-step of weight configuration. Better supported the present invention.

进一步,本发明实施例1中q1和q2可调节,所以可根据具体情况进行调整,满足用户的各种需求。Furthermore, in Embodiment 1 of the present invention, q1 and q2 are adjustable, so they can be adjusted according to specific conditions to meet various needs of users.

显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the present invention without departing from the spirit and scope of the present invention. Thus, if these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalent technologies, the present invention also intends to include these modifications and variations.

Claims (24)

Translated fromChinese
1.一种基于搜索引擎的搜索结果排序方法,其特征在于,包括下列步骤:1. A method for sorting search results based on a search engine, characterized in that it comprises the following steps:对用户输入的搜索词进行分词处理;Perform word segmentation processing on the search words entered by the user;以分词处理所得的分词分别在关键词索引中进行查找,以确定所述搜索词在各待排序网络资源中的关键词权重;The word segmentation obtained by the word segmentation processing is searched in the keyword index respectively to determine the keyword weight of the search word in each network resource to be sorted;确定所述搜索词在各待排序网络资源中的总权重;以及determining the total weight of the search term in each web resource to be ranked; and按照总权重对所述各待排序的网络资源进行排序,并呈现给用户。The network resources to be sorted are sorted according to the total weight, and presented to the user.2.如权利要求1所述的方法,其特征在于,在用户输入搜索词进行搜索之前还包括:以词和词的属性作为基本结构,定制关键词词典的步骤;定制的关键词词典中包括各有效词和每一有效词对应的属性,以及各无效词和每一无效词对应的属性。2. The method according to claim 1, characterized in that, before the user enters the search word to search, it also includes: using the word and the attribute of the word as the basic structure, the step of customizing the keyword dictionary; the customized keyword dictionary includes Each valid word corresponds to an attribute of each valid word, and each invalid word corresponds to an attribute of each invalid word.3.如权利要求2所述的方法,其特征在于,所述无效词的集合与有效词的集合互为互斥关系。3. The method according to claim 2, wherein the set of invalid words and the set of valid words are mutually exclusive.4.如权利要求3所述的方法,其特征在于,一个所述无效词包含的字符覆盖一个有效词包含的字符。4. The method according to claim 3, wherein the characters contained in one invalid word cover the characters contained in one valid word.5.如权利要求2所述的方法,其特征在于,所述词的属性以字符型数字表示,每一位字符分别表示所述词的一种属性。5. The method according to claim 2, wherein the attribute of the word is represented by a character number, and each character represents an attribute of the word.6.如权利要求2所述的方法,其特征在于,在用户输入搜索词进行搜索之前还包括:依据关键词词典,按最大匹配原则对每一网络资源的主题信息进行分词处理;根据分词处理所得分词的属性对该分词进行过滤,以提取每一网络资源的主题信息的关键词。6. The method according to claim 2, characterized in that, before the user enters the search word to search, it also includes: according to the keyword dictionary, subject information of each network resource is subjected to word segmentation processing according to the principle of maximum matching; according to the word segmentation processing The attributes of the obtained word segmentation are used to filter the word segmentation to extract keywords of the subject information of each network resource.7.如权利要求1所述的方法,其特征在于,在用户输入搜索词进行搜索之前还包括:7. The method according to claim 1, further comprising: before the user inputs a search word to search:分别对每一网络资源的主题信息的各关键词进行分词处理;Carry out word segmentation processing for each keyword of the subject information of each network resource;建立关键词的各分词到网络资源的关键词索引。Build a keyword index from each participle of the keyword to the network resource.8.如权利要求7所述的方法,其特征在于,还包括配置权重的步骤,其中包括:8. The method of claim 7, further comprising the step of configuring weights, comprising:根据关键词的各分词词长占该关键词词长的比例,为各分词分别配置分词权重;或者According to the ratio of the word length of each word segment of the keyword to the length of the keyword, configure word segmentation weights for each word; or根据网络资源的信息,为该网络资源配置静态权重,并根据关键词的各分词词长占该关键词词长的比例,为各分词分别配置分词权重。According to the information of the network resource, configure the static weight for the network resource, and configure the word segmentation weight for each word according to the ratio of the word length of each word segment of the keyword to the length of the keyword.9.如权利要求8所述的方法,其特征在于,所述网络资源的信息包括:被浏览次数和/或被引用情况和/或被下载次数和/或文件格式和/或文件大小。9. The method according to claim 8, characterized in that, the information of the network resource comprises: the number of times of browsing and/or the situation of being cited and/or the number of times of being downloaded and/or the format of the file and/or the size of the file.10.如权利要求1所述的方法,其特征在于,将以分词处理所得分词分别在关键词索引中进行查找,以确定每一分词在各待排序网络资源的主题信息的关键词中的分词权重;10. The method according to claim 1, characterized in that, the parted words obtained by word segmentation processing are searched in the keyword index respectively, so as to determine the participle of each participle in the keywords of the subject information of each network resource to be sorted Weights;将各分词在同一待排序网络资源的主题信息中的分词权重相加,作为所述搜索词在该待排序网络资源中的关键词权重。The weights of the word segments in the subject information of the same network resource to be sorted are summed up, and used as the keyword weight of the search word in the network resource to be sorted.11.如权利要求10所述的方法,其特征在于,所述总权重至少包括:搜索词在所述待排序网络资源中的关键词权重。11. The method according to claim 10, wherein the total weight includes at least: the keyword weight of the search term in the network resources to be ranked.12.如权利要求10所述的方法,其特征在于,确定所述搜索词在各待排序网络资源中的总权重,包括下列步骤:12. The method according to claim 10, wherein determining the total weight of the search term in each network resource to be sorted comprises the following steps:取根据当前待排序网络资源的信息配置的静态权重;Take the static weight configured according to the information of the current network resources to be sorted;取所述搜索词在当前待排序网络资源中的关键词权重;Get the keyword weight of the search term in the current network resource to be sorted;将当前待排序网络资源的静态权重与关键词权重组合成当前待排序网络资源的总权重。Combining the static weights of the current network resources to be sorted and the keyword weights into the total weight of the current network resources to be sorted.13.如权利要求12所述的方法,其特征在于,当前待排序网络资源的总权重为W(docid)=q1*Ws(docid)+q2*Wk(docid),13. The method according to claim 12, wherein the total weight of the current network resources to be sorted is W(docid)=q1*Ws(docid)+q2*Wk(docid),其中,docid表示当前待排序的网络资源;Among them, docid indicates the current network resources to be sorted;q1表示静态权重占总权重的比例;q1 represents the ratio of static weight to total weight;Ws(docid)表示静态权重;Ws(docid) means static weight;q2表示关键词权重占总权重的比例;q2 represents the ratio of keyword weight to total weight;Wk(docid)表示关键词权重。Wk(docid) represents keyword weight.14.如权利要求1所述的方法,其特征在于,在用户输入搜索词进行搜索之前还包括:14. The method according to claim 1, further comprising: before the user inputs a search word to search:根据基础分词词典对网络资源的主题信息进行分词处理;Segment the subject information of network resources according to the basic word segmentation dictionary;建立网络资源的各分词到网络资源的资源索引。Create a resource index from each participle of the network resource to the network resource.15.如权利要求14所述的方法,其特征在于,确定所述待排序网络资源,包括下列步骤:15. The method according to claim 14, wherein determining the network resources to be sorted comprises the following steps:以对搜索词进行分词处理所得分词分别在资源索引中进行查找,以分别确定每一分词所属的网络资源的集合;The word segmentation processing of the search word is carried out in the resource index to search respectively, so as to respectively determine the set of network resources to which each word segmentation belongs;取各所述集合的交集,作为待排序的网络资源。The intersection of each set is taken as the network resource to be sorted.16.如权利要求1所述的方法,其特征在于,按照总权重由高至低的顺序对所述各待排序的网络资源进行排序,并将排序结果正向呈现给用户。16. The method according to claim 1, wherein the network resources to be sorted are sorted in descending order of total weight, and the sorting results are forwardly presented to the user.17.一种基于搜索引擎的搜索结果排序装置,其特征在于,包括:17. A device for sorting search results based on a search engine, characterized in that it comprises:分词单元,用于对用户输入的搜索词进行分词处理;The word segmentation unit is used to perform word segmentation processing on the search words input by the user;关键词权重确定单元,用于以分词处理所得分词分别在关键词索引中进行查找,以确定所述搜索词在各待排序网络资源中的关键词权重;The keyword weight determination unit is used to search the keyword index for the segmented words obtained by the word segmentation process, so as to determine the keyword weight of the search word in each network resource to be sorted;总权重确定单元,用于确定所述搜索词在各待排序网络资源中的总权重;a total weight determination unit, configured to determine the total weight of the search term in each network resource to be sorted;排序单元,用于按照总权重对所述各待排序的网络资源进行排序;a sorting unit, configured to sort the network resources to be sorted according to the total weight;呈现单元,用于向用户呈现排序结果。The presentation unit is configured to present the sorting results to the user.18.如权利要求17所述的装置,其特征在于,还包括:18. The apparatus of claim 17, further comprising:定制单元,用于以词和词的属性作为基本结构,定制关键词词典;定制的关键词词典中包括各有效词和每一有效词对应的属性,以及各无效词和每一无效词对应的属性。The custom unit is used to customize the keyword dictionary with the word and the attribute of the word as the basic structure; the customized keyword dictionary includes each valid word and the corresponding attribute of each valid word, and each invalid word and the corresponding attribute of each invalid word Attributes.19.如权利要求18所述的装置,其特征在于,还包括:19. The apparatus of claim 18, further comprising:提取单元,用于依据关键词词典,按最大匹配原则对每一网络资源的主题信息进行分词处理;根据分词处理所得分词的属性对该分词进行过滤,以提取每一网络资源的主题信息的关键词。The extraction unit is used to perform word segmentation processing on the theme information of each network resource according to the principle of maximum matching according to the keyword dictionary; filter the word segmentation according to the attributes of the word segmentation processing to extract the key words of the theme information of each network resource word.20.如权利要求17所述的装置,其特征在于,还包括:20. The apparatus of claim 17, further comprising:关键词索引建立单元,用于分别对每一网络资源的主题信息的各关键词进行分词处理,并建立关键词的各分词到网络资源的关键词索引,以备关键词权重确定单元调用。The keyword index building unit is used to segment each keyword of the subject information of each network resource, and build a keyword index from each word segmentation of the keyword to the network resource, so as to be called by the keyword weight determination unit.21.如权利要求20所述的装置,其特征在于,还包括:21. The apparatus of claim 20, further comprising:配置单元,用于根据关键词的各分词词长占该关键词词长的比例,为各分词分别配置分词权重;或者The configuration unit is used to configure the participle weight for each participle respectively according to the ratio of the length of each participle of the keyword to the length of the keyword; or根据网络资源的信息,为该网络资源配置静态权重,并根据关键词的各分词词长占该关键词词长的比例,为各分词分别配置分词权重。According to the information of the network resource, configure the static weight for the network resource, and configure the word segmentation weight for each word according to the ratio of the word length of each word segment of the keyword to the length of the keyword.22.如权利要求17所述的装置,其特征在于,还包括:22. The apparatus of claim 17, further comprising:资源索引建立单元,用于根据基础分词词典对网络资源的主题信息进行分词处理,并建立网络资源的各分词到网络资源的资源索引。The resource index building unit is configured to perform word segmentation processing on the topic information of the network resources according to the basic word segmentation dictionary, and establish a resource index from each word segmentation of the network resources to the network resources.23.如权利要求22所述的装置,其特征在于,还包括:23. The apparatus of claim 22, further comprising:确定单元,以对搜索词进行分词处理所得分词分别在资源索引中进行查找,以分别确定每一分词所属的网络资源的集合;取各所述集合的交集,作为待排序的网络资源。The determining unit is used to search the resource index for the segmented words obtained by segmenting the search word, so as to respectively determine the set of network resources to which each segmented word belongs; take the intersection of each set as the network resource to be sorted.24.如权利要求23所述的装置,其特征在于,排序单元按照总权重由高至低的顺序对所述各待排序的网络资源进行排序,则呈现单元将排序结果正向呈现给用户。24. The device according to claim 23, wherein the sorting unit sorts the network resources to be sorted in descending order of total weight, and the presentation unit forwardly presents the sorting results to the user.
CNB2007101872765A2007-11-152007-11-15 Method and device for sorting search results based on search engineActiveCN100557612C (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CNB2007101872765ACN100557612C (en)2007-11-152007-11-15 Method and device for sorting search results based on search engine

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CNB2007101872765ACN100557612C (en)2007-11-152007-11-15 Method and device for sorting search results based on search engine

Publications (2)

Publication NumberPublication Date
CN101158971Atrue CN101158971A (en)2008-04-09
CN100557612C CN100557612C (en)2009-11-04

Family

ID=39307073

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CNB2007101872765AActiveCN100557612C (en)2007-11-152007-11-15 Method and device for sorting search results based on search engine

Country Status (1)

CountryLink
CN (1)CN100557612C (en)

Cited By (45)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN102004772A (en)*2010-11-152011-04-06百度在线网络技术(北京)有限公司Method and equipment for sequencing search results according to terms
CN102163228A (en)*2011-04-132011-08-24北京百度网讯科技有限公司Method, apparatus and device for determining sorting result of resource candidates
CN102289436A (en)*2010-06-182011-12-21阿里巴巴集团控股有限公司Method and device for determining weighted value of search term and method and device for generating search results
CN102546456A (en)*2010-12-222012-07-04北大方正集团有限公司Information feedback method and device
CN101957828B (en)*2009-07-202013-03-06阿里巴巴集团控股有限公司Method and device for sequencing search results
CN103034718A (en)*2012-12-122013-04-10北京博雅立方科技有限公司Target data sequencing method and target data sequencing device
CN103092943A (en)*2013-01-102013-05-08北京亿赞普网络技术有限公司Method of advertisement dispatch and advertisement dispatch server
CN103226597A (en)*2013-04-192013-07-31北京集奥聚合科技有限公司Keyword advertisement matching method based on natural semantics
CN103353894A (en)*2013-07-192013-10-16武汉睿数信息技术有限公司Data searching method and system based on semantic analysis
CN103425687A (en)*2012-05-212013-12-04阿里巴巴集团控股有限公司Retrieval method and system based on queries
CN103425691A (en)*2012-05-222013-12-04阿里巴巴集团控股有限公司Search method and search system
CN103593343A (en)*2012-08-132014-02-19腾讯科技(深圳)有限公司Information retrieval method and device in e-commerce platform
CN102103615B (en)*2009-12-212014-03-26北大方正集团有限公司Three-segment sequential collecting method and system for retrieval results
CN103838754A (en)*2012-11-232014-06-04腾讯科技(深圳)有限公司Information searching device and method
CN104077306A (en)*2013-03-282014-10-01阿里巴巴集团控股有限公司Search engine result sequencing method and search engine result sequencing system
CN104170316A (en)*2012-01-052014-11-26国际商业机器公司Goal-oriented user matching among social networking environments
CN104219575A (en)*2013-05-292014-12-17酷盛(天津)科技有限公司Related video recommending method and system
CN104881497A (en)*2015-06-172015-09-02郑州悉知信息技术有限公司Searching method and client
CN104991915A (en)*2015-06-232015-10-21郑州悉知信息技术有限公司Information search method and apparatus
CN105868242A (en)*2015-12-142016-08-17乐视网信息技术(北京)股份有限公司Sorting method and system for labels in network recommendation
CN105975636A (en)*2016-06-242016-09-28点击律(上海)网络科技有限公司Method and device for optimizing online consultation services
CN106021430A (en)*2016-05-162016-10-12武汉斗鱼网络科技有限公司Full-text retrieval matching method and system based on Lucence custom lexicon
CN106484889A (en)*2016-10-182017-03-08合信息技术(北京)有限公司The flooding method and apparatus of Internet resources
CN106649338A (en)*2015-10-302017-05-10中国移动通信集团公司Information filtering policy generation method and apparatus
CN107145571A (en)*2017-05-052017-09-08广东艾檬电子科技有限公司A kind of searching method and device
CN107357891A (en)*2017-07-122017-11-17中云开源数据技术(上海)有限公司A kind of homepage Link Recommendation method
US10025807B2 (en)2012-09-132018-07-17Alibaba Group Holding LimitedDynamic data acquisition method and system
CN104933149B (en)*2015-06-232018-08-14郑州悉知信息科技股份有限公司A kind of information search method and device
CN104881504B (en)*2015-06-232018-08-14郑州悉知信息科技股份有限公司A kind of information search method and device
WO2018201668A1 (en)*2017-05-052018-11-08平安科技(深圳)有限公司Text retrieval method, electronic device, computer-readable storage medium and system
CN110580276A (en)*2018-06-082019-12-17百度在线网络技术(北京)有限公司method and apparatus for processing information
CN110688572A (en)*2019-09-242020-01-14四川新网银行股份有限公司Method for identifying search intention in cold starting state
WO2020019565A1 (en)*2018-07-272020-01-30天津字节跳动科技有限公司Search sorting method and apparatus, and electronic device and storage medium
CN110765356A (en)*2019-10-232020-02-07绍兴柯桥浙工大创新研究院发展有限公司Industrial design man-machine data query system for retrieving and sorting according to user habits
CN111737501A (en)*2020-06-222020-10-02北京百度网讯科技有限公司 A content recommendation method and device, electronic device, and storage medium
CN111797205A (en)*2020-06-302020-10-20百度在线网络技术(北京)有限公司Word list retrieval method and device, electronic equipment and storage medium
CN111984749A (en)*2019-05-232020-11-24北京搜狗科技发展有限公司 Method and device for sorting interest points
CN112004126A (en)*2020-08-242020-11-27海信视像科技股份有限公司Search result display method and display device
CN112346876A (en)*2020-12-042021-02-09山东鲁能软件技术有限公司Channel distribution method and system with autonomous learning characteristic
CN112948655A (en)*2019-11-262021-06-11中兴通讯股份有限公司Information searching method, device, equipment and storage medium
CN113127761A (en)*2019-12-312021-07-16中国科学技术信息研究所Intelligent sorting method for scientific and technological element retrieval, electronic equipment and storage medium
CN113298493A (en)*2021-05-212021-08-24陕西合友网络科技有限公司Navigation system and method for administrative examination and approval intelligent navigation
CN113326363A (en)*2021-05-272021-08-31北京百度网讯科技有限公司Searching method and device, prediction model training method and device, and electronic device
CN113515940A (en)*2021-07-142021-10-19上海芯翌智能科技有限公司 A method and device for text search
CN115114505A (en)*2022-08-282022-09-27安徽冠成教育科技有限公司 Online Educational Content Distribution System

Cited By (63)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN101957828B (en)*2009-07-202013-03-06阿里巴巴集团控股有限公司Method and device for sequencing search results
CN102103615B (en)*2009-12-212014-03-26北大方正集团有限公司Three-segment sequential collecting method and system for retrieval results
CN102289436A (en)*2010-06-182011-12-21阿里巴巴集团控股有限公司Method and device for determining weighted value of search term and method and device for generating search results
CN102004772A (en)*2010-11-152011-04-06百度在线网络技术(北京)有限公司Method and equipment for sequencing search results according to terms
CN102546456A (en)*2010-12-222012-07-04北大方正集团有限公司Information feedback method and device
CN102546456B (en)*2010-12-222015-04-08北大方正集团有限公司 A kind of information feedback method and information feedback device
CN102163228A (en)*2011-04-132011-08-24北京百度网讯科技有限公司Method, apparatus and device for determining sorting result of resource candidates
WO2012139394A1 (en)*2011-04-132012-10-18北京百度网讯科技有限公司Resource candidate sequencing result determination method, apparatus and equipment
CN102163228B (en)*2011-04-132014-10-08北京百度网讯科技有限公司Method, apparatus and device for determining sorting result of resource candidates
US10268653B2 (en)2012-01-052019-04-23International Business Machines CorporationGoal-oriented user matching among social networking environments
CN104170316A (en)*2012-01-052014-11-26国际商业机器公司Goal-oriented user matching among social networking environments
CN103425687A (en)*2012-05-212013-12-04阿里巴巴集团控股有限公司Retrieval method and system based on queries
CN103425691A (en)*2012-05-222013-12-04阿里巴巴集团控股有限公司Search method and search system
CN103425691B (en)*2012-05-222016-12-14阿里巴巴集团控股有限公司A kind of searching method and system
CN103593343A (en)*2012-08-132014-02-19腾讯科技(深圳)有限公司Information retrieval method and device in e-commerce platform
US10025807B2 (en)2012-09-132018-07-17Alibaba Group Holding LimitedDynamic data acquisition method and system
CN103838754A (en)*2012-11-232014-06-04腾讯科技(深圳)有限公司Information searching device and method
CN103838754B (en)*2012-11-232017-12-22腾讯科技(深圳)有限公司Information retrieval device and method
CN103034718B (en)*2012-12-122016-07-06北京博雅立方科技有限公司A kind of target data sort method and device
CN103034718A (en)*2012-12-122013-04-10北京博雅立方科技有限公司Target data sequencing method and target data sequencing device
CN103092943A (en)*2013-01-102013-05-08北京亿赞普网络技术有限公司Method of advertisement dispatch and advertisement dispatch server
CN103092943B (en)*2013-01-102016-03-23北京亿赞普网络技术有限公司A kind of method of advertisement scheduling and advertisement scheduling server
CN104077306A (en)*2013-03-282014-10-01阿里巴巴集团控股有限公司Search engine result sequencing method and search engine result sequencing system
CN103226597A (en)*2013-04-192013-07-31北京集奥聚合科技有限公司Keyword advertisement matching method based on natural semantics
CN104219575A (en)*2013-05-292014-12-17酷盛(天津)科技有限公司Related video recommending method and system
CN104219575B (en)*2013-05-292020-05-12上海连尚网络科技有限公司 Related video recommendation method and system
CN103353894A (en)*2013-07-192013-10-16武汉睿数信息技术有限公司Data searching method and system based on semantic analysis
CN104881497A (en)*2015-06-172015-09-02郑州悉知信息技术有限公司Searching method and client
CN104991915A (en)*2015-06-232015-10-21郑州悉知信息技术有限公司Information search method and apparatus
CN104881504B (en)*2015-06-232018-08-14郑州悉知信息科技股份有限公司A kind of information search method and device
CN104933149B (en)*2015-06-232018-08-14郑州悉知信息科技股份有限公司A kind of information search method and device
CN106649338B (en)*2015-10-302020-08-21中国移动通信集团公司 Information filtering strategy generation method and device
CN106649338A (en)*2015-10-302017-05-10中国移动通信集团公司Information filtering policy generation method and apparatus
CN105868242A (en)*2015-12-142016-08-17乐视网信息技术(北京)股份有限公司Sorting method and system for labels in network recommendation
CN106021430A (en)*2016-05-162016-10-12武汉斗鱼网络科技有限公司Full-text retrieval matching method and system based on Lucence custom lexicon
CN106021430B (en)*2016-05-162018-01-19武汉斗鱼网络科技有限公司Full-text search matching process and system based on the self-defined dictionaries of Lucence
CN105975636A (en)*2016-06-242016-09-28点击律(上海)网络科技有限公司Method and device for optimizing online consultation services
CN106484889A (en)*2016-10-182017-03-08合信息技术(北京)有限公司The flooding method and apparatus of Internet resources
CN107145571A (en)*2017-05-052017-09-08广东艾檬电子科技有限公司A kind of searching method and device
WO2018201668A1 (en)*2017-05-052018-11-08平安科技(深圳)有限公司Text retrieval method, electronic device, computer-readable storage medium and system
CN107145571B (en)*2017-05-052020-02-14广东艾檬电子科技有限公司Searching method and device
CN107357891A (en)*2017-07-122017-11-17中云开源数据技术(上海)有限公司A kind of homepage Link Recommendation method
CN110580276A (en)*2018-06-082019-12-17百度在线网络技术(北京)有限公司method and apparatus for processing information
CN110580276B (en)*2018-06-082022-06-28百度在线网络技术(北京)有限公司Method and apparatus for processing information
US11481402B2 (en)2018-07-272022-10-25Tianjin Bytedance Technology Co., Ltd.Search ranking method and apparatus, electronic device and storage medium
WO2020019565A1 (en)*2018-07-272020-01-30天津字节跳动科技有限公司Search sorting method and apparatus, and electronic device and storage medium
CN111984749A (en)*2019-05-232020-11-24北京搜狗科技发展有限公司 Method and device for sorting interest points
CN110688572A (en)*2019-09-242020-01-14四川新网银行股份有限公司Method for identifying search intention in cold starting state
CN110765356A (en)*2019-10-232020-02-07绍兴柯桥浙工大创新研究院发展有限公司Industrial design man-machine data query system for retrieving and sorting according to user habits
CN112948655A (en)*2019-11-262021-06-11中兴通讯股份有限公司Information searching method, device, equipment and storage medium
CN113127761A (en)*2019-12-312021-07-16中国科学技术信息研究所Intelligent sorting method for scientific and technological element retrieval, electronic equipment and storage medium
CN111737501A (en)*2020-06-222020-10-02北京百度网讯科技有限公司 A content recommendation method and device, electronic device, and storage medium
CN111797205A (en)*2020-06-302020-10-20百度在线网络技术(北京)有限公司Word list retrieval method and device, electronic equipment and storage medium
CN111797205B (en)*2020-06-302024-03-12百度在线网络技术(北京)有限公司Vocabulary retrieval method and device, electronic equipment and storage medium
CN112004126A (en)*2020-08-242020-11-27海信视像科技股份有限公司Search result display method and display device
CN112346876A (en)*2020-12-042021-02-09山东鲁能软件技术有限公司Channel distribution method and system with autonomous learning characteristic
CN113298493A (en)*2021-05-212021-08-24陕西合友网络科技有限公司Navigation system and method for administrative examination and approval intelligent navigation
CN113326363A (en)*2021-05-272021-08-31北京百度网讯科技有限公司Searching method and device, prediction model training method and device, and electronic device
CN113326363B (en)*2021-05-272023-07-25北京百度网讯科技有限公司Searching method and device, prediction model training method and device and electronic equipment
CN113515940A (en)*2021-07-142021-10-19上海芯翌智能科技有限公司 A method and device for text search
CN113515940B (en)*2021-07-142022-12-13上海芯翌智能科技有限公司 A method and device for text search
CN115114505A (en)*2022-08-282022-09-27安徽冠成教育科技有限公司 Online Educational Content Distribution System
CN115114505B (en)*2022-08-282022-11-25安徽冠成教育科技有限公司Online education content distribution system

Also Published As

Publication numberPublication date
CN100557612C (en)2009-11-04

Similar Documents

PublicationPublication DateTitle
CN101158971A (en) Method and device for sorting search results based on search engine
US12271420B1 (en)Video segments for a video related to a task
CN102929873B (en)Method and device for extracting searching value terms based on context search
CN109800352B (en) Method, system and terminal device for information push based on clipboard
CN102708100B (en)Method and device for digging relation keyword of relevant entity word and application thereof
US7844594B1 (en)Information search, retrieval and distillation into knowledge objects
CN109684647B (en)Movie comment sentiment analysis method and device
CN102737039B (en)Index building method, searching method and searching result sorting method and corresponding device
CN101968819B (en)Audio and video intelligent cataloging information acquisition method facing wide area network
CN103106287B (en)A kind of processing method and system of user search sentence
US20100274667A1 (en)Multimedia access
US20080059453A1 (en)System and method for enhancing the result of a query
KR101134701B1 (en)The Method and System for Automatically Constructing Positive/Negative Feature-Predicate Dictionary for Polarity Classification of Product Reviews
CN118193850B (en) A method for recommending public opinion information based on knowledge graph
Chang et al.AppGrouper: Knowledge-based interactive clustering tool for app search results
WO2010014082A1 (en)Method and apparatus for relating datasets by using semantic vectors and keyword analyses
CN110750995A (en)File management method based on user-defined map
WO2015081909A1 (en)File recommendation method and device
CN106294797B (en) A method and device for generating a video gene
TWI290687B (en)System and method for search information based on classifications of synonymous words
Wu et al.News filtering and summarization on the web
CN103870489A (en)Chinese name self-extension recognition method based on search logs
CN118690048A (en) A method and device for predicting user behavior
JP2004362121A (en) Information search device, portable information terminal device, information search method, information search program, and recording medium
CN110555202A (en)method and device for generating abstract broadcast

Legal Events

DateCodeTitleDescription
C06Publication
PB01Publication
C10Entry into substantive examination
SE01Entry into force of request for substantive examination
C14Grant of patent or utility model
GR01Patent grant
ASSSuccession or assignment of patent right

Owner name:BEIJING Z-GOOD RUITUO TECHNOLOGY SERVICE CO., LTD.

Free format text:FORMER OWNER: XUNLEI NETWORK TECHNOLOGY CO., LTD., SHENZHEN

Effective date:20131030

C41Transfer of patent application or patent right or utility model
CORChange of bibliographic data

Free format text:CORRECT: ADDRESS; FROM: 518057 SHENZHEN, GUANGDONG PROVINCE TO: 100085 HAIDIAN, BEIJING

TR01Transfer of patent right

Effective date of registration:20131030

Address after:100085 Beijing city Haidian District No. 33 Xiaoying Road 1 1F05 room

Patentee after:Beijing Zhigu Ruituo Technology Service Co., Ltd.

Address before:518057 Guangdong, Shenzhen, Nanshan District science and technology in the road, Shenzhen, No. 11, software park, building 7, level 8, two

Patentee before:Xunlei Network Technology Co., Ltd., Shenzhen


[8]ページ先頭

©2009-2025 Movatter.jp