Movatterモバイル変換


[0]ホーム

URL:


CN116401255A - Search method and computing device - Google Patents

Search method and computing device
Download PDF

Info

Publication number
CN116401255A
CN116401255ACN202310416807.2ACN202310416807ACN116401255ACN 116401255 ACN116401255 ACN 116401255ACN 202310416807 ACN202310416807 ACN 202310416807ACN 116401255 ACN116401255 ACN 116401255A
Authority
CN
China
Prior art keywords
word
document
chapter
search
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310416807.2A
Other languages
Chinese (zh)
Inventor
冯浩霖
孟效轲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
XFusion Digital Technologies Co Ltd
Original Assignee
XFusion Digital Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by XFusion Digital Technologies Co LtdfiledCriticalXFusion Digital Technologies Co Ltd
Priority to CN202310416807.2ApriorityCriticalpatent/CN116401255A/en
Publication of CN116401255ApublicationCriticalpatent/CN116401255A/en
Pendinglegal-statusCriticalCurrent

Links

Images

Classifications

Landscapes

Abstract

The embodiment of the application provides a search method and computing equipment. The method comprises the following steps: receiving search words of a user for a document; matching the search word with the content of the document to determine a first matching result; matching the search term with a tag term associated with chapter content of the document, and determining a second matching result; the label word is a word which is not included in the chapter content, and the label has the same meaning as at least one word in the chapter content; and determining a search result based on the first matching result and the second matching result. Therefore, by adding the searchable tag words of the chapters and expanding the word domain of the chapters, the matching degree of the document chapters and the search words input by the user is improved, and the accuracy of search can be improved.

Description

Translated fromChinese
搜索方法及计算设备Search method and computing device

技术领域technical field

本申请涉及服务器技术领域,尤其涉及一种搜索方法及计算设备。The present application relates to the technical field of servers, in particular to a search method and computing equipment.

背景技术Background technique

搜索引擎的索引是实现“单词-文档矩阵”的具体数据结构,“单词-文档矩阵”描述了某篇文档包含了哪些单词,以及某个单词出现在哪些文档中等信息。实现这一具体数据结构的常用技术方案是倒排索引(Inverted index)。所述倒排索引主要是根据属性值来查找记录。在倒排索引表中包括某个单词在一个文档或者一组文档中的存储位置,该存储的位置一般用文档编号来表示,因此,通过倒排索引能够搜索出单词所在的文档编号。The index of the search engine is a specific data structure that realizes the "word-document matrix". The "word-document matrix" describes which words are contained in a certain document, and which documents a certain word appears in. A common technical solution to realize this specific data structure is an inverted index (Inverted index). The inverted index mainly searches for records according to attribute values. The inverted index table includes the storage location of a certain word in a document or a group of documents, and the storage location is generally represented by a document number. Therefore, the document number where the word is located can be searched through the inverted index.

通常,文档的章节检索的实现过程为:先将文档存储在搜索引擎,搜索引擎(如Elastic Search)通过对文档的文本内容分词,生成倒排索引,倒排索引内记录文档内的章节编号(Doc_ID),单词在这个章节中出现的次数(Term Frequency,TF)及单词在章节中哪些位置出现过等信息。在用户输入搜索词时,搜索引擎会查询倒排索引,找到含有该搜索词的章节,根据倒排索引中保存的该搜索词在章节中出现的次数(TF)及在章节中出现的位置来对章节进行排序,最终得到结果。Usually, the implementation process of document chapter retrieval is: first store the document in the search engine, and the search engine (such as Elastic Search) generates an inverted index by segmenting the text content of the document, and records the chapter number in the document in the inverted index ( Doc_ID), the number of times the word appears in this chapter (Term Frequency, TF) and where the word appears in the chapter and other information. When the user enters a search term, the search engine will query the inverted index to find the chapter containing the search term, and then search the index according to the number of times (TF) and the position where the search term appears in the chapter stored in the inverted index. Sort the chapters and finally get the result.

但是,文档有技术化、书面化和标准化写作的特点,用户搜索时的搜索词可能会使用口语、别名甚至错别字,与书面用语匹配度很低甚至于完全匹配不到,导致用户搜索响应度低,不能及时支持用户查看文档信息,增加用户的时间成本。However, documents are characterized by technical, written, and standardized writing. When users search, the search terms may use colloquial terms, aliases, or even typos, which have a low or even no complete match with written terms, resulting in low user search response. , cannot support users to view document information in a timely manner, increasing the time cost of users.

公开于该背景技术部分的信息仅仅旨在增加对本发明的总体背景的理解,而不应当被视为承认或以任何形式暗示该信息构成已为本领域一般技术人员所公知的现有技术。The information disclosed in this Background section is only for enhancing the understanding of the general background of the present invention and should not be taken as an acknowledgment or any form of suggestion that the information constitutes the prior art that is already known to those skilled in the art.

发明内容Contents of the invention

本申请实施例提供了一种搜索方法及计算设备,通过增加章节的可搜索的标签词,扩展章节的词域,从而提高文档章节与用户输入的搜索词契合度,可以提高搜索的准确性,使得使用户能够便捷的查找筛选、定位章节内容;在对文档本身不做改动的情况下,可实时动态维护标签库,业务侵入小,响应迅速,快速提升用户搜索体验。同类型的文档的章节内容相似,标签库具有较好的扩展性和通用性,可以多个文档共用,实现一处维护、处处可用。The embodiment of the present application provides a search method and a computing device. By adding searchable tag words of a chapter and expanding the word domain of the chapter, the degree of fit between the document chapter and the search term entered by the user can be improved, and the accuracy of the search can be improved. It enables users to conveniently search, filter, and locate chapter content; without changing the document itself, it can dynamically maintain the tag library in real time, with small business intrusion, fast response, and rapid improvement of user search experience. The chapter content of the same type of document is similar, and the tag library has good scalability and versatility, and can be shared by multiple documents, so that it can be maintained in one place and available everywhere.

第一方面,本申请实施例提供了一种搜索方法,应用于第一设备,包括:接收用户针对文档的搜索词;将搜索词与文档的内容进行匹配,确定第一匹配结果;将搜索词与文档的章节内容关联的标签词进行匹配,确定第二匹配结果;其中,标签词为章节内容未包括的词语,且标签与章节内容中的至少一个词语语义相同;基于第一匹配结果和第二匹配结果,确定搜索结果。In the first aspect, the embodiment of the present application provides a search method applied to the first device, including: receiving the user's search word for the document; matching the search word with the content of the document to determine the first matching result; Match the tag word associated with the chapter content of the document to determine the second matching result; wherein, the tag word is a word not included in the chapter content, and the tag has the same semantics as at least one word in the chapter content; based on the first matching result and the second Two matching results determine the search results.

本方案中,通过增加章节的可搜索的标签词,扩展章节的词域,从而提高文档章节与用户输入的搜索词契合度,可以提高搜索的准确性,使得使用户能够便捷的查找筛选、定位章节内容;在对文档本身不做改动的情况下,可实时动态维护标签库,业务侵入小,响应迅速,快速提升用户搜索体验。同类型的文档的章节内容相似,标签库具有较好的扩展性和通用性,可以多个文档共用,实现一处维护、处处可用。In this solution, by increasing the searchable label words of the chapters and expanding the word domain of the chapters, the fit between the document chapters and the search words entered by the user can be improved, the accuracy of the search can be improved, and the user can conveniently search, filter and locate Chapter content; without changing the document itself, the tag library can be dynamically maintained in real time, the business intrusion is small, the response is fast, and the user search experience is quickly improved. The chapter content of the same type of document is similar, and the tag library has good scalability and versatility, and can be shared by multiple documents, so that it can be maintained in one place and available everywhere.

在一种可能的实现方式中,第一设备为计算设备,具体的可以为服务器。In a possible implementation manner, the first device is a computing device, specifically, a server.

在一种可能的实现方式中,在将搜索词与文档的章节内容关联的标签词进行匹配之前,方法还包括:获取词组;其中,词组包括标签词及与标签词关联的基准词,基准词为文档中章节的内容中的词语,且与标签词的含义相同;根据词组和文档,确定与文档的章节内容关联的标签词。In a possible implementation, before matching the search term with the tag word associated with the chapter content of the document, the method further includes: acquiring a phrase; wherein, the phrase includes a tag word and a reference word associated with the tag word, the reference word is a word in the content of the chapter in the document, and has the same meaning as the tag word; according to the phrase and the document, determine the tag word associated with the chapter content of the document.

在该实现方式的一个例子中,根据词组和文档,确定与文档的章节内容关联的标签词,包括:基于基准词,搜索文档;在基准词与文档内容匹配的情况下,将与基准词关联的标签词与文档内容所在的章节关联。In an example of this implementation, according to the phrase and the document, determining the tag word associated with the chapter content of the document includes: searching the document based on the reference word; if the reference word matches the content of the document, associate the reference word with the reference word The tag word of is associated with the chapter where the content of the document is located.

本方案中,通过章节内容和基准词的匹配,为匹配基准词的章节关联扩展词域的标签词,从而增加章节可搜索的词域,从而提高文档章节与用户输入的搜索词契合度,可以提高搜索的准确性,使得使用户能够便捷的查找筛选、定位章节内容。In this solution, through the matching of the chapter content and the reference word, the tag word of the extended word domain is associated with the chapter matching the reference word, thereby increasing the searchable word domain of the chapter, thereby improving the fit between the document chapter and the search word entered by the user, which can be Improve the accuracy of the search, enabling users to conveniently search, filter, and locate chapter content.

在该实现方式的一个例子中,方法还包括:获取对文档的历史搜索词;基于历史搜索词,确定标签词。In an example of this implementation, the method further includes: acquiring historical search terms for the document; and determining tag words based on the historical search terms.

本方案中,结合历史搜索情况,可以较为准确的确定能够扩展章节词域的标签词。In this solution, combined with historical search conditions, tag words that can expand the word domain of chapters can be determined more accurately.

示例性地,标签词为未和文档的内容匹配的历史搜索词。Exemplarily, the tag words are historical search words that do not match the content of the document.

在该实现方式的一个例子中,方法还包括:更新词组中的标签词;基于更新后的词组和文档,更新与文档中章节关联的标签词。In an example of this implementation, the method further includes: updating tag words in the phrase; and updating tag words associated with chapters in the document based on the updated phrase and the document.

在一种可能的实现方式中,方法还包括:更新词组中的标签词;基于更新后的词组和文档,更新与文档中章节关联的标签词。In a possible implementation manner, the method further includes: updating tag words in the phrase; and updating tag words associated with chapters in the document based on the updated phrase and the document.

本方案中,通过对词组中的标签词进行更新,从而灵活适配实际场景。In this solution, the tags in the phrases are updated to flexibly adapt to actual scenarios.

在一种可能的实现方式中,第一匹结果包括第一匹配章节和第一匹配得分;第二匹结果包括第二匹配章节和第二匹配得分;基于第一匹配结果和第二匹配结果,确定搜索结果,包括:基于第一匹配得分和第二匹配得分,确定搜索结果;其中,搜索结果为第一匹配章节和/或第二匹配章节。In a possible implementation manner, the first matching result includes the first matching chapter and the first matching score; the second matching result includes the second matching chapter and the second matching score; based on the first matching result and the second matching result, Determining the search result includes: determining the search result based on the first matching score and the second matching score; wherein, the search result is the first matching chapter and/or the second matching chapter.

在该实现方式的一个例子中,基于第一匹配得分和第二匹配得分,确定搜索结果,包括:将第一匹配得分和第二匹配得分中得分高的所对应的章节确定为搜索结果。In an example of this implementation manner, determining a search result based on the first matching score and the second matching score includes: determining a chapter corresponding to a high score among the first matching score and the second matching score as the search result.

在该实现方式的一个例子中,文档的章节关联第三匹配得分;第三匹配得分用于指示章节关联的标签词和章节的匹配程度;第二匹配章节的第二匹配得分为第二匹配章节关联的第三匹配得分。In an example of this implementation, the chapters of the document are associated with a third matching score; the third matching score is used to indicate the degree of matching between the tag word associated with the chapter and the chapter; the second matching score of the second matching chapter is the second matching score of the second matching chapter Associated third match score.

在一种可能的实现方式中,将搜索词与文档的内容进行匹配,确定第一匹配结果,包括:获取文档对应的倒排索引;其中,倒排索引用于指示文档分词后的词语和文档分词后的词语所在的文档的章节;基于倒排索引和搜索词,确定第一匹配结果。In a possible implementation manner, matching the search term with the content of the document to determine the first matching result includes: obtaining an inverted index corresponding to the document; where the inverted index is used to indicate the word and document after word segmentation of the document The chapter of the document where the word after word segmentation is located; based on the inverted index and the search word, the first matching result is determined.

第二方面,本申请实施例提供了一种搜索方法,应用于第二设备,包括:In the second aspect, the embodiment of the present application provides a search method applied to the second device, including:

显示文档,并获取用户针对文档输入的搜索词;将搜索词发送到第一设备,以使第一设备接收用户输入的针对文档的搜索词;接收第一设备发送的搜索结果;其中,搜索结果为第一设备执行第一方面的任一方法确定的;显示文档中搜索结果指示章节的章节标题。Display the document, and obtain the search term entered by the user for the document; send the search term to the first device, so that the first device receives the search term for the document input by the user; receive the search result sent by the first device; wherein, the search result Determined by executing any method of the first aspect for the first device; displaying the chapter title of the chapter indicated by the search result in the document.

在一种可能的实现方式中,方法包括:获取用户选择的章节标题;显示文档中用户选择的章节标题对应的章节内容。In a possible implementation manner, the method includes: acquiring a chapter title selected by the user; and displaying chapter content corresponding to the chapter title selected by the user in the document.

在一种可能的实现方式中,第二设备为终端设备,示例性的可以为计算机。In a possible implementation manner, the second device is a terminal device, which may be, for example, a computer.

第三方面,本申请实施例提供了一种搜索系统,包括第一设备和第二设备;In a third aspect, the embodiment of the present application provides a search system, including a first device and a second device;

第二设备用于显示文档,并获取用户针对文档输入的搜索词;The second device is used for displaying the document, and acquiring a search term entered by the user for the document;

第一设备用于将搜索词与文档的内容进行匹配,确定第一匹配结果;将搜索词与文档的章节内容关联的标签词进行匹配,确定第二匹配结果;其中,标签词为章节内容未包括的词语,且标签与章节内容中的至少一个词语语义相同;基于第一匹配结果和第二匹配结果,确定搜索结果。The first device is used to match the search term with the content of the document to determine the first matching result; to match the search term with the tag word associated with the chapter content of the document to determine the second match result; wherein the tag word is not included in the chapter content The included words, and the label has the same semantics as at least one word in the chapter content; based on the first matching result and the second matching result, the search result is determined.

本方案中,通过增加章节的可搜索的标签词,扩展章节的词域,从而提高文档章节与用户输入的搜索词契合度,可以提高搜索的准确性,使得使用户能够便捷的查找筛选、定位章节内容;在对文档本身不做改动的情况下,可实时动态维护标签库,业务侵入小,响应迅速,快速提升用户搜索体验。同类型的文档的章节内容相似,标签库具有较好的扩展性和通用性,可以多个文档共用,实现一处维护、处处可用。In this solution, by increasing the searchable label words of the chapters and expanding the word domain of the chapters, the fit between the document chapters and the search words entered by the user can be improved, the accuracy of the search can be improved, and the user can conveniently search, filter and locate Chapter content; without changing the document itself, the tag library can be dynamically maintained in real time, the business intrusion is small, the response is fast, and the user search experience is quickly improved. The chapter content of the same type of document is similar, and the tag library has good scalability and versatility, and can be shared by multiple documents, so that it can be maintained in one place and available everywhere.

在一种可能的实现方式中,第一设备在将搜索词与文档的章节内容关联的标签词进行匹配之前,还用于获取词组;其中,词组包括标签词及与标签词关联的基准词,基准词为文档中章节的内容中的词语,且与标签词的含义相同;根据词组和文档,确定与文档的章节内容关联的标签词。In a possible implementation manner, before the first device matches the search term with the tag word associated with the chapter content of the document, it is also used to obtain a phrase; wherein, the phrase includes a tag word and a reference word associated with the tag word, The reference word is a word in the content of the chapter in the document, and has the same meaning as the tag word; according to the word group and the document, the tag word associated with the chapter content of the document is determined.

本方案中,通过章节内容和基准词的匹配,为匹配基准词的章节关联扩展词域的标签词,从而增加章节可搜索的词域,从而提高文档章节与用户输入的搜索词契合度,可以提高搜索的准确性,使得使用户能够便捷的查找筛选、定位章节内容。In this solution, through the matching of the chapter content and the reference word, the tag word of the extended word domain is associated with the chapter matching the reference word, thereby increasing the searchable word domain of the chapter, thereby improving the fit between the document chapter and the search word entered by the user, which can be Improve the accuracy of the search, enabling users to conveniently search, filter, and locate chapter content.

在该实现方式的一个例子中,第一设备用于基于基准词,搜索文档;在基准词与文档内容匹配的情况下,将与基准词关联的标签词与文档内容所在的章节关联。In an example of this implementation, the first device is configured to search for documents based on the reference word; if the reference word matches the content of the document, associate the tag word associated with the reference word with the chapter where the content of the document is located.

在该实现方式的一个例子中,第一设备还用于获取对文档的历史搜索词;基于历史搜索词,确定标签词。In an example of this implementation manner, the first device is further configured to acquire historical search terms for the document; and determine tag words based on the historical search terms.

本方案中,结合历史搜索情况,可以较为准确的确定能够扩展章节词域的标签词。In this solution, combined with historical search conditions, tag words that can expand the word domain of chapters can be determined more accurately.

示例性地,标签词为未和文档的内容匹配的历史搜索词。Exemplarily, the tag words are historical search words that do not match the content of the document.

在该实现方式的一个例子中,第一设备还用于更新词组中的标签词;基于更新后的词组和文档,更新与文档中章节关联的标签词。In an example of this implementation manner, the first device is further configured to update the tag words in the phrase; based on the updated phrase and the document, update the tag words associated with the chapters in the document.

本方案中,通过对词组中的词语增加或删除,具有较好的扩展性,从而灵活适配实际场景。In this solution, by adding or deleting words in the phrase group, it has good scalability, so as to flexibly adapt to the actual scene.

在一种可能的实现方式中,第一匹结果包括第一匹配章节和第一匹配得分;第二匹结果包括第二匹配章节和第二匹配得分;第一设备用于基于第一匹配得分和第二匹配得分,确定搜索结果,其中搜索结果为第一匹配章节和/或第二匹配章节。In a possible implementation manner, the first matching result includes a first matching section and a first matching score; the second matching result includes a second matching section and a second matching score; the first device is configured to The second matching score determines the search result, wherein the search result is the first matching chapter and/or the second matching chapter.

在该实现方式的一个例子中,第一设备用于将第一匹配得分和第二匹配得分中得分高的所对应的章节确定为搜索结果。In an example of this implementation manner, the first device is configured to determine, as the search result, a chapter corresponding to a high score among the first matching score and the second matching score.

在该实现方式的一个例子中,,文档中的章节还关联第三匹配得分;第三匹配得分指示了章节关联的标签词和章节的匹配程度;第二匹配章节的第二匹配得分为第二匹配章节关联的第三匹配得分。In an example of this implementation, the chapters in the document are also associated with a third matching score; the third matching score indicates the degree of matching between the tag word associated with the chapter and the chapter; the second matching score of the second matching chapter is the second The third match score associated with the match chapter.

在一种可能的实现方式中,第一设备用于获取文档对应的倒排索引;其中,倒排索引指示了文档分词后的词语和文档分词后的词语所在的文档的章节;基于倒排索引和搜索词,确定第一匹配结果。In a possible implementation manner, the first device is used to obtain an inverted index corresponding to the document; wherein, the inverted index indicates the word after the word segmentation of the document and the chapter of the document where the word after the word segmentation of the document is located; based on the inverted index and the search term to determine the first matching result.

在一种可能的实现方式中,第二设备还用于显示文档中搜索结果指示章节的章节标题。In a possible implementation manner, the second device is further configured to display the chapter title of the chapter indicated by the search result in the document.

在一种可能的实现方式中,第二设备还用于获取用户选择的章节标题;显示文档中用户选择的章节标题对应的章节内容。In a possible implementation manner, the second device is further configured to acquire a chapter title selected by the user; and display chapter content corresponding to the chapter title selected by the user in the document.

第四方面,本申请实施例提供了一种搜索装置,包括:至少一个存储器,用于存储程序;至少一个处理器,用于执行存储器存储的程序,当存储器存储的程序被执行时,处理器用于执行第一方面中提供的方法,或者,第二方面中提供的方法。In a fourth aspect, the embodiment of the present application provides a search device, including: at least one memory for storing programs; at least one processor for executing the programs stored in the memory, and when the programs stored in the memory are executed, the processor uses To perform the method provided in the first aspect, or the method provided in the second aspect.

第五方面,本申请实施例提供了一种搜索装置,其特征在于,装置运行计算机程序指令,以执行第一方面中提供的方法,或者,第二方面中提供的方法。示例性的,该装置可以为芯片,或处理器。In the fifth aspect, the embodiment of the present application provides a search device, wherein the device runs computer program instructions to execute the method provided in the first aspect, or the method provided in the second aspect. Exemplarily, the device may be a chip or a processor.

在一个例子中,该装置可以包括处理器,该处理器可以与存储器耦合,读取存储器中的指令并根据该指令执行第二方面中所提供的方法,或者执行第三方面中所提供的方法。其中,该存储器可以集成在芯片或处理器中,也可以独立于芯片或处理器之外。In one example, the device may include a processor, and the processor may be coupled with the memory, read instructions in the memory and execute the method provided in the second aspect according to the instructions, or execute the method provided in the third aspect . Wherein, the memory may be integrated in the chip or the processor, or independent of the chip or the processor.

第六方面,本申请实施例提供了一种计算机存储介质,计算机存储介质中存储有指令,当指令在计算机上运行时,使得计算机执行第一方面中提供的方法,或者,第二方面中提供的方法。In the sixth aspect, the embodiment of the present application provides a computer storage medium, and instructions are stored in the computer storage medium, and when the instructions are run on the computer, the computer is made to execute the method provided in the first aspect, or, the method provided in the second aspect Methods.

第七方面,本申请实施例提供了一种包含指令的计算机程序产品,当指令在计算机上运行时,使得计算机执行第一方面中提供的方法,或者,第二方面中提供的方法。In the seventh aspect, the embodiment of the present application provides a computer program product containing instructions, which, when the instructions are run on a computer, cause the computer to execute the method provided in the first aspect, or the method provided in the second aspect.

附图说明Description of drawings

图1是本申请实施例提供的一种搜索系统的系统架构图;FIG. 1 is a system architecture diagram of a search system provided by an embodiment of the present application;

图2是相关技术的一种章节搜索的示意图;FIG. 2 is a schematic diagram of a chapter search in the related art;

图3是本申请实施例提供的搜索方法的流程示意图一;FIG. 3 is a first schematic flow diagram of the search method provided by the embodiment of the present application;

图4是本申请实施例提供的文档的界面示意图;Fig. 4 is a schematic interface diagram of the document provided by the embodiment of the present application;

图5是本申请实施例提供的搜索方法的流程示意图二;FIG. 5 is a schematic flow diagram II of the search method provided by the embodiment of the present application;

图6A是本申请实施例提供的搜索方法中文档处理的示意图;FIG. 6A is a schematic diagram of document processing in the search method provided by the embodiment of the present application;

图6B是本申请实施例提供的搜索方法中搜索的示意图;Fig. 6B is a schematic diagram of searching in the searching method provided by the embodiment of the present application;

图7A是本申请实施例提供的一种技术文档的界面显示示意图;Fig. 7A is a schematic diagram of an interface display of a technical document provided by an embodiment of the present application;

图7B是图7A中搜索上传得到的搜索结果的示意图;Fig. 7B is a schematic diagram of the search results obtained by searching and uploading in Fig. 7A;

图8是本申请实施例提供的搜索装置的结构示意图一;Fig. 8 is a first structural schematic diagram of a search device provided by an embodiment of the present application;

图9是本申请实施例提供的搜索装置的结构示意图二;FIG. 9 is a second structural schematic diagram of the search device provided by the embodiment of the present application;

图10是本申请实施例提供的一种电子设备的结构示意图。FIG. 10 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

具体实施方式Detailed ways

为了使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图,对本申请实施例中的技术方案进行描述。In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described below in conjunction with the accompanying drawings.

在本申请实施例的描述中,“示例性的”、“例如”或者“举例来说”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”、“例如”或者“举例来说”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”、“例如”或者“举例来说”等词旨在以具体方式呈现相关概念。In the description of the embodiments of the present application, words such as "exemplary", "for example" or "for example" are used to represent examples, illustrations or illustrations. Any embodiment or design described as "exemplary", "for example" or "for example" in the embodiments of the present application shall not be construed as being more preferred or more advantageous than other embodiments or designs. Rather, the use of words such as "exemplary", "for example" or "for example" is intended to present related concepts in a specific manner.

在本申请实施例的描述中,术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,单独存在B,同时存在A和B这三种情况。另外,除非另有说明,术语“多个”的含义是指两个或两个以上。例如,多个系统是指两个或两个以上的系统,多个终端是指两个或两个以上的终端。In the description of the embodiments of the present application, the term "and/or" is only an association relationship describing associated objects, indicating that there may be three relationships, for example, A and/or B may indicate: A exists alone, A exists alone There is B, and there are three cases of A and B at the same time. In addition, unless otherwise specified, the term "plurality" means two or more. For example, multiple systems refer to two or more systems, and multiple terminals refer to two or more terminals.

此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。In addition, the terms "first" and "second" are used for descriptive purposes only, and cannot be understood as indicating or implying relative importance or implicitly specifying indicated technical features. Thus, a feature defined as "first" and "second" may explicitly or implicitly include one or more of these features. The terms "including", "comprising", "having" and variations thereof mean "including but not limited to", unless specifically stated otherwise.

以下,对本实施例中的部分用语进行解释说明。需要说明的是,这些解释是为了便于本领域技术人员理解,并不是对本发明所要求的保护范围构成限定。Hereinafter, some terms used in this embodiment will be explained. It should be noted that these explanations are for the convenience of those skilled in the art to understand, and do not limit the scope of protection required by the present invention.

搜索引擎:是根据用户需求与一定算法,运用特定策略从互联网检索出指定信息反馈给用户的一门检索技术。Search engine: It is a search technology that uses specific strategies to retrieve specified information from the Internet and feed back to users according to user needs and certain algorithms.

标签库:是“按照数据结构来组织、存储和管理数据的仓库”。是一个长期存储在计算机内的、有组织的、可共享的、统一管理的大量数据的集合。Tag library: It is "a warehouse that organizes, stores and manages data according to the data structure". It is a collection of large amounts of data that is stored in a computer for a long time, organized, shareable, and managed in a unified manner.

标识:标志产品目标、分类或内容。Identification: Identifying product objectives, categories, or content.

倒排索引:也常被称为反向索引、置入档案或反向档案,是一种索引方法,被用来存储在全文搜索下某个单词在一个文档或者一组文档中的存储位置的映射。它是文档检索系统中最常用的数据结构。通过倒排索引,可以根据单词快速获取包含这个单词的文档列表。倒排索引主要由两个部分组成:“单词词典”和“倒排文件”。Inverted index: also commonly known as reverse index, embedded file or reverse file, is an index method used to store the storage location of a word in a document or a group of documents under full-text search map. It is the most commonly used data structure in document retrieval systems. Through the inverted index, you can quickly obtain a list of documents containing this word according to the word. The inverted index mainly consists of two parts: "word dictionary" and "inverted file".

接下来对本发明实施例提供的搜索方法可能应用的搜索系统进行介绍。图1示出了本发明实施例提供的一种搜索系统的架构示例图。本发明实施例提供了搜索方法可以应用于如图1所示的系统架构图。如图1所示,搜索系统包括终端设备110和服务器120。Next, the search system to which the search method provided by the embodiment of the present invention may be applied will be introduced. FIG. 1 shows an example diagram of the architecture of a search system provided by an embodiment of the present invention. The embodiment of the present invention provides that the search method can be applied to the system architecture diagram shown in FIG. 1 . As shown in FIG. 1 , the search system includes aterminal device 110 and aserver 120 .

其中,终端设备110可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑。本方案中涉及的终端设备的示例性实施例包括但不限于搭载iOS、android、Windows、鸿蒙系统(Harmony OS)或者其他操作系统的电子设备。本发明实施例对电子设备的类型不做具体限定。Wherein, theterminal device 110 may be, but not limited to, various personal computers, notebook computers, smart phones, and tablet computers. Exemplary embodiments of terminal devices involved in this solution include, but are not limited to, electronic devices equipped with iOS, android, Windows, Harmony OS or other operating systems. The embodiment of the present invention does not specifically limit the type of the electronic device.

其中,服务器120可以为服务器集群120。在一个例子中,本方案中涉及的服务器可以用于提供云服务,其可以为一种可以与其他的设备建立通信连接、且能为其他的设备提供运算功能和/或存储功能的服务器或者是超级终端。其中,本方案中涉及的服务器为硬件服务器,可以为机架服务器,高密服务器或者整机柜服务器。Wherein, theserver 120 may be aserver cluster 120 . In an example, the server involved in this solution can be used to provide cloud services, which can be a server that can establish a communication connection with other devices and can provide computing functions and/or storage functions for other devices or a HyperTerminal. Wherein, the server involved in this solution is a hardware server, which may be a rack server, a high-density server or a whole cabinet server.

其中,终端设备110通过网络与服务器120通过网络进行通信。网络可以为有线网络或无线网络。示例地,有线网络可以为电缆网络、光纤网络、数字数据网(Digital DataNetwork,DDN)等,无线网络可以为电信网络、内部网络、互联网、局域网络(Local AreaNetwork,LAN)、广域网络(Wide Area Network,WAN)、无线局域网络(Wireless Local AreaNetwork,WLAN)、城域网(Metropolitan Area Network,MAN)、公共交换电话网络(PublicService Telephone Network,PSTN)、蓝牙网络、紫蜂网络(ZigBee)、移动电话(GlobalSystem for Mobile Communications,GSM)、CDMA(Code Division Multiple Access)网络、CPRS(GeneralPacketRadioService)网络等或其任意组合。可以理解的是,网络可使用任何已知的网络通信协议来实现不同客户端层和网关之间的通信,上述网络通信协议可以是各种有线或无线通信协议,诸如以太网、通用串行总线(universal serial bus,USB)、火线(firewire)、全球移动通讯系统(global system for mobile communications,GSM)、通用分组无线服务(general packet radio service,GPRS)、码分多址接入(code divisionmultiple access,CDMA)、宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA)、长期演进(long term evolution,LTE)、新空口(new radio,NR)、蓝牙(bluetooth)、无线保真(wireless fidelity,Wi-Fi)等通信协议。Wherein, theterminal device 110 communicates with theserver 120 through the network. The network can be a wired network or a wireless network. Exemplarily, the wired network can be a cable network, an optical fiber network, a digital data network (Digital DataNetwork, DDN), etc., and the wireless network can be a telecommunication network, an internal network, the Internet, a local area network (Local Area Network, LAN), a wide area network (Wide Area Network) Network, WAN), wireless local area network (Wireless Local Area Network, WLAN), metropolitan area network (Metropolitan Area Network, MAN), public switched telephone network (Public Service Telephone Network, PSTN), Bluetooth network, ZigBee network (ZigBee), mobile Telephone (Global System for Mobile Communications, GSM), CDMA (Code Division Multiple Access) network, CPRS (General Packet Radio Service) network, etc. or any combination thereof. It can be understood that the network can use any known network communication protocol to realize the communication between different client layers and gateways, and the above network communication protocol can be various wired or wireless communication protocols, such as Ethernet, Universal Serial Bus (universal serial bus, USB), firewire (firewire), global system for mobile communications (global system for mobile communications, GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multiple access , CDMA), wideband code division multiple access (WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (long term evolution, LTE), new air interface (new radio, NR), Bluetooth (bluetooth), wireless fidelity (wireless fidelity, Wi-Fi) and other communication protocols.

终端设备110可以显示文档(Document),当文档的篇幅较长,用户可以输入搜索词来检索文档的章节,根据检索结果快速跳转到用户想看的章节。比如,用户通过终端设备110上安装的浏览器在线加载并显示文档,之后,用户可以在显示文档的界面输入搜索词,服务器120会基于搜索词检测文档的内容,将检索结果发送到终端设备110,这样,终端设备110会跳转到相对应的章节。其中,文档表示以文本形式存在的存储对象,比如公司开发的网站中的网页的集合等都可以称之为文档。在一些可能的实现方式,文档可以为技术文档;其中,技术文档(technical documentation)是指任何类型描述技术产品或在研产品或用途的操作、功能和架构的文档。在一些可能的情况,技术文档中可包括多个网页,网页作为章节,每个章节具有章节标识(Doc_ID),从而区别不同的章节。Theterminal device 110 can display a document (Document). When the length of the document is long, the user can input a search word to retrieve the chapters of the document, and quickly jump to the chapter that the user wants to see according to the search result. For example, a user loads and displays a document online through a browser installed on theterminal device 110. Afterwards, the user can input a search term on the interface for displaying the document, and theserver 120 will detect the content of the document based on the search term, and send the search result to theterminal device 110. , so that theterminal device 110 will jump to the corresponding chapter. Wherein, a document represents a storage object existing in the form of text, for example, a collection of web pages in a website developed by a company may be called a document. In some possible implementation manners, the document may be a technical document; wherein, the technical document (technical documentation) refers to any type of document describing the operation, function and structure of a technical product or product under research or use. In some possible cases, the technical document may include multiple webpages, the webpages are used as chapters, and each chapter has a chapter identifier (Doc_ID), so as to distinguish different chapters.

在相关技术中,服务器120检索文档的章节通常借助通用搜索引擎(如ElasticSearch)或标签库技术,在对搜索词进行分词后,再通过分词后的词语(为了便于描述和区别,可以称为关键词)匹配文档内容的方式来实现。In related technologies, theserver 120 retrieves chapters of documents usually by means of a general search engine (such as ElasticSearch) or tag library technology. words) to match the content of the document.

在相关技术中,主要通过如下两种实现方式实现文档的内容检索。In related technologies, content retrieval of documents is mainly realized through the following two implementation manners.

实现方式1:通过倒排索引的方式实现文档的内容检索。对于任意一个文档,文档的倒排索引的构建过程为:先将文档存储在搜索引擎,搜索引擎(如Elastic Search)通过对文档文本内容分词,生成倒排索引,倒排索引内记录文档内的章节标识(Doc_ID)和文档分词后的词语(如图2中的term),进一步的,还可以记录单词在这个章节中出现的次数(TF)及单词在章节中哪些位置出现过等信息。后续,在用户输入搜索词时,搜索引擎对该搜索词进行分词,得到分词列表,然后查询倒排索引,找到含有分词类别中的词语(关键词)的章节,根据倒排索引中保存的该关键词在章节中出现的次数(TF)及在章节中出现的位置来对章节进行排序,最终得到搜索结果。Implementation method 1: Realize content retrieval of documents by way of inverted index. For any document, the construction process of the inverted index of the document is as follows: first store the document in the search engine, and the search engine (such as Elastic Search) generates an inverted index by segmenting the text content of the document, and records the content in the document in the inverted index The chapter identifier (Doc_ID) and the word after document segmentation (term in Figure 2), and further, the number of times the word appears in this chapter (TF) and where the word appears in the chapter and other information can also be recorded. Subsequently, when the user enters a search word, the search engine performs word segmentation on the search word to obtain a word segmentation list, and then queries the inverted index to find chapters containing words (keywords) in the word segmentation category. The keywords appear in the chapter (TF) and the position in the chapter to sort the chapters, and finally get the search results.

实现方式2:将文档内容存储在关系型标签库(如Mysql),通过特定的搜索语句,将搜索词与章节内容进行完全匹配或模糊匹配,找到匹配的章节。Implementation method 2: store the content of the document in a relational tag library (such as Mysql), and use a specific search statement to match the search term with the content of the chapter exactly or fuzzily, and find the matching chapter.

对于实现方式1,具有如下缺点。Forimplementation mode 1, there are the following disadvantages.

1.无法匹配到文档中隐藏的元数据内容,只能对文本中存在的可以显示的字词进行匹配。1. It cannot match the hidden metadata content in the document, but can only match the words that can be displayed in the text.

2.文档有技术化、书面化和标准化写作的特点,但是用户搜索时的关键词可能会使用口语、别名甚至错别字进行搜索,与书面用语匹配度很低甚至于完全匹配不到;导致用户搜索响应度低,不能及时支持用户查看文档信息,增加用户的时间成本与学习成本。2. Documents are characterized by technical, written, and standardized writing, but users may use colloquial words, aliases, or even typos to search for keywords, which have a low degree of matching with written words or even no complete match; causing users to search The responsiveness is low, and it cannot support users to view document information in a timely manner, which increases the time cost and learning cost of users.

3.在文档的简介或章节中添加关键词、简介等方式,虽然可以补充、增加文档可搜索词的范围,提高搜准率,但维护困难,不具备实时性,需要修改文档并重新发布,且会使得文档越来越冗余、臃肿。3. Adding keywords, introduction, etc. to the introduction or chapters of the document can supplement and increase the range of searchable words in the document and improve the search accuracy rate, but it is difficult to maintain and does not have real-time performance. It is necessary to modify the document and republish it. And it will make the document more and more redundant and bloated.

对于实现方式2,具有如下缺点。As for theimplementation mode 2, it has the following disadvantages.

1.借助标签库实现的搜索,无法对搜索词进行量化,无法量化搜索词在正文中的匹配度,搜索效果较差。1. With the help of the tag library, the search term cannot be quantified, and the matching degree of the search term in the text cannot be quantified, so the search effect is poor.

2.还存在与实现方式1中同样的缺点。2. There are also the same disadvantages as those inImplementation Mode 1.

综上,目前相关技术中主要存在的问题在于,用户在搜索文档时输入的搜索词可能与文档章节内容的书面语意思相近,但字词不同,无法搜索到文档章节。To sum up, the main problem in current related technologies is that the search term entered by the user when searching for a document may have a similar meaning to the written content of the document chapter content, but the words are different, and the document chapter cannot be searched.

基于此,本发明实施例提出了如下搜索方法。Based on this, the embodiment of the present invention proposes the following search method.

该方法通过建立标签库,标签库中存储有多个词组,每个词组中的词语的语义相同或相近,可以理解为同义词或近义词的集合,后续,可以建立文档中的每个章节和标签库中的词组之间的关联,从而通过标签库中的词组增加文档中章节的可搜索词域,增加文档中的章节与用户输入的搜索词的契合度,后续,综合基于搜索词与文档中章节的章节内容匹配得到的第一匹配结果,以及,搜索词与章节关联的词组匹配得到的第二匹配结果,得到最终的搜索结果,从而解决了用户在搜索文档中章节时输入的搜索词与文档的章节内容的书面语意思相近,但字词不同时无法搜索的技术问题;综上,本申请实施例中,一方面,通过增加章节的可搜索的标签词,扩展章节的词域,从而提高文档章节与用户输入的搜索词契合度,可以提高搜索的准确性,使得使用户能够便捷的查找筛选、定位章节内容;另一方面,在对文档本身不做改动的情况下,可实时动态维护标签库,业务侵入小,响应迅速,快速提升用户搜索体验。又一方面,同类型的文档的章节内容相似,标签库具有较好的扩展性和通用性,可以多个文档共用,实现一处维护、处处可用。This method establishes a tag library, which stores multiple phrases, and the words in each phrase have the same or similar semantics, which can be understood as synonyms or a collection of synonyms. Subsequently, each chapter and tag library in the document can be established The association between the phrases in the tag library, so as to increase the searchable word domain of the chapters in the document through the phrases in the tag library, and increase the fit between the chapters in the document and the search terms entered by the user. Subsequently, based on the search terms and the chapters in the document The first matching result obtained by matching the content of the chapter, and the second matching result obtained by matching the search term with the phrase associated with the chapter, to obtain the final search result, thus solving the problem of the search term and document entered by the user when searching for a chapter in a document The written words of the content of the chapters have similar meanings, but the words are different and cannot be searched for technical problems; in summary, in the embodiment of the present application, on the one hand, by adding searchable tag words of the chapters, the word domain of the chapters is expanded, thereby improving the document The fit between the chapter and the search term entered by the user can improve the accuracy of the search, so that the user can conveniently search, filter and locate the chapter content; on the other hand, without changing the document itself, the label can be dynamically maintained in real time Library, less business intrusion, quick response, and quickly improve user search experience. On the other hand, the content of the chapters of the same type of document is similar, and the tag library has good scalability and versatility, and can be shared by multiple documents, so that it can be maintained in one place and available everywhere.

这里仅仅是对方法的简述,关于该方法的详细内容参见下文描述。Here is only a brief description of the method, and the details of the method are described below.

接下来,结合上述提供的搜索系统,对本发明实施例提供的一种搜索方法进行详细介绍。本申请将以服务器120、终端设备110为例,对本申请实施例提供的方案进行阐述。终端设备110和服务器120仅仅作为示例并不构成具体限定,在实际应用中,服务器120可以包括一台或多台服务设备,其中的任一台服务设备可以称为第一设备,终端设备110可以称为第二设备。Next, a search method provided by an embodiment of the present invention is described in detail in combination with the search system provided above. This application will take theserver 120 and theterminal device 110 as examples to describe the solution provided by the embodiment of this application. Theterminal device 110 and theserver 120 are only examples and do not constitute a specific limitation. In practical applications, theserver 120 may include one or more service devices, any one of which may be called a first device, and theterminal device 110 may be called the second device.

图3是本申请实施例提供的搜索方法的流程示意图。如图3所示,本申请实施例提供的搜索方法至少包括如下步骤:FIG. 3 is a schematic flowchart of a search method provided by an embodiment of the present application. As shown in Figure 3, the search method provided in the embodiment of the present application at least includes the following steps:

步骤301、服务器120构建标签库,标签库包括词组,词组包括标签词和基准词,标签词和基准词的含义相同,所述标签词为所述章节内容未包括的词语,且所述标签与所述章节内容中的至少一个词语语义相同。Step 301, theserver 120 constructs a tag library, the tag library includes phrases, the phrase includes tag words and reference words, the tag words and the reference words have the same meaning, the tag words are words not included in the chapter content, and the tags and At least one word in the content of the chapters has the same semantic meaning.

其中,标签库中维护的是若干个词组,词组是同义词的词语集合。Among them, several phrases are maintained in the tag library, and the phrases are a collection of words of synonyms.

示例性地,词组包括“安装步骤”、“安装流程”、“步骤”、“过程”。Exemplarily, the phrases include "installation steps", "installation flow", "steps", and "process".

示例性地,词组包括“安装方法”、“如何安装”。Exemplarily, the phrases include "installation method", "how to install".

示例性地,词组包括“上传”、“传输文件”、“文件传输”、“SSH”。Exemplary phrases include "upload", "transfer file", "file transfer", "SSH".

在一个例子中,对于每个词组,该词组分成标签词和其对应的若干个基准词。可选地,可以从词组中随机选择一个词语作为标签词,也可以人为指定词组中的词语作为标签词。标签词用于唯一标识一个词组,可以作为该词组的标识。需要说明的是,基准词一般为待处理的文档的章节中出现的较为专业的关键词,标签词为用户可能使用的文档的章节中未出现的搜索词,为符合用户习惯的搜索词。另外,词组中的标签词可以有1个,也可以有多个,本申请实施例以1个为例进行方案阐述。In one example, for each phrase group, the phrase group is divided into tag words and several corresponding reference words. Optionally, a word may be randomly selected from the phrase group as the tag word, or a word in the phrase group may be manually designated as the tag word. Tag words are used to uniquely identify a phrase and can be used as the identifier of the phrase. It should be noted that the benchmark words are generally more professional keywords appearing in the chapters of the document to be processed, and the tag words are search words that do not appear in the chapters of the document that the user may use, and are search words that meet the user's habits. In addition, there may be one tag word or multiple tag words in the phrase group, and the embodiment of the present application uses one tag word as an example to illustrate the solution.

示例性地,标签词为“安装步骤”,基准词可以包括“安装流程”、“步骤”、“过程”。Exemplarily, the label word is "installation step", and the reference words may include "installation process", "step", and "process".

示例性地,标签词为“安装方法”,基准词可以包括“如何安装”。Exemplarily, the tag word is "installation method", and the reference word may include "how to install".

示例性地:标签词为“上传”,基准词可以包括“传输文件”、“文件传输”、“SSH”。Exemplarily: the tag word is "upload", and the reference word may include "transfer file", "file transfer" and "SSH".

本申请实施例中,确定标签词的方式可以有多种。In the embodiment of the present application, there may be multiple ways to determine tag words.

在一个例子中,标签词的构建方式可以是人为设计配置。In one example, the construction method of tag words may be a human-designed configuration.

在一个例子中,标签词可以基于历史的搜索词(为了便于描述和区别,称为历史搜索词)确定。示例性地,构建标签词的具体过程为:获取用户针对文档输入的历史搜索词,基于历史搜索词和文档的内容进行匹配,若匹配失败,则将历史搜索词作为标签词。另外,在历史搜索词有多个时,若存在多个历史搜索词均未和文档的内容匹配,则多个历史搜索词均可以作为标签词,不同的标签词位于不同的词组中。In one example, tag words can be determined based on historical search words (for ease of description and distinction, referred to as historical search words). Exemplarily, the specific process of constructing tag words is: obtaining the historical search words entered by the user for the document, matching based on the historical search words and the content of the document, and using the historical search words as the tag words if the matching fails. In addition, when there are multiple historical search terms, if there are multiple historical search terms that do not match the content of the document, then multiple historical search terms can be used as tag words, and different tag words are located in different phrase groups.

本申请实施例中,构建词组的方式可以有多种。In the embodiment of the present application, there may be multiple ways of constructing phrases.

在一个例子中,词组的构建方式可以是人为设计配置。In one example, the manner in which the phrases are constructed may be a human-designed configuration.

在一个例子中,词组可以基于历史搜索词确定。示例性地,构建词组的具体过程为:获取用户针对文档输入的多个历史搜索词,基于历史搜索词,确定词组;后续将词组加入标签库中。In one example, phrase groups can be determined based on historical search terms. Exemplarily, the specific process of constructing a phrase is: obtaining multiple historical search words entered by a user for a document, and determining a phrase based on the historical search words; and subsequently adding the phrase to the tag library.

在该例子下的一个示例中,在搜索词较多时,基于多个历史搜索词,确定词组的具体过程可以为:基于多个历史搜索词,确定多个候选词组;基于候选词组,确定词组。In an example under this example, when there are many search words, the specific process of determining a phrase based on multiple historical search words may be: determining multiple candidate phrases based on multiple historical search words; determining a phrase based on the candidate phrases.

可选地,基于多个历史搜索词,确定多个候选词组的过程如下:对多个历史搜索词分别分词,对得到的多个词语进行聚类,得到多个候选词组。具体地,聚类的过程可以为:首先对多个词语进行向量化的表示,接着基于多个词语各自的词向量进行聚类,将在一定距离范围内的词向量对应的词语作为一个候选词组。Optionally, the process of determining multiple candidate phrases based on multiple historical search words is as follows: respectively segment the multiple historical search words, cluster the obtained multiple words, and obtain multiple candidate phrases. Specifically, the clustering process can be as follows: firstly, vectorized representations of multiple words are performed, and then clustering is performed based on the respective word vectors of the multiple words, and the words corresponding to the word vectors within a certain distance range are used as a candidate phrase .

可选地,候选词组确定词组的过程为:人为对候选词组中的词语进行核查,可以删除词语,也可以新增词语,之后,将核查后的候选词组作为词组,后续,可以加入标签库中。Optionally, the process of determining the phrase for the candidate phrase is: artificially check the words in the candidate phrase, you can delete words, you can also add words, and then use the checked candidate phrases as phrases, and subsequently, you can add them to the tag library .

后续,对于标签库中的词组中的词语,在词语和文档的内容匹配的情况下,可以将该词语作为基准词,在词语未和文档的内容匹配的情况下,可以将该词语作为标签词。Subsequently, for the words in the phrases in the tag library, if the word matches the content of the document, the word can be used as the reference word, and if the word does not match the content of the document, the word can be used as the tag word .

步骤302、服务器120基于标签库和文档,确定标签词章节关联库;其中,标签词章节关联库至少指示了文档中的章节关联的标签词。In step 302, theserver 120 determines a tag word chapter association library based on the tag library and the document; wherein, the tag word chapter association library at least indicates tag words associated with chapters in the document.

示例性地,文档可以为技术文档。在实际应用中,用户通过终端设备110可以在线加载显示并浏览该技术文档。应当理解的是,技术文档仅仅作为示例,并不构成具体限定,本申请实施例中的文档可以为任何可以在线浏览搜索的文档。Exemplarily, the document may be a technical document. In practical applications, the user can load, display and browse the technical document online through theterminal device 110 . It should be understood that the technical documents are merely examples and do not constitute specific limitations, and the documents in this embodiment of the application may be any documents that can be browsed and searched online.

其中,章节关联的标签词可以有一个也可以有多个。比如,可以为1个,再比如可以为多个。值得注意的是,关联这一词语可以理解为对应,也可以理解为相关。Wherein, there may be one or more tag words associated with chapters. For example, it may be one, and another example may be multiple. It is worth noting that the word association can be understood as correspondence or correlation.

需要说明的是,在实际应用中,可以为每个文档建立一个标签词章节关联库,也可以为所有的文档统一建立一个标签词章节关联库。本申请实施例对此不做具体限定,具体可结合实际需求设计。It should be noted that, in practical applications, a tag word chapter association library may be established for each document, or a tag word chapter association library may be uniformly established for all documents. The embodiment of the present application does not specifically limit this, and it may be specifically designed in combination with actual requirements.

示例性地,标签词章节关联库可以通过关联记录存储文档中的章节关联的标签词。换言之,标签词章节关联库存储有文档中章节对应的关联记录。关联记录用于描述文档中的章节和章节匹配的标签词。章节匹配的关键词可以有多个;示例性地,一条关键记录可以记录章节匹配的一个关键词,也可以记录章节匹配的所有的标签词。本申请实施例以一条关键记录用于记录章节匹配的所有的标签词为例进行描述。Exemplarily, the tag word-chapter association library may store tag words associated with chapters in the document through association records. In other words, the tag word-chapter association library stores the associated records corresponding to the chapters in the document. Associated records are used to describe the chapters in the document and the tag words that the chapters match. There may be multiple keywords matched by a chapter; for example, a key record may record one keyword matched by a chapter, or all tag words matched by a chapter. In this embodiment of the present application, a key record is used as an example to record all tag words matched by chapters.

根据一种可行的实现方式,章节对应的关联记录包括文档的标识、章节的标识、若干个词组中的标签词、若干个词组的标识。在一些可能的场景下,词组具有编号。在该场景下,词组的标识可以为词组的编号。文档的标识可以为文档的名称,章节的标识可以为章节标题。示例性地,词组包括“上传”、“传输文件”、“文件传输”、“SSH”,章节的标识为使用WinSCP传输文件,文档的标识为FusionServer Tools 2.3.0用户指南02S。According to a feasible implementation manner, the associated records corresponding to the chapters include identifiers of documents, identifiers of chapters, tag words in several phrases, and identifiers of several phrases. In some possible scenarios, phrases are numbered. In this scenario, the identifier of the phrase may be the number of the phrase. The identifier of the document may be the name of the document, and the identifier of the chapter may be the title of the chapter. Exemplarily, the phrases include "upload", "transfer file", "file transfer", and "SSH", the section is identified as using WinSCP to transfer files, and the document is identified as FusionServer Tools 2.3.0 User Guide 02S.

在一种可能的情况,可以采用键值对的方式存储关联记录;比如,键可以为章节的标识(DocId),值可以为文档的标识、若干个词组的标识(TagId)、标签词。In a possible case, associated records can be stored in the form of key-value pairs; for example, the key can be the identifier of a chapter (DocId), and the value can be an identifier of a document, identifiers of several phrases (TagId), and tag words.

在另一种可能的情况,可以采用数据表的方式存储关联记录,比如,数据表中的多个字段分别为文档的标识、若干个词组的Id、文档中的章节的标识(DocId)、标签词,在实际应用中,关联记录作为一条记录记载在数据表中。In another possible situation, data tables can be used to store associated records. For example, the multiple fields in the data table are the identifier of the document, the Id of several phrases, the identifier (DocId) of the chapter in the document, and the label term, in practical applications, the associated record is recorded in the data table as a record.

在该实现方式中,获取文档中每个章节的关联记录可以通过如下方式实现:In this implementation, obtaining the associated records of each chapter in the document can be achieved as follows:

获取标签库;对于文档中多个章节中的每个章节,将章节的章节内容和标签库中的词组中的基准词进行匹配,在确定匹配时,将词组中的标签词和章节关联,确定章节对应的关联记录。这里,通过章节内容和基准词的匹配,为匹配基准词的章节关联扩展词域的标签词,从而增加章节可搜索的词域,从而提高文档章节与用户输入的搜索词契合度,可以提高搜索的准确性,使得使用户能够便捷的查找筛选、定位章节内容。Obtain the tag library; for each of the multiple chapters in the document, match the chapter content of the chapter with the reference word in the phrase in the tag library, and when determining the match, associate the tag word in the phrase with the chapter to determine The associated record corresponding to the chapter. Here, through the matching of the chapter content and the reference word, the chapters matching the reference word are associated with the tag words of the expanded word domain, thereby increasing the searchable word domain of the chapter, thereby improving the fit between the document chapter and the search word entered by the user, and improving the search result. The accuracy enables users to search, filter and locate chapter content conveniently.

具体地实现过程可以为:将文档的内容按章节为单元数据存储在搜索引擎(Es);对于文档中的每个章节,按顺序读取标签库中的词组,搜索引擎(Es)基于读取到的词组中的基准词去搜索章节的章节内容,当匹配到章节内容时,确定词组中的标签词为与章节匹配的关键词,在遍历标签库中的词组后,确定关联记录。The specific implementation process can be as follows: the content of the document is stored in the search engine (Es) by chapter as unit data; for each chapter in the document, the phrases in the tag library are read in order, and the search engine (Es) reads The reference word in the phrase is used to search the chapter content of the chapter. When the chapter content is matched, the tag word in the phrase is determined to be the keyword matching the chapter. After traversing the phrase in the tag library, the associated record is determined.

需要说明的是,标签库和标签词章节关联库可以为不同的数据库,换言之,词组和关联记录存储在不同的数据库中;也可以为相同的数据库,换言之,词组和关联记录存储在一个的数据库中。本申请实施例并不意图对词语和关联记录的存储方式进行限定,具体可结合实际情况设计。It should be noted that the tag library and tag word chapter association library can be different databases, in other words, phrases and associated records are stored in different databases; they can also be the same database, in other words, phrases and associated records are stored in one database middle. The embodiments of the present application do not intend to limit the storage methods of words and associated records, which may be specifically designed in combination with actual situations.

步骤303、终端设备110显示文档。Step 303, theterminal device 110 displays the document.

示例性地,文档可以为通过浏览器在线访问的网站中的文档,该文档包括多个章节内容。如图4所示,在当前网页中,文档可以包括多个章节标题和1个章节内容显示区域,用户通过点击不同的章节标题,从而在章节内容显示区域内显示不同的章节标题对应的章节内容。Exemplarily, the document may be a document in a website accessed online through a browser, and the document includes multiple chapters. As shown in Figure 4, in the current web page, the document can include multiple chapter titles and one chapter content display area, and the user clicks on different chapter titles to display the chapter content corresponding to different chapter titles in the chapter content display area .

步骤304、终端设备110获取用户输入的针对文档的搜索词。Step 304, theterminal device 110 acquires the search term for the document input by the user.

其中,搜索词可以是多种语言的,例如汉语或者英语;也可以是多种形式的,比如可以是网址、数字或各种类型的字符等。总之,搜索词的形式、语言并不限定,可以是各种语言或者形式的结合。Wherein, the search term can be in multiple languages, such as Chinese or English; it can also be in multiple forms, such as URLs, numbers, or various types of characters. In short, the form and language of the search term are not limited, and may be a combination of various languages or forms.

本发明一个优选实施例中,用户可以通过终端设备110显示并浏览文档,在浏览的过程中,可以输入搜索词,并将获取到的搜索词发送到服务器120。In a preferred embodiment of the present invention, the user can display and browse documents through theterminal device 110 , and can input search words during the browsing process, and send the obtained search words to theserver 120 .

终端设备110获取用户输入的搜索词的方式可以有多种。There may be multiple ways for theterminal device 110 to acquire the search words input by the user.

在一个例子中,可以通过键盘输入的方式来获取用户输入的搜索词,当然上述键盘在不同的终端设备110中可以表现不同的形式,例如终端设备110为计算机(包含但不限于,台式计算机和笔记本计算机),上述键盘可以为物理键盘,当然在终端设备110为平板电脑或手机时,上述键盘可以为软件生成的虚拟键盘,本申请并不限制上述键盘的具体表现形式。In one example, the search words input by the user may be acquired through keyboard input. Of course, the above-mentioned keyboard may be displayed in different forms in differentterminal devices 110. For example, theterminal device 110 is a computer (including but not limited to, a desktop computer and Notebook computer), the above-mentioned keyboard can be a physical keyboard. Of course, when theterminal device 110 is a tablet computer or a mobile phone, the above-mentioned keyboard can be a virtual keyboard generated by software. The present application does not limit the specific expression form of the above-mentioned keyboard.

在一个例子中,可以通过语音输入方式来获取用户输入的搜索词;具体地,上述语音输入的方式可以通过终端设备110连接的麦克来获取,在一些实际应用中,可以通过终端设备110内置的麦克风来获取;在另一些实际应用中,也可以通过与终端设备110连接的外部的麦克风来获取该语音输入,比如终端设备110可以为计算机(包含但不限于,台式计算机和笔记本计算机),外部的麦克风位于智能机器人上,则智能机器人可以采集用户输入的搜索词,并将采集到的搜索词发送到计算机。In one example, the search words entered by the user can be obtained through voice input; specifically, the above voice input method can be obtained through the microphone connected to theterminal device 110, and in some practical applications, it can be obtained through the built-interminal device 110 In other practical applications, the voice input can also be obtained through an external microphone connected to theterminal device 110. For example, theterminal device 110 can be a computer (including but not limited to, a desktop computer and a notebook computer). If the microphone is located on the intelligent robot, the intelligent robot can collect the search words input by the user and send the collected search words to the computer.

示例性地,终端设备110为计算机(包含但不限于,台式计算机和笔记本计算机),计算机在获取到搜索词后,比如采用上述键盘输入或语音输入的方式获取的搜索词,将搜索词发送到服务器120。Exemplarily, theterminal device 110 is a computer (including but not limited to, a desktop computer and a notebook computer). After the computer obtains the search word, for example, the search word obtained by using the above-mentioned keyboard input or voice input, the computer sends the search word toServer 120.

当然在实际应用中,上文描述的接收搜索词的方式仅仅作为示例并不构成具体限定;上述步骤303的接收搜索词的方式还可以有其他的方式,这里不一一举例。Of course, in practical applications, the manner of receiving the search words described above is only an example and does not constitute a specific limitation; the manner of receiving the search words in the above step 303 may also have other manners, which are not listed here.

在本实施例中,具体可以通过终端设备110上安装的应用程序等来接收用户输入的搜索词。In this embodiment, specifically, the search term input by the user may be received through an application program or the like installed on theterminal device 110 .

例如在通过键盘输入的方式获取用户输入的搜索词的场景下,手机或计算机(终端设备110)上安装应用程序,在应用程序的相应页面上设置输入框,用户可以在输入框内输入搜索词进行搜索。For example, in the scenario where the search term entered by the user is obtained through keyboard input, an application is installed on the mobile phone or computer (terminal device 110), an input box is set on the corresponding page of the application, and the user can input the search term in the input box to search.

例如在通过语音输入方式来获取用户输入的搜索词的场景下,手机或平板手机(终端设备110)上安装应用程序,在应用程序的相应页面上设置输入框和语音输入按钮,用户可以长按语音输入按钮,靠近手机或平板,说出搜索词,将搜索词输入到输入框中进行搜索。For example, in the scenario of obtaining the search words input by the user through voice input, an application is installed on the mobile phone or tablet (terminal device 110), and an input box and a voice input button are set on the corresponding page of the application, and the user can press and hold the Voice input button, close to the phone or tablet, speak the search term, enter the search term into the input box to search.

进一步地,应用程序可以接收输入的搜索词,并将输入的搜索词发送给对应的服务器120。Further, the application program may receive the input search term and send the input search term to thecorresponding server 120 .

示例性地,应用程序可以为浏览器,文档可以为通过浏览器在线访问的网站中的一个文档。Exemplarily, the application program may be a browser, and the document may be a document in a website accessed online through the browser.

步骤305、终端设备110向服务器120发送用户输入的针对文档的搜索词。Step 305 , theterminal device 110 sends the search term for the document entered by the user to theserver 120 .

步骤306、服务器120接收用户输入的针对文档的搜索词。Step 306, theserver 120 receives the search term for the document input by the user.

步骤307、服务器120将搜索词与文档中的多个章节的章节内容进行匹配,确定第一匹配结果。In step 307, theserver 120 matches the search word with the chapter content of multiple chapters in the document, and determines the first matching result.

可选地,具体的匹配过程为:对搜索词进行分词,确定第一分词结果(比如分词列表);将第一分词结果中的每个词语(关键词)分别与文档中的每个章节的章节内容进行匹配,若匹配到内容,将内容所在的章节作为匹配的章节(为了便于描述和区别,可以称为第一匹配章节);在遍历完文档中的所有章节后,确定第一匹配结果。需要说明的,若文档中该章节对应的章节内容包含第一分词结果中的词语(关键词)时,可以认为该章节为匹配的章节,也如第一匹配章节。Optionally, the specific matching process is: perform word segmentation on the search word, determine the first word segmentation result (such as word segmentation list); compare each word (keyword) in the first word segmentation result with each chapter in the document respectively The content of the chapter is matched. If the content is matched, the chapter where the content is located is used as the matched chapter (for the convenience of description and distinction, it can be called the first matching chapter); after traversing all the chapters in the document, determine the first matching result . It should be noted that if the content of the chapter corresponding to the chapter in the document contains the words (keywords) in the first word segmentation result, the chapter can be considered as a matching chapter, also like the first matching chapter.

第一分词结果中的词语(关键词)和章节内容匹配可以通过文档的倒排索引实现;其中,倒排索引的构建方式为:对文档的内容进行分词,得到第二分词结果,接着确定第二分词结果中的每个词语(为了便于描述和区别,可以称为文档词语)所在的章节,进一步的,还可以统计第二分词结果中的每个词语在章节的出现次数和出现位置;则匹配的具体的过程为:基于文档的倒排索引,确定搜索词的第一分词结果匹配的倒排索引中的第二分词结果中的词语(文档分词),进一步确定倒排索引中的匹配的词语(文档分词)匹配的章节,得到搜索词匹配的章节,也如第一匹配章节。The matching of words (keywords) and chapter content in the first word segmentation result can be realized through the inverted index of the document. The chapter where each word in the dichotomous result (for ease of description and distinction, may be referred to as a document word) is located, and further, the number of occurrences and the location of each word in the chapter in the second word segmentation result can be counted; then The specific process of matching is: based on the inverted index of the document, determine the words (document word segmentation) in the second word segmentation result in the inverted index matched by the first word segmentation result of the search word, and further determine the matched word in the inverted index The chapters matching the word (document word segmentation) get the chapters matching the search word, also like the first matching chapter.

在实际应用中,步骤306可以通过搜索引擎实现。In practical applications, step 306 can be implemented by a search engine.

其中,第一匹配结果可以为空,也如没有匹配到章节,也可以指示匹配的章节(第一匹配章节)。在第一匹配结果指示匹配的章节(第一匹配章节),示例性地,第一匹配结果可以包括匹配的若干个章节(也如第一匹配章节)各自的标识。示例性地,标识可以为章节的编号,也可以为章节的章节标题。Wherein, the first matching result may be empty, and if no chapter is matched, the matching chapter (the first matching chapter) may also be indicated. The first matching result indicates a matching chapter (first matching chapter). Exemplarily, the first matching result may include identifiers of several matched chapters (also such as the first matching chapter). Exemplarily, the identifier may be a number of a chapter, or may be a chapter title of a chapter.

步骤308、服务器120将搜索词与标签词章节关联库中的标签词进行匹配,确定第二匹配结果。Step 308, theserver 120 matches the search word with the tag word in the tag word chapter association library, and determines the second matching result.

可选地,具体的匹配过程为:对搜索词进行分词,确定第一分词结果;对于每个章节,将搜索词的第一分词结果中的每个词语(关键词)与标签词章节关联库中的文档中的章节关联的标签词进行匹配,若匹配,则将匹配的标签词关联的章节作为匹配的章节(为了便于描述和区别,可以称为第二匹配章节);在完成对文档中每个章节的匹配后,确定第二匹配结果。需要说明的,若标签词章节关联库中标签词为搜索词的第一分词结果中的词语(关键词),且标签词属于的章节为文档中的章节时,可以认为该章节为匹配的章节,也如第二匹配章节。Optionally, the specific matching process is: perform word segmentation on the search word, and determine the first word segmentation result; for each chapter, associate each word (keyword) in the first word segmentation result of the search word with the tag word chapter association library Match the tag words associated with the chapters in the documents in the document, if they match, the chapters associated with the matched tag words will be used as the matched chapters (for ease of description and distinction, it can be called the second matching chapter); in the complete pair of documents After the matching of each chapter, a second matching result is determined. It should be noted that if the tag word in the tag word chapter association library is a word (keyword) in the first word segmentation result of the search word, and the chapter to which the tag word belongs is a chapter in the document, the chapter can be considered as a matching chapter , also as in the second matching section.

其中,第二匹配结果可以为空,也如没有匹配到章节,也可以指示匹配的章节(第二匹配章节)。在第二匹配结果指示匹配的章节(第二匹配章节),示例性地,第二匹配结果包括匹配的若干个章节(第二匹配章节)各自的标识。示例性地,标识可以为章节的编号,也可以为章节的章节标题。Wherein, the second matching result may be empty, or if no chapter is matched, a matching chapter (second matching chapter) may also be indicated. The second matching result indicates matching chapters (second matching chapters). Exemplarily, the second matching result includes respective identifications of several matching chapters (second matching chapters). Exemplarily, the identifier may be a number of a chapter, or may be a chapter title of a chapter.

应当理解的是,第一匹配结果和第二匹配结果中章节的标识的类型相同,比如都是编号,或者,都是章节标题。It should be understood that the identifiers of the chapters in the first matching result and the second matching result are of the same type, for example, both are numbers, or both are chapter titles.

需要说明的是,本申请实施例并不意图限制步骤307和步骤308的执行顺序,可以步骤307先执行,步骤308后执行,也可以步骤308先执行,步骤307后执行,还可以如图3所示,步骤307和步骤308并行执行。It should be noted that this embodiment of the present application does not intend to limit the execution order of steps 307 and 308. Step 307 may be executed first and then step 308 may be executed, or step 308 may be executed first and then step 307 may be executed, as shown in Figure 3 As shown, step 307 and step 308 are executed in parallel.

步骤309、服务器120基于第一匹配结果和第二匹配结果,确定搜索结果。Step 309, theserver 120 determines a search result based on the first matching result and the second matching result.

根据一种可行的实现方式,搜索结果可以包括匹配的每个章节的标识,比如可以为章节的编号,再比如可以为章节标题。According to a feasible implementation manner, the search result may include an identifier of each matched chapter, for example, it may be a chapter number, or it may be a chapter title.

步骤310、服务器120将搜索结果发送到终端设备110。Step 310 , theserver 120 sends the search result to theterminal device 110 .

综上,本申请实施例中,一方面,通过增加章节的可搜索的标签词,扩展章节的词域,从而提高文档章节与用户输入的搜索词契合度,可以提高搜索的准确性,使得使用户能够便捷的查找筛选、定位章节内容;另一方面,在对文档本身不做改动的情况下,可实时动态维护标签库,业务侵入小,响应迅速,快速提升用户搜索体验。又一方面,同类型的文档的章节内容相似,标签库具有较好的扩展性和通用性,可以多个文档共用,实现一处维护、处处可用。To sum up, in the embodiment of the present application, on the one hand, by increasing the searchable tag words of the chapter and expanding the word domain of the chapter, thereby improving the fit between the document chapter and the search term entered by the user, the accuracy of the search can be improved, so that the Users can easily search, filter, and locate chapter content; on the other hand, without changing the document itself, the tag library can be dynamically maintained in real time, with little business intrusion and quick response, quickly improving user search experience. On the other hand, the content of the chapters of the same type of document is similar, and the tag library has good scalability and versatility, and can be shared by multiple documents, so that it can be maintained in one place and available everywhere.

在一些可能的场景中,对于章节匹配的标签词,若匹配的标签词所在的词组中存在未和章节的内容匹配的基准词,对应的,该章节还可以关联该基准词。对应的,章节的关联记录还可以包括词组中的未和章节的内容匹配的基准词,例如,关联记录可以记录词组中的未和章节的内容匹配的基准词。本申请实施例以关联记录记录标签库中所有未和章节的内容匹配的基准词。In some possible scenarios, for a tag word matched by a chapter, if there is a reference word that does not match the content of the chapter in the phrase where the matched tag word is located, correspondingly, the chapter can also be associated with the reference word. Correspondingly, the associated record of the chapter may also include reference words in the phrase that do not match the content of the chapter, for example, the associated record may record the reference words in the phrase that do not match the content of the chapter. In this embodiment of the present application, all reference words in the tag library that do not match the content of the chapter are recorded by associating records.

在该场景下,获取文档中每个章节的关联记录可以通过如下方式实现:In this scenario, obtaining the associated records of each chapter in the document can be achieved as follows:

获取标签库;对于文档中多个章节中的每个章节,将章节的章节内容和标签库中的每个词组进行匹配,确定章节对应的关联记录。这里,为章节关联的标签词所在词组中章节内容没有匹配的基准词,从而增加章节可搜索的词域,从而提高文档章节与用户输入的搜索词契合度,可以提高搜索的准确性,使得使用户能够便捷的查找筛选、定位章节内容。Obtain a tag library; for each of the multiple chapters in the document, match the chapter content of the chapter with each phrase in the tag library, and determine the associated record corresponding to the chapter. Here, there is no matching reference word in the chapter content in the phrase of the tag word associated with the chapter, thereby increasing the searchable word domain of the chapter, thereby improving the fit between the document chapter and the search term entered by the user, and improving the accuracy of the search. Users can easily search, filter and locate chapter content.

具体地实现过程可以为:将文档的内容按章节为单元数据存储在搜索引擎(Es);对于文档中的每个章节,按顺序读取标签库中的词组,搜索引擎(Es)对于读取到的词组,若该词组中的基准词与章节的章节内容匹配,该词组中的标签词与该章节关联;在此基础上,若该词组中存在未与章节的章节内容匹配的基准词,将该基准词与该章节关联;在遍历标签库中的词组后,确定关联记录。The specific implementation process can be: the content of the document is stored in the search engine (Es) as unit data by chapters; for each chapter in the document, the phrases in the tag library are read in order, and the search engine (Es) for reading If the reference word in the phrase matches the chapter content of the chapter, the tag word in the phrase is associated with the chapter; on this basis, if there is a reference word in the phrase that does not match the chapter content of the chapter, The reference word is associated with the chapter; after traversing the phrases in the tag library, the associated record is determined.

根据一种可能的情况,标签词章节关联库还包括文档中的章节的得分(为了便于描述和区别,称为第三匹配得分)。在标签词章节关联库包括关联记录的场景下,章节对应的关联记录还包括第一得分。需要说明的是,第三匹配得分指示了章节匹配的标签词和该章节的匹配度。According to a possible situation, the tag word chapter association library also includes the score of the chapter in the document (for convenience of description and distinction, it is called the third matching score). In a scenario where the tag word chapter association library includes associated records, the associated records corresponding to the chapters also include the first score. It should be noted that the third matching score indicates the degree of matching between the tag words matched by the chapter and the chapter.

确定文档中的任一章节(为了便于描述和区别,称为目标章节)的第三匹配得分的过程为:The process of determining the third matching score of any chapter in the document (for ease of description and distinction, referred to as the target chapter) is:

首先通过第一模型对不同词语进行向量化。具体地,对于标签库中的每个词语,通过第一模型对词语进行向量化,得到词向量。对目标章节的章节内容进行分词,得到第三分词结果,之后,通过第一模型对第三分词结果中每个词语进行向量化,得到多个词向量(为了便于描述和区别,可以称为章节词向量)。这样,相同的词语的词向量是相同的。需要说明的是,若采用倒排索引的方式,文档的第二分词结果包括文档中的章节内容的第三分词结果。The different words are first vectorized by the first model. Specifically, for each word in the tag library, the word is vectorized by the first model to obtain a word vector. Segment the chapter content of the target chapter to obtain the third word segmentation result, and then use the first model to vectorize each word in the third word segmentation result to obtain multiple word vectors (for ease of description and distinction, it can be called chapter word vector). In this way, the word vectors of the same words are the same. It should be noted that, if the inverted index is used, the second word segmentation result of the document includes the third word segmentation result of the chapter content in the document.

接着,对于标签库中的每个词组,基于该词组中的基准词的词向量和目标章节的章节词向量,确定词组针对目标章节的匹配度。确定目标章节的匹配度的方式可以有多种。Next, for each phrase in the tag library, based on the word vector of the reference word in the phrase and the chapter word vector of the target chapter, determine the matching degree of the phrase for the target chapter. There are many ways to determine the matching degree of the target chapter.

在一个例子中,对于该词组中的每个基准词,计算该基准词的词向量和目标章节的多个章节词向量之间的相似度;基于大于预设阈值的相似度确定匹配度,比如,将大于预设阈值的相似度的均值作为匹配度。In one example, for each reference word in the phrase, the similarity between the word vector of the reference word and multiple chapter word vectors of the target chapter is calculated; the matching degree is determined based on a similarity greater than a preset threshold, such as , taking the mean value of the similarity greater than the preset threshold as the matching degree.

在一个例子中,对于该词组中的每个基准词,基于该基准词的向量和目标章节的章节词向量,计算和该基准词相同的目标章节的词语的数目和位置;基于该词组中每个基准词分别和目标章节的相同的词语的数目和位置,确定匹配度。比如,相同的词语的数目越多,位置越靠前,则匹配度越高。In one example, for each benchmark word in the phrase, based on the vector of the benchmark word and the chapter word vector of the target chapter, calculate the number and position of the words of the same target chapter as the benchmark word; The number and position of the reference words and the same words in the target chapter are used to determine the matching degree. For example, the greater the number of identical words and the higher the position, the higher the matching degree.

最后,基于目标章节匹配的每个词组和该目标章节的匹配度,确定目标章节对应的第三匹配得分。比如,将目标章节匹配的每个词组和该目标章节的匹配度的均值作为第三匹配得分。Finally, based on the matching degree between each phrase matched by the target chapter and the target chapter, a third matching score corresponding to the target chapter is determined. For example, the average value of the matching degree between each phrase matched by the target chapter and the target chapter is used as the third matching score.

对应的,第一匹配结果包括匹配的章节(第一匹配章节)对应的得分(为了便于描述和区别,称为第一匹配得分),第一匹配得分指示了对应的章节和搜索词的匹配度。Correspondingly, the first matching result includes the score corresponding to the matched chapter (the first matching chapter) (for convenience of description and distinction, referred to as the first matching score), and the first matching score indicates the degree of matching between the corresponding chapter and the search term .

示例性地,若采用倒排索引的方式,第一匹配章节对应的第一匹配得分的计算过程可以为:确定搜索词的第一分词结果匹配的文档分词在章节内容的出现次数和位置,基于出现次数和位置确定第二得分。Exemplarily, if the inverted index method is used, the calculation process of the first matching score corresponding to the first matching chapter can be as follows: determine the number of occurrences and positions of the document word segmentation matched by the first word segmentation result of the search term in the chapter content, based on The number of occurrences and positions determine the second score.

示例性地,第一匹配章节对应的第一匹配得分的计算过程可以为:对搜索词进行分词,确定第一分词结果;对于文档中的每个章节,计算第一分词结果中的每个词语(关键词)分别与该章节的内容是否匹配,若匹配,将该章节作为第一匹配章节,计算第一匹配章节和第一分词结果中的每个词语的匹配度,从而确定第一匹配得分。Exemplarily, the calculation process of the first matching score corresponding to the first matching chapter may be: perform word segmentation on the search word, and determine the first word segmentation result; for each chapter in the document, calculate each word in the first word segmentation result Whether the (keywords) match the content of the chapter, if they match, use the chapter as the first matching chapter, calculate the matching degree of each word in the first matching chapter and the first word segmentation result, and determine the first matching score .

示例性地,确定第一匹配章节的第一匹配得分的过程如下:Exemplarily, the process of determining the first matching score of the first matching chapter is as follows:

首先对第一分词结果中的每个词语和第一匹配章节的内容进行向量化,得到第一分词结果中每个词语的词向量和第一匹配章节中每个词语的词向量。First, each word in the first word segmentation result and the content of the first matching chapter are vectorized to obtain the word vector of each word in the first word segmentation result and the word vector of each word in the first matching chapter.

对于第一分词结果中每个词语,基于该词语的词向量和第一匹配章节中每个词语的词向量,确定该词语和第一匹配章节的匹配度。确定该词语和第一匹配章节的匹配的方式可以有多种。For each word in the first word segmentation result, based on the word vector of the word and the word vector of each word in the first matching section, the degree of matching between the word and the first matching section is determined. There are many ways to determine the match between the term and the first matching section.

在一个例子中,对于该词语,计算该词语的词向量和第一匹配章节中每个词语的向量之间的相似度;基于大于预设阈值的相似度确定匹配度,比如,将大于预设阈值的相似度的均值作为匹配度。In one example, for the word, the similarity between the word vector of the word and the vector of each word in the first matching section is calculated; the matching degree is determined based on a similarity greater than a preset threshold, for example, will be greater than a preset The mean value of the similarity of the threshold is taken as the matching degree.

在一个例子中,对于该词语,基于该词语的词向量和第一匹配章节中每个词语的向量,计算和该词语相同的词语的数目和位置;基于第一匹配章节中和该词语相同的词语的数目和位置,确定匹配度。比如,相同的词语的数目越多,位置越靠前,则匹配度越高。In one example, for the word, based on the word vector of the word and the vector of each word in the first matching section, calculate the number and position of the same word as the word; The number and position of words determine the degree of matching. For example, the greater the number of identical words and the higher the position, the higher the matching degree.

最后,基于第一分词结果中每个词语和第一匹配章节的匹配度,确定第一匹配章节的第一匹配得分。比如,将第一分词结果中每个词语和第一匹配章节的匹配度的均值作为第一匹配得分。Finally, based on the degree of matching between each word in the first word segmentation result and the first matching section, the first matching score of the first matching section is determined. For example, the average value of the matching degree between each word in the first word segmentation result and the first matching chapter is used as the first matching score.

对应的,第二匹配结果包括匹配的标签词对应的章节(第二匹配章节)的得分(为了便于描述和区别,称为第二匹配得分)。其中,第二匹配得分为标签词章节关联库中的第二匹配章节关联的第三匹配得分。Correspondingly, the second matching result includes the score of the chapter (second matching chapter) corresponding to the matched tag word (for ease of description and distinction, it is called the second matching score). Wherein, the second matching score is the third matching score associated with the second matching chapter in the tag word chapter association database.

在步骤307中,可以按照第一匹配得分和第二匹配得分,确定搜索结果。其中,搜索结果可以包括第一匹配章节和/或第二匹配章节。In step 307, the search result may be determined according to the first matching score and the second matching score. Wherein, the search result may include the first matching chapter and/or the second matching chapter.

示例性地,假设第一匹配章节的第一匹配得分均大于等于第二匹配得分,对应地,搜索结果可以为按照第一匹配得分从大到小的顺序排列的第一匹配章节。For example, assuming that the first matching scores of the first matching chapters are greater than or equal to the second matching score, correspondingly, the search result may be the first matching chapters arranged in descending order of the first matching scores.

示例性地,假设第二匹配章节的第二匹配得分均大于等于第一匹配得分,对应地,搜索结果可以为按照第二匹配得分从大到小的顺序排列的第二匹配章节。For example, assuming that the second matching scores of the second matching chapters are all greater than or equal to the first matching score, correspondingly, the search result may be the second matching chapters arranged in descending order of the second matching scores.

示例性地,搜索结果可以为按照第二匹配得分和第一匹配得分从大到小的顺序排列的第一匹配章节和第二匹配章节。在具体实现时,基于第一匹配得分和第二匹配得分,对第一匹配章节和第二匹配章节进行排序,得到搜索结果。Exemplarily, the search result may be the first matching chapter and the second matching chapter arranged in descending order of the second matching score and the first matching score. During specific implementation, based on the first matching score and the second matching score, the first matching chapters and the second matching chapters are sorted to obtain search results.

示例性地,搜索结果可以指示第一匹配得分和第二匹配得分中得分高的所对应的章节。对应的,搜索结果可能为第一匹配章节,也可能为第二匹配章节。Exemplarily, the search result may indicate a chapter corresponding to a high score among the first matching score and the second matching score. Correspondingly, the search result may be the first matching chapter or the second matching chapter.

进一步地,本申请实施例中,词组具有较好的扩展性,可以根据实际需求更新词组中的标签词,在更新了词组中的标签词后,需要基于更新后的词组,适配性地更新文档中章节关联的标签词。Further, in the embodiment of the present application, the phrase has good scalability, and the tag words in the phrase can be updated according to actual needs. After updating the tag words in the phrase, it is necessary to update the phrase adaptively based on the updated phrase Tag words associated with chapters in the document.

图5示出了本申请实施例提供的另一种搜索方法的流程示意图。FIG. 5 shows a schematic flowchart of another search method provided by the embodiment of the present application.

如图5所示,在图3所示的步骤301至步骤309基础上,本申请实施例中,至少还包括如下步骤:As shown in FIG. 5 , on the basis of steps 301 to 309 shown in FIG. 3 , in this embodiment of the present application, at least the following steps are included:

步骤501、终端设备110显示搜索结果指示的章节针对文档的章节标题。In step 501, theterminal device 110 displays the chapter title of the document for the chapter indicated by the search result.

在一种可能的场景中,搜索结果中的章节标题按照得分从高到低的顺序排列。In one possible scenario, the chapter titles in the search results are ordered from highest score to lowest score.

步骤502、终端设备110获取用户选择的章节标题。Step 502, theterminal device 110 acquires the chapter title selected by the user.

步骤503、终端设备110显示文档中章节标题对应的章节内容。Step 503, theterminal device 110 displays the chapter content corresponding to the chapter title in the document.

上述终端设备110对搜索结果的处理过程进行作为示例,本申请实施例并不意图显示终端设备110在拿到搜索结果之后的处理过程;在一些可能的场景,终端设备110还可以对搜索结果指示的章节标题进行标记,比如,通过矩形框框选,这样,用户可以基于标记的章节标题进行章节标题的选择,比如,用户基于标记的章节标题和标记的区域在界面的位置,确定想要查看的章节。The above-mentioned processing process of theterminal device 110 on the search results is taken as an example, and the embodiment of the present application does not intend to show the processing process of theterminal device 110 after obtaining the search results; in some possible scenarios, theterminal device 110 can also indicate the search results Mark the chapter titles, for example, by selecting a rectangle, so that the user can select the chapter titles based on the marked chapter titles, for example, based on the marked chapter titles and the position of the marked area in the interface, the user determines the one he/she wants to view chapter.

基于上述提供的搜索方法,对搜索方法的具体的应用进行说明。图6A和图6B为本发明实施提供的一种搜索方法的具体应用的流程示意图。如图6A和图6B所示,具体包括:Based on the search method provided above, the specific application of the search method will be described. FIG. 6A and FIG. 6B are schematic flowcharts of a specific application of a search method provided by the implementation of the present invention. As shown in Figure 6A and Figure 6B, specifically include:

首先,如图6A所示,服务器120调用搜索引擎对导入的文档进行分词处理,得到分词列表,一方面,可以结合标签库得到文档对应的标签词章节关联库;其中,标签库中包括多个词组,每个词组包括标签词和标签词对应的多个基准词,标签词章节关联库中包括文档的一个或多个章节的标识,每个章节的标识对应的第一得分和若干个标签词,其中,不同的章节可以具有相同的标签词;另一方面,可以得到倒排索引;其中,倒排索引包括文档分词后的第二分词结果中的多个词语(文档分词,图6示出的term),每个词语(文档分词,term)所在的章节的标识(DocId),以及,每个词语(文档分词,term)针对章节的第二得分(图中未示意)。First, as shown in Figure 6A, theserver 120 calls the search engine to perform word segmentation processing on the imported document to obtain a word segmentation list. On the one hand, the tag word chapter association library corresponding to the document can be obtained by combining the tag library; Phrases, each phrase includes a tag word and multiple reference words corresponding to the tag word, the tag word chapter association library includes the identification of one or more chapters of the document, and the first score corresponding to the identification of each chapter and several tag words , wherein, different chapters can have the same tag words; on the other hand, an inverted index can be obtained; wherein, the inverted index includes a plurality of words in the second word segmentation result after document word segmentation (document word segmentation, Figure 6 shows term), the identification (DocId) of the chapter where each word (document word segmentation, term) is located, and the second score (not shown in the figure) for each word (document word segmentation, term) for the chapter.

其次,如图6B所示,终端设备110显示文档,网页中具有输入框,用户可以在输入框中输入搜索词,终端设备110将搜索词发送到服务器120。Next, as shown in FIG. 6B , theterminal device 110 displays documents, and the webpage has an input box in which the user can input search words, and theterminal device 110 sends the search words to theserver 120 .

服务器120调用搜索引擎对搜索词进行分词,得到第一分词结果,之后,基于倒排索引和第一分词结果中的词语(关键词)进行匹配,得到第一匹配结果。然后,服务器120基于搜索引擎对搜索词分词后的第一分词结果中的词语(关键词)和文档的标签词章节关联库匹配,得到第二匹配结果。最终,将第一匹配结果和第二匹配结果作为搜索结果反馈到终端设备110。Theserver 120 invokes the search engine to segment the search word to obtain the first word segmentation result, and then performs matching based on the inverted index and the words (keywords) in the first word segmentation result to obtain the first matching result. Then, based on the search engine, theserver 120 matches the words (keywords) in the first word-segmented result after the search word is word-segmented with the document's tag word-chapter association database to obtain a second matching result. Finally, the first matching result and the second matching result are fed back to theterminal device 110 as search results.

在一种可能的场景下,标签库中含有标签词“上传”,或“上传”对应的基准词“传输文件”、“文件传输”、“WinSCP”、“SSH”等。参见图7B所示,文档为FusionServer Tools 2.3.0用户指南02S;文档的内容图7A所示,文档为通过浏览器在线访问的网站中的一个文档,该文档包括可以输入搜索词的输入框701和多个章节标题:前言、简介、安装流程、选择安装方式、直接安装、加载驱动安装、制作安装源安装、安装升级驱动程序和Firmware、常规操作、本地连接服务器、登录虚拟控制台、使用WinSCP传输文件、使用本地文件夹传输文件,定位OS故障,如果获取帮助,附录等。用户想要查看章节标题:使用WinSCP传输文件的章节内容;章节标题:使用WinSCP传输文件和其对应的章节内容均不包含“上传”这一词语。In a possible scenario, the tag library contains the tag word "upload", or the corresponding reference words "transfer file", "file transfer", "WinSCP", "SSH" and so on. Referring to Figure 7B, the document is FusionServer Tools 2.3.0 User Guide 02S; the content of the document is shown in Figure 7A, the document is a document in a website accessed online through a browser, and the document includes aninput box 701 where search terms can be entered And multiple chapter titles: Preface, Introduction, Installation Process, Selecting Installation Method, Direct Installation, Loading Driver Installation, Making Installation Source Installation, Installing and Upgrading Drivers and Firmware, General Operations, Connecting to Servers Locally, Logging In to Virtual Console, Using WinSCP Transferring files, transferring files using local folders, locating OS failures, if getting help, appendices, etc. The user wants to view the chapter title: Chapter Content of Transferring Files Using WinSCP; Chapter Title: Transferring Files Using WinSCP and its corresponding chapter content do not contain the word "upload".

用户在输入框701中输入“上传”这一搜索词。在采用现有技术中倒排索引的方式搜索时,搜索结果为空。在采用本申请实施例提供的方法搜索时,可以搜索到如图7B所示的使用WinSCP传输文件这一章节标题和其对应的章节内容。需要说明的是,使用WinSCP传输文件的章节标题和章节内容虽然不包括“上传”,但是包括“传输文件”、“WinSCP”等基准词,因此可以检索到该章节。The user inputs the search word "upload" in theinput box 701 . When using the inverted index method in the prior art to search, the search result is empty. When searching using the method provided in the embodiment of the present application, you can search for the chapter title of using WinSCP to transfer files as shown in FIG. 7B and its corresponding chapter content. It should be noted that although the chapter title and chapter content of files transferred using WinSCP do not include "upload", they include reference words such as "transfer files" and "WinSCP", so this chapter can be retrieved.

基于与本发明方法实施例相同的构思,本发明实施例还提供了一种搜索装置。该搜索装置用于执行本发明实施例提供的服务器120执行的搜索方法。搜索装置包括若干个模块,各个模块用于执行本发明实施例提供的搜索方法中的各个步骤,关于模块的划分在此不做限制。所属领域的技术人员可以清楚地了解到,实际应用中,可以根据需要而将本发明实施例提供的搜索方法中的各个步骤分配由不同的模块完成,如将装置的内部结构划分成不同的模块,以完成以上描述的全部或者部分功能。实施例中的各模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上模块集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各模块的具体名称也只是为了便于相互区分,并不用于限制本发明的保护范围。上述装置中模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Based on the same idea as the method embodiment of the present invention, the embodiment of the present invention also provides a search device. The search device is used to execute the search method performed by theserver 120 provided in the embodiment of the present invention. The search device includes several modules, and each module is used to execute each step in the search method provided by the embodiment of the present invention, and there is no limitation on the division of the modules here. Those skilled in the art can clearly understand that in practical applications, each step in the search method provided by the embodiment of the present invention can be assigned to different modules according to needs, such as dividing the internal structure of the device into different modules , to complete all or part of the functions described above. Each module in the embodiment can be integrated into one processing unit, or each unit can exist separately physically, or two or more modules can be integrated into one unit, and the above-mentioned integrated units can be implemented in the form of hardware, It can also be implemented in the form of software functional units. In addition, the specific names of the modules are only for the convenience of distinguishing each other, and are not used to limit the protection scope of the present invention. For the specific working process of the modules in the above-mentioned device, reference may be made to the corresponding process in the aforementioned method embodiments, which will not be repeated here.

示例地,图8是本申请实施例提供的搜索装置的结构示意图。如图8所示,本申请实施例提供的搜索装置800,包括:Exemplarily, FIG. 8 is a schematic structural diagram of a search device provided by an embodiment of the present application. As shown in Figure 8, thesearch device 800 provided in the embodiment of the present application includes:

第一获取模块801,用于接收用户针对文档的搜索词;The first acquiring module 801 is configured to receive the user's search term for the document;

第一匹配模块802,用于将所述搜索词与所述文档的内容进行匹配,确定第一匹配结果;;The first matching module 802 is configured to match the search term with the content of the document to determine a first matching result;

第二匹配模块803,用于将所述搜索词与所述文档的章节内容关联的标签词进行匹配,确定第二匹配结果;其中,所述标签词为所述章节内容未包括的词语,且所述标签与所述章节内容中的至少一个词语语义相同;The second matching module 803 is configured to match the search term with a tag word associated with the chapter content of the document to determine a second matching result; wherein the tag word is a word not included in the chapter content, and The label has the same semantics as at least one word in the chapter content;

结果确定模块804,用于基于所述第一匹配结果和所述第二匹配结果,确定搜索结果;A result determining module 804, configured to determine a search result based on the first matching result and the second matching result;

在一种可能的实现方式中,所述装置还包括:In a possible implementation manner, the device further includes:

第二获取模块803,用于获取词组;其中,所述词组包括标签词及与所述标签词关联的基准词,所述基准词为所述文档中章节的内容中的词语,且与所述标签词的含义相同;根据所述词组和所述文档,确定与所述文档的章节内容关联的标签词。The second obtaining module 803 is used to obtain a phrase; wherein, the phrase includes a tag word and a reference word associated with the tag word, and the reference word is a word in the content of a chapter in the document, and is related to the The tag words have the same meaning; according to the phrase and the document, the tag words associated with the chapter content of the document are determined.

在该实现方式的一个例子中,第二获取模块803,用于基于所述基准词,搜索所述文档;在所述基准词与所述文档内容匹配的情况下,将与所述基准词关联的标签词与所述文档内容所在的章节关联。In an example of this implementation, the second obtaining module 803 is configured to search the document based on the reference word; if the reference word matches the content of the document, associate the reference word with the reference word The tag word of is associated with the chapter where the content of the document is located.

在该实现方式的一个例子中,第二获取模块803,用于获取对所述文档的历史搜索词;基于所述历史搜索词,确定所述标签词。In an example of this implementation manner, the second acquiring module 803 is configured to acquire historical search terms for the document; and determine the tag word based on the historical search terms.

在该实现方式的一个例子中,所述基准词包括第一基准词和第二基准词,所述第一基准词和所述文档的章节内容匹配,所述第二基准词未和所述文档的章节内容匹配,所述章节还关联所述第二基准词;第二获取模块803,用于将所述搜索词与所述文档的章节内容关联的标签词和第二基准词进行匹配,确定第二匹配结果。In an example of this implementation, the reference words include a first reference word and a second reference word, the first reference word matches the chapter content of the document, and the second reference word does not match the content of the document The chapter content of the document is matched, and the chapter is also associated with the second reference word; the second acquisition module 803 is used to match the search word with the label word associated with the chapter content of the document and the second reference word, and determine The second matching result.

在该实现方式的一个例子中,第二获取模块803,用于更新所述词组中的标签词;基于更新后的词组和所述文档,更新与所述文档中章节关联的标签词。In an example of this implementation, the second obtaining module 803 is configured to update tag words in the phrase; based on the updated phrase and the document, update tag words associated with chapters in the document.

在一种可能的实现方式中,所述第一匹结果包括第一匹配章节和第一匹配得分;所述第二匹结果包括第二匹配章节和第二匹配得分;结果确定模块804,用于基于所述第一匹配得分和所述第二匹配得分,确定所述搜索结果,其中所述搜索结果为所述第一匹配章节和/或第二匹配章节。In a possible implementation manner, the first matching result includes a first matching chapter and a first matching score; the second matching result includes a second matching chapter and a second matching score; the result determination module 804 is configured to Based on the first matching score and the second matching score, the search result is determined, wherein the search result is the first matching section and/or the second matching section.

在该实现方式的一个例子中,结果确定模块804,用于将所述第一匹配得分和所述第二匹配得分中得分高的所对应的章节确定为所述搜索结果。In an example of this implementation manner, the result determining module 804 is configured to determine, as the search result, a chapter corresponding to a high score among the first matching score and the second matching score.

在该实现方式的一个例子中,所述文档中的章节还关联第三匹配得分;所述第三匹配得分指示了所述章节关联的标签词和所述章节的匹配程度;所述第二匹配章节的第二匹配得分为所述第二匹配章节关联的第三匹配得分。In an example of this implementation, the chapters in the document are also associated with a third matching score; the third matching score indicates the degree of matching between the tag word associated with the chapter and the chapter; the second matching The second matching score of the chapter is the third matching score associated with the second matching chapter.

在一种可能的实现方式中,所述第一匹配模块801,用于获取所述文档对应的倒排索引;其中,所述倒排索引指示了所述文档分词后的词语和所述文档分词后的词语所在的所述文档的章节;基于所述倒排索引和所述搜索词,确定第一匹配结果。In a possible implementation manner, the first matching module 801 is configured to obtain an inverted index corresponding to the document; wherein, the inverted index indicates the word after the word segmentation of the document and the word segmentation of the document The section of the document where the following word is located; based on the inverted index and the search word, determine the first matching result.

示例性地,服务器120包括搜索装置800,换言之,搜索装置800部署在服务器120上,比如,部署在第一设备上。Exemplarily, theserver 120 includes asearch apparatus 800, in other words, thesearch apparatus 800 is deployed on theserver 120, for example, deployed on the first device.

基于与本发明方法实施例相同的构思,本发明实施例还提供了另一种搜索装置。该搜索装置用于执行本发明实施例提供的终端设备110执行的搜索方法。搜索装置包括若干个模块,各个模块用于执行本发明实施例提供的搜索方法中的各个步骤,关于模块的划分在此不做限制。Based on the same idea as the method embodiment of the present invention, the embodiment of the present invention also provides another search device. The search apparatus is used to execute the search method performed by theterminal device 110 provided in the embodiment of the present invention. The search device includes several modules, and each module is used to execute each step in the search method provided by the embodiment of the present invention, and there is no limitation on the division of the modules here.

示例地,图9是本申请实施例提供的搜索装置的结构示意图。如图9所示,本申请实施例提供的搜索装置900,包括:Exemplarily, FIG. 9 is a schematic structural diagram of a search device provided by an embodiment of the present application. As shown in Figure 9, thesearch device 900 provided in the embodiment of the present application includes:

文档显示模块,用于显示文档,并获取用户针对所述文档输入的搜索词;A document display module, configured to display documents and obtain search terms entered by users for said documents;

发送模块,用于将所述搜索词发送到第一设备,以使所述第一设备获取用户输入的针对所述文档的搜索词;A sending module, configured to send the search term to the first device, so that the first device obtains the search term input by the user for the document;

接收模块,用于接收第一设备发送的搜索结果;其中,所述搜索结果为第一设备基于所述搜索词与所述文档的内容进行匹配确定的第一匹配结果和所述搜索词与所述文档中章节关联的标签词进行匹配确定的第二匹配结果确定;其中,所述标签词为所述章节内容未包括的词语,且所述标签与所述章节内容中的至少一个词语语义相同;The receiving module is configured to receive the search result sent by the first device; wherein, the search result is the first matching result determined by the first device based on the matching between the search term and the content of the document, and the search term and the The tag words associated with the chapters in the document are matched to determine the second matching result; wherein, the tag words are words not included in the chapter content, and the tag has the same semantics as at least one word in the chapter content ;

标题显示模块,用于显示所述文档中所述搜索结果指示的章节的章节标题。A title display module, configured to display the chapter title of the chapter indicated by the search result in the document.

在一种可能的实现方式中,所述装置还包括:In a possible implementation manner, the device further includes:

选择模块,用于获取用户选择的章节标题;The selection module is used to obtain the chapter title selected by the user;

内容显示模块,显示所述文档中用户选择的章节标题对应的章节内容。The content display module displays the chapter content corresponding to the chapter title selected by the user in the document.

示例性地,终端设备110包括搜索装置900,换言之,搜索装置900部署在终端设备110上。第一设备可以为服务器120,在服务器包括多台服务设备时,第一设备可以为任一台服务设备。Exemplarily, theterminal device 110 includes asearch apparatus 900 , in other words, thesearch apparatus 900 is deployed on theterminal device 110 . The first device may be theserver 120, and when the server includes multiple service devices, the first device may be any one of the service devices.

基于与本发明方法实施例相同的构思,本发明实施例还提供了一种电子设备。该电子设备可以为服务器120或终端设备110。Based on the same idea as the method embodiment of the present invention, the embodiment of the present invention also provides an electronic device. The electronic device may be theserver 120 or theterminal device 110 .

图10是本申请实施例提供的一种电子设备的结构示意图。FIG. 10 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

如图10所示,电子设备1000包括处理器1001、存储器1002和网络接口1003。As shown in FIG. 10 , an electronic device 1000 includes aprocessor 1001 , a memory 1002 and a network interface 1003 .

处理器1001可以是中央处理单元(Central Processing Unit,CPU),还可以是其它通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其它可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。Theprocessor 1001 may be a central processing unit (Central Processing Unit, CPU), and may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), on-site Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.

存储器1002可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用。Memory 1002 can be volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. Among them, the non-volatile memory can be read-only memory (read-only memory, ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically programmable Erases programmable read-only memory (electrically EPROM, EEPROM) or flash memory. Volatile memory can be random access memory (RAM), which acts as external cache memory. By way of illustration and not limitation, many forms of RAM are available.

示例地,存储器1002上可以存储计算机程序,处理器1001执行计算机程序时实现上述搜索方法实施例中的步骤,例如上述服务器120执行的至少部分的步骤,或者,终端设备110执行的步骤。或者,所述处理器1001执行计算机程序时实现上述装置实施例中各模块的功能。示例性的,计算机程序可以被分割成一个或多个模块/单元,所述一个或者多个模块/单元被存储在所述存储器1002中,并由所述处理器1001执行,以完成本发明。例如,计算机程序可以被分割成图8或图9所示的模块,各模块具体功能参见上文描述。For example, a computer program may be stored in the memory 1002. When theprocessor 1001 executes the computer program, the steps in the above embodiments of the search method are implemented, for example, at least part of the steps performed by theserver 120, or the steps performed by theterminal device 110. Alternatively, when theprocessor 1001 executes the computer program, the functions of the modules in the above device embodiments are implemented. Exemplarily, the computer program can be divided into one or more modules/units, and the one or more modules/units are stored in the memory 1002 and executed by theprocessor 1001 to complete the present invention. For example, the computer program can be divided into the modules shown in FIG. 8 or FIG. 9 , and the specific functions of each module can be found in the above description.

网络接口1003用于收发数据,例如,将处理器1001处理后的数据发送至其他的电子设备,或者,接收其他的电子设备发送的数据等。The network interface 1003 is used for sending and receiving data, for example, sending data processed by theprocessor 1001 to other electronic devices, or receiving data sent by other electronic devices.

当然,为了简化,图10中仅示出了该电子设备1000中与本申请有关的组件中的一些,省略了诸如总线、输入/输出接口等等的组件。除此之外,根据具体应用情况,电子设备1000还可以包括任何其他适当的组件。本领域技术人员可以理解,图10仅仅是电子设备1000的示例,并不构成对电子设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件。Of course, for the sake of simplicity, only some of the components related to the present application in the electronic device 1000 are shown in FIG. 10 , and components such as bus, input/output interface, etc. are omitted. In addition, according to specific application conditions, the electronic device 1000 may further include any other appropriate components. Those skilled in the art can understand that FIG. 10 is only an example of the electronic device 1000 and does not constitute a limitation to the electronic device. It may include more or less components than shown in the figure, or combine certain components, or different components.

除了上述方法、装置和电子设备以外,本申请实施例还可以提供了一种计算机程序产品,其包括计算机程序指令,所述计算机程序指令在被处理器运行时,使得所述处理器执行本说明书上述“方法”部分中描述的本申请各种实施例的搜索方法中的步骤。其中,所述计算机程序产品可以以一种或多种程序设计语言的任意组合来编写用于执行本申请实施例操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言,诸如Java、C++等,还包括常规的过程式程序设计语言,诸如“C”语言或类似的程序设计语言。其中,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。计算机程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。In addition to the above-mentioned method, device, and electronic equipment, an embodiment of the present application may also provide a computer program product, which includes computer program instructions, and when the computer program instructions are executed by a processor, the processor executes the instructions in this specification. The steps in the search methods of various embodiments of the present application described in the "Methods" section above. Wherein, the computer program product can be written in any combination of one or more programming languages to execute the computer program codes for performing the operations of the embodiments of the present application, and the programming languages include object-oriented programming languages, such as Java , C++, etc., and also includes conventional procedural programming languages such as the "C" language or similar programming languages. Wherein, the computer program code may be in the form of source code, object code, executable file or some intermediate form. The computer program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or Execute on the server.

此外,本申请实施例还可以提供了一种计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令在被处理器运行时使得所述处理器执行本说明书上述“方法”部分中描述的根据本公开各种实施例的搜索方法中的步骤。所述计算机可读存储介质可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以包括但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。需要说明的是,所述计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,计算机可读介质不包括电载波信号和电信信号。In addition, an embodiment of the present application may also provide a computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the processor executes the above-mentioned "method" part of this specification. The steps in the search method according to various embodiments of the present disclosure described in . The computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof, for example. More specific examples (non-exhaustive list) of readable storage media include: electrical connection with one or more conductors, portable disk, hard disk, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. It should be noted that the content contained in the computer-readable medium may be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, computer-readable media Excludes electrical carrier signals and telecommunication signals.

在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。In the above-mentioned embodiments, the descriptions of each embodiment have their own emphases, and for parts that are not detailed or recorded in a certain embodiment, refer to the relevant descriptions of other embodiments.

应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本发明实施例的实施过程构成任何限定。It should be understood that the sequence numbers of the steps in the above embodiments do not mean the order of execution, and the execution order of each process should be determined by its functions and internal logic, and should not constitute any limitation to the implementation process of the embodiment of the present invention.

以上结合具体实施例描述了本申请的基本原理,但是,需要指出的是,在本申请中提及的优点、优势、效果等仅是示例而非限制,不能认为这些优点、优势、效果等是本公开的各个实施例必须具备的。另外,上述公开的具体细节仅是为了示例的作用和便于理解的作用,而非限制,上述细节并不限制本公开为必须采用上述具体的细节来实现。The basic principles of the present application have been described above in conjunction with specific embodiments, but it should be pointed out that the advantages, advantages, effects, etc. mentioned in the application are only examples rather than limitations, and these advantages, advantages, effects, etc. Various embodiments of the present disclosure must have. In addition, the specific details disclosed above are only for the purpose of illustration and understanding, rather than limitation, and the above details do not limit the present disclosure to be implemented by using the above specific details.

本公开中涉及的器件、装置、设备、系统的方框图仅作为例示性的例子并且不意图要求或暗示必须按照方框图示出的方式进行连接、布置、配置。如本领域技术人员将认识到的,可以按任意方式连接、布置、配置这些器件、装置、设备、系统。诸如“包括”、“包含”、“具有”等等的词语是开放性词汇,指“包括但不限于”,且可与其互换使用。这里所使用的词汇“或”和“和”指词汇“和/或”,且可与其互换使用,除非上下文明确指示不是如此。这里所使用的词汇“诸如”指词组“诸如但不限于”,且可与其互换使用。The block diagrams of devices, devices, devices, and systems involved in the present disclosure are only illustrative examples and are not intended to require or imply that they must be connected, arranged, and configured in the manner shown in the block diagrams. As will be appreciated by those skilled in the art, these devices, devices, devices, systems may be connected, arranged, configured in any manner. Words such as "including", "comprising", "having" and the like are open-ended words meaning "including but not limited to" and may be used interchangeably therewith. As used herein, the words "or" and "and" refer to the word "and/or" and are used interchangeably therewith, unless the context clearly dictates otherwise. As used herein, the word "such as" refers to the phrase "such as but not limited to" and can be used interchangeably therewith.

还需要指出的是,在本公开的装置、设备和方法中,各部件或各步骤是可以分解和/或重新组合的。这些分解和/或重新组合应视为本公开的等效方案。It should also be pointed out that, in the devices, equipment and methods of the present disclosure, each component or each step can be decomposed and/or reassembled. These decompositions and/or recombinations should be considered equivalents of the present disclosure.

为了例示和描述的目的已经给出了以上描述。此外,此描述不意图将本公开的实施例限制到在此公开的形式。尽管以上已经讨论了多个示例方面和实施例,但是本领域技术人员将认识到其某些变型、修改、改变、添加和子组合。The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit the disclosed embodiments to the forms disclosed herein. Although a number of example aspects and embodiments have been discussed above, those skilled in the art will recognize certain variations, modifications, changes, additions and sub-combinations thereof.

可以理解的是,在本申请的实施例中涉及的各种数字编号仅为描述方便进行的区分,并不用来限制本申请的实施例的范围。It can be understood that the various numbers involved in the embodiments of the present application are only for convenience of description, and are not used to limit the scope of the embodiments of the present application.

Claims (10)

CN202310416807.2A2023-04-172023-04-17 Search method and computing devicePendingCN116401255A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202310416807.2ACN116401255A (en)2023-04-172023-04-17 Search method and computing device

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202310416807.2ACN116401255A (en)2023-04-172023-04-17 Search method and computing device

Publications (1)

Publication NumberPublication Date
CN116401255Atrue CN116401255A (en)2023-07-07

Family

ID=87015829

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202310416807.2APendingCN116401255A (en)2023-04-172023-04-17 Search method and computing device

Country Status (1)

CountryLink
CN (1)CN116401255A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US6360215B1 (en)*1998-11-032002-03-19Inktomi CorporationMethod and apparatus for retrieving documents based on information other than document content
CN101751405A (en)*2008-12-122010-06-23国际商业机器公司Method and system for searching documents
CN107526744A (en)*2016-06-212017-12-29北京搜狗科技发展有限公司A kind of information displaying method and device based on search
CN109522390A (en)*2018-11-142019-03-26山东大学A kind of search result methods of exhibiting and device
CN110188166A (en)*2019-05-152019-08-30北京字节跳动网络技术有限公司Document search method, device and electronic equipment
CN110222203A (en)*2019-06-192019-09-10深圳前海微众银行股份有限公司Metadata searching method, device, equipment and computer readable storage medium
CN112256822A (en)*2020-10-212021-01-22平安科技(深圳)有限公司 Text search method, apparatus, computer equipment and storage medium
CN113609380A (en)*2021-07-122021-11-05北京达佳互联信息技术有限公司Label system updating method, searching method, device and electronic equipment
CN113761129A (en)*2021-08-132021-12-07北京搜狗科技发展有限公司 Article search method and apparatus, computer equipment, and storage medium
CN115687579A (en)*2022-09-222023-02-03广州视嵘信息技术有限公司Document tag generation and matching method and device and computer equipment

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US6360215B1 (en)*1998-11-032002-03-19Inktomi CorporationMethod and apparatus for retrieving documents based on information other than document content
CN101751405A (en)*2008-12-122010-06-23国际商业机器公司Method and system for searching documents
CN107526744A (en)*2016-06-212017-12-29北京搜狗科技发展有限公司A kind of information displaying method and device based on search
CN109522390A (en)*2018-11-142019-03-26山东大学A kind of search result methods of exhibiting and device
CN110188166A (en)*2019-05-152019-08-30北京字节跳动网络技术有限公司Document search method, device and electronic equipment
CN110222203A (en)*2019-06-192019-09-10深圳前海微众银行股份有限公司Metadata searching method, device, equipment and computer readable storage medium
CN112256822A (en)*2020-10-212021-01-22平安科技(深圳)有限公司 Text search method, apparatus, computer equipment and storage medium
CN113609380A (en)*2021-07-122021-11-05北京达佳互联信息技术有限公司Label system updating method, searching method, device and electronic equipment
CN113761129A (en)*2021-08-132021-12-07北京搜狗科技发展有限公司 Article search method and apparatus, computer equipment, and storage medium
CN115687579A (en)*2022-09-222023-02-03广州视嵘信息技术有限公司Document tag generation and matching method and device and computer equipment

Similar Documents

PublicationPublication DateTitle
US10565273B2 (en)Tenantization of search result ranking
US10146862B2 (en)Context-based metadata generation and automatic annotation of electronic media in a computer network
CN113190687B (en)Knowledge graph determining method and device, computer equipment and storage medium
US20160034514A1 (en)Providing search results based on an identified user interest and relevance matching
WO2019091026A1 (en)Knowledge base document rapid search method, application server, and computer readable storage medium
WO2006108069A2 (en)Searching through content which is accessible through web-based forms
CN112148701A (en) Method and device for document retrieval
JP2015525929A (en) Weight-based stemming to improve search quality
CN113407785A (en)Data processing method and system based on distributed storage system
CN110633375A (en)System for media information integration utilization based on government affair work
KR20180097120A (en)Method for searching electronic document and apparatus thereof
US20180089335A1 (en)Indication of search result
CN111400323A (en)Data retrieval method, system, device and storage medium
WO2023151576A1 (en)Search recommendation method, search recommendation system, computer device and storage medium
CN114330329A (en) A service content search method, device, electronic device and storage medium
WO2019071907A1 (en)Method for identifying help information based on operation page, and application server
CN112507198B (en) Method, apparatus, apparatus, medium and program for processing query text
US8799314B2 (en)System and method for managing information map
CN110569419A (en)question-answering system optimization method and device, computer equipment and storage medium
CN114706938A (en)Document tag determination method and device, electronic equipment and storage medium
CN111666479A (en)Method for searching web page and computer readable storage medium
CN116401255A (en) Search method and computing device
CN107908724B (en) A data model matching method, device, device and storage medium
CN116186198A (en)Information retrieval method, information retrieval device, computer equipment and storage medium
US20120117449A1 (en)Creating and Modifying an Image Wiki Page

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp