Disclosure of Invention
In view of the above, the embodiment of the invention provides a retrieval optimization method based on a hierarchical expert routing model and CoT reasoning, which at least partially solves the problems of poor retrieval efficiency and adaptability in the prior art.
The embodiment of the invention provides a retrieval optimization method based on a hierarchical expert routing model and CoT reasoning, which comprises the following steps:
step 1, dividing an original document in a layering way, and creating a layering knowledge base;
Step 2, extracting and clustering each segment abstract of each layer in the hierarchical knowledge base, and creating a hierarchical semantic base;
step 3, constructing a hierarchical index library by using each central abstract in the hierarchical semantic library and forming a hierarchical expert routing model by combining an intelligent routing mechanism;
step 4, determining a target level index matched with the problem in a level index library by using a level expert routing model;
Step 5, similarity calculation is carried out on the vectorized problem and the target level index;
Step 6, returning the segment of the original document under the hierarchical semantic library with the highest similarity;
And 7, driving the hierarchical switching by utilizing CoT reasoning, and finally returning the combined fragments to the large model to generate answers corresponding to the questions.
According to a specific implementation manner of the embodiment of the present invention, the step 1 specifically includes:
step 1.1, preprocessing an original document, marking the text at sentence level by adopting a send_ tokenize method of NLTK, and identifying common sentence terminators and other punctuations to obtain a segmented sentence set;
And 1.2, dividing the sentence set into a top-level fragment set, a middle-level fragment set and a bottom-level fragment set by adopting a top-down layer-by-layer division strategy.
According to a specific implementation manner of the embodiment of the present invention, the step 1.2 specifically includes:
Step 1.2.1, dividing the sentence set according to the number of tokens of the sentence set and a first threshold value to obtain a top-level fragment set;
step 1.2.2, dividing the top layer segment set according to a second threshold value to obtain a middle layer segment set;
step 1.2.3, dividing the middle segment set according to a third threshold value to obtain a bottom segment set;
And step 1.2.4, establishing a father-son mapping relation among the top layer fragment set, the middle layer fragment set and the bottom layer fragment set according to the data structures of the top layer fragment set, the middle layer fragment set and the bottom layer fragment set.
According to a specific implementation manner of the embodiment of the invention, the first threshold is greater than the second threshold, and the second threshold is greater than the third threshold;
the data structures of the top layer fragment set, the middle layer fragment set and the bottom layer fragment set all comprise fragment ids, fragment contents, fragment abstracts, fragment vectors, level identifiers and upper layer fragments.
According to a specific implementation manner of the embodiment of the present invention, the step 2 specifically includes:
step 2.1, generating abstracts corresponding to all fragments of each layer in the hierarchical knowledge base by using a large language model;
Step 2.2, vectorizing each abstract to obtain an embedded vector of each abstract;
And 2.3, clustering the abstract vectorized by each layer by using a DBSCAN algorithm to construct a hierarchical semantic library.
According to a specific implementation manner of the embodiment of the present invention, the step 3 specifically includes:
and matching the central summaries of each cluster with similar summaries by using vector indexes, so that the clustered summaries form a hierarchical index library and form a hierarchical expert routing model by combining an intelligent routing mechanism, wherein each index comprises a cluster central summary and associated document fragments.
According to a specific implementation manner of the embodiment of the present invention, the step 4 specifically includes:
step 4.1, obtaining a problem input by a user;
Step 4.2, part-of-speech tagging is carried out through a spaCy model, and keywords are extracted;
step 4.3, matching the keywords with a complexity keyword list and calculating a first Boolean value according to the matching;
step 4.4, extracting other entities in the problem and calculating a second Boolean value according to the extracted other entities;
Step 4.5, calculating a complexity score of the problem according to the first Boolean value and the second Boolean value;
and 4.6, comparing the complexity score with a complexity threshold corresponding to each level index in the level index library, and distributing the problem to the most suitable level index in the level index library by the intelligent routing mechanism as a target level index.
According to a specific implementation manner of the embodiment of the present invention, the step 5 specifically includes:
step 5.1 converting the problem into a high-dimensional vectorized representation using a pre-trained language model
;
Where BERT represents a pre-trained model for problem vectorization,Representing a problem;
step 5.2 converting the center summary in the target level index into a high-dimensional vectorized representation
;
Wherein,A central abstract representing an ith semantic library;
Step 5.3, calculating the similarity between the vectorized problem and each vectorized center abstract
。
According to a specific implementation manner of the embodiment of the present invention, the step 6 specifically includes:
And selecting a semantic library corresponding to the highest similarity and returning all relevant document fragments under the knowledge library through indexes.
According to a specific implementation manner of the embodiment of the present invention, the step 7 specifically includes:
step 7.1, analyzing the relevance among the document fragments by CoT reasoning and deleting irrelevant document fragments;
step 7.2, judging whether all the current document fragments can output answers corresponding to the questions, if so, executing the step 7.4, and if not, executing the step 7.3;
Step 7.3, if the information provided by all the current document fragments is missing, automatically searching the document fragments at the upper layer according to the parent-child mapping relation of the fragments, and repeating the step 7.2 until all the current document fragments can output answers corresponding to the questions;
And 7.4, merging all the document fragments, removing the repeated information and generating answers corresponding to the questions.
The retrieval optimization scheme based on the hierarchical expert routing model and the CoT reasoning comprises the steps of 1, creating a hierarchical knowledge base by means of hierarchical segmentation of an original document, 2, extracting each segment abstract of each layer in the hierarchical knowledge base, clustering, creating a hierarchical semantic base, 3, constructing a hierarchical index base by using each center abstract in the hierarchical semantic base and combining an intelligent routing mechanism to form a hierarchical expert routing model, 4, determining a target hierarchical index matched with a problem in the hierarchical index base by means of the hierarchical expert routing model, 5, calculating the similarity of the vectorized problem and the target hierarchical index, 6, returning segments of the original document under the hierarchical semantic base with highest similarity, 7, driving hierarchical switching by means of CoT reasoning, finally merging the segments, returning a large model, and generating an answer corresponding to the problem.
The embodiment of the invention has the beneficial effects that through the scheme of the invention, the document is subjected to hierarchical segmentation, the hierarchical semantic library and the hierarchical expert index are constructed, and the multi-level information integration from fine granularity to global background is realized, so that the system can flexibly cope with queries with different complexity. Specifically, an intelligent routing mechanism is designed to dynamically optimize a query retrieval path according to the complexity of the query. In addition, a novel Chain-of-Thought (CoT) reasoning mechanism is introduced, and the traditional CoT reasoning process is simplified by judging whether to merge fragments or perform hierarchical switching, so that unnecessary calculation expenditure is reduced when simple inquiry is processed, the capability of the system when the complex multi-step problem is processed is improved, and the obvious improvement of the retrieval efficiency and the generation quality is realized. The RAG not only can effectively solve the problems of information loss and insufficient context understanding, but also can realize reasonable allocation and efficient utilization of resources in query tasks with different complexity.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It is noted that various aspects of the embodiments are described below within the scope of the following claims. It should be apparent that the aspects described herein may be embodied in a wide variety of forms and that any specific structure and/or function described herein is merely illustrative. Based on the present disclosure, one skilled in the art will appreciate that one aspect described herein may be implemented independently of any other aspect, and that two or more of these aspects may be combined in various ways. For example, an apparatus may be implemented and/or a method practiced using any number of the aspects set forth herein. In addition, such apparatus may be implemented and/or such methods practiced using other structure and/or functionality in addition to one or more of the aspects set forth herein.
It should also be noted that the illustrations provided in the following embodiments merely illustrate the basic concept of the present invention by way of illustration, and only the components related to the present invention are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, and the form, number and proportion of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complicated.
In addition, in the following description, specific details are provided in order to provide a thorough understanding of the examples. However, it will be understood by those skilled in the art that the aspects may be practiced without these specific details.
Currently, RAG methods are largely divided into two categories, single step search and multi-step search. For the simple task or the case where the user information needs are explicit, single step retrieval is relatively efficient. However, for complex tasks or tasks involving lengthy text generation, such as long form question answering, multi-hop reasoning, thought chain reasoning, etc., searching relying only on the initial input of the user may not fully cover all of the external knowledge required by the model, whereas multi-step search methods can better cope with complex tasks by alternating search and reasoning, but have large computational overhead and low efficiency in processing simple queries.
IRCoT (INTERLEAVING RETRIEVAL WITH CHAIN-of-Thought Reasoning) proposes a strategy to be performed alternately in the course of retrieval and reasoning, aimed at improving the solution capability of complex tasks by alternating stepwise reasoning with information retrieval. The method enables the system to better cope with the problem that multi-step reasoning is needed by retrieving relevant information again after each step of reasoning. Although IRCoT performs well in complex reasoning tasks, its frequent search process incurs high computational costs, especially when faced with simple queries, where too many search rounds appear inefficient.
RAPTOR (Recursive Abstractive Processing for Tree-Organized Retrieval) handles long text and complex tasks by building a hierarchical tree structure. The method divides the document into a plurality of layers, and gradually generates abstract abstracts of each layer so as to help the model to process long text. RAPTOR is excellent in processing multi-step tasks that need to span multiple documents and levels, and can significantly improve the accuracy of multi-document questions and answers. However, the multi-level abstract generation process of RAPTOR lacks flexibility in the face of the change of query complexity, which may result in unnecessary multi-level processing, thereby increasing the computational burden of the system.
Adaptive-RAG proposes an Adaptive framework for dynamically selecting the optimal retrieval strategy according to the complexity of the query. According to the method, the complexity classifier is trained, and the single-step search, the no-search and the multi-step search can be flexibly switched according to the complexity of the query, so that the processing efficiency of the complex query and the simple query is effectively balanced. This adaptive strategy significantly reduces the computational overhead in processing simple queries and maintains efficient retrieval and generation capabilities in complex queries. However, adaptive-RAG still has room for improvement in the accuracy of the classifier, and especially in the case of unobvious complexity classification boundaries, policy selection may be inaccurate, thereby affecting the overall performance of the model.
The embodiment of the invention provides a retrieval optimization method based on a hierarchical expert routing model and CoT reasoning, which can be applied to a large language model retrieval process of an Internet scene.
Referring to fig. 1, a flow diagram of a search optimization method based on a hierarchical expert routing model and CoT reasoning is provided in an embodiment of the present invention. As shown in fig. 1, the method mainly comprises the following steps:
step 1, dividing an original document in a layering way, and creating a layering knowledge base;
further, the step 1 specifically includes:
step 1.1, preprocessing an original document, marking the text at sentence level by adopting a send_ tokenize method of NLTK, and identifying common sentence terminators and other punctuations to obtain a segmented sentence set;
And 1.2, dividing the sentence set into a top-level fragment set, a middle-level fragment set and a bottom-level fragment set by adopting a top-down layer-by-layer division strategy.
Further, the step 1.2 specifically includes:
Step 1.2.1, dividing the sentence set according to the number of tokens of the sentence set and a first threshold value to obtain a top-level fragment set;
step 1.2.2, dividing the top layer segment set according to a second threshold value to obtain a middle layer segment set;
step 1.2.3, dividing the middle segment set according to a third threshold value to obtain a bottom segment set;
And step 1.2.4, establishing a father-son mapping relation among the top layer fragment set, the middle layer fragment set and the bottom layer fragment set according to the data structures of the top layer fragment set, the middle layer fragment set and the bottom layer fragment set.
Further, the first threshold is greater than the second threshold, and the second threshold is greater than the third threshold;
the data structures of the top layer fragment set, the middle layer fragment set and the bottom layer fragment set all comprise fragment ids, fragment contents, fragment abstracts, fragment vectors, level identifiers and upper layer fragments.
In particular, the overall architecture of the method of the present invention is mainly divided into three layers, namely, a data preparation and preprocessing layer, a data retrieval layer and a large language model generation layer, and simultaneously, a novel integration of hierarchical semantic library, a hierarchical expert routing model and thinking chain (CoT) reasoning is introduced to improve the retrieval process in the RAG system, as shown in fig. 2. In the data preparation and preprocessing layer, the document hierarchical segmentation is a key step, and a hierarchical knowledge base is created by dividing the document into three layers of a top layer, a middle layer and a bottom layer, so that key information can be accurately captured on different granularities in the subsequent retrieval process, and the subsequent retrieval efficiency is optimized. In the data retrieval layer, the hierarchical expert routing model dynamically selects the optimal hierarchical expert index through an intelligent routing mechanism, so that query analysis can select the most appropriate retrieval path according to the complexity of the problem. Finally, in the large language model generation layer, recursively optimizing fragments returned by the data retrieval layer by utilizing a Chain-of-thoughts (CoT) reasoning mechanism, and generating answers with high precision and integrity by gradually enriching query contexts. The architecture provides a more efficient, flexible and intelligent solution for document retrieval in a multi-level information environment, and is beneficial to improving retrieval performance in a complex scene, and the specific steps are as follows:
the hierarchical semantic library is the basis for constructing a hierarchical expert routing model. By partitioning the document into top, middle and bottom layers, the core content of each individual piece of each layer is condensed into digests, clustered, and thus built into a hierarchical library structure, as shown in FIG. 3.
Constructing a hierarchical knowledge base:
Hierarchical segmentation is a key step in building a hierarchical knowledge base. We propose a top-down hierarchical text segmentation method that ensures that the document maintains sentence and paragraph integrity during segmentation and segments the text into top, middle and bottom segments by setting multi-level token restrictions. Each segment is cut according to predefined token limits, ensuring that the top layer contains global information, the middle layer refines the context, and the bottom layer captures local detail information. By adopting the layer-by-layer refinement mode, the information fineness is improved, the system can flexibly select a proper granularity level for searching according to the complexity and the requirement of the query, and the problems that the document fragment length is difficult to determine, the information is lost and the searching flexibility is poor in the traditional RAG system are solved.
This approach progressively partitions the document top-down, ensuring that the segments of each hierarchy contain the entire contents of its lower layers, thereby preserving the integrity and traceability of the hierarchy. The specific process is as follows:
(1) Sentence boundary identification
Maintaining sentence integrity when hierarchically segmenting documents is critical to ensuring information readability and context consistency. Firstly, text preprocessing is carried out, and data neatness is realized by removing redundant spaces, special characters and the like, so that subsequent analysis can be smoother. We use the send-tokenize method of NLTK to mark the text at sentence level, identifying common sentence terminators (e.g. ", question marks"? exclamation mark "|") and other punctuation points (e.g., quotation marks, brackets, etc.) to confirm the end of the sentence.
A sentence set S is generated for subsequent hierarchical processing.
;
Wherein,Represents a set of sentences after the segmentation,Representing each sentence.
(2) Top-down split logic
In order to ensure that the segmentation logic of top, middle and bottom segments is strict and that the hierarchy maintains containment relationships (i.e., top contains middle and middle contains bottom), a top-down layer-by-layer segmentation strategy can be adopted, and the segmentation of each layer is refined according to the structure of the previous layer.
An initialization is first performed, an empty list is set to store the current paragraph, and the current token count is initialized to zero. And traversing each sentence in the text, checking the number of the tokens, and taking the number of the tokens corresponding to one sentence with the largest number of the tokens as a first threshold value. If the total number of tokens exceeds the limit after the current sentence is added, the current sentence is moved to the next segment and a new segment is started. Otherwise, the current sentence is added to the current segment and the token count is updated.
The total number of the tokens after each sentence is added with the fragments is dynamically judged, the fragments are reasonably divided, the information flow among the fragments is natural, the sentence structure is not destroyed, and therefore, coherent context support is provided for subsequent retrieval and analysis, and the following description of each layer of segmentation is provided:
① Top layer segmentation
The main goal of the segmentation of the top-level segments is to obtain the most global information, each segment does not exceed a first threshold, and a larger context span is maintained. The top-level segmentation is the starting point of the whole layering process, with the result that segments covering more global information.
;
Wherein,Representing a collection of top-level segments,Representing a token threshold for each fragment.
② Middle layer segmentation
The goal of the middle layer segment segmentation is to refine the top layer information further. The number of tokens per middle tier segment does not exceed a second threshold. After the top-level cut is completed, the middle-level segments are subdivided based on the content within the top-level segments, ensuring that each middle-level segment contains more context medium-fine-grained segments.
;
Wherein,Representing a collection of middle-level segments,Representing a token threshold for each fragment.
③ Underlying segmentation
The segmentation of the underlying segments is the finest granularity hierarchy, focusing mainly on local specific information. The fragment length of the bottom layer does not exceed a third threshold, ensuring that the most detailed content is captured. The bottom layer segmentation is performed on the basis of the middle layer segments, which are again subdivided into more specific segments.
;
Wherein,Representing a collection of underlying fragments,Representing a token threshold for each fragment.
The data structure of each fragment is as follows:
① Fragment id, uniquely identifies each fragment, e.g., (top_1, middle_1, low_1).
② Fragment content (content), original text content, fragment information of the current hierarchy is stored.
③ Segment summary (summary) a short summary of segments, for fast reference at hierarchical switching.
④ Vector of segments (vector) the summary vectorized representation of segments is used for similarity calculation.
⑤ The hierarchy identifier (level) indicates the hierarchy (top, middle, low) to which the fragment belongs.
⑥ Upper layer segment (parent_id) indicating the upper layer segment to which the segment belongs.
This naming scheme ensures that each fragment has a unique identifier and can quickly locate the hierarchy, document, where it is located. The fragment of the upper layer is connected through the parent_id field of the fragment, so that the fragment can obtain high-level information. That is, each bottom segment is directed to a middle segment, which is further directed to a top segment. Through the parent-child mapping relation, the system can gradually trace back from fine-grained information to higher-level information.
Step 2, extracting and clustering each segment abstract of each layer in the hierarchical knowledge base, and creating a hierarchical semantic base;
on the basis of the above embodiment, the step2 specifically includes:
step 2.1, generating abstracts corresponding to all fragments of each layer in the hierarchical knowledge base by using a large language model;
Step 2.2, vectorizing each abstract to obtain an embedded vector of each abstract;
And 2.3, clustering the abstract vectorized by each layer by using a DBSCAN algorithm to construct a hierarchical semantic library.
In specific implementation, as shown in fig. 4, the construction of the hierarchical semantic library not only simplifies the query process, but also promotes the concentration degree of the whole information by clustering the summaries of each layer, and the system can aggregate the information of similar subjects in different hierarchies to form a more structured knowledge system, so that the system can quickly respond to the demands of users. The design of the structure aims to ensure that a user can quickly obtain relevant information during retrieval and avoid the trouble of information overload, and the specific steps are as follows:
1. Digest generation
Each layer of each segment in the layered knowledge base adopts gpt-3.5-turbo to generate a segment abstract, so that a model is allowed to convert a large block of text into a concise and coherent abstract of a selected segment, representative abstract information on different layers is ensured, and the concentration and accuracy of the information are ensured.
2. Abstract clustering
After generating the abstracts of the fragments of each layer, clustering is carried out next to form a higher-level semantic knowledge base, and the core semantic information of the text is captured. The purpose of clustering is to classify similar summaries into the same group, thereby improving the organization and retrieval efficiency of information, and the clustering process comprises the following steps:
(1) Abstract vectorization
Before clustering, each abstract needs to be vectorized first to obtain its embedded representation. This step converts each digest into a vector form using a BERT pre-trained embedded model. The BERT model learns deep semantic information from context through a bidirectional transducer architecture, and is suitable for capturing complex meanings of text.
(2) Clustering algorithm selection
Suitable clustering algorithms, such as K-Means, hierarchical clustering or DBSCAN, are selected according to specific requirements and data characteristics.
DBSCAN (density-based clustering with noise) is a typical density-based spatial clustering algorithm. In contrast to K-Means, BIRCH, which are generally applicable only to clustering of convex sample sets, DBSCAN is applicable to both convex and non-convex sample sets. The algorithm divides regions of sufficient density into clusters and finds arbitrarily shaped clusters in the noisy spatial database, which defines clusters as the largest set of densely connected points.
The vectorized abstracts of each layer are clustered by using a DBSCAN algorithm to generate a plurality of semantic knowledge bases, such as a bottom semantic base_1, a bottom semantic base_2, a top semantic base_1 and a top semantic base_2. Each cluster represents a theme or semantic scene, a hierarchical semantic library is formed, and the abstract of the cluster center is reserved as the representative abstract of each semantic library, so that relevant information can be positioned and matched quickly in the retrieval process.
The core of the DBSCAN algorithm is to define density and neighborhood, and key parameters include:
The domain definition is that epsilon neighborhood of all points is calculated, and for each point P in the data set, how many neighbors are in epsilon neighborhood.
;
Wherein,In order to satisfy the number of points of the condition,The distance between points P and Q (e.g., euclidean distance).
Core point this threshold for the number of neighbors is typically defined by a parameter MinPts, and if the number of N (P) is greater than or equal to MinPts, P is the core point and starts to form a new cluster, as shown below.
;
After the clustering is completed, each cluster has a centroid, and the centroid is the average value of all abstract vectors in the cluster. For each clusterCentroid of (C)The calculation formula of (2) is as follows:
;
wherein,Is a clusterIs a function of the number of points in the database,Is each point in the cluster.
For each pointIn the clusterIn, calculate and center of massDistance of (2)
;
Among them, euclidean distance is generally used:
;
Where n is the dimension of the feature,AndRespectively areAnd the value of the centroid on the mth feature.
And calculating the distance between each abstract vector in the cluster and the centroid, and selecting the abstract with the smallest distance as the center abstract.
;
Wherein,Is the index of the point closest to the centroid. The final selected centroid summary is:
;
this represents the slave clusterSelected ones of the summaries closest to the centroidAs the "central abstract" of this cluster.
Step 3, constructing a hierarchical index library by using each central abstract in the hierarchical semantic library and forming a hierarchical expert routing model by combining an intelligent routing mechanism;
On the basis of the above embodiment, the step 3 specifically includes:
and matching the central summaries of each cluster with similar summaries by using vector indexes, so that the clustered summaries form a hierarchical index library and form a hierarchical expert routing model by combining an intelligent routing mechanism, wherein each index comprises a cluster central summary and associated document fragments.
In specific implementation, the traditional information retrieval method generally depends on a single-level indexing mechanism, and the query of a user can not be definitely divided into simple or complex, so that the method is difficult to adapt to different semantic granularities existing in documents, and particularly in the complex field with more obvious layering characteristics of other document information.
To this end we propose a hierarchical expert routing model, the overall architecture of which is shown in figure 5. The core idea is to dynamically select the most suitable semantic hierarchy segment for retrieval according to the complexity of the user problem by combining the hierarchy expert index and the intelligent routing mechanism. The method has the advantages that the method can quickly position to the level related to the problem semantics under the condition of less computing resources, and the retrieval efficiency and accuracy are greatly improved.
The hierarchical expert indexes are independently built in semantic libraries of different levels, so that the system can acquire detailed contents from refined information and can quickly capture macroscopic contexts.
The center digest of each cluster is matched quickly to the similar digests using vector indexes (e.g., FAISS, annoy, etc.). The clustered summaries form semantic library indexes of different levels as follows:
The top expert index contains high-level abstract of the whole document, is used for coarse-granularity retrieval, and is suitable for inquiring the whole background, summary or answer extensive questions of the document.
Middle expert index, which contains more detailed abstracts, provides a medium granularity search, and is suitable for answering questions that require context and multiple detail support, such as queries for technical processes, application scenarios, or multidimensional comparisons.
The bottom expert index contains the fragment abstract with the finest granularity, provides accurate document content, and is suitable for the problems of refinement and accurate matching, such as specific technical parameters or simple fact query.
Each index contains the following:
cluster center abstract-the most representative abstract in each cluster.
Associating document fragments, namely, original text pointing to all abstract fragments in the semantic library.
Step 4, determining a target level index matched with the problem in a level index library by using a level expert routing model;
on the basis of the above embodiment, the step4 specifically includes:
step 4.1, obtaining a problem input by a user;
Step 4.2, part-of-speech tagging is carried out through a spaCy model, and keywords are extracted;
step 4.3, matching the keywords with a complexity keyword list and calculating a first Boolean value according to the matching;
step 4.4, extracting other entities in the problem and calculating a second Boolean value according to the extracted other entities;
Step 4.5, calculating a complexity score of the problem according to the first Boolean value and the second Boolean value;
and 4.6, comparing the complexity score with a complexity threshold corresponding to each level index in the level index library, and distributing the problem to the most suitable level index in the level index library by the intelligent routing mechanism as a target level index.
In practice, as shown in fig. 6, in a large-scale knowledge base, the calculation and storage costs are often key factors limiting the performance of the system. General search methods typically traverse a wide knowledge base, often requiring significant time and memory resources. The intelligent routing mechanism preferentially selects relevant levels according to the characteristics and complexity of the problem, so that unnecessary calculation and time expenditure are reduced, the overall resource requirement of the system is reduced, and the system is more efficient and expandable in processing large-scale data. In this way, the system can respond to user queries faster, improving user experience.
1. Complexity determination
The core of the intelligent routing mechanism is context demand identification, which determines whether a user problem requires more context information by analyzing the complexity of the problem. Complex questions are routed preferentially to hierarchical experts that provide more context information, while simple questions can get answers quickly in lower levels. Therefore, we propose a comprehensive complexity evaluation method, by means of keyword extraction, entity recognition and the like, the input problems are scored in complexity, and the intelligent route can preferentially select a proper hierarchical expert index according to similarity calculation, so that information retrieval is efficiently and accurately completed.
1. Keyword extraction
In the intelligent routing mechanism, keyword extraction is a core link for judging the complexity of the problem. By identifying important words in the user's question, the amount and complexity of contextual information that is required can be inferred. The accuracy of this step directly affects whether the system can select the most appropriate hierarchical expert index for retrieval and processing. Based on Natural Language Processing (NLP) technology, keyword extraction not only can automatically identify core words in a problem, but also can provide further complexity assessment by matching with a complexity keyword table.
In order to automatically judge the complexity of the problem, a part-of-speech tagging technology and a predefined common complexity keyword list are adopted for matching. This approach evaluates the complexity of the problem by identifying core words in the problem and then matching these words to common patterns in the complexity key word list.
(1) Part of speech tagging
Part-of-speech tagging is a technique for tagging grammatical roles (e.g., verbs, nouns, etc.) of each word in a sentence by automation. Verbs and nouns often reflect the complexity of the problem, especially for explanatory and inferential problems, during keyword extraction. These questions typically contain verbs such as "how" (indicating that the question requires a solution), "why" (indicating that the question involves causal relationships), nouns such as "cause" (indicating that a deep explanation is required), "solution" (indicating that a specific solution is required), and so on.
① Loading spaCy model
First, language processing is performed using a pre-trained model of spaCy. A common model, such as zh_core_web_sm, contains the necessary tools for part-of-speech tagging.
② Text processing and part-of-speech tagging
And transmitting the problems input by the user into the model for part-of-speech tagging. spaCy automatically identify each word in the sentence and assign it a part-of-speech tag (e.g., verb, noun, etc.).
③ Extracting core vocabulary
In this step, we are not limited to extracting verbs and nouns, but also recognize other core words related to the complexity of the problem, such as adverbs, pronouns, and other information. These vocabularies can provide richer contextual cues to help the system more fully understand the structure and complexity of the problem. Nouns generally refer to specific objects or concepts related to a problem, adverbs often refer to the background or explanatory needs of the problem, and pronouns are often used to ask for specific details or background. By comprehensively extracting the core words, the system can obtain more dimensional information, and the understanding capability of the problem is further enhanced.
(2) Common complexity keyword table matching
To further determine the complexity of the problem, we introduce a common complexity key table. The table contains a common set of complexity keywords such as "why", "how", "why", "explain", "analyze", etc. Keywords in the table typically indicate that the problem requires more contextual information or background interpretation.
After the part of speech tagging is completed, the extracted core verbs, nouns, adverbs and pronouns are matched with the keyword list. Keywords that match successfully will be given higher complexity weights. For example, for the sentence "what is the cause of a summary system failure," the system will recognize "summary" (verb) and "cause" (noun) and match to "summary" and "cause" in the keyword table, thereby judging that the problem has a higher complexity.
Keyword representation example:
Pronouns which, who, where, etc.
Verbs-solve, generate, analyze, generalize, summarize, introduce, etc.
Nouns-cause, scenario, logic, problem, background, theory, time, etc.
Adverbs-when, why, how, why, major, approximate, etc.
In order to perform keyword matching and complexity calculation, a boolean matching method is adopted, that is, whether the extracted keywords appear in a keyword table or not is judged, and the corresponding formulas are as follows:
;
wherein,Is a Boolean value representing a keywordIf a problem exists, the value is 1 when the problem exists, and the value is 0 when the problem does not exist.
2. Other entity extraction
In addition to the basic keywords described above, it is also important to extract other entities (e.g., name of person, organization, specific time, place, etc.) in the task of Natural Language Processing (NLP). Particularly when the questions include names of persons, organizations, specific times and places, these entities generally mean that the questions have a more definite context and do not require additional background information to solve, the system preferably retrieves from the underlying low fine-grained segments. Let the other extracted entities be E and the corresponding entity set beThe matching formula is:
;
wherein,Is an entity extracted.
3. Complexity calculation
For the complexity calculation of each problem, an effective complexity calculation formula is prepared by combining the matching condition of the extracted keywords and the keyword list of the part-of-speech tags and the extracted entities:
;
by the formula, the system can comprehensively consider different factors, so that the complexity of the problem can be accurately judged, and the optimal processing strategy can be selected according to the complexity. Wherein,Is in combination with keywordsThe corresponding weight indicates the degree of contribution of the keyword to the complexity, n is the total number of extracted keywords,Is the weight of the entity and,For the other entities extracted, m is the total number of entities.
2. Routing decisions
In the intelligent routing mechanism, according to the complexity score of the problem, the system can select corresponding hierarchical expert indexes according to different complexities of the problem, so that the retrieval efficiency and accuracy are improved. The following are routing strategies based on complexity optimization:
1. bottom expert index priority:
For low complexity queries, particularly queries containing explicit entities such as precise time, place, name, or organization, the system will look up preferentially from the underlying expert index. These underlying expert indexes contain the finest granularity of semantic segments, enabling processing of highly accurate matching queries. This strategy works well for those problems that do not require additional context or background interpretation.
2. Middle expert index priority:
When the complexity of the query is high, including complexity keywords (e.g., "cause," "interpret," "analyze," etc.), or context support is required, the system will choose a middle level expert index. The middle expert index contains more semantic relationships and background information, is suitable for processing questions requiring wider background or multi-angle interpretation, and provides medium-granularity answers.
3. Top-level expert index priority:
For high complexity queries, such as those involving keywords such as "why", "how", etc., and the problem requires the system to provide more background or extensive information, the system will first access the top-level expert index. The top-level expert index provides overall overview and global information of the document, can provide answers for extensive or complex high-level queries, and is suitable for processing problems involving macroscopic backgrounds, overall summaries, or multi-step reasoning.
Complexity scoreIs the core in determining the problem routing hierarchy. Based on the score, the system can intelligently select the most appropriate hierarchical expert index, with the following specific rules:
wherein,AndThe middle and top expert indexes are thresholds of complexity,Representing the corresponding hierarchical expert index, respectively.
Step 5, similarity calculation is carried out on the vectorized problem and the target level index;
on the basis of the above embodiment, the step 5 specifically includes:
step 5.1 converting the problem into a high-dimensional vectorized representation using a pre-trained language model
;
Where BERT represents a pre-trained model for problem vectorization,Representing a problem;
step 5.2 converting the center summary in the target level index into a high-dimensional vectorized representation
;
Wherein,A central abstract representing an ith semantic library;
;
In practice, once the user problem is routed to the corresponding hierarchical expert index after complexity evaluation through the intelligent route in the hierarchical expert routing model, the system will enter the preliminary retrieval stage. The core objective of the process is that the problem of the user is vectorized, similarity calculation is carried out on the problem and the hierarchical expert index selected by the intelligent route, and the semantic knowledge base corresponding to the central abstract vector with the highest similarity is the target semantic knowledge base. This step ensures that the system can accurately match the most relevant knowledge base, thereby returning all relevant original document fragments under the target semantic knowledge base.
1. User problem vectorization
In order to perform similarity calculation with the hierarchical expert index, it is first necessary to vectorize the problem of the user. We use a pre-trained language model bert to translate the natural language problem into a high-dimensional vector representation. Let the problem of the user be q, its vectorization is expressed as:
;
Where BERT represents a pre-trained model for problem vectorization. BERT converts input tokens of question q into semantic vectors of outputThe core semantic information of the problem is captured.
2. Central abstract vector representation of semantic knowledge base
The semantic knowledge base of each hierarchy is constructed from abstract clusters, the central abstract of the cluster representing the core semantic content of the knowledge base. In the primary retrieval process, the vector of the user problem is subjected to similarity calculation with the abstract vector of each clustering center. Let the central abstract of the ith semantic knowledge base beIts vectorization is expressed as:
;
BERT is also used to abstractConversion into vectorsThis vector captures the semantic information of the knowledge base center abstract.
3. Similarity calculation
In order to calculate the similarity between the user problem and the central abstract of the semantic knowledge base, we use the measurement method of cosine similarity,The calculation formula of (2) is as follows:
;
4. selecting semantic knowledge base with highest similarity
The system calculates the user problem vectorCenter summary vector indexed with the hierarchyAnd selecting a semantic knowledge base with highest similarity. If the level has N semantic knowledge bases, selecting the knowledge base with highest similarity:
;
The system will select the semantic knowledge base with the highest similarity as the target semantic knowledge base around which subsequent searches will be performed.
Step 6, returning the segment of the original document under the hierarchical semantic library with the highest similarity;
Further, the step 6 specifically includes:
And selecting a semantic library corresponding to the highest similarity and returning all relevant document fragments under the knowledge library through indexes.
In particular, once the best matching semantic knowledge base is determinedThe system will return all relevant document snippets under the knowledge base by index. The sources of these document fragments may be different parents of the same hierarchy, but because they are semantically highly similar they are grouped into the same cluster, the system can aggregate the scattered fragments in similar semantic space, ensuring that the returned information is as complete as possible and relevant to the user problem.
The flexible retrieval mechanism can ensure that the system dynamically balances local detail and global information according to the complexity of the problem. By introducing a multi-level indexed structure, the system not only improves the response capability to accurate queries, but also can address a broad range of problems requiring more context.
And 7, driving the hierarchical switching by utilizing CoT reasoning, and finally returning the combined fragments to the large model to generate answers corresponding to the questions.
On the basis of the above embodiment, the step 7 specifically includes:
step 7.1, analyzing the relevance among the document fragments by CoT reasoning and deleting irrelevant document fragments;
step 7.2, judging whether all the current document fragments can output answers corresponding to the questions, if so, executing the step 7.4, and if not, executing the step 7.3;
Step 7.3, if the information provided by all the current document fragments is missing, automatically searching the document fragments at the upper layer according to the parent-child mapping relation of the fragments, and repeating the step 7.2 until all the current document fragments can output answers corresponding to the questions;
And 7.4, merging all the document fragments, removing the repeated information and generating answers corresponding to the questions.
In practice, in the field of artificial intelligence, especially Natural Language Processing (NLP), the design and optimization of promt is critical to improving model performance. In recent years, the thought chain (Chain of Thought, COT) has gradually attracted extensive attention from researchers and developers as an emerging Prompt design strategy.
Most thinking chain COT breaks down a complex problem into a number of simple sub-problems or steps by introducing a series of stepwise reasoning steps into the Prompt, guiding the model to think in terms of a logic chain. Although the accuracy and interpretability of the output are improved to some extent, the complex reasoning process brings about problems, such as that in the step-by-step reasoning process, errors of each step may accumulate in subsequent steps, for the problem of simple logic, the traditional CoT mode may cause unnecessary calculation waste, generate excessive intermediate steps to affect the overall efficiency, and on equipment with limited calculation resources, the complex reasoning process may need more calculation resources to cause performance degradation.
Therefore, we propose a novel CoT, which introduces a CoT mechanism according to all original fragments under a hierarchical semantic library selected by a hierarchical expert routing model, and simplifies the reasoning process. Through multiple merging and hierarchical switching, the solving capability of the system to complex problems is greatly enhanced, and the large model evaluates the integrity, the merging property and the retrieval path of the fragments through CoT reasoning, so that the system can generate complete answers meeting the requirements of users more quickly, the risk of error accumulation is reduced, efficient and concise information processing is realized, and the consumption of computing resources is reduced.
1. Prompt design
The promt design in our CoT reasoning process ensures that the system can evaluate the segment integrity step by step and make a decision to merge or switch hierarchy. The following is the promt for this method:
You are an expert in the search system. According to the user query, judging whether the current fragment can completely answer the question "{ user query }"). Deciding whether another step is needed or you are ready to give the final answer. Responding in JSON format, { "title": "," content ":", "next_action": "}, wherein the key value of" next_action "is either" continuous "or" final_answer ".
The following steps are needed to be completed:
1. if there are multiple relevant segments, please analyze the relevance of the segments to the user query to determine if they need to be merged. If necessary, the relevant segments are combined and duplicate information is removed.
2. It is determined whether the current segment is sufficiently complete. If incomplete or scattered, the tag needs to retrieve upper layer information. The upper layer segment { parent_id } is queried and this process is repeated until the top layer is reached or a complete answer is obtained.
3. Finally, interpret your reasoning and indicate if higher level information is needed.
2. Specific steps of CoT reasoning search
As shown in fig. 7, the system can effectively process complex queries, automatically merge relevant fragments, progressively retrieve higher-level information, and finally generate consistent and accurate answers based on fragment merging and hierarchical retrieval of CoT. The method not only improves the retrieval efficiency, but also remarkably improves the query experience of the user, and the specific steps are as follows:
1. Fragment correlation analysis and merging
The system analyzes the relevance between the fragments through CoT reasoning, and focuses on evaluating whether the fragments can provide relevant information for the user problem. If the content of the plurality of segments has strong semantic relevance and can jointly interpret the user's query, the system will determine that the segments can be combined and delete redundant segments. The integrity of the information is improved by intelligent merging of the relevant fragments. This allows the CoT to be not just a simple reasoning, but rather to construct a more complete answer.
2. Fragment integrity determination
Because of some pieces of information being scattered or lacking context, it may not be possible to answer the user's question directly. The system needs to evaluate whether the current segment content can completely answer the query of the user in the step, and judges whether the query needs more global background support. For example, if the underlying segments are detail information, but lack the corresponding context, the CoT inference would consider that upper-level global information is needed to provide the association between the segments.
3. Progressive hierarchical retrieval
If the system detects that certain segments only provide partial information (e.g., the background information is insufficient or the interpretation is unclear), the system will automatically retrieve the upper level document segments based on the parent node information of those segments, and repeat step 2.
By such recursive search, more context information can be gradually supplemented. The system will progressively search up the hierarchy until the top level is reached or a segment is found that can answer the question in its entirety. Such a progressive search strategy may ensure that the upper layer information is fully utilized when needed, rather than simply relying on the underlying fine-grained segments.
4. Final answer generation
After the system acquires enough fragments through hierarchical switching, all the fragment contents are finally combined, repeated information is removed, and a complete answer is generated.
3. CoT reasoning example
Let us assume that the user inquires about "what is why to solve the system failure?
1. First step, preliminary search
The system finds 3 relevant fragments in the middle semantic library, but the fragment content is more scattered. By correlation analysis, it was confirmed that the fragments were interrelated.
Title: "fragment correlation analysis and merger"
Content "there is a strong correlation between segments 1,2, both of which relate to information of system failure, the system tries to merge segments. "
Next_action: "continue"
2. Second step, segment integrity determination
Segment 1 explains a part of the failure of the system, but lacks a solution, segment 2 details the solution. The segments, through analysis, lack global context information.
Title: "fragment integrity determination"
Content "fragments 1 and 2, when combined, provide part of the context and solution information, but lack analysis of the overall failure cause. "
Next_action: "continue"
3. Third step, searching to the upper layer
Because the middle-level segment cannot fully answer the question, the system retrieves the question to the upper level. The upper layer segment contains a more comprehensive explanation of the cause of the system failure, and a complete answer is obtained after the combination.
Title: "search to upper layer"
Content by retrieving top-level fragments, the system complements a comprehensive analysis of the cause of the fault. The combined fragments can answer the question completely. "
Next_action: "final_answer"
4. Final result
The system determines that the combined segments have been able to answer the user question in their entirety and returns the final answer, and a plot of an example of the cots reasoning is shown in fig. 8.
According to the retrieval optimization method based on the hierarchical expert routing model and the CoT reasoning, the document is subjected to hierarchical segmentation to construct a hierarchical semantic library and a hierarchical expert index, so that multi-level information integration from fine granularity to global background is realized, and the system can flexibly cope with queries with different complexity. Specifically, an intelligent routing mechanism is designed to dynamically optimize a query retrieval path according to the complexity of the query. In addition, a novel Chain-of-Thought (CoT) reasoning mechanism is introduced, and the traditional CoT reasoning process is simplified by judging whether to merge fragments or perform hierarchical switching, so that unnecessary calculation expenditure is reduced when simple inquiry is processed, the capability of the system when the complex multi-step problem is processed is improved, and the obvious improvement of the retrieval efficiency and the generation quality is realized. The RAG not only can effectively solve the problems of information loss and insufficient context understanding, but also can realize reasonable allocation and efficient utilization of resources in query tasks with different complexity.
Specifically, the hierarchical semantic library construction method based on hierarchical document processing is provided, documents are divided into three layers of a top layer, a middle layer and a bottom layer according to granularity requirements, semantic information of different layers is effectively organized, a system can intelligently select different granularity layers to search according to query requirements, the problem that the traditional RAG segmentation length is difficult to flexibly adjust is solved, and response speed and accuracy of user query are improved.
The hierarchical expert routing model is provided, the type analysis is carried out on the query through a natural language processing technology, the system can intelligently select proper hierarchical experts as initial retrieval according to analysis results, the inefficient operation of indifferently traversing all document fragments in the traditional method is avoided, and the query path is optimized.
The method and the device introduce thinking-over (CoT) reasoning into a retrieval process, dynamically adjust the retrieval depth by utilizing the reasoning capability of the CoT, and efficiently switch between different levels, so that the method and the device can not only merge the refined information of the bottom layer, but also trace back to a higher level to provide comprehensive context support, realize the self-adaptive adjustment of a retrieval path and improve the processing capability of complex queries.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present invention should be included in the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.