Disclosure of Invention
The embodiment of the invention provides a document generation method, a document generation system and a document generation medium based on a large language model so as to provide document generation efficiency.
In order to achieve the above object, an embodiment of the present invention provides a document generation method based on a large language model, including:
Acquiring document input, analyzing the document input, extracting a first problem and first knowledge, and obtaining first document data;
according to the first document data, carrying out query indexes in a knowledge base to obtain a plurality of pieces of information to be processed, wherein the knowledge base is an information resource base and comprises historical document knowledge and technical document knowledge;
according to the information to be processed, asking questions of the large language model to obtain a first response;
and generating a first document according to the first document data, the plurality of pieces of information to be processed and the first response, wherein the first document comprises a product requirement document and a software requirement document.
Compared with the prior art, the embodiment of the invention has the beneficial effects that:
The method and the device realize automatic processing of the document by acquiring document input and analyzing, lighten the burden of manually processing the document, utilize the first document data to search and index in a knowledge base, effectively screen out information related to the document from a large amount of information, improve the efficiency of information retrieval, realize unified processing of the information acquired from different sources by integrating the first document data, the information to be processed and the first response, facilitate comprehensive analysis and generation of new documents, realize questioning of the information to be processed by interacting with a large language model, thereby acquiring response with deeper level and complexity, facilitating processing of complex problems and providing more detailed information.
Optionally, according to the first document data, query indexes are performed in a knowledge base to obtain a plurality of pieces of information to be processed, specifically:
Performing feature extraction and text embedding on the first knowledge in the first document data to obtain a first vector;
and carrying out query indexing in the knowledge base according to the first vector to obtain a plurality of pieces of information to be processed.
By implementing the selectable item, through feature extraction and text embedding, the document data can be expressed as a vector with more semantic information, so that the query accuracy is improved, and the obtained information to be processed is ensured to be more consistent with the content and the context of the document.
Optionally, the querying index is performed in the knowledge base according to the first vector to obtain a plurality of pieces of information to be processed, specifically:
Calculating cosine values of included angles between the first vector and knowledge base information in the knowledge base and comparing cosine similarity to obtain a plurality of first scores;
and extracting knowledge base information with the corresponding first score larger than or equal to a preset first threshold value to obtain a plurality of pieces of information to be processed.
The implementation of the selectable item allows finer information filtering by cosine similarity comparison, ensures that only knowledge base information which is relatively similar to document data is extracted, is favorable for reducing mismatching and improving accuracy of information retrieval, can realize personalized adjustment of matching degree by setting a preset threshold, and can flexibly adjust the threshold according to requirements so as to balance accuracy of retrieval results.
Optionally, asking the large language model according to the plurality of pieces of information to be processed to obtain a first response, which specifically includes:
Filling a preset template to be processed according to the information to be processed and the first questions in the first document data to obtain a first question template, wherein the template to be processed is a template frame for asking questions of a large language model;
and asking questions of the large language model according to the first question template to obtain a first response, wherein the first response is an answer output by the large language model according to the first question template.
By implementing the selectable item, the preset template to be processed provides a structuring method for generating the questions, so that the forms of the questions are more standard and easier to process, and consistency and interpretability of the questions are ensured.
Optionally, after the asking the large language model to obtain the first response, the method further includes:
calculating the score of the first response according to the first response and the knowledge base information to obtain a second score;
updating the first response according to the second score until the second score corresponding to the first response is greater than or equal to a preset second threshold value, so as to obtain an updated first response;
and adding the first response to the knowledge base according to the updated first response to update the knowledge base.
By implementing the present alternative, the quality of the first response may be more accurately assessed by introducing the first evaluation criterion, helping to filter out inaccurate or irrelevant information.
Optionally, after parsing the document input and extracting the first problem and the first knowledge, obtaining first document data, the method further includes:
and updating the knowledge base according to the first document data.
By implementing the selectable item, the knowledge base can be synchronized in real time by updating the knowledge base according to the newly analyzed document data, and the method is helpful for ensuring that the information in the knowledge base is always consistent with the latest document data.
Correspondingly, the embodiment of the invention also provides a document generation system based on the large language model, which comprises an analysis module, an index module, a questioning module and a text forming module;
the analysis module is used for acquiring document input, analyzing the document input, extracting a first problem and first knowledge, and obtaining first document data;
The index module is used for carrying out query indexes in a knowledge base according to the first document data to obtain a plurality of pieces of information to be processed, wherein the knowledge base is an information resource base and comprises historical document knowledge and technical document knowledge;
The questioning module is used for questioning the large language model according to the plurality of pieces of information to be processed to obtain a first response;
The text forming module is used for generating a first document according to the first document data, the plurality of pieces of information to be processed and the first response, wherein the first document comprises a product demand document and a software demand document.
Optionally, the index module further comprises a preprocessing unit and a query unit;
the preprocessing unit is used for extracting features and embedding texts of first knowledge in the first document data to obtain a first vector;
And the query unit is used for carrying out query indexes in the knowledge base according to the first vector to obtain a plurality of pieces of information to be processed.
Optionally, the query unit further comprises a comparison subunit and an extraction subunit;
the comparison subunit is configured to compare the first vector with information of each knowledge base in the knowledge base, calculate cosine values of included angles between the first vector and the information of each knowledge base, and obtain a plurality of first scores;
The extraction subunit is used for extracting knowledge base information with a corresponding first score greater than or equal to a preset first threshold value to obtain a plurality of pieces of information to be processed.
Optionally, the questioning module further comprises a filling unit and a response unit;
The filling unit is used for filling a preset template to be processed according to the information to be processed and the first problem in the first document data to obtain a first question template, wherein the template to be processed is a template frame used for asking questions of a large language model;
The response unit is used for asking questions of the large language model according to the first question template to obtain a first response, wherein the first response is an answer output by the large language model according to the first question template.
Optionally, the questioning module further comprises a scoring unit, a correction unit and an updating unit;
The scoring unit is used for calculating the score of the first response according to the first response and the knowledge base information to obtain a second score;
The correction unit is used for updating the first response according to the second score until the second score corresponding to the first response is greater than or equal to a preset second threshold value, so as to obtain an updated first response;
The updating unit is used for adding the updated first response into the knowledge base so as to update the knowledge base.
Optionally, after the parsing module parses the document input, extracts a first problem and a first knowledge, and obtains first document data, the method further includes:
and updating the knowledge base according to the first document data.
Correspondingly, the embodiment of the invention also provides a computer readable storage medium, which comprises a stored computer program, wherein the computer program is used for controlling equipment where the computer readable storage medium is located to execute the large language model-based document generation method.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be understood that the step numbers used herein are for convenience of description only and are not limiting as to the order in which the steps are performed.
It is to be understood that the terminology used in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
The terms "comprises" and "comprising" indicate the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
The term "and/or" refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
The document generation method, system and medium based on the large language model provided by the embodiment of the invention are used for generating an intelligent document, generating a corresponding document by utilizing the large language model according to the input document content and related information, generating the document by utilizing the obtained information, managing the knowledge base, continuously learning and improving the knowledge base by analyzing new document input and updating the knowledge base, constructing an automatic question and answer system, generating a corresponding answer by utilizing the knowledge base and the language model, generating a scene matched with the text similarity by carrying out feature extraction and text embedding on document data and utilizing cosine similarity comparison, finding knowledge base information related to the input document, and constructing an automatic evaluation system by grading a first response and updating based on preset evaluation criteria.
Embodiment one:
referring to fig. 1, an embodiment of the present invention provides a document generating method based on a large language model, including:
s10, acquiring document input, analyzing the document input, extracting a first problem and first knowledge, and obtaining first document data.
In a specific implementation, after analyzing the document input and extracting a first problem and first knowledge to obtain first document data, the method further comprises updating a knowledge base according to the first document data, wherein the knowledge base is an information resource base and comprises historical document knowledge and technical document knowledge. By updating the knowledge base according to the newly parsed document data, real-time synchronization of the knowledge base can be realized, which is helpful to ensure that information in the knowledge base is always consistent with the latest document data.
S20, according to the first document data, inquiring indexes in a knowledge base to obtain a plurality of pieces of information to be processed.
In a specific implementation, the query index is performed in a knowledge base according to the first document data to obtain a plurality of pieces of information to be processed, specifically, feature extraction and text embedding are performed on first knowledge in the first document data to obtain a first vector, and the query index is performed in the knowledge base according to the first vector to obtain a plurality of pieces of information to be processed. Through feature extraction and text embedding, the document data can be expressed as vectors with more semantic information, so that the query accuracy is improved, and the obtained information to be processed is ensured to be more in line with the content and the context of the document.
In a specific implementation, the query indexes are performed in the knowledge base according to the first vector to obtain a plurality of pieces of information to be processed, specifically, the first vector and each piece of knowledge base information in the knowledge base are calculated, cosine values of included angles between the first vector and each piece of knowledge base information are compared with cosine similarity to obtain a plurality of first scores, and knowledge base information with corresponding first scores larger than or equal to a preset first threshold value is extracted to obtain a plurality of pieces of information to be processed. The cosine similarity comparison allows finer information filtering, ensures that only knowledge base information relatively similar to document data is extracted, is favorable for reducing mismatching and improving the accuracy of information retrieval, and can realize personalized adjustment of matching degree by setting a preset threshold value, and the threshold value can be flexibly adjusted according to requirements so as to balance the accuracy of retrieval results.
S30, asking questions of the large language model according to the information to be processed to obtain a first response, wherein the first response is an answer output by the large language model according to the first asking templates.
In a specific implementation, the large language model is questioned according to the plurality of pieces of information to be processed to obtain a first response, specifically, a preset template to be processed is filled according to the plurality of pieces of information to be processed and a first problem in the first document data to obtain a first questioning template, wherein the template to be processed is a template frame used for questioning the large language model, the large language model is questioned according to the first questioning template to obtain the first response, and the first response is an answer output by the large language model according to the first questioning template. The preset templates to be processed provide a structuring method for generating the questions, so that the forms of the questions are more standard and easier to process, and consistency and interpretability of the questions are ensured.
In a specific implementation, after the question is asked of the large language model to obtain a first response, the method further comprises the steps of calculating a score of the first response according to the first response and information of each knowledge base to obtain a second score, updating the first response according to the second score until the second score corresponding to the first response is greater than or equal to a preset second threshold value to obtain an updated first response, and adding the updated first response into the knowledge base according to the updated first response to update the knowledge base. By introducing the first evaluation criterion, the quality of the first response can be evaluated more accurately, helping to filter out inaccurate or irrelevant information.
S40, generating a first document according to the first document data, the information to be processed and the first response, wherein the first document comprises a product requirement document and a software requirement document.
The document generation method based on the large language model provided by the embodiment of the invention can obtain information to be processed by inquiring indexes in a knowledge base, can screen relevant contents from a large amount of information, so as to improve the accuracy and efficiency of document generation, can better represent semantic information of a document by extracting features and embedding texts to obtain a first vector, is favorable for improving the accuracy and relevance of the inquiry, can calculate cosine values of included angles between the first vector and information of each knowledge base according to the information in the first vector and the knowledge base to perform cosine similarity comparison, can measure the similarity between the first vector and the knowledge base, effectively filter relevant information, further improve the quality of generated documents, can fill the information to be processed and the first document data, generate a question template, is favorable for more accurately asking the large language model, so as to obtain more targeted response, can be continuously optimized and learned in the operation process by introducing a mechanism of updating the knowledge base according to the scores, can fully reflect the information in the knowledge base after the document is input, can fully analyze the information in the knowledge base, can be more efficiently and fully generate the document by combining the information in the knowledge base with the knowledge base, and the knowledge base can be completely generated by analyzing the information, and the knowledge base can fully can be generated.
Referring to fig. 4, a specific implementation step of a document generating method based on a large language model provided in an embodiment of the present invention may be used to generate a PRS document and an SRS document, including:
s110, storing first document data obtained by analyzing a PRS document or an SRS document into a knowledge base for URS generation and storage data;
S120, performing embedding processing on the first document data, converting the first document data into a vector form to obtain a first vector, performing text embedding on each document information in a knowledge base to obtain a plurality of knowledge base information, comparing the first vector with the plurality of knowledge base information through cosine similarity comparison to obtain a plurality of comparison results, wherein the size interval of the comparison results is a closed interval from zero to one;
S130, after ordering data in a list, selecting the first N pieces of most relevant knowledge base information as background knowledge according to preset first parameters, combining first document data and templates to be processed to obtain a question-answer template, inputting the question-answer template into a large language model to obtain a first response, calculating the score of the first response according to the first response and the knowledge base information to obtain a second score, updating the first response according to the second score until the second score corresponding to the first response is greater than or equal to a preset second threshold value to obtain an updated first response, updating the knowledge base according to the updated first response, and generating a first document according to the first document data, the plurality of pieces of information to be processed and the first response, wherein the first document comprises a product demand document and a software demand document.
According to the large language model-based document generation method, the large language model is utilized to automatically generate the product demand document and the software demand document by analyzing the PRS document and the SRS document, so that the workload of manual document writing is reduced, the large language model-based document generation method can be used for generating the product demand document and the software demand document, and other types of documents can be generated according to actual needs, and the large language model-based document generation method has certain universality and flexibility.
Embodiment two:
referring to FIG. 5, the embodiment of the invention provides a document generation system based on a large language model, which comprises an analysis module 1, an index module 2, a questioning module 3 and a text forming module 4;
The analysis module 1 is used for acquiring document input, analyzing the document input, extracting a first problem and first knowledge, and obtaining first document data.
In a specific implementation, after the analysis module 1 analyzes the document input and extracts a first problem and first knowledge to obtain first document data, the method further comprises the step of updating a knowledge base according to the first document data, wherein the knowledge base is an information resource base, and the knowledge base comprises historical documents and technical document knowledge.
The index module 2 is configured to perform query indexing in a knowledge base according to the first document data, so as to obtain a plurality of pieces of information to be processed.
In a specific implementation, the indexing module further comprises a preprocessing unit 5 and a query unit 6, wherein the preprocessing unit 5 is used for extracting features and embedding texts of first knowledge in the first document data to obtain first vectors, extracting features and embedding texts of the first knowledge in the first document data to obtain first vectors, and the query unit 6 is used for carrying out query indexing in the knowledge base according to the first vectors to obtain a plurality of pieces of information to be processed.
In a specific implementation, the query unit 6 further includes a comparison subunit 7 and an extraction subunit 8, where the comparison subunit 7 is configured to compare the first vector with information of each knowledge base in the knowledge base, calculate cosine values of included angles between the first vector and information of each knowledge base to perform cosine similarity comparison, and obtain a plurality of first scores, and the extraction subunit 8 is configured to extract knowledge base information with a corresponding first score greater than or equal to a preset first threshold, and obtain a plurality of information to be processed.
The questioning module 3 is configured to ask a large language model according to the plurality of pieces of information to be processed, so as to obtain a first response, where the first response is an answer output by the large language model according to a first questioning template.
In a specific implementation, the questioning module 3 further comprises a filling unit 9 and a response unit 10, wherein the filling unit 9 is used for filling a preset template to be processed according to the plurality of pieces of information to be processed and first questions in the first document data to obtain a first questioning template, the template to be processed is a template frame used for questioning a large language model, and the response unit 10 is used for questioning the large language model according to the first questioning template to obtain a first response, wherein the first response is an answer output by the large language model according to the response of the first questioning template.
In a specific implementation, the questioning module 3 further includes a scoring unit 11, a correction unit 12 and an updating unit 13, where the scoring unit 11 is configured to calculate a score of the first response according to the first response and the knowledge base information to obtain a second score, the correction unit 12 is configured to update the first response according to the second score until the second score corresponding to the first response is greater than or equal to a preset second threshold to obtain an updated first response, and the updating unit 13 is configured to add the updated first response to the knowledge base to update the knowledge base.
The text forming module 4 is configured to generate a first document according to the first document data, the plurality of pieces of information to be processed and the first response, where the first document includes a product requirement document and a software requirement document.
Referring to fig. 2, as a detailed description of a large language model-based document generating system, a specific block diagram of the large language model-based document generating system according to an embodiment of the present invention includes a document parsing module 110, a knowledge base 120, a data processing module 130, a knowledge searching module 140, a model outputting module 150, an engineer feedback module 160, and a document generating module 170;
The document analysis module 110 is configured to analyze a word file or a PRS file based on a web page uploading the word file or the PRS file, extract a first problem and a first knowledge, and obtain first document data;
the knowledge base 120 is an information resource base, including history documents and technical document knowledge;
The data processing module 130 is configured to perform feature extraction and text embedding on the first document data to obtain a first vector by performing word segmentation, word removal, word stem extraction, punctuation mark removal, entity recognition and vectorization, where the word removal refers to words that frequently appear in text but do not carry actual meaning, such as, for example, the last three words, and excluding the words, so as to reduce the noise level of the text;
The knowledge search module 140 is configured to perform text embedding on each document information in the knowledge base 120 to obtain a plurality of knowledge base 120 information, calculate cosine values of included angles between the first vector and each knowledge base 120 information in the knowledge base 120 to perform cosine similarity comparison to obtain a plurality of first scores, extract knowledge base 120 information with a corresponding first score greater than or equal to a preset first threshold to obtain a plurality of information to be processed, wherein the knowledge base 120 is an information resource base, and the knowledge base 120 comprises historical documents and technical document knowledge;
The model output module 150 is configured to fill a preset template to be processed according to the plurality of information to be processed and a first problem in the first document data to obtain a first question template, where the template to be processed is a template frame for asking a question of a large language model;
The engineer feedback module 160 is configured to calculate a score of the first response according to the first response and the knowledge base 12 information to obtain a second score, update the first response according to the second score until the second score corresponding to the first response is greater than or equal to a preset second threshold value to obtain an updated first response;
The document generation module 170 is configured to generate a first document according to the first document data, the plurality of pieces of information to be processed, and the first response, where the first document includes a product requirement document and a software requirement document.
Referring to FIG. 3, an embodiment of the present invention provides another specific structure diagram of a document generation system based on a large language model, including a CPU, a GPU, a Main Memory, a Cache, and a Device Memory;
The method comprises the steps of connecting a CPU (Central processing Unit) with a GPU (graphics processing Unit) through PCI Express, performing data interaction between the CPU and a Main Memory through a Cache, performing data interaction between the GPU and a Device Memory through another Cache, and realizing the document generation method based on the large language model by using the other specific structure.
The more detailed step flow and working principle of this embodiment can be, but not limited to, those described in relation to embodiment one.
The document generation system based on the large language model provided by the embodiment of the invention obtains information to be processed by inquiring indexes in a knowledge base, can screen relevant contents from a large amount of information, so as to improve the accuracy and efficiency of document generation, obtains a first vector by carrying out feature extraction and text embedding on first document data, can better represent semantic information of the document, is favorable for improving the accuracy and relevance of the inquiry, calculates cosine values of included angles between the first vector and information of each knowledge base according to the information in the first vector and the knowledge base to carry out cosine similarity comparison, can measure the similarity between the first vector and the knowledge base, effectively filters relevant information, further improves the quality of generated documents, fills the information to be processed and the first document data, generates a question template, is favorable for more accurately asking the large language model, so as to obtain more targeted response, introduces a mechanism for updating the knowledge base according to the scores, can continuously optimize and learn in the operation process, can generate the document according to the first vector and the information in the knowledge base after the first vector is input, can fully analyze the information, can measure the similarity between the information and the knowledge base, can generate the information in the knowledge base and the knowledge base, can fully generate the document by combining the accuracy and the knowledge model, and the full-scale information can be generated.
Embodiment III:
A computer readable storage medium comprising a stored computer program, wherein the computer program when run controls a device in which the computer readable storage medium resides to perform a large language model based document generation method as described in any one of the above.
The computer readable storage medium provided by the embodiment of the invention can obtain information to be processed by inquiring indexes in a knowledge base, screen relevant contents from a large amount of information, so that the accuracy and efficiency of document generation are improved, obtain first vectors by carrying out feature extraction and text embedding on first document data, can better represent semantic information of documents, is favorable for improving the accuracy and relevance of inquiry, calculate cosine values of included angles between the first vectors and information in knowledge bases according to the first vectors and the information in the knowledge bases to carry out cosine similarity comparison, can measure the similarity between the first vectors and the information in the knowledge bases, effectively filter relevant information, further improve the quality of generated documents, fill the information to be processed and the first document data, generate a question template, help question a large language model more accurately, so that a more targeted response is obtained, introduce second vectors and a mechanism for updating the knowledge bases according to the scores, can be continuously optimized and learned in the operation process, update the first document data according to the first document data, keep the information in the knowledge bases, can fully reflect the information in the knowledge bases and the knowledge bases, generate a full language model by combining the accuracy and the knowledge bases, the accuracy and the knowledge model can be generated, the document can be fully improved, the document can be generated, and the document can be completely and completely analyzed by the knowledge base is generated, and the document can be completely generated.
The foregoing embodiments have been provided for the purpose of illustrating the general principles of the present invention, and are not to be construed as limiting the scope of the invention. It should be noted that any modifications, equivalent substitutions, improvements, etc. made by those skilled in the art without departing from the spirit and principles of the present invention are intended to be included in the scope of the present invention.