CN119003535B

Movatterモバイル変換

Info

Publication number: CN119003535B
Application number: CN202411492178.2A
Authority: CN
Inventors: 邵健; 郁力之; 沈永亮; 鲁伟明
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2024-10-24
Filing date: 2024-10-24
Publication date: 2025-02-11
Anticipated expiration: 2044-10-24
Also published as: CN119003535A

Abstract

The invention discloses a question-answering method and a question-answering system based on a general form large model and a data intelligent agent, and belongs to the field of large models. The method comprises the steps of firstly carrying out form instruction fine adjustment on a base large model on a form instruction data set, then carrying out agent fine adjustment on a general form large model subjected to form instruction fine adjustment on a data agent instruction data set, finally inputting a question to be answered, a form file and a prompt word built in advance into the general form large model subjected to agent fine adjustment, directly obtaining a question answer when an agent calling tool is not needed, and carrying out interaction between the general form large model subjected to agent fine adjustment and an agent when the agent calling tool is needed, so as to obtain the question answer. The invention fully utilizes the form processing capability of the base large model, designs the intelligent operation flow aiming at various form tasks, and enhances the universality and generalization capability of the base large model.

Description

Question answering method and system based on general form large model and data intelligent agent

Technical Field

The invention belongs to the field of large models, and particularly relates to a question-answering method and system based on a general form large model and a data intelligent agent.

Background

Structured tables are ubiquitous, and these tables typically exist in the form of spreadsheet files, databases, etc., with structured data often containing important information. Previously, to process structured forms, it was difficult to automate by manually manipulating the software, directly manipulating the form via natural language instructions. The structured form is operated by natural language, so that the related work of each industry becomes efficient, and the method has great significance.

Currently, various deep learning methods are applied to form tasks, and the forms are usually required to be pre-trained or specially designed in a model architecture, and are limited by specific form types or simplified assumptions of the forms and the tasks. The method firstly generally lacks generality, is difficult to solve various problems by a general model, has different effects on various form tasks, has higher cost for later maintenance, and is difficult to land in actual scenes.

In recent years, the advent of large model technology has greatly enhanced the linguistic capabilities of artificial intelligence, allowing conversations to be completed like humans. However, two-dimensional data such as structured tables are different from natural language forms, and are generally difficult to directly understand by a large model, and further intensive research is required.

Disclosure of Invention

The invention aims to solve the problems existing in the prior art and provides a question-answering method and system based on a general form large model and a data intelligent agent.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

In a first aspect, the present invention provides a question-answering method based on a general form big model and a data agent, which includes the following steps:

S1, acquiring and processing open source data sets of different types of table tasks, generating a table instruction data set, and performing table instruction fine adjustment on a base large model on the table instruction data set to obtain a general table large model subjected to table instruction fine adjustment;

s2, generating a data agent instruction data set according to the open source data set of part of the form task, and performing agent fine adjustment on the general form large model subjected to the fine adjustment of the form instruction on the data agent instruction data set to obtain the general form large model subjected to the agent fine adjustment;

S3, inputting the questions to be answered, the form file and the pre-built prompt words into the general form big model after the fine tuning of the agent, outputting the answers to the questions by the general form big model after the fine tuning of the agent when the agent is not required to call the tool, and interacting by the general form big model after the fine tuning of the agent and the agent when the agent is required to call the tool, so as to finally obtain the answers to the questions.

On the basis of the scheme, each step can be realized in the following preferred specific mode.

As a preference of the first aspect, in step S1, the form task is specifically form question-answering, natural language generation SQL, form text generation, data analysis, and form entity alignment.

As a preference of the above first aspect, the specific flow of step S1 is as follows:

S11, processing open source annotation data in the open source data set, and generating form instruction data based on a specific sample format by the processed open source annotation data, wherein the fields of the specific sample format are [ instructions; forms; questions; answers ];

s12, rewriting instructions in the form instruction data by the large language model, and carrying out data enhancement on forms, questions and answers in the form instruction data by the large language model;

s13, generating a table instruction data set from the processed table instruction data, and performing table instruction fine adjustment on the base large model on the table instruction data set through a Lora method to obtain a general table large model subjected to table instruction fine adjustment.

As a preference of the above first aspect, the specific flow of step S2 is as follows:

S21, generating intelligent body track data based on a large language model and a reaction reasoning frame by using a prompt learning method according to an open source data set of part of form tasks;

S22, evaluating whether an answer generated by the large language model is correct, if so, reserving, finally obtaining processed intelligent body track data, if not, filtering, checking whether repeated or non-logic tool calling steps exist in the problem analysis process generated by the large language model, if so, filtering, otherwise, reserving, and finally obtaining the processed intelligent body track data;

S23, generating a data agent instruction data set from the processed agent track data, and performing agent fine adjustment on the general form large model subjected to the fine adjustment of the table instruction on the data agent instruction data set by a Lora method to obtain the general form large model subjected to the agent fine adjustment.

Preferably, in the first aspect, the specific process of generating the intelligent agent track data in step S21 is that the form instruction data is filled into the pre-built reaction sample prompt words, the prompt learning method inputs the filled reaction sample prompt words into the large language model, the large language model generates a problem analysis process, the called tool and parameters, the intelligent agent calls the relevant tool and returns the tool calling result to the large language model, the large language model regenerates the problem analysis process until an answer corresponding to the problem in the form instruction data is generated, and the problem analysis process and the answer generated by the large language model form the intelligent agent track data.

As a preference of the above first aspect, the specific procedure of step S3 is as follows:

s31, inputting a question to be answered, a table file and a pre-constructed prompt word into a general table big model after fine tuning of an agent, and designating a question type by a user, wherein the table file is an Excel file or a database file, and the question type is specifically a table question and answer, a table reasoning, a database question and answer, a table operation and a chart generation;

S32, judging whether an agent calling tool is needed according to the problem type appointed by the user by the agent after the agent is trimmed, if the tool is not required to be called, outputting a problem answer by the agent after the agent is trimmed, if the tool is required to be called, outputting the type of the calling tool and a problem analysis process by the agent after the agent is trimmed, configuring the agent by different tools, calling various tools by the agent based on the agent after the agent is trimmed, returning the tool calling result to the agent after the agent is trimmed, regenerating the problem analysis process by the agent after the agent is trimmed, and finally outputting the problem answer by the agent after the agent is trimmed.

As a preferred feature of the first aspect, in step S32, the agent calling tool includes a Python interpreter, an SQL executor, an Excel table name query tool, a database table name query tool, a line graph generation tool, and a histogram generation tool.

As a preference of the first aspect, LLaMa is used as the base large model.

In a second aspect, the present invention provides a question-answering system based on a generic tabular large model and a data agent, comprising:

the form instruction fine adjustment module is used for acquiring and processing open source data sets of different types of form tasks, generating a form instruction data set, and carrying out form instruction fine adjustment on the base large model on the form instruction data set to obtain a general form large model subjected to form instruction fine adjustment;

The intelligent agent fine adjustment module is used for generating a data intelligent agent instruction data set according to the open source data set of part of the form tasks, and carrying out intelligent agent fine adjustment on the general form large model subjected to the form instruction fine adjustment on the data intelligent agent instruction data set to obtain the general form large model subjected to intelligent agent fine adjustment;

the result acquisition module is used for inputting the questions to be answered, the form file and the pre-built prompt words into the general form big model after the fine tuning of the agent, outputting the answers to the questions by the general form big model after the fine tuning of the agent when the agent calling tool is not needed, and interacting with the general form big model after the fine tuning of the agent when the agent calling tool is needed, so as to finally obtain the answers to the questions.

In a third aspect, the present invention provides a computer electronic device comprising a memory and a processor;

The memory is used for storing a computer program;

the processor is configured to implement the question-answering method based on the generic tabular large model and the data agent according to any one of the aspects of the first aspect when executing the computer program.

Compared with the prior art, the invention has the following beneficial effects:

The large model of the general form can enable the large language model to fully understand the form information and can call related tools to process the form task. Compared with the traditional method, the method solves various problems in the form field by using only one general form large model, has stronger generalization capability and is more close to the ground.

Drawings

FIG. 1is a flow chart of the steps of the method of the present invention;

FIG. 2 is a flow chart of form instruction trimming and agent trimming according to the present invention;

FIG. 3 is a flowchart illustrating a specific process for processing a form task by a data agent according to the present invention;

Fig. 4 is a system block diagram of the present invention.

Detailed Description

In order that the above objects, features and advantages of the invention will be readily understood, a more particular description of the invention will be rendered by reference to the appended drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. The present invention may be embodied in many other forms than described herein and similarly modified by those skilled in the art without departing from the spirit of the invention, whereby the invention is not limited to the specific embodiments disclosed below. The technical features of the embodiments of the invention can be combined correspondingly on the premise of no mutual conflict.

In the description of the present invention, it should be understood that the terms "first" and "second" are used solely for the purpose of distinguishing between the descriptions and not necessarily for the purpose of indicating or implying a relative importance or implicitly indicating the number of features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature.

As shown in FIG. 1, in a preferred implementation manner of the present invention, the question-answering method based on the general form large model and the data agent mainly has three subtasks, namely form instruction fine tuning, agent fine tuning, and data agent processing form tasks, including the following steps S1-S3. The following describes the specific implementation procedure.

1. Form instruction trimming

S1, acquiring and processing open source data sets of different types of table tasks, generating a table instruction data set, and performing table instruction fine adjustment on the base large model on the table instruction data set to obtain a general table large model subjected to table instruction fine adjustment.

It should be noted that, in step S1 of the embodiment of the present invention, the form task is mainly form question-answering, natural language generation SQL, form text generation, data analysis, form entity alignment, and the like.

It should be noted that, in step S1 of the embodiment of the present invention, the open source data set includes Fetaqa, spider, dart, totto, hybridQA, tabfact, wikitableqa, ds-1000, bird-sql, TURL, SQA, and the like. As shown in fig. 2, the specific flow of step S1 of the present invention is as follows:

In step S12 of the embodiment of the present invention, the process of writing the instruction and enhancing the data of the table, the question and the answer is completed by using the large model such as GPT-4, which has the advantages of improving the diversity of the instruction data of the table and enhancing the generalization capability of the large model of the general table.

In step S13 of the embodiment of the present invention, LLaMa is specifically used as the base large model, and after the Lora trimming is performed on the base large model, a general form large model after the form instruction trimming (that is, the general form large model v1 in fig. 2) is obtained.

2. Intelligent body fine tuning

S2, generating a data agent instruction data set according to the open source data set of part of the form task, and performing agent fine adjustment on the general form large model subjected to the fine adjustment of the form instruction on the data agent instruction data set, so that the general form large model obtains tool calling capability, and a general form large model subjected to the agent fine adjustment is obtained.

As shown in fig. 2, the specific flow of step S2 of the present invention is as follows:

s21, generating intelligent body track data based on a large language model and a reaction reasoning frame by using a prompt learning method according to an open source data set of part of the table tasks.

It should be noted that, the exact inference framework mentioned in step S21 of the present invention is an inference framework based on less sample learning and context learning, and the core is to write some question-answer examples, and its implementation manner belongs to the prior art. In the practice reasoning framework, "thinking", "actions" and "observation" are the core operational steps inside the framework for efficiently performing the reasoning tasks of the large-scale model.

In step S21 of the embodiment of the present invention, a large model such as GPT-4 is specifically used as the large language model.

In step S21 of the embodiment of the present invention, the specific manner of generating the agent trajectory data is that the form instruction data is filled into the pre-built React sample prompt word, the prompt learning method inputs the filled React sample prompt word into the large language model, so that the large language model generates the problem analysis process and the invoked tool and parameter, the agent invokes the relevant tool and returns the tool invoking result to the large language model, the problem analysis process is regenerated by the large language model until the answer corresponding to the problem in the form instruction data is generated, and the problem analysis process and the answer generated by the large language model form the agent trajectory data.

S23, generating a data agent instruction data set from the processed agent track data, and performing agent fine adjustment on the general form large model subjected to the fine adjustment of the table instruction on the data agent instruction data set by a Lora method to obtain a general form large model subjected to the agent fine adjustment (namely a general form large model v2 in FIG. 2).

3. Data agent processing form tasks

As shown in fig. 3, the specific flow of step S3 of the present invention is as follows:

s32, judging whether an agent calling tool is needed according to the problem type appointed by a user by the agent after the agent is trimmed, if the agent calling tool is not needed, outputting a problem answer by the agent after the agent is trimmed, if the agent calling tool is needed, outputting the type of the calling tool and the problem analysis process by the agent after the agent is trimmed, configuring the agent by different tools, calling various tools by the agent based on the agent after the agent is trimmed, returning the calling result of the tool to the agent after the agent is trimmed, regenerating the problem analysis process by the agent after the agent is trimmed, and finally outputting the problem answer by the agent after the agent is trimmed, wherein the calling tool comprises a Python interpreter, an SQL executor, an Excel table list name query tool, a database table list name query tool, a line drawing generation tool and a histogram generation tool.

In order to better demonstrate the specific implementation and technical effects of the invention, the question-answering method based on the general form large model and the data agent shown in the steps S1 to S3 in the preferred implementation mode is applied to a specific example.

Examples

The specific implementation process of the question-answering method based on the general form big model and the data intelligent agent adopted in the embodiment is as described above, and will not be described again.

In this embodiment, an example of solving the form reasoning by the data agent is specifically given as follows, for the question of what is the "legal status" patent under examination. The mentioned table is patent information xlsx.

1. Prompt word generation

The question and form path will be filled into the prompt word template to generate complete prompt words, which are provided to the general form big model as follows:

"please try to answer the following questions. You can use the following tools, python_reply (code: str) - > str-a Python interpreter. To execute Python commands. The input should be a valid Python command. If you want to see the output of a certain value you should print out using print (..). QueryColumns (file_path: str) - > str-query columns in Excel file. The following format was used:

Questions input questions you have to answer

Thinking-you should always consider what to do

Action to be taken is one of [ python_reply, queryColumns ]

Action input-input of action

Observation of results of actions

.. (This thinking/action input/observation can be repeated N times)

Thinking i now know the final answer

Final answer to original input question

Start-!

The problem is to solve a question-answering task according to a given data file.

Data File Path patent information xlsx

Question is how much is there a patent with legal status "under examination?

2. Data agent reasoning

According to the prompt words, the general form big model outputs < think > "I need to query the relevant column names in the form. "then output < Action >" Action: queryColumns Action Input: "patent info. Xlsx" ", where Action is the tool name, action Input is the tool parameter, and the tool is the view Excel table list name.

The tool is executed to obtain < view > "[ 'serial number', 'patent name', 'patent type', 'legal status', 'application number', 'application date', 'publication (bulletin) number', 'publication (bulletin) date' ]", which is the table column name.

Continuing the iteration, generate < thought > "I need to use the 'legal status' column to calculate the number of companies under review. ", output < Action >" Action: python_reply

Action Input:

import pandas as pd

Data=pd.read_excel ("patent information xlsx")

Count=data [ data [ 'legal status' ] = 'in-trial' ] [ 'serial number' ]. Count ()

print(count)”

Wherein the python_reply tool is called to execute the code. Obtain < observe > "4".

Continuing the iteration, generate < thought > "I now know that the legal status in the data file is the number of companies under review. The legal status of "get < final answer >" is that there are 4 companies under examination. "

In summary, when the data agent solves the problem of table reasoning, the data agent finally solves the problem by generating thinking, calling a tool, observing the result and calling a general large model of the table.

In addition, the question answering method based on the general form large model and the data agent in the above embodiment may be essentially executed by a computer program or a module. Therefore, based on the same inventive concept, another preferred embodiment of the present invention further provides a question-answering system based on a general form big model and a data agent, corresponding to the question-answering method based on a general form big model and a data agent provided in the above embodiment, as shown in fig. 4, which includes:

The memory is used for storing a computer program;

The processor is configured to implement the question answering method based on the generic tabular large model and the data agent in the above embodiment when executing the computer program.

Further, the logic instructions in the memory described above may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention.

It is to be appreciated that the processor described above can be a general purpose processor including a central Processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc., a digital signal processor (DIGITAL SIGNAL Processing, DSP), application Specific Integrated Circuit (ASIC), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc.

It should be further noted that, for convenience and brevity of description, specific working processes of the system described above may refer to corresponding processes in the foregoing method embodiments, which are not described herein again. In the embodiments of the present application, the division of steps or modules in the system and the method is only one logic function division, and other division manners may be implemented in actual implementation, for example, multiple modules or steps may be combined or may be integrated together, and one module or step may also be split.

The above embodiment is only a preferred embodiment of the present invention, but it is not intended to limit the present invention. Various changes and modifications may be made by one of ordinary skill in the pertinent art without departing from the spirit and scope of the present invention. Therefore, all the technical schemes obtained by adopting the equivalent substitution or equivalent transformation are within the protection scope of the invention.

Claims

1. The question-answering method based on the general form big model and the data agent is characterized by comprising the following steps:

S3, inputting the questions to be answered, the form file and the pre-built prompt words into the general form big model after the fine tuning of the agent, outputting the answers to the questions by the general form big model after the fine tuning of the agent when the agent is not required to call the tool, and interacting by the general form big model after the fine tuning of the agent and the agent when the agent is required to call the tool, so as to finally obtain the answers to the questions;

in step S1, the form task is specifically form question-answering, natural language generation SQL, form text generation, data analysis and form entity alignment;

the specific flow of step S1 is as follows:

s13, generating a table instruction data set from the processed table instruction data, and performing table instruction fine adjustment on the base large model on the table instruction data set through a Lora method to obtain a general table large model subjected to table instruction fine adjustment;

the specific flow of step S2 is as follows:

2. The method for question answering based on the general form large model and the data agent according to claim 1, wherein in step S21, the specific process of generating the agent track data is as follows, form instruction data is filled into pre-built exact few sample prompt words, the prompt learning method is used for inputting the filled exact few sample prompt words into the large language model, the large language model generates a question analyzing process and invoked tools and parameters, the agent invokes relevant tools and returns the tool invoking result to the large language model, the large language model regenerates the question analyzing process until answers corresponding to the questions in the form instruction data are generated, and the question analyzing process and the answers generated by the large language model form the agent track data.

3. The question-answering method based on the general form big model and the data agent as claimed in claim 1, wherein the specific process of step S3 is as follows:

4. The method for question-answering based on the large model of the universal form and the data agent according to claim 3, wherein in step S32, the agent calling tools include Python interpreter, SQL executor, excel form name query tool, database form name query tool, line drawing generation tool and histogram generation tool.

5. The method for question answering based on the large model of the universal form and the data agent of claim 1, the base large model is characterized in that LLaMa is adopted.

6. A question-answering system based on a general form big model and a data agent, comprising:

The result acquisition module is used for inputting the questions to be generated with the form file and the pre-built prompt words into the general form big model after the fine adjustment of the intelligent agent, outputting the questions and the answers by the general form big model after the fine adjustment of the intelligent agent when the intelligent agent is not required to call the tool, and interacting by the general form big model after the fine adjustment of the intelligent agent and the intelligent agent when the intelligent agent is required to call the tool, so as to finally obtain the questions and the answers;

In the form instruction fine tuning module, the form tasks are specifically form question-answering, natural language generation SQL, form text generation, data analysis and form entity alignment;

the specific flow in the form instruction fine tuning module is as follows:

The specific flow in the agent fine tuning module is as follows:

7. A computer electronic device comprising a memory and a processor;

The memory is used for storing a computer program;

The processor is configured to implement the question-answering method based on the universal form large model and the data agent according to any one of claims 1 to 5 when executing the computer program.