CN118332120A

Movatterモバイル変換

Info

Publication number: CN118332120A
Application number: CN202410482023.4A
Authority: CN
Inventors: 赵昱杰
Original assignee: CETC 10 Research Institute
Current assignee: CETC 10 Research Institute
Priority date: 2024-04-22
Filing date: 2024-04-22
Publication date: 2024-07-12

Abstract

Translated fromChinese

本发明公开了一种结合大语言模型和Agent思想的任务动态规划方法，属于人工智能领域，包括步骤：步骤1，收集开放域或者垂直域的场景对话内容，对现实世界的自然交互语言利用当下已有场景对话数据，进行思维空间网络的抽象与构建；步骤2，针对思维空间网络中每个节点处，结合Agent思想设计任务输出函数，用于当前节点的任务拆解与素材数据整合等。本发明解决了大语言模型普遍存在的输出结果不可控、提示工程受限于大语言模型的上下文token限制的技术问题。

The present invention discloses a task dynamic planning method combining a large language model and Agent thinking, which belongs to the field of artificial intelligence and includes the following steps: Step 1, collecting scene dialogue content of an open domain or a vertical domain, and using the existing scene dialogue data of the real world's natural interactive language to abstract and construct a thinking space network; Step 2, for each node in the thinking space network, combining Agent thinking to design a task output function, which is used for task disassembly and material data integration of the current node. The present invention solves the technical problems that the output results of large language models are generally uncontrollable and the prompt engineering is limited by the context token restrictions of the large language model.

Description

Translated fromChinese

结合大语言模型和Agent思想的任务动态规划方法A task dynamic planning method combining large language model and agent thinking

技术领域Technical Field

本发明涉及人工智能领域，更为具体的，涉及一种结合大语言模型和Agent思想的任务动态规划方法。The present invention relates to the field of artificial intelligence, and more specifically, to a task dynamic programming method combining a large language model and Agent ideas.

背景技术Background technique

由于大模型依附于云计算实现模型迭代升级，而大模型算力的提升方式包含直接囤积GPU以及打造场景专属DSA(Domain Specific Architecture)架构芯片。国内大模型开发过程中技术创新不断涌现以实现模型规模缩小、模型性能提升。针对大模型训练过程中的效率问题，研究人员提出了多种优化算法和并行计算技术，显著提高了训练速度和效率。例如，百度发布的分别基于自研昆仑芯以及华为昇腾打造的两款AI实例，升级AI异构计算平台百舸3.0，适配国内外主流AI芯片等举措极大推动了有效训练时间的进步。同时，针对大模型的推理速度问题，诸如模型压缩以及模型蒸馏技术的出现实现了减小模型大小和计算需求的目标，从而提高了模型的部署速度和实时性能。类似技术将不断互相补充，相关技术创新将进一步促进大语言模型发展。Since large models rely on cloud computing to achieve model iteration and upgrade, the ways to improve the computing power of large models include directly hoarding GPUs and creating scene-specific DSA (Domain Specific Architecture) architecture chips. In the process of domestic large model development, technological innovations continue to emerge to reduce the size of models and improve model performance. In response to the efficiency issues in the training process of large models, researchers have proposed a variety of optimization algorithms and parallel computing technologies, which significantly improved the training speed and efficiency. For example, Baidu released two AI instances based on its own Kunlun core and Huawei Ascend, upgraded the AI heterogeneous computing platform Baige 3.0, and adapted to mainstream AI chips at home and abroad, which greatly promoted the improvement of effective training time. At the same time, in response to the reasoning speed of large models, the emergence of technologies such as model compression and model distillation has achieved the goal of reducing model size and computing requirements, thereby improving the deployment speed and real-time performance of the model. Similar technologies will continue to complement each other, and related technological innovations will further promote the development of large language models.

近期，一份斯坦福大学李飞飞团队2024年开年巨作《AGENT AI:SURVEYING THEHORIZONS OF MULTIMODAL INTERACTION》相关Agent最新综述表示，Agent AI(智能体AI)技术取得了显著进展，并且其应用于实体系统中开启了以更沉浸式、动态且引人入胜的体验与代理人进行交互的新可能性。为了加快进程并简化Agent AI开发中的繁琐工作，研究者们正在计划开发下一代AI赋能的智能体交互流程。研究者们正在开发一个人机协作系统，人和机器可以在其中有意义地交流和互动。该系统可以利用LLM(语言模型)或VLM(视觉语言模型)的对话能力和广泛的应对行为，与人类玩家进行交谈并识别人类的需求。然后，它将根据请求来执行适当的行动以帮助人工玩家。Recently, a review of the latest Agent related to "AGENT AI: SURVEYING THE HORIZONS OF MULTIMODAL INTERACTION", a masterpiece of Stanford University's Fei-Fei Li team in 2024, stated that Agent AI technology has made significant progress, and its application in physical systems has opened up new possibilities for interacting with agents in a more immersive, dynamic and engaging experience. In order to speed up the process and simplify the tedious work in Agent AI development, researchers are planning to develop the next generation of AI-enabled agent interaction processes. Researchers are developing a human-machine collaborative system in which people and machines can communicate and interact meaningfully. The system can use the conversational capabilities and a wide range of response behaviors of LLM (language model) or VLM (visual language model) to talk with human players and identify human needs. It will then perform appropriate actions based on the request to help the artificial player.

当为人机协作系统服务时，LLM/VLM常常扮演一个黑匣子的角色，产生无法预测的输出。这种不确定性在实体设备，如实际操作机器人的情况下可能变得非常重要。解决这个问题的一种方法是通过提示工程把LLM/VLM的焦点限制在一定范围内。例如，在从指示进行机器人任务计划时，将环境信息包含在内的提示已被报道能比仅依赖文本产生更稳定的输出。这个观点得到了Minsky的AI框架理论的支持，该理论认为LLM/VLM需要解决的问题空间是由所给的提示定义的。另一种方法是设计出能让LLM/LM包含解释性文本的提示，以让用户理解模型的关注焦点或识别内容。此外，在人类的指导下加入一个能用于执行前的验证和修改的更高层次，可以使得在此类指导下工作的系统的操作更为便利。When serving human-robot collaborative systems, LLM/VLM often acts as a black box, producing unpredictable outputs. This uncertainty can become very important in the case of physical devices, such as actual operating robots. One way to address this problem is to constrain the focus of LLM/VLM through prompt engineering. For example, when planning a robot task from instructions, prompts that include environmental information have been reported to produce more stable outputs than relying solely on text. This view is supported by Minsky's AI framework theory, which states that the problem space that LLM/VLM needs to solve is defined by the prompts given. Another approach is to design prompts that allow LLM/LM to include explanatory text to let users understand the focus of the model or what it recognizes. In addition, adding a higher level of human guidance that can be used for pre-execution verification and modification can make it easier to operate systems working under such guidance.

在以ChatGPT开发的机器人教学系统中，这种方法显得尤为有效。该系统的工作流程包括了三个步骤：任务规划，其中ChatGPT从指示和环境信息中计划机器人的任务；示范，用户将动作序列以视觉形式展示出来。所有步骤都要经过用户审查，如果有任何步骤失败或有不足，都可以根据需要回顾前面的步骤。此外，还有一个网络应用，用户可以上传演示数据或有不足，都可以根据需要回顾前面的步骤。此外，还有一个网络应用，用户可以上传演示数据，且可以实现用户与ChatGPT间的互动。This approach is particularly effective in a robot teaching system developed with ChatGPT. The system's workflow includes three steps: task planning, in which ChatGPT plans the robot's tasks from instructions and environmental information; demonstration, in which the user presents the action sequence in a visual form. All steps are subject to user review, and if any step fails or is insufficient, the previous steps can be reviewed as needed. In addition, there is a web application where users can upload demonstration data or review the previous steps as needed. In addition, there is a web application where users can upload demonstration data and enable interaction between users and ChatGPT.

传统的意图分类方法有贝叶斯分类方法、决策树方法、KNN方法、支撑向量机SVM、神经网络方法、LIST VotedClassfication方法等。这些方法除决策树方法外，追求的是较高的意图分类精度，却很难抽取出使人易于理解的意图分类规则，规则抽取也是意图分类中的一个难题，虽然也有基于规则抽取的意图分类技术，但这种分类方法抽取易于理解的分类规则依然是困难的。例如，基于粗糙集的文本分类规则抽取方法就存在着明显的缺陷：决策表十分庞大，因而离散化和基于粗糙集的属性约简的工作量巨大；若分类规则包含特征项具有实型权值，则规则不易理解，并且分类时不能直接利用，从而缺少决策树分类方法具有的出色数据分析效率，这是其他方法无法比拟的优势；但是决策树也存在着弱点：决策树方法在意图特征维数过高、数据量过大时建立决策树需要消耗大量时间且分类精度降低，而且类别过多时容易出错。目前意图分类过程中最常用的是用BoW和TF-IDF模型进行文本表征。最近的一些研究依托大语言模型的强大语言理解能力，采用模型蒸馏、模型量化等技术进行意图分类模型微调，以满足复杂领域场景的意图分类问题。由此训练出的意图分类模型，仍然没有摆脱硬件资源和高质量语料数据集这两大难题的约束，导致很难在复杂多源化的现实应用场景落地。Traditional intent classification methods include Bayesian classification method, decision tree method, KNN method, support vector machine (SVM), neural network method, LIST Voted Classification method, etc. Except for the decision tree method, these methods pursue higher intent classification accuracy, but it is difficult to extract intent classification rules that are easy to understand. Rule extraction is also a difficult problem in intent classification. Although there are also intent classification technologies based on rule extraction, it is still difficult for this classification method to extract easy-to-understand classification rules. For example, the text classification rule extraction method based on rough sets has obvious defects: the decision table is very large, so the workload of discretization and attribute reduction based on rough sets is huge; if the classification rule contains feature items with real weights, the rule is not easy to understand and cannot be directly used during classification, thus lacking the excellent data analysis efficiency of the decision tree classification method, which is an advantage that other methods cannot match; however, the decision tree also has weaknesses: when the dimension of the intent feature is too high and the amount of data is too large, it takes a lot of time to build a decision tree and the classification accuracy is reduced, and it is easy to make mistakes when there are too many categories. Currently, the most commonly used methods in the intent classification process are to use BoW and TF-IDF models for text representation. Relying on the powerful language understanding capabilities of large language models, some recent studies have used model distillation, model quantization and other technologies to fine-tune intent classification models to meet the intent classification problems in complex domain scenarios. The intent classification models trained in this way are still constrained by the two major problems of hardware resources and high-quality corpus datasets, making it difficult to implement them in complex and multi-source real-world application scenarios.

发明内容Summary of the invention

本发明的目的在于克服现有技术的不足，提供一种结合大语言模型和Agent思想的任务动态规划方法，针对大语言模型普遍存在的输出结果不可控、提示工程(PromptEngineering)受限于大语言模型的上下文token限制的问题，提出了一种结合大语言模型和Agent思想的任务动态规划方法解决方案。The purpose of the present invention is to overcome the shortcomings of the prior art and provide a task dynamic planning method combining a large language model and Agent thinking. Aiming at the common problems of uncontrollable output results of large language models and prompt engineering being limited by the context token restrictions of large language models, a task dynamic planning method solution combining a large language model and Agent thinking is proposed.

本发明的目的是通过以下方案实现的：The object of the present invention is achieved through the following solutions:

一种结合大语言模型和Agent思想的任务动态规划方法，包括如下步骤：A task dynamic planning method combining a large language model and agent thinking includes the following steps:

S1，收集开放域或者垂直域的场景对话内容，对现实世界的自然交互语言利用当下已有场景对话数据，进行思维空间网络的抽象与构建；S1, collects scene dialogue content in open domains or vertical domains, and uses existing scene dialogue data to abstract and construct the thinking space network for the natural interactive language in the real world;

S2，针对思维空间网络中每个节点处，结合Agent思想设计任务输出函数，用于当前节点的任务拆解与素材数据整合；S2, for each node in the thinking space network, combine the agent idea to design the task output function, which is used for task decomposition and material data integration of the current node;

在思维空间网络中每一任务输出函数内部，结合提示工程编写提示词说明书，针对当下任务设计任务拆解相关提示词，依托大语言模型自身理解能力，给出下一步任务节点的明确函数，进而递归进入更深一层的思维层次，直到大语言模型认为已找到完美答案为止；In each task output function in the thinking space network, a prompt word manual is compiled in combination with the prompt engineering, and task-decomposition related prompt words are designed for the current task. Relying on the understanding ability of the large language model itself, a clear function of the next task node is given, and then recursively enters a deeper level of thinking until the large language model believes that the perfect answer has been found;

在思维空间网络中每一任务输出函数内部，提示工程设计基于随机游走算法、惯性思维和随机思维相结合，动态形成不同逻辑的提示词，输出任务动态规划结果；In each task output function in the thinking space network, the prompt engineering design is based on the random walk algorithm, inertial thinking and random thinking, dynamically forming prompt words with different logics and outputting the task dynamic planning results;

在思维空间网络中每一任务输出函数内部，针对每个任务函数经任务拆分递归进入下一层任务拆分的过程，采用及时止损、动态回溯调整的策略，防止任务规划在动态拆分任务过程中，陷入死循环情况。In each task output function in the thinking space network, for the process of each task function recursively entering the next level of task splitting after task splitting, a strategy of timely stop loss and dynamic backtracking adjustment is adopted to prevent task planning from falling into an infinite loop during the dynamic task splitting process.

进一步地，在步骤S1中，所述收集开放域或者垂直域的场景对话内容，对现实世界的自然交互语言利用当下已有场景对话数据，进行思维空间网络的抽象与构建，具体包括如下子步骤：Furthermore, in step S1, the scene dialogue content of the open domain or vertical domain is collected, and the natural interactive language of the real world is used to abstract and construct the thinking space network by using the existing scene dialogue data, which specifically includes the following sub-steps:

当面对简单场景时，基于选定垂直领域数据集，采集垂直领域用户问答场景记录，归纳总结分为12大类问题，基于12大类进行思维空间网络建模，形成初代版本思维空间网络，这个初代版本思维空间网络由一个根节点即入口节点和12个叶节点即12大类问题组成，其中根节点负责对接人机交互的自然语言，叶节点即问答场景中针对问题依托思维空间网络产出答案的节点，表示为：When faced with simple scenarios, based on the selected vertical field data set, vertical field user question and answer scenario records are collected and summarized into 12 major categories of questions. Based on the 12 major categories, the thinking space network modeling is carried out to form the first-generation version of the thinking space network. This first-generation version of the thinking space network consists of a root node, i.e., the entry node, and 12 leaf nodes, i.e., 12 major categories of questions. The root node is responsible for connecting the natural language of human-computer interaction, and the leaf node is the node that outputs the answer to the question in the question and answer scenario based on the thinking space network, which is expressed as:

T＝{t₀，t₁₁，t₁₂…t_ij}；i＝1，1≤j≤12；T＝{t₀ , t₁₁ , t₁₂ . . . t_ij }; i＝1, 1≤j≤12;

其中，T表示思维空间网络，t₀表示入口节点，t_ij表示叶节点，即答案输出节点。Among them, T represents the thinking space network,_t0 represents the entry node, and_tij represents the leaf node, that is, the answer output node.

进一步地，在步骤S1中，所述收集开放域或者垂直域的场景对话内容，对现实世界的自然交互语言利用当下已有场景对话数据，进行思维空间网络的抽象与构建，具体包括如下子步骤：当面对复杂场景时，按照如下子步骤构建思维空间网络：Furthermore, in step S1, the scene dialogue content of the open domain or vertical domain is collected, and the natural interactive language of the real world is used to abstract and construct the mind space network using the existing scene dialogue data. Specifically, the following sub-steps are included: When facing a complex scene, the mind space network is constructed according to the following sub-steps:

构建中间节点即任务动态规划节点，中间节点设计互通连线，其中任意一个任务动态规划节点的函数名称设计规范如下：Construct the intermediate node, i.e. the task dynamic planning node. The intermediate nodes are designed to communicate with each other. The function name design specification of any task dynamic planning node is as follows:

TaskName_*()(*∈{A，B，C…Z})TaskName_*()(*∈{A，B，C…Z})

其中，TaskName为本任务动态规划节点的任务名称表示，*表示A到Z的大写字母选项，同时限定了每个任务动态规划节点的任务输出候选集在这26个字母中产生，分类问题不会超过26类，若遇到超过情况，则再次进行抽象细分类，将每层任务动态规划节点的分类问题控制在26类以内。Among them, TaskName is the task name of the dynamic programming node of this task, * represents the uppercase letter options from A to Z, and at the same time limits the task output candidate set of each task dynamic programming node to be generated from these 26 letters. The classification problem will not exceed 26 categories. If it exceeds the situation, abstract sub-classification will be performed again to control the classification problem of each layer of task dynamic programming node within 26 categories.

进一步地，在步骤S2中，所述针对思维空间网络中每个节点处，结合Agent思想设计任务动态规划输出函数，用于当前节点的任务拆解与素材数据整合，具体包括如下子步骤：在所述思维空间网络的每一个任务动态规划函数内部，通过编写严格提示词，用于对当前任务进行动态任务规划和可信度把控，目标是经过当前节点找到下一个任务动态规划节点，以搜寻满足当前任务的可信全模态素材，从而控制大语言模型的输出朝着可信可控的方向搜寻问题佐证。Furthermore, in step S2, for each node in the thinking space network, the task dynamic planning output function is designed in combination with the Agent concept, which is used for task decomposition and material data integration of the current node, and specifically includes the following sub-steps: inside each task dynamic planning function of the thinking space network, strict prompt words are written to perform dynamic task planning and credibility control on the current task, with the goal of finding the next task dynamic planning node through the current node to search for credible full-modal materials that meet the current task, thereby controlling the output of the large language model to search for problem evidence in a credible and controllable direction.

进一步地，所述严格提示词的格式包括：Furthermore, the format of the strict prompt word includes:

Prompt＝“你是一个TaskName分类器，请将{query}分类到以下类别中；选项：\nA.询问隐私，背景信息\nB.询问天气情况\n...Z.依据年龄询问饮食配方\n注意：只需输出选项编号，无需其他文字输出；\n”Prompt = "You are a TaskName classifier, please classify {query} into the following categories; options: \nA. Ask for privacy, background information\nB. Ask for weather conditions\n...Z. Ask for dietary recipes based on age\nNote: Only the option number needs to be output, no other text output is required; \n"

其中，Prompt为严格提示词，query为用户问题。Among them, Prompt is a strict prompt word and query is a user question.

进一步地，所述提示工程设计基于随机游走算法、惯性思维和随机思维相结合，动态形成不同逻辑的提示词，输出任务动态规划结果，具体包括如下子步骤：Furthermore, the prompt engineering design is based on a combination of random walk algorithm, inertial thinking and random thinking, dynamically forms prompt words with different logics, and outputs task dynamic planning results, specifically including the following sub-steps:

S21，设计如下惯性思维模式提示工程：S21, design the following inertial thinking mode prompt project:

Inertial_Thinking＝"你是一个TaskName分类器，请将{query}分类到以下类别中；选项：\nA.询问隐私，背景信息，选择工具[findName_A，findPhoneNumber_S，findBackGround_B]\n B.询问天气情况，选择工具[xxx_A，xxx_S，xxx_B]\n...Z.依据年龄询问饮食配方，选择工具[xxx_S，xxx_H]\nInertial_Thinking="You are a TaskName classifier, please classify {query} into the following categories; options:\nA. Ask for privacy, background information, select tools [findName_A, findPhoneNumber_S, findBackGround_B]\n B. Ask for weather conditions, select tools [xxx_A, xxx_S, xxx_B]\n...Z. Ask for dietary formula based on age, select tools [xxx_S, xxx_H]\n

只需输出工具名称集合的JSON数据格式，如下所示：\nJust output the JSON data format of the tool name collection, as follows:\n

{"func":["findName_A"，"findPhoneNumber_S"，"findBackGround_B"]}\n无需输出除JSON格式外的其他文字信息\n"{"func":["findName_A","findPhoneNumber_S","findBackGround_B"]}\nNo need to output other text information except JSON format\n"

其中，findName、findPhoneNumber和findBackGround均为TaskName的英文表示，[findName_A，findPhoneNumber_S，findBackGround_B]表示依次执行这三个中间节点，即可完成当前场景问答，并在当前节点进行素材整合，生产答案内容；Among them, findName, findPhoneNumber and findBackGround are all English expressions of TaskName. [findName_A, findPhoneNumber_S, findBackGround_B] means that the three intermediate nodes can be executed in sequence to complete the current scenario question and answer, and the materials are integrated at the current node to produce the answer content;

设计如下随机思维模式提示工程：Design the following random thinking pattern prompt project:

Random_Thinking＝"你是一个TaskName分类器，请将{query}分类到以下类别中；选项：\nA.询问隐私，背景信息，可选择工具集[findName_A，findPhoneNumber_S，findBackGround_B]\n B.询问天气情况，可选择工具集为[xxx_A，xxx_S，xxx_B]\n...Z.依据年龄询问饮食配方，可选择工具集为[xxx_S，xxx_H]\nRandom_Thinking="You are a TaskName classifier, please classify {query} into the following categories; options:\nA. Ask for privacy, background information, select the tool set [findName_A, findPhoneNumber_S, findBackGround_B]\n B. Ask for weather conditions, select the tool set [xxx_A, xxx_S, xxx_B]\n...Z. Ask for dietary formula based on age, select the tool set [xxx_S, xxx_H]\n

任何一个选项，在可选工具集中任意选出一个，且只能选择一个工具，作为本次任务动态规划输出\nAny option, select any one of the optional tools, and only one tool can be selected as the dynamic planning output of this task\n

{"func":["findName_A"]}\n{"func":["findName_A"]}\n

无需输出除JSON格式外的其他文字信息\nNo need to output other text information except JSON format\n

如果遇到xxx特殊场景，同时选择A选项工具集中的findName_A，和Z选项中的xxx_S作为本次任务动态规划的结果；输出格式为：\nIf you encounter a special scenario like xxx, select findName_A in the A option tool set and xxx_S in the Z option as the result of the dynamic planning of this task; the output format is:\n

{"func":["findName_A"，"xxx_S"]}\n"{"func":["findName_A","xxx_S"]}\n"

其中，xxx特殊场景即在设计当前任务动态规划函数节点过程中遇到的特殊场景，均可通过这样的规范约束方式，添加到注意选项中去；Among them, xxx special scenarios are special scenarios encountered in the process of designing the dynamic programming function node of the current task, which can be added to the attention options through such a standard constraint method;

S22，利用神经网络中的单神经元架构，逻辑回归函数来计算当前动态任务规划需要选择的思维模式，公式如下：S22 uses the single neuron architecture in the neural network and the logistic regression function to calculate the thinking mode that needs to be selected for the current dynamic task planning. The formula is as follows:

其中，y是多元线性回归函数，公式如下：Among them, y is the multiple linear regression function, the formula is as follows:

y＝w₀，₀+w_0，1x₀，₁+w_0，2x₀，₂+…+w_i，jx_i.j；y＝w₀ ，₀ +w_0，1 x₀ ，₁ +w_0，2 x₀ ，₂ + … + w_i，j x_ij ;

其中，w_i，j为回归系数，i，j均是函数名称，x_i，j表示从函数i跳转到函数j的概率；其中，x_i，j的计算过程采用如下公式：Where w_i,j is the regression coefficient, i,j are both function names, and x_i,j represents the probability of jumping from function i to function j; the calculation process of x_i,j uses the following formula:

设在当前任务动态规划函数中给当前中间节点u进行动态任务推荐，需计算所有中间节点对于当前节点u的相关度，该公式表示从当前j节点出发，开始随机游走，每到一个节点都以1-d的概率停止游走并从u从新开始，或者以d的概率继续游走，从当前节点指向的节点中按照均匀分布随机选择一个节点往下游走，这样经过多轮游走后，每个与当前节点相关的中间节点被访问的概率也会收敛趋于稳定，此时从概率上计算出由节点j走到节点i的x_j，i概率；Assume that in the current task dynamic programming function, dynamic task recommendation is performed for the current intermediate node u. The relevance of all intermediate nodes to the current node u needs to be calculated. The formula indicates that starting from the current node j, random walking begins. Each time a node is reached, the walking stops with a probability of 1-d and starts again from u, or continues walking with a probability of d. A node is randomly selected from the nodes pointed to by the current node according to a uniform distribution and walked downstream. After multiple rounds of walking, the probability of each intermediate node related to the current node being visited will converge and stabilize. At this time, the probability of xj_,i walking from node j to node i is calculated from the probability.

关于w_i，j多元线性回归系数求解，具体采用最小化均方误差MSE，即sum(y_actual-y_estimate)^2，通过梯度下降等方法来优化参数w_i，j使得MSE最小；w_i，j初始数值为随机初始化0到1范围内参数，后期经过人类反馈结果，以踩、赞的行为方式回溯思维路径，反向传播动态更新w_i，j权重；Regarding the solution of the multivariate linear regression coefficients of_wi,j , the method of minimizing the mean square error (MSE) is used, i.e. sum(y_actual -y_estimate)^2. The parameters wi_,j are optimized by gradient descent and other methods to minimize the MSE. The initial values of_wi,j are randomly initialized to a range of 0 to 1. Later, after human feedback, the thinking path is traced back by the behavior of thumbs-up and thumbs-down, and the weights of wi_,j are dynamically updated by backpropagation.

S23，通过以上计算，分别计算出当前任务规划节点，基于上一次历史记录数据计算出采用惯性思维模式和随机思维模式的概率，对比延续采用概率较大的思维模式方案完成当前节点的任务动态规划，计算出下一步该去的节点位置，即：S23, through the above calculations, the current task planning node is calculated respectively, and the probability of adopting the inertial thinking mode and the random thinking mode is calculated based on the previous historical record data, and the dynamic planning of the task of the current node is completed by comparing and continuing to adopt the thinking mode with a larger probability, and the node position to be reached in the next step is calculated, that is:

TaskName_#＝TaskName_*(query)(*，#∈{A，B，C…Z})TaskName_#=TaskName_*(query)(*,#∈{A，B，C…Z})

其中，TaskName_*表示当前任务动态规划节点的函数名称，TaskName_#表示下一任务动态规划节点函数名称。Among them, TaskName_* represents the function name of the current task dynamic programming node, and TaskName_# represents the function name of the next task dynamic programming node.

进一步地，所述在思维空间网络中每一任务输出函数内部，针对每个任务函数经任务拆分递归进入下一层任务拆分的过程，采用及时止损、动态回溯调整的策略，防止任务规划在动态拆分任务过程中，陷入死循环情况，具体包括如下子步骤：Furthermore, within each task output function in the mind space network, for each task function, after task splitting, recursively enters the next layer of task splitting, a strategy of timely stop loss and dynamic backtracking adjustment is adopted to prevent the task planning from falling into an infinite loop during the dynamic task splitting process, which specifically includes the following sub-steps:

在任务动态规划节点函数内部，输出下一任务动态规划节点之前设计观察者模型，只有当满足观察者模型条件时，才进入下一任务动态规划节点，否则提前终止当前任务规划节点的搜索任务，及时反馈用户。Inside the task dynamic planning node function, an observer model is designed before outputting the next task dynamic planning node. Only when the observer model conditions are met will the next task dynamic planning node be entered. Otherwise, the search task of the current task planning node will be terminated in advance to provide timely feedback to the user.

进一步地，所述及时反馈用户，具体包括如下子步骤：采用比较委婉的语气回应自己无法回答相关问题，即：Furthermore, the timely feedback to the user specifically includes the following sub-steps: using a more tactful tone to respond that one cannot answer the relevant question, namely:

FlagBool＝LookUp(TaskName_A，…TaskName_*)(*∈{A，B，C…Z})FlagBool = LookUp(TaskName_A, ...TaskName_*)(*∈{A, B, C, ...Z})

其中，LookUp()为观察者函数，FlagBool默认值为True，负责对当前任务动态规划节点经过Inertial_Thinking或Random_Thinking思考模式得到的任务进行规划；[findName_A，findPhoneNumber_S，findBackGround_B]进行依次检验每个节点是否找到相关负责数据，如果有一项或者几项没有找到，则FlagBool＝False，采用切换思维模式和回溯上一个任务规划节点两种途径进行任务再分配尝试。Among them, LookUp() is the observer function, and the default value of FlagBool is True, which is responsible for planning the tasks obtained by the current task dynamic planning node through the Inertial_Thinking or Random_Thinking thinking mode; [findName_A, findPhoneNumber_S, findBackGround_B] checks each node in turn whether it has found the relevant responsible data. If one or more items are not found, FlagBool = False, and the task redistribution attempt is made by switching thinking modes and backtracking to the previous task planning node.

进一步地，所述任务再分配尝试，具体包括如下子步骤：设置每种尝试不超过设定次数，总尝试时间不超过设定时间，当设定时间后任处于尝试状态，提前终止当前任务动态规划节点尝试，承认自己的不足并且及时反馈用户请换个问题试试。Furthermore, the task reallocation attempt specifically includes the following sub-steps: setting each attempt to not exceed a set number of times, and the total attempt time not exceeding a set time, when the attempt state is reached after the set time, terminating the current task dynamic planning node attempt in advance, admitting one's own shortcomings and promptly giving feedback to the user to try another problem.

进一步地，所述设定次数包括3次，设定时间包括30秒。Furthermore, the set number of times includes 3 times, and the set time includes 30 seconds.

本发明的有益效果包括：The beneficial effects of the present invention include:

本发明针对大语言模型输出可信可控的需求，实施例中在基于国内开源720亿参数的大语言模型基础上，结合思维树、图、森林等图算法模型，对现实问答场景可能出现的问题类型进行动态抽象建模，由浅入深搭建思维空间网络，对网络中每一个节点设计可信可控任务输出函数，形了一种结合大语言模型和Agent思想的任务动态规划方法；对思维空间网络内每一个节点的输出函数的输出内容真实性，通过严格的提示词进行精准把控；同时利用大语言模型输出不稳定的特性，在任务输出函数的内部采取惯性思维模型和随机思维模型混合的任务决策模式进行动态任务规划，达到在输出内容可信可控的情况下，极大程度上降低了复杂场景延伸出的大语言模型微调需求，为不具备大量计算显卡的中小企业提供可能。效果上一方面解决了大语言模型输出可信可控的需求，另一方面，在很大程度上降低了大语言模型微调需求，只需对动态任务规划进行可信可控监督，剩下的工作交由专业模型完成即可。The present invention aims at the demand of reliable and controllable output of large language models. In the embodiment, based on the large language model with 72 billion parameters open source in China, combined with graph algorithm models such as thinking tree, graph, forest, etc., the types of questions that may appear in the real question-answering scene are dynamically abstracted and modeled, and a thinking space network is built from shallow to deep. A reliable and controllable task output function is designed for each node in the network, forming a task dynamic planning method combining large language models and Agent ideas; the authenticity of the output content of the output function of each node in the thinking space network is accurately controlled through strict prompt words; at the same time, the unstable output characteristics of the large language model are used to adopt a task decision mode mixed with an inertial thinking model and a random thinking model in the task output function for dynamic task planning, so that when the output content is reliable and controllable, the demand for fine-tuning of the large language model extended from complex scenes is greatly reduced, which provides possibilities for small and medium-sized enterprises that do not have a large number of computing graphics cards. In terms of effect, on the one hand, the demand for reliable and controllable output of the large language model is solved, and on the other hand, the demand for fine-tuning of the large language model is reduced to a large extent. It only needs to supervise the dynamic task planning in a reliable and controllable manner, and the rest of the work can be completed by professional models.

本发明针对开源大语言模型上下文token字数受限问题，实施例中在基于国内开源720亿参数的大语言模型最大支持32K上下文基础上，借鉴大语言模型的提示工程、Agent理念、经典图算法深度搜索、广度搜索和随机游走等算法思想构建思维空间网络；把现实场景对话问题，分层次抽象为依托思维空间网络解决问题的模式，从而解决了由于场景复杂，提示词过长受大语言模型上下文token数牵制的问题。在有益效果上一方面解决了开源大语言模型上下文token字数受限问题，通过思维空间网络中的动态任务规划设计，可完全脱离大语言模型32K上下文限制，从而充分发挥大语言模型在输出内容上不可控的优势和劣势；同时，通过提示词说明书中的增量学习内容可极大地提高思维空间网络的灵活性和可高效扩展的能力；另一方面，思维空间网络的设计模式更符合人类的深度、浅度认知和局部、全局认知的思维方式。The present invention aims at the problem of limited number of words in the context token of the open source large language model. In the embodiment, based on the domestic open source 72 billion parameter large language model that supports a maximum of 32K context, the large language model's prompt engineering, Agent concept, classic graph algorithm deep search, breadth search and random walk algorithm ideas are used to construct a mind space network; the real scene dialogue problem is hierarchically abstracted into a mode of solving the problem based on the mind space network, thereby solving the problem that the prompt word is too long and is constrained by the number of context tokens of the large language model due to the complexity of the scene. In terms of beneficial effects, on the one hand, the problem of limited number of words in the context token of the open source large language model is solved. Through the dynamic task planning design in the mind space network, it can completely break away from the 32K context limit of the large language model, so as to give full play to the uncontrollable advantages and disadvantages of the large language model in output content; at the same time, the incremental learning content in the prompt word manual can greatly improve the flexibility and efficient expansion ability of the mind space network; on the other hand, the design mode of the mind space network is more in line with the human's deep, shallow cognition and local, global cognition thinking mode.

本发明文本分类方法思路清晰，可执行性强，当问答场景较复杂的情况下，改善效果比较明显。The text classification method of the present invention has clear ideas and strong executability. When the question-answering scenario is more complicated, the improvement effect is more obvious.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required for use in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying creative labor.

图1为本发明实施例方法步骤流程图。FIG1 is a flow chart of the method steps of an embodiment of the present invention.

具体实施方式Detailed ways

本说明书中所有实施例公开的所有特征，或隐含公开的所有方法或过程中的步骤，除了互相排斥的特征和/或步骤以外，均可以以任何方式组合和/或扩展、替换。All features disclosed in all embodiments in this specification, or steps in all methods or processes implicitly disclosed, except for mutually exclusive features and/or steps, can be combined and/or expanded or replaced in any manner.

如图1所示，根据本发明构思，具体提供一种结合大语言模型和Agent思想的任务动态规划方法，通过以下步骤实现：As shown in FIG1 , according to the present invention, a task dynamic planning method combining a large language model and an agent concept is specifically provided, which is implemented by the following steps:

步骤S1，收集开放域或者垂直域的复杂场景对话内容，对现实世界的自然交互语言利用当下已有复杂场景对话数据，进行简单思维空间网络的抽象与构建；具体而言，针对开放域、垂直域可能遇到的各种复杂问答场景，对现实世界的自然语言进行思维树、图、森林等图算法抽象，由浅入深地形成一种灵活、动态可扩展的思维空间网络。Step S1, collect complex scene dialogue content in open domains or vertical domains, and use the existing complex scene dialogue data to abstract and construct a simple thinking space network for the natural interactive language of the real world; specifically, for various complex question-and-answer scenarios that may be encountered in open domains and vertical domains, the natural language of the real world is abstracted by graph algorithms such as thinking trees, graphs, and forests, so as to form a flexible, dynamically and scalable thinking space network from shallow to deep.

步骤S2，针对简易思维空间网络中每个节点处，结合Agent思想设计任务动态规划输出函数；具体而言，在思维空间网络的每个中间节点，结合Agent智能体思想，设计当前节点的任务函数，专门负责当前节点的任务拆解与素材数据整合功能。Step S2, for each node in the simple thinking space network, combine the Agent idea to design the task dynamic programming output function; specifically, at each intermediate node in the thinking space network, combine the Agent intelligent body idea to design the task function of the current node, which is specifically responsible for the task disassembly and material data integration functions of the current node.

步骤S3，在思维空间网络中每一任务输出函数内部，结合提示工程技术，充分利用大语言模型回答内容不可控的优势和劣势，严格编写提示词说明书，在国内开源大语言模型千问72B模型基础上，达到输出任务内容和相关素材的可信可控；具体而言，针对每个任务函数，内部结合大语言模型中的提示工程(Prompt-Engineering)，专门针对当下任务，设计任务拆解相关提示词，依托大语言模型自身理解能力，给出下一步任务节点的明确函数，进而递归进入更深一层的思维层次，直到大语言模型认为已找到完美答案为止。Step S3, within each task output function in the thinking space network, combined with prompt engineering technology, fully utilize the advantages and disadvantages of the uncontrollable answer content of the large language model, strictly compile prompt word instructions, and achieve reliable controllability of output task content and related materials based on the domestic open source large language model Qianwen 72B model; specifically, for each task function, internally combine the prompt engineering (Prompt-Engineering) in the large language model, specifically for the current task, design task decomposition related prompt words, rely on the large language model's own understanding ability, give a clear function of the next task node, and then recursively enter a deeper level of thinking until the large language model believes that the perfect answer has been found.

步骤S4，在思维空间网络中每一任务输出函数内部，提示工程设计基于随机游走算法结合惯性思维和随机思维结合的设计模式输出任务动态规划结果；具体而言，针对每个任务函数内部设计的提示词，采用惯性思维模式结合随机思维模式的设计思路，动态形成不同逻辑的提示词。Step S4, within each task output function in the thinking space network, prompt the engineering design to output the task dynamic planning results based on the design pattern of the random walk algorithm combined with inertial thinking and random thinking; specifically, for the prompt words designed inside each task function, the design idea of the inertial thinking mode combined with the random thinking mode is adopted to dynamically form prompt words with different logics.

步骤S5，在思维空间网络中每一任务输出函数内部，例如，经任务拆分递归进入下一层任务拆分的过程，采用及时止损、动态回溯调整的策略，防止任务规划算法在动态拆分任务过程中，陷入死循环等意外情况；具体而言，针对每个任务函数经任务拆分递归进入下一层任务拆分的过程，采用及时止损、动态回溯调整的策略，防止任务规划算法在动态拆分任务过程中，陷入死循环等意外情况，类似人类遇到钻牛角尖场景。Step S5, inside each task output function in the thinking space network, for example, in the process of recursively entering the next layer of task splitting after task splitting, a strategy of timely stop loss and dynamic backtracking adjustment is adopted to prevent the task planning algorithm from falling into an infinite loop and other unexpected situations during the dynamic task splitting process; specifically, for each task function, in the process of recursively entering the next layer of task splitting after task splitting, a strategy of timely stop loss and dynamic backtracking adjustment is adopted to prevent the task planning algorithm from falling into an infinite loop and other unexpected situations during the dynamic task splitting process, which is similar to the situation where humans are stuck in a dead end.

按照上述步骤，设置到思维空间网络中的每一任务输出函数中，形成一套完整的任务动态规划方法。更进一步而言，为思维空间网络中每一个任务规划节点复用上述设计模式，形成一套完整的、可动态扩展的任务动态规划方法。According to the above steps, each task output function in the mind space network is set to form a complete task dynamic planning method. Furthermore, the above design pattern is reused for each task planning node in the mind space network to form a complete and dynamically scalable task dynamic planning method.

步骤1：本实施例方案在收集语料数据集过程中，基于某垂直领域数据集，采集垂直领域用户问答场景记录，归纳总结分为12大类问题，基于12大类进行简易思维空间网络建模，形成初代版本思维空间网络，这个简易网络由一个根节点(入口节点)和12个叶节点(12大类问题)组成，其中根节点负责对接人机交互的自然语言，叶节点即问答场景中针对问题依托思维空间网络产出答案的节点；Step 1: In the process of collecting corpus data sets, this embodiment scheme collects user question-and-answer scenario records in vertical fields based on a certain vertical field data set, summarizes and classifies them into 12 major categories of questions, and models a simple thinking space network based on the 12 major categories to form the first-generation version of the thinking space network. This simple network consists of a root node (entry node) and 12 leaf nodes (12 major categories of questions), where the root node is responsible for connecting to the natural language of human-computer interaction, and the leaf node is the node that outputs answers to questions in the question-and-answer scenario based on the thinking space network;

T＝{t₀，t₁₁，t₁₂…t_ij}(i＝1，1≤j≤12)T＝{t₀ ，t₁₁ ，t₁₂ ···t_ij }(i＝1，1≤j≤12)

其中，T表示简易思维空间网络，t₀表示入口节点，t_ij表示叶节点，也就是答案输出节点。Among them, T represents the simple thinking space network,_t0 represents the entry node, and_tij represents the leaf node, which is the answer output node.

步骤2：针对思维空间网络中每个节点处，结合Agent思想设计任务动态规划输出函数，用于当前节点的任务拆解与素材数据整合。步骤1中简易思维空间网络，是一颗高度为2的多叉树，仅适用于问题从入口根节点进入，仅需一步即可给出答案的简单任务场景，且如果12类问题演变成12000类问题时，无法突破开源72B模型32K的上下文token限制，且类别变多后很容易产生语意分歧、交叉和模糊等现象；因此，面对复杂场景的思维空间网络建模，需要构建中间节点，也就是任务动态规划节点，中间节点设计可以互通连线，此时已脱离多叉树概念，将其从二维空间拖入多维空间，以此来构建解决问题的复杂思维空间网络。其中任意一个任务动态规划节点的函数名称设计规范如下：Step 2: For each node in the thinking space network, combine the Agent idea to design the task dynamic programming output function, which is used for task decomposition and material data integration of the current node. The simple thinking space network in step 1 is a multi-branch tree with a height of 2. It is only suitable for simple task scenarios where the problem enters from the entry root node and the answer can be given in only one step. If 12 types of problems evolve into 12,000 types of problems, it is impossible to break through the 32K context token limit of the open source 72B model, and it is easy to produce semantic divergence, intersection and ambiguity when the categories increase. Therefore, in the face of complex scenarios, it is necessary to build intermediate nodes, that is, task dynamic programming nodes. The intermediate nodes are designed to be interconnected. At this time, it has been separated from the concept of multi-branch trees and dragged from two-dimensional space into multi-dimensional space to build a complex thinking space network for solving problems. The function name design specification of any task dynamic programming node is as follows:

TaskName_*()(*∈{A，B，C…Z})TaskName_*()(*∈{A，B，C…Z})

其中，TaskName为本任务动态规划节点的任务名称的英文表示，*表示A到Z的大写字母选项，同时限定了每个任务动态规划节点的任务输出候选集在这26个字母中产生，分类问题不会超过26类，如遇到超过情况，请再次进行抽象细分类，将每层任务动态规划节点的分类问题控制在26类以内。Among them, TaskName is the English representation of the task name of the dynamic programming node of this task, and * represents the uppercase letter options from A to Z. At the same time, the task output candidate set of each task dynamic programming node is limited to these 26 letters. The classification problem will not exceed 26 categories. If it exceeds the situation, please perform abstract subclassification again to control the classification problem of each layer of task dynamic programming node within 26 categories.

步骤3：在上述思维空间网络的每一个任务动态规划函数内部，严格编写提示词，对当前任务进行动态任务规划和可信度把控，目标是经过当前节点找到下一个任务动态规划节点，以搜寻满足当前任务的可信全模态素材。提示词格式大致如下：Step 3: In each task dynamic planning function of the above mind space network, strictly write prompt words to perform dynamic task planning and credibility control for the current task. The goal is to find the next task dynamic planning node through the current node to search for credible full-modal materials that meet the current task. The prompt word format is as follows:

Prompt＝“你是一个TaskName分类器，请将{query}分类到以下类别中；选项：\nA.询问隐私，背景信息\nB.询问天气情况\n...Z.依据年龄询问饮食配方\n注意：只需输出选项编号，无需其他文字输出；\n”。Prompt = "You are a TaskName classifier, please classify {query} into the following categories; options: \nA. Ask for privacy, background information\nB. Ask for weather conditions\n...Z. Ask for dietary recipes based on age\nNote: Only the option number needs to be output, no other text output is required; \n".

其中，Prompt为提示词，TaskName是当前任务输出节点的任务名称中文表示，query为用户问题；通过编写严格提示词，控制大语言模型的输出朝着可信可控的方向搜寻问题佐证。Among them, Prompt is the prompt word, TaskName is the Chinese representation of the task name of the current task output node, and query is the user's question. By writing strict prompt words, the output of the large language model is controlled to search for problem evidence in a credible and controllable direction.

步骤4：上述思维空间网络的每一个任务动态规划函数内部，结合随机游走、惯性思维和随机思维模式计算出下一步中间任务动态规矩节点或叶节点，具体步骤如下：惯性思维模式提示工程设计：Step 4: In each task dynamic programming function of the above-mentioned thinking space network, the next intermediate task dynamic rule node or leaf node is calculated by combining random walk, inertial thinking and random thinking modes. The specific steps are as follows: Inertial thinking mode prompts engineering design:

注意：只需输出工具名称集合的JSON数据格式，如下所示：\nNote: You only need to output the JSON data format of the tool name collection, as follows:\n

其中，findName、findPhoneNumber和findBackGround均为TaskName的英文表示，[findName_A，findPhoneNumber_S，findBackGround_B]表示依次执行这三个中间节点，即可完成当前场景问答，并在当前节点进行素材整合，生产答案内容；基于此样本提示工程下的输出，充分利用了大语言模型底座强大的通识说明书理解能力优势，模拟人类的惯性思维模式。Among them, findName, findPhoneNumber and findBackGround are all English expressions of TaskName, and [findName_A, findPhoneNumber_S, findBackGround_B] means that by executing these three intermediate nodes in sequence, the current scenario question and answer can be completed, and the materials can be integrated at the current node to produce the answer content; based on the output of this sample prompt project, the powerful general instruction manual comprehension ability advantage of the large language model base is fully utilized to simulate the inertial thinking mode of human beings.

随机思维模式提示工程设计：Random Thought Patterns Tips for Engineering Design:

注意：Notice:

1任何一个选项，可在可选工具集中任意选出一个，且只能选择一个工具，作为本次任务动态规划输出\n1 For any option, you can select any one of the optional tools, and only one tool can be selected as the dynamic planning output of this task\n

2只需输出工具名称集合的JSON数据格式，如下所示：\n2 Simply output the JSON data format of the tool name collection as follows:\n

{"func":["findName_A"]}\n{"func":["findName_A"]}\n

3无需输出除JSON格式外的其他文字信息\n3No need to output other text information except JSON format\n

4如果遇到xxx特殊场景，可同时选择A选项工具集中的findName_A，和Z选项中的xxx_S作为本次任务动态规划的结果；输出格式为：\n4 If you encounter a special scenario, you can select findName_A in the A option tool set and xxx_S in the Z option as the result of the dynamic planning of this task; the output format is: \n

{"func":["findName_A"，"xxx_S"]}\n"{"func":["findName_A","xxx_S"]}\n"

其中，注意4中提到的xxx特殊场景，即在设计当前任务动态规划函数节点过程中遇到的特殊场景，均可通过这样的规范约束方式，添加到注意选项中去；基于此样本提示工程下的输出，充分利用了大语言模型输出不可控的劣势，模拟人类的随机思维模式。Among them, the xxx special scenarios mentioned in Note 4, that is, the special scenarios encountered in the process of designing the dynamic programming function nodes of the current task, can all be added to the attention options through such standardized constraints; based on the output of this sample prompt project, it fully utilizes the uncontrollable disadvantage of the large language model output and simulates the random thinking mode of humans.

利用神经网络中的单神经元架构，逻辑回归函数来计算当前动态任务规划需要选择的思维模式，公式如下：Using the single neuron architecture in the neural network, the logistic regression function is used to calculate the thinking mode that needs to be selected for the current dynamic task planning. The formula is as follows:

y＝w₀，₀+w_0，1x₀，₁+w_0，2x₀，₂+…+w_i，jx_i.jy＝w₀ ，₀ +w_0，1 x₀ ，₁ +w_0，2 x₀ ，₂ +…+_wi，j x_ij

其中，w_i，j为回归系数，在本文，i，j均是函数名称，如findName_A，findName_B等，x_i，j表示从函数i跳转到函数j的概率；Where, w_i,j is the regression coefficient. In this paper, i,j are function names, such as findName_A, findName_B, etc., and_xi,j represents the probability of jumping from function i to function j;

其中，x_i，j的计算过程参考PageRank、PersonalRank思想，公式如下：The calculation process of x_i,j refers to the ideas of PageRank and PersonalRank, and the formula is as follows:

其中，假设在当前任务动态规划函数中给当前中间节点u进行动态任务推荐，需计算所有中间节点对于当前节点u的相关度，该公式表示从当前j节点出发，开始随机游走，每到一个节点都以1-d的概率停止游走并从u从新开始，或者以d的概率继续游走，从当前节点指向的节点中按照均匀分布随机选择一个节点往下游走，这样经过多轮游走后，每个与当前节点相关的中间节点被访问的概率也会收敛趋于稳定，此时可从概率上计算出由节点j走到节点i的x_j，i概率；关于w_i，j多元线性回归系数求解，采用最小化均方误差(MSE)，即sum(y_actual-y_estiomate)^2。可以通过梯度下降等方法来优化参数w_i，j使得MSE最小；w_i，j初始数值为随机初始化0到1范围内参数，后期经过人类反馈结果，以踩、赞的行为方式回溯思维路径，反向传播动态更新w_i，j权重。Among them, assuming that a dynamic task recommendation is made for the current intermediate node u in the current task dynamic programming function, the relevance of all intermediate nodes to the current node u needs to be calculated. The formula indicates that starting from the current node j, a random walk begins, and each time a node is reached, the walk stops with a probability of 1-d and starts again from u, or continues to walk with a probability of d, and randomly selects a node from the nodes pointed to by the current node according to a uniform distribution and walks downstream. In this way, after multiple rounds of walking, the probability of each intermediate node related to the current node being visited will also converge and stabilize. At this time, the probability of xj_,i walking from node j to node i can be calculated from the probability; for the solution of the multivariate linear regression coefficient of w_i,j , the mean square error (MSE) is minimized, that is, sum(y_actual -y_estiomate)^2. The parameters w_i,j can be optimized by methods such as gradient descent to minimize the MSE; the initial value of w_i,j is a parameter randomly initialized in the range of 0 to 1. Later, after human feedback results, the thinking path is traced back in the form of thumbs-up and thumbs-down behaviors, and the weights of w_i,j are dynamically updated by back propagation.

通过以上计算，分别计算出当前任务规划节点，基于上一次历史记录数据计算出采用惯性思维模式和随机思维模式的概率，对比延续采用概率较大的思维模式方案完成当前节点的任务动态规划，计算出下一步该去的节点位置。Through the above calculations, the current task planning node is calculated respectively, and the probability of adopting the inertial thinking mode and the random thinking mode is calculated based on the previous historical record data. The dynamic planning of the task of the current node is completed by comparing and continuing to adopt the thinking mode with a larger probability, and the node position to be reached next is calculated.

步骤5：在任务动态规划节点函数内部，输出下一任务动态规划节点之前设计观察者模型，只有当满足观察者模型条件时，才进入下一任务动态规划节点，否则，提前终止当前任务规划节点的搜索任务，及时反馈用户，用比较委婉的语气回应自己无法回答相关问题。Step 5: Inside the task dynamic planning node function, design the observer model before outputting the next task dynamic planning node. Only when the observer model conditions are met will the next task dynamic planning node be entered. Otherwise, terminate the search task of the current task planning node in advance, and provide timely feedback to the user, using a more tactful tone to respond that you cannot answer the relevant questions.

其中，LookUp()为观察者函数，FlagBool默认值为True，负责对当前任务动态规划节点经过Inertial_Thinking或Random_Thinking思考模式得到的任务规划，如[findName_A，findPhoneNumber_S，findBackGround_B]进行依次检验每个节点是否找到相关负责数据，如果有一项或者几项没有找到，FlagBool＝False，采用切换思维模式和回溯上一个任务规划节点两种途径进行任务再分配尝试，并且每种尝试不超过3次，总尝试时间不超过30秒，当30秒后任处于尝试状态，提前终止当前任务动态规划节点尝试，承认自己的不足并且及时反馈用户请换个问题试试。Among them, LookUp() is the observer function, and the default value of FlagBool is True. It is responsible for the task planning obtained by the current task dynamic planning node through the Inertial_Thinking or Random_Thinking thinking mode, such as [findName_A, findPhoneNumber_S, findBackGround_B] to check whether each node has found the relevant responsible data in turn. If one or more items are not found, FlagBool = False, and the task redistribution attempt is made by switching thinking modes and backtracking to the previous task planning node. Each attempt is no more than 3 times, and the total attempt time is no more than 30 seconds. If it is still in the attempt state after 30 seconds, the current task dynamic planning node attempt is terminated in advance, admitting one's own shortcomings and giving timely feedback to users. Please try another question.

需要说明的是，在本发明权利要求书中所限定的保护范围内，以下实施例均可以从上述具体实施方式中，例如公开的技术原理，公开的技术特征或隐含公开的技术特征等，以合乎逻辑的任何方式进行组合和/或扩展、替换。It should be noted that within the scope of protection defined in the claims of the present invention, the following embodiments can be combined and/or expanded or replaced in any logical way from the above specific implementation methods, such as disclosed technical principles, disclosed technical features or implicitly disclosed technical features.

在其他实施方式中，包括但不限于如下实施例：In other embodiments, including but not limited to the following embodiments:

实施例1Example 1

步骤1，收集开放域或者垂直域的场景对话内容，对现实世界的自然交互语言利用当下已有场景对话数据，进行思维空间网络的抽象与构建；Step 1: Collect scene dialogue content in open or vertical domains, and use existing scene dialogue data to abstract and construct a thinking space network for natural interactive language in the real world;

步骤2，针对思维空间网络中每个节点处，结合Agent思想设计任务输出函数，用于当前节点的任务拆解与素材数据整合；Step 2: For each node in the thinking space network, combine the agent idea to design the task output function, which is used for task decomposition and material data integration of the current node;

实施例2Example 2

基于实施例1，在步骤1中，所述收集开放域或者垂直域的场景对话内容，对现实世界的自然交互语言利用当下已有场景对话数据，进行思维空间网络的抽象与构建，具体包括如下子步骤：Based on Example 1, in step 1, the scene dialogue content of the open domain or vertical domain is collected, and the natural interactive language of the real world is used to abstract and construct the thinking space network using the existing scene dialogue data, which specifically includes the following sub-steps:

实施例3Example 3

基于实施例2，在步骤1中，所述收集开放域或者垂直域的场景对话内容，对现实世界的自然交互语言利用当下已有场景对话数据，进行思维空间网络的抽象与构建，具体包括如下子步骤：当面对复杂场景时，按照如下子步骤构建思维空间网络：Based on Example 2, in step 1, the scene dialogue content of the open domain or vertical domain is collected, and the natural interactive language of the real world is used to use the existing scene dialogue data to abstract and construct the thinking space network, which specifically includes the following sub-steps: When facing a complex scene, the thinking space network is constructed according to the following sub-steps:

TaskName_*()(*∈{A，B，C…Z})TaskName_*()(*∈{A，B，C…Z})

实施例4Example 4

基于实施例2或实施例3，在步骤2中，所述针对思维空间网络中每个节点处，结合Agent思想设计任务动态规划输出函数，用于当前节点的任务拆解与素材数据整合，具体包括如下子步骤：在所述思维空间网络的每一个任务动态规划函数内部，通过编写严格提示词，用于对当前任务进行动态任务规划和可信度把控，目标是经过当前节点找到下一个任务动态规划节点，以搜寻满足当前任务的可信全模态素材，从而控制大语言模型的输出朝着可信可控的方向搜寻问题佐证。Based on Example 2 or Example 3, in step 2, for each node in the thinking space network, the task dynamic planning output function is designed in combination with the Agent concept, which is used for task decomposition and material data integration of the current node, and specifically includes the following sub-steps: inside each task dynamic planning function of the thinking space network, strict prompt words are written to perform dynamic task planning and credibility control on the current task, with the goal of finding the next task dynamic planning node through the current node to search for credible full-modal materials that meet the current task, thereby controlling the output of the large language model to search for problem evidence in a credible and controllable direction.

实施例5Example 5

基于实施例4，所述严格提示词的格式包括：Based on Example 4, the format of the strict prompt word includes:

实施例6Example 6

基于实施例1，所述提示工程设计基于随机游走算法、惯性思维和随机思维相结合，动态形成不同逻辑的提示词，输出任务动态规划结果，具体包括如下子步骤：Based on Example 1, the prompt engineering design is based on a combination of random walk algorithm, inertial thinking and random thinking, dynamically forms prompt words with different logics, and outputs task dynamic planning results, specifically including the following sub-steps:

步骤21，设计如下惯性思维模式提示工程：Step 21, design the following inertial thinking mode prompt project:

{"func":["findName_A"]}\n{"func":["findName_A"]}\n

{"func":["findName_A"，"xxx_S"]}\n"{"func":["findName_A","xxx_S"]}\n"

步骤22，利用神经网络中的单神经元架构，逻辑回归函数来计算当前动态任务规划需要选择的思维模式，公式如下：Step 22, using the single neuron architecture in the neural network and the logistic regression function to calculate the thinking mode that needs to be selected for the current dynamic task planning, the formula is as follows:

步骤23，通过以上计算，分别计算出当前任务规划节点，基于上一次历史记录数据计算出采用惯性思维模式和随机思维模式的概率，对比延续采用概率较大的思维模式方案完成当前节点的任务动态规划，计算出下一步该去的节点位置，即：Step 23, through the above calculations, the current task planning node is calculated respectively, and the probability of adopting the inertial thinking mode and the random thinking mode is calculated based on the previous historical record data, and the dynamic planning of the task of the current node is completed by comparing and continuing to adopt the thinking mode with a higher probability, and the node position to be reached in the next step is calculated, that is:

实施例7Example 7

基于实施例6，所述在思维空间网络中每一任务输出函数内部，针对每个任务函数经任务拆分递归进入下一层任务拆分的过程，采用及时止损、动态回溯调整的策略，防止任务规划在动态拆分任务过程中，陷入死循环情况，具体包括如下子步骤：Based on Example 6, within each task output function in the mind space network, for each task function, after task splitting, recursively enters the next layer of task splitting, a strategy of timely stop loss and dynamic backtracking adjustment is adopted to prevent task planning from falling into an infinite loop during the dynamic task splitting process, which specifically includes the following sub-steps:

实施例8Example 8

基于实施例7，所述及时反馈用户，具体包括如下子步骤：采用比较委婉的语气回应自己无法回答相关问题，即：Based on Example 7, the timely feedback to the user specifically includes the following sub-steps: using a more tactful tone to respond that one cannot answer the relevant question, that is:

实施例9Embodiment 9

基于实施例8，所述任务再分配尝试，具体包括如下子步骤：设置每种尝试不超过设定次数，总尝试时间不超过设定时间，当设定时间后任处于尝试状态，提前终止当前任务动态规划节点尝试，承认自己的不足并且及时反馈用户请换个问题试试。Based on Example 8, the task reallocation attempt specifically includes the following sub-steps: setting each attempt to not exceed a set number of times, and the total attempt time not exceeding a set time, when the attempt state is reached after the set time, terminating the current task dynamic planning node attempt in advance, admitting one's own shortcomings and promptly giving feedback to the user to try another problem.

实施例10Example 10

基于实施例9，所述设定次数包括3次，设定时间包括30秒。Based on Example 9, the set number of times includes 3 times, and the set time includes 30 seconds.

描述于本发明实施例中所涉及到的单元可以通过软件的方式实现，也可以通过硬件的方式来实现，所描述的单元也可以设置在处理器中。其中，这些单元的名称在某种情况下并不构成对该单元本身的限定。The units involved in the embodiments of the present invention may be implemented by software or hardware, and the units described may also be arranged in a processor. The names of these units do not, in some cases, limit the units themselves.

根据本发明实施例的一个方面，提供了一种计算机程序产品或计算机程序，该计算机程序产品或计算机程序包括计算机指令，该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令，处理器执行该计算机指令，使得该计算机设备执行上述各种可选实现方式中提供的方法。According to one aspect of an embodiment of the present invention, a computer program product or a computer program is provided, the computer program product or the computer program includes a computer instruction, and the computer instruction is stored in a computer-readable storage medium. A processor of a computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the method provided in the above various optional implementations.

作为另一方面，本发明实施例还提供了一种计算机可读介质，该计算机可读介质可以是上述实施例中描述的电子设备中所包含的；也可以是单独存在，而未装配入该电子设备中。上述计算机可读介质承载有一个或者多个程序，当上述一个或者多个程序被一个该电子设备执行时，使得该电子设备实现上述实施例中所述的方法。As another aspect, an embodiment of the present invention further provides a computer-readable medium, which may be included in the electronic device described in the above embodiment; or may exist independently without being assembled into the electronic device. The above computer-readable medium carries one or more programs, and when the above one or more programs are executed by an electronic device, the electronic device implements the method described in the above embodiment.