CN118606438A

Movatterモバイル変換

Info

Publication number: CN118606438A
Application number: CN202410864955.5A
Authority: CN
Inventors: 郑嘉伟; 官俊; 沈阳超; 贾栩杰
Original assignee: Zhongdian Jinxin Digital Technology Group Co ltd
Current assignee: Zhongdian Jinxin Digital Technology Group Co ltd
Priority date: 2024-06-28
Filing date: 2024-06-28
Publication date: 2024-09-06

Abstract

Translated fromChinese

本申请涉及一种数据分析方法、装置、计算机设备、计算机可读存储介质和计算机程序产品。所述方法包括：获取数据查询类型的问题文本；将问题文本与元数据仓库中第一元数据进行相似度匹配，并根据相似度匹配结果，在第一元数据中确定问题文本对应的第二元数据；根据问题文本、第二元数据和上下文提示模版，构建初始上下文提示文本；根据链式验证逻辑和大语言模型对第二元数据进行验证，并基于验证结果对初始上下文提示文本进行调整，得到目标上下文提示文本；基于目标上下文提示文本和大语言模型得到问题文本对应的数据查询语句，并基于数据查询语句在业务数据库中进行检索，得到数据分析结果。采用本方法能够提高数据分析准确性。

The present application relates to a data analysis method, device, computer equipment, computer-readable storage medium and computer program product. The method comprises: obtaining a question text of a data query type; performing similarity matching between the question text and the first metadata in the metadata warehouse, and determining the second metadata corresponding to the question text in the first metadata according to the similarity matching result; constructing an initial context prompt text according to the question text, the second metadata and the context prompt template; verifying the second metadata according to the chain verification logic and the large language model, and adjusting the initial context prompt text based on the verification result to obtain a target context prompt text; obtaining a data query statement corresponding to the question text based on the target context prompt text and the large language model, and retrieving in the business database based on the data query statement to obtain a data analysis result. The use of this method can improve the accuracy of data analysis.

Description

Translated fromChinese

数据分析方法、装置、计算机设备、可读存储介质和程序产品Data analysis method, device, computer equipment, readable storage medium and program product

技术领域Technical Field

本申请涉及人工智能技术领域，特别是涉及一种数据分析方法、装置、计算机设备、计算机可读存储介质和计算机程序产品。The present application relates to the field of artificial intelligence technology, and in particular to a data analysis method, apparatus, computer equipment, computer-readable storage medium, and computer program product.

背景技术Background Art

随着企业业务的发展，业务数据的体量越来越大，业务数据一般存储于企业业务系统的数据库中，在数据分析和汇总的需求下，需要对数据库中的业务数据进行查询分析。With the development of enterprise business, the volume of business data is getting larger and larger. Business data is generally stored in the database of the enterprise business system. When there is a need for data analysis and aggregation, it is necessary to query and analyze the business data in the database.

传统技术中，数据分析人员通过人工操作，基于底层数据仓库的物理结构，例如，数据库中存储的实例、表和字段等，以及实例、表和字段对应的业务含义，来定义业务指标，并将相关业务指标按照数据分析人员的经验转换为可执行的SQL语句，根据SQL语句在业务数据库中进行相关业务指标的查询分析，得到数据分析结果。In traditional technology, data analysts define business indicators through manual operations based on the physical structure of the underlying data warehouse, such as the instances, tables, and fields stored in the database, as well as the business meanings corresponding to the instances, tables, and fields. They convert the relevant business indicators into executable SQL statements based on the experience of the data analysts, and query and analyze the relevant business indicators in the business database based on the SQL statements to obtain data analysis results.

然而，传统技术中，由于底层数据仓库的物理结构与业务的关系较为复杂，导致数据分析人员凭经验编写的SQL语句的准确性较差，进而导致数据分析结果的准确性较差。However, in traditional technologies, due to the complex relationship between the physical structure of the underlying data warehouse and the business, the SQL statements written by data analysts based on experience are less accurate, which in turn leads to less accurate data analysis results.

发明内容Summary of the invention

基于此，有必要针对上述技术问题，提供一种数据分析方法、装置、计算机设备、计算机可读存储介质和计算机程序产品。Based on this, it is necessary to provide a data analysis method, apparatus, computer equipment, computer-readable storage medium and computer program product to address the above technical issues.

第一方面，本申请提供了一种数据分析方法，包括：In a first aspect, the present application provides a data analysis method, comprising:

获取数据查询类型的问题文本；Get the question text of the data query type;

将所述问题文本与元数据仓库中第一元数据进行相似度匹配，并根据相似度匹配结果，在所述第一元数据中确定所述问题文本对应的第二元数据；Performing similarity matching between the question text and first metadata in the metadata repository, and determining second metadata corresponding to the question text in the first metadata according to the similarity matching result;

根据所述问题文本、所述第二元数据和上下文提示模版，构建初始上下文提示文本；所述初始上下文提示文本中包含链式验证逻辑；Constructing an initial context prompt text according to the question text, the second metadata and the context prompt template; the initial context prompt text includes a chain verification logic;

根据所述链式验证逻辑和大语言模型对所述第二元数据进行验证，并基于验证结果对所述初始上下文提示文本进行调整，得到目标上下文提示文本；Verifying the second metadata according to the chain verification logic and the large language model, and adjusting the initial context prompt text based on the verification result to obtain a target context prompt text;

基于所述目标上下文提示文本和所述大语言模型得到所述问题文本对应的数据查询语句，并基于所述数据查询语句在业务数据库中进行检索，得到数据分析结果。A data query statement corresponding to the question text is obtained based on the target context prompt text and the large language model, and a search is performed in a business database based on the data query statement to obtain a data analysis result.

在其中一个实施例中，所述根据所述链式验证逻辑和大语言模型对所述第二元数据进行验证，并基于验证结果对所述初始上下文提示文本进行调整，得到目标上下文提示文本，包括：In one embodiment, the verifying the second metadata according to the chain verification logic and the large language model, and adjusting the initial context prompt text based on the verification result to obtain the target context prompt text includes:

根据所述链式验证逻辑和大语言模型对所述第二元数据进行验证，并基于验证结果对所述第二元数据进行筛选调整，得到目标元数据；Verifying the second metadata according to the chain verification logic and the large language model, and screening and adjusting the second metadata based on the verification result to obtain target metadata;

基于所述目标元数据对所述初始上下文提示文本进行调整，得到目标上下文提示文本。The initial context prompt text is adjusted based on the target metadata to obtain a target context prompt text.

在其中一个实施例中，所述第二元数据包括数据表和字段；所述根据所述链式验证逻辑和大语言模型对所述第二元数据进行验证，并基于验证结果对所述第二元数据进行筛选调整，得到目标元数据，包括：In one embodiment, the second metadata includes a data table and a field; the second metadata is verified according to the chain verification logic and the large language model, and the second metadata is screened and adjusted based on the verification result to obtain the target metadata, including:

执行所述链式验证逻辑，根据大语言模型对所述问题文本的问题意图进行所述字段与所述问题意图之间的相关性分析，得到相关性结果；Executing the chain verification logic, performing a correlation analysis between the field and the question intent of the question text according to the large language model, and obtaining a correlation result;

基于所述相关性结果在所述字段中确定目标字段；determining a target field among the fields based on the correlation result;

将所述目标字段和包含所述目标字段的数据表确定为目标数据表；所述目标字段和所述目标数据表构成目标元数据。The target field and the data table containing the target field are determined as the target data table; the target field and the target data table constitute target metadata.

在其中一个实施例中，所述将所述问题文本与元数据仓库中第一元数据进行相似度匹配，并根据相似度匹配结果，在所述第一元数据中确定所述问题文本对应的第二元数据，包括：In one embodiment, performing similarity matching between the question text and first metadata in a metadata repository, and determining second metadata corresponding to the question text in the first metadata according to the similarity matching result, includes:

将所述问题文本进行分词处理，得到问题关键词，并将所述问题关键词进行词嵌入处理，得到问题向量；Perform word segmentation on the question text to obtain question keywords, and perform word embedding on the question keywords to obtain a question vector;

计算所述问题向量与元数据仓库中各第一元数据的向量表示之间的相似度；Calculating the similarity between the question vector and the vector representation of each first metadata in the metadata repository;

将所述相似度大于预设阈值的向量表示对应的第一元数据，确定为所述问题文本对应的第二元数据。The first metadata corresponding to the vector representation whose similarity is greater than a preset threshold is determined as the second metadata corresponding to the question text.

在其中一个实施例中，所述根据所述问题文本、所述第二元数据和上下文提示模版，构建初始上下文提示文本，包括：In one embodiment, constructing an initial context prompt text according to the question text, the second metadata and the context prompt template includes:

基于自然语言转换微服务对所述问题文本内进行语义识别，得到所述问题文本对应的问题意图；Based on the natural language conversion microservice, semantic recognition is performed in the question text to obtain the question intent corresponding to the question text;

根据所述问题意图、所述第二元数据和上下文提示模版中的链式验证逻辑，构建初始上下文提示文本。An initial context prompt text is constructed according to the question intent, the second metadata and the chain verification logic in the context prompt template.

在其中一个实施例中，所述获取数据查询类型的问题文本之前，所述方法还包括：In one embodiment, before obtaining the question text of the data query type, the method further includes:

在业务数据库中进行数据库、模式、表、字段和备注信息的提取，得到第一元数据；Extracting database, mode, table, field and remark information from the business database to obtain first metadata;

对所述第一元数据进行词向量嵌入，得到所述第一元数据的向量表示，并基于所述第一元数据的向量表示构建元数据仓库。The first metadata is word-embedded to obtain a vector representation of the first metadata, and a metadata warehouse is constructed based on the vector representation of the first metadata.

第二方面，本申请还提供了一种数据分析装置，包括：In a second aspect, the present application also provides a data analysis device, comprising:

获取模块，用于获取数据查询类型的问题文本；The acquisition module is used to obtain the question text of the data query type;

匹配模块，用于将所述问题文本与元数据仓库中第一元数据进行相似度匹配，并根据相似度匹配结果，在所述第一元数据中确定所述问题文本对应的第二元数据；A matching module, used for performing similarity matching between the question text and the first metadata in the metadata repository, and determining the second metadata corresponding to the question text in the first metadata according to the similarity matching result;

第一构建模块，用于根据所述问题文本、所述第二元数据和上下文提示模版，构建初始上下文提示文本；所述初始上下文提示文本中包含链式验证逻辑；A first construction module is used to construct an initial context prompt text according to the question text, the second metadata and the context prompt template; the initial context prompt text contains a chain verification logic;

验证模块，用于根据所述链式验证逻辑和大语言模型对所述第二元数据进行验证，并基于验证结果对所述初始上下文提示文本进行调整，得到目标上下文提示文本；A verification module, configured to verify the second metadata according to the chain verification logic and the large language model, and adjust the initial context prompt text based on the verification result to obtain a target context prompt text;

查询模块，用于基于所述目标上下文提示文本和所述大语言模型得到所述问题文本对应的数据查询语句，并基于所述数据查询语句在业务数据库中进行检索，得到数据分析结果。A query module is used to obtain a data query statement corresponding to the question text based on the target context prompt text and the large language model, and to search in a business database based on the data query statement to obtain a data analysis result.

在其中一个实施例中，所述验证模块具体用于根据所述链式验证逻辑和大语言模型对所述第二元数据进行验证，并基于验证结果对所述第二元数据进行筛选调整，得到目标元数据；In one of the embodiments, the verification module is specifically used to verify the second metadata according to the chain verification logic and the large language model, and screen and adjust the second metadata based on the verification result to obtain the target metadata;

在其中一个实施例中，所述第二元数据包括数据表和字段；所述验证模块具体用于执行所述链式验证逻辑，根据大语言模型对所述问题文本的问题意图进行所述字段与所述问题意图之间的相关性分析，得到相关性结果；In one embodiment, the second metadata includes a data table and a field; the verification module is specifically used to execute the chain verification logic, perform a correlation analysis between the field and the question intent of the question text according to the large language model, and obtain a correlation result;

在其中一个实施例中，所述匹配模块具体用于将所述问题文本进行分词处理，得到问题关键词，并将所述问题关键词进行词嵌入处理，得到问题向量；In one embodiment, the matching module is specifically used to perform word segmentation processing on the question text to obtain question keywords, and perform word embedding processing on the question keywords to obtain a question vector;

在其中一个实施例中，所述第一构建模块具体用于基于自然语言转换微服务对所述问题文本内进行语义识别，得到所述问题文本对应的问题意图；In one embodiment, the first construction module is specifically used to perform semantic recognition in the question text based on the natural language conversion microservice to obtain the question intent corresponding to the question text;

在其中一个实施例中，所述装置还包括：In one embodiment, the device further comprises:

提取模块，用于在业务数据库中进行数据库、模式、表、字段和备注信息的提取，得到第一元数据；An extraction module, used to extract database, mode, table, field and remark information from a business database to obtain first metadata;

第二构建模块，用于对所述第一元数据进行词向量嵌入，得到所述第一元数据的向量表示，并基于所述第一元数据的向量表示构建元数据仓库。The second construction module is used to perform word vector embedding on the first metadata to obtain a vector representation of the first metadata, and to construct a metadata warehouse based on the vector representation of the first metadata.

第三方面，本申请还提供了一种计算机设备，包括存储器和处理器，所述存储器存储有计算机程序，所述处理器执行所述计算机程序时实现以下步骤：In a third aspect, the present application further provides a computer device, including a memory and a processor, wherein the memory stores a computer program, and when the processor executes the computer program, the following steps are implemented:

第四方面，本申请还提供了一种计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器执行时实现以下步骤：In a fourth aspect, the present application further provides a computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, the following steps are implemented:

第五方面，本申请还提供了一种计算机程序产品，包括计算机程序，该计算机程序被处理器执行时实现以下步骤：In a fifth aspect, the present application further provides a computer program product, including a computer program, which implements the following steps when executed by a processor:

上述数据分析方法、装置、计算机设备、计算机可读存储介质和计算机程序产品，获取数据查询类型的问题文本；将问题文本与元数据仓库中第一元数据进行相似度匹配，并根据相似度匹配结果，在第一元数据中确定问题文本对应的第二元数据；根据问题文本、第二元数据和上下文提示模版，构建初始上下文提示文本；初始上下文提示文本中包含链式验证逻辑；根据链式验证逻辑和大语言模型对第二元数据进行验证，并基于验证结果对初始上下文提示文本进行调整，得到目标上下文提示文本；基于目标上下文提示文本和大语言模型得到问题文本对应的数据查询语句，并基于数据查询语句在业务数据库中进行检索，得到数据分析结果。采用本方法，通过链式验证逻辑指导大语言模型对第二元数据进行验证，能够保证目标上下文提示文本的准确性，进而根据大语言模型对目标上下文提示文本进行分析处理，能够提高数据查询语句的准确性，进而提高数据分析结果的准确性。The data analysis method, device, computer equipment, computer-readable storage medium and computer program product described above obtain a question text of a data query type; perform similarity matching between the question text and the first metadata in the metadata warehouse, and determine the second metadata corresponding to the question text in the first metadata according to the similarity matching result; construct an initial context prompt text according to the question text, the second metadata and the context prompt template; the initial context prompt text contains a chain verification logic; the second metadata is verified according to the chain verification logic and the large language model, and the initial context prompt text is adjusted based on the verification result to obtain a target context prompt text; the data query statement corresponding to the question text is obtained based on the target context prompt text and the large language model, and the data query statement is retrieved in the business database to obtain a data analysis result. By adopting this method, the large language model is guided by the chain verification logic to verify the second metadata, so that the accuracy of the target context prompt text can be guaranteed, and then the target context prompt text is analyzed and processed according to the large language model, so that the accuracy of the data query statement can be improved, and then the accuracy of the data analysis result can be improved.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本申请实施例或相关技术中的技术方案，下面将对本申请实施例或相关技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他相关的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application or the related technologies, the drawings required for use in the embodiments of the present application or the related technical descriptions will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present application. For ordinary technicians in this field, other related drawings can be obtained based on these drawings without paying creative work.

图1为一个实施例中数据分析方法的流程示意图；FIG1 is a schematic diagram of a flow chart of a data analysis method in one embodiment;

图2为一个实施例中对初始上下文提示文本进行验证和调整的流程示意图；FIG2 is a schematic diagram of a process for verifying and adjusting the initial context prompt text in one embodiment;

图3为一个实施例中确定目标元数据的流程示意图；FIG3 is a schematic diagram of a process for determining target metadata in one embodiment;

图4为一个实施例中确定第二元数据的流程示意图；FIG4 is a schematic diagram of a process of determining second metadata in one embodiment;

图5为一个实施例中构建初始上下文提示文本的流程示意图；FIG5 is a schematic diagram of a process of constructing an initial context prompt text in one embodiment;

图6为一个具体的实施例中数据分析方法的流程示意图；FIG6 is a schematic flow chart of a data analysis method in a specific embodiment;

图7为一个实施例中构建元数据仓库的流程示意图；FIG7 is a schematic diagram of a process for constructing a metadata repository in one embodiment;

图8为一个实施例中数据分析装置的结构框图；FIG8 is a structural block diagram of a data analysis device in one embodiment;

图9为一个实施例中计算机设备的内部结构图。FIG. 9 is a diagram showing the internal structure of a computer device in one embodiment.

具体实施方式DETAILED DESCRIPTION

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处描述的具体实施例仅仅用以解释本申请，并不用于限定本申请。In order to make the purpose, technical solution and advantages of the present application more clearly understood, the present application is further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application and are not used to limit the present application.

在一个实施例中，如图1所示，提供了一种数据分析方法，本实施例以该方法应用于终端进行举例说明，可以理解的是，该方法也可以应用于服务器，还可以应用于包括终端和服务器的系统，并通过终端和服务器的交互实现。本实施例中，该方法包括以下步骤：In one embodiment, as shown in FIG1 , a data analysis method is provided. This embodiment is illustrated by applying the method to a terminal. It is understandable that the method can also be applied to a server, or to a system including a terminal and a server, and implemented through the interaction between the terminal and the server. In this embodiment, the method includes the following steps:

步骤102，获取数据查询类型的问题文本。Step 102, obtaining the question text of the data query type.

本申请实施例中，终端获取用户输入的问题文本，该问题文本的类型为数据查询类型，用于将用户以自然语言形式输入的数据查询需求转换为SQL（Structured QueryLanguage，结构化查询语句）语句。例如，用户输入的问题文本可以是“查询西南地区4月商品销售额”。In the embodiment of the present application, the terminal obtains the question text input by the user, and the type of the question text is a data query type, which is used to convert the data query requirements input by the user in natural language into SQL (Structured Query Language) statements. For example, the question text input by the user can be "query the sales volume of goods in the southwest region in April".

可选的，终端对用户输出的初始问题文本进行文本理解，当初始文本不够清晰和具体时，即终端对初始问题文本进行语义理解确定的问题类型与预先定义的问题模版或预定义问题类型的相似度低于预设阈值或初始问题文本的问题类型与多种预定义问题类型的相似度均低于预设阈值，终端可以根据大语言模型向用户展示澄清性问题，终端可以利用预定义的模式匹配规则或预先设计的规则引擎，分析用户输入的语法结构和常见错误，从而识别并反馈可能存在的问题或不清晰之处。终端基于用户对该澄清性问题的回复结果，对初始问题文本进行增强，得到目标问题文本，并基于目标问题文本进行后续处理。Optionally, the terminal performs text understanding on the initial question text output by the user. When the initial text is not clear and specific enough, that is, the similarity between the question type determined by the terminal through semantic understanding of the initial question text and the predefined question template or predefined question type is lower than a preset threshold or the similarity between the question type of the initial question text and multiple predefined question types is lower than a preset threshold, the terminal can display clarifying questions to the user based on the large language model. The terminal can use predefined pattern matching rules or pre-designed rule engines to analyze the grammatical structure and common errors of the user input, thereby identifying and feedbacking possible problems or unclear points. Based on the user's response to the clarifying question, the terminal enhances the initial question text to obtain the target question text, and performs subsequent processing based on the target question text.

步骤104，将问题文本与元数据仓库中第一元数据进行相似度匹配，并根据相似度匹配结果，在第一元数据中确定问题文本对应的第二元数据。Step 104: perform similarity matching between the question text and the first metadata in the metadata repository, and determine the second metadata corresponding to the question text in the first metadata according to the similarity matching result.

本申请实施例中，在直接使用大语言模型对用户输入的问题文本进行处理的情况下，由于数据库中字段名和表名的命名规则不同，即不同领域或不同企业的数据库命名规则不同，大语言模型输出的问题文本对应的查询语句中的字段名和表名可能与真实的业务数据库中所使用的表名、字段名等具有差异，则当前大语言模型输出的查询语句的准确性较低。In an embodiment of the present application, when a large language model is used directly to process the question text input by the user, due to the different naming rules of field names and table names in the database, that is, the different database naming rules in different fields or different enterprises, the field names and table names in the query statement corresponding to the question text output by the large language model may be different from the table names, field names, etc. used in the actual business database, and the accuracy of the query statement output by the current large language model is low.

此时，终端可以根据预先构建的元数据仓库确定出问题文本中涉及的第一元数据，元数据仓库中通常存储的是业务数据库中的表名和字段名等第一元数据，其中，元数据仓库中存储的第一元数据可以是向量化表示的元数据。终端使用相似度计算方法，例如，余弦相似度等，比较问题文本和问题文本之间的相似度，根据相似度的匹配结果，将相似度大于预设阈值的第一元数据确定为第二元数据，确保选择的第二元数据与问题文本的匹配度足够高。At this time, the terminal can determine the first metadata involved in the question text based on the pre-built metadata warehouse. The metadata warehouse usually stores the first metadata such as the table name and field name in the business database, wherein the first metadata stored in the metadata warehouse can be vectorized metadata. The terminal uses a similarity calculation method, such as cosine similarity, to compare the similarity between the question text and the question text, and according to the similarity matching result, determines the first metadata with a similarity greater than a preset threshold as the second metadata, ensuring that the selected second metadata matches the question text sufficiently well.

步骤106，根据问题文本、第二元数据和上下文提示模版，构建初始上下文提示文本。Step 106: construct an initial context prompt text according to the question text, the second metadata and the context prompt template.

其中，初始上下文提示文本中包含链式验证逻辑。The initial contextual hint text contains chain validation logic.

本申请实施例中，上下文提示模版中可以预先存储有数据查询类型的链式验证逻辑，终端根据用户输入的自然语言表达的问题文本以及匹配到的第二元数据，按照上下文提示模版的格式进行填充和组合，构建初始上下文提示文本，以确保初始上下文提示文本能够清晰表达用户的数据查询需求。In an embodiment of the present application, a chain verification logic of a data query type may be pre-stored in a context prompt template. The terminal fills and combines the question text expressed in natural language input by the user and the matched second metadata in accordance with the format of the context prompt template to construct an initial context prompt text, so as to ensure that the initial context prompt text can clearly express the user's data query requirements.

在一个可选的实施例中，终端可以根据数据类型验证、约束条件检查或其他业务逻辑上的验证来生成与问题文本的数据查询类型相匹配的链式验证逻辑。当数据查询类型为单一类型的情况下，终端也可以将链式验证逻辑集成于上下文提示模版中，针对独立需求下的数据查询类型进行第二元数据的有效性、准确性验证。In an optional embodiment, the terminal can generate a chain verification logic that matches the data query type of the question text based on data type verification, constraint condition checking, or other business logic verification. When the data query type is a single type, the terminal can also integrate the chain verification logic into the context prompt template to verify the validity and accuracy of the second metadata for the data query type under independent requirements.

步骤108，根据链式验证逻辑和大语言模型对第二元数据进行验证，并基于验证结果对初始上下文提示文本进行调整，得到目标上下文提示文本。Step 108 , verify the second metadata according to the chain verification logic and the large language model, and adjust the initial context prompt text based on the verification result to obtain the target context prompt text.

本申请实施例中，终端使用预先定义的链式验证逻辑指导大语言模型对第二元数据进行验证，确保第二元数据符合预期的格式、内容和业务逻辑，将不相关的第二元数据进行排除，保留与问题文本相关联的真实有效的第二元数据。In an embodiment of the present application, the terminal uses a predefined chain verification logic to guide the large language model to verify the second metadata, ensure that the second metadata conforms to the expected format, content and business logic, excludes irrelevant second metadata, and retains the true and valid second metadata associated with the question text.

在一个示例性的实施例中，数据查询类型的链式验证逻辑可以是：步骤1，将需求转换为专业的数据查询表述；步骤2，根据现有的数据库表情况进行分析，获取必要的表和字段，排除不需要的表；步骤3，得到涉及的数据表名。当用户输入的问题文本为“查询西南地区4月份的电子产品销售额”，终端首先指导大模型按照专业的数据查询表述生成用户的需求，得到查询西南地区（area='西南'）的商品销售情况，仅限于4月份（sale_dateLIKE'%-04-%'）的数据，并进一步基于该需求，得到数据查询表述“SELECT sd.product,sd.amount,sd.sale_date, sd.area FROM sail detail sd WHERE sd.area=西南'ANDsd.sale_date LIKE'%-04-%'”。在此基础上，根据大语言模型对第二元数据中包含的表名、字段名进行分析，得到符合需求的表名，并将符合需求的表名进行输出。In an exemplary embodiment, the chain verification logic of the data query type can be: step 1, converting the demand into a professional data query expression; step 2, analyzing according to the existing database table situation, obtaining necessary tables and fields, and excluding unnecessary tables; step 3, obtaining the names of the data tables involved. When the question text entered by the user is "query the sales of electronic products in the southwest region in April", the terminal first instructs the big model to generate the user's demand according to the professional data query expression, and obtains the query of the sales of goods in the southwest region (area='southwest'), limited to the data in April (sale_dateLIKE'%-04-%'), and further based on the demand, obtains the data query expression "SELECT sd.product, sd.amount, sd.sale_date, sd.area FROM sail detail sd WHERE sd.area=southwest'ANDsd.sale_date LIKE'%-04-%'". On this basis, the table name and field name contained in the second metadata are analyzed according to the big language model to obtain the table name that meets the demand, and the table name that meets the demand is output.

步骤110，基于目标上下文提示文本和大语言模型得到问题文本对应的数据查询语句，并基于数据查询语句在业务数据库中进行检索，得到数据分析结果。Step 110, obtaining a data query statement corresponding to the question text based on the target context prompt text and the large language model, and searching in the business database based on the data query statement to obtain a data analysis result.

本申请实施例中，终端根据目标上下文提示文本对大语言模型的指导，结合目标上下文提示文本中经过链式验证逻辑筛选后的第二元数据，大语言模型可以输出准确的数据查询语句，即用户的问题文本对应的SQL语句，终端将大语言模型生成的数据查询语句发送至业务数据库中进行执行，获取到业务数据库返回的数据分析结果。In an embodiment of the present application, the terminal guides the large language model based on the target context prompt text, combined with the second metadata in the target context prompt text that has been filtered through the chain verification logic, and the large language model can output accurate data query statements, that is, the SQL statements corresponding to the user's question text. The terminal sends the data query statements generated by the large language model to the business database for execution, and obtains the data analysis results returned by the business database.

上述数据分析方法中，通过链式验证逻辑指导大语言模型对第二元数据进行验证，能够保证目标上下文提示文本的准确性，进而根据大语言模型对目标上下文提示文本进行分析处理，提高了数据查询语句的准确性，进而提高了数据分析结果的准确性。In the above data analysis method, the chain verification logic is used to guide the large language model to verify the second metadata, which can ensure the accuracy of the target context prompt text, and then analyze and process the target context prompt text according to the large language model, thereby improving the accuracy of the data query statement and thus improving the accuracy of the data analysis results.

在一个示例性的实施例中，如图2所示，步骤106包括步骤202至步骤204。其中：In an exemplary embodiment, as shown in FIG2 , step 106 includes steps 202 to 204. Among them:

步骤202，根据链式验证逻辑和大语言模型对第二元数据进行验证，并基于验证结果对第二元数据进行筛选调整，得到目标元数据。Step 202: Verify the second metadata according to the chain verification logic and the large language model, and screen and adjust the second metadata based on the verification result to obtain target metadata.

本申请实施例中，终端将包含初始上下文提示文本输入至大语言模型中，该初始上下文提示文本中包含链式验证逻辑，用于指导大语言模型按照链式验证逻辑逐步对第二元数据进行验证。在大语言模型的数据处理中，以元数据的颗粒度进行验证，针对数据仓库中元数据的名称或说明，对第二元数据的具体属性或条件进行验证，验证结果表征不同元数据与问题文本之间的相关性。终端通过大语言模型对验证结果为验证失败和验证通过的第二元数据分别进行标记，并舍弃验证失败的第二元数据，得到目标元数据。In an embodiment of the present application, the terminal inputs the initial context prompt text into the large language model, and the initial context prompt text contains a chain verification logic, which is used to guide the large language model to gradually verify the second metadata according to the chain verification logic. In the data processing of the large language model, verification is performed at the granularity of metadata, and the specific attributes or conditions of the second metadata are verified for the name or description of the metadata in the data warehouse. The verification result represents the correlation between different metadata and the question text. The terminal uses the large language model to mark the second metadata with verification results of failed verification and passed verification, respectively, and discards the second metadata that failed verification to obtain the target metadata.

步骤204，基于目标元数据对初始上下文提示文本进行调整，得到目标上下文提示文本。Step 204: adjust the initial context prompt text based on the target metadata to obtain the target context prompt text.

本申请实施例中，终端根据目标元数据对初始上下文提示文本进行调整，以确保提示文本准确地反映目标数据的特性和要求，其中，调整方式可以是将目标元数据对初始上下文提示文本中的第二元数据进行替换，或者终端可以根据目标元数据的属性重新组织初始上下文的文本结构，确保信息流畅和清晰，并在初始上下文中增加对目标元数据更详细的描述，例如，添加元数据的单位等，以增强信息的完整性和清晰度，根据目标元数据的属性调整初始上下文的逻辑关系，确保上下文文本与目标数据查询需求紧密相关。In an embodiment of the present application, the terminal adjusts the initial context prompt text according to the target metadata to ensure that the prompt text accurately reflects the characteristics and requirements of the target data, wherein the adjustment method can be to replace the second metadata in the initial context prompt text with the target metadata, or the terminal can reorganize the text structure of the initial context according to the attributes of the target metadata to ensure the smoothness and clarity of the information, and add a more detailed description of the target metadata in the initial context, for example, adding units of metadata, etc., to enhance the integrity and clarity of the information, and adjust the logical relationship of the initial context according to the attributes of the target metadata to ensure that the context text is closely related to the target data query requirements.

本实施例中，通过逐步验证元数据属性，确保数据的准确性和有效性，然后根据验证结果筛选调整目标元数据，最终基于目标元数据调整上下文提示文本，以确保生成数据查询语句的完整和准确性。In this embodiment, the accuracy and validity of the data are ensured by gradually verifying the metadata attributes, then the target metadata is screened and adjusted according to the verification results, and finally the context prompt text is adjusted based on the target metadata to ensure the completeness and accuracy of the generated data query statement.

在一个示例性的实施例中，第二元数据包括数据表和字段，如图3所示，步骤202包括步骤302至步骤306。其中：In an exemplary embodiment, the second metadata includes data tables and fields, as shown in FIG3 , step 202 includes steps 302 to 306. Among them:

步骤302，执行链式验证逻辑，根据大语言模型对问题文本的问题意图进行字段与问题意图之间的相关性分析，得到相关性结果。Step 302, execute the chain verification logic, perform the correlation analysis between the fields and the question intent of the question text according to the large language model, and obtain the correlation result.

本申请实施例中，终端将初始上下文提示文本输入至大语言模型中，大语言模型执行初始上下文提示文本中的链式验证逻辑，将问题文本中的问题意图进行分析和验证，通过自然语言处理技术分析问题文本中的问题意图与元数据中的字段进行相关性分析，得出字段与问题意图之间的相关性结果，确定哪些字段与问题文本最相关。In an embodiment of the present application, the terminal inputs the initial context prompt text into the large language model, the large language model executes the chain verification logic in the initial context prompt text, analyzes and verifies the question intent in the question text, and uses natural language processing technology to analyze the question intent in the question text and the fields in the metadata for correlation analysis, obtains the correlation results between the fields and the question intent, and determines which fields are most relevant to the question text.

步骤304，基于相关性结果在字段中确定目标字段。Step 304, determining a target field among the fields based on the correlation result.

本申请实施例中，终端根据相关性结果确定哪些字段对于解决问题意图是最相关的，作为目标字段，具体地，终端将相关性结果中大于预设阈值的相关性结果对应的字段确定为目标字段，即将确定的目标字段作为数据查询的关键字段，用于识别与问题文本相关的数据内容。In an embodiment of the present application, the terminal determines which fields are most relevant to the intention of solving the problem based on the correlation results, and uses them as target fields. Specifically, the terminal determines the fields corresponding to the correlation results that are greater than a preset threshold in the correlation results as target fields, that is, the determined target fields are used as key fields for data query, which are used to identify data content related to the problem text.

步骤306，将目标字段和包含目标字段的数据表确定为目标数据表。Step 306: determine the target field and the data table containing the target field as the target data table.

其中，目标字段和目标数据表构成目标元数据。Among them, the target fields and target data tables constitute the target metadata.

本申请实施例中，终端将包含目标字段的数据表确定为目标数据表，根据目标字段的信息，确定包含该字段的数据表，并将该数据表作为目标数据表，用于进一步的数据查询和分析，并将非目标字段和不包含目标字段的数据表进行舍弃。In an embodiment of the present application, the terminal determines a data table containing a target field as a target data table, determines a data table containing the field based on information of the target field, and uses the data table as the target data table for further data query and analysis, and discards non-target fields and data tables that do not contain target fields.

在一个示例性的实施例中，当问题文本为“查询西南地区4月份的电子产品销售额”时，初始上下文提示文本中的第二元数据包含商品订单表和商品流转表，该商品订单表中包含地区、时间和产品类型等字段，商品流转表包含生产地、经销商等字段，此时，生产地和经销商字段与问题文本不相关，大语言模型可以将地区、时间和产品类型字段确定为目标字段，以及将商品订单表确定为目标数据表，并将商品流转表、生产地、经销商等字段进行舍弃，得到目标元数据。In an exemplary embodiment, when the question text is "Query the sales volume of electronic products in the southwest region in April", the second metadata in the initial context prompt text includes a commodity order table and a commodity circulation table. The commodity order table includes fields such as region, time and product type, and the commodity circulation table includes fields such as production place and distributor. At this time, the production place and distributor fields are not related to the question text. The large language model can determine the region, time and product type fields as target fields, and determine the commodity order table as the target data table, and discard the commodity circulation table, production place, distributor and other fields to obtain the target metadata.

本实施例中，通过执行链式验证逻辑，指导大语言模型分析问题意图与字段的相关性，确定目标字段，并将包含目标字段的数据表确定为目标数据表，有助于根据问题文本中的问题意图精准地定位目标字段和数据表，得到与问题文本具有更强相关性的目标元数据，从而提高数据查询的准确性。In this embodiment, by executing chain verification logic, the large language model is guided to analyze the correlation between the question intent and the field, determine the target field, and determine the data table containing the target field as the target data table. This helps to accurately locate the target field and data table according to the question intent in the question text, and obtain target metadata that has a stronger correlation with the question text, thereby improving the accuracy of data query.

在一个示例性的实施例中，如图4所示，步骤104包括步骤402至步骤406。其中：In an exemplary embodiment, as shown in FIG4 , step 104 includes steps 402 to 406. Among them:

步骤402，将问题文本进行分词处理，得到问题关键词，并将问题关键词进行词嵌入处理，得到问题向量。Step 402, segment the question text to obtain question keywords, and embed the question keywords to obtain a question vector.

本申请实施例中，终端首先将问题文本进行分词处理，将其拆分成单个词语或短语的集合或序列。终端对包含问题关键词的集合或序列进行词嵌入处理，将每个关键词映射到高维空间的向量表示，来形成问题向量从分词结果中提取问题的关键词，该关键词将用于表示问题的语义内容。In the embodiment of the present application, the terminal first performs word segmentation on the question text, splitting it into a set or sequence of individual words or phrases. The terminal performs word embedding on the set or sequence containing the question keywords, mapping each keyword to a vector representation in a high-dimensional space to form a question vector, and extracting the keyword of the question from the word segmentation result, which will be used to represent the semantic content of the question.

步骤404，计算问题向量与元数据仓库中各第一元数据的向量表示之间的相似度。Step 404: Calculate the similarity between the question vector and the vector representation of each first metadata in the metadata repository.

本申请实施例中，相似度计算方法包括余弦相似度、欧式距离等，终端将问题向量与元数据仓库中各第一元数据的向量表示进行相似度计算，得到多个相似度。In the embodiment of the present application, the similarity calculation method includes cosine similarity, Euclidean distance, etc. The terminal calculates the similarity between the question vector and the vector representation of each first metadata in the metadata warehouse to obtain multiple similarities.

步骤406，将相似度大于预设阈值的向量表示对应的第一元数据，确定为问题文本对应的第二元数据。Step 406: Determine the first metadata corresponding to the vector representation having a similarity greater than a preset threshold as the second metadata corresponding to the question text.

本申请实施例中，终端包含预先设定的相似度阈值，将与问题向量相似度大于该阈值的元数据对应的第一元数据确定为问题文本对应的第二元数据，初步得到与问题文本具有一定相关性的第二元数据。In an embodiment of the present application, the terminal includes a pre-set similarity threshold, and determines the first metadata corresponding to the metadata having a similarity with the question vector greater than the threshold as the second metadata corresponding to the question text, and preliminarily obtains the second metadata having a certain correlation with the question text.

本实施例中，将问题文本表示为问题向量，并通过计算问题向量与元数据仓库中各第一元数据的向量表示之间的相似度，确定与问题文本最相关的第二元数据，实现了根据自然语言的问题文本提取第二元数据的智能映射，为后续数据查询语句的生成提供了准确的支持。In this embodiment, the question text is represented as a question vector, and the second metadata most relevant to the question text is determined by calculating the similarity between the question vector and the vector representation of each first metadata in the metadata warehouse, thereby realizing intelligent mapping of the second metadata extracted from the natural language question text, and providing accurate support for the generation of subsequent data query statements.

在一个示例性的实施例中，如图5所示，步骤106包括步骤502至步骤504。其中：In an exemplary embodiment, as shown in FIG5 , step 106 includes steps 502 to 504. Among them:

步骤502，基于自然语言转换微服务对问题文本内进行语义识别，得到问题文本对应的问题意图。Step 502: Based on the natural language conversion microservice, semantic recognition is performed on the question text to obtain the question intent corresponding to the question text.

本申请实施例中，如图6所示，大语言模型可以为codegemma（一种代码生成和理解的大型语言模型）、deepseek-coder（一种人工智能编码助手）等，自然语言转换微服务为NL2SQL（NLP Natural Language To SQL，自然语言向SQL语句转换）微服务。终端可以将问题文本输入至自然语言转换微服务中，通过自然语言转换微服务，解析用户意图，得到问题文本对应的问题意图。In the embodiment of the present application, as shown in FIG6 , the large language model may be codegemma (a large language model for code generation and understanding), deepseek-coder (an artificial intelligence coding assistant), etc., and the natural language conversion microservice is the NL2SQL (NLP Natural Language To SQL, natural language to SQL statement conversion) microservice. The terminal may input the question text into the natural language conversion microservice, and the user intent may be parsed through the natural language conversion microservice to obtain the question intent corresponding to the question text.

步骤504，根据问题意图、第二元数据和上下文提示模版中的链式验证逻辑，构建初始上下文提示文本。Step 504 , constructing an initial context prompt text according to the question intent, the second metadata and the chain verification logic in the context prompt template.

本申请实施例中，在得到问题意图后，如图6所示，终端通过大模型管理中间件将问题意图、第二元数据和上下文提示模版中的链式验证逻辑，按照上下文提示模版的格式进行组合，构建初始上下文提示文本。In an embodiment of the present application, after obtaining the question intent, as shown in Figure 6, the terminal combines the question intent, the second metadata and the chain verification logic in the context prompt template through the large model management middleware according to the format of the context prompt template to construct the initial context prompt text.

在此之前，终端通过大模型管理中间件进行步骤104中第二元数据的匹配，在初始上下文提示文本构建完成后，终端通过大模型管理中间件将初始上下文提示文本输入至大语言模型中，通过大语言模型对初始上下文提示文本中的第二元数据进行验证，得到大语言模型输出的验证结果，通过大模型管理中间件和该验证结果，对初始上下文提示文本作进一步调整，得到目标上下文提示文本，进而再次将目标上下文提示文本输入至大语言模型中，得到数据查询语句。通过大模型中间件向大语言模型进行消息的传递，可以实现用户和大模型之间的异步通信以及针对大模型接口之间的消息路由，提高数据分析的响应速度和并发能力。Prior to this, the terminal matches the second metadata in step 104 through the large model management middleware. After the initial context prompt text is constructed, the terminal inputs the initial context prompt text into the large language model through the large model management middleware, verifies the second metadata in the initial context prompt text through the large language model, obtains the verification result output by the large language model, further adjusts the initial context prompt text through the large model management middleware and the verification result, obtains the target context prompt text, and then inputs the target context prompt text into the large language model again to obtain the data query statement. By transmitting messages to the large language model through the large model middleware, asynchronous communication between the user and the large model and message routing between the large model interfaces can be realized, thereby improving the response speed and concurrency of data analysis.

本实施例中，自然语言转换微服务进行语义识别，能够准确识别用户问题文本的问题意图，根据问题意图、第二元数据和链式验证逻辑，动态地构建初始上下文提示文本，为大语言模型对第二元数据进行验证提供基础。In this embodiment, the natural language conversion microservice performs semantic recognition and can accurately identify the question intent of the user's question text. According to the question intent, the second metadata and the chain verification logic, it dynamically constructs the initial context prompt text, providing a basis for the large language model to verify the second metadata.

在一个示例性的实施例中，如图7所示，步骤102之前，需要构建完整的元数据仓库，该方法还包括步骤702至步骤704。其中：In an exemplary embodiment, as shown in FIG7 , before step 102, a complete metadata repository needs to be constructed, and the method further includes steps 702 to 704. Among them:

步骤702，在业务数据库中进行数据库、模式、表、字段和备注信息的提取，得到第一元数据。Step 702: extract database, mode, table, field and remark information from the business database to obtain first metadata.

本申请实施例中，首先，终端通过数据库连接工具或者数据库管理终端提供的API连接到业务数据库。然后，终端从业务数据库中提取需要的元数据信息，包括数据库的名称、模式（Schema）、表（Table）、字段（Column）以及相关的备注信息（Metadata Comments）。对于每个数据库对象（如表和字段），终端可以获取其名称、结构定义，以及任何可用的描述性信息或注释，用于后续的词向量嵌入和元数据仓库的构建。In the embodiment of the present application, first, the terminal connects to the business database through the database connection tool or the API provided by the database management terminal. Then, the terminal extracts the required metadata information from the business database, including the database name, schema, table, column, and related comments. For each database object (such as table and field), the terminal can obtain its name, structure definition, and any available descriptive information or comments for subsequent word vector embedding and metadata warehouse construction.

步骤704，对第一元数据进行词向量嵌入，得到第一元数据的向量表示，并基于第一元数据的向量表示构建元数据仓库。Step 704: embed the first metadata into word vectors to obtain a vector representation of the first metadata, and build a metadata warehouse based on the vector representation of the first metadata.

本申请实施例中，终端对提取的每个数据库对象进行预处理，包括文本清洗（例如，去除特殊字符、标点符号等）和分词处理。In the embodiment of the present application, the terminal pre-processes each extracted database object, including text cleaning (for example, removing special characters, punctuation marks, etc.) and word segmentation.

终端可以使用预训练的词向量模型或者嵌入模型，将每个数据库对象（流入，表名、字段名、备注信息等）转换为向量表示，该向量表示为高维度的数值向量，捕捉了对象之间的语义和语法关系。The terminal can use a pre-trained word vector model or embedding model to convert each database object (inflow, table name, field name, comment information, etc.) into a vector representation, which is a high-dimensional numerical vector that captures the semantic and grammatical relationships between objects.

终端将每个数据库对象的词向量表示存储在元数据仓库中，作为业务数据库的对象的一条数据库表项，即构成元数据库中一条数据记录，用于后续的问题文本与元数据之间的相似度计算，并建立有效的索引机制和管理策略，以便快速检索和更新元数据仓库中的词向量数据。The terminal stores the word vector representation of each database object in the metadata warehouse as a database table item of the object of the business database, that is, a data record in the metadata warehouse, which is used for subsequent similarity calculation between the question text and the metadata, and establishes an effective indexing mechanism and management strategy to quickly retrieve and update the word vector data in the metadata warehouse.

本实施例中，通过在业务数据库中提取第一元数据，并对第一元数据进行词向量嵌入，得到包含向量表示的元数据仓库，能够从业务数据库中提取并处理第一元数据，将其转换为数值化的向量表示，并构建一个有效的元数据仓库，为数据分析中第二元数据的获取提供基础，保证生成的数据查询语句中的字段名、表名等与业务数据库的字段名、表名等相匹配，保证数据查询语句的准确性。In this embodiment, by extracting the first metadata from the business database and embedding the first metadata into word vectors, a metadata warehouse containing vector representations is obtained. The first metadata can be extracted and processed from the business database, converted into a numerical vector representation, and an effective metadata warehouse is constructed to provide a basis for obtaining the second metadata in data analysis, ensure that the field names, table names, etc. in the generated data query statements match the field names, table names, etc. of the business database, and ensure the accuracy of the data query statements.

应该理解的是，虽然如上所述的各实施例所涉及的流程图中的各个步骤按照箭头的指示依次显示，但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明，这些步骤的执行并没有严格的顺序限制，这些步骤可以以其它的顺序执行。而且，如上所述的各实施例所涉及的流程图中的至少一部分步骤可以包括多个步骤或者多个阶段，这些步骤或者阶段并不必然是在同一时刻执行完成，而是可以在不同的时刻执行，这些步骤或者阶段的执行顺序也不必然是依次进行，而是可以与其它步骤或者其它步骤中的步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that, although the various steps in the flowcharts involved in the above-mentioned embodiments are displayed in sequence according to the indication of the arrows, these steps are not necessarily executed in sequence according to the order indicated by the arrows. Unless there is a clear explanation in this article, the execution of these steps does not have a strict order restriction, and these steps can be executed in other orders. Moreover, at least a part of the steps in the flowcharts involved in the above-mentioned embodiments can include multiple steps or multiple stages, and these steps or stages are not necessarily executed at the same time, but can be executed at different times, and the execution order of these steps or stages is not necessarily carried out in sequence, but can be executed in turn or alternately with other steps or at least a part of the steps or stages in other steps.

基于同样的发明构思，本申请实施例还提供了一种用于实现上述所涉及的数据分析方法的数据分析装置。该装置所提供的解决问题的实现方案与上述方法中所记载的实现方案相似，故下面所提供的一个或多个数据分析装置实施例中的具体限定可以参见上文中对于数据分析方法的限定，在此不再赘述。Based on the same inventive concept, the embodiment of the present application also provides a data analysis device for implementing the data analysis method involved above. The implementation scheme for solving the problem provided by the device is similar to the implementation scheme recorded in the above method, so the specific limitations in one or more data analysis device embodiments provided below can refer to the limitations on the data analysis method above, and will not be repeated here.

在一个示例性的实施例中，如图8所示，提供了一种数据分析装置800，包括：获取模块801、匹配模块802、第一构建模块803、验证模块804和查询模块805，其中：In an exemplary embodiment, as shown in FIG8 , a data analysis device 800 is provided, comprising: an acquisition module 801, a matching module 802, a first construction module 803, a verification module 804 and a query module 805, wherein:

获取模块801，用于获取数据查询类型的问题文本；Acquisition module 801, used to acquire question text of data query type;

匹配模块802，用于将问题文本与元数据仓库中第一元数据进行相似度匹配，并根据相似度匹配结果，在第一元数据中确定问题文本对应的第二元数据；A matching module 802 is used to perform similarity matching between the question text and the first metadata in the metadata repository, and determine the second metadata corresponding to the question text in the first metadata according to the similarity matching result;

第一构建模块803，用于根据问题文本、第二元数据和上下文提示模版，构建初始上下文提示文本；初始上下文提示文本中包含链式验证逻辑；The first construction module 803 is used to construct an initial context prompt text according to the question text, the second metadata and the context prompt template; the initial context prompt text includes a chain verification logic;

验证模块804，用于根据链式验证逻辑和大语言模型对第二元数据进行验证，并基于验证结果对初始上下文提示文本进行调整，得到目标上下文提示文本；A verification module 804 is used to verify the second metadata according to the chain verification logic and the large language model, and adjust the initial context prompt text based on the verification result to obtain the target context prompt text;

查询模块805，用于基于目标上下文提示文本和大语言模型得到问题文本对应的数据查询语句，并基于数据查询语句在业务数据库中进行检索，得到数据分析结果。The query module 805 is used to obtain a data query statement corresponding to the question text based on the target context prompt text and the large language model, and to search in the business database based on the data query statement to obtain a data analysis result.

在其中一个实施例中，验证模块804具体用于根据链式验证逻辑和大语言模型对第二元数据进行验证，并基于验证结果对第二元数据进行筛选调整，得到目标元数据；In one embodiment, the verification module 804 is specifically used to verify the second metadata according to the chain verification logic and the large language model, and to filter and adjust the second metadata based on the verification result to obtain the target metadata;

基于目标元数据对初始上下文提示文本进行调整，得到目标上下文提示文本。The initial context prompt text is adjusted based on the target metadata to obtain the target context prompt text.

在其中一个实施例中，第二元数据包括数据表和字段；验证模块804具体用于执行链式验证逻辑，根据大语言模型对问题文本的问题意图进行字段与问题意图之间的相关性分析，得到相关性结果；In one embodiment, the second metadata includes a data table and a field; the verification module 804 is specifically used to execute a chain verification logic, and perform a correlation analysis between the field and the question intent of the question text according to the large language model to obtain a correlation result;

基于相关性结果在字段中确定目标字段；determining a target field among the fields based on the correlation results;

将目标字段和包含目标字段的数据表确定为目标数据表；目标字段和目标数据表构成目标元数据。The target field and the data table containing the target field are determined as the target data table; the target field and the target data table constitute the target metadata.

在其中一个实施例中，匹配模块802具体用于将问题文本进行分词处理，得到问题关键词，并将问题关键词进行词嵌入处理，得到问题向量；In one embodiment, the matching module 802 is specifically used to perform word segmentation processing on the question text to obtain question keywords, and perform word embedding processing on the question keywords to obtain a question vector;

计算问题向量与元数据仓库中各第一元数据的向量表示之间的相似度；Calculate the similarity between the question vector and the vector representation of each first metadata in the metadata repository;

将相似度大于预设阈值的向量表示对应的第一元数据，确定为问题文本对应的第二元数据。The first metadata corresponding to the vector representation having a similarity greater than a preset threshold is determined as the second metadata corresponding to the question text.

在其中一个实施例中，第一构建模块803具体用于基于自然语言转换微服务对问题文本内进行语义识别，得到问题文本对应的问题意图；In one embodiment, the first building module 803 is specifically used to perform semantic recognition in the question text based on the natural language conversion microservice to obtain the question intention corresponding to the question text;

根据问题意图、第二元数据和上下文提示模版中的链式验证逻辑，构建初始上下文提示文本。Construct the initial contextual hint text based on the question intent, the secondary metadata, and the chained validation logic in the contextual hint template.

在其中一个实施例中，该装置800还包括：In one embodiment, the apparatus 800 further includes:

第二构建模块，用于对第一元数据进行词向量嵌入，得到第一元数据的向量表示，并基于第一元数据的向量表示构建元数据仓库。The second construction module is used to perform word vector embedding on the first metadata to obtain a vector representation of the first metadata, and to construct a metadata warehouse based on the vector representation of the first metadata.

上述数据分析装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中，也可以以软件形式存储于计算机设备中的存储器中，以便于处理器调用执行以上各个模块对应的操作。Each module in the above data analysis device can be implemented in whole or in part by software, hardware, or a combination thereof. Each module can be embedded in or independent of a processor in a computer device in the form of hardware, or can be stored in a memory in a computer device in the form of software, so that the processor can call and execute operations corresponding to each module.

在一个示例性的实施例中，提供了一种计算机设备，该计算机设备可以是服务器，其内部结构图可以如图9所示。该计算机设备包括处理器、存储器、输入/输出接口(Input/Output，简称I/O）和通信接口。其中，处理器、存储器和输入/输出接口通过系统总线连接，通信接口通过输入/输出接口连接到系统总线。其中，该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质和内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储元数据仓库、数据分析结果。该计算机设备的输入/输出接口用于处理器与外部设备之间交换信息。该计算机设备的通信接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种数据分析方法。In an exemplary embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be shown in FIG9. The computer device includes a processor, a memory, an input/output interface (Input/Output, referred to as I/O) and a communication interface. The processor, the memory and the input/output interface are connected via a system bus, and the communication interface is connected to the system bus via the input/output interface. The processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program and a database. The internal memory provides an environment for the operation of the operating system and the computer program in the non-volatile storage medium. The database of the computer device is used to store metadata warehouses and data analysis results. The input/output interface of the computer device is used to exchange information between the processor and an external device. The communication interface of the computer device is used to communicate with an external terminal through a network connection. When the computer program is executed by the processor, a data analysis method is implemented.

本领域技术人员可以理解，图9中示出的结构，仅仅是与本申请方案相关的部分结构的框图，并不构成对本申请方案所应用于其上的计算机设备的限定，具体的计算机设备可以包括比图中所示更多或更少的部件，或者组合某些部件，或者具有不同的部件布置。Those skilled in the art will understand that the structure shown in FIG. 9 is merely a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may include more or fewer components than shown in the figure, or combine certain components, or have a different arrangement of components.

在一个示例性的实施例中，提供了一种计算机设备，包括存储器和处理器，存储器中存储有计算机程序，该处理器执行计算机程序时实现以下步骤：In an exemplary embodiment, a computer device is provided, including a memory and a processor, wherein a computer program is stored in the memory, and when the processor executes the computer program, the following steps are implemented:

将问题文本与元数据仓库中第一元数据进行相似度匹配，并根据相似度匹配结果，在第一元数据中确定问题文本对应的第二元数据；Performing similarity matching between the question text and the first metadata in the metadata repository, and determining the second metadata corresponding to the question text in the first metadata according to the similarity matching result;

根据问题文本、第二元数据和上下文提示模版，构建初始上下文提示文本；初始上下文提示文本中包含链式验证逻辑；Constructing an initial context prompt text according to the question text, the second metadata and the context prompt template; the initial context prompt text includes a chain verification logic;

根据链式验证逻辑和大语言模型对第二元数据进行验证，并基于验证结果对初始上下文提示文本进行调整，得到目标上下文提示文本；Verifying the second metadata according to the chain verification logic and the large language model, and adjusting the initial context prompt text based on the verification result to obtain the target context prompt text;

基于目标上下文提示文本和大语言模型得到问题文本对应的数据查询语句，并基于数据查询语句在业务数据库中进行检索，得到数据分析结果。Based on the target context prompt text and the large language model, a data query statement corresponding to the question text is obtained, and based on the data query statement, a search is performed in the business database to obtain a data analysis result.

在一个实施例中，处理器执行计算机程序时还实现以下步骤：In one embodiment, when the processor executes the computer program, the processor further implements the following steps:

根据链式验证逻辑和大语言模型对第二元数据进行验证，并基于验证结果对第二元数据进行筛选调整，得到目标元数据；Verifying the second metadata according to the chain verification logic and the large language model, and screening and adjusting the second metadata based on the verification result to obtain the target metadata;

执行链式验证逻辑，根据大语言模型对问题文本的问题意图进行字段与问题意图之间的相关性分析，得到相关性结果；Execute chain verification logic, analyze the correlation between the fields and the question intent of the question text based on the large language model, and obtain the correlation result;

将问题文本进行分词处理，得到问题关键词，并将问题关键词进行词嵌入处理，得到问题向量；Perform word segmentation on the question text to obtain question keywords, and then perform word embedding on the question keywords to obtain the question vector;

基于自然语言转换微服务对问题文本内进行语义识别，得到问题文本对应的问题意图；Based on the natural language conversion microservice, semantic recognition is performed in the question text to obtain the question intent corresponding to the question text;

对第一元数据进行词向量嵌入，得到第一元数据的向量表示，并基于第一元数据的向量表示构建元数据仓库。The first metadata is word-embedded to obtain a vector representation of the first metadata, and a metadata warehouse is constructed based on the vector representation of the first metadata.

在一个实施例中，提供了一种计算机可读存储介质，其上存储有计算机程序，计算机程序被处理器执行时实现上述各方法实施例中的步骤。In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored. When the computer program is executed by a processor, the steps in the above method embodiments are implemented.

在一个实施例中，提供了一种计算机程序产品，包括计算机程序，该计算机程序被处理器执行时实现上述各方法实施例中的步骤。In one embodiment, a computer program product is provided, including a computer program, which implements the steps in the above method embodiments when executed by a processor.

需要说明的是，本申请所涉及的用户信息（包括但不限于用户设备信息、用户个人信息等）和数据（包括但不限于用于分析的数据、存储的数据、展示的数据等），均为经用户授权或者经过各方充分授权的信息和数据，且相关数据的收集、使用和处理需要符合相关规定。It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, stored data, displayed data, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data must comply with relevant regulations.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，是可以通过计算机程序来指令相关的硬件来完成，所述的计算机程序可存储于一非易失性计算机可读取存储介质中，该计算机程序在执行时，可包括如上述各方法的实施例的流程。其中，本申请提供的各实施例中所使用的对存储器、数据库或其它介质的任何引用，均可包括非易失性存储器和易失性存储器中的至少一种。非易失性存储器可包括只读存储器（Read-Only Memory，ROM）、磁带、软盘、闪存、光存储器、高密度嵌入式非易失性存储器、阻变存储器（Resistive Random Access Memory，ReRAM）、磁变存储器（Magnetoresistive RandomAccess Memory，MRAM）、铁电存储器（Ferroelectric Random Access Memory，FRAM）、相变存储器（Phase Change Memory，PCM）、石墨烯存储器等。易失性存储器可包括随机存取存储器（Random Access Memory，RAM）或外部高速缓冲存储器等。作为说明而非局限，RAM可以是多种形式，比如静态随机存取存储器（Static Random Access Memory，SRAM）或动态随机存取存储器（Dynamic Random Access Memory，DRAM）等。本申请提供的各实施例中所涉及的数据库可包括关系型数据库和非关系型数据库中至少一种。非关系型数据库可包括基于区块链的分布式数据库等，不限于此。本申请提供的各实施例中所涉及的处理器可为通用处理器、中央处理器、图形处理器、数字信号处理器、可编程逻辑器、基于量子计算的数据处理逻辑器、人工智能（Artificial Intelligence，AI）处理器等，不限于此。A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment method can be completed by instructing the relevant hardware through a computer program, and the computer program can be stored in a non-volatile computer-readable storage medium. When the computer program is executed, it can include the processes of the embodiments of the above-mentioned methods. Among them, any reference to the memory, database or other medium used in the embodiments provided in the present application can include at least one of non-volatile memory and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive random access memory (ReRAM), magnetic random access memory (MRAM), ferroelectric random access memory (FRAM), phase change memory (PCM), graphene memory, etc. Volatile memory can include random access memory (RAM) or external cache memory, etc. As an illustration and not limitation, RAM can be in various forms, such as static random access memory (SRAM) or dynamic random access memory (DRAM). The database involved in each embodiment provided in this application may include at least one of a relational database and a non-relational database. Non-relational databases may include distributed databases based on blockchains, etc., but are not limited to this. The processor involved in each embodiment provided in this application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic device, a data processing logic device based on quantum computing, an artificial intelligence (AI) processor, etc., but are not limited to this.

以上实施例的各技术特征可以进行任意的组合，为使描述简洁，未对上述实施例中的各个技术特征所有可能的组合都进行描述，然而，只要这些技术特征的组合不存在矛盾，都应当认为是本申请记载的范围。The technical features of the above embodiments may be arbitrarily combined. To make the description concise, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this application.

以上所述实施例仅表达了本申请的几种实施方式，其描述较为具体和详细，但并不能因此而理解为对本申请专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本申请构思的前提下，还可以做出若干变形和改进，这些都属于本申请的保护范围。因此，本申请的保护范围应以所附权利要求为准。The above-described embodiments only express several implementation methods of the present application, and the descriptions thereof are relatively specific and detailed, but they cannot be construed as limiting the scope of the present application. It should be noted that, for a person of ordinary skill in the art, several modifications and improvements can be made without departing from the concept of the present application, and these all belong to the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the attached claims.