Movatterモバイル変換


[0]ホーム

URL:


CN118211028A - Automated data element product review method and system based on large language model - Google Patents

Automated data element product review method and system based on large language model
Download PDF

Info

Publication number
CN118211028A
CN118211028ACN202410384836.XACN202410384836ACN118211028ACN 118211028 ACN118211028 ACN 118211028ACN 202410384836 ACN202410384836 ACN 202410384836ACN 118211028 ACN118211028 ACN 118211028A
Authority
CN
China
Prior art keywords
language model
data
llama
large language
compliance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410384836.XA
Other languages
Chinese (zh)
Inventor
李源
单震
谢传家
邱阳
唐婧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chaozhou Zhuoshu Big Data Industry Development Co Ltd
Original Assignee
Chaozhou Zhuoshu Big Data Industry Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chaozhou Zhuoshu Big Data Industry Development Co LtdfiledCriticalChaozhou Zhuoshu Big Data Industry Development Co Ltd
Priority to CN202410384836.XApriorityCriticalpatent/CN118211028A/en
Publication of CN118211028ApublicationCriticalpatent/CN118211028A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

The invention discloses an automatic data element product examination method and system based on a large language model, which belongs to the technical field of large language models and intelligent agents, and aims to solve the technical problem of how to realize automatic examination of data products and improve examination efficiency and accuracy, and adopts the following technical scheme: the method is completed cooperatively through an Agent AI Agent and LLaMA-2 big language models; the method comprises the following steps: data collection and pretreatment; feature extraction and representation learning; deep learning analysis and compliance assessment; report generation and feedback mechanisms; continuously monitoring and updating the model.

Description

Translated fromChinese
基于大语言模型的自动化数据要素产品审查方法及系统Automated data element product review method and system based on large language model

技术领域Technical Field

本发明涉及大语言模型和智能体技术领域,具体地说是一种基于大语言模型的自动化数据要素产品审查方法及系统。The present invention relates to the field of large language models and intelligent agent technology, and specifically to an automated data element product review method and system based on a large language model.

背景技术Background technique

数据产品的合规性审查是一个复杂且重要的过程,它涉及到多个方面以确保数据产品符合法律法规、行业准则和标准的规定,同时保护用户隐私和数据安全。以下是一些关于数据产品合规性审查的关键方面:The compliance review of data products is a complex and important process that involves multiple aspects to ensure that data products comply with laws, regulations, industry guidelines and standards while protecting user privacy and data security. The following are some key aspects of data product compliance review:

首先,需要从产品的功能和内容、客户(用户)、产品提供方式/途径等方面梳理产品的业务模式。这包括识别开展业务所需的资质,判断产品业务模式是否符合法律和监管要求,以及是否存在违反公序良俗、社会公德或对企业或他人带来不良影响的风险。First, it is necessary to sort out the product's business model from the aspects of product functions and content, customers (users), product provision methods/channels, etc. This includes identifying the qualifications required to conduct business, judging whether the product's business model complies with legal and regulatory requirements, and whether there is a risk of violating public order and good customs, social morality, or bringing adverse effects to the company or others.

其次,数据合规性审查的重点在于产品网络安全、数据安全和个人信息保护方面。这涉及到对数据的收集、存储、处理、传输和共享等环节的合规性要求。例如,确保数据收集具有合法性、透明性,并符合用户同意的原则;数据存储应保证数据的完整性和可用性,防止数据被非法访问或篡改;数据处理和传输应遵守相关的加密和安全协议,确保数据在传输过程中不被窃取或篡改。Secondly, the focus of data compliance review is on product network security, data security and personal information protection. This involves compliance requirements for data collection, storage, processing, transmission and sharing. For example, ensure that data collection is legal, transparent and in accordance with the principle of user consent; data storage should ensure the integrity and availability of data to prevent data from being illegally accessed or tampered with; data processing and transmission should comply with relevant encryption and security protocols to ensure that data is not stolen or tampered with during transmission.

此外,数据合规性审查还需要考虑条约、法律法规、行业准则以及企业自身的章程和规章制度等方面的要求。这意味着企业及其员工在处理数据时必须遵守相关法律法规和行业准则,以确保数据的合规性。In addition, data compliance review also needs to consider the requirements of treaties, laws and regulations, industry standards, and the company's own charter and rules and regulations. This means that companies and their employees must comply with relevant laws and regulations and industry standards when processing data to ensure data compliance.

最后,建立有效的数据合规管理体系是确保数据产品合规性审查顺利进行的关键。这包括组织体系、制度体系、运营体系和保障体系等方面的建设。通过明确职责、制定详细的操作规范、加强内部监管和建立应急响应机制等措施,可以确保数据产品在处理和使用数据的过程中始终保持合规性。Finally, establishing an effective data compliance management system is the key to ensuring the smooth compliance review of data products. This includes the construction of organizational system, institutional system, operation system and guarantee system. By clarifying responsibilities, formulating detailed operating specifications, strengthening internal supervision and establishing emergency response mechanisms, it is possible to ensure that data products always maintain compliance during the process of processing and using data.

随着大数据的快速发展,数据产品的合规性审查成为确保企业数据安全和符合法规要求的关键环节。传统的数据产品审查方法主要依赖人工操作和基础自动化工具,存在效率低下和难以适应不断变化的合规要求的问题。With the rapid development of big data, compliance review of data products has become a key link to ensure enterprise data security and compliance with regulatory requirements. Traditional data product review methods mainly rely on manual operations and basic automation tools, which are inefficient and difficult to adapt to changing compliance requirements.

故如何实现数据产品的自动化审查,提高审查效率和准确性是目前亟待解决的技术问题。Therefore, how to achieve automated review of data products and improve review efficiency and accuracy is a technical problem that needs to be solved urgently.

发明内容Summary of the invention

本发明的技术任务是提供一种基于大语言模型的自动化数据要素产品审查方法及系统,来解决如何实现数据产品的自动化审查,提高审查效率和准确性的问题。The technical task of the present invention is to provide an automated data element product review method and system based on a large language model to solve the problem of how to realize automated review of data products and improve review efficiency and accuracy.

本发明的技术任务是按以下方式实现的,一种基于大语言模型的自动化数据要素产品审查方法,该方法是通过智能体AI Agent和LLaMA-2大语言模型协同完成;具体如下:The technical task of the present invention is achieved in the following way: an automated data element product review method based on a large language model, which is accomplished by the collaboration of an intelligent AI Agent and an LLaMA-2 large language model; specifically as follows:

数据收集与预处理;Data collection and preprocessing;

特征提取与表示学习;Feature extraction and representation learning;

深度学习分析与合规性评估;Deep learning analysis and compliance assessment;

报告生成与反馈机制;Report generation and feedback mechanism;

持续监控与模型更新。Continuous monitoring and model updating.

作为优选,数据收集与预处理具体如下:As a preference, data collection and preprocessing are specifically as follows:

通过智能体AI Agent自动地从多个数据源收集数据产品相关的数据,并利用LLaMA-2大语言模型进行自然语言处理、理解并标准化数据格式;Automatically collect data related to data products from multiple data sources through AI Agent, and use LLaMA-2 large language model to process natural language, understand and standardize data formats;

智能体AI Agent通过LLaMA-2大语言模型提供的语义理解结果对数据进行清洗及去重的预处理操作。The AI Agent performs preprocessing operations on data cleaning and deduplication through the semantic understanding results provided by the LLaMA-2 large language model.

作为优选,特征提取与表示学习具体如下:As a preferred method, feature extraction and representation learning are as follows:

利用LLaMA-2大语言模型的自然语言处理功能,智能体AI Agent从预处理后的数据汇总提取与数据产品合规性相关的关键特征;Using the natural language processing capabilities of the LLaMA-2 large language model, the AI Agent extracts key features related to data product compliance from the pre-processed data summary;

采用表示学习方法,智能体AI Agent将提取的特征转换为向量表示,并基于LLaMA-2大语言模型的深度语义理解进行优化。Using representation learning methods, the AI Agent converts the extracted features into vector representations and optimizes them based on the deep semantic understanding of the LLaMA-2 large language model.

作为优选,深度学习分析和合规性评估具体如下:As a preference, deep learning analysis and compliance assessment are as follows:

通过智能体AI Agent构建深度学习模型,并利用LLaMA-2大语言模型对转换后的向量表示进行高级分析,学习数据产品的内在规律和合规性模式;Build a deep learning model through AI Agent, and use the LLaMA-2 large language model to perform advanced analysis on the converted vector representation to learn the inherent laws and compliance patterns of data products;

智能体AI Agent结合预设的合规性规则和LLaMA-2大语言模型的深度分析结果,对数据进行综合合规性评估。The AI Agent combines preset compliance rules and the in-depth analysis results of the LLaMA-2 large language model to conduct a comprehensive compliance assessment of the data.

作为优选,报告生成与反馈机制具体如下:As a preferred embodiment, the report generation and feedback mechanism is as follows:

基于LLaMA-2大语言模型的自然语言生成功能,智能体AI Agent自动生成清晰且准确的合规性报告,并详细列出合规性风险点、风险等级级建议措施;Based on the natural language generation function of the LLaMA-2 large language model, the AI Agent automatically generates clear and accurate compliance reports, and lists compliance risk points, risk level and recommended measures in detail;

建立反馈机制,智能体AI Agent根据人工审核结果和LLaMA-2大语言模型的语音理解,优化报告的准确性和可读性。A feedback mechanism is established, and the AI Agent optimizes the accuracy and readability of the report based on the manual review results and the speech understanding of the LLaMA-2 large language model.

更优地,持续监控与模型更新具体如下:Preferably, continuous monitoring and model updating are as follows:

智能体AI Agent持续监控新的数据和变化,利用LLaMA-2大语言模型进行实时语义分析,确保数据的实时合规性检测;The AI Agent continuously monitors new data and changes, and uses the LLaMA-2 large language model to perform real-time semantic analysis to ensure real-time compliance detection of data;

根据新的合规性规则和案例,智能体AI Agent定期更新深度学习模型,并基于LLaMA-2大语言模型的深度语义理解优化深度学习模型的参数。According to new compliance rules and cases, the AI Agent regularly updates the deep learning model and optimizes the parameters of the deep learning model based on the deep semantic understanding of the LLaMA-2 large language model.

一种基于大语言模型的自动化数据要素产品审查系统,该系统包括智能体AIAgent和LLaMA-2大语言模型,智能体AI Agent和LLaMA-2大语言模型协同完成,智能体AIAgent用于自动化执行数据收集、预处理、特征提取、深度学习分析、合规性评估、报告生成以及持续监控;LLaMA-2大语言模型用于利用深度语义理解对数据进行自然语言处理、特征提取优化、深度学习分析支持以及自然语言报告生成。An automated data element product review system based on a large language model, the system includes an intelligent agent AI Agent and an LLaMA-2 large language model, the intelligent agent AI Agent and the LLaMA-2 large language model work together, the intelligent agent AI Agent is used to automatically perform data collection, preprocessing, feature extraction, deep learning analysis, compliance assessment, report generation and continuous monitoring; the LLaMA-2 large language model is used to use deep semantic understanding to perform natural language processing, feature extraction optimization, deep learning analysis support and natural language report generation on data.

作为优选,智能体AI Agent包括:Preferably, the AI Agent includes:

数据收集模块,用于从多个数据源收集数据产品相关的数据,并利用LLaMA-2大语言模型进行自然语言处理、理解并标准化数据格式;The data collection module is used to collect data related to data products from multiple data sources and use the LLaMA-2 large language model to perform natural language processing, understand and standardize the data format;

预处理模块,用于通过LLaMA-2大语言模型提供的语义理解结果对数据进行清洗及去重的预处理操作;The preprocessing module is used to clean and remove duplicate data through the semantic understanding results provided by the LLaMA-2 large language model;

特征提取模块,用于利用LLaMA-2大语言模型的自然语言处理功能,从预处理后的数据汇总提取与数据产品合规性相关的关键特征,并采用表示学习方法,智能体AI Agent将提取的特征转换为向量表示,并基于LLaMA-2大语言模型的深度语义理解进行优化;The feature extraction module is used to extract key features related to data product compliance from the preprocessed data summary using the natural language processing function of the LLaMA-2 large language model. The AI Agent uses a representation learning method to convert the extracted features into vector representations and optimizes them based on the deep semantic understanding of the LLaMA-2 large language model.

深度学习分析模块,用于构建深度学习模型,并利用LLaMA-2大语言模型对转换后的向量表示进行高级分析,学习数据产品的内在规律和合规性模式;The deep learning analysis module is used to build deep learning models and use the LLaMA-2 large language model to perform advanced analysis on the converted vector representations to learn the inherent laws and compliance patterns of data products;

合规性评估模块,用于结合预设的合规性规则和LLaMA-2大语言模型的深度分析结果,对数据进行综合合规性评估;The compliance assessment module is used to conduct a comprehensive compliance assessment of the data by combining the preset compliance rules and the in-depth analysis results of the LLaMA-2 large language model;

报告生成模块,用于基于LLaMA-2大语言模型的自然语言生成功能,智能体AIAgent自动生成清晰且准确的合规性报告,并详细列出合规性风险点、风险等级级建议措施;Report generation module, which is used for natural language generation based on the LLaMA-2 large language model. The AIAgent automatically generates clear and accurate compliance reports, and lists compliance risk points, risk levels and recommended measures in detail.

反馈机制建立模块,用于根据人工审核结果和LLaMA-2大语言模型的语音理解,优化报告的准确性和可读性;A feedback mechanism to build a module for optimizing report accuracy and readability based on manual review results and speech understanding from the LLaMA-2 large language model;

持续监控模块,用于持续监控新的数据和变化,利用LLaMA-2大语言模型进行实时语义分析,确保数据的实时合规性检测;The continuous monitoring module is used to continuously monitor new data and changes, and use the LLaMA-2 large language model to perform real-time semantic analysis to ensure real-time compliance detection of data;

模型更新模块,用于根据新的合规性规则和案例,智能体AI Agent定期更新深度学习模型,并基于LLaMA-2大语言模型的深度语义理解优化深度学习模型的参数。The model update module is used to periodically update the deep learning model based on new compliance rules and cases, and optimize the parameters of the deep learning model based on the deep semantic understanding of the LLaMA-2 large language model.

一种电子设备,包括:存储器和至少一个处理器;An electronic device comprising: a memory and at least one processor;

其中,所述存储器上存储有计算机程序;Wherein, the memory stores a computer program;

所述至少一个处理器执行所述存储器存储的计算机程序,使得所述至少一个处理器执行如上述的基于大语言模型的自动化数据要素产品审查方法。The at least one processor executes the computer program stored in the memory, so that the at least one processor performs the automated data element product review method based on a large language model as described above.

一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序可被处理器执行以实现如上述的基于大语言模型的自动化数据要素产品审查方法。A computer-readable storage medium having a computer program stored therein, wherein the computer program can be executed by a processor to implement the above-mentioned automated data element product review method based on a large language model.

本发明的基于大语言模型的自动化数据要素产品审查方法及系统具有以下优点:The method and system for automatic data element product review based on a large language model of the present invention have the following advantages:

(一)本发明通过结合LLaMA-2大语言模型的深度语义理解和智能体AI Agent的自主决策能力,实现了数据产品的自动化审查,提高了审查效率和准确性,确保了数据产品的合规性;(1) The present invention realizes the automated review of data products by combining the deep semantic understanding of the LLaMA-2 large language model and the autonomous decision-making ability of the intelligent AI Agent, improves the review efficiency and accuracy, and ensures the compliance of data products;

(二)智能体AI Agent作为本发明的核心组件之一,智能体AI Agent负责自动化执行数据收集、预处理、特征提取、深度学习分析、合规性评估、报告生成以及持续监控等任务,智能体AI Agent具备自主决策能力,能够根据LLaMA-2大语言模型的语义理解结果优化操作策略,提高了审查效率和准确性;(ii) As one of the core components of the present invention, the AI Agent is responsible for automatically performing tasks such as data collection, preprocessing, feature extraction, deep learning analysis, compliance assessment, report generation, and continuous monitoring. The AI Agent has the ability to make independent decisions and can optimize the operation strategy based on the semantic understanding results of the LLaMA-2 large language model, thereby improving the efficiency and accuracy of the review;

(三)LLaMA-2大语言模型在本发明中发挥着关键作用,大语言模型利用深度语义理解对数据进行自然语言处理、特征提取优化、深度学习分析支持以及自然语言报告生成等,LLaMA-2大语言模型的强大语义处理能力使得智能体AI Agent能够更准确地理解数据内容,从而提高了审查的精确度和效率;(III) The LLaMA-2 large language model plays a key role in the present invention. The large language model uses deep semantic understanding to perform natural language processing, feature extraction optimization, deep learning analysis support, and natural language report generation on data. The powerful semantic processing capability of the LLaMA-2 large language model enables the intelligent AI Agent to understand the data content more accurately, thereby improving the accuracy and efficiency of the review;

(四)本发明通过结合LLaMA-2大语言模型和智能体AI Agent的技术优势,实现了数据产品的自动化审查,显著提高了审查效率和准确性;同时,本发明能够适应不断变化的合规要求,降低企业面临的合规风险,并促进企业的合规文化建设。(IV) The present invention combines the technical advantages of the LLaMA-2 large language model and the intelligent AI Agent to achieve automated review of data products, significantly improving review efficiency and accuracy. At the same time, the present invention can adapt to ever-changing compliance requirements, reduce compliance risks faced by enterprises, and promote the establishment of a compliance culture in enterprises.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

下面结合附图对本发明进一步说明。The present invention is further described below in conjunction with the accompanying drawings.

附图1为基于大语言模型的自动化数据要素产品审查方法的示意图。Figure 1 is a schematic diagram of an automated data element product review method based on a large language model.

具体实施方式Detailed ways

参照说明书附图和具体实施例对本发明的基于大语言模型的自动化数据要素产品审查方法及系统作以下详细地说明。The automated data element product review method and system based on a large language model of the present invention are described in detail below with reference to the accompanying drawings and specific embodiments of the specification.

实施例1:Embodiment 1:

如附图1所示,本实施例提供了一种基于大语言模型的自动化数据要素产品审查方法,该方法是通过智能体AI Agent和LLaMA-2大语言模型协同完成;具体如下:As shown in FIG. 1 , this embodiment provides an automated data element product review method based on a large language model, which is accomplished by collaboration between an AI Agent and an LLaMA-2 large language model; specifically, as follows:

S1、数据收集与预处理;S1, data collection and preprocessing;

S2、特征提取与表示学习;S2, feature extraction and representation learning;

S3、深度学习分析与合规性评估;S3, Deep learning analysis and compliance assessment;

S4、报告生成与反馈机制;S4. Report generation and feedback mechanism;

S5、持续监控与模型更新。S5. Continuous monitoring and model updating.

本实施例步骤S1中的数据收集与预处理具体如下:The data collection and preprocessing in step S1 of this embodiment are specifically as follows:

S101、通过智能体AI Agent自动地从多个数据源收集数据产品相关的数据,并利用LLaMA-2大语言模型进行自然语言处理、理解并标准化数据格式;S101, automatically collect data related to data products from multiple data sources through the AI Agent, and use the LLaMA-2 large language model to perform natural language processing, understand and standardize the data format;

S102、智能体AI Agent通过LLaMA-2大语言模型提供的语义理解结果对数据进行清洗及去重的预处理操作。S102, the AI Agent performs pre-processing operations on the data by cleaning and deduplicating the semantic understanding results provided by the LLaMA-2 large language model.

本实施例步骤S2中的特征提取与表示学习具体如下:The feature extraction and representation learning in step S2 of this embodiment are specifically as follows:

S201、利用LLaMA-2大语言模型的自然语言处理功能,智能体AI Agent从预处理后的数据汇总提取与数据产品合规性相关的关键特征;S201. Using the natural language processing capabilities of the LLaMA-2 large language model, the AI Agent extracts key features related to data product compliance from the pre-processed data summary;

S202、采用表示学习方法,智能体AI Agent将提取的特征转换为向量表示,并基于LLaMA-2大语言模型的深度语义理解进行优化。S202. Using the representation learning method, the AI Agent converts the extracted features into vector representations and optimizes them based on the deep semantic understanding of the LLaMA-2 large language model.

本实施例步骤S3中的深度学习分析和合规性评估具体如下:The deep learning analysis and compliance assessment in step S3 of this embodiment are specifically as follows:

S301、通过智能体AI Agent构建深度学习模型,并利用LLaMA-2大语言模型对转换后的向量表示进行高级分析,学习数据产品的内在规律和合规性模式;S301. Build a deep learning model through AI Agent, and use the LLaMA-2 large language model to perform advanced analysis on the converted vector representation to learn the inherent laws and compliance patterns of data products;

S302、智能体AI Agent结合预设的合规性规则和LLaMA-2大语言模型的深度分析结果,对数据进行综合合规性评估。S302, the AI Agent combines the preset compliance rules and the in-depth analysis results of the LLaMA-2 large language model to conduct a comprehensive compliance assessment of the data.

本实施例步骤S4中的报告生成与反馈机制具体如下:The report generation and feedback mechanism in step S4 of this embodiment is specifically as follows:

S401、基于LLaMA-2大语言模型的自然语言生成功能,智能体AI Agent自动生成清晰且准确的合规性报告,并详细列出合规性风险点、风险等级级建议措施;S401. Based on the natural language generation function of the LLaMA-2 large language model, the AI Agent automatically generates a clear and accurate compliance report, and lists in detail the compliance risk points, risk level and recommended measures;

S402、建立反馈机制,智能体AI Agent根据人工审核结果和LLaMA-2大语言模型的语音理解,优化报告的准确性和可读性。S402. Establish a feedback mechanism. The AI Agent optimizes the accuracy and readability of the report based on the manual review results and the speech understanding of the LLaMA-2 large language model.

本实施例步骤S5中的持续监控与模型更新具体如下:The continuous monitoring and model updating in step S5 of this embodiment are specifically as follows:

S501、智能体AI Agent持续监控新的数据和变化,利用LLaMA-2大语言模型进行实时语义分析,确保数据的实时合规性检测;S501, AI Agent continuously monitors new data and changes, and uses the LLaMA-2 large language model for real-time semantic analysis to ensure real-time compliance detection of data;

S502、根据新的合规性规则和案例,智能体AI Agent定期更新深度学习模型,并基于LLaMA-2大语言模型的深度语义理解优化深度学习模型的参数。S502. According to new compliance rules and cases, the AI Agent regularly updates the deep learning model and optimizes the parameters of the deep learning model based on the deep semantic understanding of the LLaMA-2 large language model.

实施例2:Embodiment 2:

本实施例提供了一种基于大语言模型的自动化数据要素产品审查系统,该系统包括智能体AI Agent和LLaMA-2大语言模型,智能体AI Agent和LLaMA-2大语言模型协同完成,智能体AI Agent用于自动化执行数据收集、预处理、特征提取、深度学习分析、合规性评估、报告生成以及持续监控;LLaMA-2大语言模型用于利用深度语义理解对数据进行自然语言处理、特征提取优化、深度学习分析支持以及自然语言报告生成。This embodiment provides an automated data element product review system based on a large language model, which includes an intelligent AI Agent and an LLaMA-2 large language model. The intelligent AI Agent and the LLaMA-2 large language model work together. The intelligent AI Agent is used to automatically perform data collection, preprocessing, feature extraction, deep learning analysis, compliance assessment, report generation, and continuous monitoring; the LLaMA-2 large language model is used to use deep semantic understanding to perform natural language processing, feature extraction optimization, deep learning analysis support, and natural language report generation on the data.

本实施例中的智能体AI Agent包括:The AI Agent in this embodiment includes:

数据收集模块,用于从多个数据源收集数据产品相关的数据,并利用LLaMA-2大语言模型进行自然语言处理、理解并标准化数据格式;The data collection module is used to collect data related to data products from multiple data sources and use the LLaMA-2 large language model to perform natural language processing, understand and standardize the data format;

预处理模块,用于通过LLaMA-2大语言模型提供的语义理解结果对数据进行清洗及去重的预处理操作;The preprocessing module is used to clean and remove duplicate data through the semantic understanding results provided by the LLaMA-2 large language model;

特征提取模块,用于利用LLaMA-2大语言模型的自然语言处理功能,从预处理后的数据汇总提取与数据产品合规性相关的关键特征,并采用表示学习方法,智能体AI Agent将提取的特征转换为向量表示,并基于LLaMA-2大语言模型的深度语义理解进行优化;The feature extraction module is used to extract key features related to data product compliance from the preprocessed data summary using the natural language processing function of the LLaMA-2 large language model. The AI Agent uses a representation learning method to convert the extracted features into vector representations and optimizes them based on the deep semantic understanding of the LLaMA-2 large language model.

深度学习分析模块,用于构建深度学习模型,并利用LLaMA-2大语言模型对转换后的向量表示进行高级分析,学习数据产品的内在规律和合规性模式;The deep learning analysis module is used to build deep learning models and use the LLaMA-2 large language model to perform advanced analysis on the converted vector representations to learn the inherent laws and compliance patterns of data products;

合规性评估模块,用于结合预设的合规性规则和LLaMA-2大语言模型的深度分析结果,对数据进行综合合规性评估;The compliance assessment module is used to conduct a comprehensive compliance assessment of the data by combining the preset compliance rules and the in-depth analysis results of the LLaMA-2 large language model;

报告生成模块,用于基于LLaMA-2大语言模型的自然语言生成功能,智能体AIAgent自动生成清晰且准确的合规性报告,并详细列出合规性风险点、风险等级级建议措施;Report generation module, which is used for natural language generation based on the LLaMA-2 large language model. The AIAgent automatically generates clear and accurate compliance reports, and lists compliance risk points, risk levels and recommended measures in detail.

反馈机制建立模块,用于根据人工审核结果和LLaMA-2大语言模型的语音理解,优化报告的准确性和可读性;A feedback mechanism to build a module for optimizing report accuracy and readability based on manual review results and speech understanding from the LLaMA-2 large language model;

持续监控模块,用于持续监控新的数据和变化,利用LLaMA-2大语言模型进行实时语义分析,确保数据的实时合规性检测;The continuous monitoring module is used to continuously monitor new data and changes, and use the LLaMA-2 large language model to perform real-time semantic analysis to ensure real-time compliance detection of data;

模型更新模块,用于根据新的合规性规则和案例,智能体AI Agent定期更新深度学习模型,并基于LLaMA-2大语言模型的深度语义理解优化深度学习模型的参数。The model update module is used to periodically update the deep learning model based on new compliance rules and cases, and optimize the parameters of the deep learning model based on the deep semantic understanding of the LLaMA-2 large language model.

实施例3:Embodiment 3:

本实施例还提供了一种电子设备,包括:存储器和至少一个处理器;This embodiment also provides an electronic device, including: a memory and at least one processor;

其中,所述存储器存储计算机执行指令;Wherein, the memory stores computer-executable instructions;

所述至少一个处理器执行所述存储器存储的计算机执行指令,使得所述至少一个处理器执行本发明任一实施例中的基于大语言模型的自动化数据要素产品审查方法。The at least one processor executes the computer-executable instructions stored in the memory, so that the at least one processor executes the automated data element product review method based on a large language model in any embodiment of the present invention.

实施例4:Embodiment 4:

本实施例还提供了一种计算机可读存储介质,其中存储有多条指令,指令由处理器加载,使处理器执行本发明任一实施例中的基于大语言模型的自动化数据要素产品审查方法。具体地,可以提供配有存储介质的系统或者装置,在该存储介质上存储着实现上述实施例中任一实施例的功能的软件程序代码,且使该系统或者装置的计算机(或CPU或MPU)读出并执行存储在存储介质中的程序代码。This embodiment also provides a computer-readable storage medium, in which a plurality of instructions are stored, and the instructions are loaded by a processor, so that the processor executes the automated data element product review method based on a large language model in any embodiment of the present invention. Specifically, a system or device equipped with a storage medium can be provided, on which a software program code that implements the functions of any of the above embodiments is stored, and a computer (or CPU or MPU) of the system or device reads and executes the program code stored in the storage medium.

在这种情况下,从存储介质读取的程序代码本身可实现上述实施例中任何一项实施例的功能,因此程序代码和存储程序代码的存储介质构成了本发明的一部分。In this case, the program code itself read from the storage medium can realize the function of any one of the above-mentioned embodiments, and thus the program code and the storage medium storing the program code constitute a part of the present invention.

用于提供程序代码的存储介质实施例包括软盘、硬盘、磁光盘、光盘(如CD-ROM、CD-R、CD-RW、DVD-ROM、DVD-RYM、DVD-RW、DVD+RW)、磁带、非易失性存储卡和ROM。可选择地,可以由通信网络从服务器计算机上下载程序代码。The storage medium embodiments for providing the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (such as CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RYM, DVD-RW, DVD+RW), a magnetic tape, a non-volatile memory card, and a ROM. Alternatively, the program code can be downloaded from a server computer via a communication network.

此外,应该清楚的是,不仅可以通过执行计算机所读出的程序代码,而且可以通过基于程序代码的指令使计算机上操作的操作系统等来完成部分或者全部的实际操作,从而实现上述实施例中任意一项实施例的功能。In addition, it should be clear that the functions of any of the above embodiments can be implemented not only by executing the program code read by the computer, but also by enabling an operating system operating on the computer to complete part or all of the actual operations based on instructions from the program code.

此外,可以理解的是,将由存储介质读出的程序代码写到插入计算机内的扩展板中所设置的存储器中或者写到与计算机相连接的扩展单元中设置的存储器中,随后基于程序代码的指令使安装在扩展板或者扩展单元上的CPU等来执行部分和全部实际操作,从而实现上述实施例中任一实施例的功能。In addition, it can be understood that the program code read from the storage medium is written to a memory provided in an expansion board inserted into the computer or written to a memory provided in an expansion unit connected to the computer, and then based on the instructions of the program code, a CPU installed on the expansion board or the expansion unit is enabled to perform part or all of the actual operations, thereby realizing the functions of any of the above-mentioned embodiments.

最后应说明的是:以上各实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述各实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit it. Although the present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that they can still modify the technical solutions described in the aforementioned embodiments, or replace some or all of the technical features therein by equivalents. However, these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

Translated fromChinese
1.一种基于大语言模型的自动化数据要素产品审查方法,其特征在于,该方法是通过智能体AI Agent和LLaMA-2大语言模型协同完成;具体如下:1. An automated data element product review method based on a large language model, characterized in that the method is accomplished by the collaboration of an intelligent AI Agent and an LLaMA-2 large language model; specifically as follows:数据收集与预处理;Data collection and preprocessing;特征提取与表示学习;Feature extraction and representation learning;深度学习分析与合规性评估;Deep learning analysis and compliance assessment;报告生成与反馈机制;Report generation and feedback mechanism;持续监控与模型更新。Continuous monitoring and model updating.2.根据权利要求1所述的基于大语言模型的自动化数据要素产品审查方法,其特征在于,数据收集与预处理具体如下:2. The method for automatic data element product review based on a large language model according to claim 1 is characterized in that data collection and preprocessing are specifically as follows:通过智能体AIAgent自动地从多个数据源收集数据产品相关的数据,并利用LLaMA-2大语言模型进行自然语言处理、理解并标准化数据格式;The AI Agent automatically collects data related to data products from multiple data sources, and uses the LLaMA-2 large language model to process natural language, understand and standardize data formats;智能体AI Agent通过LLaMA-2大语言模型提供的语义理解结果对数据进行清洗及去重的预处理操作。The AI Agent performs preprocessing operations on data cleaning and deduplication through the semantic understanding results provided by the LLaMA-2 large language model.3.根据权利要求1所述的基于大语言模型的自动化数据要素产品审查方法,其特征在于,特征提取与表示学习具体如下:3. The method for automatic data element product review based on a large language model according to claim 1 is characterized in that feature extraction and representation learning are specifically as follows:利用LLaMA-2大语言模型的自然语言处理功能,智能体AI Agent从预处理后的数据汇总提取与数据产品合规性相关的关键特征;Using the natural language processing capabilities of the LLaMA-2 large language model, the AI Agent extracts key features related to data product compliance from the pre-processed data summary;采用表示学习方法,智能体AIAgent将提取的特征转换为向量表示,并基于LLaMA-2大语言模型的深度语义理解进行优化。Using representation learning methods, the AI Agent converts the extracted features into vector representations and optimizes them based on the deep semantic understanding of the LLaMA-2 large language model.4.根据权利要求1所述的基于大语言模型的自动化数据要素产品审查方法,其特征在于,深度学习分析和合规性评估具体如下:4. The method for automated data element product review based on a large language model according to claim 1, wherein the deep learning analysis and compliance assessment are specifically as follows:通过智能体AI Agent构建深度学习模型,并利用LLaMA-2大语言模型对转换后的向量表示进行高级分析,学习数据产品的内在规律和合规性模式;Build a deep learning model through AI Agent, and use the LLaMA-2 large language model to perform advanced analysis on the converted vector representation to learn the inherent laws and compliance patterns of data products;智能体AI Agent结合预设的合规性规则和LLaMA-2大语言模型的深度分析结果,对数据进行综合合规性评估。The AI Agent combines preset compliance rules and the in-depth analysis results of the LLaMA-2 large language model to conduct a comprehensive compliance assessment of the data.5.根据权利要求1所述的基于大语言模型的自动化数据要素产品审查方法,其特征在于,报告生成与反馈机制具体如下:5. The method for automated data element product review based on a large language model according to claim 1 is characterized in that the report generation and feedback mechanism is as follows:基于LLaMA-2大语言模型的自然语言生成功能,智能体AI Agent自动生成清晰且准确的合规性报告,并详细列出合规性风险点、风险等级级建议措施;Based on the natural language generation function of the LLaMA-2 large language model, the AI Agent automatically generates clear and accurate compliance reports, and lists compliance risk points, risk level and recommended measures in detail;建立反馈机制,智能体AI Agent根据人工审核结果和LLaMA-2大语言模型的语音理解,优化报告的准确性和可读性。A feedback mechanism is established, and the AI Agent optimizes the accuracy and readability of the report based on the manual review results and the speech understanding of the LLaMA-2 large language model.6.根据权利要求1-5中任一项所述的基于大语言模型的自动化数据要素产品审查方法,其特征在于,持续监控与模型更新具体如下:6. The method for automated data element product review based on a large language model according to any one of claims 1 to 5, characterized in that continuous monitoring and model updating are specifically as follows:智能体AI Agent持续监控新的数据和变化,利用LLaMA-2大语言模型进行实时语义分析,确保数据的实时合规性检测;The AI Agent continuously monitors new data and changes, and uses the LLaMA-2 large language model to perform real-time semantic analysis to ensure real-time compliance detection of data;根据新的合规性规则和案例,智能体AIAgent定期更新深度学习模型,并基于LLaMA-2大语言模型的深度语义理解优化深度学习模型的参数。According to new compliance rules and cases, the AIAgent regularly updates the deep learning model and optimizes the parameters of the deep learning model based on the deep semantic understanding of the LLaMA-2 large language model.7.一种基于大语言模型的自动化数据要素产品审查系统,其特征在于,该系统包括智能体AI Agent和LLaMA-2大语言模型,智能体AI Agent和LLaMA-2大语言模型协同完成,智能体AIAgent用于自动化执行数据收集、预处理、特征提取、深度学习分析、合规性评估、报告生成以及持续监控;LLaMA-2大语言模型用于利用深度语义理解对数据进行自然语言处理、特征提取优化、深度学习分析支持以及自然语言报告生成。7. An automated data element product review system based on a large language model, characterized in that the system includes an intelligent agent AI Agent and an LLaMA-2 large language model, which work together to complete the system. The intelligent agent AI Agent is used to automatically perform data collection, preprocessing, feature extraction, deep learning analysis, compliance assessment, report generation and continuous monitoring; the LLaMA-2 large language model is used to perform natural language processing, feature extraction optimization, deep learning analysis support and natural language report generation on data using deep semantic understanding.8.根据权利要求7所述的基于大语言模型的自动化数据要素产品审查系统,其特征在于,智能体AI Agent包括:8. The automated data element product review system based on a large language model according to claim 7, wherein the AI Agent comprises:数据收集模块,用于从多个数据源收集数据产品相关的数据,并利用LLaMA-2大语言模型进行自然语言处理、理解并标准化数据格式;The data collection module is used to collect data related to data products from multiple data sources and use the LLaMA-2 large language model to perform natural language processing, understand and standardize the data format;预处理模块,用于通过LLaMA-2大语言模型提供的语义理解结果对数据进行清洗及去重的预处理操作;The preprocessing module is used to clean and remove duplicate data through the semantic understanding results provided by the LLaMA-2 large language model;特征提取模块,用于利用LLaMA-2大语言模型的自然语言处理功能,从预处理后的数据汇总提取与数据产品合规性相关的关键特征,并采用表示学习方法,智能体AI Agent将提取的特征转换为向量表示,并基于LLaMA-2大语言模型的深度语义理解进行优化;The feature extraction module is used to extract key features related to data product compliance from the preprocessed data summary using the natural language processing function of the LLaMA-2 large language model. The AI Agent uses a representation learning method to convert the extracted features into vector representations and optimizes them based on the deep semantic understanding of the LLaMA-2 large language model.深度学习分析模块,用于构建深度学习模型,并利用LLaMA-2大语言模型对转换后的向量表示进行高级分析,学习数据产品的内在规律和合规性模式;The deep learning analysis module is used to build deep learning models and use the LLaMA-2 large language model to perform advanced analysis on the converted vector representations to learn the inherent laws and compliance patterns of data products;合规性评估模块,用于结合预设的合规性规则和LLaMA-2大语言模型的深度分析结果,对数据进行综合合规性评估;The compliance assessment module is used to conduct a comprehensive compliance assessment of the data by combining the preset compliance rules and the in-depth analysis results of the LLaMA-2 large language model;报告生成模块,用于基于LLaMA-2大语言模型的自然语言生成功能,智能体AIAgent自动生成清晰且准确的合规性报告,并详细列出合规性风险点、风险等级级建议措施;Report generation module, which is used for natural language generation based on the LLaMA-2 large language model. The AIAgent automatically generates clear and accurate compliance reports, and lists compliance risk points, risk levels and recommended measures in detail.反馈机制建立模块,用于根据人工审核结果和LLaMA-2大语言模型的语音理解,优化报告的准确性和可读性;A feedback mechanism to build a module for optimizing report accuracy and readability based on manual review results and speech understanding from the LLaMA-2 large language model;持续监控模块,用于持续监控新的数据和变化,利用LLaMA-2大语言模型进行实时语义分析,确保数据的实时合规性检测;The continuous monitoring module is used to continuously monitor new data and changes, and use the LLaMA-2 large language model to perform real-time semantic analysis to ensure real-time compliance detection of data;模型更新模块,用于根据新的合规性规则和案例,智能体AI Agent定期更新深度学习模型,并基于LLaMA-2大语言模型的深度语义理解优化深度学习模型的参数。The model update module is used to periodically update the deep learning model based on new compliance rules and cases, and optimize the parameters of the deep learning model based on the deep semantic understanding of the LLaMA-2 large language model.9.一种电子设备,其特征在于,包括:存储器和至少一个处理器;9. An electronic device, comprising: a memory and at least one processor;其中,所述存储器上存储有计算机程序;Wherein, the memory stores a computer program;所述至少一个处理器执行所述存储器存储的计算机程序,使得所述至少一个处理器执行如权利要求1至6任一项所述的基于大语言模型的自动化数据要素产品审查方法。The at least one processor executes the computer program stored in the memory, so that the at least one processor performs the automated data element product review method based on a large language model as described in any one of claims 1 to 6.10.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机程序,所述计算机程序可被处理器执行以实现如权利要求1至6中任一项所述的基于大语言模型的自动化数据要素产品审查方法。10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, and the computer program can be executed by a processor to implement the automated data element product review method based on a large language model as described in any one of claims 1 to 6.
CN202410384836.XA2024-04-012024-04-01 Automated data element product review method and system based on large language modelPendingCN118211028A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202410384836.XACN118211028A (en)2024-04-012024-04-01 Automated data element product review method and system based on large language model

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202410384836.XACN118211028A (en)2024-04-012024-04-01 Automated data element product review method and system based on large language model

Publications (1)

Publication NumberPublication Date
CN118211028Atrue CN118211028A (en)2024-06-18

Family

ID=91452089

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202410384836.XAPendingCN118211028A (en)2024-04-012024-04-01 Automated data element product review method and system based on large language model

Country Status (1)

CountryLink
CN (1)CN118211028A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN116150349A (en)*2021-11-182023-05-23上海数据交易中心有限公司Data product security compliance checking method, device and server
CN116720515A (en)*2023-06-052023-09-08上海识装信息科技有限公司 Sensitive word review methods, storage media and electronic devices based on large language models
CN117291515A (en)*2023-07-112023-12-26红塔烟草(集团)有限责任公司Contract compliance auditing method and system based on generated language model
CN117710160A (en)*2024-01-022024-03-15安徽省招标集团股份有限公司Intelligent auditing method and system for construction engineering construction contract

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN116150349A (en)*2021-11-182023-05-23上海数据交易中心有限公司Data product security compliance checking method, device and server
CN116720515A (en)*2023-06-052023-09-08上海识装信息科技有限公司 Sensitive word review methods, storage media and electronic devices based on large language models
CN117291515A (en)*2023-07-112023-12-26红塔烟草(集团)有限责任公司Contract compliance auditing method and system based on generated language model
CN117710160A (en)*2024-01-022024-03-15安徽省招标集团股份有限公司Intelligent auditing method and system for construction engineering construction contract

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ARMIN BERGER 等: "Towards Automated Regulatory Compliance Verification in Financial Auditing with Large Language Models", 《2023 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIGDATA)》, 22 January 2024 (2024-01-22), pages 4626 - 4635*

Similar Documents

PublicationPublication DateTitle
CN115375146B (en)Digital construction integrated platform
CN113706101B (en) Architecture and method of intelligent system for power grid project management
CN1755732A (en) Method for approval and processing of bank personal credit business data
CN110175741B (en)Information value chain construction method based on power industry and storage medium
CN111461644A (en)Audit information management and control platform
CN107423035B (en)Product data management system in software development process
CN115392805B (en)Transaction type contract compliance risk diagnosis method and system
CN115309913A (en)Deep learning-based financial data risk identification method and system
CN114254908A (en) A risk perception and supervision platform for regional financial non-bank financial institutions
CN116739408A (en)Power grid dispatching safety monitoring method and system based on data tag and electronic equipment
CN119379305A (en) Method for constructing a customer complaint handling system to deal with regulatory complaints
CN114581211A (en) A method and system for debt collection based on machine learning
CN117540965A (en) A metadata-based data asset inventory method and system
CN116342067A (en) A credit review system based on legal digital employees
Jia et al.Development model of enterprise green marketing based on cloud computing
CN118863334B (en) A sulfur hexafluoride full life cycle management and control platform based on Internet of Things technology
CN118917748A (en)Enterprise ESG information disclosure optimization method and rating maintenance system
CN102194156A (en)Method and system for sci-tech novelty retrieval
CN118898234A (en) A system and method for automatically labeling government big data
CN118211028A (en) Automated data element product review method and system based on large language model
CN118195329A (en) Multi-dimensional fusion processing method and system for intelligent risk identification of coal mine safety production
CN107230031A (en)Eco industrial park Third Party Reverse Logistics system network platform
CN117827792A (en)Data asset management method and system
CN113420996A (en)Digital logistics data information management system
Han et al.Artificial Intelligence Technology Development and Audit Innovation

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp