CN120296111A

Movatterモバイル変換

Info

Publication number: CN120296111A
Application number: CN202510780524.5A
Authority: CN
Inventors: 宋金宝; 李鑫; 赵丹; 张婉莘
Original assignee: Communication University of China
Current assignee: Communication University of China
Priority date: 2025-06-12
Filing date: 2025-06-12
Publication date: 2025-07-11

Abstract

本发明公开了一种基于大模型和RAG技术的知识构建方法及系统，属于大模型技术领域，本发明通过跨文本块互信息计算与RAG增强推理，系统能够量化实体间的统计相关性，并结合外部知识库的上下文信息，挖掘跨段落或文档的潜在关联关系，采用动态语义分割策略，确保文本块内部语义一致，并通过保留重叠部分维持上下文连续性，通过多维指代消解机制，综合判定同一实体的多表述，降低错误消解率，结合动态分割与混合检索，解决大模型输入长度限制问题，并通过语义检索与关键词匹配互补，提升召回率，本发明显著解决了传统知识构建中的语义断裂、跨文本关联缺失、实体对齐精度低等关键问题。

The present invention discloses a knowledge construction method and system based on a large model and RAG technology, belonging to the technical field of large models. The present invention uses cross-text block mutual information calculation and RAG enhanced reasoning, so that the system can quantify the statistical correlation between entities, and combine the context information of an external knowledge base to mine potential associations across paragraphs or documents. A dynamic semantic segmentation strategy is adopted to ensure the semantic consistency within the text block, and maintain the context continuity by retaining the overlapping parts. Through a multi-dimensional reference resolution mechanism, multiple expressions of the same entity are comprehensively determined to reduce the error resolution rate. Dynamic segmentation and hybrid retrieval are combined to solve the problem of large model input length limitation, and semantic retrieval and keyword matching are complemented to improve the recall rate. The present invention significantly solves key problems in traditional knowledge construction, such as semantic discontinuity, lack of cross-text association, and low entity alignment accuracy.