CN111507090B

Movatterモバイル変換

Info

Publication number: CN111507090B
Application number: CN202010125189.2A
Authority: CN
Inventors: 郑立颖; 徐亮; 阮晓雯
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-02-27
Filing date: 2020-02-27
Publication date: 2024-11-15
Anticipated expiration: 2040-02-27
Also published as: CN111507090A; WO2021169217A1

Abstract

Translated fromChinese

本申请提供一种摘要提取方法、装置、设备及计算机可读存储介质，该方法包括：计算语句集中每两个语句之间的句子相似度，并基于TextRank算法和句子相似度，从语句集中筛选出第一摘要候选集；计算语句集中每两个语句之间的余弦相似度，并基于TextRank算法和余弦相似度，从语句集中筛选出第二摘要候选集；基于MMR算法和预设语句个数，分别从第一摘要候选集和第二摘要候选集中筛选出第三摘要候选集第四摘要候选集；分别四个摘要候选集中选择预设摘要语句数量的语句，以形成融合摘要候选集；统计融合摘要候选集中各语句的出现次数，并根据各语句的出现次数，从融合摘要候选集中筛选出目标文本的摘要结果集。本申请涉及数据处理，可以提高摘要提取的准确性。

The present application provides a summary extraction method, device, equipment and computer-readable storage medium, the method comprising: calculating the sentence similarity between every two sentences in a sentence set, and filtering out a first summary candidate set from the sentence set based on the TextRank algorithm and the sentence similarity; calculating the cosine similarity between every two sentences in the sentence set, and filtering out a second summary candidate set from the sentence set based on the TextRank algorithm and the cosine similarity; filtering out a third summary candidate set and a fourth summary candidate set from the first summary candidate set and the second summary candidate set based on the MMR algorithm and the preset number of sentences; selecting sentences with a preset number of summary sentences from the four summary candidate sets to form a fused summary candidate set; counting the number of occurrences of each sentence in the fused summary candidate set, and filtering out a summary result set of the target text from the fused summary candidate set according to the number of occurrences of each sentence. The present application relates to data processing, which can improve the accuracy of summary extraction.