Movatterモバイル変換


[0]ホーム

URL:


CN116521939A - Cross-modal retrieval method for judicial image-text data - Google Patents

Cross-modal retrieval method for judicial image-text data
Download PDF

Info

Publication number
CN116521939A
CN116521939ACN202310551276.8ACN202310551276ACN116521939ACN 116521939 ACN116521939 ACN 116521939ACN 202310551276 ACN202310551276 ACN 202310551276ACN 116521939 ACN116521939 ACN 116521939A
Authority
CN
China
Prior art keywords
data
court trial
original
trial
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310551276.8A
Other languages
Chinese (zh)
Inventor
王斌
宋志鹏
蒋婕
李廷超
曹又潮
迟鹭璎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shuangzhibo Shenyang Electromechanical Equipment Manufacturing Co ltd
Original Assignee
Shuangzhibo Shenyang Electromechanical Equipment Manufacturing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shuangzhibo Shenyang Electromechanical Equipment Manufacturing Co ltdfiledCriticalShuangzhibo Shenyang Electromechanical Equipment Manufacturing Co ltd
Priority to CN202310551276.8ApriorityCriticalpatent/CN116521939A/en
Publication of CN116521939ApublicationCriticalpatent/CN116521939A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本申请涉及多模态数据检索领域,尤其涉及一种面向司法图文数据的跨模态检索方法,包括:确定待构建庭审项目,以及获取原始视频数据、原始音频数据和原始文本数据,以及分别进行特征提取,得到提取视频数据、提取音频数据和提取文本数据并且针对标准多模态数据检索网络进行训练,得到多模态庭审数据检索网络;输入待检索庭审项目至多模态庭审数据检索网络,得到相应的原始视频数据、原始音频数据或者原始文本数据。本申请通过针对待构建庭审项目的提取视频数据、提取音频数据和提取文本数据,构建关于待构建庭审项目的多模态庭审数据检索网络,将待构建庭审项目的多模态数据进行统一空间的存储,方便对待构建庭审项目进行检索。

This application relates to the field of multimodal data retrieval, and in particular to a cross-modal retrieval method for judicial graphic data, including: determining the trial project to be constructed, obtaining original video data, original audio data and original text data, and respectively Perform feature extraction, extract video data, extract audio data, and extract text data, and perform training on the standard multimodal data retrieval network to obtain a multimodal court trial data retrieval network; input the court trial items to be retrieved to the multimodal court trial data retrieval network, Corresponding original video data, original audio data or original text data are obtained. This application builds a multi-modal court trial data retrieval network for the court trial project to be constructed by extracting video data, audio data and text data for the trial project to be constructed, and unifies the multimodal data of the trial project to be constructed. Storage for easy retrieval of court trial projects to be constructed.

Description

Translated fromChinese
面向司法图文数据的跨模态检索方法Cross-modal retrieval method for judicial graphic data

技术领域technical field

本申请涉及多模态数据检索领域,尤其涉及一种面向司法图文数据的跨模态检索方法。This application relates to the field of multi-modal data retrieval, in particular to a cross-modal retrieval method for judicial graphic data.

背景技术Background technique

随着科技的发展与时代的进步,法庭案件审理数量逐渐增多,法庭案件记录也越来越多。常见并一直使用的记录方法是利用纸质文件记录整场审判的过程。但利用纸质文件记录缺点很多,首先随着年份的增长,纸质文件会越来越多,需要占用更多的空间进行保存。其次,利用纸质文件并不环保,因为审判时间较长,记录文件体积普遍较为庞大,一份庭审记录需要大量纸张才可以打印完全。第三,不便于查找。对于某一年份或某一特定卷宗的查找难度相对较低,如果要查找特定卷宗中的特定文字内容,需要人工进行全文件阅读才能准确定位,但这样做工作量很大,效率偏低。With the development of science and technology and the progress of the times, the number of court cases has gradually increased, and the number of court case records has also increased. A common and consistently used recording method is to use paper documents to record the entire trial process. However, there are many disadvantages of using paper files to record. First, as the years grow, there will be more and more paper files, which need to take up more space for storage. Secondly, the use of paper documents is not environmentally friendly, because the trial time is long, and the volume of record files is generally relatively large. A court record requires a lot of paper to print completely. Third, it is not easy to find. The difficulty of searching for a certain year or a specific file is relatively low. If you want to find a specific text in a specific file, you need to manually read the entire file to locate it accurately, but this is a lot of work and low efficiency.

通常需要查找的数据不仅仅是文本数据,庭审过程中的相关记录文件形式还有视频和音频,对于视频有效信息的定位也是耗时耗力的,在没有文本文件记录的情况下,需要对整个视频文件进行查看才能实现定位。Usually the data that needs to be searched is not only text data, but also video and audio in the form of relevant record files in the court trial process. It is also time-consuming and labor-intensive to locate effective video information. In the absence of text file records, the entire Only by viewing the video file can positioning be realized.

目前法院对于有关于庭审记录的检索需求是实现文本、音频和视频三个模态数据之间的检索。现有的多个媒体数据之间可以检索的方式通常是采用多任务网络,但并不可以专门用于法庭案件审理应用场景,专业性不强,应用场景不符合。At present, the court's search requirement for court trial records is to realize the search among the three modal data of text, audio and video. Existing methods for retrieving multiple media data usually use a multi-tasking network, but they cannot be specially used in court case trial application scenarios. They are not professional and do not meet the application scenarios.

发明内容Contents of the invention

本申请提供了一种面向司法图文数据的跨模态检索方法,能够解决现有的庭审文件检索方法不能够专门用于法庭案件审理的问题。This application provides a cross-modal retrieval method for judicial graphic data, which can solve the problem that the existing retrieval methods for court documents cannot be specially used for court cases.

本申请的技术方案是一种面向司法图文数据的跨模态检索方法,包括:The technical solution of this application is a cross-modal retrieval method for judicial graphic data, including:

S1:确定若干个待构建庭审项目,以及基于待构建庭审项目,相应地获取原始视频数据、原始音频数据和原始文本数据;S1: Determine several court trial projects to be constructed, and obtain original video data, original audio data and original text data accordingly based on the court trial projects to be constructed;

S2:基于待构建庭审项目,分别对原始视频数据、原始音频数据和原始文本数据进行特征提取,相应地得到以相同存储形式进行存储的提取视频数据、提取音频数据和提取文本数据;S2: Based on the court trial project to be constructed, feature extraction is performed on the original video data, original audio data and original text data respectively, and the extracted video data, extracted audio data and extracted text data stored in the same storage form are correspondingly obtained;

S3:通过若干个待构建庭审项目的提取视频数据、提取音频数据和提取文本数据,针对标准多模态数据检索网络进行训练,得到多模态庭审数据检索网络;S3: Through the extracted video data, extracted audio data and extracted text data of several court trial projects to be constructed, train the standard multi-modal data retrieval network to obtain a multi-modal court trial data retrieval network;

S4:获取待检索庭审项目并且输入待检索庭审项目至多模态庭审数据检索网络,得到相应于待检索庭审项目的原始视频数据、原始音频数据或者原始文本数据。S4: Obtain the court trial items to be retrieved and input the court trial items to be retrieved into the multimodal court trial data retrieval network to obtain original video data, original audio data or original text data corresponding to the court trial items to be retrieved.

可选地,其特征在于,所述步骤S2包括:Optionally, it is characterized in that the step S2 includes:

S21:针对原始视频数据进行分段处理,得到分段视频数据,以及通过MovieNet对分段视频数据进行特征提取,得到提取视频数据;S21: performing segmentation processing on the original video data to obtain segmented video data, and performing feature extraction on the segmented video data through MovieNet to obtain extracted video data;

S22:针对原始音频数据进行分段处理,得到分段音频数据,以及通过AudioNet对分段音频数据进行特征提取,得到提取音频数据;S22: performing segmentation processing on the original audio data to obtain segmented audio data, and performing feature extraction on the segmented audio data through AudioNet to obtain extracted audio data;

S23:通过Bert对原始文本数据进行特征提取,得到包括若干个单词向量的提取文本数据。S23: performing feature extraction on the original text data through Bert to obtain extracted text data including several word vectors.

可选地,所述步骤S3包括:Optionally, the step S3 includes:

S31:通过若干个待构建庭审项目的提取视频数据、提取音频数据和提取文本数据,针对标准多模态数据检索网络进行训练,得到多模态庭审数据初步网络;S31: Through extracting video data, extracting audio data and extracting text data of several court trial projects to be constructed, train the standard multi-modal data retrieval network to obtain a preliminary multi-modal court trial data network;

S32:确定测试庭审项目并且输入测试庭审项目至多模态庭审数据检索网络,通过mAP曲线、PR曲线和top-N精度对多模态庭审数据检索网络进行评估,得出评估结果;S32: Determine the test trial project and input the test trial project to the multimodal trial data retrieval network, evaluate the multimodal trial data retrieval network through the mAP curve, PR curve and top-N accuracy, and obtain the evaluation result;

判断评估结果是否符合预设的评估标准,如果符合,以多模态庭审数据初步网络作为多模态庭审数据检索网络;Judging whether the evaluation results meet the preset evaluation standards, if so, use the preliminary multimodal trial data network as the multimodal trial data retrieval network;

如果不符合,通过评估结果优化多模态庭审数据初步网络,得到多模态庭审数据检索网络。If not, the preliminary multimodal trial data network is optimized based on the evaluation results to obtain a multimodal trial data retrieval network.

有益效果:Beneficial effect:

本申请通过针对待构建庭审项目的提取视频数据、提取音频数据和提取文本数据,构建关于待构建庭审项目的多模态庭审数据检索网络,将待构建庭审项目的多模态数据进行统一空间的存储,方便对待构建庭审项目进行检索,因此本申请能够解决现有的庭审文件检索方法不能够专门用于法庭案件审理的问题。This application builds a multimodal court trial data retrieval network for court trial projects to be constructed by extracting video data, audio data, and text data for court trial projects to be constructed, and performs unified spatial retrieval of multimodal data for court trial projects to be constructed. storage, which is convenient for retrieval of court trial items to be constructed, so this application can solve the problem that the existing court trial document retrieval methods cannot be specially used for court case trials.

附图说明Description of drawings

为了更清楚地说明本申请的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the technical solution of the present application more clearly, the accompanying drawings used in the embodiments will be briefly introduced below. Obviously, for those of ordinary skill in the art, on the premise of not paying creative labor, Additional drawings can also be derived from these drawings.

图1为本申请实施例中面向司法图文数据的跨模态检索方法的流程示意图。FIG. 1 is a schematic flowchart of a cross-modal retrieval method for judicial graphic data in an embodiment of the present application.

具体实施方式Detailed ways

下面将详细地对实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下实施例中描述的实施方式并不代表与本申请相一致的所有实施方式。仅是与权利要求书中所详述的、本申请的一些方面相一致的系统和方法的示例。The embodiments will be described in detail hereinafter, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following examples do not represent all implementations consistent with this application. These are merely examples of systems and methods consistent with aspects of the present application as recited in the claims.

本申请提供了一种面向司法图文数据的跨模态检索方法,如图1所示,图1为本申请实施例中面向司法图文数据的跨模态检索方法的流程示意图,包括:This application provides a cross-modal retrieval method for judicial graphic data, as shown in Figure 1, Figure 1 is a schematic flow diagram of a cross-modal retrieval method for judicial graphic data in the embodiment of the application, including:

S1:确定若干个待构建庭审项目,以及基于待构建庭审项目,相应地获取原始视频数据、原始音频数据和原始文本数据。S1: Determine several court trial projects to be constructed, and obtain original video data, original audio data and original text data accordingly based on the court trial projects to be constructed.

具体地,预处理数据集。庭审的原始视频数据和原始音频数据往往时长较长,需要对对原始视频数据和原始音频数据加入标注,以便于后续的特征提取。Specifically, preprocess the dataset. The original video data and original audio data of court trials are often long in length, and it is necessary to add annotations to the original video data and original audio data to facilitate subsequent feature extraction.

S2:基于待构建庭审项目,分别对原始视频数据、原始音频数据和原始文本数据进行特征提取,相应地得到以相同存储形式进行存储的提取视频数据、提取音频数据和提取文本数据。S2: Based on the court trial project to be constructed, feature extraction is performed on the original video data, original audio data and original text data respectively, and the extracted video data, extracted audio data and extracted text data stored in the same storage form are correspondingly obtained.

其中,所述步骤S2包括:Wherein, the step S2 includes:

S21:针对原始视频数据进行分段处理,得到分段视频数据,以及通过MovieNet对分段视频数据进行特征提取,得到提取视频数据;S21: performing segmentation processing on the original video data to obtain segmented video data, and performing feature extraction on the segmented video data through MovieNet to obtain extracted video data;

S22:针对原始音频数据进行分段处理,得到分段音频数据,以及通过AudioNet对分段音频数据进行特征提取,得到提取音频数据;S22: performing segmentation processing on the original audio data to obtain segmented audio data, and performing feature extraction on the segmented audio data through AudioNet to obtain extracted audio data;

S23:通过Bert对原始文本数据进行特征提取,得到包括若干个单词向量的提取文本数据。S23: performing feature extraction on the original text data through Bert to obtain extracted text data including several word vectors.

具体地,针对视频模态:首先对视频进行分段处理,利用MovieNet提取每一段视频的特征。Specifically, for the video mode: first segment the video, and use MovieNet to extract the features of each video.

针对音频模态:首先对视频进行分段处理,利用AudioNet提取音频的特征,并转换为对应的文本数据。For the audio mode: first segment the video, use AudioNet to extract the audio features, and convert them into corresponding text data.

针对文本模态:利用Bert模型对文本数据进行特征提取。整个文本的表示是所有单词向量的集合,表示为{ti,……,tj}。For text mode: use the Bert model to extract features from text data. The representation of the whole text is the set of all word vectors, denoted as {ti ,...,tj }.

S3:通过若干个待构建庭审项目的提取视频数据、提取音频数据和提取文本数据,针对标准多模态数据检索网络进行训练,得到多模态庭审数据检索网络。S3: Through the extracted video data, extracted audio data and extracted text data of several court trial projects to be constructed, the standard multi-modal data retrieval network is trained to obtain a multi-modal court trial data retrieval network.

其中,所述步骤S3包括:Wherein, the step S3 includes:

S31:通过若干个待构建庭审项目的提取视频数据、提取音频数据和提取文本数据,针对标准多模态数据检索网络进行训练,得到多模态庭审数据初步网络。S31: Through the extracted video data, extracted audio data and extracted text data of several court trial projects to be constructed, train the standard multimodal data retrieval network to obtain a preliminary multimodal trial data network.

具体地,将三种模态数据进行转换,映射到同一子空间,在这一子空间中,不同模态数据的存储形式是相同的。Specifically, the three modal data are converted and mapped to the same subspace, and in this subspace, the storage forms of different modal data are the same.

重复获取不同的待构建庭审项目的所有三种模态数据,然后将三种模态数据映射到同一子空间,通过多模态庭审数据网络进行训练,优化网络结构,使得多模态庭审数据网络能够消除不同模态的异构性,实现多模态庭审数据的检索。Repeatedly obtain all three modal data of different trial projects to be constructed, and then map the three modal data to the same subspace, train through the multi-modal trial data network, optimize the network structure, and make the multi-modal trial data network It can eliminate the heterogeneity of different modalities and realize the retrieval of multimodal court trial data.

S32:确定测试庭审项目并且输入测试庭审项目至多模态庭审数据检索网络,通过mAP曲线、PR曲线和top-N精度对多模态庭审数据检索网络进行评估,得出评估结果;S32: Determine the test trial project and input the test trial project to the multimodal trial data retrieval network, evaluate the multimodal trial data retrieval network through the mAP curve, PR curve and top-N accuracy, and obtain the evaluation result;

判断评估结果是否符合预设的评估标准,如果符合,以多模态庭审数据初步网络作为多模态庭审数据检索网络;如果不符合,通过评估结果优化多模态庭审数据初步网络,得到多模态庭审数据检索网络。Judging whether the evaluation results meet the preset evaluation standards, if so, use the multimodal trial data preliminary network as the multimodal trial data retrieval network; if not, optimize the multimodal trial data preliminary network through the evaluation results to obtain the multimodal trial data network. State Trial Data Retrieval Network.

具体地,首先计算出精度和召回率,精度表示预测为正例的样本中有多少是真的正例,召回率表示所有正例中预测正确的程度,精确度的计算公式如下所示:Specifically, first calculate the precision and recall rate. The precision indicates how many of the samples predicted as positive examples are true positive examples, and the recall rate indicates the degree of correct prediction among all positive examples. The calculation formula of precision is as follows:

其中,tp表示为检索样本中的预测正确的正样本,fp是将负样本预测为正样本的情况,精确度主要是反映“预测正确的正样本”占“预测为正样本”的比重,从而判断检索的准确性。Among them, tp represents the correctly predicted positive samples in the retrieved samples, fp is the case of predicting negative samples as positive samples, and the accuracy mainly reflects the proportion of "predicted positive samples" in "predicted positive samples", so Judge the accuracy of the search.

召回率的计算公式如下所示:The formula for calculating the recall rate is as follows:

其中,tp表示为检索样本中的预测正确的正样本,fn是将正样本预测为负样本的情况,召回率主要是反映“预测正确的正样本”占“正样本”的比重,从而进一步判断检索的准确性。Among them, tp represents the correctly predicted positive samples in the retrieved samples, fn is the case of predicting positive samples as negative samples, and the recall rate mainly reflects the proportion of "correctly predicted positive samples" in "positive samples", so as to further judge Accuracy of retrieval.

S4:获取待检索庭审项目并且输入待检索庭审项目至多模态庭审数据检索网络,得到相应于待检索庭审项目的原始视频数据、原始音频数据或者原始文本数据。S4: Obtain the court trial items to be retrieved and input the court trial items to be retrieved into the multimodal court trial data retrieval network to obtain original video data, original audio data or original text data corresponding to the court trial items to be retrieved.

以上对本申请的实施例进行了详细说明,但内容仅为本申请的较佳实施例,不能被认为用于限定本申请的实施范围。凡依本申请范围所作的均等变化与改进等,均应仍属于本申请的专利涵盖范围之内。The embodiments of the present application have been described in detail above, but the contents are only preferred embodiments of the present application, and cannot be considered as limiting the implementation scope of the present application. All equivalent changes and improvements made according to the scope of this application should still fall within the scope of patent coverage of this application.

Claims (3)

CN202310551276.8A2023-05-162023-05-16Cross-modal retrieval method for judicial image-text dataPendingCN116521939A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202310551276.8ACN116521939A (en)2023-05-162023-05-16Cross-modal retrieval method for judicial image-text data

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202310551276.8ACN116521939A (en)2023-05-162023-05-16Cross-modal retrieval method for judicial image-text data

Publications (1)

Publication NumberPublication Date
CN116521939Atrue CN116521939A (en)2023-08-01

Family

ID=87399305

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202310551276.8APendingCN116521939A (en)2023-05-162023-05-16Cross-modal retrieval method for judicial image-text data

Country Status (1)

CountryLink
CN (1)CN116521939A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112015923A (en)*2020-09-042020-12-01平安科技(深圳)有限公司Multi-mode data retrieval method, system, terminal and storage medium
CN113806482A (en)*2021-09-172021-12-17中国电信集团系统集成有限责任公司Cross-modal retrieval method and device for video text, storage medium and equipment
CN114241279A (en)*2021-12-302022-03-25中科讯飞互联(北京)信息科技有限公司 Image-text joint error correction method, device, storage medium and computer equipment
WO2022247562A1 (en)*2021-05-252022-12-01北京有竹居网络技术有限公司Multi-modal data retrieval method and apparatus, and medium and electronic device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112015923A (en)*2020-09-042020-12-01平安科技(深圳)有限公司Multi-mode data retrieval method, system, terminal and storage medium
WO2022247562A1 (en)*2021-05-252022-12-01北京有竹居网络技术有限公司Multi-modal data retrieval method and apparatus, and medium and electronic device
CN113806482A (en)*2021-09-172021-12-17中国电信集团系统集成有限责任公司Cross-modal retrieval method and device for video text, storage medium and equipment
CN114241279A (en)*2021-12-302022-03-25中科讯飞互联(北京)信息科技有限公司 Image-text joint error correction method, device, storage medium and computer equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
秦峰;: "远程审判高清科技法庭系统的建设实施", 计算机产品与流通, no. 07, 19 May 2020 (2020-05-19)*

Similar Documents

PublicationPublication DateTitle
CN109697233B (en)Knowledge graph system construction method
CN109815364B (en)Method and system for extracting, storing and retrieving mass video features
CN108038183B (en) Structured entity recording method, device, server and storage medium
US8347206B2 (en)Interactive image tagging
CN102414680B (en) Semantic Event Detection Using Cross-Domain Knowledge
CN104573130B (en)The entity resolution method and device calculated based on colony
CN102508923A (en)Automatic video annotation method based on automatic classification and keyword marking
CN105843841A (en)Small file storage method and system
CN105426426A (en)KNN text classification method based on improved K-Medoids
CN104199965A (en)Semantic information retrieval method
CN111753535A (en) A method and device for generating patent application text
CN103440233A (en)Automatic sScientific paper standardization automatic detecting and editing system
CN103226547A (en)Method and device for producing verse for picture
CN109947971B (en)Image retrieval method, image retrieval device, electronic equipment and storage medium
CN107357765A (en)Word document flaking method and device
CN111353055A (en) Cataloging method and system for extended metadata based on smart tags
CN107291949A (en)Information search method and device
CN118708790A (en) Archive information retrieval method, device, computer equipment and readable storage medium
CN116975363A (en)Video tag generation method and device, electronic equipment and storage medium
CN114708445B (en)Trademark similarity recognition method and device, electronic equipment and storage medium
CN114691907B (en)Cross-modal retrieval method, device and medium
CN118779458B (en) A sensitive information analysis and identification method, system, device and readable storage medium
CN105701227A (en)Cross-media similarity measure method and search method based on local association graph
CN116521939A (en)Cross-modal retrieval method for judicial image-text data
CN112417220A (en)Heterogeneous data integration method

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp