CN116453023B

Movatterモバイル変換

Info

Publication number: CN116453023B
Application number: CN202310437286.9A
Authority: CN
Inventors: 沈浩; 黄海量; 吴东进; 韩松乔; 吴优
Original assignee: Shanghai Zhixun Information Technology Co ltd
Current assignee: Shanghai Zhixun Information Technology Co ltd
Priority date: 2023-04-23
Filing date: 2023-04-23
Publication date: 2024-01-26
Anticipated expiration: 2043-04-23
Also published as: CN116453023A; WO2024221710A1

Abstract

The embodiment of the invention discloses a video abstraction system, a method, electronic equipment and a medium of 5G rich media information, wherein the video abstraction method of the 5G rich media information comprises the following steps: acquiring a video message information set Y in a 5G rich media message, and constructing a training set based on the video message information set Y; constructing a video abstract model, wherein the video abstract model comprises a time decoder, a perceptron and a transducer module which are sequentially connected; training the video abstract model through the training set to obtain a trained video abstract model; and inputting the video to be identified into the trained video abstract model to obtain a video sampling picture set y' of the video to be identified. The video abstraction method of the 5G rich media information solves the problems that in the prior art, the video content identification difficulty is large, the identification time is long, and the method is not suitable for a high-concurrency short message sending scene.

Description

Translated fromChinese

5G富媒体信息的视频摘要系统、方法、电子设备及介质Video summary system, method, electronic device and medium for 5G rich media information

技术领域Technical field

本发明涉及计算机技术领域，具体涉及一种5G富媒体信息的视频摘要系统、方法、电子设备及介质。The invention relates to the field of computer technology, and specifically relates to a video summary system, method, electronic device and medium for 5G rich media information.

背景技术Background technique

5G富媒体消息是短消息行业通信能力一次重大的飞跃，相比于传统文字短信而言，5G富媒体消息支持的媒体格式更多，表现形式更丰富，不仅可以发送长文本、图片、语音、视频等富媒体信息，还包括了公众号、小程序等用户交互和反馈能力，使得5G富媒体消息的应用场景、内容质量、使用范围都极大的提升。5G rich media messaging is a major leap forward in the communication capabilities of the short message industry. Compared with traditional text messages, 5G rich media messaging supports more media formats and richer expressions. It can not only send long texts, pictures, voices, Rich media information such as videos also includes user interaction and feedback capabilities such as public accounts and mini programs, which greatly improves the application scenarios, content quality, and scope of use of 5G rich media messages.

一条5G富媒体消息包括多条文本消息信息集合X(x₁,x₂,...)、多条视频消息信息集合Y(y₁,y₂,...)以及多条图片消息信息集合Z(z₁,z₂,...)，视频消息信息中包含大了的视频内容，但是视频内容识别难度较大，且识别时间较长，并不适用于高并发的短消息发送场景。A 5G rich media message includes multiple text message information sets X (x₁ , x₂ ,...), multiple video message information sets Y (y₁ , y₂ ,...) and multiple picture message information sets Z(z₁ , z₂ ,...), the video message information contains large video content, but the video content recognition is difficult and takes a long time to identify, so it is not suitable for high-concurrency short message sending scenarios.

因此，亟需一种适用于高并发短消息发送场景的识别5G富媒体消息视频图片摘要的方法。Therefore, there is an urgent need for a method for identifying 5G rich media message video image summaries that is suitable for high-concurrency short message sending scenarios.

发明内容Contents of the invention

本发明实施例的目的在于提供一种5G富媒体信息的视频摘要系统、方法、电子设备及介质，用以解决现有技术中视频内容识别难度较大，且识别时间较长，不适用于高并发的短消息发送场景的问题。The purpose of the embodiments of the present invention is to provide a 5G rich media information video summary system, method, electronic device and medium to solve the problem that video content recognition in the existing technology is difficult and requires a long recognition time, which is not suitable for high-end applications. Problems with concurrent short message sending scenarios.

为实现上述目的，本发明实施例提供一种5G富媒体信息的视频摘要方法，所述方法具体包括：To achieve the above objectives, embodiments of the present invention provide a video summary method for 5G rich media information. The method specifically includes:

获取5G富媒体消息中的视频消息信息集合Y，基于所述视频消息信息集合Y构建训练集；Obtain the video message information set Y in the 5G rich media message, and build a training set based on the video message information set Y;

构建视频摘要模型，其中，所述视频摘要模型包括依次连接的时间解码器、感知器和Transformer模块；Construct a video summary model, wherein the video summary model includes a temporal decoder, a perceptron and a Transformer module connected in sequence;

通过所述训练集对所述视频摘要模型进行训练，得到训练好的视频摘要模型；Train the video summary model through the training set to obtain a trained video summary model;

将待识别视频输入训练好的所述视频摘要模型得到所述待识别视频的视频抽样图片集合y’。Input the video to be identified into the trained video summary model to obtain a video sample picture set y' of the video to be identified.

在上述技术方案的基础上，本发明还可以做如下改进：On the basis of the above technical solutions, the present invention can also make the following improvements:

进一步地，所述构建视频摘要模型，其中，所述视频摘要模型包括依次连接的时间解码器、感知器和Transformer模块，包括：Further, the video summary model is constructed, wherein the video summary model includes temporal decoder, perceptron and Transformer modules connected in sequence, including:

基于所述时间解码器对视频消息信息集合Y进行时序化处理；Perform timing processing on the video message information set Y based on the time decoder;

基于两层感知器生成相应的分割序列；The corresponding segmentation sequence is generated based on the two-layer perceptron;

通过Transformer模块对每个分割序列进行向量化解析，得到每个分割序列的序列特征集合R，计算所述序列特征集合R中两两序列特征间的容差率，基于所述容差率得到容差率最大的分割序列集合The Transformer module performs vector analysis on each segmented sequence to obtain the sequence feature set R of each segmented sequence. The tolerance rate between two sequence features in the sequence feature set R is calculated. Based on the tolerance rate, the tolerance rate is obtained. The set of segmentation sequences with the largest difference

从所述分割序列集合每个分割序列中随机抽取n张图片，形成所述待识别视频的视频抽样图片集合y’。from the segmented sequence collection n pictures are randomly selected from each segmentation sequence to form a video sampling picture set y' of the video to be identified.

进一步地，所述5G富媒体信息的视频摘要方法还包括：Further, the video summary method of 5G rich media information also includes:

获取5G富媒体消息中的图片消息，基于所述图片消息和所述视频抽样图片集合y’构建图片消息信息集合Z；Obtain the picture message in the 5G rich media message, and construct a picture message information set Z based on the picture message and the video sampling picture set y';

构建特征提取模型和不良图片分类模型；Build feature extraction models and bad image classification models;

基于所述特征提取模型对所述图片消息信息集合Z进行特征提取得到图片深度特征集合z；Perform feature extraction on the picture message information set Z based on the feature extraction model to obtain a picture depth feature set z;

将所述图片深度特征集合z中的图片依次输入所述不良图片分类模型判断所述图片深度特征集合z中的所有图片是否全部合规。The pictures in the picture depth feature set z are sequentially input into the bad picture classification model to determine whether all the pictures in the picture depth feature set z are in compliance.

构建语音转文本模型；Build a speech-to-text model;

通过所述语音转文本模型将所述视频消息信息集合Y转换为视频文本集合y。The video message information set Y is converted into a video text set y through the speech-to-text model.

获取5G富媒体消息中的文本消息；Get text messages in 5G rich media messages;

基于所述文本消息和所述视频文本集合y构建文本消息信息集合X；Construct a text message information set X based on the text message and the video text set y;

构建敏感词变体识别模型；Build a sensitive word variant recognition model;

将所述文本消息信息集合X中的文本消息依次输入所述敏感词变体识别模型判断所述文本消息信息集合X中的所有文本消息是否全部合规。The text messages in the text message information set X are sequentially input into the sensitive word variant recognition model to determine whether all text messages in the text message information set X are all compliant.

当所述文本消息信息集合X中的所有文本消息全部合规，且所述图片深度特征集合z中的图片全部合规时，判定所述5G富媒体消息可以正常发送。When all text messages in the text message information set X are all compliant, and all pictures in the picture depth feature set z are compliant, it is determined that the 5G rich media message can be sent normally.

进一步地，所述通过所述训练集对所述视频摘要模型进行训练，得到训练好的视频摘要模型，包括：Further, training the video summary model through the training set to obtain a trained video summary model includes:

将所述视频消息信息集合Y划分为训练集、测试集和验证集；Divide the video message information set Y into a training set, a test set and a verification set;

基于所述训练集训练所述视频摘要模型；Train the video summary model based on the training set;

基于所述验证集对所述视频摘要模型进行性能验证，保存满足性能条件的改进CTC模型；Perform performance verification on the video summary model based on the verification set, and save the improved CTC model that meets the performance conditions;

基于所述测试集评估所述视频摘要模型的识别效果。Evaluate the recognition effect of the video summary model based on the test set.

一种5G富媒体信息的视频摘要系统，包括：A video summary system for 5G rich media information, including:

获取模块，用于获取5G富媒体消息中的视频消息信息集合Y，基于所述视频消息信息集合Y构建训练集；An acquisition module, used to acquire the video message information set Y in the 5G rich media message, and build a training set based on the video message information set Y;

构建模块，用于构建视频摘要模型，其中，所述视频摘要模型包括依次连接的时间解码器、感知器和Transformer模块；A building module for building a video summary model, wherein the video summary model includes a temporal decoder, a perceptron and a Transformer module connected in sequence;

训练模块，用于通过所述训练集对所述视频摘要模型进行训练，得到训练好的视频摘要模型；A training module, configured to train the video summary model through the training set to obtain a trained video summary model;

一种电子设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现如所述方法的步骤。An electronic device includes a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, the steps of the method are implemented.

一种非暂态计算机可读介质，其上存储有计算机程序，所述计算机程序被处理器执行时实现所述方法的步骤。A non-transitory computer-readable medium having a computer program stored thereon which implements the steps of the method when executed by a processor.

本发明实施例具有如下优点：The embodiments of the present invention have the following advantages:

本发明中5G富媒体信息的视频摘要方法，获取5G富媒体消息中的视频消息信息集合Y，基于所述视频消息信息集合Y构建训练集；构建视频摘要模型，其中，所述视频摘要模型包括依次连接的时间解码器、感知器和Transformer模块；通过所述训练集对所述视频摘要模型进行训练，得到训练好的视频摘要模型；将待识别视频输入训练好的所述视频摘要模型得到所述待识别视频的视频抽样图片集合y’，解决了现有技术中视频内容识别难度较大，且识别时间较长，不适用于高并发的短消息发送场景的问题。The video summary method of 5G rich media information in the present invention obtains the video message information set Y in the 5G rich media message, builds a training set based on the video message information set Y, and builds a video summary model, wherein the video summary model includes The temporal decoder, perceptron and Transformer modules are connected in sequence; the video summary model is trained through the training set to obtain a trained video summary model; the video to be identified is input to the trained video summary model to obtain the result The video sampling picture set y' of the video to be recognized solves the problem in the existing technology that video content recognition is difficult and the recognition time is long, and is not suitable for high-concurrency short message sending scenarios.

附图说明Description of the drawings

为了更清楚地说明本发明的实施方式或现有技术中的技术方案，下面将对实施方式或现有技术描述中所需要使用的附图作简单地介绍。显而易见地，下面描述中的附图仅仅是示例性的，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据提供的附图引伸获得其它的实施附图。In order to more clearly explain the embodiments of the present invention or the technical solutions in the prior art, the drawings that need to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are only exemplary. For those of ordinary skill in the art, other implementation drawings can be obtained based on the extension of the provided drawings without exerting creative efforts.

本说明书所绘示的结构、比例、大小等，均仅用以配合说明书所揭示的内容，以供熟悉此技术的人士了解与阅读，并非用以限定本发明可实施的限定条件，故不具技术上的实质意义，任何结构的修饰、比例关系的改变或大小的调整，在不影响本发明所能产生的功效及所能达成的目的下，均应仍落在本发明所揭示的技术内容得能涵盖的范围内。The structures, proportions, sizes, etc. shown in this specification are only used to coordinate with the content disclosed in the specification and are for the understanding and reading of those familiar with this technology. They are not used to limit the conditions under which the invention can be implemented, and therefore do not have technical implications. Any structural modifications, changes in proportions or adjustments in size shall still fall within the scope of the technical content disclosed in the present invention without affecting the effectiveness and purpose achieved by the present invention. within the scope that can be covered.

图1为本发明5G富媒体信息的视频摘要方法的流程图；Figure 1 is a flow chart of the video summarization method of 5G rich media information according to the present invention;

图2为本发明5G富媒体信息的视频摘要系统的第一架构图；Figure 2 is a first architecture diagram of the 5G rich media information video summary system of the present invention;

图3为本发明5G富媒体信息的视频摘要系统的第二架构图；Figure 3 is a second architecture diagram of the 5G rich media information video summary system of the present invention;

图4为本发明5G富媒体视频图片摘要的生成流程图；Figure 4 is a flow chart for generating 5G rich media video image abstracts according to the present invention;

图5为本发明提供的电子设备实体结构示意图。Figure 5 is a schematic diagram of the physical structure of the electronic equipment provided by the present invention.

其中附图标记为：The drawings are marked as:

获取模块10，构建模块20，训练模块30，视频摘要模型40，语音转文本模型50，敏感词变体识别模型60，特征提取模型70，不良图片分类模型80，电子设备90，处理器901，存储器902，总线903。Acquisition module 10, construction module 20, training module 30, video summary model 40, speech-to-text model 50, sensitive word variant recognition model 60, feature extraction model 70, bad picture classification model 80, electronic device 90, processor 901, Memory 902, bus 903.

具体实施方式Detailed ways

以下由特定的具体实施例说明本发明的实施方式，熟悉此技术的人士可由本说明书所揭露的内容轻易地了解本发明的其他优点及功效，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following specific embodiments are used to illustrate the implementation of the present invention. Persons familiar with this technology can easily understand other advantages and effects of the present invention from the content disclosed in this specification. Obviously, the described embodiments are only part of the embodiments of the present invention. , not all examples. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of the present invention.

实施例Example

图1为本发明5G富媒体信息的视频摘要方法实施例流程图，如图1所示，本发明实施例提供的一种5G富媒体信息的视频摘要方法包括以下步骤：Figure 1 is a flow chart of a video summarization method for 5G rich media information according to an embodiment of the present invention. As shown in Figure 1, a video summary method for 5G rich media information provided by an embodiment of the present invention includes the following steps:

S101，获取5G富媒体消息中的视频消息信息集合Y，基于视频消息信息集合Y构建训练集；S101, obtain the video message information set Y in the 5G rich media message, and build a training set based on the video message information set Y;

具体的，获取5G富媒体消息，由于5G富媒体消息单条最大容量为3M，因此可以在一条5G富媒体消息内包含多份文字短信、多张图片、多段视频/音频。一条5G富媒体消息可表示为T_xyz，T可以包含多条文本消息信息集合X(x₁,x₂,...)，多条视频消息信息集合Y(y₁,y₂,...)，以及多条图片消息信息集合Z(z₁,z₂,...)。Specifically, to obtain 5G rich media messages, since the maximum capacity of a single 5G rich media message is 3M, one 5G rich media message can contain multiple text messages, multiple pictures, and multiple videos/audios. A 5G rich media message can be expressed as T_xyz , and T can contain multiple text message information sets X(x₁ , x₂ ,...), multiple video message information sets Y(y₁ , y₂ ,...) ), and a collection of multiple picture message information Z(z₁ , z₂ ,...).

视频消息信息集合Y包含视频内容和音频内容，在安全合规审查中需要同时对视频和音频信息进行审核，分别将视频消息信息集合Y(y₁,y₂,...)转化为视频文本集合y和视频抽样图片集合y’，并基于视频消息信息集合Y构建训练集。The video message information set Y contains video content and audio content. In the security compliance review, the video and audio information need to be reviewed at the same time, and the video message information set Y (y₁ , y₂ ,...) is converted into video text respectively. Set y and video sampling picture set y', and build a training set based on the video message information set Y.

S102，构建视频摘要模型，其中，视频摘要模型包括依次连接的时间解码器、感知器和Transformer模块；S102, construct a video summary model, where the video summary model includes a temporal decoder, a perceptron and a Transformer module connected in sequence;

具体的，基于所述时间解码器对视频消息信息集合Y中的视频进行时序化处理；Specifically, the video in the video message information set Y is sequentially processed based on the time decoder;

S103，通过训练集对视频摘要模型进行训练，得到训练好的视频摘要模型；S103, train the video summary model through the training set to obtain the trained video summary model;

具体的，将所述视频消息信息集合Y划分为训练集、测试集和验证集；Specifically, the video message information set Y is divided into a training set, a test set and a verification set;

基于所述训练集训练所述视频摘要模型40；Train the video summary model 40 based on the training set;

基于所述验证集对所述视频摘要模型40进行性能验证，保存满足性能条件的改进CTC模型；Perform performance verification on the video summary model 40 based on the verification set, and save the improved CTC model that meets the performance conditions;

基于所述测试集评估所述视频摘要模型40的识别效果。The recognition effect of the video summary model 40 is evaluated based on the test set.

S104，将待识别视频输入训练好的视频摘要模型得到待识别视频的视频抽样图片集合y’。S104: Input the video to be recognized into the trained video summary model to obtain the video sample picture set y’ of the video to be recognized.

所述5G富媒体信息的视频摘要方法还包括：The video summary method of 5G rich media information also includes:

构建语音转文本模型50；Build a speech-to-text model50;

通过所述语音转文本模型50将所述视频消息信息集合Y转换为视频文本集合y；Convert the video message information set Y into a video text set y through the speech-to-text model 50;

优选的，语音转文本模型50为CTC模型，在CTC模型的基础上引入最大熵函数对所述CTC模型中CTC原有损失函数进行改进；通过所述训练集对所述改进CTC模型进行训练，得到训练好的改进CTC模型；通过训练好的所述改进CTC模型将所述视频消息信息集合Y转换为视频文本集合y。Preferably, the speech-to-text model 50 is a CTC model. On the basis of the CTC model, the maximum entropy function is introduced to improve the original CTC loss function in the CTC model; the improved CTC model is trained through the training set, Obtain a trained improved CTC model; convert the video message information set Y into a video text set y through the trained improved CTC model.

通过公式1对所述CTC原有损失函数进行改进；Improve the original loss function of the CTC through Formula 1;

式中，为所述改进CTC模型的损失函数，/>为CTC原有损失函数，α为最大条件熵正则化的系数，H(p(π|l，X))为给定输入序列和目标序列的可行路径的熵。In the formula, is the loss function of the improved CTC model,/> is the original loss function of CTC, α is the coefficient of maximum conditional entropy regularization, and H(p(π|l,X)) is the entropy of the feasible path given the input sequence and the target sequence.

通过公式2求解H(p(π|l，X))；Solve H(p(π|l,X)) through formula 2;

式中，(p(π|l，X))表示当给定5G语音信息X和真实输出I的情况下，某一条可行路径π的条件概率；In the formula, (p(π|l,X)) represents the conditional probability of a certain feasible path π given the 5G voice information X and the real output I;

logp(π|X)表示给定5G语音信息X时，某一条可行路径π的条件概率的对数；表示无论是否给出真实输出I时，5G语音信息X的所有输出条件概率的和。logp(π|X) represents the logarithm of the conditional probability of a certain feasible path π given 5G voice information X; Indicates the sum of all output conditional probabilities of 5G voice information X regardless of whether the real output I is given or not.

本发明中使用的损失函数，可以选择L1Loss、MSEloss、CrossEntropyLoss等，对改进CTC模型最终效果不会有较大差异。The loss function used in the present invention can choose L1Loss, MSEloss, CrossEntropyLoss, etc., which will not have a big difference in the final effect of improving the CTC model.

获取5G富媒体消息中的文本消息；基于所述文本消息和所述视频文本集合y构建文本消息信息集合X；Obtain the text message in the 5G rich media message; build a text message information set X based on the text message and the video text set y;

构建敏感词变体识别模型60；优选的，所述敏感词变体识别模型60为Text CNN模型，基于Text CNN模型的不良短文本识别方法目前已经被较为成熟的运用在短消息文本审查中。Construct a sensitive word variant recognition model 60; preferably, the sensitive word variant recognition model 60 is a Text CNN model. The bad short text recognition method based on the Text CNN model has been relatively maturely used in short message text review.

本发明中使用的敏感词变体识别模型60，除了Text CNN模型之外，还可以使用CRNN、LSTM+CTC等模型替代，识别效果不会有较大差异。The sensitive word variant recognition model 60 used in the present invention, in addition to the Text CNN model, can also be replaced by CRNN, LSTM+CTC and other models, and the recognition effect will not be significantly different.

首先，待处理的5G富媒体消息需要经过数字字符标准化、英文字符标准化、繁体转简体、特殊意义符号处理、去除夹杂符号噪音、连续数字支付统一表示、字符串切分等预处理。First of all, the 5G rich media messages to be processed need to undergo preprocessing such as digital character standardization, English character standardization, traditional to simplified Chinese, special meaning symbol processing, removal of mixed symbol noise, unified representation of continuous digital payments, and string segmentation.

其次，通过word2vec将短文本进行向量化，并在卷积层中对文本向量进行高维卷积及延展，利用池化层和全连接层对敏感词汇进行向量激活，通过SoftMax函数计算敏感词的命中概率。此处选择的SoftMax函数表达式如下：Secondly, the short text is vectorized through word2vec, and the text vector is subjected to high-dimensional convolution and extension in the convolution layer. The pooling layer and the fully connected layer are used to perform vector activation of sensitive words, and the SoftMax function is used to calculate the sensitive words. Hit probability. The SoftMax function expression selected here is as follows:

其中，x代表词向量。Among them, x represents the word vector.

最后，将所述文本消息信息集合X中的文本消息依次输入所述敏感词变体识别模型60判断所述文本消息信息集合X中的所有文本消息是否全部合规。如果文本消息合规性判定为不合规，则转为人工判断或预警。文本消息判定为合规，则进入后续判定流程。Finally, the text messages in the text message information set X are sequentially input into the sensitive word variant recognition model 60 to determine whether all text messages in the text message information set X are in compliance. If the text message compliance is determined to be non-compliant, it will be converted to manual judgment or early warning. If the text message is determined to be compliant, it will enter the subsequent determination process.

构建不良图片分类模型80和特征提取模型70；Construct a bad image classification model 80 and a feature extraction model 70;

对所述图片消息信息集合Z进行特征提取得到图片深度特征集合z；优选的，本发明中使用的图片原始特征提取方法为LBP、HOG、SIFT，可以使用其他相似特征提取算法替代，替代效果对最终不良图片分类模型80效果影响不会太大。Feature extraction is performed on the picture message information set Z to obtain the picture depth feature set z; preferably, the original picture feature extraction methods used in the present invention are LBP, HOG, and SIFT. Other similar feature extraction algorithms can be used instead, and the substitution effect is In the end, the effect of the bad image classification model 80 will not have much impact.

将所述图片深度特征集合z中的图片依次输入所述不良图片分类模型80判断所述图片深度特征集合z中的所有图片是否全部合规。如果图片或图片内某一特征信息判定为不合规，则该图片判定为不合规。如果图片或图片内任一特征信息均判定为合规，则该图片判定为合规。The pictures in the picture depth feature set z are sequentially input into the bad picture classification model 80 to determine whether all the pictures in the picture depth feature set z are in compliance. If a picture or a certain characteristic information in the picture is judged to be non-compliant, the picture is judged to be non-compliant. If the picture or any feature information in the picture is judged to be compliant, the picture is judged to be compliant.

该5G富媒体信息的视频摘要方法，获取5G富媒体消息中的视频消息信息集合Y，基于所述视频消息信息集合Y构建训练集；构建视频摘要模型40，其中，所述视频摘要模型40包括依次连接的时间解码器、感知器和Transformer模块；通过所述训练集对所述视频摘要模型40进行训练，得到训练好的视频摘要模型40；将待识别视频输入训练好的所述视频摘要模型40得到所述待识别视频的视频抽样图片集合y’。解决了现有技术中视频内容识别难度较大，且识别时间较长，不适用于高并发的短消息发送场景的问题。The video summary method of 5G rich media information obtains the video message information set Y in the 5G rich media message, builds a training set based on the video message information set Y, and builds a video summary model 40, wherein the video summary model 40 includes The temporal decoder, perceptron and Transformer modules are connected in sequence; the video summary model 40 is trained through the training set to obtain the trained video summary model 40; the video to be recognized is input into the trained video summary model 40. Obtain the video sample picture set y' of the video to be identified. This solves the problem in the existing technology that video content recognition is difficult and takes a long time to identify, making it unsuitable for high-concurrency short message sending scenarios.

图2-图3为本发明5G富媒体信息的视频摘要系统实施例流程图；如图2-图3所示，本发明实施例提供的一种5G富媒体信息的视频摘要系统，包括以下步骤：Figures 2-3 are flow charts of a video summary system for 5G rich media information according to an embodiment of the present invention; as shown in Figures 2-3, a video summary system for 5G rich media information provided by an embodiment of the present invention includes the following steps :

获取模块10，用于获取5G富媒体消息中的视频消息信息集合Y，基于所述视频消息信息集合Y构建训练集；The acquisition module 10 is used to acquire the video message information set Y in the 5G rich media message, and build a training set based on the video message information set Y;

构建模块20，用于构建视频摘要模型40，其中，所述视频摘要模型40包括依次连接的时间解码器、感知器和Transformer模块；Building module 20, used to build a video summary model 40, wherein the video summary model 40 includes a temporal decoder, a perceptron and a Transformer module connected in sequence;

基于所述时间解码器对视频消息信息集合Y中的视频进行时序化处理；Perform sequential processing on the videos in the video message information set Y based on the temporal decoder;

训练模块30，用于通过所述训练集对所述视频摘要模型40进行训练，得到训练好的视频摘要模型40；The training module 30 is used to train the video summary model 40 through the training set to obtain the trained video summary model 40;

将待识别视频输入训练好的所述视频摘要模型40得到所述待识别视频的视频抽样图片集合y’。Input the video to be recognized into the trained video summary model 40 to obtain a video sample picture set y' of the video to be recognized.

所述获取模块10还用于：The acquisition module 10 is also used to:

特征提取模型70，基于所述特征提取模型70对所述图片消息信息集合Z进行特征提取得到图片深度特征集合z；The feature extraction model 70 performs feature extraction on the picture message information set Z based on the feature extraction model 70 to obtain the picture depth feature set z;

不良图片分类模型80，将所述图片深度特征集合z中的图片依次输入所述不良图片分类模型80判断所述图片深度特征集合z中的所有图片是否全部合规；The bad picture classification model 80 sequentially inputs the pictures in the picture depth feature set z into the bad picture classification model 80 to determine whether all the pictures in the picture depth feature set z are in compliance;

语音转文本模型50，通过所述语音转文本模型50将所述视频消息信息集合Y转换为视频文本集合y；A speech-to-text model 50 that converts the video message information set Y into a video text set y through the speech-to-text model 50;

敏感词变体识别模型60，将所述文本消息信息集合X中的文本消息依次输入所述敏感词变体识别模型60判断所述文本消息信息集合X中的所有文本消息是否全部合规；The sensitive word variant identification model 60 inputs the text messages in the text message information set X sequentially into the sensitive word variant identification model 60 to determine whether all the text messages in the text message information set X are in compliance;

本发明的一种5G富媒体信息的视频摘要系统，通过获取模块10获取5G富媒体消息中的视频消息信息集合Y，基于所述视频消息信息集合Y构建训练集；通过构建模块20构建视频摘要模型40，其中，所述视频摘要模型40包括依次连接的时间解码器、感知器和Transformer模块；训练模块30通过所述训练集对所述视频摘要模型40进行训练，得到训练好的视频摘要模型40；将待识别视频输入训练好的所述视频摘要模型40得到所述待识别视频的视频抽样图片集合y’。该5G富媒体信息的视频摘要方法解决现有技术中视频内容识别难度较大，且识别时间较长，不适用于高并发的短消息发送场景的问题。A video summary system for 5G rich media information of the present invention obtains the video message information set Y in the 5G rich media message through the acquisition module 10, and constructs a training set based on the video message information set Y; and constructs the video summary through the construction module 20 Model 40, wherein the video summary model 40 includes a temporal decoder, a perceptron and a Transformer module connected in sequence; the training module 30 trains the video summary model 40 through the training set to obtain a trained video summary model 40; Input the video to be recognized into the trained video summary model 40 to obtain the video sample picture set y' of the video to be recognized. This video summary method of 5G rich media information solves the problem in the existing technology that video content recognition is difficult and the recognition time is long, and is not suitable for high-concurrency short message sending scenarios.

图5为本发明实施例提供的电子设备实体结构示意图，如图5所示，电子设备90包括：处理器901(processor)、存储器902(memory)和总线903；Figure 5 is a schematic diagram of the physical structure of an electronic device provided by an embodiment of the present invention. As shown in Figure 5, the electronic device 90 includes: a processor 901 (processor), a memory 902 (memory) and a bus 903;

其中，处理器901、存储器902通过总线903完成相互间的通信；Among them, the processor 901 and the memory 902 complete communication with each other through the bus 903;

处理器901用于调用存储器902中的程序指令，以执行上述各方法实施例所提供的方法，例如包括：获取5G富媒体消息中的视频消息信息集合Y，基于所述视频消息信息集合Y构建训练集；构建视频摘要模型40，其中，所述视频摘要模型40包括依次连接的时间解码器、感知器和Transformer模块；通过所述训练集对所述视频摘要模型40进行训练，得到训练好的视频摘要模型40；将待识别视频输入训练好的所述视频摘要模型40得到所述待识别视频的视频抽样图片集合y’。The processor 901 is used to call program instructions in the memory 902 to execute the methods provided by the above method embodiments, for example, including: obtaining the video message information set Y in the 5G rich media message, and constructing a video message information set Y based on the video message information set Y Training set; Construct a video summary model 40, wherein the video summary model 40 includes a temporal decoder, a perceptron and a Transformer module connected in sequence; train the video summary model 40 through the training set to obtain the trained Video summary model 40: input the video to be recognized into the trained video summary model 40 to obtain the video sample picture set y' of the video to be recognized.

本实施例提供一种非暂态计算机可读介质，非暂态计算机可读介质存储计算机指令，计算机指令使计算机执行上述各方法实施例所提供的方法，例如包括：获取5G富媒体消息中的视频消息信息集合Y，基于所述视频消息信息集合Y构建训练集；构建视频摘要模型40，其中，所述视频摘要模型40包括依次连接的时间解码器、感知器和Transformer模块；通过所述训练集对所述视频摘要模型40进行训练，得到训练好的视频摘要模型40；将待识别视频输入训练好的所述视频摘要模型40得到所述待识别视频的视频抽样图片集合y’。This embodiment provides a non-transitory computer-readable medium. The non-transitory computer-readable medium stores computer instructions. The computer instructions cause the computer to execute the methods provided by the above method embodiments. For example, it includes: obtaining the information in the 5G rich media message. Video message information set Y, construct a training set based on the video message information set Y; construct a video summary model 40, wherein the video summary model 40 includes a temporal decoder, a perceptron and a Transformer module connected in sequence; through the training The video summary model 40 is trained collectively to obtain a trained video summary model 40; the video to be identified is input into the trained video summary model 40 to obtain a video sampling picture set y' of the video to be identified.

本领域普通技术人员可以理解：实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成，前述的程序可以存储于一计算机可读取介质中，该程序在执行时，执行包括上述方法实施例的步骤；而前述的介质包括：ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。Those of ordinary skill in the art can understand that all or part of the steps to implement the above method embodiments can be completed by hardware related to program instructions. The aforementioned program can be stored in a computer-readable medium. When the program is executed, the execution includes: The steps of the above method embodiment; and the aforementioned media include: ROM, RAM, magnetic disks or optical disks and other media that can store program codes.

以上所描述的装置实施例仅仅是示意性的，其中作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下，即可以理解并实施。The device embodiments described above are only illustrative. The units described as separate components may or may not be physically separated. The components shown as units may or may not be physical units, that is, they may be located in one place. , or it can be distributed to multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. Persons of ordinary skill in the art can understand and implement the method without any creative effort.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件。基于这样的理解，上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品可以存储在计算机可读介质中，如ROM/RAM、磁碟、光盘等，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行各实施例或者实施例的某些部分的方法。Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions can be embodied in the form of software products in essence or in part that contribute to the existing technology. The computer software products can be stored in computer readable media, such as ROM/RAM, disks. , optical disk, etc., including a number of instructions to cause a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods of each embodiment or some parts of the embodiment.

虽然，上文中已经用一般性说明及具体实施例对本发明作了详尽的描述，但在本发明基础上，可以对之作一些修改或改进，这对本领域技术人员而言是显而易见的。因此，在不偏离本发明精神的基础上所做的这些修改或改进，均属于本发明要求保护的范围。Although the present invention has been described in detail with general descriptions and specific examples above, it is obvious to those skilled in the art that some modifications or improvements can be made on the basis of the present invention. Therefore, these modifications or improvements made without departing from the spirit of the present invention all fall within the scope of protection claimed by the present invention.

Claims

Translated fromChinese

1.一种5G富媒体信息的视频摘要方法，其特征在于，所述方法具体包括：1. A video summary method for 5G rich media information, characterized in that the method specifically includes:

从所述分割序列集合每个分割序列中随机抽取n张图片，形成待识别视频的视频抽样图片集合y’；from the segmented sequence collection n pictures are randomly selected from each segmentation sequence to form a video sampling picture set y' of the video to be recognized;

2.根据权利要求1所述5G富媒体信息的视频摘要方法，其特征在于，所述5G富媒体信息的视频摘要方法还包括：2. The video summary method of 5G rich media information according to claim 1, characterized in that the video summary method of 5G rich media information further includes:

3.根据权利要求1所述5G富媒体信息的视频摘要方法，其特征在于，所述5G富媒体信息的视频摘要方法还包括：3. The video summary method of 5G rich media information according to claim 1, characterized in that the video summary method of 5G rich media information further includes:

构建语音转文本模型；Build a speech-to-text model;

4.根据权利要求3所述5G富媒体信息的视频摘要方法，其特征在于，所述5G富媒体信息的视频摘要方法还包括：4. The video summarization method of 5G rich media information according to claim 3, characterized in that the video summarization method of 5G rich media information further includes:

5.根据权利要求4所述5G富媒体信息的视频摘要方法，其特征在于，所述5G富媒体信息的视频摘要方法还包括：5. The video summary method of 5G rich media information according to claim 4, characterized in that the video summary method of 5G rich media information further includes:

6.根据权利要求1所述5G富媒体信息的视频摘要方法，其特征在于，所述通过所述训练集对所述视频摘要模型进行训练，得到训练好的视频摘要模型，包括：6. The video summarization method of 5G rich media information according to claim 1, characterized in that said training the video summary model through the training set to obtain a trained video summary model includes:

基于所述验证集对所述视频摘要模型进行性能验证，保存满足性能条件的视频摘要模型；Perform performance verification on the video summary model based on the verification set, and save the video summary model that meets the performance conditions;

7.一种5G富媒体信息的视频摘要系统，其特征在于，包括：7. A video summary system for 5G rich media information, which is characterized by including:

所述视频摘要模型用于：The video summarization model is used for:

8.一种电子设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，其特征在于，所述处理器执行所述计算机程序时实现如权利要求1至6中的任一项所述的方法的步骤。8. An electronic device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that when the processor executes the computer program, the processor implements claims 1 to 6 The steps of any of the methods.

9.一种非暂态计算机可读介质，其上存储有计算机程序，其特征在于，所述计算机程序被处理器执行时实现如权利要求1至6中的任一项所述的方法的步骤。9. A non-transitory computer-readable medium with a computer program stored thereon, characterized in that, when executed by a processor, the computer program implements the steps of the method according to any one of claims 1 to 6 .