CN109005451A

Movatterモバイル変換

Info

Publication number: CN109005451A
Application number: CN201810701351.3A
Authority: CN
Inventors: 倪攀; 姜子琛; 彭梅; 刘睿; 刘宜飞
Original assignee: Hangzhou Star Technology Co Ltd
Current assignee: Hangzhou Star Technology Co Ltd
Priority date: 2018-06-29
Filing date: 2018-06-29
Publication date: 2018-12-14
Anticipated expiration: 2038-06-29
Also published as: CN109005451B

Abstract

The video demolition method based on deep learning that the invention discloses a kind of, comprising the following steps: step 1: video data initialization；Step 2: carrying out Face datection using face recognition technology, the time slice of similar face continuously occurred as candidate demolition segment；Step 3: in candidate demolition segment, extracting sound characteristic；Step 4: refining the demolition time point of candidate demolition segment using voice recognition technology and the sound characteristic, obtain final demolition time point.Two features of face and sound are identified using deep learning algorithm in the present invention, improve the accuracy of demolition, and can be exceedingly fast simultaneously to multiple video clips progress face and voice recognition, speed.In addition, deep learning algorithm can carry out intelligent demolition to video, reduce the investment of manpower.

Description

Translated fromChinese

基于深度学习的视频拆条方法Video stripping method based on deep learning

技术领域technical field

本发明涉及媒资管理技术领域，更具体的说，涉及一种基于深度学习的The present invention relates to the technical field of media asset management, and more specifically, to a deep learning-based

视频拆条方法。How to split a video.

背景技术Background technique

随着电视节目生产全流程的数字化，网络化、信息化以及电视节目的不断发展，积累了大量的多媒体数据，面对海量的多媒体资源无法深度开发和利用以及我国对电视节目的监管要求不断提升，拆条技术应运而生。而互联网的不断发展，使得视频素材量呈现爆炸式增长，直播、小视频、网络电视节目、移动多媒体等不是进行完整的节目播出，而是需要拆分或精简小视频，用户对互联网内容的碎片化需求不断增加，拆条在新媒体中也有越来越广泛的应用。With the digitization of the whole process of TV program production, networking, informatization, and the continuous development of TV programs, a large amount of multimedia data has been accumulated. Facing the in-depth development and utilization of massive multimedia resources and the continuous improvement of my country's regulatory requirements for TV programs , Strip removal technology came into being. With the continuous development of the Internet, the amount of video materials has exploded. Live broadcasts, small videos, Internet TV programs, mobile multimedia, etc. do not broadcast complete programs, but need to split or streamline small videos. The demand for fragmentation continues to increase, and splitting is also more and more widely used in new media.

传统拆条方法是人工拆条即人工逐帧预览手工拆条，需要大量的人力投入且效率太低。现有技术是基于云架构的拆条方法，和传统的拆条方式比效率有所提高，在内容产出的时效性和软件成本方面有较大的优势，但需要大量的人力投入，并没有将人力从大量低质量的重复劳动中解放出来。The traditional strip removal method is manual strip removal, that is, manual frame-by-frame preview and manual strip removal, which requires a lot of manpower input and is too inefficient. The existing technology is based on the cloud-based stripping method, which has improved efficiency compared with the traditional stripping method, and has great advantages in terms of timeliness of content output and software cost, but requires a lot of manpower input, and there is no Free manpower from a large amount of low-quality repetitive labor.

发明内容Contents of the invention

有鉴于此，本发明提供一种可以降低拆条工作中人力投入的基于深度学习的视频拆条方法，用于解决现有技术中需要大量的人力投入的问题。In view of this, the present invention provides a deep learning-based video stripping method that can reduce manpower input in the stripping work, so as to solve the problem of requiring a large amount of manpower input in the prior art.

本发明提供了一种基于深度学习的视频拆条方法，包括以下步骤：The invention provides a method for stripping video based on deep learning, comprising the following steps:

步骤1：视频数据初始化；Step 1: Video data initialization;

步骤2：利用人脸识别技术进行人脸检测，得到连续出现相似人脸的时间片段作为候选拆条片段；Step 2: Use face recognition technology to detect faces, and obtain time segments in which similar faces appear continuously as candidate splitting segments;

步骤3：在候选的拆条片段中，提取声音特征；Step 3: Extract sound features from the candidate stripping segments;

步骤4：利用声音识别技术和所述声音特征细化候选拆条片段的拆条时间点，得到最终的拆条时间点。Step 4: Using the sound recognition technology and the sound features to refine the disassembly time points of the candidate disassembly segments to obtain the final dissection time points.

可选的，所述步骤1 中视频数据初始化包括获取视频数据中的音频波形数据和图像数据。Optionally, the video data initialization in step 1 includes acquiring audio waveform data and image data in the video data.

可选的，所述步骤2 中的人脸识别技术包括：使用深度学习算法对人脸进行编码，比较视频数据中各个图像帧人脸的相似性。Optionally, the face recognition technology in step 2 includes: using a deep learning algorithm to encode the face, and comparing the similarity of the face in each image frame in the video data.

可选的，所述步骤4 中声音识别技术包括：使用深度学习算法在候选拆条片段的拆条时间点前后一定范围中寻找与所述提取声音特征具有相似特征的声音。Optionally, the voice recognition technology in step 4 includes: using a deep learning algorithm to search for voices with similar characteristics to the extracted voice features within a certain range before and after the disassembly time point of the candidate disassembly segment.

可选的，所述使用深度学习算法对人脸进行编码过程包括：Optionally, the process of encoding a face using a deep learning algorithm includes:

训练深度神经网络模型，使其能够对输入的人脸提取特征；Train the deep neural network model so that it can extract features from the input face;

输入视频数据的图像数据到所述深度神经网络模型，提取图像数据的高维度人脸特征；The image data of input video data is to described depth neural network model, extracts the high-dimensional face feature of image data;

进行编码，即将高维度人脸特征映射为低维度的向量；Encoding, that is, mapping high-dimensional face features to low-dimensional vectors;

根据低维度的向量，辨别视频数据中的人脸相似或不同。Identify similar or different faces in video data based on low-dimensional vectors.

本发明中与现有技术相比，具有以下优点：本发明中利用深度学习算法对人脸和声音两个特征进行识别，提高了拆条的准确性，且可同时对多个视频片段进行人脸和声音识别，速度极快。此外，深度学习算法可以对视频进行智能拆条，减少了人力的投入。Compared with the prior art, the present invention has the following advantages: In the present invention, a deep learning algorithm is used to identify the two features of human face and voice, which improves the accuracy of stripping, and can simultaneously perform artificial intelligence on multiple video clips. Face and voice recognition, extremely fast. In addition, the deep learning algorithm can intelligently split the video, reducing the input of manpower.

附图说明Description of drawings

图1 为本发明基于深度学习的视频拆条方法的流程图。Fig. 1 is a flow chart of the video stripping method based on deep learning in the present invention.

具体实施方式Detailed ways

以下结合附图对本发明的优选实施例进行详细描述，但本发明并不仅仅限于这些实施例。本发明涵盖任何在本发明的精神和范围上做的替代、修改、等效方法以及方案。Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings, but the present invention is not limited to these embodiments. The present invention covers any alternatives, modifications, equivalent methods and schemes made within the spirit and scope of the present invention.

为了使公众对本发明有彻底的了解，在以下本发明优选实施例中详细说明了具体的细节，而对本领域技术人员来说没有这些细节的描述也可以完全理解本发明。In order to provide the public with a thorough understanding of the present invention, specific details are set forth in the following preferred embodiments of the present invention, but those skilled in the art can fully understand the present invention without the description of these details.

在下列段落中参照附图以举例方式更具体地描述本发明。需说明的是，附图均采用较为简化的形式且均使用非精准的比例，仅用以方便、明晰地辅助说明本发明实施例的目的。In the following paragraphs the invention is described more specifically by way of example with reference to the accompanying drawings. It should be noted that all the drawings are in simplified form and use inaccurate scales, which are only used to facilitate and clearly assist the purpose of illustrating the embodiments of the present invention.

本发明提供了一种基于深度学习的视频拆条方法，如图1 所示，包括以下步骤：The present invention provides a video stripping method based on deep learning, as shown in Figure 1, comprising the following steps:

步骤1：视频数据初始化；Step 1: Video data initialization;

所述步骤1 中视频数据初始化包括获取视频数据中的音频波形数据和图像数据。The video data initialization in step 1 includes acquiring audio waveform data and image data in the video data.

所述步骤2 中的人脸识别技术包括：使用深度学习算法对人脸进行编码，比较视频数据中各个图像帧人脸的相似性，将出现相似人脸的的连续时间片段视为一个拆条片段，故可以得到多个拆条片段。The face recognition technology in the step 2 includes: using a deep learning algorithm to encode the face, comparing the similarity of the faces of each image frame in the video data, and treating the continuous time segments of similar faces as a split Fragments, so multiple stripped fragments can be obtained.

所述步骤4 中声音识别技术包括：使用深度学习算法在候选拆条片段的拆条时间点前后一定范围中寻找与所述提取声音特征具有相似特征的声音。The voice recognition technology in step 4 includes: using a deep learning algorithm to search for voices with similar characteristics to the extracted voice features within a certain range before and after the disassembly time point of the candidate disassembly segment.

所述使用深度学习算法对人脸进行编码过程包括：The process of encoding human faces using deep learning algorithms includes:

通过将多张人脸图像信息映射成低维度向量，模型可以辨别出两张人脸是相似或相同。By mapping the information of multiple face images into low-dimensional vectors, the model can distinguish whether two faces are similar or the same.

在实际过程中，可以先利用分布式算法对视频进行分析和处理，将视频以指定秒数（如10 秒）为粒度，划分为若干片段。而后将这些片段分配给可用的服务器同时进行人脸和声音的检测，速度极快，可以实现秒级短视频生产。In the actual process, the distributed algorithm can be used to analyze and process the video first, and the video can be divided into several segments with a specified number of seconds (such as 10 seconds) as the granularity. These clips are then distributed to available servers for face and voice detection at the same time, which is extremely fast and can achieve second-level short video production.

以上所述的实施方式，并不构成对该技术方案保护范围的限定。任何在上述实施方式的精神和原则之内所作的修改、等同替换和改进等，均应包含在该技术方案的保护范围之内。The implementation methods described above do not constitute a limitation to the scope of protection of the technical solution. Any modifications, equivalent replacements and improvements made within the spirit and principles of the above implementation methods shall be included in the protection scope of the technical solution.