Movatterモバイル変換


[0]ホーム

URL:


CN109005451A - Video demolition method based on deep learning - Google Patents

Video demolition method based on deep learning
Download PDF

Info

Publication number
CN109005451A
CN109005451ACN201810701351.3ACN201810701351ACN109005451ACN 109005451 ACN109005451 ACN 109005451ACN 201810701351 ACN201810701351 ACN 201810701351ACN 109005451 ACN109005451 ACN 109005451A
Authority
CN
China
Prior art keywords
demolition
face
deep learning
video
segment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810701351.3A
Other languages
Chinese (zh)
Other versions
CN109005451B (en
Inventor
倪攀
姜子琛
彭梅
刘睿
刘宜飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Star Technology Co Ltd
Original Assignee
Hangzhou Star Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Star Technology Co LtdfiledCriticalHangzhou Star Technology Co Ltd
Priority to CN201810701351.3ApriorityCriticalpatent/CN109005451B/en
Publication of CN109005451ApublicationCriticalpatent/CN109005451A/en
Application grantedgrantedCritical
Publication of CN109005451BpublicationCriticalpatent/CN109005451B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

The video demolition method based on deep learning that the invention discloses a kind of, comprising the following steps: step 1: video data initialization;Step 2: carrying out Face datection using face recognition technology, the time slice of similar face continuously occurred as candidate demolition segment;Step 3: in candidate demolition segment, extracting sound characteristic;Step 4: refining the demolition time point of candidate demolition segment using voice recognition technology and the sound characteristic, obtain final demolition time point.Two features of face and sound are identified using deep learning algorithm in the present invention, improve the accuracy of demolition, and can be exceedingly fast simultaneously to multiple video clips progress face and voice recognition, speed.In addition, deep learning algorithm can carry out intelligent demolition to video, reduce the investment of manpower.

Description

Translated fromChinese
基于深度学习的视频拆条方法Video stripping method based on deep learning

技术领域technical field

本发明涉及媒资管理技术领域,更具体的说,涉及一种基于深度学习的The present invention relates to the technical field of media asset management, and more specifically, to a deep learning-based

视频拆条方法。How to split a video.

背景技术Background technique

随着电视节目生产全流程的数字化,网络化、信息化以及电视节目的不断发展,积累了大量的多媒体数据,面对海量的多媒体资源无法深度开发和利用以及我国对电视节目的监管要求不断提升,拆条技术应运而生。而互联网的不断发展,使得视频素材量呈现爆炸式增长,直播、小视频、网络电视节目、移动多媒体等不是进行完整的节目播出,而是需要拆分或精简小视频,用户对互联网内容的碎片化需求不断增加,拆条在新媒体中也有越来越广泛的应用。With the digitization of the whole process of TV program production, networking, informatization, and the continuous development of TV programs, a large amount of multimedia data has been accumulated. Facing the in-depth development and utilization of massive multimedia resources and the continuous improvement of my country's regulatory requirements for TV programs , Strip removal technology came into being. With the continuous development of the Internet, the amount of video materials has exploded. Live broadcasts, small videos, Internet TV programs, mobile multimedia, etc. do not broadcast complete programs, but need to split or streamline small videos. The demand for fragmentation continues to increase, and splitting is also more and more widely used in new media.

传统拆条方法是人工拆条即人工逐帧预览手工拆条,需要大量的人力投入且效率太低。现有技术是基于云架构的拆条方法,和传统的拆条方式比效率有所提高,在内容产出的时效性和软件成本方面有较大的优势,但需要大量的人力投入,并没有将人力从大量低质量的重复劳动中解放出来。The traditional strip removal method is manual strip removal, that is, manual frame-by-frame preview and manual strip removal, which requires a lot of manpower input and is too inefficient. The existing technology is based on the cloud-based stripping method, which has improved efficiency compared with the traditional stripping method, and has great advantages in terms of timeliness of content output and software cost, but requires a lot of manpower input, and there is no Free manpower from a large amount of low-quality repetitive labor.

发明内容Contents of the invention

有鉴于此,本发明提供一种可以降低拆条工作中人力投入的基于深度学习的视频拆条方法,用于解决现有技术中需要大量的人力投入的问题。In view of this, the present invention provides a deep learning-based video stripping method that can reduce manpower input in the stripping work, so as to solve the problem of requiring a large amount of manpower input in the prior art.

本发明提供了一种基于深度学习的视频拆条方法,包括以下步骤:The invention provides a method for stripping video based on deep learning, comprising the following steps:

步骤1:视频数据初始化;Step 1: Video data initialization;

步骤2:利用人脸识别技术进行人脸检测,得到连续出现相似人脸的时间片段作为候选拆条片段;Step 2: Use face recognition technology to detect faces, and obtain time segments in which similar faces appear continuously as candidate splitting segments;

步骤3:在候选的拆条片段中,提取声音特征;Step 3: Extract sound features from the candidate stripping segments;

步骤4:利用声音识别技术和所述声音特征细化候选拆条片段的拆条时间点,得到最终的拆条时间点。Step 4: Using the sound recognition technology and the sound features to refine the disassembly time points of the candidate disassembly segments to obtain the final dissection time points.

可选的,所述步骤1 中视频数据初始化包括获取视频数据中的音频波形数据和图像数据。Optionally, the video data initialization in step 1 includes acquiring audio waveform data and image data in the video data.

可选的,所述步骤2 中的人脸识别技术包括:使用深度学习算法对人脸进行编码,比较视频数据中各个图像帧人脸的相似性。Optionally, the face recognition technology in step 2 includes: using a deep learning algorithm to encode the face, and comparing the similarity of the face in each image frame in the video data.

可选的,所述步骤4 中声音识别技术包括:使用深度学习算法在候选拆条片段的拆条时间点前后一定范围中寻找与所述提取声音特征具有相似特征的声音。Optionally, the voice recognition technology in step 4 includes: using a deep learning algorithm to search for voices with similar characteristics to the extracted voice features within a certain range before and after the disassembly time point of the candidate disassembly segment.

可选的,所述使用深度学习算法对人脸进行编码过程包括:Optionally, the process of encoding a face using a deep learning algorithm includes:

训练深度神经网络模型,使其能够对输入的人脸提取特征;Train the deep neural network model so that it can extract features from the input face;

输入视频数据的图像数据到所述深度神经网络模型,提取图像数据的高维度人脸特征;The image data of input video data is to described depth neural network model, extracts the high-dimensional face feature of image data;

进行编码,即将高维度人脸特征映射为低维度的向量;Encoding, that is, mapping high-dimensional face features to low-dimensional vectors;

根据低维度的向量,辨别视频数据中的人脸相似或不同。Identify similar or different faces in video data based on low-dimensional vectors.

本发明中与现有技术相比,具有以下优点:本发明中利用深度学习算法对人脸和声音两个特征进行识别,提高了拆条的准确性,且可同时对多个视频片段进行人脸和声音识别,速度极快。此外,深度学习算法可以对视频进行智能拆条,减少了人力的投入。Compared with the prior art, the present invention has the following advantages: In the present invention, a deep learning algorithm is used to identify the two features of human face and voice, which improves the accuracy of stripping, and can simultaneously perform artificial intelligence on multiple video clips. Face and voice recognition, extremely fast. In addition, the deep learning algorithm can intelligently split the video, reducing the input of manpower.

附图说明Description of drawings

图1 为本发明基于深度学习的视频拆条方法的流程图。Fig. 1 is a flow chart of the video stripping method based on deep learning in the present invention.

具体实施方式Detailed ways

以下结合附图对本发明的优选实施例进行详细描述,但本发明并不仅仅限于这些实施例。本发明涵盖任何在本发明的精神和范围上做的替代、修改、等效方法以及方案。Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings, but the present invention is not limited to these embodiments. The present invention covers any alternatives, modifications, equivalent methods and schemes made within the spirit and scope of the present invention.

为了使公众对本发明有彻底的了解,在以下本发明优选实施例中详细说明了具体的细节,而对本领域技术人员来说没有这些细节的描述也可以完全理解本发明。In order to provide the public with a thorough understanding of the present invention, specific details are set forth in the following preferred embodiments of the present invention, but those skilled in the art can fully understand the present invention without the description of these details.

在下列段落中参照附图以举例方式更具体地描述本发明。需说明的是,附图均采用较为简化的形式且均使用非精准的比例,仅用以方便、明晰地辅助说明本发明实施例的目的。In the following paragraphs the invention is described more specifically by way of example with reference to the accompanying drawings. It should be noted that all the drawings are in simplified form and use inaccurate scales, which are only used to facilitate and clearly assist the purpose of illustrating the embodiments of the present invention.

本发明提供了一种基于深度学习的视频拆条方法,如图1 所示,包括以下步骤:The present invention provides a video stripping method based on deep learning, as shown in Figure 1, comprising the following steps:

步骤1:视频数据初始化;Step 1: Video data initialization;

步骤2:利用人脸识别技术进行人脸检测,得到连续出现相似人脸的时间片段作为候选拆条片段;Step 2: Use face recognition technology to detect faces, and obtain time segments in which similar faces appear continuously as candidate splitting segments;

步骤3:在候选的拆条片段中,提取声音特征;Step 3: Extract sound features from the candidate stripping segments;

步骤4:利用声音识别技术和所述声音特征细化候选拆条片段的拆条时间点,得到最终的拆条时间点。Step 4: Using the sound recognition technology and the sound features to refine the disassembly time points of the candidate disassembly segments to obtain the final dissection time points.

所述步骤1 中视频数据初始化包括获取视频数据中的音频波形数据和图像数据。The video data initialization in step 1 includes acquiring audio waveform data and image data in the video data.

所述步骤2 中的人脸识别技术包括:使用深度学习算法对人脸进行编码,比较视频数据中各个图像帧人脸的相似性,将出现相似人脸的的连续时间片段视为一个拆条片段,故可以得到多个拆条片段。The face recognition technology in the step 2 includes: using a deep learning algorithm to encode the face, comparing the similarity of the faces of each image frame in the video data, and treating the continuous time segments of similar faces as a split Fragments, so multiple stripped fragments can be obtained.

所述步骤4 中声音识别技术包括:使用深度学习算法在候选拆条片段的拆条时间点前后一定范围中寻找与所述提取声音特征具有相似特征的声音。The voice recognition technology in step 4 includes: using a deep learning algorithm to search for voices with similar characteristics to the extracted voice features within a certain range before and after the disassembly time point of the candidate disassembly segment.

所述使用深度学习算法对人脸进行编码过程包括:The process of encoding human faces using deep learning algorithms includes:

训练深度神经网络模型,使其能够对输入的人脸提取特征;Train the deep neural network model so that it can extract features from the input face;

输入视频数据的图像数据到所述深度神经网络模型,提取图像数据的高维度人脸特征;The image data of input video data is to described depth neural network model, extracts the high-dimensional face feature of image data;

进行编码,即将高维度人脸特征映射为低维度的向量;Encoding, that is, mapping high-dimensional face features to low-dimensional vectors;

根据低维度的向量,辨别视频数据中的人脸相似或不同。Identify similar or different faces in video data based on low-dimensional vectors.

通过将多张人脸图像信息映射成低维度向量,模型可以辨别出两张人脸是相似或相同。By mapping the information of multiple face images into low-dimensional vectors, the model can distinguish whether two faces are similar or the same.

在实际过程中,可以先利用分布式算法对视频进行分析和处理,将视频以指定秒数(如10 秒)为粒度,划分为若干片段。而后将这些片段分配给可用的服务器同时进行人脸和声音的检测,速度极快,可以实现秒级短视频生产。In the actual process, the distributed algorithm can be used to analyze and process the video first, and the video can be divided into several segments with a specified number of seconds (such as 10 seconds) as the granularity. These clips are then distributed to available servers for face and voice detection at the same time, which is extremely fast and can achieve second-level short video production.

以上所述的实施方式,并不构成对该技术方案保护范围的限定。任何在上述实施方式的精神和原则之内所作的修改、等同替换和改进等,均应包含在该技术方案的保护范围之内。The implementation methods described above do not constitute a limitation to the scope of protection of the technical solution. Any modifications, equivalent replacements and improvements made within the spirit and principles of the above implementation methods shall be included in the protection scope of the technical solution.

Claims (5)

CN201810701351.3A2018-06-292018-06-29 Video stripping method based on deep learningActiveCN109005451B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201810701351.3ACN109005451B (en)2018-06-292018-06-29 Video stripping method based on deep learning

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201810701351.3ACN109005451B (en)2018-06-292018-06-29 Video stripping method based on deep learning

Publications (2)

Publication NumberPublication Date
CN109005451Atrue CN109005451A (en)2018-12-14
CN109005451B CN109005451B (en)2021-07-30

Family

ID=64601854

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201810701351.3AActiveCN109005451B (en)2018-06-292018-06-29 Video stripping method based on deep learning

Country Status (1)

CountryLink
CN (1)CN109005451B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110267061A (en)*2019-04-302019-09-20新华智云科技有限公司A kind of news demolition method and system
CN111222499A (en)*2020-04-222020-06-02成都索贝数码科技股份有限公司News automatic bar-splitting conditional random field algorithm prediction result back-flow training method
CN111586494A (en)*2020-04-302020-08-25杭州慧川智能科技有限公司Intelligent strip splitting method based on audio and video separation
CN112565885A (en)*2020-11-302021-03-26清华珠三角研究院Video segmentation method, system, device and storage medium
CN113810782A (en)*2020-06-122021-12-17阿里巴巴集团控股有限公司Video processing method and device, server and electronic device

Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20070091203A1 (en)*2005-10-252007-04-26Peker Kadir AMethod and system for segmenting videos using face detection
CN101616264A (en)*2008-06-272009-12-30中国科学院自动化研究所 News Video Cataloging Method and System
WO2013097101A1 (en)*2011-12-282013-07-04华为技术有限公司Method and device for analysing video file
CN103546667A (en)*2013-10-242014-01-29中国科学院自动化研究所 An automatic news stripping method for mass broadcast and television supervision
CN105931633A (en)*2016-05-302016-09-07深圳市鼎盛智能科技有限公司Speech recognition method and system
CN106228142A (en)*2016-07-292016-12-14西安电子科技大学Face verification method based on convolutional neural networks and Bayesian decision

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20070091203A1 (en)*2005-10-252007-04-26Peker Kadir AMethod and system for segmenting videos using face detection
CN101616264A (en)*2008-06-272009-12-30中国科学院自动化研究所 News Video Cataloging Method and System
WO2013097101A1 (en)*2011-12-282013-07-04华为技术有限公司Method and device for analysing video file
CN103546667A (en)*2013-10-242014-01-29中国科学院自动化研究所 An automatic news stripping method for mass broadcast and television supervision
CN105931633A (en)*2016-05-302016-09-07深圳市鼎盛智能科技有限公司Speech recognition method and system
CN106228142A (en)*2016-07-292016-12-14西安电子科技大学Face verification method based on convolutional neural networks and Bayesian decision

Cited By (9)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110267061A (en)*2019-04-302019-09-20新华智云科技有限公司A kind of news demolition method and system
CN111222499A (en)*2020-04-222020-06-02成都索贝数码科技股份有限公司News automatic bar-splitting conditional random field algorithm prediction result back-flow training method
CN111222499B (en)*2020-04-222020-08-14成都索贝数码科技股份有限公司News automatic bar-splitting conditional random field algorithm prediction result back-flow training method
CN111586494A (en)*2020-04-302020-08-25杭州慧川智能科技有限公司Intelligent strip splitting method based on audio and video separation
CN111586494B (en)*2020-04-302022-03-11腾讯科技(深圳)有限公司Intelligent strip splitting method based on audio and video separation
CN113810782A (en)*2020-06-122021-12-17阿里巴巴集团控股有限公司Video processing method and device, server and electronic device
CN113810782B (en)*2020-06-122022-09-27阿里巴巴集团控股有限公司Video processing method and device, server and electronic device
CN112565885A (en)*2020-11-302021-03-26清华珠三角研究院Video segmentation method, system, device and storage medium
CN112565885B (en)*2020-11-302023-01-06清华珠三角研究院Video segmentation method, system, device and storage medium

Also Published As

Publication numberPublication date
CN109005451B (en)2021-07-30

Similar Documents

PublicationPublication DateTitle
CN109005451A (en)Video demolition method based on deep learning
CN102075695B (en)New generation intelligent cataloging system and method facing large amount of broadcast television programs
CN110012349B (en) An End-to-End Structured Method for News Programs
CN104063706B (en)Video fingerprint extraction method based on SURF algorithm
CN106601243B (en)Video file identification method and device
WO2019228267A1 (en)Short video synthesis method and apparatus, and device and storage medium
CN112329604B (en) A Multimodal Sentiment Analysis Method Based on Multidimensional Low-Rank Decomposition
WO2023197979A1 (en)Data processing method and apparatus, and computer device and storage medium
CN106878632A (en)A kind for the treatment of method and apparatus of video data
CN115460462B (en) A method for automatically cropping audio-visual datasets containing anchors in Cantonese news videos
US12277766B2 (en)Information generation method and apparatus
CN107247919A (en)The acquisition methods and system of a kind of video feeling content
CN114299418B (en) A Cantonese lip reading recognition method, device and storage medium
CN114333062B (en)Pedestrian re-recognition model training method based on heterogeneous dual networks and feature consistency
CN104063701B (en)Fast electric television stations TV station symbol recognition system and its implementation based on SURF words trees and template matches
CN119646874A (en) A multimodal safety fence method based on vector matching
CN118779492A (en) A multimodal large model driven video understanding and retrieval method
CN114022923A (en)Intelligent collecting and editing system
CN104504406A (en)Rapid and high-efficiency near-duplicate image matching method
CN116166125A (en) Virtual image construction method, device, equipment and storage medium
CN102004795B (en)Hand language searching method
CN108764258B (en)Optimal image set selection method for group image insertion
CN106611043B (en)Video searching method and system
CN116506699A (en) System and method for producing audio-visual content
CN113113043B (en)Method and device for converting voice into image

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant
PE01Entry into force of the registration of the contract for pledge of patent right

Denomination of invention:A Video Stripping Method Based on Deep Learning

Granted publication date:20210730

Pledgee:Guotou Taikang Trust Co.,Ltd.

Pledgor:HANGZHOU XINGXI TECHNOLOGY Co.,Ltd.

Registration number:Y2024980020954

PE01Entry into force of the registration of the contract for pledge of patent right
PC01Cancellation of the registration of the contract for pledge of patent right

Granted publication date:20210730

Pledgee:Guotou Taikang Trust Co.,Ltd.

Pledgor:HANGZHOU XINGXI TECHNOLOGY Co.,Ltd.

Registration number:Y2024980020954

PC01Cancellation of the registration of the contract for pledge of patent right

[8]ページ先頭

©2009-2025 Movatter.jp