Movatterモバイル変換


[0]ホーム

URL:


CN109376603A - A kind of video frequency identifying method, device, computer equipment and storage medium - Google Patents

A kind of video frequency identifying method, device, computer equipment and storage medium
Download PDF

Info

Publication number
CN109376603A
CN109376603ACN201811113391.2ACN201811113391ACN109376603ACN 109376603 ACN109376603 ACN 109376603ACN 201811113391 ACN201811113391 ACN 201811113391ACN 109376603 ACN109376603 ACN 109376603A
Authority
CN
China
Prior art keywords
video
recognition result
subfile
identification
key frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811113391.2A
Other languages
Chinese (zh)
Inventor
程成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhou Tong Technology Co Ltd
Original Assignee
Beijing Zhou Tong Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhou Tong Technology Co LtdfiledCriticalBeijing Zhou Tong Technology Co Ltd
Priority to CN201811113391.2ApriorityCriticalpatent/CN109376603A/en
Publication of CN109376603ApublicationCriticalpatent/CN109376603A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

The embodiment of the invention discloses a kind of video frequency identifying method, device, computer equipment and storage mediums, the described method includes: obtain simple video subfile corresponding with video file to be identified and simple audio subfile, and acquisition key frame set corresponding with the simple video subfile and video clip set;Multi-modal picture recognition is carried out to the key frame set, the first recognition result is obtained, and video identification is carried out to the video clip set, obtains the second recognition result;Audio identification is carried out to the simple audio subfile, obtains third recognition result;According to first recognition result, second recognition result and the third recognition result, obtain corresponding with the video file integrating recognition result.The technical solution of the embodiment of the present invention is realized on the basis of reducing identification cost, and the rich of video identification technology, accuracy, high efficiency and real-time are improved.

Description

A kind of video frequency identifying method, device, computer equipment and storage medium
Technical field
The present embodiments relate to technical field of video processing more particularly to a kind of video frequency identifying method, device, computersEquipment and storage medium.
Background technique
As Global Internet is popularized with what is communicated, allow people all over the world with various communication equipmentsOnline exchange, transmission multimedia messages.People can upload to respective picture, text, voice, video etc. the network platform pointEnjoy respective state, mood, beautiful scenery etc..And video with it includes abundant content information, allow people more intuitive, clearUnderstanding content and largely transmit and be stored in the network platform.But in the video of people's upload, there are many local laws, roadThe video that moral does not allow, such as yellow, gambling, it is bloody, vulgar, sudden and violent probably, extreme religion video.User propagates these views in downloadingWhen frequency, the great variety (especially teenager) that is easy to cause in soul.And the view of magnanimity on audit internet is manually gone merelyFrequency is a very time-consuming, laborious and unpractical problem.Video audit technology is come into being in this context.
Video audits technology early stage generally using traditional machine learning method, and this method uses artificial design features, needlePair be specific library, lack generalization (general the library be applicable in, algorithm performance is just deteriorated into another library).People is used laterWork audit combines conventional video audit technology to be audited by 7*24 hours uninterrupted naked eyes+machine auxiliary, reduces illegalThe appearance of violation video content.In recent years, in the fast development of the fields such as video, image, voice in deep learning.Therefore based on deepStudy, image recognition, the audit of the machine intelligence of cloud are spent as Main Trends of The Development, this can make enterprise put into manual examination and verificationCost substantially reduce and available better video auditing result.Country Baidu, Netease, map, Shang Tangdeng section at presentSkill company is all proposed respective video auditing system accordingly, and external Google, Facebook, Amazon, ValossaDeng being also proposed each video auditing system for having oneself characteristic.
In the implementation of the present invention, the discovery prior art has following defects that inventor
Although machine learning method can identify part violation content information, in short-sighted frequency, live video etc.When appearance can not but accomplish accurate content recognition and face the video of magnanimity, algorithm cannot identify in video wellHold.And manual examination and verification combination conventional video audit technology needs huge manual examination and verification team, audits accuracy rate in artificial intelligenceIt also needs further to expand its team when not high.Meanwhile manual examination and verification also will cause fatigue in continual audit video,And then lead to missing inspection, the erroneous detection of some videos.And enterprise needs to carry out plenty of time training to manual examination and verification personnel, so that enterpriseIndustry is to the input costs of manual examination and verification considerably beyond machine learning algorithm cost.It is existing to be based on deep learning, image recognition, cloudThe machine intelligence audit technology of technology cannot detect a large amount of vulgar unhelpful videos present on current network well, and identify systemUnite identification content is relatively simple, identification range is small, identification dimension once increase calculation amount also will exponentially type increase, to calculationForce request is excessively high.
Summary of the invention
The embodiment of the present invention provides a kind of video frequency identifying method, device, computer equipment and storage medium, to know reducingOn the basis of other cost, the rich of video identification technology, accuracy, high efficiency and real-time are improved.
In a first aspect, the embodiment of the invention provides a kind of video frequency identifying methods, comprising:
Obtain simple video subfile corresponding with video file to be identified and simple audio subfile, and obtain andThe corresponding key frame set of the simple video subfile and video clip set;
Multi-modal picture recognition is carried out to the key frame set, obtains the first recognition result, and to the video clipSet carries out video identification, obtains the second recognition result;
Audio identification is carried out to the simple audio subfile, obtains third recognition result;
According to first recognition result, second recognition result and the third recognition result, obtain with it is describedVideo file is corresponding to integrate recognition result.
Second aspect, the embodiment of the invention also provides a kind of video identification devices, comprising:
Subfile obtains module, for acquisition simple video subfile corresponding with video file to be identified and merelyAudio subfile, and obtain key frame set corresponding with the simple video subfile and video clip set;
First identification module, for obtaining the first recognition result to the multi-modal picture recognition of key frame set progress,And video identification is carried out to the video clip set, obtain the second recognition result;
Second identification module obtains third recognition result for carrying out audio identification to the simple audio subfile;
Recognition result obtains module, for according to first recognition result, second recognition result and described theThree recognition results obtain corresponding with the video file integrating recognition result.
The third aspect, the embodiment of the invention also provides a kind of computer equipment, the computer equipment includes:
One or more processors;
Storage device, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processingDevice realizes video frequency identifying method provided by any embodiment of the invention.
Fourth aspect, the embodiment of the invention also provides a kind of computer storage mediums, are stored thereon with computer program,The program realizes video frequency identifying method provided by any embodiment of the invention when being executed by processor.
The embodiment of the present invention is by obtaining simple video subfile corresponding with video file to be identified and simple soundFrequency subfile, and key frame set corresponding with simple video subfile and video clip set;To key frame set intoThe multi-modal picture recognition of row obtains the first recognition result, and carries out video identification to video clip set and obtain the second identification knotFruit;Audio identification is carried out to simple audio subfile and obtains third recognition result, finally ties the first recognition result, the second identificationFruit and third recognition result are integrated to obtain the integration recognition result of video file, solve the existing knowledge of existing video audit technologyThe problem that other content is single and identification range is small realizes abundant identification type, refinement identification content, carries out multidimensional to video contentThe identification of degree improves the rich of video identification technology, accuracy, high efficiency and reality on the basis of reducing identification costShi Xing.
Detailed description of the invention
Fig. 1 is a kind of flow chart for video frequency identifying method that the embodiment of the present invention one provides;
Fig. 2 is a kind of flow chart of video frequency identifying method provided by Embodiment 2 of the present invention;
Fig. 3 a is a kind of flow chart for video frequency identifying method that the embodiment of the present invention three provides;
Fig. 3 b is a kind of bounding box size and position prediction effect diagram that the embodiment of the present invention three provides;
Fig. 3 c is a kind of Face datection effect diagram that the embodiment of the present invention three provides;
Fig. 3 d is a kind of effect diagram for face key point location that the embodiment of the present invention three provides;
Fig. 3 e is a kind of flow chart for video frequency identifying method that the embodiment of the present invention three provides;
Fig. 3 f is a kind of system schematic for video identification that the embodiment of the present invention three provides;
Fig. 3 g is a kind of flow chart for video frequency identifying method that the embodiment of the present invention three provides;
Fig. 3 h is a kind of schematic diagram for logarithm Meier spectrum signature that the embodiment of the present invention three provides;
Fig. 3 i is a kind of video recognition algorithms configuration diagram that the embodiment of the present invention three provides;
Fig. 4 is a kind of schematic diagram for video identification device that the embodiment of the present invention four provides;
Fig. 5 is a kind of structural schematic diagram for computer equipment that the embodiment of the present invention five provides.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouchedThe specific embodiment stated is used only for explaining the present invention rather than limiting the invention.
It also should be noted that only the parts related to the present invention are shown for ease of description, in attached drawing rather thanFull content.It should be mentioned that some exemplary embodiments are described before exemplary embodiment is discussed in greater detailAt the processing or method described as flow chart.Although operations (or step) are described as the processing of sequence by flow chart,It is that many of these operations can be implemented concurrently, concomitantly or simultaneously.In addition, the sequence of operations can be by againIt arranges.The processing can be terminated when its operations are completed, it is also possible to have the additional step being not included in attached drawing.The processing can correspond to method, function, regulation, subroutine, subprogram etc..
Embodiment one
Fig. 1 is a kind of flow chart for video frequency identifying method that the embodiment of the present invention one provides, and the present embodiment is applicable to pairVideo file carries out accurate, quick the case where identifying, this method can be executed by video identification device, which can be by softThe mode of part and/or hardware can be generally integrated in computer equipment to realize.Correspondingly, as shown in Figure 1, this method packetInclude following operation:
S110, simple video subfile corresponding with video file to be identified and simple audio subfile are obtained, andObtain key frame set corresponding with the simple video subfile and video clip set.
Wherein, video file to be identified may include two kinds of data resources of video and audio.Simple video subfile canTo be the file for only including video resource, similarly, simple audio subfile can be the file for only including audio resource.Key frameSet can be used for storing each key frame in simple video subfile, and key frame can be representative in simple video subfileStrongest video frame.Wherein, the representative representativeness for referring to video clip semantic content, content intact and semanteme are obviously.Depending onFrequency set of segments can be used for storing each video clip in simple video subfile.
In embodiments of the present invention, after getting video file to be identified, can to video file to be identified intoThe processing that row audio-video detaches, to obtain simple video subfile and simple audio subfile.To video file to be identifiedWhen being identified, simple video subfile and simple audio subfile can be identified respectively.Specifically, to simple viewWhen frequency subfile is identified, two kinds of identifying schemes of image recognition and video identification can be carried out.When being identified to image,It can be identified according to each key frame in the corresponding key frame set of simple video subfile;It is identified to videoWhen, each video clip in the corresponding video clip set of simple video subfile can be identified.
S120, multi-modal picture recognition is carried out to the key frame set, obtains the first recognition result, and to the videoSet of segments carries out video identification, obtains the second recognition result.
Wherein, multi-modal picture recognition can be integration or fusion two kinds and two or more picture recognition features.First knowsOther result can be picture recognition as a result, the second recognition result can be video recognition result.
In embodiments of the present invention, each key frame in the corresponding key frame set of simple video subfile is identifiedWhen, the first identification knot can be obtained to the key frame picture recognition in key frame set by the way of multi-modal picture recognitionFruit.Each video clip in the corresponding video clip set of simple video subfile is carried out to identify available second identification knotFruit.
It should be noted that in embodiments of the present invention, the acquisition process of the first recognition result and the second recognition result isIt is mutually independent, it is unaffected by each other.That is, the video identification process of multi-modal picture recognition and video clip is mutually indepedentLink.
S130, audio identification is carried out to the simple audio subfile, obtains third recognition result.
Wherein, third recognition result can be audio recognition result.
Correspondingly, carrying out audio identification, available corresponding third recognition result to simple audio subfile.
S140, according to first recognition result, second recognition result and the third recognition result, obtain withThe video file is corresponding to integrate recognition result.
Wherein, integrate recognition result can be to the first recognition result, the second recognition result and third recognition result byThe recognition result integrated according to setting rule.
It in embodiments of the present invention, can after obtaining the first recognition result, the second recognition result and third recognition resultCorresponding with video file recognition result is integrated to be integrated to obtain to three kinds of recognition results.It optionally, can be directly byUnion is added to obtain integrating recognition result in one recognition result, the second recognition result and third recognition result.
Video frequency identifying method provided by the embodiment of the present invention can be used for in video whether there is law and morals notThe content of permission is audited, and carries out intelligent video audit for content illegal in short-sighted frequency, live video and long video,To construct good the Internet transmission storage environment, can be very good to solve the problems, such as that current short-sighted frequency and live streaming platform exist,And enterprise is substantially reduced to the investment of manual examination and verification.Meanwhile video frequency identifying method provided by the embodiment of the present invention customizes energyPower is strong, and flexibility ratio is high, can be customized according to user behavior and solve user demand.It can also provide to have with video content and be associated with by forceProperty advertisement launch, to promote advertisement delivery effect.
The embodiment of the present invention is by obtaining simple video subfile corresponding with video file to be identified and simple soundFrequency subfile, and key frame set corresponding with simple video subfile and video clip set;To key frame set intoThe multi-modal picture recognition of row obtains the first recognition result, and carries out video identification to video clip set and obtain the second identification knotFruit;Audio identification is carried out to simple audio subfile and obtains third recognition result, finally ties the first recognition result, the second identificationFruit and third recognition result are integrated to obtain the integration recognition result of video file, solve the existing knowledge of existing video audit technologyThe problem that other content is single and identification range is small realizes abundant identification type, refinement identification content, carries out multidimensional to video contentThe identification of degree improves the rich of video identification technology, accuracy, high efficiency and reality on the basis of reducing identification costShi Xing.
Embodiment two
Fig. 2 is a kind of flow chart of video frequency identifying method provided by Embodiment 2 of the present invention, and the present embodiment is with above-mentioned implementationIt is embodied based on example, in the present embodiment, gives acquisition key frame collection corresponding with the simple video subfileThe specific implementation of conjunction and video clip set.Correspondingly, as shown in Fig. 2, the method for the present embodiment may include:
S210, simple video subfile corresponding with video file to be identified and simple audio subfile are obtained, andObtain key frame set corresponding with the simple video subfile and video clip set.
Correspondingly, S210 can specifically include:
S211, the simple video subfile is filtered using video frame coarse filtration technology, obtains filtering video frameSet.
Wherein, filtering sets of video frames can be used for storing the video frame obtained after simple video subfile is filtered.
It is understood that handling the frame image in entire video flowing is very time-consuming and waste computing resource, it is commonProcessing system for video generally use and carry out double sampling in video streaming with uniform time interval to reduce the number of video frameAmount, but this method easily loses certain key frames in video.
The embodiment of the present invention is in order to improve the accuracy of key-frame extraction, first using video frame coarse filtration technology to simpleVideo subfile is filtered, to effectively reduce the quantity of video frame.Specifically, can will be dark in simple video subfileFrame, fuzzy frame and low-quality filtering frames, to obtain the preferable video frame of most of total quality.And then obtain after filtrationFiltering sets of video frames selects clear, bright and high quality video frame as key frame.
Specifically, spacer can be filtered out by following formula:
Luminance(Irgb)=0.2126Ir+0.7152Ig+0.0722Ib
Wherein, Luminance () indicates brightness of image, IrgbIndicate RGB triple channel natural image, IrIndicate red logicalRoad image, IgIndicate that green channel images, r indicate that red channel, g indicate that green channel, b indicate that blue channel, rgb indicate threeChannel.When the brightness of image of each video frame in filtering sets of video frames is calculated by above-mentioned formula, setting can be passed throughThe video frame that the mode of threshold value is unsatisfactory for requiring to brightness of image is filtered.
Fuzzy frame can be filtered by following formula:
Wherein, Sharpness () indicates image definition, IgrayIndicate gray level image, ΔxIndicate transverse gradients, ΔyTableShow longitudinal gradient, x indicates transverse direction, and y indicates longitudinal direction.It is filtered in sets of video frames when being calculated by above-mentioned formulaWhen the image definition of each video frame, can in such a way that threshold value is set to image definition be unsatisfactory for require video frame intoRow filtering.
Low-quality frame can be filtered by following formula:
Wherein, δ indicates that picture quality, M indicate that horizontal pixel number, N indicate that longitudinal pixel number, i indicate lateral coordinates, j tableShow that longitudinal coordinate, P () indicate that pixel value, μ indicate threshold value.It is each in filtering sets of video frames when being calculated by above-mentioned formulaWhen the picture quality of video frame, the video frame that can be unsatisfactory for requiring to picture quality by way of threshold value is arranged was carried outFilter.
In addition, can have a large amount of fuzzy frame during the Shot change of video.And hence it is also possible to according to lens edgeDetection technique further filters out underproof video frame.
S212, calculating feature vector corresponding with video frame is respectively filtered in the filtering sets of video frames, and according toDescribed eigenvector carries out clustering processing to each filtering video frame in the filtering sets of video frames, obtains at least two clustersCluster, wherein include at least one filtering video frame in the clustering cluster.
The most common extraction method of key frame is clustering, and clustering is similar by calculating the vision between video frameDegree, and select a video frame closest to clustering cluster center as key frame from each clustering cluster.Implement in the present inventionIn example, key frame can be extracted according to the feature vector of video frame is respectively filtered in filtering sets of video frames.Specifically, can countCalculate the corresponding feature vector of each filtering video frame, and according to the feature vector of each filtering video frame to each filtering video frame intoRow clustering processing, and then obtain multiple clustering clusters.It include at least one filtering video frame in each clustering cluster.
In an alternate embodiment of the present invention where, it calculates and respectively filters video frame difference in the filtering sets of video framesCorresponding feature vector may include: to be regarded using convolutional neural networks model to each filtering in the filtering sets of video framesFrequency frame carries out feature extraction;Or, using local binary patterns LBP to it is described filtering sets of video frames in each filtering video frame intoRow feature extraction, and each feature extraction result is subjected to processing formation statistics histogram as with each filtering video frame and is distinguishedCorresponding LBP feature vector.
In embodiments of the present invention, can use convolutional neural networks (Convolutional Neural Network,CNN) model carries out feature extraction to each filtering video frame in the filtering sets of video frames.Specifically, classics can be selectedThe network architectures such as CNN model such as AlexNet, VGGNet or Inception, obtain the high dimensional feature vector statement of video frame.
It is understood that many frames usually in video are all that similarity is very high, therefore is directed to the one of video frameThe feature for being easy to calculate a bit can effectively distinguish the similarity between different video frame, such as color and edge histogram featureOr LBP (Local Binary Pattern, local binary patterns) feature etc..It optionally, in embodiments of the present invention, can be withVideo frame is described using LBP feature as feature descriptor.Transformed matrix-vector is obtained using LBP first, and then LBPFeature vector of the statistics histogram as video frame.In order to consider the location information of feature, it is small that video frame is divided into severalRegion carries out statistics with histogram in each zonule, that is, counts the quantity for belonging to a certain mode in the region, finally again instituteThere is the histogram in region to be once connected together as the processing that feature vector receives next stage.
S213, the static highest filtering video frame composition key frame set of angle value in each clustering cluster is obtained respectively.
In embodiments of the present invention, optionally, key frame is extracted from different clustering clusters by the static degree of picture.Since the motion compensation used in video compress will lead to fuzzy pseudomorphism, usually the picture with high kinergety also can more mouldPaste.Therefore, the quality of the key frame by selecting the picture with harmonic motion energy to may insure to extract is higher.Specifically, canTo be clustered first using feature vector of the K mean algorithm to the video frame extracted, the number of clustering cluster be can be set toThe number of camera lens in video, to obtain better cluster result.Different subsets ID number having the same in same clustering cluster, andCalculate separately the static degree of picture.Static degree refers to the inverse of the quadratic sum of the pixel difference of adjacent picture, can be poly- from differenceKey frame of the static highest picture of angle value as the clustering cluster is selected in class cluster.
It should be noted that in embodiments of the present invention, the purpose for selecting the highest filtering video frame of static angle value is sieveRepresentative strongest video frame is selected in each clustering cluster as key frame.In addition to being according to each cluster of screening with static angle valueFor representative strongest video frame as key frame, other can filter out the side of representative strongest video frame in clustering cluster in clusterMethod can also be used as the method for extracting key frame, and the embodiment of the present invention is to method used by extraction key frame and without limitSystem.
S214, the time parameter according to the filtering video frame for including in each clustering cluster in simple video subfile,Determining initial time corresponding with each clustering cluster and duration, and according to the initial time and the duration, it is rightThe simple video subfile carries out slicing treatment, obtains the video clip set.
Correspondingly, can not only extract key frame, while video can also be pressed after carrying out clustering to video frameDifferent classifications is divided into different segments, by the starting of the available video clip of quantity of video frame in the boundary of classification and classTime and clip durations, and then simple video subfile can be resolved into the frequency range of neglecting with special characteristic, complete sliceProcessing obtains video clip set.Video clip set can be used for carrying out video identification.
S220, multi-modal picture recognition is carried out to the key frame set, obtains the first recognition result, and to the videoSet of segments carries out video identification, obtains the second recognition result.
S230, audio identification is carried out to the simple audio subfile, obtains third recognition result.
S240, according to first recognition result, second recognition result and the third recognition result, obtain withThe video file is corresponding to integrate recognition result.
By adopting the above technical scheme, it is operated by video frame coarse filtration, video frame feature extraction and key-frame extraction etc.Extract key frame, can guarantee key frame meet with the highly relevant performance indicator of video content, to simple video subfile intoRow slicing treatment can be used for carrying out video identification to obtain video clip set, to realize more to video content progressThe identification of dimension.
Embodiment three
Fig. 3 a is a kind of flow chart for video frequency identifying method that the embodiment of the present invention three provides, and Fig. 3 b is the embodiment of the present inventionA kind of three bounding box sizes and position prediction effect diagram provided, Fig. 3 c are a kind of faces that the embodiment of the present invention three providesDetection effect schematic diagram, Fig. 3 e are a kind of flow charts for video frequency identifying method that the embodiment of the present invention three provides, and Fig. 3 g is this hairA kind of flow chart for video frequency identifying method that bright embodiment three provides.The present embodiment is carried out specifically based on above-described embodimentChange, in the present embodiment, gives the specific implementation for obtaining each recognition result.Correspondingly, as shown in Figure 3a, the present embodimentMethod may include:
S310, simple video subfile corresponding with video file to be identified and simple audio subfile are obtained, andObtain key frame set corresponding with the simple video subfile and video clip set.
Wherein, after simple video subfile in video file to be identified is identified available first recognition result andSecond recognition result;Available third recognition result after simple audio subfile is identified.
Correspondingly, to key frame set carry out multi-modal picture recognition obtain the first recognition result can specifically include it is followingTwo kinds of operations:
S320, picture classification is carried out to each key frame in the key frame set using default picture classification model, andUsing classification results as first recognition result.
Wherein, presetting picture classification model can be for the preparatory trained network to key frame progress picture classificationModel.
In embodiments of the present invention, it presets training data when picture classification model is trained and is mainly derived from two sidesFace.First is that including the background data base of the available label data set voluntarily marked of 20,000 multiclass, second is that the public affairs such as ImageNetOpen data set.It is abundant in content colorful due to picture, it is difficult to all categories accurately be differentiated using single model.CauseThis, the embodiment of the present invention can solve the problems, such as precisely to identify using multistage disaggregated model: the first order separates major class, such as quotationClass, sport category and vegetable class etc.;The second level carries out more sophisticated category, is such as finely divided into basketball movement and foot again to sport categoryBall movement etc..Simultaneously according to the actual situation, it can take the circumstances into consideration to carry out third level classification, such as identify it is which two in basketball movementTeam is playing.Every first-level class device can select according to the actual situation classification, object detection, OCR based on CNN networkThe methods of (Optical Character Recognition, optical character identification) completes classification.According to the reality of image contentSituation completes a series of streams such as the building of training dataset, including task formulation, picture crawler, picture calibration, inspection of qualityJourney, to guarantee to identify quality.
ResNet network can use residual error study and solve degenerate problem, and the content that residual error study needs to learn is relativelyIt is few, therefore learning difficulty is small and is easy to get preferable effect, experiments have shown that increase of the ResNet network with depth, the knot of generationFruit shows much better than network traditional before.ResNet network not only shows very on the data set of ImageNetIt is good, equally have good performance on the data set of COCO etc., illustrate ResNet network be used as one it is generalModel.It therefore, in embodiments of the present invention, can be using ResNet as CNN network model when carrying out picture classification.Further, it can be trained ResNet-34 as basic model.
S330, each key frame in the key frame set is separately input into YOLOv3 model trained in advance;AndYOLOv3 model output is obtained, target object mark corresponding with each key frame and target object existPosition coordinates in key frame are as the first recognition result.
Wherein, target object can be object in addition to face, such as animal, automobile or cutter etc..Target object markKnow the label in the list of labels that can be picture or video identification.Illustratively, list of labels includes but is not limited to: (1) excessiveDirty pornographic:, true man sexuality pornographic including true man, animation pornographic, the sexy and some special defects of animation;(2) bloody violence: including a surnameIt raises sudden and violent probably tissue, the bloody scene of violence and fights;(3) political sensitivity: including political sensitivity personage and scene etc.;(4) it disobeysProhibited cargo product: including being involved in drug traffic, controlled knife and army and police's articles etc.;(5) vulgar unhelpful: including exposed upper body, smoking, vulgar place andIt tatoos.In embodiments of the present invention, list of labels can update.Completely train the model of entire video identification usualNeed to spend the time of several weeks.Due to list of labels update frequency it is very fast, the model of video identification with list of labels againTraining is clearly very time-consuming.In order to shorten the training time, the iteration of model can be carried out using the method for transfer learning, it shouldMethod is finely adjusted the partial nerve network layer of the model by a model completely trained, new to identifyClassification.Training time and training resource can be greatlyd save in this way.Specific step is as follows: (1) changing softmax layers of nodeNumber is new number of labels, other network structures can not change;(2) weight of trained model before being loaded into;(3) againTraining pattern, but can substantially reduce the trained time.
Target detection is that the multiple objects in picture are positioned and classified, and positioning marks in picture where objectPosition, classification is then to provide object each in picture to correspond to classification.Target detection is handled for multiple target, to improveThe speed and accuracy rate of video identification.It in embodiments of the present invention, can when being identified to the target object in each key frameTo be identified using YOLOv3 model trained in advance as target detection network.YOLOv3 model is end-to-end detection, nothingIt needs region to nominate, target discrimination and target identification is combined into one, recognition performance can be substantially improved.Pass through training in advanceAfter YOLOv3 model identifies the corresponding target object of each key frame, the label in list of labels can use to objectBody is identified, and can by the recognition result of key frame target object according to its position coordinates in key frame make its withEach video clip is matched one by one in video clip set.Illustratively, it is assumed that YOLOv3 model identifies the 3rd key frameIn controlled knife, and recognize the key frame that the key frame belongs to the 2nd video clip, then can be by the 3rd key frameRecognition result is matched in the 2nd video clip.
YOLOv3 model used in the embodiment of the present invention introduces residual error structure and constructs new Darknet-53;It simultaneously canTo carry out repeated detection, three different anchor are respectively set to be detected on the different characteristic pattern of three scales;ItsIt is secondary not use softmax with more classification entropy loss and carry out single classification.The master of YOLOv3 model progress target detectionWanting process includes the following aspects:
(1) predicted boundary frame
Input picture is divided into S*S cyberspace position cell, fixed frame anchor is obtained by the method for clusterBoxes then predicts four coordinate value (t to each bounding boxx,ty,tw,th).For the cell of prediction, according to figureAs the offset (c in the upper left cornerx,cy), and the width and high p of bounding box is obtained beforewAnd phIt can be to bounding boxIt is predicted.Mean square error loss function can be used when these coordinate values of YOLOv3 model training, to each boundingBox predicts the score of an object by logistic regression.If the bounding box of prediction and true frame value are most ofBe overlapped and than other all predictions than get well, then the score value of object is just 1.If overlap does not reach a threshold value(threshold value can be set to 0.5) is shown as no penalty values then the bounding box of this prediction will be ignored.
Fig. 3 b is a kind of bounding box size and position prediction effect diagram that the embodiment of the present invention three provides, with reference to figure3b, predicted boundary frame can use following formula:
bx=σ (tx)+cx
by=σ (ty)+cy
Wherein, bxIndicate the upper left corner boundingbox abscissa, byIndicate the upper left corner boundingbox ordinate, bwTableShow boundingbox width, bhIndicate boundingbox height, tx、ty、twAnd thIt is expressed as generating bounding box netFour coordinate values of network model prediction, cxAnd cyIndicate deviant, pwAnd phIndicate the width and height of the bounding box of priori, σ() indicates activation primitive.
(2) class prediction
Each bounding box is classified using multi-tag.Therefore polytypic softmax layers of single label is changed into and is used forThe polytypic logistic regression layer of multi-tag, does two classification to each classification by simple logistic regression layer.Logistic regression layerSigmoid function mainly is used, which can be by input constraint in the range of 0 to 1, therefore works as a picture and pass through featureCertain one kind output after extraction, if it is greater than 0.5, means that after sigmoid function constraint and belongs to such.
(3) across scale prediction
YOLOv3 model is given a forecast by the way of multiple scale fusion, in three different scale prediction boxes, is madeThe priori of boundingbox is obtained with cluster, selects 9 clustering clusters and 3 scales, it is then that this 9 clustering clusters are uniformIt is distributed on these scales.Changed by FPN (Feature Pyramid Network, feature pyramid network) network specialSign extracts model, and finally prediction obtains a 3-d tensor, and it comprises bounding box information, object information and moreThe predictive information of a class.Make YOLOv3 model available in such a way to more semantic informations.
(4) feature extraction
In embodiments of the present invention, YOLOv3 model is using DarkNet-53 network as feature extraction layer, one side baseThis uses full convolution, and the sample of feature map is using convolutional layer, residual structure is on the other hand introduced, subtractsNetwork, can be accomplished 53 layers, use multiple 3*3 and 1*1 convolutional layers by the small network difficulty of trained layer, improve network essenceDegree.
YOLOv3 model in the embodiment of the present invention can improve the recognition effect of multiple target multi-tag and Small object.
(5) training
In embodiments of the present invention, a variety of methods, such as data can be used to enhance when being trained to YOLOv3 model.
S340, recognition of face is carried out to obtain the first recognition result to each key frame of the key frame set.
Correspondingly, S340 can specifically include:
S341, Face datection is carried out to each key frame in the key frame set using S3FD algorithm.
It is understood that Face datection is the first step of recognition of face, it is particularly significant to recognition of face.Traditional faceDetection algorithm has the Face datection based on geometrical characteristic, the Face datection based on eigenface, the face inspection based on elastic graph matchingSurvey and be based on the Face datection etc. of SVM (Support VectorMachine, support vector machines).Although these methods are able to achieveThe detection of face, but there are many erroneous detection, missing inspection, detection effect is very poor under complex background, and does not adapt to illumination, angle etc. and becomeChange.To solve the above-mentioned problems, the embodiment of the present invention uses S3FD (the Single Shot Scale- based on deep learningInvariant Face Detector, Scale invariant human-face detector) algorithm.S3FD algorithm is especially suitable for small Face datection.
Specifically, S3FD algorithm detects the face of different scale using the receptive field difference of different convolutional layers.The algorithmBasic network be VGG16, can load VGG16 pre-training model accelerate network training.It is more multiple dimensioned in order to detect simultaneouslyFace, S3FD algorithm increases 6 convolutional layers on the basis of VGG16, is ultimately used to the convolutional layer of detection face.S3FD is calculatedMethod mainly has following two points improvement: 1) difference based on theoretical receptive field He practical receptive field, improves the side of anchor propositionFormula;2) in order to preferably detect small face, more layer and scale are increased.The embodiment of the present invention is based on open source human face dataCollect wider face and VGG16 network is trained according to the human face data that self-demand is collected, detection effect figure Fig. 3 cIt is shown, it is seen then that the Face datection of the embodiment of the present invention works well.
S342, face key point location is carried out to the face detected by MTCNN algorithm, obtains face key point.
Face key point location is the key that do face alignment, needs to orient left and right human eye, the left and right corners of the mouth and nosePosition, the accuracy of face key point location can greatly influence the effect of face characteristic extraction.Traditional face key point is fixedPosition method is all based on the local feature of face greatly to position, and locating effect is undesirable, and generalization ability is poor, does not adapt to angle and lightAccording to etc. influence factors variation.To solve the above-mentioned problems, the embodiment of the present invention uses MTCNN (Multi-task CascadedConvolutional Networks, multitask concatenated convolutional network) algorithm realize face key point positioning.MTCNN is oneThe convolutional neural networks of kind cascade structure, it is divided into tri- parts p-net, r-net and o-net.MTCNN can regard three asThe series connection of independent convolutional neural networks, three being completed for tasks of network be it is the same, it is only slightly poor in network structureNot.The main thought of MTCNN algorithm is exactly the cascade using multiple networks, is constantly optimized to the same task, that is, p-Net obtain one it is rough as a result, then r-net makes improvements, last o-net again improves the result of r-net.It is continuous in this way to improve, so that crucial point location becomes more accurate.The embodiment of the present invention is based on open source data set widerFace, Celeba and according to self-demand collect human face data MTCNN network is trained, Fig. 3 d is the embodiment of the present inventionA kind of effect diagram of the three face key point locations provided, the detection effect figure of MTCNN algorithm are as shown in Figure 3d, it is seen then thatThe face key point locating effect of the embodiment of the present invention is good.
S343, feature extraction is carried out to facial image by Arcface algorithm according to the face key point.
Face characteristic is extracted primarily to being compared to face, the face characteristic of the same person should be quite similar,The feature of different faces should similarity it is very low.Because the feature extracted is all based on such a big classification of face, how to allowThe feature extracted is similar as far as possible on the face in the same person, and discrimination is big as far as possible on different faces, is the pass for extracting featureKey.In order to increase the discrimination between different faces, the embodiment of the present invention is (i.e. deep using Arcface algorithm improvement sorter networkDegree neural network) loss function increase the discrimination between different classes of so that feelings such as there are many imbalanced training sets and classificationRemain to that there is preferable classifying quality under condition.
Arcface algorithm is that categorised demarcation line is directly maximized in angular region, that is, original to sorter networkSoftmax loss function is modified, and the loss that angular region carrys out presentation class network is converted to.Original softmaxLoss calculation formula is as follows:
Loss calculation formula after Arcface algorithm improvement is as follows:
Wherein, L1Indicating the definition of loss function, m indicates the size of batchsize, and i and j are natural number,It indicatesThe y_i column of i-th of sample the last one full articulamentum, x and y indicate feature vector and classification, and x_i indicates i-th of sampleDeep learning feature and y_i indicate classification belonging to i-th of sample, and T indicates transposition operation,Indicate the last one full connectionThe y_i column of the bias term of layer, b indicate the bias term of the last one full articulamentum,Indicate i-th sample lastThe jth of a full articulamentum arranges, bjIndicate that the jth column of the bias term of the last one full articulamentum, s indicate after normalization | |X | |, θyiIndicate angle between w_ (y_i) and x_i, θjIndicate angle between w_j and x_i.
Compared with original softmax loss, possess better performance, class spacing with the feature that Arcface algorithm extractsIt is big from more, even if still having preferable differentiation effect in the case where there are many classification number.
S344, it is matched according to the face characteristic extracted with the feature in feature database, and according to matching result to eachThe corresponding people information of the key frame is identified, and using the mark result of the face information as the first recognition result.
In embodiments of the present invention, after extracting the face characteristic in key frame picture, building need to identify the people of personageFace feature database also gets up the storage corresponding with his face characteristic of the people information in key frame picture.In cognitive phase meetingIt will be matched from the feature of face extraction to be detected and the face characteristic of feature database, identification knot provided according to matched similarityFruit.Under normal conditions, measures characteristic vector similarity has Euclidean distance and two kinds of COS distance.Because of Euclidean distance fluctuation rangeIt is bigger, it is difficult to there is a determining threshold value to define similarity, so the embodiment of the present invention describes spy using COS distanceThe similarity of sign.COS distance range can very easily determine demarcation threshold between [- 1,1].On matching principle originallyThe matching algorithm that inventive embodiments use closest matching method to combine with threshold method.The algorithm calculate first feature to be identified withThe similarity of feature planting modes on sink characteristic takes classification of the personage's classification of the highest feature of similarity as the identification feature, then judges thisWhether similarity is greater than the threshold value of setting, then assert it is that the figure kind is other greater than the threshold value, then determines it is not special less than the threshold valuePersonage's classification of Zheng Kunei.
It should be noted that before carrying out multimodal recognition to key frame picture, it is also necessary to the key frame figure of inputPiece is pre-processed.Pretreatment refers mainly to be standardized picture and that picture is zoomed to same size is defeated as modelEnter, the first order is introduced into the classification that different classes of model carries out level-one label, and the second level is directed to a certain major class or a few againMajor class carries out more fine identification, and such classification framework is very easy to extension, and target detection mould can be used in part labelsType and human face recognition model are assisted in identifying.
It should be noted that Fig. 3 a is only a kind of schematic diagram of implementation, there is no successively suitable between S320 and S330Order relation can first implement S320, then implement S330, can also first implement S330, then implement S320, can be real parallel with the twoApply or select an implementation.
Correspondingly, as shown in Figure 3 e, obtaining the second recognition result can specifically include operations described below:
S350, video identification is carried out to video clip set, obtains the second recognition result.
Specifically, S350 may include operations described below:
S351, time domain down-sampling is carried out to each video clip in the video clip set respectively, obtained and piece of videoThe corresponding sampled video frame set of section.
Wherein, sampled video frame set can be used for storing according to the video frame obtained after setting rule sampling.
Fig. 3 f is a kind of system schematic for video identification that the embodiment of the present invention three provides.As illustrated in figure 3f, in this hairIn bright embodiment, the accurate identification of different type of action in video clip is realized using 3D convolutional neural networks 3DCNN technology.The movement of identification may include fight, smoke, drinking, society shake with sea grass dance etc. more than 20 kinds of bad vulgar movements, can alsoTo include more than the 100 kinds of conventional actions such as having a meal, climb rocks, jump, play football and kissing.Difference movement resolution accuracy may be differentSample, for example dance it is more likely that a global action, and smoke it is more likely that an activities.In order to meet different resolutionDemand, the embodiment of the present invention can construct a high-resolution 3DCNN network and a low resolution 3DCNN network.Usual feelingsUnder condition, by time domain specification there are two types of in the way of, one is directly with original picture frame as the input of 3DCNN, it is for secondX gradient, y gradient and Optical-flow Feature between picture frame are extracted as the input of 3DCNN.It should be noted that 3DCNN intoWhen row training, for more classification problems, 3DCNN can use polytypic cross entropy loss.
Video sequence is the image of time correlation.In the time domain, the time interval very little of consecutive frame, especially in time domainIn the case where sample rate higher such as 25fps, 30fps, 50fps and 60fps, the correlation of consecutive frame is very high, and 3DCNNEach sample of input also requires time domain frame number to fix.In embodiments of the present invention, time domain frame number can position 16 frames, Ke YiweiAngular transition under different frame rates provides prerequisite.Specifically, as illustrated in figure 3f, the system of video identification of the embodiment of the present inventionIn M1 module can use following two kinds of sample modes:
Mode (1): it is assumed that the frame per second of original video sequence is Q, the frame per second after sampling is P, can be carried out based on time gapDown-sampled processing, conversion formula are as follows: σi=λ θk+1+(1-λ)θk, whereinσiIndicate i-th of videoFrame, λ indicate weighting parameters, θk+1Indicate+1 frame image of kth of original video, θkIndicate that the kth frame image of original video, i are down-sampledFrame number index afterwards, two frames i adjacent in original video sequence are k and k+1, it is down-sampled in this way after sequence of frames of video be σ=[σ1, σ2…σM], wherein M value is 16.
Mode (2): the covering of 8 frames is had to 16 frames that take of original video Sequentially continuous, but between two neighboring 16 frame fragment, i.e.,For original video segment, the segment of multiple 16 frames of 8 frames of covering mutually can be divided into.
Mode (1) can guarantee that the of overall importance of video clip, mode (2) can guarantee the locality and information of video clipIntegrality, the sample generated by both modes can take the label of original video segment, consequently facilitating being trained.
S352, spatially and temporally progress setting processing operation is integrated into the sampled video frame, obtains at least two classesThe input picture of type;Wherein, the setting processing operation includes that scaling processing, light stream extraction and edge image extract;It is describedThe type of input picture includes high-definition picture, low-resolution image, light stream image and edge image.
Correspondingly, as illustrated in figure 3f, the M2 module in the system of video identification of the embodiment of the present invention is responsible for sample videoThe processing that video frame in frame set is set.In embodiments of the present invention, M2 module provides 3 kinds of processing modes, that is, contractsProcessing, light stream extraction and edge image is put to extract.Wherein, scaling processing may further include two kinds of scalable manners: in order toMeet to high-resolution demand, by original image in airspace resize at 224*224*3 size, and the image of low resolution is thenResize is at 112*112*3.Light stream is the significant information of object of which movement in the time domain, is using pixel in image sequence in the timeThe corresponding relationship between previous frame and present frame that the correlation between variation and consecutive frame on domain is found, between consecutive frameThis corresponding relationship regard the motion information of object as.Optionally, the embodiment of the present invention passes through in conjunction with opencv'sCv2.calcOpticalFlowPyrLK () function calculates light stream.Edge image is the structural attribute and object of imageMove the significant information on airspace.Optionally, the embodiment of the present invention takes Canny operator extraction edge image, and to RGB3A channel calculates separately edge feature.The calculation process of Canny are as follows: 1) filter out and make an uproar with smoothed image using Gaussian filterSound;2) gradient intensity of each pixel and direction in image are calculated;3) non-maxima suppression is applied, to eliminate edge detection bandThe spurious response come;4) it detects using dual threshold to determine true and potential edge;5) by inhibiting isolated weak edgeIt is finally completed edge detection.
S353, all kinds of input pictures are separately input into corresponding 3DCNN network, and use the 3DCNN netNetwork identifies input picture, and obtains the 3DCNN network output, the output of video tab corresponding with input pictureProbability value.
Correspondingly, all kinds of input pictures can be separately input into corresponding 3DCNN after getting four class input picturesIn network.As illustrated in figure 3f, 3DCNN network may include the modules such as M3, M4 and M5.M3 module is the backbone network of 3DCNN,The input of 3DCNN1,3DCNN2,3DCNN3 and 3DCNN4 be respectively high-definition picture, low-resolution image, light stream image andEdge image.For the convolution kernel of 3DCNN network, airspace size can choose 3*3 and 5*5, the series connection side of multiple small convolution kernelsFormula.Pooling selects max_pooling, the size of time domain most to start all then gradually to be incremented by, be followed successively by 2,3,4 with 1.ThisThe mode of sample setting, which is that time-domain information is premature in order to prevent, to be fused.M4 module is full articulamentum (fully connectedLayers, FC), network parameter is excessive in order to prevent, and the embodiment of the present invention is only with a full convolutional layer.M5 module is substantiallyA full articulamentum, but node number is classification quantity, prediction be current 3DCNN every a kind of input picture it is corresponding eachThe output probability value of a classification.
S354, the output probability value of each video tab is merged according to setting amalgamation mode, will be obtained after fusionTo video tab be combined as the second recognition result.
Correspondingly, as illustrated in figure 3f, Output module merges the output probability value of the 3DCNN of front 4, fusion sideFormula is the probability value multiplication that 4 3DCNN networks correspond to classification, as a result the fusion output probability as each classification input pictureValue, when prediction, take the output of this layer to export as a result.
Illustratively, it is assumed that disaggregated model is respectively to dance there are two classifications, the output probability value of 4 3DCNN: 0.9,Smoking: 0.1;It dances: 0.8, smoking: 0.2;It dances: 0.9, smoking: 0.1;It dances: 0.7, smoking: 0.3;The output of each 3DCNNThe corresponding weight of probability value is defaulted as 0.25, then corresponding second recognition result of video clip can be with are as follows: dances: 0.825, meterCalculation mode are as follows: 0.9*0.25+0.8*0.25+0.9*0.25+0.7*0.25=0.825, smoking: 0.175, calculation are as follows:0.1*0.25+0.2*0.25+0.1*0.25+0.4*0.25=0.175.
It should be noted that in embodiments of the present invention, video clip is simple video subfile after slicing treatmentThe relatively single segment of content.For long video segment, it is possible to can exist first 10 seconds and smoking, play football within latter 10 secondsSituation.Such video can be cut into two video clips, and for both playing football in smoking in same 10 secondsVideo can not have to incision and the second recognition result by " smoking " and " playing football " two labels as the video.
Correspondingly, as shown in figure 3g, obtaining third recognition result can specifically include operations described below:
S360, audio identification is carried out to simple audio subfile, obtains third recognition result.
Specifically, S360 may include operations described below:
S361, the simple audio subfile is pre-processed after carry out fast Fourier change to obtain the simple soundThe frequency domain information of frequency subfile.
It in embodiments of the present invention, first can be to simple audio when carrying out audio identification to simple audio subfileSubfile is pre-processed (including audio signal preemphasis or signal adding window etc.) and is obtained merely using Fast Fourier Transform (FFT) afterwardsThe frequency domain information of audio subfile.Fast Fourier is the fast algorithm of discrete Fourier Asia transformation, is become according to discrete FourierThe odd even characteristic changed carries out algorithm optimization, and complexity is reduced to O (nlogn) from O (n2).Fast Fourier Transform (FFT) formula can be withExpression are as follows:
Wherein, X (k) indicates that the signal sequence after Fourier's variation, x (n) indicate discrete tone sequence after sampling, and n is indicatedTonic train length, k indicate frequency domain sequence length, and N indicates Fourier transformation siding-to-siding block length,
S362, the corresponding energy spectrum of frequency domain information for calculating simple audio subfile described in every frame.
Correspondingly, modulus square fortune can be carried out to frequency domain information after getting the frequency domain information of simple audio subfileIt calculates, calculates the energy spectrum of each frame signal, and be filtered to signal with Meier filter group, calculate Meier filtered energy.It willSignal is expressed with plural form are as follows:
X (k)=α e-jθk=acos θ k+jasin θ k=ak+jbk
Then signal energy stave reaches are as follows:
Wherein, E (k) indicates Meier filtered energy.
S363, logarithm Meier spectrum energy is obtained according to the energy spectrum.
Log spectrum feature has preferable retention to high-frequency signal, more to the audio identification performance in complex sceneStablize.Therefore, in embodiments of the present invention, audio knowledge can be carried out to simple audio subfile according to logarithm Meier spectrum energyNot.
Specifically, logarithm Meier spectrum energy can be calculated by following formula:
Wherein, E (n) indicates that the corresponding logarithm Meier spectrum energy of n-th of Meier filter, C (k) indicate kth section audioThe energy of signal, Hn(k) frequency response of n-th of Meier filter is indicated.
S364, logarithm Meier spectrum signature is extracted according to the logarithm Meier spectrum energy.
Correspondingly, in embodiments of the present invention, in the logarithm Meier spectrum signature processing stage of logarithm Meier spectrum energy,The library Librosa can be used and extract audio frequency characteristics.Wherein, sample rate is set as 44100Hz, and frame length is set as 30ms, preemphasisCoefficient is 0.89, using hamming window function.Meier spectrum signature coefficient 32 is tieed up, its first-order difference and second differnce feature are calculated,The feature vector of 96 dimensions is constituted altogether.
S365, the logarithm Meier spectrum signature is reconstructed, obtains two-dimentional audio frequency characteristics.
Fig. 3 h is a kind of schematic diagram for logarithm Meier spectrum signature that the embodiment of the present invention three provides, as illustrated in figure 3h,In the embodiment of the present invention, after obtaining logarithm Meier spectrum signature, obtained one-dimensional logarithm Meier spectral audio feature is subjected to weightStructure, obtains two-dimensional audio frequency characteristics distribution, and characteristic pattern dimension is (frequency band number * audio frame length).
The operation of above-mentioned S351-S355 belongs to the pretreated operation of audio frequency characteristics, and the operation of following S356 belongs to and is based onThe data sorting operation of deep learning.
S366, feature extraction and audio classification are carried out to the two-dimentional audio frequency characteristics by being based on CNN basic structural unit.
In embodiments of the present invention, it can realize that further feature is extracted and audio point using CNN basic structural unitClass.Convolutional layer can not only reduce computing cost, while can retain data using part connection and parameter sharing mechanismSpace distribution rule.Basic structural unit mainly includes convolutional layer, pond layer and activation primitive, and classification layer uses SoftmaxFunction, loss function use cross entropy loss function, and it is 128 that initial learning rate, which is set as 0.001, batchsize size, is usedStochastic gradient descent optimization algorithm.
In an alternate embodiment of the present invention where, when the CNN basic structural unit uses multiple classifiers, pass throughThe mode of ballot identifies the simple audio subfile.
Illustratively, when using 3 classifiers, if obtained recognition result is respectively as follows: label 1: smoking, confidenceDegree: 0.9, weight: 0.8;Label 2: smoking, confidence level: 0.8, weight: 0.1;Label 3: dancing, confidence level: 0.5, weight:0.1.The mode then voted, which may is that, to be taken according to label 1 and the identical label of label 2: smoking, then corresponding final recognition resultIt may is that label: smoking, confidence level: 0.8*0.9+0.1*0.8+0.1*0=0.8.
Fig. 3 i is a kind of video recognition algorithms configuration diagram that the embodiment of the present invention three provides, in a specific exampleIn, as shown in figure 3i, after getting video file to be identified, video file to be identified is detached to form simple video ZiwenPart and simple audio subfile, wherein simple video subfile is a kind of successive frame.Simple video subfile is by slice and closesKey frame extraction process forms video clip and key frame, can carry out multimodal recognition to key frame picture to obtain picture recognitionCorresponding first recognition result, multimodal recognition may include picture classification, target detection and Face datection etc..Wherein, pictureClassification can be realized using OCR or NLP (Natural Language Processing, natural language processing) method.For listPure video subfile can be identified to obtain visual classification, i.e. video identification pair when carrying out video identification using 3DCNN networkThe second recognition result answered.Audio classification can also be carried out when carrying out audio identification to simple audio subfile, can therefrom be obtainedThe text information in simple audio subfile is taken, and text information is identified using NLP method, non-legible information can be carried outSpeech audio or non-speech audio identification etc., to obtain the corresponding third recognition result of audio identification.Finally, post-processing moduleRecognition result in three can be integrated, recognition result is integrated in formation.Integrating includes finally determining label in recognition resultWith confidence information etc..
The acquisition modes for embodying each recognition result by adopting the above technical scheme are able to solve existing video audit technology and depositIdentification content is single and problem that identification range is small, realize abundant identification type, refinement identification content, to video content intoThe identification of row various dimensions improves the rich of video identification technology, accuracy, high efficiency on the basis of reducing identification costAnd real-time.
Example IV
Fig. 4 is a kind of schematic diagram for video identification device that the embodiment of the present invention four provides, as shown in figure 4, described deviceIt include: that subfile obtains module 410, the first identification module 420, the second identification module 430 and recognition result acquisition module440, in which:
Subfile obtains module 410, for obtain corresponding with video file to be identified simple video subfile andSimple audio subfile, and obtain key frame set corresponding with the simple video subfile and video clip set;
First identification module 420 obtains the first identification knot for carrying out multi-modal picture recognition to the key frame setFruit, and video identification is carried out to the video clip set, obtain the second recognition result;
Second identification module 430 obtains third identification knot for carrying out audio identification to the simple audio subfileFruit;
Recognition result obtains module 440, for according to first recognition result, second recognition result and describedThird recognition result obtains corresponding with the video file integrating recognition result.
The embodiment of the present invention is by obtaining simple video subfile corresponding with video file to be identified and simple soundFrequency subfile, and key frame set corresponding with simple video subfile and video clip set;To key frame set intoThe multi-modal picture recognition of row obtains the first recognition result, and carries out video identification to video clip set and obtain the second identification knotFruit;Audio identification is carried out to simple audio subfile and obtains third recognition result, finally ties the first recognition result, the second identificationFruit and third recognition result are integrated to obtain the integration recognition result of video file, solve the existing knowledge of existing video audit technologyThe problem that other content is single and identification range is small realizes abundant identification type, refinement identification content, carries out multidimensional to video contentThe identification of degree improves the rich of video identification technology, accuracy, high efficiency and reality on the basis of reducing identification costShi Xing.
Optionally, subfile obtains module 410, comprising: filtering sets of video frames acquiring unit, for thick using video frameFiltering technique is filtered the simple video subfile, obtains filtering sets of video frames;Clustering cluster acquiring unit, based onFeature vector corresponding with video frame is respectively filtered in the filtering sets of video frames is calculated, and according to described eigenvector to instituteEach filtering video frame stated in filtering sets of video frames carries out clustering processing, obtains at least two clustering clusters, wherein the clusterIt include at least one filtering video frame in cluster;Key frame set component units, it is quiet in each clustering cluster for obtaining respectivelyThe highest filtering video frame of attitude score forms key frame set;Video clip set acquiring unit, for according to each clusterTime parameter of the filtering video frame for including in cluster in simple video subfile, determination are corresponding with each clustering clusterInitial time and duration, and according to the initial time and the duration, the simple video subfile is carried out at sliceReason, obtains the video clip set.
Optionally, clustering cluster acquiring unit is specifically used for using convolutional neural networks model to the filtering set of videoEach filtering video frame in conjunction carries out feature extraction;Or, using local binary patterns LBP in the filtering sets of video framesEach filtering video frame carries out feature extraction, and using each feature extraction result carry out processing formed statistics histogram as with it is each describedFilter the corresponding LBP feature vector of video frame
Optionally, the first identification module 420 is specifically used for using default picture classification model in the key frame setEach key frame carry out picture classification, and using classification results as first recognition result;
And/or
Each key frame in the key frame set is separately input into YOLOv3 model trained in advance;And obtain instituteThe output of YOLOv3 model is stated, target object mark corresponding with each key frame and target object are in key frameIn position coordinates as the first recognition result;
And/or
Face datection is carried out to each key frame in the key frame set using S3FD algorithm;It is calculated by MTCNNMethod carries out face key point location to the face detected, obtains face key point;Passed through according to the face key pointArcface algorithm carries out feature extraction to facial image;According to the feature progress in the face characteristic and feature database extractedMatch, and the corresponding people information of each key frame is identified according to matching result, and by the mark of the face informationAs a result it is used as the first recognition result.
Optionally, the first identification module 420 is also used to respectively carry out each video clip in the video clip setTime domain down-sampling obtains sampled video frame set corresponding with video clip;
Spatially and temporally progress setting processing operation is integrated into the sampled video frame, obtains the defeated of at least two typesEnter image;Wherein, the setting processing operation includes that scaling processing, light stream extraction and edge image extract;The input figureThe type of picture includes high-definition picture, low-resolution image, light stream image and edge image;By all kinds of input picturesIt is separately input into corresponding 3DCNN network, and input picture is identified using the 3DCNN network, and described in acquisitionThe output of 3DCNN network, the output probability value of video tab corresponding with input picture;The output of each video tab is generalRate value is merged according to setting amalgamation mode, and the video tab obtained after fusion is combined as the second recognition result.
Optionally, the second identification module 430, it is fast specifically for being carried out after being pre-processed to the simple audio subfileFast Fourier changes to obtain the frequency domain information of the simple audio subfile;Calculate the frequency domain of simple audio subfile described in every frameThe corresponding energy spectrum of information;Logarithm Meier spectrum energy is obtained according to the energy spectrum;According to the logarithm Meier spectrum energyExtract logarithm Meier spectrum signature;The logarithm Meier spectrum signature is reconstructed, two-dimentional audio frequency characteristics are obtained;By being based onCNN basic structural unit carries out feature extraction and audio classification to the two-dimentional audio frequency characteristics.
Optionally, when the CNN basic structural unit uses multiple classifiers, to described simple by way of ballotAudio subfile is identified.
Video frequency identifying method provided by any embodiment of the invention can be performed in above-mentioned video identification device, has the side of executionThe corresponding functional module of method and beneficial effect.The not technical detail of detailed description in the present embodiment, reference can be made to the present invention is anyThe video frequency identifying method that embodiment provides.
Embodiment five
Fig. 5 is a kind of structural schematic diagram for computer equipment that the embodiment of the present invention five provides.Fig. 5, which is shown, to be suitable for being used toRealize the block diagram of the computer equipment 512 of embodiment of the present invention.The computer equipment 512 that Fig. 5 is shown is only an example,Should not function to the embodiment of the present invention and use scope bring any restrictions.
As shown in figure 5, computer equipment 512 is showed in the form of universal computing device.The component of computer equipment 512 canTo include but is not limited to: one or more processor 516, storage device 528 connect different system components (including storage dressSet 528 and processor 516) bus 518.
Bus 518 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller,Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.It liftsFor example, these architectures include but is not limited to industry standard architecture (Industry StandardArchitecture, ISA) bus, microchannel architecture (Micro Channel Architecture, MCA) bus, enhancingType isa bus, Video Electronics Standards Association (Video Electronics Standards Association, VESA) localBus and peripheral component interconnection (Peripheral Component Interconnect, PCI) bus.
Computer equipment 512 typically comprises a variety of computer system readable media.These media can be it is any canThe usable medium accessed by computer equipment 512, including volatile and non-volatile media, moveable and immovable JieMatter.
Storage device 528 may include the computer system readable media of form of volatile memory, such as arbitrary accessMemory (Random Access Memory, RAM) 530 and/or cache memory 532.Computer equipment 512 can be intoOne step includes other removable/nonremovable, volatile/non-volatile computer system storage mediums.Only as an example, it depositsStorage system 534 can be used for reading and writing immovable, non-volatile magnetic media, and (Fig. 5 do not show, commonly referred to as " hard driveDevice ").Although being not shown in Fig. 5, the disk for reading and writing to removable non-volatile magnetic disk (such as " floppy disk ") can be provided and drivenDynamic device, and to removable anonvolatile optical disk (such as CD-ROM (Compact Disc-Read Only Memory, CD-ROM), digital video disk (Digital Video Disc-Read Only Memory, DVD-ROM) or other optical mediums) read-writeCD drive.In these cases, each driver can pass through one or more data media interfaces and bus 518It is connected.Storage device 528 may include at least one program product, which has one group of (for example, at least one) programModule, these program modules are configured to perform the function of various embodiments of the present invention.
Program 536 with one group of (at least one) program module 526, can store in such as storage device 528, thisThe program module 526 of sample includes but is not limited to operating system, one or more application program, other program modules and programIt may include the realization of network environment in data, each of these examples or certain combination.Program module 526 usually executesFunction and/or method in embodiment described in the invention.
Computer equipment 512 can also with one or more external equipments 514 (such as keyboard, sensing equipment, camera,Display 524 etc.) communication, the equipment interacted with the computer equipment 512 communication can be also enabled a user to one or more,And/or with any equipment (such as net that the computer equipment 512 is communicated with one or more of the other calculating equipmentCard, modem etc.) communication.This communication can be carried out by input/output (I/O) interface 522.Also, computerEquipment 512 can also pass through network adapter 520 and one or more network (such as local area network (Local AreaNetwork, LAN), wide area network Wide Area Network, WAN) and/or public network, such as internet) communication.As schemedShow, network adapter 520 is communicated by bus 518 with other modules of computer equipment 512.Although should be understood that in figure notIt shows, other hardware and/or software module can be used in conjunction with computer equipment 512, including but not limited to: microcode, equipmentDriver, redundant processing unit, external disk drive array, disk array (Redundant Arrays of IndependentDisks, RAID) system, tape drive and data backup storage system etc..
The program that processor 516 is stored in storage device 528 by operation, thereby executing various function application and numberAccording to processing, such as realize video frequency identifying method provided by the above embodiment of the present invention.
That is, the processing unit is realized when executing described program: obtaining corresponding with video file to be identified simpleVideo subfile and simple audio subfile, and obtain key frame set corresponding with the simple video subfile and viewFrequency set of segments;Multi-modal picture recognition is carried out to the key frame set, obtains the first recognition result, and to the piece of videoDuan Jihe carries out video identification, obtains the second recognition result;Audio identification is carried out to the simple audio subfile, obtains thirdRecognition result;According to first recognition result, second recognition result and the third recognition result, obtain with it is describedVideo file is corresponding to integrate recognition result.
By computer equipment acquisition simple video subfile corresponding with video file to be identified and merelyAudio subfile, and key frame set corresponding with simple video subfile and video clip set;To key frame setIt carries out multi-modal picture recognition and obtains the first recognition result, and video identification is carried out to video clip set and obtains the second identification knotFruit;Audio identification is carried out to simple audio subfile and obtains third recognition result, finally ties the first recognition result, the second identificationFruit and third recognition result are integrated to obtain the integration recognition result of video file, solve the existing knowledge of existing video audit technologyThe problem that other content is single and identification range is small realizes abundant identification type, refinement identification content, carries out multidimensional to video contentThe identification of degree improves the rich of video identification technology, accuracy, high efficiency and reality on the basis of reducing identification costShi Xing.
Embodiment six
The embodiment of the present invention six also provides a kind of computer storage medium for storing computer program, the computer programWhen being executed by computer processor for executing any video frequency identifying method of the above embodiment of the present invention: obtain with toThe corresponding simple video subfile of the video file of identification and simple audio subfile, and obtain and the simple video ZiwenThe corresponding key frame set of part and video clip set;Multi-modal picture recognition is carried out to the key frame set, obtains theOne recognition result, and video identification is carried out to the video clip set, obtain the second recognition result;To simple audioFile carries out audio identification, obtains third recognition result;According to first recognition result, second recognition result and instituteThird recognition result is stated, obtains corresponding with the video file integrating recognition result.
The computer storage medium of the embodiment of the present invention, can be using any of one or more computer-readable mediaCombination.Computer-readable medium can be computer-readable signal media or computer readable storage medium.It is computer-readableStorage medium for example may be-but not limited to-the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, device orDevice, or any above combination.The more specific example (non exhaustive list) of computer readable storage medium includes: toolThere are electrical connection, the portable computer diskette, hard disk, random access memory (RAM), read-only memory of one or more conducting wires(Read Only Memory, ROM), erasable programmable read only memory ((Erasable Programmable ReadOnly Memory, EPROM) or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magneticMemory device or above-mentioned any appropriate combination.In this document, computer readable storage medium, which can be, any includesOr the tangible medium of storage program, which can be commanded execution system, device or device use or in connection makeWith.
Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal,Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimitedIn electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer canAny computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used forBy the use of instruction execution system, device or device or program in connection.
The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimitedIn wireless, electric wire, optical cable, radio frequency (Radio Frequency, RF) etc. or above-mentioned any appropriate combination.
The computer for executing operation of the present invention can be write with one or more programming languages or combinations thereofProgram code, described program design language include object oriented program language-such as Java, Smalltalk, C++ orPython etc. further includes conventional procedural programming language --- such as " C " language or similar programming language.JourneySequence code can be executed fully on the user computer, partly execute on the user computer, be independent soft as onePart packet executes, part executes on the remote computer or completely in remote computer or service on the user computer for partIt is executed on device.In situations involving remote computers, remote computer can pass through the network of any kind --- including officeDomain net (LAN) or wide area network (WAN)-are connected to subscriber computer, or, it may be connected to outer computer (such as using becauseSpy nets service provider to connect by internet).
Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art thatThe invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation,It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present inventionIt is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, alsoIt may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.

Claims (10)

CN201811113391.2A2018-09-252018-09-25A kind of video frequency identifying method, device, computer equipment and storage mediumPendingCN109376603A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201811113391.2ACN109376603A (en)2018-09-252018-09-25A kind of video frequency identifying method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201811113391.2ACN109376603A (en)2018-09-252018-09-25A kind of video frequency identifying method, device, computer equipment and storage medium

Publications (1)

Publication NumberPublication Date
CN109376603Atrue CN109376603A (en)2019-02-22

Family

ID=65401655

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201811113391.2APendingCN109376603A (en)2018-09-252018-09-25A kind of video frequency identifying method, device, computer equipment and storage medium

Country Status (1)

CountryLink
CN (1)CN109376603A (en)

Cited By (76)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109829067A (en)*2019-03-052019-05-31北京达佳互联信息技术有限公司Audio data processing method, device, electronic equipment and storage medium
CN109862394A (en)*2019-03-272019-06-07北京周同科技有限公司Checking method, device, equipment and the storage medium of video content
CN109886241A (en)*2019-03-052019-06-14天津工业大学 Driver fatigue detection based on long short-term memory network
CN110110846A (en)*2019-04-242019-08-09重庆邮电大学Auxiliary driver's vehicle exchange method based on convolutional neural networks
CN110147711A (en)*2019-02-272019-08-20腾讯科技(深圳)有限公司Video scene recognition methods, device, storage medium and electronic device
CN110176027A (en)*2019-05-272019-08-27腾讯科技(深圳)有限公司Video target tracking method, device, equipment and storage medium
CN110210430A (en)*2019-06-062019-09-06中国石油大学(华东)A kind of Activity recognition method and device
CN110298291A (en)*2019-06-252019-10-01吉林大学Ox face and ox face critical point detection method based on Mask-RCNN
CN110334602A (en)*2019-06-062019-10-15武汉市公安局视频侦查支队A kind of people flow rate statistical method based on convolutional neural networks
CN110490098A (en)*2019-07-312019-11-22恒大智慧科技有限公司Smoking behavior automatic testing method, equipment and the readable storage medium storing program for executing of community user
CN110647831A (en)*2019-09-122020-01-03华宇(大连)信息服务有限公司Court trial patrol method and system
CN110717428A (en)*2019-09-272020-01-21上海依图网络科技有限公司Identity recognition method, device, system, medium and equipment fusing multiple features
CN110750770A (en)*2019-08-182020-02-04浙江好络维医疗技术有限公司Method for unlocking electronic equipment based on electrocardiogram
CN110755108A (en)*2019-11-042020-02-07合肥望闻健康科技有限公司Heart sound classification method, system and device based on intelligent stethoscope and readable storage medium
CN110798703A (en)*2019-11-042020-02-14云目未来科技(北京)有限公司Method and device for detecting illegal video content and storage medium
CN110853636A (en)*2019-10-152020-02-28北京雷石天地电子技术有限公司 A system and method for generating verbatim lyrics files based on K-nearest neighbor algorithm
CN110852231A (en)*2019-11-042020-02-28云目未来科技(北京)有限公司Illegal video detection method and device and storage medium
CN110851148A (en)*2019-09-232020-02-28上海意略明数字科技股份有限公司Analysis system and method for recognizing user behavior data based on intelligent image
CN110879985A (en)*2019-11-182020-03-13西南交通大学 A face recognition model training method for anti-noise data
CN110909613A (en)*2019-10-282020-03-24Oppo广东移动通信有限公司 Video person recognition method, device, storage medium and electronic device
CN110942011A (en)*2019-11-182020-03-31上海极链网络科技有限公司Video event identification method, system, electronic equipment and medium
CN110956108A (en)*2019-11-222020-04-03华南理工大学 A Small Frequency Standard Detection Method Based on Feature Pyramid
CN110991246A (en)*2019-10-312020-04-10天津市国瑞数码安全系统股份有限公司Video detection method and system
CN110996123A (en)*2019-12-182020-04-10广州市百果园信息技术有限公司Video processing method, device, equipment and medium
CN111031330A (en)*2019-10-292020-04-17中国科学院大学 A method for content analysis of webcast based on multimodal fusion
CN111047879A (en)*2019-12-242020-04-21苏州奥易克斯汽车电子有限公司Vehicle overspeed detection method
CN111157007A (en)*2020-01-162020-05-15深圳市守行智能科技有限公司Indoor positioning method using cross vision
CN111191207A (en)*2019-12-232020-05-22深圳壹账通智能科技有限公司Electronic file control method and device, computer equipment and storage medium
CN111356014A (en)*2020-02-182020-06-30南京中新赛克科技有限责任公司Youtube video identification and matching method based on automatic learning
CN111414496A (en)*2020-03-272020-07-14腾讯科技(深圳)有限公司Artificial intelligence-based multimedia file detection method and device
CN111428591A (en)*2020-03-112020-07-17天津华来科技有限公司AI face image processing method, device, equipment and storage medium
CN111541940A (en)*2020-04-302020-08-14深圳创维-Rgb电子有限公司Motion compensation method and device for display equipment, television and storage medium
CN111563551A (en)*2020-04-302020-08-21支付宝(杭州)信息技术有限公司Multi-mode information fusion method and device and electronic equipment
CN111563488A (en)*2020-07-142020-08-21成都市映潮科技股份有限公司Video subject content identification method, system and storage medium
CN111724810A (en)*2019-03-192020-09-29杭州海康威视数字技术股份有限公司Audio classification method and device
CN111741356A (en)*2020-08-252020-10-02腾讯科技(深圳)有限公司 Quality inspection method, device, device and readable storage medium for double-recording video
CN111753762A (en)*2020-06-282020-10-09北京百度网讯科技有限公司 Recognition method, device, device and storage medium for key identification in video
CN111783718A (en)*2020-07-102020-10-16浙江大华技术股份有限公司Target object state identification method and device, storage medium and electronic device
CN111783507A (en)*2019-07-242020-10-16北京京东尚科信息技术有限公司 Target search method, apparatus, and computer-readable storage medium
CN111860222A (en)*2020-06-302020-10-30东南大学 Video action recognition method, system, computer equipment and storage medium based on dense-segmented frame sampling
CN111914759A (en)*2020-08-042020-11-10苏州市职业大学Pedestrian re-identification method, device, equipment and medium based on video clip
CN111985345A (en)*2020-07-272020-11-24腾讯科技(深圳)有限公司Play data processing method and medium
CN112052441A (en)*2020-08-242020-12-08深圳市芯汇群微电子技术有限公司Data decryption method of solid state disk based on face recognition and electronic equipment
CN112052911A (en)*2020-09-232020-12-08恒安嘉新(北京)科技股份公司Method and device for identifying riot and terrorist content in image, electronic equipment and storage medium
CN112150431A (en)*2020-09-212020-12-29京东数字科技控股股份有限公司UI visual walkthrough method and device, storage medium and electronic device
CN112149463A (en)*2019-06-272020-12-29京东方科技集团股份有限公司Image processing method and device
CN112231497A (en)*2020-10-192021-01-15腾讯科技(深圳)有限公司Information classification method and device, storage medium and electronic equipment
CN112241673A (en)*2019-07-192021-01-19浙江商汤科技开发有限公司Video method and device, electronic equipment and storage medium
CN112347821A (en)*2019-08-092021-02-09飞思达技术(北京)有限公司Method for extracting IPTV (Internet protocol television) and OTT (over the top) video features based on convolutional neural network
CN112581438A (en)*2020-12-102021-03-30腾讯科技(深圳)有限公司Slice image recognition method and device, storage medium and electronic equipment
CN112995666A (en)*2021-02-222021-06-18天翼爱音乐文化科技有限公司Video horizontal and vertical screen conversion method and device combined with scene switching detection
CN113055666A (en)*2019-12-262021-06-29武汉Tcl集团工业研究院有限公司Video quality evaluation method and device
CN113076566A (en)*2021-04-262021-07-06深圳市三旺通信股份有限公司Display content detection method, device, computer program product and storage medium
CN113077470A (en)*2021-03-262021-07-06天翼爱音乐文化科技有限公司Method, system, device and medium for cutting horizontal and vertical screen conversion picture
CN113220941A (en)*2021-06-012021-08-06平安科技(深圳)有限公司Video type obtaining method and device based on multiple models and electronic equipment
CN113283515A (en)*2021-05-312021-08-20广州宸祺出行科技有限公司Detection method and system for illegal passenger carrying for online taxi appointment
CN113435443A (en)*2021-06-282021-09-24中国兵器装备集团自动化研究所有限公司Method for automatically identifying landmark from video
CN113705563A (en)*2021-04-132021-11-26腾讯科技(深圳)有限公司Data processing method, device, equipment and storage medium
CN113779308A (en)*2021-11-122021-12-10冠传网络科技(南京)有限公司Short video detection and multi-classification method, device and storage medium
CN113821675A (en)*2021-06-302021-12-21腾讯科技(北京)有限公司Video identification method and device, electronic equipment and computer readable storage medium
CN113923472A (en)*2021-09-012022-01-11北京奇艺世纪科技有限公司Video content analysis method and device, electronic equipment and storage medium
CN114155454A (en)*2020-09-072022-03-08中国移动通信有限公司研究院Video processing method, device and storage medium
CN114189708A (en)*2021-12-072022-03-15国网电商科技有限公司 A kind of video content identification method and related device
CN114465737A (en)*2022-04-132022-05-10腾讯科技(深圳)有限公司Data processing method and device, computer equipment and storage medium
CN114626024A (en)*2022-05-122022-06-14北京吉道尔科技有限公司Internet infringement video low-consumption detection method and system based on block chain
CN114639164A (en)*2022-03-102022-06-17平安科技(深圳)有限公司Behavior recognition method, device and equipment based on voting mechanism and storage medium
CN114821272A (en)*2022-06-282022-07-29上海蜜度信息技术有限公司Image recognition method, image recognition system, image recognition medium, electronic device, and target detection model
CN114821401A (en)*2022-04-072022-07-29腾讯科技(深圳)有限公司Video auditing method, device, equipment, storage medium and program product
CN115049953A (en)*2022-05-092022-09-13中移(杭州)信息技术有限公司Video processing method, device, equipment and computer readable storage medium
CN115062186A (en)*2022-08-052022-09-16北京远鉴信息技术有限公司Video content retrieval method, device, equipment and storage medium
CN115529475A (en)*2021-12-292022-12-27北京智美互联科技有限公司Method and system for detecting video flow content and controlling wind
CN115755059A (en)*2022-11-232023-03-07中国船舶重工集团公司第七一五研究所Passive high-resolution processing method based on multi-scale deep convolution neural regression network
CN115908280A (en)*2022-11-032023-04-04广东科力新材料有限公司Data processing-based performance determination method and system for PVC calcium zinc stabilizer
CN117173608A (en)*2023-08-232023-12-05山东新一代信息产业技术研究院有限公司 Video content review methods and systems
CN117319749A (en)*2023-10-272023-12-29深圳金语科技有限公司Video data transmission method, device, equipment and storage medium
CN120550406A (en)*2025-07-312025-08-29赛力斯汽车有限公司 Game interactive control method, device, computer equipment and storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20100284617A1 (en)*2006-06-092010-11-11Sony Ericsson Mobile Communications AbIdentification of an object in media and of related media objects
WO2013122675A2 (en)*2011-12-162013-08-22The Research Foundation For The State University Of New YorkMethods of recognizing activity in video
CN103854014A (en)*2014-02-252014-06-11中国科学院自动化研究所 A horror video recognition method and device based on context sparse representation
CN104021544A (en)*2014-05-072014-09-03中国农业大学Greenhouse vegetable disease surveillance video key frame extracting method and extracting system
CN105550699A (en)*2015-12-082016-05-04北京工业大学CNN-based video identification and classification method through time-space significant information fusion
CN106407960A (en)*2016-11-092017-02-15浙江师范大学Multi-feature-based classification method and system for music genres
CN106599789A (en)*2016-07-292017-04-26北京市商汤科技开发有限公司Video class identification method and device, data processing device and electronic device
CN107247919A (en)*2017-04-282017-10-13深圳大学The acquisition methods and system of a kind of video feeling content
CN107590420A (en)*2016-07-072018-01-16北京新岸线网络技术有限公司Scene extraction method of key frame and device in video analysis
CN107609497A (en)*2017-08-312018-01-19武汉世纪金桥安全技术有限公司The real-time video face identification method and system of view-based access control model tracking technique
CN108053838A (en)*2017-12-012018-05-18上海壹账通金融科技有限公司With reference to audio analysis and fraud recognition methods, device and the storage medium of video analysis

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20100284617A1 (en)*2006-06-092010-11-11Sony Ericsson Mobile Communications AbIdentification of an object in media and of related media objects
WO2013122675A2 (en)*2011-12-162013-08-22The Research Foundation For The State University Of New YorkMethods of recognizing activity in video
CN103854014A (en)*2014-02-252014-06-11中国科学院自动化研究所 A horror video recognition method and device based on context sparse representation
CN104021544A (en)*2014-05-072014-09-03中国农业大学Greenhouse vegetable disease surveillance video key frame extracting method and extracting system
CN105550699A (en)*2015-12-082016-05-04北京工业大学CNN-based video identification and classification method through time-space significant information fusion
CN107590420A (en)*2016-07-072018-01-16北京新岸线网络技术有限公司Scene extraction method of key frame and device in video analysis
CN106599789A (en)*2016-07-292017-04-26北京市商汤科技开发有限公司Video class identification method and device, data processing device and electronic device
CN106407960A (en)*2016-11-092017-02-15浙江师范大学Multi-feature-based classification method and system for music genres
CN107247919A (en)*2017-04-282017-10-13深圳大学The acquisition methods and system of a kind of video feeling content
CN107609497A (en)*2017-08-312018-01-19武汉世纪金桥安全技术有限公司The real-time video face identification method and system of view-based access control model tracking technique
CN108053838A (en)*2017-12-012018-05-18上海壹账通金融科技有限公司With reference to audio analysis and fraud recognition methods, device and the storage medium of video analysis

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王深: "《基于聚类算法的多特征融合关键帧提取技术研究》", 《中国优秀硕士学位论文全文数据库 信息科技辑》*

Cited By (108)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110147711A (en)*2019-02-272019-08-20腾讯科技(深圳)有限公司Video scene recognition methods, device, storage medium and electronic device
CN110147711B (en)*2019-02-272023-11-14腾讯科技(深圳)有限公司Video scene recognition method and device, storage medium and electronic device
CN109829067A (en)*2019-03-052019-05-31北京达佳互联信息技术有限公司Audio data processing method, device, electronic equipment and storage medium
CN109886241A (en)*2019-03-052019-06-14天津工业大学 Driver fatigue detection based on long short-term memory network
CN111724810B (en)*2019-03-192023-11-24杭州海康威视数字技术股份有限公司Audio classification method and device
CN111724810A (en)*2019-03-192020-09-29杭州海康威视数字技术股份有限公司Audio classification method and device
CN109862394A (en)*2019-03-272019-06-07北京周同科技有限公司Checking method, device, equipment and the storage medium of video content
CN110110846A (en)*2019-04-242019-08-09重庆邮电大学Auxiliary driver's vehicle exchange method based on convolutional neural networks
CN110176027B (en)*2019-05-272023-03-14腾讯科技(深圳)有限公司Video target tracking method, device, equipment and storage medium
CN110176027A (en)*2019-05-272019-08-27腾讯科技(深圳)有限公司Video target tracking method, device, equipment and storage medium
CN110334602A (en)*2019-06-062019-10-15武汉市公安局视频侦查支队A kind of people flow rate statistical method based on convolutional neural networks
CN110210430A (en)*2019-06-062019-09-06中国石油大学(华东)A kind of Activity recognition method and device
CN110334602B (en)*2019-06-062021-10-26武汉市公安局视频侦查支队People flow statistical method based on convolutional neural network
CN110298291A (en)*2019-06-252019-10-01吉林大学Ox face and ox face critical point detection method based on Mask-RCNN
CN110298291B (en)*2019-06-252022-09-23吉林大学Mask-RCNN-based cow face and cow face key point detection method
CN112149463B (en)*2019-06-272024-04-23京东方科技集团股份有限公司 Image processing method and device
CN112149463A (en)*2019-06-272020-12-29京东方科技集团股份有限公司Image processing method and device
CN112241673A (en)*2019-07-192021-01-19浙江商汤科技开发有限公司Video method and device, electronic equipment and storage medium
CN111783507A (en)*2019-07-242020-10-16北京京东尚科信息技术有限公司 Target search method, apparatus, and computer-readable storage medium
CN110490098A (en)*2019-07-312019-11-22恒大智慧科技有限公司Smoking behavior automatic testing method, equipment and the readable storage medium storing program for executing of community user
CN112347821A (en)*2019-08-092021-02-09飞思达技术(北京)有限公司Method for extracting IPTV (Internet protocol television) and OTT (over the top) video features based on convolutional neural network
CN110750770A (en)*2019-08-182020-02-04浙江好络维医疗技术有限公司Method for unlocking electronic equipment based on electrocardiogram
CN110750770B (en)*2019-08-182023-10-03浙江好络维医疗技术有限公司Electrocardiogram-based method for unlocking electronic equipment
CN110647831A (en)*2019-09-122020-01-03华宇(大连)信息服务有限公司Court trial patrol method and system
CN110851148A (en)*2019-09-232020-02-28上海意略明数字科技股份有限公司Analysis system and method for recognizing user behavior data based on intelligent image
CN110717428A (en)*2019-09-272020-01-21上海依图网络科技有限公司Identity recognition method, device, system, medium and equipment fusing multiple features
CN110853636B (en)*2019-10-152022-04-15北京雷石天地电子技术有限公司System and method for generating word-by-word lyric file based on K nearest neighbor algorithm
CN110853636A (en)*2019-10-152020-02-28北京雷石天地电子技术有限公司 A system and method for generating verbatim lyrics files based on K-nearest neighbor algorithm
WO2021082941A1 (en)*2019-10-282021-05-06Oppo广东移动通信有限公司Video figure recognition method and apparatus, and storage medium and electronic device
CN110909613A (en)*2019-10-282020-03-24Oppo广东移动通信有限公司 Video person recognition method, device, storage medium and electronic device
CN110909613B (en)*2019-10-282024-05-31Oppo广东移动通信有限公司Video character recognition method and device, storage medium and electronic equipment
CN111031330A (en)*2019-10-292020-04-17中国科学院大学 A method for content analysis of webcast based on multimodal fusion
CN110991246A (en)*2019-10-312020-04-10天津市国瑞数码安全系统股份有限公司Video detection method and system
CN110798703A (en)*2019-11-042020-02-14云目未来科技(北京)有限公司Method and device for detecting illegal video content and storage medium
CN110755108A (en)*2019-11-042020-02-07合肥望闻健康科技有限公司Heart sound classification method, system and device based on intelligent stethoscope and readable storage medium
CN110852231A (en)*2019-11-042020-02-28云目未来科技(北京)有限公司Illegal video detection method and device and storage medium
CN110942011B (en)*2019-11-182021-02-02上海极链网络科技有限公司Video event identification method, system, electronic equipment and medium
CN110942011A (en)*2019-11-182020-03-31上海极链网络科技有限公司Video event identification method, system, electronic equipment and medium
CN110879985A (en)*2019-11-182020-03-13西南交通大学 A face recognition model training method for anti-noise data
CN110956108B (en)*2019-11-222023-04-18华南理工大学Small frequency scale detection method based on characteristic pyramid
CN110956108A (en)*2019-11-222020-04-03华南理工大学 A Small Frequency Standard Detection Method Based on Feature Pyramid
CN110996123B (en)*2019-12-182022-01-11广州市百果园信息技术有限公司Video processing method, device, equipment and medium
CN110996123A (en)*2019-12-182020-04-10广州市百果园信息技术有限公司Video processing method, device, equipment and medium
CN111191207A (en)*2019-12-232020-05-22深圳壹账通智能科技有限公司Electronic file control method and device, computer equipment and storage medium
CN111047879A (en)*2019-12-242020-04-21苏州奥易克斯汽车电子有限公司Vehicle overspeed detection method
CN113055666A (en)*2019-12-262021-06-29武汉Tcl集团工业研究院有限公司Video quality evaluation method and device
CN113055666B (en)*2019-12-262022-08-09武汉Tcl集团工业研究院有限公司Video quality evaluation method and device
CN111157007A (en)*2020-01-162020-05-15深圳市守行智能科技有限公司Indoor positioning method using cross vision
CN111356014A (en)*2020-02-182020-06-30南京中新赛克科技有限责任公司Youtube video identification and matching method based on automatic learning
CN111428591A (en)*2020-03-112020-07-17天津华来科技有限公司AI face image processing method, device, equipment and storage medium
CN111414496A (en)*2020-03-272020-07-14腾讯科技(深圳)有限公司Artificial intelligence-based multimedia file detection method and device
CN111414496B (en)*2020-03-272023-04-07腾讯科技(深圳)有限公司Artificial intelligence-based multimedia file detection method and device
CN111541940A (en)*2020-04-302020-08-14深圳创维-Rgb电子有限公司Motion compensation method and device for display equipment, television and storage medium
CN111541940B (en)*2020-04-302022-04-08深圳创维-Rgb电子有限公司 Motion compensation method, device, television and storage medium for display device
CN111563551A (en)*2020-04-302020-08-21支付宝(杭州)信息技术有限公司Multi-mode information fusion method and device and electronic equipment
CN111753762A (en)*2020-06-282020-10-09北京百度网讯科技有限公司 Recognition method, device, device and storage medium for key identification in video
CN111753762B (en)*2020-06-282024-03-15北京百度网讯科技有限公司Method, device, equipment and storage medium for identifying key identification in video
CN111860222A (en)*2020-06-302020-10-30东南大学 Video action recognition method, system, computer equipment and storage medium based on dense-segmented frame sampling
CN111783718A (en)*2020-07-102020-10-16浙江大华技术股份有限公司Target object state identification method and device, storage medium and electronic device
CN111563488A (en)*2020-07-142020-08-21成都市映潮科技股份有限公司Video subject content identification method, system and storage medium
CN111985345A (en)*2020-07-272020-11-24腾讯科技(深圳)有限公司Play data processing method and medium
CN111914759A (en)*2020-08-042020-11-10苏州市职业大学Pedestrian re-identification method, device, equipment and medium based on video clip
CN111914759B (en)*2020-08-042024-02-13苏州市职业大学Pedestrian re-identification method, device, equipment and medium based on video clips
CN112052441B (en)*2020-08-242021-09-28深圳市芯汇群微电子技术有限公司Data decryption method of solid state disk based on face recognition and electronic equipment
CN112052441A (en)*2020-08-242020-12-08深圳市芯汇群微电子技术有限公司Data decryption method of solid state disk based on face recognition and electronic equipment
CN111741356A (en)*2020-08-252020-10-02腾讯科技(深圳)有限公司 Quality inspection method, device, device and readable storage medium for double-recording video
CN111741356B (en)*2020-08-252020-12-08腾讯科技(深圳)有限公司 Quality inspection method, device, device and readable storage medium for double-recording video
CN114155454B (en)*2020-09-072025-04-04中国移动通信有限公司研究院 Video processing method, device and storage medium
CN114155454A (en)*2020-09-072022-03-08中国移动通信有限公司研究院Video processing method, device and storage medium
CN112150431A (en)*2020-09-212020-12-29京东数字科技控股股份有限公司UI visual walkthrough method and device, storage medium and electronic device
CN112052911A (en)*2020-09-232020-12-08恒安嘉新(北京)科技股份公司Method and device for identifying riot and terrorist content in image, electronic equipment and storage medium
CN112231497B (en)*2020-10-192024-04-09腾讯科技(深圳)有限公司Information classification method and device, storage medium and electronic equipment
CN112231497A (en)*2020-10-192021-01-15腾讯科技(深圳)有限公司Information classification method and device, storage medium and electronic equipment
CN112581438A (en)*2020-12-102021-03-30腾讯科技(深圳)有限公司Slice image recognition method and device, storage medium and electronic equipment
CN112581438B (en)*2020-12-102022-11-08腾讯医疗健康(深圳)有限公司Slice image recognition method and device, storage medium and electronic equipment
CN112995666A (en)*2021-02-222021-06-18天翼爱音乐文化科技有限公司Video horizontal and vertical screen conversion method and device combined with scene switching detection
CN113077470B (en)*2021-03-262022-01-18天翼爱音乐文化科技有限公司Method, system, device and medium for cutting horizontal and vertical screen conversion picture
CN113077470A (en)*2021-03-262021-07-06天翼爱音乐文化科技有限公司Method, system, device and medium for cutting horizontal and vertical screen conversion picture
CN113705563A (en)*2021-04-132021-11-26腾讯科技(深圳)有限公司Data processing method, device, equipment and storage medium
CN113076566B (en)*2021-04-262024-02-27深圳市三旺通信股份有限公司Display content detection method, apparatus, computer program product, and storage medium
CN113076566A (en)*2021-04-262021-07-06深圳市三旺通信股份有限公司Display content detection method, device, computer program product and storage medium
CN113283515A (en)*2021-05-312021-08-20广州宸祺出行科技有限公司Detection method and system for illegal passenger carrying for online taxi appointment
CN113283515B (en)*2021-05-312024-02-02广州宸祺出行科技有限公司Detection method and system for illegal passenger carrying of network appointment vehicle
CN113220941A (en)*2021-06-012021-08-06平安科技(深圳)有限公司Video type obtaining method and device based on multiple models and electronic equipment
CN113220941B (en)*2021-06-012022-08-02平安科技(深圳)有限公司Video type obtaining method and device based on multiple models and electronic equipment
CN113435443B (en)*2021-06-282023-04-18中国兵器装备集团自动化研究所有限公司Method for automatically identifying landmark from video
CN113435443A (en)*2021-06-282021-09-24中国兵器装备集团自动化研究所有限公司Method for automatically identifying landmark from video
CN113821675B (en)*2021-06-302024-06-07腾讯科技(北京)有限公司Video identification method, device, electronic equipment and computer readable storage medium
CN113821675A (en)*2021-06-302021-12-21腾讯科技(北京)有限公司Video identification method and device, electronic equipment and computer readable storage medium
CN113923472A (en)*2021-09-012022-01-11北京奇艺世纪科技有限公司Video content analysis method and device, electronic equipment and storage medium
CN113923472B (en)*2021-09-012023-09-01北京奇艺世纪科技有限公司Video content analysis method, device, electronic equipment and storage medium
CN113779308B (en)*2021-11-122022-02-25冠传网络科技(南京)有限公司Short video detection and multi-classification method, device and storage medium
CN113779308A (en)*2021-11-122021-12-10冠传网络科技(南京)有限公司Short video detection and multi-classification method, device and storage medium
CN114189708A (en)*2021-12-072022-03-15国网电商科技有限公司 A kind of video content identification method and related device
CN115529475A (en)*2021-12-292022-12-27北京智美互联科技有限公司Method and system for detecting video flow content and controlling wind
CN114639164B (en)*2022-03-102024-07-19平安科技(深圳)有限公司Behavior recognition method, device equipment and storage medium based on voting mechanism
CN114639164A (en)*2022-03-102022-06-17平安科技(深圳)有限公司Behavior recognition method, device and equipment based on voting mechanism and storage medium
CN114821401A (en)*2022-04-072022-07-29腾讯科技(深圳)有限公司Video auditing method, device, equipment, storage medium and program product
CN114465737A (en)*2022-04-132022-05-10腾讯科技(深圳)有限公司Data processing method and device, computer equipment and storage medium
CN115049953A (en)*2022-05-092022-09-13中移(杭州)信息技术有限公司Video processing method, device, equipment and computer readable storage medium
CN114626024A (en)*2022-05-122022-06-14北京吉道尔科技有限公司Internet infringement video low-consumption detection method and system based on block chain
CN114821272A (en)*2022-06-282022-07-29上海蜜度信息技术有限公司Image recognition method, image recognition system, image recognition medium, electronic device, and target detection model
CN115062186A (en)*2022-08-052022-09-16北京远鉴信息技术有限公司Video content retrieval method, device, equipment and storage medium
CN115908280A (en)*2022-11-032023-04-04广东科力新材料有限公司Data processing-based performance determination method and system for PVC calcium zinc stabilizer
CN115755059A (en)*2022-11-232023-03-07中国船舶重工集团公司第七一五研究所Passive high-resolution processing method based on multi-scale deep convolution neural regression network
CN117173608A (en)*2023-08-232023-12-05山东新一代信息产业技术研究院有限公司 Video content review methods and systems
CN117319749A (en)*2023-10-272023-12-29深圳金语科技有限公司Video data transmission method, device, equipment and storage medium
CN120550406A (en)*2025-07-312025-08-29赛力斯汽车有限公司 Game interactive control method, device, computer equipment and storage medium

Similar Documents

PublicationPublication DateTitle
CN109376603A (en)A kind of video frequency identifying method, device, computer equipment and storage medium
ZhangDeepfake generation and detection, a survey
Pan et al.Deepfake detection through deep learning
Liu et al.Learning human pose models from synthesized data for robust RGB-D action recognition
Chen et al.Chinesefoodnet: A large-scale image dataset for chinese food recognition
Deng et al.Image aesthetic assessment: An experimental survey
CN113762138B (en)Identification method, device, computer equipment and storage medium for fake face pictures
CN113569895B (en) Image processing model training method, processing method, device, equipment and medium
CN104246656B (en)It is recommended that video editing automatic detection
US8391617B2 (en)Event recognition using image and location information
Ben Tamou et al.Multi-stream fish detection in unconstrained underwater videos by the fusion of two convolutional neural network detectors
Xu et al.Saliency prediction on omnidirectional image with generative adversarial imitation learning
CN114360073B (en)Image recognition method and related device
Hou et al.Text-aware single image specular highlight removal
Park et al.Performance comparison and visualization of ai-generated-image detection methods
CN104376308B (en)A kind of human motion recognition method based on multi-task learning
Alfarano et al.A novel convmixer transformer based architecture for violent behavior detection
Roy et al.Unmasking deepfake visual content with generative AI
Daniilidis et al.Computer Vision--ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part V
CN116896654B (en)Video processing method and related device
CN119172634A (en) A panoramic video navigation method driven by user subjective preference
Khedkar et al.Exploiting spatiotemporal inconsistencies to detect deepfake videos in the wild
CN116310589A (en) An image classification method and device
Lahrache et al.A survey on image memorability prediction: From traditional to deep learning models
CN117156078B (en)Video data processing method and device, electronic equipment and storage medium

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
RJ01Rejection of invention patent application after publication

Application publication date:20190222

RJ01Rejection of invention patent application after publication

[8]ページ先頭

©2009-2025 Movatter.jp