Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, completeSite preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hairEmbodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative effortsExample, shall fall within the protection scope of the present invention.
It should be appreciated that ought use in this specification and in the appended claims, term " includes " and "comprising" instructionDescribed feature, entirety, step, operation, the presence of element and/or component, but one or more of the other feature, whole is not precludedBody, step, operation, the presence or addition of element, component and/or its set.
It is also understood that mesh of the term used in this description of the invention merely for the sake of description specific embodimentAnd be not intended to limit the present invention.As description of the invention and it is used in the attached claims, unless onOther situations are hereafter clearly indicated, otherwise " one " of singular, "one" and "the" are intended to include plural form.
It will be further appreciated that the term "and/or" used in description of the invention and the appended claims isRefer to any combination and all possible combinations of one or more of associated item listed, and including these combinations.
Fig. 1 and Fig. 2 are please referred to, Fig. 1 is that the application scenarios of video data detection method provided in an embodiment of the present invention are illustratedFigure;Fig. 2 is the flow diagram of video data detection method provided in an embodiment of the present invention, the video data detection method applicationIn server, this method is executed by the application software being installed in server.
As shown in Fig. 2, the method comprising the steps of S110~S140.
S110, video to be searched is received, is decomposed by video and video to be searched is decomposed into multiframe picture to be searched, at randomOne or more picture to be searched is obtained in multiframe picture to be searched to form pictures to be searched.
In the present embodiment, for the clearer usage scenario for understanding technical solution, below to involved terminalIt is introduced.It wherein, in this application, is to carry out description technique scheme from the angle of server.
First is that server, the real time video data that can be uploaded to user terminal in server is (as being broadcast live upload in real time onlineVideo data) or non-real-time video data (video data that such as film has complete playing duration) progress real-time verification,Judge whether there is pornographic element.
Second is that user terminal, for uploading real time video data or non-real-time video data to server.
When user terminal is selected or acquires video to be searched in real time, when being uploaded to server, server first receives instituteVideo to be searched is stated, video decomposition then is carried out to video to be searched, is in order to disassemble short-sighted frequency for multiframe picture, to be directed toEach frame picture carries out the extraction of feature vector.
By the forming process of video it is found that video is made of multiframe picture, such as each second video can be converted to 24-30 pictures.By video disassembling tool, (such as Adobe Premiere, editing goes out to need to export sequence in Adobe PremiereThe segment of column bitmap, output is at sequence bitmap in File menu) each video to be searched can be subjected to video decomposition,It obtains and every multiframe picture to be searched.
For example, if when server is short video sharing platform, in server in order to avoid the upload of pornographic video (generally toSearch video is 10-15 seconds short-sighted frequencies, namely including 240-450 frame picture), picture spy is converted at this time in order to reduce pictureThe workload for levying vector, can be in this 240-300 frame picture by selecting one at random in corresponding 24-30 frame picture per second, most10-15 picture can be then chosen to carry out next step analysis in total afterwards.
S120, each picture progress feature extraction to be searched of search pictures concentration is treated by convolutional neural networks, obtainWith the one-to-one picture feature vector of picture to be searched each in pictures to be searched, to form picture feature vector set.
In the present embodiment, by crossing convolutional neural networks energy rapidly extracting picture feature vector.Convolutional neural networks are logicalOften include following several layers:
A) convolutional layer (Convolutional layer), every layer of convolutional layer is by several convolution units in convolutional Neural networkComposition, the parameter of each convolution unit is optimized by back-propagation algorithm.The purpose of convolution algorithm is that extraction is defeatedThe different characteristic entered, first layer convolutional layer may can only extract some rudimentary features such as levels such as edge, lines and angle, moreLayer network can from low-level features the more complicated feature of iterative extraction.
B) line rectification layer (Rectified LinearUnits layer, ReLU layer), this layer of neural activityChange function (Activation function) and uses line rectification (Rectified LinearUnits, ReLU).
C) pond layer (Pooling layer) can obtain the very big feature of dimension usually after convolutional layer, feature is cutAt several regions, its maximum value or average value are taken, obtains new, the lesser feature of dimension.
D) full articulamentum (Fully-Connected layer), combines all local features and becomes global characteristics, be used toCalculate the score of last every one kind.
In one embodiment, as shown in figure 3, step S120 includes:
S121, each picture to be searched is pre-processed, picture after being pre-processed, and corresponding with picture after pretreatmentPicture pixels matrix;Wherein, each picture to be searched is subjected to pretreatment sequentially to carry out gray scale to the picture to be searchedChange, edge detection and binary conversion treatment;
S122, picture pixels Input matrix corresponding with picture after pretreatment is inputted into convolutional neural networks modelLayer, obtains multiple characteristic patterns;
S123, each characteristic pattern is input to pond layer in convolutional neural networks model, obtains corresponding with each characteristic pattern oneDimensional vector;
S124, the corresponding one-dimensional vector of each characteristic pattern is separately input into full articulamentum in convolutional neural networks model, obtainedTo picture feature vector corresponding with each characteristic pattern to form picture feature vector set.
In the present embodiment, gray processing, edge detection and binary conversion treatment are successively carried out to each picture to be searchedPicture after being pre-processed.
Since color image includes more information, but if directly handling color image, the execution of systemSpeed will reduce, and storage space can also become larger.The gray processing of color image is the basic method of one kind of image procossing, in mouldFormula identification field widely used, reasonable gray processing by image information extraction and subsequent processing have very big sideIt helps, storage space, speed up processing can be saved.
The method of edge detection is the situation of change of pixel gray scale in some field of image under consideration, reference numbers imageThe middle apparent point of brightness change.The edge detection of image can significantly reduce data volume, and reject incoherent information,Save the important structure attribute of image.There are many operator for edge detection, commonly in addition to there is Sobel operator (i.e. SobelOperator), there are also Laplacian edge detection operator (i.e. Lapalace edge detection operator), Canny edge detection operators (i.e.Canny operator) etc..
In order to reduce the influence of noise, need to carry out binary conversion treatment to the image after progress edge detection, binaryzation isA seed type of thresholding is carried out to image.According to the selection situation of threshold value, the method for binaryzation can be divided into Global thresholding, moveState threshold method and local thresholding method commonly use maximum variance between clusters (also referred to as Otsu algorithm) and carry out thresholding, to reject some laddersThe lesser pixel of angle value, the pixel value of image is 0 or 255 after binary conversion treatment.At this point, picture after pretreatment can be obtained,And picture pixels matrix corresponding with picture after pretreatment.
When obtaining the feature vector of picture, picture pixels matrix corresponding with picture after the pretreatment of each frame is first obtained,Then using the corresponding picture pixels matrix of picture after the pretreatment of each frame as the input of input layer in convolutional neural networks model,Obtain multiple characteristic patterns, later by characteristic pattern input pond layer, obtain corresponding to the corresponding maximum value of each characteristic pattern it is one-dimensional toAmount, is finally input to full articulamentum for one-dimensional vector corresponding to the corresponding maximum value of each characteristic pattern, obtains locating in advance with each frameThe corresponding picture feature vector of picture after reason, to form picture feature vector set.
S130, obtain in the picture feature vector set each picture feature vector with it is each in the Target Photo library constructed in advanceSimilarity between the feature vector of picture, by the Target Photo library with picture feature vector in picture feature vector set itBetween the degree of approximation exceed the picture of default degree of approximation threshold value using as recognition result.
In the present embodiment, when obtain pre-processed with each frame after the corresponding picture feature vector of picture after, need to calculateWhether the similarity between the feature vector of picture each in Target Photo library judges to have in Target Photo library and scheme with after pretreatmentThe approximate picture of piece.Due to stored in Target Photo library be collect in advance there are the pictures of pornographic element, if therefore picture libraryIn have with pretreatment after the approximate picture of picture, indicate pretreatment after picture there are the suspicion of porny, need to forbid at this timeThe upload or live streaming of video to be searched.
In one embodiment, before step S130 further include:
By picture feature vector each in the picture feature vector set by principal component analysis dimensionality reduction, obtain and each pictureFeature vector picture principal vector to be searched correspondingly.
In the present embodiment, the purpose of PCA (principal component analysis) algorithm is under the premise of " information " loss is lesser, by higher-dimensionData be transformed into low-dimensional, the maximum individual difference showed by principal component of extracting, can also be used to cut down regression analysis andThe number of variable in clustering, to reduce calculation amount.PCA is a kind of more common dimensionality reduction technology, the thought of PCA be byDimensional feature is mapped in dimension, this dimension is completely new orthogonal characteristic.This dimensional feature is known as pivot, is the Wei Te for reconfiguring outSign.In PCA, data are transformed under new coordinate system from original coordinate system, and selection and the data itself of new coordinate system areClosely related.What first new reference axis selected is the maximum direction of variance in initial data, second new reference axis selectionDirection orthogonal with first reference axis and with maximum variance.The process repeats always, and number of repetition is special in initial dataThe number of sign.Most of variance is included in several new reference axis of foremost.Therefore, remaining reference axis can be ignored,Dimension-reduction treatment is carried out to data.
In one embodiment, step S130 includes:
The principal vector of each picture in each picture principal vector to be searched and the Target Photo library is subjected to Pearson came similarityIt calculates, obtains the similarity in each picture principal vector to be searched and the Target Photo library between the principal vector of each picture.
In the present embodiment, when will by picture feature vector each in the picture feature vector set pass through principal component analysis dropAfter dimension obtains each picture principal vector to be searched, the principal vector of each picture in calculating picture principal vector to be searched and Target Photo librarySimilarity degree when, can be using Pearson came similarity as reference standard.The value range of Pearson came similarity is 0~1, if more connecingNearly 1 indicates that similarity degree is higher, if indicating that similarity degree is low closer to 0.
The Pearson came similarity between any two vector is calculated, can be calculate by the following formula:
Wherein, E indicates mathematic expectaion;
ρX,YValue range be (0,1), work as ρX,YIt indicates that the similarity degree of two column vectors is higher closer to 1, works as ρX,YMoreIndicate that the similarity degree of two vectors is lower close to 0.
If the corresponding picture number of S140, recognition result is greater than or equal to one, video to be searched is verified and is not led toMark is crossed, video to be searched is carried out uploading interception or live streaming is interrupted.
In the present embodiment, the corresponding picture feature vector of each picture in Target Photo library (can also each picturePrincipal vector), and the manual or automatic porny screened of magnanimity is stored in Target Photo library, in this way when view to be searchedFrequently the degree of approximation that corresponding picture feature vector is concentrated with the feature vector of picture in picture feature vector and Target Photo library surpassesDegree of approximation threshold value is preset out, then it represents that by there is the suspicion there are porny in the multiframe picture of video acquisition to be searched, at this timeUnverified mark need to be carried out to the video to be searched, be shared with preventing from passing to server thereon, or cutDisconnected online live streaming is to prevent pornographic live streaming.
In one embodiment, after step S140 further include:
If the corresponding picture number of recognition result carries out audio extraction less than one, to the video to be searched, sound is obtainedFrequency extracts as a result, obtaining the text of the audio extraction result by speech recognition modeling to obtain text identification result;
The text identification result is segmented, word segmentation result is obtained;
If in the word segmentation result including with pre-set sensitive keys set of words identical one or how each keyWord carries out uploading interception or live streaming is interrupted to video to be searched.
In the present embodiment, if the corresponding picture number of recognition result is less than one, in the picture for indicating video to be searchedThere is no porny, but in order to avoid there is the appearance of the audio of pornographic element, carry out audio to the video to be searched at this time and mentionIt takes, obtains audio extraction as a result, obtaining the text of the audio extraction result by speech recognition modeling to obtain text identificationAs a result.
Removing the video channel information in video to be searched can be obtained audio extraction as a result, at this time by speech recognition mouldType identifies the audio extraction result, obtains recognition result.The audio extraction knot is obtained by speech recognition modelingThe text of fruit obtains text identification result.
It when being segmented later to text identification result, is divided by the segmenting method based on probability statistics modelWord.For example, enabling C=C1C2...Cm, C is Chinese character string to be slit, and to enable W=W1W2...Wn, W be cutting as a result, Wa,Wb ... .Wk is all possible cutting scheme of C.So, the segmentation model based on probability statistics is to find purposeWord string W, so that W meets: P (W | C)=MAX (P (Wa | C), P (Wb | C) ... P (Wk | C)) participle model, above-mentioned participle modelObtained word string W i.e. estimated probability is the word string of maximum.
If in the word segmentation result including with pre-set sensitive keys set of words identical one or how each keyWord indicates also to need to upload video to be searched at this time there are sensitive word (usually pornographic vocabulary) in the word segmentation resultIt intercepts or live streaming is interrupted.
In one embodiment, described that the text of the audio extraction result is obtained to obtain text by speech recognition modelingThe step of recognition result, comprising:
The audio extraction result is identified by N-gram model, to obtain recognition result.
In the present embodiment, it when identify to the voice to be identified by the N-gram model, identifiesTo be a whole word, such as " hello, in What for ", the voice to be identified can be carried out by N-gram modelEffectively identification, obtains the maximum sentence of identification probability as recognition result.
Wherein, the N-gram model is will to practice collection corpus and be input to initial N-gram model and be trained to obtain.Training set corpus is general corpus, and the vocabulary in general corpus is not partial to a certain specific field, but each fieldVocabulary all relate to.
In one embodiment, after step S140 further include:
If the corresponding picture number of recognition result less than one, by the video name of the video to be searched to it is described toThe video tab of search video extracts, and obtains tag extraction result;
If including identical one or mostly each with pre-set sensitive keys set of words in the tag extraction resultKeyword carries out uploading interception or live streaming is interrupted to video to be searched.
In the present embodiment, when user uploads video to be searched, can generally video tab be carried out to video to be searched and setIt sets, user setting includes the video tab of sensitive vocabulary in order to prevent, also to the video name of the video to be searched to describedThe video tab of video to be searched extracts, and obtains tag extraction result.If include in the tag extraction result in advanceIdentical one or how each keyword in the sensitive keys set of words of setting, then it represents that the video tab of the video to be searched is depositedIn sensitive vocabulary, also needs to carry out uploading interception to video to be searched at this time or live streaming is interrupted.
This method passes through to the feature stored in picture feature vector set corresponding to video to be searched and Target Photo inventorySimilarity between vector can quickly judge whether video to be searched is pornographic video or pornographic live streaming, to realize automaticInspection.
The embodiment of the present invention also provides a kind of video data detection device, and the video data detection device is aforementioned for executingAny embodiment of video data detection method.Specifically, referring to Fig. 4, Fig. 4 is video data provided in an embodiment of the present inventionThe schematic block diagram of detection device.The video data detection device 100 can be configured in server.
As shown in figure 4, video data detection device 100 include video split cells 110, feature vector acquiring unit 120,Image identification unit 130, video processing unit 140.
Video split cells 110 is decomposed by video video to be searched being decomposed into multiframe for receiving video to be searchedPicture to be searched obtains one or more picture to be searched in multiframe picture to be searched at random to form pictures to be searched.
In the present embodiment, when user terminal is selected or acquires video to be searched in real time, when being uploaded to server, clothesBusiness device first receives the video to be searched, then carries out video decomposition to video to be searched, is to be more by the dismantling of short-sighted frequencyFrame picture, to carry out the extraction of feature vector for each frame picture.
By the forming process of video it is found that video is made of multiframe picture, such as each second video can be converted to 24-30 pictures.By video disassembling tool, (such as Adobe Premiere, editing goes out to need to export sequence in Adobe PremiereThe segment of column bitmap, output is at sequence bitmap in File menu) each video to be searched can be subjected to video decomposition,It obtains and every multiframe picture to be searched.
For example, if when server is short video sharing platform, in server in order to avoid the upload of pornographic video (generally toSearch video is 10-15 seconds short-sighted frequencies, namely including 240-450 frame picture), picture spy is converted at this time in order to reduce pictureThe workload for levying vector, can be in this 240-300 frame picture by selecting one at random in corresponding 24-30 frame picture per second, most10-15 picture can be then chosen to carry out next step analysis in total afterwards.
Feature vector acquiring unit 120 concentrates each figure to be searched for treating search pictures by convolutional neural networksPiece carry out feature extraction, obtain with the one-to-one picture feature vector of picture to be searched each in pictures to be searched, with groupAt picture feature vector set.
In the present embodiment, by crossing convolutional neural networks energy rapidly extracting picture feature vector.Convolutional neural networks are logicalOften include following several layers:
A) convolutional layer (Convolutional layer), every layer of convolutional layer is by several convolution units in convolutional Neural networkComposition, the parameter of each convolution unit is optimized by back-propagation algorithm.The purpose of convolution algorithm is that extraction is defeatedThe different characteristic entered, first layer convolutional layer may can only extract some rudimentary features such as levels such as edge, lines and angle, moreLayer network can from low-level features the more complicated feature of iterative extraction.
B) line rectification layer (Rectified LinearUnits layer, ReLU layer), this layer of neural activityChange function (Activation function) and uses line rectification (Rectified LinearUnits, ReLU).
C) pond layer (Pooling layer) can obtain the very big feature of dimension usually after convolutional layer, feature is cutAt several regions, its maximum value or average value are taken, obtains new, the lesser feature of dimension.
D) full articulamentum (Fully-Connected layer), combines all local features and becomes global characteristics, be used toCalculate the score of last every one kind.
In one embodiment, as shown in figure 5, feature vector acquiring unit 120 includes:
Pretreatment unit 121, for each picture to be searched to be pre-processed, picture after being pre-processed, and with it is pre-The corresponding picture pixels matrix of picture after processing;Wherein, each picture to be searched is subjected to pretreatment as sequentially to described wait searchRope picture carries out gray processing, edge detection and binary conversion treatment;
Convolution unit 122, for will picture pixels Input matrix corresponding with picture after pretreatment to convolutional neural networksInput layer in model obtains multiple characteristic patterns;
Pond unit 123 obtains and each feature for each characteristic pattern to be input to pond layer in convolutional neural networks modelScheme corresponding one-dimensional vector;
Full connection unit 124, for the corresponding one-dimensional vector of each characteristic pattern to be separately input into convolutional neural networks modelIn full articulamentum, obtain picture feature vector corresponding with each characteristic pattern to form picture feature vector set.
In the present embodiment, gray processing, edge detection and binary conversion treatment are successively carried out to each picture to be searchedPicture after being pre-processed.
Since color image includes more information, but if directly handling color image, the execution of systemSpeed will reduce, and storage space can also become larger.The gray processing of color image is the basic method of one kind of image procossing, in mouldFormula identification field widely used, reasonable gray processing by image information extraction and subsequent processing have very big sideIt helps, storage space, speed up processing can be saved.
The method of edge detection is the situation of change of pixel gray scale in some field of image under consideration, reference numbers imageThe middle apparent point of brightness change.The edge detection of image can significantly reduce data volume, and reject incoherent information,Save the important structure attribute of image.There are many operator for edge detection, commonly in addition to there is Sobel operator (i.e. SobelOperator), there are also Laplacian edge detection operator (i.e. Lapalace edge detection operator), Canny edge detection operators (i.e.Canny operator) etc..
In order to reduce the influence of noise, need to carry out binary conversion treatment to the image after progress edge detection, binaryzation isA seed type of thresholding is carried out to image.According to the selection situation of threshold value, the method for binaryzation can be divided into Global thresholding, moveState threshold method and local thresholding method commonly use maximum variance between clusters (also referred to as Otsu algorithm) and carry out thresholding, to reject some laddersThe lesser pixel of angle value, the pixel value of image is 0 or 255 after binary conversion treatment.At this point, picture after pretreatment can be obtained,And picture pixels matrix corresponding with picture after pretreatment.
When obtaining the feature vector of picture, picture pixels matrix corresponding with picture after the pretreatment of each frame is first obtained,Then using the corresponding picture pixels matrix of picture after the pretreatment of each frame as the input of input layer in convolutional neural networks model,Obtain multiple characteristic patterns, later by characteristic pattern input pond layer, obtain corresponding to the corresponding maximum value of each characteristic pattern it is one-dimensional toAmount, is finally input to full articulamentum for one-dimensional vector corresponding to the corresponding maximum value of each characteristic pattern, obtains locating in advance with each frameThe corresponding picture feature vector of picture after reason, to form picture feature vector set.
Image identification unit 130, for obtaining each picture feature vector and building in advance in the picture feature vector setTarget Photo library in each picture feature vector between similarity, by the Target Photo library with picture feature vector setThe degree of approximation between middle picture feature vector exceeds the picture of default degree of approximation threshold value using as recognition result.
In the present embodiment, when obtain pre-processed with each frame after the corresponding picture feature vector of picture after, need to calculateWhether the similarity between the feature vector of picture each in Target Photo library judges to have in Target Photo library and scheme with after pretreatmentThe approximate picture of piece.Due to stored in Target Photo library be collect in advance there are the pictures of pornographic element, if therefore picture libraryIn have with pretreatment after the approximate picture of picture, indicate pretreatment after picture there are the suspicion of porny, need to forbid at this timeThe upload or live streaming of video to be searched.
In one embodiment, the video data detection device 100 further include:
Dimensionality reduction unit, for picture feature vector each in the picture feature vector set to be passed through principal component analysis dimensionality reduction,It obtains and each picture feature vector picture principal vector to be searched correspondingly.
In the present embodiment, the purpose of PCA (principal component analysis) algorithm is under the premise of " information " loss is lesser, by higher-dimensionData be transformed into low-dimensional, the maximum individual difference showed by principal component of extracting, can also be used to cut down regression analysis andThe number of variable in clustering, to reduce calculation amount.PCA is a kind of more common dimensionality reduction technology, the thought of PCA be byDimensional feature is mapped in dimension, this dimension is completely new orthogonal characteristic.This dimensional feature is known as pivot, is the Wei Te for reconfiguring outSign.In PCA, data are transformed under new coordinate system from original coordinate system, and selection and the data itself of new coordinate system areClosely related.What first new reference axis selected is the maximum direction of variance in initial data, second new reference axis selectionDirection orthogonal with first reference axis and with maximum variance.The process repeats always, and number of repetition is special in initial dataThe number of sign.Most of variance is included in several new reference axis of foremost.Therefore, remaining reference axis can be ignored,Dimension-reduction treatment is carried out to data.
In one embodiment, image identification unit 130 is also used to:
The principal vector of each picture in each picture principal vector to be searched and the Target Photo library is subjected to Pearson came similarityIt calculates, obtains the similarity in each picture principal vector to be searched and the Target Photo library between the principal vector of each picture.
In the present embodiment, when will by picture feature vector each in the picture feature vector set pass through principal component analysis dropAfter dimension obtains each picture principal vector to be searched, the principal vector of each picture in calculating picture principal vector to be searched and Target Photo librarySimilarity degree when, can be using Pearson came similarity as reference standard.The value range of Pearson came similarity is 0~1, if more connecingNearly 1 indicates that similarity degree is higher, if indicating that similarity degree is low closer to 0.
The Pearson came similarity between any two vector is calculated, can be calculate by the following formula:
Wherein, E indicates mathematic expectaion;
ρX,YValue range be (0,1), work as ρX,YIt indicates that the similarity degree of two column vectors is higher closer to 1, works as ρX,YMoreIndicate that the similarity degree of two vectors is lower close to 0.
Video processing unit 140, if being greater than or equal to one for the corresponding picture number of recognition result, to view to be searchedFrequency is verified not by mark, carries out uploading interception to video to be searched or live streaming is interrupted.
In the present embodiment, the corresponding picture feature vector of each picture in Target Photo library (can also each picturePrincipal vector), and the manual or automatic porny screened of magnanimity is stored in Target Photo library, in this way when view to be searchedFrequently the degree of approximation that corresponding picture feature vector is concentrated with the feature vector of picture in picture feature vector and Target Photo library surpassesDegree of approximation threshold value is preset out, then it represents that by there is the suspicion there are porny in the multiframe picture of video acquisition to be searched, at this timeUnverified mark need to be carried out to the video to be searched, be shared with preventing from passing to server thereon, or cutDisconnected online live streaming is to prevent pornographic live streaming.
In one embodiment, the video data detection device 100 further include:
Audio identification unit, if for the corresponding picture number of recognition result less than one, to the video to be searched intoRow audio extraction obtains audio extraction as a result, obtaining the text of the audio extraction result by speech recognition modeling to obtainText identification result;
Participle unit obtains word segmentation result for segmenting to the text identification result;
First sensitive word judging unit, if for including in the word segmentation result and pre-set sensitive keys set of wordsIn identical one or how each keyword, to video to be searched carry out upload interception or live streaming interrupt.
In the present embodiment, if the corresponding picture number of recognition result is less than one, in the picture for indicating video to be searchedThere is no porny, but in order to avoid there is the appearance of the audio of pornographic element, carry out audio to the video to be searched at this time and mentionIt takes, obtains audio extraction as a result, obtaining the text of the audio extraction result by speech recognition modeling to obtain text identificationAs a result.
Removing the video channel information in video to be searched can be obtained audio extraction as a result, at this time by speech recognition mouldType identifies the audio extraction result, obtains recognition result.The audio extraction knot is obtained by speech recognition modelingThe text of fruit obtains text identification result.
It when being segmented later to text identification result, is divided by the segmenting method based on probability statistics modelWord.For example, enabling C=C1C2...Cm, C is Chinese character string to be slit, and to enable W=W1W2...Wn, W be cutting as a result, Wa,Wb ... .Wk is all possible cutting scheme of C.So, the segmentation model based on probability statistics is to find purposeWord string W, so that W meets: P (W | C)=MAX (P (Wa | C), P (Wb | C) ... P (Wk | C)) participle model, above-mentioned participle modelObtained word string W i.e. estimated probability is the word string of maximum.
If in the word segmentation result including with pre-set sensitive keys set of words identical one or how each keyWord indicates also to need to upload video to be searched at this time there are sensitive word (usually pornographic vocabulary) in the word segmentation resultIt intercepts or live streaming is interrupted.
In one embodiment, the audio identification unit is also used to:
The audio extraction result is identified by N-gram model, to obtain recognition result.
In the present embodiment, it when identify to the voice to be identified by the N-gram model, identifiesTo be a whole word, such as " hello, in What for ", the voice to be identified can be carried out by N-gram modelEffectively identification, obtains the maximum sentence of identification probability as recognition result.
Wherein, the N-gram model is will to practice collection corpus and be input to initial N-gram model and be trained to obtain.Training set corpus is general corpus, and the vocabulary in general corpus is not partial to a certain specific field, but each fieldVocabulary all relate to.
In one embodiment, the video data detection device 100 further include:
Tag extraction unit, if passing through the video to be searched less than one for the corresponding picture number of recognition resultVideo name the video tab of the video to be searched is extracted, obtain tag extraction result;
Second sensitive word judging unit, if for including in the tag extraction result and pre-set sensitive keys wordIdentical one or how each keyword in set carry out uploading interception or live streaming are interrupted to video to be searched.
In the present embodiment, when user uploads video to be searched, can generally video tab be carried out to video to be searched and setIt sets, user setting includes the video tab of sensitive vocabulary in order to prevent, also to the video name of the video to be searched to describedThe video tab of video to be searched extracts, and obtains tag extraction result.If include in the tag extraction result in advanceIdentical one or how each keyword in the sensitive keys set of words of setting, then it represents that the video tab of the video to be searched is depositedIn sensitive vocabulary, also needs to carry out uploading interception to video to be searched at this time or live streaming is interrupted.
The device passes through to the feature stored in picture feature vector set corresponding to video to be searched and Target Photo inventorySimilarity between vector can quickly judge whether video to be searched is pornographic video or pornographic live streaming, to realize automaticInspection.
Above-mentioned video data detection device can be implemented as the form of computer program, which can such as schemeIt is run in computer equipment shown in 6.
Referring to Fig. 6, Fig. 6 is the schematic block diagram of computer equipment provided in an embodiment of the present invention.The computer equipment500 be server, and server can be independent server, is also possible to the server cluster of multiple server compositions.
Refering to Fig. 6, which includes processor 502, memory and the net connected by system bus 501Network interface 505, wherein memory may include non-volatile memory medium 503 and built-in storage 504.
The non-volatile memory medium 503 can storage program area 5031 and computer program 5032.The computer program5032 are performed, and processor 502 may make to execute video data detection method.
The processor 502 supports the operation of entire computer equipment 500 for providing calculating and control ability.
The built-in storage 504 provides environment for the operation of the computer program 5032 in non-volatile memory medium 503, shouldWhen computer program 5032 is executed by processor 502, processor 502 may make to execute video data detection method.
The network interface 505 is for carrying out network communication, such as the transmission of offer data information.Those skilled in the art canTo understand, structure shown in Fig. 6, only the block diagram of part-structure relevant to the present invention program, is not constituted to this hairThe restriction for the computer equipment 500 that bright scheme is applied thereon, specific computer equipment 500 may include than as shown in the figureMore or fewer components perhaps combine certain components or with different component layouts.
Wherein, the processor 502 is for running computer program 5032 stored in memory, to realize the present inventionVideo data detection method in embodiment.
It will be understood by those skilled in the art that the embodiment of computer equipment shown in Fig. 6 is not constituted to computerThe restriction of equipment specific composition, in other embodiments, computer equipment may include components more more or fewer than diagram, orPerson combines certain components or different component layouts.For example, in some embodiments, computer equipment can only include depositingReservoir and processor, in such embodiments, the structure and function of memory and processor are consistent with embodiment illustrated in fig. 6,Details are not described herein.
It should be appreciated that in embodiments of the present invention, processor 502 can be central processing unit (CentralProcessing Unit, CPU), which can also be other general processors, digital signal processor (DigitalSignal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit,ASIC), ready-made programmable gate array (Field-Programmable GateArray, FPGA) or other programmable logic devicesPart, discrete gate or transistor logic, discrete hardware components etc..Wherein, general processor can be microprocessor orThe processor is also possible to any conventional processor etc..
Computer readable storage medium is provided in another embodiment of the invention.The computer readable storage medium can be withFor non-volatile computer readable storage medium.The computer-readable recording medium storage has computer program, wherein calculatingThe video data detection method in the embodiment of the present invention is realized when machine program is executed by processor.
It is apparent to those skilled in the art that for convenience of description and succinctly, foregoing description is setThe specific work process of standby, device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.Those of ordinary skill in the art may be aware that unit described in conjunction with the examples disclosed in the embodiments of the present disclosure and algorithmStep can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware and softwareInterchangeability generally describes each exemplary composition and step according to function in the above description.These functions are studied carefullyUnexpectedly the specific application and design constraint depending on technical solution are implemented in hardware or software.Professional technicianEach specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceedThe scope of the present invention.
In several embodiments provided by the present invention, it should be understood that disclosed unit and method, it can be withIt realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unitIt divides, only logical function partition, there may be another division manner in actual implementation, can also will be with the same functionUnit set is at a unit, such as multiple units or components can be combined or can be integrated into another system or someFeature can be ignored, or not execute.In addition, shown or discussed mutual coupling, direct-coupling or communication connection canBe through some interfaces, the indirect coupling or communication connection of device or unit, be also possible to electricity, mechanical or other shapesFormula connection.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unitThe component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multipleIn network unit.Some or all of unit therein can be selected to realize the embodiment of the present invention according to the actual needsPurpose.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unitIt is that each unit physically exists alone, is also possible to two or more units and is integrated in one unit.It is above-mentioned integratedUnit both can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent productWhen, it can store in one storage medium.Based on this understanding, technical solution of the present invention is substantially in other words to existingThe all or part of part or the technical solution that technology contributes can be embodied in the form of software products, shouldComputer software product is stored in a storage medium, including some instructions are used so that a computer equipment (can bePersonal computer, server or network equipment etc.) execute all or part of step of each embodiment the method for the present inventionSuddenly.And storage medium above-mentioned include: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), magnetic disk orThe various media that can store program code such as person's CD.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, anyThose familiar with the art in the technical scope disclosed by the present invention, can readily occur in various equivalent modifications or replaceIt changes, these modifications or substitutions should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with rightIt is required that protection scope subject to.