CN111163332A

Movatterモバイル変換

Info

Publication number: CN111163332A
Application number: CN202010018646.8A
Authority: CN
Inventors: 刘露; 闵梁
Original assignee: Shenzhen Inveno Technology Co ltd
Current assignee: Shenzhen Inveno Technology Co ltd
Priority date: 2020-01-08
Filing date: 2020-01-08
Publication date: 2020-05-15

Abstract

The invention provides a video pornography detection method, a terminal and a medium, wherein an AI yellow identification model is established according to historical identification data; acquiring a video to be detected in a task stream; detecting whether the video to be detected is yellow-related or not by using the AI yellow identification model; if not, executing the task flow; if so, detecting whether the task flow needs to be terminated, if so, terminating the task flow, and if not, executing the task flow. According to the method, the video frames are identified one by utilizing an AI yellow identification model, and finally all identification results are summarized to detect the pornography of the video to be detected, so that the accuracy of identifying the network obscene social emotion video is improved.

Description

Video pornography detection method, terminal and medium

Technical Field

The invention belongs to the technical field of computers, and particularly relates to a video pornography detection method, a terminal and a medium.

Background

In the information age in which the internet has been rapidly developed, good and bad information is enriched in the network, and thus network obscene pornography information is born with the birth of the network. The network obscene pornography information is spread to all corners of the world through various carriers, and directly or indirectly endangers netizens. Because of the convenient carrier of the network, the obscene pornography information continues to live in the seams under the enclosure of various laws, regulations and measures and has the trend of becoming more and more intense, so the identification of the obscene pornography information is urgent.

In the current information explosion era, the information released on the network is numerous, and the identification of obscene pornographic information by manpower in the information sea has little effect. Therefore, relevant information mining and identification technologies must be applied to screening and identification, assisted by manual identification.

At present, a technology capable of completely identifying the obscene pornography information of the network does not exist, and the technology updating often cannot keep pace with the updating of the transmission mode along with the continuous change and updating of the transmission channel and method. Moreover, because the technical means are realized by algorithms, the method is far less than the human brain in discrimination rate, the problems of wrong judgment and missed judgment cannot be avoided, and finally, the judgment is carried out by manpower, so that the labor cost is high.

The existing identification technology of the network obscene social video mainly comprises a key frame technology. The key frame technique is the most primitive one of pornography identification techniques. The technology carries out random frame interception on a video, then identifies whether a screenshot is a pornographic image, a section of the video is provided with a plurality of detection points, the detection points are fully distributed in the whole video and cannot be independently distributed in a certain time period, so that pornographic fragments in the pornographic video are effectively prevented from being independently distributed in a certain time to cause that the pornographic fragments cannot be detected, the key of the technology is also dependent on the identification technology of the pornographic image, the current key frame technology adopts a mode of timing a time window for extraction, namely, the extraction is carried out once every a plurality of times, the extracted video frames are evenly distributed on each time point of the video, if the time window is too long, the loss of the key frames is easily caused, if the time window is too short, the extraction of the video frames is too many, the extraction times of the video frames are too long, and more interference factors are generated for.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a video pornography detection method, a terminal and a medium, which improve the accuracy of network obscene social emotion video identification.

In a first aspect, a method for detecting pornography of a video includes the following steps:

an AI yellow identification model is constructed according to historical identification data;

acquiring a video to be detected in a task stream;

detecting whether the video to be detected is yellow-related or not by using the AI yellow identification model; if not, executing the task flow; if so, detecting whether the task flow needs to be terminated, if so, terminating the task flow, and if not, executing the task flow.

Preferably, the building of the AI yellow identification model according to the historical identification data specifically includes:

setting a model structure;

an AI yellow identification model is built according to the model structure by using a deep neural network;

and training a built AI yellow identification model by using the historical identification data.

Preferably, the detecting whether the video to be detected is yellow-related by using the AI yellow identification model specifically includes:

intercepting a plurality of video frames from the video to be detected;

detecting whether an image corresponding to each video frame is yellow-related or not by using the AI yellow-identifying model so as to obtain a frame detection result;

and judging whether the video to be detected is yellow-related or not according to the frame detection results of all video frames in the video to be detected.

Preferably, when the model structure is a regression model, the AI yellow identification model is used to detect whether an image corresponding to each video frame is yellow-related or not, so as to obtain a frame detection result; judging whether the video to be detected is yellow-related according to the frame detection results of all video frames in the video to be detected specifically comprises the following steps:

the AI yellow identification model is utilized to score the image corresponding to each video frame;

defining the highest score of all video frames in the video to be detected as the score of the video to be detected so as to obtain the confidence coefficient of the video to be detected;

and obtaining the yellow-related grade of the video to be detected according to the confidence level of the video to be detected and a preset confidence grade.

Preferably, when the model structure is a classification model, the AI yellow identification model is used to detect whether an image corresponding to each video frame is yellow-related or not, so as to obtain a frame detection result; judging whether the video to be detected is yellow-related according to the frame detection results of all video frames in the video to be detected specifically comprises the following steps:

classifying images corresponding to each video frame by using the AI yellow identification model into two types of yellow-related images and non-yellow-related images;

when all the images corresponding to all the video frames in the video to be detected are not yellow, defining that the video to be detected is not yellow; otherwise, defining the video to be detected to be yellow-related.

Preferably, the method for identifying the image corresponding to each video frame by the AI yellow identification model comprises the following steps:

when the image is a color image, judging whether the image is yellow-related or not by detecting skin color and texture in the image;

and when the image is a black-and-white image, detecting whether a preset sensitive part exists in the image, and if so, defining that the image is yellow.

Preferably, when the model structure is a classification model and a regression model, the AI yellow identification model is used to detect whether an image corresponding to each video frame is yellow-related or not, so as to obtain a frame detection result; judging whether the video to be detected is yellow-related according to the frame detection results of all video frames in the video to be detected specifically comprises the following steps:

scoring the image corresponding to each video frame by using a regression model;

classifying the image corresponding to each video frame by using a classification model;

calculating a weighted score by using the scores and the categories of all video frames in the video to be detected according to a preset weighting rule;

and classifying the videos to be detected according to the weighted scores, wherein the videos are classified into two types of yellow-related videos and non-yellow-related videos.

Preferably, the intercepting a plurality of video frames from the video to be detected specifically includes:

when a video frame is intercepted from the video to be detected, calculating the similarity between the image of the current video frame and the previously intercepted image by utilizing a perceptual hash algorithm;

and when the similarity is higher than a preset threshold value, discarding the image of the current video frame.

In a second aspect, a terminal comprises a processor, an input device, an output device, and a memory, the processor, the input device, the output device, and the memory being interconnected, wherein the memory is configured to store a computer program, the computer program comprising program instructions, and the processor is configured to invoke the program instructions to perform the method of the first aspect.

In a third aspect, a computer-readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of the first aspect.

According to the technical scheme, the video pornography detection method, the terminal and the medium provided by the invention have the advantages that the AI yellow identification model is utilized to identify the video frames one by one, and finally, all identification results are summarized to detect the pornography of the video to be detected. The method also utilizes opencv to read the video frames at fixed intervals, utilizes perceptual hash to repeat the frames, and determines the pornographic degree grade of the video to be detected according to the set grade interval, thereby improving the accuracy of identifying the network obscene social video.

Drawings

In order to more clearly illustrate the detailed description of the invention or the technical solutions in the prior art, the drawings that are needed in the detailed description of the invention or the prior art will be briefly described below. Throughout the drawings, like elements or portions are generally identified by like reference numerals. In the drawings, elements or portions are not necessarily drawn to scale.

Fig. 1 is a flowchart of a video pornography detection method according to an embodiment of the present invention.

Fig. 2 is a flowchart of an AI yellow identification model construction method according to an embodiment of the present invention.

Fig. 3 is a flowchart of an AI yellow-identifying model detection method according to the second embodiment of the present invention.

Fig. 4 is a flowchart of a regression model detection method according to a second embodiment of the present invention.

Fig. 5 is a flowchart of a classification model detection method according to a second embodiment of the present invention.

Fig. 6 is a flowchart of two AI yellow identification model detection methods provided in the third embodiment of the present invention.

Fig. 7 is a flowchart of a video frame capturing method according to a third embodiment of the present invention.

Fig. 8 is a block diagram of a terminal according to a fourth embodiment of the present invention.

Detailed Description

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and therefore are only examples, and the protection scope of the present invention is not limited thereby. It is to be noted that, unless otherwise specified, technical or scientific terms used herein shall have the ordinary meaning as understood by those skilled in the art to which the invention pertains.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

The first embodiment is as follows:

a method for detecting the pornography of a video, referring to fig. 1, comprising the following steps:

acquiring a video to be detected in a task stream;

Specifically, the method includes the steps of firstly training an AI yellow identification model by using historical identification data (namely historical images and judgment results of whether the images are yellow-related), and carrying out yellow-related identification on a new video to be detected by the aid of the trained AI yellow identification model. And if the video to be detected does not relate to yellow (namely, the video to be detected does not contain obscene pornography information), executing the task stream, and performing subsequent transcoding, watermarking or screenshot and other operations on the video to be detected. If the video to be detected is yellow (containing obscene pornography information) and the task flow needs to be terminated, the task flow is terminated. Otherwise, executing the task stream, and performing subsequent operations such as transcoding, watermarking or screenshot on the video to be detected.

In practical application, when the AI yellow identification model cannot accurately judge whether the video to be detected is yellow, the video to be detected needs to be marked and identified manually. According to the method, the video frames are identified one by utilizing an AI yellow identification model, and finally all identification results are summarized to detect the pornography of the video to be detected, so that the accuracy of identifying the network obscene social emotion video is improved.

Referring to fig. 2, the building of the AI yellow identification model according to the historical identification data specifically includes:

setting a model structure;

Specifically, the model structure may be set according to the requirements of the user's own service. The historical authentication data can be obtained by using an open source data set, or the user collects data more suitable for self service. Because the AI yellow-identifying model can be stored in various file formats, when the established AI yellow-identifying model is stored, the file format most suitable for the business needs to be selected according to the business flow of the method for storing, so that the AI yellow-identifying model can be better integrated into the business.

When an AI yellow identification model is constructed, firstly a model structure is designed, then the AI yellow identification model is constructed according to the designed model structure by using a deep neural network, and finally the constructed AI yellow identification model is trained by using a large amount of historical identification data.

In the image recognition process, the method can adopt the following network models: convolutional neural networks and image semantic segmentation algorithms.

The convolutional neural network is a neural network composed of various algorithm layers, and dimension reduction of mass image data can be realized through calculation of the layers, so that the aim of identifying images is fulfilled finally. For the transmitted image, in order to determine whether the image is yellow, three steps are required to be completed: firstly, a convolution kernel (which can also be regarded as a filter) with a proper size is selected to perform convolution operation on the image, and each small region is filtered out, so that the characteristic value of the filtered small region can be obtained in real training. And secondly, pooling (namely downsampling) is carried out, the original image is continuously reduced to a small size, a feature map is obtained, and data dimensionality is reduced. Finally, the up-sampling (deconvolution) is similar to the convolution, the operation methods are multiplication and addition, and the up-sampling aims to amplify and restore the image. Thus, the effect and the purpose of automatically removing the yellow-related image can be realized.

The image semantic segmentation algorithm is just like PS matting, and the border of each object needs to be attached as much as possible, so that after the understanding of the image pixel level is completed, labeling and classification are carried out on each pixel point in the image. Therefore, for the yellow-related image transmitted and propagated on the network, semantic segmentation needs to be performed on the image to complete segmentation, labeling and understanding of the image. The implementation of image semantic segmentation mainly comprises a threshold value method, a pixel clustering method and an image edge segmentation method, and on segmentation of a yellow-related image, a front background, a rear background, image texture information and edge information can be obtained by using the methods, so that an expected segmentation result is finally obtained. And finally, performing image recognition by using a statistical method, a structural method, a neural network method, a template matching method and a set transformation method.

Example two:

the second embodiment is added with the following contents on the basis of the first embodiment:

referring to fig. 3, the detecting whether the video to be detected is yellow-related by using the AI yellow identification model specifically includes:

intercepting a plurality of video frames from the video to be detected;

Specifically, when the AI yellow-identification model is used, firstly a plurality of video frames are intercepted from the video to be detected (which may be randomly intercepted or intercepted at fixed intervals), then the AI yellow-identification model is used to judge whether each frame of image is yellow-related, and whether the video to be detected is yellow-related is judged according to the frame detection results of all the intercepted video frames. The method realizes the function of sequentially identifying each video frame by using the AI yellow identification model.

Model structures can be divided into two categories: classification models and regression models, which are different in practical use.

1. And (6) regression modeling.

Referring to fig. 4, when the model structure is a regression model, the AI yellow identification model is used to detect whether an image corresponding to each video frame is yellow-related or not, so as to obtain a frame detection result; judging whether the video to be detected is yellow-related according to the frame detection results of all video frames in the video to be detected specifically comprises the following steps:

Specifically, if the model structure is a regression model, the AI yellow-identifying model scores each image, and the higher the score, the higher the possibility of indicating that the image is yellow-related, i.e., the higher the confidence that the image is yellow-related. And after the AI yellow identification model scores each image, defining the highest score in all the images as the score of the whole video to be detected, namely the confidence coefficient of the yellow-related characteristic of the video to be detected, and finally determining the yellow-related grade of the video to be detected according to a preset confidence interval. The selection of the confidence interval needs to be obtained through experiments, different confidence intervals are set for grading in the experiments, grading conditions of the different confidence intervals are counted according to grading results, and the confidence interval with the best grading condition is selected to determine the yellow-related grade of the video.

2. And (5) classifying the models.

Referring to fig. 5, when the model structure is a classification model, the AI yellow identification model is used to detect whether an image corresponding to each video frame is yellow-related or not, so as to obtain a frame detection result; judging whether the video to be detected is yellow-related according to the frame detection results of all video frames in the video to be detected specifically comprises the following steps:

Specifically, if the model structure is a classification model, the AI yellow-identification model directly classifies each image into two categories, i.e., yellow-related and non-yellow-related. If all the intercepted video frames do not have yellow-related images, the video to be detected does not relate to yellow; and if at least one intercepted video frame is yellow-related, the video to be detected is yellow-related.

For the sake of brief description, the method provided by the embodiment of the present invention may refer to the corresponding contents in the foregoing method embodiments.

Example three:

example three on the basis of other examples, the following contents are added:

the method for identifying the image corresponding to each video frame by the AI yellow identification model comprises the following steps:

Specifically, the identification of yellow-related images also depends on skin color identification and texture identification. The detection method of skin color and texture is that an optical flow context histogram is utilized to train an identifier, each frame image of a video to be detected is intercepted, then optical flow feature extraction is carried out on the intercepted image, the extracted feature points are distributed in a concentric circle according to the size and the direction, then the optical flow histogram is calculated according to the optical flow feature points, and when the two kinds of detection report that the video to be detected is a pornographic video, the video can be roughly determined as the pornographic video.

Since the images are subdivided into color and black-and-white (or grayscale) images, for black-and-white (or grayscale) images, skin tone detection is not effective because there is no color. And the drawn character has no skin texture for cartoon, so the texture identification is also invalid.

Therefore, when performing yellow-related identification on an image, it is necessary to analyze the color of the image first and perform identification using an RGB color model or an HSV color model. If the image is a color image, whether the image is related to yellow or not is judged by monitoring skin color and texture in the image. If the image is a black-and-white (or gray) image, whether the image is yellow or not is judged through the detection of a sensitive part, if the sensitive part appears, the image is manually checked, and a plurality of samples can be used for training a model, so that the identification efficiency is improved.

Referring to fig. 6, when the model structures are a classification model and a regression model, the AI yellow-identification model is used to detect whether an image corresponding to each video frame is yellow-related or not, so as to obtain a frame detection result; judging whether the video to be detected is yellow-related according to the frame detection results of all video frames in the video to be detected specifically comprises the following steps:

Specifically, when the method classifies the video to be detected, if a regression model and a classification model are used at the same time, each video frame image respectively obtains a corresponding category and a corresponding score through the two models. And after all video frames are analyzed, calculating a weighting score by integrating the categories and the scores of all images, taking the weighting score as a yellowed confidence score of the video to be detected, and finally classifying the video according to the weighting score.

Referring to fig. 7, the intercepting a plurality of video frames from the video to be detected specifically includes:

Specifically, when a video frame is intercepted, the method calculates the similarity between the current video frame image and the previously intercepted image according to a perceptual hash algorithm, and if the similarity between the current video frame image and the previously intercepted image is higher than a threshold value, which indicates that the similar image is intercepted by the current video frame image, the current intercepted video frame is discarded; if the similarity is lower than the threshold value, the similar image is not intercepted, and the image of the current video frame is saved.

Example four:

a terminal, see fig. 8, comprising aprocessor 801, aninput device 802, anoutput device 803 and amemory 804, theprocessor 801, theinput device 802, theoutput device 803 and thememory 804 being interconnected via abus 805, wherein thememory 804 is adapted to store a computer program comprising program instructions, theprocessor 801 being configured to invoke the program instructions to perform the method described above.

It should be understood that in the present embodiment, theProcessor 801 may be a Central Processing Unit (CPU), and the Processor may be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Theinput device 802 may include a touch pad, a fingerprint sensor (for collecting fingerprint information of a user and direction information of the fingerprint), a microphone, etc., and theoutput device 803 may include a display (LCD, etc.), a speaker, etc.

Thememory 804 may include both read-only memory and random access memory, and provides instructions and data to theprocessor 801. A portion of thememory 804 may also include non-volatile random access memory. For example, thememory 804 may also store device type information.

For a brief description, the embodiment of the present invention may refer to the corresponding content in the foregoing method embodiments.

Example five:

a computer-readable storage medium, in which a computer program is stored, the computer program comprising program instructions which, when executed by a processor, cause the processor to carry out the above-mentioned method.

The computer readable storage medium may be an internal storage unit of the terminal according to any of the foregoing embodiments, for example, a hard disk or a memory of the terminal. The computer readable storage medium may also be an external storage device of the terminal, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the terminal. Further, the computer-readable storage medium may also include both an internal storage unit and an external storage device of the terminal. The computer-readable storage medium is used for storing the computer program and other programs and data required by the terminal. The computer readable storage medium may also be used to temporarily store data that has been output or is to be output.

For the sake of brief description, the media provided by the embodiments of the present invention, and the portions of the embodiments that are not mentioned, refer to the corresponding contents in the foregoing method embodiments.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.

Claims

1. A video pornography detection method is characterized by comprising the following steps:

acquiring a video to be detected in a task stream;

2. The method for detecting video pornography as recited in claim 1, wherein said constructing an AI yellow identifier model based on historical identification data specifically comprises:

setting a model structure;

3. The method for detecting the pornography of a video according to claim 2, wherein the detecting whether the video to be detected is yellow-related by using the AI yellow identification model specifically includes:

intercepting a plurality of video frames from the video to be detected;

4. The method according to claim 3, wherein when the model structure is a regression model, the AI yellow-identification model is used to detect whether the image corresponding to each video frame is yellow-associated to obtain a frame detection result; judging whether the video to be detected is yellow-related according to the frame detection results of all video frames in the video to be detected specifically comprises the following steps:

5. The method according to claim 3, wherein when the model structure is a classification model, the AI yellow-identification model is used to detect whether an image corresponding to each video frame is yellow-associated to obtain a frame detection result; judging whether the video to be detected is yellow-related according to the frame detection results of all video frames in the video to be detected specifically comprises the following steps:

6. The method for detecting video pornography as recited in claim 5, wherein the method for identifying the corresponding image of each video frame by the AI yellow-identifying model comprises:

7. The video pornography detection method of claim 5, wherein,

when the model structures are a classification model and a regression model, detecting whether an image corresponding to each video frame is yellow-related by using the AI yellow-identifying model to obtain a frame detection result; judging whether the video to be detected is yellow-related according to the frame detection results of all video frames in the video to be detected specifically comprises the following steps:

8. The method according to claim 3, wherein the capturing a plurality of video frames from the video to be detected specifically comprises:

9. A terminal, comprising a processor, an input device, an output device, and a memory, the processor, the input device, the output device, and the memory being interconnected, wherein the memory is configured to store a computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method of any of claims 1-8.

10. A computer-readable storage medium, characterized in that the computer storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to perform the method according to any of claims 1-8.