CN114241374B

Movatterモバイル変換

Info

Publication number: CN114241374B
Application number: CN202111530235.8A
Authority: CN
Inventors: 宋腾飞; 邢浩强; 邓天生; 于天宝; 贠挺; 陈国庆; 林赛群
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-12-14
Filing date: 2021-12-14
Publication date: 2022-12-13
Anticipated expiration: 2041-12-14
Also published as: CN114241374A

Abstract

The disclosure provides a live broadcast processing model training method and device and a live broadcast processing method and device, and relates to the field of artificial intelligence, in particular to the field of computer vision and deep learning. The specific implementation scheme is as follows: extracting candidate sample images from the live images; processing the candidate sample image by adopting a live broadcast processing model to obtain a processing result of the candidate sample image; the live broadcast processing model comprises a live broadcast detection model and a live broadcast classification model, and the processing result of the candidate sample image comprises the detection result and the classification result of the candidate sample image; and determining a target sample image from the candidate sample images according to the processing result of the candidate sample images, and training the live broadcast processing model by adopting the target sample image. The present disclosure can improve training efficiency and recognition accuracy of live-cast processing models.

Description

Training method of live broadcast processing model, live broadcast processing method, device and equipment

Technical Field

The present disclosure relates to the field of artificial intelligence, and more particularly to the field of computer vision and deep learning.

Background

With the continuous development of the internet and communication networks, live webcasting is more and more popular, and various requirements of users on entertainment, shopping and the like are met. Due to the irregularity of the network live content, various bad and harmful live broadcasts often occur, and great physical and psychological influences are brought to viewers.

Therefore, how to timely discover and process the illegal live broadcasts is an important problem in the industry.

Disclosure of Invention

The present disclosure provides a live broadcast processing model training method, a live broadcast processing device, and a live broadcast processing apparatus, so as to improve the training efficiency and the recognition accuracy of the live broadcast processing model.

According to an aspect of the present disclosure, there is provided a method for training a live broadcast processing model, including:

extracting candidate sample images from the live images;

processing the candidate sample image by adopting a live broadcast processing model to obtain a processing result of the candidate sample image; the live broadcast processing model comprises a live broadcast detection model and a live broadcast classification model, and the processing result of the candidate sample image comprises the detection result and the classification result of the candidate sample image;

and determining a target sample image from the candidate sample images according to the processing result of the candidate sample images, and training the live broadcast processing model by adopting the target sample image.

According to another aspect of the present disclosure, there is provided a live broadcast processing method, including:

extracting a target image to be detected from the live image;

detecting the target image by adopting a live broadcast detection model in a live broadcast processing model to obtain a detection result of the target image, and classifying the target image by adopting a live broadcast classification model in the live broadcast processing model to obtain a classification result of the target image;

determining a live broadcast processing result of the target image according to the detection result of the target image and/or the classification result of the target image;

the live broadcast processing model is obtained by training through the training method of the live broadcast processing model provided by any embodiment of the disclosure.

According to another aspect of the present disclosure, there is provided a training apparatus for broadcasting a process model, the apparatus including:

the candidate sample image extraction module is used for extracting candidate sample images from the live broadcast images;

the candidate sample image processing module is used for processing the candidate sample image by adopting a live broadcast processing model to obtain a processing result of the candidate sample image; the live broadcast processing model comprises a live broadcast detection model and a live broadcast classification model, and the processing result of the candidate sample image comprises the detection result and the classification result of the candidate sample image;

and the model training module is used for determining a target sample image from the candidate sample images according to the processing result of the candidate sample images and training the live broadcast processing model by adopting the target sample image.

According to another aspect of the present disclosure, there is provided a live broadcast processing apparatus, the apparatus including:

the target image extraction module is used for extracting a target image to be detected from the live broadcast image;

the target image processing module is used for detecting the target image by adopting a live broadcast detection model in a live broadcast processing model to obtain a detection result of the target image, and classifying the target image by adopting a live broadcast classification model in the live broadcast processing model to obtain a classification result of the target image;

the processing result determining module is used for determining a live broadcast processing result of the target image according to the detection result of the target image and/or the classification result of the target image;

the live broadcast processing model is determined by a training device of the live broadcast processing model provided by any embodiment of the disclosure.

According to another aspect of the present disclosure, there is provided an electronic apparatus including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a live processing method or a training method of a live processing model provided by any embodiment of the disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a live processing method or a training method of a live processing model provided in any of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a live processing method or a training method of a live processing model provided in any of the embodiments of the present disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a schematic diagram of a training method of a live broadcast processing model according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram of a training method of a live broadcast processing model according to yet another embodiment of the present disclosure;

fig. 3 is a schematic diagram of a live broadcast processing method according to yet another embodiment of the present disclosure;

fig. 4 is a schematic diagram of a live broadcast processing method according to another embodiment of the present disclosure;

fig. 5 is a device diagram of a training device of a live broadcast processing model according to yet another embodiment of the present disclosure;

fig. 6 is a device diagram of a live broadcast processing device according to yet another embodiment of the present disclosure;

FIG. 7 is a block diagram of an electronic device for implementing the methods described in embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a flowchart of training of a live broadcast processing model according to an embodiment of the present disclosure, where the present embodiment is applicable to a case where a model is trained using a sample image, and the method may be performed by a live broadcast processing model training apparatus, which may be implemented in software and/or hardware. The device can be configured in an electronic device with corresponding data processing capability, and the method specifically comprises the following steps:

and S110, extracting candidate sample images from the live images.

Wherein the live image may be an image that determines that a bad image frame already exists. The candidate sample image may be a frame image corresponding to the live view.

Specifically, the live broadcast image is framed based on a preset frame interval or the frame rate of the live broadcast image, and a plurality of obtained image frame candidate sample images are obtained. It should be noted that the proportion of the bad image frame in the candidate sample image may be one hundred percent, or may be other proportions, and the bad image frame is determined according to a specific live video, and other healthy image frames besides the live video may also be used as candidate sample images to verify the processing capability after the live processing model is trained.

S120, processing the candidate sample image by adopting a live broadcast processing model to obtain a processing result of the candidate sample image; the live broadcast processing model comprises a live broadcast detection model and a live broadcast classification model, and the processing result of the candidate sample image comprises the detection result and the classification result of the candidate sample image.

The live broadcast detection model can be constructed based on a neural network detection algorithm, and by extracting texture and position features in the image and then adopting a classification and regression mode, the regions in the picture are classified and detected, so that whether the exposure of the preset human body adverse region exists in the current picture is identified. The live broadcast classification model can be constructed based on a neural network classification algorithm, and the harmful category of the current image pair is judged by extracting texture and semantic features in the image and then classifying the features by using a feature classifier.

Specifically, the extracted candidate sample images are labeled through a labeling tool, and during labeling, the category of the whole candidate sample image is labeled for training a classification model; target frames and frame category labels of specific contents in the candidate sample images are also provided for training the detection model. For example, for a candidate image frame which is completely normal for the clothing but whose specific parts are clearly uncovered, a target box is marked on the specific parts uncovered in the candidate image frame as a detection data set; and for candidate image frames which have no obvious physical characteristics but have low semantics and are used for tempting postures and actions, marking the whole image as a classification data set. After a classification data set corresponding to the live broadcast classification model and a detection data set corresponding to the detection model are obtained, the data sets are input into the corresponding models for processing, and two model output results of each candidate sample image are obtained. For the output result of the classification model, each candidate sample image can have three confidence scores, namely a normal harmful confidence score, a high-risk harmful confidence score and a low-risk harmful confidence score, and the sum of the three confidence scores is fixed; as for the output result of the inspection model, each of the candidate sample images may exist in different categories on account of the difference in exposure of the defective region, and further, on the basis of the defective category, the confidence level of the current sample image in at least one harmful category is given, based on the difference in exposure of the defective region.

S130, determining a target sample image from the candidate sample images according to the processing result of the candidate sample images, and training the live broadcast processing model by adopting the target sample image.

The target sample image is a sample image of which the classification result and the detection result meet preset conditions in the candidate sample image.

Specifically, the images that are not detected by both models are considered as normal images and can be discarded from the corresponding data sets, and if the candidate sample images respectively detected by both models are greater than a certain confidence criterion, the candidate sample images are considered as target sample images, and the candidate sample images simultaneously detected by both models are also considered as target sample images. The candidate sample images are detected and classified through the processing model, the target sample images with certain recognition accuracy are obtained, in the process, the images are marked and recognized through the classification and detection model, the target samples in the candidate samples are determined, manual operation in the process of determining the target samples is reduced, and model training efficiency is improved.

According to the method, after the candidate sample images are extracted from the live broadcast images, the candidate image categories are identified and labeled in a model iteration mode based on the live broadcast detection model and the live broadcast classification model, so that the target sample images required by model training are obtained, the model training efficiency and the identification accuracy are improved, and a large amount of human resources are saved.

On the basis of the above disclosure, optionally, the determining, according to the processing result of the candidate sample image, a target sample image from the candidate sample image, and training the live broadcast processing model by using the target sample image includes:

under the condition that the processing result of any candidate sample image is wrong, adopting the candidate sample image and the processing result of the candidate sample image to construct a negative sample; and training the live broadcast processing model by adopting the negative sample.

The negative sample may be a false detection image that should be determined as the target sample image but is actually false-judged as the non-target sample image and discarded, or a false detection image that should be determined as the non-target sample image but is actually false-judged as the target sample image and retained.

Specifically, in the model learning process, a situation may occur in which a positive sample is predicted as a negative sample, and a negative sample is predicted as a positive sample. Therefore, in the method, the false detection image is analyzed in the model iteration process, and corresponding negative sample data or positive sample data is added in a targeted mode aiming at the false detection image, so that the situation of partial false detection can be effectively reduced. The positive and negative sample data added in a directional way come from the images of the corresponding detected or false-detected classes of the detection or classification model, and can be added into the training set after certain adjustment. By adopting the false detection image as the negative sample, the false detection rate of the model is effectively reduced.

Referring to fig. 2, the method includes:

s210, extracting candidate sample images from the live images.

S220, processing the candidate sample image by adopting a live broadcast processing model to obtain a processing result of the candidate sample image; the live broadcast processing model comprises a live broadcast detection model and a live broadcast classification model, and the processing result of the candidate sample image comprises the detection result and the classification result of the candidate sample image.

S230A, in the detection result of any candidate sample image, when the confidence of the bad category to which the bad region belongs in the candidate sample image is greater than a first confidence threshold, and the confidence of the bad category of the candidate sample image in the classification result of the candidate sample image is greater than a second confidence threshold, the candidate sample image is taken as the target sample image.

Wherein the poor region is a body region which causes an adverse effect when exposed, such as sex organs and the like. The harmful category may determine a degree of adverse effect according to a body part to which the body region belongs, an exposure degree of the body region, identity information of a person to which the body region belongs, and the like, and divide the harmful region into a low-risk type harmful category and a high-risk type harmful category according to the degree of adverse effect. For example, 14 bad categories may be preset, of which 7 may be classified into high-risk bad categories, and the other 7 may be classified into low-risk bad categories.

Specifically, the candidate sample image is detected based on the detection model to obtain a naked poor region of the current candidate sample image, and then the type and the confidence coefficient of the poor region in the current candidate sample image are determined according to the position and the size of the poor region, it should be noted that the confidence coefficient score is related to the position, the size and the human body characteristics of the poor region at the same time. The harmful category confidence greater than the second confidence threshold may be that the confidence threshold corresponding to any category of the low-risk harmful confidence or the high-risk harmful confidence is greater than the second confidence threshold, or the sum of the confidence of the low-risk harmful confidence and the confidence of the high-risk harmful confidence is greater than the second confidence threshold, and the specific setting mode may be determined according to the requirement. By combining the confidence coefficient of the bad categories in the detection result and the confidence coefficient of the harmful results in the classification result, the accurate screening of the target sample image is realized.

Specifically, in order to improve the target sample image determination efficiency, if the confidence of the bad type in the detection result is greater than the third confidence threshold, the candidate sample image which is the target sample image can be directly determined, and auxiliary judgment in combination with the classification result is not required. It should be noted that the third confidence threshold should be greater than the first confidence threshold that needs to be further determined by combining the classification result, so that the screening accuracy is ensured while the determination efficiency of the target sample image is improved. And when the confidence of the bad category to which the bad region belongs in the candidate sample image is smaller than the third confidence threshold but larger than the fourth confidence threshold, the confidence of the bad category is at a middle position, and the bad region is directly determined as the target sample image or discarded, which may be greatly different from the actual situation, so that the candidate sample image satisfying the confidence interval needs to be further subjected to supplementary annotation verification, and after the annotation verification is completed, whether the candidate sample image is the target sample image or not is determined. By reasonably setting the third confidence coefficient threshold value and the fourth confidence coefficient threshold value and comparing the third confidence coefficient threshold value and the fourth confidence coefficient threshold value with the candidate sample image, the normal candidate sample image is prevented from being wrongly identified as the illegal target sample image on the basis of ensuring that the illegal image can be normally identified.

For example, for the annotation verification, the candidate sample image corresponding to the question may be input into the live broadcast detection model again, and if the confidence of the poor type of the second detection result is still less than the third confidence threshold and greater than the fourth confidence threshold, it is indicated that the candidate image cannot be normally identified, and the annotation verification may be performed on the current candidate sample image manually. Before the manual labeling verification, a set of the current suspicious candidate sample images can be tested, the recall accuracy of the set is determined, and the labeling verification of the current candidate sample images in the set can be triggered under the condition that the accuracy is equal to or greater than an accuracy threshold; in the case that the accuracy is less than the accuracy threshold, the set may be discarded without performing a labeling verification. The candidate sample images for the marking verification are screened according to the accuracy, so that the workload of the manual marking verification can be further reduced, and the success rate of the manual marking verification is improved.

In this disclosure, optionally, after the candidate sample image is taken as the target sample image, the method further includes:

matching the bad categories to which the bad areas in the candidate sample image belong with a preset incidence relation, and determining the bad categories to which the candidate sample image belongs; the preset association relationship is an association relationship between a candidate bad category of the live broadcast detection model and a candidate harmful category of the live broadcast classification model; and taking the defective region in the candidate sample image and the defective category to which the defective region belongs as detection marking data of the candidate sample image, and taking the defective category to which the candidate sample image belongs as classification marking data of the candidate sample image.

Specifically, the target sample image is determined from the candidate sample image directly based on the bad type and the bad confidence, so that the harmful type of the current target sample image in the classification result cannot be determined, the subsequent target sample image cannot effectively train the classification model, and only the detection model can be effectively trained. The method comprises the steps of establishing a certain incidence relation between candidate adverse categories of a live broadcast detection model and candidate adverse categories of a live broadcast classification model in advance, for example, exposing the adverse categories to some adverse areas with larger adverse effects, once determining the adverse categories of a candidate sample image as the categories and directly determining the candidate sample image as a target sample image only according to a detection result, determining the adverse categories corresponding to the sample image as high-risk types; similarly, when the defective region having a small adverse effect is exposed to the defective category, the defective category is determined to be of the low-risk type. After bad and harmful categories of the target image are obtained, detection marking data and classification identification data are respectively generated and stored in association with the current target sample image, so that the target sample image can be used for training a live broadcast detection model and a live broadcast classification model at the same time, sample data is expanded, and the processing efficiency of the corresponding model is improved.

S240, training the live broadcast processing model by adopting the target sample image.

According to the method and the device, whether the candidate sample image is the target sample image or not is further determined according to the size relation between the confidence coefficient of the bad category to which the bad region belongs in the candidate sample image and the preset multiple confidence coefficient thresholds and the harmful category associated with the candidate sample image, the accurate distinguishing of the candidate target image is achieved, and the accuracy and the effectiveness of determining the target sample image from the candidate sample image are improved.

Fig. 3 is a flowchart of a live broadcast processing method according to yet another embodiment of the present disclosure, where this embodiment may be applied to a case where a live broadcast is processed by using a live broadcast processing model, and the method may be performed by a training apparatus of the live broadcast processing model, and the apparatus may be implemented in software and/or hardware. The device can be configured in an electronic device with corresponding data processing capability, and the method specifically comprises the following steps:

and S310, extracting a target image to be detected from the live image.

Specifically, frame extraction is carried out on the live broadcast images according to a preset time interval, the obtained frame images are uniformly zoomed according to a first preset size proportion, and the obtained frame images with the same size are used as target images to be detected after normalization processing.

S320, detecting the target image by adopting a live broadcast detection model in the live broadcast processing model to obtain a detection result of the target image, and classifying the target image by adopting a live broadcast classification model in the live broadcast processing model to obtain a classification result of the target image.

Specifically, the live broadcast detection model and the live broadcast classification model have different working principles, and the size requirements of the live broadcast detection model and the live broadcast classification model on the input target picture may be different. The first preset size set when extracting the target image may actually be one of the model image sizes that fits the two models most. Therefore, before the target image is processed by using the other model, the target image needs to be uniformly scaled to a second preset size in the same manner, and then input into the other model for processing after normalization, so that the two models can be ensured to operate in the optimal scene.

S330, determining a live broadcast processing result of the target image according to the detection result of the target image and/or the classification result of the target image; the live broadcast processing model is obtained by training by adopting the training method of the live broadcast processing model disclosed by the disclosed embodiment.

Specifically, the live broadcast processing result is determined based on at least one of the detection result and the classification result, so that the target image containing at least one of semantic information and a characteristic position can be effectively identified, and the current live broadcast processing result is determined according to the identification result of the target image. If the current target image is determined to be a harmful image, warning processing is carried out on the anchor, and stopping or ending processing or permanent number sealing processing can also be carried out on the anchor; if the image is a normal image, no processing is needed; if the target image cannot be effectively identified based on the classification result and the detection result, the current live broadcast processing result can be determined by manual participation.

This is disclosed handles the target image that comes from the live broadcast image through the live broadcast processing model that utilizes the preselection training, combines live broadcast classification model and live broadcast detection model to carry out the preliminary treatment to the server, has higher calling to right, and lower false retrieval, robustness are good, can effectively replace manual examination and check, use manpower sparingly.

Fig. 4 is a flowchart of a live broadcast processing method according to still another embodiment of the present disclosure, in this embodiment, on the basis of the foregoing embodiments, the "determining, according to the detection result of the target image and/or the classification result of the target image, the live broadcast processing result of the target image" is refined into "that, in the detection result of the target image, the bad category to which the bad region belongs is unique, and the bad category is a preset high-risk bad category, and in a case that the bad category is a preset high-risk bad category, the high-risk harmful confidence of the target image and the low-risk harmful confidence of the target image are extracted from the classification result of the target image; under the condition that the high-risk harmful confidence of the target image is greater than a fifth confidence threshold or the low-risk harmful confidence of the target image is greater than a sixth confidence threshold, determining that the target image belongs to a harmful image, and taking the harmful image as a live broadcast processing result of the target image or extracting the high-risk harmful confidence of the target image and the low-risk harmful confidence of the target image from a classification result of the target image; and under the condition that the sum of the high-risk harmful confidence coefficient and the low-risk harmful confidence coefficient of the target image is greater than a seventh confidence coefficient threshold value, determining that the target image belongs to a harmful image, and taking the target image as a live broadcast processing result of the target image or determining live broadcast processing results of the target image according to at least two bad categories under the condition that a bad region exists in a detection result of the target image and the bad region belongs to the at least two bad categories.

Referring to fig. 4, the method includes:

and S410, extracting a target image to be detected from the live image.

S420, detecting the target image by adopting a live broadcast detection model in the live broadcast processing model to obtain a detection result of the target image, and classifying the target image by adopting a live broadcast classification model in the live broadcast processing model to obtain a classification result of the target image.

S430A, under the condition that the bad category of the bad region in the detection result of the target image is unique and is a preset high-risk bad category, extracting a high-risk harmful confidence coefficient of the target image and a low-risk harmful confidence coefficient of the target image from the classification result of the target image; and under the condition that the high-risk harmful confidence coefficient of the target image is greater than a fifth confidence coefficient threshold value or the low-risk harmful confidence coefficient of the target image is greater than a sixth confidence coefficient threshold value, determining that the target image belongs to a harmful image and taking the harmful image as a live broadcast processing result of the target image.

Whether the adverse types are high-risk types or not can be determined based on the adverse area positions corresponding to the types, the exposure degree determination and the human body characteristics determination, such as the same exposure area size, and the adverse types corresponding to the adverse types and the human body characteristics are different in harm degrees at different positions, such as genitals and shoulders, wherein the adverse types can be high-risk types, and the adverse types can be non-high-risk types; the same upper body is completely uncovered, and the corresponding adverse category harm degrees of the male anchor and the female anchor are different due to different human characteristics.

Specifically, if the current bad type is determined to be a high-risk type and the bad type is single, it is determined that the target image is a potentially harmful image, and at this time, a classification result of the live broadcast classification model is obtained to assist in determination. Because the classification result of each target image has high-risk harmful confidence and low-risk harmful confidence, the target image can be rapidly qualified by combining the detection result and one of the two. It should be noted that the fifth confidence threshold should be set to be lower than the sixth confidence threshold, when the harmful type is a high-risk type, a lower confidence may be triggered, and when the harmful type is a low-risk type, a higher confidence may be triggered, and the relationship between the fifth confidence threshold and the sixth confidence threshold is set reasonably, so that the accuracy of determining the final harmful image can be ensured.

S430B, extracting a high-risk harmful confidence coefficient of the target image and a low-risk harmful confidence coefficient of the target image from the classification result of the target image; and under the condition that the sum of the high-risk harmful confidence coefficient and the low-risk harmful confidence coefficient of the target image is greater than a seventh confidence coefficient threshold value, determining that the target image belongs to a harmful image and taking the harmful image as a live broadcast processing result of the target image.

Specifically, similar to determining the target sample image based on the detection result only in the model training process, the harmful category of the target image may also be determined based on the classification result only in the present disclosure. And acquiring the sum of the high-risk harmful confidence and the low-risk harmful confidence of the current target image in the classification result, and if the sum of the high-risk harmful confidence and the low-risk harmful confidence is greater than a seventh confidence threshold, indicating that the semantically maximum probability of the target image is a harmful image. And determining a corresponding live broadcast processing result by taking the target image as a harmful image. Whether the target image is an illegal image or not is determined only according to the classification result, any bad area can be completely uncovered, but the target image with semantic violation in posture, action and the like can be effectively identified, and the identification effectiveness of various illegal conditions is improved.

And S440C, determining a live broadcast processing result of the target image according to the at least two bad categories when the bad areas exist in the detection result of the target image and the bad areas belong to the at least two bad categories.

Specifically, it is different from the above-described determination of the live broadcast processing result based on both the detection result and the classification result, only the classification result. The present disclosure provides a third approach: and determining the live broadcast processing result of the target image according to the detection result. If the current image has bad areas which belong to different bad categories, the target image is shown to have at least two different bad areas exposed, the current image can be directly judged to be an illegal image, and then a live broadcast processing result is determined. By directly qualifying the target images with a plurality of different types of bad areas, the determination efficiency of the illegal target images is improved without further auxiliary determination according to the classification result.

In addition, it should be noted that, the steps S430A, S430B, and S430C do not have a fixed execution order and priority, and one or more of them may be used according to specific requirements, which is not limited in this disclosure.

The method and the device have the advantages that the classification result and the detection result can be simultaneously utilized, or only one of the classification result and the detection result is utilized to identify the target image, so that even the target image with only harmful semantics and no exposed bad areas can be effectively identified, and the accuracy and the universality of live broadcast of different violation conditions are improved; by directly qualifying the target images with a plurality of different types of bad areas, the determination efficiency of the illegal target images is improved without further auxiliary determination according to the classification result.

Fig. 5 is a schematic diagram of a training apparatus for a live broadcast processing model according to yet another embodiment of the present disclosure, where the apparatus is configured in an electronic device with corresponding data processing capability, and is configured to implement a live broadcast processing model training method according to any embodiment of the present disclosure.

Referring to fig. 5, the apparatus includes:

a candidate sampleimage extraction module 510, configured to extract a candidate sample image from the live image;

a candidate sampleimage processing module 520, configured to process the candidate sample image by using a live broadcast processing model to obtain a processing result of the candidate sample image; the live broadcast processing model comprises a live broadcast detection model and a live broadcast classification model, and the processing result of the candidate sample image comprises the detection result and the classification result of the candidate sample image;

amodel training module 530, configured to determine a target sample image from the candidate sample images according to the processing result of the candidate sample images, and train the live broadcast processing model by using the target sample image.

The device and the module can execute the training method of the live broadcast processing model provided by any embodiment of the disclosure, and have the corresponding functional modules and beneficial effects of the execution method. Optionally, the model training module includes a first target sample image determining unit 531;

the first sample image determining unit 531 is configured to, in a detection result of any one of the candidate sample images, regard the candidate sample image as a target sample image when a confidence degree of a poor category to which a poor region belongs in the candidate sample image is greater than a first confidence degree threshold value, and a confidence degree of a poor category of the candidate sample image in a classification result of the candidate sample image is greater than a second confidence degree threshold value.

Optionally, the model training module includes a second target sample image determining unit 532 and a third target sample image determining unit 533;

the second target sample image determining unit 532 is configured to, in a detection result of any candidate sample image, take the candidate sample image as a target sample image when a confidence of a poor category to which a poor region belongs in the candidate sample image is greater than a third confidence threshold;

the third target sample image determining unit 533 is configured to generate a labeling verification task for any candidate sample image when, in a detection result of the candidate sample image, a confidence of a poor category to which a poor region belongs in the candidate sample image is smaller than the third confidence threshold and larger than a fourth confidence threshold; and determining whether the candidate sample image is the target sample image according to the labeling verification information of the candidate sample image.

Optionally, the model training module further includes an association determination unit 534 and a data annotation unit 535;

the association determining unit 534 is configured to match a bad category to which a bad region in the candidate sample image belongs with a preset association relationship, and determine a bad category to which the candidate sample image belongs; the preset association relationship is an association relationship between a candidate bad category of the live broadcast detection model and a candidate harmful category of the live broadcast classification model;

the data labeling unit 535 is configured to use the defective region and the defective category to which the defective region belongs in the candidate sample image as the detection labeling data of the candidate sample image, and use the defective category to which the candidate sample image belongs as the classification labeling data of the candidate sample image.

Optionally, the model training module further includes a sample expansion unit 536;

the sample expansion unit 536 is configured to, in a case where a processing result of any one of the candidate sample images is incorrect, construct a negative sample by using the candidate sample image and the processing result of the candidate sample image; and training the live broadcast processing model by adopting the negative sample.

The further described devices, modules and units may execute the live broadcast processing model training method provided by any embodiment of the present disclosure, and have corresponding functional modules and beneficial effects of the execution method.

Fig. 6 is a schematic diagram of a live broadcast processing apparatus according to yet another embodiment of the present disclosure, where the embodiment of the present disclosure is applicable to a situation where a live broadcast is processed by using a live broadcast processing model, and the apparatus is configured in an electronic device with corresponding data processing capability, and can implement a live broadcast processing method according to any embodiment of the present disclosure.

Referring to fig. 6, the apparatus includes:

a targetimage extraction module 610, configured to extract a target image to be detected from a live image;

a targetimage processing module 620, configured to detect the target image by using a live broadcast detection model in a live broadcast processing model to obtain a detection result of the target image, and classify the target image by using a live broadcast classification model in the live broadcast processing model to obtain a classification result of the target image;

a processingresult determining module 630, configured to determine a live broadcast processing result of the target image according to the detection result of the target image and/or the classification result of the target image; the live broadcast processing model is obtained by training by adopting the training method of the live broadcast processing model disclosed by the disclosed embodiment.

The device and the module can execute the live broadcast processing method provided by any embodiment of the disclosure, and have the corresponding functional modules and beneficial effects of the execution method. Optionally, the detection image processing module includes a first result determining unit 621;

the first result determining unit 621 is configured to, under the condition that the poor category to which the poor region belongs is unique in the detection result of the target image and the poor category is a preset high-risk poor category, extract a high-risk harmful confidence coefficient of the target image and a low-risk harmful confidence coefficient of the target image from the classification result of the target image;

and under the condition that the high-risk harmful confidence coefficient of the target image is greater than a fifth confidence coefficient threshold value or the low-risk harmful confidence coefficient of the target image is greater than a sixth confidence coefficient threshold value, determining that the target image belongs to a harmful image and taking the harmful image as a live broadcast processing result of the target image.

Optionally, the processing result determining module includes a second result determining unit 622;

the second result determining unit 622 is configured to extract a high-risk harmful confidence of the target image and a low-risk harmful confidence of the target image from the classification result of the target image;

and under the condition that the sum of the high-risk harmful confidence coefficient and the low-risk harmful confidence coefficient of the target image is greater than a seventh confidence coefficient threshold value, determining that the target image belongs to a harmful image and taking the harmful image as a live broadcast processing result of the target image.

The processing result determination module includes a third result determination unit 623;

the third result determining unit 623 is configured to determine a live broadcast processing result of the target image according to the at least two bad categories when a bad area exists in the detection result of the target image and the bad area belongs to the at least two bad categories.

The further described devices, modules and units can execute the live broadcast processing method provided by any embodiment of the present disclosure, and have corresponding functional modules and beneficial effects of the execution method.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and the like of the personal information of the related user all conform to the regulations of related laws and regulations, and do not violate the good custom of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 7 shows a schematic block diagram of an exampleelectronic device 700 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, thedevice 700 comprises acomputing unit 701, which may perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from astorage unit 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data required for the operation of thedevice 700 can also be stored. Thecomputing unit 701, theROM 702, and the RAM703 are connected to each other by abus 704. An input/output (I/O)interface 705 is also connected tobus 704.

Various components in thedevice 700 are connected to the I/O interface 705, including: aninput unit 706 such as a keyboard, a mouse, or the like; anoutput unit 707 such as various types of displays, speakers, and the like; astorage unit 708 such as a magnetic disk, optical disk, or the like; and acommunication unit 709 such as a network card, modem, wireless communication transceiver, etc. Thecommunication unit 709 allows thedevice 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

Computing unit 701 may be a variety of general purpose and/or special purpose processing components with processing and computing capabilities. Some examples of thecomputing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. Thecalculation unit 701 performs the respective methods and processes described above, such as a training method of a live broadcast processing model and/or a live broadcast processing method. For example, in some embodiments, the training method of the live processing model and/or the live processing method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such asstorage unit 707. In some embodiments, part or all of a computer program may be loaded onto and/or installed ontodevice 700 viaROM 702 and/orcommunications unit 709. When loaded into the RAM703 and executed by thecomputing unit 701, the computer program may perform one or more steps of the live processing model and/or the live processing method described above. Alternatively, in other embodiments, thecomputing unit 701 may be configured as a training method of a live processing model and/or a live processing method X by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.

It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A training method of a live broadcast processing model comprises the following steps:

extracting candidate sample images from the live images;

determining a target sample image from the candidate sample images according to the processing result of the candidate sample images, and training the live broadcast processing model by adopting the target sample image;

wherein the determining the target sample image from the candidate sample images according to the processing result of the candidate sample images comprises:

taking any candidate sample image as a target sample image under the condition that the confidence coefficient of the poor category to which the poor region belongs in the candidate sample image in the detection result of the candidate sample image is greater than a third confidence coefficient threshold value;

correspondingly, after determining the target sample image from the candidate sample images, the method further includes:

matching the bad category to which the bad area in the candidate sample image belongs with a preset incidence relation, and determining the bad category to which the candidate sample image belongs; the preset association relationship is an association relationship between a candidate bad category of the live broadcast detection model and a candidate harmful category of the live broadcast classification model;

taking the defective region in the candidate sample image and the defective category to which the defective region belongs as detection labeling data of the candidate sample image, and taking the defective category to which the candidate sample image belongs as classification labeling data of the candidate sample image;

and the detection labeling data and the classification labeling data are stored in association with the current target sample image.

2. The method of claim 1, wherein the determining a target sample image from the candidate sample images according to the processing results of the candidate sample images comprises:

and under the condition that the confidence coefficient of the poor category of the poor region in any candidate sample image in the detection result of the candidate sample image is greater than a first confidence coefficient threshold value, and the confidence coefficient of the poor category of the candidate sample image in the classification result of the candidate sample image is greater than a second confidence coefficient threshold value, taking the candidate sample image as a target sample image.

3. The method of claim 1, wherein the determining the target sample image from the candidate sample images according to the processing result of the candidate sample images comprises:

generating a labeling verification task for the candidate sample image under the condition that the confidence coefficient of the poor category to which the poor region belongs in the candidate sample image in the detection result of any candidate sample image is smaller than the third confidence coefficient threshold value and larger than a fourth confidence coefficient threshold value;

and determining whether the candidate sample image is the target sample image according to the labeling verification information of the candidate sample image.

4. The method of claim 1, wherein the determining a target sample image from the candidate sample images according to the processing result of the candidate sample images and training the live processing model with the target sample image comprises:

under the condition that the processing result of any candidate sample image is wrong, adopting the candidate sample image and the processing result of the candidate sample image to construct a negative sample;

and training the live broadcast processing model by adopting the negative sample.

5. A live broadcast processing method, comprising:

extracting a target image to be detected from the live image;

the live broadcast processing model is obtained by training by adopting the training method of the live broadcast processing model as claimed in any one of claims 1-4.

6. The method according to claim 5, wherein the determining a live broadcast processing result of the target image according to the detection result of the target image and/or the classification result of the target image comprises:

under the condition that the bad category of the bad region in the detection result of the target image is unique and is a preset high-risk bad category, extracting a high-risk harmful confidence coefficient of the target image and a low-risk harmful confidence coefficient of the target image from the classification result of the target image;

7. The method of claim 5, wherein determining a result of live processing of a target image according to the classification result of the target image comprises:

extracting a high-risk harmful confidence coefficient of the target image and a low-risk harmful confidence coefficient of the target image from the classification result of the target image;

8. The method of claim 5, wherein determining a live processing result of a target image according to the detection result of the target image comprises:

and determining a live broadcast processing result of the target image according to the at least two bad categories when the bad area exists in the detection result of the target image and the bad area belongs to the at least two bad categories.

9. A training apparatus for a live broadcast processing model, comprising:

the model training module is used for determining a target sample image from the candidate sample images according to the processing result of the candidate sample images and training the live broadcast processing model by adopting the target sample image;

the model training module comprises a second target sample image determining unit, a judging unit and a judging unit, wherein the second target sample image determining unit is used for taking any candidate sample image as a target sample image under the condition that the confidence coefficient of the poor category to which the poor region belongs in the candidate sample image in the detection result of the candidate sample image is greater than a third confidence coefficient threshold value;

correspondingly, the model training module further comprises an association determination unit and a data labeling unit;

the association determining unit is used for matching the bad category to which the bad area belongs in the candidate sample image with a preset association relation, and determining the harmful category to which the candidate sample image belongs; the preset association relationship is an association relationship between a candidate bad category of the live broadcast detection model and a candidate harmful category of the live broadcast classification model;

the data labeling unit is used for taking the defective area in the candidate sample image and the defective category to which the defective area belongs as the detection labeling data of the candidate sample image, and taking the harmful category to which the candidate sample image belongs as the classification labeling data of the candidate sample image;

and the detection marking data and the classification marking data are stored in association with the current target sample image.

10. The apparatus of claim 9, wherein the model training module comprises a first target sample image determination unit;

the first target sample image determining unit is configured to, in a detection result of any one candidate sample image, regard the candidate sample image as a target sample image when a confidence degree of a bad category to which a bad region in the candidate sample image belongs is greater than a first confidence degree threshold value and a confidence degree of a bad category of the candidate sample image in a classification result of the candidate sample image is greater than a second confidence degree threshold value.

11. The apparatus of claim 9, wherein the model training module comprises a third target sample image determination unit;

the third target sample image determining unit is configured to generate a labeling verification task for any candidate sample image when the confidence of the poor category to which the poor region belongs in the candidate sample image in the detection result of the candidate sample image is smaller than the third confidence threshold and larger than a fourth confidence threshold; and determining whether the candidate sample image is the target sample image according to the labeling verification information of the candidate sample image.

12. The apparatus of claim 9, wherein the model training module further comprises a sample expansion unit;

the sample expansion unit is used for constructing a negative sample by adopting the candidate sample image and the processing result of the candidate sample image under the condition that the processing result of any candidate sample image is wrong; and training the live broadcast processing model by adopting the negative sample.

13. A live processing apparatus comprising:

wherein the live processing model is determined by a training device of the live processing model of any of claims 9-12.

14. The apparatus of claim 13, wherein the target image processing module comprises a first result determination unit; the first result determination unit is specifically configured to:

under the condition that the bad type of the bad region in the detection result of the target image is unique and the bad type is a preset high-risk bad type, extracting a high-risk harmful confidence coefficient of the target image and a low-risk harmful confidence coefficient of the target image from the classification result of the target image;

15. The apparatus of claim 13, wherein the processing result determination module comprises a second result determination unit; the second result determining unit is specifically configured to:

the second result determining unit is used for extracting the high-risk harmful confidence coefficient of the target image and the low-risk harmful confidence coefficient of the target image from the classification result of the target image;

16. The apparatus of claim 13, wherein the processing result determination module comprises a third result determination unit;

and the third result determining unit is used for determining the live broadcast processing result of the target image according to the at least two bad categories under the condition that the bad area exists in the detection result of the target image and the bad area belongs to the at least two bad categories.

17. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of training a live processing model as claimed in any one of claims 1-7.

18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of training a live processing model of any of claims 1-7.

19. A computer program product comprising a computer program which, when executed by a processor, implements a live processing method according to any of claims 1-7.