CN114339049A

Movatterモバイル変換

Info

Publication number: CN114339049A
Application number: CN202111672470.9A
Authority: CN
Inventors: 钟华平; 何聪辉
Original assignee: Shenzhen Sensetime Technology Co Ltd
Current assignee: Shenzhen Sensetime Technology Co Ltd
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2022-04-12
Anticipated expiration: 2041-12-31
Also published as: WO2023123981A1; CN114339049B

Abstract

The present disclosure provides a video processing method, apparatus, computer device and storage medium, wherein the method comprises: acquiring a video to be processed; identifying at least one type of target object in each frame of video image of a video to be processed, wherein each type of target object is related to personal information; performing fuzzy processing on the target object in the identified video image to obtain a target image; and generating a target video after the blurring processing based on the target images respectively corresponding to each frame of video image.

Description

Translated fromChinese

一种视频处理方法、装置、计算机设备和存储介质A video processing method, apparatus, computer equipment and storage medium

技术领域technical field

本公开涉及图像处理技术领域，具体而言，涉及一种视频处理方法、装置、计算机设备和存储介质。The present disclosure relates to the technical field of image processing, and in particular, to a video processing method, apparatus, computer device, and storage medium.

背景技术Background technique

随着网络技术的快速发展，人们可以通过各种途径和方式获取各种各样的信息，例如，视频信息、图片信息等。信息获取途径和获取方式的多样化，虽然增加了信息获取的便利性，但同时也增加了信息泄露的风险，例如，行车视频中的车牌信息和人脸信息等敏感信息的泄露，影响了数据安全性。With the rapid development of network technology, people can obtain various information, such as video information, picture information, etc., through various channels and methods. The diversification of information acquisition channels and methods increases the convenience of information acquisition, but also increases the risk of information leakage. For example, the leakage of sensitive information such as license plate information and face information in driving videos affects data safety.

如何在保证信息获取的便利性的同时，防止视频信息中的敏感信息的泄露，保证数据的安全性，成为了一个亟需解决的问题。How to prevent the leakage of sensitive information in video information and ensure data security while ensuring the convenience of information acquisition has become an urgent problem to be solved.

发明内容SUMMARY OF THE INVENTION

本公开实施例至少提供一种视频处理方法、装置、计算机设备和存储介质。Embodiments of the present disclosure provide at least a video processing method, apparatus, computer device, and storage medium.

第一方面，本公开实施例提供了一种视频处理方法，包括：In a first aspect, an embodiment of the present disclosure provides a video processing method, including:

获取待处理视频；Get pending video;

识别所述待处理视频的每帧视频图像中的至少一种类型的目标对象，每种类型的目标对象均与个人信息相关；Identify at least one type of target object in each frame of video image of the video to be processed, and each type of target object is related to personal information;

对识别出的所述视频图像中的所述目标对象进行模糊处理，得到目标图像；blurring the target object in the identified video image to obtain a target image;

基于每帧所述视频图像分别对应的所述目标图像，生成模糊处理后的目标视频。Based on the target image corresponding to each frame of the video image, a blurred target video is generated.

这里，与个人信息相关的目标对象通常为视频图像中的敏感信息，通过识别出每帧视频图像中的各种类型的、与个人信息相关的目标对象，再对每帧视频图像中的目标对象都进行模糊处理，能够实现对每帧视频图像中的敏感信息的去除，进而得到去除了各种敏感信息的目标视频，也即得到脱敏后的目标视频，有效提高了视频数据的安全性。Here, the target objects related to personal information are usually sensitive information in video images. By identifying various types of target objects related to personal information in each frame of video images, Both of them are blurred, which can remove the sensitive information in each frame of video image, and then obtain the target video with various sensitive information removed, that is, the target video after desensitization, which effectively improves the security of video data.

在一种可能的实施方式中，所述对识别出的所述视频图像中的所述目标对象进行模糊处理，得到目标图像，包括：In a possible implementation manner, the blurring process performed on the identified target object in the video image to obtain a target image includes:

响应于在所述视频图像中识别出多个所述目标对象，从所述视频图像中抠取各个所述目标对象对应的初始子图像；In response to identifying a plurality of the target objects in the video image, extracting an initial sub-image corresponding to each of the target objects from the video image;

对每个所述初始子图像进行模糊处理，得到每个所述初始子图像对应的目标子图像；blurring each of the initial sub-images to obtain a target sub-image corresponding to each of the initial sub-images;

采用各个所述目标子图像，替换所述视频图像中对应的初始子图像，得到所述目标图像。Each of the target sub-images is used to replace the corresponding initial sub-image in the video image to obtain the target image.

该实施方式，通过抠取出各个目标对象对应的初始子图像，并对初始子图像进行模糊处理的操作，相比直接对整张视频图像进行模糊处理，能够提高整体进行模糊处理的效率。In this embodiment, by extracting the initial sub-images corresponding to each target object and performing blurring processing on the initial sub-images, the overall blurring processing efficiency can be improved compared to directly performing the blurring processing on the entire video image.

在一种可能的实施方式中，针对任一所述初始子图像，根据以下步骤对所述初始子图像进行模糊处理：In a possible implementation manner, for any of the initial sub-images, the initial sub-image is blurred according to the following steps:

将所述初始子图像划分为多个处理区域；dividing the initial sub-image into a plurality of processing regions;

基于每个所述处理区域中的各个像素点的像素值，确定每个所述处理区域对应的目标像素值；Based on the pixel value of each pixel in each of the processing areas, determine the target pixel value corresponding to each of the processing areas;

将每个处理区域中的各个像素点对应的像素值替换为确定的所述目标像素值，得到所述初始子图像对应的所述目标子图像。The pixel value corresponding to each pixel point in each processing area is replaced with the determined target pixel value to obtain the target sub-image corresponding to the initial sub-image.

该实施方式，不同处理区域中的各个像素点的像素值不同，得到的每个处理区域对应的目标像素值也不同，通过利用不同的目标像素值对多个处理区域进行像素值的替换处理，替换后得到的目标图像中的各个处理区域之间存在像素值的区别，也即存在颜色的变化，从而使得替换后的目标图像更自然。In this embodiment, the pixel values of each pixel in different processing areas are different, and the obtained target pixel values corresponding to each processing area are also different. By using different target pixel values to perform pixel value replacement processing on multiple processing areas, Differences in pixel values exist between the processing regions in the target image obtained after replacement, that is, there are color changes, so that the target image after replacement is more natural.

在一种可能的实施方式中，所述基于每个所述处理区域中的各个像素点分别对应的像素值，确定每个所述处理区域对应的目标像素值，包括：In a possible implementation manner, the determining the target pixel value corresponding to each of the processing areas based on the pixel values corresponding to the respective pixel points in each of the processing areas includes:

基于每个所述处理区域中的各个像素点的像素值，确定每个所述处理区域对应的像素值均值，将每个所述处理区域对应的像素值均值作为该处理区域对应的所述目标像素值。Based on the pixel value of each pixel in each of the processing areas, the mean value of the pixel values corresponding to each of the processing areas is determined, and the mean value of the pixel values corresponding to each of the processing areas is used as the target corresponding to the processing area Pixel values.

该实施方式，使用像素均值作为目标像素值，可以使得目标像素值能够表征对应的处理区域的各个像素点的集中程度，进而使得替换像素值后的各个处理区域之间的像素值更均衡。In this embodiment, the pixel mean value is used as the target pixel value, so that the target pixel value can represent the concentration degree of each pixel point in the corresponding processing area, thereby making the pixel value between each processing area after replacing the pixel value more balanced.

基于每个所述处理区域中的各个像素点的像素值，确定每个所述处理区域对应的像素值极值，将每个所述处理区域对应的像素值极值作为该处理区域对应的所述目标像素值。Based on the pixel value of each pixel in each of the processing areas, determine the extreme value of the pixel value corresponding to each of the processing areas, and use the extreme value of the pixel value corresponding to each of the processing areas as the corresponding pixel value of the processing area. Describe the target pixel value.

该实施方式，如果使用像素极值作为目标像素值，可以使得替换像素值后的各个处理区域之间的像素值差异更明显。In this implementation manner, if the pixel extreme value is used as the target pixel value, the pixel value difference between the processing regions after the pixel value is replaced can be made more obvious.

在一种可能的实施方式中，在所述待处理视频为拍摄的道路环境下的视频的情况下，所述目标对象的对象类型包括人脸类型、车牌类型。In a possible implementation manner, when the video to be processed is a video captured in a road environment, the object type of the target object includes a face type and a license plate type.

该实施方式，人脸类型的目标对象和车牌类型的目标对象都属于个人信息，通过对这两种类型的目标对象的模糊处理，能够有效提高数据的安全性。In this embodiment, both the face-type target object and the license plate-type target object belong to personal information, and data security can be effectively improved by fuzzy processing of these two types of target objects.

在一种可能实施方式中，所述识别所述待处理视频的每帧视频图像中的至少一种类型的目标对象是利用预先训练好的目标神经网络识别的，所述目标神经网络为利用多个样本图像训练得到的、能够用于识别多种类型的目标对象的神经网络。In a possible implementation manner, the identifying at least one type of target object in each frame of video image of the video to be processed is identified by using a pre-trained target neural network, and the target neural network uses multiple A neural network trained on several sample images that can be used to identify various types of target objects.

该实施方式，由于利用多个样本图像训练得到的目标神经网络能够识别多种类型的目标对象且具有较高的识别精度，通过利用训练好的目标神经网络对待处理视频中的每帧视频图像进行处理，能够准确地识别出每帧视频图像中包括的各种类型的目标对象，也即可以准确地识别各种涉及到个人信息的目标对象。In this embodiment, since the target neural network trained by using multiple sample images can recognize various types of target objects and has high recognition accuracy, each frame of video image in the video to be processed is processed by using the trained target neural network. Through processing, various types of target objects included in each frame of video image can be accurately identified, that is, various target objects related to personal information can be accurately identified.

在一种可能实施方式中，所述目标神经网络包括共享网络和多个分支网络，每个所述分支网络分别用于识别一种类型的目标对象；In a possible implementation, the target neural network includes a shared network and a plurality of branch networks, each of which is used to identify a type of target object;

利用预先训练好的目标神经网络，识别所述待处理视频的每帧视频图像中的至少一种类型的目标对象，包括：Use the pre-trained target neural network to identify at least one type of target object in each frame of video image of the video to be processed, including:

通过所述目标神经网络中的所述共享网络，对所述待处理视频进行视频解码处理，得到所述待处理视频对应的每帧视频图像；并针对每帧所述视频图像，对所述视频图像进行连续多次的下采样处理和上采样处理；Perform video decoding processing on the video to be processed through the shared network in the target neural network to obtain each frame of video image corresponding to the video to be processed; and for each frame of the video image, perform video decoding on the video The image is subjected to successive down-sampling and up-sampling processing;

通过所述目标神经网络中的多个分支网络，基于所述采样处理后的结果，确定所述视频图像中包括的至少一种类型的目标对象。At least one type of target object included in the video image is determined through a plurality of branch networks in the target neural network based on the sampled result.

该实施方式，基于连续的上采样处理和下采样处理，可以实现对视频图像中的特征信息的充分提取，通过多个目标分支网络对提取出的信息进行处理，能够准确得到视频图像中的各种类型的目标对象。In this embodiment, based on the continuous up-sampling and down-sampling processing, the feature information in the video image can be fully extracted, and the extracted information can be processed through multiple target branch networks, so that each feature in the video image can be accurately obtained. types of target objects.

在一种可能的实施方式中，所述针对每帧所述视频图像，对所述视频图像进行连续多次的下采样处理和上采样处理，包括：In a possible implementation manner, performing successive downsampling and upsampling processing on the video image for each frame of the video image includes:

针对每帧所述视频图像，对所述视频图像进行连续多次的下采样处理，分别得到每次下采样处理对应的图像特征信息；其中，连续多次的下采样处理中的后一次下采样处理的输入信息为前一次下采样处理得到的图像特征信息，其中第一次下采样处理的输入信息为所述视频图像；For each frame of the video image, the video image is subjected to multiple consecutive downsampling processes, and image feature information corresponding to each downsampling process is obtained respectively; The processed input information is the image feature information obtained by the previous downsampling process, wherein the input information of the first downsampling process is the video image;

对最后一次下采样处理得到的图像特征信息进行连续多次的上采样处理，分别得到每一次上采样处理对应的初始类别信息，以及所述初始类别信息对应的初始检测框信息。The image feature information obtained by the last down-sampling process is subjected to continuous multiple up-sampling processes to obtain initial category information corresponding to each up-sampling process and initial detection frame information corresponding to the initial category information.

该实施方式，通过进行多次下采样，在进行多个上采样的方式，可以实现对视频图像中的特征信息的充分提取，进而得到准确的初始类别信息和初始检测框信息。In this embodiment, by performing multiple downsampling and multiple upsampling, the feature information in the video image can be fully extracted, thereby obtaining accurate initial category information and initial detection frame information.

在一种可能的实施方式中，通过所述目标神经网络中的多个分支网络，基于所述采样处理后的结果，确定所述视频图像中包括的至少一种类型的目标对象，包括：In a possible implementation manner, through a plurality of branch networks in the target neural network, based on the result of the sampling processing, at least one type of target object included in the video image is determined, including:

利用所述多个分支网络中与所述初始类别信息匹配的目标分支网络，对每一次所述上采样处理对应的初始类别信息进行连续多次的特征提取，得到目标类别信息，以及对所述上采样处理对应的所述初始检测框信息进行连续多次的特征提取，得到目标检测框信息；Using the target branch network that matches the initial category information in the plurality of branch networks, the initial category information corresponding to each up-sampling process is subjected to feature extraction for multiple times in a row to obtain target category information, and the The initial detection frame information corresponding to the up-sampling process is subjected to continuous feature extraction for multiple times to obtain target detection frame information;

基于得到的每一次上采样处理对应的所述目标类别信息和目标检测框信息，确定所述视频图像中包括的至少一种类型的目标对象。Based on the obtained target category information and target detection frame information corresponding to each upsampling process, at least one type of target object included in the video image is determined.

该实施方式，与初始类别信息匹配的目标分支网络适用于对该初始类别信息的处理，利用与初始类别信息匹配的目标分支网络对初始类别信息进行连续多次的特征提取，能够实现信息的充分提取，从而得到准确的目标类别信息，以及，利用与初始类别信息匹配的目标分支网络对初始检测框信息进行连续多次的特征提取，能够准确地得到与目标类别信息相匹配的目标对象的目标检测框信息，这里的检测框信息用于反映目标对象的位置信息；从而基于得到的每一次上采样处理对应的目标类别信息和目标检测框信息，能够得到视频图像中包括的每种类型的每个目标对象及其位置信息。In this embodiment, the target branch network matching the initial category information is suitable for processing the initial category information, and the target branch network matching the initial category information is used to perform feature extraction on the initial category information for multiple consecutive times, which can achieve sufficient information. Extraction, thereby obtaining accurate target category information, and using the target branch network matching the initial category information to perform feature extraction on the initial detection frame information for multiple consecutive times, the target object that matches the target category information can be accurately obtained. Detection frame information, where the detection frame information is used to reflect the position information of the target object; thus, based on the obtained target category information and target detection frame information corresponding to each upsampling process, each type of each type included in the video image can be obtained. target objects and their location information.

在一种可能的实施方式中，所述基于得到的每一次上采样处理对应的所述目标类别信息和目标检测框信息，确定所述视频图像中包括的至少一种类型的目标对象，包括：In a possible implementation manner, determining at least one type of target object included in the video image based on the obtained target category information and target detection frame information corresponding to each upsampling process, including:

基于所述每一次上采样处理对应的目标类别信息和目标检测框信息，确定每一次上采样处理对应的各个目标对象的位置信息；Based on the target category information and target detection frame information corresponding to each upsampling process, determine the position information of each target object corresponding to each upsampling process;

基于所述位置信息，确定多次上采样处理对应的多个目标对象的位置是否存在重叠；Based on the position information, determine whether the positions of the multiple target objects corresponding to the multiple upsampling processes overlap;

响应于所述位置存在重叠，确定位置重叠的多个目标对象分别对应的置信度，并将置信度最高的目标对象作为最终的目标对象；In response to the overlapping of the positions, determine the respective confidence levels of the multiple target objects whose positions overlap, and use the target object with the highest confidence level as the final target object;

将确定的各个所述最终的目标对象，以及未存在位置重叠的各个目标对象，作为所述视频图像中的目标对象。Each of the determined final target objects and each of the target objects that do not have overlapping positions are used as target objects in the video image.

由于不同次的上采样处理，存在识别出相同的目标对象的对应的初始检测框信息的可能，该实施方式对每次上采样处理得到的目标类别信息和目标检测框信息均进行进一步的处理，就不可避免地出现存在位置重叠的多个目标对象，这里的位置重叠的多个目标对象大概率为同一个目标对象；另外，由于每次上采样处理得到的信息存在区别，所以即使得到相同的目标对象，也可能存在置信度的差异；因此将位置重叠的多个目标对象中对应的置信度最高的目标对象作为最终的目标对象，能够提高最终识别出的目标对象的准确性。Due to different upsampling processes, it is possible to identify the corresponding initial detection frame information of the same target object. In this embodiment, the target category information and target detection frame information obtained by each upsampling process are further processed. There are inevitably multiple target objects with overlapping positions. The multiple target objects with overlapping positions here are likely to be the same target object; in addition, since the information obtained by each upsampling process is different, even if the same Target objects may also have differences in confidence; therefore, taking the target object with the highest corresponding confidence among multiple target objects with overlapping positions as the final target object can improve the accuracy of the final recognized target object.

第二方面，本公开实施例还提供一种视频处理装置，包括：In a second aspect, an embodiment of the present disclosure further provides a video processing apparatus, including:

获取模块，用于获取待处理视频；The acquisition module is used to acquire the video to be processed;

识别模块，用于识别所述待处理视频的每帧视频图像中的至少一种类型的目标对象，每种类型的目标对象均与个人信息相关；an identification module for identifying at least one type of target object in each frame of video image of the video to be processed, and each type of target object is related to personal information;

处理模块，用于对识别出的所述视频图像中的所述目标对象进行模糊处理，得到目标图像；a processing module, configured to perform blur processing on the identified target object in the video image to obtain a target image;

生成模块，用于基于每帧所述视频图像分别对应的所述目标图像，生成模糊处理后的目标视频。A generating module is configured to generate a blurred target video based on the target image corresponding to each frame of the video image respectively.

在一种可能的实施方式中，所述处理模块，用于响应于在所述视频图像中识别出多个所述目标对象，从所述视频图像中抠取各个所述目标对象对应的初始子图像；In a possible implementation manner, the processing module is configured to extract, in response to recognizing a plurality of the target objects in the video image, an initial sub-section corresponding to each of the target objects from the video image image;

在一种可能的实施方式中，所述处理模块，用于针对任一所述初始子图像，根据以下步骤对所述初始子图像进行模糊处理：In a possible implementation manner, the processing module is configured to, for any of the initial sub-images, perform blurring processing on the initial sub-image according to the following steps:

在一种可能的实施方式中，所述处理模块，用于基于每个所述处理区域中的各个像素点的像素值，确定每个所述处理区域对应的像素值均值，将每个所述处理区域对应的像素值均值作为该处理区域对应的所述目标像素值。In a possible implementation manner, the processing module is configured to, based on the pixel value of each pixel in each of the processing areas, determine the mean value of the pixel values corresponding to each of the processing areas, and assign each of the processing areas to The mean value of the pixel values corresponding to the processing area is used as the target pixel value corresponding to the processing area.

在一种可能的实施方式中，所述处理模块，用于基于每个所述处理区域中的各个像素点的像素值，确定每个所述处理区域对应的像素值极值，将每个所述处理区域对应的像素值极值作为该处理区域对应的所述目标像素值。In a possible implementation manner, the processing module is configured to, based on the pixel value of each pixel in each of the processing areas, determine the extreme value of the pixel value corresponding to each of the processing areas, The extreme value of the pixel value corresponding to the processing area is used as the target pixel value corresponding to the processing area.

在一种可能的实施方式中，所述识别所述待处理视频的每帧视频图像中的至少一种类型的目标对象是利用预先训练好的目标神经网络识别的，所述目标神经网络为利用多个样本图像训练得到的、能够用于识别多种类型的目标对象的神经网络。In a possible implementation manner, the identifying at least one type of target object in each frame of video image of the video to be processed is identified by using a pre-trained target neural network, and the target neural network is A neural network trained on multiple sample images that can be used to identify various types of target objects.

在一种可能的实施方式中，所述目标神经网络包括共享网络和多个分支网络，每个所述分支网络分别用于识别一种类型的目标对象；In a possible implementation, the target neural network includes a shared network and a plurality of branch networks, each of which is used to identify a type of target object;

所述识别模块，用于利用预先训练好的目标神经网络，识别所述待处理视频的每帧视频图像中的至少一种类型的目标对象，包括：The identification module is used to identify at least one type of target object in each frame of video image of the video to be processed by using a pre-trained target neural network, including:

在一种可能的实施方式中，所述识别模块，用于针对每帧所述视频图像，对所述视频图像进行连续多次的下采样处理，分别得到每次下采样处理对应的图像特征信息；其中，连续多次的下采样处理中的后一次下采样处理的输入信息为前一次下采样处理得到的图像特征信息，其中第一次下采样处理的输入信息为所述视频图像；In a possible implementation manner, the identification module is configured to perform down-sampling processing on the video image multiple times in succession for each frame of the video image, and obtain image feature information corresponding to each down-sampling processing respectively. ; Wherein, the input information of the next downsampling processing in the successive downsampling processing is the image feature information obtained by the previous downsampling processing, and the input information of the first downsampling processing is the video image;

以及，用于对最后一次下采样处理得到的图像特征信息进行连续多次的上采样处理，分别得到每一次上采样处理对应的初始类别信息，以及所述初始类别信息对应的初始检测框信息。And, it is used to perform up-sampling processing on the image feature information obtained by the last down-sampling processing for several consecutive times to obtain initial category information corresponding to each up-sampling processing and initial detection frame information corresponding to the initial category information.

在一种可能的实施方式中，所述识别模块，用于利用所述多个分支网络中与所述初始类别信息匹配的目标分支网络，对每一次所述上采样处理对应的初始类别信息进行连续多次的特征提取，得到目标类别信息，以及对所述上采样处理对应的所述初始检测框信息进行连续多次的特征提取，得到目标检测框信息；In a possible implementation manner, the identification module is configured to use the target branch network that matches the initial category information in the plurality of branch networks to perform the initial category information corresponding to each up-sampling process. Continuously perform feature extraction for multiple times to obtain target category information, and perform feature extraction on the initial detection frame information corresponding to the upsampling process for multiple consecutive times to obtain target detection frame information;

在一种可能的实施方式中，所述识别模块，用于基于所述每一次上采样处理对应的目标类别信息和目标检测框信息，确定每一次上采样处理对应的各个目标对象的位置信息；In a possible implementation manner, the identification module is configured to determine the position information of each target object corresponding to each upsampling process based on the target category information and target detection frame information corresponding to each upsampling process;

第三方面，本公开可选实现方式还提供一种计算机设备，处理器、存储器，所述存储器存储有所述处理器可执行的机器可读指令，所述处理器用于执行所述存储器中存储的机器可读指令，所述机器可读指令被所述处理器执行时，所述机器可读指令被所述处理器执行时执行上述第一方面，或第一方面中任一种可能的实施方式中的步骤。In a third aspect, an optional implementation manner of the present disclosure further provides a computer device, a processor, and a memory, where the memory stores machine-readable instructions executable by the processor, and the processor is configured to execute the instructions stored in the memory. machine-readable instructions, when the machine-readable instructions are executed by the processor, when the machine-readable instructions are executed by the processor, the above-mentioned first aspect, or any possible implementation of the first aspect, is executed steps in the method.

第四方面，本公开可选实现方式还提供一种计算机可读存储介质，该计算机可读存储介质上存储有计算机程序，该计算机程序被运行时执行上述第一方面，或第一方面中任一种可能的实施方式中的步骤。In a fourth aspect, an optional implementation manner of the present disclosure further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program executes the first aspect, or any of the first aspect, when the computer program is run. steps in one possible implementation.

关于上述视频处理装置、计算机设备、及计算机可读存储介质的效果描述参见上述视频处理方法的说明，这里不再赘述。For a description of the effects of the above video processing apparatus, computer equipment, and computer-readable storage medium, reference may be made to the description of the above video processing method, which will not be repeated here.

为使本公开的上述目的、特征和优点能更明显易懂，下文特举较佳实施例，并配合所附附图，作详细说明如下。In order to make the above-mentioned objects, features and advantages of the present disclosure more obvious and easy to understand, the preferred embodiments are exemplified below, and are described in detail as follows in conjunction with the accompanying drawings.

附图说明Description of drawings

为了更清楚地说明本公开实施例的技术方案，下面将对实施例中所需要使用的附图作简单地介绍，此处的附图被并入说明书中并构成本说明书中的一部分，这些附图示出了符合本公开的实施例，并与说明书一起用于说明本公开的技术方案。应当理解，以下附图仅示出了本公开的某些实施例，因此不应被看作是对范围的限定，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他相关的附图。In order to explain the technical solutions of the embodiments of the present disclosure more clearly, the following briefly introduces the accompanying drawings required in the embodiments, which are incorporated into the specification and constitute a part of the specification. The drawings illustrate embodiments consistent with the present disclosure, and together with the description serve to explain the technical solutions of the present disclosure. It should be understood that the following drawings only show some embodiments of the present disclosure, and therefore should not be regarded as limiting the scope. Other related figures are obtained from these figures.

图1示出了本公开实施例所提供的一种视频处理方法的流程图；1 shows a flowchart of a video processing method provided by an embodiment of the present disclosure;

图2示出了本公开实施例所提供的一种共享网络的结构示意图；FIG. 2 shows a schematic structural diagram of a shared network provided by an embodiment of the present disclosure;

图3示出了本公开实施例所提供的一种对待处理视频中的一帧视频图像进行处理的示意图；3 shows a schematic diagram of processing a frame of video image in a video to be processed according to an embodiment of the present disclosure;

图4示出了本公开实施例所提供的一种初始子图像和目标子图像的对比示意图；FIG. 4 shows a schematic diagram of a comparison between an initial sub-image and a target sub-image provided by an embodiment of the present disclosure;

图5示出了本公开实施例所提供的一种视频处理装置的示意图；FIG. 5 shows a schematic diagram of a video processing apparatus provided by an embodiment of the present disclosure;

图6示出了本公开实施例所提供的一种计算机设备结构示意图。FIG. 6 shows a schematic structural diagram of a computer device provided by an embodiment of the present disclosure.

具体实施方式Detailed ways

为使本公开实施例的目的、技术方案和优点更加清楚，下面将结合本公开实施例中附图，对本公开实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本公开一部分实施例，而不是全部的实施例。通常在此处描述和示出的本公开实施例的组件可以以各种不同的配置来布置和设计。因此，以下对本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围，而是仅仅表示本公开的选定实施例。基于本公开的实施例，本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例，都属于本公开保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only These are some, but not all, embodiments of the present disclosure. The components of the disclosed embodiments generally described and illustrated herein may be arranged and designed in a variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure is not intended to limit the scope of the disclosure as claimed, but is merely representative of selected embodiments of the disclosure. Based on the embodiments of the present disclosure, all other embodiments obtained by those skilled in the art without creative work fall within the protection scope of the present disclosure.

另外，本公开实施例中的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。在本文中提及的“多个或者若干个”是指两个或两个以上。“和/或”，描述关联对象的关联关系，表示可以存在三种关系，例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。In addition, the terms "first", "second" and the like in the description and claims in the embodiments of the present disclosure and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It is to be understood that data so used may be interchanged under appropriate circumstances so that the embodiments described herein can be practiced in sequences other than those illustrated or described herein. Reference herein to "a plurality or several" means two or more. "And/or", which describes the association relationship of the associated objects, means that there can be three kinds of relationships, for example, A and/or B, which can mean that A exists alone, A and B exist at the same time, and B exists alone. The character "/" generally indicates that the associated objects are an "or" relationship.

经研究发现，为了提高视频数据安全性，通过需要对视频中的敏感信息进行匿名化处理，但现有技术中，对视频中的敏感信息进行匿名化处理的方式为利用视频跟踪算法实现，具体的，在确定出视频首帧图像中的敏感信息之后，利用视频跟踪算法对上述敏感信息进行追踪定位，再进行匿名化处理。但受限于视频跟踪算法的跟踪精度，通过会存在敏感信息跟踪失败，或者存在较大跟踪误差的情况，从而导致敏感信息的匿名化处理的效果较差，无法保证视频数据的安全性。The research found that in order to improve the security of video data, it is necessary to anonymize the sensitive information in the video, but in the prior art, the method of anonymizing the sensitive information in the video is to use the video tracking algorithm. Yes, after determining the sensitive information in the first frame of the video, use a video tracking algorithm to track and locate the sensitive information, and then perform anonymization processing. However, limited by the tracking accuracy of the video tracking algorithm, the sensitive information may fail to be tracked, or there may be a large tracking error, resulting in poor anonymization of sensitive information and inability to ensure the security of video data.

基于上述研究，本公开提供了一种视频处理方法、装置、计算机设备和存储介质，通过识别出每帧视频图像中的各种类型的、与个人信息相关的目标对象，再对每帧视频图像中的目标对象都进行模糊处理，能够实现对每帧视频图像中的敏感信息的去除，进而得到去除了各种敏感信息的目标视频，也即得到脱敏后的目标视频，有效提高了视频数据的安全性。Based on the above research, the present disclosure provides a video processing method, device, computer equipment and storage medium. By identifying various types of target objects related to personal information in each frame of video image, The target objects in the video are all blurred, which can remove the sensitive information in each frame of video image, and then obtain the target video with various sensitive information removed, that is, the target video after desensitization, which effectively improves the video data. security.

针对以上方案所存在的缺陷，均是发明人在经过实践并仔细研究后得出的结果，因此，上述问题的发现过程以及下文中本公开针对上述问题所提出的解决方案，都应该是发明人在本公开过程中对本公开做出的贡献。The defects existing in the above solutions are all the results obtained by the inventor after practice and careful research. Therefore, the discovery process of the above problems and the solutions to the above problems proposed by the present disclosure hereinafter should be the inventors. Contributions made to this disclosure during the course of this disclosure.

应注意到：相似的标号和字母在下面的附图中表示类似项，因此，一旦某一项在一个附图中被定义，则在随后的附图中不需要对其进行进一步定义和解释。It should be noted that like numerals and letters refer to like items in the following figures, so once an item is defined in one figure, it does not require further definition and explanation in subsequent figures.

为便于对本实施例进行理解，首先对本公开实施例所公开的一种视频处理方法进行详细介绍，本公开实施例所提供的视频处理方法的执行主体一般为具有一定计算能力的计算机设备，在一些可能的实现方式中，该视频处理方法可以通过处理器调用存储器中存储的计算机可读指令的方式来实现。In order to facilitate the understanding of this embodiment, a video processing method disclosed in the embodiment of the present disclosure is first introduced in detail. The execution subject of the video processing method provided by the embodiment of the present disclosure is generally a computer device with a certain computing capability. In a possible implementation manner, the video processing method may be implemented by the processor calling computer-readable instructions stored in the memory.

下面以执行主体为计算机设备为例对本公开实施例提供的视频处理方法加以说明。The video processing method provided by the embodiment of the present disclosure will be described below by taking the execution subject as a computer device as an example.

如图1所示，为本公开实施例提供的一种视频处理方法的流程图，可以包括以下步骤：As shown in FIG. 1 , a flowchart of a video processing method provided by an embodiment of the present disclosure may include the following steps:

S101：获取待处理视频。S101: Acquire the video to be processed.

这里，待处理视频可以为利用任一拍摄设备拍摄的包括一种或多种类型的目标对象的视频，每种类型的目标对象可以包括一种或多个。例如，待处理视频可以为利用行车记录仪拍摄的前方道路环境视频，此时待处理视频中可能会出现车牌信息、驾驶员或乘客的人脸信息等个人信息。Here, the video to be processed may be a video shot by any shooting device and includes one or more types of target objects, and each type of target objects may include one or more types of target objects. For example, the video to be processed may be a video of the road environment ahead captured by a driving recorder. In this case, personal information such as license plate information, driver or passenger face information, etc. may appear in the video to be processed.

每种类型的目标对象与个人信息相关，个人信息即为图像中的敏感信息，例如，人脸信息、车牌信息等。目标对象可以是视频中与个人信息相关的对象，例如，目标对象可以为视频中出现的各个人脸、各个车牌号、各个电话号等。Each type of target object is related to personal information, which is sensitive information in images, such as face information, license plate information, etc. The target object may be an object related to personal information in the video, for example, the target object may be each face, each license plate number, each phone number, etc. appearing in the video.

S102：识别待处理视频的每帧视频图像中的至少一种类型的目标对象；每种类型的目标对象均与个人信息相关。S102: Identify at least one type of target object in each frame of video image of the video to be processed; each type of target object is related to personal information.

在一种实施例中，识别待处理视频的每帧视频图像中的至少一种类型的目标对象可以是利用预先训练好的目标神经网络识别的，目标神经网络可以为利用多个样本图像训练得到的、能够用于识别多种类型的目标对象的神经网络。In one embodiment, identifying at least one type of target object in each frame of video images of the video to be processed may be identified by using a pre-trained target neural network, and the target neural network may be obtained by training multiple sample images. A neural network capable of recognizing many types of target objects.

上述目标神经网络为预先训练好的、能够识别视频中的多种类型的目标对象的神经网络。每种类型即为一种个人信息对应的对象类型，如人脸信息对应的对象类型为人脸类型、电话号信息对应的对象类型为电话号类型、文字信息对应的对象类型为文字类型等。The above target neural network is a pre-trained neural network capable of recognizing various types of target objects in the video. Each type is an object type corresponding to a personal information, for example, the object type corresponding to face information is face type, the object type corresponding to phone number information is phone number type, and the object type corresponding to text information is text type, etc.

在预先训练目标神经网络的过程中，可以使用大规模的不同场景下的样本图像，每个样本图像中可以包括一种或多种类型的样本对象，每个样本图像中包括的样本对象的数量不同，即使包括相同数量的相同类型的多个样本图像，每个样本图像对应的场景也可以不同。通过利用大量的样本图像对目标神经网络进行训练，可以得到具有可靠的识别精度、以及适用于对各种场景下、包括各种类型的目标对象的视频的识别处理。In the process of pre-training the target neural network, large-scale sample images in different scenarios can be used, each sample image can include one or more types of sample objects, and the number of sample objects included in each sample image Differently, even if the same number of multiple sample images of the same type are included, the scene corresponding to each sample image may be different. By using a large number of sample images to train the target neural network, it can obtain reliable recognition accuracy and be suitable for recognition processing of videos including various types of target objects in various scenarios.

关于训练好的目标神经网络能够识别的目标对象的类型，可以根据实际的应用场景进行设置，本公开不进行具体限定。在具体应用时，只需要选择包括需要识别的任一类型的样本对象的样本图像，对目标神经网络进行训练，即可得到的能够识别任一类型的目标对象的目标神经网络。The types of target objects that can be recognized by the trained target neural network can be set according to actual application scenarios, which are not specifically limited in the present disclosure. In a specific application, it is only necessary to select a sample image including any type of sample object to be recognized, and to train the target neural network to obtain a target neural network capable of recognizing any type of target object.

本步骤在具体实施时，在获取到待处理视频之后，可以将其输入到预先训练好的目标神经网络，利用目标神经网络先对其进行视频解码以及切帧处理，得到待处理视频中包括的每帧视频图像。然后，针对每帧视频图像，可以对该帧视频图像进行识别处理，确定出该帧视频图像中包括的每种类型的每个目标对象。例如，可以利用目标神经网络中的卷积层对视频图像进行连续多次地卷积处理，提取出视频图像的图像特征信息，再根据图像特征信息，确定出属于对象类型的特征信息，以及属于各个目标对象的特征信息，继而，确定出视频图像中包括的目标对象的类型，以及各个类型的目标对象。In the specific implementation of this step, after the video to be processed is obtained, it can be input into the pre-trained target neural network, and the target neural network is used to first perform video decoding and frame cutting processing on it, and obtain the video included in the to-be-processed video. Each frame of video image. Then, for each frame of video image, identification processing may be performed on the frame of video image to determine each type of target object included in the frame of video image. For example, the convolution layer in the target neural network can be used to perform convolution processing on the video image several times in a row to extract the image feature information of the video image, and then according to the image feature information, determine the feature information belonging to the object type, and the The feature information of each target object, and then, determines the types of target objects included in the video image and the target objects of each type.

这样，基于对每帧视频图像的识别处理，可以准确地确定出每帧视频图像中所包括的各种类型的各个目标对象，也即，准确确定出待处理视频中包括的各种类型的与个人信息相关的各个目标对象。In this way, based on the identification processing of each frame of video image, various types of target objects included in each frame of video image can be accurately determined, that is, various types of objects included in the video to be processed can be accurately determined. Each target object related to personal information.

S103：对识别出的视频图像中的目标对象进行模糊处理，得到目标图像。S103: Perform a blurring process on the recognized target object in the video image to obtain a target image.

上述目标图像为对视频图像中包括的各种类型的各个目标对象进行了脱敏处理后得到的图像。The above-mentioned target image is an image obtained by desensitizing each target object of various types included in the video image.

具体实施时，在基于上述S102识别出每帧视频图像中包括的各个类型的各个目标对象之后，可以分别对每帧视频图像中的每个目标对象进行模糊处理。During specific implementation, after identifying each type of each target object included in each frame of video image based on the above S102, blurring processing may be performed on each target object in each frame of video image respectively.

其中，模糊处理的方式例如可以包括以下至少一种：Wherein, the mode of blurring processing may include at least one of the following, for example:

方式一、利用预设填充图案覆盖或替换目标对象。Method 1: Cover or replace the target object with a preset fill pattern.

方式二、将目标对象对应的各个像素点的像素值替换为预设像素值。Manner 2: Replace the pixel value of each pixel point corresponding to the target object with a preset pixel value.

方式三、将目标对象的部分关键区域(如人脸中的五官、车牌中的全部或部分车牌号)对应的像素点的像素值替换为目标像素值。Method 3: Replace the pixel value of the pixel point corresponding to some key regions of the target object (such as facial features in the face, all or part of the license plate number in the license plate) with the target pixel value.

方式四、根据目标对象的类型，以及预设的类型和模糊处理方式(例如可以包括上述三种方式)之间的关联关系，确定识别出的每种类型的目标对象对应的目标处理方式，再利用该处理方式对目标对象进行模糊处理。Mode 4: Determine the target processing mode corresponding to each type of The target object is blurred by this processing method.

方式五、将目标对象对应的图像区域从视频图像中裁剪出，然后，将同等比例、同等形状的对象示意图(如轮廓图)贴回目标对象对应的图像区域。Method 5: Cut out the image area corresponding to the target object from the video image, and then paste the object schematic diagram (such as a contour map) of the same proportion and shape back to the image area corresponding to the target object.

进一步的，在利用上述任一种方式对各帧视频图像中包括的各个目标对象进行了模糊处理之后，可以得到每帧视频图像分别对应的目标图像。Further, after blurring each target object included in each frame of video image by using any of the above methods, a target image corresponding to each frame of video image can be obtained.

S104：基于每帧视频图像分别对应的目标图像，生成模糊处理后的目标视频。S104: Generate a blurred target video based on the target image corresponding to each frame of the video image.

在得到每帧视频图像分别对应的目标图像之后，可以按照每帧视频图像对应在待处理视频中的图像顺序，将每帧视频图像分别对应的目标图像，按照上述图像顺序进行合并，得到待处理视频对应的、模糊处理后的目标视频。目标视频中包括有模糊处理后的各个目标对象。After the target images corresponding to each frame of video images are obtained, the target images corresponding to each frame of video images can be merged according to the sequence of images corresponding to each frame of video images in the video to be processed, and the target images corresponding to each frame of video images can be merged according to the above image sequence to obtain the target image to be processed The blurred target video corresponding to the video. The target video includes each target object after blurring.

这样，通过识别出每帧视频图像中的各种类型的、与个人信息相关的目标对象，再对每帧视频图像中的目标对象都进行模糊处理，能够实现对每帧视频图像中的敏感信息的去除，进而得到去除了各种敏感信息的目标视频，也即得到脱敏后的目标视频，有效提高了视频数据的安全性。In this way, by identifying various types of target objects related to personal information in each frame of video image, and then blurring the target objects in each frame of video image, it is possible to realize the sensitive information in each frame of video image. Then, the target video from which various sensitive information has been removed is obtained, that is, the target video after desensitization is obtained, which effectively improves the security of video data.

在一种实施例中，上述目标神经网络包括共享网络和多个分支网络，每个分支网络分别用于识别一种类型的目标对象，针对利用预先训练好的目标神经网络，识别所述待处理视频的每帧视频图像中的至少一种类型的目标对象的步骤，可以按照以下步骤实施：In an embodiment, the above-mentioned target neural network includes a shared network and a plurality of branch networks, each branch network is used to identify a type of target object, and the target neural network that is pre-trained is used to identify the The step of at least one type of target object in each frame of video image of the video can be implemented according to the following steps:

S102-1：通过目标神经网络中的共享网络，对待处理视频进行视频解码处理，得到待处理视频对应的每帧视频图像；并针对每帧视频图像，对视频图像进行连续多次的下采样处理和上采样处理。S102-1: Perform video decoding processing on the video to be processed through the shared network in the target neural network to obtain each frame of video image corresponding to the video to be processed; and for each frame of video image, perform continuous multiple downsampling processing on the video image and upsampling processing.

这里，共享网络为对待处理视频及视频中包括的各帧视频图像进行处理的网络，用于对各帧视频图像中的特征信息进行初步提取。在获取到任一待处理视频之后，都需要先使用共享网络对其进行处理，再利用目标神经网络中的分支网络对共享网络处理的结果进行进一步的处理。Here, the shared network is a network for processing the video to be processed and each frame of video image included in the video, and is used to perform preliminary extraction of feature information in each frame of video image. After acquiring any video to be processed, it needs to be processed by the shared network first, and then the branch network in the target neural network is used to further process the result of the shared network processing.

共享网络可以包括两部分，一部分为下采样网络，一部分为上采样网络，其中，下采样网络用于对视频图像进行下采样处理，上采样网络用于对下采样网络的采样结果进行上采样处理。上采样网络和下采样网络均可以包括多个采样层，且上采样网络和下采样网络的采样层的数量一致，不同采样层采集得到的特征信息的深度、准确性、完整性均不相同。在具体实施时，下采样网络可以是残差网络RetNet，上采样网络可以是特征金字塔网络Feature Pyramid Net。或者，下采样网络和下采样网络中的多个采样层可以是多个具有不同卷积核的卷积层。The shared network can include two parts, one is a downsampling network, and the other is an upsampling network, where the downsampling network is used for downsampling the video image, and the upsampling network is used for upsampling the sampling result of the downsampling network. . Both the upsampling network and the downsampling network may include multiple sampling layers, and the number of sampling layers of the upsampling network and the downsampling network is the same, and the depth, accuracy and integrity of the feature information collected by different sampling layers are different. In a specific implementation, the down-sampling network may be a residual network RetNet, and the up-sampling network may be a feature pyramid network Feature Pyramid Net. Alternatively, the downsampling network and the multiple sampling layers in the downsampling network can be multiple convolutional layers with different convolution kernels.

不同的分支网络分别用于识别不同类型的目标对象，一个分支网络仅能识别一种类型的目标对象，目标神经网络中包括的分支网络的数量，可以根据目标神经网络所需要识别的目标对象的对象类型的数量决定。例如，在目标神经网络需要识别人脸类型和车牌类型两种类型的目标对象时，目标神经网络中包括的分支网络可以为人脸识别分支网络和车牌识别分支网络两个。Different branch networks are used to identify different types of target objects. One branch network can only identify one type of target object. The number of branch networks included in the target neural network can be determined according to the number of target objects that the target neural network needs to recognize. The number of object types determines. For example, when the target neural network needs to recognize two types of target objects, the face type and the license plate type, the branch networks included in the target neural network can be the face recognition branch network and the license plate recognition branch network.

具体实施时，在得到待处理视频之后，可以利用共享网络对待处理视频进行视频解码处理，得到待处理视频对应的每帧视频图像。然后，针对每帧视频图像，可以先利用下采样网络中的多个采样层对其进行连续多次的下采样处理，再利用上采样网络中的多个采样层对最后一次下采样的结果进行连续多次的上采样处理，得到的视频图像对应的采样处理后的结果。During specific implementation, after the video to be processed is obtained, the video to be processed may be decoded by using a shared network to obtain each frame of video image corresponding to the video to be processed. Then, for each frame of video image, multiple sampling layers in the down-sampling network can be used to perform successive down-sampling processing on it, and then multiple sampling layers in the up-sampling network can be used to perform the last down-sampling result. The up-sampling process is performed several times in a row to obtain the result of the sampling process corresponding to the video image.

S102-2：通过目标神经网络中的多个分支网络，基于采样处理后的结果，确定视频图像中包括的至少一种类型的目标对象。S102-2: Determine at least one type of target object included in the video image through multiple branch networks in the target neural network and based on the result after sampling processing.

本步骤中，可以利用每个分支网络分别对采样处理后的结果进行处理，得到各个分支网格的输出，并将各个分支网络的输出作为视频图像中包括的各个目标对象。In this step, each branch network can be used to process the result of the sampling processing respectively to obtain the output of each branch grid, and the output of each branch network can be used as each target object included in the video image.

这里，由于不同分支网络可以输出具有不同对象类型的目标对象，如果视频图像中不包括与某一分支网络能够识别的对象类型相关的信息，在利用该分支网络对该视频图像对应的上述采样处理后的结果进行处理时，可以不进行信息的输出，或者输出报错信息。因此，直接利用各个分支网络的输出确定视频图像中的目标对象，最终得到的目标对象可以包括至少一种类型。Here, since different branch networks can output target objects with different object types, if the video image does not include information related to the type of objects that can be identified by a branch network, the above-mentioned sampling processing corresponding to the video image is performed by the branch network. When the result is processed, the output of the information may not be performed, or an error message may be output. Therefore, the output of each branch network is directly used to determine the target object in the video image, and the final target object can include at least one type.

或者，针对S102-2，也可以在得到采样处理后的结果之后，先基于采样处理后的结果，从多个分支网络中选择出用于对该结果进行处理的至少一个目标分支网络，再利用确定出目标分支网络对采样处理后的结果进行处理，以确定出视频图像中包括的至少一种类型的目标对象。Alternatively, for S102-2, after the result of the sampling processing is obtained, at least one target branch network for processing the result may be selected from the plurality of branch networks based on the result of the sampling processing, and then use The determined target branch network processes the sampled result to determine at least one type of target object included in the video image.

具体实施时，可以按照以下方式确定目标分支网络：During specific implementation, the target branch network can be determined as follows:

方式一、可以基于处理后的结果，确定视频图像中所包括的各个目标对象分别对应的各个对象类型，选取与各个对象类型分别对应的分支网络作为各个目标分支网络。Manner 1: Based on the processed results, each object type corresponding to each target object included in the video image can be determined, and a branch network corresponding to each object type can be selected as each target branch network.

方式二、在执行上述S102-1的过程中，针对每帧视频图像，也可以利用下采样网络中的多个采样层，对视频图像进行连续多次的下采样处理，得到视频图像对应于下采样处理后的第一结果，以及利用上采样网络中的多个采样层，对视频图像进行连续多次的上采样处理，得到视频图像对应于上采样处理后的第二结果。然后，可以基于第一结果，从多个分支网络中选择出至少一个目标分支网络，并基于第二结果，从多个分支网络中再选择出至少一个目标分支网络，进而，将第一结果对应的目标分支网络和第二结果对应的目标分支网络，作为最终确定出各个目标分支网络。Mode 2: In the process of performing the above S102-1, for each frame of video image, multiple sampling layers in the down-sampling network can also be used to perform down-sampling processing on the video image for multiple consecutive times to obtain the video image corresponding to the down-sampling process. The first result after the sampling processing, and the multiple sampling layers in the upsampling network are used to perform upsampling processing on the video image multiple times in succession to obtain a second result corresponding to the upsampling processing of the video image. Then, based on the first result, at least one target branch network may be selected from the plurality of branch networks, and based on the second result, at least one target branch network may be selected from the plurality of branch networks, and then the first result corresponds to The target branch network corresponding to the second result and the target branch network corresponding to the second result are used as the final determination of each target branch network.

方式三、在执行上述S102-1的过程中，针对每帧视频图像，在利用下采样网络进行处理，得到上述第一结果之后，将第一结果和视频图像合并并输入至上采样网络，利用上采样网络对合并后的第一结果和视频图像进行连续多次的上采样处理，得到最终的采样处理后的结果；最后利用该采样处理后的结果，从多个分支网络中选择出至少一个目标分支网络。Mode 3: In the process of performing the above S102-1, for each frame of video image, the down-sampling network is used for processing, and after the first result is obtained, the first result and the video image are combined and input to the up-sampling network, using the up-sampling network. The sampling network performs up-sampling processing on the merged first result and the video image multiple times in succession to obtain the final sampled result; finally, using the sampled result, at least one target is selected from the multiple branch networks branch network.

在确定出各个目标分支网络之后，可以将采样处理后的结果，分别输入至每个目标分支网络，利用每个目标分支网络分别对采样处理后的结果进行处理，得到每个目标分支网络输出的与该分支网络对应的识别类型相匹配的各个目标对象，从而得到视频图像中包括的每种类型的每个目标对象。After each target branch network is determined, the sampled results can be input to each target branch network respectively, and each target branch network is used to process the sampled results respectively, and the output of each target branch network can be obtained. Each target object matched with the recognition type corresponding to the branch network, so as to obtain each target object of each type included in the video image.

或者，在确定出各个目标分支网络之后，针对每个目标分支网络，可以将采样处理后的结果中与该目标分支网络对应的识别类型相关的结果输入至该目标分支网络并处理，得到该目标分支网络输出的、与该目标分支网络对应的识别类型相匹配的各个目标对象。最终，基于各个目标分支网络分别输出的各个类型的各个目标对象，确定视频图像中包括的每种类型的每个目标对象。Or, after each target branch network is determined, for each target branch network, the result related to the identification type corresponding to the target branch network in the results after sampling processing can be input into the target branch network and processed to obtain the target branch network. Each target object output by the branch network that matches the identification type corresponding to the target branch network. Finally, each target object of each type included in the video image is determined based on each target object of each type respectively output by each target branch network.

在一种实施例中，针对上述S102-1中的针对每帧视频图像，对视频图像进行连续多次的下采样处理和上采样处理的步骤，可以按照以下步骤实施：In an embodiment, for each frame of video image in the above S102-1, the steps of performing down-sampling and up-sampling processing on the video image multiple times in succession may be implemented according to the following steps:

S102-1-1：针对每帧视频图像，对视频图像进行连续多次的下采样处理，分别得到每次下采样处理对应的图像特征信息。S102-1-1: For each frame of video image, perform down-sampling processing on the video image several times in succession, and obtain image feature information corresponding to each down-sampling processing.

S102-1-2：对最后一次下采样处理得到的图像特征信息进行连续多次的上采样处理，分别得到每一次上采样处理对应的初始类别信息，以及初始类别信息对应的初始检测框信息。S102-1-2: Perform up-sampling processing on the image feature information obtained by the last down-sampling processing for several consecutive times to obtain initial category information corresponding to each up-sampling processing and initial detection frame information corresponding to the initial category information.

其中，连续多次的下采样处理中的后一次下采样处理的输入信息为前一次下采样处理得到的图像特征信息，其中第一次下采样处理的输入信息为视频图像。连续多次的上采样处理的总次数与连续多次的下采样处理的总次数相同，连续多次的上采样处理中的后一次上采样处理的输入信息为前一次上采样处理得到的初始类别信息、初始检测框信息、以及与该次上采样处理相匹配的下采样处理得到的图像特征信息，其中，下采样处理与上采样处理匹配是指：上采样处理对应的采样顺序位和下采样处理对应的采样顺序位之和等于总次数加1。Wherein, the input information of the subsequent downsampling processing in the successive downsampling processing is the image feature information obtained by the previous downsampling processing, wherein the input information of the first downsampling processing is a video image. The total number of consecutive upsampling processes is the same as the total number of consecutive downsampling processes, and the input information of the subsequent upsampling process in the consecutive upsampling processes is the initial category obtained by the previous upsampling process. information, the initial detection frame information, and the image feature information obtained by the downsampling process that matches the upsampling process, where the downsampling process matches the upsampling process refers to: the sampling order bits corresponding to the upsampling process and the downsampling process The sum of processing the corresponding sampling order bits is equal to the total number of times plus 1.

上述S102-1中的采样处理后的结果，即为每一次上采样处理对应的初始类别信息，以及初始类别信息对应的初始检测框信息。The result after the sampling process in the above S102-1 is the initial category information corresponding to each up-sampling process, and the initial detection frame information corresponding to the initial category information.

初始类别信息用于表征共享网络识别出视频图像中的目标对象的对象类型，每一次上采样处理对应的初始类别信息可以相同，也可以不同；初始检测框信息用于反映与初始类别信息对应的各个目标对象的初始预测位置信息。例如，初始类别信息可以为包括两个人脸类型的目标对象和一个车牌类型的目标对象，初始检测框信息可以为两个人脸类型的目标对象分别对应的初始检测框，以及一个车牌类型的目标对象对应的初始检测框。The initial category information is used to represent the object type of the target object in the video image identified by the sharing network. The initial category information corresponding to each upsampling process can be the same or different; the initial detection frame information is used to reflect the corresponding initial category information. The initial predicted location information of each target object. For example, the initial category information may include two face-type target objects and a license-plate-type target object, and the initial detection frame information may be the initial detection frames corresponding to the two face-type target objects respectively, and a license-plate-type target object The corresponding initial detection frame.

如图2所示，为本公开实施例所提供的一种共享网络的结构示意图，在图2所示的共享网络中的下采样网络包括三个下采样层：对应于第一采样顺序位的第一下采样层、对应于第二采样顺序位的第二下采样层、对应于第三采样顺序位的第三下采样层；上采样网络也包括三个上采样层：对应于第一采样顺序位的第一上采样层、对应于第二采样顺序位的第二上采样层和对应于第三采样顺序位的第三上采样层。但图2仅作为一种示例，关于上下采样网络所包括的采样层的数量，可以根据实际的采样需要进行设置，在此不做限定。As shown in FIG. 2, which is a schematic structural diagram of a shared network provided by an embodiment of the present disclosure, the downsampling network in the shared network shown in FIG. 2 includes three downsampling layers: The first downsampling layer, the second downsampling layer corresponding to the second sampling order bit, and the third downsampling layer corresponding to the third sampling order bit; the upsampling network also includes three upsampling layers: corresponding to the first sampling order A first upsampling layer of the order bits, a second upsampling layer corresponding to the second sampling order bits, and a third upsampling layer corresponding to the third sampling order bits. However, FIG. 2 is only an example, and the number of sampling layers included in the up and down sampling network can be set according to actual sampling needs, which is not limited here.

下面以图2所示的下采样网络对上述S102-1-1和S102-1-2进行详细说明：The above S102-1-1 and S102-1-2 are described in detail below with the downsampling network shown in FIG. 2 :

具体实施时，针对每帧视频图像，可以先将该帧视频图像输入至下采样网络中的第一下采样层，利用该下采样层对该帧视频图像进行下采样处理，提取出视频图像对应于第一下采样层的图像特征信息，如颜色特征信息、纹理特征信息等。During specific implementation, for each frame of video image, the frame of video image can be input to the first down-sampling layer in the down-sampling network, and the down-sampling layer is used to down-sample the frame of video image, and the corresponding video image is extracted. Image feature information in the first down-sampling layer, such as color feature information, texture feature information, and the like.

然后，将第一下采样层提取出的图像特征信息输入至第二下采样层，利用第二下采样层对该图像特征信息进行进一步地下采样处理，提取出视频图像对应于第二下采样层的图像特征信息。Then, the image feature information extracted by the first downsampling layer is input to the second downsampling layer, and the image feature information is further downsampled by the second downsampling layer, and the extracted video image corresponds to the second downsampling layer. image feature information.

再然后，将第二下采样层提取出的图像特征信息输入至第三下采样层，利用第三下采样层对该图像特征信息进行进一步地下采样处理，提取出视频图像对应于第三下采样层的图像特征信息。也即对视频图像进行了连续三次的下采样处理，下采样处理的总次数为3次。Then, the image feature information extracted by the second down-sampling layer is input into the third down-sampling layer, and the image feature information is further down-sampled by the third down-sampling layer, and the extracted video image corresponds to the third down-sampling layer. The image feature information of the layer. That is, three consecutive downsampling processes are performed on the video image, and the total number of downsampling processes is three.

进一步的，可以将最后一次下采样处理得到的图像特征信息，即上述提取出的视频图像对应于位于第三采样顺序位的第三下采样层的图像特征信息，输入至上采样网络中的位于第一采样顺序位的第一上采样层，利用第一上采样层对该图像特征信息进行上采样处理，得到第一次上采样处理对应的初始类别信息，以及与初始类别信息对应的初始检测框信息。Further, the image feature information obtained by the last down-sampling process, that is, the above-mentioned extracted video image corresponds to the image feature information of the third down-sampling layer located in the third sampling order, and input it to the up-sampling network in the third down-sampling layer. A first upsampling layer with a sampling order bit, and the first upsampling layer is used to perform upsampling processing on the image feature information to obtain the initial category information corresponding to the first upsampling processing, and the initial detection frame corresponding to the initial category information. information.

然后，将第一次上采样处理对应的初始类别信息，以及与初始类别信息对应的初始检测框信息，以及与第二采样顺序位的第二次上采样处理相匹配的下采样处理得到的图像特征信息，(即第二采样顺序位的第二下采样层采样得到的图像特征信息)进行融合，将融合后的结果一起输入至位于第二采样顺序位的第二上采样层，利用第二上采样层对该融合后的结果进行上采样处理，得到的第二次上采样处理对应的初始类别信息，以及与初始类别信息对应的初始检测框信息。Then, the initial class information corresponding to the first upsampling process, the initial detection frame information corresponding to the initial class information, and the image obtained by the downsampling process that matches the second upsampling process of the second sampling order bit The feature information, that is, the image feature information sampled by the second down-sampling layer of the second sampling order, is fused, and the fusion result is input to the second up-sampling layer in the second sampling order. The upsampling layer performs upsampling processing on the fused result, and obtains initial category information corresponding to the second upsampling processing and initial detection frame information corresponding to the initial category information.

然后，将第二次上采样处理对应的初始类别信息，以及与初始类别信息对应的初始检测框信息，以及与第三采样顺序位的第三次上采样处理相匹配的下采样处理得到的图像特征信息，(即第一采样顺序位的第一下采样层采样得到的图像特征信息)进行融合，将融合后的结果一起输入至位于第三采样顺序位的第三上采样层，利用第三上采样层对该融合后的结果进行上采样处理，得到的第三次上采样处理对应的初始类别信息，以及与初始类别信息对应的初始检测框信息。这里，上采样的总次数也为3次。Then, the initial category information corresponding to the second up-sampling process, the initial detection frame information corresponding to the initial category information, and the image obtained by down-sampling matching with the third up-sampling process of the third sampling order bit feature information, (that is, the image feature information obtained by sampling the first down-sampling layer of the first sampling sequence bit) for fusion, input the fused result to the third up-sampling layer located in the third sampling sequence bit, and use the third The upsampling layer performs upsampling processing on the fusion result, and obtains initial category information corresponding to the third upsampling processing, and initial detection frame information corresponding to the initial category information. Here, the total number of times of upsampling is also 3 times.

进一步的，针对上述选取目标分支网络的步骤，可以在得到每一次上采样处理对应的初始类别信息之后，根据该初始类别信息，确定该初始类别信息对应的各个对象类型，基于该各个对象类型，从多个分支网络中选择能够识别出每个对象类型的目标分支网络。例如，在一次上采样处理对应的初始类别信息中包括的对象类型为人脸类型和车辆类型的情况下，确定出的目标分支网络可以为人脸识别分支网络和车牌识别分支网络；在一次上采样处理对应的初始类别信息中包括的对象类型仅为人脸类型情况下，确定出的目标分支网络可以为人脸识别分支网络。Further, for the above step of selecting the target branch network, after obtaining the initial category information corresponding to each up-sampling process, according to the initial category information, each object type corresponding to the initial category information can be determined, and based on the various object types, A target branch network that can identify each object type is selected from multiple branch networks. For example, when the object types included in the initial category information corresponding to one upsampling process are face types and vehicle types, the determined target branch network can be the face recognition branch network and the license plate recognition branch network; in one upsampling process When the object type included in the corresponding initial category information is only a face type, the determined target branch network may be a face recognition branch network.

进一步的，基于每一次上采样处理对应的初始类别信息，可以确定出与每一次上采样处理对应的目标分支网络。Further, based on the initial category information corresponding to each upsampling process, the target branch network corresponding to each upsampling process can be determined.

在一种实施例中，针对上述S102-2，可以按照以下步骤实施：In an embodiment, for the above S102-2, the following steps can be performed:

S102-2-1：利用多个分支网络中与初始类别信息匹配的目标分支网络，对每一次上采样处理对应的初始类别信息进行连续多次的特征提取，得到目标类别信息，以及对上采样处理对应的初始检测框信息进行连续多次的特征提取，得到目标检测框信息。S102-2-1: Use the target branch network that matches the initial category information in the multiple branch networks to perform feature extraction on the initial category information corresponding to each upsampling process for multiple consecutive times to obtain the target category information, and upsampling The corresponding initial detection frame information is processed to perform feature extraction multiple times in succession to obtain target detection frame information.

这里，与初始类别信息匹配的目标分支网络即为利用上述实施例中提供的方式确定出的各个目标分支网络。Here, the target branch network matching the initial category information is each target branch network determined using the method provided in the above embodiment.

目标类别信息为目标分支网络输出的、用于表征视频图像中与该目标分支网络所能识别的对象类型相匹配的目标对象的细节信息，以及可以表征视频图像中是否存在与该目标分支网络所能识别的对象类型相匹配的目标对象。例如，在目标分支网络为人脸识别分支网络的情况下，目标类别信息可以为该人脸识别分支网络输出的各个人脸的细节信息，如人脸轮廓的具体像素位置。The target category information is output by the target branch network and is used to represent the detailed information of the target object in the video image that matches the object type recognized by the target branch network, and can indicate whether there is any object in the video image that matches the target branch network. Recognized object types that match the target object. For example, when the target branch network is a face recognition branch network, the target category information may be detailed information of each face output by the face recognition branch network, such as specific pixel positions of the face contour.

目标检测框信息即为与目标类别信息对应的各个目标对象最终的检测框。The target detection frame information is the final detection frame of each target object corresponding to the target category information.

各个分支网络可以分为两部分，一部分为用于对初始类别信息进行处理的类别信息提取网络，一部分为用于对初始检测框信息进行处理的检测框信息提取网络，两部分网络均为包括多个卷积层的网络，两部分网络中包括的卷积层的数量一致。Each branch network can be divided into two parts, one part is the category information extraction network used to process the initial category information, and the other part is the detection frame information extraction network used to process the initial detection frame information. The number of convolutional layers included in the two parts of the network is the same.

具体实施时，针对每一次上采样处理得到的初始类别信息和初始检测框信息，可以利用确定出的与该次上采样处理对应的各个目标分支网络，分别对初始类别信息和初始检测框信息进行处理。例如，在该次上采样处理对应的目标分支网络包括两个人脸识别分支网络和车牌识别分支网络的情况下，可以将该次上采样处理对应的初始类别信息和初始检测框信息输入人脸识别分支网络，以及输入车牌识别分支网络；利用人脸识别分支网络对初始类别信息和初始检测框信息进行处理，得到的人脸识别分支网络对应的目标类别信息和目标检测框信息；以及利用车牌识别分支网络对初始类别信息和初始检测框信息进行处理，得到的车牌识别分支网络对应的目标类别信息和目标检测框信息。During specific implementation, for the initial category information and the initial detection frame information obtained by each upsampling process, each target branch network determined corresponding to the upsampling process can be used to perform the initial category information and the initial detection frame information respectively. deal with. For example, when the target branch network corresponding to this upsampling process includes two face recognition branch networks and a license plate recognition branch network, the initial category information and initial detection frame information corresponding to this upsampling process can be input into the face recognition Branch network, and input license plate recognition branch network; use face recognition branch network to process initial category information and initial detection frame information, and obtain target category information and target detection frame information corresponding to face recognition branch network; and use license plate recognition The branch network processes the initial category information and the initial detection frame information, and obtains the target category information and target detection frame information corresponding to the license plate recognition branch network.

这样，针对每一次上采样处理得到的初始类别信息和初始检测框信息，都可以利用与该次上采样处理对应的各个目标分支网络，确定出该次上采样处理对应的目标类别信息和目标检测框信息。In this way, for the initial category information and initial detection frame information obtained by each upsampling process, each target branch network corresponding to the upsampling process can be used to determine the target category information and target detection corresponding to the upsampling process. box information.

关于利用各个目标分支网络确定目标类别信息和目标检测框信息的过程，具体可以为：Regarding the process of using each target branch network to determine target category information and target detection frame information, the details can be as follows:

针对输入的初始类别信息，可以利用目标分支网络中的类别信息提取网络中的多个卷积层中的第一个卷积层对输入的初始类别信息进行特征提取，将第一次提取的结果输入至第二个卷积层，得到第二次提取的结果，再将第二次提取的结果输入至下一个卷积层，得到新一次提取的结果，依次类推，直至得到最后一个卷积层提取的结果，将该结果作为得到的目标类别信息。同理，针对输入的初始检测框信息，可以利用目标分支网络中的检测框信息提取网络中的多个卷积层中的第一个卷积层对输入的初始检测框信息进行特征提取，将第一次提取的结果输入至第二个卷积层，得到第二次提取的结果，再将第二次提取的结果输入至下一个卷积层，得到新一次提取的结果，依次类推，直至得到最后一个卷积层提取的结果，将该结果作为得到的目标检测框信息。For the input initial category information, the category information in the target branch network can be used to extract the first convolutional layer of the multiple convolutional layers in the network to perform feature extraction on the input initial category information, and the result of the first extraction can be extracted. Input to the second convolutional layer to get the result of the second extraction, and then input the result of the second extraction to the next convolutional layer to get the result of a new extraction, and so on until the last convolutional layer is obtained The extracted result is used as the obtained target category information. Similarly, for the input initial detection frame information, the detection frame information in the target branch network can be used to extract the first convolutional layer in the multiple convolutional layers in the network to perform feature extraction on the input initial detection frame information, and the The result of the first extraction is input to the second convolutional layer, and the result of the second extraction is obtained, and then the result of the second extraction is input to the next convolutional layer to obtain the result of the new extraction, and so on, until Obtain the result extracted by the last convolutional layer, and use the result as the obtained target detection frame information.

S102-2-2：基于得到的每一次上采样处理对应的目标类别信息和目标检测框信息，确定视频图像中包括的至少一种类型的目标对象。S102-2-2: Determine at least one type of target object included in the video image based on the obtained target category information and target detection frame information corresponding to each upsampling process.

具体实施时，可以将每一次上采样处理对应的目标类别信息和目标检测框信息进行合并，得到视频图像对应的最终的目标类别信息和目标检测框信息，将最终的目标类别信息对应的对象类型作为视频图像中包括的目标对象的对象类型，将最终的目标检测框信息对应的检测框中的各个对象作为视频图像中包括的各个目标对象。During specific implementation, the target category information and target detection frame information corresponding to each upsampling process can be combined to obtain the final target category information and target detection frame information corresponding to the video image, and the object type corresponding to the final target category information can be obtained. As the object type of the target object included in the video image, each object in the detection frame corresponding to the final target detection frame information is used as each target object included in the video image.

在一种实施例中，针对S102-2-2，可以按照以下步骤实施：In an embodiment, for S102-2-2, the following steps can be performed:

S102-2-2-1：基于每一次上采样处理对应的目标类别信息和目标检测框信息，确定每一次上采样处理对应的各个目标对象的位置信息。S102-2-2-1: Based on the target category information and target detection frame information corresponding to each upsampling process, determine the position information of each target object corresponding to each upsampling process.

这里，目标检测框信息即为检测框，具体可以为矩形检测框。目标对象的位置信息可以包括目标对象对应的检测框的左上顶点、在视频图像中对应的像素位置信息，以及检测框的长宽信息，或者也可以为目标对象对应的检测框的各个顶点在视频图像中对应的像素位置信息。Here, the target detection frame information is the detection frame, and specifically, it may be a rectangular detection frame. The position information of the target object may include the upper left vertex of the detection frame corresponding to the target object, the corresponding pixel position information in the video image, and the length and width information of the detection frame, or may also be each vertex of the detection frame corresponding to the target object in the video. Corresponding pixel location information in the image.

具体实施时，针对每一次上采样处理对应的目标类别信息和目标检测框信息，可以将此次上采样处理对应的目标类别信息中包括的对象类型的数量，确定此次上采样处理对应的目标对象的数量，再根据目标类别信息中包括的细节信息，确定出与此次上采样处理对应的每个目标对象的检测框，进而，可以根据此次上采样处理对应的每个目标对象的检测框的顶点对应的像素位置信息和长宽信息，作为此次上采样处理对应的各个目标对象的位置信息。During specific implementation, for the target category information and target detection frame information corresponding to each upsampling process, the number of object types included in the target category information corresponding to this upsampling process can be used to determine the target corresponding to this upsampling process. The number of objects, and then according to the detailed information included in the target category information, the detection frame of each target object corresponding to this upsampling process is determined, and then, the detection frame of each target object corresponding to this upsampling process can be detected. The pixel position information and the length and width information corresponding to the vertices of the frame are used as the position information of each target object corresponding to this upsampling process.

S102-2-2-2：基于位置信息，确定多次上采样处理对应的多个目标对象的位置是否存在重叠。S102-2-2-2: Based on the position information, determine whether the positions of multiple target objects corresponding to the multiple upsampling processes overlap.

这里，上述S102-2-2-1确定出的位置信息包括每一次上采样处理对应的各个目标对象的位置信息，而由于每一次上采样处理对应的目标类别信息和目标检测框信息可能存在部分重叠，也即每一次上采样处理对应的目标类别信息和目标检测框信息，可能对应于同一个目标对象。因此在得到各个目标对象的位置信息之后，还可以根据各个位置信息对应的位置，确定多次上采样处理对应的多个目标对象的位置是否存在重叠。Here, the position information determined by the above S102-2-2-1 includes the position information of each target object corresponding to each upsampling process, and since the target category information and target detection frame information corresponding to each upsampling process may exist in part Overlap, that is, the target category information and target detection frame information corresponding to each upsampling process may correspond to the same target object. Therefore, after obtaining the position information of each target object, it can also be determined whether the positions of the multiple target objects corresponding to the multiple upsampling processes overlap according to the positions corresponding to each position information.

S102-2-2-3：响应于位置存在重叠，确定位置重叠的多个目标对象分别对应的置信度，并将置信度最高的目标对象作为最终的目标对象。S102-2-2-3: In response to overlapping positions, determine the respective confidence levels of multiple target objects with overlapping positions, and use the target object with the highest confidence level as the final target object.

具体实施时，在基于各个位置信息，从中找出位置存在重叠的多个目标对象，针对位置存在重叠的多个目标对象中的每个目标对象，可以先确定出每个目标对象对应的置信度。其中，该置信度可以为目标对象对应的目标类别信息的置信度，也可以目标对象对应的目标检测框信息的置信度；或者，也可以为上述两个置信度对应的置信度均值。In specific implementation, based on each position information, a plurality of target objects with overlapping positions are found therefrom, and for each target object in the plurality of target objects with overlapping positions, the confidence level corresponding to each target object can be determined first. . The confidence level may be the confidence level of the target category information corresponding to the target object, or the confidence level of the target detection frame information corresponding to the target object; or, may also be the confidence level corresponding to the above two confidence levels.

最后，可以根据位置存在重叠的多个目标对象中的每个目标对象对应的置信度，将置信度最高的目标对象作为该重叠位置处最终的目标对象，并将其他目标对象删除。Finally, according to the confidence level corresponding to each target object in the multiple target objects with overlapping positions, the target object with the highest confidence level can be used as the final target object at the overlapping position, and other target objects can be deleted.

S102-2-2-4：将确定的各个最终的目标对象，以及未存在位置重叠的各个目标对象，作为视频图像中包括的至少一种类型的目标对象。S102-2-2-4: Use each final target object determined and each target object that does not have overlapping positions as at least one type of target object included in the video image.

具体实施时，可以直接将确定的各个最终的目标对象，以及每一次上采样处理对应的各个目标对象中未存在位置重叠的各个目标对象，作为视频图像中包括的至少一种类型的目标对象。During specific implementation, each final target object determined and each target object that does not have overlapping positions among the target objects corresponding to each up-sampling process can be directly used as at least one type of target object included in the video image.

如图3所示，为本公开实施例所提供的一种对待处理视频中的一帧视频图像进行处理的示意图，其中，图3中包括两个分支网络——人脸识别分支网络和车辆识别分支网络，两个分支网络具有相同数量的卷积层，图3中的×4表示经过4个卷积层(均为一种示例)，在利用目标神经网络中的共享网络输出每一次上采样处理对应的初始类别信息和初始检测框信息之后，可以将其输入到不同的分支网络中，如图3所示，可以将第一次上采样处理对应的初始类别信息1和初始检测框信息1输入至人脸识别分支网络，将第二次上采样处理对应的初始类别信息2和初始检测框信息2输入至车牌识别分支网络，将第三次上采样处理对应的初始类别信息3和初始检测框信息3均输入至人脸识别分支网络和车牌识别分支网络；然后，利用每一个识别分支网络中的类别信息提取网络和检测框信息提取网络，分别对输入的初始类别信息和初始检测框信息进行处理。这里，由于输入人脸识别分支网络的初始类别信息和初始检测框信息为第一次上采样处理和第三次上采样处理的结果，所以图3中的人脸识别分支网络输出的目标类别信息包括第一次上采样处理中的初始类别信息1对应的目标类别信息1和第三次上采样处理中的初始类别信息3对应的目标类别信息3，同理，图3中的人脸识别分支网络输出的目标检测框信息也包括第一次上采样处理对应的目标检测框信息1和第三次上采样处理对应的目标检测框信息3。图3中的车牌识别分支网络仅输出第二次上采样处理对应的目标类别信息2和目标检测框信息2。最后，可以基于输出的目标类别信息1、2、3和目标检测框信息1、2、3，确定出视频图像中包括的至少一种类型的目标对象。As shown in FIG. 3 , a schematic diagram of processing a frame of video image in a video to be processed provided by an embodiment of the present disclosure, wherein FIG. 3 includes two branch networks—a face recognition branch network and a vehicle recognition network Branch network, the two branch networks have the same number of convolutional layers, ×4 in Figure 3 indicates that after 4 convolutional layers (all of which are an example), each time upsampling is performed using the shared network output in the target neural network After processing the corresponding initial category information and initial detection frame information, it can be input into different branch networks, as shown in Figure 3, the initial category information 1 and initial detection frame information 1 corresponding to the first upsampling process can be processed Input to the face recognition branch network, input the initial category information 2 and initial detection frame information 2 corresponding to the second upsampling processing to the license plate recognition branch network, and input the initial category information 3 and initial detection corresponding to the third upsampling processing The frame information 3 is input into the face recognition branch network and the license plate recognition branch network; then, the classification information extraction network and the detection frame information extraction network in each recognition branch network are used to extract the input initial classification information and the initial detection frame information respectively. to be processed. Here, since the initial category information and initial detection frame information input to the face recognition branch network are the results of the first upsampling process and the third upsampling process, the target category information output by the face recognition branch network in Figure 3 Including the target category information 1 corresponding to the initial category information 1 in the first upsampling process and the target category information 3 corresponding to the initial category information 3 in the third upsampling process. Similarly, the face recognition branch in Figure 3 The target detection frame information output by the network also includes target detection frame information 1 corresponding to the first upsampling process and target detection frame information 3 corresponding to the third upsampling process. The license plate recognition branch network in Figure 3 only outputstarget category information 2 and targetdetection frame information 2 corresponding to the second upsampling process. Finally, at least one type of target object included in the video image may be determined based on the outputtarget category information 1, 2, and 3 and targetdetection frame information 1, 2, and 3.

在一种实施方式中，本公开实施例所提供的视频处理方法所提及的上述目标神经网络也可以包括多个，每个目标神经网络用于识别待处理视频中的一种类型的目标对象。例如，人脸神经网络用于识别每帧视频图像中的人脸，车牌神经网络用于识别每帧视频图像中的车牌。在具体实施时，可以根据需要模糊处理的目标对象的对象类型，将待处理视频输入至与该对象类型相对应的目标神经网络中，再对目标神经网络识别出的目标对象进行模糊处理，从而得到模糊处理后的目标视频。用于识别每种类型的目标对象的目标神经网络的具体识别过程，可以参照上述各实施例中对应的分支网络的识别过程，对此不再赘述。In an embodiment, the above-mentioned target neural network mentioned in the video processing method provided by the embodiment of the present disclosure may also include multiple ones, and each target neural network is used to identify a type of target object in the video to be processed . For example, a face neural network is used to recognize faces in each frame of video images, and a license plate neural network is used to recognize license plates in each frame of video images. In specific implementation, according to the object type of the target object that needs to be blurred, the video to be processed can be input into the target neural network corresponding to the object type, and then the target object identified by the target neural network can be blurred, so as to Get the blurred target video. For the specific identification process of the target neural network for identifying each type of target object, reference may be made to the identification process of the corresponding branch network in the above-mentioned embodiments, which will not be repeated here.

在一种实施例中，针对S103，可以按照以下步骤实施：In an embodiment, for S103, the following steps can be performed:

S103-1：响应于在视频图像中识别出多个目标对象，从视频图像中抠取各个目标对象对应的初始子图像。S103-1: In response to identifying multiple target objects in the video image, extract initial sub-images corresponding to each target object from the video image.

具体实施时，针对每帧视频图像，在确定从该帧视频图像中识别出多个目标对象的情况下，可以响应于在视频图像中识别出多个目标对象，为每个目标对象分配一个对象标识(Identity document，ID)；然后根据每个目标对象对应的位置信息，从视频图像中扣取各个目标对象对应的初始子图像，并将每个目标对象对应的对象标识作为其对应的初始子图像的图像标识。In a specific implementation, for each frame of video image, if it is determined that multiple target objects are identified from the frame of video image, an object may be assigned to each target object in response to the multiple target objects being identified in the video image Identification (Identity document, ID); then according to the position information corresponding to each target object, the initial sub-image corresponding to each target object is deducted from the video image, and the object ID corresponding to each target object is used as its corresponding initial sub-image Image ID for the image.

在一种实施方式中，针对根据每个目标对象对应的位置信息，从视频图像中截取各个目标对象对应的初始子图像的步骤，可以直接将目标对象的检测框对应的图像区域扣取出来作为目标对象的初始子图像；或者，可以按照预设的放缩比例，对目标对象的检测框进行放缩，将放缩后的检测框对应的图像区域扣取出来作为目标对象的初始子图像。In one embodiment, for the step of intercepting the initial sub-image corresponding to each target object from the video image according to the position information corresponding to each target object, the image area corresponding to the detection frame of the target object can be directly deducted as The initial sub-image of the target object; or, the detection frame of the target object can be scaled according to a preset scaling ratio, and the image area corresponding to the scaled detection frame is deducted as the initial sub-image of the target object.

在一种实施方式中，即使从视频图像中只识别出一个目标对象，也可以基于该目标对象对应的位置信息，从视频图像中扣取出该目标对象对应的初始子图像。In one embodiment, even if only one target object is identified from the video image, the initial sub-image corresponding to the target object can be deducted from the video image based on the position information corresponding to the target object.

S103-2：对每个初始子图像进行模糊处理，得到每个初始子图像对应的目标子图像。S103-2: Perform blurring processing on each initial sub-image to obtain a target sub-image corresponding to each initial sub-image.

具体实施时，可以对从视频图像中扣取出的每个初始子图像进行模糊处理，得到每个初始子图像分别对应的目标子图像，并将每个初始子图像对应的图像标识作为该初始子图像对应的图像标识。During specific implementation, each initial sub-image deducted from the video image may be subjected to a blurring process to obtain a target sub-image corresponding to each initial sub-image, and the image identifier corresponding to each initial sub-image is used as the initial sub-image. Image ID corresponding to the image.

S103-3：采用各个目标子图像，替换视频图像中对应的初始子图像，得到目标图像。S103-3: Use each target sub-image to replace the corresponding initial sub-image in the video image to obtain the target image.

具体实施时，可以根据每个目标子图像对应的图像标识以及每个目标对象对应的对象标识，确定该目标子图像对应的目标对象，然后用该目标子图像替换视频图像中该目标对象对应的初始子图像。进而，可以基于每个目标子图像对应的初始子图像的图像标识和目标对象对应的对象标识，实现对视频图像中每个目标对象对应的初始子图像的替换，从而得到目标图像。During specific implementation, the target object corresponding to the target sub-image can be determined according to the image identifier corresponding to each target sub-image and the object identifier corresponding to each target object, and then the target sub-image is used to replace the target object corresponding to the target object in the video image. Initial subimage. Furthermore, the initial sub-image corresponding to each target object in the video image can be replaced based on the image identifier of the initial sub-image corresponding to each target sub-image and the object identifier corresponding to the target object, thereby obtaining the target image.

在一种实施例中，针对任一初始子图像，根据以下步骤对初始子图像进行模糊处理：In one embodiment, for any initial sub-image, the initial sub-image is blurred according to the following steps:

步骤一、将初始子图像划分为多个处理区域。Step 1: Divide the initial sub-image into multiple processing regions.

本步骤中，可以根据初始子图像对应的图像尺寸，将初始子图像均分为多个同等大小的处理区域，例如，在初始子图像对应的图像尺寸为100*80的情况下，可以将初始子图像划分为100个处理区域，每个处理区域的区域大小为10*8。或者，也可以按照预设的处理区域数量，将初始子图像划分为处理区域数量个处理区域。或者，也可以根据目标对象的对象类型，确定划分方式，例如，在目标对象的对象类型为人脸类型的情况下，可以根据人脸五官的位置信息，将初始子图像划分为多个处理区域；在目标对象的对象类型为车牌类型的情况下，可以将初始子图像均分为多个同等大小的处理区域。In this step, according to the image size corresponding to the initial sub-image, the initial sub-image can be divided into a plurality of processing areas of the same size. For example, when the image size corresponding to the initial sub-image is 100*80, the initial sub-image can be The sub-image is divided into 100 processing areas, and the area size of each processing area is 10*8. Alternatively, the initial sub-image may also be divided into the number of processing areas and the number of processing areas according to the preset number of processing areas. Alternatively, the division method can also be determined according to the object type of the target object. For example, when the object type of the target object is a face type, the initial sub-image can be divided into multiple processing areas according to the position information of the facial features; When the object type of the target object is a license plate type, the initial sub-image can be divided into a plurality of processing regions of the same size.

步骤二、基于每个处理区域中的各个像素点的像素值，确定每个处理区域对应的目标像素值。Step 2: Determine the target pixel value corresponding to each processing area based on the pixel value of each pixel in each processing area.

在一种实施例中，针对划分得到的每个处理区域，可以基于该处理区域中的各个像素点分别对应的像素值，确定该处理区域对应的像素均值，将确定的像素均值作为该处理区域的目标像素值。In an embodiment, for each divided processing area, the pixel mean value corresponding to the processing area may be determined based on the pixel values corresponding to each pixel in the processing area, and the determined pixel mean value may be used as the processing area. target pixel value.

在另一种实施例中，针对划分得到的每个处理区域，也可以基于该处理区域中的各个像素点分别对应的像素值，确定该处理区域对应的像素极值，将确定的像素极值作为该处理区域的目标像素值。In another embodiment, for each divided processing area, the pixel extreme value corresponding to the processing area may also be determined based on the pixel values corresponding to each pixel in the processing area, and the determined pixel extreme value as the target pixel value for this processing area.

再或者，针对划分得到的每个处理区域，也可以从该处理区域中的各个像素点分别对应的像素值中随机选取一个像素值作为目标像素值。Still alternatively, for each processing area obtained by division, one pixel value may be randomly selected as the target pixel value from the pixel values corresponding to the respective pixel points in the processing area.

步骤三、将每个处理区域中的各个像素点对应的像素值替换为确定的目标像素值，得到初始子图像对应的目标子图像。Step 3: Replace the pixel value corresponding to each pixel in each processing area with the determined target pixel value to obtain a target sub-image corresponding to the initial sub-image.

具体实施时，针对每个处理区域，可以将该处理区域中的各个像素点对应的像素值替换为确定的该处理区域对应的目标像素值，在完成对每个处理区域的像素值替换之后，得到目标子图像。如图4所示，为本公开实施例所提供的一种初始子图像和目标子图像的对比示意图。During specific implementation, for each processing area, the pixel value corresponding to each pixel in the processing area can be replaced with the determined target pixel value corresponding to the processing area. After completing the replacement of the pixel value for each processing area, Get the target subimage. As shown in FIG. 4 , a schematic diagram of comparison between an initial sub-image and a target sub-image provided by an embodiment of the present disclosure.

在一种实施例中，在待处理视频为拍摄的道路环境下的视频的情况下，由于道路环境下拍摄的视频中出现的目标对象大都为行人的人脸、或其他车辆的车牌，而人脸和车牌等都属于需要脱敏处理的敏感信息，因此，针对在道路环境下拍摄的视频，目标对象的对象类型包括人脸类型、车牌类型。In one embodiment, when the video to be processed is a video shot in a road environment, since most of the target objects appearing in the video shot in the road environment are the faces of pedestrians or the license plates of other vehicles, and Faces and license plates are sensitive information that needs to be desensitized. Therefore, for videos shot in a road environment, the object types of the target objects include face types and license plate types.

示例性的，待处理视频可以为用户上传的利用行车记录仪拍摄的行车视频，目标对象的对象类型可以包括：人脸类型、车牌类型。Exemplarily, the video to be processed may be a driving video uploaded by a user and captured by a driving recorder, and the object type of the target object may include: face type, license plate type.

当然，除上述对象类型以外，还可以包括如目标动物类型、目标建筑类型、目标物体类型等各种可能出现在视频的目标对象对应的对象类型，关于具体的对象类型，本公开实施例不进行具体限定。Of course, in addition to the above object types, it may also include various object types corresponding to the target objects that may appear in the video, such as target animal types, target building types, and target object types. Specific restrictions.

另外，本公开实施例还提供了一种利用多个样本图像对目标神经网络进行训练的方法，具体的，需要先获取多张样本图像，其中，每张样本图像中包括至少一种类型的样本对象，不同样本图像中包括的样本对象以及样本对象的数量可以均不相同。In addition, an embodiment of the present disclosure also provides a method for training a target neural network by using multiple sample images. Specifically, multiple sample images need to be acquired first, wherein each sample image includes at least one type of sample. objects, the sample objects included in different sample images, and the number of sample objects may all be different.

然后，可以将样本图像输入至待训练的目标神经网络，利用待训练的目标神经网络中的共享网络对样本图像进行识别处理，确定出样本图像对应于每一次上采样处理的初始预测类别信息，以及初始预测类别信息对应的初始预测检测框信息。Then, the sample image can be input into the target neural network to be trained, and the shared network in the target neural network to be trained is used to identify and process the sample image, so as to determine the initial prediction category information of the sample image corresponding to each upsampling process, and the initial predicted detection frame information corresponding to the initial predicted category information.

之后，针对得到的每一次上采样处理对应的初始预测类别信息，可以从多个分支网络中筛选出与该初始预测类别信息匹配的至少一个目标分支网络，并利用筛选出目标分支网络对初始预测类别信息以及该初始预测类别信息对应的初始预测检测框信息进行处理，确定出该次上采样处理对应的目标预测类别信息和目标预测检测框信息。进而，可以根据每一次上采样处理对应的目标预测类别信息和目标预测检测框信息，确定样本图像中包括的预测样本对象。After that, for the obtained initial prediction category information corresponding to each upsampling process, at least one target branch network that matches the initial prediction category information can be screened out from the multiple branch networks, and the target branch network can be used to screen out the initial prediction. The category information and the initial prediction detection frame information corresponding to the initial prediction category information are processed, and the target prediction category information and the target prediction detection frame information corresponding to the upsampling process are determined. Furthermore, the predicted sample object included in the sample image can be determined according to the target prediction category information and the target prediction detection frame information corresponding to each up-sampling process.

最后，可以根据共享网络输出的初始预测类别信息和初始预测检测框信息，以及标准初始类别信息和标准初始检测框信息，确定共享网络对应的第一损失值；根据每个分支网络在作为目标分支网络时输出的目标预测类别信息和目标预测检测框信息，以及标准目标预测类别信息和标准预测检测框信息，分别确定各个分支网络对应的第二损失值；之后，利用确定的第一损失值对共享网络进行迭代训练，以及利用每个分支网络对应的第二损失值，对该分支网络进行迭代训练，以调整共享网络和各个分支网络的网络参数值，直至满足预设训练截止条件，得到训练好的共享网络和各个分支网络。其中，预设训练截止条件可以包括迭代训练的轮数达到预设轮数和/或训练得到的网络的预测精度达到预设精度。Finally, the first loss value corresponding to the shared network can be determined according to the initial prediction category information and the initial prediction detection frame information output by the shared network, as well as the standard initial category information and the standard initial detection frame information; according to each branch network as the target branch The target prediction category information and target prediction detection frame information output during the network, as well as the standard target prediction category information and the standard prediction detection frame information, respectively determine the second loss value corresponding to each branch network; after that, use the determined first loss value to pair The shared network is iteratively trained, and the second loss value corresponding to each branch network is used to iteratively train the branch network to adjust the network parameter values of the shared network and each branch network until the preset training cutoff condition is met, and the training is obtained. Good shared network and various branch networks. The preset training cut-off condition may include that the number of iterative training rounds reaches the preset number of rounds and/or the prediction accuracy of the network obtained by training reaches the preset accuracy.

并且，还可以根据每个样本图像对应的标准样本对象和目标神经网络输出的预测样本对象，确定第三损失值，并利用第三损失值一起对待训练的目标神经网络进行迭代训练。此外，除可以确定每个分支网络对应第二损失值之外，还可以根据每个分支网络输出的各个预测对象和对应的标准对象，确定每个分支网络对应的第四损失值，并利用第四损失值一起对各个分支网络进行迭代训练。In addition, a third loss value can also be determined according to the standard sample object corresponding to each sample image and the predicted sample object output by the target neural network, and the target neural network to be trained is iteratively trained with the third loss value. In addition, in addition to determining the second loss value corresponding to each branch network, it is also possible to determine the fourth loss value corresponding to each branch network according to each prediction object and the corresponding standard object output by each branch network, and use the second loss value corresponding to each branch network. The four loss values together iteratively train each branch network.

最后，在确定共享网络和各个分支网络均训练完成的情况下，得到训练好的目标神经网络。Finally, when it is determined that the training of the shared network and each branch network is completed, the trained target neural network is obtained.

本领域技术人员可以理解，在具体实施方式的上述方法中，各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定，各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。Those skilled in the art can understand that in the above method of the specific implementation, the writing order of each step does not mean a strict execution order but constitutes any limitation on the implementation process, and the specific execution order of each step should be based on its function and possible Internal logic is determined.

基于同一发明构思，本公开实施例中还提供了与视频处理方法对应的视频处理装置，由于本公开实施例中的装置解决问题的原理与本公开实施例上述视频处理方法相似，因此装置的实施可以参见方法的实施，重复之处不再赘述。Based on the same inventive concept, the embodiment of the present disclosure also provides a video processing apparatus corresponding to the video processing method. Reference may be made to the implementation of the method, and repeated descriptions will not be repeated.

如图5所示，为本公开实施例提供的一种视频处理装置的示意图，包括：As shown in FIG. 5, a schematic diagram of a video processing apparatus provided by an embodiment of the present disclosure includes:

获取模块501，用于获取待处理视频；anacquisition module 501, configured to acquire the video to be processed;

识别模块502，用于识别所述待处理视频的每帧视频图像中的至少一种类型的目标对象，每种类型的目标对象均与个人信息相关；Anidentification module 502, configured to identify at least one type of target object in each frame of video image of the video to be processed, and each type of target object is related to personal information;

处理模块503，用于对识别出的所述视频图像中的所述目标对象进行模糊处理，得到目标图像；aprocessing module 503, configured to perform blur processing on the identified target object in the video image to obtain a target image;

生成模块504，用于基于每帧所述视频图像分别对应的所述目标图像，生成模糊处理后的目标视频。Thegenerating module 504 is configured to generate a blurred target video based on the target image corresponding to each frame of the video image respectively.

在一种可能的实施方式中，所述处理模块503，用于响应于在所述视频图像中识别出多个所述目标对象，从所述视频图像中抠取各个所述目标对象对应的初始子图像；In a possible implementation manner, theprocessing module 503 is configured to, in response to identifying a plurality of the target objects in the video image, extract the initial corresponding to each of the target objects from the video image. subimage;

在一种可能的实施方式中，所述处理模块503，用于针对任一所述初始子图像，根据以下步骤对所述初始子图像进行模糊处理：In a possible implementation manner, theprocessing module 503 is configured to, for any of the initial sub-images, perform blurring processing on the initial sub-image according to the following steps:

在一种可能的实施方式中，所述处理模块503，用于基于每个所述处理区域中的各个像素点的像素值，确定每个所述处理区域对应的像素值均值，将每个所述处理区域对应的像素值均值作为该处理区域对应的所述目标像素值。In a possible implementation manner, theprocessing module 503 is configured to determine the average value of the pixel values corresponding to each of the processing areas based on the pixel values of the respective pixel points in each of the processing areas, The mean value of the pixel values corresponding to the processing area is used as the target pixel value corresponding to the processing area.

在一种可能的实施方式中，所述处理模块503，用于基于每个所述处理区域中的各个像素点的像素值，确定每个所述处理区域对应的像素值极值，将每个所述处理区域对应的像素值极值作为该处理区域对应的所述目标像素值。In a possible implementation manner, theprocessing module 503 is configured to determine the pixel value extreme value corresponding to each processing area based on the pixel value of each pixel point in each processing area, The extreme value of the pixel value corresponding to the processing area is used as the target pixel value corresponding to the processing area.

所述识别模块502，用于利用预先训练好的目标神经网络，识别所述待处理视频的每帧视频图像中的至少一种类型的目标对象，包括：Theidentification module 502 is configured to identify at least one type of target object in each frame of video image of the video to be processed by using a pre-trained target neural network, including:

通过所述目标神经网络中的多个分支网络，基于所述采样处理后的结果，确定所述视频图像中包括的至少一种类型的目标对象。At least one type of target object included in the video image is determined based on the result of the sampling process through a plurality of branch networks in the target neural network.

在一种可能的实施方式中，所述识别模块502，用于针对每帧所述视频图像，对所述视频图像进行连续多次的下采样处理，分别得到每次下采样处理对应的图像特征信息；其中，连续多次的下采样处理中的后一次下采样处理的输入信息为前一次下采样处理得到的图像特征信息，其中第一次下采样处理的输入信息为所述视频图像；In a possible implementation manner, the identifyingmodule 502 is configured to, for each frame of the video image, perform down-sampling processing on the video image multiple times in succession, and obtain image features corresponding to each down-sampling processing respectively. information; wherein, the input information of the subsequent downsampling processing in the successive downsampling processing is the image feature information obtained by the previous downsampling processing, wherein the input information of the first downsampling processing is the video image;

在一种可能的实施方式中，所述识别模块502，用于利用所述多个分支网络中与所述初始类别信息匹配的目标分支网络，对每一次所述上采样处理对应的初始类别信息进行连续多次的特征提取，得到目标类别信息，以及对所述上采样处理对应的所述初始检测框信息进行连续多次的特征提取，得到目标检测框信息；In a possible implementation manner, the identifyingmodule 502 is configured to use the target branch network matching the initial category information in the plurality of branch networks to process the initial category information corresponding to each upsampling process Perform feature extraction for multiple times in a row to obtain target category information, and perform feature extraction for multiple consecutive times on the initial detection frame information corresponding to the upsampling process to obtain target detection frame information;

在一种可能的实施方式中，所述识别模块502，用于基于所述每一次上采样处理对应的目标类别信息和目标检测框信息，确定每一次上采样处理对应的各个目标对象的位置信息；In a possible implementation manner, the identifyingmodule 502 is configured to determine the position information of each target object corresponding to each upsampling process based on the target category information and target detection frame information corresponding to each upsampling process ;

关于装置中的各模块的处理流程、以及各模块之间的交互流程的描述可以参照上述方法实施例中的相关说明，这里不再详述。For the description of the processing flow of each module in the apparatus and the interaction flow between the modules, reference may be made to the relevant descriptions in the foregoing method embodiments, which will not be described in detail here.

本公开实施例还提供了一种计算机设备，如图6所示，为本公开实施例提供的一种计算机设备结构示意图，包括：An embodiment of the present disclosure also provides a computer device. As shown in FIG. 6 , a schematic structural diagram of a computer device provided by an embodiment of the present disclosure includes:

处理器61和存储器62；所述存储器62存储有处理器61可执行的机器可读指令，处理器61用于执行存储器62中存储的机器可读指令，所述机器可读指令被处理器61执行时，处理器61执行下述步骤：S101：获取待处理视频；S102：识别待处理视频的每帧视频图像中的至少一种类型的目标对象；每种类型的目标对象均与个人信息相关；S103：对识别出的视频图像中的目标对象进行模糊处理，得到目标图像以及S104：基于每帧视频图像分别对应的目标图像，生成模糊处理后的目标视频。Aprocessor 61 and amemory 62; thememory 62 stores machine-readable instructions executable by theprocessor 61, theprocessor 61 is configured to execute the machine-readable instructions stored in thememory 62, and the machine-readable instructions are executed by theprocessor 61 During execution, theprocessor 61 performs the following steps: S101: acquiring the video to be processed; S102: identifying at least one type of target object in each frame of video image of the video to be processed; each type of target object is related to personal information ; S103: Perform blurring processing on the target object in the identified video image to obtain a target image and S104: Generate a target video after blurring processing based on the target image corresponding to each frame of video image respectively.

上述存储器62包括内存621和外部存储器622；这里的内存621也称内存储器，用于暂时存放处理器61中的运算数据，以及与硬盘等外部存储器622交换的数据，处理器61通过内存621与外部存储器622进行数据交换。The above-mentionedmemory 62 includes amemory 621 and anexternal memory 622; thememory 621 here is also called an internal memory, and is used to temporarily store the operation data in theprocessor 61 and the data exchanged with theexternal memory 622 such as the hard disk. Theexternal memory 622 performs data exchange.

上述指令的具体执行过程可以参考本公开实施例中所述的视频处理方法的步骤，此处不再赘述。For the specific execution process of the above instruction, reference may be made to the steps of the video processing method described in the embodiments of the present disclosure, and details are not repeated here.

本公开实施例还提供一种计算机可读存储介质，该计算机可读存储介质上存储有计算机程序，该计算机程序被处理器运行时执行上述方法实施例中所述的视频处理方法的步骤。其中，该存储介质可以是易失性或非易失的计算机可读取存储介质。Embodiments of the present disclosure further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the steps of the video processing method described in the foregoing method embodiments are executed. Wherein, the storage medium may be a volatile or non-volatile computer-readable storage medium.

本公开实施例所提供的视频处理方法的计算机程序产品，包括存储了程序代码的计算机可读存储介质，所述程序代码包括的指令可用于执行上述方法实施例中所述的视频处理方法的步骤，具体可参见上述方法实施例，在此不再赘述。The computer program product of the video processing method provided by the embodiments of the present disclosure includes a computer-readable storage medium storing program codes, and the instructions included in the program codes can be used to execute the steps of the video processing methods described in the above method embodiments. , for details, refer to the foregoing method embodiments, which will not be repeated here.

该计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选实施例中，所述计算机程序产品具体体现为计算机存储介质，在另一个可选实施例中，计算机程序产品具体体现为软件产品，例如软件开发包(Software Development Kit，SDK)等等。The computer program product can be specifically implemented by hardware, software or a combination thereof. In an optional embodiment, the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc. Wait.

所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的装置的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。在本公开所提供的几个实施例中，应该理解到，所揭露的装置和方法，可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，又例如，多个单元或组件可以结合，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。Those skilled in the art can clearly understand that, for the convenience and brevity of description, for the specific working process of the device described above, reference may be made to the corresponding process in the foregoing method embodiments, which will not be repeated here. In the several embodiments provided in the present disclosure, it should be understood that the disclosed apparatus and method may be implemented in other manners. The device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined. Or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some communication interfaces, indirect coupling or communication connection of devices or units, which may be in electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

另外，在本公开各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.

所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解，本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(Read-OnlyMemory，ROM)、随机存取存储器(Random Access Memory，RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The functions, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a processor-executable non-volatile computer-readable storage medium. Based on such understanding, the technical solutions of the present disclosure can be embodied in the form of software products in essence, or the parts that contribute to the prior art or the parts of the technical solutions. The computer software products are stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of the present disclosure. The aforementioned storage medium includes: U disk, removable hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes.

最后应说明的是：以上所述实施例，仅为本公开的具体实施方式，用以说明本公开的技术方案，而非对其限制，本公开的保护范围并不局限于此，尽管参照前述实施例对本公开进行了详细的说明，本领域的普通技术人员应当理解：任何熟悉本技术领域的技术人员在本公开揭露的技术范围内，其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化，或者对其中部分技术特征进行等同替换；而这些修改、变化或者替换，并不使相应技术方案的本质脱离本公开实施例技术方案的精神和范围，都应涵盖在本公开的保护范围之内。因此，本公开的保护范围应所述以权利要求的保护范围为准。Finally, it should be noted that the above-mentioned embodiments are only specific implementations of the present disclosure, and are used to illustrate the technical solutions of the present disclosure, but not to limit them. The protection scope of the present disclosure is not limited to this, although the aforementioned The embodiments describe the present disclosure in detail, and those skilled in the art should understand that: any person skilled in the art can still modify the technical solutions described in the foregoing embodiments within the technical scope disclosed by the present disclosure. Or can easily think of changes, or equivalently replace some of the technical features; and these modifications, changes or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present disclosure, and should be covered in the present disclosure. within the scope of protection. Therefore, the protection scope of the present disclosure should be based on the protection scope of the claims.