CN115205161B

Movatterモバイル変換

Info

Publication number: CN115205161B
Application number: CN202210994225.8A
Authority: CN
Inventors: 王国毅; 刘小伟; 周俊伟
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2022-08-18
Filing date: 2022-08-18
Publication date: 2023-02-21
Anticipated expiration: 2042-08-18
Also published as: CN115205161A

Abstract

The embodiment of the application discloses an image processing method and equipment, wherein the method comprises the following steps: carrying out panoramic segmentation processing on an image to be processed to obtain a plurality of objects and contour masks corresponding to the objects; removing a target object in an image to be processed to obtain a first image, wherein the target object is an object in a plurality of objects; and performing image completion on a first object in the first image based on the contour mask of the first object, wherein the first object is an object which is shielded by the target object in the plurality of objects. Based on the method described in the application, the target object in the image to be processed can be removed, and the vacancy part after the target object is removed is subjected to image completion, so that the attractiveness of the image is improved.

Description

Translated fromChinese

一种图像处理方法及设备An image processing method and device

技术领域technical field

本申请涉及电子技术领域，尤其涉及一种图像处理方法及设备。The present application relates to the field of electronic technology, and in particular to an image processing method and device.

背景技术Background technique

随着时代的不断发展，越来越多人们开始喜欢通过拍照留下纪念。但在拍照的过程中，常常会遇到图像中出现不相干事物的情况，影响图像的美观度。例如，用户在公众场合拍照时，由于公众场合人流量较大，因此容易受到无关路人或者无关物体进入画面从而干扰拍摄。With the continuous development of the times, more and more people like to leave souvenirs by taking pictures. However, in the process of taking pictures, it is often encountered that irrelevant things appear in the image, which affects the aesthetics of the image. For example, when a user takes a photo in a public place, due to the large flow of people in the public place, it is easy for unrelated passers-by or unrelated objects to enter the picture and interfere with the shooting.

发明内容Contents of the invention

本申请实施例提供了一种图像处理方法及设备，能够移除待处理图像中的目标对象，并对移除目标对象后的空缺处进行图像补全，从而提高图像的美观度。Embodiments of the present application provide an image processing method and device, capable of removing a target object in an image to be processed, and performing image completion for a vacancy after removing the target object, thereby improving the aesthetics of the image.

第一方面，本申请实施例提出了一种图像处理方法，该方法包括：对待处理图像进行全景分割处理，得到多个对象和多个对象对应的轮廓掩码；移除待处理图像中的目标对象得到第一图像，目标对象为多个对象中的对象；基于第一对象的轮廓掩码对第一图像中的第一对象进行图像补全，第一对象为多个对象中被目标对象遮挡的对象。In the first aspect, the embodiment of the present application proposes an image processing method, the method includes: performing panorama segmentation processing on the image to be processed to obtain multiple objects and contour masks corresponding to the multiple objects; removing objects in the image to be processed The object obtains a first image, and the target object is an object in the multiple objects; image completion is performed on the first object in the first image based on the contour mask of the first object, and the first object is blocked by the target object among the multiple objects Object.

基于第一方面所描述的方法，能够移除用户希望移除的目标图像，同时也能够对移除目标图像后的空缺进行图像补偿，有利于提高图像的美观度。Based on the method described in the first aspect, the target image that the user wants to remove can be removed, and at the same time, image compensation can be performed for the vacancy after the removal of the target image, which is beneficial to improve the aesthetics of the image.

在一种可能的实现方式中，第一对象的数量大于1，基于第一对象的轮廓掩码对第一图像中的第一对象进行图像补全，具体实现方式为：确定多个第一对象的深度值；基于多个第一对象的深度值，计算多个第一对象中通过图像补全的像素的深度值；基于多个第一对象的轮廓掩码和多个第一对象中通过图像补全的像素的深度值，对第一图像中的多个第一对象进行图像补全，其中，深度值高的第一对象的图层低于深度值低的第一对象的图层。有利于提高图像补全的效果，从而提高图像的美观度。In a possible implementation manner, the number of first objects is greater than 1, and image completion is performed on the first object in the first image based on the contour mask of the first object. The specific implementation method is: determine a plurality of first objects Depth value; based on the depth values of multiple first objects, calculate the depth value of the pixels in the multiple first objects that are completed through the image; based on the contour masks of the multiple first objects and the multiple first objects through the image The depth value of the completed pixel is used to perform image completion on multiple first objects in the first image, wherein the layers of the first objects with high depth values are lower than the layers of the first objects with low depth values. It is beneficial to improve the effect of image completion, thereby improving the aesthetics of the image.

在一种可能的实现方式中，对待处理图像进行全景分割处理，得到多个对象和多个对象对应的轮廓掩码，具体实现方式为：将待处理图像输入全景分割模型，得到多个对象和多个对象对应的轮廓掩码。基于该实现方式，能够训练出图像补全效果更好的图像补全模型，从而有利于提升图像补全后的美观度。In a possible implementation, the image to be processed is subjected to panoramic segmentation processing to obtain multiple objects and contour masks corresponding to the multiple objects. The specific implementation method is: input the image to be processed into the panoramic segmentation model to obtain multiple objects and Contour masks for multiple objects. Based on this implementation method, an image completion model with better image completion effect can be trained, which is beneficial to improve the aesthetics of the image completion.

在一种可能的实现方式中，该方法还包括：获取第一训练图像以及第一训练图像中包括的至少一个第二对象；将第一训练图像输入全景分割模型得到至少一个第三对象；基于至少一个第二对象和至少一个第三对象调整全景分割模型的参数。基于该实现方式，有利于得到一个全景分割处理能力较强的全景分割模型，从而提高全景处理的效果。In a possible implementation, the method further includes: acquiring a first training image and at least one second object included in the first training image; inputting the first training image into the panoramic segmentation model to obtain at least one third object; based on At least one second object and at least one third object adjust parameters of the panoramic segmentation model. Based on this implementation method, it is beneficial to obtain a panoramic segmentation model with a strong panoramic segmentation processing capability, thereby improving the effect of panoramic processing.

在一种可能的实现方式中，基于第一对象的轮廓掩码对第一图像中的第一对象进行图像补全，具体实现方式为：将第一对象的轮廓掩码和第一对象输入图像补全模型进行图像补全。In a possible implementation manner, image completion is performed on the first object in the first image based on the contour mask of the first object, and the specific implementation manner is: input the contour mask of the first object and the first object into the image The completion model performs image completion.

在一种可能的实现方式中，该方法还包括：获取第二训练图像和第二训练图像中的对象的轮廓掩码，第二训练图像中包括一个对象；随机去除第二训练图像中的任意大小的区域得到输入图像；将输入图像和第二训练图像中的对象的轮廓掩码输入至图像补全模型，得到补全图像；基于补全图像和第二训练图像计算轮廓掩码损失值；基于轮廓掩码损失值调整图像补全模型中的参数。基于该实现方式，能够训练出图像补全效果更好的图像补全模型，从而有利于提升图像补全后的美观度。In a possible implementation, the method further includes: acquiring a second training image and the contour mask of an object in the second training image, where an object is included in the second training image; randomly removing any The region of size obtains input image; Input the contour mask of the object in the input image and the second training image to the image completion model, obtain the completion image; Calculate the contour mask loss value based on the completion image and the second training image; Adjust the parameters in the image completion model based on the contour mask loss value. Based on this implementation method, an image completion model with better image completion effect can be trained, which is beneficial to improve the aesthetics of the image completion.

在一种可能的实现方式中，图像补全模型包括粗修复网络和精修复网络；将输入图像和第二训练图像中的对象的轮廓掩码输入至图像补全模型，得到补全图像，具体实现方式为：将输入图像和第二训练图像中的对象的轮廓掩码输入至粗修复网络得到粗修复图像；将粗修复图像、输入图像和第二训练图像中的对象的轮廓掩码输入至精修复网络得到补全图像；基于补全图像和第二训练图像计算轮廓掩码损失值，具体实现方式为：基于补全图像、粗修复图像和第二训练图像计算轮廓掩码损失值。In a possible implementation, the image completion model includes a rough repair network and a fine repair network; the input image and the contour mask of the object in the second training image are input to the image completion model to obtain a completed image, specifically The implementation method is: input the contour mask of the object in the input image and the second training image to the rough repair network to obtain a rough repair image; input the rough repair image, the contour mask of the object in the input image and the second training image to The fine inpainting network obtains the complementary image; the contour mask loss value is calculated based on the complementary image and the second training image, and the specific implementation method is: the contour mask loss value is calculated based on the complementary image, the rough inpainting image and the second training image.

在一种可能的实现方式中，确定多个第一对象的深度值，具体实现方式为：将待处理图像输入深度估计模型得到待处理图像中每个像素的深度值；基于待处理对象中每个像素的深度值确定第一对象的深度值。In a possible implementation manner, the depth values of multiple first objects are determined, and the specific implementation manner is: input the image to be processed into the depth estimation model to obtain the depth value of each pixel in the image to be processed; The depth value of pixels determines the depth value of the first object.

在一种可能的实现方式中，该方法还包括：获取第三训练图像和第三训练图像对应的真实深度值；将第三训练图像输入至深度估计模型得到训练深度值；基于训练深度值和真实深度值的误差调整深度估计模型中的参数。基于该实现方式，能够训练出深度估计效果更好的深度估计模型，从而有利于提升图像补全后的美观度。In a possible implementation, the method further includes: acquiring the third training image and the real depth value corresponding to the third training image; inputting the third training image into the depth estimation model to obtain the training depth value; based on the training depth value and The error in the true depth value adjusts the parameters in the depth estimation model. Based on this implementation method, a depth estimation model with better depth estimation effect can be trained, which is beneficial to improve the aesthetics of the image after completion.

在一种可能的实现方式中，移除待处理图像中的目标对象得到第一图像之前，该方法还包括：显示待处理图像中包括的多个对象和多个对象的标识；接收用户对待处理图像中的对象的选择操作；将选择操作选择的对象确定为目标对象。基于该实现方式，有利于使用户能够灵活地控制需要移除的目标对象。In a possible implementation manner, before removing the target object in the image to be processed to obtain the first image, the method further includes: displaying multiple objects and the identifications of the multiple objects included in the image to be processed; A selection operation of an object in an image; the object selected by the selection operation is determined as a target object. Based on this implementation manner, it is beneficial to enable the user to flexibly control the target object to be removed.

在一种可能的实现方式中，将选择操作选择的对象确定为目标对象，具体实现方式为：确定选择操作选择的对象的语义类型；将多个对象中语义类型和选择操作选择的对象的语义类型相同的对象确定为目标对象。基于该实现方式，能够直接移除待处理图像中用户选择的语义类型的对象，简化了用户的操作。In a possible implementation, the object selected by the selection operation is determined as the target object. The specific implementation method is: determine the semantic type of the object selected by the selection operation; Objects of the same type are determined as target objects. Based on this implementation, the semantic type object selected by the user in the image to be processed can be directly removed, which simplifies the operation of the user.

在一种可能的实现方式中，目标对象的语义类型为预设语义类型。基于该实现方式，可直接移除用户预设的语义类型，简化用户操作，提高了图像处理的便捷性。In a possible implementation manner, the semantic type of the target object is a preset semantic type. Based on this implementation method, the semantic type preset by the user can be directly removed, the user operation is simplified, and the convenience of image processing is improved.

在一种可能的实现方式中，多个对象中包括多个第四对象的语义类型为预设语义类型；移除待处理图像中的目标对象得到第一图像之前，该方法还包括：基于预设图像和多个第四对象确定第五对象；确定多个第四对象中除第五对象以外的其它对象为目标对象。基于该实现方式，能够在无需用户进行手动操作的过程中，通过预设图像保留用户期望留下的对象，通过预设语义类型移除用户希望移除的对象，提高了图像处理的智能性，也简化了用户的操作。In a possible implementation manner, the semantic types of the plurality of fourth objects among the plurality of objects are preset semantic types; before removing the target object in the image to be processed to obtain the first image, the method further includes: It is assumed that the image and the plurality of fourth objects determine a fifth object; and the objects other than the fifth object among the plurality of fourth objects are determined as target objects. Based on this implementation method, the object that the user expects to be retained can be retained through the preset image, and the object that the user desires to be removed can be removed through the preset semantic type without manual operation by the user, which improves the intelligence of image processing. It also simplifies the user's operation.

第二方面，本申请提出了一种图像处理装置，该图像处理装置包括全景分割单元、移除单元和图像补全单元，其中：该全景分割单元，用于对待处理图像进行全景分割处理，得到多个对象和多个对象对应的轮廓掩码；该移除单元，用于移除待处理图像中的目标对象得到第一图像，目标对象为多个对象中的对象；该图像补全单元，用于基于第一对象的轮廓掩码对第一图像中的第一对象进行图像补全，第一对象为多个对象中被目标对象遮挡的对象。In the second aspect, the present application proposes an image processing device, which includes a panoramic segmentation unit, a removal unit, and an image completion unit, wherein: the panoramic segmentation unit is used to perform panoramic segmentation processing on the image to be processed to obtain A plurality of objects and contour masks corresponding to the plurality of objects; the removal unit is used to remove the target object in the image to be processed to obtain the first image, and the target object is an object in the plurality of objects; the image completion unit, The method is used for performing image complementation on the first object in the first image based on the contour mask of the first object, where the first object is an object occluded by the target object among the plurality of objects.

在一种可能的实现方式中，第一对象的数量大于1，该图像补全单元基于第一对象的轮廓掩码对第一图像中的第一对象进行图像补全时，该图像补全单元具体用于：确定多个第一对象的深度值；基于多个第一对象的深度值，计算多个第一对象中通过图像补全的像素的深度值；基于多个第一对象的轮廓掩码和多个第一对象中通过图像补全的像素的深度值，对第一图像中的多个第一对象进行图像补全，其中，深度值高的第一对象的图层低于深度值低的第一对象的图层。In a possible implementation manner, the number of first objects is greater than 1, and when the image completion unit performs image completion on the first object in the first image based on the contour mask of the first object, the image completion unit Specifically used for: determining the depth values of multiple first objects; calculating the depth values of pixels in the multiple first objects through image completion based on the depth values of the multiple first objects; based on the contour masks of the multiple first objects code and the depth values of the pixels in the multiple first objects through image completion, and perform image complementation on the multiple first objects in the first image, wherein the layer of the first object with a high depth value is lower than the depth value Lower the first object's layer.

在一种可能的实现方式中，该全景分割单元对待处理图像进行全景分割处理，得到多个对象和多个对象对应的轮廓掩码时，该全景分割单元，具体用于：将待处理图像输入全景分割模型，得到多个对象和多个对象对应的轮廓掩码。In a possible implementation manner, when the panoramic segmentation unit performs panoramic segmentation processing on the image to be processed to obtain multiple objects and contour masks corresponding to the multiple objects, the panoramic segmentation unit is specifically configured to: input the image to be processed Panoramic segmentation model to get multiple objects and contour masks corresponding to multiple objects.

在一种可能的实现方式中，该图像处理装置包括训练单元，该训练单元，用于：获取第一训练图像以及第一训练图像中包括的至少一个第二对象；将第一训练图像输入全景分割模型得到至少一个第三对象；基于至少一个第二对象和至少一个第三对象调整全景分割模型的参数。In a possible implementation manner, the image processing device includes a training unit configured to: acquire a first training image and at least one second object included in the first training image; input the first training image into the panorama A segmentation model obtains at least one third object; parameters of the panoramic segmentation model are adjusted based on the at least one second object and the at least one third object.

在一种可能的实现方式中，该图像补全单元基于第一对象的轮廓掩码对第一图像中的第一对象进行图像补全时，该图像补全单元，具体用于：将第一对象的轮廓掩码和第一对象输入图像补全模型进行图像补全。In a possible implementation manner, when the image completion unit performs image completion on the first object in the first image based on the contour mask of the first object, the image completion unit is specifically configured to: The contour mask of the object and the first object are input to the image completion model for image completion.

在一种可能的实现方式中，该训练单元，还用于：获取第二训练图像和第二训练图像中的对象的轮廓掩码，第二训练图像中包括一个对象；随机去除第二训练图像中的任意大小的区域得到输入图像；将输入图像和第二训练图像中的对象的轮廓掩码输入至图像补全模型，得到补全图像；基于补全图像和第二训练图像计算轮廓掩码损失值；基于轮廓掩码损失值调整图像补全模型中的参数。In a possible implementation manner, the training unit is further configured to: obtain the second training image and the contour mask of the object in the second training image, where an object is included in the second training image; randomly remove the second training image The input image is obtained from an area of any size in the input image; the contour mask of the object in the input image and the second training image is input to the image completion model to obtain the completion image; the contour mask is calculated based on the completion image and the second training image Loss value; adjusts the parameters in the image completion model based on the contour mask loss value.

在一种可能的实现方式中，图像补全模型包括粗修复网络和精修复网络；该训练单元将输入图像和第二训练图像中的对象的轮廓掩码输入至图像补全模型，得到补全图像时，鸡腿用于：将输入图像和第二训练图像中的对象的轮廓掩码输入至粗修复网络得到粗修复图像；将粗修复图像、输入图像和第二训练图像中的对象的轮廓掩码输入至精修复网络得到补全图像；该训练单元基于补全图像和第二训练图像计算轮廓掩码损失值时，具体用于：基于补全图像、粗修复图像和第二训练图像计算轮廓掩码损失值。In a possible implementation, the image completion model includes a rough repair network and a fine repair network; the training unit inputs the contour mask of the object in the input image and the second training image to the image completion model to obtain the completion During the image, the chicken leg is used for: input the outline mask of the object in the input image and the second training image to the rough repair network to obtain the rough repair image; the outline mask of the object in the rough repair image, the input image and the second training image The code is input to the fine repair network to obtain the completed image; when the training unit calculates the contour mask loss value based on the completed image and the second training image, it is specifically used to: calculate the contour based on the completed image, the rough repaired image and the second training image Mask loss value.

在一种可能的实现方式中，该全景分割单元确定多个第一对象的深度值时，该全景分割单元具体用于：将待处理图像输入深度估计模型得到待处理图像中每个像素的深度值；基于待处理对象中每个像素的深度值确定第一对象的深度值。In a possible implementation manner, when the panoramic segmentation unit determines the depth values of multiple first objects, the panoramic segmentation unit is specifically configured to: input the image to be processed into the depth estimation model to obtain the depth of each pixel in the image to be processed value; determine the depth value of the first object based on the depth value of each pixel in the object to be processed.

在一种可能的实现方式中，该训练单元，还用于：获取第三训练图像和第三训练图像对应的真实深度值；将第三训练图像输入至深度估计模型得到训练深度值；基于训练深度值和真实深度值的误差调整深度估计模型中的参数。In a possible implementation, the training unit is also used to: acquire the third training image and the real depth value corresponding to the third training image; input the third training image into the depth estimation model to obtain the training depth value; The error between the depth value and the true depth value adjusts the parameters in the depth estimation model.

在一种可能的实现方式中，该图像处理装置还包括交互单元，在移除待处理图像中的目标对象得到第一图像之前，交互单元用于显示待处理图像中包括的多个对象和多个对象的标识；接收用户对待处理图像中的对象的选择操作；将选择操作选择的对象确定为目标对象。In a possible implementation manner, the image processing apparatus further includes an interaction unit, which is configured to display multiple objects and multiple The identification of an object; receiving the user's selection operation of the object in the image to be processed; determining the object selected by the selection operation as the target object.

在一种可能的实现方式中，交互单元将选择操作选择的对象确定为目标对象时，该交互单元具体用于：确定选择操作选择的对象的语义类型；将多个对象中语义类型和选择操作选择的对象的语义类型相同的对象确定为目标对象。In a possible implementation manner, when the interaction unit determines the object selected by the selection operation as the target object, the interaction unit is specifically configured to: determine the semantic type of the object selected by the selection operation; Objects of the same semantic type as the selected objects are determined as target objects.

在一种可能的实现方式中，目标对象的语义类型为预设语义类型。In a possible implementation manner, the semantic type of the target object is a preset semantic type.

在一种可能的实现方式中，多个对象中包括多个第四对象的语义类型为预设语义类型；移除待处理图像中的目标对象得到第一图像之前，该交互单元还用于：基于预设图像和多个第四对象确定第五对象；确定多个第四对象中除第五对象以外的其它对象为目标对象。In a possible implementation manner, the semantic types of the plurality of fourth objects among the plurality of objects are preset semantic types; before the target object in the image to be processed is removed to obtain the first image, the interaction unit is also used for: A fifth object is determined based on the preset image and the plurality of fourth objects; and objects other than the fifth object among the plurality of fourth objects are determined as target objects.

第三方面，本申请实施例提供了一种电子设备，电子设备包括存储器和至少一个处理器；存储器与一个或多个处理器耦合，存储用于存储计算机程序代码，计算机程序代码包括计算机指令，当一个或多个处理器执行计算机指令时，使得电子设备执行如第一方面或第一方面下的任意一种可能的实现方式所描述的方法。In a third aspect, an embodiment of the present application provides an electronic device, the electronic device includes a memory and at least one processor; the memory is coupled to one or more processors, and is used to store computer program codes, the computer program codes include computer instructions, When one or more processors execute computer instructions, the electronic device is made to execute the method described in the first aspect or any possible implementation manner under the first aspect.

第四方面，本申请实施例提供了一种计算机存储介质，包括计算机指令，当计算机指令在电子设备上运行时，使得电子设备执行如第一方面或第一方面下的任意一种可能的实现方式所描述的方法。In the fourth aspect, the embodiment of the present application provides a computer storage medium, including computer instructions, when the computer instructions are run on the electronic device, the electronic device executes the first aspect or any possible implementation under the first aspect The method described by the method.

第五方面，本申请实施例提供一种计算机程序产品，当计算机程序产品在计算机上运行时，使得计算机执行如第一方面或第一方面下的任意一种可能的实现方式所描述的方法。In a fifth aspect, an embodiment of the present application provides a computer program product, which, when running on a computer, causes the computer to execute the method described in the first aspect or any possible implementation manner under the first aspect.

附图说明Description of drawings

图1是本申请实施例提供的电子设备的结构示意图；FIG. 1 is a schematic structural diagram of an electronic device provided by an embodiment of the present application;

图2是本申请实施例提高的电子设备的软件结构框图；Fig. 2 is a software structural block diagram of the electronic device improved by the embodiment of the present application;

图3是本申请实施例提供的一种图像处理方法的流程示意图；FIG. 3 is a schematic flow diagram of an image processing method provided in an embodiment of the present application;

图4是本申请实施例提供的全景分割示意图；FIG. 4 is a schematic diagram of panorama segmentation provided by an embodiment of the present application;

图5是本申请实施例提供的全景分割模型的结构示意图；FIG. 5 is a schematic structural diagram of a panorama segmentation model provided by an embodiment of the present application;

图6是本申请实施例提供的目标图像移除的示意图；FIG. 6 is a schematic diagram of removal of a target image provided by an embodiment of the present application;

图7是本申请实施例提供的图像补全的示意图；Fig. 7 is a schematic diagram of image completion provided by the embodiment of the present application;

图8是本申请实施例提供的随机去除图像区域的示意图；Fig. 8 is a schematic diagram of randomly removing image regions provided by the embodiment of the present application;

图9是本申请实施例提供的图像补全的结构示意图；Fig. 9 is a schematic structural diagram of image completion provided by the embodiment of the present application;

图10是本申请实施例提供的CSA层的处理过程示意图；FIG. 10 is a schematic diagram of the processing process of the CSA layer provided by the embodiment of the present application;

图11是本申请实施例提供的深度估计的示意图；FIG. 11 is a schematic diagram of depth estimation provided by an embodiment of the present application;

图12是本申请实施例提供的人机交互的示意图。Fig. 12 is a schematic diagram of the human-computer interaction provided by the embodiment of the present application.

具体实施方式Detailed ways

下面结合附图对本申请具体实施例作进一步的详细描述。其中，在本申请实施例的描述中，除非另有说明，“/”表示或的意思，例如，A/B可以表示A或B；文本中的“和/或”仅仅是一种描述关联对象的关联关系，表示可以存在三种关系，例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B这三种情况，另外，在本申请实施例的描述中，“多个”是指两个或多于两个。The specific embodiments of the present application will be further described in detail below in conjunction with the accompanying drawings. Among them, in the description of the embodiments of this application, unless otherwise specified, "/" means or means, for example, A/B can mean A or B; "and/or" in the text is only a description of associated objects The association relationship indicates that there may be three kinds of relationships, for example, A and/or B, which may indicate: A exists alone, A and B exist at the same time, and B exists alone. In addition, in the description of the embodiment of the present application , "plurality" means two or more than two.

以下，术语“第一”、“第二”仅用于描述目的，而不能理解为暗示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此，限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征，在本申请实施例的描述中，除非另有说明，“多个”的含义是两个或两个以上。Hereinafter, the terms "first" and "second" are used for descriptive purposes only, and cannot be understood as implying or implying relative importance or implicitly specifying the quantity of indicated technical features. Therefore, the features defined as "first" and "second" may explicitly or implicitly include one or more of these features. In the description of the embodiments of the present application, unless otherwise specified, the "multiple" The meaning is two or more.

随着时代的不断发展，越来越多人们开始喜欢通过拍照留下纪念。但在拍照的过程中，常常会遇到图像中出现不相干事物的情况，影响图像的美观度。为了能够移除图像中的目标对象，从而提高图像的美观度，本申请实施例提出了一种图像处理方法，该图像处理方法可以应用在电子设备中，该图像处理方法大致可以包括：对待处理图像进行全景分割处理，得到多个对象和多个对象对应的轮廓掩码；移除待处理图像中的目标对象得到第一图像，目标对象为多个对象中的对象；基于第一对象的轮廓掩码对第一图像中的第一对象进行图像补全，第一对象为多个对象中被目标对象遮挡的对象。基于本申请所描述的方法，能够移除用户希望移除的目标图像，同时也能够对移除目标图像后的空缺进行图像补偿，有利于提高图像的美观度。With the continuous development of the times, more and more people like to leave souvenirs by taking pictures. However, in the process of taking pictures, it is often encountered that irrelevant things appear in the image, which affects the aesthetics of the image. In order to be able to remove the target object in the image, thereby improving the aesthetics of the image, an embodiment of the present application proposes an image processing method, which can be applied to electronic devices, and the image processing method generally includes: The image is subjected to panoramic segmentation processing to obtain a plurality of objects and contour masks corresponding to the plurality of objects; the target object in the image to be processed is removed to obtain the first image, and the target object is an object in the plurality of objects; based on the contour of the first object The mask performs image complementation on the first object in the first image, where the first object is an object blocked by the target object among the plurality of objects. Based on the method described in this application, the target image that the user wants to remove can be removed, and at the same time, image compensation can be performed for the vacancy after removing the target image, which is beneficial to improve the aesthetics of the image.

上述所描述的电子设备可以为终端设备，例如智能手机、平板电脑、笔记本电脑、台式计算机、智能音箱、智能手表、智能车载等，但并不局限于此。该电子设备也可以为服务器，例如可以是独立的物理服务器，也可以是多个物理服务器构成的服务器集群或者分布式系统，还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(Content Delivery Network，CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器。本申请实施例对电子设备的类型不作限定。The electronic device described above may be a terminal device, such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a smart vehicle, etc., but is not limited thereto. The electronic device can also be a server, for example, it can be an independent physical server, or it can be a server cluster or distributed system composed of multiple physical servers, and it can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage , network services, cloud communications, middleware services, domain name services, security services, content delivery network (Content Delivery Network, CDN), and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms. The embodiment of the present application does not limit the type of the electronic device.

请参见图1，图1示出了电子设备100的结构示意图。下面以电子设备100为例对实施例进行具体说明。应该理解的是，电子设备100可以具有比图中所示的更多的或者更少的部件，可以组合两个或多个的部件，或者可以具有不同的部件配置。图中所示出的各种部件可以在包括一个或多个信号处理和/或专用集成电路在内的硬件、软件、或硬件和软件的组合中实现。Please refer to FIG. 1 , which shows a schematic structural diagram of an electronic device 100 . Hereinafter, the embodiment will be specifically described by taking the electronic device 100 as an example. It should be understood that electronic device 100 may have more or fewer components than shown in the figures, may combine two or more components, or may have a different configuration of components. The various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software including one or more signal processing and/or application specific integrated circuits.

电子设备100可以包括：处理器110、存储器120、天线1，天线2，移动通信模块150，无线通信模块160，显示屏194。The electronic device 100 may include: aprocessor 110 , a memory 120 , anantenna 1 , an antenna 2 , amobile communication module 150 , awireless communication module 160 , and adisplay screen 194 .

可以理解的是，本发明实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中，电子设备100可以包括比图示更多或更少的部件，或者组合某些部件，或者拆分某些部件，或者不同的部件布置。图示的部件可以以硬件，软件或软件和硬件的组合实现。It can be understood that, the structure illustrated in the embodiment of the present invention does not constitute a specific limitation on the electronic device 100 . In other embodiments of the present application, the electronic device 100 may include more or fewer components than shown in the figure, or combine certain components, or separate certain components, or arrange different components. The illustrated components can be realized in hardware, software or a combination of software and hardware.

处理器110可以包括一个或多个处理单元，例如：处理器110可以包括应用处理器(application processor，AP)，调制解调处理器，图形处理器(graphics processingunit，GPU)，图像信号处理器(image signal processor，ISP)，控制器，存储器，视频编解码器，数字信号处理器(digital signal processor，DSP)，基带处理器，和/或神经网络处理器(neural-network processing unit，NPU)等。其中，不同的处理单元可以是独立的器件，也可以集成在一个或多个处理器中。Theprocessor 110 may include one or more processing units, for example: theprocessor 110 may include an application processor (application processor, AP), a modem processor, a graphics processing unit (graphics processing unit, GPU), an image signal processor ( image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural network processor (neural-network processing unit, NPU), etc. . Wherein, different processing units may be independent devices, or may be integrated in one or more processors.

其中，控制器可以是电子设备100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号，产生操作控制信号，完成取指令和执行指令的控制。Wherein, the controller may be the nerve center and command center of the electronic device 100 . The controller can generate an operation control signal according to the instruction opcode and timing signal, and complete the control of fetching and executing the instruction.

处理器110中还可以设置存储器120，用于存储指令和数据。在一些实施例中，处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据，可从所述存储器中直接调用。避免了重复存取，减少了处理器110的等待时间，因而提高了系统的效率。A memory 120 may also be provided in theprocessor 110 for storing instructions and data. In some embodiments, the memory inprocessor 110 is a cache memory. The memory may hold instructions or data that theprocessor 110 has just used or recycled. If theprocessor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated access is avoided, and the waiting time of theprocessor 110 is reduced, thus improving the efficiency of the system.

在一些实施例中，处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit，I2C)接口，集成电路内置音频(inter-integrated circuitsound，I2S)接口，脉冲编码调制(pulse code modulation，PCM)接口，通用异步收发传输器(universal asynchronous receiver/transmitter，UART)接口，移动产业处理器接口(mobile industry processor interface，MIPI)，通用输入输出(general-purposeinput/output，GPIO)接口，用户标识模块(subscriber identity module，SIM)接口，和/或通用串行总线(universal serial bus，USB)接口等。In some embodiments,processor 110 may include one or more interfaces. The interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuitsound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous receiver (universal asynchronous receiver) /transmitter, UART) interface, mobile industry processor interface (mobile industry processor interface, MIPI), general-purpose input and output (general-purpose input/output, GPIO) interface, subscriber identity module (subscriber identity module, SIM) interface, and/or A universal serial bus (universal serial bus, USB) interface, etc.

显示屏194用于显示图像，视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display，LCD)，有机发光二极管(organic light-emittingdiode，OLED)，有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrixorganic light emitting diode的，AMOLED)，柔性发光二极管(flex light-emittingdiode，FLED)，Miniled，MicroLed，Micro-oLed，量子点发光二极管(quantum dot lightemitting diodes，QLED)等。在一些实施例中，电子设备100可以包括1个或N个显示屏194，N为大于1的正整数。Thedisplay screen 194 is used to display images, videos and the like. Thedisplay screen 194 includes a display panel. The display panel may be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (active-matrix organic light emitting diode). AMOLED), flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diodes (quantum dot light emitting diodes, QLED), etc. In some embodiments, the electronic device 100 may include 1 or N display screens 194 , where N is a positive integer greater than 1.

天线1和天线2用于发射和接收电磁波信号。终端设备中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用，以提高天线的利用率。例如：可以将天线1复用为无线局域网的分集天线。在另外一些实施例中，天线可以和调谐开关结合使用。Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in an end device can be used to cover single or multiple communication frequency bands. Different antennas can also be multiplexed to improve the utilization of the antennas. For example:Antenna 1 can be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

移动通信模块150可以提供应用在终端设备上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器，开关，功率放大器，低噪声放大器(low noise amplifier，LNA)等。移动通信模块150可以由天线1接收电磁波，并对接收的电磁波进行滤波，放大等处理，传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大，经天线1转为电磁波辐射出去。在一些实施例中，移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中，移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。Themobile communication module 150 can provide wireless communication solutions including 2G/3G/4G/5G applied on terminal equipment. Themobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA) and the like. Themobile communication module 150 can receive electromagnetic waves through theantenna 1, filter and amplify the received electromagnetic waves, and send them to the modem processor for demodulation. Themobile communication module 150 can also amplify the signals modulated by the modem processor, and convert them into electromagnetic waves through theantenna 1 for radiation. In some embodiments, at least part of the functional modules of themobile communication module 150 may be set in theprocessor 110 . In some embodiments, at least part of the functional modules of themobile communication module 150 and at least part of the modules of theprocessor 110 may be set in the same device.

调制解调处理器可以包括调制器和解调器。其中，调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后，被传递给应用处理器。应用处理器通过音频设备输出声音信号，或通过显示图像或视频。在一些实施例中，调制解调处理器可以是独立的器件。在另一些实施例中，调制解调处理器可以独立于处理器110，与移动通信模块150或其他功能模块设置在同一个器件中。A modem processor may include a modulator and a demodulator. Wherein, the modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used to demodulate the received electromagnetic wave signal into a low frequency baseband signal. Then the demodulator sends the demodulated low-frequency baseband signal to the baseband processor for processing. The low-frequency baseband signal is passed to the application processor after being processed by the baseband processor. The application processor outputs a sound signal through an audio device, or by displaying an image or video. In some embodiments, the modem processor may be a stand-alone device. In some other embodiments, the modem processor may be independent from theprocessor 110, and be set in the same device as themobile communication module 150 or other functional modules.

无线通信模块160可以提供应用在终端设备上的包括无线局域网(wirelesslocal area networks，WLAN)(如Wi-Fi网络)，蓝牙(bluetooth，BT)，BLE广播，全球导航卫星系统(global navigation satellite system，GNSS)，调频(frequency modulation，FM)，近距离无线通信技术(near field communication，NFC)，红外技术(infrared，IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波，将电磁波信号调频以及滤波处理，将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号，对其进行调频，放大，经天线2转为电磁波辐射出去。Thewireless communication module 160 can provide wireless local area networks (wireless local area networks, WLAN) (such as Wi-Fi network), bluetooth (bluetooth, BT), BLE broadcasting, global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions. Thewireless communication module 160 may be one or more devices integrating at least one communication processing module. Thewireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency-modulates and filters the electromagnetic wave signals, and sends the processed signals to theprocessor 110 . Thewireless communication module 160 can also receive the signal to be sent from theprocessor 110 , frequency-modulate it, amplify it, and convert it into electromagnetic waves through the antenna 2 for radiation.

在一些实施例中，终端设备的天线1和移动通信模块150耦合，天线2和无线通信模块160耦合，使得终端设备可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications，GSM)，通用分组无线服务(general packet radio service，GPRS)，码分多址接入(code divisionmultiple access，CDMA)，宽带码分多址(wideband code division multiple access，WCDMA)，时分码分多址(time-division code division multiple access，TD-SCDMA)，长期演进(long term evolution，LTE)，BT，GNSS，WLAN，NFC ，FM，和/或IR技术等。所述GNSS可以包括全球卫星定位系统(global positioning system ，GPS)，全球导航卫星系统(global navigation satellite system，GLONASS)，北斗卫星导航系统(beidounavigation satellite system，BDS)，准天顶卫星系统(quasi-zenith satellitesystem，QZSS)和/或星基增强系统(satellite based augmentation systems，SBAS)。In some embodiments, theantenna 1 of the terminal device is coupled to themobile communication module 150, and the antenna 2 is coupled to thewireless communication module 160, so that the terminal device can communicate with the network and other devices through wireless communication technology. The wireless communication technology may include global system for mobile communications (GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multiple access, CDMA), wideband code wideband code division multiple access (WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (LTE), BT, GNSS, WLAN, NFC, FM, and/or IR technology, etc. The GNSS may include a global positioning system (global positioning system, GPS), a global navigation satellite system (global navigation satellite system, GLONASS), a Beidou satellite navigation system (beidounavigation satellite system, BDS), a quasi-zenith satellite system (quasi- zenith satellite system (QZSS) and/or satellite based augmentation systems (SBAS).

图2是本发明实施例的电子设备100的软件结构框图。分层架构将软件分成若干个层，每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中，将Android系统分为四层，从上至下分别为应用程序层，应用程序框架层，安卓运行时(Android runtime)和系统库，以及内核层。FIG. 2 is a block diagram of the software structure of the electronic device 100 according to the embodiment of the present invention. The layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate through software interfaces. In some embodiments, the Android system is divided into four layers, which are, from top to bottom, the application program layer, the application program framework layer, the Android runtime (Android runtime) and the system library, and the kernel layer.

应用程序层可以包括一系列应用程序包。The application layer can consist of a series of application packages.

如图2所示，应用程序包可以包括相机，图库，日历，通话，地图，导航，WLAN，蓝牙，音乐，视频，短信息等应用程序(也可以称为应用)。As shown in FIG. 2, the application package may include application programs (also called applications) such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, and short message.

应用程序框架层为应用程序层的应用程序提供应用编程接口(applicationprogramming interface，API)和编程框架。应用程序框架层包括一些预先定义的函数。The application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer. The application framework layer includes some predefined functions.

如图2所示，应用程序框架层可以包括窗口管理器，内容提供器，视图系统，电话管理器，资源管理器，通知管理器等。As shown in Figure 2, the application framework layer can include window managers, content providers, view systems, phone managers, resource managers, notification managers, and so on.

窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小，判断是否有状态栏，锁定屏幕，截取屏幕等。A window manager is used to manage window programs. The window manager can get the size of the display screen, determine whether there is a status bar, lock the screen, capture the screen, etc.

内容提供器用来存放和获取数据，并使这些数据可以被应用程序访问。所述数据可以包括视频，图像，音频，拨打和接听的电话，浏览历史和书签，电话簿等。Content providers are used to store and retrieve data and make it accessible to applications. Said data may include video, images, audio, calls made and received, browsing history and bookmarks, phonebook, etc.

视图系统包括可视控件，例如显示文字的控件，显示图片的控件等。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如，包括短信通知图标的显示界面，可以包括显示文字的视图以及显示图片的视图。The view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on. The view system can be used to build applications. A display interface can consist of one or more views. For example, a display interface including a text message notification icon may include a view for displaying text and a view for displaying pictures.

电话管理器用于提供电子设备100的通信功能。例如通话状态的管理(包括接通，挂断等)。The phone manager is used to provide communication functions of the electronic device 100 . For example, the management of call status (including connected, hung up, etc.).

资源管理器为应用程序提供各种资源，比如本地化字符串，图标，图片，布局文件，视频文件等等。The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and so on.

通知管理器使应用程序可以在状态栏中显示通知信息，可以用于传达告知类型的消息，可以短暂停留后自动消失，无需用户交互。比如通知管理器被用于告知下载完成，消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知，例如后台运行的应用程序的通知，还可以是以对话界面形式出现在屏幕上的通知。例如在状态栏提示文本信息，发出提示音，电子设备振动，指示灯闪烁等。The notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and can automatically disappear after a short stay without user interaction. For example, the notification manager is used to notify the download completion, message reminder, etc. The notification manager can also be a notification that appears on the top status bar of the system in the form of a chart or scroll bar text, such as a notification of an application running in the background, or a notification that appears on the screen in the form of a dialog interface. For example, prompting text information in the status bar, issuing a prompt sound, vibrating the electronic device, and flashing the indicator light, etc.

Android runtime包括核心库和虚拟机。Android runtime负责安卓系统的调度和管理。Android runtime includes core library and virtual machine. The Android runtime is responsible for the scheduling and management of the Android system.

核心库包含两部分：一部分是java语言需要调用的功能函数，另一部分是安卓的核心库。The core library consists of two parts: one part is the function function that the java language needs to call, and the other part is the core library of Android.

应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理，堆栈管理，线程管理，安全和异常的管理，以及垃圾回收等功能。The application layer and the application framework layer run in virtual machines. The virtual machine executes the java files of the application program layer and the application program framework layer as binary files. The virtual machine is used to perform functions such as object life cycle management, stack management, thread management, security and exception management, and garbage collection.

系统库可以包括多个功能模块。例如：表面管理器(surface manager)，媒体库(Media Libraries)，三维图形处理库(例如：OpenGL ES)，2D图形引擎(例如：SGL)等。A system library can include multiple function modules. For example: surface manager (surface manager), media library (Media Libraries), 3D graphics processing library (eg: OpenGL ES), 2D graphics engine (eg: SGL), etc.

表面管理器用于对显示子系统进行管理，并且为多个应用程序提供了2D和3D图层的融合。The surface manager is used to manage the display subsystem and provides the fusion of 2D and 3D layers for multiple applications.

媒体库支持多种常用的音频，视频格式回放和录制，以及静态图像文件等。媒体库可以支持多种音视频编码格式，例如: MPEG4，H.264，MP3，AAC，AMR，JPG，PNG等。The media library supports playback and recording of various commonly used audio and video formats, as well as still image files, etc. The media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.

三维图形处理库用于实现三维图形绘图，图像渲染，合成，和图层处理等。The 3D graphics processing library is used to implement 3D graphics drawing, image rendering, compositing, and layer processing, etc.

2D图形引擎是2D绘图的绘图引擎。2D graphics engine is a drawing engine for 2D drawing.

内核层是硬件和软件之间的层。内核层至少包含显示驱动，摄像头驱动，音频驱动，传感器驱动。The kernel layer is the layer between hardware and software. The kernel layer includes at least a display driver, a camera driver, an audio driver, and a sensor driver.

下面对本申请实施例提供的图像处理方法进一步进行详细描述：The image processing method provided by the embodiment of the present application is further described in detail below:

请参见图3，图3是本申请实施例提供的一种图像处理方法的流程示意图。图3所示的方法执行主语可以为电子设备，或主语可以为电子设备中的芯片。图3以电子设备为方法的执行主体为例进行说明。本申请实施例的其他附图所示的图像处理方法的执行主语同理，后文不再赘述。图3所示的图像处理方法包括步骤301~步骤303。其中：Please refer to FIG. 3 . FIG. 3 is a schematic flowchart of an image processing method provided by an embodiment of the present application. The execution subject of the method shown in FIG. 3 may be an electronic device, or the subject may be a chip in the electronic device. FIG. 3 is illustrated by taking an electronic device as an execution body of the method as an example. The execution subject of the image processing method shown in other drawings of the embodiments of the present application is the same, and will not be described in detail below. The image processing method shown in FIG. 3 includes steps 301 to 303. in:

301、对待处理图像进行全景分割处理，得到多个对象和多个对象对应的轮廓掩码。301. Perform panorama segmentation processing on an image to be processed to obtain multiple objects and contour masks corresponding to the multiple objects.

本申请实施例中，待处理图像指用户需要进行处理的图像，该待处理图像可以是电子设备本地存储的图像，例如，该电子设备为终端设备，用户通过终端设备的摄像头进行拍摄得到了该图像，该图像存储在该终端设备的本地存储空间中；该待处理图像也可以是由其它设备发送给电子设备的，例如，该电子设备为服务器，用户通过终端设备向服务器发送该待处理图像，以请求服务器对该待处理图像进行图像处理。本申请实施例对如何获取待处理对象不作限定。In this embodiment of the present application, the image to be processed refers to the image that the user needs to process. The image to be processed may be an image stored locally on the electronic device. For example, the electronic device is a terminal device, and the user obtains the image through the camera of the terminal device. image, the image is stored in the local storage space of the terminal device; the image to be processed may also be sent to the electronic device by other devices, for example, the electronic device is a server, and the user sends the image to be processed to the server through the terminal device , to request the server to perform image processing on the image to be processed. The embodiment of the present application does not limit how to obtain the object to be processed.

其中，全景分割处理包括语义分割(semantic)和实例分割(instancesegmentation)。语义分割指确定图像中的每个像素所对应的语义，但是同一类别之间的对象不会区分。语义分割用于确定图像中包含的一个或者多个语义对象。实例分割指确定图像中像素所对应的实例，且会对特定的物体实例进行分类，实例分割用于确定图像中包含的一个或者多个实例对象。示例性的，如图4所示，图4所示的待处理图像中包含了两辆车和一棵树，若经过全景分割中的语义对象分割，则可以确定该待处理对象中包含两个语义对象，即图像区域401和图像区域402中包含的像素对应一个语义对象，该语义对象的类别为车辆，而图像区域403中包含的像素对应一个语义对象，该语义对象的类别为树。而在经过实例对象分割后，可以确定该待处理对象中包含三个实例对象，即图像区域401对应实例对象车辆1，图像区域401对应实例对象车辆2，图像区域403对应实例对象树。本申请实施例所描述的多个对象包括语义对象和实例对象，其中，若待处理对象中的图像区域能够通过实例分割确定对应的实例对象，则确定该像素所对应的对象为实例对象，例如树，车辆，人物。若待处理对象中的像素不能通过实例分割确定对应的实例对象，仅能通过语义分割确定该像素对应的语义对象，则确定该像素所对应的对象为语义对象，例如天空，大地等。还需说明的是，轮廓掩码主要用于指示该对象所对应的轮廓。电子设备通过全景分割处理，能够识别待处理图像中的一个或者多个对象，从而后续能够对需要移除的对象进行处理。Wherein, the panoramic segmentation processing includes semantic segmentation (semantic) and instance segmentation (instance segmentation). Semantic segmentation refers to determining the semantics corresponding to each pixel in an image, but objects of the same category will not be distinguished. Semantic segmentation is used to determine one or more semantic objects contained in an image. Instance segmentation refers to determining the instance corresponding to the pixel in the image, and classifying specific object instances. Instance segmentation is used to determine one or more instance objects contained in the image. Exemplarily, as shown in FIG. 4, the image to be processed shown in FIG. 4 contains two vehicles and a tree. After the semantic object segmentation in the panorama segmentation, it can be determined that the object to be processed contains two The semantic objects, that is, the pixels contained in theimage region 401 and theimage region 402 correspond to a semantic object whose category is a vehicle, and the pixels contained in theimage region 403 correspond to a semantic object whose category is a tree. After instance object segmentation, it can be determined that the object to be processed contains three instance objects, that is, theimage area 401 corresponds to theinstance object vehicle 1, theimage area 401 corresponds to the instance object vehicle 2, and theimage area 403 corresponds to the instance object tree. The multiple objects described in the embodiments of the present application include semantic objects and instance objects, wherein, if the image region in the object to be processed can determine the corresponding instance object through instance segmentation, then determine the object corresponding to the pixel as the instance object, for example Trees, vehicles, people. If the pixel in the object to be processed cannot determine the corresponding instance object through instance segmentation, but can only determine the semantic object corresponding to the pixel through semantic segmentation, then determine the object corresponding to the pixel as a semantic object, such as the sky, the earth, etc. It should also be noted that the contour mask is mainly used to indicate the contour corresponding to the object. The electronic device can identify one or more objects in the image to be processed through panoramic segmentation processing, so that the object to be removed can be subsequently processed.

在一种可能的实现方式中，对待处理图像进行全景分割处理，得到多个对象和多个对象对应的轮廓掩码，具体实现方式可以为：将待处理图像输入全景分割模型，得到多个对象和多个对象对应的轮廓掩码。In a possible implementation, the image to be processed is subjected to panoramic segmentation processing to obtain multiple objects and the contour masks corresponding to the multiple objects. The specific implementation method may be: input the image to be processed into the panoramic segmentation model to obtain multiple objects Contour masks corresponding to multiple objects.

该全景分割模型主要用于对图像进行全景分割处理。可选地，该全景分割模型的训练方式可以为：获取第一训练图像以及第一训练图像中包括的至少一个第二对象；将第一训练图像输入全景分割模型得到至少一个第三对象；基于至少一个第二对象和至少一个第三对象调整全景分割模型的参数。其中，训练该全景分割模型的设备可以为电子设备，也可以是除了电子设备以外的其他设备，本申请实施例对此不作限定。第一训练图像的数量可以为一张或者多张，本申请实施例对此不作限定。The panoramic segmentation model is mainly used for panoramic segmentation processing of images. Optionally, the training method of the panorama segmentation model may be: acquiring a first training image and at least one second object included in the first training image; inputting the first training image into the panorama segmentation model to obtain at least one third object; based on At least one second object and at least one third object adjust parameters of the panoramic segmentation model. Wherein, the device for training the panorama segmentation model may be an electronic device, or may be other devices except the electronic device, which is not limited in this embodiment of the present application. The number of the first training images may be one or more, which is not limited in this embodiment of the present application.

其中，该全景分割模型可以如图5所示，首先通过特征提取模块提取图像特征，该特征提取模块可以为深度残差网络(Deep residual network, ResNet)50。将图像特征通过语义分割和实例分割，得到语义对象和实例对象，基于语义对象和实例对象的结果进行融合。具体的融合结果为，针对一个像素，如果该像素有对应的实例对象，则该像素属于该实例对象，为实例分割的结果；如果该像素没有对应的实例对象，则该像素属于对应的语义对象，为语义分割的结果。第二对象和第三对象均可以指语义对象和/或实例对象。第二对象为第一训练图像中所包括的对象，示例性的，可以是通过人为处理已经分割好后的对象，而第三对象是第一训练图像通过全景分割模型处理后得到的对象，第二对象和第三对象越相近，则该全景分割模型的全景分割处理能力越强。可以通过像素计算第二对象和第三对象之间的误差，通过反向传播算法，调整网络参数，减小误差。基于该实现方式，有利于得到一个全景分割处理能力较强的全景分割模型，从而提高全景处理的效果。Wherein, the panorama segmentation model may be shown in FIG. 5 , and image features are first extracted through a feature extraction module, which may be a deep residual network (Deep residual network, ResNet) 50. Semantic objects and instance objects are obtained by image features through semantic segmentation and instance segmentation, and fusion is performed based on the results of semantic objects and instance objects. The specific fusion result is, for a pixel, if the pixel has a corresponding instance object, then the pixel belongs to the instance object, which is the result of instance segmentation; if the pixel does not have a corresponding instance object, then the pixel belongs to the corresponding semantic object , is the result of semantic segmentation. Both the second object and the third object may refer to semantic objects and/or instance objects. The second object is an object included in the first training image. Exemplarily, it may be an object that has been segmented through artificial processing, and the third object is an object obtained after the first training image is processed by a panoramic segmentation model. The first The closer the second object is to the third object, the stronger the panoramic segmentation processing capability of the panoramic segmentation model is. The error between the second object and the third object can be calculated by pixels, and the network parameters can be adjusted to reduce the error through the backpropagation algorithm. Based on this implementation method, it is beneficial to obtain a panoramic segmentation model with a strong panoramic segmentation processing capability, thereby improving the effect of panoramic processing.

可选地，在全景分割模型训练完成后，还可以通过全景质量 (panoptic quality，PQ)评价模型的效果，其中，PQ值越高，则说明该全景分割模型越好；PQ值越低，则说明该全景分割模型效果越差。PQ的计算方式如下：Optionally, after the panoptic segmentation model training is completed, the effect of the model can also be evaluated by panoptic quality (PQ), wherein, the higherthe PQ value, the better the panoptic segmentation model; the lower thePQ value, the It shows that the effect of the panoramic segmentation model is worse.PQ is calculated as follows:

(1)

其中，TP(True Positive，真正)指被模型预测为正的正样本，FP(FalsePositive，假正)指被模型预测为正的负样本，FN(False Negative，假负)表示被模型预测为负的正样本，p表示预测值，g表示真实值(ground truth，GT)。IoU(Intersection overUnion)是一种测量在特定数据集中检测相应物体准确度的一个标准。Among them,TP (True Positive, real) refers to the positive sample predicted by the model as positive,FP (FalsePositive, false positive) refers to the negative sample predicted as positive by the model,and FN (False Negative, false negative) refers to the negative sample predicted by the model The positive sample of , p represents the predicted value, and g represents the real value (ground truth, GT). IoU (Intersection over Union) is a standard for measuring the accuracy of detecting corresponding objects in a specific data set.

302、移除待处理图像中的目标对象得到第一图像，该目标对象为多个对象中的对象。302. Remove a target object in the image to be processed to obtain a first image, where the target object is an object in multiple objects.

本申请实施例中，第一图像为不包含目标对象所对应的像素的待处理图像。例如，如图6所示，待处理图像中车辆601为目标对象，电子设备移除待处理对象中的车辆601后，得到第一图像，其中车辆601所对应的像素位置602此时为空白，不包含任何内容。In the embodiment of the present application, the first image is an image to be processed that does not include pixels corresponding to the target object. For example, as shown in FIG. 6 , thevehicle 601 in the image to be processed is the target object, and after the electronic device removes thevehicle 601 in the object to be processed, the first image is obtained, wherein thepixel position 602 corresponding to thevehicle 601 is blank at this time, Does not contain any content.

303、基于第一对象的轮廓掩码对第一图像中的第一对象进行图像补全，第一对象为多个对象中被目标对象遮挡的对象。303. Perform image complementation on the first object in the first image based on the contour mask of the first object, where the first object is an object occluded by the target object among the multiple objects.

本申请实施例中，第一对象为多个对象中被目标对象遮挡的对象，例如，如图7所示，在图像补全之前，第一图像中像素区域701所对应的位置为被移除的目标对象所在的位置，像素区域701所对应的目标对象遮挡了对象702，对象702对应的语义类型为大地，从图像中看出对象702所对应的大地由于被目标对象所遮挡，所以空缺了一部分，因此可以确定该对象702为第一对象。电子设备采用图像补全的方式，基于第一对象的轮廓掩码对第一图像中的第一对象进行图像补全，第一对象的轮廓掩码可以表征该第一对象的形状，第一对象由于被目标对象遮挡，因此第一对象的轮廓并不是完整的，因此电子设备可以根据第一对象对应的语义类型的完整轮廓和该第一对象的轮廓掩码进行对比，确定缺失的部分，从而进行补全。例如，如图7所示，对象702对应的类型为大地，由于被目标对象所遮挡，可以通过对象702所对应的轮廓掩码识别确定对象702所对应的大地缺失的像素区域701所在的部分，因此可以基于对象702对应的轮廓掩码以及第一对象，对像素区域701所在的部分进行图像补全。In the embodiment of the present application, the first object is an object that is blocked by the target object among multiple objects. For example, as shown in FIG. 7 , before image completion, the position corresponding to thepixel area 701 in the first image is removed. The location of the target object, the target object corresponding to thepixel area 701 blocks theobject 702, and the semantic type corresponding to theobject 702 is the earth. It can be seen from the image that the earth corresponding to theobject 702 is vacant because it is blocked by the target object Therefore, it can be determined that theobject 702 is the first object. The electronic device uses image completion to perform image completion on the first object in the first image based on the contour mask of the first object. The contour mask of the first object can represent the shape of the first object, and the first object Due to being blocked by the target object, the contour of the first object is not complete, so the electronic device can compare the complete contour of the semantic type corresponding to the first object with the contour mask of the first object to determine the missing part, thereby Complete. For example, as shown in FIG. 7 , the type corresponding to theobject 702 is the earth. Since it is blocked by the target object, the part where the missingpixel region 701 of the earth corresponding to theobject 702 is identified can be identified through the contour mask corresponding to theobject 702. Therefore, based on the contour mask corresponding to theobject 702 and the first object, image completion can be performed on the part where thepixel region 701 is located.

在一种可能的实现方式中，基于第一对象的轮廓掩码对第一图像中的第一对象进行图像补全的具体实现方式为：将第一对象的轮廓掩码和第一对象输入图像补全模型进行图像补全。In a possible implementation manner, the specific implementation manner of performing image complementation on the first object in the first image based on the contour mask of the first object is: input the contour mask of the first object and the first object into the image The completion model performs image completion.

该图像补全模型主要用于对第一图像中的第一对象进行图像补全。可选地，该图像补全模型的训练方法可以为：获取第二训练图像和第二训练图像中的对象的轮廓掩码，第二训练图像中包括一个对象；随机去除第二训练图像中的任意大小的区域得到输入图像；将输入图像和第二训练图像中的对象的轮廓掩码输入至图像补全模型，得到补全图像；基于补全图像和第二训练图像计算轮廓掩码损失值；基于轮廓掩码损失值调整图像补全模型中的参数。其中，训练该图像补全模型的设备可以为电子设备，也可以是除了电子设备以外的其他设备，本申请实施例对此不作限定。第二训练图像的数量可以为一张或者多张，本申请实施例对此不作限定。The image completion model is mainly used for image completion of the first object in the first image. Optionally, the training method of the image completion model may be as follows: obtaining the second training image and the contour mask of the object in the second training image, where an object is included in the second training image; An input image is obtained from an area of any size; the contour mask of the object in the input image and the second training image is input to the image completion model to obtain the completed image; the contour mask loss value is calculated based on the completed image and the second training image ; Adjust the parameters in the image completion model based on the contour mask loss value. Wherein, the device for training the image completion model may be an electronic device, or other devices except the electronic device, which is not limited in this embodiment of the present application. The number of the second training images may be one or more, which is not limited in this embodiment of the present application.

其中，第二训练图像中仅包括一个对象，获取第二训练图像的方法可以为：收集全景分割公开的数据集或者也可以自采数据集，自采数据集的方法为：使用不规则的轮廓线将将物体分割出来，可以分出实例对象的，如桌椅、人物等，标注每一实例对象的轮廓；若无法分出实例对象，则区分其语义对象，如地面、天空，标注每一语义对象的轮廓即可。基于上述对数据集处理，可以得到多个对象，将这多个对象拆分为只包含一个对象的图像，即该图像为第二训练图像。若为语义对象，可以切割出该语义对象最大的完整的矩形区域作为第二训练图像。其中，若实例对象是不完整的样本，例如，假设该实例对象为一辆车，但分割出得到的车辆仅有真实车辆的一部分，完整度小于80%，则可以剔除该实例对象，不用于进行训练。Wherein, only one object is included in the second training image, the method of obtaining the second training image can be: collect the public dataset of panorama segmentation or self-collected dataset, the method of self-collected dataset is: use irregular outline The line will separate the object. If the instance object can be separated, such as tables, chairs, people, etc., mark the outline of each instance object; if the instance object cannot be separated, then distinguish its semantic objects, such as the ground and sky, and mark each The outline of the semantic object is sufficient. Based on the above processing of the data set, multiple objects can be obtained, and the multiple objects are split into an image containing only one object, that is, the image is the second training image. If it is a semantic object, the largest complete rectangular area of the semantic object may be cut out as the second training image. Among them, if the instance object is an incomplete sample, for example, assuming that the instance object is a car, but the segmented vehicle is only a part of the real vehicle, and the completeness is less than 80%, the instance object can be eliminated and not used for to train.

还需补充说明的是，随机去除第二训练图像中的任意大小的区域得到输入图像，这个步骤可以重复多次，主要用于模拟真实的场景中，对象会被遮挡的场景。例如，如图8所示，将第二训练图像中包含了一个实例对象汽车，随机去除第二训练图像中任意大小，或者任意形状的区域得到输入对象，通过图8可见输入图像中包含的实例对象汽车有一部分区域被去除掉了。What needs to be added is that the input image is obtained by randomly removing any area of any size in the second training image. This step can be repeated many times, and it is mainly used to simulate the scene where the object will be occluded in the real scene. For example, as shown in Figure 8, the second training image contains an example object car, and randomly removes any size or shape of the region in the second training image to obtain the input object, and the example contained in the input image can be seen through Figure 8 Part of the subject car has been removed.

轮廓掩码损失值是基于该图像补全模型对输入图像进行图像补全处理时该图像补全模型中的部分参数得到，其中，轮廓掩码损失值越小则说明该图像补全模型的图像补全效果越好。进一步可选地，图像补全模型包括粗修复网络和精修复网络；将输入图像和第二训练图像中的对象的轮廓掩码输入至图像补全模型，得到补全图像，具体实现方式为：将输入图像和第二训练图像中的对象的轮廓掩码输入至粗修复网络得到粗修复图像；将粗修复图像、输入图像和第二训练图像中的对象的轮廓掩码输入至精修复网络得到补全图像；基于补全图像和第二训练图像计算轮廓掩码损失值，具体实现方式为：基于补全图像、粗修复图像和第二训练图像计算轮廓掩码损失值。基于该实现方式，能够训练出图像补全效果更好的图像补全模型，从而有利于提升图像补全后的美观度。The contour mask loss value is obtained based on some parameters in the image completion model when the image completion model performs image completion processing on the input image. The smaller the contour mask loss value is, the better the image completion model is. The better the complement. Further optionally, the image completion model includes a rough repair network and a fine repair network; the input image and the contour mask of the object in the second training image are input to the image completion model to obtain a completed image, and the specific implementation method is: Input the contour mask of the object in the input image and the second training image to the rough repair network to obtain a rough repair image; input the rough repair image, the input image and the contour mask of the object in the second training image to the fine repair network to obtain Completing the image; calculating a contour mask loss value based on the complementary image and the second training image, the specific implementation method is: calculating the contour mask loss value based on the complementary image, the rough repair image and the second training image. Based on this implementation method, an image completion model with better image completion effect can be trained, which is beneficial to improve the aesthetics of the image completion.

图像补全模型的网络结构如图9所示，图像补全模型包括粗修复网络和精修复网络，其中，粗修复网络用于粗略生成缺失部分，精修复网络，用于实现更好的修复效果。粗修复网络的输入包括输入图像和轮廓掩码，其中，轮廓掩码指的是该输入图像所对应的第二训练图像中包含的对象的轮廓掩码，该轮廓掩码指示了输入图像中所包含的对象的完整轮廓。在经过粗修复网络修复后，可以得到粗修复图像。精修复网络的输入包括输入图像、轮廓掩码以及通过粗修复网络得到的粗修复图像，经过精修复网络可以得到最终的输出图像。The network structure of the image completion model is shown in Figure 9. The image completion model includes a rough repair network and a fine repair network. Among them, the rough repair network is used to roughly generate missing parts, and the fine repair network is used to achieve better repair results. . The input of the coarse inpainting network includes an input image and a contour mask, wherein the contour mask refers to the contour mask of the object contained in the second training image corresponding to the input image, and the contour mask indicates the The full outline of the contained object. After being inpainted by the coarse inpainting network, the coarse inpainted image can be obtained. The input of the fine inpainting network includes the input image, the contour mask and the rough inpainting image obtained through the rough inpainting network, and the final output image can be obtained through the fine inpainting network.

还需要补充的是，轮廓掩码损失值L_m可以通过下列公式(2)进行计算：It should also be added that the contour mask loss valueL_m can be calculated by the following formula (2):

(2)

其中，I_r表示补全图像，I_p表示粗修复图像，I’_gt表示第二训练图像中的对象的轮廓掩码。Among them,I_r represents the completed image,I_p represents the rough inpainted image,_andI'gt represents the contour mask of the object in the second training image.

进一步可选地，电子设备基于补全图像和第二训练图像计算轮廓掩码损失值，基于轮廓掩码损失值调整图像补全模型中的参数的具体实现方式为：基于一致性损失值(consistency Loss)L_c、重建损失值(reconstruction loss)L_re、鉴别器损失值D_R和轮廓掩码损失值L_m计算出总损失值L，基于该总损失值L调整图像补全模型中的相关参数，总损失值L计算公式可以参见下列公式(3)：Further optionally, the electronic device calculates a contour mask loss value based on the complemented image and the second training image, and the specific implementation manner of adjusting parameters in the image completion model based on the contour mask loss value is: based on the consistency loss value (consistency Loss)L_c , reconstruction loss (reconstruction loss)L_re , discriminator loss valueD_R and contour mask loss valueL_m calculate the total loss valueL , and adjust the correlation in the image completion model based on the total loss valueL Parameters, the calculation formula of the total loss valueL can refer to the following formula (3):

(3)

λ_r表示L_re所对应的权重值、λ_m表示L_m所对应的权重值、λ_c表示L_c所对应的权重值、λ_R表示D_R所对应的权重值。λ_r_represents the weight value corresponding toL_re ,λ_m represents the weight value corresponding toL_m ,λ_c represents the weight value corresponding toL_c , andλ_R represents the weight value corresponding toDR .

L_c的计算方式可以参见公式(4)：The calculation method ofL_c can be referred to formula (4):

(4)

其中，I_p表示粗修复图像，I_gt表示第二训练图像，CSA表示精修复网络中的残差通道空间注意力(channel-spatial attention，CSA)层之后的输出的特征向量，Φ_n表示第二训练图像的超分辨率测试序列(Visual Geometry Group，VGG)4-3层的特征向量，CSA_d表示解码器中对应的位置的特征向量。Among them,I_p represents the rough repair image,I_gt represents the second training image,CSA represents the output feature vector after the residual channel spatial attention (channel-spatial attention, CSA) layer in the fine repair network, Φ_n represents the first 2. The feature vector of the super-resolution test sequence (Visual Geometry Group, VGG) layer 4-3 of the training image,CSA_d represents the feature vector of the corresponding position in the decoder.

CSA层用于保证在图像修复的过程中，生成的填充区域与未残缺部分图像的连贯性。具体的，CSA的过程主要分为搜索和生成两个步骤。如图10所示，M表示图像的残缺区域，即需要填充的区域M，

表示未残缺区域。搜索阶段中，计算每一图像块m_i与未残缺区域

的相似度D_max，计算方法可以参见下列公式(5)，将相似度最高的图像块作为

。The CSA layer is used to ensure the continuity of the generated filled area and the undamaged part of the image during the image restoration process. Specifically, the process of the CSA is mainly divided into two steps of searching and generating. As shown in Figure 10,M represents the incomplete area of the image, that is, the areaM that needs to be filled,

Indicates the uninfected area. In the search phase, calculate each image blockm_i and the non-incomplete area

similarityD_max , the calculation method can refer to the following formula (5), and the image block with the highest similarity is taken as

.

(5)

在生成阶段，计算每一图像块m_i与其相邻图像块的相似度D_ad，计算方法可以参见公式(6)所示，使用相邻图像块的值来对当前的初始值进行优化，如公式(7)所示，得到的最终的值作为当前的图像块的像素值。其中，m₁没有相邻图像块，因此，m₁对应的相似度D_ad为0。In the generation stage, the similarityD_ad between each image blockm_i and its adjacent image blocks is calculated. The calculation method can be shown in formula (6), using the values of adjacent image blocks to optimize the current initial value, as shown in As shown in formula (7), the obtained final value is used as the pixel value of the current image block. Wherein,m₁ has no adjacent image block, therefore, the similarityD_ad corresponding tom₁ is 0.

(6)

(7)

L_re的计算方式可以参见公式(8)：The calculation method ofL_re can be referred to formula (8):

(8)

D_R的计算方式可以参见公式(9)：The calculation method ofD_R can be referred to formula (9):

(9)

其中，D表示鉴别器，具体指补全图像和粗修复图像输入到鉴别器种，网络负责分辨出真实的图像和虚假的图像。其中，

表示对一个批次（batch）内的第二修复图像的取平均值，

表示一个batch内的补全图像取平均值。Among them, D represents the discriminator, specifically refers to the input of the completed image and the rough repaired image to the discriminator, and the network is responsible for distinguishing the real image from the fake image. in,

Represents the average of the second repaired image in a batch (batch),

Indicates the average value of the completed images in a batch.

在一种可能的实现方式中，第一对象的数量为多个，电子设备基于第一对象的轮廓掩码对第一图像中的第一对象进行图像补全，具体实现方式为：确定多个第一对象的深度值；基于多个第一对象的深度值，计算多个第一对象中通过图像补全的像素的深度值；基于多个第一对象的轮廓掩码和多个第一对象中通过图像补全的像素的深度值，对第一图像中的多个第一对象进行图像补全，其中，深度值高的第一对象的图层低于深度值低的第一对象的图层。In a possible implementation manner, there are multiple first objects, and the electronic device performs image completion on the first object in the first image based on the contour mask of the first object. The depth value of the first object; based on the depth values of the plurality of first objects, calculate the depth value of the pixels in the plurality of first objects through image completion; based on the contour masks of the plurality of first objects and the plurality of first objects Image completion is performed on a plurality of first objects in the first image through the depth values of pixels in the image completion, wherein the layers of the first objects with high depth values are lower than those of the first objects with low depth values layer.

其中，当目标对象遮挡的第一对象为多个时，由于多个第一对象被遮挡住，因此在通过图像补全后，多个第一对象之间可能存在重叠的地方，为了能够得到更好的图像补全效果，本申请提出了深度估计方法，即计算多个第一对象之间的深度值。其中深度值可以理解为在拍摄图像时，该物体距离摄像头的远近，深度值越高，说明该物体离摄像头越远，深度值越低，则说明物体距离该摄像头越近，因此计算重叠的两个第一对象在重叠部分的深度值，将深度值高的第一对象的图层设置低于深度值低的第一对象的图像，有利于提高图像补全的效果，从而提高图像的美观度。例如，如图11所示，假设车辆1101为目标对象，树1102和树1103分别是被目标对象车挡后的第一对象，在将车辆1101去除后，对树1102和树1103进行图像补全后，会发现树1102和树1103之间存在重叠的像素区域1104。计算树1102和树1103分别在像素区域1104的深度值，若经过计算确定树1102在像素区域1104的深度值低于树1103，因此最终设定树1102的图层高于树1103，即可得到最终图像。Wherein, when there are multiple first objects occluded by the target object, since the multiple first objects are occluded, there may be overlaps among the multiple first objects after image completion, in order to obtain a more accurate For a good image completion effect, the present application proposes a depth estimation method, that is, calculating depth values between multiple first objects. The depth value can be understood as the distance between the object and the camera when the image is captured. The higher the depth value, the farther the object is from the camera, and the lower the depth value, the closer the object is to the camera. The depth value of the first object in the overlapping part, setting the layer of the first object with a high depth value lower than the image of the first object with a low depth value will help improve the effect of image completion, thereby improving the aesthetics of the image . For example, as shown in FIG. 11 , assuming that thevehicle 1101 is the target object, thetree 1102 and thetree 1103 are the first objects blocked by the target object respectively, and after thevehicle 1101 is removed, image complementation is performed on thetree 1102 and thetree 1103 After that, it will be found that there is an overlappingpixel area 1104 between thetree 1102 and thetree 1103 . Calculate the depth values oftree 1102 andtree 1103 in thepixel area 1104 respectively. If it is determined through calculation that the depth value oftree 1102 in thepixel area 1104 is lower than that oftree 1103, so finally set the layer oftree 1102 higher thantree 1103, you can get final image.

计算多个第一对象中通过图像补全的像素的深度值的具体实现方式可以为：确定第一对象中第一像素的坐标（x坐标，y坐标）、第二像素的坐标（x坐标，y坐标）和第二像素的真实深度值，其中，第一像素指第一对象中需要图像补全的像素，第二像素指第一对象中除需要图像补全的像素以外的其它像素。然后通过机器学习算法建立数学模型，例如假设最高次幂为3次，则可以假设目标深度对应的多项式为：The specific implementation manner of calculating the depth values of the pixels through image completion in multiple first objects may be: determine the coordinates (x coordinate, y coordinate) of the first pixel in the first object, the coordinates (x coordinate, y coordinate) of the second pixel (x coordinate, y coordinate) and the real depth value of the second pixel, wherein the first pixel refers to the pixel in the first object that needs image completion, and the second pixel refers to other pixels in the first object except the pixel that needs image completion. Then establish a mathematical model through a machine learning algorithm. For example, assuming that the highest power is 3 times, you can assume that the polynomial corresponding to the target depth is:

(10)

其中，x、y为坐标值，a~j为常数项，先随机初始化为任意常数，输入第二像素及其对应的坐标值，计算与目标深度值，对比目标深度值与真实深度值之间的误差，利用反向传播，调整常数值，反复迭代。当与目标深度值和真实深度值之间的误差小于阈值时，则认为求取到合理的常数项，停止迭代，数学模型建立完成，确定最终的目标深度值对应的多项式。将第一像素的坐标代入该多项式，可以确定该第一像素对应的深度值。Among them, x and y are coordinate values, and a~j are constant items. First, randomly initialize to any constant, input the second pixel and its corresponding coordinate value, calculate the target depth value, and compare the target depth value with the real depth value. For the error, use backpropagation, adjust the constant value, and iterate repeatedly. When the error between the target depth value and the real depth value is less than the threshold, it is considered that a reasonable constant term has been obtained, the iteration is stopped, the mathematical model is established, and the polynomial corresponding to the final target depth value is determined. By substituting the coordinates of the first pixel into the polynomial, the depth value corresponding to the first pixel can be determined.

可选地，确定多个第一对象的深度值，具体实现方式为：将待处理图像输入深度估计模型得到待处理图像中每个像素的深度值；基于待处理对象中每个像素的深度值确定第一对象的深度值。Optionally, determine the depth values of multiple first objects. The specific implementation method is: input the image to be processed into the depth estimation model to obtain the depth value of each pixel in the image to be processed; based on the depth value of each pixel in the object to be processed Determine the depth value of the first object.

该深度估计模型主要用于计算图像中每个像素的深度值，进一步可选地，该深度估计模型的训练方法可以为：获取第三训练图像和第三训练图像对应的真实深度值；将第三训练图像输入至深度估计模型得到训练深度值；基于训练深度值和真实深度值的误差调整深度估计模型中的参数。其中，训练该深度估计模型的设备可以为电子设备，也可以是除了电子设备以外的其他设备，本申请实施例对此不作限定。第三训练图像的数量可以为一张或者多张，本申请实施例对此不作限定。The depth estimation model is mainly used to calculate the depth value of each pixel in the image, and further optionally, the training method of the depth estimation model may be: obtaining the third training image and the real depth value corresponding to the third training image; The three training images are input to the depth estimation model to obtain the training depth value; the parameters in the depth estimation model are adjusted based on the error between the training depth value and the real depth value. Wherein, the device for training the depth estimation model may be an electronic device, or may be other devices except the electronic device, which is not limited in this embodiment of the present application. The number of the third training images may be one or more, which is not limited in this embodiment of the present application.

其中，可以通过RGBD数据采集设备采集第三训练图像，RGBD数据采集设备采集到的第三训练图像还包括该第三训练图像每个像素的真实深度值。通过深度估计模型计算的得到第三训练图像对应的训练深度值，逐像素计算训练深度值和真实深度值之间的误差，使用反向传播法，调整深度估计模型的网络参数，减小误差。基于该实现方式，有利于能够获得深度估计效果更好的深度估计模型，从而有利于提高图像补全的效果，从而提高图像的美观度。Wherein, the third training image can be collected by the RGBD data collection device, and the third training image collected by the RGBD data collection device also includes the real depth value of each pixel of the third training image. Obtain the training depth value corresponding to the third training image through the calculation of the depth estimation model, calculate the error between the training depth value and the real depth value pixel by pixel, and use the back propagation method to adjust the network parameters of the depth estimation model to reduce the error. Based on this implementation method, it is beneficial to obtain a depth estimation model with better depth estimation effect, thereby improving the effect of image completion, thereby improving the aesthetics of the image.

在一种可能的实现方式中，移除待处理图像中的目标对象得到第一图像之前，该方法还包括：显示待处理图像中包括的多个对象和多个对象的标识；接收用户对待处理图像中的对象的选择操作；将选择操作选择的对象确定为目标对象。示例性的，如图12所示，电子设备上显示了待处理图像，以及待处理图像中所包含的多个对象和多个对象对应的标识，用户可以通过点击等操作方式选择需要移除的目标对象，例如假设用户需要移除树2，则点击树2所对应的区域，电子设备即可通过用户的选择操作确定树2为目标对象。基于该实现方式，有利于使用户能够灵活地控制需要移除的目标对象。In a possible implementation manner, before removing the target object in the image to be processed to obtain the first image, the method further includes: displaying multiple objects and the identifications of the multiple objects included in the image to be processed; A selection operation of an object in an image; the object selected by the selection operation is determined as a target object. Exemplarily, as shown in Figure 12, the image to be processed is displayed on the electronic device, as well as multiple objects contained in the image to be processed and the identifications corresponding to the multiple objects, and the user can select the object to be removed by clicking or other operations. For the target object, for example, assuming that the user needs to remove the tree 2, the user clicks on the area corresponding to the tree 2, and the electronic device can determine the tree 2 as the target object through the user's selection operation. Based on this implementation manner, it is beneficial to enable the user to flexibly control the target object to be removed.

可选地，将选择操作选择的对象确定为目标对象，具体实现方式为：确定选择操作选择的对象的语义类型；将多个对象中语义类型和选择操作选择的对象的语义类型相同的对象确定为目标对象。示例性的，如图12所示，当用户通过点击的方式选择了待处理图像中的树2时，电子设备可以识别树2所对应的语义类型为树，且该待处理图像还包括和树2的语义类型一样的对象，该对象为树1。基于该方法，用户仅需选择树1，电子设备即可以识别树1和树2均为目标对象，能够直接移除待处理图像中用户选择的语义类型的对象，简化了用户的操作。Optionally, the object selected by the selection operation is determined as the target object. The specific implementation method is: determine the semantic type of the object selected by the selection operation; for the target object. Exemplarily, as shown in FIG. 12 , when the user selects tree 2 in the image to be processed by clicking, the electronic device can recognize that the semantic type corresponding to tree 2 is a tree, and the image to be processed also includes and tree 2 is an object of the same semantic type astree 1. Based on this method, the user only needs to selecttree 1, and the electronic device can recognize that bothtree 1 and tree 2 are target objects, and can directly remove the semantic type object selected by the user in the image to be processed, which simplifies the user's operation.

在另一种可能的实现方式中，目标对象的语义类型为预设语义类型。用户可以提前通过电子设备设置好预设的语义类型，例如，假设用户设定目标对象的语义类型为车辆，则当电子设备在处理待处理对象时，可以直接将待处理对象中的车辆全部视作目标对象。基于该实现方式，可直接移除用户预设的语义类型，简化用户操作，提高了图像处理的便捷性。In another possible implementation manner, the semantic type of the target object is a preset semantic type. The user can set the preset semantic type through the electronic device in advance. For example, if the user sets the semantic type of the target object as a vehicle, then when the electronic device is processing the object to be processed, all the vehicles in the object to be processed can be directly viewed as as the target object. Based on this implementation method, the semantic type preset by the user can be directly removed, the user operation is simplified, and the convenience of image processing is improved.

可选地，多个对象中包括多个第四对象的语义类型为预设语义类型；移除待处理图像中的目标对象得到第一图像之前，该方法还包括：基于预设图像和多个第四对象确定第五对象；确定多个第四对象中除第五对象以外的其它对象为目标对象。示例性的，假设预设图像为用户上传的自拍照，预设的语义类型为人物，待处理对象中包含的多个第四对象分别为用户和游客，其中，多个第四对象中的第五对象为用户本人，多个第四对象中除第五对象以外的其它对象为游客，因此电子设备可以通过预设图像识别多个第四对象，确定出第五对象，将除第五对象以外的其它对象确定为目标对象，并进行移除。基于该实现方式，能够无需用户进行手动操作，通过预设图像保留用户期望留下的对象和预设语义类型移除用户希望移除的对象，提高了图像处理的智能性，也简化了用户的操作。Optionally, the semantic types of the plurality of fourth objects among the plurality of objects are preset semantic types; before removing the target object in the image to be processed to obtain the first image, the method further includes: based on the preset image and the plurality of The fourth object determines the fifth object; and other objects in the plurality of fourth objects except the fifth object are determined as target objects. Exemplarily, assuming that the preset image is a selfie uploaded by a user, the preset semantic type is a person, and the plurality of fourth objects included in the object to be processed are respectively a user and a tourist, wherein the fourth object among the plurality of fourth objects The fifth object is the user himself, and the other objects except the fifth object among the multiple fourth objects are tourists. Therefore, the electronic device can identify multiple fourth objects through the preset image, determine the fifth object, and identify the fifth object except the fifth object. The other objects of are identified as target objects and removed. Based on this implementation method, it is possible to retain the objects that the user expects to leave in the preset image and remove the object that the user wants to remove through the preset image without manual operation by the user, which improves the intelligence of image processing and simplifies the user's work. operate.

本申请实施例还提供了一种计算机可读存储介质，该计算机可读存储介质中存储有指令，当其在计算机或处理器上运行时，使得计算机或处理器执行上述任一个方法中的一个或多个步骤。The embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores instructions, and when it is run on a computer or a processor, the computer or the processor executes one of the above-mentioned methods or multiple steps.

本申请实施例还提供了一种包含指令的计算机程序产品。当该计算机程序产品在计算机或处理器上运行时，使得计算机或处理器执行上述任一个方法中的一个或多个步骤。The embodiment of the present application also provides a computer program product including instructions. When the computer program product is run on the computer or the processor, the computer or the processor is made to perform one or more steps in any one of the above methods.

在上述实施例中，可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时，可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时，全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中，或者通过所述计算机可读存储介质进行传输。所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线（例如同轴电缆、光纤、数字用户线）或无线（例如红外、无线、微波等）方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质，（例如，软盘、硬盘、磁带）、光介质(例如，DVD)、或者半导体介质（例如，固态硬盘（solid state disk，SSD））等。In the above embodiments, all or part of them may be implemented by software, hardware, firmware or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part. The computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in or transmitted via a computer-readable storage medium. Said computer instructions may be sent from one website site, computer, server or data center to another website site, computer, server or data center for transmission. The computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrated with one or more available media. The available medium may be a magnetic medium (for example, a floppy disk, a hard disk, or a magnetic tape), an optical medium (for example, DVD), or a semiconductor medium (for example, a solid state disk (solid state disk, SSD)) and the like.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，该流程可以由计算机程序来指令相关的硬件完成，该程序可存储于计算机可读取存储介质中，该程序在执行时，可包括如上述各方法实施例的流程。而前述的存储介质包括：ROM或随机存储记忆体RAM、磁碟或者光盘等各种可存储程序代码的介质。在上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述的部分，可以参见其他实施例的相关描述。最后应说明的是：以上各实施例仅用以说明本申请的技术方案，而非对其限制；尽管参照前述各实施例对本申请进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分或者全部技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments are realized. The processes can be completed by computer programs to instruct related hardware. The programs can be stored in computer-readable storage media. When the programs are executed , may include the processes of the foregoing method embodiments. The aforementioned storage medium includes: ROM or random access memory RAM, magnetic disk or optical disk, and other various media that can store program codes. In the foregoing embodiments, the descriptions of each embodiment have their own emphases, and for parts not described in detail in a certain embodiment, reference may be made to relevant descriptions of other embodiments. Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and are not intended to limit it; although the application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: It is still possible to modify the technical solutions described in the foregoing embodiments, or perform equivalent replacements for some or all of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the various embodiments of the present application. scope.