CN111815666B

Movatterモバイル変換

Info

Publication number: CN111815666B
Application number: CN202010796552.3A
Authority: CN
Inventors: 樊欢欢; 李姬俊男
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2020-08-10
Filing date: 2020-08-10
Publication date: 2024-04-02
Anticipated expiration: 2040-08-10
Also published as: CN111815666A

Abstract

The present disclosure provides an image processing method, an image processing apparatus, a computer-readable storage medium, and an electronic device, and relates to the technical field of image processing. The image processing method comprises the following steps: acquiring a two-dimensional image, performing semantic segmentation on the two-dimensional image, and determining a foreground area and a background area of the two-dimensional image; determining depth information of a background area; determining pixel information and depth information of a shielding area by using the pixel information and the depth information of a background area; the position of the shielding area corresponds to the position of the foreground area on the two-dimensional image; and combining the pixel information and the depth information of the shielding area to generate a three-dimensional image corresponding to the two-dimensional image. The method and the device can convert the two-dimensional image into the three-dimensional image so as to improve the stereoscopic representation capability of the image.

Description

Translated fromChinese

图像处理方法及装置、计算机可读存储介质和电子设备Image processing method and device, computer-readable storage medium and electronic equipment

技术领域Technical Field

本公开涉及图像处理技术领域，具体而言，涉及一种图像处理方法、图像处理装置、计算机可读存储介质和电子设备。The present disclosure relates to the technical field of image processing, and specifically, to an image processing method, an image processing device, a computer-readable storage medium, and an electronic device.

背景技术Background technique

随着例如手机、平板电脑等电子设备的普及以及摄像模组配置的不断提升，用户对拍照效果的追求也不断提高。With the popularization of electronic devices such as mobile phones and tablet computers and the continuous improvement of camera module configurations, users' pursuit of photo effects is also increasing.

目前，用户利用电子设备拍摄出的照片存在不生动、缺乏立体感的问题，尤其在寓教于乐的场景中，不能充分发挥出展示的作用。At present, the photos taken by users using electronic devices are not vivid and lack three-dimensionality. Especially in entertaining and educational scenes, they cannot fully play the role of display.

发明内容Summary of the invention

本公开提供一种图像处理方法、图像处理装置、计算机可读存储介质和电子设备，进而至少在一定程度上克服拍摄出的照片立体感弱的问题。The present disclosure provides an image processing method, an image processing device, a computer-readable storage medium and an electronic device, thereby overcoming the problem of weak stereoscopic effect of taken photos at least to a certain extent.

根据本公开的第一方面，提供了一种图像处理方法，包括：获取二维图像，对二维图像进行语义分割，确定二维图像的前景区域和背景区域；确定背景区域的深度信息；利用背景区域的像素信息和深度信息，确定遮挡区域的像素信息和深度信息；其中，遮挡区域的位置与前景区域在二维图像上的位置对应；结合遮挡区域的像素信息和深度信息，生成与二维图像对应的三维图像。According to a first aspect of the present disclosure, there is provided an image processing method, comprising: acquiring a two-dimensional image, performing semantic segmentation on the two-dimensional image, determining a foreground area and a background area of the two-dimensional image; determining depth information of the background area; using pixel information and depth information of the background area, determining pixel information and depth information of an occluded area; wherein a position of the occluded area corresponds to a position of a foreground area on the two-dimensional image; and generating a three-dimensional image corresponding to the two-dimensional image in combination with the pixel information and depth information of the occluded area.

根据本公开的第二方面，提供了一种图像处理装置，包括：语义分割模块，用于获取二维图像，对二维图像进行语义分割，确定二维图像的前景区域和背景区域；深度确定模块，用于确定背景区域的深度信息；遮挡信息确定模块，用于利用背景区域的像素信息和深度信息，确定遮挡区域的像素信息和深度信息；其中，遮挡区域的位置与前景区域在二维图像上的位置对应；三维图像生成模块，用于结合遮挡区域的像素信息和深度信息，生成与二维图像对应的三维图像。According to a second aspect of the present disclosure, an image processing device is provided, including: a semantic segmentation module for acquiring a two-dimensional image, performing semantic segmentation on the two-dimensional image, and determining the foreground area and background area of the two-dimensional image; depth determination The module is used to determine the depth information of the background area; the occlusion information determination module is used to use the pixel information and depth information of the background area to determine the pixel information and depth information of the occlusion area; among them, the position of the occlusion area and the foreground area are in two dimensions Position correspondence on the image; the three-dimensional image generation module is used to combine the pixel information and depth information of the occluded area to generate a three-dimensional image corresponding to the two-dimensional image.

根据本公开的第三方面，提供了一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现上述的图像处理方法。According to a third aspect of the present disclosure, a computer-readable storage medium is provided, a computer program is stored thereon, and when the program is executed by a processor, the above-mentioned image processing method is implemented.

根据本公开的第四方面，提供了一种电子设备，包括处理器；存储器，用于存储一个或多个程序，当一个或多个程序被处理器执行时，使得所述处理器实现上述的图像处理方法。According to a fourth aspect of the present disclosure, an electronic device is provided, comprising a processor; and a memory for storing one or more programs, wherein when the one or more programs are executed by the processor, the processor implements the above-mentioned image processing method.

在本公开的一些实施例所提供的技术方案中，对二维图像进行语义分割，得到前景区域和背景区域，并确定背景区域的深度信息，利用背景区域的像素信息和深度信息，确定遮挡区域的像素信息和深度信息，进而利用遮挡区域的像素信息和深度信息生成三维图像。一方面，本公开可以将二维图像转换为三维图像，提高了图像展示的立体感，视觉效果得到了提升；另一方面，针对寓教于乐的场景，本公开可以充分展示图像的信息，使用户更易了解图像的内容；再一方面，可以将本方案应用于增强现实技术或虚拟现实技术中，构建不同类型的应用场景，以提高用户的感知程度和参与度。In the technical solution provided by some embodiments of the present disclosure, semantic segmentation is performed on the two-dimensional image to obtain the foreground area and background area, and the depth information of the background area is determined. The pixel information and depth information of the background area are used to determine the occlusion area. The pixel information and depth information of the occluded area are then used to generate a three-dimensional image. On the one hand, the present disclosure can convert a two-dimensional image into a three-dimensional image, improving the three-dimensional sense of image display and improving the visual effect; on the other hand, for educational and entertaining scenes, the present disclosure can fully display the information of the image, It makes it easier for users to understand the content of images; on the other hand, this solution can be applied to augmented reality technology or virtual reality technology to build different types of application scenarios to improve user perception and participation.

应当理解的是，以上的一般描述和后文的细节描述仅是示例性和解释性的，并不能限制本公开。It should be understood that the foregoing general description and the following detailed description are exemplary and explanatory only, and do not limit the present disclosure.

附图说明Description of drawings

此处的附图被并入说明书中并构成本说明书的一部分，示出了符合本公开的实施例，并与说明书一起用于解释本公开的原理。显而易见地，下面描述中的附图仅仅是本公开的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。在附图中：The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. Obviously, the drawings in the following description are only some embodiments of the present disclosure. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without exerting creative efforts. In the attached picture:

图1示出了本公开实施例的图像处理方案的示例性系统架构的示意图；Figure 1 shows a schematic diagram of an exemplary system architecture of an image processing solution according to an embodiment of the present disclosure;

图2示出了适于用来实现本公开实施例的电子设备的结构示意图；Figure 2 shows a schematic structural diagram of an electronic device suitable for implementing embodiments of the present disclosure;

图3示意性示出了根据本公开的示例性实施方式的图像处理方法的流程图；3 schematically illustrates a flowchart of an image processing method according to an exemplary embodiment of the present disclosure;

图4示意性示出了根据本公开实施例的语义分割的效果图；Figure 4 schematically shows a rendering of semantic segmentation according to an embodiment of the present disclosure;

图5示意性示出了利用神经网络确定遮挡区域的像素信息和深度信息的网络结构图；FIG5 schematically shows a network structure diagram for determining pixel information and depth information of an occluded area using a neural network;

图6示意性示出了根据本公开实施例的整个图像处理过程的流程图；Figure 6 schematically shows a flow chart of the entire image processing process according to an embodiment of the present disclosure;

图7示意性示出了根据本公开的示例性实施方式的图像处理装置的方框图；FIG7 schematically shows a block diagram of an image processing apparatus according to an exemplary embodiment of the present disclosure;

图8示意性示出了根据本公开的另一示例性实施方式的图像处理装置的方框图；8 schematically illustrates a block diagram of an image processing device according to another exemplary embodiment of the present disclosure;

图9示意性示出了根据本公开的又一示例性实施方式的图像处理装置的方框图。FIG. 9 schematically shows a block diagram of an image processing device according to yet another exemplary embodiment of the present disclosure.

具体实施方式Detailed ways

现在将参考附图更全面地描述示例实施方式。然而，示例实施方式能够以多种形式实施，且不应被理解为限于在此阐述的范例；相反，提供这些实施方式使得本公开将更加全面和完整，并将示例实施方式的构思全面地传达给本领域的技术人员。所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施方式中。在下面的描述中，提供许多具体细节从而给出对本公开的实施方式的充分理解。然而，本领域技术人员将意识到，可以实践本公开的技术方案而省略所述特定细节中的一个或更多，或者可以采用其它的方法、组元、装置、步骤等。在其它情况下，不详细示出或描述公知技术方案以避免喧宾夺主而使得本公开的各方面变得模糊。Example embodiments will now be described more fully with reference to the accompanying drawings. However, example embodiments can be implemented in a variety of forms and should not be construed as being limited to the examples set forth herein; on the contrary, these embodiments are provided so that the present disclosure will be more comprehensive and complete, and the concepts of the example embodiments are fully conveyed to those skilled in the art. The described features, structures, or characteristics may be combined in one or more embodiments in any suitable manner. In the following description, many specific details are provided to provide a full understanding of the embodiments of the present disclosure. However, those skilled in the art will appreciate that the technical solutions of the present disclosure may be practiced while omitting one or more of the specific details, or other methods, components, devices, steps, etc. may be adopted. In other cases, known technical solutions are not shown or described in detail to avoid obscuring various aspects of the present disclosure.

此外，附图仅为本公开的示意性图解，并非一定是按比例绘制。图中相同的附图标记表示相同或类似的部分，因而将省略对它们的重复描述。附图中所示的一些方框图是功能实体，不一定必须与物理或逻辑上独立的实体相对应。可以采用软件形式来实现这些功能实体，或在一个或多个硬件模块或集成电路中实现这些功能实体，或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings represent the same or similar parts, and thus their repeated description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software form, or implemented in one or more hardware modules or integrated circuits, or implemented in different networks and/or processor devices and/or microcontroller devices.

附图中所示的流程图仅是示例性说明，不是必须包括所有的步骤。例如，有的步骤还可以分解，而有的步骤可以合并或部分合并，因此实际执行的顺序有可能根据实际情况改变。The flowcharts shown in the figures are illustrative only and do not necessarily include all steps. For example, some steps can be decomposed, and some steps can be merged or partially merged, so the actual order of execution may change according to the actual situation.

随着终端技术和摄像技术的发展，用户对图像的要求越来越高。相册中的二维图像存在不生动、缺乏立体感的问题，如果将二维图像转换为三维图像，不但能够使图像的内容更加丰富，还可以提高娱乐性和用户体验。With the development of terminal technology and camera technology, users have higher and higher requirements for images. The two-dimensional images in the photo album are not vivid and lack three-dimensionality. If the two-dimensional images are converted into three-dimensional images, the content of the images will not only be enriched, but also the entertainment and user experience will be improved.

在本公开的示例性实施方式中，结合语义分割技术和深度估计技术，可以实现将二维图像转换为三维图像，进而可以将二维相册转换为三维相册。在一些场景中，还可以利用三维图像形成动画，以起到寓教于乐的目的。此外，还可以将生成的三维图像应用在增强现实场景或虚拟现实场景，本公开对生成的三维图像的应用范围不做限制。In the exemplary embodiments of the present disclosure, by combining semantic segmentation technology and depth estimation technology, it is possible to convert a two-dimensional image into a three-dimensional image, and then convert a two-dimensional photo album into a three-dimensional photo album. In some scenarios, three-dimensional images can also be used to form animations to achieve the purpose of entertaining and educating. In addition, the generated three-dimensional images can also be applied to augmented reality scenarios or virtual reality scenarios, and the present disclosure does not limit the scope of application of the generated three-dimensional images.

图1示出了本公开实施例的图像处理方案的示例性系统架构的示意图。FIG. 1 shows a schematic diagram of an exemplary system architecture of an image processing solution according to an embodiment of the present disclosure.

如图1所示，系统架构1000可以包括终端设备1001、1002、1003中的一种或多种，网络1004和服务器1005。网络1004用以在终端设备1001、1002、1003和服务器1005之间提供通信链路的介质。网络1004可以包括各种连接类型，例如有线、无线通信链路或者光纤电缆等等。As shown in Figure 1, system architecture 1000 may include one or more of terminal devices 1001, 1002, 1003, a network 1004 and a server 1005. Network 1004 is a medium used to provide communication links between terminal devices 1001, 1002, 1003 and server 1005. Network 1004 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

应该理解，图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要，可以具有任意数目的终端设备、网络和服务器。比如服务器1005可以是多个服务器组成的服务器集群等。It should be understood that the number of terminal devices, networks and servers in Figure 1 is only illustrative. Depending on implementation needs, there can be any number of end devices, networks, and servers. For example, the server 1005 may be a server cluster composed of multiple servers.

用户可以使用终端设备1001、1002、1003通过网络1004与服务器1005交互，以接收或发送消息等。终端设备1001、1002、1003可以是具有显示屏的各种电子设备，包括但不限于智能手机、平板电脑、便携式计算机和台式计算机等等。Users can use terminal devices 1001, 1002, 1003 to interact with the server 1005 through the network 1004 to receive or send messages, etc. The terminal devices 1001, 1002, and 1003 may be various electronic devices with display screens, including but not limited to smart phones, tablet computers, portable computers, desktop computers, and so on.

在仅使用终端设备1001、1002、1003实现本公开示例性实施方式的图像处理方案的实例中，在终端设备1001、1002、1003确定出需要转换为三维图像的二维图像时，首先，一方面，可以对二维图像进行语义分割，确定二维图像的前景区域和背景区域，另一方面，可以对二维图像进行深度估计，得到二维图像上各像素的深度信息，进而确定背景区域的深度信息；接下来，可以利用背景区域的像素信息和深度信息，确定遮挡区域的像素信息和深度信息，其中，遮挡区域的位置与前景区域在二维图像上的位置对应；然后，结合遮挡区域的像素信息和深度信息，生成与二维图像对应的三维图像。In an example of an image processing scheme of an exemplary embodiment of the present disclosure using only terminal devices 1001, 1002, and 1003, when terminal devices 1001, 1002, and 1003 determine a two-dimensional image that needs to be converted into a three-dimensional image, first, on the one hand, semantic segmentation can be performed on the two-dimensional image to determine the foreground area and background area of the two-dimensional image, and on the other hand, depth estimation can be performed on the two-dimensional image to obtain depth information of each pixel on the two-dimensional image, and then the depth information of the background area can be determined; next, the pixel information and depth information of the background area can be used to determine the pixel information and depth information of the occluded area, wherein the position of the occluded area corresponds to the position of the foreground area on the two-dimensional image; then, the pixel information and depth information of the occluded area are combined to generate a three-dimensional image corresponding to the two-dimensional image.

在这种情况下，下面所述的图像处理装置可以配置在终端设备1001、1002、1003中。In this case, the image processing apparatus described below may be configured in the terminal devices 1001, 1002, and 1003.

本公开所述图像处理方案还可以由服务器1005执行。首先，服务器1005借助于网络1004从终端设备1001、1002、1003获取二维图像，或者服务器1005可以从其他服务器或存储设备中获取二维图像；接下来，服务器1005可以对二维图像进行语义分割，确定二维图像的前景区域和背景区域，另外，还可以对二维图像进行深度估计，并基于深度估计的结果确定背景区域的深度信息；随后，服务器1005可以利用背景区域的像素信息和深度信息，确定遮挡区域的像素信息和深度信息，并结合遮挡区域的像素信息和深度信息，生成与二维图像对应的三维图像。此外，服务器1005还可以利用三维图像生成三维相册，以及/或者将三维图像发送给终端设备1001、1002、1003。The image processing scheme described in this disclosure can also be executed by the server 1005. First, the server 1005 obtains two-dimensional images from the terminal devices 1001, 1002, and 1003 via the network 1004, or the server 1005 can obtain two-dimensional images from other servers or storage devices; next, the server 1005 can perform semantic segmentation on the two-dimensional images. , determine the foreground area and background area of the two-dimensional image. In addition, the depth of the two-dimensional image can also be estimated, and the depth information of the background area is determined based on the depth estimation result; subsequently, the server 1005 can use the pixel information and depth of the background area. information, determine the pixel information and depth information of the occluded area, and combine the pixel information and depth information of the occluded area to generate a three-dimensional image corresponding to the two-dimensional image. In addition, the server 1005 can also use the three-dimensional images to generate a three-dimensional photo album, and/or send the three-dimensional images to the terminal devices 1001, 1002, and 1003.

在这种情况下，下面所述的图像处理装置可以配置在服务器1005中。In this case, the image processing device described below may be configured in the server 1005.

图2示出了适于用来实现本公开示例性实施方式的电子设备的示意图，可以将上述终端设备配置为图2所示电子设备的形式。另外需要说明的是，图2示出的电子设备仅是一个示例，不应对本公开实施例的功能和使用范围带来任何限制。FIG. 2 shows a schematic diagram of an electronic device suitable for implementing an exemplary embodiment of the present disclosure. The above-mentioned terminal device may be configured in the form of the electronic device shown in FIG. 2 . In addition, it should be noted that the electronic device shown in FIG. 2 is only an example and should not impose any restrictions on the functions and scope of use of the embodiments of the present disclosure.

本公开的电子设备至少包括处理器和存储器，存储器用于存储一个或多个程序，当一个或多个程序被处理器执行时，使得处理器可以实现本公开示例性实施方式的图像处理方法。The electronic device of the present disclosure at least includes a processor and a memory. The memory is used to store one or more programs. When the one or more programs are executed by the processor, the processor can implement the image processing method of the exemplary embodiments of the present disclosure.

具体的，如图2所示，电子设备200可以包括：处理器210、内部存储器221、外部存储器接口222、通用串行总线(Universal Serial Bus，USB)接口230、充电管理模块240、电源管理模块241、电池242、天线1、天线2、移动通信模块250、无线通信模块260、音频模块270、扬声器271、受话器272、麦克风273、耳机接口274、传感器模块280、显示屏290、摄像模组291、指示器292、马达293、按键294以及用户标识模块(Subscriber IdentificationModule，SIM)卡接口295等。其中传感器模块280可以包括深度传感器、压力传感器、陀螺仪传感器、气压传感器、磁传感器、加速度传感器、距离传感器、接近光传感器、指纹传感器、温度传感器、触摸传感器、环境光传感器及骨传导传感器等。Specifically, as shown in FIG2 , the electronic device 200 may include: a processor 210, an internal memory 221, an external memory interface 222, a Universal Serial Bus (USB) interface 230, a charging management module 240, a power management module 241, a battery 242, an antenna 1, an antenna 2, a mobile communication module 250, a wireless communication module 260, an audio module 270, a speaker 271, a receiver 272, a microphone 273, an earphone interface 274, a sensor module 280, a display screen 290, a camera module 291, an indicator 292, a motor 293, a button 294, and a Subscriber Identification Module (SIM) card interface 295, etc. The sensor module 280 may include a depth sensor, a pressure sensor, a gyroscope sensor, an air pressure sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity light sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, and a bone conduction sensor, etc.

可以理解的是，本申请实施例示意的结构并不构成对电子设备200的具体限定。在本申请另一些实施例中，电子设备200可以包括比图示更多或更少的部件，或者组合某些部件，或者拆分某些部件，或者不同的部件布置。图示的部件可以以硬件、软件或软件和硬件的组合实现。It can be understood that the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the electronic device 200 . In other embodiments of the present application, the electronic device 200 may include more or fewer components than shown in the figures, or some components may be combined, some components may be separated, or some components may be arranged differently. The components illustrated may be implemented in hardware, software, or a combination of software and hardware.

处理器210可以包括一个或多个处理单元，例如：处理器210可以包括应用处理器(Application Processor，AP)、调制解调处理器、图形处理器(Graphics ProcessingUnit，GPU)、图像信号处理器(Image Signal Processor，ISP)、控制器、视频编解码器、数字信号处理器(Digital Signal Processor，DSP)、基带处理器和/或神经网络处理器(Neural-etwork Processing Unit，NPU)等。其中，不同的处理单元可以是独立的器件，也可以集成在一个或多个处理器中。另外，处理器210中还可以设置存储器，用于存储指令和数据。The processor 210 may include one or more processing units. For example, the processor 210 may include an application processor (Application Processor, AP), a modem processor, a graphics processor (Graphics Processing Unit, GPU), an image signal processor ( Image Signal Processor (ISP), controller, video codec, digital signal processor (Digital Signal Processor, DSP), baseband processor and/or neural network processor (Neural-network Processing Unit, NPU), etc. Among them, different processing units can be independent devices or integrated in one or more processors. In addition, the processor 210 may also be provided with a memory for storing instructions and data.

USB接口230是符合USB标准规范的接口，具体可以是MiniUSB接口，MicroUSB接口，USBTypeC接口等。USB接口230可以用于连接充电器为电子设备200充电，也可以用于电子设备200与外围设备之间传输数据。也可以用于连接耳机，通过耳机播放音频。该接口还可以用于连接其他电子设备，例如AR设备等。The USB interface 230 is an interface that complies with USB standard specifications, and may specifically be a MiniUSB interface, a MicroUSB interface, a USBTypeC interface, etc. The USB interface 230 can be used to connect a charger to charge the electronic device 200, and can also be used to transmit data between the electronic device 200 and peripheral devices. It can also be used to connect headphones to play audio through them. This interface can also be used to connect other electronic devices, such as AR devices, etc.

充电管理模块240用于从充电器接收充电输入。其中，充电器可以是无线充电器，也可以是有线充电器。电源管理模块241用于连接电池242、充电管理模块240与处理器210。电源管理模块241接收电池242和/或充电管理模块240的输入，为处理器210、内部存储器221、显示屏290、摄像模组291和无线通信模块260等供电。The charge management module 240 is used to receive charging input from the charger. Among them, the charger can be a wireless charger or a wired charger. The power management module 241 is used to connect the battery 242, the charging management module 240 and the processor 210. The power management module 241 receives input from the battery 242 and/or the charging management module 240, and supplies power to the processor 210, internal memory 221, display screen 290, camera module 291, wireless communication module 260, etc.

电子设备200的无线通信功能可以通过天线1、天线2、移动通信模块250、无线通信模块260、调制解调处理器以及基带处理器等实现。The wireless communication function of the electronic device 200 can be implemented through the antenna 1, the antenna 2, the mobile communication module 250, the wireless communication module 260, the modem processor and the baseband processor.

移动通信模块250可以提供应用在电子设备200上的包括2G/3G/4G/5G等无线通信的解决方案。The mobile communication module 250 can provide wireless communication solutions including 2G/3G/4G/5G applied to the electronic device 200 .

无线通信模块260可以提供应用在电子设备200上的包括无线局域网(WirelessLocal Area Networks，WLAN)(如无线保真(Wireless Fidelity，Wi-Fi)网络)、蓝牙(Bluetooth，BT)、全球导航卫星系统(Global Navigation Satellite System，GNSS)、调频(Frequency Modulation，FM)、近距离无线通信技术(Near Field Communication，NFC)、红外技术(Infrared，IR)等无线通信的解决方案。The wireless communication module 260 can provide wireless local area networks (Wireless Local Area Networks, WLAN) (such as wireless fidelity (Wireless Fidelity, Wi-Fi) network), Bluetooth (Bluetooth, BT), and global navigation satellite systems applied on the electronic device 200. (Global Navigation Satellite System, GNSS), Frequency Modulation (FM), Near Field Communication (NFC), Infrared (IR) and other wireless communication solutions.

电子设备200通过GPU、显示屏290及应用处理器等实现显示功能。GPU为图像处理的微处理器，连接显示屏290和应用处理器。GPU用于执行数学和几何计算，用于图形渲染。处理器210可包括一个或多个GPU，其执行程序指令以生成或改变显示信息。The electronic device 200 implements display functions through a GPU, a display screen 290, an application processor, and the like. The GPU is an image processing microprocessor and is connected to the display screen 290 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering. Processor 210 may include one or more GPUs that execute program instructions to generate or alter display information.

电子设备200可以通过ISP、摄像模组291、视频编解码器、GPU、显示屏290及应用处理器等实现拍摄功能。在一些实施例中，电子设备200可以包括1个或N个摄像模组291，N为大于1的正整数，若电子设备200包括N个摄像头，N个摄像头中有一个是主摄像头。The electronic device 200 can realize the shooting function through the ISP, the camera module 291, the video codec, the GPU, the display screen 290 and the application processor. In some embodiments, the electronic device 200 may include 1 or N camera modules 291, where N is a positive integer greater than 1. If the electronic device 200 includes N cameras, one of the N cameras is a main camera.

内部存储器221可以用于存储计算机可执行程序代码，所述可执行程序代码包括指令。内部存储器221可以包括存储程序区和存储数据区。外部存储器接口222可以用于连接外部存储卡，例如Micro SD卡，实现扩展电子设备200的存储能力。Internal memory 221 may be used to store computer executable program code, which includes instructions. The internal memory 221 may include a program storage area and a data storage area. The external memory interface 222 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 200 .

电子设备200可以通过音频模块270、扬声器271、受话器272、麦克风273、耳机接口274及应用处理器等实现音频功能。例如音乐播放、录音等。The electronic device 200 can implement audio functions through an audio module 270, a speaker 271, a receiver 272, a microphone 273, a headphone interface 274, an application processor, and the like. For example, music playback, recording, etc.

音频模块270用于将数字音频信息转换成模拟音频信号输出，也用于将模拟音频输入转换为数字音频信号。音频模块270还可以用于对音频信号编码和解码。在一些实施例中，音频模块270可以设置于处理器210中，或将音频模块270的部分功能模块设置于处理器210中。The audio module 270 is used to convert digital audio information into analog audio signal output, and is also used to convert analog audio input into digital audio signal. The audio module 270 can also be used to encode and decode audio signals. In some embodiments, the audio module 270 can be arranged in the processor 210, or some functional modules of the audio module 270 can be arranged in the processor 210.

扬声器271，也称“喇叭”，用于将音频电信号转换为声音信号。电子设备200可以通过扬声器271收听音乐，或收听免提通话。受话器272，也称“听筒”，用于将音频电信号转换成声音信号。当电子设备200接听电话或语音信息时，可以通过将受话器272靠近人耳接听语音。麦克风273，也称“话筒”，“传声器”，用于将声音信号转换为电信号。当拨打电话或发送语音信息时，用户可以通过人嘴靠近麦克风273发声，将声音信号输入到麦克风273。电子设备200可以设置至少一个麦克风273。耳机接口274用于连接有线耳机。The speaker 271, also called "speaker", is used to convert audio electrical signals into sound signals. The electronic device 200 can listen to music or listen to hands-free calls through the speaker 271. The receiver 272, also called "earpiece", is used to convert audio electrical signals into sound signals. When the electronic device 200 answers a call or voice message, the voice can be answered by placing the receiver 272 close to the human ear. The microphone 273, also called "microphone" or "microphone", is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can speak by placing the human mouth close to the microphone 273 to input the sound signal into the microphone 273. The electronic device 200 can be provided with at least one microphone 273. The headphone jack 274 is used to connect wired headphones.

针对电子设备200中传感器模块280可以包括的传感器，深度传感器用于获取景物的深度信息。压力传感器用于感受压力信号，可以将压力信号转换成电信号。陀螺仪传感器可以用于确定电子设备200的运动姿态。气压传感器用于测量气压。磁传感器包括霍尔传感器。电子设备200可以利用磁传感器检测翻盖皮套的开合。加速度传感器可检测电子设备200在各个方向上(一般为三轴)加速度的大小。距离传感器用于测量距离。接近光传感器可以包括例如发光二极管(LED)和光检测器，例如光电二极管。指纹传感器用于采集指纹。温度传感器用于检测温度。触摸传感器可以将检测到的触摸操作传递给应用处理器，以确定触摸事件类型。可以通过显示屏290提供与触摸操作相关的视觉输出。环境光传感器用于感知环境光亮度。骨传导传感器可以获取振动信号。Regarding the sensors that the sensor module 280 in the electronic device 200 may include, the depth sensor is used to obtain the depth information of the scene. The pressure sensor is used to sense the pressure signal and can convert the pressure signal into an electrical signal. The gyroscope sensor can be used to determine the motion posture of the electronic device 200. The air pressure sensor is used to measure the air pressure. The magnetic sensor includes a Hall sensor. The electronic device 200 can use the magnetic sensor to detect the opening and closing of the flip leather case. The acceleration sensor can detect the magnitude of the acceleration of the electronic device 200 in various directions (generally three axes). The distance sensor is used to measure the distance. The proximity light sensor may include, for example, a light emitting diode (LED) and a light detector, such as a photodiode. The fingerprint sensor is used to collect fingerprints. The temperature sensor is used to detect the temperature. The touch sensor can transmit the detected touch operation to the application processor to determine the type of touch event. The visual output related to the touch operation can be provided through the display screen 290. The ambient light sensor is used to sense the ambient light brightness. The bone conduction sensor can obtain a vibration signal.

按键294包括开机键，音量键等。按键294可以是机械按键。也可以是触摸式按键。马达293可以产生振动提示。马达293可以用于来电振动提示，也可以用于触摸振动反馈。指示器292可以是指示灯，可以用于指示充电状态，电量变化，也可以用于指示消息，未接来电，通知等。SIM卡接口295用于连接SIM卡。电子设备200通过SIM卡和网络交互，实现通话以及数据通信等功能。The button 294 includes a power button, a volume button, etc. The button 294 can be a mechanical button. It can also be a touch button. The motor 293 can generate a vibration prompt. The motor 293 can be used for incoming call vibration prompts, and can also be used for touch vibration feedback. The indicator 292 can be an indicator light, which can be used to indicate the charging status, power changes, messages, missed calls, notifications, etc. The SIM card interface 295 is used to connect the SIM card. The electronic device 200 interacts with the network through the SIM card to realize functions such as calls and data communications.

本申请还提供了一种计算机可读存储介质，该计算机可读存储介质可以是上述实施例中描述的电子设备中所包含的；也可以是单独存在，而未装配入该电子设备中。This application also provides a computer-readable storage medium. The computer-readable storage medium may be included in the electronic device described in the above embodiments; it may also exist independently without being assembled into the electronic device.

计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件，或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于：具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中，计算机可读存储介质可以是任何包含或存储程序的有形介质，该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。The computer-readable storage medium may be, for example, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or any combination thereof. More specific examples of computer readable storage media may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard drive, random access memory (RAM), read only memory (ROM), removable Programmed read-only memory (EPROM or flash memory), fiber optics, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In this disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program for use by or in connection with an instruction execution system, apparatus, or device.

计算机可读存储介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读存储介质上包含的程序代码可以用任何适当的介质传输，包括但不限于：无线、电线、光缆、RF等等，或者上述的任意合适的组合。Computer-readable storage media may transmit, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer-readable storage medium may be transmitted using any suitable medium, including but not limited to: wireless, wire, optical cable, RF, etc., or any suitable combination of the above.

计算机可读存储介质承载有一个或者多个程序，当上述一个或者多个程序被一个该电子设备执行时，使得该电子设备实现如下述实施例中所述的方法。The computer-readable storage medium carries one or more programs. When the one or more programs are executed by an electronic device, the electronic device implements the method described in the following embodiments.

附图中的流程图和框图，图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分，上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意，在有些作为替换的实现中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个接连地表示的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图或流程图中的每个方框、以及框图或流程图中的方框的组合，可以用执行规定的功能或操作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operations of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logic functions that implement the specified executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown one after another may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved. It will also be noted that each block in the block diagram or flowchart illustration, and combinations of blocks in the block diagram or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or operations, or may be implemented by special purpose hardware-based systems that perform the specified functions or operations. Achieved by a combination of specialized hardware and computer instructions.

描述于本公开实施例中所涉及到的单元可以通过软件的方式实现，也可以通过硬件的方式来实现，所描述的单元也可以设置在处理器中。其中，这些单元的名称在某种情况下并不构成对该单元本身的限定。The units described in the embodiments of the present disclosure may be implemented in software or hardware, and the described units may also be provided in a processor. Among them, the names of these units do not constitute a limitation on the unit itself under certain circumstances.

下面将以终端设备执行本公开图像处理方案为例进行说明。The following will take an example of a terminal device executing the image processing solution of the present disclosure as an example.

图3示意性示出了本公开的示例性实施方式的图像处理方法的流程图。参考图3，所述图像处理方法可以包括以下步骤：FIG. 3 schematically shows a flowchart of an image processing method according to an exemplary embodiment of the present disclosure. Referring to Figure 3, the image processing method may include the following steps:

S32.获取二维图像，对二维图像进行语义分割，确定二维图像的前景区域和背景区域。S32. Obtain a two-dimensional image, perform semantic segmentation on the two-dimensional image, and determine the foreground area and background area of the two-dimensional image.

在本公开的示例性实施方式中，二维图像可以是通过终端设备的摄像模组拍摄出的图像，也可以是从其他设备或服务器上获取到的图像，本公开对二维图像的格式、尺寸、来源等均不做限制。In an exemplary embodiment of the present disclosure, the two-dimensional image may be an image captured by a camera module of a terminal device, or may be an image acquired from other devices or servers. The present disclosure does not impose any restrictions on the format, size, source, etc. of the two-dimensional image.

二维图像可以存储在二维相册中，用户可以从中挑选出要进行三维转换的二维图像执行本公开方案的步骤。终端设备也可以按时间顺序、拍摄地点等对相册中的二维图像进行分类，按类别执行本公开二维图像转换三维图像的方案。The two-dimensional images can be stored in a two-dimensional photo album, and the user can select the two-dimensional images to be converted into three-dimensional images to perform the steps of the disclosed solution. The terminal device can also classify the two-dimensional images in the album in chronological order, shooting location, etc., and execute the disclosed solution of converting two-dimensional images into three-dimensional images by category.

在另一些实施例中，每当终端设备拍摄一张二维图像时，终端设备均会执行本公开方案，以得到对应的三维图像。In other embodiments, whenever the terminal device captures a two-dimensional image, the terminal device will execute the disclosed solution to obtain the corresponding three-dimensional image.

终端设备在获取到待进行三维转换的二维图像后，可以对二维图像进行语义分割。所谓语义分割，指的是在像素级别上的分类，将属于同一类的像素归为一类。After acquiring the two-dimensional image to be converted into three-dimensionally, the terminal device can perform semantic segmentation on the two-dimensional image. The so-called semantic segmentation refers to classification at the pixel level, classifying pixels belonging to the same category into one category.

根据本公开一些实施例，可以采用语义分割模型来实现对二维图像的语义分割，该语义分割模型可以基于深度神经网络来实现。首先，可以利用训练数据集对语义分割模型进行训练，接下来，将二维图像输入训练后的语义分割模型，根据模型的输出，即可得到二维图像的前景区域和背景区域。例如，前景区域包含的对象可以是人、动物、汽车等与用户兴趣点对应的对象，而背景区域对应于人、动物、汽车等所处的背景，例如，草地、树木、天空等。According to some embodiments of the present disclosure, a semantic segmentation model can be used to implement semantic segmentation of two-dimensional images, and the semantic segmentation model can be implemented based on a deep neural network. First, the training data set can be used to train the semantic segmentation model. Next, the two-dimensional image is input into the trained semantic segmentation model. According to the output of the model, the foreground area and background area of the two-dimensional image can be obtained. For example, the objects contained in the foreground area may be people, animals, cars, etc. that correspond to the user's points of interest, while the background area corresponds to the background where people, animals, cars, etc. are located, such as grass, trees, sky, etc.

本公开对语义分割的实现方式不作具体限制，然而，应当注意的是，将语义分割的构思应用到二维图像转换至三维图像中的方案均属于本公开的内容。This disclosure does not place specific limitations on the implementation of semantic segmentation. However, it should be noted that any solution that applies the concept of semantic segmentation to converting two-dimensional images into three-dimensional images is within the scope of this disclosure.

图4示意性示出了根据本公开实施例的语义分割的效果图。参考图4，对二维图像40进行语义分割后，可以得到背景区域41和前景区域42。Figure 4 schematically shows an effect diagram of semantic segmentation according to an embodiment of the present disclosure. Referring to Figure 4, after performing semantic segmentation on the two-dimensional image 40, the background area 41 and the foreground area 42 can be obtained.

S34.确定背景区域的深度信息。S34. Determine the depth information of the background area.

在终端设备获取二维图像后，还可以对二维图像进行深度估计。所谓深度估计，指的是确定二维图像上各像素点的深度信息。After the terminal device acquires the two-dimensional image, it can also perform depth estimation on the two-dimensional image. The so-called depth estimation refers to determining the depth information of each pixel on the two-dimensional image.

根据本公开一些实施例，可以采用深度估计模型来实现对二维图像的深度估计，该深度估计模型也可以基于神经网络来实现。首先，可以利用大量带有像素级别深度标签的图像对深度估计模型进行训练，得到训练后的深度估计模型；接下来，可以将二维图像输入训练后的深度估计模型，根据模型的输出，可以得到二维图像深度估计的结果，即二维图像的深度信息。According to some embodiments of the present disclosure, a depth estimation model can be used to achieve depth estimation of a two-dimensional image, and the depth estimation model can also be implemented based on a neural network. First, a large number of images with pixel-level depth labels can be used to train the depth estimation model to obtain a trained depth estimation model. Next, two-dimensional images can be input into the trained depth estimation model. According to the output of the model, The result of depth estimation of the two-dimensional image is obtained, that is, the depth information of the two-dimensional image.

应当注意的是，本公开对执行深度估计的过程和步骤S32执行语义分割的过程的先后顺序不做限制。也就是说，可以先执行语义分割的过程后执行深度估计的过程，也可以先执行深度估计的过程后执行语义分割的过程，还可以同时执行语义分割和深度估计的过程。It should be noted that this disclosure does not limit the sequence of the process of performing depth estimation and the process of performing semantic segmentation in step S32. That is to say, the semantic segmentation process can be performed first and then the depth estimation process, or the depth estimation process can be performed first and then the semantic segmentation process, or the semantic segmentation and depth estimation processes can be performed simultaneously.

对二维图像进行深度估计后，可以基于深度估计的结果确定背景区域的深度信息。After depth estimation is performed on the two-dimensional image, the depth information of the background area can be determined based on the depth estimation result.

例如，在确定出二维图像的背景区域后，即可获得背景区域的坐标。接下来，利用前景区域的坐标可以从二维图像的深度信息中确定出背景区域的深度信息。For example, after determining the background area of a two-dimensional image, the coordinates of the background area can be obtained. Next, the depth information of the background area can be determined from the depth information of the two-dimensional image using the coordinates of the foreground area.

类似地，终端设备还可以确定出前景区域的深度信息。Similarly, the terminal device can also determine the depth information of the foreground area.

此外，在本公开的另一些实施例中，在对二维图像进行深度估计之前，还可以识别前景区域内是否包含目标对象。在前景区域包含目标对象的情况下，则对二维图像进行深度估计。在前景区域包含目标对象的情况下，则不对二维图像进行处理。In addition, in other embodiments of the present disclosure, before performing depth estimation on the two-dimensional image, it may also be identified whether the target object is contained in the foreground area. When the foreground area contains the target object, depth estimation is performed on the two-dimensional image. In the case where the foreground area contains the target object, the two-dimensional image is not processed.

该目标对象可以由用户提前设置，例如，在用户期望仅对二维相册中包含人物(或特定人物，如本人)的图像进行三维图像转换的情况下，用户可以设置目标对象为人物。具体的，可以在相册中配置该设置功能，用户可以通过滑动、点击、勾选等方式进行目标对象的设置。通过在相册中添加此设置功能，可以满足不同用户的需求。The target object can be set by the user in advance. For example, if the user wishes to convert only images containing people (or specific people, such as the user himself) in a two-dimensional album into three-dimensional images, the user can set the target object to be the person. Specifically, the setting function can be configured in the album, and the user can set the target object by sliding, clicking, checking, etc. By adding this setting function in the album, the needs of different users can be met.

具体的，针对识别前景区域是否包含目标对象的过程，在语义分割算法可以直接确定出分割出的区域所包含对象类型的情况下，可以直接根据语义分割的结果确定出前景区域是否包含目标对象。Specifically, for the process of identifying whether the foreground area contains the target object, if the semantic segmentation algorithm can directly determine the type of objects contained in the segmented area, it can be directly determined based on the results of semantic segmentation whether the foreground area contains the target object.

在语义分割算法不能直接确定出分割出的区域所包含对象类型的情况下，可以另外执行对前景区域的识别操作，以得到前景区域是否包含目标对象的结果。对前景区域进行图像识别的过程也可以采用神经网络的方式实现，本公开对此不做限制。When the semantic segmentation algorithm cannot directly determine the type of object contained in the segmented area, an additional identification operation on the foreground area can be performed to obtain the result of whether the foreground area contains the target object. The process of image recognition of the foreground area can also be implemented using a neural network, and this disclosure does not limit this.

上面以对二维图像进行深度估计为例，说明了确定背景区域深度信息的过程。然而，在本公开的另一些实施例中，终端设备上可以配置有深度传感器，在拍摄二维图像时，可以直接通过深度传感器获取二维图像的深度信息，进而可以直接确定出背景区域的深度信息。The above takes the depth estimation of a two-dimensional image as an example to illustrate the process of determining the depth information of the background area. However, in other embodiments of the present disclosure, a depth sensor may be configured on the terminal device, and when shooting a two-dimensional image, the depth information of the two-dimensional image may be directly obtained through the depth sensor, and the depth information of the background area may be directly determined.

S36.利用背景区域的像素信息和深度信息，确定遮挡区域的像素信息和深度信息；其中，遮挡区域的位置与前景区域在二维图像上的位置对应。S36. Use the pixel information and depth information of the background area to determine the pixel information and depth information of the occlusion area; where the position of the occlusion area corresponds to the position of the foreground area on the two-dimensional image.

在本公开的示例性实施方式中，遮挡区域指的是，前景区域遮挡背景的区域。遮挡区域的位置与前景区域在二维图像上的位置对应，也就是说，遮挡区域可以是二维图像中剔除前景区域后，二维图像中缺失的图像区域，也即所处位置是前景区域对应的位置。参考图4，遮挡区域为小狗遮挡的区域。In an exemplary embodiment of the present disclosure, the occlusion area refers to an area in which the foreground area blocks the background. The position of the occlusion area corresponds to the position of the foreground area on the two-dimensional image. That is to say, the occlusion area can be the missing image area in the two-dimensional image after the foreground area is eliminated from the two-dimensional image, that is, the location is the foreground area. corresponding location. Referring to Figure 4, the blocked area is the area blocked by the puppy.

在移动终端确定出背景区域的像素信息和深度信息的情况下，可以对遮挡区域的像素信息和深度信息进行预测。When the mobile terminal determines the pixel information and depth information of the background area, it can predict the pixel information and depth information of the occlusion area.

首先，可以对背景区域的像素信息和深度信息进行特征提取，生成中间信息。接下来，一方面，可以对中间信息执行像素信息预测过程，以确定遮挡区域的像素信息；另一方面，可以对中间信息执行深度信息预测过程，以确定遮挡区域的深度信息。First, feature extraction can be performed on the pixel information and depth information of the background area to generate intermediate information. Next, on the one hand, a pixel information prediction process can be performed on the intermediate information to determine the pixel information of the occlusion area; on the other hand, a depth information prediction process can be performed on the intermediate information to determine the depth information of the occlusion area.

具体的，可以通过一个卷积神经网络(Convolutional Neural Networks，CNN)来实现像素信息预测过程，并通过另一个卷积神经网络来实现深度信息预测过程。Specifically, the pixel information prediction process can be realized through a convolutional neural network (Convolutional Neural Networks, CNN), and the depth information prediction process can be realized through another convolutional neural network.

参考图5，首先，可以将背景区域的像素信息和深度信息输入第一神经网络51进行特征提取，生成中间信息。具体的，可以利用VGG16网络来配置第一神经网络51，也可以利用一个CNN网络来配置第一神经网络51，本公开对此不做限制。Referring to FIG. 5 , first, the pixel information and depth information of the background area can be input into the first neural network 51 for feature extraction to generate intermediate information. Specifically, the VGG16 network can be used to configure the first neural network 51, or a CNN network can be used to configure the first neural network 51. This disclosure does not limit this.

接下来，一方面，可以将中间信息输入第二神经网络52，该第二神经网络52可以是CNN网络，以对遮挡区域的像素信息进行预测，输出遮挡区域的像素信息。Next, on the one hand, the intermediate information can be input into the second neural network 52, which can be a CNN network, to predict the pixel information of the occluded area and output the pixel information of the occluded area.

另一方面，可以将中间信息输入第三神经网络53，该第三神经网络可以是另一CNN网络，以对遮挡区域的深度信息进行预测，输出遮挡区域的深度信息。On the other hand, the intermediate information can be input into the third neural network 53, which can be another CNN network, to predict the depth information of the occluded area and output the depth information of the occluded area.

本公开对图5所涉神经网络的网络结构及训练过程不做限制。The present disclosure does not limit the network structure and training process of the neural network involved in FIG5 .

此外，考虑到一些二维图像的前景区域与背景区域的深度差较小，没有必要耗费资源进行三维转换。因此，在确定遮挡区域的像素信息和深度信息之前，还可以包括确定前景区域与背景区域之间深度差的过程。In addition, considering that the depth difference between the foreground area and the background area of some two-dimensional images is small, there is no need to waste resources for three-dimensional conversion. Therefore, before determining the pixel information and depth information of the occlusion area, a process of determining the depth difference between the foreground area and the background area may also be included.

首先，终端设备可以确定前景区域的深度信息；接下来，基于前景区域的深度信息和背景区域的深度信息，确定前景区域与背景区域的深度差；随后，将该深度差与深度阈值进行比较。其中，可以预先设置该深度阈值，例如，将其设置为10cm、0.5m等。First, the terminal device can determine the depth information of the foreground area; next, based on the depth information of the foreground area and the depth information of the background area, determine the depth difference between the foreground area and the background area; and then, compare the depth difference with a depth threshold. Among them, the depth threshold can be set in advance, for example, set to 10cm, 0.5m, etc.

如果该深度差大于深度阈值，则执行确定遮挡区域的像素信息和深度信息的过程。如果该深度差不大于深度阈值，则停止本方案的处理过程，还可以向用户反馈“由于深度差较小，不建议进行转换”的提示。If the depth difference is greater than the depth threshold, the process of determining the pixel information and depth information of the occluded area is performed. If the depth difference is not greater than the depth threshold, the processing of this solution is stopped, and a prompt "conversion is not recommended due to the small depth difference" may be fed back to the user.

S38.结合遮挡区域的像素信息和深度信息，生成与二维图像对应的三维图像。S38. Combine the pixel information and depth information of the occluded area to generate a three-dimensional image corresponding to the two-dimensional image.

首先，可以基于深度估计的结果确定前景区域的深度信息，并获取前景区域的像素信息；接下来，可以结合遮挡区域的像素信息和深度信息以及前景区域的像素信息和深度信息，生成与二维图像对应的三维图像。First, the depth information of the foreground area can be determined based on the depth estimation result, and the pixel information of the foreground area can be obtained; next, the pixel information and depth information of the occlusion area and the pixel information and depth information of the foreground area can be combined to generate a two-dimensional The three-dimensional image corresponding to the image.

在本公开的一些实施例中，本公开所述三维图像可以是与在二维平面上与二维图像尺寸相同的图像。In some embodiments of the present disclosure, the three-dimensional image described in the present disclosure may be an image having the same size as the two-dimensional image on a two-dimensional plane.

在这种情况下，生成三维图像的过程，除需要遮挡区域的像素信息和深度信息以及前景区域的像素信息和深度信息外，还需要利用背景区域的像素信息和深度信息。In this case, the process of generating a three-dimensional image requires the use of pixel information and depth information of the background area in addition to the pixel information and depth information of the occluded area and the pixel information and depth information of the foreground area.

在本公开的另一些实施例中，本公开所述三维图像可以是仅针对前景区域的三维图像。就如图4所示的二维图像而言，生成的三维图像可以是仅包括小狗而不包含背景区域的三维图像。In other embodiments of the present disclosure, the three-dimensional image described in the present disclosure may be a three-dimensional image only for the foreground area. As for the two-dimensional image as shown in Figure 4, the generated three-dimensional image may be a three-dimensional image that only includes the puppy and does not include the background area.

具体的，可以利用遮挡区域的像素信息和深度信息以及前景区域的像素信息和深度信息，生成前景区域对应对象的三维图像，作为二维图像对应的三维图像。Specifically, the pixel information and depth information of the occlusion area and the pixel information and depth information of the foreground area can be used to generate a three-dimensional image of the object corresponding to the foreground area as a three-dimensional image corresponding to the two-dimensional image.

应当理解的是，生成三维图像的过程包括三维渲染的过程。另外，由于是三维图像，因此，根据观看视角的不同，可以映射出图像中各对象(物体)之间的遮挡关系，并根据该遮挡关系得到不同视角下的观看效果。在此基础上，可以生成三维动画，以便用户观看到不同角度下的三维图像。It should be understood that the process of generating a three-dimensional image includes a process of three-dimensional rendering. In addition, since it is a three-dimensional image, the occlusion relationship between objects (objects) in the image can be mapped based on different viewing angles, and the viewing effects at different viewing angles can be obtained based on the occlusion relationship. On this basis, three-dimensional animation can be generated so that users can view three-dimensional images from different angles.

下面将参考图6对本公开实施例的整个图像处理过程进行说明。The entire image processing process of the embodiment of the present disclosure will be described below with reference to FIG. 6 .

在步骤S602中，终端设备可以获取二维图像；在步骤S604中，终端设备可以对二维图像进行语义分割；在步骤S606中，终端设备可以对二维图像进行深度估计。In step S602, the terminal device can acquire a two-dimensional image; in step S604, the terminal device can perform semantic segmentation on the two-dimensional image; in step S606, the terminal device can perform depth estimation on the two-dimensional image.

基于步骤S604语义分割的结果，在步骤S608中，可以确定出前景区域，在步骤S610中，可以确定出背景区域。基于步骤S606语义分割的结果，在步骤S612中，可以确定出二维图像上各像素的深度值(即深度信息)。Based on the result of semantic segmentation in step S604, in step S608, the foreground area can be determined, and in step S610, the background area can be determined. Based on the result of semantic segmentation in step S606, in step S612, the depth value (ie, depth information) of each pixel on the two-dimensional image can be determined.

在步骤S614中，可以根据背景区域的像素信息和背景区域的深度信息，对遮挡部分进行像素估计和深度估计。In step S614, pixel estimation and depth estimation may be performed on the occluded portion according to the pixel information of the background area and the depth information of the background area.

针对基于神经网络的像素估计过程，在步骤S616中，可以确定出遮挡部分的像素信息。For the pixel estimation process based on the neural network, in step S616, the pixel information of the occluded part can be determined.

针对基于另一神经网络的深度估计过程，在步骤S618中，可以确定出遮挡部分的深度信息。For the depth estimation process based on another neural network, in step S618, the depth information of the occluded part may be determined.

在步骤S620中，结合遮挡部分的深度信息和前景区域的信息，进行三维渲染。In step S620, three-dimensional rendering is performed by combining the depth information of the occluded part and the information of the foreground area.

在步骤S622中，终端设备可以将渲染得到三维图像输出。另外，可以生成三维动画进行展示，还可以基于三维图像生成三维相册，具体的，可以将该三维相册配置为云相册，以节省终端设备的存储空间。In step S622, the terminal device may output the rendered three-dimensional image. In addition, a three-dimensional animation can be generated for display, and a three-dimensional photo album can be generated based on the three-dimensional image. Specifically, the three-dimensional photo album can be configured as a cloud photo album to save storage space on the terminal device.

应当注意，尽管在附图中以特定顺序描述了本公开中方法的各个步骤，但是，这并非要求或者暗示必须按照该特定顺序来执行这些步骤，或是必须执行全部所示的步骤才能实现期望的结果。附加的或备选的，可以省略某些步骤，将多个步骤合并为一个步骤执行，以及/或者将一个步骤分解为多个步骤执行等。It should be noted that although various steps of the methods of the present disclosure are depicted in a specific order in the drawings, this does not require or imply that these steps must be performed in that specific order, or that all of the illustrated steps must be performed to achieve the desired results. the result of. Additionally or alternatively, certain steps may be omitted, multiple steps may be combined into one step for execution, and/or one step may be decomposed into multiple steps for execution, etc.

进一步的，本示例实施方式中还提供了一种图像处理装置。Furthermore, this exemplary embodiment also provides an image processing device.

图7示意性示出了本公开的示例性实施方式的图像处理装置的方框图。参考图7，根据本公开的示例性实施方式的图像处理装置7可以包括语义分割模块71、深度确定模块73、遮挡信息确定模块75和三维图像生成模块77。FIG. 7 schematically shows a block diagram of an image processing apparatus according to an exemplary embodiment of the present disclosure. Referring to FIG. 7 , the image processing device 7 according to an exemplary embodiment of the present disclosure may include a semantic segmentation module 71 , a depth determination module 73 , an occlusion information determination module 75 , and a three-dimensional image generation module 77 .

具体的，语义分割模块71可以用于获取二维图像，对二维图像进行语义分割，确定二维图像的前景区域和背景区域；深度确定模块73可以用于确定背景区域的深度信息；遮挡信息确定模块75可以用于利用背景区域的像素信息和深度信息，确定遮挡区域的像素信息和深度信息；其中，遮挡区域的位置与前景区域在二维图像上的位置对应；三维图像生成模块77可以用于结合遮挡区域的像素信息和深度信息，生成与二维图像对应的三维图像。Specifically, the semantic segmentation module 71 can be used to obtain a two-dimensional image, perform semantic segmentation on the two-dimensional image, and determine the foreground area and background area of the two-dimensional image; the depth determination module 73 can be used to determine the depth information of the background area; and occlusion information. The determination module 75 can be used to determine the pixel information and depth information of the occlusion area using the pixel information and depth information of the background area; wherein the position of the occlusion area corresponds to the position of the foreground area on the two-dimensional image; the three-dimensional image generation module 77 can Used to combine the pixel information and depth information of the occluded area to generate a three-dimensional image corresponding to the two-dimensional image.

基于本公开示例性实施方式的图像处理装置，一方面，本公开可以将二维图像转换为三维图像，提高了图像展示的立体感，视觉效果得到了提升；另一方面，针对寓教于乐的场景，本公开可以充分展示图像的信息，使用户更易了解图像的内容；再一方面，可以将本方案应用于增强现实技术或虚拟现实技术中，构建不同类型的应用场景，以提高用户的感知程度和参与度。Based on the image processing device of the exemplary embodiment of the present disclosure, on the one hand, the present disclosure can convert a two-dimensional image into a three-dimensional image, thereby improving the stereoscopic sense of the image display and enhancing the visual effect; on the other hand, for entertaining and educational scenarios, the present disclosure can fully display the information of the image, making it easier for users to understand the content of the image; on another hand, the present solution can be applied to augmented reality technology or virtual reality technology to construct different types of application scenarios to improve the user's perception and participation.

根据本公开的示例性实施例，遮挡信息确定模块75可以被配置为执行：对背景区域的像素信息和深度信息进行特征提取，生成中间信息；对中间信息执行像素信息预测过程，以确定遮挡区域的像素信息；对中间信息执行深度信息预测过程，以确定遮挡区域的深度信息。According to an exemplary embodiment of the present disclosure, the occlusion information determination module 75 may be configured to: perform feature extraction on pixel information and depth information of the background area to generate intermediate information; perform a pixel information prediction process on the intermediate information to determine the occlusion area pixel information; perform a depth information prediction process on the intermediate information to determine the depth information of the occlusion area.

根据本公开的示例性实施例，参考图8，相比于图像处理装置7，图像处理装置8还可以包括深度差比较模块81。According to an exemplary embodiment of the present disclosure, referring to FIG. 8 , compared with the image processing device 7 , the image processing device 8 may further include a depth difference comparison module 81 .

具体的，深度差比较模块81可以被配置为执行：确定前景区域的深度信息；基于前景区域的深度信息和背景区域的深度信息，确定前景区域与背景区域的深度差；将深度差与深度阈值进行比较；其中，如果深度差大于深度阈值，则控制遮挡信息确定模块75执行确定遮挡区域的像素信息和深度信息的过程。Specifically, the depth difference comparison module 81 can be configured to perform: determining the depth information of the foreground area; determining the depth difference between the foreground area and the background area based on the depth information of the foreground area and the depth information of the background area; comparing the depth difference with the depth threshold; wherein, if the depth difference is greater than the depth threshold, controlling the occlusion information determination module 75 to execute the process of determining the pixel information and depth information of the occlusion area.

根据本公开的示例性实施例，三维图像生成模块77可以被配置为执行：获取前景区域的像素信息和深度信息；结合遮挡区域的像素信息和深度信息以及前景区域的像素信息和深度信息，生成与二维图像对应的三维图像。According to an exemplary embodiment of the present disclosure, the three-dimensional image generation module 77 may be configured to: obtain pixel information and depth information of the foreground area; combine pixel information and depth information of the occlusion area and pixel information and depth information of the foreground area to generate A three-dimensional image corresponding to a two-dimensional image.

根据本公开的示例性实施例，三维图像生成模块77生成三维图像的过程可以被配置为执行：利用遮挡区域的像素信息和深度信息以及前景区域的像素信息和深度信息，生成前景区域对应对象的三维图像，作为与二维图像对应的三维图像。According to an exemplary embodiment of the present disclosure, the process of generating a three-dimensional image by the three-dimensional image generation module 77 may be configured to perform: using the pixel information and depth information of the occlusion area and the pixel information and depth information of the foreground area, generate a corresponding object in the foreground area. A three-dimensional image, as a three-dimensional image corresponding to a two-dimensional image.

根据本公开的示例性实施例，三维图像生成模块77生成三维图像的过程还可以被配置为执行：利用遮挡区域的像素信息和深度信息、前景区域的像素信息和深度信息以及背景区域的像素信息和深度信息，生成与二维图像对应的三维图像。According to an exemplary embodiment of the present disclosure, the process of generating a three-dimensional image by the three-dimensional image generation module 77 may also be configured to perform: utilizing pixel information and depth information of the occlusion area, pixel information and depth information of the foreground area, and pixel information of the background area. and depth information to generate a three-dimensional image corresponding to the two-dimensional image.

根据本公开的示例性实施例，深度确定模块73可以被配置为执行：对二维图像进行深度估计，并基于深度估计的结果确定背景区域的深度信息。According to an exemplary embodiment of the present disclosure, the depth determination module 73 may be configured to perform depth estimation on the two-dimensional image and determine depth information of the background area based on a result of the depth estimation.

根据本公开的示例性实施例，参考图9，相比于相比于图像处理装置7，图像处理装置9还可以包括对象识别模块91。According to an exemplary embodiment of the present disclosure, referring to FIG. 9 , compared to the image processing device 7 , the image processing device 9 may further include an object recognition module 91 .

具体的，对象识别模块91可以被配置为执行：识别前景区域内是否包含目标对象；其中，如果前景区域包含目标对象，则控制深度确定模块73执行对二维图像进行深度估计的过程。Specifically, the object recognition module 91 may be configured to: identify whether the foreground area contains a target object; wherein, if the foreground area contains the target object, control the depth determination module 73 to perform a depth estimation process on the two-dimensional image.

应当理解的是，对象识别模块91还可以配置于上述图像处理装置8中。类似地，图像处理装置8中包括的深度差比较模块81还可以配置于图像处理装置9中。It should be understood that the object recognition module 91 may also be configured in the above-mentioned image processing device 8 . Similarly, the depth difference comparison module 81 included in the image processing device 8 may also be configured in the image processing device 9 .

由于本公开实施方式的图像处理装置的各个功能模块与上述方法实施方式中相同，因此在此不再赘述。Since each functional module of the image processing device in the embodiment of the present disclosure is the same as that in the above method embodiment, details will not be described again here.

通过以上的实施方式的描述，本领域的技术人员易于理解，这里描述的示例实施方式可以通过软件实现，也可以通过软件结合必要的硬件的方式来实现。因此，根据本公开实施方式的技术方案可以以软件产品的形式体现出来，该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM，U盘，移动硬盘等)中或网络上，包括若干指令以使得一台计算设备(可以是个人计算机、服务器、终端装置、或者网络设备等)执行根据本公开实施方式的方法。Through the description of the above implementation, it is easy for those skilled in the art to understand that the example implementation described here can be implemented by software, or by software combined with necessary hardware. Therefore, the technical solution according to the implementation of the present disclosure can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a USB flash drive, a mobile hard disk, etc.) or on a network, including several instructions to enable a computing device (which can be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the implementation of the present disclosure.

此外，上述附图仅是根据本公开示例性实施例的方法所包括的处理的示意性说明，而不是限制目的。易于理解，上述附图所示的处理并不表明或限制这些处理的时间顺序。另外，也易于理解，这些处理可以是例如在多个模块中同步或异步执行的。In addition, the above-mentioned drawings are only schematic illustrations of processes included in the methods according to the exemplary embodiments of the present disclosure, and are not intended to be limiting. It is readily understood that the processes shown in the above figures do not indicate or limit the temporal sequence of these processes. In addition, it is also easy to understand that these processes may be executed synchronously or asynchronously in multiple modules, for example.

应当注意，尽管在上文详细描述中提及了用于动作执行的设备的若干模块或者单元，但是这种划分并非强制性的。实际上，根据本公开的实施方式，上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之，上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。It should be noted that although several modules or units of equipment for action execution are mentioned in the above detailed description, this division is not mandatory. In fact, according to embodiments of the present disclosure, the features and functions of two or more modules or units described above may be embodied in one module or unit. Conversely, the features and functions of one module or unit described above may be further divided into being embodied by multiple modules or units.

本领域技术人员在考虑说明书及实践这里公开的内容后，将容易想到本公开的其他实施例。本申请旨在涵盖本公开的任何变型、用途或者适应性变化，这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的，本公开的真正范围和精神由权利要求指出。Other embodiments of the disclosure will be readily apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the disclosure that follow the general principles of the disclosure and include common knowledge or customary technical means in the technical field that are not disclosed in the disclosure. . It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

应当理解的是，本公开并不局限于上面已经描述并在附图中示出的精确结构，并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限。It is to be understood that the present disclosure is not limited to the precise structures described above and illustrated in the accompanying drawings, and various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.