Movatterモバイル変換


[0]ホーム

URL:


CN113850709B - Image transformation method and device - Google Patents

Image transformation method and device
Download PDF

Info

Publication number
CN113850709B
CN113850709BCN202010600182.1ACN202010600182ACN113850709BCN 113850709 BCN113850709 BCN 113850709BCN 202010600182 ACN202010600182 ACN 202010600182ACN 113850709 BCN113850709 BCN 113850709B
Authority
CN
China
Prior art keywords
image
face
target person
target
distance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010600182.1A
Other languages
Chinese (zh)
Other versions
CN113850709A (en
Inventor
杨晨
王小娥
苏忱
肖朝蕾
田晶铎
郑士胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co LtdfiledCriticalHuawei Technologies Co Ltd
Priority to CN202010600182.1ApriorityCriticalpatent/CN113850709B/en
Publication of CN113850709ApublicationCriticalpatent/CN113850709A/en
Application grantedgrantedCritical
Publication of CN113850709BpublicationCriticalpatent/CN113850709B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本申请提供一种图像变换方法和装置。本申请图像变换方法,包括:通过前置摄像头针对目标场景获取第一图像,所述目标场景包括目标人物的人脸;获取所述目标人物的人脸与所述前置摄像头之间的目标距离;当所述目标距离小于预设阈值时,对所述第一图像进行第一处理得到第二图像;所述第一处理包括根据所述目标距离对所述第一图像进行畸变矫正;其中,所述第二图像中的目标人物的人脸相较于所述第一图像中的目标人物的人脸更接近于所述目标人物的人脸的真实样貌。本申请可以提升自拍场景下的拍摄成像效果。

The present application provides an image transformation method and device. The image transformation method of the present application includes: acquiring a first image for a target scene through a front camera, the target scene including a face of a target person; acquiring a target distance between the face of the target person and the front camera; when the target distance is less than a preset threshold, performing a first processing on the first image to obtain a second image; the first processing includes performing distortion correction on the first image according to the target distance; wherein the face of the target person in the second image is closer to the real appearance of the face of the target person than the face of the target person in the first image. The present application can improve the shooting imaging effect in the selfie scene.

Description

Image transformation method and device
Technical Field
The present application relates to image processing technology, and in particular, to an image transformation method and apparatus.
Background
Photography has become an important carrier for life recording, and in recent years, a front camera of a mobile phone is more and more popular with people, and the problem of perspective distortion of a face of a small size is more and more remarkable due to the fact that the distance from the camera to the face is relatively short during self-photographing, for example, when a person is photographed in a short distance, nose is usually caused to be relatively large due to the fact that the distances from different parts of the face to the camera are different, and meanwhile, the face is lengthened, so that subjective effect of the person is affected. Therefore, it is necessary to eliminate the perspective distortion effect of "near-large-far-small" and improve the aesthetic degree of the portrait by image transformation processing, for example, transformation of the distance, pose, position, etc. of the target in the image, on the premise of ensuring that the reality of the portrait is not affected.
The imaging process of the camera is to obtain a two-dimensional image of the three-dimensional object, and a common image processing algorithm is aimed at the two-dimensional image, but the image obtained after transformation based on the two-dimensional image is difficult to realize the effect of real transformation of the three-dimensional object.
Disclosure of Invention
The application provides an image transformation method and device for improving shooting imaging effect in a self-shooting scene.
The application provides an image transformation method, which comprises the steps of obtaining a first image aiming at a target scene through a front camera, wherein the target scene comprises a face of a target person, obtaining a target distance between the face of the target person and the front camera, performing first processing on the first image to obtain a second image when the target distance is smaller than a preset threshold value, and performing distortion correction on the first image according to the target distance, wherein the face of the target person in the second image is closer to the real appearance of the face of the target person than the face of the target person in the first image.
The first image is acquired by a front camera of the terminal under the scene of self-shooting by the user. The first image includes two cases, wherein the first image is not imaged on the image sensor when the shutter is not triggered by the user, and is only one original picture acquired by the camera, and the second image is an original image imaged on the image sensor when the shutter is triggered by the user. Therefore, the second image also comprises two cases, corresponding to the former case, the second image is a corrected image obtained by carrying out distortion correction based on the original image acquired by the camera, the image is not imaged on the image sensor and is only used as a preview image for a user to see, and corresponding to the latter case, the second image is a corrected image obtained by carrying out distortion correction based on the original image imaged on the image sensor, and the terminal can store the corrected image into a picture library.
According to the application, the three-dimensional transformation effect of the image is realized by means of the three-dimensional model, perspective distortion correction is carried out on the face of the target person in the image shot at a short distance, so that the relative proportion and the relative position of the facial features of the corrected target person are closer to those of the target person, and the shooting imaging effect under a self-shooting scene can be remarkably improved.
In one possible implementation, the target distance includes a distance between a front-most portion on a face of the target person and the front-facing camera, or a distance between a designated portion on the face of the target person and the front-facing camera, or a distance between a center position on the face of the target person and the front-facing camera.
The target distance between the face of the target person and the camera may be the distance between the foremost end position (e.g., nose) on the face of the target person and the camera, or the target distance between the face of the target person and the camera may be the distance between a specified portion (e.g., eyes, mouth, nose, etc.) on the face of the target person and the camera, or the target distance between the face of the target person and the camera may be the distance between the center position (e.g., nose on the front of the target person, or cheekbone position on the side of the target person, etc.) on the face of the target person and the camera. The definition of the target distance may be determined according to the specific situation of the first image, which is not particularly limited by the present application.
In one possible implementation manner, the obtaining the target distance between the face of the target person and the front camera includes obtaining a screen ratio of the face of the target person in the first image, and obtaining the target distance according to the screen ratio and a field angle FOV of the front camera.
In one possible implementation, the acquiring the target distance between the face of the target person and the front-facing camera includes acquiring the target distance by a distance sensor including a time-of-flight ranging TOF sensor, a structured light sensor, or a binocular sensor.
The application can acquire the target distance by adopting a mode of calculating the face screen duty ratio, namely, firstly acquiring the screen duty ratio (the ratio of the pixel area of the face to the pixel area of the first image) of the face of the target person in the first image, and then acquiring the target distance according to the screen duty ratio and the FOV of the front camera. The target distance may also be measured by a distance sensor. The target distance may also be obtained in other manners, which are not particularly limited in the present application.
In one possible implementation, the preset threshold is less than 80 cm.
In one possible implementation, the preset threshold is 50 cm.
The application sets a threshold value, and when the target distance between the face of the target person and the front camera is considered to be smaller than the threshold value, the obtained first image containing the face of the target person is distorted, and the first image needs to be distorted and corrected. The preset threshold value is within 80 cm, and optionally, the preset threshold value can be set to be 50 cm. It should be noted that, the specific value of the preset threshold may depend on the performance of the front camera, the shooting illumination, and the like, which is not particularly limited in the present application.
In one possible implementation, the second image includes a preview image or an image obtained after triggering a shutter.
The second image may be a preview image obtained by the front camera, that is, before the shutter is triggered, the front camera may obtain a first image facing the target scene area, the terminal performs perspective distortion correction on the first image by using the method to obtain the second image, and the second image is displayed on a screen of the terminal, where the second image seen by the user is a preview image that has undergone perspective distortion correction. The second image may also be an image obtained after the shutter is triggered, that is, the first image is an image imaged on the image sensor of the terminal, the terminal uses the method to obtain the second image after perspective distortion correction of the first image, the second image is stored and displayed on the screen of the terminal, and at the moment, the second image seen by the user is an image in the picture library which has been subjected to perspective distortion correction and stored.
In one possible implementation, the face of the target person in the second image is closer to the real appearance of the face of the target person than the face of the target person in the first image, including that the relative proportion of the five sense organs of the target person in the second image is closer to the relative proportion of the five sense organs of the face of the target person than the relative proportion of the five sense organs of the target person in the first image, and/or that the relative position of the five sense organs of the target person in the second image is closer to the relative position of the five sense organs of the face of the target person than the relative position of the five sense organs of the target person in the first image.
When the first image is acquired, the target distance between the face of the target person and the front camera is smaller than the preset threshold value, and the size change, stretching and the like of the five sense organs of the face of the target person in the first image are likely to occur due to the problem of 'near-large-far-small' face perspective distortion of the front camera, and the relative proportion and the relative position of the five sense organs deviate from the relative proportion and the relative position of the real appearance of the face of the target person. The second image obtained by perspective distortion correction of the first image can eliminate the conditions of size change, stretching and the like of the five sense organs of the face of the target person, so that the relative proportion and the relative position of the five sense organs of the face of the target person in the second image are approximate to, and even restore to, the relative proportion and the relative position of the real appearance of the face of the target person.
In one possible implementation manner, the correcting the distortion of the first image according to the target distance includes fitting the face of the target person in the first image with a standard face model according to the target distance to obtain depth information of the face of the target person, and correcting the perspective distortion of the first image according to the depth information to obtain the second image.
In one possible implementation manner, the perspective distortion correction is performed on the first image according to the depth information to obtain the second image, and the perspective distortion correction comprises the steps of establishing a first three-dimensional model of the face of the target person, transforming the pose and/or the shape of the first three-dimensional model to obtain a second three-dimensional model of the face of the target person, obtaining a pixel displacement vector field of the face of the target person according to the depth information, the first three-dimensional model and the second three-dimensional model, and obtaining the second image according to the pixel displacement vector field of the face of the target person.
The three-dimensional model is built based on the face of the target person, the three-dimensional transformation effect of the image is achieved through the three-dimensional model, the pixel displacement vector field of the face of the target person is obtained based on the association of sampling points between the three-dimensional model before transformation and the three-dimensional model after transformation, further, the transformed two-dimensional image is obtained, perspective distortion correction can be carried out on the face of the target person in the image shot at a short distance, the relative proportion and the relative position of the facial features of the corrected target person are enabled to be closer to those of the facial features of the target person, and the shooting imaging effect under a self-shooting scene can be remarkably improved.
In one possible implementation manner, perspective projection is performed on the first three-dimensional model according to the depth information to obtain a first coordinate set, the first coordinate set comprises coordinate values corresponding to a plurality of pixels in the first three-dimensional model, perspective projection is performed on the second three-dimensional model according to the depth information to obtain a second coordinate set, the second coordinate set comprises coordinate values corresponding to a plurality of pixels in the second three-dimensional model, coordinate differences between the first coordinate values and the second coordinate values are calculated to obtain a pixel displacement vector field of the target object, the first coordinate values comprise coordinate values corresponding to a first pixel in the first coordinate set, the second coordinate values comprise coordinate values corresponding to a first pixel in the second coordinate set, and the first pixel comprises any one of the same pixels contained in the first three-dimensional model and the second three-dimensional model.
The application provides an image transformation method, which comprises the steps of obtaining a first image, wherein the first image comprises a face of a target person, distortion exists in the face of the target person in the first image, displaying a distortion correction function menu, obtaining transformation parameters input by a user on the distortion correction function menu, wherein the transformation parameters at least comprise equivalent simulation shooting distances used for simulating the distance between the face of the target person and a camera when a shooting terminal shoots the face of the target person, performing first processing on the first image to obtain a second image, and performing distortion correction on the first image according to the transformation parameters, wherein the face of the target person in the second image is closer to the real appearance of the face of the target person than the face of the target person in the first image.
According to the application, perspective distortion correction is carried out on the first image according to the transformation parameters to obtain the second image, and the face of the target person in the second image is closer to the real appearance of the face of the target person compared with the face of the target person in the first image, namely, the relative proportion and the relative position of the five sense organs of the target person in the second image are closer to the relative proportion and the relative position of the five sense organs of the face of the target person compared with the relative proportion and the relative position of the five sense organs of the target person in the first image.
In one possible implementation manner, the distortion of the face of the target person in the first image is caused by that a target distance between the face of the target person and the second terminal when the second terminal captures the first image is smaller than a first preset threshold, wherein the target distance comprises a distance between a front-most end position on the face of the target person and the front camera, or a distance between a designated part on the face of the target person and the front camera, or a distance between a center position on the face of the target person and the front camera.
In one possible implementation, the target distance is obtained by a screen ratio of a face of the target person in the first image and a FOV of a camera of the second terminal, or by an equivalent focal length in exchangeable image file format EXIF information of the first image.
In one possible implementation manner, the distortion correction function menu comprises an option for adjusting the equivalent simulated shooting distance, and the obtaining of the transformation parameters input by the user on the distortion correction function menu comprises obtaining the equivalent simulated shooting distance according to an instruction triggered by the operation of a control or a sliding block in the option for adjusting the equivalent simulated shooting distance by the user.
In one possible implementation manner, the value of the equivalent simulated shooting distance in the option of adjusting the equivalent simulated shooting distance includes a default value or a pre-calculated value when the distortion correction function menu is initially displayed.
In a possible implementation manner, before the distortion correction function menu is displayed, a popup window is displayed when the face of the target person is distorted, the popup window is used for providing a selection control for whether distortion correction is performed or not, and when a user clicks the control for distortion correction on the popup window, an instruction generated by user operation is responded.
In a possible implementation manner, before the distortion correction function menu is displayed, the method further comprises the step of displaying a distortion correction control when the face of the target person is distorted, wherein the distortion correction control is used for opening the distortion correction function menu, and when a user clicks the distortion correction control, an instruction generated by user operation is responded.
In one possible implementation, the distortion of the face of the target person in the first image is caused by the fact that when the second terminal captures the first image, the field angle FOV of the camera is greater than a second preset threshold, and the pixel distance between the face of the target person and the edge of the FOV is less than a third preset threshold, wherein the pixel distance comprises the number of pixels between the foremost end position on the face of the target person and the edge of the FOV, or the number of pixels between a designated part on the face of the target person and the edge of the FOV, or the number of pixels between the center position on the face of the target person and the edge of the FOV.
In one possible implementation, the FOV is derived from EXIF information of the first image.
In one possible implementation, the second preset threshold is 90 °, and the third preset threshold is one fourth of the length or width of the first image.
In one possible implementation, the distortion correction function menu comprises an option for adjusting the displacement distance, and the obtaining of the transformation parameters input by the user on the distortion correction function menu comprises obtaining the adjustment direction and the displacement distance according to an instruction triggered by the operation of a control or a sliding block in the option for adjusting the displacement distance.
In one possible implementation, the distortion correction function menu comprises an option for adjusting the relative position and/or relative proportion of the five sense organs, and the obtaining of the transformation parameters input by the user on the distortion correction function menu comprises obtaining the adjustment direction, the displacement distance and/or the size of the five sense organs according to an instruction triggered by the operation of a control or a sliding block in the option for adjusting the relative position and/or relative proportion of the five sense organs.
In one possible implementation manner, the distortion correction function menu comprises an angle adjustment option, the obtaining of transformation parameters input by a user on the distortion correction function menu comprises obtaining adjustment directions and adjustment angles according to instructions triggered by operation of a control or a slider in the angle adjustment option, or the distortion correction function menu comprises an expression adjustment option, the obtaining of transformation parameters input by the user on the distortion correction function menu comprises obtaining new expression templates according to instructions triggered by operation of the control or the slider in the expression adjustment option, or the distortion correction function menu comprises an action adjustment option, and the obtaining of transformation parameters input by the user on the distortion correction function menu comprises obtaining of new action templates according to instructions triggered by operation of the control or the slider in the action adjustment option.
In one possible implementation, the face of the target person in the second image is closer to the real appearance of the face of the target person than the face of the target person in the first image, including that the relative proportion of the five sense organs of the target person in the second image is closer to the relative proportion of the five sense organs of the face of the target person than the relative proportion of the five sense organs of the target person in the first image, and/or that the relative position of the five sense organs of the target person in the second image is closer to the relative position of the five sense organs of the face of the target person than the relative position of the five sense organs of the target person in the first image.
In one possible implementation manner, the correcting the distortion of the first image according to the transformation parameters includes fitting the face of the target person in the first image with a standard face model according to the target distance to obtain depth information of the face of the target person, and correcting the perspective distortion of the first image according to the depth information and the transformation parameters to obtain the second image.
In one possible implementation manner, the perspective distortion correction is performed on the first image according to the depth information and the transformation parameters to obtain the second image, and the method comprises the steps of establishing a first three-dimensional model of a face of the target person, transforming the pose and/or the shape of the first three-dimensional model according to the transformation parameters to obtain a second three-dimensional model of the face of the target person, obtaining a pixel displacement vector field of the face of the target person according to the depth information, the first three-dimensional model and the second three-dimensional model, and obtaining the second image according to the pixel displacement vector field of the face of the target person.
In one possible implementation manner, the method for obtaining the pixel displacement vector field of the face of the target person according to the depth information, the first three-dimensional model and the second three-dimensional model comprises the steps of performing perspective projection on the first three-dimensional model according to the depth information to obtain a first coordinate set, wherein the first coordinate set comprises coordinate values corresponding to a plurality of pixels in the first three-dimensional model, performing perspective projection on the second three-dimensional model according to the depth information to obtain a second coordinate set, wherein the second coordinate set comprises coordinate values corresponding to a plurality of pixels in the second three-dimensional model, calculating coordinate differences between the first coordinate values and the second coordinate values to obtain the pixel displacement vector field of the target object, wherein the first coordinate values comprise coordinate values corresponding to a first pixel in the first coordinate set, and the second coordinate values comprise coordinate values corresponding to the first pixel in the second coordinate set, and the first pixel comprises any one of the same pixels contained in the first three-dimensional model and the second three-dimensional model.
The image transformation method is applied to a first terminal, and comprises the steps of obtaining a first image, wherein the first image comprises a face of a target person, distortion exists in the face of the target person in the first image, displaying a distortion correction function menu on a screen, the distortion correction function menu comprises one or more sliders and/or one or more controls, receiving a distortion correction instruction, the distortion correction instruction comprises transformation parameters generated when a user performs touch operation on the one or more sliders and/or the one or more controls, the transformation parameters at least comprise equivalent simulation shooting distances, the equivalent simulation shooting distances are used for simulating the distance between the face of the target person and a camera when the shooting terminal shoots the face of the target person, performing first processing on the first image according to the transformation parameters to obtain a second image, and the first processing comprises distortion correction on the first image, wherein the face of the target person in the second image is closer to the true face of the target person than the face of the target person in the first image.
The application provides an image conversion device, which comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a first image aiming at a target scene through a front camera, the target scene comprises a face of a target person, the target distance between the face of the target person and the front camera is acquired, the processing module is used for performing first processing on the first image to obtain a second image when the target distance is smaller than a preset threshold value, the first processing comprises the step of performing distortion correction on the first image according to the target distance, and the face of the target person in the second image is closer to the real appearance of the face of the target person than the face of the target person in the first image.
In one possible implementation, the target distance includes a distance between a front-most portion on a face of the target person and the front-facing camera, or a distance between a designated portion on the face of the target person and the front-facing camera, or a distance between a center position on the face of the target person and the front-facing camera.
In one possible implementation manner, the acquiring module is specifically configured to acquire a screen ratio of a face of the target person in the first image, and obtain the target distance according to the screen ratio and a field angle FOV of the front camera.
In one possible implementation, the acquisition module is specifically configured to acquire the target distance by a distance sensor, where the distance sensor includes a time-of-flight ranging TOF sensor, a structured light sensor, or a binocular sensor.
In one possible implementation, the preset threshold is less than 80 cm.
In one possible implementation, the second image includes a preview image or an image obtained after triggering a shutter.
In one possible implementation, the face of the target person in the second image is closer to the real appearance of the face of the target person than the face of the target person in the first image, including that the relative proportion of the five sense organs of the target person in the second image is closer to the relative proportion of the five sense organs of the face of the target person than the relative proportion of the five sense organs of the target person in the first image, and/or that the relative position of the five sense organs of the target person in the second image is closer to the relative position of the five sense organs of the face of the target person than the relative position of the five sense organs of the target person in the first image.
In one possible implementation manner, the processing module is specifically configured to fit a face of the target person in the first image with a standard face model according to the target distance to obtain depth information of the face of the target person, and perform perspective distortion correction on the first image according to the depth information to obtain the second image.
In one possible implementation manner, the processing module is specifically configured to establish a first three-dimensional model of a face of the target person, transform a pose and/or a shape of the first three-dimensional model to obtain a second three-dimensional model of the face of the target person, obtain a pixel displacement vector field of the face of the target person according to the depth information, the first three-dimensional model and the second three-dimensional model, and obtain the second image according to the pixel displacement vector field of the face of the target person.
In one possible implementation manner, the processing module is specifically configured to perform perspective projection on the first three-dimensional model according to the depth information to obtain a first coordinate set, where the first coordinate set includes coordinate values corresponding to a plurality of pixels in the first three-dimensional model, perform perspective projection on the second three-dimensional model according to the depth information to obtain a second coordinate set, where the second coordinate set includes coordinate values corresponding to a plurality of pixels in the second three-dimensional model, calculate a coordinate difference between a first coordinate value and a second coordinate value to obtain a pixel displacement vector field of the target object, where the first coordinate value includes coordinate values corresponding to a first pixel in the first coordinate set, and the second coordinate value includes coordinate values corresponding to a first pixel in the second coordinate set, where the first pixel includes any one of the same plurality of pixels included in the first three-dimensional model and the second three-dimensional model.
The application provides an image conversion device, which comprises an acquisition module, a display module and a processing module, wherein the acquisition module is used for acquiring a first image, the first image comprises a face of a target person, distortion exists in the face of the target person in the first image, the display module is used for displaying a distortion correction function menu, the acquisition module is also used for acquiring conversion parameters input by a user on the distortion correction function menu, the conversion parameters at least comprise equivalent simulation shooting distances, the equivalent simulation shooting distances are used for simulating distances between the face of the target person and a camera when a shooting terminal shoots the face of the target person, the processing module is used for carrying out first processing on the first image to obtain a second image, the first processing comprises correcting the distortion of the first image according to the conversion parameters, and the face of the target person in the second image is closer to the real appearance of the face of the target person than the face of the target person in the first image.
In one possible implementation manner, the distortion of the face of the target person in the first image is caused by that a target distance between the face of the target person and the second terminal when the second terminal captures the first image is smaller than a first preset threshold, wherein the target distance comprises a distance between a front-most end position on the face of the target person and the front camera, or a distance between a designated part on the face of the target person and the front camera, or a distance between a center position on the face of the target person and the front camera.
In one possible implementation, the target distance is obtained by a screen ratio of a face of the target person in the first image and a FOV of a camera of the second terminal, or by an equivalent focal length in exchangeable image file format EXIF information of the first image.
In one possible implementation manner, the distortion correction function menu includes an option for adjusting an equivalent simulated shooting distance, and the obtaining module is specifically configured to obtain the equivalent simulated shooting distance according to an instruction triggered by a user operating a control or a slider in the option for adjusting the equivalent simulated shooting distance.
In one possible implementation manner, the value of the equivalent simulated shooting distance in the option of adjusting the equivalent simulated shooting distance includes a default value or a pre-calculated value when the distortion correction function menu is initially displayed.
In one possible implementation manner, the display module is further configured to display a popup window when the face of the target person has distortion, where the popup window is used to provide a selection control for correcting distortion, and when a user clicks the control for correcting distortion on the popup window, respond to an instruction generated by a user operation.
In one possible implementation manner, the display module is further configured to display a distortion correction control when a face of the target person has distortion, where the distortion correction control is used to open the distortion correction function menu, and when a user clicks the distortion correction control, respond to an instruction generated by a user operation.
In one possible implementation, the distortion of the face of the target person in the first image is caused by the fact that when the second terminal captures the first image, the field angle FOV of the camera is greater than a second preset threshold, and the pixel distance between the face of the target person and the edge of the FOV is less than a third preset threshold, wherein the pixel distance comprises the number of pixels between the foremost end position on the face of the target person and the edge of the FOV, or the number of pixels between a designated part on the face of the target person and the edge of the FOV, or the number of pixels between the center position on the face of the target person and the edge of the FOV.
In one possible implementation, the FOV is derived from EXIF information of the first image.
In one possible implementation, the second preset threshold is 90 °, and the third preset threshold is one fourth of the length or width of the first image.
In one possible implementation manner, the distortion correction function menu comprises an option for adjusting the displacement distance, and the acquisition module is further used for acquiring the adjustment direction and the displacement distance according to an instruction triggered by the operation of a control or a sliding block in the option for adjusting the displacement distance by a user.
In one possible implementation manner, the distortion correction function menu comprises an option for adjusting the relative position and/or relative proportion of the five sense organs, and the acquisition module is further used for acquiring the adjustment direction, the displacement distance and/or the size of the five sense organs according to an instruction triggered by the operation of a control or a sliding block in the option for adjusting the relative position and/or relative proportion of the five sense organs by a user.
In one possible implementation manner, the distortion correction function menu comprises an angle adjustment option, the acquisition module is further used for acquiring an adjustment direction and an adjustment angle according to an instruction triggered by operation of a control or a sliding block in the angle adjustment option by a user, or the distortion correction function menu comprises an expression adjustment option, the acquisition module is further used for acquiring a new expression template according to an instruction triggered by operation of the control or the sliding block in the expression adjustment option by the user, or the distortion correction function menu comprises an action adjustment option, and the acquisition module is further used for acquiring a new action template according to an instruction triggered by operation of the control or the sliding block in the action adjustment option by the user.
In one possible implementation, the face of the target person in the second image is closer to the real appearance of the face of the target person than the face of the target person in the first image, including that the relative proportion of the five sense organs of the target person in the second image is closer to the relative proportion of the five sense organs of the face of the target person than the relative proportion of the five sense organs of the target person in the first image, and/or that the relative position of the five sense organs of the target person in the second image is closer to the relative position of the five sense organs of the face of the target person than the relative position of the five sense organs of the target person in the first image.
In one possible implementation manner, the processing module is specifically configured to fit a face of the target person in the first image with a standard face model according to the target distance to obtain depth information of the face of the target person, and perform perspective distortion correction on the first image according to the depth information and the transformation parameter to obtain the second image.
In one possible implementation manner, the processing module is specifically configured to establish a first three-dimensional model of a face of the target person, transform a pose and/or a shape of the first three-dimensional model according to the transformation parameters to obtain a second three-dimensional model of the face of the target person, obtain a pixel displacement vector field of the face of the target person according to the depth information, the first three-dimensional model and the second three-dimensional model, and obtain the second image according to the pixel displacement vector field of the face of the target person.
In one possible implementation manner, the processing module is specifically configured to perform perspective projection on the first three-dimensional model according to the depth information to obtain a first coordinate set, where the first coordinate set includes coordinate values corresponding to a plurality of pixels in the first three-dimensional model, perform perspective projection on the second three-dimensional model according to the depth information to obtain a second coordinate set, where the second coordinate set includes coordinate values corresponding to a plurality of pixels in the second three-dimensional model, calculate a coordinate difference between a first coordinate value and a second coordinate value to obtain a pixel displacement vector field of the target object, where the first coordinate value includes coordinate values corresponding to a first pixel in the first coordinate set, and the second coordinate value includes coordinate values corresponding to a first pixel in the second coordinate set, where the first pixel includes any one of the same plurality of pixels included in the first three-dimensional model and the second three-dimensional model.
In one possible implementation manner, the system further comprises a recording module, the acquisition module is further used for acquiring a recording instruction according to the triggering operation of the user on the recording control, and the recording module is used for starting recording of the second image acquisition process according to the recording instruction until receiving a recording stopping instruction generated by the triggering operation of the user on the recording stopping control.
The application provides an image conversion device, which comprises an acquisition module, a display module and a processing module, wherein the acquisition module is used for acquiring a first image, the first image comprises a face of a target person, the face of the target person in the first image is distorted, the display module is used for displaying a distortion correction function menu on a screen, the distortion correction function menu comprises one or more sliders and/or one or more controls, the distortion correction instruction comprises conversion parameters generated when a user performs touch operation on the one or more sliders and/or the one or more controls, the conversion parameters at least comprise equivalent simulated shooting distances, the equivalent simulated shooting distances are used for simulating the distance between the face of the target person and a camera when a shooting terminal shoots the face of the target person, the processing module is used for performing first processing on the first image according to the conversion parameters to obtain a second image, the first processing comprises distortion correction on the first image, and the face of the second image is closer to the true face of the target person than the face of the target person in the first image.
In a seventh aspect, the present application provides an apparatus comprising one or more processors, a memory for storing one or more programs, which when executed by the one or more processors, cause the one or more processors to implement the method as described in any of the first to third aspects above.
In an eighth aspect, the present application provides a computer readable storage medium comprising a computer program which, when executed on a computer, causes the computer to perform the method of any one of the first to third aspects above.
In a ninth aspect, the present application also provides a computer program product comprising computer program code which, when run on a computer, causes the computer to perform the method of any one of the first to third aspects.
Drawings
FIG. 1 is a schematic diagram of an exemplary application architecture to which the image transformation method of the present application is applicable;
Fig. 2 shows an exemplary structural schematic diagram of a terminal 200;
FIG. 3 is a flowchart of an embodiment of an image transformation method according to the present application;
FIG. 4 illustrates an exemplary schematic diagram of a distance acquisition method;
FIG. 5 illustrates an exemplary schematic diagram of a face three-dimensional model creation process;
FIG. 6 illustrates an exemplary schematic diagram of the angular change of position movement;
FIG. 7 illustrates an exemplary schematic diagram of perspective projection;
FIGS. 8a and 8b schematically illustrate the effect of face projection at 30cm and 55cm object distances, respectively;
FIG. 9 illustrates an exemplary schematic diagram of a pixel displacement vector dilation method;
FIG. 10 is a flowchart of a second embodiment of an image transformation method according to the present application;
FIG. 11 illustrates an exemplary schematic diagram of a menu of distortion correction functions;
FIGS. 12 a-12 f schematically illustrate a process of distortion correction of a terminal in a self-timer scenario;
FIGS. 13 a-13 h illustrate an exemplary process for distortion correction of images in a picture library;
FIG. 14 illustrates other examples of a menu of distortion correction functions;
fig. 15 is a schematic structural diagram of an embodiment of an image conversion device according to the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The terms "first," "second," and the like in the description and in the claims and drawings are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion, such as a series of steps or elements. The method, system, article, or apparatus is not necessarily limited to those explicitly listed but may include other steps or elements not explicitly listed or inherent to such process, method, article, or apparatus.
It should be understood that in the present application, "at least one (item)" means one or more, and "a plurality" means two or more. "and/or" is used to describe an association relationship of an associated object, and indicates that three relationships may exist, for example, "a and/or B" may indicate that only a exists, only B exists, and three cases of a and B exist simultaneously, where a and B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one of a, b or c may represent a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
The application provides an image transformation method, which enables a transformed image, in particular to a target object in the image, to realize the effect of real transformation in a three-dimensional space.
Fig. 1 shows an exemplary schematic diagram of an application architecture to which the image transformation method of the present application is applicable, as shown in fig. 1, where the framework includes an image capturing module, an image processing module and a display module, where the image capturing module is used to capture or acquire an image to be processed, and the image capturing module may be, for example, a camera, a video camera, etc., the image processing module is used to perform transformation processing on the image to be processed to implement distortion correction, and the image processing module may be any device having image processing capability, for example, a terminal, a picture server, etc., or any chip having image processing capability, for example, a graphics processor (graphics processing unit, GPU) chip, and the display module is used to display an image, where the display module may be, for example, a display, a screen of the terminal, a television, a projector, etc.
In the application, the image acquisition module, the image processing module and the display module can be integrated on the same device, and the processor of the device is used as a control module to control the image acquisition module, the image processing module and the display module to realize respective functions. The image acquisition module, the image processing module and the display module may also be separate devices. For example, the image acquisition module adopts devices such as a camera and a video camera, the image processing module and the display module are integrated on one device, at this time, the processor of the integrated device is used as a control module to control the image processing module and the display module to realize respective functions, the integrated device can also have wireless or wired transmission capability to receive the image to be processed from the image acquisition module, and the integrated device can also be provided with an input interface to acquire the image to be processed through the input interface. For example, the image acquisition module adopts equipment such as a camera and a video camera, the image processing module adopts equipment with image processing capability, such as a mobile phone, a tablet personal computer, a computer and the like, the display module adopts equipment such as a screen and a television, and the three are connected in a wireless or wired mode to realize the transmission of image data. For another example, the image acquisition module and the image processing module are integrated on a device, the integrated device has the capabilities of image acquisition and image processing, such as a mobile phone, a tablet computer and the like, a processor of the integrated device is used as a control module to control the image acquisition module and the image processing module to realize respective functions, the device can also have wireless or wired transmission capability to transmit images to the display module, and the integrated device can also be provided with an output interface to transmit images through the output interface.
It should be noted that, the application architecture may also use other hardware and/or software implementations, which are not limited in particular by the present application.
The image processing module may be a terminal (such as a mobile phone, a tablet computer (pad), etc.), a wearable device with a wireless communication function (such as a smart watch), a computer with a wireless transceiver function, a Virtual Reality (VR) device, an augmented reality (augmented reality, AR) device, etc., which are not limited in this application.
Fig. 2 shows a schematic structure of the terminal 200.
The terminal 200 may include a processor 210, an external memory interface 220, an internal memory 221, a universal serial bus (universal serial bus, USB) interface 230, a charge management module 240, a power management module 241, a battery 242, an antenna 1, an antenna 2, a mobile communication module 250, a wireless communication module 260, an audio module 270, a speaker 270A, a receiver 270B, a microphone 270C, an earphone interface 270D, a sensor module 280, keys 290, a motor 291, an indicator 292, a camera 293, a display 294, and a subscriber identity module (subscriber identification module, SIM) card interface 295, etc. The sensor modules 280 may include, among other things, pressure sensor 280A, gyroscope sensor 280B, barometric sensor 280C, magnetic sensor 280D, acceleration sensor 280E, distance sensor 280F, proximity sensor 280G, fingerprint sensor 280H, temperature sensor 280J, touch sensor 280K, ambient light sensor 280L, bone conduction sensor 280M, time of flight (TOF) sensor 280N, structured light sensor 280O, binocular sensor 280P, etc.
It should be understood that the structure illustrated in the embodiments of the present application does not constitute a specific limitation on the terminal 200. In other embodiments of the application, terminal 200 may include more or less components than illustrated, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
Processor 210 may include one or more processing units, for example, processor 210 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (IMAGE SIGNAL processor, ISP), a controller, a video codec, a digital signal processor (DIGITAL SIGNAL processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.
The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.
A memory may also be provided in the processor 210 for storing instructions and data. In some embodiments, the memory in processor 210 includes a cache memory. The memory may hold instructions or data that the processor 210 has just used or recycled. If the processor 210 needs to reuse the instruction or data, it may be called directly from the memory. Repeated accesses are avoided and the latency of the processor 210 is reduced, thereby improving the efficiency of the system.
In some embodiments, processor 210 may include one or more interfaces. The interfaces may include an integrated circuit (inter-INTEGRATED CIRCUIT, I2C) interface, an integrated circuit built-in audio (inter-INTEGRATED CIRCUIT SOUND, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous receiver transmitter (universal asynchronous receiver/transmitter, UART) interface, a mobile industry processor interface (mobile industry processor interface, MIPI), a general-purpose input/output (GPIO) interface, a subscriber identity module (subscriber identity module, SIM) interface, and/or a universal serial bus (universal serial bus, USB) interface, among others.
The I2C interface is a bi-directional synchronous serial bus comprising a serial data line (SERIAL DATA LINE, SDA) and a serial clock line (derail clock line, SCL). In some embodiments, the processor 210 may contain multiple sets of I2C buses. The processor 210 may be coupled to the touch sensor 280K, charger, flash, camera 293, etc., respectively, through different I2C bus interfaces. For example, the processor 210 may be coupled to the touch sensor 280K through an I2C interface, such that the processor 210 communicates with the touch sensor 280K through an I2C bus interface to implement the touch function of the terminal 200.
The I2S interface may be used for audio communication. In some embodiments, the processor 210 may contain multiple sets of I2S buses. The processor 210 may be coupled to the audio module 270 via an I2S bus to enable communication between the processor 210 and the audio module 270. In some embodiments, the audio module 270 may communicate audio signals to the wireless communication module 260 through the I2S interface to implement a function of answering a call through a bluetooth headset.
PCM interfaces may also be used for audio communication to sample, quantize and encode analog signals. In some embodiments, the audio module 270 and the wireless communication module 260 may be coupled by a PCM bus interface. In some embodiments, the audio module 270 may also transmit audio signals to the wireless communication module 260 through the PCM interface to implement a function of answering a call through the bluetooth headset. Both the I2S interface and the PCM interface may be used for audio communication.
The UART interface is a universal serial data bus for asynchronous communications. The bus may comprise a bi-directional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is typically used to connect the processor 210 with the wireless communication module 260. For example, the processor 210 communicates with a bluetooth module in the wireless communication module 260 through a UART interface to implement bluetooth functions. In some embodiments, the audio module 270 may transmit an audio signal to the wireless communication module 260 through a UART interface, implementing a function of playing music through a bluetooth headset.
The MIPI interface may be used to connect the processor 210 to peripheral devices such as the display 294, the camera 293, and the like. The MIPI interfaces include camera serial interfaces (CAMERA SERIAL INTERFACE, CSI), display serial interfaces (DISPLAY SERIAL INTERFACE, DSI), and the like. In some embodiments, processor 210 and camera 293 communicate through a CSI interface to implement the photographing function of terminal 200. The processor 210 and the display 294 communicate through a DSI interface to implement the display function of the terminal 200.
The GPIO interface may be configured by software. The GPIO interface may be configured as a control signal or as a data signal. In some embodiments, a GPIO interface may be used to connect the processor 210 with the camera 293, display 294, wireless communication module 260, audio module 270, sensor module 280, and the like. The GPIO interface may also be configured as an I2C interface, an I2S interface, a UART interface, an MIPI interface, etc.
The USB interface 230 is an interface conforming to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 230 may be used to connect a charger to charge the terminal 200, or may be used to transfer data between the terminal 200 and a peripheral device. And can also be used for connecting with a headset, and playing audio through the headset. The interface may also be used to connect other terminals, such as AR devices, etc.
It should be understood that the interfacing relationship between the modules illustrated in the embodiment of the present application is only illustrative, and does not limit the structure of the terminal 200. In other embodiments of the present application, the terminal 200 may also use different interfacing manners, or a combination of multiple interfacing manners in the above embodiments.
The charge management module 240 is configured to receive a charge input from a charger. The charger can be a wireless charger or a wired charger. In some wired charging embodiments, the charge management module 240 may receive a charging input of a wired charger through the USB interface 230. In some wireless charging embodiments, the charge management module 240 may receive wireless charging input through a wireless charging coil of the terminal 200. The charging management module 240 may also supply power to the terminal through the power management module 241 while charging the battery 242.
The power management module 241 is used for connecting the battery 242, and the charge management module 240 and the processor 210. The power management module 241 receives input from the battery 242 and/or the charge management module 240 and provides power to the processor 210, the internal memory 221, the display 294, the camera 293, the wireless communication module 260, and the like. The power management module 241 may also be configured to monitor battery capacity, battery cycle times, battery health (leakage, impedance), and other parameters. In other embodiments, the power management module 241 may also be disposed in the processor 210. In other embodiments, the power management module 241 and the charge management module 240 may be disposed in the same device.
The wireless communication function of the terminal 200 can be implemented by the antenna 1, the antenna 2, the mobile communication module 250, the wireless communication module 260, a modem processor, a baseband processor, and the like.
The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in terminal 200 may be configured to cover a single or multiple communication bands. Different antennas may also be multiplexed to improve the utilization of the antennas. For example, the antenna 1 may be multiplexed into a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
The mobile communication module 250 may provide a solution including 2G/3G/4G/5G wireless communication applied on the terminal 200. The mobile communication module 250 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA), or the like. The mobile communication module 250 may receive electromagnetic waves from the antenna 1, perform processes such as filtering, amplifying, and the like on the received electromagnetic waves, and transmit the processed electromagnetic waves to the modem processor for demodulation. The mobile communication module 250 can amplify the signal modulated by the modem processor, and convert the signal into electromagnetic waves through the antenna 1 to radiate. In some embodiments, at least some of the functional modules of the mobile communication module 250 may be disposed in the processor 210. In some embodiments, at least some of the functional modules of the mobile communication module 250 may be provided in the same device as at least some of the modules of the processor 210.
The modem processor may include a modulator and a demodulator. The modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low frequency baseband signal to the baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs sound signals through an audio device (not limited to speaker 270A, receiver 270B, etc.), or displays images or video through display screen 294. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communication module 250 or other functional module, independent of the processor 210.
The wireless communication module 260 may provide solutions for wireless communication including wireless local area network (wireless local area networks, WLAN) (e.g., wireless fidelity (WIRELESS FIDELITY, wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation SATELLITE SYSTEM, GNSS), frequency modulation (frequency modulation, FM), near field communication (NEAR FIELD communication, NFC), infrared (IR), etc., applied on the terminal 200. The wireless communication module 260 may be one or more devices that integrate at least one communication processing module. The wireless communication module 260 receives electromagnetic waves via the antenna 2, modulates the electromagnetic wave signals, filters the electromagnetic wave signals, and transmits the processed signals to the processor 210. The wireless communication module 260 may also receive a signal to be transmitted from the processor 210, frequency modulate it, amplify it, and convert it to electromagnetic waves for radiation via the antenna 2.
In some embodiments, antenna 1 and mobile communication module 250 of terminal 200 are coupled, and antenna 2 and wireless communication module 260 are coupled, so that terminal 200 may communicate with a network and other devices through wireless communication techniques. The wireless communication techniques can include the Global System for Mobile communications (global system for mobile communications, GSM), general packet radio service (GENERAL PACKET radio service, GPRS), code division multiple access (code division multiple access, CDMA), wideband code division multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), BT, GNSS, WLAN, NFC, FM, and/or IR techniques, among others. The GNSS may include a global satellite positioning system (global positioning system, GPS), a global navigation satellite system (global navigation SATELLITE SYSTEM, GLONASS), a beidou satellite navigation system (beidou navigation SATELLITE SYSTEM, BDS), a quasi zenith satellite system (quasi-zenith SATELLITE SYSTEM, QZSS) and/or a satellite based augmentation system (SATELLITE BASED AUGMENTATION SYSTEMS, SBAS).
Terminal 200 implements display functions through a GPU, display screen 294, application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display screen 294 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 210 may include one or more GPUs that execute program instructions to generate or change display information.
The display 294 is used to display images, videos, and the like. The display 294 includes a display panel. The display panel may employ a Liquid Crystal Display (LCD) CRYSTAL DISPLAY, an organic light-emitting diode (OLED), an active-matrix organic LIGHT EMITTING diode (AMOLED), a flexible light-emitting diode (FLED), miniled, microLed, micro-oLed, a quantum dot LIGHT EMITTING diode (QLED), or the like. In some embodiments, terminal 200 may include 1 or N displays 294, N being a positive integer greater than 1.
The terminal 200 may implement a photographing function through an ISP, a camera 293, a video codec, a GPU, a display 294, an application processor, and the like.
The ISP is used to process the data fed back by the camera 293. For example, when photographing, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electric signal, and the camera photosensitive element transmits the electric signal to the ISP for processing and is converted into an image visible to naked eyes. ISP can also optimize the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in the camera 293.
The camera 293 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (charge coupled device, CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, which is then transferred to the ISP to be converted into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV, or the like format. In some embodiments, terminal 200 may include 1 or N cameras 293, N being a positive integer greater than 1. One or more cameras 293 may be disposed on the front side of the terminal 200, for example, in the middle of the top of the screen, which may be understood as a front camera of the terminal, a device corresponding to the binocular sensor may also have two front cameras, or one or more cameras 293 may be disposed on the back side of the terminal 200, for example, in the upper left corner of the back of the terminal, which may be understood as a rear camera of the terminal.
The digital signal processor is used for processing digital signals, and can process other digital signals besides digital image signals. For example, when the terminal 200 selects a frequency bin, the digital signal processor is used to fourier transform the frequency bin energy, etc.
Video codecs are used to compress or decompress digital video. The terminal 200 may support one or more video codecs. In this way, the terminal 200 can play or record video in various encoding formats, such as moving picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4, etc.
The NPU is a neural-network (NN) computing processor, and can rapidly process input information by referencing a biological neural network structure, for example, referencing a transmission mode between human brain neurons, and can also continuously perform self-learning. Applications such as intelligent recognition of the terminal 200, for example, image recognition, face recognition, voice recognition, text understanding, etc., can be realized through the NPU.
The external memory interface 220 may be used to connect an external memory card, such as a Micro SD card, to realize the memory capability of the extension terminal 200. The external memory card communicates with the processor 210 through an external memory interface 220 to implement data storage functions. For example, files such as music, video, etc. are stored in an external memory card.
Internal memory 221 may be used to store computer executable program code that includes instructions. The internal memory 221 may include a storage program area and a storage data area. The storage program area may store an application program (such as a sound playing function, an image playing function, etc.) required for at least one function of the operating system, etc. The storage data area may store data (e.g., audio data, phonebook, etc.) created during use of the terminal 200, etc. In addition, the internal memory 221 may include a high-speed random access memory, and may further include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like. The processor 210 performs various functional applications of the terminal 200 and data processing by executing instructions stored in the internal memory 221 and/or instructions stored in a memory provided in the processor.
The terminal 200 may implement audio functions through an audio module 270, a speaker 270A, a receiver 270B, a microphone 270C, an earphone interface 270D, an application processor, and the like. Such as music playing, recording, etc.
The audio module 270 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The audio module 270 may also be used to encode and decode audio signals. In some embodiments, the audio module 270 may be disposed in the processor 210, or some functional modules of the audio module 270 may be disposed in the processor 210.
Speaker 270A, also referred to as a "horn," is used to convert audio electrical signals into sound signals. The terminal 200 can listen to music through the speaker 270A or listen to hands-free calls.
A receiver 270B, also referred to as a "earpiece", is used to convert the audio electrical signal into a sound signal. When terminal 200 is answering a telephone call or voice message, voice can be received by placing receiver 270B close to the human ear.
Microphone 270C, also referred to as a "microphone" or "microphone," is used to convert sound signals into electrical signals. When making a call or transmitting voice information, the user can sound near the microphone 270C through the mouth, inputting a sound signal to the microphone 270C. The terminal 200 may be provided with at least one microphone 270C. In other embodiments, the terminal 200 may be provided with two microphones 270C, and may implement a noise reduction function in addition to collecting sound signals. In other embodiments, the terminal 200 may be further provided with three, four or more microphones 270C to collect sound signals, reduce noise, identify the source of sound, implement directional recording functions, etc.
The earphone interface 270D is for connecting a wired earphone. Earphone interface 270D may be USB interface 230 or a 3.5mm open mobile electronic device platform (open mobile terminal platform, OMTP) standard interface, american cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.
The pressure sensor 280A is used to sense a pressure signal, and may convert the pressure signal into an electrical signal. In some embodiments, pressure sensor 280A may be disposed on display 294. The pressure sensor 280A is of various types, such as a resistive pressure sensor, an inductive pressure sensor, a capacitive pressure sensor, and the like. The capacitive pressure sensor may be a capacitive pressure sensor comprising at least two parallel plates with conductive material. When a force is applied to the pressure sensor 280A, the capacitance between the electrodes changes. The terminal 200 determines the strength of the pressure according to the change of the capacitance. When a touch operation is applied to the display 294, the terminal 200 detects the intensity of the touch operation according to the pressure sensor 280A. The terminal 200 may also calculate the location of the touch based on the detection signal of the pressure sensor 280A. In some embodiments, touch operations that act on the same touch location, but at different touch operation strengths, may correspond to different operation instructions. For example, when a touch operation with a touch operation intensity smaller than a first pressure threshold is applied to the short message application icon, an instruction to view the short message is executed. And executing an instruction for newly creating the short message when the touch operation with the touch operation intensity being greater than or equal to the first pressure threshold acts on the short message application icon.
The gyro sensor 280B may be used to determine a motion gesture of the terminal 200. In some embodiments, the angular velocity of terminal 200 about three axes (i.e., x, y, and z axes) may be determined by gyro sensor 280B. The gyro sensor 280B may be used for photographing anti-shake. Illustratively, when the shutter is pressed, the gyro sensor 280B detects the shake angle of the terminal 200, calculates the distance to be compensated by the lens module according to the angle, and makes the lens counteract the shake of the terminal 200 through the reverse motion, thereby realizing anti-shake. The gyro sensor 280B may also be used for navigating, somatosensory game scenes.
The air pressure sensor 280C is used to measure air pressure. In some embodiments, the terminal 200 calculates altitude from barometric pressure values measured by the barometric pressure sensor 280C, aiding in positioning and navigation.
The magnetic sensor 280D includes a hall sensor. The terminal 200 may detect the opening and closing of the flip cover using the magnetic sensor 280D. In some embodiments, when the terminal 200 is a folder, the terminal 200 may detect opening and closing of the folder according to the magnetic sensor 280D. And then according to the detected opening and closing state of the leather sheath or the opening and closing state of the flip, the characteristics of automatic unlocking of the flip and the like are set.
The acceleration sensor 280E may detect the magnitude of acceleration of the terminal 200 in various directions (typically three axes). The magnitude and direction of gravity may be detected when the terminal 200 is stationary. But also for recognizing the gesture of the terminal, the method is applied to switching of horizontal and vertical screens, pedometers and the like.
A distance sensor 280F for measuring distance. The terminal 200 may measure the distance by infrared or laser. In some embodiments, the terminal 200 may range using the distance sensor 280F to achieve fast focusing.
Proximity light sensor 280G may include, for example, a Light Emitting Diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. The terminal 200 emits infrared light outward through the light emitting diode. The terminal 200 detects infrared reflected light from nearby objects using a photodiode. When sufficient reflected light is detected, it may be determined that there is an object near the terminal 200. When insufficient reflected light is detected, the terminal 200 may determine that there is no object in the vicinity of the terminal 200. The terminal 200 can detect that the user holds the terminal 200 close to the ear by using the proximity light sensor 280G, so as to automatically extinguish the screen for the purpose of saving power. The proximity light sensor 280G may also be used in holster mode, pocket mode to automatically unlock and lock the screen.
The ambient light sensor 280L is used to sense ambient light level. The terminal 200 may adaptively adjust the brightness of the display 294 according to the perceived ambient light level. The ambient light sensor 280L may also be used to automatically adjust white balance during photographing. The ambient light sensor 280L may also cooperate with the proximity light sensor 280G to detect whether the terminal 200 is in a pocket to prevent false touches.
The fingerprint sensor 280H is used to collect a fingerprint. The terminal 200 can utilize the collected fingerprint characteristics to realize fingerprint unlocking, access the application lock, fingerprint photographing, fingerprint incoming call answering and the like.
The temperature sensor 280J is used to detect temperature. In some embodiments, the terminal 200 performs a temperature processing strategy using the temperature detected by the temperature sensor 280J. For example, when the temperature reported by temperature sensor 280J exceeds a threshold, terminal 200 performs a reduction in the performance of a processor located in the vicinity of temperature sensor 280J in order to reduce power consumption for thermal protection. In other embodiments, when the temperature is below another threshold, the terminal 200 heats the battery 242 to avoid the low temperature causing the terminal 200 to be abnormally shut down. In other embodiments, when the temperature is below a further threshold, terminal 200 performs boosting of the output voltage of battery 242 to avoid abnormal shutdown caused by low temperatures.
The touch sensor 280K, also referred to as a "touch device". The touch sensor 280K may be disposed on the display screen 294, and the touch sensor 280K and the display screen 294 form a touch screen, which is also referred to as a "touch screen". The touch sensor 280K is used to detect a touch operation acting on or near it. The touch sensor may communicate the detected touch operation to the application processor to determine the touch event type. Visual output related to touch operations may be provided through the display 294. In other embodiments, the touch sensor 280K may also be disposed on a surface of the terminal 200 at a different location than the display 294.
Bone conduction sensor 280M may acquire a vibration signal. In some embodiments, bone conduction sensor 280M may acquire a vibration signal of a human vocal tract vibrating bone pieces. The bone conduction sensor 280M may also contact the pulse of the human body to receive the blood pressure pulsation signal. In some embodiments, bone conduction sensor 280M may also be provided in a headset, in combination with an osteoinductive headset. The audio module 270 may analyze the voice signal based on the vibration signal of the sound portion vibration bone piece obtained by the bone conduction sensor 280M, so as to implement the voice function. The application processor can analyze heart rate information based on the blood pressure beat signal acquired by the bone conduction sensor 280M, so as to realize a heart rate detection function.
Keys 290 include a power on key, a volume key, etc. The keys 290 may be mechanical keys. Or may be a touch key. The terminal 200 may receive key inputs, generating key signal inputs related to user settings and function controls of the terminal 200.
The motor 291 may generate a vibration alert. The motor 291 may be used for incoming call vibration alerting or for touch vibration feedback. For example, touch operations acting on different applications (e.g., photographing, audio playing, etc.) may correspond to different vibration feedback effects. The motor 291 may also correspond to different vibration feedback effects by touch operations applied to different areas of the display 294. Different application scenarios (such as time reminding, receiving information, alarm clock, game, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.
The indicator 292 may be an indicator light, which may be used to indicate a state of charge, a change in power, a message indicating a missed call, a notification, etc.
The SIM card interface 295 is for interfacing with a SIM card. The SIM card may be inserted into the SIM card interface 295 or withdrawn from the SIM card interface 295 to enable contact and separation with the terminal 200. The terminal 200 may support 1 or N SIM card interfaces, N being a positive integer greater than 1. The SIM card interface 295 may support Nano SIM cards, micro SIM cards, and the like. The same SIM card interface 295 may be used to insert multiple cards simultaneously. The types of the plurality of cards may be the same or different. The SIM card interface 295 may also be compatible with different types of SIM cards. The SIM card interface 295 may also be compatible with external memory cards. The terminal 200 interacts with the network through the SIM card to realize functions such as communication and data communication. In some embodiments, the terminal 200 employs an eSIM, i.e., an embedded SIM card. The eSIM card may be embedded in the terminal 200 and cannot be separated from the terminal 200.
Fig. 3 is a flowchart of an embodiment of an image transformation method according to the present application, as shown in fig. 3, the method of the present embodiment may be applied to the application architecture shown in fig. 1, and the execution body may be the terminal shown in fig. 2. The image transformation method may include:
step 301, a first image is acquired for a target scene through a front-facing camera.
The first image is acquired by a front-facing camera of the terminal under the scene of self-shooting by a user. Typically, the FOV of the front camera is set to 70 ° to 110 °, preferably 90 °. The area facing the front camera is a target scene, and the target scene comprises the face of a target person (namely a user).
Alternatively, the first image may be a preview image acquired by a front camera of the terminal and displayed on the screen, where the shutter is not triggered, and the first image is not yet imaged on the image sensor, or the first image may also be an image acquired by the terminal but not displayed on the screen, where the shutter is not triggered, and the first image is not imaged on the sensor, or the first image may also be an image acquired by the terminal and imaged on the image sensor after the shutter is triggered.
It should be noted that, the first image may also be obtained through a rear camera of the terminal, which is not limited in particular.
Step 302, obtaining a target distance between a face of a target person and a front camera.
The target distance between the face of the target person and the front camera may be the distance between the front-most end position (e.g., nose) on the face of the target person and the front camera, or the target distance between the face of the target person and the front camera may be the distance between a designated part (e.g., eyes, mouth, nose, etc.) on the face of the target person and the front camera, or the target distance between the face of the target person and the front camera may be the distance between the center position (e.g., nose on the front of the target person, or cheekbone position on the side of the target person, etc.) on the face of the target person and the front camera. The definition of the target distance may be determined according to the specific situation of the first image, which is not particularly limited by the present application.
In one possible implementation manner, the terminal may acquire the target distance by calculating a face screen ratio, that is, firstly, acquiring a screen ratio (a ratio of a pixel area of a face to a pixel area of a first image) of a face of the target person in the first image, and then obtaining the distance according to the screen ratio and a field of view (FOV) of the front camera. Fig. 4 shows an exemplary schematic diagram of a distance acquisition method, as shown in fig. 4, assuming that the average face is 20cm long and 15cm wide, the real area s=20×15/P cm2 of the entire field of view of the first image at the target distance D between the face of the target person and the front camera can be estimated from the screen ratio P of the face. The diagonal length L of the first image may be obtained from the aspect ratio of the first image. For example, when the aspect ratio of the first image is 1:1, the diagonal length is l=s0.5. According to the imaging relationship of the above diagram, the target distance d=l/(2×tan (0.5×fov)).
In one possible implementation manner, the terminal may also obtain the target distance through a distance sensor, that is, the distance between the front camera and the face of the front target person may be measured through the distance sensor on the terminal during self-shooting of the user. The distance sensor may include, for example, a time of flight (TOF) sensor, a structured light sensor, or a binocular sensor, among others.
And 303, when the target distance is smaller than a preset threshold value, performing first processing on the first image to obtain a second image, wherein the first processing comprises distortion correction on the first image according to the target distance.
The distance from the face to the front camera is usually short during self-shooting, and the problem of 'near-far-small' face perspective distortion exists, for example, when the distance between the face of a target person and the front camera is too short, the nose in an image is possibly enlarged, the face is lengthened and other effects can be caused due to the fact that the distances from different parts in the face to the cameras are different. And when the distance between the face of the target person and the front camera is large, the above problem may be weakened. Therefore, the application sets a threshold value, and when the distance between the face of the target person and the front camera is considered to be smaller than the threshold value, the obtained first image containing the face of the target person is distorted, and the first image needs to be distorted and corrected. The preset threshold value is within 80 cm, and optionally, the preset threshold value can be set to be 50 cm. It should be noted that, the specific value of the preset threshold may depend on the performance of the front camera, the shooting illumination, and the like, which is not particularly limited in the present application.
And the terminal performs perspective distortion correction on the first image to obtain a second image, wherein the face of the target person in the second image is closer to the real appearance of the face of the target person than the face of the target person in the first image, namely the relative proportion and the relative position of the five sense organs of the target person in the second image are closer to the relative proportion and the relative position of the five sense organs of the face of the target person than the relative proportion and the relative position of the five sense organs of the target person in the first image.
As described above, when the first image is acquired, the distance between the face of the target person and the front camera is smaller than the preset threshold, and there is a high possibility that the facial features of the face of the target person in the first image undergo a change in size, stretching, or the like due to the problem of "near-large-far-small" face perspective distortion of the front camera, the relative proportion and relative position of the facial features of which deviate from those of the actual appearance of the face of the target person. The second image obtained by perspective distortion correction of the first image can eliminate the conditions of size change, stretching and the like of the five sense organs of the face of the target person, so that the relative proportion and the relative position of the five sense organs of the face of the target person in the second image are approximate to, and even restore to, the relative proportion and the relative position of the real appearance of the face of the target person.
Optionally, the second image may be a preview image acquired by the front camera, that is, before the shutter is triggered, the front camera may acquire a first image (the first image is displayed on the screen as a preview image or the first image does not appear on the screen) opposite to the target scene area, the terminal performs perspective distortion correction on the first image by using the above method to obtain a second image, and displays the second image on the screen of the terminal, where the second image seen by the user is a preview image that has undergone perspective distortion correction, and the shutter is not triggered, and the second image has not yet been imaged on the image sensor. Or the second image can be an image obtained after the shutter is triggered, namely, the first image is an image imaged on an image sensor of the terminal after the shutter is triggered, the terminal obtains the second image after performing perspective distortion correction on the first image by adopting the method, the second image is stored and displayed on a screen of the terminal, and at the moment, the second image seen by the user is an image which is subjected to perspective distortion correction and is stored in a picture library.
The process of obtaining the second image by perspective distortion correction of the first image can comprise the steps of fitting the face of a target person in the first image with a standard face model according to the target distance to obtain depth information of the face of the target person, and then obtaining the second image by perspective distortion correction of the first image according to the depth information. The standard face model is a face model including five sense organs created in advance, which is provided with shape conversion coefficients, expression conversion coefficients, and the like, and the expression or shape of the face model can be changed by adjusting these coefficient values. D represents the value of the target distance, and the terminal assumes that a standard three-dimensional face model is placed right in front of the camera, the target distance between the standard three-dimensional face model and the camera is D, and a specified point (for example, a nose tip, a center point of the face model, etc.) on the standard three-dimensional face model corresponds to the origin O of the three-dimensional coordinate system. Obtaining a two-dimensional projection point set A of the characteristic points on the standard three-dimensional face model through perspective projection, obtaining a point set B of the two-dimensional characteristic points of the face of the target person in the first image, wherein each point in the point set A has a corresponding unique matching point in the point set B, and calculating to obtain the sum F of the two-dimensional coordinate distance differences of all the matching points. In order to obtain the real three-dimensional model of the face of the target person, the plane position (i.e. the specified points move up and down and left and right away from the origin O), the shape, the relative proportion of the five sense organs and the like of the standard three-dimensional face model can be adjusted for multiple times, so that the sum F of the distance differences reaches minimum or even approaches 0, i.e. the point set A and the point set B approach to completely coincide one by one. Based on the method, the real three-dimensional model of the face of the target person corresponding to the first image can be obtained. And obtaining the coordinates (x, y, z+D) of any pixel point on the real three-dimensional model of the face of the target person relative to the camera according to the coordinates (x, y, z) of any pixel point on the real three-dimensional model relative to the origin O and the target distance D between the origin O and the camera. At this time, depth information of any pixel point of the face of the target person in the first image can be obtained through perspective projection. Alternatively, a TOF sensor, a structured light sensor, etc. may be used to directly obtain depth information.
Based on the fitting process, the terminal can establish a first three-dimensional model of the face of the target person, and the fitting process can adopt modes such as face feature point fitting, depth learning fitting, TOF camera depth fitting and the like. The first three-dimensional model may be presented in a three-dimensional point cloud corresponding to a two-dimensional image of the face of the target person, presenting the same facial expression, relative proportions and relative positions of facial features, and the like.
Illustratively, a three-dimensional model of the face of the target person is established by adopting a face feature point fitting mode. Fig. 5 shows an exemplary schematic diagram of a process of creating a three-dimensional model of a face, as shown in fig. 5, firstly, feature points of parts such as facial features and contours of the face are obtained from an input image, and then, by means of a deformable basic three-dimensional face model, fitting of a real face form is performed according to a fitting parameter (a target distance, a FOV coefficient, etc.) and a fitting optimization term (a face rotation translation scaling parameter, a face model deformation coefficient, etc.) according to a corresponding relationship between a 3D key point of the face model and the obtained feature points. Because the corresponding positions of the outline feature points of the face on the three-dimensional face model can be changed along with the corners of the face, the corresponding relation under different corners can be selected, and fitting accuracy is ensured by adopting multiple iterative fitting. And finally fitting and outputting three-dimensional space coordinates of each point on the three-dimensional point cloud of the face of the attached target person. The distance between the camera and the nose is established, a three-dimensional face model is built according to the distance, a standard model is placed on the distance, projection is carried out, the coordinate distance difference between two images (photo and two-position projection) is reduced to the minimum, the distance difference is projected continuously, and the three-dimensional face model is obtained, and the depth information is a vector. Fitting is not only fitting shape, but also fitting x-axis and y-axis.
And the terminal transforms the pose and/or the shape of the first three-dimensional model of the face of the target person to obtain a second three-dimensional model of the face of the target person.
The pose and/or shape of the first three-dimensional model are/is adjusted according to the distance between the face of the target person and the front camera, and the first three-dimensional model can be moved backwards to obtain a second three-dimensional model of the face of the target person because the distance between the face of the target person and the front camera in the first image is smaller than a preset threshold value. The adjustment based on the three-dimensional model may simulate the adjustment of the face of the target person in the real world, so that the adjusted second three-dimensional model may exhibit a result of a backward movement of the face of the target person compared to the front camera.
In one possible implementation, the second three-dimensional model is angle-compensated when the second three-dimensional model is obtained by a position shift of the first three-dimensional model.
In the real world, when the face of the target person moves backward relative to the front camera, the face of the target person may change in angle relative to the camera, and at this time, if the angle change caused by backward movement is not desired to be retained, angle compensation may be performed. Fig. 6 shows an exemplary schematic diagram of the angular change of the position movement, as shown in fig. 6, where the face of the target person is initially less than the preset threshold from the front camera, and the angle of the front camera is α, the vertical distance from the front camera is tz1, and the horizontal distance from the front camera is tx. When the face of the target person moves backward, the angle of the target person relative to the front camera becomes beta, the vertical distance from the front camera becomes tz2, and the horizontal distance from the front camera is still tx. Thus the changing angle of the face of the target person relative to the front cameraIf the first three-dimensional model is merely moved backward, the effect of angle change caused by the backward movement of the two-dimensional image obtained based on the second three-dimensional model is caused, and the second three-dimensional model needs to be subjected to angle compensation, that is, the second three-dimensional model is rotated by an angle delta theta, so that the finally obtained two-dimensional image and the corresponding two-dimensional image (namely, the face of the target person in the original image) keep the same posture.
And the terminal acquires a pixel displacement vector field of the target object according to the depth information, the first three-dimensional model and the second three-dimensional model.
After two three-dimensional models (the first three-dimensional model and the second three-dimensional model) before and after transformation are obtained, perspective projection is carried out on the first three-dimensional model according to depth information to obtain a first coordinate set, and the first coordinate set comprises two-dimensional coordinate values obtained by projection corresponding to a plurality of sampling points in the first three-dimensional model. And performing perspective projection on the second three-dimensional model according to the depth information to obtain a second coordinate set, wherein the second coordinate set comprises two-dimensional coordinate values obtained by projection corresponding to a plurality of sampling points in the second three-dimensional model.
Projection is a method of converting three-dimensional coordinates into two-dimensional coordinates, and common projection methods include orthogonal projection, perspective projection, and the like. Taking perspective projection as an example, the basic perspective projection model is composed of a view point E and a view plane P, wherein the view point E is not on the view plane P. Viewpoint E may be considered the position of the camera. The view plane P is a two-dimensional plane that renders a perspective view of the three-dimensional target object. Fig. 7 shows an exemplary schematic view of perspective projection, as shown in fig. 7, for any point X in the real world, a ray is constructed that starts at the point E and passes through the point X, and the intersection Xp of the ray and the view plane P is the perspective projection of the point X. The object of the three-dimensional world can be regarded as being formed by the set of points Xi, such that rays Ri starting from the viewpoint E and passing through the Xi points are respectively constructed, and the set of intersection points of these rays Ri with the viewing plane P is a two-dimensional projection of the object of the three-dimensional world at the viewpoint E.
Based on the principle, each sampling point of the three-dimensional model is respectively projected to obtain corresponding pixel points on the two-dimensional plane, and the pixel points on the two-dimensional plane can be represented by a coordinate value in the two-dimensional plane, so that a coordinate set corresponding to the sampling point of the three-dimensional model can be obtained. The method and the device can obtain the first coordinate set corresponding to the first three-dimensional model and the second coordinate set corresponding to the second three-dimensional model.
And calculating a coordinate difference between a first coordinate value and a second coordinate value to obtain a pixel displacement vector field of the face of the target person, wherein the first coordinate value is a coordinate value corresponding to a first sampling point in a first coordinate set, the second coordinate value is a coordinate value corresponding to the first sampling point in a second coordinate set, and the first sampling point is any one point in a plurality of same point clouds contained in the first three-dimensional model and the second three-dimensional model.
The second three-dimensional model is obtained by transforming the pose and/or shape of the first three-dimensional model, so that the first three-dimensional model and the second three-dimensional model contain a large number of identical sampling points, even the sampling points contained in the first three-dimensional model and the second three-dimensional model are identical, and therefore a plurality of groups of coordinate values in the first coordinate set and the second coordinate set correspond to the same sampling point, namely a certain sampling point corresponds to one coordinate value in the first coordinate set, and a certain sampling point corresponds to one coordinate value in the second coordinate set. And calculating the coordinate difference between the first coordinate value and the second coordinate value, namely respectively calculating the coordinate difference of the first coordinate value and the second coordinate value on the x axis and the coordinate difference on the y axis to obtain the coordinate difference of the first pixel. And calculating the coordinate differences of all the same sampling points contained in the first three-dimensional model and the second three-dimensional model to obtain a pixel displacement vector field of the face of the target person, wherein the pixel displacement vector field consists of the coordinate differences of all the sampling points.
In one possible implementation, when a coordinate difference between a coordinate value corresponding to a pixel of an edge position of a face of the target person in the image in the second coordinate set and a coordinate value of a pixel of a surrounding area is greater than a preset threshold, the coordinate value in the second coordinate set is subjected to translation or scaling adjustment, and the surrounding area is adjacent to the face of the target person.
In order to keep the size and the position of the face of the target person consistent, when the edge position of the face of the target person is excessively displaced (can be measured by a preset threshold value) to cause the distortion of the background, a proper alignment point and a proper zoom scale can be selected to carry out translation or zoom adjustment on the coordinate values in the second coordinate set. The principle of the panning or zooming adjustment may be to make the displacement of the edge of the face of the target person as small as possible with respect to the surrounding area, which includes the background area or the field of view edge area, for example, when the lowest point of the first coordinate set is located at the boundary of the field of view edge area, and the lowest point of the second coordinate set is excessively deviated from the boundary, the second coordinate set is panned so that the lowest point coincides with the boundary of the field of view edge area. Fig. 8a and 8b schematically show the effect of face projection at an object distance of 30cm and 55cm, respectively.
The terminal obtains a transformed image according to the pixel displacement vector field of the face of the target person.
In one possible implementation manner, algorithm constraint correction is performed on the face of the target person, a field of view edge area and a background area according to a pixel displacement vector field of the face of the target person, so as to obtain a transformed image, wherein the field of view edge area is a strip area positioned at the edge of the image, and the background area is other areas except the face of the target person and the field of view edge area in the image.
The image is divided into three regions, one being the region occupied by the face of the target person, the other being the edge region of the image (i.e., the field of view edge region), and the third being the background region (i.e., the portion of the background outside the face of the target person that does not include the edge region of the image).
The method comprises the steps of determining initial image matrixes corresponding to a face, a view field edge area and a background area of a target person according to pixel displacement vector fields of the face of the target person, constructing constraint items corresponding to the face, the view field edge area and the background area of the target person respectively, constructing regular constraint items for images, obtaining pixel displacement matrixes corresponding to the face, the view field edge area and the background area of the target person according to the constraint items corresponding to the face, the view field edge area and the background area of the target person respectively and weight coefficients corresponding to the constraint items respectively, and obtaining transformed images through color mapping according to the initial image matrixes corresponding to the face, the view field edge area and the background area of the target person respectively and the pixel displacement matrixes corresponding to the face, the view field edge area and the background area of the target person respectively.
In one possible implementation, the pixel displacement vector field of the face of the target person is expanded through an interpolation algorithm to obtain a pixel displacement vector field of a mask region, the mask region comprises the face of the target person, and algorithm constraint correction is performed on the mask region, the field of view edge region and the background region according to the pixel displacement vector field of the mask region to obtain a transformed image.
The interpolation algorithm may include assigning a pixel displacement vector of a first sampling point to a second sampling point, which is any one of sampling points located outside a face region of the target person and within a mask region, as a pixel displacement vector of the second sampling point, the first sampling point being a pixel point closest to the second pixel point on a boundary contour of a face of the target person.
Fig. 9 is a schematic diagram of an exemplary pixel displacement vector expansion method, where, as shown in fig. 9, the target area is a face of a person, and the mask area is an area of a head of the person, where the mask area includes the face of the person, and the pixel displacement vector field of the face of the person is expanded to the whole mask area by using the interpolation algorithm, so as to obtain the pixel displacement vector field of the mask area.
The image is divided into four areas, one is the area occupied by the face, and the other is a partial area related to the face, and the partial area can be correspondingly transformed along with the pose and/or shape transformation of the target object, for example, when the face rotates, the head of the corresponding person must also rotate. The face and the partial regions form a mask region. The third is the edge region of the image (i.e., the field of view edge region), and the fourth is the background region (i.e., the portion of the background outside the object that does not include the edge region of the image).
The method and the device can determine initial image matrixes respectively corresponding to the mask area, the view field edge area and the background area according to the pixel displacement vector field of the mask area, construct constraint items respectively corresponding to the mask area, the view field edge area and the background area, and construct regular constraint items for images, obtain pixel displacement matrixes respectively corresponding to the mask area, the view field edge area and the background area according to the constraint items respectively corresponding to the mask area, the view field edge area and the background area and weight coefficients corresponding to the constraint items, and obtain transformed images according to the initial image matrixes respectively corresponding to the mask area, the view field edge area and the background area and the pixel displacement matrixes respectively corresponding to the mask area, the view field edge area and the background area through color mapping.
The constraint terms for the mask region, the background region, and the field-of-view edge region, and the regular constraint terms for image global are described below, respectively.
(1) And the constraint item corresponding to the mask region is used for constraining the target image matrix corresponding to the mask region in the image to approach to an image matrix after geometric transformation is carried out on the pixel displacement vector field of the mask region in the previous step so as to correct the distortion of the mask region. The geometric transformation represents a spatial mapping, i.e. a mapping of a pixel displacement vector field into another image matrix by transformation. The geometric transformation in the present application may be at least one of image Translation transformation (transformation), image scaling transformation (Scale), image Rotation transformation (Rotation).
For convenience of description, the constraint terms corresponding to the mask region may be simply referred to as mask constraint terms, and when a plurality of target objects exist in the image, different target objects may correspond to different mask constraint terms.
Mask constraint terms can be written as Term1, the expression for Term1 is as follows:
Term1(i,j)=SUM(i,j)∈HeadRegionk||M0(i,j)+Dt(i,j)-Func1k[M1(i,j)]||
For an image matrix M0 (i, j) of a pixel point located in a head region (i.e., (i, j) ∈ HeadRegionk), coordinate values of a target object after shape preservation are M1 (i, j) = [ u1 (i, j), v1 (i, j) ]T, dt (i, j) represents a displacement matrix corresponding to M0 (i, j), k represents a kth mask region of the image, func1k represents a geometric transformation function corresponding to the kth mask region, and ||.
The mask constraint Term1 (i, j) is required to ensure that the image matrix M0 (i, j) tends to be an appropriate geometric transformation of M1 (i, j) under the influence of the displacement matrix Dt (i, j), including at least one transformation operation of image rotation, image translation and image scaling.
The geometric transformation function Func1k corresponding to the kth mask region indicates that all points in the kth mask region share the same geometric transformation function Func1k, and different mask regions correspond to different geometric transformation functions. The geometric transformation function Func1k can be expressed in particular as:
Where ρ1k represents the scaling factor of the kth mask region, θ1k represents the rotation angle of the kth mask region, and TX1k and TY1k represent the lateral displacement and the longitudinal displacement of the kth mask region, respectively.
Term1 (i, j) may be specifically expressed as:
Where du (i, j) and dv (i, j) are unknowns that need to be solved, and the constraint equation needs to be solved later to ensure that the term is as small as possible.
(2) The constraint item corresponding to the field of view edge region is used for constraining the pixel points in the initial image matrix corresponding to the field of view edge region in the image to displace along the edge of the image or to the outside of the image so as to maintain or enlarge the field of view edge region.
For convenience of description, the constraint Term corresponding to the field edge region may be simply referred to as a field edge constraint Term, and the field edge constraint Term may be denoted as Term3, where the expression of Term3 is as follows:
Term3(i,j)=SUM(i,j)∈EdgeRegion||M0(i,j)+Dt(i,j)-Func3(i,j)[M0(i,j)]||
Where M0 (i, j) represents the image coordinates of the pixel point located in the field edge region (i.e., (i, j) ∈ EdgeRegion), dt (i, j) represents the displacement matrix corresponding to this M0 (i, j), func3(i,j) represents the displacement function of M0 (i, j), and i.
The field-of-view edge constraint term needs to ensure that under the action of the displacement matrix Dt (i, j), the image matrix M0 (i, j) tends to be a proper displacement of the coordinate value M0 (i, j), and the displacement rule is to move properly only along the edge area or to the outer side of the edge area, so that the displacement is avoided to move to the inner side of the edge area, and the benefit of this is that the loss of image information caused by subsequent rectangular cutting can be reduced as much as possible, and even the image content of the field-of-view edge area can be increased in gain.
Assuming that a pixel point a located in the field-of-view edge region of the image has an image coordinate of [ u0, v0]T, a tangential vector of the point a along the field-of-view boundary is denoted as y (u 0, v 0), a normal vector to the outside of the image is denoted as x (u 0, v 0), and when the boundary region is known, x (u 0, v 0) and y (u 0, v 0) are also known. Func3(i,j) may be expressed specifically as:
Where α (u 0 (i, j), v0 (i, j)) needs to be limited to not less than 0to ensure that the point does not shift inward of the field of view edge, positive and negative of β (u 0 (i, j), v0 (i, j)) need not be limited. α (u 0 (i, j), v0 (i, j)) and β (u 0 (i, j), v0 (i, j)) are intermediate unknowns that do not require display resolution.
Term3 (i, j) may be expressed specifically as:
Where du (i, j) and dv (i, j) are unknowns that need to be solved, and the constraint equation needs to be solved later to ensure that the term is as small as possible.
(3) The constraint item corresponding to the background area is used for constraining the displacement of the pixel point in the image matrix corresponding to the background area in the image, and a first vector corresponding to the pixel point before the displacement and a second vector corresponding to the pixel point after the displacement are kept parallel as far as possible, so that the image content in the background area is smooth and continuous and the image content penetrating through the human image in the background area is continuous and consistent in human eyes, wherein the first vector represents the vector between the pixel point before the displacement and the neighborhood pixel point corresponding to the pixel point before the displacement, and the second vector represents the vector between the pixel point after the displacement and the neighborhood pixel point corresponding to the pixel point after the displacement.
For convenience of description, the constraint Term corresponding to the background region may be simply referred to as a background constraint Term, and the background constraint Term may be denoted as Term4, where the expression of Term4 is as follows:
Term4(i,j)=SUM(i,j)∈Bkg Re gion{Func4(i,j)(M0(i,j),M0(i,j)+Dt(i,j))}
where M0 (i, j) represents the image coordinates of the pixel point located in the background area (i.e., (i, j) ∈ Bkg Re gion), dt (i, j) represents the displacement matrix corresponding to this M0 (i, j), func4(i,j) represents the displacement function of M0 (i, j), and i.
The background constraint term is to ensure that the coordinate value M0 (i, j) under the influence of the displacement matrix Dt (i, j) tends to be an appropriate displacement of the coordinate value M0 (i, j). In the application, each pixel point in the background area can be divided into different control domains, and the application does not limit the size, shape and quantity of the control domains. In particular, for a background pixel located at the boundary of the target object and the background, the control domain thereof needs to extend across the target object to the other end of the target object. Assuming that a certain background pixel point A and a control domain pixel point set { Bi } thereof exist, the control domain is a neighborhood of the point A, and the control domain of the point A spans the intermediate mask area and extends to the other end of the mask area. Bi represents the neighborhood pixels of A, which are shifted to A 'and { B' i } respectively after shifting. The displacement rule is that the background constraint term will restrict vector ABi and vector a 'B' i to remain as parallel in direction as possible. The method has the advantages that the transition smoothness of the target object and the background area can be ensured, the image content of the through portrait in the background area can be continuously consistent in human vision, and phenomena such as distortion or cavity wiredrawing and the like of the background image are avoided. Func4(i,j) may be expressed specifically as:
wherein:
Where angle [ ] represents the angle between the two vectors, vec1 represents the vector of the corrected foreground point [ i, j ]T and a point in its control domain, vec2 represents the vector of the corrected background point [ i, j ]T and a point in its control domain, and SUM(i+di,j+dj)∈CtrlRegion represents the SUM of all vector angles in the control domain.
Term4 (i, j) may be expressed specifically as:
Term4(i,j)=SUM(i,j)∈BkgRegion{SUM(i+di,j+dj)∈CtrlRegion{angle[vec1,vec2]}}
Where du (i, j) and dv (i, j) are unknowns that need to be solved, and the constraint equation needs to be solved later to ensure that the term is as small as possible.
(4) The regular constraint item is used for constraining the difference value of the displacement matrixes of any two adjacent pixel points in the displacement matrixes respectively corresponding to the background area, the mask area and the field-of-view edge area in the image to be smaller than a preset threshold value so as to enable the overall image content of the image to be smooth and continuous.
The canonical constraint Term can be noted as Term5, the arithmetic expression of Term5 is as follows:
Term5(i,j)=SUM(i,j)∈AllRegion{Func5(i,j)(Dt(i,j))}
For the whole image range (i.e., (i,j)∈AllRegion's pixel point M0 (i, j)), the regular constraint term should ensure that the displacement matrix Dt (i, j) of the neighboring pixel points is smooth and continuous to avoid local excessive transitions, the limiting principle is that the difference between the displacement at point [ i, j ]T and the displacement of its neighboring point (i+di, j+dj) should be as small as possible (i.e., less than a certain threshold value). Func5(i,j) can be expressed specifically as:
term5 (i, j) may be expressed specifically as:
Where du (i, j) and dv (i, j) are unknowns that need to be solved, and the constraint equation needs to be solved later to ensure that the term is as small as possible.
And obtaining displacement matrixes corresponding to the areas respectively according to the constraint items and the weight coefficients corresponding to the constraint items.
Specifically, weight coefficients can be set for constraint items of a mask region, a view field edge region and a background region and regular constraint items, a constraint equation is established according to each constraint item and the corresponding weight coefficient, and the constraint equation is solved, so that the offset of each position point in each region can be obtained.
Assuming that the algorithm constrains the coordinate matrix of the rectified image (which may also be referred to as the target image matrix) to be Mt (i, j), mt (i, j) = [ ut (i, j), vt (i, j) ]T, the displacement matrix compared to the image matrix M0 (i, j) is Dt (i, j), dt (i, j) = [ du (i, j), dv (i, j) ]T, that is:
Mt(i,j)=M0(i,j)+Dt(i,j)
ut(i,j)=u0(i,j)+du(i,j)
vt(i,j)=v0(i,j)+dv(i,j)
Weight coefficients are allocated to each constraint term, and a constraint equation is constructed as follows:
Dt(i,j)=(du(i,j),dv(i,j))
=arg min(α1(i,j)×Term1(i,j)+α2(i,j)×Term2(i,j)+α3(i,j)×Term3(i,j)+α4(i,j)×Term4(i,j)+α5(i,j)×Term5(i,j))
wherein, alpha 1 (i, j) to alpha 5 (i, j) are weight coefficients (weight matrix) corresponding to Term1 to Term5 respectively.
Solving the constraint equation by using a least square method or a gradient descent method or various improved algorithms to finally obtain a displacement matrix Dt (i, j) of each pixel point of the image. Based on the displacement matrix Dt (i, j), a transformed image can be obtained.
According to the application, the three-dimensional transformation effect of the image is realized by means of the three-dimensional model, perspective distortion correction is carried out on the face of the target person in the image shot at a short distance, so that the relative proportion and the relative position of the facial features of the corrected target person are closer to those of the target person, and the shooting imaging effect under a self-shooting scene can be remarkably improved.
In a possible implementation manner, for a scene of a recorded video, the terminal may perform distortion correction processing on a plurality of image frames in the recorded video by using the method in the embodiment shown in fig. 3, so as to obtain a video after distortion correction. The terminal may directly play the video after distortion correction on the screen, or the terminal may display the video before distortion correction in a part of the area on the screen and display the video after distortion correction in another part of the area in a split-screen manner, see fig. 12f.
Fig. 10 is a flowchart of a second embodiment of an image transformation method according to the present application, as shown in fig. 10, the method of this embodiment may be applied to the application architecture shown in fig. 1, and the execution body may be the terminal shown in fig. 2. The image transformation method may include:
Step 1001, acquiring a first image.
In the application, the first image is already stored in the picture library of the second terminal, and the first image can be a photo taken by the second terminal or a frame of image in a video taken by the second terminal, and the application is not limited in particular to the acquisition mode of the first image.
The first terminal and the second terminal that currently perform distortion correction processing on the first image may be the same device or different devices.
In one possible implementation manner, the second terminal may be any device having a shooting function, for example, a camera, a video camera, etc., as the device for acquiring the first image, where the second terminal stores the shot image in a local or cloud. The first terminal may be any device having an image processing function, such as a mobile phone, a computer, a tablet pc, or the like, as a processing device for performing distortion correction on the first image. The first terminal can receive the first image from the second terminal or the cloud end in a wired or wireless communication mode, and the first terminal can acquire the first image shot by the second terminal through a storage medium (such as a USB flash disk).
In one possible implementation, the first terminal has both a shooting function and an image processing function, for example, a mobile phone, a tablet computer, etc., and the first terminal acquires a first image from a local picture library, or the first terminal shoots and acquires the first image based on an instruction triggered by the shutter being pressed.
In one possible implementation, the first image includes a face of the target person, and the distortion of the face of the target person in the first image is caused by the second terminal having a target distance between the face of the target person and the second terminal that is less than a first preset threshold when the first image is captured. Namely, when the second terminal shoots the first image, the target distance between the face of the target person and the camera is smaller. When the front camera is used for self-shooting, the distance between the face and the camera is smaller, and the problem of 'near-far-small' face perspective distortion exists, for example, when the target distance between the face of a target person and the front camera is too small, the nose in an image is possibly enlarged, the face is lengthened and other effects can be caused due to the fact that the distances from different parts in the face to the camera are different. And when the distance between the face of the target person and the front camera is large, the above problem may be weakened. Therefore, the application sets a threshold value, and when the distance between the face of the target person and the front camera is considered to be smaller than the threshold value, the obtained first image containing the face of the target person is distorted, and the first image needs to be distorted and corrected. The preset threshold value is within 80 cm, and optionally, the preset threshold value can be set to be 50 cm. It should be noted that, the specific value of the preset threshold may depend on the performance of the front camera, the shooting illumination, and the like, which is not particularly limited in the present application.
The target distance between the face of the target person and the camera may be the distance between the foremost end position (e.g., nose) on the face of the target person and the camera, or the target distance between the face of the target person and the camera may be the distance between a specified portion (e.g., eyes, mouth, nose, etc.) on the face of the target person and the camera, or the target distance between the face of the target person and the camera may be the distance between the center position (e.g., nose on the front of the target person, or cheekbone position on the side of the target person, etc.) on the face of the target person and the camera. The definition of the target distance may be determined according to the specific situation of the first image, which is not particularly limited by the present application.
The terminal may obtain the target distance according to a screen ratio of a face of the target person in the first image and an FOV of a camera of the second terminal. When the second terminal includes multiple cameras, after capturing the first image, information of the cameras capturing the first image is recorded in exchangeable image file format (EXIF) information, and thus, the FOV of the second terminal refers to the FOV of the cameras recorded in the EXIF information. The principle of this is referred to step 302 above, and will not be described here, wherein the FOV can be obtained from the FOV in the EXIF information of the first image or calculated from the equivalent focal length in the EXIF information, for example FOV =2.0×atan (43.27/2 f), where 43.27 is the diagonal length of the 135mm film and f represents the equivalent focal length. . The terminal may also obtain the target distance according to the target shooting distance stored in the EXIF information of the first image. The EXIF is specially set for the photo of the digital camera, the attribute information and the shooting data of the digital photo can be recorded, and the terminal can directly read the data such as the target shooting distance, the FOV or the equivalent focal length when the first image is shot from the EXIF information, so that the target distance is obtained. The principle of this method can also refer to the above step 302, and will not be described herein.
In one possible implementation, the first image includes a face of the target person, and the distortion of the face of the target person in the first image is caused by the FOV of the camera being greater than a second preset threshold and the pixel distance between the face of the target person and an edge of the FOV being less than a third preset threshold when the second terminal captures the first image. If the camera of the terminal is a wide-angle camera, distortion can be caused when the face of the target person is at the edge of the FOV of the camera, and the distortion can be reduced or even eliminated when the face of the target person is at the middle FVO of the camera. Therefore, the application sets two thresholds, considers that the FOV of the camera of the terminal is larger than the corresponding threshold, and when the pixel distance between the face of the target person and the edge of the FOV is smaller than the corresponding threshold, the acquired first image containing the face of the target person is distorted, and the distortion correction is needed. The FOV corresponds to a threshold of 90 ° and the pixel distance corresponds to a threshold of one quarter of the length or width of the first image. The specific value of the threshold may depend on the performance of the camera, the shooting illumination, and the like, which is not particularly limited in the present application.
The pixel distance may be the number of pixels between the foremost end position on the face of the target person and the boundary of the first image, or the pixel distance may be the number of pixels between a specified portion on the face of the target person and the boundary of the first image, or the pixel distance may be the number of pixels between the center position on the face of the target person and the boundary of the first image.
Step 1002, displaying a distortion correction function menu.
When the face of the target person has distortion, the terminal displays a popup window for providing a selection control for whether to perform distortion correction, for example, as shown in fig. 13c, or displays a distortion correction control for opening a distortion correction function menu, for example, as shown in fig. 12 d. When the user clicks the "yes" control or clicks the distortion correction control, the distortion correction function menu is displayed in response to an instruction generated by the user operation.
The present application provides a distortion correction function menu including options for changing transformation parameters, such as an option for adjusting an equivalent analog photographing distance, an option for adjusting a displacement distance, an option for adjusting the relative position and/or relative proportion of five sense organs, and the like. The options can adjust the corresponding transformation parameters in the manner of a slider or a control, namely the transformation parameters can change the value of the transformation parameters by adjusting the position of one slider, and the selected value of the transformation parameters can be determined by triggering one control. When the distortion correction function menu is initially displayed, the transformation parameter values in each option on the menu can be default values or pre-calculated values. For example, the slider corresponding to the equivalent analog photographing distance may be initially located at a value of 0 as shown in fig. 11, or the terminal obtains an adjustment amount of the equivalent analog photographing distance according to an image conversion algorithm, and displays the slider at a value corresponding to the adjustment amount as shown in fig. 13 e.
Fig. 11 shows an exemplary schematic view of a distortion correcting function menu, as shown in fig. 11, where the function menu includes four areas, a left part of the function menu is used to display a first image, the first image is placed under a coordinate axis including an xyz axis, a right upper part of the function menu includes two controls, one is used to save a picture, the other is used to start a transformation process of a recording image, the lower right part of the function menu includes an expression control and an action control, the expression template selectable under the expression control includes six expressions of holding unchanged, happy, wounded, disfiguring, surprise and happy, and the user clicks the control of the corresponding expression to indicate that the expression template is selected, and the terminal can transform the relative proportion and the relative position of the facial sense of the face in the image of the left part according to the selected expression template. For example, the eyes on a face can be reduced, the corners of the mouth can be bent upwards, the eyes and the mouth can be grown up, the eyes and the corners of the mouth can be bent downwards, the optional action templates under the action control comprise six actions including no action, nodding, shaking the head left and right, shaking the head in Xinjiang, blinking and laughing, the user can select the action templates by clicking the control of the corresponding action, and the terminal can change the facial sense organs of the face in the left part image for multiple times according to the selected action templates. For example, a normal nodding comprises two actions of head lowering and head raising, a normal nodding comprises two actions of head raising and head raising, a blinking comprises two actions of eye closing and opening, a functional menu comprises four sliders below, the four sliders correspond to the distance, the nodding and head raising and lowering and head turning respectively, the distance between a face and a camera can be adjusted by moving the sliders in the distance left and right, the nodding and head turning can be adjusted by moving the sliders in the position left and right, the direction (left or right) and angle of face turning can be adjusted by moving the sliders in the nodding and head turning up and down, the direction (upward or downward) and angle of face turning can be adjusted by moving the sliders in the clockwise direction, and the angle of face turning clockwise or counterclockwise can be adjusted by moving the sliders in the position left and right. The user can adjust the corresponding transformation parameter value by controlling the one or more sliders or controls, and the terminal obtains the corresponding transformation parameter value after detecting the operation of the user.
Step 1003, obtaining transformation parameters input by a user on the distortion correction function menu.
The transformation parameters are obtained from user operations on options contained on the menu of distortion correction functions (e.g., from user drag operations on sliders associated with the transformation parameters; and/or from user trigger operations on controls associated with the transformation parameters).
In one possible implementation, the distortion correction function menu comprises an option for adjusting the displacement distance, and the obtaining of the transformation parameters input by the user on the distortion correction function menu comprises the step of obtaining the adjustment direction and the displacement distance according to an instruction triggered by the operation of a control or a sliding block in the option for adjusting the displacement distance.
In one possible implementation, the distortion correction function menu comprises an option for adjusting the relative position and/or relative proportion of the five sense organs, and the obtaining of the transformation parameters input by the user on the distortion correction function menu comprises the steps of obtaining the adjustment direction, the displacement distance and/or the size of the five sense organs according to an instruction triggered by the operation of a control or a sliding block in the option for adjusting the relative position and/or relative proportion of the five sense organs by the user.
In one possible implementation, the distortion correction function menu comprises an angle adjustment option, and the obtaining of the transformation parameters input by the user on the distortion correction function menu comprises obtaining the adjustment direction and the adjustment angle according to an instruction triggered by the operation of a control or a sliding block in the angle adjustment option.
In one possible implementation, the distortion correction function menu comprises an option for adjusting the expression, and the obtaining of the transformation parameters input by the user on the distortion correction function menu comprises the step of obtaining a new expression template according to an instruction triggered by the operation of a control or a sliding block in the option for adjusting the expression by the user.
In one possible implementation, the distortion correction function menu comprises an option of adjusting the action, and the obtaining of the transformation parameters input by the user on the distortion correction function menu comprises obtaining a new action template according to an instruction triggered by the operation of a control or a sliding block in the option of adjusting the action by the user.
Step 1004, performing first processing on the first image to obtain a second image, where the first processing includes performing distortion correction on the first image according to the transformation parameters.
And the terminal performs perspective distortion correction on the first image according to the transformation parameters to obtain a second image, wherein the face of the target person in the second image is closer to the real appearance of the face of the target person than the face of the target person in the first image, namely, the relative proportion and the relative position of the five sense organs of the target person in the second image are closer to the relative proportion and the relative position of the five sense organs of the face of the target person than the relative proportion and the relative position of the five sense organs of the target person in the first image.
The second image obtained by perspective distortion correction of the first image can eliminate the conditions of size change, stretching and the like of the five sense organs of the face of the target person, so that the relative proportion and the relative position of the five sense organs of the face of the target person in the second image are approximate to, and even restore to, the relative proportion and the relative position of the real appearance of the face of the target person.
In a possible implementation manner, the terminal may further obtain a recording instruction according to a triggering operation of the user on the start recording control, and then start recording the second image in the process of obtaining the second image, where the start recording control becomes a stop recording control, the user triggers the stop recording control, and the terminal may receive the stop recording instruction and further stop recording.
The recording process in the application can comprise at least two of the following scenes:
One scenario is that, for a first image, before distortion correction starts, a user clicks a control for starting recording, and a terminal starts a screen recording function in response to an instruction generated by the operation, that is, records a screen of the terminal. At the moment, the user clicks a control on the distortion correction function menu or drags a sliding block on the distortion correction function menu, the transformation parameters are set, and along with the operation of the user, the terminal performs distortion correction processing or other transformation processing on the first image based on the transformation parameters, so that the processed second image is displayed on the screen. The method comprises the steps that from the process that a control or a sliding block on a distortion correction function menu is operated, to the process that a first image is converted into a second image, the process is recorded by a recording function of a terminal, when a user clicks the control for stopping recording, the terminal closes the recording function, and at the moment, the terminal acquires videos of the process.
Another scenario is that, for a video segment, before distortion correction starts, a user clicks a control for starting recording, and the terminal starts a screen recording function in response to an instruction generated by the operation, that is, records a screen of the terminal. At this time, the user clicks a control on the distortion correction function menu or drags a slider on the distortion correction function menu, the transformation parameters are set, and as the user operates, the terminal performs distortion correction processing or other transformation processing on a plurality of image frames in the video based on the transformation parameters, so that the processed video is played on the screen. From the process that the control or the sliding block on the distortion correction function menu is operated to the process that the video after the processing is played is recorded by the recording function of the terminal, when a user clicks the control for stopping recording, the terminal closes the recording function, and at the moment, the terminal acquires the video of the process.
In one possible implementation manner, the terminal may further obtain a storage instruction according to a triggering operation of the user on the picture saving control, and then store the currently obtained image in the picture library.
In the present application, the procedure of performing perspective distortion correction on the first image to obtain the second image may refer to the description of step 303, which is different in that the present application may perform image transformation for the back shift of the face of the target person, and may perform image transformation for other transformations of the face of the target person, such as panning, changing expression, changing motion, etc. On the basis, besides the distortion correction of the first image, the first image can be arbitrarily transformed, even if the first image is not distorted, the face of the target person can be transformed according to the transformation parameters input by the user on the distortion correction function menu, so that the face of the target person in the transformed image is more similar to the appearance of the transformed parameters selected by the user.
In one possible implementation, in step 1001, in addition to acquiring the first image including the face of the target person, annotation data of the first image, such as an image segmentation mask, an annotation frame position, a feature point position, and the like, may be acquired. In step 1003, after obtaining the transformed second image, the annotation data may be transformed according to the same transformation method as in step 1003, to obtain a new annotation file.
Therefore, the image library with the annotation data can be augmented, namely, one image is utilized to generate various transformation images, the transformed images do not need to be annotated again manually, the effects of the transformed images are natural and real, the differences between the transformed images and the original images are large, and the deep learning training is facilitated.
In one possible implementation, in step 1003, a transformation of the first three-dimensional model is specified, for example, to shift the first three-dimensional model left or right by about 6cm (face left-right eye distance). After the transformed second image is obtained, the first image and the second image are respectively used as the input of the left eye and the right eye of the VR device, so that the three-dimensional display effect of the target main body can be realized.
Therefore, the common two-dimensional image or video can be converted into the VR input film source, the three-dimensional display effect of the face of the target person is realized, and the depth of the three-dimensional model is considered, so that the three-dimensional effect of the whole target object can be realized.
According to the application, the three-dimensional transformation effect of the image is realized by means of the three-dimensional model, perspective distortion correction is carried out on the face of the target person of the image shot under the close-range condition in the picture library, so that the relative proportion and the relative position of the facial features of the corrected target person are closer to those of the target person, and the imaging effect can be changed. Further, the face of the target person is correspondingly transformed according to the transformation parameters input by the user, so that the diversity transformation of the face can be realized, and the virtual transformation of the image can be realized.
In the above embodiment, a three-dimensional model is built for a face of a target person in an image, so as to implement distortion correction for the face. The image transformation method provided by the application can be also suitable for distortion correction of any other target object, and is different in that the object for building the three-dimensional model is changed from the face of the target person to the target object. The present application is not particularly limited to the object of image transformation.
The image transformation method of the present application is described below in two specific embodiments.
Fig. 12a to 12f illustrate a process of distortion correction of a terminal in a self-timer scene.
As shown in fig. 12a, the user clicks a camera icon on the desktop of the terminal, turning on the camera.
As shown in fig. 12b, the camera is turned on by default, and the image of the target scene acquired by the rear camera is displayed on the screen of the terminal. And clicking a camera conversion control on the photographing function menu by the user, and switching the camera to the front camera.
As shown in fig. 12c, a first image of a target scene acquired by a front camera is displayed on a screen of the terminal, where the target scene includes a face of a user.
When the distance between the face of the target person in the first image and the front camera is less than the preset threshold,
In one case, the terminal performs distortion correction on the first image by using the image transformation method in the embodiment shown in fig. 3 to obtain a second image, and the second image is displayed on the screen of the terminal as a preview image because the shutter is not triggered. As shown in fig. 12d, at this time, the terminal displays the "distortion correction" word on the screen, and displays the close control, and if the user does not need to display the corrected preview image, the user may click on the close control, and after receiving the corresponding instruction, the terminal displays the first image as the preview image on the screen.
In another case, as shown in fig. 12e, the user clicks the shutter control on the photographing function menu, and the terminal performs distortion correction on the photographing image (the first image) by using the method in the embodiment shown in fig. 3 to obtain a second image, where the second image is stored in the picture library.
In the third case, as shown in fig. 12f, the terminal corrects the distortion of the first image to obtain the second image by using the method in the embodiment shown in fig. 3, and since the shutter is not triggered, the terminal displays both the first image and the second image as preview images on the screen of the terminal, so that the user can intuitively see the difference between the first image and the second image.
Fig. 13a to 13h illustrate a process of correcting distortion of an image in a picture library.
As shown in fig. 13a, the user clicks a picture library icon on the desktop of the terminal, and opens the picture library.
As shown in fig. 13b, the user selects a first image to be subjected to distortion correction in the image library, and when the first image is shot, the distance between the face of the target person and the camera is smaller than a preset threshold. The foregoing distance information may be obtained by the distance obtaining method in the embodiment of the method shown in fig. 10, which is not described herein.
As shown in fig. 13c, when the terminal detects that the face of the target person in the currently displayed image has distortion, a popup window can be popped up on the screen, the popup window displays a word of "whether the image has distortion and corrects distortion" or not, and controls of "yes" and "no" are displayed below the word, and when the user clicks "yes", a menu of distortion correction functions is displayed. It should be noted that fig. 13c provides an example of a pop-up window for enabling the user to select whether to perform distortion correction, but this is not limited to the interface or display content of the pop-up window, for example, the typeface content, typeface size, typeface font displayed on the pop-up window, the content on the two controls corresponding to "yes" and "no" may be implemented in other manners, and the implementation manner of the pop-up window is not limited in particular in the present application.
When the terminal detects that the face of the target person in the currently displayed image has distortion, a control can be displayed on the screen as shown in fig. 12d, and the control is used for opening or closing a distortion correction function menu. It should be noted that fig. 12d provides an example of a trigger control of the distortion correction function menu, but this does not limit the interface or display content of the pop-up window, for example, the position of the control, the content on the control, etc. may be implemented in other manners, and the implementation manner of the control is not specifically limited in the present application.
The distortion correction function menu may be as shown in fig. 11 or 14.
When the distortion correction function menu is initially displayed, the transformation parameter values in each option on the menu can be default values or pre-calculated values. For example, the slider corresponding to the distance may be initially located at a value of 0 as shown in fig. 11, or the terminal obtains an adjustment amount of the distance according to the image conversion algorithm, and displays the slider at a position corresponding to the adjustment amount as shown in fig. 13 d.
As shown in fig. 13e, the user can adjust the distance by dragging the slider corresponding to the distance. As shown in fig. 13f, the user can also select the expression (happiness) that he wishes to transform by triggering the control. This process may refer to the method in the embodiment shown in fig. 10, and will not be described here again.
As shown in fig. 13g, when the user is satisfied with the result of the transformation, the control for saving the picture can be triggered, and when the terminal receives an instruction for saving the picture, the currently obtained image is saved in the picture library.
As shown in fig. 13h, before the user selects the transformation parameter, the user may trigger a start recording control, and after receiving an instruction to start recording, the terminal starts a recording screen to record the second image acquisition process. In the process, the starting recording control is changed into the stopping recording control, and when the user triggers the stopping recording control, the terminal stops recording the screen.
Embodiment three, fig. 14 exemplarily shows other examples of the distortion correction function menu.
As shown in fig. 14, the distortion correction function menu includes a control for selecting and adjusting the position of the five sense organs, the user selects the nose (the oval control in front of the nose is black), then clicks the control for showing the enlargement, and the terminal enlarges the nose of the target person according to the set step length by adopting the method in the embodiment shown in fig. 10.
The embodiments shown in fig. 11, fig. 12a to fig. 12f, fig. 13a to fig. 13h, and fig. 14 are examples, but the embodiments of the present application are not limited to the configuration of the desktop, the distortion correction function menu, the photographing interface, and the like of the terminal, and the present application is not particularly limited to this.
Fig. 15 is a schematic structural diagram of an embodiment of an image conversion device according to the present application, and as shown in fig. 15, the device according to the present embodiment may be applied to the terminal shown in fig. 2. The image transformation module includes an acquisition module 1501, a processing module 1502, a recording module 1503, and a display module 1504. Wherein, the
In a self-shooting scene, an acquisition module 1501 is configured to acquire a first image for a target scene through a front camera, where the target scene includes a face of a target person, acquire a target distance between the face of the target person and the front camera, and a processing module 1502 is configured to perform a first process on the first image to obtain a second image when the target distance is less than a preset threshold, where the first process includes performing distortion correction on the first image according to the target distance, where the face of the target person in the second image is closer to a real appearance of the face of the target person than the face of the target person in the first image.
In one possible implementation, the target distance includes a distance between a front-most portion on a face of the target person and the front-facing camera, or a distance between a designated portion on the face of the target person and the front-facing camera, or a distance between a center position on the face of the target person and the front-facing camera.
In one possible implementation manner, the obtaining module 1501 is specifically configured to obtain a screen ratio of a face of the target person in the first image, and obtain the target distance according to the screen ratio and the field angle FOV of the front camera.
In one possible implementation, the acquiring module 1501 is specifically configured to acquire the target distance by a distance sensor, where the distance sensor includes a time-of-flight ranging TOF sensor, a structured light sensor, or a binocular sensor.
In one possible implementation, the preset threshold is less than 80 cm.
In one possible implementation, the second image includes a preview image or an image obtained after triggering a shutter.
In one possible implementation, the face of the target person in the second image is closer to the real appearance of the face of the target person than the face of the target person in the first image, including that the relative proportion of the five sense organs of the target person in the second image is closer to the relative proportion of the five sense organs of the face of the target person than the relative proportion of the five sense organs of the target person in the first image, and/or that the relative position of the five sense organs of the target person in the second image is closer to the relative position of the five sense organs of the face of the target person than the relative position of the five sense organs of the target person in the first image.
In a possible implementation manner, the processing module 1502 is specifically configured to fit, according to the target distance, a face of the target person in the first image to a standard face model to obtain depth information of the face of the target person, and perform perspective distortion correction on the first image according to the depth information to obtain the second image.
In one possible implementation manner, the processing module 1502 is specifically configured to establish a first three-dimensional model of a face of the target person, transform a pose and/or a shape of the first three-dimensional model to obtain a second three-dimensional model of the face of the target person, obtain a pixel displacement vector field of the face of the target person according to the depth information, the first three-dimensional model and the second three-dimensional model, and obtain the second image according to the pixel displacement vector field of the face of the target person.
In a possible implementation manner, the processing module 1502 is specifically configured to perform perspective projection on the first three-dimensional model according to the depth information to obtain a first coordinate set, where the first coordinate set includes coordinate values corresponding to a plurality of pixels in the first three-dimensional model, perform perspective projection on the second three-dimensional model according to the depth information to obtain a second coordinate set, where the second coordinate set includes coordinate values corresponding to a plurality of pixels in the second three-dimensional model, calculate a coordinate difference between a first coordinate value and a second coordinate value to obtain a pixel displacement vector field of the target object, where the first coordinate value includes coordinate values corresponding to a first pixel in the first coordinate set, and the second coordinate value includes coordinate values corresponding to a first pixel in the second coordinate set, where the first pixel includes any one of a plurality of identical pixels included in the first three-dimensional model and the second three-dimensional model.
The image processing method comprises the steps of acquiring a first image, wherein the first image comprises a face of a target person, the face of the target person in the first image is distorted, displaying a distortion correction function menu, acquiring a module 1501, further acquiring transformation parameters input by a user on the distortion correction function menu, the transformation parameters at least comprise equivalent simulated shooting distances, the equivalent simulated shooting distances are used for simulating the distance between the face of the target person and a camera when a shooting terminal shoots the face of the target person, processing the first image to obtain a second image, and the first processing comprises the step of carrying out distortion correction on the first image according to the transformation parameters, wherein the face of the target person in the second image is closer to the real appearance of the face of the target person than the face of the target person in the first image.
In one possible implementation manner, the distortion of the face of the target person in the first image is caused by that a target distance between the face of the target person and the second terminal when the second terminal captures the first image is smaller than a first preset threshold, wherein the target distance comprises a distance between a front-most end position on the face of the target person and the front camera, or a distance between a designated part on the face of the target person and the front camera, or a distance between a center position on the face of the target person and the front camera.
In one possible implementation, the target distance is obtained by a screen ratio of a face of the target person in the first image and a FOV of a camera of the second terminal, or by an equivalent focal length in exchangeable image file format EXIF information of the first image.
In one possible implementation manner, the distortion correction function menu includes an option for adjusting an equivalent simulated shooting distance, and the obtaining module 1501 is specifically configured to obtain the equivalent simulated shooting distance according to an instruction triggered by a user operating a control or a slider in the option for adjusting the equivalent simulated shooting distance.
In one possible implementation manner, the value of the equivalent simulated shooting distance in the option of adjusting the equivalent simulated shooting distance includes a default value or a pre-calculated value when the distortion correction function menu is initially displayed.
In a possible implementation manner, the display module 1504 is further configured to display a pop-up window when there is distortion in the face of the target person, where the pop-up window is used to provide a selection control for correcting distortion, and respond to an instruction generated by a user operation when the user clicks the control for correcting distortion on the pop-up window.
In a possible implementation manner, the display module 1504 is further configured to display a distortion correction control when there is distortion in the face of the target person, where the distortion correction control is used to open the distortion correction function menu, and when the user clicks the distortion correction control, respond to an instruction generated by the user operation.
In one possible implementation, the distortion of the face of the target person in the first image is caused by the fact that when the second terminal captures the first image, the field angle FOV of the camera is greater than a second preset threshold, and the pixel distance between the face of the target person and the edge of the FOV is less than a third preset threshold, wherein the pixel distance comprises the number of pixels between the foremost end position on the face of the target person and the edge of the FOV, or the number of pixels between a designated part on the face of the target person and the edge of the FOV, or the number of pixels between the center position on the face of the target person and the edge of the FOV.
In one possible implementation, the FOV is derived from EXIF information of the first image.
In one possible implementation, the second preset threshold is 90 °, and the third preset threshold is one fourth of the length or width of the first image.
In one possible implementation manner, the distortion correction function menu includes an option for adjusting the displacement distance, and the obtaining module 1501 is further configured to obtain the adjustment direction and the displacement distance according to an instruction triggered by the operation of a control or a slider in the option for adjusting the displacement distance by a user.
In one possible implementation manner, the distortion correction function menu includes an option for adjusting the relative position and/or relative proportion of the five sense organs, and the obtaining module 1501 is further configured to obtain the adjustment direction, the displacement distance and/or the size of the five sense organs according to an instruction triggered by the operation of the control or the slider in the option for adjusting the relative position and/or relative proportion of the five sense organs by the user.
In one possible implementation manner, the distortion correction function menu includes an angle adjustment option, the obtaining module 1501 is further configured to obtain an adjustment direction and an adjustment angle according to an instruction triggered by an operation of a control or a slider in the angle adjustment option by a user, or the distortion correction function menu includes an expression adjustment option, the obtaining module 1501 is further configured to obtain a new expression template according to an instruction triggered by an operation of a control or a slider in the expression adjustment option by a user, or the distortion correction function menu includes an adjustment action option, and the obtaining module 1501 is further configured to obtain a new action template according to an instruction triggered by an operation of a control or a slider in the adjustment action option by a user.
In one possible implementation, the face of the target person in the second image is closer to the real appearance of the face of the target person than the face of the target person in the first image, including that the relative proportion of the five sense organs of the target person in the second image is closer to the relative proportion of the five sense organs of the face of the target person than the relative proportion of the five sense organs of the target person in the first image, and/or that the relative position of the five sense organs of the target person in the second image is closer to the relative position of the five sense organs of the face of the target person than the relative position of the five sense organs of the target person in the first image.
In a possible implementation manner, the processing module 1502 is specifically configured to fit, according to the target distance, a face of the target person in the first image to a standard face model to obtain depth information of the face of the target person, and perform perspective distortion correction on the first image according to the depth information and the transformation parameter to obtain the second image.
In one possible implementation manner, the processing module 1502 is specifically configured to establish a first three-dimensional model of a face of the target person, transform a pose and/or a shape of the first three-dimensional model according to the transformation parameters to obtain a second three-dimensional model of the face of the target person, obtain a pixel displacement vector field of the face of the target person according to the depth information, the first three-dimensional model and the second three-dimensional model, and obtain the second image according to the pixel displacement vector field of the face of the target person.
In a possible implementation manner, the processing module 1502 is specifically configured to perform perspective projection on the first three-dimensional model according to the depth information to obtain a first coordinate set, where the first coordinate set includes coordinate values corresponding to a plurality of pixels in the first three-dimensional model, perform perspective projection on the second three-dimensional model according to the depth information to obtain a second coordinate set, where the second coordinate set includes coordinate values corresponding to a plurality of pixels in the second three-dimensional model, calculate a coordinate difference between a first coordinate value and a second coordinate value to obtain a pixel displacement vector field of the target object, where the first coordinate value includes coordinate values corresponding to a first pixel in the first coordinate set, and the second coordinate value includes coordinate values corresponding to a first pixel in the second coordinate set, where the first pixel includes any one of a plurality of identical pixels included in the first three-dimensional model and the second three-dimensional model.
In one possible implementation manner, the system further comprises a recording module 1503, the acquisition module is further configured to acquire a recording instruction according to a triggering operation of a user on a recording control, and the recording module 1503 is configured to start recording of the second image acquisition process according to the recording instruction until a recording stopping instruction generated by the triggering operation of the user on the recording stopping control is received.
In one possible implementation manner, the display module 1504 is configured to display a distortion correction function menu on a screen, where the distortion correction function menu includes one or more sliders and/or one or more controls, receive a distortion correction instruction, where the distortion correction instruction includes a transformation parameter generated when a user performs a touch operation on the one or more sliders and/or the one or more controls, where the transformation parameter includes at least an equivalent simulated shooting distance, where the equivalent simulated shooting distance is used to simulate a distance between a face of the target person and a camera when the shooting terminal shoots the face of the target person, and perform a first process on the first image according to the transformation parameter to obtain a second image, where the first process includes correcting the first image, and the face of the target person in the second image is closer to a real appearance of the face of the target person than the face of the target person in the first image.
The device of the present embodiment may be used to implement the technical solution of the method embodiment shown in fig. 3 or fig. 10, and its implementation principle and technical effects are similar, and are not described herein again.
In implementation, the steps of the above method embodiments may be implemented by integrated logic circuits of hardware in a processor or instructions in software form. The processor may be a general purpose processor, a Digital Signal Processor (DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (field programmable GATE ARRAY, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in the embodiment of the application can be directly embodied in a hardware encoding processor for execution or in a combination of hardware and software modules in the encoding processor for execution. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method.
The memory mentioned in the above embodiments may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an erasable programmable ROM (erasable PROM), an electrically erasable programmable EPROM (EEPROM), or a flash memory. The volatile memory may be random access memory (random access memory, RAM) which acts as external cache memory. By way of example, and not limitation, many forms of RAM are available, such as static random access memory (STATIC RAM, SRAM), dynamic random access memory (DYNAMIC RAM, DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (double DATA RATE SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (ENHANCED SDRAM, ESDRAM), synchronous link dynamic random access memory (SYNCHLINK DRAM, SLDRAM), and direct memory bus random access memory (direct rambus RAM, DR RAM). It should be noted that the memory of the systems and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (personal computer, server, network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. The storage medium includes a U disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (21)

Translated fromChinese
1.一种图像变换方法,其特征在于,包括:1. An image transformation method, comprising:通过前置摄像头针对目标场景获取第一图像,所述目标场景包括目标人物的人脸;Acquire a first image of a target scene through a front camera, wherein the target scene includes a face of a target person;获取所述目标人物的人脸与所述前置摄像头之间的目标距离;Obtaining a target distance between the face of the target person and the front camera;当所述目标距离小于预设阈值时,对所述第一图像进行第一处理得到第二图像;所述第一处理包括根据所述目标距离对所述第一图像进行畸变矫正;其中,所述第二图像中的目标人物的人脸相较于所述第一图像中的目标人物的人脸更接近于所述目标人物的人脸的真实样貌;When the target distance is less than a preset threshold, performing a first processing on the first image to obtain a second image; the first processing includes performing distortion correction on the first image according to the target distance; wherein the face of the target person in the second image is closer to the real appearance of the face of the target person than the face of the target person in the first image;其中,所述根据所述目标距离对所述第一图像进行畸变矫正,包括:The step of performing distortion correction on the first image according to the target distance includes:根据所述目标人物的人脸的不同转角、所述目标距离将所述第一图像中的所述目标人物的人脸与标准人脸模型进行拟合得到所述目标人物的人脸的深度信息;Fitting the face of the target person in the first image with a standard face model according to different rotation angles of the face of the target person and the target distance to obtain depth information of the face of the target person;根据所述深度信息对所述第一图像进行透视畸变矫正得到所述第二图像。The second image is obtained by performing perspective distortion correction on the first image according to the depth information.2.根据权利要求1所述的方法,其特征在于,所述目标距离包括所述目标人物的人脸上的最前端部位与所述前置摄像头之间的距离;或者,所述目标人物的人脸上的指定部位与所述前置摄像头之间的距离;或者,所述目标人物的人脸上的中心位置与所述前置摄像头之间的距离。2. The method according to claim 1 is characterized in that the target distance includes the distance between the frontmost part of the face of the target person and the front camera; or, the distance between a specified part of the face of the target person and the front camera; or, the distance between the center position of the face of the target person and the front camera.3.根据权利要求1或2所述的方法,其特征在于,所述获取所述目标人物的人脸与所述前置摄像头之间的目标距离,包括:3. The method according to claim 1 or 2, characterized in that the step of obtaining the target distance between the face of the target person and the front camera comprises:获取所述目标人物的人脸在所述第一图像中的屏占比;Obtaining a screen ratio of the target person's face in the first image;根据所述屏占比和所述前置摄像头的视场角FOV得到所述目标距离。The target distance is obtained according to the screen-to-body ratio and the field of view FOV of the front camera.4.根据权利要求1或2所述的方法,其特征在于,所述获取所述目标人物的人脸与所述前置摄像头之间的目标距离,包括:4. The method according to claim 1 or 2, characterized in that the step of obtaining the target distance between the face of the target person and the front camera comprises:通过距离传感器获取所述目标距离,所述距离传感器包括飞行时间测距法TOF传感器、结构光传感器或者双目传感器。The target distance is acquired by a distance sensor, wherein the distance sensor includes a time-of-flight distance measurement method TOF sensor, a structured light sensor or a binocular sensor.5.根据权利要求1-4中任一项所述的方法,其特征在于,所述预设阈值小于80厘米。5. The method according to any one of claims 1-4, characterized in that the preset threshold is less than 80 centimeters.6.根据权利要求1-5中任一项所述的方法,其特征在于,所述第二图像包括预览图像或者触发快门后得到的图像。6 . The method according to claim 1 , wherein the second image comprises a preview image or an image obtained after a shutter is triggered.7.根据权利要求1-6中任一项所述的方法,其特征在于,所述第二图像中的目标人物的人脸相较于所述第一图像中的目标人物的人脸更接近于所述目标人物的人脸的真实样貌,包括:7. The method according to any one of claims 1 to 6, wherein the face of the target person in the second image is closer to the real appearance of the face of the target person than the face of the target person in the first image, comprising:所述第二图像中的目标人物的五官的相对比例相较于所述第一图像中的目标人物的五官的相对比例更接近于所述目标人物的人脸五官的相对比例;和/或,The relative proportions of the facial features of the target person in the second image are closer to the relative proportions of the facial features of the target person in the first image than the relative proportions of the facial features of the target person in the first image; and/or,所述第二图像中的目标人物的五官的相对位置相较于所述第一图像中的目标人物的五官的相对位置更接近于所述目标人物的人脸五官的相对位置。The relative positions of the facial features of the target person in the second image are closer to the relative positions of the facial features of the target person than the relative positions of the facial features of the target person in the first image.8.根据权利要求1-7中任一项所述的方法,其特征在于,所述根据所述深度信息对所述第一图像进行透视畸变矫正得到所述第二图像,包括:8. The method according to any one of claims 1 to 7, wherein performing perspective distortion correction on the first image according to the depth information to obtain the second image comprises:建立所述目标人物的人脸的第一三维模型;Establishing a first three-dimensional model of the target person's face;对所述第一三维模型的位姿和/或形状进行变换得到所述目标人物的人脸的第二三维模型;Transforming the posture and/or shape of the first three-dimensional model to obtain a second three-dimensional model of the target person's face;根据所述深度信息、所述第一三维模型和所述第二三维模型获取所述目标人物的人脸的像素位移向量场;Acquire a pixel displacement vector field of the face of the target person according to the depth information, the first three-dimensional model, and the second three-dimensional model;根据所述目标人物的人脸的像素位移向量场得到所述第二图像。The second image is obtained according to the pixel displacement vector field of the face of the target person.9.根据权利要求8所述的方法,其特征在于,所述根据所述深度信息、所述第一三维模型和所述第二三维模型获取所述目标人物的人脸的像素位移向量场,包括:9. The method according to claim 8, characterized in that the step of acquiring the pixel displacement vector field of the face of the target person according to the depth information, the first three-dimensional model and the second three-dimensional model comprises:根据所述深度信息对所述第一三维模型进行透视投影得到第一坐标集合,所述第一坐标集合包括对应于所述第一三维模型中的多个像素的坐标值;Performing perspective projection on the first three-dimensional model according to the depth information to obtain a first coordinate set, where the first coordinate set includes coordinate values corresponding to a plurality of pixels in the first three-dimensional model;根据所述深度信息对所述第二三维模型进行透视投影得到第二坐标集合,所述第二坐标集合包括对应于所述第二三维模型中的多个像素的坐标值;Performing perspective projection on the second three-dimensional model according to the depth information to obtain a second coordinate set, where the second coordinate set includes coordinate values corresponding to a plurality of pixels in the second three-dimensional model;计算第一坐标值和第二坐标值之间的坐标差得到所述目标人物的人脸的像素位移向量场,所述第一坐标值包括第一像素在所述第一坐标集合中对应的坐标值,所述第二坐标值包括所述第一像素在所述第二坐标集合中对应的坐标值,所述第一像素包括所述第一三维模型和所述第二三维模型包含的多个相同像素中的任意一个。The pixel displacement vector field of the face of the target person is obtained by calculating the coordinate difference between the first coordinate value and the second coordinate value, wherein the first coordinate value includes the coordinate value corresponding to the first pixel in the first coordinate set, the second coordinate value includes the coordinate value corresponding to the first pixel in the second coordinate set, and the first pixel includes any one of a plurality of identical pixels contained in the first three-dimensional model and the second three-dimensional model.10.一种图像变换装置,其特征在于,包括:10. An image conversion device, comprising:获取模块,用于通过前置摄像头针对目标场景获取第一图像,所述目标场景包括目标人物的人脸;获取所述目标人物的人脸与所述前置摄像头之间的目标距离;An acquisition module is used to acquire a first image of a target scene through a front camera, wherein the target scene includes a face of a target person; and acquire a target distance between the face of the target person and the front camera;处理模块,用于当所述目标距离小于预设阈值时,对所述第一图像进行第一处理得到第二图像;所述第一处理包括根据所述目标距离对所述第一图像进行畸变矫正;其中,所述第二图像中的目标人物的人脸相较于所述第一图像中的目标人物的人脸更接近于所述目标人物的人脸的真实样貌;A processing module, configured to perform a first processing on the first image to obtain a second image when the target distance is less than a preset threshold; the first processing includes performing distortion correction on the first image according to the target distance; wherein the face of the target person in the second image is closer to the real appearance of the face of the target person than the face of the target person in the first image;所述处理模块,具体用于根据所述目标人物的人脸的不同转角、所述目标距离将所述第一图像中的所述目标人物的人脸与标准人脸模型进行拟合得到所述目标人物的人脸的深度信息;根据所述深度信息对所述第一图像进行透视畸变矫正得到所述第二图像。The processing module is specifically used to fit the face of the target person in the first image with a standard face model according to different rotation angles of the target person's face and the target distance to obtain the depth information of the target person's face; and to perform perspective distortion correction on the first image according to the depth information to obtain the second image.11.根据权利要求10所述的装置,其特征在于,所述目标距离包括所述目标人物的人脸上的最前端部位与所述前置摄像头之间的距离;或者,所述目标人物的人脸上的指定部位与所述前置摄像头之间的距离;或者,所述目标人物的人脸上的中心位置与所述前置摄像头之间的距离。11. The device according to claim 10 is characterized in that the target distance includes the distance between the frontmost part of the face of the target person and the front camera; or, the distance between a specified part of the face of the target person and the front camera; or, the distance between the center position of the face of the target person and the front camera.12.根据权利要求10或11所述的装置,其特征在于,所述获取模块,具体用于获取所述目标人物的人脸在所述第一图像中的屏占比;根据所述屏占比和所述前置摄像头的视场角FOV得到所述目标距离。12. The device according to claim 10 or 11 is characterized in that the acquisition module is specifically used to obtain the screen ratio of the face of the target person in the first image; and obtain the target distance according to the screen ratio and the field of view FOV of the front camera.13.根据权利要求10或11所述的装置,其特征在于,所述获取模块,具体用于通过距离传感器获取所述目标距离,所述距离传感器包括飞行时间测距法TOF传感器、结构光传感器或者双目传感器。13. The device according to claim 10 or 11, characterized in that the acquisition module is specifically used to acquire the target distance through a distance sensor, and the distance sensor includes a time-of-flight ranging method TOF sensor, a structured light sensor or a binocular sensor.14.根据权利要求10-13中任一项所述的装置,其特征在于,所述预设阈值小于80厘米。14. The device according to any one of claims 10-13, characterized in that the preset threshold is less than 80 centimeters.15.根据权利要求10-14中任一项所述的装置,其特征在于,所述第二图像包括预览图像或者触发快门后得到的图像。15. The device according to any one of claims 10 to 14, characterized in that the second image comprises a preview image or an image obtained after a shutter is triggered.16.根据权利要求10-15中任一项所述的装置,其特征在于,所述第二图像中的目标人物的人脸相较于所述第一图像中的目标人物的人脸更接近于所述目标人物的人脸的真实样貌,包括:16. The device according to any one of claims 10 to 15, characterized in that the face of the target person in the second image is closer to the real appearance of the face of the target person than the face of the target person in the first image, comprising:所述第二图像中的目标人物的五官的相对比例相较于所述第一图像中的目标人物的五官的相对比例更接近于所述目标人物的人脸五官的相对比例;和/或,The relative proportions of the facial features of the target person in the second image are closer to the relative proportions of the facial features of the target person in the first image than the relative proportions of the facial features of the target person in the first image; and/or,所述第二图像中的目标人物的五官的相对位置相较于所述第一图像中的目标人物的五官的相对位置更接近于所述目标人物的人脸五官的相对位置。The relative positions of the facial features of the target person in the second image are closer to the relative positions of the facial features of the target person than the relative positions of the facial features of the target person in the first image.17.根据权利要求10-16中任一项所述的装置,其特征在于,所述处理模块,具体用于建立所述目标人物的人脸的第一三维模型;对所述第一三维模型的位姿和/或形状进行变换得到所述目标人物的人脸的第二三维模型;根据所述深度信息、所述第一三维模型和所述第二三维模型获取所述目标人物的人脸的像素位移向量场;根据所述目标人物的人脸的像素位移向量场得到所述第二图像。17. The device according to any one of claims 10-16 is characterized in that the processing module is specifically used to establish a first three-dimensional model of the target person's face; transform the posture and/or shape of the first three-dimensional model to obtain a second three-dimensional model of the target person's face; obtain a pixel displacement vector field of the target person's face based on the depth information, the first three-dimensional model and the second three-dimensional model; and obtain the second image based on the pixel displacement vector field of the target person's face.18.根据权利要求17所述的装置,其特征在于,所述处理模块,具体用于根据所述深度信息对所述第一三维模型进行透视投影得到第一坐标集合,所述第一坐标集合包括对应于所述第一三维模型中的多个像素的坐标值;根据所述深度信息对所述第二三维模型进行透视投影得到第二坐标集合,所述第二坐标集合包括对应于所述第二三维模型中的多个像素的坐标值;计算第一坐标值和第二坐标值之间的坐标差得到所述目标人物的人脸的像素位移向量场,所述第一坐标值包括第一像素在所述第一坐标集合中对应的坐标值,所述第二坐标值包括所述第一像素在所述第二坐标集合中对应的坐标值,所述第一像素包括所述第一三维模型和所述第二三维模型包含的多个相同像素中的任意一个。18. The device according to claim 17 is characterized in that the processing module is specifically used to perform perspective projection on the first three-dimensional model according to the depth information to obtain a first coordinate set, the first coordinate set including coordinate values corresponding to multiple pixels in the first three-dimensional model; perform perspective projection on the second three-dimensional model according to the depth information to obtain a second coordinate set, the second coordinate set including coordinate values corresponding to multiple pixels in the second three-dimensional model; calculate the coordinate difference between the first coordinate value and the second coordinate value to obtain a pixel displacement vector field of the face of the target person, the first coordinate value including the coordinate value corresponding to the first pixel in the first coordinate set, the second coordinate value including the coordinate value corresponding to the first pixel in the second coordinate set, and the first pixel including any one of a plurality of identical pixels contained in the first three-dimensional model and the second three-dimensional model.19.一种设备,其特征在于,包括:19. A device, comprising:一个或多个处理器;one or more processors;存储器,用于存储一个或多个程序;A memory for storing one or more programs;当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现权利要求1-9中任一项所述的方法。When the one or more programs are executed by the one or more processors, the one or more processors implement the method according to any one of claims 1 to 9.20.一种计算机可读存储介质,其特征在于,包括计算机程序,所述计算机程序在计算机上被执行时,使得所述计算机执行权利要求1-9中任一项所述的方法。20. A computer-readable storage medium, characterized by comprising a computer program, wherein when the computer program is executed on a computer, the computer is caused to execute the method according to any one of claims 1 to 9.21.一种计算机程序产品,其特征在于,所述计算机程序产品包括计算机程序代码,当所述计算机程序代码在计算机或处理器上运行时,使得计算机或处理器执行权利要求1-9中任一项所述的方法。21. A computer program product, characterized in that the computer program product comprises computer program code, and when the computer program code runs on a computer or a processor, the computer or the processor executes the method according to any one of claims 1 to 9.
CN202010600182.1A2020-06-282020-06-28 Image transformation method and deviceActiveCN113850709B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202010600182.1ACN113850709B (en)2020-06-282020-06-28 Image transformation method and device

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202010600182.1ACN113850709B (en)2020-06-282020-06-28 Image transformation method and device

Publications (2)

Publication NumberPublication Date
CN113850709A CN113850709A (en)2021-12-28
CN113850709Btrue CN113850709B (en)2025-07-15

Family

ID=78972743

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202010600182.1AActiveCN113850709B (en)2020-06-282020-06-28 Image transformation method and device

Country Status (1)

CountryLink
CN (1)CN113850709B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN114359837A (en)*2022-01-122022-04-15杭州登虹科技有限公司Face image correction method in security scene
CN116739908A (en)*2022-03-022023-09-12华为技术有限公司 Image processing methods, devices and equipment
CN115239576B (en)*2022-06-152023-08-04荣耀终端有限公司Photo optimization method, electronic equipment and storage medium
CN115376203B (en)*2022-07-202025-01-21华为技术有限公司 A data processing method and device thereof

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN105046246A (en)*2015-08-312015-11-11广州市幸福网络技术有限公司Identification photo camera capable of performing human image posture photography prompting and human image posture detection method
CN111080545A (en)*2019-12-092020-04-28Oppo广东移动通信有限公司Face distortion correction method and device, terminal equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US8577118B2 (en)*2008-01-182013-11-05Mitek SystemsSystems for mobile image capture and remittance processing
JP5694060B2 (en)*2011-06-062015-04-01シャープ株式会社 Image processing apparatus, image processing method, program, imaging apparatus, and television receiver
JP2015095857A (en)*2013-11-142015-05-18キヤノン株式会社Imaging apparatus
CN111027474B (en)*2019-12-092024-03-15Oppo广东移动通信有限公司 Face area acquisition method, device, terminal equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN105046246A (en)*2015-08-312015-11-11广州市幸福网络技术有限公司Identification photo camera capable of performing human image posture photography prompting and human image posture detection method
CN111080545A (en)*2019-12-092020-04-28Oppo广东移动通信有限公司Face distortion correction method and device, terminal equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Perspective-aware Manipulation of Portrait Photos;Ohad Fried 等;《SIGGRAPH ’16 Technical Paper》;20160728;第1-10页*

Also Published As

Publication numberPublication date
CN113850709A (en)2021-12-28

Similar Documents

PublicationPublication DateTitle
CN110445978B (en) A shooting method and equipment
CN110072070B (en) A kind of multi-channel video recording method and equipment, medium
US12299859B2 (en)Image transformation method and apparatus
CN113850709B (en) Image transformation method and device
CN113747050B (en)Shooting method and equipment
CN112614057B (en)Image blurring processing method and electronic equipment
CN114205515B (en)Anti-shake processing method for video and electronic equipment
WO2020192458A1 (en)Image processing method and head-mounted display device
WO2021078001A1 (en)Image enhancement method and apparatus
CN114092364A (en) Image processing method and related equipment
CN113741681B (en) Image correction method and electronic device
CN112700377A (en)Image floodlight processing method and device and storage medium
CN113572956A (en)Focusing method and related equipment
CN113711123B (en)Focusing method and device and electronic equipment
US12382163B2 (en)Shooting method and related device
WO2021185374A1 (en)Image capturing method and electronic device
WO2020249076A1 (en)Face calibration method and electronic device
CN113572957A (en) A shooting focusing method and related equipment
WO2022033344A1 (en)Video stabilization method, and terminal device and computer-readable storage medium
CN115150542B (en)Video anti-shake method and related equipment
CN114302063A (en)Shooting method and equipment
CN113472996B (en)Picture transmission method and device
CN117221722A (en)Video anti-shake method and electronic equipment
WO2022218216A1 (en)Image processing method and terminal device
CN117714849B (en) Image shooting method and related equipment

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp