CN107077719B

Movatterモバイル変換

Info

Publication number: CN107077719B
Application number: CN201580057165.5A
Authority: CN
Inventors: T·斯沃特达尔; P·克雷恩; N·塔拉隆; J·迪马雷
Original assignee: Polarite Corp
Current assignee: Polarite Corp
Priority date: 2014-09-05
Filing date: 2015-09-04
Publication date: 2020-11-13
Anticipated expiration: 2035-09-04
Also published as: EP3189493B1; US10154241B2; US20170289516A1; EP3189493A1; CN107077719A; WO2016034709A1; DK3189493T3

Abstract

The present invention relates to post-processing of digital photographs to correct perspective distortion in the photograph. The correction technique applies a digital photograph of a scene and a depth map associated with the photograph and including the depth of each pixel in the photograph, the depth being the distance between that portion of the scene that is the pixel and the position of the camera at the time the photograph was taken. The correction technique is performed locally, so that the correction of any pixel in the photograph depends on the depth of that pixel. The correction technique may be implemented to convert each pixel in the original photograph into a new location in the corrected photograph. The pixel values must then be calculated for the pixels of the corrected picture using the original pixel values and their new positions. The invention is particularly relevant for photographs of objects or scenes involving significant magnification differences, such as self-photographs, close-up photographs and photographs in which the extension of a large object is not orthogonal to the optical axis of the camera (low/high angle shots).

Description

Perspective correction based on depth map in digital photos

Technical Field

The present invention relates to post-processing of digital photos, and more particularly, to a method, digital storage holding software, an integrated circuit, and a handheld device with a camera for performing post-processing of digital photos.

Background

Distortion of perspective

Photography provides a 2D representation of the 3D world. This 3D to 2D conversion is achieved by projecting a 3D scene onto a 2D sensor with a lens.

At great distances and if the optical axis of the camera is perpendicular to the extension of the object, the perspective projection provides a "pleasing to the eye" that is "as expected". However, if the ratio between the distance to the closest part of the object and the distance to the farthest part of the object is very large, the near part will appear at a different magnification than the far part, and this difference in magnification will cause perspective distortion in the picture of the object.

There are many well known perspective distortion effects. When the extension of a larger object is not orthogonal to the optical axis, the parallel lines in the object are not parallel in the photograph, since the proximal end will be magnified more than the distal end. This is the case, for example, when shooting skyscrapers at low angles. Another effect often encountered is that when the distance between the camera and the object in the photograph is on the same order of magnitude as the depth of the topology (topology) of the object, the near part will appear disproportionate to the far part. This is the case, for example, of a self-photograph (self-picture taken by a hand-held camera of a subject) in which the arm length distance (30 to 50cm) between the camera and the subject's head and the distance between the nose and the ear are of about the same order of magnitude, making the nose look unnaturally large.

Thus, perspective distortion can affect any photograph in which an object or scene involves a high magnification difference.

Perspective correction

The problem of perspective correction has been partially solved in some specific applications, most of the time using tools that require user interaction. Existing tools allow correction of photographs taken at low and high angles, but these corrections are based on knowledge of the camera position and orientation or on an overall correction of the assumed geometry of the scene (e.g. parallel lines in a building).

Most currently available correction schemes only provide correction for optical distortions caused by camera optics, such as fisheye, barrel or pillow distortion correction. In these cases, the optical distortion is modeled and the global correction is applied.

DxO ViewPoint 2(http:// www.dxo.com/intl/photopraphy/DxO-viewport/wide-angle-lens-software) represents the current state of the art in perspective distortion correction. The DxO ViewPoint application allows correction of perspective distortion caused by camera optics when using a wide-angle lens. The application also allows correction of vanishing lines by performing different projections, but this correction is independent of the distance between the camera and the object or scene and cannot correct both near and far distance distortions. The applied correction is global and is applied independent of the topology of the scene or object in the photograph.

The overall correction is based on a smoothing criterion function determined by a few parameters relating to camera-extrinsic and camera-intrinsic parameters and/or relating to "prior" data or user-defined data. For example, fig. 1A-1D illustrate image correction by DxO ViewPoint, applying the same correction techniques on photographs and checkerboard patterns. Fig. 1A shows an original photograph and a pattern, with lens distortion correction applied in fig. 1B, perspective correction (natural mode) applied in fig. 1C, and perspective correction (complete mode) applied in fig. 1D. A problem with global correction techniques, such as those in DxO ViewPoint, is that these correction techniques are applied over the entire image without regard to the subject. The correction may be adjusted to a given object in a given plane at a given location through user interaction, but if the scene includes some objects that are not equidistant from the camera, the correction may not be satisfactory for all objects.

There are some smartphone applications such as SkinneePix and Facetune that allow users to improve the appearance of self-photographs. SkinneePix simply geometrically warps the picture, relying on a picture that contains only one face in the center of the picture, meaning the known geometry. Facetune allows for local changes in shape, but is essentially a simplified Photoshop tool dedicated to facial photographs that allows the user to control local distortion. Facetune is not a perspective correction tool and does not rely on image depth information.

Software exists to create 3D models from multiple cameras (e.g., RGB + TOF camera, stereo camera, or projection light system such as Microsoft Kinect). Another example is the paper "Kinect-Variety Fusion: A Novel Hybrid Approach for Artifacts-free 3DTV Content Generation" by Shama Manxi et al, which describes the extraction of depth information by combining multiple sensors and the use of Microsoft Kinect cameras and projection light structuring technology to improve depth map extraction. The second part of the paper (Section B, page 2277) relates to content generation for 3DTV, generating new views of a scene with multiple images of the scene captured by cameras as they move relative to the scene. These stereoscopic techniques do not provide a method to automatically correct perspective distortion in a single photograph. 3D GFX libraries such as OpenGL also do not provide a method for perspective correction using depth information.

To date, the tools available for perspective correction are either global correction, meaning some geometry of the object (straight line, face in the center), recording state (low angle, close range), or local correction, which requires user interaction and his/her knowledge of the natural or desired scene or appearance of the object (commonly known as photoshop cutback).

Disclosure of Invention

It is an object of the present invention to provide a method, digital storage holding software, integrated circuit and handheld device with camera for eliminating or compensating for local magnification difference of the whole photo by performing perspective correction using depth information of the photo. Another object is to perform such perspective correction automatically, without requiring a priori knowledge or user input about the natural appearance or topology of the scene or object in the photograph or the position or orientation of the camera.

To perform the perspective correction of the present invention, only a photograph is not sufficient. The distance between the camera and the scene for each pixel in the photograph, i.e. the depth map, is required. Once the photograph and depth map are obtained, the photograph taken by the virtual camera from different perspectives, i.e., different angles, farther distances, maintaining the same field of view can be reconstructed using the images and appropriate processing, thus eliminating or compensating for any perspective distortion. Instead of using the global conversion used for the entire image, a local correction for each pixel is performed according to the pixel distance from the associated depth map.

Accordingly, in a first aspect, the present invention provides a method for performing perspective correction in a digital photograph of a scene recorded by a camera, the method comprising the steps of:

for an array of pixels (x, y), the pixel value P is provided_(x,y)A digital photograph of a representative scene, the photograph relating to perspective distortion effects;

provide information relating to the photograph and including the depth d_(x,y)For each pixel in the array of pixels, the depth d_(x,y)Refers to the distance between the part of the scene represented by the pixel and the acquisition position of the camera when the photograph was acquired; and

performing perspective correction using a photograph of the scene and its associated depth map, the correction of any pixel or region in the photograph depending on the depth of the pixel or the depth of the pixels in the region.

In this case, correction is to be understood as a process of making some change to solve a problem or produce a desired effect, and is not limited to a change made to make something accurate or correct according to a standard. Therefore, the perspective correction may also be a correction from the viewpoint of achieving a specific effect in the photograph. Also, the photograph originally obtained or some photograph with perspective distortion, i.e., the original photograph or the source photograph, is often referred to as the "captured photograph". The final photograph, i.e., the final photograph or the target photograph, on which the perspective correction has been performed is generally referred to as a "processed photograph". In a more detailed definition, depth is the distance between an object to the object principal plane (object principal plane) of the camera lens. For a mobile phone, the object plane is very close to the protection window of the camera in the mobile phone. It should be noted that the depth map may have a different (and typically lower) resolution than the photograph. In this case, the depth map may be upscaled so that it has the same resolution and effective depth information is obtained for each pixel value.

The perspective correction of the present invention is advantageous in that it can be performed without user interaction, so it can be implemented as a fully automatic correction that provides an immediate improvement of the photograph. Another advantage of the perspective correction of the present invention is local and thus the local perspective distortion of each pixel or region in the photograph can be corrected. Thus, the correction can simultaneously and independently correct perspective distortion of different parts with different depths in the scene of the photograph.

In a preferred embodiment, the step of performing perspective correction comprises: perspective correction of a scene of a photograph is performed using the photograph and its associated depth map, i.e. for each pixel (x, y) in the photograph, at least according to the depth d of the pixel in the image plane of the acquisition location_acq(x, y) and its position D_acq(x, y) and a displacement C between the virtual camera position and the acquisition position, determining a new position D of the virtual camera position in the image plane_proc. Preferably, the new position is calculated to maintain the magnification in the selected plane of the depth map.

In this relationship, the displacement is the difference between the final and original position (movement in the x, y, z direction) and direction (rotation about the x, y, z axis), i.e. here the difference between the camera position at the time of taking the picture and the virtual camera position. Furthermore, the magnification is the ratio between the real size of the object and the size of the object in the photograph, which for each pixel is equal to the ratio between the depth of the pixel and the effective focal length.

In an alternative, the step of performing perspective correction using the at least one photograph of the scene and its associated depth map preferably comprises the steps of:

the depth of each pixel (x, y) in the photograph is used to calculate the magnification of that pixel.

Using the calculated magnification to calculate a new position (x ', y') for each pixel in the photograph, such that all new positions have the same magnification.

In another embodiment, the step of performing perspective correction comprises determining the new position as follows:

moreover, the step of performing perspective correction may further comprise further using the depth d of the reference plane selected to maintain magnification within the selected plane of the depth map_{acq_ref}To adjust the magnification of the new position. This embodiment may be implemented by determining the new position as follows:

if a new position has been determined, the step of performing perspective correction preferably further comprises: using pixel values P_(x,y)And a new position (x ', y') to determine a new pixel value P for the array of pixels (i, j)_(i,j)(ii) a By applying the corresponding pixel value P_(x,y)With a new pixel value P of the pixel (i, j) surrounding the new position_(i,j)Adding, for each new position (x ', y'), a corrected picture is depicted, wherein the pixel value P is scaled with a weighting factor as a function of the new position (x ', y') and the relative position of each pixel (i, j)_(x,y)The weighting is performed.

If a new pixel value is generated by the addition of weighted values of one or more pixel values in the acquired picture, the perspective correction preferably further comprises the subsequent steps of: i.e. each new pixel value P_(i,j)Divided by a normalization factor. In areas where the photograph is "stretched," some new pixels may not be near any new locations and thus may not have any pixel values assigned to them. Therefore, the method preferably further comprises the subsequent steps of: for having an indeterminate value P_(i,j)According to the determined value P in the corrected picture_(i,j)The interpolated pixel value is calculated.

In a preferred implementation, the displacement between the virtual camera position and the acquisition position is a linear displacement along the optical axis of the camera. It may also be preferred that the displacement is infinitely distant from the virtual camera position, so that all magnification changes are equal.

The depth map may be generated by a multi-camera setting or a single-camera setting. In a preferred embodiment, the steps of providing a photograph and providing a depth map relating to the photograph involve only a single camera having a single acquisition position. The single camera preferably has an adjustable lens or any other optical element with an adjustable optical magnification.

The step of providing a depth map may involve generating the depth map using focus-based depth map estimates, such as in-focus ranging (also known as Shape from focus) or out-of-focus ranging. By definition, DFF (in-focus ranging) or DFD (out-of-focus ranging) provides a depth map that is completely consistent with a picture taken by the same camera at the same location, thus providing the advantage of reducing the complexity of the processing and eliminating any calibration between the picture and the depth map.

In an alternative embodiment, the steps of providing a picture and providing a depth map involve the use of multiple cameras, for example:

use separate image and depth map cameras;

generating images and depth maps using stereo or array cameras; or

Generate images and depth maps using multiple cameras and from different perspectives.

Since not all photographs are perspective distorted to the extent that perspective correction is needed or preferred, and since perspective correction requires some processing power, it may be preferred to select a photograph for which perspective correction is needed. Thus, in a preferred embodiment, the steps of providing a photograph and providing a depth map may comprise: a series of photographs and their associated depth maps are provided, wherein the method further comprises the step of detecting and evaluating perspective distortion in the photographs to select photographs that would benefit from perspective correction. The evaluation of perspective distortion may be based on, for example, the distance of the nearest object in the scene and/or on the analysis of vanishing straight lines.

In a further embodiment, the same transformation used to correct perspective distortion in the photograph may be applied to transform the depth map itself to generate a depth map associated with the corrected photograph. Here, the method of the first aspect further comprises performing perspective also on the depth mapCorrection, i.e. for each pixel (x, y) in the depth map, by at least depending on the depth d of that pixel in the image plane of the acquisition location_acq(x, y) and its position D_acq(x, y) and a displacement C between the virtual camera position and the acquisition position, determining a new position D of the virtual camera position in the image plane_procTo also perform perspective correction on the depth map. In this respect, depth maps are only single-channel images (all pixel values are depth), while color photographs are typically three-channel images (pixel values in one of the three color channels, e.g., red, green, and blue). The process is equivalent, the only difference being that only the depth map itself is required, since the depth of each pixel is inherent. Preferably, the displacement C is added to each depth (pixel value) of the corrected depth map.

The perspective correction according to the first aspect of the invention may be performed immediately in the camera, or in a post-processing of the camera or in other devices.

The invention may be implemented in hardware, software, firmware or any combination of these. Thus in a second aspect the invention provides digital storage holding software which, when executed by one or more digital processors, performs the method of the first aspect. The digital memory may be any one or more readable media that can store digital code, such as a diskette, hard drive, RAM, ROM, etc., and the software may be on a single medium (e.g., the memory of a device with a camera) or distributed over several media, such as different hard drive servers connected via a network, or other types of electronic storage.

In a third aspect, the invention provides an integrated circuit configured to perform the method of the first aspect.

Similarly, in a fourth aspect, the invention provides a hand-held or portable device with a camera comprising the data memory of the second aspect or the integrated circuit of the third aspect.

Drawings

The invention will be described in more detail with reference to the accompanying drawings. The drawings illustrate one way of carrying out the invention and are not intended to limit other possible embodiments within the scope of the appended claims.

Fig. 1A-D show the image correction applied by DxO ViewPoint 2, the correction applied to the photo and checkerboard pattern being the same.

Fig. 2 and 3 show an arrangement for explaining an applicable algebraic derivation according to an embodiment of the invention.

Fig. 4 shows the way in which the new pixel value is calculated.

Fig. 5A-C illustrate the use of an adaptive kernel for interpolating pixel values for pixels having undetermined pixel values.

FIG. 6 is a schematic system diagram illustrating an embodiment of the method of the present invention and a schematic representation of an overview of the operation of the computer program product of the present invention.

Detailed Description

The main focus described in the following description is on perspective distortion that occurs when the ratio between the distance of the camera from the closest part of the scene and the distance of the camera from the farthest part of the scene is high and generates severe distortion. This occurs mainly in close-up or low-angle photography. However, the perspective correction in the present invention can be applied in any scene topology.

The method of perspective correction in the present invention converts a photograph from the camera angle of view (POV) at the time of acquisition to a virtual POV in a processed photograph in which the perspective distortion effect is weakened, negligible, or not present. Yet another object is to convert to a new POV (remote or infinite) while maintaining the same object size.

Therefore, for each pixel (x, y) in the acquired picture, the user needs to calculate the value P of the relevant pixel in the processed picture_(x,y)The new position (x ', y'). The calculated new position is the pixel position (x, y) in the acquired picture and the distance d between the camera and the part of the scene in that pixel_(x,y)As a function of (c).

The following describes an embodiment in which the displacement between the camera position at the time of taking a photograph (also referred to as the original camera position) and the camera position in the virtual POV is a linear displacement along the camera optical axis (here, the Z axis). More complex displacements (displacements in other directions: x/y and rotation) can be applied, but the algebra for such displacements, although readily available, is quite extensive.

Fig. 2 shows the settings of the object, the position of the camera where the picture is taken and the virtual camera position, the camera having a lens with a focal length f.

The following symbols will be used in the description and are shown in fig. 2 and 3:

d: depth of pixel

D: pixel distance from sensor center or optical axis

C: acquiring the displacement between the picture position and the virtual position along the Z-axis; +: away from the scene; -: close up scene

Subscript "acq": refers to a taken photograph

Subscript "proc": refer to processed photographs from a virtual camera position

Coordinates/index (x, y): integer position of pixel in captured photograph

P: pixel values, example: RGB or other color spaces

Coordinates/index (x ', y'): new (decimal) position of pixel value of pixel (x, y) after conversion

Coordinates/index (i, j): integer position of pixel in processed photograph

The following geometrical relationships can be taken from fig. 2:

d_acq/D＝f/D_acq

d_proc/D＝f/D_proc

d_proc＝d_acq+C

＝>D_proc/D_acq＝d_acq/d_proc

＝>D_proc＝D_acq*d_acq/d_proc

＝>D_proc＝D_acq*d_acq/(d_acq+C) (1)

as previously mentioned, the magnification is the true size of the object and the object's position in the photographThe ratio between the dimensions, in relation to FIG. 2, the magnification in the acquired photograph may be expressed as D/D_acq＝f/d_acqCan be expressed as D/D in the processed photo_proc＝f/d_proc＝f/(d_acq+C)。

The transformation (1) causes the enlargement of the whole picture. If we want to select a reference plane where the magnification factor is one, we need to calculate the magnification factor for this distance. This process is illustrated in fig. 3. The reference plane is preferably selected close to the center of the object in the direction towards the camera. For example: face, the plane of the face (hair/ear) contour may be chosen as the reference plane, maintaining head size and accounting for nose distortion.

The magnification on the reference plane is:

D_{proc_ref}/D_{acq_ref}＝(d_{acq_ref})/(d_{acq_ref}+C) (2)

substituting the reference magnification (2) into the conversion (1) yields:

D_proc＝D_acq*d_acq*(d_{acq_ref}+C)/((d_{acq_ref})*(d_acq+C)) (3)

if C is infinity (same magnification for all objects), we get:

D_proc＝D_acq*d_acq/(d_{acq_ref}) (4)

because D has axial symmetry around the optical axis (Z-axis), the transformation expressed in (3) lies in polar coordinates, D is a radial coordinate, and the angular coordinate

Unaffected by the conversion. It should be noted that other expressions may be developed with the same or similar results as the transformation (3), using, for example, other coordinate systems, other camera optics, or other conditions. An important feature of the conversion (3) is that the conversion, and thus the correction, of any pixel in the photograph depends on the depth d of that pixel_acq. Therefore, if a photograph and associated depth map are obtained with perspective distortion, virtual facies with significantly reduced or absent perspective distortion are selectedMachine position, this perspective correction is done in principle.

The conversion (3) can be used to calculate a pixel having a pixel value P_(x,y)If the photograph is taken with a camera at a virtual location, the pixel will be at that location (x ', y'). As can be inferred from (3), the new position is the pixel depth d_(x,y)Position D and the displacement C between the virtual camera position and the position at which the photograph was taken. The new position maintains the magnification of the selected plane of the depth map by substituting the reference magnification in (2) into (3).

In a preferred embodiment, the conversion (3) is used as a forward conversion; computing a pixel value P from an original pixel position (x, y)_(x,y)The new position (x ', y'). The forward conversion involves some complications, where multiple pixels in the processed photograph will contribute to a single pixel, but some pixels in the processed photograph may not be at all subject to this contribution. Inverse transformation may also be used, but the computational requirements are higher for perspective correction, so it is preferable to implement forward mapping.

In the forward conversion, the acquired picture is scanned and new coordinates are calculated for each point. However, in order to represent the converted photograph in a standard digital format with a regular array of pixels of the same size and each having a single value in a certain color space, the converted photograph requires more processing. Repositioning each pixel (x, y) to its new position (x ', y') only creates a picture with more overlapping pixels (multi-source pixels) points and with no pixels black points (holes).

For each source pixel P where x and y are integer values_(x,y)Coordinates in the processed photograph: x 'and y' are decimal values. These coordinates may be represented as an integer value (i, j) plus a fractional part (a)_x_y)。

x’＝i+_x

y’＝j+_y

First, a pixel value P_(x,y)Are assigned to pixels within the pixel array in the processed photograph as explained in connection with fig. 4. P_(x,y)'s is the pixel value in the target picture, and X's is the pixel value P in the processed picture_(i,j)Is measured in the center of the pixel (i, j). For each calculated new position, the corresponding pixel value will act on the pixel values of nearby pixels in the processed picture. In a preferred embodiment, this is achieved as follows.

For each pixel in the acquired picture by assigning a corresponding pixel value P_(x,y)New pixel values P of the four pixels closest to the new position in the corrected picture_(i,j)Add to determine a new pixel value. When added, the pixel value P_(x,y)Weighted by a weighting factor that is a function of the new position (x ', y') and the relative position of the pixel (i, j), resulting in a bilinear interpolation, as shown in fig. 4:

P_(i,j)→P_(i,j)+P_(x,y)*(1-_x)*(1-_y)

P_(i+1,j)→P_(i+1,j)+P_(x,y)*_x*(1-_y)

P_(i,j+1)→P_(i,j+1)+P_(x,y)*(1-_x)*_y

P_(i+1,j+1)→P_(i+1,j+1)+P_(x,y)*_x*_y

thus, in the processed photograph, the weighted values of the original pixel values are accumulated in each pixel. To normalize the new pixel values in the processed picture, each pixel value is subsequently divided by a normalization factor.

In practical applications, a "Photo accumulated buffer" is created. First, it is filled with 0's, and each time the pixel value P is present_(x,y)Acting on pixel P in processed picture_(i,j)Then P is calculated in the buffer_(x,y)The sum of the weighted values of (a). At the same time, the weighting factors are calculated in a "weighting buffer" (example: (1-)_x)*(1-_y) ) of the total of the two. Once the forward mapping is complete, each pixel from the photo accumulation buffer is divided by the weighting factor of the weighting buffer to generate a "weighted perspective corrected photo". This solves the problemThe problem is that some pixels in the processed picture will get information from a number of pixels in the obtained picture by forward mapping.

In the above, the obtained pixel values are accumulated in the surrounding pixels using bilinear interpolation and weighting factors_x*_y. However, other methods of assigning the pixel values of the acquired photos, and thus other weighting factors, are possible, and other interpolation methods such as bi-cubic or spline interpolation may also be used.

The weighted perspective corrected picture may include a "hole", i.e. no pixel in the obtained picture contributes to this pixel value, which is therefore undetermined. To fill in holes, the pixels are corrected to have an indeterminate value P based on surrounding pixels in the picture_(i,j)Has a determined value P_(i,j)The interpolated pixel values of (2). In the preferred embodiment, the weighted perspective corrected picture is scanned and each time a hole is found, an interpolated value for the pixel is calculated based on the Inverse Distance Weighting (IDW) from the active pixel. More information about IDW can be found on web pages like http:// en. wikipedia. org/wiki/Inverse _ distance _ weighing.

Since the size of the hole is unknown, the distance to the nearest determined pixel value is also unknown. To ensure fast processing and avoid unfilled holes, an adaptive kernel size may be used, as shown in FIGS. 5A-C: in fig. 5A and 5B, a 3 × 3 kernel is used, and in fig. 5A, the value of the active pixel, i.e., the value of the hole, can be calculated from the surrounding 6 pixels having the determined pixel value by IDW. In fig. 5B, however, the surrounding pixels do not provide data for the IDW, so no determined value for the active pixel/hole is available, and the kernel size must be increased, for example to the 5x5 kernel in fig. 5C. Here, the value of the active pixel/hole can be calculated by IDW from the surrounding 10 pixels having a certain value.

Filling holes can also be done by texture mapping if the holes are too large for pixel interpolation. For example, for a selfie, the side of the nose may be an area where missing polygons are obtained as the distance increases, in which case skin texture mapping may be used.

Fig. 6 shows the overall process of the perspective correction method.

As mentioned before, the depth map itself is a single-channel image, all pixel values of which are depths, and the depth map can also be corrected by the same transformation process and steps as the scene picture. The displacement C between the actual and virtual camera positions may be added to each depth/pixel value after conversion. The corrected depth map is correlated with the corrected picture and the distances of different parts of the scene are provided to the picture plane of the virtual camera position.

Self-photographing application

The preferred embodiment of the present invention relates to perspective correction in selfphotographing. Self-photography, by its very nature, is a photograph taken at close range, such as a photograph taken using a built-in camera or camera of a cell phone (the farthest distance is usually arm length) or a laptop or tablet. These close-range photographs most often show perspective distortion-the face is close to the camera and the ratio between the distance of the camera from the closest part (nose) and the distance of the camera from the farthest part (ear) is large.

Thus, selfphotographing is typically suited for automatic perspective correction techniques, and in order to pick out photographs that would benefit from the aforementioned perspective correction techniques, automatic detection and evaluation of perspective distortion can be combined with pattern recognition to detect depth differences imposed on the depth map that occur on human faces.

Also, in self-photography, the background (the portion of the scene that is photographed behind the person being photographed) can be determined using depth map information. Thus, in a preferred embodiment, the method of the invention involves detecting one or more foreground objects and the background in a photograph by analysing a depth map, thereby identifying areas which vary rapidly in depth and which are at least partially outlined by such areas, areas with a smaller average depth being identified as foreground objects and areas with a greater average depth being identified as the background. In an alternative embodiment, the method may involve detecting one or more foreground objects and the background in the photograph by analysing the depth map, so that regions with a depth of less than 300cm (e.g. less than 200cm, 150cm or 100cm) are identified as foreground objects and regions with greater depth are identified as background.

After detecting the background and foreground portions of the photograph, the background can be replaced with other image content (photograph, painting, graphics, or any combination thereof) while retaining the foreground objects.

Technical implementation

The invention may be implemented in hardware, software, firmware or any combination thereof. The invention or some of its features may also be implemented as software running in one or more data processors and/or digital signal processors. Fig. 6 is also considered to be a system schematic block diagram depicting an overview of the operation of an embodiment of the computer program product according to the second aspect of the present invention. The individual elements of a hardware embodiment of the invention may be physically, functionally and logically implemented in any suitable way, e.g. in a single unit, in a plurality of units or as part of separate functional units. The invention may be implemented in a single unit or may be physically and functionally distributed between different units and processors.

The integrated circuit of the third aspect of the invention may be a general purpose In System Programming (ISP), microprocessor or Application Specific Integrated Circuit (ASIC) or some part thereof. This may be advantageous, especially for hand-held or portable devices with cameras as described in the fourth aspect of the invention, for which lower cost, power consumption, weight, volume, heat generation etc. are important. Handheld devices with cameras include digital cameras, mobile phones, tablet computers, mp3 players, and the like. The portable device with a camera includes, for example, a notebook computer.

While the invention has been described in connection with specific embodiments, the invention should not be construed as being limited in any way to the examples presented. The scope of the invention is to be construed in accordance with the substance defined by the following claims. In the context of the claims, the term "comprising" or "comprises" does not exclude other possible elements or steps. Furthermore, references to words such as "a" or "an" should not be construed as excluding the plural. The use of reference signs in the claims with respect to elements illustrated in the figures shall not be construed as limiting the scope of the invention. Furthermore, advantageous combinations of features may be obtained from the individual features mentioned in different claims, and the mentioning of these features in different claims does not exclude that a combination of features is possible and advantageous.

Claims

1. A method of automatically performing perspective correction in a digital photograph of a scene recorded with a camera, the method comprising the steps of:

using only a single camera with a single acquisition position,

providing a digital photograph of the scene represented by pixel values P (x, y) for an array of pixels (x, y), said photograph being related to the effect of perspective distortion;

providing a depth map relating to the photograph and comprising a depth d (x, y), which for each pixel in the array of pixels refers to the distance between the part of the scene represented by the pixel and the acquisition position of the camera at the time the photograph was taken; and

using the photograph of the scene from the single camera and the acquisition position and its associated depth map as the only input recorded by the camera to perform a perspective correction of the photograph, i.e. for each pixel (x, y) in the photograph, at least according to the depth d of the pixel in the image plane of the acquisition position_acq(x, y) and its position D_acq(x, y) and a linear displacement C between the virtual camera position and said acquisition position along the camera optical axis to determine a new position D of said virtual camera position in the image plane_procThe following were used:

2. the method of claim 1, wherein the step of performing perspective correction further comprises: also using the depth d of the reference plane_{acq_ref}To adjust the magnification of the new position, the reference plane being chosen to hold the depth mapThe magnification of the selected plane.

3. The method of claim 2, wherein the step of performing perspective correction comprises determining a new location as follows:

4. the method of claim 1, wherein the step of performing perspective correction further comprises: using said pixel value P_(x,y)And said new position (x ', y') to determine a new pixel value P for the array of pixels (i, j)_(i,j)By applying the corresponding pixel value P_(x,y)With a new pixel value P of the pixel (i, j) surrounding the new position_(i,j)Adding up to render a corrected picture for each new position (x ', y'), wherein the pixel value P is scaled with a weighting factor that is a function of the new position (x ', y') and the relative position of each pixel (i, j)_(x,y)The weighting is performed.

5. The method of claim 4, further comprising: subsequently, each new pixel value P is used_(i,j)Divided by a normalization factor.

6. The method of claim 4 or 5, further comprising: subsequently, for a value having an uncertainty P_(i,j)Calculates an interpolated pixel value from the surrounding pixels in the corrected picture having the determined value P (i, j).

7. The method of claim 1, wherein the steps of providing a photograph and providing a depth map comprise providing a series of photographs and their associated depth maps; wherein the method further comprises the steps of: detecting and evaluating perspective distortion in the photograph based on distance of nearest objects in the scene or based on analysis of vanishing straight lines; selecting a photograph with perspective distortion that would benefit from perspective correction; and automatically performing perspective correction in the selected photograph.

8. The method of claim 1, further comprising: perspective correction is also performed on the depth map, i.e. for each pixel (x, y) in the depth map, at least according to the depth d of said pixel in the image plane of the acquisition location_acq(x, y) and its position D_acq(x, y) and a displacement C between the virtual camera position and the acquisition position, determining a new position D of the virtual camera position in the image plane_proc。

9. A computer-readable storage medium, on which a computer program is stored, the computer program being executable by a processor for performing the method of claim 1.

10. An integrated circuit configured to perform the method of any of claims 1-8.

11. A hand-held or portable device with a camera comprising a computer readable storage medium according to claim 9 or an integrated circuit according to claim 10.