US20160255323A1

Movatterモバイル変換

Info

Publication number: US20160255323A1
Application number: US14/832,062
Authority: US
Inventors: Andrew Wajs
Original assignee: DUAL APERTURE INTERNATIONAL Co Ltd
Current assignee: DUAL APERTURE INTERNATIONAL Co Ltd
Priority date: 2015-02-26
Filing date: 2015-08-21
Publication date: 2016-09-01
Also published as: US20160267667A1; US9721344B2; US20160269600A1; US20160269710A1; US9721357B2; WO2016137241A1

Abstract

Embodiments relate to different methods for reducing computations used to estimate depth information. One aspect relates to using down-sampled blur kernels. Another aspect relates to processing of edges in the images. Yet another aspect relates to using partial blur kernels, such as single-sided blur kernels. Yet another aspect relates to frequency filtering to reduce energy and noise at frequencies that do not distinguish between different blur kernels.

Description

CROSS-REFERENCE TO RELATED APPLICATION(S)

This application claims priority under 35 U.S.C. §119(e) to U.S. Provisional Patent Application Ser. No. 62/121,203, “Dual-Aperture Depth Map Using Adaptive PSF Sizing,” filed Feb. 26, 2015. The subject matter of all of the foregoing is incorporated herein by reference in its entirety.

BACKGROUND

1. Field of the Invention

This invention relates to a multi-aperture imaging system that uses multiple apertures of different f-numbers to estimate depth of an object.

2. Description of Related Art

A dual-aperture camera has two apertures. A narrow aperture, typically at one spectral range such as infrared (IR), produces relatively sharp images over a long depth of focus. A wider aperture, typically at another spectral range such as RGB, produces sometimes blurred images for out of focus objects. The pairs of images captured using the two different apertures can be processed to generate distance information of an object, for example as described in U.S. patent application Ser. No. 13/579,568, which is incorporated herein by reference. However, conventional processing methods can be computationally expensive.

Therefore, there is a need to improve approaches for depth map generation.

SUMMARY

Embodiments relate to different methods for reducing computations used to estimate depth information. One aspect relates to scaling the size of blur kernels used in the depth processing. The distance range is divided into sub-ranges. A bank of blur kernels is used for each sub-range to estimate distance. For different sub-ranges, the blur kernels and captured images are down-sampled by different factors. In this way, although the original blur kernels may span a large range of sizes, the down-sampled blur kernels will be more limited in size which reduces computation.

In another aspect, processing of images takes advantage of edges in the images. The same edge in different images may first be normalized to phase match and/or equate energies in the edges of the two images. In another aspect, the edges may be binarized. Binarized edges can be used to reduce computationally expensive convolutions into simpler summing operations.

In another aspect, rather than using full blur kernels, only partial blur kernels are used. For example, single-sided blur kernels may be used in order to accommodate edges caused by occlusions, where the two sides of the edge are at different depths.

In yet another aspect, frequency filtering is used to reduce energy and noise at frequencies that are not useful to distinguish between different blur kernels.

Other aspects include components, devices, systems, improvements, methods, processes, applications, computer readable mediums, and other technologies related to any of the above.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the disclosure have other advantages and features which will be more readily apparent from the following detailed description and the appended claims, when taken in conjunction with the accompanying drawings, in which:

FIG. 1 is a block diagram of a multi-aperture, shared sensor imaging system according to one embodiment of the invention.

FIG. 2A is a graph illustrating the spectral responses of a digital camera.

FIG. 2B is a graph illustrating the spectral sensitivity of silicon.

FIGS. 3A-3C depict operation of a multi-aperture imaging system according to one embodiment of the invention.

FIGS. 3D-3E depict operation of an adjustable multi-aperture imaging system according to one embodiment of the invention.

FIG. 4 is a plot of the blur spot sizes B_visand B_irof visible and infrared images, as a function of object distance s.

FIG. 5 is a table of blur spot and blur kernel as a function of object distance s.

FIG. 6A is a diagram illustrating one approach to estimating object distance s.

FIG. 6B is a graph of error e as a function of kernel number k for the architecture ofFIG. 6A.

FIG. 7A is a diagram illustrating another approach to estimating object distance s.

FIGS. 7B-7D are graphs of error e as a function of kernel number k for the architecture ofFIG. 7A.

FIG. 8 is a diagram illustrating normalization of edges.

FIGS. 9A-9E illustrate a simplified approach for convolution of binarized edges.

FIG. 10 is a diagram illustrating the effect of occlusion.

FIG. 11 is a diagram illustrating a set of single-sided blur kernels with different edge orientations.

FIG. 12 is a frequency diagram illustrating the effect of frequency filtering.

The figures depict various embodiments for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles described herein.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

FIG. 1 is a block diagram of a multi-aperture, sharedsensor imaging system100 according to one embodiment of the invention. The imaging system may be part of a digital camera or integrated in a mobile phone, a webcam, a biometric sensor, image scanner or any other multimedia device requiring image-capturing functionality. The system depicted inFIG. 1 includes imaging optics110 (e.g., a lens and/or mirror system), amulti-aperture system120 and animage sensor130. Theimaging optics110

images objects

150 from a scene onto the image sensor. InFIG. 1, theobject150 is in focus, so that thecorresponding image160 is located at the plane of thesensor130. As described below, this will not always be the case. Objects that are located at other depths will be out of focus at theimage sensor130.

Themulti-aperture system120 includes at least two apertures, shown inFIG. 1 as

apertures

122 and124. In this example,aperture122 is the aperture that limits the propagation of visible light, andaperture124 limits the propagation of infrared or other non-visible light. In this example, the two

apertures

122,124 are placed together but they could also be separated. This type ofmulti-aperture system120 may be implemented by wavelength-selective optical components, such as wavelength filters. As used in this disclosure, terms such as “light” “optics” and “optical” are not meant to be limited to the visible part of the electromagnetic spectrum but to also include other parts of the electromagnetic spectrum where imaging may occur, including wavelengths that are shorter than visible (e.g., ultraviolet) and wavelengths that are longer than visible (e.g., infrared).

Thesensor130 detects both the visible image corresponding toaperture122 and the infrared image corresponding toaperture124. In effect, there are two imaging systems that share a single sensor array130: a visible imagingsystem using optics110,aperture122 andsensor130; and an infrared imagingsystem using optics110,aperture124 andsensor130. Theimaging optics110 in this example is fully shared by the two imaging systems, but this is not required. In addition, the two imaging systems do not have to be visible and infrared. They could be other spectral combinations: red and green, or infrared and white (i.e., visible but without color), for example.

The exposure of theimage sensor130 to electromagnetic radiation is typically controlled by ashutter170 and the apertures of themulti-aperture system120. When theshutter170 is opened, the aperture system controls the amount of light and the degree of collimation of the light exposing theimage sensor130. Theshutter170 may be a mechanical shutter or, alternatively, the shutter may be an electronic shutter integrated in the image sensor. Theimage sensor130 typically includes rows and columns of photosensitive sites (pixels) forming a two dimensional pixel array. The image sensor may be a CMOS (complementary metal oxide semiconductor) active pixel sensor or a CCD (charge coupled device) image sensor. Alternatively, the image sensor may relate to other Si (e.g. a-Si), III-V (e.g. GaAs) or conductive polymer based image sensor structures.

When the light is projected by theimaging optics110 onto theimage sensor130, each pixel produces an electrical signal, which is indicative of the electromagnetic radiation (energy) incident on that pixel. In order to obtain color information and to separate the color components of an image which is projected onto the imaging plane of the image sensor, typically acolor filter array132 is interposed between theimaging optics110 and theimage sensor130. Thecolor filter array132 may be integrated with theimage sensor130 such that each pixel of the image sensor has a corresponding pixel filter. Each color filter is adapted to pass light of a predetermined color band onto the pixel. Usually a combination of red, green and blue (RGB) filters is used. However other filter schemes are also possible, e.g. CYGM (cyan, yellow, green, magenta), RGBE (red, green, blue, emerald), etc. Alternately, the image sensor may have a stacked design where red, green and blue sensor elements are stacked on top of each other rather than relying on individual pixel filters.

Each pixel of the exposedimage sensor130 produces an electrical signal proportional to the electromagnetic radiation passed through thecolor filter132 associated with the pixel. The array of pixels thus generates image data (a frame) representing the spatial distribution of the electromagnetic energy (radiation) passed through thecolor filter array132. The signals received from the pixels may be amplified using one or more on-chip amplifiers. In one embodiment, each color channel of the image sensor may be amplified using a separate amplifier, thereby allowing to separately control the ISO speed for different colors.

Further, pixel signals may be sampled, quantized and transformed into words of a digital format using one or more analog to digital (A/D)converters140, which may be integrated on the chip of theimage sensor130. The digitized image data are processed by aprocessor180, such as a digital signal processor (DSP) coupled to the image sensor, which is configured to perform well known signal processing functions such as interpolation, filtering, white balance, brightness correction, and/or data compression techniques (e.g. MPEG or JPEG type techniques).

Theprocessor180 may include signal processing functions184 for obtaining depth information associated with an image captured by the multi-aperture imaging system. These signal processing functions may provide a multi-aperture imaging system with extended imaging functionality including variable depth of focus, focus control and stereoscopic 3D image viewing capabilities. The details and the advantages associated with these signal processing functions will be discussed hereunder in more detail.

Theprocessor180 may also be coupled to additional compute resources, such as additional processors, storage memory for storing captured images and program memory for storing software programs. Acontroller190 may also be used to control and coordinate operation of the components inimaging system100. Functions described as performed by theprocessor180 may instead be allocated among theprocessor180, thecontroller190 and additional compute resources.

As described above, the sensitivity of theimaging system100 is extended by using infrared imaging functionality. To that end, theimaging optics110 may be configured to allow both visible light and infrared light or at least part of the infrared spectrum to enter the imaging system. Filters located at the entrance aperture of theimaging optics110 are configured to allow at least part of the infrared spectrum to enter the imaging system. In particular,imaging system100 typically would not use infrared blocking filters, usually referred to as hot-mirror filters, which are used in conventional color imaging cameras for blocking infrared light from entering the camera. Hence, the light entering the multi-aperture imaging system may include both visible light and infrared light, thereby allowing extension of the photo-response of the image sensor to the infrared spectrum. In cases where the multi-aperture imaging system is based on spectral combinations other than visible and infrared, corresponding wavelength filters would be used.

FIGS. 2A and 2B are graphs showing the spectral responses of a digital camera. InFIG. 2A,curve202 represents a typical color response of a digital camera without an infrared blocking filter (hot mirror filter). As can be seen, some infrared light passes through the color pixel filters.FIG. 2A shows the photo-responses of a conventionalblue pixel filter204,green pixel filter206 andred pixel filter208. The color pixel filters, in particular the red pixel filter, may transmit infrared light so that a part of the pixel signal may be attributed to the infrared.FIG. 2B depicts theresponse220 of silicon (i.e. the main semiconductor component of an image sensor used in digital cameras). The sensitivity of a silicon image sensor to infrared radiation is approximately four times higher than its sensitivity to visible light.

In order to take advantage of the spectral sensitivity provided by the image sensor as illustrated byFIGS. 2A and 2B, theimage sensor130 in the imaging system inFIG. 1 may be a conventional image sensor. In a conventional RGB sensor, the infrared light is mainly sensed by the red pixels. In that case, theDSP180 may process the red pixel signals in order to extract the low-noise infrared information. Alternatively, the image sensor may be especially configured for imaging at least part of the infrared spectrum. The image sensor may include, for example, one or more infrared (I) pixels in addition to the color pixels, thereby allowing the image sensor to produce a RGB color image and a relatively low-noise infrared image.

An infrared pixel may be realized by covering a pixel with a filter material, which substantially blocks visible light and substantially transmits infrared light, preferably infrared light within the range of approximately 700 through 1100 nm. The infrared transmissive pixel filter may be provided in an infrared/color filter array (ICFA) may be realized using well known filter materials having a high transmittance for wavelengths in the infrared band of the spectrum, for example a black polyimide material sold by Brewer Science under the trademark “DARC 400”.

Such filters are described in more detail in US2009/0159799, “Color infrared light sensor, camera and method for capturing images,” which is incorporated herein by reference. In one design, an ICFA contain blocks of pixels, e.g. a block of 2×2 pixels, where each block comprises a red, green, blue and infrared pixel. When exposed, such an ICFA image sensor produces a raw mosaic image that includes both RGB color information and infrared information. After processing the raw mosaic image, a RGB color image and an infrared image may be obtained. The sensitivity of such an ICFA image sensor to infrared light may be increased by increasing the number of infrared pixels in a block. In one configuration (not shown), the image sensor filter array uses blocks of sixteen pixels, with four color pixels (RGGB) and twelve infrared pixels.

Instead of an ICFA image sensor (where color pixels are implemented by using color filters for individual sensor pixels), in a different approach, theimage sensor130 may use an architecture where each photo-site includes a number of stacked photodiodes. Preferably, the stack contains four stacked photodiodes responsive to the primary colors RGB and infrared, respectively. These stacked photodiodes may be integrated into the silicon substrate of the image sensor.

The multi-aperture system, e.g. a multi-aperture diaphragm, may be used to improve the depth of field (DOF) or other depth aspects of the camera. The DOF determines the range of distances from the camera that are in focus when the image is captured. Within this range the object is acceptably sharp. For moderate to large distances and a given image format, DOF is determined by the focal length of the imaging optics N, the f-number associated with the lens opening (the aperture), and/or the object-to-camera distance s. The wider the aperture (the more light received) the more limited the DOF. DOF aspects of a multi-aperture imaging system are illustrated inFIG. 3.

Consider firstFIG. 3B, which shows the imaging of anobject150 onto theimage sensor330. Visible and infrared light may enter the imaging system via themulti-aperture system320. In one embodiment, themulti-aperture system320 may be a filter-coated transparent substrate. Onefilter coating324 may have a central circular hole of diameter D1. Thefilter coating324 transmits visible light and reflects and/or absorbs infrared light. Anopaque cover322 has a larger circular opening with a diameter D2. Thecover322 does not transmit either visible or infrared light. It may be a thin-film coating which reflects both infrared and visible light or, alternatively, the cover may be part of an opaque holder for holding and positioning the substrate in the optical system. This way, themulti-aperture system320 acts as a circular aperture of diameter D2 for visible light and as a circular aperture of smaller diameter D1 for infrared light. The visible light system has a larger aperture and faster f-number than the infrared light system. Visible and infrared light passing the aperture system are projected by theimaging optics310 onto theimage sensor330.

The pixels of the image sensor may thus receive a wider-apertureoptical image signal352B for visible light, overlaying a second narrower-apertureoptical image signal354B for infrared light. The wider-aperturevisible image signal352B will have a shorter DOF, while the narrower-aperture infrared image signal354 will have a longer DOF. InFIG. 3B, theobject150B is located at the plane of focus N, so that thecorresponding image160B is in focus at theimage sensor330.

Objects

150 close to the plane of focus N of the lens are projected onto theimage sensor plane330 with relatively small defocus blur. Objects away from the plane of focus N are projected onto image planes that are in front of or behind theimage sensor330. Thus, the image captured by theimage sensor330 is blurred. Because thevisible light352B has a faster f-number than theinfrared light354B, the visible image will blur more quickly than the infrared image as theobject150 moves away from the plane of focus N. This is shown byFIGS. 3A and 3C and by the blur diagrams at the right of each figure.

Most ofFIG. 3B shows the propagation of rays fromobject150B to theimage sensor330. The righthand side ofFIG. 3B also includes a blur diagram335, which shows the blurs resulting from imaging of visible light and of infrared light from an on-axis point152 of the object. InFIG. 3B, the on-axis point152 produces avisible blur332B that is relatively small and also produces aninfrared blur334B that is also relatively small. That is because, inFIG. 3B, the object is in focus.

FIGS. 3A and 3C show the effects of defocus. InFIG. 3A, theobject150A is located to one side of the nominal plane of focus N. As a result, thecorresponding image160A is formed at a location in front of theimage sensor330. The light travels the additional distance to theimage sensor330, thus producing larger blur spots than inFIG. 3B. Because thevisible light352A is a faster f-number, it diverges more quickly and produces alarger blur spot332A. The infrared light354 is a slower f-number, so it produces ablur spot334A that is not much larger than inFIG. 3B. If the f-number is slow enough, the infrared blur spot may be assumed to be constant size across the range of depths that are of interest.

FIG. 3C shows the same effect, but in the opposite direction. Here, theobject150C produces animage160C that would fall behind theimage sensor330. Theimage sensor330 captures the light before it reaches the actual image plane, resulting in blurring. Thevisible blur spot332C is larger due to the faster f-number. Theinfrared blur spot334C grows more slowly with defocus, due to the slower f-number.

TheDSP180 may be configured to process and combine the captured color and infrared images. Improvements in the DOF and the ISO speed provided by a multi-aperture imaging system are described in more detail in U.S. application Ser. No. 13/144,499, “Improving the depth of field in an imaging system”; U.S. application Ser. No. 13/392,101, “Reducing noise in a color image”; U.S. application Ser. No. 13/579,568, “Processing multi-aperture image data”; U.S. application Ser. No. 13/579,569, “Processing multi-aperture image data”; and U.S. application Ser. No. 13/810,227, “Flash system for multi-aperture imaging.” All of the foregoing are incorporated by reference herein in their entirety.

In one example, the multi-aperture imaging system allows a simple mobile phone camera with a typical f-number of 2 (e.g. focal length of 3 mm and a diameter of 1.5 mm) to improve its DOF via a second aperture with a f-number varying e.g. between 6 for a diameter of 0.5 mm up to 15 or more for diameters equal to or less than 0.2 mm. The f-number is defined as the ratio of the focal length f and the effective diameter of the aperture. Preferable implementations include optical systems with an f-number for the visible aperture of approximately 2 to 4 for increasing the sharpness of near objects, in combination with an f-number for the infrared aperture of approximately 16 to 22 for increasing the sharpness of distance objects.

The multi-aperture imaging system may also be used for generating depth information for the captured image. TheDSP180 of the multi-aperture imaging system may include at least one depth function, which typically depends on the parameters of the optical system and which in one embodiment may be determined in advance by the manufacturer and stored in the memory of the camera for use in digital image processing functions.

If the multi-aperture imaging system is adjustable (e.g., a zoom lens), then the depth function typically will also include the dependence on the adjustment. For example, a fixed lens camera may implement the depth function as a lookup table, and a zoom lens camera may have multiple lookup tables corresponding to different focal lengths, possibly interpolating between the lookup tables for intermediate focal lengths. Alternately, it may store a single lookup table for a specific focal length but use an algorithm to scale the lookup table for different focal lengths. A similar approach may be used for other types of adjustments, such as an adjustable aperture. In various embodiments, when determining the distance or change of distance of an object from the camera, a lookup table or a formula provides an estimate of the distance based on one or more of the following parameters: the blur kernel providing the best match between IR and RGB image data; the f-number or aperture size for the IR imaging; the f-number or aperture size for the RGB imaging; and the focal length. In some imaging systems, the physical aperture is constrained in size, so that as the focal length of the lens changes, the f-number changes. In this case, the diameter of the aperture remains unchanged but the f-number changes. The formula or lookup table could also take this effect into account.

In certain situations, it is desirable to control the relative size of the IR aperture and the RGB aperture. This may be desirable for various reasons. For example, adjusting the relative size of the two apertures may be used to compensate for different lighting conditions. In some cases, it may be desirable to turn off the multi-aperture aspect. As another example, different ratios may be preferable for different object depths, or focal lengths or accuracy requirements. Having the ability to adjust the ratio of IR to RGB provides an additional degree of freedom in these situations.

FIG. 3D is a diagram illustrating adjustment of the relative sizes of anIR aperture324 andvisible aperture322. In this diagram, the hashed annulus is amechanical shutter370. On the lefthand side, themechanical shutter370 is fully open so that thevisible aperture322 has maximum area. On the righthand side, theshutter370 is stopped down, so that thevisible aperture322 has less area but theIR aperture324 is unchanged so that the ratio between visible and IR can be adjusted by adjusting themechanical shutter370. InFIG. 3E, theIR aperture324 is located near the edge of thevisible aperture322. Stopping down themechanical shutter370 reduces the size (and changes the shape) of theIR aperture324 and the dual-aperture mode can be eliminated by stopping theshutter370 to the point where theIR aperture324 is entirely covered. Similar effects can be implemented by other mechanisms, such as adjusting electronic shuttering or exposure time.

As described above inFIGS. 3A-3C, a scene may contain different objects located at different distances from the camera lens so that objects closer to the focal plane of the camera will be sharper than objects further away from the focal plane. A depth function may relate sharpness information for different objects located in different areas of the scene to the depth or distance of those objects from the camera. In one embodiment, a depth function is based on the sharpness of the color image components relative to the sharpness of the infrared image components.

Here, the sharpness parameter may relate to the circle of confusion, which corresponds to the blur spot diameter measured by the image sensor. As described above inFIGS. 3A-3C, the blur spot diameter representing the defocus blur is small (approaching zero) for objects that are in focus and grows larger when moving away to the foreground or background in object space. As long as the blur disk is smaller than the maximum acceptable circle of confusion, it is considered sufficiently sharp and part of the DOF range. From the known DOF formulas it follows that there is a direct relation between the depth of an object, e.g. its distance s from the camera, and the amount of blur or sharpness of the captured image of that object. Furthermore, this direct relation is different for the color image than it is for the infrared image, due to the difference in apertures and f-numbers.

Hence, in a multi-aperture imaging system, the increase or decrease in sharpness of the RGB components of a color image relative to the sharpness of the IR components in the infrared image is a function of the distance to the object. For example, if the lens is focused at 3 meters, the sharpness of both the RGB components and the IR components may be the same. In contrast, due to the small aperture used for the infrared image for objects at a distance of 1 meter, the sharpness of the RGB components may be significantly less than those of the infrared components. This dependence may be used to estimate the distances of objects from the camera.

In one approach, the imaging system is set to a large (“infinite”) focus point. That is, the imaging system is designed so that objects at infinity are in focus. This point is referred to as the hyperfocal distance H of the multi-aperture imaging system. The system may then determine the points in an image where the color and the infrared components are equally sharp. These points in the image correspond to objects that are in focus, which in this example means that they are located at a relatively large distance (typically the background) from the camera. For objects located away from the hyperfocal distance H (i.e., closer to the camera), the relative difference in sharpness between the infrared components and the color components will change as a function of the distance s between the object and the lens.

The sharpness may be obtained empirically by measuring the sharpness (or, equivalently, the blurriness) for one or more test objects at different distances s from the camera lens. It may also be calculated based on models of the imaging system. In one embodiment, sharpness is measured by the absolute value of the high-frequency infrared components in an image. In another approach, blurriness is measured by the blur size or point spread function (PSF) of the imaging system.

FIG. 4 is a plot of the blur spot sizes B_visand B_irof the visible and infrared images, as a function of object distance s.FIG. 4 shows that around the focal distance N, which in this example is the hyperfocal distance, the blur spots are the smallest. Away from the focal distance N, the color components experience rapid blurring and rapid increase in the blur spot size B_vis. In contrast, as a result of the relatively small infrared aperture, the infrared components do not blur as quickly and, if the f-number is slow enough, the blur spot size B_irmay be approximated as constant in size over the range of depths considered.

Now consider the object distance s_x. At this object distance, the infrared image is produced with ablur spot410 and the visible image is produced with ablur spot420. Conversely, if the blur spot sizes were known, or the ratio of the blur spot sizes were know, this information could be used to estimate the object distance s_x. Recall that the blur spot, also referred to as the point spread function, is the image produced by a single point source. If the object were a single point source, then the infrared image will be a blur spot ofsize410 and the corresponding visible image will be a blur spot ofsize420.

FIG. 5 illustrates one approach to estimating the object distance based on the color and infrared blur spots.FIG. 5 is a table of blur spot as a function of object distance s. For each object distance s_k, there is shown a corresponding IR blur spot (PSF_ir) and color blur spot (PSF_vis). The IR image I_nis the convolution of an ideal image I_idealwith PSF_ir, and the color image I_visis the convolution of the ideal image I_idealwith PSF_vis.

I_ir=I_ideal*PSF_ir (1)

I_vis=I_ideal*PSF_vis (2)

where * is the convolution operator. Manipulating these two equations yields

I_vis=I_ir*B (3)

where B is a blur kernel that accounts for deblurring of the IR image followed by blurring of the visible image. The blur kernels B can be calculated in advance or empirically measured as a function of object depth s, producing a table as shown inFIG. 5.

InFIG. 5, the blur kernel B is shown as similar in size to the visible blur spot PSF_vis. Under certain circumstances, the IR blur spot PSF_irmay be neglected or otherwise accounted for. For example, if the IR blur spot is small relative to the visible blur spot PSF_vis, then neglecting the effect of the IR blur may be negligible. As another example, if the IR blur spot does not vary significantly with object distance, then it may be neglected for purposes of calculating the blur kernel B, but may be accounted for by a systematic adjustment of the results.

FIG. 6A is a diagram illustrating a method for producing an estimate s* of the object distance s using abank610 of blur kernels B_k. The infrared image I_nis blurred by each of the blur kernels B_kin the bank. In this example, the blurring is accomplished by convolution, although faster approaches will be discussed below. This results in estimated visible images I*_vis.

Each of these estimated images I*_visis compared620 to the actual visible image I_vis. In this example, the comparison is a sum squared error e_kbetween the two images.

FIG. 6B is a graph of error e as a function of kernel number k for the architecture ofFIG. 6A. Recall that each kernel number k corresponds to a specific object distance s. The error metrics e are processed630 to yield an estimate s* of the object distance. In one approach, the minimum error e_kis identified, and the estimated object distance s* is the object depth s_kcorresponding to the minimum error e_k. Other approaches can also be used. For example, the functional pairs (s_k,e_k) can be interpolated for the value of s that yields the minimum e.

The infrared image I_irand visible image I_visinFIG. 6A typically are not the entire captured images. Rather, the approach ofFIG. 6A can be applied to different windows within the image in order to estimate the depth of the objects in the window. In this way, a depth map of the entire image can be produced.

The approach ofFIG. 6A includes a convolution for each blur kernel. If the window and blur kernel B_kare each large, the convolution can be computationally expensive. The blur kernels B_kby definition will vary in size. For example, the smallest blur kernel may be 3×3 while the largest may be 25×25 or larger. In order to accommodate the largest blur kernels, the window should be at least the same size as the largest blur kernel, which means a large window size is required for a bank that includes a large blur kernel. Furthermore, the same window should be used for all blur kernels in order to allow direct comparison of the calculated error metrics. Therefore, if the bank includes a large blur kernel, a large window will be used for all blur kernels, which can lead to computationally expensive convolutions.

FIG. 7A is a diagram illustrating a variation ofFIG. 6A that addresses this issue. Rather than using a single bank of blur kernels, as inFIG. 6A, the approach ofFIG. 7A uses multiple banks710a-M of blur kernels. Each bank contains multiple blur kernels. However, each bank710 is down-sampled by a different down-sampling factor. For example,bank710amay use the smallest blur kernels and the original images without down-sampling,bank710bmay use the next smallest set of kernels but with down-sampling of 2×, and so on. InFIG. 7A,bank710muses down-sampling of mx. The visible image and the infrared image are also down-sampled by mx, as indicated by the boxes marked “/m”.Bank710muses blur kernels J to (J+K), each of which is also down-sampled by mx, as indicated by the “/m” in “*B_J/m”. Each bank710 produces a result, for example an estimated object distance s_m* and these are combined 730 into an overall depth estimate s*.

One advantage of this approach is that down-sampled blur kernels are smaller and therefore require less computation for convolution and other operations. The table below shows a set of 9 blur kernels, ranging in size from 3×3 forblur kernel1, to 25×25 for blur kernel9. In the approach ofFIG. 6A, blur kernel9 would be 25×25 with a corresponding number of multiply-accumulates used to implement convolution. In contrast, in the table below, all blur kernels are down-sampled so that no convolution uses a kernel larger than 5×5.

TABLE 1

Kernel	Size of	Down-sampling
number (k)	blur kernel	factor

1	3 × 3	1x
2	5 × 5	2x
3	8 × 8	2x
4	11 × 11	3x
5	14 × 14	3x
6	17 × 17	4x
7	20 × 20	4x
8	23 × 23	5x
9	25 × 25	5x

FIGS. 7B and 7C are graphs of error as a function of blur kernel number k for the architecture ofFIG. 7A. If the down-sampling is performed without normalizing energies, then the error curve may exhibit discontinuities when transitioning from one bank to the next bank.FIG. 7B shows an error curve using five banks Each piece of the curve corresponds to one of the banks Each curve is continuous because the same down-sampling factor is used for all blur kernels in that bank. However, the down-sampling factor changes from one bank to the next so the different pieces of the curve may not align correctly. However, the minimum error can still be determined. In this example,curve750cis the only curve that has a minimum within that curve. The other four curves are either monotonically increasing or monotonically decreasing. Therefore, the minimum error occurs withincurve750c. More sophisticated approaches may also be used. For example, differentials across the entire range of curves may be analyzed to predict the point of minimum error. This approach can be used to avoid local minima, which may be caused by noise or other effects.

InFIG. 7B, the curves are shown as continuous within each bank. However, there may be a limited number of samples for each bank.FIG. 7C is the same asFIG. 7B, except that there are only three samples for each bank. InFIG. 7C, the dashed ovals identify each of the banks Each of the banks can be classified as monotonically increasing, monotonically decreasing or containing an extremum. In this example,

banks

750aand750bare monotonically decreasing,bank750ccontains an extremum, and

banks

750dand750eare monotonically increasing. Based on these classifications, the minimum error e occurs somewhere withinbank750c. Finer resolution sampling withinbank750ccan then be performed to more accurately locate the location of the minimum value.

InFIG. 7D,

banks

750aand750bare monotonically decreasing, and

banks

750cand750dare monotonically increasing. There is no bank that exhibits an internal extremum based on the samples shown. However, based on the gradients for the banks, the minimum lies in the range covered by

banks

750band750c. In this case, another bank can be constructed that spans the gap between

banks

750band750c. That bank will then have an internal minimum.

These figures effectively illustrate different sampling approaches to find the extremum of the error function e(k). As another variation, the error function e(k) may be coarsely sampled at first in order to narrow the range of k where the minimum error e exists. Finer and finer sampling may be used as the range is narrowed. Other sampling approaches can be used to find the value of kernel number k (and the corresponding object distance) where the extremum of the error function e(k) occurs.

Down-sampling can be implemented in other ways. For example, the visible images may be down-sampled first. The blur kernels are then down-sampled to match the down-sampling of the visible images. The down-sampled blur kernels are applied to the full resolution IR images. The result is an intermediate form which retains the fill resolution of the IR image but then is down-sampled to match the resolution of the down-sampled visible images. This method is not as efficient as fully down-sampling the IR but is more efficient than not using down-sampling at all. This approach may be beneficial to reduce computation while still maintaining a finer resolution.

Another aspect is that the approach ofFIG. 6A depends on the content of the window. For example, a window for which the only object is a single point source object (e.g., a window containing a single star surrounded entirely by black night sky) will yield a good result because that image is a direct measure of the underlying point spread functions. Similarly, a window that contains the image of only an edge will also yield a good result because that image is a direct measure of the underlying point spread functions albeit only along one direction. At the other extreme, a window that is constant and has no features will not yield any estimate because every estimated visible image will also be a constant so there is no way to distinguish the different blur kernels. Other images may be somewhere between these extremes. Features will help distinguish the different blur kernels. Featureless areas will not and typically will also add unwanted noise.

In one approach, the windows are selected to include edges. Edge identification can be accomplished using known algorithms. Once identified, edges preferably are processed to normalize variations between the different captured images.FIG. 8 shows one example. In this example, the green component I_gmof the color image is the fast f-number image and the IR image I_iris the slow f-number image. The left column ofFIG. 8 shows processing of the green image while the right column shows processing of the IR image. The top row shows the same edge appearing in both images. The object is not in focus so that the green edge is blurred relative to the IR edge. Also note that the edge has different phase in the two images. The green edge transitions from high to low amplitude, while the IR edge transitions from low to high amplitude.FIG. 8 shows one approach to normalize these edges to allow comparisons using blur kernels as described above.

The second row ofFIG. 8 shows both edges afterdifferentiation810. Theabsolute value820 of the derivatives is then taken, yielding the third row ofFIG. 8. This effectively removes the phase mismatch between the two edges, yielding two phase matched edges. The two edges are then scaled830, resulting in the bottom row ofFIG. 8. In this example, the IR image is binarized to take on only the

values

0 or 1, and the green image is scaled in amplitude to have equal energy as the IR image. The blur kernels are also scaled in amplitude so that, although a blur kernel might spread the energy in an image over a certain area, it does not increase or decrease the total energy. This then allows a direct comparison between the actual green edge and the estimated green edges calculated by applying the blur kernels to the IR edge.

Note that the IR edge looks like a line source. This is not uncommon since the IR point spread function is small and fairly constant over a range of depths, compared to the color point spread function. Also recall that inFIG. 6, the IR image is convolved with many different blur kernels. The convolution can be simplified as follows. First, the IR edge is binarized, so that the IR image is a binary image taking on only the values of 0 or 1. (Instep830 above, the color image is then scaled in amplitude to have equal energy as the binary IR image). Convolution generally requires multiplies and adds. However, when the image only takes values of 0 or 1, the multiplies are simplified. Multiplying by 0 yields all 0's so that pixels with 0 value can be ignored. Multiplying by 1 yields the blur kernel so that no actual multiplication is required. Rather, any pixel with 1 value causes an accumulation of the blur kernel centered on that pixel.

FIGS. 9A-9E illustrate this concept.FIG. 9A shows a 4×4 window with a binarized edge, where the pixels are either 1 or 0.FIG. 9B shows a 3×3 blur kernel to be convolved with the window.FIGS. 9C-9E show progression of the convolution using only adds and no multiplies. In these figures, the lefthand side shows the binarized edge ofFIG. 9A and the righthand side shows progression of the convolution. InFIG. 9C,pixel910 has been processed, meaning that the blur kernel centered onpixel910 has been added to the moving sum on the right. InFIG. 9D, the next pixel along theedge911 has been processed. The blur kernel centered onpixel911 is added to the moving sum, which already contains the effect ofpixel910. The result is shown on the right. This continues for all pixels with value of 1.FIG. 9E shows the final result after all four edge pixels have been processed. This is the estimated green edge, which can then be compared to the actual green edge. If the two match well, then the blur kernel shown inFIG. 9B is the correct blur kernel for this window and can be used to estimate the object distance for this edge.

Edges in an image may be caused by a sharp transition within an object, for example the border between black and white squares on a checkerboard. In that case, the approach shown inFIG. 9 may be implemented using entire blur kernels. However, edges may also be caused by occlusion, when a closer object partially blocks a more distant object. InFIG. 10, thesign1010 in the foreground partially blocks thehouse1020 in the background. This creates anedge1030 in the image. However, the left side of the edge is thesign1010, which is at a closer object distance, and the right side of the edge is thehouse1020, which is at a farther object distance. The two different object distances correspond to different blur kernels. Applying a single blur kernel to the edge will not give good results, because when one side is matched to the blur kernel, the other side will not be.

Single-sided blur kernels can be used instead. A single-sided blur kernel is half a blur kernel instead of an entire blur kernel.FIG. 11 shows a set of eight single-sided blur kernels with different edge orientations based on the 3×3 blur kernel ofFIG. 9B. The full 3×3 blur kernel is reproduced in the center ofFIG. 11. Note that different single-sided blur kernels can be derived from the same full blur kernel, depending on the orientation of the edge. InFIG. 11, thesolid line1110 represents the edge. These single-sided blur kernels can be applied to binarized edges, as described above, to yield different depth estimates for each side of the edge.

FIG. 12 illustrates another aspect of the approach described above. As described above, a bank of blur kernels of varying sizes is used to estimate the object depth. Blur kernels effectively act as low pass filters. Larger blur kernels cause more blurring and therefore have lower cutoff frequencies compared to smaller blur kernels.FIG. 12 shows a generalized frequency response for a bank of blur kernels.Blur kernel1210A is the low pass filter with the lowest cutoff frequency in the bank, which corresponds to the blur kernel with the largest blur size.Blur kernel1210B is the second largest blur kernel and so on to blurkernel1210D, which has the highest cutoff frequency and smallest blur size. The IR image is blurred by each of these blur kernels, and the results are compared to determine which blur kernel corresponds to the object depth.

However, note that theblur kernels1210A-D differ only within thefrequency range1220. Outside thisfrequency range1220, all of theblur kernels1210A-D in the bank have the same behavior. Therefore, content outside thefrequency range1220 will not distinguish between thedifferent blur kernels1210A-D. However, that content will add to background noise. Therefore, in one approach, frequency filtering is added to reduce energy and noise from outside thefrequency range1220. In one approach, the original images are frequency filtered. In another approach, the blur kernels may be frequency filtered versions. The frequency filtering may be low pass filtering to reduce frequency content abovefrequency1220B, high pass filtering to reduce frequency content belowfrequency1220A, or bandpass filtering to reduce both the low frequency and high frequency content. The filtering may take different forms and may be performed regardless of whether down-sampling is also used. When it is used, down-sampling is a type of low pass filtering.

The filtering may also be applied to less than or more than all the blur kernels in a bank. For example, a narrower bandpass filter may be used if it is desired to distinguish only blur

kernels

1210A and1210B (i.e., to determine the error gradient betweenblur kernels1210A-1210B). Most of the difference between those two blur kernels occurs in thefrequency band1230, so a bandpass filter that primarily passes frequencies within that range and rejects frequencies outside that range will increase the relative signal available for distinguishing the two

blur kernels

1210A and1210B.

Window sizes and locations preferably are selected based on the above considerations, and the window size may be selected independent of the blur kernel size. For example, window size may be selected to be large enough to contain features such as edges, small enough to avoid interfering features such as closely spaced parallel edges, and generally only large enough to allow processing of features since larger windows will add more noise. The size of the blur kernel may be selected to reduce computation (e.g., by down-sampling) and also possibly in order to provide sufficient resolution for the depth estimation. As a result, the window size may be different (typically, larger) than the size of the blur kernels.

The number of windows and window locations may also be selected to contain features such as edges, and to reduce computation. A judicious choice of windows can reduce power consumption by having fewer pixels to power up and to read out, which in turn can be used to increase the frame rate. A higher frame rate may be advantageous for many reasons, for example in enabling finer control of gesture tracking.

Embodiments of the invention may be implemented as a program product for use with a computer system. The program(s) of the program product define functions of the embodiments (including the methods described herein) and can be contained on a variety of computer-readable storage media. Illustrative computer-readable storage media include, but are not limited to: (i) non-writable storage media (e.g., read-only memory devices within a computer such as CD-ROM disks readable by a CD-ROM drive, flash memory, ROM chips or any type of solid-state non-volatile semiconductor memory) on which information is permanently stored; and (ii) writable storage media (e.g., floppy disks within a diskette drive or hard-disk drive or any type of solid-state random-access semiconductor memory) on which alterable information is stored.

It is to be understood that any feature described in relation to any one embodiment may be used alone, or in combination with other features described, and may also be used in combination with one or more features of any other of the embodiments, or any combination of any other of the embodiments. Moreover, the invention is not limited to the embodiments described above, which may be varied within the scope of the accompanying claims. For example, aspects of this technology have been described with respect to different f-number images captured by a multi-aperture imaging system. However, these approaches are not limited to multi-aperture imaging systems. They can also be used in other systems that estimate depth based on differences in blurring, regardless of whether a multi-aperture imaging system is used to capture the images. For example, two images may be captured in time sequence, but at different f-number settings. Another method is to capture two or more images of the same scene but with different focus settings, or to rely on differences in aberrations (e.g., chromatic aberrations) or other phenomenon that cause the blurring of the two or more images to vary differently as a function of depth so that these variations can be used to estimate the depth.