BACKGROUNDCurrent video cameras lack sufficient dynamic range to capture different types of scenes. The result of such a lack of dynamic range is that an object of interest in a scene may be underexposed or overexposed, resulting in an image in which the object of interest is either too dark or too bright. In most images, the object of interest may be in the center of the frame and a mask may be used to determine the location of the object of interest. The exposure may then be adjusted based on the current exposure of the area represented by the mask.
SUMMARYThe following presents a simplified summary of the disclosure in order to provide a basic understanding to the reader. This summary is not an extensive overview of the disclosure and it does not identify key/critical elements of the invention or delineate the scope of the invention. Its sole purpose is to present some concepts disclosed herein in a simplified form as a prelude to the more detailed description that is presented later.
The present example provides a system and method for controlling an automatic gain control and/or an automatic exposure control for a camera. A region of interest may be determined using any or all of sound source location, multi-person detection, and active speaker detection. An image mean may be determined using the region of interest and a set of backlight weight regions, or, only the set of backlight weight regions if a region of interest could not be found. The image mean may then be compared to a target value to determine if the image mean falls within the target value within a predetermined threshold. If the image mean is greater than or less than the predetermined target value and predetermined threshold value, the gain may be increased or decreased and the exposure may be increased or decreased.
Many of the attendant features will be more readily appreciated as the same becomes better understood by reference to the following detailed description considered in connection with the accompanying drawings.
DESCRIPTION OF THE DRAWINGSThe present description will be better understood from the following detailed description read in light of the accompanying drawings, wherein:
FIG. 1 shows an example of a computing device for implementing one or more embodiments of the invention.
FIG. 2 shows an example system for controlling an automatic gain control and an automatic exposure control using an example implementation of a gain/exposure control component.
FIG. 3 shows an exemplary digital image including an example detected region of interest.
FIG. 4 shows further detail of an example gain/exposure control component.
FIG. 5 shows an exemplary digital image including exemplary backlight regions.
FIG. 6 shows an example method for adjusting image gain and image exposure.
Like reference numerals are used to designate like parts in the accompanying drawings.
DETAILED DESCRIPTIONThe detailed description provided below in connection with the appended drawings is intended as a description of the present examples and is not intended to represent the only forms in which the present example may be constructed or utilized. The description sets forth the functions of the example and the sequence of steps for constructing and operating the example. However, the same or equivalent functions and sequences may be accomplished by different examples.
Although the present examples are described and illustrated herein as being implemented in a system for controlling an automatic gain control and an automatic exposure control using region of interest detection, the system described is provided as an example and not a limitation. As those skilled in the art will appreciate, the present examples are suitable for application in a variety of different types of systems for controlling an automatic gain control and an automatic exposure control using region of interest detection.
FIG. 1 and the following discussion are intended to provide a brief, general description of a suitable computing environment to implement embodiments of the invention. The operating environment ofFIG. 1 is only one example of a suitable operating environment and is not intended to suggest any limitation as to the scope of use or functionality of the operating environment. Other well known computing devices, environments, and/or configurations that may be suitable for use with embodiments described herein include, but are not limited to, personal computers, server computers, hand-held or laptop devices, mobile devices (such as mobile phones, Personal Digital Assistants (PDAs), media players, and the like), multiprocessor systems, consumer electronics, mini computers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
Although not required, embodiments of the invention will be described in the general context of “computer readable instructions” being executed by one or more computing devices. Computer readable instructions may be distributed via computer readable media (discussed below). Computer readable instructions may be implemented as program modules, such as functions, objects, Application Programming Interfaces (APIs), data structures, and the like, that perform particular tasks or implement particular abstract data types. Typically, the functionality of the computer readable instructions may be combined or distributed as desired in various environments.
FIG. 1 shows an example of acomputing device100 for implementing one or more embodiments of the invention. In one configuration,computing device100 includes at least oneprocessing unit102 andmemory104. Depending on the exact configuration and type of computing device,memory104 may be volatile (such as RAM), non-volatile (such as ROM, flash memory, etc.) or some combination of the two. This configuration is illustrated inFIG. 1 by dashedline106.
In other embodiments,device100 may include additional features and/or functionality. For example,device100 may also include additional storage (e.g., removable and/or non-removable) including, but not limited to, magnetic storage, optical storage, and the like. Such additional storage is illustrated inFIG. 1 bystorage108. In one embodiment, computer readable instructions to implement embodiments of the invention may be stored instorage108.Storage108 may also store other computer readable instructions to implement an operating system, an application program, and the like.
The term “computer readable media” as used herein includes computer storage media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions or other data.Memory104 andstorage108 are examples of computer storage media. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVDs) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed bydevice100. Any such computer storage media may be part ofdevice100.
Device100 may also include communication connection(s)112 that allowdevice100 to communicate with other devices. Communication connection(s)112 may include, but is not limited to, a modem, a Network Interface Card (NIC), or other interfaces for connectingcomputing device100 to other computing devices. Communication connection(s)112 may include a wired connection or a wireless connection. Communication connection(s)112 may transmit and/or receive communication media.
Communication media typically embodies computer readable instructions or other data in a “modulated data signal” such as a carrier wave or other transport mechanism and includes any information delivery media. The term “computer readable media” may include communication media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency, infrared, and other wireless media.
Device100 may include input device(s)114 such as keyboard, mouse, pen, voice input device, touch input device, infra-red cameras, video input devices, and/or any other input device. Output device(s)116 such as one or more displays, speakers, printers, and/or any other output device may also be included indevice100. Input device(s)114 and output device(s)116 may be connected todevice100 via a wired connection, wireless connection, or any combination thereof. In one embodiment, an input device or an output device from another computing device may be used as input device(s)114 or output device(s)116 forcomputing device100.
Components ofcomputing device100 may be connected by various interconnects, such as a bus. Such interconnects may include a Peripheral Component Interconnect (PCI), such as PCI Express, a Universal Serial Bus (USB), firewire (IEEE 1394), an optical bus structure, and the like. In another embodiment, components ofcomputing device100 may be interconnected by a network. For example,memory104 may be comprised of multiple physical memory units located in different physical locations interconnected by a network.
Those skilled in the art will realize that storage devices utilized to store computer readable instructions may be distributed across a network. For example, acomputing device130 accessible vianetwork120 may store computer readable instructions to implement one or more embodiments of the invention.Computing device100 may accesscomputing device130 and download a part or all of the computer readable instructions for execution. Alternatively,computing device100 may download pieces of the computer readable instructions, as needed, or some instructions may be executed atcomputing device100 and some atcomputing device130. Those skilled in the art will also realize that all or a portion of the computer readable instructions may be carried out by a dedicated circuit, such as a Digital Signal Processor (DSP), programmable logic array, and the like.
FIG. 2 shows anexample system200 for controlling anautomatic gain control240 and an automatic exposure control250 using an example implementation of a gain/exposure control component220. In theexemplary system200, the gain/exposure control component further implements an example gain/exposure adjustment method230. Finally, theexample system200 includes one ormore cameras205, one ormore microphones207, and an audio/video capture component210.
The audio/video capture component210 is communicatively coupled to the gain/exposure control component220. The gain/exposure control component220 is in turn communicatively coupled to each of theautomatic gain control240 and the automatic exposure control250. Each of theautomatic gain control240 and the automatic exposure control250 are further communicatively coupled to the one ormore cameras205. The one ormore cameras205 and the one ormore microphones207 are communicatively coupled to the audio/video capture component210.
Referring now to the functioning of theexample system200, the audio/video capture component210 includes functionality to digital capture video, audio, and still images to produce digital video, audio, and still image files from the one ormore cameras205. It is to be appreciated that the one ormore cameras205 may produce panoramic images when disposed as more than one panoramic camera, or, may also produce a panoramic image when disposed as a single sensor camera. The one ormore cameras205 may each include a camera lens, at least one image sensor, a control for controlling gain, a control for controlling exposure, and the like. The one ormore microphones207 may be disposed in an array such that the one or microphones may be used for sound source localization (SSL).
The one ormore cameras205 send digital image information to the audio/video capture component210. Accordingly, the one ormore microphones207 also send analog audio information to the audio/video capture component210. The audio/video capture component210 may include an analog to digital convertor for converting sound waves received by the one ormore microphones207 into a digital audio signal, digital storage for storing the digitized audio and video, and the like. In particular, a digital video signal may be comprised of a series of sequential static digital video images that produce the illusion of motion when played back in order.
The automatic gain control (AGC)240 is an adaptive system that amplifies the signal representing a captured digital image. The effect of amplifying the signal representing a captured digital image increases the values of all pixels in the captured digital image, thereby adding information to all pixels included in the captured digital image. In turn, the effect of adding information to all pixels in the captured digital image may be such that noise, or unwanted information, is added to the captured digital image. Theautomatic gain control240 may further incrementally increase the gain or incrementally decrease the gain to bring the gain within a range of a predetermined target value offset above and below by a threshold value.
The automatic exposure control250 controls the amount of light that falls upon the image sensor implemented by the audio/video capture device210. The effect of increasing the amount of light that falls upon the sensor increases the strength of a signal representing a digital image. The effect of decreasing the amount of light that falls upon the sensor decreases the strength of a signal representing a digital image. The automatic exposure control250 functions similarly to theautomatic gain control240 in that the automatic exposure control250 may incrementally increase or incrementally decrease the exposure to bring the exposure within a range of a predetermined target value offset above and below by a threshold value.
Once the digital video and audio has been captured by the audio/video capture component210, the gain/exposure control component220 analyzes all or a portion of the captured audio and video to determine whether to increase or decrease the gain and/or exposure. The gain/exposure control component220 implements the gain/exposure adjustment method230 to perform such an analysis. Accordingly, the gain/exposure adjustment method230 further implements several methods of detecting a region of interest within the captured image.
Once a region of interest has been detected, the gain/exposure adjustment method230 calculates the image mean using the region of interest before adjusting the gain and exposure in accordance with the image mean. If no region of interest is detected, pixels in different regions of the captured digital image may have a weight applied depending on expected physical characteristics of the environment in which the image was captured. In the case where the one ormore cameras205 includes multiple cameras, the sensor included with each camera is adjusted using the portion of a panoramic image captured by the corresponding camera sensor. In the case where the one ormore cameras205 includes only a single camera, the sensor of the single camera is adjusted using the entire panoramic image captured by the single camera.
For example,FIG. 3 shows an exemplarydigital image300 including an example detected region ofinterest320. For example, the exemplarydigital image300 may be a single captured digital image in a sequence of digital images that collectively comprise a digital video. In this example, thedigital image300 includes ahuman speaker310. The region ofinterest320 is a region or regions in thedigital image300 that contains all or a portion of thehuman speaker310, or any other object of interest.
The region ofinterest320 is shown as including the head of thehuman speaker310; however, the region ofinterest320 may include a smaller region of thehuman speaker310, such as the mouth of thehuman speaker310, or a larger region of thehuman speaker310, such as the entire body of thehuman speaker310. The region ofinterest320 is illustrated as a polygon in this example; however, the shape of the region ofinterest320 may be any type of shape including an outline of thehuman speaker310, or the like. Furthermore, the region ofinterest320 may be an array of pixels within thedigital image300 bounded by the edges of the region ofinterest320.
Turning now toFIG. 4,FIG. 4 shows further detail of an example gain/exposure control component220. The example gain/exposure control component220 includes the gain/exposure adjustment method230 (fromFIG. 2), a region ofinterest detection component420, a collection of one or more backlightweighted regions430 for use by the gain/exposure adjustment method430, and atunable parameter440 also for use by the gain/exposure adjustment method230.
The region ofinterest detection component420 includessound source localization422,multi-person detection424, andactive speaker detection426.Active speaker detection426 and soundsource localization422 are described in U.S. patent application Ser. No. 11/425,967 filed Jun. 22, 2006, titled “Identification of People Using Multiple Types of Input”.Multi-person detection424 is described in U.S. Pat. No. 7,050,607 filed Dec. 8, 2001, titled “A System and Method for Multi-View Face Detection”.
Sound source localization422 provides functionality for locating a speaker in a captured panoramic digital image by estimating the sound source time of arrivals from a microphone array and using the geometry of the array to estimate the location of the sound source.Active speaker detection426 includes identifying a pool of identifying features from audio and video input and a classifier that selects a subset of identifying features to identify regions where people or a speaker may be located in a captured digital image.Multi-person detection424 includes a system and methods for identifying one or more areas of a digital image in which one or more human faces are likely to be.Multi-person detection424 may also utilize information related to motion in a digital video file to identify an area in which one or more human faces are likely to be.
The region ofinterest detection component420 may make use of any or all ofsound source localization422,multi-person detection424, andactive speaker detection426 to determine a region of interest320 (fromFIG. 3). It is to be appreciated that each ofsound source localization422,multi-person detection424, andactive speaker detection426 may be used in any combination to best determine the region ofinterest320. For example, soundsource localization422 may be used to determine the likely location of one or more human speakers in a captured digital image, and the determined likely locations may be combined with likely location of human speakers identified bymulti-person detection424, resulting in a more accurate detection of the human speakers.
The gain/exposure adjustment method230 utilizes the region ofinterest320 determined by the region ofinterest detection component420 to compute the image mean of a digital image under analysis. In the case where the region ofinterest detection component420 does not determine a region ofinterest320, the gain/exposure adjustment method230 may use the backlightweighted regions430 instead. The backlightweighted regions430 will be discussed in more detail in the discussion ofFIG. 5.
In addition, the gain/exposure adjustment method230 may make use of atunable parameter440 to allow finer control of the gain/exposure adjustment method230. Thetunable parameter440 may further expose an interface allowing a user of the gain/exposure control component220 to modify the value associated with thetunable parameter440. The effect of changing the value of thetunable parameter440 may serve to increase or decrease the likelihood of the gain/exposure adjustment method230 to increase or decrease the gain or exposure.
Turning now toFIG. 5,FIG. 5 shows an exemplarydigital image500 including an examplehuman speaker510. The exemplarydigital image500 includes afirst weight region520, a secondweighted region530, and a thirdweighted region540. While three weighted regions are illustrated in thedigital image500, it is to be appreciated that any number of different weighted regions may be included in thedigital image500. In an alternative embodiment, such adigital image500 may instead be a panoramic image taken by a panoramic digital camera.
As previously discussed with respect toFIG. 3 and the detected region ofinterest320, the weighted region most likely to include thehuman speaker510 is given the highest weight. Accordingly, the next most likely region to include a human speaker is given a lower weight, and the next most likely region to include a human speaker is given an accordingly lower weight, and so on.
For example, thesecond weight region530 may be most likely to include thehuman speaker510 in the case where thedigital image500 is taken in a conference room setting including a table512. In this example, the pixels included in thesecond weight region530 may each have a weight of 1.38. Thefirst weight region520 may further be the next most likely region to include a human speaker as a person may be standing to speak. In this example, the pixels included in thefirst weight region520 may each have a weight of 0.92. Thethird weight region540 may be the least likely region to include a human speaker as thethird weight region540 is most likely to include the table512 in a conference room setting. In this example, the pixels included in thethird weight region540 may each have a weight of 0.46.
As previously discussed, it is to be appreciated that thedigital image500 may include any number of weighted regions, and the pixels included in such weighted regions may be given any appropriate weight. Furthermore, such adigital image500 including thefirst weight region520,second weight region530, andthird weight region540 may be used to determine an image mean in an example method for adjusting image gain and image exposure.
Turning now toFIG. 6,FIG. 6 shows an example method230 (fromFIG. 2) for adjusting image gain and image exposure.Block602 refers to an operation in which a frame of digital video and corresponding audio is captured and stored in memory.
Block604 refers to an operation to determine one or more regions of interest within the frame of digital video captured atblock602. A region of interest may be determined in accordance with any or all of sound source localization422 (fromFIG. 4), multi-person detection424 (fromFIG. 4), or active speaker detection426 (fromFIG. 4) as described earlier. Each ofsound source localization422,multi-person detection424, andactive speaker detection426 may be used individually or in combination to determine an area of pixels within the digital image captured atblock602 where a human speaker or human speakers are likely to be located.
Block608 refers to an operation to determine if the region of interest determined atblock608 is empty. If each ofsound source localization422,multi-person detection424, andactive speaker detection426 are unable to determine a region of interest likely to include a human speaker or human speakers, the region of interest will be empty. In response to a negative determination, flow continues on to block610. In response to a positive determination, flow continues on to block620.
Block610 refers to an operation in which an image mean is determined using the region of interest determined atblock604 and backlighting image weights in accordance with the discussion ofFIG. 5. In an example implementation, the image mean may be determined by implementing the following function in computer executable form:
image_mean=A*mean(image*backlight)+(1−A)*mean(ROI)
Where image_mean represents the weighted mean of the digital image captured atblock602, image represents the pixels and any information associated with the pixels such as color information included in the digital image captured atblock602, backlight includes pixels and any weights associated with the pixels in accordance with the backlight regions discussed inFIG. 5, ROI represents the pixels and any information associated with the pixels such as color information included in the region of interest determined atblock604, and A represents a tunable parameter440 (fromFIG. 4). It is to be appreciated that the region of interest may one or more areas assigned different weights depending on the classification of the area. For example, the region of interest may include an area identified using active speaker detection or sound source localization as a human speaking. Such an area may be weighted more heavily than an area determined to include a human that is not speaking using multi-person detection. The weighted mean function is a typical mean function intended to determine an average value of the pixels and any information associated with the pixels in the array of pixels in the image passed to the mean function.
Flow continues on to both block614 and block620.
Block620 refers to an operation in which the image mean is determined using only the backlighting image weights in accordance with the discussion ofFIG. 5. In an example implementation, the image mean may be determined by implementing the following function in computer executable form:
image_mean=mean(image*backlight)
Where image_mean represents the mean of the digital image captured atblock602, image represents the pixels and any information associated with the pixels such as color information included in the digital image captured atblock602, and backlight includes pixels and any weights associated with the pixels in accordance with the backlight regions discussed inFIG. 5. Such an image mean may also be referred to as an exposure/gain metric. The mean function is a typical mean function intended to determine an average value of the pixels and any information associated with the pixels in the array of pixels in the image passed to the mean function.
Flow continues on to both block614 and block620.
Block614 refers to a determination as to whether the image mean determined at either block610 or block612 is greater than a predetermined image mean added to a predetermined threshold. For example, the predetermined image mean may be a value selected in accordance with one or more characteristics of the device used to capture the digital video image atblock602. The predetermined threshold value may also be selected in accordance with one or more characteristics of the device used to capture the digital video image atblock602.
A positive determination atblock614 may indicate that the gain and/or exposure of the device that captured the digital image atblock602 are each too high. In response to a positive determination, flow continues on to block616. In response to a negative determination, flow ends atblock626.
Block616 refers to a determination as to whether the image mean determined at either block610 or block612 is less than a predetermined image mean subtracted from a predetermined threshold. For example, the predetermined image mean may be a value selected in accordance with one or more characteristics of the device used to capture the digital video image atblock602. The predetermined threshold value may also be selected in accordance with one or more characteristics of the device used to capture the digital video image atblock602.
A positive determination atblock614 may indicate that the gain and/or exposure of the device that captured the digital image atblock602 are each too low. In response to a positive determination, flow continues on to block616. In response to a negative determination, flow ends atblock626.
Block616 refers to an operation in which the gain associated with the device that captured the digital image atblock602 is decreased. It is to be appreciated that if the gain is already at a minimum setting, the gain may not be decreased below a minimum setting. Flow continues on to block617.
Block617 refers to an operation in which it is determined if the exposure setting is at a setting greater than zero. In response to a negative determination, flow continues on to block626. In response to a positive determination, flow continues on to block618.
Block618 refers to an operation in which the exposure associated with the device that captured the digital image atblock602 is decreased. It is to be appreciated that flow may return to block602 such that the gain and exposure are decreased until the result is that the gain and exposure are set at such a level that block614 results in a negative determination and flow ends.
Block622 refers to an operation in which the exposure associated with the device that captured the digital image atblock602 is decreased. Flow continues on to block624.
Block624 refers to an operation in which the gain associated with the device that captured the digital image atblock602 is decreased. It is to be appreciated that flow may return to block602 such that the gain and exposure are decreased until the result is that the gain and exposure are set at such a level such thatblock620 results in a negative determination and flow ends.