CROSS REFERENCE TO RELATED APPLICATIONThis application is a divisional of co-pending, commonly owned U.S. patent application Ser. No. 12/560,658, filed Sep. 16, 2009, entitled “TEXTUAL ATTRIBUTE-BASED IMAGE CATEGORIZATION AND SEARCH,” the entirety of which is herein incorporated by reference.
BACKGROUNDA photographer is presented with many challenges when attempting to photograph a scene that includes moving content. In particular, when photographing in outdoor settings, the photographer must contend with uncontrollable factors that may influence the resulting photograph. For example, the amount of sunlight, clouds, and other naturally occurring attributes in a photograph may be difficult to manage particularly when a photographer has a limited amount of time by which to conduct the photography, such as during a vacation with a limited duration.
Often, photographers rely on digital imaging software (DIS) to modify photographs that are stored as digital images, such as to make corrections to the images. For example, a photographer may use DIS to correct or modify lighting, saturation, contrast, or other aspects in an image. In some instances, an image may be modified to add or remove content (e.g., an object, etc.) that is shown in the image. Content modification is typically a user intensive operation that is performed using existing DIS features. Content (e.g., objects, landscape, etc.) that are added to an image sometimes lack a desirable level of realism that may be detectable by a viewer with a trained eye when modifications are not preformed carefully. Thus, modification of images using DIS can require a large amount of human interaction to produce a realistic looking image when modifications take place.
In addition, it is often difficult, time consuming, and cumbersome for a user to search a repository of images when they are looking for specific content or objects, or portion thereof, to add to another image. For example, a search for “lighthouse” may produce drawings, paintings, and real images of lighthouses in a variety of different orientations and settings. A user may have to search through hundreds of images before finding an image that meets the user's requirements.
SUMMARYTechniques and systems for providing textual, attribute-based image categorization and search are disclosed herein. In some aspects, images may be analyzed to identify a category of an image, or portion thereof. A classifier may be trained for each category and configured to generate a classifier score. The classifier scores for an image may be used to associate a category with the image. In various aspects, the categories may be types of sky sceneries.
In some aspects, the images may be analyzed to determine additional textual attributes. The additional textual attributes for sky sceneries may include one or more of a layout, a horizon position, a position of the sun, and a richness factor based on a number of detected edges in the image.
In further aspects, a user interface may provide an intuitive arrangement of the categorized images for user navigation and selection. The user interface may also provide a simplified presentation and facilitate a search of the categorized images. An Image that is selected from the user interface may be used to replace or modify features of an existing target image using a feature replacement function that may be accessible via the user interface.
BRIEF DESCRIPTION OF THE DRAWINGSThe detailed description is described with reference to the examples shown in the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same reference numbers in different figures indicate similar or identical items.
FIG. 1 is a schematic diagram of an illustrative environment where a trained categorizer categorizes images for use by a client device in accordance with embodiments of textual attribute-based image categorization and utilization.
FIG. 2 is a flow diagram of an illustrative process of implementing components of a system that include training, categorizing, and imaging in accordance with embodiments of the disclosure.
FIG. 3 is a flow diagram of an illustrative training process in accordance with embodiments of the disclosure.
FIG. 4 is a pictorial flow diagram of an illustrative training process to train a codebook to perform categorization of images in accordance with some embodiments of the disclosure.
FIG. 5 is a flow diagram of an illustrative categorization process in accordance with various embodiments of the disclosure.
FIG. 6 is a pictorial flow diagram of an illustrative process of using the codebook ofFIG. 4 to categorize images in accordance with one or more embodiments of the disclosure.
FIG. 7 is a flow diagram of an illustrative process of scoring images to determine a category for the image in accordance with various embodiments of the disclosure.
FIG. 8 is a pictorial flow diagram of an illustrative process of determining textual attributes of images in accordance with some embodiments of the disclosure.
FIG. 9 is a flow diagram of an illustrative process of ranking images based on color signature in accordance with some embodiments of the disclosure.
FIG. 10 is an illustrative user interface to enable user selection and manipulation of images based on various user-selected attributes in accordance with one or more embodiments of the disclosure.
FIG. 11 is a flow diagram of an illustrative process of identifying a chain of similar images in accordance with various embodiments of the disclosure.
FIG. 12 is a pictorial flow diagram of an illustrative process of image modification by merging aspects of a source image into a target image in accordance with one or more embodiments of the disclosure.
DETAILED DESCRIPTIONOverviewAs discussed above, image manipulation using digital imaging software (DIS) may enable photographers to enhance an image when an original image lacks desired content or objects, such as a colorful sky or other content or objects. Current DIS tools often require considerable amounts of human interaction (i.e., multiple steps and thorough knowledge of DIS) to replace objects in an image. Further, it may be difficult to search through a collection of images to identify an object with predetermined attributes (e.g., color, orientation, object location, etc.).
In various embodiments, an image categorizer may be trained to classify images based on content in the images. For example and without limitation, the image categorizer may categorize images of the sky into various categories (e.g., blue sky, cloudy, etc.). Attributes such as a horizon location, position of the sun, and color information may be extracted from the image and associated with the image to enable a user search of the images. For example, the attributes and category could be stored as metadata for an image.
In some embodiments, a user may search a categorized collection of the sky images based on user-determined attributes to find a desired (“source”) image. The user may then insert a portion of the source image into a target image to enhance the target image, such as to replace the sky of the target image with the sky in a source image. The source image and/or the target image may be further modified to produce a modified image that has attributes from both the target image and the source image.
As an example, a user may photograph a monument that includes a grey sky background to produce a target image. The user may desire to replace the grey sky in the target image with a more vibrant blue sky background that includes various dispersed clouds. The user may search a categorized collection of the sky images to locate a source image. Next, the user may, via an imaging application, merge part of the source image into the target image to replace the grey sky with the vibrant blue sky. Finally, the imaging application may adjust the foreground or other aspects of the target image or the portion of the source image that is transferred to the target image to enhance the realism of a modification to the target image.
The description below may expand upon the sky replacement example discussed above to illustrate various embodiments. However, embodiments are not limited to sky images or sky replacement, but may be used with other objects that are common in images such as buildings, landscapes, people, and so forth. Thus, although illustrative examples describe sky replacement, other types of image attribute replacements may be performed using the processes and systems described below.
The processes and systems described herein may be implemented in a number of ways. Example implementations are provided below with reference to the following figures.
Illustrative EnvironmentFIG. 1 is a schematic diagram of anillustrative environment100 where a trained categorizer may categorize images for use by a client device in accordance with embodiments of textual attribute-based image categorization and utilization. Theenvironment100 includesservers102 that are configured to categorize images based on various textual attributes of the images. Textual attributes are attributes of an image that may be used to describe the attributes in common language (e.g., during a search) and for other functions (e.g., sorting images, classification, etc.). As discussed herein, attributes are textual attributes unless otherwise stated. In some embodiments, theservers102 may be in communication with one or more client device(s)104 via anetwork106. The client device(s)104 may enable a user108 to access asource image110 from theservers102 and manipulate atarget image112 with a portion of a selected one of the categorized images. The client device(s)104 may be a server, a desktop computer, a tablet, a mobile computer, a mobile telephone, a gaming console, or a music player, among other possible client devices. As described or referenced herein, thesource image110 is an image that is categorized (indexed) by one or more of theservers102, and may also be interchangeably referred to herein as a categorized image or an indexed image. Thetarget image112 is an image that the user108 intends to manipulate and may also be interchangeably referred to herein as a user image.
In an example configuration,servers102 may include one or more processors (“processors”)114 andsystem memory116. Depending on the exact configuration and type of server,system memory116 may be volatile (such as RAM), non-volatile (such as ROM, flash memory, etc.) or some combination of the two.System memory116 may include animage categorization application118 to perform the functions described herein.
Theservers102 may be in communication with various storage devices (SDs) such as araw image SD120 and asearchable image SD122. Theraw image SD120 may include a collection of uncategorized images which may be obtained from a variety of locations, such as disbursed storage locations across the Internet, photography hosting sites, archived sites, and so forth. In some embodiments, theimage categorization application118 may crawl the Internet and/or other networks of computers to locate and copy images from an original location to theraw image SD120. Thesearchable image SD122 may be used to store thesource images110 after they have been categorized by the various modules of theimage categorization application118.
In accordance with one or more embodiments theimage categorization application118 may include atraining module124 and a categorization module126, among other possible modules. Thetraining module124 may be used to train theimage categorization application118 to categorize images by textual attributes that are identified in an image. For example, thetraining module124 may be trained to categorize an image as having a blue sky, a cloudy sky, or a sunset, among many possibilities, as described further below. In addition, thetraining module124 may be trained to identify and select a sky region of an image (as distinguished from objects, landscape, or other non-sky features).
In some embodiments, the categorization module126 may categorize an image based on trained attribute detection by thetraining module124. For example, the categorization module126 may select an image from theraw image SD120, analyze the image based on one or more trained classifier algorithms via aprocess128, and then associate textual attributes with the image, or portion thereof, which may then be stored as thesource image110 in thesearchable image SD122.
The client device(s)104 may also include one or more processors (“processors”)130 andsystem memory132. Thesystem memory132 may be volatile (such as RAM), non-volatile (such as ROM, flash memory, etc.) or some combination of the two.System memory132 may include animage application134 to perform the functions described herein. In accordance with one or more embodiments, theimage application134 may include a search module136 and afeature replacement module138, among other possible modules.
In some embodiments, the search module136 may enable the user108 to select desired attributes via a user interface and then receive images in a search result that match one or more attributes of the search. For example, the search module136 may enable the user108 to specify sky replacement attributes that include a category (e.g., blue sky, cloudy, sunset, etc.), horizon, sun location, a richness level, layout, and/or other possible attributes. The attributes may be unique to the type of image that is categorized, thus attributes associated with images of people may be different than attributes associated with sky replacement images. In various embodiments the search module136 may enable a user to refine a search and/or navigate though search results to locate an image that is similar to another identified image. In some embodiments all of (or portion thereof) the search module136 may be stored by or executed by the one ormore servers102.
Thefeature replacement module138 may enable a user to replace a feature in thetarget image112 with a feature from thesource image110. For example, thefeature replacement module138 may replace the sky from thetarget image112 with the sky in thesource image110. In some embodiments, thefeature replacement module138 may perform various modifications to thetarget image112 to enhance the realism of the target image. For example, thefeature replacement module138 may adjust colorization, tone, contrast, or other aspects of the target image upon or after modification of thetarget image112 with portions of thesource image110.
As an example, thetarget image112 may undergo asky replacement process140 by thefeature replacement module138 to replace the sky. The user108 may then replace a relatively featureless blue sky in the target image with a cloudy sky, rich blue sky, or sunset sky, among various other feature replacement possibilities that are in thesource images110.
Illustrative OperationFIG. 2 is a flow diagram of anillustrative process200 of implementing components of a system that include training, categorizing, and imaging in accordance with embodiments of the disclosure. Theprocess200 is illustrated as a collection of blocks in a logical flow graph, which represents a sequence of operations that can be implemented in hardware, software, or a combination thereof. In the context of software, the blocks represent computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform the recited operations. Generally, computer-executable instructions include routines, programs, objects, components, data structures, and the like that perform particular functions or implement particular abstract data types. The order in which the operations are described is not intended to be construed as a limitation, and any number of the described blocks can be combined in any order and/or in parallel to implement the process. Other processes described throughout this disclosure, in addition toprocess200, shall be interpreted accordingly. Theprocess200 will be discussed with reference to theenvironment100 ofFIG. 1.
At202, theimage categorization application118 may acquire images for categorization (indexing) and/or for use by theimage application134. In some embodiments, the images by be obtained by crawling a network, such as the Internet, to obtain images. For example, a list of sites that include public licensed images (i.e., copyright free) may be used as a source of the images and may include sites that enable photographers to post images. The acquired images may be stored in theraw image SD120.
At204, theimage categorization application118 may implement a training phase via thetraining module124, which may execute sub-operations204(1)-204(3) to accomplish training of a classifier to categorize images based on textual attributes. At204(1), thetraining module124 may prepare test data for the training process. For example, images may be selected as representative samples for various attributes, which may be labeled for the training phase. At204(2), an image portion to be categorized may be identified by thetraining module124. For example, in the sky replacement scenario, a sky region of an image may be labeled as “cloudy” at204(1), and selected at204(2). At204(3), thetraining module124 may be trained to execute a region segmentation process that identifies attributes of the image portions using a trained codebook.
In accordance with various embodiments, at206 theimage categorization application118 may implement a categorization phase having sub-operations206(1)-206(3). At206(1), the categorization module126 may perform region segmentation, as trained at204(3), to identify one or more attributes of the image. At206(2), the categorization module126 may cut out the image portion, such as by identifying the image portion boundary and removing extraneous data. At206(3), the categorization module126 may identify additional textual attributes. In the sky replacement example, the additional attributes may include a horizon line, a position of the sun, a richness factor, and a layout factor, among other possible textual attributes.
In some embodiments, theimage application134 may perform an imaging phase, which may execute sub-operations208(1)-208(3) to accomplish selection of an image and modification of thetarget image112. At208(1), the search module136 may query images from thesearchable image SD122 based on user-selected attributes. At208(2), the search module136 may receive an image selection from the user108 as thesource object110. At208(3), thefeature replacement module138 may modify thetarget image112 with thesource images110 that is selected at the operation208(2).
FIGS. 3-12 provide further details on thetraining phase operation204, thecategorization phase operation206, and theimaging phase operation208, which are divided into the following sections of Illustrative Training, Illustrative Categorizing, and Illustrative Imaging, respectively. Each of theFIGS. 3-12 is discussed with reference to theenvironment100 ofFIG. 1 and/or any preceding figures in this description.
Illustrative TrainingFIG. 3 is a flow diagram of anillustrative training process300 in accordance with embodiments of the disclosure. Thetraining process300 may be performed by thetraining module124 and/or other appropriate modules stored on theservers102 or theclient devices104.
At302, thetraining module124 may receive training images for training the categorization module126. The training images may be samples of each category or unique instance of an attribute, category, or the like. For example, when training a sky replacement application, the training images may include a number of images of each category that may include, without limitation, blue sky images, cloudy sky images, and sunset images.
At304, thetraining module124 may receive one or more labels for the training images. For example, the labels may indicate the category (e.g., blue sky, cloudy, etc.) and/or other textual attribute information (e.g., location of sun, horizon position, sky boundary, etc.). In another illustrative example that involves a building replacement application, the labels may indicate a type of building (e.g., office tower, house, parking garage, etc) and/or other attribute information (e.g., number of floors, color, etc.).
At306, thetraining module124 may identify a focus portion of an image (e.g., the sky region, etc.). For example, the focus portion may be selected using any available image cutout tool known in the art to separate the focus portion from the image (e.g., sky and non-sky regions).
At308, the image size may be reduced to expedite processing by the one ormore processors114 when training the categorization module126. In some embodiments, the training image may be resized such that the width and height do not exceed about 400 pixels.
At310, thetraining module124 may perform region segmentation that includes sub-processes of category training at310(1) and layout training at310(2), each being discussed in turn.
In some embodiments, thetraining module124 may train multiple classifiers (e.g., one classifier for each category) at310(1). In some embodiments, thetraining module124 may train a support vector machine (SVM) classifier for each category. For example, a blue sky classifier may be trained using a blue sky training images as positive examples and the other training images (cloudy, sunset, etc.) as negative examples. For each image, thetraining module124 may apply each of the classifiers, which output an SVM score for each category. Thus, the blue sky classifier would have a blue sky score and so forth for each classifier.
The SVM scores describe a degree of membership of an image to the categories. In some embodiments, the scores may be used to arrange the images in a user interface to enable simplified user-selection and navigation of the images. For example, when three categories are used, then three SVM scores are generated and may be used to arrange the images in a triangular chart representing three-dimensional (3D) space based on the three SVM scores. Similarly, when two categories are used, the images may include two scores that enable the images to be plotted on a two-dimensional (2D) chart or a line chart. Representation of the images is described below with reference to a user interface shown inFIG. 10. Ultimately, the SVM scores may be used as a category attribute.
At310(2), thetraining module124 may determine a layout of the image. Thetraining module124 may individually train a sky/non-sky pixel classifier for each of the categories using the training images of302 with the focus portions of306. Each classifier may use a same visual descriptor as used in the categorization training of310(1) and the patches from the sky/non-sky region within the category as the positive/negative examples. The classifier may be implemented via Random Forests® (or another comparable algorithm) to output a soft label ([0,1]) for each pixel. After the pixel level classification, obtained soft labels may be used as the data terms in a graph cut based segmentation to produce a binary focus portion map. After the segmentation, thetraining module124 may discard images where the focus portion is less than a predetermined percentage of the entire image (e.g., <30% of the image area).
In accordance with some embodiments, thetraining module124 may extract layout attributes as follows. First, using the focus portion map, a line of the horizon may be estimated by moving a horizontal line upwards from the bottom of the image, until a number of sky pixels below the line is greater than a predetermined percent (e.g., 5%, etc.). Next, the focus portion may be categorized into a predetermined layout type. In the context of the sky replacement, they layout types may include: full-sky, (e.g., the sky region covers the whole image); object-in-sky, (e.g., the sky region covers the whole image except for one or more portions that may be due to an extraneous object such as a bird); landscape, (e.g., approximately 95%-100% of the pixels above the horizon are sky pixels); normal-sky, (e.g., approximately 75%-95% of the pixels above the horizon are sky pixels); and others (e.g., the remaining images). The layout types may be different for other types of images, such that a building replacement categorizer may use building layout types that are different from the sky replacement examples listed above.
FIG. 4 is a pictorial flow diagram of anillustrative training process400 to train a codebook to perform categorization of images in accordance with some embodiments of the disclosure. Thetraining process400 may be performed by thetraining module124 and/or other appropriate modules stored on one or more of theservers102 or one or more of theclient devices104.
At402, an image is segmented intopatches404. The image may be evenly divided to create a number of thepatches404 for sampling, such as 16×16. Selection of smaller patches (i.e., larger number of the patches404) may enable a classifier to obtain more detailed information on an image while selection of larger patches may increase performance of theprocess400.
At406, thetraining module124 may perform feature quantization using Random Forests® (or another comparable algorithm) to create a codebook. Thetraining module124 may represent each of the training images as a “bag-of-words”, which is a collection of evenly sampled patches (e.g., 16×16 patches, etc.). For example 16×16 patches at400 pixels may be sampled at8-pixel intervals. Each patch may be assigned to a nearest codeword in a visual codebook. The patch may be represented by the concatenation of a scale-invariant feature transform (SIFT) descriptor (histogram) and mean hue, saturation and value (HSV) color. In some embodiments, a codebook with about 2,500 codewords may be trained on about 250,000 patches that are randomly sampled from all training images, such as by using Random Forests® (or another comparable algorithm). Each patch may be assigned to one codeword in onetree408. The image may be represented by a histogram of visual words, such as to describe the category of the image (e.g., blue sky visual words, etc.).
At410, the training module may store thefeature space partition412 for the codebook. The feature space partition may enable categorization of the patches of images, and thus the images, into the categories that are defined for the image (e.g., blue sky, cloudy sky, sunset, or other types of categories).
Illustrative CategorizationFIG. 5 is a flow diagram of anillustrative categorization process500 in accordance with various embodiments of the disclosure. Thecategorization process500 may be performed by the categorization module126 and/or other appropriate modules stored on one or more of theservers102 or one or more of theclient devices104.
At502, the categorization module126 may select images for categorization. The images may be thesource images110 that are stored in theraw image SD120.
At504, the categorization module126 may identify focus portions. For example, the focus portions for a sky replacement image categorization application may include a sky portion as distinguished from a non-sky portion.
At506, the categorization module126 may crop (or cut out) the focus portion of the image for categorization. For example, the sky portion may be isolated to enable further processing of patches of the sky portion for classification.
At508, the image may be optionally reduced in size to enable further processing. For example, the image size may be reduced to 400 pixels in each direction of length and width, or another pixel value, percentage, etc., that is less than the original image size.
At510, the categorization module126 may identify attributes of the image via various sub-processes510(1)-510(3), each discussed in turn.
At510(1), the categorization module126 may determine a category of the image (e.g., blue sky, cloudy, sunset, etc.) based on the codebook score. The patches of each image may be processed by each classifier (for each category) and a score may be computed that is representative of the category of the image.
At510(2), the categorization module126 may determine a layout of the image. As discussed above, the layout may be determined by identifying an approximate horizon line (or other distinguishing feature) and then categorizing the layout accordingly. In the sky replacement example, the layout may be categorized as one of full-sky, object-in-sky, landscape, normal-sky, or other.
At510(3), the categorization module126 may determine other attributes of the image. For example, the categorization module126 may determine a horizon line, a position of the sun, a richness value, or other textual attributes of the image. The textual attributes may be used by a user interface to enable a user to select attributes during a search, which may enable retrieval of source images having the selected attributes. Further details on the other textual attributes are discussed with reference toFIG. 8.
FIG. 6 is a pictorial flow diagram of anillustrative process600 of using the codebook that is described in theprocess400 ofFIG. 4 to categorize images in accordance with one or more embodiments of the disclosure.
At602, the categorization module126 may segment an image604 (e.g., the source image110) into patches. In some embodiments, the patches may be equally sized.
At606, the categorization module126 may perform feature quantization on eachpatch608 to determine a score from the codebook that was created in theprocess400.
At610, the categorization module126 may assign each patch to a codeword in atree612.
At614, the categorization module126 may present each image (via the patches) with a histogram that represents all of the patches. The histogram may be a unique identifier of the image and enable classification of the image, such as one of blue sky, cloudy sky, or sunset. In some embodiments, the histogram may enable arrangement of the image relative to other images by having a score for each category, which is discussed next.
FIG. 7 is a flow diagram of anillustrative process700 of scoring images to determine a category for the image in accordance with various embodiments of the disclosure.
At702, the categorization module126 may calculate a score for each category. For example, an image may have a blue sky score, a cloudy sky score, and a sunset score. The scores may be calculated by classifiers for each category (e.g., blue sky classifier, etc.).
At704, the categorization module126 may compare the category scores to category attributes. The combination of these scores may enable an arrangement of the images with respect to other images (having different histograms/scores) in a user interface. A user interface is discussed below with reference toFIG. 10.
At706, the categorization module126 may determine a category for the image. For example, the image may be labeled as one of blue sky, cloudy sky, or sunset, among other possible categories.
FIG. 8 is a pictorial flow diagram of anillustrative process800 of determining textual attributes of images in accordance with some embodiments of the disclosure.
At802, the categorization module126 may determine the horizon of the image. The horizon of the image may be determined by moving a horizontal line upwards from the bottom of animage804 until a number of sky pixels below the line is greater than a predetermined percent (e.g., 5%, etc.). Ahorizon line806 may be identified and recorded for later use, such as for a searchable textual attribute in a user interface.
At808, the categorization module126 may determine whether a sun is included in the image, and if so, the location of the sun in animage810. An intensity difference between sun and clouds is larger in the CMYK color space for sunset and cloudy skies than for other color spaces which may be analyzed at812 by each color channel. When the image is categorized in the sunset category, the categorization module126 may detect a largest connected component whose brightness is greater than a threshold (e.g., 245 by default) in the magenta (M) channel. When the image is categorized in the cloudy sky category, the categorization module126 may perform a similar detection in the black (K) channel. If the aspect ratio of the detection region is within the range (e.g., [0.4, 2.5]) and the ratio of region's area to the area of region's bounding box is greater than 0.5 (an empirical description to the shape of visible sun), a sun is detected at814. The existence and location of the sun may be identified and recorded for later use, such as for a searchable textual attribute in a user interface.
At816, the categorization module126 may measure a richness value of animage818. The richness value may be a measure of the “activity” in the image which may be determined by the number of edges locatedsky region820 of the image. In some embodiments, the richness of the sky or clouds may be characterized by the amount of image edges. An adaptive linear combination of the edges numbers detected at822 by a Sobel detector and a Canny detector in the sky region may be used to assess the edges, since the Canny detector is good at detecting small scale edges while the Sobel detector is more suitable for middle and large scale edges. In one implementation, the categorization module126 may use the Sobel detector and Canny detector by using ns and nc as the detected edge numbers. The edge number n of the image is defined byequation 1.
n=κ·ns·s(−ns−1000 100)+nc·s(ns−1000 100) Equ. 1
where s(x)=1 1+exp(−x) is a sigmoid function and κ is a constant parameter to make the edge number of Sobel and Canny comparable (empirically set to 8). The equation indicates that if the edge number by the Sobel detector is small, a more weight is given to the Canny detector and vice versa. The categorization module126 may then quantize the edge number into various intervals (e.g., five intervals) so that the set of images in the database are evenly divided. Finally, each image may be assigned to one of the intervals of richness (e.g., 1 through 5, etc.). The richness factor may be identified and recorded for later use, such as for a searchable textual attribute in a user interface.
FIG. 9 is a flow diagram of anillustrative process900 of ranking images based on color signature in accordance with some embodiments of the disclosure. The users108 may find color to be an important characteristic in an image source, particularly when trying to enhance an image in a particular manner.
At902, the categorization module126 may determine a color signature. The color of each sky image may be defined by a color signature as shown inequation 2.
s={wk,ck}Kk=1 Equ. 2
where wkis a weight, ckis a color in LAB space, and K(=3) is the number of color components. The color signature may be obtained offline by clustering all pixels in the sky region using the K-means algorithm.
At904, color signatures may be compared across images. At any time during the search, the user108 may select an image of interest and find more similar results in terms of the color using theprocess900.
At906, the images may be ranked based on the color signature similarity as compared to a selected image. A similarity between two signatures may be measured by the Earth Mover's Distance (EMD). The results may be ranked based on the EMD distance. The ranking of the images may be useful in a user interface having a visual display of the source images as described next in section.
Illustrative ImagingFIG. 10 is anillustrative user interface1000 to enable user selection and manipulation of images based on various user-selected attributes in accordance with one or more embodiments of the disclosure. A received user selection may be used by the search module136 to locate images stored in thesearchable image SD122 that have been categorized by the categorization module126.
In some embodiments, theuser interface1000 may include acategory triangle1002 that may enable user-selection of an image based on the predetermined categories (e.g., blue sky, cloudy sky, sunset). Thecategory triangle1002 is representative of one possible shape that may provide an intuitive display of images that are categorized into three categories. Other shapes, including lines, squares, circles, hexagons, etc. may be better suited when other numbers of categories are used by the categorization module126.
As discussed above in with reference toFIG. 7, each image may have three category (SVM) scores, which can be viewed as a point in a 3D space. A set of points from the images in thesearchable image SD122 may be represented to lie approximately on a flat “triangle” in three-dimensions (3D). To provide a simple interface selection, the search module136 may project these points into a 2D triangle using principal components analysis.
Thecategory triangle1002 may be presented as an equilateral triangle that provides a semantic organization of the source images. When a user traverses the triangle from ablue sky vertex1004 to asunset vertex1006, the images may gradually change from blue sky with white clouds in daytime, to sky with red clouds and a dark foreground at sunset. The images in between theblue sky vertex1004 to thesunset vertex1006 tend to have skies before sunset. Similarly, clouds may gradually change from white to grey when the user108 traverses from the blue-sky vertex1004 to a cloudy-sky vertex1008. The images in the center of the triangle may include a mixture of the three categories.
Theuser interface1000 may enable the user108 to place and move a2D reference point1010 in the triangle. Images within arange1012 surrounding thereference point1010 may be retrieved and ranked in terms of their 2D distance to the reference point. The user may also specify a radius of therange1012 to limit the number of retrieved images.
Theuser interface1000 may include a sun andhorizon selector1014. The user108 may be enabled to intuitively select (e.g., draw, drag, etc.) positions of asun1016 and ahorizon1018, or omit (e.g., delete or omit from selector) the sun and/or horizon. The user selections in the sun andhorizon selector1014 may enable a search by the search module136 of images with attributes that substantially match the information (e.g., sun position, horizon position) that may be selected by the user, which is categorized via theoperations806 and808 of theprocess800.
In some embodiments, theuser interface1000 may also include alayout selector1020. A received layout selection may correspond to searchable attributes of images which are categorized at the operation510(2) of theprocess500.
Theuser interface1000 may include arichness selector1022. A received richness selection may correspond to searchable attributes of images which are categorized at theoperation816 of theprocess800. Thelayout selector1020 and/or therichness selector1022 may be implemented as drop-down selectors or via other techniques to enable user selection of one of many possible layout features or richness levels.
Theuser interface1000 may include a search resultsdisplay section1024 to display results that match any user selections made using the various controls described above including thecategory triangle1002, the sun andhorizon selector1014, thelayout selector1020, therichness selector1022, and/or other controls that may be implemented in the user interface. In accordance with embodiments, thedisplay section1024 may be actively updated when the user changes selections, such as by refreshing the images in the display section when a user drags a mouse across thecategory triangle1002 to move thereference point1010.
The user interface may include a targetimage selection portion1026 to select (e.g., via abrowse button1028, etc.) and display thetarget image112 that may be modified by replacing one or more motions of the target image with a selectedimage1030 from thedisplay section1024. A replacecommand1032 may initiate a feature replacement and generate a modifiedimage1034 for display to the user.
In some embodiments, theuser interface1000 may also include apath search selector1036 to initiate the search module136 to perform a path selection as described below.
FIG. 11 is a flow diagram of anillustrative process1100 of identifying a path of similar images for a path selection. The user108 may desire to find a number of sky images that are similar to a selection in a search result (e.g., displayed in the user interface1000). The search module136 may identify a number of intermediate images that are “between” two selected images such as to continue a color and attribute spectrum from one image to another image. The search module136 may organize images from thesearchable image SD122 to present the attribute spectrum via theuser interface1000.
At1102, the search module136 may combine thecategory triangle1002 and the richness factor determined at theoperation816 to obtain a group of visually similar images. However, other attributes may be selected to create the path.
At1104, the search module136 may rank the images identified at theoperation1102 by color to reduce a number of images to a predetermined quantity (e.g., 10, 20, etc.). In some embodiments, the search module136 may establish an edge between two nodes when the nodes are similar for category, richness, and color attributes. A color similarity (based on EMD distance) may be used as a weight of the edge.
At
1106, the search module
136 may compute a min-max cost for a shortest path between the two nodes to identify a smooth path. For example p(
)={e
0, e
1, . . . , e
s, . . . } may be a shortest path whose max-transition-cost max{e
0, e
1, . . . , e
s, . . . } is not greater than a value
, where e
sis an edge weight on the path. The search module
136 may find the shortest path with minimal max-transition-cost according to equation 3:
p*=argp(ε)εPMinsεP(ε)Maxes Equ. 3
where P={p(
)|>0} contains all shortest paths for various values of
. Because the range of edge weights is limited in this problem, the search module
136 may segment the value
into various levels (e.g., 16 levels, etc.) within the range [0,10] and perform a binary search to obtain an approximate solution.
At1108, the search module136 may enable presentation of the result of the min-max cots shortest path. For example, the results may be displayed in the searchresults display section1024 of theuser interface1000 to enable the user108 to select an image.
FIG. 12 is a pictorial flow diagram of anillustrative process1200 of image modification by merging aspects of a source image into a target image in accordance with one or more embodiments of the disclosure. The operations of theprocess1200 may be performed by the feature replacement module (FRM)138 to modify thetarget image112.
At1202, theFRM138 may receive the selected image1030 (one of ht source images110), such as from theuser interface1000. In some embodiments, theFRM138 may enforce a weak geometric constraint during a user search when thetarget image112 is identified before thesource image110 is selected. For example, Q and R may be regions above a horizon in thetarget image112 and thesource image110 image. TheFRM138 may include an overlap ratio Q∩R/Q∪R be not less than a predetermine value (e.g., 0.75, etc.).
At1204, theFRM138 may insert the selectedimage1030 to replace a feature of thetarget image112. To replace the feature (e.g., the sky, etc.), theFRM138 may replace the sky region of thetarget image112 with the sky region of the selectedimage1030 by aligning their horizons to create a modifiedimage1206.
At1208, theFRM138 may determine whether thetarget image112 needs (or would benefit from) a modification (e.g., color modification, darkening, lightening, etc.). When a modification is selected (by a user or the FRM138), theprocess1200 may proceed to1210. In some embodiments, the modification may be selected at1208 when thetarget image112 is of the blue sky or cloudy sky category.
At1210, theFRM138 may determine attributes of the selectedimage1030. To obtain visually plausible results, theFRM138 may adapt a brightness and color of a foreground in the selectedimage1030 to the foreground of thetarget image112. In some embodiments, theFRM138 may apply a category-specific color transfer in HSV space.
At1212, theFRM138 may compute color transfer variables (shift and variance multiplier) between twosky regions1214 and then apply the variables to the non-sky region to create a modifiedforeground image1216.
At1218, the final image may be outputted, which may be the modifiedimage1206 or the modifiedforeground image1216. When the retrieved image is within the sunset category, theFRM138 may directly transfer colors of the source non-sky region in the retrieved image to the target non-sky region at1218.
Thus, theprocess1200 may enable providing a replacement object, such as a sky, into a target image. In addition, theprocess1200 may make adjustments to other aspects of the target image (e.g., the foreground) to enhance a perceived realism of the modified target image.
CONCLUSIONAlthough the techniques have been described in language specific to structural features and/or methodological acts, it is to be understood that the appended claims are not necessarily limited to the specific features or acts described. Rather, the specific features and acts are disclosed as exemplary forms of implementing such techniques.