RELATED APPLICATIONSThe present application claims priority to U.S. Patent Application Ser. No. 60/946,956, Attorney Docket Number MS1-3567USP1, entitled, “Video Collage”, to Mei et al., filed on Jun. 28, 2007, which is incorporated by reference herein for all that it teaches and discloses.
TECHNICAL FIELDThe subject matter relates generally to video representation, and more specifically, to presenting a video collage from a video sequence for efficient video browsing.
BACKGROUNDRepresenting multimedia in different formats presents many challenges. For instance, the quantity of multimedia data is increasing dramatically in recent years with the popularity of digital capturing devices. While online delivery of video content surged to an unprecedented level in current years, users now face an enormous amount of videos. However, problems include how to effectively and efficiently represent important information encoded in video data while removing redundancy. Another problem is how to represent video content for efficient browsing of video data, whether the video is an unedited home video, a professional video program, or an online video clip.
Various techniques have been attempted to present video content. One technique is a video booklet system that selects a set of thumbnails from an original video and prints the thumbnails out on a predefined set of templates in a variety of forms. However, the predefined booklet templates usually lack a compact layout, since a focus of the video booklet is to support artistic templates and personalized delivery. Another technique is a video summary, which is a stained-glass visualization where the key-frames with an interesting area are packed and visualized like a stained-glass with irregular shapes. The drawback is that stained-glass is not very visually pleasing due to the irregular shapes as well as the unsmooth transitions between these shapes.
There are two more techniques in presenting video content. One is a pictorial summary of video content, which arranges video poster in a timeline to tell an underlying story. Another technique is a video snapshot which is total solution of compact static video summarization. These techniques lack a satisfying presentation layout. Therefore, it is desirable to find ways to construct a collage from a video sequence to understand the video content.
SUMMARYThis Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
In view of the above, this disclosure describes various exemplary methods, computer program products, and user interfaces for providing a compact synthesized video collage for efficient video browsing. The video collage is constructed from a video sequence of video content by selecting representative images from the video content, extracting and resizing regions of interest (ROI) from the representative images from the video content. The described techniques arrange regions of interest on a canvas and preserve a temporal structure of the video content in terms of a layout in the video collage. The video collage offers viewing advantages and convenience to a user of a computing device. The video collage is efficient for browsing large amounts of data in a video presentation while preserving a storyline.
Also, this disclosure illustrates formulating an energy equation that maximizes representativeness of the video content and minimizes transition to address regions of interest for extraction and blending. Furthermore, this disclosure improves a user interface experience by automatically constructing a compact and visually appealing synthesized collage from a video sequence for efficient video browsing. The user may browse video content in a variety of more efficient ways such as in a one dimensional collage, a two dimensional collage, a dynamic or a static collage, key frames, video clips and video content corresponding to the video collage. Thus, the techniques for the video collage offer browsing advantages and convenience to the user of the computing device while preserving a storyline.
BRIEF DESCRIPTION OF THE DRAWINGSThe Detailed Description is set forth with reference to the accompanying figures. The teachings are described with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The use of the same reference numbers in different figures indicates similar or identical items.
FIG. 1 is a block diagram of an exemplary system for a video collage.
FIG. 2 is an overview flowchart showing an exemplary process for the video collage ofFIG. 1.
FIG. 3 is a block diagram showing an exemplary video collage with blending edges.
FIG. 4 is a block diagram showing the exemplary video collage ofFIG. 3 without seams and in a compact layout.
FIG. 5 is a block diagram showing an exemplary user interface for the video collage.
FIG. 6 is a block diagram of an exemplary system for the video collage ofFIG. 1.
DETAILED DESCRIPTIONOverviewThis disclosure is directed to various exemplary methods, computer program products, and user interfaces for generating a video presentation scheme, by combining regions of interest (ROI) into a video collage. Traditional techniques for video presentations cannot be readily applied towards constructing a video collage, since those conventional techniques typically lack compact layout and have irregular visual shapes showing unsmooth transitions between the shapes. Also, the techniques of creating a picture collage from a collection of images cannot be applied towards constructing a video collage. Differences exist between photo and video, where in video, there is an information-intensive media with more redundancy and with better-organized temporal structures, like scene and shot. Thus, the techniques described for generating a video collage allows automatic construction of a compact and visually appealing synthesized video collage from the video content.
In one aspect, the disclosure is directed towards constructing a video collage from images from a photo collection. The method includes extracting and resizing the images from the photo collection and arranging the images on a canvas according to a timestamp.
In another aspect, the techniques for creating the video collage formulates an energy minimization equation that maximizes representativeness of video content by extracting the regions of interest and minimizes transitions between the regions of interest (ROI) by blending these regions. Thus, the techniques extract and blend the regions of interest (ROI) independently in order for optimization to occur.
In another aspect, a user may experience an interface from the following aspects: a compact and visually appealing synthesized collage from a video sequence for efficient video browsing. The user may browse video content in a variety of more efficient ways such as a one dimensional collage, a two dimensional collage, a dynamic or a static collage, key frames, video clips and video content corresponding to the video collage. Thus, the interface for the video collage offers browsing advantages and a variety of browsing manners to the user.
The described techniques for creating the video collage help improve efficiency and provide convenience for the user by constructing a compact and visually appealing synthesized video collage for efficient video browsing. Furthermore, the video collage supports browsing manner to enable the user to view the video collage, and view a corresponding video content, a corresponding video clip, or corresponding key frames. By way of example and not limitation, the video collage described herein may be applied to many contexts and environments. By way of example and not limitation, the video collage may be implemented on web search engines, search engines, video-sharing sites, video search services, content websites, content blogs, movie sites, media centers, and the like. Furthermore, the video collage may be implemented as a kind of online video service which provides a compact and visually appealing tool for browsing and sharing the video content on the Internet.
Illustrative EnvironmentFIG. 1 is an overview block diagram of anexemplary system100 for generating a compact and visually appealing synthesized video collage, which is broadly applicable to any situation in which it is desirable to construct a video collage from video content. Shown is acomputing device102.Computing devices102 that are suitable for use with thesystem100, include, but are not limited to, a personal computer, a laptop computer, a desktop computer, a digital camera, a personal digital assistance, a cellular phone, a video player, and other types of image source. Thecomputing device102 may include amonitor104 to display an exemplary compact synthesized video collage including but not limited to, for browsing purposes.
Thesystem100 includes creating the video collage as, for example, but not limited to, a tool, a method, a solver, a software, an application program, a service, technology resources which include access to the internet, and the like. Here, the video collage is implemented as anapplication program106.
Implementation of the videocollage application program106 includes, but is not limited to, selecting key frames that are representative images ofvideo content108 and are of high quality as well. The videocollage application program106 makes use of thevideo content108 by extracting regions of interest (ROI) from key-frames, which are efficiently packed. The videocollage application program106 enlarges the most salient regions of interest to emphasize the meaningful highlights. Salient regions may describe a relevant part of an image that is a main focus of attention for a typical viewer. The videocollage application program106 arranges the regions of interest without seams and provides transitions between the regions of interest (ROI) that are visually smooth.
The videocollage application program106 preserves a temporal structure of thevideo content108 in terms of the layout in a product, in creating the video collage. The videocollage application program106 includes selecting images from thevideo content108 and extracting and resizing the regions of interest (ROI) to construct theexemplary video collage110 which is shown in thedisplay monitor104. Thevideo collage110 offers an efficientvideo browsing system112.
The video collagesearch application program106 generates theexemplary video collage110 that is applicable towardsvideo browsing112. Here, the videocollage application program106 will provide a one dimensional collage, a two dimensional collage, a dynamic or a static collage, key frames, video clips and video content corresponding to thevideo collage110. The disclosure offers browsing advantages and convenience to the user. The display monitor104 would show a user interface that allows the user of the computing device to browse through theexemplary video collage110 and corresponding video clips, corresponding video content, and corresponding key frames.
Implementation of the Video Collage ProgramIllustrated inFIG. 2 is an overview exemplary flowchart of aprocess200 for implementing the videocollage application program106 to provide a benefit to users by automatically constructing a visuallyappealing video collage110. For ease of understanding, themethod200 is delineated as separate steps represented as independent blocks inFIG. 2. However, these separately delineated steps should not be construed as necessarily order dependent in their performance. The order in which the process is described is not intended to be construed as a limitation, and any number of the described process blocks maybe be combined in any order to implement the method, or an alternate method. Moreover, it is also possible that one or more of the provided steps will be omitted. The flowchart for thevideo collage process200 provides an example of the videocollage application program106 ofFIG. 1.
Shown inFIG. 2 atblock202 identifies utilizing a video sequence of avideo content108 in the videocollage application program106. In order to provide efficient browsing of video data, the videocollage application program106 presents a main story of the video, such as an effective summarization of the video content. For example, theprocess200 preserves the temporal structure of the video content, which makes for efficient browsing and understanding of the whole video content.
Block204 illustrates selecting key frames that are representative images of thevideo content108 that are of high quality, as well. The videocollage application program106 selects representative images consisting of two parts: optimization-based sub-shot selection and key-frame selection. For example, let Ω={SSi} (i=1, . . . , NSS) which denotes all the sub-shots in a video, Θ denotes a subset of Ω with N sub-shots. Thus, the videocollage application program106 selects representative sub-shots as finding an optimal Θ which minimizes an energy function. Shown below is an equation for finding the optimal Θ which minimizes the energy function
where the three parameters (α, β, γ) have the same constraint as in this equation for representativeness energy: Erep(λ)=−(αA(λ)+βQ(λ)+γD(λ)). The terms A(SSi), Q(SSi) and D(Θ) have the same meanings as the representativeness equation and can be computed by rewriting the representativeness equation as:
except that using the key-frame of each sub-shot instead of Ii. Thevideo application program106 solves this problem by a heuristic searching algorithm searching for a sub-shot selection. The algorithm is shown as:
| |
| Input: N,Ω={SSi} |
| Output: Θ |
| while(n ≦ N)do |
| find the sub-shot SSiwith max{A(SSi)+Q(SSi)} in Ω |
| for each SSkin the shot to which SSiis belonging do |
| A(SSk)=A(SSk)−1,Q(SSk)=Q(SSk)−1;Ω=Ω−{SSk} |
| end for |
| Θ = Θ + {SSi} |
| n + +; |
| end while |
| |
In a key-frame selection, the number of key-frames to be selected from each sub-shot is decided according to the camera motion in the sub-shot. The videocollage application program106 classifies camera motions into four types: static, pan, tilt, and zoom. Although more than one image is selected from pan/tilt sub-shot, these two images are blended as one region of interest in thefinal video collage110.
Video or photo presentation can be classified into two paradigms, framed-based or regions of interest (ROI) based. Framed-based paradigm extracts a set of representative key-frames and then arranges these key-frames into a synthesized image according to a temporal structure. Regions of interest (ROI) extract saliency regions in the key-frames and then arrange the key frames in a static or a dynamic manner. Saliency regions may pertain to a relevant part of an image that is a main focus of attention for a typical viewer. Theprocess200 enlarges the most salient regions of interest (ROI) to emphasize the meaningful highlights.
Inblock206, theprocess200 extracts regions of interest (ROI) from the representative key-frames in the video sequence and resizes regions of interest according to their saliency. The regions of interest may be fixed to a shape, including but not limited to a rectangle, a square, a triangle, and the like, and are arranged by a redefined temporal order.
In another implementation, the regions of interest may not be fixed to any particular shape, but may include a free form shape without any defined temporal order. The free form shape supports arbitrary shapes of regions of interest (ROI). For example, the free form shape includes ROI design arrangement schemes that include but is not limited to a book, a diagonal, and a spiral. Furthermore, the spiral order and any other order may include but is not limited to, a circle, a heart, a fan, an ellipse, and a mickey mouse shape. Based on the collage styles for the free form shape, the process may order the pixels in the video collage in sequence, order the ROI according to temporal information or saliencies. The videocollage application program106 provides as much informative information as possible and as little background information for thevideo collage110. For example, the videocollage application program106 supplies parts of each key-frame that attracts attention of the user and provides useful information.
Saliency refers to the “importance” or “attractiveness” of the visual information embedded in an image. A salient region may describe a relevant part of an image that is a main focus of a typical viewer's attention. A static image attention model may be adopted to extract ROI based on the saliency map. Then each ROI is resized206 according to its saliency to emphasize the meaningful highlights.
In an exemplary implementation of the videocollage application program106, an energy minimization is formulated. In this implementation, there is a video sequence V containing M frames (images) {Ii} (i=1, . . . , M) and their corresponding ROI maps {Ri} (i=1, . . . , M). The videocollage application program106 selects N (N<<M) representative images from V and arranges the ROI of these images on a video collage C (video collage110). For this implementation, λ represents a feasible solution where λ={Ii, Ri} (i=1, . . . , M).
In an exemplary implementation of the videocollage application program106, each ROI Rihas a set of state variables Ri={li, pi, si}, where liis the label of Riindicating whether Iiis selected (li=1) or not (li=0) in C, piis the spatial position of Riin C, and siis the size of Riafter being resized according to its saliency. By the triplet of (li, pi, si), the videocollage application program106 determines whether Iiappears in C and how the corresponding Riis presented in C (i.e. the position and size).
Block208 represents the videocollage application program106 incorporating several desired properties. In particular, two measurements, i.e., representativeness and transition, are used to solve the issue of regions of interest by extracting and blending these items separately for optimization.
Block208 represents maximizing representativeness and minimizing transition in which the videocollage application program106 creates an energy minimization equation to find the best λ to minimize an energy or a cost E(λ). The energy minimization equation is: E(λ)=ω1Erep(λ)+ω2Etrans(λ)
Subject to Σi=1Mλi=N
where Erep(λ)denotes the cost from representativeness of λ,Etrans(λ)denotes the cost of any transition that is not visually smooth, ω1and ω2are two predefined weights controlling the relative strength of each energy term.
Representativeness Cost Erep(λ)The representativeness cost is associated with how the selected images represent video content. The videocollage application program106 suggests that a saliency, a quality, and a distribution of the selected image set should be taken into account in measuring the representativeness. Therefore, representativeness energy is defined as a combination of each configuration as follows:
Erep(λ)=−(αA(λ)+βQ(λ)+γD(λ))
where α+β+γ=1,0≦α,β,γ≦1. A(λ),Q(λ) and D(λ) measures the saliency, the quality, and the distribution of the selected images, respectively. In order to incorporate the resizing strategy for eachROI206, the equation for representativeness energy is rewritten in more details as follows:
where A(Ii, Ri) measures the saliency or importance of Iiand can be computed by an image attention model; the quality of Ii, i.e. Q(Ii, Ri), is derived from color contrast C(Ii,Ri) and blurring degree B(Ii, Ri); Amaxis the maximal saliency in λ;ε(1≦ε≦2) is a constant to control the resizing of ROI of Ii. D(λ) measures a temporal distribution of λ, where the sense of selected images are uniformly distributed such that the content can be preserved as more as possible. Thus, D(λ) can be defined as:
where p(Ii, Ri)=(interval between IiandIi+1)/(the total duration of video). Intuitively, the larger D(λ) is, the more uniform the distribution of λ is.
Transition Cost Etrans(λ)The videocollage application program106 desires a compact and seamless layout of λ in C by minimizing the transition energy item Etrans(λ). Given the selected collection of ROI {Ri}(i=1, . . . , M) and collage C, the arrangement of ROI in the collage is expressed as finding an optimal ROI for each pixel p in C, thus p is from one of ROI in λ. The mapping between pixels and source ROI is known as a labeling and denote the label for each pixel L(p), where L(p)∈{1,2, . . . , M}. The videocollage application program106 detects a seam between two neighboring pixels p, q in C if L(p)≠L(q). The videocollage application program106 resizes each ROI in the final collage by a bilinear interpolation according to its saliency, given the spatial layout of selected ROI in C. The videocollage application program106 proposes measuring the transition cost as the sum of color differences across the seams of the resized neighboring ROI:
where R′L(p)(q) denotes the color of pixel q(q ∈ C) in the resized ROI R′L(p).
If the conditions for the maximization of representativeness and the minimization of transition conditions are not satisfied, then theprocess flow200 takes a NO branch to block210 which does not include or use these images as part of constructing thevideo collage110.
Returning to block208, if the conditions for the maximization of representativeness of the regions of interest and the minimization of transition of the ROI conditions are satisfied, then theprocess flow200 takes a YES branch to block212 which includes or uses these regions of interest in constructing the video collage.
Fromblock208, the process may proceed to block212 for blending. Based on the above ROI selection and resizing operations, an optimal set of ROI is obtained which minimizes Erep(λ). To construct a video collage with compact and visually appealing form, the ROI selected should be seamlessly blended to minimize Etrans(λ), with the following properties:
- (1) the spatial layout should be consistent with the temporal order of the selected ROI. Thus, the temporal structure of ROI in the spatial layout is preserved “left to right” and “top to down”;
- (2) the ROI within the same sub-shot should be blended according to the camera motion. Thus, the ROI within the same sub-shot represents the pan by horizontally blending and tilt by vertically blending the images from the same sub-shot;
- (3) all of the ROI should not be overlapped; and
- (4) all of the neighboring ROI should satisfy the seamless transition.
Two conditions, all of the ROI should not be overlapped and all of the neighboring ROI satisfy the seamless transition can be met as follows. The ROI is first put onto thevideo collage110 compactly according to the criterion that the spatial layout should be consistent with the temporal order of the selected ROI and all of the ROI should not be overlapped. Then the transition is represented between the neighboring ROI by low-order statistics with spatial mean and covariance, which is interpreted as a Gaussian model.
There may be times where there is an image with seams. For neighboring pixels p and q, if L(p )≠L(q), a seam exists between them. If there is a seam between S and T, which are two small blending areas (i.e. the area with the distance of less than 20 pixels to the seam) close to the seam of two neighboring ROI Ri and Rj, the ROI blending is performed on S and T. To be exact, for pixels p in S or T, the probabilistic density fs(p) and fT(p) according to Gaussian distribution is:
where μS, and μTare the means of neighboring area of p in S or T, a and b are the edges of S and T. Then, for pixelpbin S or T to be blended, the value after blending I(p b) can be computed as follows:
where Is(p) and IT(P) denotes the value of p in S and T before blending, respectively.
Exemplary Video CollageFIGS. 3 and 4 illustrate exemplary video collages.FIG. 3 illustrates a two dimensional video collage of a home video with blendingedges300 andFIG. 4 illustrates the exemplary video collage ofFIG. 3 without any blending edges.
FIG. 3 shows an exemplary two dimensional video collage with ROI blending edges of ahome video sequence300. The ROI are excerpted from the representative key-frames which are selected from the original video, resized according to the salience, and then arranged without any seams in thevideo collage300. In an exemplary implementation, the video may include but is not limited to, thirty video sequences with 3k shots and 50k sub-shots and the number of ROI may include but is not limited to, ranging from ten to thirty ROI. The temporal structure of the video content is preserved in the order of “left to right”layout302 and “top to down”layout304 as shown in the twodimensional video collage300.
FIG. 4 shows the exemplary two dimensional video collage of thehome video sequence400. The twodimensional video collage400 corresponds to the twodimensional video collage300 shown inFIG. 3, but shown without any blending edges. The temporal structure of the video content is preserved in the order of “left to right”layout402 and “top to down”layout404 as shown in the twodimensional video collage400.
Exemplary Video Collage InterfaceFIG. 5 illustrates an exemplary videocollage user interface500 for the videocollage application program106.FIG. 5 shows a novel video browsing system with auser interface500. The user interface may include but is not limited to four separate panels, shown as panel A at502, panel B at504, panel C at506, and panel D at508. The users can change collage resolution (i.e., the number of ROI in the video collage) by moving themarker510 on the slide bar (i.e., the bar between panel A at502 and panel B at504) vertically to view the video collage content in different resolution.
In one aspect, the videocollage user interface500 supports a two dimensional static collage. For example, the two dimensional collage may be shown in panel A at502. By the user left clicking on a specific ROI, the user may access the corresponding video content shown in panel B at504.
In another aspect, the videocollage user interface500 supports a two dimensional dynamic collage. For example, the two dimensional collage may be shown in panel A at502. By the user right-clicking on a specific ROI, the user may select playing a corresponding video clip in panel A at502 or playing all of the clips in panel A at502 on a pop-up menu. There are thumbnails corresponding to a short video clip. Advantages of this representation are that thevideo collage110 is composed of ROI which makes the collage more compact, the thumbnails in the collage are resized according to saliencies, and the video collage is designed for a single video.
In another aspect, the videocollage user interface500 supports a one dimensional static collage. For example, the one dimensional collage may be shown in panel C at506. By the user left clicking on a specific ROI, the user may access the corresponding video content shown in panel B at504.
In another aspect, the videocollage user interface500 supports a one dimensional dynamic collage. For example, the one dimensional collage may be shown in panel C at506. By the user right-clicking on a specific ROI, the user may select playing a corresponding video clip in panel A at502 or playing all of the clips in panel A at502 on a pop-up menu.
In another implementation, the videocollage user interface500 supports key-frames. For example, the user may view key-frames in panel D at508 and click on a specific key-frame to access the corresponding video content in panel B at504. Through these different methods on the videocollage user interface500, the users can browse the video content very efficiently.
Video Collage SystemFIG. 6 is a schematic block diagram of an exemplarygeneral operating system600. Thesystem600 may be configured as any suitable system capable of implementing the videocollage application program106. In one exemplary configuration, the system comprises at least oneprocessor602 andmemory604. Theprocessing unit602 may be implemented as appropriate in hardware, software, firmware, or combinations thereof. Software or firmware implementations of theprocessing unit602 may include computer- or machine-executable instructions written in any suitable programming language to perform the various functions described.
Memory604 may store programs of instructions that are loadable and executable on theprocessor602, as well as data generated during the execution of these programs. Depending on the configuration and type of computing device,memory604 may be volatile (such as RAM) and/or non-volatile (such as ROM, flash memory, etc.). The system may also include additionalremovable storage606 and/ornon-removable storage608 including, but not limited to, magnetic storage, optical disks, and/or tape storage. The disk drives and their associated computer-readable medium may provide non-volatile storage of computer readable instructions, data structures, program modules, and other data for the communication devices.
Memory604,removable storage606, andnon-removable storage608 are all examples of the computer storage medium. Additional types of computer storage medium that may be present include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by thecomputing device102.
Turning to the contents of thememory604 in more detail, may include anoperating system610, one or more videocollage application program106 for implementing all or a part of the video collage method. For example, thesystem600 illustrates architecture of these components residing on one system or one server. Alternatively, these components may reside in multiple other locations, servers, or systems. For instance, all of the components may exist on a client side. Furthermore, two or more of the illustrated components may combine to form a single component at a single location.
In one implementation, thememory604 includes the videocollage application program106, adata management module612, and anautomatic module614. Thedata management module612 stores and manages storage of information, such as images, ROI, equations, and the like, and may communicate with one or more local and/or remote databases or services. Theautomatic module614 allows the process to operate without human intervention. For example, theautomatic module614 in an exemplary implementation, may allow the videocollage application program106 to automatically construct a compact synthesized collage from a video sequence, and the like.
Thesystem600 may also contain communications connection(s)616 that allowprocessor602 to communicate with servers, the user terminals, and/or other devices on a network. Communications connection(s)616 is an example of communication medium. Communication medium typically embodies computer readable instructions, data structures, and program modules. By way of example, and not limitation, communication medium includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. The term computer readable medium as used herein includes both storage medium and communication medium.
Thesystem600 may also include input device(s)618 such as a keyboard, mouse, pen, voice input device, touch input device, etc., and output device(s)620, such as a display, speakers, printer, etc. Thesystem600 may include a database hosted on theprocessor602. All these devices are well known in the art and need not be discussed at length here.
The subject matter described above can be implemented in hardware, or software, or in both hardware and software. Although embodiments of click-through log mining for ads have been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts are disclosed as exemplary forms of exemplary implementations of click-through log mining for ads. For example, the methodological acts need not be performed in the order or combinations described herein, and may be performed in any combination of one or more acts.