Movatterモバイル変換


[0]ホーム

URL:


US7035435B2 - Scalable video summarization and navigation system and method - Google Patents

Scalable video summarization and navigation system and method
Download PDF

Info

Publication number
US7035435B2
US7035435B2US10/140,511US14051102AUS7035435B2US 7035435 B2US7035435 B2US 7035435B2US 14051102 AUS14051102 AUS 14051102AUS 7035435 B2US7035435 B2US 7035435B2
Authority
US
United States
Prior art keywords
keyframes
shot
scene
frame
importance value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related, expires
Application number
US10/140,511
Other versions
US20030210886A1 (en
Inventor
Ying Li
Tong Zhang
Daniel R. Tretter
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hewlett Packard Development Co LP
Original Assignee
Hewlett Packard Development Co LP
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett Packard Development Co LPfiledCriticalHewlett Packard Development Co LP
Priority to US10/140,511priorityCriticalpatent/US7035435B2/en
Assigned to HEWLETT-PACKARD COMPANYreassignmentHEWLETT-PACKARD COMPANYASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS).Assignors: TRETTET, DANIEL, ZHANG, TONG, LI, YING
Assigned to HEWLETT-PACKARD COMPANYreassignmentHEWLETT-PACKARD COMPANYCORRECTIVE ASSIGNMENT TO CORRECT THE ASSIGNEE, FILED ON 10/25/2002, RECORDED ON REEL 013444 FRAME 0347, ASSIGNOR HEREBY CONFIRMS THE ASSIGNMENT OF THE ENTIRE INTEREST.Assignors: TRETTER, DANIEL R., ZHANG, TONG, LI, YING
Priority to AU2003230369Aprioritypatent/AU2003230369A1/en
Priority to JP2004504147Aprioritypatent/JP4426966B2/en
Priority to EP03724542Aprioritypatent/EP1502210A2/en
Priority to PCT/US2003/014709prioritypatent/WO2003096229A2/en
Assigned to HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.reassignmentHEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS).Assignors: HEWLETT-PACKARD COMPANY
Publication of US20030210886A1publicationCriticalpatent/US20030210886A1/en
Publication of US7035435B2publicationCriticalpatent/US7035435B2/en
Application grantedgrantedCritical
Adjusted expirationlegal-statusCritical
Expired - Fee Relatedlegal-statusCriticalCurrent

Links

Images

Classifications

Definitions

Landscapes

Abstract

A method and system for automatically summarizing a video document. The video document is decomposed into scenes, shots and frames, and an importance value is assigned to each scene, shot and frame. Keyframes are allocated among the shots based on the importance value of each shot. The allocated number of keyframes are then selected from each shot. The number of keyframes may be altered for greater or lesser detail in response to user input.

Description

THE FIELD OF THE INVENTION
The present invention generally relates to summarizing and browsing of video material, and more particularly to automating and customizing the summarizing and browsing process.
BACKGROUND OF THE INVENTION
Digital video is a rapidly growing element of the computer and telecommunication industries. Many companies, universities and even families already have large repositories of videos both in analog and digital formats. Examples include video used in broadcast news, training and education videos, security monitoring videos, and home videos. The fast evolution of digital video is changing the way many people capture and interact with multimedia, and in the process, it has brought about many new needs and applications.
Consequently, research and development of new technologies that lower the costs of video archiving, cataloging and indexing, as well as improve the efficiency, usability and accessibility of stored videos are greatly needed. One important topic is how to enable a user to quickly browse a large collection of video data, and how to achieve efficient access and representation of the video content while enabling quick browsing of the video data. To address these issues, video abstraction techniques have emerged and have been attracting more research interest in recent years.
Video abstraction, as the name implies, is a short summary of the content of a longer video document which provides users concise information about the content of the video document, while the essential message of the original is well preserved. Theoretically, a video abstract can be generated manually or automatically. However, due to the huge volumes of video data already in existence and the ever increasing amount of new video data being created, it is increasingly difficult to generate video abstracts manually. Thus, it is becoming more and more important to develop fully automated video analysis and processing tools so as to reduce the human involvement in the video abstraction process.
There are two fundamentally different kinds of video abstracts: still-image abstracts and moving-image abstracts. The still-image abstract, also called a video summary, is a small collection of salient images (known as keyframes) extracted or generated from the underlying video source. The moving-image abstract, also called video skimming, consists of a collection of image sequences, as well as the corresponding audio abstract extracted from the original sequence and is thus itself a video clip but of considerably shorter length. Generally, a video summary can be built much faster than the skimming, since only visual information will be utilized and no handling of audio or textual information is necessary. Consequently, a video summary can be displayed more easily since there are no timing or synchronization issues. Furthermore, the temporal order of all extracted representative frames can be displayed in a spatial order so that the users are able to grasp the video content more quickly. Finally, when needed, all extracted still images in a video summary may be printed out very easily
As a general approach to video summarization, the entire video sequence is often first segmented into a series of shots; then one or more keyframes are extracted from each shot by either uniform sampling or adaptive schemes that depend on the underlying video content complexity based on a variety of features, including color and motion. A typical output of these systems is a static storyboard with all extracted keyframes displayed in their temporal order. There are two major drawbacks in these approaches. First, while these efforts attempt to reduce the amount of data, they often only present the video content “as is” rather than summarizing it. Since different shots may be of different importance to users, it is preferably to assign more keyframes to important shots than to the less important ones. Second, a static storyboard cannot provide users the ability to obtain a scalable video summary, which is a useful feature in a practical summarization system. For example, sometimes the user may want to take a detailed look at certain scenes or shots which requires more keyframes, and sometimes the user may only need a very coarse summarization which requires fewer keyframes.
What is needed is a system and method to automatically and intelligently generate a scalable video summary of a video document that offers users the flexibility to summarize and navigate the video content to their own desired level of detail.
SUMMARY OF THE INVENTION
The invention described herein provides a system and method for automatically summarizing a video document. The video document is decomposed into scenes, shots and frames, and an importance value is assigned to each scene, shot and frame. A number of keyframes are allocated among the shots based on the importance value of each shot. The allocated number of keyframes are then selected from each shot. The number of keyframes may be altered for greater or lesser detail in response to user input.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1ais a schematic illustration of a hierarchical video structure.
FIG. 1bis a flowchart illustrating one process for creating a video summary according to the invention.
FIG. 1cis a flowchart illustrating one embodiment of the computation of importance values according to the invention.
FIG. 1dis a flowchart illustrating one embodiment of the removal of keyframes according to the invention.
FIG. 2 illustrates the optical flow fields for camera panning, tilting and zooming.
FIG. 3 illustrates the eight directions into which camera motion is quantized.
FIG. 4 illustrates an example of an MPEG Group of Picture structure.
FIGS. 5aand5bare examples of histograms showing camera right panning and zooming, respectively.
FIGS. 6aand6bare graphs of the statistics r and AvgMag, showing a video shot containing a camera panning sequence and a shot without camera motion, respectively.
FIGS. 7aand7bshow the region occupied by skin colors in CbCr color space.
FIG. 8 shows an example where skin color detection recognizes one face while another face is neglected.
FIGS. 9aand9bshow an example where face recognition algorithms fail to detect a face in one frame, and while detecting a face in another frame.
FIGS. 10a10dshow examples of applying a vertical edge operator to select a well focused keyframe.
FIG. 11 is a graph of the computed standard deviation of edge energies for an example video shot.
FIG. 12 shows importance curve for an example video shot.
FIGS. 13aand13billustrate keyframe selection with and without consideration of the frame edge-energy.
FIGS. 14aand14bshow an example of the scalable video summary described herein.
FIG. 15 is a schematic representation of a computer system usable to create a video summary according to the invention.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
In the following detailed description of the preferred embodiments, reference is made to the accompanying drawings which form a part hereof, and in which is shown by way of illustration specific embodiments in which the invention may be practiced. It is to be understood that other embodiments may be utilized and structural or logical changes may be made without departing from the scope of the present invention. The following detailed description, therefore, is not to be taken in a limiting sense, and the scope of the present invention is defined by the appended claims.
According to the present invention, avideo sequence20 is first represented as a hierarchical tree based on the detectedscene22 and shot24 structure as shown inFIG. 1a. As used herein, ashot24 is defined as a video segment captured during a continuous shooting period and ascene22 is composed of a set of semanticallyrelated shots24. Ashot24 is composed of a series ofindividual frames26. A variety of algorithms exist for shot and scene detection, any one of which may be suitable for use in decomposing avideo sequence20 into itsscene22,shot24 andframe26 structure. Depending upon the type ofvideo sequence20 being analyzed, one algorithm may be preferred over another. For example, an approach particularly suitable for use with home video is presented in U.S. patent application Ser. No. 10/020,255, filed Dec. 14, 2001 and commonly assigned herewith, and incorporated herein by reference.
One process for creating avideo summary80 according to the invention is illustrated inFIG. 1b. After decomposing thevideo sequence20 intoscenes22,shots24 and frames26, eachcomponent scene22, shot24 andframe26 is assigned an importance value based on measurements which are explained in greater detail below. One process for creating importance values according to the invention is illustrated inFIG. 1c. Next, given a desired number of keyframes N, keyframes are distributed among theunderlying scenes22 based on the importance value of thescenes22, where the more important thescene22, the more keyframes it is assigned. Then, within eachscene22, the assigned number of keyframes are further distributed among the scene'scomponent shots24, based on the importance value of theshots24. Finally, the designated number of keyframes are selected from the underlying video frames26 of each shot24 according to the importance value of theframes26. Using this process, ascalable video summarization80 which assigns more keyframes to important scenes, shots and frames is achieved. Moreover, to meet the users' needs for flexible video content browsing, the specified number of keyframes may be allowed to grow and shrink as a user navigates the video content.
It becomes readily apparent that the performance of the invention described herein depends heavily on the definition and computation of the three categories of importance measurement with respect to thescenes22,shots24 and frames26, respectively. The computation of the importance values is described in greater detail below.
Scene Importance Computation
Three factors are considered in determining the importance of ascene22 according to one embodiment of the invention: 1) the scene length in terms of the number of frames in the scene; 2) the activity level of the scene; and 3) the number of component shots contained in the scene. The underlying rationale for the above considerations is that longer scenes, higher activity, and more shots are all indicative of an important scene. For example, if an interesting subject attracts the videographer's attention, more time will usually be spent capturing video than if the subject is uninteresting. Also, when the underlying video content of ascene22 is highly dynamic (usually characterized by lots of camera motion, many object activities and a large number of contained shots), thescene22 has complex contents, and thus deserves more keyframes.
Determination of scene length and the number ofcomponent shots24 contained in ascene22 is a straightforward process requiring little other than counting the number offrames26 andshots24 in ascene22. Any suitable method of countingframes26 andshots24 may be employed to that end. The more difficult aspect is quantifying the activity level of ascene22.
To quantify the activity level for ascene22, the frame-to-frame color histogram difference for each consecutive frame pair within the scene is computed and their average is used as the scene's activity level indicator. Although the histogram difference is not a very accurate motion indicator, it is fast to compute and the results are sufficient for the scene level. More accurate, but also more time-consuming motion vector computation could also be used, as is described below with respect to computing the importance values ofindividual shots24. Assuming thevideo sequence20 being analyzed contains a total of SN scenes, and denoting scene i's importance by IMi, the scene importance is computed as:
IMi=α1×lii=1SNli+β1×HDii=1SNHDi+γ1×SHii=1SNSHi
where α1, β1, and γ1 are weighting coefficients that sum to 1, liis scene i's length, HDiis its average histogram difference, and SHiis the number of its contained shots. If the total number of desired keyframes is N, then the number of keyframes Niassigned to scene i is Ni=IMi×N.
The values of α1, β1, and γ1 are empirically determined. According to one embodiment of the invention, the value of α1 may range from 0.10 to 0.20, the value of β1 may range from 0.30 to 0.50, and the value of γ1 may range from 0.40 to 0.60.
Shot Importance Computation
Three factors are considered in determining the importance of ashot24 according to one embodiment of the invention: 1) the shot length in terms of the number of frames in the shot; 2) the activity level of the shot; and 3) the detected camera motion, where camera panning is primarily considered. The rationale for considering shot length and activity level is similar to that described above with respect to determining the importance of ascene22. The reason for including camera motion detection is that shot content tends to be more complex when certain camera motion exists and thus theshot24 deserves more keyframes.
As with determining the importance of ascene22, shot length may be determined by counting the number offrames26 in ashot24 using any suitable counting method.
To compute the activity level of ashot24, the amount of motion between every referenced frame pair within the shot is computed, and their average is used to indicate the activity level of theshot24. Since many video sequences20 (especially home videos) are digitized and compressed into H.26X or MPEG-X format, motion vector information may be directly obtained from the original bit stream which includes predictively coded frames26. In particular, given a predictively coded frame, say, a P-frame, the magnitude of the motion vector mv for every macroblock is first computed, then the average magnitude of all motion vectors mv in the P-frame is used to indicate the activity level. If the video data does not include predictively coded frames26, thevideo sequence20 may either be translated to a format which does include predictively coded frames26, or other methods known in the art may be used to determine and quantify motion vector mv information.
Camera motion detection has been explored in the prior art, and most existing methods are based on analyzing the optical flow computed between consecutive images. Basically, all camera motion analysis work can be categorized into two classes: 1) algorithms that define an affine model for representing camera motion and estimate model parameters from the computed optical flow; and 2) algorithms that directly analyze the observed optical flow pattern without any motion models by using the angular distribution or the power of optical flow vectors. (See, for example, J. Kim, H. S. Chang, J. Kim and H. M. Kim, “Efficient camera motion characterization for MPEG video indexing”, ICME2000, New York, 2000). Although either class of algorithms may be used, the first class of algorithms is sensitive to camera shaking and jerky motion, so for some applications, such as home video, the second class of algorithms are preferred.FIG. 2 shows the ideal optical flow patterns for three typical types of camera motion including camera panning, tilting and zooming.
Since estimation of the optical flow is usually based on gradient methods or block matching methods using raw video data, it can be very computationally expensive. According to one embodiment of the present invention, it is contemplated that ready-to-use motion vector mv information is embedded in the video data bit stream (as in MPEG-X or H.26X formats). This information may be used as an alternative to estimation of the optical flow to reduce the computational load. Camera motion may be detected by analyzing the layout pattern of the extracted motion vectors mv. As stated above with respect to determining theshot24 activity level, if the video data does not include predictively codedframes26 containing motion vector mv information, the video may either be translated to a format which does include predictively coded frames, or other methods known in the art may be used to determine and quantify motion vector information.
As shown inFIG. 3, camera motion is quantized into eight directions. Each direction includes the nearest sub-region along the counter-clockwise direction. For example, if the motion vector mv sits in the region of 0–45 degrees, it is indexed asdirection1. The discussion below focuses on the detection of camera panning, as this is the major camera motion observed in a home video. Ideally, during a camera panning, all motion vectors mv should unanimously point indirection1 or5, as shown inFIG. 3. However, due to camera shake, there will also be motion vectors sitting mv in the regions ofdirections1 and8, ordirections4 and5.
In a typical MPEG video sequence, there are three frame types: I, P and B frames. All I-frames are intra-coded without motion estimation and compensation, while P-frames are predictively coded from either a previous I-frame or P-frame. Each macroblock within a P-frame could be intra-coded, forward-predicted or simply skipped. To further improve the compression ratio, a B-frame is defined which could be bi-directionally predictively coded from previous and future I- and P-frames.FIG. 4 shows a typical MPEG GOP (Group Of Picture)structure30 which contains fifteen frames with a pattern of IBBPBBPBB . . . . Since a B-frame may contain both forward- and backward-predicted motion vectors mv which will likely confuse camera motion detection, all B-frames are discarded, and only P-frames are used. This is acceptable since with a typical 29.97 frame per second rate, there may be eight P-frames within a second, and a typical camera motion will usually last longer than one second.
Three major steps are involved in the method described herein for detecting camera motion. Instep1, the motion vectors mv of each P-frame are categorized into the above described eight directions, and adirection histogram32 is computed. If aframe26 does belong to a camera motion sequence, say, a right camera panning, then it should have a majority number of motion vectors mv indirections1 and8, otherwise the motion vectors mv may be scattered without a major direction represented. Moreover, a continuous series of P-frames are required to present a similar motion pattern before declaring that a camera motion is detected. This is illustrated inFIGS. 5aand5b.FIG. 5ashows ahistogram32 of 8 P-frames within a right panning sequence, andFIG. 5bshows ahistogram32′ of 8 P-frames within a camera zooming sequence. It is very apparent that almost all P-frames inFIG. 5apresent a similar pattern with the major direction pointing to right. InFIG. 5b, the motion vectors mv are almost equally distributed along each direction which characterizes a zooming sequence.
Instep2 of the method for detecting camera motion, the directional motion ratio r and the average magnitude of the directional motion vectors (AvgMag) of the P-frame are computed. In case of a right camera panning, r is the ratio of the amount of motion vectors mv alongdirections1 and8 over the total number of motion vectors mv contained in theframe26. If r is larger than a certain threshold, say, 0.6, theframe26 is indexed as a candidate. The value of r is selected empirically, and in one embodiment of the invention may have a value in the range of 0.5 to 0.7. AvgMag is simply the average of the magnitudes of all motion vectors mv indirections1 and8.
Instep3 of the method for detecting camera motion, the above calculations are repeated for every P-frame within a given shot. If asequence40 of candidates with sufficient length is observed and its average AvgMag is larger than a preset threshold, a camera panning sequence is declared to be detected; otherwise, no camera motion exists.FIGS. 6aand6beach show graphs of the statistics r and AvgMag, whereFIG. 6ais a shot containing a camera panning sequence andFIG. 6bis a shot without camera motion. By observing them carefully, one can see that only considering r without taking into account the AvgMag, may lead to incorrect detection of camera motion. For example, considering only r would likely lead to a wrong motion detection decision on the shot sequence ofFIG. 6bsince there is also a long candidate sequence caused by a minor shaky motion of the camera. Finally, if more accurate detection results are desired, the standard deviation of the directional motion vectors' magnitude StdMag may also be considered. For example, if there is a continuous camera panning sequence, the StdMag value should be quite small due to the motion consistency.
Now, assume there are a total of SH shots within scene i. Then shot i's importance IMSicould be computed as
IMSi=α2×lsii=1SHlsi+β2×Actii=1SHActi+γ2×Camii=1SHCami
where α2, β2, and γ2 are weighting coefficients that sum to 1, lsiis shot i's length, Actiis its average motion vector magnitude, and Camiis a binary camera motion detection result. Now, given scene i's assigned keyframes Ni, the number of keyframes NSiassigned to shot i is NSi=IMSi×Ni. If NSiis less than 1, the value of NSimay be set to 1 if it is desired that at least one keyframe be extracted from each shot. Alternately, the value of NSimay be set to 0 if it is not preferred that at least one keyframe be extracted from each shot.
The values of α2, β2, and γ2 are empirically determined. According to one embodiment of the invention, the value of α2 may range from 0.3 to 0.5, the value of β2 may range from 0.4 to 0.6, and the value of γ2 may range from 0.0 to 0.2.
Frame Importance Computation
Four factors are considered in determining the importance of aframe26 according to one embodiment of the invention: 1) the percentage of skin-colored pixels in the frame; 2) the number of detected human faces in the frame; 3) the distribution of the frame's edge energy; and 4) the amount of motion activity contained in the frame. The reason for including the first two factors is that, generally speaking, aframe26 that contains a human face will be more informative than, for example, a landscape frame. In case a face is missed by the face detection algorithm, the skin-color detection could compensate for the missed face detection. The last two factors are used to make sure that the extracted keyframe is a well-focused clear image and not a blurry image, such as is caused by a fast camera motion, fast object movement or a bad camera focus. For example, the still image taken after a camera panning is preferred over the image taken during the panning which may be blurred or unstable. Thus, in the case of frame importance, frames26 which contain less activity are preferred.
Skin-color detection has been explored extensively in both face detection and face recognition areas. Primarily two models have been evaluated and used. The YCbCr model is naturally related to MPEG and JPEG coding, and the HSV (Hue, Saturation, Value) model is mainly used in computer graphics. To approximate the skin color subspaces, skin-color patches have been used to define skin tone regions in both models. Also, since the intensity value Y is observed to have little influence on the distribution of skin color, some work has directly performed the skin color classification in the chrominance plane (CbCr) without taking Y into account. (See, for example, H. Wang and S.-F. Chang, “A highly efficient system for automatic face region detection in MPEG video”, IEEE Transaction on Circuit System and Video Technology, vol. 7, no. 4, pp. 615–628, 1997).FIG. 7ashows the hues in CbCr color space, whileFIG. 7bshows the corresponding region occupied by the skin color, which is obtained from data consisting of various still images that cover a large range of skin color appearance (different races, different lighting conditions, etc.). It can be seen that the skin color samples actually form a single and quite compact cluster in the CbCr color space. Derived from this observation, the following rule in terms of RGB is used to classify a color as skin-color:
If (Y>=32) and (G<0.8*R) and (B<G) and (B>0.7*G),
then it is a skin-color.
The reason to include the Y criterion is to exclude regions that are too dark.FIG. 8a(showing a video frame26) andFIG. 8b(a skin color reduction of the video frame inFIG. 8a) show an example where skin color detection allows the man's face to be well recognized, while the woman's face in the back is neglected since it is in shadow.
Face detection and recognition algorithms are known in the art. Any suitable algorithm may be used in the implementation of the invention described herein. Consideration may be given to the sensitivity of an algorithm and the computational load required by a particular algorithm.FIGS. 9aand9bshow instances where a face is not detected (FIG. 9a) and where a face is detected (FIG. 9b).
Due to the casual photography nature of most home video, one can easily find many blurry and badly-focused video frames26. Generally speaking, preferred keyframes will be well-focused clear images with distinct edges. This preference is applied to help identify appropriate keyframe candidates. Specifically, for a given frame, an edge operator is used to find all the edges of the frame. Then, the standard deviation of edge energy is computed. If the standard deviation of edge energy is larger than a preset value, the frame is declared to be well focused and is qualified to be a keyframe; otherwise, the frame is discarded as a keyframe candidate.FIGS. 10athrough10dshow examples of applying a vertical edge operator on two images where one is blurry (FIG. 10a) and the other one clear (FIG. 10c). It can easily be seen that the clear image ofFIG. 10chas very distinct edges (FIG. 10d), while the blurry image ofFIG. 10ahas edges which are barely discernable, if at all (FIG. 10b).
Edges may be detected in several suitable manners. One suitable manner to detect the edge is the use of a “Prewitt” edge operator, where the following two gradients are used:
GR=13[10-110-110-1]andGc=13[-1-1-1000111]
where GRand GCare the row and column gradients, respectively.FIG. 11 shows the computedstandard deviations60 of the edge energies for allframes26 within oneparticular shot24. As can be seen, most of theframes26 in theshot24 have small values except a couple offrames26 with larger numbers. As a matter of fact, this shot24 contains many blurry portions and only a few well-focusedframes26.
Keyframe Selection
Assume there are a total of F frames within shot i. Then frame i's importance IMFicould be computed as
IMFi=α3×PSii=1FPSi+β3×NFii=1FNFi+r3×EStdii=1FEStdi+δ×(1-FActii=1FFActi)
where α3, β3, γ3, and δ are weighting coefficients that sum to 1, PSiis the percentage of skin-colored pixels, NFiis the number of detected faces, EStdiis the standard deviation of computed edge energy, and FActiis the contained motion activity in frame i. It should be noted that since the face and edge detections are both very time-consuming, it is unnecessary to repeat them for every single frame. Instead, a small set of neighboring frames can usually share the same face and edge detection result due to continuity of the video content.
The values of α3, β3, γ3 and δ are empirically determined. According to one embodiment of the invention, the value of α3 may range from 0.1 to 0.3, the value of β3 may range from 0.1 to 0.3, the value of γ3 may range from 0.1 to 0.3, and the value of δ may range from 0.3 to 0.5.
After frame importance has been determined, the number of keyframes NSiassigned to each shot must be selected from all F frames in the shot. If all F frames are sorted in the descending order based on their importance values, theoretically the top NSiframes should be chosen as the keyframes since those frames are the most important ones. However, if one frame has a large importance value, many of its neighboring frames will also have large importance values due to the visual and motion continuities of the video content. Thus, all or many of the allocated keyframes may be taken from the same temporal region of the shot, and may not provide a good representation of the video content.
To provide a better representation of the video content of a shot, either time-constrained keyframe selection or importance adapted keyframe selection may be used. In time-constrained keyframe selection, two additional rules are enforced. First, keyframes should be visually different from each other. Specifically, the newly extracted keyframe should be visually different from all pre-extracted keyframes. Color histogram comparisons are one method which may be used for this purpose. Second, keyframes should be temporally separated from each other. Specifically, all extracted keyframes should be distributed as uniformly as possible among the shot so as to cover the entire video content. A set of well-spread keyframes will usually represent the underlying video content better than a set of temporally clustered keyframes.
In importance-adapted keyframe selection, keyframes are selected by adapting to the underlying importance curve. In particular, the importance values of all frames within the shot are first normalized to form a curve with the underlying area equals to one.FIG. 12 shows theimportance curve70 of one particular shot where the lower (normalized)curve70′ is obtained from the upper raw curve by using a 3×1 mean filter. Next, the entire temporal axis of the shot is partitioned into NSisegments72 (only onesegment72 is shown in the figure) in a way such that the sum of the importance values inside each segment72 (i.e. the area under the curve) equals 1/NSi. Theframe26 having the highest importance within eachsegment72 is then chosen as therepresentative frame26. To ensure that all extracted keyframes are well spread out in the timeline, a time restriction rule like that used in time-constrained keyframe selection may be used.
Based on experimental results, time-constrained keyframe selection generates slightly better results than importance-adapted keyframe selection, yet at the cost of lower speed due to the color histogram computation and comparison. Finally, in time-constrained keyframe selection, the number of extracted keyframes may be less than NSiif the underlying shot has a flat content.
Experimental results verify the effectiveness of the factors used to determine a frame's importance.FIGS. 9aand9bshow two frames whereFIG. 9bis the keyframe extracted after including face detection andFIG. 9ais the original candidate. The image ofFIG. 9bis clearly the better choice, as it allows easier identification of the person in the image.FIGS. 13aand13bshow another two frames where the initially selected keyframe shown inFIG. 13ais replaced by the frame shown inFIG. 13bwhen the edge-energy restriction factor is enforced. The image ofFIG. 13bis more clearly focused and visually more pleasant than the fuzzy image ofFIG. 13a.
After generating the initial keyframe set N, as described above, a new keyframe set N′ may be constructed based on the shot-and-scene structure. A new keyframe set N′ may be desired if the user wants either more keyframes and greater detail (N′>N) or fewer keyframes and less detail (N′<N) in the video summary.
When more keyframes are needed, the additional number of keyframes must be extracted from the underlying video content. Given ND (ND=N′−N) more keyframes that we need to extract, the extra keyframes are assigned to allunderlying scenes22 andshots24 based on their respective importance values, in a manner like that described above when generating the initial keyframe set N. The basic assignment rule is that moreimportant scenes22 andshots24 get more keyframes. A keyframe extraction process similar to that described above may be applied after each shot24 gets the new designated number of keyframes.
When fewer keyframes are needed, the excess keyframes are removed from the initial keyframe set N. As illustrated inFIG. 1d, given ND (ND=N−N′) keyframes that must be removed from the original set N, the ND keyframes are distributed among theunderlying scenes22, where the number of keyframes to be removed from ascene22 is inversely proportional to thescene22 importance. Assuming R frames need to be removed from scene j's keyframe set, the procedure is as follows: Starting from the least important shot24 within thescene22, check each shot24. If it contains more than one keyframe, remove the least important keyframe, and decrease R by 1. If R equals 0, stop; otherwise, continue with thenext shot24. If thelast shot24 is reached, start over again with the leastimportant shot24. In case there is only one keyframe left for every shot24, and R is still greater than 0, then, starting from the least important shot24, remove its last keyframe.
Thus, ascalable video summary80 based on user's preference may be achieved. Moreover, a real-time video summarization may be achieved if the initial keyframe set N is generated offline. The value of a scaleable video summary can be seen in the example where a user wants to navigate along the hierarchical (scene-shot-frame) video tree. For instance, if the user wants a detailed summary ofcertain scenes22 orshots24 while only wanting a brief review of others, the invention as described herein can easily achieve this by using a predefined but tunable scale factor. Specifically, based on the initial keyframe assignment quota, the scale factor may be used to calculate the currently desired number of keyframes for the desired shot, scene, or even the whole sequence. Then the keyframes will be extracted or removed using the scheme discussed above. If the user is unhappy with the default navigation scale, he can easily tune it to his own satisfaction.
The video summarization and navigation system described herein may be implemented on a variety of platforms, such as ahome computer100, so long as the chosen platform possesses aprocessor102 with sufficient computing power, adata storage system104 for storing the video summary, and an interface106 for allowing the user to alter the level of detail of thevideo summary80. Thedata storage system104 may be a hard disk drive or other persistent storage device, or the random access memory of the chosen platform. Thevideo summary80 may be displayed on adisplay device108, for example, on a video monitor, or on a hard copy generated by a printer.
Experimental Results
FIGS. 14aand14bshow an example of thescalable video summary80 described herein, whereFIG. 14ashows three initially generated keyframes for a particular shot, andFIG. 14bshows two more extracted keyframes when the user requires a more detailed look at the underlying content. This shot actually contains a long sequence of camera panning to introduce all present guests, and it can be seen that the two additional keyframes give the user a better understanding of the shot.
Although specific embodiments have been illustrated and described herein for purposes of description of the preferred embodiment, it will be appreciated by those of ordinary skill in the art that a wide variety of alternate and/or equivalent implementations may be substituted for the specific embodiments shown and described without departing from the scope of the present invention. Those with skill in the computer and electrical arts will readily appreciate that the present invention may be implemented in a very wide variety of embodiments. This application is intended to cover any adaptations or variations of the preferred embodiments discussed herein. Therefore, it is manifestly intended that this invention be limited only by the claims and the equivalents thereof.

Claims (31)

US10/140,5112002-05-072002-05-07Scalable video summarization and navigation system and methodExpired - Fee RelatedUS7035435B2 (en)

Priority Applications (5)

Application NumberPriority DateFiling DateTitle
US10/140,511US7035435B2 (en)2002-05-072002-05-07Scalable video summarization and navigation system and method
PCT/US2003/014709WO2003096229A2 (en)2002-05-072003-05-07Scalable video summarization and navigation system and method
EP03724542AEP1502210A2 (en)2002-05-072003-05-07Scalable video summarization and navigation system and method
JP2004504147AJP4426966B2 (en)2002-05-072003-05-07 Scalable video summarization and navigation system and method
AU2003230369AAU2003230369A1 (en)2002-05-072003-05-07Scalable video summarization and navigation system and method

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
US10/140,511US7035435B2 (en)2002-05-072002-05-07Scalable video summarization and navigation system and method

Publications (2)

Publication NumberPublication Date
US20030210886A1 US20030210886A1 (en)2003-11-13
US7035435B2true US7035435B2 (en)2006-04-25

Family

ID=29399443

Family Applications (1)

Application NumberTitlePriority DateFiling Date
US10/140,511Expired - Fee RelatedUS7035435B2 (en)2002-05-072002-05-07Scalable video summarization and navigation system and method

Country Status (5)

CountryLink
US (1)US7035435B2 (en)
EP (1)EP1502210A2 (en)
JP (1)JP4426966B2 (en)
AU (1)AU2003230369A1 (en)
WO (1)WO2003096229A2 (en)

Cited By (49)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20040223052A1 (en)*2002-09-302004-11-11Kddi R&D Laboratories, Inc.Scene classification apparatus of video
US20050002647A1 (en)*2003-07-022005-01-06Fuji Xerox Co., Ltd.Systems and methods for generating multi-level hypervideo summaries
US20050163346A1 (en)*2003-12-032005-07-28Safehouse International LimitedMonitoring an output from a camera
US20050195331A1 (en)*2004-03-052005-09-08Kddi R&D Laboratories, Inc.Classification apparatus for sport videos and method thereof
US20050220345A1 (en)*2004-03-312005-10-06Fuji Xerox Co., Ltd.Generating a highly condensed visual summary
US20050234719A1 (en)*2004-03-262005-10-20Tatsuya HosodaSelection of images for image processing
US20050232588A1 (en)*2004-03-232005-10-20Tatsuya HosodaVideo processing device
US20060044446A1 (en)*2002-11-292006-03-02Porter Robert M SMedia handling system
US20060256131A1 (en)*2004-12-092006-11-16Sony United Kingdom LimitedVideo display
US20060284978A1 (en)*2005-06-172006-12-21Fuji Xerox Co., Ltd.Method and system for analyzing fixed-camera video via the selection, visualization, and interaction with storyboard keyframes
US20070098266A1 (en)*2005-11-032007-05-03Fuji Xerox Co., Ltd.Cascading cluster collages: visualization of image search results on small displays
US20080112684A1 (en)*2006-11-142008-05-15Microsoft CorporationSpace-Time Video Montage
US20080232687A1 (en)*2007-03-222008-09-25Christian PetersohnMethod and device for selection of key-frames for retrieving picture contents, and method and device for temporal segmentation of a sequence of successive video pictures or a shot
US20080304807A1 (en)*2007-06-082008-12-11Gary JohnsonAssembling Video Content
US20090025039A1 (en)*2007-07-162009-01-22Michael BronsteinMethod and apparatus for video digest generation
US20090066838A1 (en)*2006-02-082009-03-12Nec CorporationRepresentative image or representative image group display system, representative image or representative image group display method, and program therefor
US20090148133A1 (en)*2004-01-302009-06-11Kazuhiro NomuraContent playback apparatus
US20090244093A1 (en)*2006-09-012009-10-01Zhi Bo ChenMethod and device for adaptive video presentation
US20100095239A1 (en)*2008-10-152010-04-15Mccommons JordanScrollable Preview of Content
US20100109108A1 (en)*2008-11-052010-05-06Seagate Technology LlcStram with composite free magnetic element
US20100281381A1 (en)*2009-04-302010-11-04Brian MeaneyGraphical User Interface for a Media-Editing Application With a Segmented Timeline
US20100281371A1 (en)*2009-04-302010-11-04Peter WarnerNavigation Tool for Video Presentations
US20100278504A1 (en)*2009-04-302010-11-04Charles LyonsTool for Grouping Media Clips for a Media Editing Application
US20100281372A1 (en)*2009-04-302010-11-04Charles LyonsTool for Navigating a Composite Presentation
US20110064318A1 (en)*2009-09-172011-03-17Yuli GaoVideo thumbnail selection
US20120033949A1 (en)*2010-08-062012-02-09Futurewei Technologies, Inc.Video Skimming Methods and Systems
US20120176409A1 (en)*2011-01-062012-07-12Hal Laboratory Inc.Computer-Readable Storage Medium Having Image Processing Program Stored Therein, Image Processing Apparatus, Image Processing System, and Image Processing Method
WO2012158588A1 (en)2011-05-182012-11-22Eastman Kodak CompanyVideo summary including a particular person
US8392183B2 (en)2006-04-252013-03-05Frank Elmo WeberCharacter-based automated media summarization
US8432965B2 (en)2010-05-252013-04-30Intellectual Ventures Fund 83 LlcEfficient method for assembling key video snippets to form a video summary
US8446490B2 (en)2010-05-252013-05-21Intellectual Ventures Fund 83 LlcVideo capture system producing a video summary
US8520088B2 (en)2010-05-252013-08-27Intellectual Ventures Fund 83 LlcStoring a video summary as metadata
US8599316B2 (en)2010-05-252013-12-03Intellectual Ventures Fund 83 LlcMethod for determining key video frames
US8605221B2 (en)2010-05-252013-12-10Intellectual Ventures Fund 83 LlcDetermining key video snippets using selection criteria to form a video summary
US8619150B2 (en)2010-05-252013-12-31Intellectual Ventures Fund 83 LlcRanking key video frames using camera fixation
US8665345B2 (en)2011-05-182014-03-04Intellectual Ventures Fund 83 LlcVideo summary including a feature of interest
US20140119428A1 (en)*2012-10-262014-05-01Cisco Technology, Inc.System and method for providing intelligent chunk duration
US8745499B2 (en)2011-01-282014-06-03Apple Inc.Timeline search and index
US8775480B2 (en)2011-01-282014-07-08Apple Inc.Media clip management
US8875025B2 (en)2010-07-152014-10-28Apple Inc.Media-editing application with media clips grouping capabilities
US8910046B2 (en)2010-07-152014-12-09Apple Inc.Media-editing application with anchored timeline
US8966367B2 (en)2011-02-162015-02-24Apple Inc.Anchor override for a media-editing application with an anchored timeline
US9402114B2 (en)2012-07-182016-07-26Cisco Technology, Inc.System and method for providing randomization in adaptive bitrate streaming environments
US9536564B2 (en)2011-09-202017-01-03Apple Inc.Role-facilitated editing operations
US20170017857A1 (en)*2014-03-072017-01-19Lior WolfSystem and method for the detection and counting of repetitions of repetitive activity via a trained network
US9997196B2 (en)2011-02-162018-06-12Apple Inc.Retiming media presentations
US10158983B2 (en)2015-07-222018-12-18At&T Intellectual Property I, L.P.Providing a summary of media content to a communication device
US11268838B2 (en)2015-08-282022-03-08Crisi Medical Systems, Inc.Flow sensor system including transmissive connection
US11747972B2 (en)2011-02-162023-09-05Apple Inc.Media-editing application with novel editing tools

Families Citing this family (48)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US6870956B2 (en)2001-06-142005-03-22Microsoft CorporationMethod and apparatus for shot detection
US7047494B2 (en)*2002-05-072006-05-16Hewlett-Packard Development Company, L.P.Scalable video summarization
US20040088723A1 (en)*2002-11-012004-05-06Yu-Fei MaSystems and methods for generating a video summary
US7116716B2 (en)*2002-11-012006-10-03Microsoft CorporationSystems and methods for generating a motion attention model
US7143352B2 (en)*2002-11-012006-11-28Mitsubishi Electric Research Laboratories, IncBlind summarization of video content
US7164798B2 (en)*2003-02-182007-01-16Microsoft CorporationLearning-based automatic commercial content detection
US7260261B2 (en)*2003-02-202007-08-21Microsoft CorporationSystems and methods for enhanced image adaptation
US7400761B2 (en)*2003-09-302008-07-15Microsoft CorporationContrast-based image attention analysis framework
US7471827B2 (en)*2003-10-162008-12-30Microsoft CorporationAutomatic browsing path generation to present image areas with high attention value as a function of space and time
US20050228849A1 (en)*2004-03-242005-10-13Tong ZhangIntelligent key-frame extraction from a video
US8411902B2 (en)*2004-04-072013-04-02Hewlett-Packard Development Company, L.P.Providing a visual indication of the content of a video by analyzing a likely user intent
US9053754B2 (en)*2004-07-282015-06-09Microsoft Technology Licensing, LlcThumbnail generation and presentation for recorded TV programs
JP4758161B2 (en)2004-07-302011-08-24パナソニック株式会社 Digest creation method and apparatus
US7986372B2 (en)2004-08-022011-07-26Microsoft CorporationSystems and methods for smart media content thumbnail extraction
US7760956B2 (en)2005-05-122010-07-20Hewlett-Packard Development Company, L.P.System and method for producing a page using frames of a video stream
US20060271855A1 (en)*2005-05-272006-11-30Microsoft CorporationOperating system shell management of video files
US8180826B2 (en)2005-10-312012-05-15Microsoft CorporationMedia sharing and authoring on the web
US7773813B2 (en)*2005-10-312010-08-10Microsoft CorporationCapture-intention detection for video content analysis
US8196032B2 (en)2005-11-012012-06-05Microsoft CorporationTemplate-based multimedia authoring and sharing
US8036263B2 (en)2005-12-232011-10-11Qualcomm IncorporatedSelecting key frames from video frames
US7599918B2 (en)*2005-12-292009-10-06Microsoft CorporationDynamic search with implicit user intention mining
US20080046406A1 (en)*2006-08-152008-02-21Microsoft CorporationAudio and video thumbnails
US8375302B2 (en)*2006-11-172013-02-12Microsoft CorporationExample based video editing
WO2009024966A2 (en)*2007-08-212009-02-26Closevu Ltd.Method for adapting media for viewing on small display screens
US8526489B2 (en)*2007-09-142013-09-03General Instrument CorporationPersonal video recorder
US8345990B2 (en)*2009-08-032013-01-01Indian Institute Of Technology BombaySystem for creating a capsule representation of an instructional video
US8897603B2 (en)*2009-08-202014-11-25Nikon CorporationImage processing apparatus that selects a plurality of video frames and creates an image based on a plurality of images extracted and selected from the frames
US8438484B2 (en)*2009-11-062013-05-07Sony CorporationVideo preview module to enhance online video experience
US8773490B2 (en)*2010-05-282014-07-08Avaya Inc.Systems, methods, and media for identifying and selecting data images in a video stream
CN102402536A (en)*2010-09-132012-04-04索尼公司Method and equipment for extracting key frame from video
WO2012110689A1 (en)*2011-02-182012-08-23Nokia CorporationMethod, apparatus and computer program product for summarizing media content
US20140292759A1 (en)*2011-04-062014-10-02Nokia CorporationMethod, Apparatus and Computer Program Product for Managing Media Content
US9271035B2 (en)*2011-04-122016-02-23Microsoft Technology Licensing, LlcDetecting key roles and their relationships from video
US20120263439A1 (en)*2011-04-132012-10-18David King LassmanMethod and apparatus for creating a composite video from multiple sources
CN102902819B (en)*2012-10-302015-10-14浙江宇视科技有限公司A kind of Intelligent video analysis method and device
JP2016517640A (en)*2013-03-062016-06-16トムソン ライセンシングThomson Licensing Video image summary
GB2518868B (en)*2013-10-032016-08-10Supponor OyMethod and apparatus for image frame identification
US9639762B2 (en)*2014-09-042017-05-02Intel CorporationReal time video summarization
US10089532B2 (en)*2015-02-232018-10-02Kodak Alaris Inc.Method for output creation based on video content characteristics
KR20170098079A (en)*2016-02-192017-08-29삼성전자주식회사Electronic device method for video recording in electronic device
US20170316256A1 (en)*2016-04-292017-11-02Google Inc.Automatic animation triggering from video
US11259088B2 (en)*2017-10-272022-02-22Google LlcPreviewing a video in response to computing device interaction
CN111465916B (en)2017-12-222024-04-23索尼公司Information processing device, information processing method, and program
CN112740713B (en)2018-09-212023-08-22三星电子株式会社Method for providing key time in multimedia content and electronic device thereof
CN110213614B (en)*2019-05-082021-11-02北京字节跳动网络技术有限公司Method and device for extracting key frame from video file
US12249143B2 (en)*2020-07-302025-03-11Ncr Voyix CorporationKeyframe selection for computer vision analysis
EP3968636A1 (en)2020-09-112022-03-16Axis ABA method for providing prunable video
CN115063322A (en)*2022-07-252022-09-16武汉华夏理工学院 An image processing method and system based on deep learning

Citations (14)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US5635982A (en)1994-06-271997-06-03Zhang; Hong J.System for automatic video segmentation and key frame extraction for video sequences having both sharp and gradual transitions
US5708767A (en)1995-02-031998-01-13The Trustees Of Princeton UniversityMethod and apparatus for video browsing based on content and structure
US5933549A (en)1996-06-071999-08-03Matsushita Electric Industrial Co., Ltd.Method and apparatus for image editing using key frame image control data
US5995095A (en)1997-12-191999-11-30Sharp Laboratories Of America, Inc.Method for hierarchical summarization and browsing of digital video
EP1045316A2 (en)1999-04-132000-10-18Canon Kabushiki KaishaImage processing method and apparatus
US6252975B1 (en)1998-12-172001-06-26Xerox CorporationMethod and system for real time feature based motion analysis for key frame selection from a video
US6320669B1 (en)1998-04-082001-11-20Eastman Kodak CompanyMethod and apparatus for obtaining consumer video segments for the purpose of creating motion sequence cards
US6340971B1 (en)1997-02-032002-01-22U.S. Philips CorporationMethod and device for keyframe-based video displaying using a video cursor frame in a multikeyframe screen
US6342904B1 (en)1998-12-172002-01-29Newstakes, Inc.Creating a slide presentation from full motion video
US6535639B1 (en)*1999-03-122003-03-18Fuji Xerox Co., Ltd.Automatic video summarization using a measure of shot importance and a frame-packing method
US6549643B1 (en)*1999-11-302003-04-15Siemens Corporate Research, Inc.System and method for selecting key-frames of video data
US6738100B2 (en)*1996-06-072004-05-18Virage, Inc.Method for detecting scene changes in a digital video stream
US20040125877A1 (en)*2000-07-172004-07-01Shin-Fu ChangMethod and system for indexing and content-based adaptive streaming of digital video content
US20040170321A1 (en)*1999-11-242004-09-02Nec CorporationMethod and system for segmentation, classification, and summarization of video images

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
JPH10232884A (en)*1996-11-291998-09-02Media Rinku Syst:Kk Video software processing method and video software processing device
US6424789B1 (en)*1999-08-172002-07-23Koninklijke Philips Electronics N.V.System and method for performing fast forward and slow motion speed changes in a video stream based on video content

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US5635982A (en)1994-06-271997-06-03Zhang; Hong J.System for automatic video segmentation and key frame extraction for video sequences having both sharp and gradual transitions
US5708767A (en)1995-02-031998-01-13The Trustees Of Princeton UniversityMethod and apparatus for video browsing based on content and structure
US5933549A (en)1996-06-071999-08-03Matsushita Electric Industrial Co., Ltd.Method and apparatus for image editing using key frame image control data
US6738100B2 (en)*1996-06-072004-05-18Virage, Inc.Method for detecting scene changes in a digital video stream
US6340971B1 (en)1997-02-032002-01-22U.S. Philips CorporationMethod and device for keyframe-based video displaying using a video cursor frame in a multikeyframe screen
US5995095A (en)1997-12-191999-11-30Sharp Laboratories Of America, Inc.Method for hierarchical summarization and browsing of digital video
US6320669B1 (en)1998-04-082001-11-20Eastman Kodak CompanyMethod and apparatus for obtaining consumer video segments for the purpose of creating motion sequence cards
US6252975B1 (en)1998-12-172001-06-26Xerox CorporationMethod and system for real time feature based motion analysis for key frame selection from a video
US6342904B1 (en)1998-12-172002-01-29Newstakes, Inc.Creating a slide presentation from full motion video
US6535639B1 (en)*1999-03-122003-03-18Fuji Xerox Co., Ltd.Automatic video summarization using a measure of shot importance and a frame-packing method
EP1045316A2 (en)1999-04-132000-10-18Canon Kabushiki KaishaImage processing method and apparatus
US20040170321A1 (en)*1999-11-242004-09-02Nec CorporationMethod and system for segmentation, classification, and summarization of video images
US6549643B1 (en)*1999-11-302003-04-15Siemens Corporate Research, Inc.System and method for selecting key-frames of video data
US20040125877A1 (en)*2000-07-172004-07-01Shin-Fu ChangMethod and system for indexing and content-based adaptive streaming of digital video content

Non-Patent Citations (11)

* Cited by examiner, † Cited by third party
Title
Corridoni J M et al: Structured representation and automatic indexing of movie information content Pattern Recognition, Pergamon Press Inc. Elmsford, NY, US vol. 31, No. 12, Dec. 1, 1998, pp. 2027-2045, XP004139537.
Dufaux F: "Key frame selection to represent a video" IEEE Int Conf Image Process: IEEE International conference on Image Processing (ICIP 2000) vol. 2, Sep. 10, 2000, pp. 275, 278, XP010529977.
Girgensohn, A., et al., "Time-Constrained Keyframe Selection Technique," IEEE, 1999, pp. 756-761.
Kim, J., "Efficient Camera Motion Characterization for MPEG Video Indexing," IEEE, 2000, pp. 1171-1174.
Lagendijk R L et al: "visual search in SMASH system" Proceedings of the International Conference on Image Processing (ICIP) Lausanne, Sep. 16-19, 1996, New York, IEEE, US vol.1, pp. 671-674, XP010202483.
Masumitsu et al., "Video Summarization Using Reinforcement Learning in Eigenspace," Proc. IEEE Int. Conf. on Image Processing, vol. II, Sep. 2000, pp. 267-270.*
Pfeiffer, S., et al., "Abstracting Digital Movies Automatically," Journal of Communication and image Representation, vol. 7, No. 4, Dec. 1996, pp. 345-353.
Uchihashi et al., "Summarizing Video Using a Shot Importance Measure and a Frame-Packing Algorithm," Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Mar. 1999, pp. 3041-3044.*
Wang, H., "A Highly Efficient System for Automatic Face Region Detection in MPEG Video," IEEE Transactions on Circuits and System for Video Technology, vol. 7, No. 4, Aug. 1997, pp. 615-628.
Yueting Zhuang et al: "Adaptive key frame extraction using unsupervised clustering" Image Processing, 1998 ICIP 98, Proceedings 1998 International Conference on, Chicago, IL Oct. 4-7, 1998, Los Alamitos, CA USA, IEEE, Comput. Soc, US, pp. 866-870, XP010308833.
Zhang, T., "Using Backround Audio Change Detection for Segmenting Video," PDNO 10018002-1.

Cited By (105)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US8264616B2 (en)*2002-09-302012-09-11Kddi R&D Laboratories, Inc.Scene classification apparatus of video
US20040223052A1 (en)*2002-09-302004-11-11Kddi R&D Laboratories, Inc.Scene classification apparatus of video
US20060044446A1 (en)*2002-11-292006-03-02Porter Robert M SMedia handling system
US7739598B2 (en)*2002-11-292010-06-15Sony United Kingdom LimitedMedia handling system
US20050002647A1 (en)*2003-07-022005-01-06Fuji Xerox Co., Ltd.Systems and methods for generating multi-level hypervideo summaries
US20090162025A1 (en)*2003-07-022009-06-25Fuji Xerox Co., Ltd.Systems and methods for generating multi-level hypervideo summaries
US8606083B2 (en)*2003-07-022013-12-10Fuji Xerox Co., Ltd.Systems and methods for generating multi-level hypervideo summaries
US7480442B2 (en)*2003-07-022009-01-20Fuji Xerox Co., Ltd.Systems and methods for generating multi-level hypervideo summaries
US20050163346A1 (en)*2003-12-032005-07-28Safehouse International LimitedMonitoring an output from a camera
US7664292B2 (en)*2003-12-032010-02-16Safehouse International, Inc.Monitoring an output from a camera
US20090148133A1 (en)*2004-01-302009-06-11Kazuhiro NomuraContent playback apparatus
US8081863B2 (en)*2004-01-302011-12-20Panasonic CorporationContent playback apparatus
US7916171B2 (en)2004-03-052011-03-29Kddi R&D Laboratories, Inc.Classification apparatus for sport videos and method thereof
US20050195331A1 (en)*2004-03-052005-09-08Kddi R&D Laboratories, Inc.Classification apparatus for sport videos and method thereof
US20050232588A1 (en)*2004-03-232005-10-20Tatsuya HosodaVideo processing device
US7606462B2 (en)*2004-03-232009-10-20Seiko Epson CorporationVideo processing device and method for producing digest video data
US7711210B2 (en)*2004-03-262010-05-04Seiko Epson CorporationSelection of images for image processing
US20050234719A1 (en)*2004-03-262005-10-20Tatsuya HosodaSelection of images for image processing
US20050220345A1 (en)*2004-03-312005-10-06Fuji Xerox Co., Ltd.Generating a highly condensed visual summary
US7697785B2 (en)*2004-03-312010-04-13Fuji Xerox Co., Ltd.Generating a highly condensed visual summary
US9535991B2 (en)*2004-12-092017-01-03Sony Europe LimitedVideo display for displaying a series of representative images for video
US11531457B2 (en)2004-12-092022-12-20Sony Europe B.V.Video display for displaying a series of representative images for video
US20060256131A1 (en)*2004-12-092006-11-16Sony United Kingdom LimitedVideo display
US20060284978A1 (en)*2005-06-172006-12-21Fuji Xerox Co., Ltd.Method and system for analyzing fixed-camera video via the selection, visualization, and interaction with storyboard keyframes
US8089563B2 (en)*2005-06-172012-01-03Fuji Xerox Co., Ltd.Method and system for analyzing fixed-camera video via the selection, visualization, and interaction with storyboard keyframes
US7904455B2 (en)2005-11-032011-03-08Fuji Xerox Co., Ltd.Cascading cluster collages: visualization of image search results on small displays
US20070098266A1 (en)*2005-11-032007-05-03Fuji Xerox Co., Ltd.Cascading cluster collages: visualization of image search results on small displays
US8938153B2 (en)*2006-02-082015-01-20Nec CorporationRepresentative image or representative image group display system, representative image or representative image group display method, and program therefor
US20090066838A1 (en)*2006-02-082009-03-12Nec CorporationRepresentative image or representative image group display system, representative image or representative image group display method, and program therefor
US8392183B2 (en)2006-04-252013-03-05Frank Elmo WeberCharacter-based automated media summarization
US8605113B2 (en)*2006-09-012013-12-10Thomson LicensingMethod and device for adaptive video presentation
EP2057531A4 (en)*2006-09-012017-10-25Thomson LicensingMethod and device for adaptive video presentation
US20090244093A1 (en)*2006-09-012009-10-01Zhi Bo ChenMethod and device for adaptive video presentation
US20080112684A1 (en)*2006-11-142008-05-15Microsoft CorporationSpace-Time Video Montage
US8000533B2 (en)2006-11-142011-08-16Microsoft CorporationSpace-time video montage
US8363960B2 (en)*2007-03-222013-01-29Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.Method and device for selection of key-frames for retrieving picture contents, and method and device for temporal segmentation of a sequence of successive video pictures or a shot
US20080232687A1 (en)*2007-03-222008-09-25Christian PetersohnMethod and device for selection of key-frames for retrieving picture contents, and method and device for temporal segmentation of a sequence of successive video pictures or a shot
US9047374B2 (en)*2007-06-082015-06-02Apple Inc.Assembling video content
US20080304807A1 (en)*2007-06-082008-12-11Gary JohnsonAssembling Video Content
US8224087B2 (en)2007-07-162012-07-17Michael BronsteinMethod and apparatus for video digest generation
US20090025039A1 (en)*2007-07-162009-01-22Michael BronsteinMethod and apparatus for video digest generation
US20100095239A1 (en)*2008-10-152010-04-15Mccommons JordanScrollable Preview of Content
US8788963B2 (en)2008-10-152014-07-22Apple Inc.Scrollable preview of content
US20100109108A1 (en)*2008-11-052010-05-06Seagate Technology LlcStram with composite free magnetic element
US8359537B2 (en)2009-04-302013-01-22Apple Inc.Tool for navigating a composite presentation
US20100278504A1 (en)*2009-04-302010-11-04Charles LyonsTool for Grouping Media Clips for a Media Editing Application
US20100281382A1 (en)*2009-04-302010-11-04Brian MeaneyMedia Editing With a Segmented Timeline
US20100281371A1 (en)*2009-04-302010-11-04Peter WarnerNavigation Tool for Video Presentations
US9032299B2 (en)2009-04-302015-05-12Apple Inc.Tool for grouping media clips for a media editing application
US20100281381A1 (en)*2009-04-302010-11-04Brian MeaneyGraphical User Interface for a Media-Editing Application With a Segmented Timeline
US8533598B2 (en)2009-04-302013-09-10Apple Inc.Media editing with a segmented timeline
US20100281383A1 (en)*2009-04-302010-11-04Brian MeaneySegmented Timeline for a Media-Editing Application
US9317172B2 (en)2009-04-302016-04-19Apple Inc.Tool for navigating a composite presentation
US20100281372A1 (en)*2009-04-302010-11-04Charles LyonsTool for Navigating a Composite Presentation
US8769421B2 (en)2009-04-302014-07-01Apple Inc.Graphical user interface for a media-editing application with a segmented timeline
US8631326B2 (en)2009-04-302014-01-14Apple Inc.Segmented timeline for a media-editing application
US8571330B2 (en)*2009-09-172013-10-29Hewlett-Packard Development Company, L.P.Video thumbnail selection
US20110064318A1 (en)*2009-09-172011-03-17Yuli GaoVideo thumbnail selection
US9124860B2 (en)2010-05-252015-09-01Intellectual Ventures Fund 83 LlcStoring a video summary as metadata
US8432965B2 (en)2010-05-252013-04-30Intellectual Ventures Fund 83 LlcEfficient method for assembling key video snippets to form a video summary
US8520088B2 (en)2010-05-252013-08-27Intellectual Ventures Fund 83 LlcStoring a video summary as metadata
US8446490B2 (en)2010-05-252013-05-21Intellectual Ventures Fund 83 LlcVideo capture system producing a video summary
US8605221B2 (en)2010-05-252013-12-10Intellectual Ventures Fund 83 LlcDetermining key video snippets using selection criteria to form a video summary
US8619150B2 (en)2010-05-252013-12-31Intellectual Ventures Fund 83 LlcRanking key video frames using camera fixation
US8599316B2 (en)2010-05-252013-12-03Intellectual Ventures Fund 83 LlcMethod for determining key video frames
US9600164B2 (en)2010-07-152017-03-21Apple Inc.Media-editing application with anchored timeline
US8875025B2 (en)2010-07-152014-10-28Apple Inc.Media-editing application with media clips grouping capabilities
US8910046B2 (en)2010-07-152014-12-09Apple Inc.Media-editing application with anchored timeline
US9171578B2 (en)*2010-08-062015-10-27Futurewei Technologies, Inc.Video skimming methods and systems
US20120033949A1 (en)*2010-08-062012-02-09Futurewei Technologies, Inc.Video Skimming Methods and Systems
US10153001B2 (en)2010-08-062018-12-11Vid Scale, Inc.Video skimming methods and systems
US8797354B2 (en)*2011-01-062014-08-05Nintendo Co., Ltd.Computer-readable storage medium having image processing program stored therein, image processing apparatus, image processing system, and image processing method
US20120176409A1 (en)*2011-01-062012-07-12Hal Laboratory Inc.Computer-Readable Storage Medium Having Image Processing Program Stored Therein, Image Processing Apparatus, Image Processing System, and Image Processing Method
US9099161B2 (en)2011-01-282015-08-04Apple Inc.Media-editing application with multiple resolution modes
US8745499B2 (en)2011-01-282014-06-03Apple Inc.Timeline search and index
US8954477B2 (en)2011-01-282015-02-10Apple Inc.Data structures for a media-editing application
US9870802B2 (en)2011-01-282018-01-16Apple Inc.Media clip management
US8886015B2 (en)2011-01-282014-11-11Apple Inc.Efficient media import
US8775480B2 (en)2011-01-282014-07-08Apple Inc.Media clip management
US9251855B2 (en)2011-01-282016-02-02Apple Inc.Efficient media processing
US11747972B2 (en)2011-02-162023-09-05Apple Inc.Media-editing application with novel editing tools
US8966367B2 (en)2011-02-162015-02-24Apple Inc.Anchor override for a media-editing application with an anchored timeline
US9026909B2 (en)2011-02-162015-05-05Apple Inc.Keyword list view
US11157154B2 (en)2011-02-162021-10-26Apple Inc.Media-editing application with novel editing tools
US10324605B2 (en)2011-02-162019-06-18Apple Inc.Media-editing application with novel editing tools
US9997196B2 (en)2011-02-162018-06-12Apple Inc.Retiming media presentations
WO2012158588A1 (en)2011-05-182012-11-22Eastman Kodak CompanyVideo summary including a particular person
US8665345B2 (en)2011-05-182014-03-04Intellectual Ventures Fund 83 LlcVideo summary including a feature of interest
US9013604B2 (en)2011-05-182015-04-21Intellectual Ventures Fund 83 LlcVideo summary including a particular person
US8643746B2 (en)2011-05-182014-02-04Intellectual Ventures Fund 83 LlcVideo summary including a particular person
US9536564B2 (en)2011-09-202017-01-03Apple Inc.Role-facilitated editing operations
US9402114B2 (en)2012-07-182016-07-26Cisco Technology, Inc.System and method for providing randomization in adaptive bitrate streaming environments
US9516078B2 (en)*2012-10-262016-12-06Cisco Technology, Inc.System and method for providing intelligent chunk duration
US20140119428A1 (en)*2012-10-262014-05-01Cisco Technology, Inc.System and method for providing intelligent chunk duration
US20170017857A1 (en)*2014-03-072017-01-19Lior WolfSystem and method for the detection and counting of repetitions of repetitive activity via a trained network
US10922577B2 (en)*2014-03-072021-02-16Lior WolfSystem and method for the detection and counting of repetitions of repetitive activity via a trained network
US20210166055A1 (en)*2014-03-072021-06-03Lior WolfSystem and method for the detection and counting of repetitions of repetitive activity via a trained network
US20200065608A1 (en)*2014-03-072020-02-27Lior WolfSystem and method for the detection and counting of repetitions of repetitive activity via a trained network
US11727725B2 (en)*2014-03-072023-08-15Lior WolfSystem and method for the detection and counting of repetitions of repetitive activity via a trained network
US10460194B2 (en)*2014-03-072019-10-29Lior WolfSystem and method for the detection and counting of repetitions of repetitive activity via a trained network
US10812948B2 (en)2015-07-222020-10-20At&T Intellectual Property I, L.P.Providing a summary of media content to a communication device
US10158983B2 (en)2015-07-222018-12-18At&T Intellectual Property I, L.P.Providing a summary of media content to a communication device
US11388561B2 (en)2015-07-222022-07-12At&T Intellectual Property I, L.P.Providing a summary of media content to a communication device
US11268838B2 (en)2015-08-282022-03-08Crisi Medical Systems, Inc.Flow sensor system including transmissive connection
US11754428B2 (en)2015-08-282023-09-12Crisi Medical Systems, Inc.Flow sensor system including transmissive connection having bonding adhesive between the transducers and the fittings

Also Published As

Publication numberPublication date
JP4426966B2 (en)2010-03-03
AU2003230369A1 (en)2003-11-11
WO2003096229A3 (en)2004-04-01
JP2005525034A (en)2005-08-18
WO2003096229A2 (en)2003-11-20
EP1502210A2 (en)2005-02-02
US20030210886A1 (en)2003-11-13

Similar Documents

PublicationPublication DateTitle
US7035435B2 (en)Scalable video summarization and navigation system and method
US6807306B1 (en)Time-constrained keyframe selection method
WolfKey frame selection by motion analysis
Zhang et al.Video parsing, retrieval and browsing: an integrated and content-based solution
Zabih et al.A feature-based algorithm for detecting and classifying production effects
US7177470B2 (en)Method of and system for detecting uniform color segments
US8316301B2 (en)Apparatus, medium, and method segmenting video sequences based on topic
US7889794B2 (en)Extracting key frame candidates from video clip
US8031775B2 (en)Analyzing camera captured video for key frames
JP5005154B2 (en) Apparatus for reproducing an information signal stored on a storage medium
US8306334B2 (en)Methods of representing and analysing images
US6940910B2 (en)Method of detecting dissolve/fade in MPEG-compressed video environment
JP2005276220A (en)Extraction of intelligent key frames from video
US8320664B2 (en)Methods of representing and analysing images
LianAutomatic video temporal segmentation based on multiple features
Panchal et al.Scene detection and retrieval of video using motion vector and occurrence rate of shot boundaries
KR20050033075A (en)Unit for and method of detection a content property in a sequence of video images
Latecki et al.Extraction of key frames from videos by optimal color composition matching and polygon simplification
Yeo et al.A framework for sub-window shot detection
Farag et al.A new paradigm for analysis of MPEG compressed videos
EP2426620A1 (en)Feature extraction and automatic annotation of flash illuminated video data in unconstrained video streams
FordFuzzy logic methods for video shot boundary detection and classification
Li et al.Scene-based scalable video summarization
LeeVideo analysis and abstraction in the compressed domain
Li et al.Scene-Based Movie Summarization

Legal Events

DateCodeTitleDescription
ASAssignment

Owner name:HEWLETT-PACKARD COMPANY, COLORADO

Free format text:ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LI, YING;ZHANG, TONG;TRETTET, DANIEL;REEL/FRAME:013444/0347;SIGNING DATES FROM 20020410 TO 20020416

ASAssignment

Owner name:HEWLETT-PACKARD COMPANY, COLORADO

Free format text:CORRECTIVE ASSIGNMENT TO CORRECT THE ASSIGNEE, FILED ON 10/25/2002, RECORDED ON REEL 013444 FRAME 0347;ASSIGNORS:LI, YING;ZHANG, TONG;TRETTER, DANIEL R.;REEL/FRAME:014026/0536;SIGNING DATES FROM 20020410 TO 20020416

ASAssignment

Owner name:HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P., COLORADO

Free format text:ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HEWLETT-PACKARD COMPANY;REEL/FRAME:013776/0928

Effective date:20030131

Owner name:HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P., COLORAD

Free format text:ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HEWLETT-PACKARD COMPANY;REEL/FRAME:013776/0928

Effective date:20030131

Owner name:HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.,COLORADO

Free format text:ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HEWLETT-PACKARD COMPANY;REEL/FRAME:013776/0928

Effective date:20030131

FPAYFee payment

Year of fee payment:4

REMIMaintenance fee reminder mailed
LAPSLapse for failure to pay maintenance fees
STCHInformation on status: patent discontinuation

Free format text:PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FPLapsed due to failure to pay maintenance fee

Effective date:20140425


[8]ページ先頭

©2009-2025 Movatter.jp