FIELD OF THE DISCLOSUREThe present disclosure relates generally to video processing, and more particularly to video encoding and transcoding.
BACKGROUNDVideo encoding or transcoding frequently is used to reduce the amount of video data to be stored or transmitted or to convert a video signal from one format to another. Effective transcoding often relies on the accurate detection of features present in the video content, such as blank screens, scene changes, black borders, and the like. Conventional techniques for identifying these features, such as by detecting a change in sound level for identifying a scene change or by determining the number of pixels having a certain color for identifying a black border, often are inefficient or ineffective at identifying the corresponding feature.
BRIEF DESCRIPTION OF THE DRAWINGSThe present disclosure may be better understood, and its numerous features and advantages made apparent to those skilled in the art by referencing the accompanying drawings.
FIG. 1 is a block diagram illustrating a video processing system having video feature detection based on sum of variances metrics in accordance with at least one embodiment of the present disclosure.
FIG. 2 is a diagram illustrating various approaches for calculating a sum of variances metric for an image of a video signal in accordance with at least one embodiment of the present disclosure.
FIG. 3 is a flow diagram illustrating a method for transcoding a video signal based on video features identified through a sum of variances analysis in accordance with at least one embodiment of the present disclosure.
FIG. 4 is a flow diagram illustrating a method for detecting a scene change based on a sum of variances analysis in accordance with at least one embodiment of the present disclosure.
FIG. 5 is a diagram illustrating an example implementation of the method ofFIG. 4.
FIG. 6 is a flow diagram illustrating a method for detecting a black border region or caption region in an image based on a sum of variances analysis in accordance with at least one embodiment of the present disclosure.
FIG. 7 is a diagram illustrating example transcoding operations for an image based on detection of a black border region or caption region in accordance with the method ofFIG. 6.
FIG. 8 is a flow diagram illustrating a method for determining a complexity of an image based on a sum of variances analysis and adjusting a transcoding of the image based on the complexity in accordance with at least one embodiment of the present disclosure.
DETAILED DESCRIPTIONFIGS. 1-8 illustrate techniques for encoding or transcoding a video signal based on video features detected using an analysis of one or more sum of variances metrics for images represented by the video signal. During a typical transcoding process, the variance (often referred to as “VAR”) of the pixels within a macroblock or other block of pixels is calculated and used for motion detection or motion estimation. The variance of a pixel block represents the relative similarity or dissimilarity of the pixels within the pixel block. However, this variance, when considered across a region of pixel blocks, also can be used to identify certain video features for which the variance is expected to be relatively low for the region (e.g., within a black border region due to the constant black pixels) or relatively high for the region (e.g., within a caption region due to the high contrast between the white pixels of the caption text and the black pixels of the background). Moreover, a significant difference between the variances for a region of one image and the variances for the same region of the next image can be indicative of a scene change between the images. Accordingly, in at least one embodiment the variances of pixel blocks for a specified region are summed and a metric representing this sum of variances (SVAR) is be used to determine whether certain types of video features are present in the image. The one or more encoding or transcoding operations then may be performed in response to the identification of the video feature in the image. As one example, the sum of variances of a number or all of the pixel blocks of an image may be used as a metric of the complexity of the image, and this complexity feature of the image may be used for bit allocation for a rate control process. As another example, the sum of variances metric for an image may be compared with a metric representing the sum of variances for one or more preceding images to determine whether the image represents a scene change. In response to detecting the image represents a scene change, the transcoding process can implement the transcoded version of the image as an Intra-coded frame (I-frame) in a new group of pictures (GOP). As yet another example, the sum of variance metrics for one or more columns of pixel blocks or rows of pixel blocks may be used to detect the presence of a black border region or a caption region and the transcoding process adapted accordingly, such as by omitting the black border region/caption region from the corresponding image in the transcoded signal, by determining a true resolution of the active image region surrounded by the black border region or adjacent to the caption region (which can prove useful in setting the threshold for scene change detection), by detecting a scene change based on a presence or change in the black border region, or by assigning a different bit budget to the black border region/caption region while transcoding the image.
FIG. 1 illustrates, in block diagram form, avideo processing system100 in accordance with at least one embodiment of the present disclosure. Thevideo processing system100 includes avideo source102, atranscoding system104, and avideo destination106. Thevideo source102 transmits or otherwise provides one or more video signals in analog or digital form. For example, thevideo source102 can comprise a receiver for a satellite or cable transmission system, a storage element (e.g., a hard drive), a server streaming video content over the Internet or other network, a digital versatile disc (DVD) or Blu-Ray™ disc, and the like. Thevideo destination106 can comprise any of a variety of intermediate or final destinations of a transcoded video signal, such as a storage element, a networked computer, set-top box, or television, and the like. Thetranscoding system104 transcodes avideo signal108 received from thevideo source102 to generate a transcodedvideo signal110 for output to thevideo destination106. For example, thetranscoding system104 may be implemented as a system-on-a-chip (SOC) or other component of a set-top box, personal video recorder (PVR), media gateway, or network attached storage (NAS). Thevideo signal108 and the transcodedvideo signal110 each can be encoded in accordance with a digital video format such as H.264, MPEG-4 Part 10 Advanced Video Coding (AVC) or other digital format such as a Moving Picture Experts Group (MPEG) format (such as MPEG1, MPEG-2 or MPEG4), Quicktime format, Real Media format, Windows Media Video (WMV) or Audio Video Interleave (AVI), or another digital video format, either standard or proprietary.
Thevideo processing system100 can represent any of a variety of video systems in which encoding or transcoding can be advantageously used. For example, in one embodiment, thevideo processing system100 comprises a satellite or cable television system whereby video content is streamed from a broadcaster to a set-top box at a customer's premises. In this example, thevideo destination106 can include, for example, a non-volatile memory at the set-top box and thetranscoding system104 can include a SOC at the set-top box for use in transcoding the video content and providing the transcoded video content to the non-volatile memory. As another example, thevideo processing system100 can comprise a video content server system, whereby thevideo source102 comprises a hard drive storing original video content, thevideo destination106 is a remote computer system connected to the video content server via a network, and thetranscoding system104 is used to transcode the video content responsive to current network conditions before the transcoded video content is transmitted to the remote computer system via the network.
In the illustrated embodiment, thetranscoding system104 includesinterfaces112 and114,decoder116,encoder118, and afeature detection module120. Theinterfaces112 and114 include interfaces used to communicate signaling with thevideo source102 and thevideo destination106, respectively. Examples of theinterfaces112 and114 include input/output (I/O) interfaces, such as Peripheral Component Interconnect Express (PCIE), Universal Serial Bus (USB), Serial Attached Technology Attachment (SATA), wired network interfaces such as Ethernet, or wireless network interfaces, such as IEEE 802.11x or Bluetooth™ or a wireless cellular interface, such as a 3GPP, 4G, or LTE cellular data standard. Thedecoder116,encoder118, andfeature detection module120 each may be implemented entirely in hardware, entirely as software stored in amemory122 and executed by aprocessor124, or a combination of hardware logic and software-executed functionality. To illustrate, in one embodiment, thetranscoding system104 is implemented as a SOC whereby portions of thedecoder118, theencoder118, and thefeature detection module120 are implemented as hardware logic, and other portions are implemented via firmware stored at the SOC and executed by a processor of the SOC. The hardware of thetranscoding system104 can be implemented using a single processing device or a plurality of processing devices. Such processing devices can include a central processing unit (CPU), a graphics processing unit (GPU), a microcontroller, a digital signal processor, a field programmable gate array, programmable logic device, state machine, logic circuitry, analog circuitry, digital circuitry, or any device that manipulates signals (analog and/or digital) based on operational instructions that are stored in a memory, such asmemory122.Memory122 may be a single memory device or a plurality of memory devices. Such memory devices can include a hard disk drive or other disk drive, read-only memory, random access memory, volatile memory, non-volatile memory, static memory, dynamic memory, flash memory, cache memory, and/or any device that stores digital information. Note that when the processing module implements one or more of its functions via a state machine, analog circuitry, digital circuitry, and/or logic circuitry, the memory storing the corresponding operational instructions may be embedded within, or external to, the circuitry comprising the state machine, analog circuitry, digital circuitry, and/or logic circuitry.
Thedecoder116 operates to receive thevideo signal108 via theinterface112 and partially or fully decode thevideo signal108 to create a decoded data stream126, which can include pixel information, motion estimation/detection information, timing information, and other video parameters. Thetranscoder118 receives the decoded data stream126 and uses the video parameters represented by the decoded data stream to generate thetranscoded video signal110, which comprises a transcoded representation of the video content of theoriginal video signal108. The transcoding process implemented by theencoder118 can include, for example, a stream format change (e.g., conversion from an MPEG-2 format to an AVC format), a resolution change, a frame rate change, a bit rate change, and the like.
Thefeature detection module120 receivespixel information128 from thedecoder116 as it decodes thevideo signal108 and modifies or otherwise controls viacontrol signaling130 various encoding operations of theencoder118 based on a variance analysis of this pixel information. For example, during the motion estimation/detection process, the variances of some or all of the pixel blocks of an image being analyzed are provided as part of thepixel information128, and thefeature detection module120 uses a sum of variances for pixel blocks within one or more regions of the image to detect the presence of one or more video features. To illustrate, the sum of variances for the pixel blocks of the entire image (or a substantial portion thereof) can be used by thefeature detection module120 to detect a scene change, and in response, instruct theencoder118 to start a new group-of-pictures (GOP) and encoded the image as an intra-frame (I-frame). As another example, thefeature detection module120 can use the sum of variances for the pixel blocks of one or more regions of the image to detect a black border region at the periphery of an active image region or a region of the image used to display caption information, and in response can control theencoder118 so as to either remove the detected black border/caption region from the resulting encoded image, to allocate a lower bit rate to the detected black border and a higher bit rate to the active image region of the image (that is, the region of the image bordered by the black border region or adjacent to the caption region and which contains non-caption/non-black border image content), to allocate a higher bit rate to the detected caption region so as to reduce the potential for subjective artifacts, or to determine a scene change based on the presence of, or a change in, the black border region. As yet another example, the feature detection module can use the sum of variances for the pixels of one or more regions of an image to determine the complexity of the image, and then adjust the rate control parameters applied by theencoder118 to the frame based on this determined complexity.
FIG. 2 provides an example context to illustrate various terms used herein with respect to variance. The illustrated image200 (I-frame) is comprised of a matrix ofpixel blocks202 arranged in a plurality of rows and columns. Theimage200 can be a field of a video frame (e.g., an odd field or an even field in an interlaced implementation) or the complete video frame (e.g., both the odd field and even field combined in a progressive implementation). A typical video image can comprise hundreds, or even thousands, of rows and columns of thesepixel blocks202. However, for ease of illustration, theimage200 is a simplified image comprising sixty-fourpixel blocks202 arranged in eight horizontal rows (labeled 1-8) and eight columns (labeled A-H). Eachpixel block202 in turn comprises a matrix of pixels (not shown) arranged in rows and columns. Eachpixel block202 can comprise one or more macroblocks. To illustrate, in one embodiment each of the pixel blocks202 is a 16×16 macroblock in accordance with the H.263 or H.264 standards, and thus represents 256 pixels arranged in sixteen rows and sixteen columns. In another embodiment, each of the pixel blocks202 comprises a matrix of macroblocks, such as a 4×4 matrix of 16×16 macroblocks. In yet another embodiment, each of the pixel blocks202 comprises only a portion of a macroblock, such as a 4×4 partition of a 16×16 macroblock.
A variance, often denoted as “VAR”, can be determined for some or all of the pixel blocks202 of theimage200, as either part of a motion estimation or motion detection process or as a separate process. This variance typically is determined from the luminance values of the pixels of thepixel block202, although in other embodiments the color information may be used to determine the variance. Techniques for calculating the variance for a block of pixels is well known in the art, and a discussion of one such technique for variance calculation is described in U.S. Pat. No. 6,223,193. For ease of reference, the variance for apixel block202 at row i and column j is denoted as VARi,j. Thus, the variance for thepixel block202 atrow 1 and column A is denoted as VAR1Aand the variance for thepixel block202 atrow 1 and column F is denoted as VAR1F.
The metrics pertaining to the sum of variances (SVAR) for one or more regions of an image can prove useful in identifying certain characteristics of the image. For example, as described in greater detail herein, the sum of variances for some or all of the pixel blocks202 of theimage200 can prove useful in determining whether theimage200 represents a scene change or in determining the relative complexity of the image. As another example, the sum of variances for certain regions of pixel blocks202 can prove useful in identifying black border regions or caption regions in theimage200.
As used herein, the sum of variances metric representing the summation of variances for pixel blocks along a row i of pixel blocks is called a variance row projection and is denoted SVARR[i]. Similarly, the sum of variances metric representing the summation of variances for pixel blocks along a column j of pixel blocks is called a variance column projection and is denoted SVARC[j]. Sum of variances metrics also can be calculated for regions of theimage200 that comprise multiple rows or multiple columns. For example, a sum of variances metric representing a region X composed ofrows 1 and 2 can be calculated and denoted as SVARRegX, or a sum of variances metric representation a region Y composed of columns A and B can be calculated and denoted as SVARRegY. Further, sum of variances metric can be calculated for regions of theimage200 that comprise only portions of rows or only portions of columns. To illustrate, a sum of variances metric may be calculated for an expected active image region composed of those pixel blocks202 that are, for example, both in rows 3-6 and in columns C-F, or a sum of variances metric may be calculated for an expected black border region composed of those pixel blocks202 that are, for example, both in one ofrows 1, 2, 7, or 8 and in one of columns A, B, G, or H. Further, a sum of variances metric may be calculated for all of the pixel blocks202 of the image and denoted as SVARI.
FIG. 3 illustrates anexample method300 of operation of thetranscoding system104 ofFIG. 1. Atblock302, thetranscoding system104 receives thevideo signal108 from thevideo sources102, wherein thevideo signal108 can be received from a remove source over a wired or wireless network, received from a local storage device via an I/O interface, and the like. Atblock304, thetranscoding system104 transcodes thevideo signal108 to generate the transcodedvideo signal110. This transcoding process can include one or more of an encoding format conversion, a resolution conversion, a bit rate conversion, a frame rate conversion, and the like. Atblock306, thetranscoding system104 transmits the transcoded video signal to thevideo destination106, wherein the transcoded video signal can be transmitted to a remote destination via a wired or wireless network, transmitted to a local storage device via an I/O interface, and the like.
As part of the transcoding process ofblock304, thefeature detection module120 processes certain images represented in the video signal108 (or the decoded version thereof) so as to identify certain characteristics in the images and modify the transcoding process in response to identifying the characteristics. The images processed by thefeature detection module120 typically include, for example, the I-frames of the video signal108 (or a subset of the I-frames), although predicted frames (P-frames) and bi-predicted frames (B-frames) also may be analyzed. The processing of an image by thefeature detection module120 includes determining one or more sum of variances metric for an image, or one or more regions of the image, atblock308. For example, the sum of variances metric determined for the image can include the sum of variances for the entire image (SVARI), the variance row projections for one or more rows (SVARR[i]), the variance column projections for one or more columns (SVARC[j]), or the sum of variances for other regions of the image.
Atblock310, thefeature detection module120 uses the one or more SVAR metrics determined atblock308 to detect one or more video characteristics associated with the image. Examples of the video characteristics which may be detected include, but are not limited to, a scene change, a repeat picture, a fade-in or fade-out, the presence and location of a caption regions used to provide closed captioning or subtitles, the presence and location of a black border region and an active image region, the relative complexity of the image, and the like. Atblock312, thefeature detection module120 controls the encoder118 (FIG. 1) via control signaling130 so as to perform at least one transcoding operation (which can include modifying at least one transcoding operation), based on the detected characteristics of the image. If the detected characteristic is that the image represents a scene change, the transcoding operation performed in response can include, for example, encoding the image as an I-frame at the start of a new GOP, or allocating a higher bit rate to the image during transcoding. If the detected characteristic is the presence of a black border region or caption region, the transcoding operation performed in response can include, for example, encoding the image so as to omit the black border region or caption region from the corresponding encoded image, omitting the black border region or caption region from the analysis of the image for scene change detection, or encoding the image so as to assign a different bit allocation to the black border region/caption region than the bit allocation assigned to the active image region. If the detected characteristic is the complexity of the image, the transcoding operation performed in response can include, for example, setting the rate control or quantization parameter.FIGS. 4-8 illustrate various examples of the processes performed at blocks308-312.
FIGS. 4 and 5 illustrate a method400 (FIG. 4) for detecting scene changes in a video signal500 (FIG. 5) based on SVAR metrics in accordance with at least one embodiment of the present disclosure. Themethod400 initiates atblock402 whereby thefeature detection module120 receives or selects for scene change analysis animage502 from thevideo signal500 being processed for encoding or transcoding by the transcoding system104 (FIG. 1). Atblock404, thefeature detection module120 determines a current SVAR metric for theimage502 currently being analyzed for scene change detection. In one embodiment, the current SVAR metric comprises the current SVAR metric for the entire image502 (SVARI). In another embodiment, the current SVAR metric can comprise a SVAR metric for a selected region of theimage502, such as a sum of a certain number of the row projections SVARR[x]or a certain number of column projections SVARC[X]at a center of theimage502, or alternatively, at the sides of the image. An effective approach can include using the current SVAR metric for the active image region of theimage502 with any black border or caption regions removed or disregarded (e.g., due to the significant VAR fluctuations introduced by the caption region). As another example, the current SVAR metric can include a SVAR metric for side regions of theimage502 as the appearance or disappearance of a black border often is a reliable indicator of a scene change. In the event that detection of a fade-in or fade-out is sought, the current SVAR metric of the active image region can be evaluated for a gradual increase or decrease, thereby indicating a fade-in or fade out. As yet another example, the current SVAR metric calculated atblock404 can include the SVAR metric SVARReg[X]for the blocks at a defined center region of theimage502.
At block406, thefeature detection module120 determines or accesses a previous SVAR metric for one or more preceding images in thevideo signal500. For example, thefeature detection module120 can determine the previous SVAR metric as the corresponding SVAR metric for the immediately precedingimage504. As another example, thefeature detection module120 can determine the previous SVAR metric as an average or other representation of the corresponding SVAR metrics for a sliding window of preceding images, such as a three-image sliding window505 that includes precedingframes504,506, and508 with respect to thecurrent image502. The previous SVAR metric calculated for the slidingwindow505 can be an unweighted average of the SVAR metrics for the images in the sliding window505 (that is, the SVAR for each image in the slidingwindow505 is weighted equally), or the previous SVAR metric can be calculated as a weighted average, whereby the SVAR metric for the image most proximate to the current image under analysis (e.g.,image504 relative to current image502) is most heavily weighted. The previous SVAR metric typically is calculated from the same region of the preceding image(s) as the region used to calculate the current SVAR metric forimage502. For example, the current SVAR metric is the entire-image SVAR metric, then the previous SVAR metric is calculated from the entire-image SVAR metric of each of the one or more images in the slidingwindow505.
A statistically-significant change in SVAR metrics between the current image and one or more of the preceding images in a video signal is a strong indicator that the current image represents a scene change. Accordingly, atblock408 thefeature detection module120 determines a difference between the current SVAR metric calculated atblock404 and the previous SVAR metric calculated at block406. Thefeature detection module120 then compares this difference with a predetermined threshold to identify whether theimage502 represents a scene change (i.e., there is a statistically-significant difference between the SVAR metric of thecurrent image502 and one or more preceding images). In one embodiment, the predetermined threshold is a relative threshold, such as a percentage change. To illustrate, the threshold may be set as +/−20% change from the previous SVAR metric. Thus, the current SVAR metric would exceed the threshold if the current SVAR metric were more than 20% higher or more than 20% lower than the previous SVAR metric. Alternatively, the predetermined threshold may represent an absolute change, rather than a relative change. Further, in one embodiment, the threshold may include both a relative threshold component and an absolute threshold component such that the threshold is exceeded only when both the relative threshold component and the absolute threshold component are exceeded. The threshold may be determined empirically, through simulation or modeling, and the like. Further, while the threshold may be static in some implementations, in other implementations the threshold may dynamically changed based on feedback during the transcoding process.
In the event that the difference between the current SVAR metric and the previous SVAR metric does not exceed the predetermined threshold, atblock410 thefeature detection module120 identifies theimage502 as not representing a scene change and signals the encoder118 (FIG. 1) accordingly. Conversely, in response to determining the difference exceeds the threshold, atblock412 thefeature detection module120 identifies theimage502 as representing a scene change and signals theencoder118 accordingly. In one embodiment, rather than basing scene change detection solely on the SVAR metric, thetranscoding system104 uses the SVAR metric in conjunction with one or more other scene change indicators. For example, thetranscoding system104 could use a combination of the SVAR metric and a volume change detection to determine whether a scene change has occurred.
Atblock414, theencoder104 encodes a portion of thevideo signal500 corresponding to theimage502 based on whether theimage502 was determined to represent a scene change atblocks410 and412. For example, in response to thefeature detection module120 signaling via control signaling130 that theimage502 is a scene change, theencoder118 can perform a transcoding operation that generates a new GOP with theimage502 as the first I-frame of the new GOP, or otherwise changes the transcoding parameters (such as the rate control parameters and references of the image502) so as to improve the quality of the transcoded representation of theimage502. In parallel, themethod400 returns to block402 for the next image in thevideo signal500, at which point theimage502 becomes a preceding image relative to the next image. The process ofmethod400 may continue until thevideo signal500 is encoded or transcoded.
FIGS. 6 and 7 illustrate a method600 (FIG. 6) for detecting a black border region or a caption region in an image700 (FIG. 7) based on SVAR metrics in accordance with at least one embodiment of the present disclosure. As illustrated byimage700 ofFIG. 7, images of a video signal may include a black border comprising one or more black border regions, such as a top horizontal black bar702, a bottom horizontal black bar704, a left vertical black bar706, and a right vertical black bar708. These black border bars (also commonly referred to as “mattes”) typically are introduced during a video encoding process or video formatting process in which the aspect ratio or resolution is changed between the original video content and the resulting video content. To illustrate, the conversion of video from a 16:9 aspect ratio common to the ATSC format to the 4:3 aspect ratio of the NTSC television often results in the introduction of the vertical black bars702 and704 if the entirety of the original content is to be maintained. Further, a caption region710 often may be present in the image700 (often in the bottom horizontal black bar704) for the purpose of displaying closed captioning text or subtitles. The presence of these black border regions or caption regions in images conventionally leads to a sub-optimal encoding of the images as encoding resources are unnecessarily allocated to encoding the black border region or the caption region at the same fidelity as the active image region (i.e., the region of the image in which non-border or non-caption content is displayed).
Referring back toFIG. 6, themethod600 illustrates a process for detecting these black border regions or caption regions in images so as to adjust the encoding resources allocated to these regions accordingly. Themethod600 initiates atblock602, whereby thefeature detection module120 receives theimage700 for analysis and determines SVAR metrics for one or more variance row projections and/or one or more variance column projections in one or more border regions of theimage700. To illustrate, if it is assumed that a black border, if present at all, would reside within the columns of pixel blocks within vertical border regions712 and714 and within the rows of pixel blocks within horizontal border regions716 and718, thefeature detection module120 can determine the variance column projections SVARC[X]for those columns X of pixel blocks that fall within vertical border regions712 and714 and the variance row projections SVARR[Y]for those rows Y of pixel blocks that fall within horizontal border regions716 and718. Also under this assumption, thefeature detection module120 would bypass SVAR metric computation for the pixel blocks that fall within both vertical region720 and horizontal region722 under the expectation these pixel blocks would constitute theactive image region724 of theimage700.
Thefeature detection module120 can detect a black border region726 from these SVAR projections in a number of ways. In one embodiment, thefeature detection module120 can detect each black bar, or matte, of the black border individually. For example, thefeature detection module120 can sum or average the variance row projections in horizontal border region716 to determine a region SVAR metric for detecting the presence of the top horizontal black bar702. Likewise, thefeature detection module120 can sum or average the variance row projections in horizontal border region718 to determine a region SVAR metric for detecting the presence of the bottom horizontal black bar704. Similarly, thefeature detection module120 can sum or average the variance column projections in vertical border region712 to determine a region SVAR metric for detecting the presence of the left vertical black bar706 and sum or average the variance column projections in vertical border region714 to determine a region SVAR metric for detecting the presence of the right vertical black bar708.
Alternatively, thefeature detection module120 can detect the black bars in pairs (top and bottom black bars702 and704, or left and right black bars706 and708), such as by summing or averaging the variance row projections in both border regions716 and718 or by summing or averaging the variance column projections in both border regions712 and714. In another embodiment, the presence of the black border region726 as a whole can be detected by, for example, summing or averaging the variance row projections from horizontal border regions716 and718 and the variance column projections from vertical border regions712 and714 together.
Atblock604, thefeature detection module120 uses the one or more SVAR metrics determined atblock602 to detect whether the black border region726, or one or more black bars thereof, is present in theimage700. Generally, the SVAR metrics for those regions of an image in which a black border or border bar is present have a relatively low variance as there would be little variance between the pixels of the pixel block. Accordingly, thefeature detection module120 can uses a predetermined threshold corresponding to this expected low variance as a trigger for detecting the black border region726. In the event that the SVAR metric fromblock602 exceeds the threshold, thefeature detection module120 identifies theimage700 as containing the back border726 (or corresponding bar component) and signals the encoder118 (FIG. 1) accordingly. To illustrate, it may be determined from empirical analysis or modeling that the average of the variance row projections in region716 of an image having a black border is less than K. Thus, the threshold for detecting the presence of the top black bar702 may be set to K such that when the average of the variance row projections for the pixel blocks in region716 of theimage700 falls below K, thefeature detection module120 identifiesimage700 has having the top black bar702 at horizontal region716. While the threshold may be static in some implementations, in other implementations the threshold may dynamically changed based on feedback during the transcoding process.
In certain implementations, the extent of the border bars may not be reliably predicted. For example, thetranscoding system104 may not be aware of any aspect ratio changes made in the receivedvideo signal108. Accordingly, rather than rely on predefined regions712,716,716, and718. Thefeature detection module120 can instead detect the transition from a black bar to the active image region724 (that is, the edge of the black bar) by detecting a statistically-significant change between variance row projections of adjacent rows of pixel blocks or between variance column projections of adjacent columns of pixel blocks. For example, thefeature detection module120 may identify as the edge of the left vertical black bar706 the line dividing a column of pixel blocks with a variance column projection below a predetermined threshold and a column of pixel blocks with a variance column projection above a predetermined threshold. Alternatively,feature detection module120 may identify as the edge of the left vertical black bar706 the first column of pixels having a variance column projection that is at least, for example, 20% greater than the variance column projection of the column of pixels lateral and adjacent to the first column.
Thetranscoding system104 may perform any of a variety of transcoding operations in response to detecting the black border region726 (or components thereof) in theimage700. To illustrate, atblock606 thetranscoding system104 could crop theimage700 for the resulting transcoded video signal such that the image content in the black border region726 is omitted in the corresponding transcoded image, and thus only the image content in theactive image region724 is included in the corresponding transcoded image. Alternatively, atblock608 the detected black border region726 (or components thereof) may continue to be represented in the corresponding transcoded image, but at a lower fidelity. In this instance, theencoder118 may allocate a higher bit rate or bit budget to theactive image portion724 and allocate a lower bit rate or bit budget to the detected black border region726, thereby allowing theactive image region724 to have an improved fidelity for a given bit allocation for the overall image.
The caption region710 may be detected and handled by thetranscoding system104 in a similar manner. For example, atblock604 thefeature detection module120 may determine a SVAR metric by, for example, summing or averaging the variance row projections for the caption region710. However, unlike a black frame region, a caption region710 is expected to have a relatively high SVAR due to the sharp contrast (and thus variance) between the typically white pixels representing the characters in the caption region710 and the typically black pixels representing the background in the caption region710. Accordingly, atblock606 thefeature detection module120 would detect the presence of the caption region710 by determining that the SVAR metric for the expected caption region710 falls above a predetermined threshold, which may be determined empirically, through modeling, and may be static or may be updated dynamically based on feedback during the transcoding process. In the event that the caption region710 is identified, the caption region710 may be cropped from the resulting transcoded image atblock606 or may be allocated a higher bit budget for very low bit-rate transcoding to improve subjective quality at detailed above.
FIG. 8 illustrates anexample method800 for using an SVAR metric for allocation of bits to an image for bit rate control in accordance with at least one embodiment of the present disclosure. Themethod800 initiates atblock802, whereby thefeature detection module120 receives an image from the video signal108 (FIG. 1) for analysis and determines the SVAR metric for the entire image (SVARI). This SVAR metric represents the complexity of the image in that the higher the SVARI, the more complex the image. Atblock804 theencoder118 modifies the implemented rate control based on the indicated complexity of the image. To illustrate, for a constant quality system, an image having a higher SVARI, and thus a higher complexity, may be allocated a higher bit budget or lower quantization parameter than an image having a lower SVARI, and thus a lower complexity. For a constant bit rate system, an image having a higher SVARI, and thus a higher complexity may call for a higher quantization parameter than an image having a lower SVARI, and thus a lower complexity under the same bit budget limitation.
AlthoughFIGS. 3-8 illustrate example transcoding operations that may be implemented or modified based on determined SVAR metrics, any of a variety of transcoding operations may be implemented or modified in a similar manner without departing from the scope of the present disclosure.
In this document, relational terms such as “first” and “second”, and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual relationship or order between such entities or actions or any actual relationship or order between such entities and claimed elements. The term “another”, as used herein, is defined as at least a second or more. The terms “including”, “having”, or any variation thereof, as used herein, are defined as comprising.
Other embodiments, uses, and advantages of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. The specification and drawings should be considered as examples only, and the scope of the disclosure is accordingly intended to be limited only by the following claims and equivalents thereof.
Note that not all of the activities or elements described above in the general description are required, that a portion of a specific activity or device may not be required, and that one or more further activities may be performed, or elements included, in addition to those described. Still further, the order in which activities are listed are not necessarily the order in which they are performed.
Also, the concepts have been described with reference to specific embodiments. However, one of ordinary skill in the art appreciates that various modifications and changes can be made without departing from the scope of the present disclosure as set forth in the claims below. Accordingly, the specification and figures are to be regarded in an illustrative rather than a restrictive sense, and all such modifications are intended to be included within the scope of the present disclosure.
Benefits, other advantages, and solutions to problems have been described above with regard to specific embodiments. However, the benefits, advantages, solutions to problems, and any feature(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential feature of any or all the claims.