Movatterモバイル変換

[0]ホーム

Jump to content

Visual information fidelity

中文

Edit links

From Wikipedia, the free encyclopedia

Objective full-reference image quality assessment

This article has multiple issues. Please helpimprove it or discuss these issues on thetalk page.(Learn how and when to remove these messages)

This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "Visual information fidelity" – news ·newspapers ·books ·scholar ·JSTOR(May 2017) (Learn how and when to remove this message)

This article or sectionmay have beencopied and pasted from another location,possibly in violation ofWikipedia's copyright policy. Please review the source andremedy this by editing this article to remove any non-free copyrighted content and attributing free content correctly, or flagging the content for deletion. Please be sure that the supposed source of the copyright violation is not itself aWikipedia mirror.(January 2024)

(Learn how and when to remove this message)

Visual information fidelity (VIF) is a full referenceimage quality assessment index based onnatural scene statistics and the notion of image information extracted by thehuman visual system.^[1] It was developed by Hamid R Sheikh andAlan Bovik at the Laboratory for Image and Video Engineering (LIVE) at theUniversity of Texas at Austin in 2006. It is deployed in the core of theNetflix VMAF video quality monitoring system, which controls the picture quality of all encoded videos streamed by Netflix.

Model overview

[edit]

Images and videos of thethree-dimensional visual environments come from a common class: the class of natural scenes. Natural scenes from a tiny subspace in the space of all possible signals, and researchers have developed sophisticated models to characterize these statistics. Most real-worlddistortion processes disturb these statistics and make the image or video signals unnatural. The VIF index employsnatural scene statistical (NSS) models in conjunction with adistortion (channel) model to quantify the information shared between the test and the reference images. Further, the VIF index is based on thehypothesis that this shared information is an aspect of fidelity that relates well with visual quality. In contrast to prior approaches based onhuman visual system (HVS) error-sensitivity and measurement of structure,^[2] this statistical approach used in aninformation-theoretic setting, yields a full reference (FR)quality assessment (QA) method that does not rely on any HVS or viewing geometry parameter, nor any constants requiring optimization, and yet is competitive with state of the art QA methods.^[3]

Specifically, the reference image is modeled as being the output of astochastic 'natural' source that passes through the HVS channel and is processed later by the brain. The information content of the reference image is quantified as being the mutual information between the input and output of the HVS channel. This is the information that the brain could ideally extract from the output of the HVS. The same measure is then quantified in the presence of an image distortion channel that distorts the output of the natural source before it passes through the HVS channel, thereby measuring the information that the brain could ideally extract from the test image. This is shown pictorially in Figure 1. The two information measures are then combined to form a visual information fidelity measure that relates visual quality to relative image information.

System model

[edit]

This articlemay be too technical for most readers to understand. Pleasehelp improve it tomake it understandable to non-experts, without removing the technical details.(January 2018) (Learn how and when to remove this message)

Source model

[edit]

A Gaussian scale mixture (GSM) is used to statistically model thewavelet coefficients of a steerable pyramid decomposition of an image.^[4] The model is described below for a given subband of the multi-scale multi-orientation decomposition and can be extended to other subbands similarly. Let the wavelet coefficients in a given subband be ${\mathcal {C}}=\{{\bar {C}}_{i}:i\in {\mathcal {I}}\}$ where ${\mathcal {I}}$ denotes the set of spatial indices across the subband and each ${\bar {C}}_{i}$ is an $M {\displaystyle M}$ dimensionalvector. The subband is partitioned into non-overlapping blocks of $M {\displaystyle M}$ coefficients each, where each block corresponds to ${\bar {C}}_{i}$ . According to the GSM model, ${\mathcal {C}}={\mathcal {S}}\cdot {\mathcal {U}}=\{S_{i}{\bar {U}}_{i}:i\in {\mathcal {I}}\},$ where $S_{i}$ is a positivescalar and ${\bar {U}}_{i}$ is a Gaussian vector with mean zero andco-variance $\mathbf {C} _{U}$ . Further the non-overlapping blocks are assumed to be independent of each other and that the random field ${\mathcal {S}}$ is independent of ${\mathcal {U}}$ .

Distortion model

[edit]

The distortion process is modeled using a combination of signalattenuation and additive noise in thewavelet domain. Mathematically, if ${\mathcal {D}}=\{{\bar {D}}_{i}:i\in {\mathcal {I}}\}$ denotes the random field from a given subband of the distorted image, ${\mathcal {G}}=\{g_{i}:i\in {\mathcal {I}}\}$ is adeterministic scalar field and ${\mathcal {V}}=\{{\bar {V}}_{i}:i\in {\mathcal {I}}\}$ , where ${\bar {V}}_{i}$ is a zero mean Gaussian vector with co-variance $\mathbf {C} _{V}=\sigma _{v}^{2}\mathbf {I}$ , then

{\mathcal {D}}={\mathcal {G}}{\mathcal {C}}+{\mathcal {V}}.

Further, ${\mathcal {V}}$ is modeled to be independent of ${\mathcal {S}}$ and ${\mathcal {U}}$ .

HVS model

[edit]

The duality of HVS models and NSS implies that several aspects of the HVS have already been accounted for in the source model. Here, the HVS is additionally modeled based on the hypothesis that the uncertainty in theperception of visual signals limits the amount of information that can be extracted from the source and distorted image. This source of uncertainty can be modeled asvisual noise in the HVS model. In particular, the HVS noise in a given subband of the wavelet decomposition is modeled as additive white Gaussian noise. Let ${\mathcal {N}}=\{{\bar {N}}_{i}:i\in {\mathcal {I}}\}$ and ${\mathcal {N}}'=\{{\bar {N}}_{i}':i\in {\mathcal {I}}\}$ be random fields, where ${\bar {N}}_{i}$ and ${\bar {N}}_{i}'$ are zero mean Gaussian vectors with co-variance $\mathbf {C} _{N}$ and $\mathbf {C} _{N}'$ . Further, let ${\mathcal {E}}$ and ${\mathcal {F}}$ denote the visual signal at the output of the HVS. Mathematically, we have ${\mathcal {E}}={\mathcal {C}}+{\mathcal {N}}$ and ${\mathcal {F}}={\mathcal {D}}+{\mathcal {N}}'$ . Note that ${\mathcal {N}}$ and ${\mathcal {N}}'$ arerandom fields that are independent of ${\mathcal {S}}$ , ${\mathcal {U}}$ and ${\mathcal {V}}$ .

VIF index

[edit]

This sectiondoes notcite anysources. Please helpimprove this section byadding citations to reliable sources. Unsourced material may be challenged andremoved.(January 2018) (Learn how and when to remove this message)

Let ${\bar {C}}^{N}=({\bar {C}}_{1},{\bar {C}}_{2},\ldots ,{\bar {C}}^{N})$ denote the vector of all blocks from a given subband. Let $S^{N},{\bar {D}}^{N},{\bar {E}}^{N}$ and ${\bar {F}}^{N}$ be similarly defined. Let $s^{N}$ denote themaximum likelihood estimate of $S^{N}$ given $C^{N}$ and $\mathbf {C} _{U}$ . The amount of information extracted from the reference is obtained as

I({\bar {C}}^{N};{\bar {E}}^{N}|{\bar {S}}^{N}=s^{N})={\frac {1}{2}}\sum _{i=1}^{N}\log _{2}\left({\frac {|s_{i}^{2}\mathbf {C} _{U}+\sigma _{n}^{2}\mathbf {I} |}{|\sigma _{n}^{2}\mathbf {I} |}}\right),

while the amount of information extracted from the test image is given as

I({\bar {C}}^{N};{\bar {F}}^{N}|{\bar {S}}^{N}=s^{N})={\frac {1}{2}}\sum _{i=1}^{N}\log _{2}\left({\frac {|g_{i}^{2}s_{i}^{2}\mathbf {C} _{U}+(\sigma _{v}^{2}+\sigma _{n}^{2})\mathbf {I} |}{|(\sigma _{v}^{2}+\sigma _{n}^{2})\mathbf {I} |}}\right).

Denoting the $N {\displaystyle N}$ blocks in subband $j {\displaystyle j}$ of the wavelet decomposition by ${\bar {C}}^{N,j}$ , and similarly for the other variables, the VIF index is defined as

{\textrm {VIF}}={\frac {\sum _{j\in {\textrm {subbands}}}I({\bar {C}}^{N,j};{\bar {F}}^{N,j}\mid S^{N,j}=s^{N,j})}{\sum _{j\in {\textrm {subbands}}}I({\bar {C}}^{N,j};{\bar {E}}^{N,j}\mid S^{N,j}=s^{N,j})}}.

Performance

[edit]

The Spearman's rank-order correlation coefficient (SROCC) between the VIF index scores of distorted images on the LIVE Image Quality Assessment Database and the corresponding human opinion scores is evaluated to be 0.96.^{[citation needed]}

References

[edit]

^Sheikh, Hamid; Bovik, Alan (2006). "Image Information and Visual Quality".IEEE Transactions on Image Processing.15 (2):430–444.Bibcode:2006ITIP...15..430S.doi:10.1109/tip.2005.859378.PMID 16479813.
^Wang, Zhou; Bovik, Alan; Sheikh, Hamid; Simoncelli, Eero (2004). "Image quality assessment: From error visibility to structural similarity".IEEE Transactions on Image Processing.13 (4):600–612.Bibcode:2004ITIP...13..600W.doi:10.1109/tip.2003.819861.PMID 15376593.S2CID 207761262.
^Sheikh, Hamid R. (2006)."Image Information and Visual Quality".IEEE Transactions on Image Processing.15 (2):430–444.Bibcode:2006ITIP...15..430S.doi:10.1109/tip.2005.859378.PMID 16479813. Retrieved15 April 2024.
^Simoncelli, Eero; Freeman, William (1995). "The steerable pyramid: A flexible architecture for multi-scale derivative computation".Proceedings., International Conference on Image Processing. Vol. 3. pp. 444–447.doi:10.1109/ICIP.1995.537667.ISBN 0-7803-3122-2.S2CID 1099364.

External links

[edit]

Laboratory for Image and Video Engineering at the University of Texas
An implementation of the VIF index
LIVE Image Quality Assessment Database

Retrieved from "https://en.wikipedia.org/w/index.php?title=Visual_information_fidelity&oldid=1259011360"

Categories:

Hidden categories:

[8]ページ先頭