Movatterモバイル変換

[0]ホーム

Jump to content

Pyramid (image processing)

Edit links

From Wikipedia, the free encyclopedia

Type of multi-scale signal representation

Visual representation of an image pyramid with 5 levels

Feature detection
Edge detection
Canny Deriche Differential Sobel Prewitt Robinson Roberts cross
Corner detection
Harris operator Shi and Tomasi Level curve curvature Hessian feature strength measures SUSAN FAST
Blob detection
Laplacian of Gaussian (LoG) Difference of Gaussians (DoG) Determinant of Hessian (DoH) Maximally stable extremal regions PCBR
Ridge detection
Hough transform
Hough transform Generalized Hough transform
Structure tensor
Structure tensor Generalized structure tensor
Affine invariant feature detection
Affine shape adaptation Harris affine Hessian affine
Feature description
SIFT SURF GLOH HOG
Scale space
Scale-space axioms Implementation details Pyramids
v t e

Pyramid, orpyramid representation, is a type ofmulti-scale signal representation developed by thecomputer vision,image processing andsignal processing communities, in which a signal or an image is subject to repeatedsmoothing andsubsampling. Pyramid representation is a predecessor toscale-space representation andmultiresolution analysis.

Pyramid generation

[edit]

There are two main types of pyramids: lowpass and bandpass.

A lowpass pyramid is made by smoothing the image with an appropriate smoothing filter and then subsampling the smoothed image, usually by a factor of 2 along each coordinate direction. The resulting image is then subjected to the same procedure, and the cycle is repeated multiple times. Each cycle of this process results in a smaller image with increased smoothing, but with decreased spatial sampling density (that is, decreased image resolution). If illustrated graphically, the entire multi-scale representation will look like a pyramid, with the original image on the bottom and each cycle's resulting smaller image stacked one atop the other.

A bandpass pyramid is made by forming the difference between images at adjacent levels in the pyramid and performing image interpolation between adjacent levels of resolution, to enable computation of pixelwise differences.^[1]

Pyramid generation kernels

[edit]

A variety of different smoothingkernels have been proposed for generating pyramids.^[2]^[3]^[4]^[5]^[6]^[7] Among the suggestions that have been given, thebinomial kernels arising from thebinomial coefficients stand out as a particularly useful and theoretically well-founded class.^[3]^[8]^[9]^[10]^[11]^[12] Thus, given a two-dimensional image, we may apply the (normalized) binomial filter (1/4, 1/2, 1/4) typically twice or more along each spatial dimension and then subsample the image by a factor of two. This operation may then proceed as many times as desired, leading to a compact and efficient multi-scale representation. If motivated by specific requirements, intermediate scale levels may also be generated where the subsampling stage is sometimes left out, leading to anoversampled orhybrid pyramid.^[11] With the increasing computational efficiency ofCPUs available today, it is in some situations also feasible to use wider supportedGaussian filters as smoothing kernels in the pyramid generation steps.

Gaussian pyramid

[edit]

In a Gaussian pyramid, subsequent images are weighted down using a Gaussian average (Gaussian blur) and scaled down. Each pixel containing a local average corresponds to a neighborhood pixel on a lower level of the pyramid. This technique is used especially intexture synthesis.

Laplacian pyramid

[edit]

A Laplacian pyramid is very similar to a Gaussian pyramid but saves the difference image of the blurred versions between each levels. Only the smallest level is not a difference image to enable reconstruction of the high resolution image using the difference images on higher levels. This technique can be used inimage compression.^[13]

Steerable pyramid

[edit]

A steerable pyramid, developed bySimoncelli and others, is an implementation of a multi-scale, multi-orientationband-pass filter bank used for applications includingimage compression,texture synthesis, andobject recognition. It can be thought of as an orientation selective version of a Laplacian pyramid, in which a bank ofsteerable filters are used at each level of the pyramid instead of a single Laplacian orGaussian filter.^[14]^[15]^[16]

Applications of pyramids

[edit]

Alternative representation

[edit]

In the early days of computer vision, pyramids were used as the main type of multi-scale representation for computing multi-scale imagefeatures from real-world image data. More recent techniques includescale-space representation, which has been popular among some researchers due to its theoretical foundation, the ability to decouple the subsampling stage from the multi-scale representation, the more powerful tools for theoretical analysis as well as the ability to compute a representation atany desired scale, thus avoiding the algorithmic problems of relating image representations at different resolution. Nevertheless, pyramids are still frequently used for expressing computationally efficient approximations toscale-space representation.^[11]^[17]^[18]

Detail manipulation

[edit]

Levels of a Laplacian pyramid can be added to or removed from the original image to amplify or reduce detail at different scales. However, detail manipulation of this form is known to produce halo artifacts in many cases, leading to the development of alternatives such as thebilateral filter.

Someimage compression file formats use theAdam7 algorithm or some otherinterlacing technique.These can be seen as a kind of image pyramid.Because those file format store the "large-scale" features first, and fine-grain details later in the file,a particular viewer displaying a small "thumbnail" or on a small screen can quickly download just enough of the image to display it in the available pixels—so one file can support many viewer resolutions, rather than having to store or generate a different file for each resolution.

References

[edit]

^E.H. Andelson and C.H. Anderson and J.R. Bergen and P.J. Burt and J.M. Ogden."Pyramid methods in image processing".1984.
^Burt, P. J. (May 1981). "Fast filter transform for image processing".Computer Graphics and Image Processing.16:20–51.doi:10.1016/0146-664X(81)90092-7.
^^a ^bCrowley, James L. (November 1981)."A representation for visual information".Interim Report Carnegie-Mellon Univ. Carnegie-Mellon University, Robotics Institute.Bibcode:1981cmu..reptR....C. tech. report CMU-RI-TR-82-07.
^Burt, Peter; Adelson, Ted (1983)."The Laplacian Pyramid as a Compact Image Code"(PDF).IEEE Transactions on Communications.9 (4):532–540.CiteSeerX 10.1.1.54.299.doi:10.1109/TCOM.1983.1095851.S2CID 8018433.
^Crowley, J. L.;Parker, A. C. (March 1984). "A representation for shape based on peaks and ridges in the difference of low-pass transform".IEEE Transactions on Pattern Analysis and Machine Intelligence.6 (2):156–170.CiteSeerX 10.1.1.161.3102.doi:10.1109/TPAMI.1984.4767500.PMID 21869180.S2CID 14348919.
^Crowley, J. L.; Sanderson, A. C. (1987)."Multiple resolution representation and probabilistic matching of 2-D gray-scale shape"(PDF).IEEE Transactions on Pattern Analysis and Machine Intelligence.9 (1):113–121.CiteSeerX 10.1.1.1015.9294.doi:10.1109/tpami.1987.4767876.PMID 21869381.S2CID 14999508.
^Meer, P.; Baugher, E. S.; Rosenfeld, A. (1987). "Frequency domain analysis and synthesis of image generating kernels".IEEE Transactions on Pattern Analysis and Machine Intelligence.9 (4):512–522.doi:10.1109/tpami.1987.4767939.PMID 21869409.S2CID 5978760.
^Lindeberg, Tony, "Scale-space for discrete signals," PAMI(12), No. 3, March 1990, pp. 234-254.
^Haddad, R. A.; Akansu, A. N. (March 1991)."A Class of Fast Gaussian Binomial Filters for Speech and Image Processing"(PDF).IEEE Transactions on Signal Processing.39 (3):723–727.Bibcode:1991ITSP...39..723H.doi:10.1109/78.80892.
^Lindeberg, Tony.Scale-Space Theory in Computer Vision, Kluwer Academic Publishers, 1994,ISBN 0-7923-9418-6 (see specifically Chapter 2 for an overview of Gaussian and Laplacian image pyramids and Chapter 3 for theory about generalized binomial kernels and discrete Gaussian kernels)
^^a ^b ^cLindeberg, T. and Bretzner, L.Real-time scale selection in hybrid multi-scale representations, Proc. Scale-Space'03, Isle of Skye, Scotland, Springer Lecture Notes in Computer Science, volume 2695, pages 148-163, 2003.
^See the article onmulti-scale approaches for a very brief theoretical statement
^Burt, Peter J.; Adelson, Edward H. (1983)."The Laplacian Pyramid as a Compact Image Code"(PDF).IEEE Transactions on Communications.31 (4):532–540.CiteSeerX 10.1.1.54.299.doi:10.1109/TCOM.1983.1095851.S2CID 8018433.
^Simoncelli, Eero."The Steerable Pyramid". cns.nyu.edu.
^Manduchi, Roberto; Perona, Pietro; Shy, Doug (1997)."Efficient Deformable Filter Banks"(PDF).California Institute of Technology/University of Padua.
Also inManduchi, R.; Perona, P.; Shy, D. (1998). "Efficient Deformable Filter Banks".IEEE Transactions on Signal Processing.46 (4):1168–1173.Bibcode:1998ITSP...46.1168M.CiteSeerX 10.1.1.5.3102.doi:10.1109/78.668570.
^Klein, Stanley A.; Carney, Thom; Barghout-Stein, Lauren; Tyler, Christopher W. (1997). "Seven models of masking". In Rogowitz, Bernice E.; Pappas, Thrasyvoulos N. (eds.).Human Vision and Electronic Imaging II. Vol. 3016. pp. 13–24.doi:10.1117/12.274510.S2CID 8366504.
^Crowley, J, Riff O.Fast computation of scale normalised Gaussian receptive fields, Proc. Scale-Space'03, Isle of Skye, Scotland, SpringerLecture Notes in Computer Science, volume 2695, 2003.
^Lowe, D. G. (2004)."Distinctive image features from scale-invariant keypoints".International Journal of Computer Vision.60 (2):91–110.CiteSeerX 10.1.1.73.2924.doi:10.1023/B:VISI.0000029664.99615.94.S2CID 221242327.