Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Blob detection

From Wikipedia, the free encyclopedia
Particular task in computer vision
This articlemay be too technical for most readers to understand. Pleasehelp improve it tomake it understandable to non-experts, without removing the technical details.(September 2009) (Learn how and when to remove this message)
Feature detection
Edge detection
Corner detection
Blob detection
Ridge detection
Hough transform
Structure tensor
Affine invariant feature detection
Feature description
Scale space

Incomputer vision andimage processing,blob detection methods are aimed at detecting regions in adigital image that differ in properties, such as brightness or color, compared to surrounding regions. Informally, ablob is a region of an image in which some properties are constant or approximately constant; all the points in a blob can be considered in some sense to be similar to each other. The most common method for blob detection is by usingconvolution.

Given some property of interest expressed as a function of position on the image, there are two main classes of blob detectors: (i) differential methods, which are based on derivatives of the function with respect to position, and (ii) methods based on localextrema, which are based on finding the local maxima and minima of the function. With the more recent terminology used in the field, these detectors can also be referred to asinterest point operators, or alternatively interest region operators (see alsointerest point detection andcorner detection).

There are several motivations for studying and developing blob detectors. One main reason is to provide complementary information about regions, which is not obtained fromedge detectors orcorner detectors. In early work in the area, blob detection was used to obtain regions of interest for further processing. These regions could signal the presence of objects or parts of objects in the image domain with application toobject recognition and/or objecttracking. In other domains, such ashistogram analysis, blob descriptors can also be used for peak detection with application tosegmentation. Another common use of blob descriptors is as main primitives fortexture analysis and texture recognition. In more recent work, blob descriptors have found increasingly popular use asinterest points for wide baselinestereo matching and to signal the presence of informative image features for appearance-based object recognition based on local image statistics. There is also the related notion ofridge detection to signal the presence of elongated objects.

The Laplacian of Gaussian

[edit]

One of the first and also most common blob detectors is based on theLaplacian of theGaussian (LoG). Given an input imagef(x,y){\displaystyle f(x,y)}, this image isconvolved by a Gaussian kernel

g(x,y,t)=12πtex2+y22t{\displaystyle g(x,y,t)={\frac {1}{2\pi t}}e^{-{\frac {x^{2}+y^{2}}{2t}}}}

at a certain scalet{\displaystyle t} to give ascale space representationL(x,y;t) =g(x,y,t)f(x,y){\displaystyle L(x,y;t)\ =g(x,y,t)*f(x,y)}. Then, the result of applying theLaplacian operator

2L=Lxx+Lyy{\displaystyle \nabla ^{2}L=L_{xx}+L_{yy}}

is computed, which usually results in strong positive responses for dark blobs of radiusr2=2t{\textstyle r^{2}=2t} (for a two-dimensional image,r2=dt{\textstyle r^{2}=dt} for ad{\textstyle d}-dimensional image) and strong negative responses for bright blobs of similar size. A main problem when applying this operator at a single scale, however, is that the operator response is strongly dependent on the relationship between the size of the blob structures in the image domain and the size of the Gaussian kernel used for pre-smoothing. In order to automatically capture blobs of different (unknown) size in the image domain, a multi-scale approach is therefore necessary.

A straightforward way to obtain amulti-scale blob detector with automatic scale selection is to consider thescale-normalized Laplacian operator

norm2L=t(Lxx+Lyy){\displaystyle \nabla _{\mathrm {norm} }^{2}L=t\,(L_{xx}+L_{yy})}

and to detectscale-space maxima/minima, that are points that aresimultaneously local maxima/minima ofnorm2L{\displaystyle \nabla _{\mathrm {norm} }^{2}L} with respect to both space and scale (Lindeberg 1994, 1998). Thus, given a discrete two-dimensional input imagef(x,y){\displaystyle f(x,y)} a three-dimensional discrete scale-space volumeL(x,y,t){\displaystyle L(x,y,t)} is computed and a point is regarded as a bright (dark) blob if the value at this point is greater (smaller) than the value in all its 26 neighbours. Thus, simultaneous selection of interest points(x^,y^){\displaystyle ({\hat {x}},{\hat {y}})} and scalest^{\displaystyle {\hat {t}}} is performed according to

(x^,y^;t^)=argmaxminlocal(x,y;t)((norm2L)(x,y;t)){\displaystyle ({\hat {x}},{\hat {y}};{\hat {t}})=\operatorname {argmaxminlocal} _{(x,y;t)}((\nabla _{\mathrm {norm} }^{2}L)(x,y;t))}.

Note that this notion of blob provides a concise and mathematically precise operational definition of the notion of "blob", which directly leads to an efficient and robust algorithm for blob detection. Some basic properties of blobs defined from scale-space maxima of the normalized Laplacian operator are that the responses are covariant with translations, rotations and rescalings in the image domain. Thus, if a scale-space maximum is assumed at a point(x0,y0;t0){\displaystyle (x_{0},y_{0};t_{0})} then under a rescaling of the image by a scale factors{\displaystyle s}, there will be a scale-space maximum at(sx0,sy0;s2t0){\displaystyle \left(sx_{0},sy_{0};s^{2}t_{0}\right)} in the rescaled image (Lindeberg 1998). This in practice highly useful property implies that besides the specific topic of Laplacian blob detection,local maxima/minima of the scale-normalized Laplacian are also used for scale selection in other contexts, such as incorner detection, scale-adaptive feature tracking (Bretzner and Lindeberg 1998), in thescale-invariant feature transform (Lowe 2004) as well as other image descriptors for image matching andobject recognition.

The scale selection properties of the Laplacian operator and other closely scale-space interest point detectors are analyzed in detail in (Lindeberg 2013a).[1]In (Lindeberg 2013b, 2015)[2][3] it is shown that there exist other scale-space interest point detectors, such as the determinant of the Hessian operator, that perform better than Laplacian operator or its difference-of-Gaussians approximation for image-based matching using local SIFT-like image descriptors.

The difference of Gaussians approach

[edit]
Main article:Difference of Gaussians

From the fact that thescale space representationL(x,y,t){\displaystyle L(x,y,t)} satisfies thediffusion equation

tL=122L{\displaystyle \partial _{t}L={\frac {1}{2}}\nabla ^{2}L}

it follows that the Laplacian of the Gaussian operator2L(x,y,t){\displaystyle \nabla ^{2}L(x,y,t)} can also be computed as the limit case of the difference between two Gaussian smoothed images (scale space representations)

norm2L(x,y;t)tΔt(L(x,y;t+Δt)L(x,y;t)){\displaystyle \nabla _{\mathrm {norm} }^{2}L(x,y;t)\approx {\frac {t}{\Delta t}}\left(L(x,y;t+\Delta t)-L(x,y;t)\right)}.

In the computer vision literature, this approach is referred to as thedifference of Gaussians (DoG) approach. Besides minor technicalities, however, this operator is in essence similar to theLaplacian and can be seen as an approximation of the Laplacian operator. In a similar fashion as for the Laplacian blob detector, blobs can be detected from scale-space extrema of differences of Gaussians—see (Lindeberg 2012, 2015)[3][4] for the explicit relation between the difference-of-Gaussian operator and the scale-normalized Laplacian operator. This approach is for instance used in thescale-invariant feature transform (SIFT) algorithm—see Lowe (2004).

The determinant of the Hessian

[edit]

By considering the scale-normalized determinant of the Hessian, also referred to as theMonge–Ampère operator,

detHnormL=t2(LxxLyyLxy2){\displaystyle \det H_{\mathrm {norm} }L=t^{2}\left(L_{xx}L_{yy}-L_{xy}^{2}\right)}

whereHL{\displaystyle HL} denotes theHessian matrix of the scale-space representationL{\displaystyle L} and then detecting scale-space maxima of this operator one obtains another straightforward differential blob detector with automatic scale selection which also responds to saddles (Lindeberg 1994, 1998)

(x^,y^;t^)=argmaxlocal(x,y;t)((detHnormL)(x,y;t)){\displaystyle ({\hat {x}},{\hat {y}};{\hat {t}})=\operatorname {argmaxlocal} _{(x,y;t)}((\det H_{\mathrm {norm} }L)(x,y;t))}.

The blob points(x^,y^){\displaystyle ({\hat {x}},{\hat {y}})} and scalest^{\displaystyle {\hat {t}}} are also defined from an operational differential geometric definitions that leads to blob descriptors that are covariant with translations, rotations and rescalings in the image domain. In terms of scale selection, blobs defined from scale-space extrema of the determinant of the Hessian (DoH) also have slightly better scale selection properties under non-Euclidean affine transformations than the more commonly used Laplacian operator (Lindeberg 1994, 1998, 2015).[3] In simplified form, the scale-normalized determinant of the Hessian computed fromHaar wavelets is used as the basic interest point operator in theSURF descriptor (Bay et al. 2006) for image matching and object recognition.

A detailed analysis of the selection properties of the determinant of the Hessian operator and other closely scale-space interest point detectors is given in (Lindeberg 2013a)[1] showing that the determinant of the Hessian operator has better scale selection properties under affine image transformations than the Laplacian operator.In (Lindeberg 2013b, 2015)[2][3] it is shown that the determinant of the Hessian operator performs significantly better than the Laplacian operator or its difference-of-Gaussians approximation, as well as better than the Harris or Harris-Laplace operators, for image-based matching using local SIFT-like or SURF-like image descriptors, leading to higher efficiency values and lower 1-precision scores.

The hybrid Laplacian and determinant of the Hessian operator (Hessian-Laplace)

[edit]

A hybrid operator between the Laplacian and the determinant of the Hessian blob detectors has also been proposed, where spatial selection is done by the determinant of the Hessian and scale selection is performed with the scale-normalized Laplacian (Mikolajczyk and Schmid 2004):

(x^,y^)=argmaxlocal(x,y)((detHL)(x,y;t)){\displaystyle ({\hat {x}},{\hat {y}})=\operatorname {argmaxlocal} _{(x,y)}((\det HL)(x,y;t))}
t^=argmaxminlocalt((norm2L)(x^,y^;t)){\displaystyle {\hat {t}}=\operatorname {argmaxminlocal} _{t}((\nabla _{\mathrm {norm} }^{2}L)({\hat {x}},{\hat {y}};t))}

This operator has been used for image matching, object recognition as well as texture analysis.

Affine-adapted differential blob detectors

[edit]

The blob descriptors obtained from these blob detectors with automatic scale selection are invariant to translations, rotations and uniform rescalings in the spatial domain. The images that constitute the input to a computer vision system are, however, also subject to perspective distortions. To obtain blob descriptors that are more robust to perspective transformations, a natural approach is to devise a blob detector that isinvariant to affine transformations. In practice, affine invariant interest points can be obtained by applyingaffine shape adaptation to a blob descriptor, where the shape of the smoothing kernel is iteratively warped to match the local image structure around the blob, or equivalently a local image patch is iteratively warped while the shape of the smoothing kernel remains rotationally symmetric (Lindeberg and Garding 1997; Baumberg 2000; Mikolajczyk and Schmid 2004, Lindeberg 2008). In this way, we can define affine-adapted versions of the Laplacian/Difference of Gaussian operator, the determinant of the Hessian and the Hessian-Laplace operator (see alsoHarris-Affine andHessian-Affine).

Spatio-temporal blob detectors

[edit]

The determinant of the Hessian operator has been extended to joint space-time by Willems et al.[5] and Lindeberg,[6] leading to the following scale-normalized differential expression:

det(H(x,y,t),normL)=s2γsτγτ(LxxLyyLtt+2LxyLxtLytLxxLyt2LyyLxt2LttLxy2).{\displaystyle \det(H_{(x,y,t),\mathrm {norm} }L)=s^{2\gamma _{s}}\tau ^{\gamma _{\tau }}\left(L_{xx}L_{yy}L_{tt}+2L_{xy}L_{xt}L_{yt}-L_{xx}L_{yt}^{2}-L_{yy}L_{xt}^{2}-L_{tt}L_{xy}^{2}\right).}

In the work by Willems et al.,[5] a simpler expression corresponding toγs=1{\displaystyle \gamma _{s}=1} andγτ=1{\displaystyle \gamma _{\tau }=1} was used. In Lindeberg,[6] it was shown thatγs=5/4{\displaystyle \gamma _{s}=5/4} andγτ=5/4{\displaystyle \gamma _{\tau }=5/4} implies better scale selection properties in the sense that the selected scale levels obtained from a spatio-temporal Gaussian blob with spatial extents=s0{\displaystyle s=s_{0}} and temporal extentτ=τ0{\displaystyle \tau =\tau _{0}} will perfectly match the spatial extent and the temporal duration of the blob, with scale selection performed by detecting spatio-temporal scale-space extrema of the differential expression.

The Laplacian operator has been extended to spatio-temporal video data by Lindeberg,[6] leading to the following two spatio-temporal operators, which also constitute models of receptive fields of non-lagged vs. lagged neurons in the LGN:

t,norm((x,y),norm2L)=sγsτγτ/2(Lxxt+Lyyt),{\displaystyle \partial _{t,\mathrm {norm} }(\nabla _{(x,y),\mathrm {norm} }^{2}L)=s^{\gamma _{s}}\tau ^{\gamma _{\tau }/2}(L_{xxt}+L_{yyt}),}
tt,norm((x,y),norm2L)=sγsτγτ(Lxxtt+Lyytt).{\displaystyle \partial _{tt,\mathrm {norm} }(\nabla _{(x,y),\mathrm {norm} }^{2}L)=s^{\gamma _{s}}\tau ^{\gamma _{\tau }}(L_{xxtt}+L_{yytt}).}

For the first operator, scale selection properties call for usingγs=1{\displaystyle \gamma _{s}=1} andγτ=1/2{\displaystyle \gamma _{\tau }=1/2}, if we want this operator to assume its maximum value over spatio-temporal scales at a spatio-temporal scale level reflecting the spatial extent and the temporal duration of an onset Gaussian blob. For the second operator, scale selection properties call for usingγs=1{\displaystyle \gamma _{s}=1} andγτ=3/4{\displaystyle \gamma _{\tau }=3/4}, if we want this operator to assume its maximum value over spatio-temporal scales at a spatio-temporal scale level reflecting the spatial extent and the temporal duration of a blinking Gaussian blob.

Grey-level blobs, grey-level blob trees and scale-space blobs

[edit]

A natural approach to detect blobs is to associate a bright (dark) blob with each local maximum (minimum) in the intensity landscape. A main problem with such an approach, however, is that local extrema are very sensitive to noise. To address this problem, Lindeberg (1993, 1994) studied the problem of detecting local maxima with extent at multiple scales inscale space. A region with spatial extent defined from a watershed analogy was associated with each local maximum, as well a local contrast defined from a so-called delimiting saddle point. A local extremum with extent defined in this way was referred to as agrey-level blob. Moreover, by proceeding with the watershed analogy beyond the delimiting saddle point, agrey-level blob tree was defined to capture the nested topological structure of level sets in the intensity landscape, in a way that is invariant to affine deformations in the image domain and monotone intensity transformations. By studying how these structures evolve with increasing scales, the notion ofscale-space blobs was introduced. Beyond local contrast and extent, these scale-space blobs also measured how stable image structures are in scale-space, by measuring theirscale-space lifetime.

It was proposed that regions of interest and scale descriptors obtained in this way, with associated scale levels defined from the scales at which normalized measures of blob strength assumed their maxima over scales could be used for guiding other early visual processing. An early prototype of simplified vision systems was developed where such regions of interest and scale descriptors were used for directing the focus-of-attention of anactive vision system. While the specific technique that was used in these prototypes can be substantially improved with the current knowledge in computer vision, the overall general approach is still valid, for example in the way that local extrema over scales of the scale-normalized Laplacian operator are nowadays used for providing scale information to other visual processes.

Lindeberg's watershed-based grey-level blob detection algorithm

[edit]

For the purpose of detectinggrey-level blobs (local extrema with extent) from a watershed analogy, Lindeberg developed an algorithm based onpre-sorting the pixels, alternatively connected regions having the same intensity, in decreasing order of the intensity values. Then, comparisons were made between nearest neighbours of either pixels or connected regions.

For simplicity, consider the case of detecting bright grey-level blobs and let the notation "higher neighbour" stand for "neighbour pixel having a higher grey-level value". Then, at any stage in the algorithm (carried out in decreasing order of intensity values) is based on the following classification rules:

  1. If a region has no higher neighbour, then it is a local maximum and will be the seed of a blob. Set a flag which allows the blob to grow.
  2. Else, if it has at least one higher neighbour, which is background, then it cannot be part of any blob and must be background.
  3. Else, if it has more than one higher neighbour and if those higher neighbours are parts of different blobs, then it cannot be a part of any blob, and must be background. If any of the higher neighbors are still allowed to grow, clear their flag which allows them to grow.
  4. Else, it has one or more higher neighbours, which are all parts of the same blob. If that blob is still allowed to grow then the current region should be included as a part of that blob. Otherwise the region should be set to background.

Compared to other watershed methods, theflooding in this algorithm stops once the intensity level falls below the intensity value of the so-calleddelimiting saddle point associated with the local maximum. However, it is rather straightforward to extend this approach to other types of watershed constructions. For example, by proceeding beyond the first delimiting saddle point a "grey-level blob tree" can be constructed. Moreover, the grey-level blob detection method was embedded in ascale space representation and performed at all levels of scale, resulting in a representation called thescale-space primal sketch.

This algorithm with its applications in computer vision is described in more detail in Lindeberg's thesis[7] as well as the monograph on scale-space theory[8] partially basedon that work. Earlier presentations of this algorithm can also be found in .[9][10] More detailed treatments of applications of grey-level blob detection and the scale-space primal sketch to computer vision and medicalimage analysis are given in .[11][12][13]

Maximally stable extremal regions (MSER)

[edit]
Main article:Maximally stable extremal regions

Matas et al. (2002) were interested in defining image descriptors that are robust underperspective transformations. They studied level sets in the intensity landscape and measured how stable these were along the intensity dimension. Based on this idea, they defined a notion ofmaximally stable extremal regions and showed how these image descriptors can be used as image features forstereo matching.

There are close relations between this notion and the above-mentioned notion of grey-level blob tree. The maximally stable extremal regions can be seen as making a specific subset of the grey-level blob tree explicit for further processing.

See also

[edit]

References

[edit]
  1. ^abLindeberg, Tony (June 2013)."Scale Selection Properties of Generalized Scale-Space Interest Point Detectors".Journal of Mathematical Imaging and Vision.46 (2):177–210.Bibcode:2013JMIV...46..177L.doi:10.1007/s10851-012-0378-3.ISSN 0924-9907.
  2. ^abLindeberg, Tony (2013), Kuijper, Arjan; Bredies, Kristian; Pock, Thomas; Bischof, Horst (eds.),"Image Matching Using Generalized Scale-Space Interest Points",Scale Space and Variational Methods in Computer Vision, Lecture Notes in Computer Science, vol. 7893, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 355–367,doi:10.1007/978-3-642-38267-3_30,ISBN 978-3-642-38266-6, retrieved2025-07-09{{citation}}: CS1 maint: work parameter with ISBN (link)
  3. ^abcdLindeberg, Tony (May 2015)."Image Matching Using Generalized Scale-Space Interest Points".Journal of Mathematical Imaging and Vision.52 (1):3–36.Bibcode:2015JMIV...52....3L.doi:10.1007/s10851-014-0541-0.ISSN 0924-9907.
  4. ^T. LindebergScale invariant feature transform, Scholarpedia, 7(5):10491, 2012.
  5. ^abGeert Willems, Tinne Tuytelaars and Luc van Gool (2008). "An efficient dense and scale-invariant spatiotemporal-temporal interest point detector".European Conference on Computer Vision. Springer Lecture Notes in Computer Science. Vol. 5303. pp. 650–663.doi:10.1007/978-3-540-88688-4_48.
  6. ^abcTony Lindeberg (2018)."Spatio-temporal scale selection in video data".Journal of Mathematical Imaging and Vision.60 (4):525–562.Bibcode:2018JMIV...60..525L.doi:10.1007/s10851-017-0766-9.S2CID 4430109.
  7. ^Lindeberg, Tony (May 1991).Discrete Scale-Space Theory and the Scale-Space Primal Sketch (PhD thesis).KTH Royal Institute of Technology. Retrieved2025-07-10.
  8. ^Lindeberg, Tony (1994).Scale-space theory in computer vision. Kluwer international series in engineering and computer science. Boston: Kluwer Academic.ISBN 978-0-7923-9418-1.
  9. ^Lindeberg, T.; Eklundh, J.O. (1990).Scale detection and region extraction from a scale-space primal sketch. Third International Conference on Computer Vision. Osaka: IEEE Comput. Soc. Press. pp. 416–426.doi:10.1109/ICCV.1990.139563.ISBN 978-0-8186-2057-7.
  10. ^T. Lindeberg and J.-O. Eklundh, "On the computation of a scale-space primal sketch",Journal of Visual Communication and Image Representation, vol. 2, pp. 55--78, Mar. 1991.
  11. ^Lindeberg, Tony (December 1993)."Detecting salient blob-like image structures and their scales with a scale-space primal sketch: A method for focus-of-attention".International Journal of Computer Vision.11 (3):283–318.doi:10.1007/BF01469346.ISSN 0920-5691.
  12. ^Lindeberg, Tony; Lidberg, Pär; Roland, Per E. (1999)."Analysis of brain activation patterns using a 3-D scale-space primal sketch".Human Brain Mapping.7 (3):166–194.doi:10.1002/(SICI)1097-0193(1999)7:3<166::AID-HBM3>3.0.CO;2-I.ISSN 1065-9471.PMC 6873316.PMID 10194618.
  13. ^Mangin, J.-F; Rivière, D; Coulon, O; Poupon, C; Cachia, A; Cointepas, Y; Poline, J.-B; Bihan, D.Le; Régis, J; Papadopoulos-Orfanos, D (February 2004)."Coordinate-based versus structural approaches to brain image analysis".Artificial Intelligence in Medicine.30 (2):177–197.doi:10.1016/S0933-3657(03)00064-2.PMID 14992763.

Further reading

[edit]
Retrieved from "https://en.wikipedia.org/w/index.php?title=Blob_detection&oldid=1321830491"
Category:
Hidden categories:

[8]ページ先頭

©2009-2026 Movatter.jp