Movatterモバイル変換

Vantage-point tree

From Wikipedia, the free encyclopedia

(Redirected fromMVP tree)

Avantage-point tree (orVP tree) is ametric tree that segregates data in ametric space by choosing a position in the space (the "vantage point") and partitioning the data points into two parts: those points that are nearer to the vantage point than a threshold, and those points that are not. By recursively applying this procedure to partition the data into smaller and smaller sets, atree data structure is created where neighbors in the tree are likely to be neighbors in the space.^[1]

One generalization is called amulti-vantage-point tree (orMVP tree): adata structure for indexing objects from largemetric spaces forsimilarity search queries. It uses more than one point to partition each level.^[2]^[3]

History

[edit]

Peter Yianilos claimed that the vantage-point tree was discovered independently by him (Peter Yianilos) and byJeffrey Uhlmann.^[1]Yet,Uhlmann published this method before Yianilos in 1991.^[4]Uhlmann called the data structure ametric tree, the name VP-tree was proposed by Yianilos. Vantage-point trees have been generalized to non-metric spaces usingBregman divergence by Nielsen et al.^[5]

This iterative partitioning process is similar to that of ak-d tree, but uses circular (or spherical, hyperspherical, etc.) rather than rectilinear partitions. In two-dimensional Euclidean space, this can be visualized as a series of circles segregating the data.

The vantage-point tree is particularly useful in dividing data in a non-standard metric space into a metric tree.

Understanding a vantage-point tree

[edit]

The way a vantage-point tree stores data can be represented by a circle.^[6] First, understand that eachnode of thistree contains an input point and a radius. All the left children of a givennode are the points inside the circle and all the right children of a givennode are outside of the circle. Thetree itself does not need to know any other information about what is being stored. All it needs is the distance function that satisfies the properties of themetric space.^[6]

Searching through a vantage-point tree

[edit]

A vantage-point tree can be used to find the nearest neighbor of a pointx. The search algorithm is recursive. At any given step we are working with a node of the tree that has a vantage pointv and a threshold distancet. The point of interestx will be some distance from the vantage pointv. If that distanced is less thant then use the algorithm recursively to search the subtree of the node that contains the points closer to the vantage point than the thresholdt; otherwise recurse to the subtree of the node that contains the points that are farther than the vantage point than the thresholdt. If the recursive use of the algorithm finds a neighboring pointn with distance tox that is less than|t −d| then it cannot help to search the other subtree of this node; the discovered noden is returned. Otherwise, the other subtree also needs to be searched recursively.

A similar approach works for finding thek nearest neighbors of a pointx. In the recursion, the other subtree is searched fork −k′ nearest neighbors of the pointx whenever onlyk′ (<k) of the nearest neighbors found so far have distance that is less than|t −d|.

Advantages of a vantage-point tree

[edit]

Instead of inferring multidimensional points for domain before the index being built, we build the index directly based on the distance.^[6] Doing this, avoids pre-processing steps.
Updating a vantage-point tree is relatively easy compared to the FastMap approach. For FastMap, after inserting or deleting data, there will come a time when FastMap will have to rescan itself. That takes up too much time and it is unclear to know when the rescanning will start.^[6]
Distance based methods are flexible. It is “able to index objects that are represented as feature vectors of a fixed number of dimensions."^[6]

Complexity

[edit]

The time cost to build a vantage-point tree is approximatelyO(n logn). For each element, the tree is descended bylogn levels to find its placement. However there is a constant factork wherek is the number of vantage points per tree node.^[3]

The time cost to search a vantage-point tree to find a single nearest neighbor isO(logn). There arelogn levels, each involvingk distance calculations, wherek is the number of vantage points (elements) at that position in the tree.

The time cost to search a vantage-point tree for a range, which may be the most important attribute, can vary greatly depending on the specifics of the algorithm used and parameters. Brin's paper^[3] gives the result of experiments with several vantage point algorithms with various parameters to investigate the cost, measured in number of distance calculations.

The space cost for a vantage-point tree is approximatelyn. Each element is stored, and each tree element in each non-leaf node requires a pointer to its descendant nodes. (See Brin for details on one implementation choice. The parameter for number of elements at each node plays a factor.)

Withn points there areO(n²) pairwise distances between points. However, the creation of a vantage-point tree requires that onlyO(n logn) distances be calculated explicitly, and a search requires onlyO(logn) distance calculations. For example, ifx andy are points and it is known that the distanced(x,y) is small then any pointz that is far fromx will also necessarily be almost as far fromy because the metric space'striangle inequality givesd(y,z) ≥d(x,z) −d(x,y).

References

[edit]

^^a ^bYianilos (1993).Data structures and algorithms for nearest neighbor search in general metric spaces. Fourth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics Philadelphia, PA, USA. pp. 311–321.
^Bozkaya, Tolga; Ozsoyoglu, Meral (September 1999)."Indexing Large Metric Spaces for Similarity Search Queries".ACM Trans. Database Syst.24 (3):361–404.doi:10.1145/328939.328959.ISSN 0362-5915.S2CID 6486308.
^^a ^b ^cBrin, Sergey (Sep 1995)."Near Neighbor Search in Large Metric Spaces".VLDB '95 Proceedings of the 21th [sic] International Conference on Very Large Data Bases. Zurich, Switzerland: Morgan Kaufmann Publishers Inc.:574–584.ISBN 9781558603790.
^Uhlmann, Jeffrey (1991). "Satisfying General Proximity/Similarity Queries with Metric Trees".Information Processing Letters.40 (4):175–179.doi:10.1016/0020-0190(91)90074-r.
^Nielsen, Frank (2009). "Bregman vantage point trees for efficient nearest Neighbor Queries".Proceedings of Multimedia and Exp (ICME). IEEE. pp. 878–881.
^^a ^b ^c ^d ^eFu, Ada Wai-chee; Polly Mei-shuen Chan; Yin-Ling Cheung; Yiu Sang Moon (2000)."Dynamic vp-tree indexing forn-nearest neighbor search given pair-wise distances".The VLDB Journal — The International Journal on Very Large Data Bases. Springer-Verlag New York, Inc. Secaucus, NJ, USA. pp. 154–173. vp. Retrieved2012-10-02.

External links

[edit]

Understanding VP Trees

v t e Tree data structures
Search trees (dynamic sets/associative arrays)	2–3 2–3–4 AA (a,b) AVL B B+ B* B^x (Optimal) Binary search Dancing HTree Interval Order statistic Palindrome (Left-leaning) Red–black Scapegoat Splay T Treap UB Weight-balanced
Heaps	Binary Binomial Brodal d-ary Fibonacci Leftist Pairing Skew binomial Skew van Emde Boas Weak
Tries	Ctrie C-trie (compressed ADT) Hash Radix Suffix Ternary search X-fast Y-fast
Spatial data partitioning trees	Ball BK BSP Cartesian Hilbert R k-d (implicitk-d) M Metric MVP Octree PH Priority R Quad R R+ R* Segment VP X
Other trees	Cover Exponential Fenwick Finger Fractal tree index Fusion Hash calendar iDistance K-ary Left-child right-sibling Link/cut Log-structured merge Merkle PQ Range SPQR Top

Retrieved from "https://en.wikipedia.org/w/index.php?title=Vantage-point_tree&oldid=1250107370"

Categories:

[8]ページ先頭