Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

M-tree

From Wikipedia, the free encyclopedia
Tree data structure
This articlemay be too technical for most readers to understand. Pleasehelp improve it tomake it understandable to non-experts, without removing the technical details.(October 2021) (Learn how and when to remove this message)

Incomputer science,M-trees aretree data structures that are similar toR-trees andB-trees. It is constructed using ametric and relies on thetriangle inequality for efficient range andk-nearest neighbor (k-NN) queries.While M-trees can perform well in many conditions, the tree can also have large overlap and there is no clear strategy on how to best avoid overlap. In addition, it can only be used fordistance functions that satisfy the triangle inequality, while many advanced dissimilarity functions used ininformation retrieval do not satisfy this.[1]

Overview

[edit]
2D M-Tree visualized usingELKI. Every blue sphere (leaf) is contained in a red sphere (directory nodes). Leaves overlap, but not too much; directory nodes overlap much more here.

As in any tree-based data structure, the M-tree is composed of nodes and leaves. In each node there is a data object that identifies it uniquely and a pointer to a sub-tree where its children reside. Every leaf has several data objects. For each node there is a radiusr{\displaystyle r} that defines a Ball in the desired metric space. Thus, every noden{\displaystyle n} and leafl{\displaystyle l} residing in a particular nodeN{\displaystyle N} is at most distancer{\displaystyle r} fromN{\displaystyle N}, and every noden{\displaystyle n} and leafl{\displaystyle l} with node parentN{\displaystyle N} keep the distance from it.

M-tree construction

[edit]

Components

[edit]

An M-tree has these components and sub-components:

  1. Non-leaf nodes
    1. A set of routing objects NRO.
    2. Pointer to Node's parent object Op.
  2. Leaf nodes
    1. A set of objects NO.
    2. Pointer to Node's parent object Op.
  3. Routing Object
    1. (Feature value of) routing object Or.
    2. Covering radius r(Or).
    3. Pointer to covering tree T(Or).
    4. Distance of Or from its parent object d(Or,P(Or))
  4. Object
    1. (Feature value of the) object Oj.
    2. Object identifier oid(Oj).
    3. Distance of Oj from its parent object d(Oj,P(Oj))

Insert

[edit]

The main idea is first to find a leaf nodeN where the new objectO belongs. IfN is not full then just attach it toN. IfN is full then invoke a method to splitN. The algorithm is as follows:

Algorithm Insert  Input: NodeN of M-TreeMT,EntryOn{\displaystyle O_{n}}  Output: A new instance ofMT containing all entries in originalMT plusOn{\displaystyle O_{n}}
NeN{\displaystyle N_{e}\gets N}'s routing objects or objectsifN is not a leafthen  {       /* Look for entries that the new object fits into */letNin{\displaystyle N_{in}} be routing objects fromNe{\displaystyle N_{e}}'s set of routing objectsNRO{\displaystyle N_{RO}}such thatd(Or,On)r(Or){\displaystyle d(O_{r},O_{n})\leq r(O_{r})}ifNin{\displaystyle N_{in}} is not emptythen       {          /* If there are one or more entry, then look for an entry such that is closer to the new object */OrminOrNind(Or,On){\displaystyle O_{r}^{*}\gets \min _{O_{r}\in N_{in}}d(O_{r},O_{n})}       }else       {          /* If there are no such entry, then look for an object with minimal distance from */           /* its covering radius's edge to the new object */OrminOrNind(Or,On)r(Or){\displaystyle O_{r}^{*}\gets \min _{O_{r}\in N_{in}}d(O_{r},O_{n})-r(O_{r})}          /* Upgrade the new radii of the entry */r(Or)d(Or,On){\displaystyle r(O_{r}^{*})\gets d(O_{r}^{*},O_{n})}       }       /* Continue inserting in the next level */return insert(T(Or),On{\displaystyle T(O_{r}^{*}),O_{n}});else  {       /* If the node has capacity then just insert the new object */ifN is not fullthen       {store(N,On{\displaystyle N,O_{n}}) }       /* The node is at full capacity, then it is needed to do a new split in this level */else       {split(N,On{\displaystyle N,O_{n}}) }  }
  • "←" denotesassignment. For instance, "largestitem" means that the value oflargest changes to the value ofitem.
  • "return" terminates the algorithm and outputs the following value.

Split

[edit]

If the split method arrives to the root of the tree, then it choose two routing objects fromN, and creates two new nodes containing all the objects in originalN, and store them into the new root. If split methods arrives to a nodeN that is not the root of the tree, the method choose two new routing objects fromN, re-arrange every routing object inN in two new nodesN1{\displaystyle N_{1}} andN2{\displaystyle N_{2}}, and store these new nodes in the parent nodeNp{\displaystyle N_{p}} of originalN. The split must be repeated ifNp{\displaystyle N_{p}} has not enough capacity to storeN2{\displaystyle N_{2}}. The algorithm is as follow:

Algorithm Split  Input: NodeN of M-TreeMT,EntryOn{\displaystyle O_{n}}  Output: A new instance ofMT containing a new partition.
  /* The new routing objects are now all those in the node plus the new routing object */  let beNN entries ofNO{\displaystyle N\cup O}ifN is not the rootthen  {     /*Get the parent node and the parent routing object*/letOp{\displaystyle O_{p}} be the parent routing object ofNletNp{\displaystyle N_{p}} be the parent node ofN  }  /* This node will contain part of the objects of the node to be split */  Create a new nodeN'  /* Promote two routing objects from the node to be split, to be new routing objects */  Create new objectsOp1{\displaystyle O_{p1}} andOp2{\displaystyle O_{p2}}.  Promote(N,Op1,Op2{\displaystyle N,O_{p1},O_{p2}})  /* Choose which objects from the node being split will act as new routing objects */  Partition(N,Op1,Op2,N1,N2{\displaystyle N,O_{p1},O_{p2},N_{1},N_{2}})  /* Store entries in each new routing object */StoreN1{\displaystyle N_{1}}'s entries inN andN2{\displaystyle N_{2}}'s entries inN'ifN is the current rootthen  {      /* Create a new node and set it as new root and store the new routing objects */Create a new root nodeNp{\displaystyle N_{p}}StoreOp1{\displaystyle O_{p1}} andOp2{\displaystyle O_{p2}} inNp{\displaystyle N_{p}}  }else  {      /* Now use the parent routing object to store one of the new objects */Replace entryOp{\displaystyle O_{p}} with entryOp1{\displaystyle O_{p1}} inNp{\displaystyle N_{p}}ifNp{\displaystyle N_{p}} is no fullthen      {          /* The second routing object is stored in the parent only if it has free capacity */StoreOp2{\displaystyle O_{p2}} inNp{\displaystyle N_{p}}      }else      {           /*If there is no free capacity then split the level up*/split(Np,Op2{\displaystyle N_{p},O_{p2}})      }  }
  • "←" denotesassignment. For instance, "largestitem" means that the value oflargest changes to the value ofitem.
  • "return" terminates the algorithm and outputs the following value.

M-tree queries

[edit]

Range query

[edit]

A range query is where a minimum similarity/maximum distance value is specified. For a given query objectQD{\displaystyle Q\in D} and a maximum search distancer(Q){\displaystyle r(Q)}, the range queryrange(Q, r(Q)) selects all the indexed objectsOj{\displaystyle O_{j}} such thatd(Oj,Q)r(Q){\displaystyle d(O_{j},Q)\leq r(Q)}.[2]

Algorithm RangeSearch starts from the root node and recursively traverses all the paths which cannot be excluded from leading to qualifying objects.

Algorithm RangeSearchInput: NodeN of M-Tree MT,Q: query object,r(Q){\displaystyle r(Q)}: search radius
Output: all the DB objectssuch thatd(Oj,Q)r(Q){\displaystyle d(Oj,Q)\leq r(Q)}
{letOp{\displaystyle O_{p}}be the parent object of nodeN;ifNis not a leafthen {for eachentry(Or{\displaystyle O_{r}})inNdo {if|d(Op,Q)d(Or,Op)|r(Q)+r(Or){\displaystyle |d(O_{p},Q)-d(O_{r},O_{p})|\leq r(Q)+r(O_{r})}then {Computed(Or,Q){\displaystyle d(O_{r},Q)};ifd(Or,Q)r(Q)+r(Or){\displaystyle d(O_{r},Q)\leq r(Q)+r(O_{r})}thenRangeSearch(*ptr(T(Or{\displaystyle T(O_{r}})),Q,r(Q){\displaystyle r(Q)});           }    }  }else {for eachentry(Oj{\displaystyle O_{j}})inNdo {if|d(Op,Q)d(Oj,Op)|r(Q){\displaystyle |d(O_{p},Q)-d(O_{j},O_{p})|\leq r(Q)}then {Computed(Oj,Q){\displaystyle d(O_{j},Q)};ifd(Oj,Q){\displaystyle d(O_{j},Q)}r(Q){\displaystyle r(Q)}thenaddoid(Oj){\displaystyle oid(O_{j})} to the result;          }    }  }}
  • "←" denotesassignment. For instance, "largestitem" means that the value oflargest changes to the value ofitem.
  • "return" terminates the algorithm and outputs the following value.

k-NN queries

[edit]

k-nearest neighbor (k-NN) query takes the cardinality of the input set as an input parameter. For a given query object Q ∈ D and anintegerk ≥ 1, thek-NN query NN(Q, k) selects thek indexed objects which have the shortest distance from Q, according to the distance function d.[2]

See also

[edit]

References

[edit]
  1. ^Ciaccia, Paolo; Patella, Marco; Zezula, Pavel (1997)."M-tree An Efficient Access Method for Similarity Search in Metric Spaces"(PDF).Proceedings of the 23rd VLDB Conference Athens, Greece, 1997. IBM Almaden Research Center: Very Large Databases Endowment Inc. pp. 426–435. p426. Retrieved2010-09-07.
  2. ^abP. Ciaccia; M. Patella; F. Rabitti; P. Zezula."Indexing Metric Spaces with M-tree"(PDF).Department of Computer Science and Engineering. University of Bologna. p. 3. Retrieved19 November 2013.
Search trees
(dynamic sets,
associative arrays)
Heaps
Tries
Spatial data
partitioning trees
Other trees
Portals:
Retrieved from "https://en.wikipedia.org/w/index.php?title=M-tree&oldid=1294402108"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2026 Movatter.jp