![]() | This article has multiple issues. Please helpimprove it or discuss these issues on thetalk page.(Learn how and when to remove these messages) (Learn how and when to remove this message)
|
B+ tree | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Type | Tree (data structure) | |||||||||||||||||||||||
|
AB+ tree is anm-ary tree with a variable but often large number of children per node. A B+ tree consists of a root, internal nodes and leaves.[1] The root may be either a leaf or a node with two or more children.
A B+ tree can be viewed as aB-tree in which each node contains only keys (not key–value pairs), and to which an additional level is added at the bottom with linked leaves.
The primary value of a B+ tree is in storing data for efficient retrieval in ablock-oriented storage context—in particular,filesystems. This is primarily because unlikebinary search trees, B+ trees have very high fanout (number of pointers to child nodes in a node,[1] typically on the order of 100 or more), which reduces the number of I/O operations required to find an element in the tree.
There is no single paper introducing the B+ tree concept. Instead, the notion of maintaining all data in leaf nodes is repeatedly brought up as an interesting variant of the B-tree, which was introduced by R. Bayer and E. McCreight.[2]Douglas Comer notes in an early survey of B-trees (which also covers B+ trees) that the B+ tree was used in IBM'sVSAM data access software, and refers to an IBM published article from 1973.[3]
As with other trees, B+ trees can be represented as a collection of three types of nodes:root,internal (a.k.a. interior), andleaf. In B+ trees, the following properties are maintained for these nodes:
The pointer properties of nodes are summarized in the tables below:
when exists | when and exist | when exists, and does not exist | when and do not exist |
---|---|---|---|
Points to subtree in which all search keys are less than. | Points to subtree in which all search keys are greater than or equal to and are less than. | Points to subtree in which all search keys are greater than or equal to. | Here, is empty. |
when exists | when does not exist and | |
---|---|---|
Points to a record with a value equal to. | Here, is empty. | Points to the next leaf in the tree. |
The node bounds are summarized in the table below:[4][5]
Node Type | Min Number of Keys | Max Number of Keys | Min Number of Child Nodes | Max Number of Child Nodes |
---|---|---|---|---|
Root Node (when it is a leaf node) | 0 | 0 | 0 | |
Root Node (when it is an internal node) | 1 | 2[1] | ||
Internal Node | ||||
Leaf Node | 0 | 0 |
By definition, each value contained within the B+ tree is a key contained in exactly one leaf node. Each key is required to be directlycomparable with every other key, which forms atotal order.[6] This enables each leaf node to keep all of its keys sorted at all times, which then enables each internal node to construct an ordered collection ofintervals representing the contiguous extent of values contained in a given leaf. Internal nodes higher in the tree can then construct their own intervals, which recursively aggregate the intervals contained in their own child internal nodes. Eventually, the root of a B+ Tree represents the whole range of values in the tree, where every internal node represents a subinterval.
For this recursive interval information to be retained, internal nodes must additionally contain copies of keys for representing the least element within the interval covered by the child with indexi (which may itself be an internal node, or a leaf). Wherem represents theactual number of children for a given internal node.
Theorder orbranching factorb of a B+ tree measures the capacity of interior nodes, i.e. their maximum allowed number of direct child nodes. This value is constant over the entire tree. For ab-order B+ tree withh levels of index:[citation needed]
We are looking for a valuek in the B+ Tree. This means that starting from the root, we are looking for the leaf which may contain the valuek. At each node, we figure out which internal node we should follow. An internal B+ Tree node has at most children, where every one of them represents a different sub-interval. We select the corresponding child via a linear search of them entries, then when we finally get to a leaf, we do a linear search of itsn elements for the desired key. Because we only traverse one branch of all the children at each rung of the tree, we achieve runtime, whereN is the total number of keys stored in the leaves of the B+ tree.[4]
function search(k,root)islet leaf = leaf_search(k, root)for leaf_keyin leaf.keys():if k = leaf_key:return truereturn false
function leaf_search(k,node)isif node is a leaf:return nodelet p = node.children()let l = node.left_sided_intervals()assertlet m = p.len()for ifrom 1to m - 1:if:return leaf_search(k, p[i])return leaf_search(k, p[m])
Note that thispseudocode uses 1-based array indexing.
B+ trees grow at the root and not at the leaves.[1]
Given a collection of data records, we want to create a B+ tree index on some key field. One approach is to insert each record into an empty tree. However, it is quite expensive, because each entry requires us to start from the root and go down to the appropriate leaf page. An efficient alternative is to use bulk-loading.
Note:
The purpose of the delete algorithm is to remove the desired entry node from the tree structure. Werecursively call the delete algorithm on the appropriate node until no node is found. For each function call, we traverse along, using the index to navigate until we find the node, remove it, and then work back up to the root.
At entry L that we wish to remove:
Note: merge could propagate to root, which means decreasing height.[10]
The leaves (the bottom-most index blocks) of the B+ tree are often linked to one another in a linked list; this makes range queries or an (ordered) iteration through the blocks simpler and more efficient (though the aforementioned upper bound can be achieved even without this addition). This does not substantially increase space consumption or maintenance on the tree. This illustrates one of the significant advantages of a B+tree over a B-tree; in a B-tree, since not all keys are present in the leaves, such an ordered linked list cannot be constructed. A B+tree is thus particularly useful as adatabase system index, where the data typically resides on disk, as it allows the B+tree to actually provide an efficient structure for housing the data itself (this is described in[11]: 238 as index structure "Alternative 1").
If a storage system has a block size of B bytes, and the keys to be stored have a size of k, arguably the most efficient B+ tree is one where. Although theoretically the one-off is unnecessary, in practice there is often a little extra space taken up by the index blocks (for example, the linked list references in the leaf blocks). Having an index block which is slightly larger than the storage system's actual block represents a significant performance decrease; therefore erring on the side of caution is preferable.
If nodes of the B+ tree are organized as arrays of elements, then it may take a considerable time to insert or delete an element as half of the array will need to be shifted on average. To overcome this problem, elements inside a node can be organized in a binary tree or a B+ tree instead of an array.
B+ trees can also be used for data stored in RAM. In this case a reasonable choice for block size would be the size of processor's cache line.
Space efficiency of B+ trees can be improved by using some compression techniques. One possibility is to usedelta encoding to compress keys stored into each block. For internal blocks, space saving can be achieved by either compressing keys or pointers. For string keys, space can be saved by using the following technique: Normally thei-th entry of an internal block contains the first key of block. Instead of storing the full key, we could store the shortest prefix of the first key of block that is strictly greater (in lexicographic order) than last key of blocki. There is also a simple way to compress pointers: if we suppose that some consecutive blocks are stored contiguously, then it will suffice to store only a pointer to the first block and the count of consecutive blocks.
All the above compression techniques have some drawbacks. First, a full block must be decompressed to extract a single element. One technique to overcome this problem is to divide each block into sub-blocks and compress them separately. In this case searching or inserting an element will only need to decompress or compress a sub-block instead of a full block. Another drawback of compression techniques is that the number of stored elements may vary considerably from a block to another depending on how well the elements are compressed inside each block.
TheReiserFS,NSS,XFS,JFS,ReFS, andBFS filesystems all use this type of tree for metadata indexing; BFS also uses B+ trees for storing directories.NTFS uses B+ trees for directory and security-related metadata indexing. EXT4 uses extent trees (a modified B+ tree data structure) for file extent indexing.[12]APFS uses B+ trees to store mappings from filesystem object IDs to their locations on disk, and to store filesystem records (including directories), though these trees' leaf nodes lack sibling pointers.[13]
Relational database management systems such asIBM Db2,[11]Informix,[11]Microsoft SQL Server,[11]Oracle 8,[11]Sybase ASE,[11] andSQLite[14][full citation needed] support this type of tree for table indices, though each such system implements the basic B+ tree structure with variations and extensions. ManyNoSQL database management systems such asCouchDB[15][full citation needed][a] andTokyo Cabinet[16] also support this type of tree for data access and storage.
Finding objects in ahigh-dimensional database that are comparable to a particular query object is one of the most often utilized and yet expensive procedures in such systems.[citation needed] In such situations, finding the closest neighbor using a B+ tree is productive.[17][full citation needed]
B+ tree is efficiently used to construct an indexed search method called iDistance. iDistance searches for k nearest neighbors (kNN) in high-dimension metric spaces. The data in those high-dimension spaces is divided based on space or partition strategies, and each partition has an index value that is close with the respect to the partition. From here, those points can be efficiently implemented using B+ tree, thus, the queries are mapped to single dimensions ranged search. In other words, the iDistance technique can be viewed as a way of accelerating the sequential scan. Instead of scanning records from the beginning to the end of the data file, the iDistance starts the scan from spots where the nearest neighbors can be obtained early with a very high probability.[18]
Nonvolatile random-access memory (NVRAM) has been using B+ tree structure as the main memory access technique for the Internet Of Things (IoT) system because of its non static power consumption and high solidity of cell memory. B+ can regulate the trafficking of data to memory efficiently. Moreover, with advanced strategies on frequencies of some highly used leaf or reference point, the B+ tree shows significant results in increasing the endurance of database systems.[19]
{{cite book}}
: CS1 maint: location missing publisher (link)![]() | This article'suse ofexternal links may not follow Wikipedia's policies or guidelines. Pleaseimprove this article by removingexcessive orinappropriate external links, and converting useful links where appropriate intofootnote references.(January 2019) (Learn how and when to remove this message) |