BACKGROUNDBackgroundTree structures can be used to store data in an ordered fashion. For instance, one kind of tree structure is a balanced tree or “b-tree.” B-trees comprise a number of nodes organized along a parent-child relationship. In general, parent nodes will store data (or pointers and/or links to data) having a particular value and link to a number of “children” nodes that also store data (or, again, links to data) having a particular value. At leaf level, the nodes will store data. Typically, a given parent node will have a “left” child that stores values less than the smallest value stored by the parent and a number of “right” children, each corresponding to a subset of values in parent, that store data having values greater than the greatest value in that particular subset in the parent. Consider, for instance, a simple b-tree having three nodes. If the parent node stores data with a value of2, then the left child node might store data with a value of 1 and the right child node might store data with a value of 3. When a tree has both its left arm and right arm populated (and any associated sub-arms) the tree is said to be “balanced.”
Existing algorithms to build b-trees require comparing a data element (e.g., a key) to be inserted with the data elements that have already been inserted to find the correct position to insert the data element. Additionally, the b-tree needs to be balanced and/or rebalanced when any individual node gets over-filled. As part of this balancing, data elements stored in the various nodes must be moved to other nodes. All of these operations can incur both time and resource costs. It is, therefore, better to minimize these operations as much as possible.
BRIEF DESCRIPTION OF THE DRAWINGSThe accompanying drawings are incorporated herein and form a part of the specification.
FIG. 1 is an example diagram depicting the structure of a b-tree according to various embodiments.
FIGS. 2A-2F are example diagrams conceptually depicting a b-tree at various points during its construction according to embodiments.
FIG. 3 is a flowchart depicting a method of constructing a b-tree according to various embodiments.
FIG. 4 is a flowchart depicting a method of populating a b-tree according to various embodiments.
FIGS. 5A-5C are example diagrams conceptually depicting a b-tree at various points during its construction according to embodiments.
FIG. 6 is a flowchart depicting a method finalizing a b-tree according to various embodiments.
FIG. 7 is a flowchart depicting a method of constructing a b-tree according to various embodiments.
FIG. 8 is a flowchart depicting a method of adding elements to an extant b-tree according to various embodiments.
FIG. 9 is a flowchart depicting a method of undoing finalization of a b-tree according to various embodiments.
FIG. 10 is an example computer system useful for implementing various embodiments.
In the drawings, like reference numbers generally indicate identical or similar elements. Additionally, generally, the left-most digit(s) of a reference number identifies the drawing in which the reference number first appears.
DETAILED DESCRIPTIONProvided herein are system, method and/or computer program product embodiments, and/or combinations and sub-combinations thereof, for storing data in a database using a tiered index architecture.
FIG. 1 depicts a simple generalpurpose tree structure100 according to various embodiments. As shown, thetree structure100 comprises a number of different nodes including aroot node102,intermediate nodes104aand104b(collectively referred to herein as “intermediate nodes104”), and several leaf nodes106a-f(collectively referred to herein as “leaf nodes106”). Each of the nodes may be associated with a block of data that is configured to store a number of keys or data elements.
Tree structures such astree structure100 can be organized as balanced trees (also known as “b-trees”). B-trees are ordered in a specific way. For instance, iftree structure100 were organized as a b-tree,root node102 would comprise a block of data storing one or more data elements. Each of the data elements may be a pair of data comprising a value associated with that data element and a so-called “right link” to another node and/or sub-tree. Additionally, each node may include a “left link” that links to another node and/or sub-tree.
In general the nodes and/or sub-trees linked to by the right link of a particular data element contain values that are greater than the value of the particular data element. For instance, as shown inFIG. 1, data element D30 contains a right link tointermediate node104b.Thus, it can be assumed that the values stored in the block associated withnode104b(and all of its associated sub-nodes) are greater than the value associated with data element D30.
In contrast, a node's left link will link to a node and/or sub-tree having values that are less than the smallest value stored in a particular node. For instance,FIG. 1 depictsroot node102 having a left link tointermediate node104a.It may, therefore, be assumed that the values stored innode104a(i.e., the values associated with D10 and D20) are less than the value associated with D30, which is the smallest data element stored inroot node102.
The relationship between nodes continues as you traverse down thetree100. For instance,node104acontains a left link tonode106a.Node106acontains data elements with values that are less than the value of data element D10. Similarly, data element D10 contains a right link tonode106a.As shown,node106acontains values that are greater than the value of the data element D10.
As briefly discussed above, existing algorithms to build balanced b-trees can prove a drain on resources in terms of time and in terms of hardware requirements. For instance, in order to determine where a particular data element (e.g., a key) should be inserted into a b-tree, it should be first determined where it fits into the established order. This is done by traversing the tree and comparing the value of the data element to the value of the data elements stored in the tree. For large b-trees, this can be rather time consuming.
Another step that must be performed in order to manage b-trees is to balance the b-tree when any node gets over-filled. For instance, some schemes might require nodes to remain at least half-full. To accomplish this balancing, data elements are moved from node to node and new nodes are created. Again, this can be rather time consuming and resource intensive. Additionally, some standard balancing algorithms result in a b-tree with nodes that are at least half full, but not entirely full. This is not the most efficient use of memory.
A better way to manage b-trees may be to eliminate or, at the very least, to reduce or even minimize these time-consuming steps.
One way the time-consuming steps can be eliminated and/or reduced is to institute one or more additional policies such as:
- (1) pre-sort the data to be inserted into the b-tree so that it is in a particular order (e.g., ascending). In this way, any new data element is guaranteed to be “next” in line from the data elements already in the b-tree; and/or
- (2) relax (or eliminate) any requirement that nodes be at least half full.
With such additional policies, the b-tree can be built using a “bottom-up” approach. To use the bottom-up approach, the nodes of the b-tree are prefilled to full capacity in a bottom up manner.
This begins by inserting data elements into an initial leaf node until it is full. When a leaf node is full, the next data element can be added to the leaf node's parent node. If the parent node does not exist, then a new node can be created as the leaf node's parent and the next data element inserted therein.
Once a data element is added to any level other than the leaf, a new node should be created at the leaf node level. This new leaf node can then be filled with the subsequent data elements. This process can continue until the data elements in the input are filled into a node.
Once the data elements are filled into the appropriate nodes, the b-tree can be balanced. This is done by determining whether there are any non-leaf nodes that do not have a right child. If they do not, then a right child or children can be created using data elements evicted from full nodes and inserted into the new child or children. This action can create almost empty nodes that contain one data element (key) and that are used for balancing the b-tree.
When new data elements need to be inserted into the b-tree following balancing, the balancing process must be reversed. To do so, the data elements used for balancing the b-tree are removed from the b-tree until a node containing more than one data element is reached. Once a node containing more than one data element is reached, the removed keys can then be inserted into the b-tree in the manner described above. After the removed data elements are re-inserted, the new data elements can be added in the same way described above. After the new data elements are added, the b-tree can be re-balanced.
There are several benefits to building a b-tree this way. First, the process may not involve any comparisons between data elements or keys while inserting keys into the b-tree. This makes the process of building the b-tree faster. Second, no b-tree nodes need to be split in order to balance a tree. Accordingly, there is not unnecessary data movement as a part of intermediate balancing. Finally, data in all but the right most arm of the b-tree is densely packed achieving maximum compression, and hence reducing input/output cost incurred during querying the b-tree.
FIGS. 2A-2E are a diagrams conceptually depicting a b-tree200 at various points during its construction according to embodiments. It is noted that embodiments are not limited to examples ofFIGS. 2A-2E.
As shown, inFIG. 2A, the b-tree can initially begin as asingle node202.Node202 may be associated with a particular data block that has the capacity to store a number of data elements. For instance, in the examples that follow, leaf nodes are depicted as having a capacity to store10 data elements, but the capacity need not be so limited. Indeed, in practice, nodes might have the capacity to store thousands of data elements.FIG. 2A also depicts adata queue204 containing data elements D1 to DK, which have been pre-sorted in ascending order, however the b-tree need not be so limited. In fact, according to various embodiments the data elements to be stored in b-tree200 may be sorted in any order (e.g., descending, etc.).
FIG. 2B depicts the b-tree200 afternode202 has been filled to capacity with 10 data elements (i.e., data elements D0 to D9). As shown, the data elements can be added to thenode202 in sequential order such that the lowest value data element (D0 in this example) is in the “left” position of thenode202 and the largest value data element (D9 in this example) is added to the right-most position in thenode202. Since, as discussed above,node202 has a capacity to hold10 data elements, thenode202 is now at capacity. Accordingly, at this point, there is no room to store the next data element D11 innode202. A new node (e.g., node206) needs to be created to store the next data element D11. This is shown inFIG. 2C.
As shown inFIG. 2C, anew node206 has been created as an intermediate node. Additionally, aleft link220 has been created linkingnode206 to theinitial leaf node202. According to various embodiments, intermediate nodes (that is, non-leaf nodes) such asnode206 may have the capacity to store fewer data elements than the leaf nodes. However, this certainly need not be the case. Indeed, in some embodiments, nodes have the capacity to store the same number of data elements and in other embodiments, intermediate nodes can store more data elements than the leaf nodes. However, for the purposes of the example processes described with respect toFIGS. 2A-2F, leaf nodes will have a capacity to store 10 data elements and intermediate nodes will have the capacity to store only two data elements. Oncenode206 is created, the next data element (here, data element D10) can be is inserted into the left most position of thenew node206.
After data element D10 is inserted into the left-most position ofnode206, a new leaf node is created to begin storing the next data elements. This process is depicted inFIG. 2D. As shown inFIG. 2D,new node208 is been created as a child/leaf node ofnode206. Additionally,node208 is right linked to data element D10 vialink222. The next data element from thedata queue204 can now be stored in thenew node208 beginning at the left position. As shown inFIG. 2D, this begins with data element D11.
As shown inFIG. 2E, thenew node208 has been filed with data elements D11 to D20 bringing it to capacity. Accordingly, the next data element (in this case D21) must be added to new node and the process can continue.
FIG. 2F depicts b-tree200 later in the process of its construction. As shown inFIG. 2F, anadditional node210 has been added as a right link to data element D21 and subsequently filled with data elements D22-D31. Accordingly, the next data element (D32) cannot be stored innode210. However, since in this exampleintermediate node206 can only hold two data elements, the next data element cannot be stored in that node either. Thus, anew node212 must be created at the next level. Data element D32 can then be insertednode212. After data element D32 has been inserted intonode212, anew leaf node214 can be created to store the next data element (D33). In general, any time a data element is inserted into a non-leaf node, a new leaf node is the next node created. Afternode214 is created, the process can continue as discussed with respect toFIGS. 2A-2F until the data elements indata queue204 have been stored in the b-tree200.
FIG. 3 is a flowchart depicting amethod300 of populating a b-tree200 according to various embodiments. For ease of explanation,method300 will be described with reference to b-table200 depicted inFIGS. 2A-2F, however it need not be so limited.
According to the method afirst leaf node202 is created atstep302. According to various embodiments, thefirst leaf node202 could have a capacity to hold a number of data elements. For instance,leaf node202 is depicted inFIGS. 2A-F as having the capacity to store ten data elements. However, it should be understood that each node could have any capacity and still fall within the spirit and scope of the present description. Indeed, in practice, leaf nodes may have the capacity to store hundreds or thousands of data elements. Atstep304, theleaf node202 is filled with a number of data elements (e.g., D0 to D9) until it is full.
When thefirst leaf node202 is full atstep304, aparent node206 is created atstep306. The parent can left link220 with the leaf node at this point thereby establishing the parent/child relationship between the twonodes202 and206. Atstep308, the next data element (e.g., D10) is added to the lowest-value (i.e., “left-most”) spot ofparent node206. If necessary, a new leaf node can then be created atstep210. In general, after adding a data element to a non-leaf node, a new leaf node is created.
Atstep310, thenew leaf node208 is created and associated with data element D10 stored inparent node206. The next data element and or elements can then be added to thenew leaf node208 atstep412. For instance, as shown inFIG. 2D, thenew leaf node208 can contain data elements (e.g., D11-D20) with values larger than the data element (D11) stored innode206. This process may continue iteratively until the data elements in thedata queue204 are added to the b-tree200 or until theparent node206 is full. When the parent node is full, a new node needs to be added at a level up from the parent node to facilitate additional data elements being added to the b-tree200. This process is described with respect toFIG. 4.
FIG. 4 depicts aprocess400 of adding data elements to a b-tree200 when a leaf node and its parent node are both filled to capacity. For ease of explanation,method400 will be described with reference to b-table200 depicted inFIGS. 2A-2F, however it need not be so limited.
Themethod400 begins atstep402 with a determination that the current leaf node (e.g., node210) is filled to capacity. Atstep404, themethod400 also determines that the parent node (e.g., node206) is also filled to capacity. At this point anew node212 can be created that is one level up from theparent node206 atstep406. Thenew node212 can be left linked withnode206. Atstep408, the next data element D32 can be stored in thenew node212. Once the next data element D32 has been stored in thenew node212, anew leaf node214 can be created to the right ofnode212. In general, any time a new data element is stored in a non-leaf node (e.g.,nodes206 and212) a new leaf node can be created according to various embodiments. Atstep412, the next data element and/or elements can be stored in the newly createdleaf node214. New data elements may then be added to the b-tree200 in the manner consistent withFIGS. 2A-2F andmethod300 ofFIG. 3. This process can be repeated each time a new non-leaf node needs to be created.
Trees constructed according to the methods outlined above may have several problems with them once they are constructed. Namely, they may be unbalanced. Consider, for instance, the alternate example representation of b-tree500 depicted inFIG. 5A. B-tree500 was constructed in accordance with the processes described above with respect toFIGS. 2A-4.
As shown inFIG. 5A, b-tree500 comprises a number of nodes at three different levels.
Each of the nodes is labeled according to the convention L[m]N[n] where “m” is the level number and “n” is the node number in that level. Accordingly, for instance, nod L1N1 is the first node inlevel1 and L3N1 is the first node inlevel3. Additionally, the shaded nodes (i.e., L1N2, L1N3, L1N4, and L1N5) are shaded to indicate that they are right linked with the data element in the node directly above them. For instance, node L1N2 is right linked (i.e., contains larger values than) with data element D10, which is stored in node L2N1 and node L1N3 is right linked with data element D20, etc.
As can be seen inFIG. 5A, b-tree500 is not balanced. To be balanced, b-tree500 should be finalized. The problems with b-tree500 specifically, are that (a) node L1N6 is not linked with any other node and (b) node L3N1 has no right pointer.
To fix this problem, a balancing binary tree can be constructed from node L1N6 by “evicting” the data elements one at a time. For instance, as shown inFIG. 5B, such a balanced b-tree500acan be built by creating a new node L1N6aand the largest data element from L1N6, which in this case is D57. At this point, another node L1N6bcan be created as alevel2 node (i.e., a level up from L1N6a) and the next highest data element can be evicted from node L1N6 and stored in thenew level2 node L1N6b.This balanced b-tree500acan then be right linked with data element D50 as shown inFIG. 5C so that b-tree500 is balanced. In general, the balanced b-tree should have a height one less than the node that of the node containing a data element that needs a right link. For instance, inFIG. 5A, node L3N1 is alevel3 node, accordingly, balancing binary500ashould have a height of 2 levels.
FIG. 6 depicts amethod600 of finalizing an unbalanced b-tree (e.g., b-tree500) according to various embodiments. For ease of explanation, themethod600 will be described with reference toFIGS. 5A-5C. However, it should be understood that themethod600 is not limited to the particular embodiments depicted inFIGS. 5A-5C.
As shown inFIG. 6,method600 begins atstep602 with a determination that the b-tree (e.g., b-tree500) has a non-leaf node without a right link. For example, in the example depicted inFIG. 5A, themethod600 could determine that node L3N1 does not have a right link. Atstep604, data elements can be evicted from the node containing the last data element in the b-tree500 in order to construct a balancing binary tree502a.For instance, in the example shown inFIGS. 5A-5C, node L1N6 contains the last data elements (D51-D57). Accordingly, data elements can be evicted from node L1N6 to create single element nodes L1N6aand L1N6b.These nodes can, in turn, be used to construct the balancing binary b-tree500adepicted inFIG. 5B. Once the balancingbinary tree500ais constructed atstep604, it can be linked with the b-tree500. For instance, the root of the balancing binary tree (i.e., node L1N6binFIG. 5B) can be right linked with the node lacking a right link (i.e., node L3N1). Once this is done, the b-tree500 will be balanced. It should be noted that in some instances, a b-tree may have more than one nodes without right links.Process600 can be repeated iteratively until the b-tree is properly finalized. To summarize, as a part of finalizing the b-tree, we build a binary tree to be linked to the highest node missing a right link. The elements of this binary tree are obtained by evicting data from the already filled nodes in the b-tree.
FIG. 7 is a flowchart depicting amethod700 of constructing and finalizing a b-tree (e.g., b-tree500) according to various embodiments. Themethod700 begins atstep702 with the creation of the b-tree500 by filing it with data elements. The b-tree500 can be constructed, for instance, according tomethods300 and400 depicted above with respect toFIGS. 3 and 4. Atstep704, themethod700 determines whether it is necessary to finalize the b-tree500. Theprocess700 can determine whether a b-tree needs to be finalized by determining, for instance, if it contains nodes that have no right links (e.g., node L3N1 inFIG. 5A). In some instances, the b-tree500 may contain no such nodes and may, therefore, already be balanced. If this is the case, then no finalization is necessary and the process can finish atstep708. However, if it is determined that the b-tree500 contains nodes with no right links, then the b-tree500 will need to be finalized and themethod700 can move to step706, where the finalization process is performed. According to various embodiments, the finalization process may be performed usingmethod600 depicted inFIG. 6, above. Afterstep706, themethod700 loops back to704 to determine whether there is any further need for finalization. If not, then the process ends atstep708. However, if there is, then the finalization process (e.g., method600) is performed again.
At some points, it may be necessary to add additional data elements to an already-created b-tree that has been created in accordance with the methods described above.FIG. 8 is a flowchart depicting amethod800 that can be used to add additional data elements to an extant b-tree according to various embodiments.
As shown inFIG. 8, theprocess800 may begin by determining that additional data elements need to be added to the b-tree (e.g., b-tree500). Once it is determined that additional data elements need to be added to the b-tree, then an undo finalize procedure is performed atstep804. The additional data element or elements can then be added to the b-tree atstep806. For instance, according to various embodiments, the processes outlined above with respect toFIGS. 2A-4 can be used to add the additional data element or elements to the b-tree. Atstep808, the b-tree500 is again finalized. Step808 may be accomplished using the finalization procedures outlined above with respect toFIGS. 5A-6.
FIG. 9 is a flowchart depicting amethod900 of undoing the finalization of a b-tree according to various embodiments. For instance,method900 could be usedperform step804 ofmethod800 inFIG. 8, above. For ease of explanation,method600 will be described with reference to b-table200 depicted inFIGS. 5A-5C, however it need not be so limited.
Method900 begins atstep902 by removing the data elements used for balancing the b-tree500. For instance, in the embodiment depicted inFIGS. 5A-5C, D57 and D56. can be removed from the B tree. This is done by iteratively removing data elements from the b-tree until we reach a node having more than one data element. Atstep904, the removed data elements D57 and D56 are re-inserted into the b-tree200 as described above with respect toFIGS. 2A-4. In this example, this would result in an un-balanced b-tree500 as shown inFIG. 5A. That is, the b-tree at this point would have a node L3N1 without any right links. The new data elements (e.g., D58, etc.) can then be added to the B tree atstep906 according to the processes described above with respect toFIGS. 2A-4. Atstep908, the b-tree500 can be re-balanced if necessary.
Various embodiments can be implemented, for example, using one or more well-known computer systems, such ascomputer system1000 shown inFIG. 10.Computer system1000 can be any well-known computer capable of performing the functions described herein, such as computers available from International Business Machines, Apple, Sun, HP, Dell, Sony, Toshiba, etc.
Computer system1000 includes one or more processors (also called central processing units, or CPUs), such as aprocessor1004.Processor1004 is connected to a communication infrastructure orbus1006.
Computer system1000 also includes user input/output device(s)1003, such as monitors, keyboards, pointing devices, etc., which communicate withcommunication infrastructure1006 through user input/output interface(s)1002.
Computer system1000 also includes a main orprimary memory1008, such as random access memory (RAM).Main memory1008 may include one or more levels of cache.Main memory1008 has stored therein control logic (i.e., computer software) and/or data.
Computer system1000 may also include one or more secondary storage devices ormemory1010.Secondary memory1010 may include, for example, ahard disk drive1012 and/or a removable storage device or drive1014.Removable storage drive1014 may be a floppy disk drive, a magnetic tape drive, a compact disk drive, an optical storage device, tape backup device, and/or any other storage device/drive.
Removable storage drive1014 may interact with aremovable storage unit1018.Removable storage unit1018 includes a computer usable or readable storage device having stored thereon computer software (control logic) and/or data.Removable storage unit1018 may be a floppy disk, magnetic tape, compact disk, DVD, optical storage disk, and/any other computer data storage device.Removable storage drive1014 reads from and/or writes toremovable storage unit1018 in a well-known manner.
According to an exemplary embodiment,secondary memory1010 may include other means, instrumentalities or other approaches for allowing computer programs and/or other instructions and/or data to be accessed bycomputer system1000. Such means, instrumentalities or other approaches may include, for example, aremovable storage unit1022 and an interface1020. Examples of theremovable storage unit1022 and the interface1020 may include a program cartridge and cartridge interface (such as that found in video game devices), a removable memory chip (such as an EPROM or PROM) and associated socket, a memory stick and USB port, a memory card and associated memory card slot, and/or any other removable storage unit and associated interface.
Computer system1000 may further include a communication ornetwork interface1024.Communication interface1024 enablescomputer system1000 to communicate and interact with any combination of remote devices, remote networks, remote entities, etc. (individually and collectively referenced by reference number1028). For example,communication interface1024 may allowcomputer system1000 to communicate with remote devices1028 over communications path1026, which may be wired and/or wireless, and which may include any combination of LANs, WANs, the Internet, etc. Control logic and/or data may be transmitted to and fromcomputer system1000 via communication path1026.
In an embodiment, a tangible apparatus or article of manufacture comprising a tangible computer useable or readable medium having control logic (software) stored thereon is also referred to herein as a computer program product or program storage device. This includes, but is not limited to,computer system1000,main memory1008,secondary memory1010, andremovable storage units1018 and1022, as well as tangible articles of manufacture embodying any combination of the foregoing. Such control logic, when executed by one or more data processing devices (such as computer system1000), causes such data processing devices to operate as described herein.
Based on the teachings contained in this disclosure, it will be apparent to persons skilled in the relevant art(s) how to make and use the invention using data processing devices, computer systems and/or computer architectures other than that shown inFIG. 10. In particular, embodiments may operate with software, hardware, and/or operating system implementations other than those described herein.
It is to be appreciated that the Detailed Description section, and not the Summary and
Abstract sections (if any), is intended to be used to interpret the claims. The Summary and Abstract sections (if any) may set forth one or more but not all exemplary embodiments of the invention as contemplated by the inventor(s), and thus, are not intended to limit the invention or the appended claims in any way.
While the invention has been described herein with reference to exemplary embodiments for exemplary fields and applications, it should be understood that the invention is not limited thereto. Other embodiments and modifications thereto are possible, and are within the scope and spirit of the invention. For example, and without limiting the generality of this paragraph, embodiments are not limited to the software, hardware, firmware, and/or entities illustrated in the figures and/or described herein. Further, embodiments (whether or not explicitly described herein) have significant utility to fields and applications beyond the examples described herein.
Embodiments have been described herein with the aid of functional building blocks illustrating the implementation of specified functions and relationships thereof. The boundaries of these functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternate boundaries can be defined as long as the specified functions and relationships (or equivalents thereof) are appropriately performed. Also, alternative embodiments may perform functional blocks, steps, operations, methods, etc. using orderings different than those described herein.
References herein to “one embodiment,” “an embodiment,” “an example embodiment,” or similar phrases, indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it would be within the knowledge of persons skilled in the relevant art(s) to incorporate such feature, structure, or characteristic into other embodiments whether or not explicitly mentioned or described herein.
The breadth and scope of the invention should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.