Detailed Description
In order to make the objects, technical solutions and effects of the present application clearer and more specific, the present application will be further described in detail below with reference to the accompanying drawings and examples.
The application provides a data updating method, a device and a storage device based on a B+ tree index, wherein a CoW mechanism is used during data updating, the effect of no conflict between reading and writing is achieved, a mapping table is introduced during construction of the B+ tree index, the mapping table maps physical nodes to virtual nodes, atomicity of modification of multiple references of pages can be ensured, cascade operation of the CoW B+ tree during updating and modification can be eliminated, and cost is reduced.
Referring to fig. 2, fig. 2 is a flowchart illustrating a first embodiment of a data updating method based on b+ tree index according to the present application; in this embodiment, the method comprises the steps of:
s201: nodes requiring updated data are acquired.
The nodes comprise physical nodes and virtual nodes, wherein the physical nodes represent a physical memory address, and the virtual nodes represent a B+ tree node.
Specifically, a b+ tree is a balanced search tree designed for disk or other direct access secondary storage devices, preferably in terms of reducing disk I/O operands, and comprises a root node, an internal node, and leaf nodes, which may be a leaf node, or a node comprising two or more leaf nodes, as is commonly used in database and operating system file systems.
S202: and copying the data to be updated into a new physical memory, and modifying the data in the new physical memory to form a new physical memory address.
The method adopts the copy-on-write technology to modify and update the data, and can achieve the effect of no conflict between reading and writing through the copy-on-write technology. Specifically, the basic idea of copy-on-write is that if multiple callers (callers) request the same resource at the same time, they will commonly get the same index to point to the same resource, and until a caller (caller) attempts to modify the resource, the system will actually copy a copy (private copy) to the caller, so as to avoid the modified resource being directly perceived, which is transparent to other callers.
In other words, different processes in the virtual space (virtual node) point to the same physical space (physical node, memory area), when the behavior of changing the corresponding segment occurs in the process, the physical space is allocated to the corresponding segment of the process, and then the data is modified in the new physical space.
S203: modifying the mapping table to enable the virtual node to point to a new physical memory address to form a new physical node; wherein the mapping table comprises a mapping of virtual nodes to physical nodes.
The internal nodes of the B+ tree have no pointer pointing to specific information of the keywords, so that the internal nodes are smaller, if keywords of all the same internal nodes are stored in the same disk block, the more the number of the keywords which can be contained in the disk block is, the more the keywords which need to be searched in the memory are read at one time, and the relatively lower the IO read-write times, so that the disk read-write cost of the B+ tree is lower. Meanwhile, since the non-terminal node of the B+ tree is not the node which finally points to the file content, but is only the index of the key words in the leaf nodes, any key word search must take a path from the root node to the leaf nodes, and the path length of all key word queries is the same, so that the query efficiency of each data is equivalent, and the query efficiency of the B+ tree is more stable.
However, the copy-on-write b+ tree index is not friendly to the update operation because the CoW action can propagate up from the leaf node to the root node, and in addition, the random IO cost-effectiveness incurred in this process is large. In order to overcome the defects of the existing CoW B+ tree, a Mapping Table (Mapping Table) is added in the B+ tree index, and the Mapping Table comprises Mapping from virtual nodes to physical nodes, so that when the physical address of a data page is changed, only the corresponding entry in the Mapping Table needs to be changed, and only the Mapping relation between the virtual nodes and the physical nodes in the Mapping Table needs to be modified. The influence of data page change can be limited on a specific page by using the Mapping table, the cascade CoW phenomenon is avoided, the updating operation is accelerated, and the random IO is reduced.
In one embodiment, the method includes creating a tree index, where the tree index includes a mapping table and a b+ tree; the mapping table comprises the mapping from the virtual node to the physical node; the physical nodes represent a physical memory address and the virtual nodes are constructed to form a b+ tree. Because the mapping table isolates the physical address and the B+ tree node, the modification of the physical node does not affect the virtual node, so that the modification of the physical page caused by the CoW does not affect other pages, is atomic, and does not need to propagate the modification of the leaf to the root of the tree. However, the physical address Ptr corresponding to the page ID in the Mapping table changes with the change of the page.
In one embodiment, virtual nodes are linked by a logical link in the b+ tree index of the method. This is because the introduction of the mapping table eliminates the defect that the CoW action propagates along the path to the root node, affecting the entire index. Therefore, the leaf nodes can be linked together, and range finding and merging and splitting operations are better supported.
In one embodiment, the method can be used for modifying index page data, and can also be used for modifying a data structure of a tree, such as splitting a node or merging nodes. These modification operations all employ CoW technology.
Wherein, in one embodiment, the old physical node is reclaimed after modification is completed. Specifically, the Latch-free environment does not allow exclusive access to a shared data structure, that is, pages can be accessed even if they are being modified. We do not want to free a memory that is still being accessed. For example, at the time of solidification, threads swap old pages and new page states and reclaim the new states, but release when threads no longer access old pages. The reclamation mechanism is to protect previously accessed and released objects.
Referring to fig. 3, fig. 3 is a flowchart illustrating a second embodiment of a data updating method based on b+ tree index according to the present application; in one application scenario, the method comprises the steps of:
and establishing a tree index, wherein the tree index comprises a mapping table and a B+ tree, the mapping table comprises mapping (solid line pointer in the figure) from a virtual node (ID) to a physical node (Ptr), the physical node (Ptr) represents a physical memory address, and the virtual node (ID) is constructed to form the B+ tree. A B+ tree contains root nodes, internal nodes and leaf nodes, each node is composed of a combination of Pointer (P) data pairs (dotted pointers in the figure), key (Key, K) and file addresses represented by the Key, and a node is stored on a disk block.
When the page data modification or the tree structure modification is needed, the original page is copied and copied, then the modification is carried out on a new page, and after the modification, the new address is put into the page physical address column of the mapping table to form a new physical address.
When a CoW transaction needs to modify a record, it first looks for Leaf nodes that are located to the grey box. Specifically, from the root node, the root node information is read, a given keyword is searched in keywords K1, … … and Kn contained in the root node, and if the keyword which is equal to the given value is found, the search is successful; otherwise, it can be determined that the keyword to be searched is between Ki and Kj, and at this time, the node pointed by the pointer Pi is taken to continue searching (Pi is the pointer pointing to the root node of the subtree) until the keyword is found.
Then the transaction copies the corresponding leaf node to be output and carries out corresponding modification; after the modification is completed, the physical address Ptr is pointed to the internal memory address of the page where the modification is completed; thus, the subsequent operations access the page after the modification is completed. While gray pages wait to be reclaimed after all access operations are completed. In the embodiment, the cascade CoW problem of the original Copy-on-Write B+ tree can be eliminated by a layer of indirect layer Mapping table device, and the problems of Write amplification and random IO are reduced.
Based on the foregoing method, the present application further provides a data updating device based on a b+ tree index, referring to fig. 4, fig. 4 is a schematic structural diagram of a first embodiment of the data updating device based on a b+ tree index, in this embodiment, the device modifies data by using a copy-on-write manner, the device 40 includes a processor 401, the processor 401 is configured to obtain nodes of data to be updated, the nodes include physical nodes and virtual nodes, the physical nodes represent a physical memory address, and the virtual nodes represent a b+ tree node. The processor 401 is further configured to copy the data to be updated to a new physical memory, and modify the data in the new physical memory to form a new physical memory address. The processor 401 is further configured to modify the mapping table to enable the virtual node to point to a new physical memory address, thereby forming a new physical node; wherein the mapping table comprises a mapping of virtual nodes to physical nodes.
Wherein, in an embodiment, the processor 401 is further configured to build a tree index, where the tree index includes a mapping table and a b+ tree; the mapping table comprises the mapping from the virtual node to the physical node; the physical node represents a physical memory address and the virtual node is constructed to form the b+ tree.
In one embodiment, the processor 401 is further configured to associate virtual nodes through a logical link.
In one embodiment, the processor 401 is further configured to change the index page data or change the index tree structure.
In one embodiment, the processor 401 is further configured to perform node splitting or node merging on the tree structure.
Wherein in an embodiment the processor 401 is further configured to reclaim old physical nodes.
The data updating device based on the b+ tree index can be used for executing the data updating method based on the b+ tree index, and has the corresponding beneficial effects, and the specific process is described in the above embodiments and is not repeated herein. The device may be a stand-alone device independent of the background server, or may be a module or a processing unit in the server.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a second embodiment of a b+ tree index-based data updating apparatus according to the present application, in this embodiment, the apparatus modifies data by using a copy-on-write method, and the apparatus 50 is a module in a processor, and specifically includes an obtaining module 501, an updating module 502, and a modifying mapping table module 503.
The acquiring module 501 is configured to acquire nodes of data to be updated, where the nodes include physical nodes and virtual nodes, the physical nodes represent a physical memory address, and the virtual nodes represent a b+ tree node.
The update module 502 is configured to copy the data to be updated into a new physical memory, and modify the data in the new physical memory to form a new physical memory address.
The modification mapping table module 503 is configured to modify the mapping table, so that the virtual node points to a new physical memory address to form a new physical node; wherein the mapping table comprises a mapping of virtual nodes to physical nodes.
Wherein, in an embodiment, the device further comprises a creating module, configured to create a tree index, where the tree index includes a mapping table and a b+ tree; the mapping table comprises the mapping from the virtual node to the physical node; the physical node represents a physical memory address and the virtual node is constructed to form the b+ tree.
In one embodiment, the update module 502 is further configured to change the index page data or change the index tree structure. The apparatus may be used to execute the above data updating method based on the b+ tree index, and the specific process is described in the above embodiment, and will not be described herein.
Referring to fig. 6, fig. 6 is a schematic structural diagram of a first embodiment of the device with storage function. In this embodiment, the storage device 60 stores a program 601, and the program 601 implements the data update method based on the b+ tree index described above when executed. The specific working process is identical to that of the above method embodiment, so that the detailed description thereof will be omitted herein, and the detailed description of the corresponding method steps will be referred to above. The device having the storage function may be a portable storage medium such as a U-disk, an optical disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or the like, or may be a terminal, a server, or the like.
According to the scheme, the data updating method based on the B+ tree index is provided, and the mapping table is introduced when the B+ tree index is constructed, so that the physical nodes are mapped to the virtual nodes by the mapping table, the atomicity of modification of a plurality of references of a page can be ensured, the cascade CoW operation of the CoW B+ tree can be eliminated, and the cost is reduced.
In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist alone physically, or two or more units may be integrated into one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution, in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to perform all or part of the steps of the methods described in the embodiments of the present application.
The foregoing description is only of embodiments of the present application, and is not intended to limit the scope of the patent application, and all equivalent structures or equivalent processes using the descriptions and the drawings of the present application or direct or indirect application in other related technical fields are included in the scope of the present application.