Detailed Description
It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.
In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate in order to describe the embodiments of the application herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For convenience of description, the following will describe some terms or terminology involved in the embodiments of the present application:
And the distributed block storage system is used for storing data in a plurality of storage servers in a scattered way, forming a virtual storage device by the scattered storage resources and carrying out semantic storage on the external lifting blocks (reading and writing according to the positions).
LSM tree (Log-Structured MERGE TREE) is a data structure for storing and managing data, typically used to implement key-value store or similar database systems. The LSM tree is characterized in that data is written into a disk in a log form, and merging and compressing operations are carried out in the background, so that the writing performance is improved, and the reading cost is reduced. LSM trees are typically composed of multiple levels, including a memory layer, a disk layer, and possibly a compression layer, each with different performance characteristics and purposes.
The mapping relation between the logical position of the block and the physical position of the bottom storage server generally comprises two forms, namely a fixed-length key value pair and a variable-length key value pair.
The self-adaptive radix tree, the compressed prefix tree, is a more space-saving prefix tree and a multi-way search tree. The intermediate nodes of the tree do not hold leaves but point to child nodes, the leaves being held only in the leaf nodes of the last layer. Each node comprises a fixed number of slots (slots), the number being the nth power of 2, called the base of the adaptive radix tree. The method is mainly characterized in that a group of character strings are stored according to the sequence of characters, and space is saved by sharing a common prefix. The structure is a multi-way tree, each node represents a character, and from the root node, each node has a plurality of sub-nodes, and each sub-node represents the next character.
It should be noted that, related information (including, but not limited to, user equipment information, user personal information, etc.) and data (including, but not limited to, data for presentation, analyzed data, etc.) related to the present disclosure are information and data authorized by a user or sufficiently authorized by each party. For example, an interface is provided between the system and the relevant user or institution, before acquiring the relevant information, the system needs to send an acquisition request to the user or institution through the interface, and acquire the relevant information after receiving the consent information fed back by the user or institution.
The present application will be described with reference to preferred implementation steps, and fig. 1 is a flowchart of a data dump method according to an embodiment of the present application, as shown in fig. 1, and the method includes the following steps:
step S101, if the current state of the target system meets the preset condition of full data dump, scanning a plurality of first self-adaptive radix trees to obtain first data information, wherein each first self-adaptive radix tree is constructed by first metadata information corresponding to one data fragment, the first metadata information is a fixed-length key value pair, and the first data information at least comprises metadata information corresponding to leaf nodes in the plurality of first self-adaptive radix trees.
Alternatively, the target system may be a distributed block storage system, and for fixed-length metadata information for representing a mapping relationship between a logical location of a block and a physical location of an underlying storage server, an adaptive Radix Tree (ART Tree) is used as a memory structure, where the adaptive Radix Tree (ray Tree) is a Trie (prefix Tree) that splits keys (i.e., logical addresses in the metadata information described above) to store sub-sequences in a root-to-leaf order, where the root-to-leaf order is called a path, where a complete path forms a key of a leaf, and where a prefix portion is stored only once if multiple keys have the same prefix. The last layer node where the leaf is located does not use child pointer, but directly embeds value (i.e. the physical address mentioned above) in the ART tree, thereby reducing the memory usage. And finally, constructing a self-adaptive radix tree corresponding to the data slicing according to the node leaves.
It should be noted that, the concept of slicing (shard) is introduced to split the data block externally displayed by the distributed block storage system into smaller units in the system, which is referred to herein as slicing, that is, slicing each data block, and finally constructing an adaptive radix tree corresponding to each data slice. It should be noted that one data slice corresponds to one adaptive radix tree.
It should be noted that, to reduce the total io write latency, the index information (i.e., the metadata information of the block) and the metadata information of the index information may be combined into one io write into the persistent medium (meta chunk). The type of meta chunk is classified as full or INCREMENTAL CHUNK (full or delta). The data layout of the index information and the metadata information of the index information in the persistent medium is shown in fig. 2, wherein the layout of meta chunk includes three parts of content, header part, total 64 bytes comprising reserved space, and the data structure is as follows:
Meaning interpretation of the various fields in Table 1 header
Each entry includes two parts, meta and ELEMENTS SET, ELEMENTS SET is specific index information, meta is metadata information of the index information, and is a fixed length, and includes the following data structures:
TABLE 2 interpretation of the meaning of the data structure for each entry
ELEMENTS SET compressed (if any) post-index data, if there are multiple records, are stored in such a way that op_type+key_len (key length) +key+value_len (value length) +value, next < key, value >.
The total 64 bytes occupied by the folder part comprises reserved space, and the data structure is as follows:
Meaning interpretation of the data structure of Table 3 footer
By constructing the layout of the meta chunk, a plurality of index data and metadata information thereof, such as the information of crc, elements_ count, version and the like, can be packaged into one io transaction operation, so that complex scenes and logic introduction, such as delay increase, operation failure rollback and the like caused by multiple io operations, are avoided.
The index caching module (used for managing metadata information of the blocks) comprises two types of index data, namely fixed-length kv, namely mapping from logical offset to underlying physical storage space offset in the storage slices, and variable-length kv, namely kv, key and value used by other modules in the storage slices, which are variable-length.
Since the chunk (bottom physical storage space) used by the index buffer module only provides the function of additional writing (application only), multiple write operations for the same intra-tile logical offset will generate multiple copies of data and the same amount of index data (i.e. the metadata information mentioned above), where only the last copy of the index data is valid data, and the other is invalid data, which is also called garbage data, and meanwhile, the index data is stored in the increment block (INCREMENTAL CHUNK) of the index buffer, and as time goes, the previously invalid index data will be more and more, which occupies the bottom physical storage space and also affects the time for recovering the index buffer ART tree and the variable-length kv memory structure after restarting the process (because all INCREMENTAL CHUNK needs to be traversed). Therefore, the invalid index data needs to be deleted, so that the storage space is released, and the number of INCREMENTAL CHUNK needed to be read in data recovery is reduced.
By performing full-size data dumping, namely writing all the current latest fixed-length and variable-length data into a full-size block (full chunk), all index data in the mark INCREMENTAL CHUNK are equivalent to invalid data, thereby achieving the purposes of deleting the chunk to release storage space and reducing the number of chunks to be read during data recovery.
Therefore, on the premise of adopting the self-adaptive radix tree as the memory structure of the fixed-length key value pair, if the current state of the target system meets the preset condition of full data dump, the existing first self-adaptive radix trees are directly scanned to obtain first data information at least comprising metadata information corresponding to leaf nodes in the first self-adaptive radix trees.
When the full data dump is performed, the version number of the full data dump is allocated to the epoch value. The version number is monotonically increased, and each dump operation (dump op) in the full data dump needs to carry the version number as specific identity information and also as verification information for data recovery.
The current committed version of the index cache module is used as the epoch. The index cache module internally maintains two versions, one is a subset version for allocating sequence numbers for the fixed-length or variable-length kv update operation of each front-end user writing INCREMENTAL CHUNK, and the other is committed version for recording the continuous maximum version completed in the fixed-length or variable-length kv update operation of the front-end user writing INCREMENTAL CHUNK. For example, subset version=100, the sequence number 100,committed version =98, which indicates that the last current update operation has been committed to a fixed length or variable length, the sequence number 98, which indicates that the largest and consecutive update operation has been completed, is not necessarily returned from the underlying physical store, committed version would otherwise be equal to 99, #100, which may or may not be returned, even though #100 was returned prior to #99, does not update committed version because #99 of the preamble operation is not returned. When #99 returns, the index cache module scans a queue called operation completion to find the largest consecutive completed operation sequence number as the new committed version. The dump of the full data can be accurately realized through the version sequence number.
Step S102, scanning a plurality of first association containers to obtain second data information, wherein each first association container stores second metadata information of one data fragment, the second data information at least comprises the second metadata information, and the second metadata information is a variable-length key value pair.
Alternatively, the variable length kv (i.e., the second metadata information described above) is mainly kv that is information organized for data slices, and only a limited number of variable length kv exists for one data slice. Thus, variable length kv uses the std:: map container (i.e., the first association container described above) provided by the c++ standard STL (STANDARD TEMPLATE Library) implemented based on the red-black tree data structure. std. The elements in the map are ordered in the order of keys, the keys being unique and each key being capable of only one value. It should be noted that, one data slice corresponds to one first association container, that is, the variable length kv of each data slice is stored separately.
When full data dumping is performed, a traversing method provided by a map container is used, wherein traversing is started from a minimum key of the map, each dump is encoded with kv as granularity, and the variable-length kv encoded data layout shown in fig. 3, namely, the data layout of second data information obtained by scanning a plurality of first associated containers comprises metadata information and meta (for example, kv1 comprises meta, key1 and value1, kv2 comprises meta, key2 and value 2), and meta consists of a length (KEY LENGTH) of keys and a length (value length) of values.
It should be noted that, since the amount of metadata information obtained by one scan is limited, for example, one dump op dumps the std of 4MiB at most, i.e., kv information in map. And continuing traversing std from the position where the next dump is finished from the previous dump by map container until traversing the whole container. Each dump op needs to be allocated an op operation sequence number within the epoch, encoded starting from 0, and persisted into the version of ENTRY META of the full chunk written. Meanwhile, the last dump op needs to set the completion to 1 in ENTRY META, so as to determine whether the full data dump is complete or not when the process is restarted to restore data.
Step S103, dumping the first data information and the second data information to the target storage medium.
Optionally, the scanned first data information and second data information are dumped to the target storage medium.
In summary, the self-adaptive radix tree is used as the memory structure of the fixed-length metadata information, and the association container is used as the memory structure of the variable-length metadata information, so that when full-volume data dump is performed, the self-adaptive radix tree and the association container can rapidly realize full-volume data scan, metadata information in each layer is not required to be searched from top to bottom, and further the effect of improving the efficiency of data dump is achieved.
Optionally, in the data dumping method provided by the embodiment of the application, scanning the plurality of first self-adaptive radix trees to obtain the first data information comprises the steps of scanning the plurality of first self-adaptive radix trees to obtain metadata information corresponding to leaf nodes in the plurality of first self-adaptive radix trees, node types of the leaf nodes and common prefix information in upper nodes corresponding to the leaf nodes, determining attribute information of the metadata information corresponding to the leaf nodes according to the node types, the leaf numbers and the common prefix information of the leaf nodes, and obtaining the first data information according to the attribute information and the metadata information corresponding to the leaf nodes.
In an alternative embodiment, scanning the plurality of first adaptive radix trees includes, for example, data slice sizes calculated as 32GiB, logical spaces of 32GiB using at most 32GiB/4 kib=8mib leaves, and a total of 40MiB memory usage calculated as a value of one leaf occupies 5 bytes. Meanwhile, the total memory used by the intermediate Node (inner Node) of the ART tree does not exceed 1MiB, because the ART tree with 32GiB slices has only 3 layers, the first layer is a root Node, only one inner Node256 nodes, and the second layer has at most 256 inner Node256 nodes, each inner Node256 stores 256 pointers pointing to the lower Node, each pointer occupies 8 bytes, and therefore, the first layer and the second layer occupy (1+256) x 256 x 8 = 514KiB (1 KiB = 1024 bytes) memory in total.
When dumping, only the information of the leaf node is encoded, and the prefix information (namely the common prefix information) of all layers before the leaf node is recorded, for example, the key of a certain leaf is 0x123400, when dumping the leaf node, only the common prefix 0x1234 of each leaf is recorded. Each dump is encoded by taking a leaf node as granularity, and one dump op operation is performed to dump the encoding information of the ART leaf child node of 4MiB and the prefix thereof at most. The ART tree shown in FIG. 4 includes root, inner node1, inner node2 and leaf nodes 1-5. The last layer of leaf nodes (leaf node in fig. 4, the sequential direction is indicated by a dotted line) are scanned sequentially from the leftmost leaf node (leaf node 1) of the ART tree. The inner node1 has a prefix of 0x12, the inner node2 has a prefix of 0x77, the leaf node1 has a prefix of 0x1234, there are two leaves in the interior, the key of the first leaf is 0x123400, and the values of keys of the other subsequent leaves are shown in fig. 4 in the range of (0 x123400,0x1234 ff) and the leaf node2 has a prefix of 0x1256,leaf node3 of 0x1278.
The data layout of the first data information is shown in fig. 5, and includes leaf node1 and leaf node2, and one leaf node (leaf node1 and leaf node 2) is composed of meta and leave, and the meta includes node type (node type), leaf count (leaf count), and common prefix information (key prefix). ART trees may exceed 4 mibs and may not be able to complete the encoding of all leaf nodes in one dump op, thus multiple ART tree dump operations may be required.
The corresponding fixed-length KV can be obtained rapidly through scanning the ART tree, and the efficiency of data dump is improved.
Optionally, in the data dumping method provided by the embodiment of the application, before dumping the first data information and the second data information to the target storage medium, the method further comprises the steps of scanning a coding table in the target system and a bitmap corresponding to the coding table to obtain third data information, wherein the coding table is used for carrying out compression coding processing on the metadata information when constructing a plurality of first adaptive radix trees, the bitmap is used for expressing the use condition of an array in the coding table, and dumping the third data information to the target storage medium.
In an alternative embodiment, the metadata information is compressed and encoded for a reduced memory usage when constructing the plurality of first adaptive radix trees, as shown in fig. 6, values 2065, 3089 and 1010 in the boxes represent the metadata information before encoding, and index numbers (numbered from 0, for example, 0,1,2,3,4,5, 6) of the encoded arrays are used as the encoded values, and in addition, black indicates that the array item is occupied, white indicates that the array item is unoccupied, and color information is stored in bitmap information and represented by 1 bit. The information of the coding table which needs to be persistent comprises two parts of a coding array and a bitmap, wherein the coding array and the bitmap are continuous logic memories, the total occupied space is less than 200KiB, and the coding can be completed in one dump operation. It should be noted that a first adaptive radix tree corresponds to a bitmap and a coding table.
The memory usage of metadata information can be effectively reduced through the bitmap and the encoding table.
Optionally, in the data dumping method provided by the embodiment of the application, after the first data information and the second data information are dumped to the target storage medium, the method further comprises the steps of judging whether the full data dumping is finished or not through a finishing identifier corresponding to the full data dumping, if the full data dumping is finished, reading the first data information in the target storage medium, constructing a plurality of second self-adaptive radix trees according to the first data information, reading the second data information in the target storage medium, and constructing a plurality of second association containers according to the first data information.
In an alternative embodiment, after the first data information and the second data information, and the encoding table and the bitmap corresponding to the encoding table are dumped to the target storage medium, the method further includes the step that the index buffer module records the starting position and the length information of each dump op written in the full chunk at the beginning of each full data dump, the information is called full chunk check point (checkpoint), and when the process restarts data recovery, the process searches forward from the latest full data dump, and judges whether the full data dump is complete or not, that is, judges whether the completion (namely, the completion identifier) in the entry of the last checkpoint is true or not. If the full data dump is completed, the ART tree (i.e., the second adaptive radix tree described above) and the variable-length kv memory data structure (i.e., the second association container described above) are reconstructed based on the first data information and the second data information in the target storage medium.
The first data information and the second data information can realize the rapid reconstruction of the ART tree and the memory data structure with variable length kv.
Optionally, after constructing a plurality of second association containers according to the first data information, the method further includes reading newly added metadata information in an added block to obtain first newly added metadata information, obtaining a first version number corresponding to the first newly added metadata information, wherein the first version number is generated when the first newly added metadata information is newly added, determining whether the first version number is a first expected version number according to a target version number, wherein the target version number is a version number corresponding to the full data dump, and if the first version number is the first expected version number, updating the plurality of second adaptive radix trees according to the first newly added metadata information to obtain updated plurality of second adaptive radix trees, or updating the plurality of second association containers according to the first newly added metadata information to obtain updated plurality of second association containers.
In an alternative embodiment, when full data dump is performed, new index data may be written into INCREMENTAL CHUNK (delta block), INCREMENTAL CHUNK where the new index data is located is not released, and the new index data needs to be read during data recovery and updated into the memory structure of the ART tree and the variable-length kv, specifically including reading the new metadata information in INCREMENTAL CHUNK (i.e. the delta block), and it should be noted that, assuming that full data dump with epoch 99 is the last full data dump, after each entry with sub version > 99 in INCREMENTAL CHUNK (i.e. the new metadata information described above) needs to be sequentially read, the new index data is updated into the memory data structure of the ART tree and the variable-length kv. The reason why it is not necessary to read entries of subset version < = 99 is that the results of these entries have been completely written into full chunk corresponding to full data dump with epoch of 99.
It should be noted that determining whether there is a hole or disorder according to version in INCREMENTAL CHUNK, that is, whether the read first newly added metadata information is metadata information corresponding to an expected version, specifically includes determining whether the first version number is the first expected version number according to epoch (i.e., the above-mentioned target version number) of the full data dump. Note that the first expected version number is epoch+1. And if the first version number is the first expected version number, updating the plurality of second adaptive radix trees or the plurality of second associated containers directly according to the first newly added metadata information.
The accuracy of metadata information updating can be ensured through the judgment of the version number, and the situation that data errors exist in the updated self-adaptive radix tree or the associated container due to sequence problems is avoided.
Optionally, in the data dumping method provided by the embodiment of the application, after determining whether the first version number is the first expected version number according to the target version number, the method further comprises storing first newly added metadata information into a target queue if the target version number is not the first expected version number, reading the newly added metadata information in the added block again to obtain second newly added metadata information, and updating a plurality of second adaptive radix trees according to the second newly added metadata information if the second version number corresponding to the second newly added metadata information is the first expected version number to obtain updated plurality of second adaptive radix trees or updating a plurality of second associated containers according to the second newly added metadata information to obtain updated plurality of second associated containers.
In an alternative embodiment, if the target version number is not the first expected version number, an order-preserving queue (i.e., the target queue described above) is constructed, which requires the ability to perform a lookup according to entry version. And starting reading INCREMENTAL CHUNK ENTRY, if the read entry version is greater than the expected version (i.e. the target version number is not the first expected version number), storing the entry data in an order-preserving queue, and then continuing reading INCREMENTAL CHUNK, and if the entry (i.e. the second newly added metadata information) conforming to the expected version is read, updating the plurality of second adaptive radix trees according to the second newly added metadata information to obtain updated plurality of second adaptive radix trees, or updating the plurality of second associated containers according to the second newly added metadata information to obtain updated plurality of second associated containers.
If the second version number corresponding to the second newly added metadata information is not the first expected version number, then INCREMENTAL CHUNK needs to be read again until the entry conforming to the expected version is read.
Accurate updating of metadata information can be achieved by saving the queues.
Optionally, in the data dumping method provided by the embodiment of the application, after updating the plurality of second adaptive radix trees according to the second newly added metadata information to obtain updated plurality of second adaptive radix trees, or updating the plurality of second associated containers according to the second newly added metadata information to obtain updated plurality of second associated containers, the method further comprises judging whether third newly added metadata information exists in the target queue, wherein a version number corresponding to the third newly added metadata information is a second expected version number, if the third newly added metadata information does not exist, repeatedly executing the step of reading the newly added metadata information in the added block until all the newly added metadata information in the added block is read, and if the third newly added metadata information exists, updating the updated plurality of second adaptive radix trees according to the third newly added metadata information, or updating the updated plurality of second associated containers according to the third newly added metadata information until all the newly added metadata information in the added block is read.
In an alternative embodiment, after the update process is performed, it is determined whether an entry meeting the expected version exists in the order-preserving queue at the same time, that is, whether the target queue has the third newly added metadata information, if so, the entry is taken out from the order-preserving queue, and the memory structure of the ART tree and the variable-length kv is updated after decoding. It should be noted that if the epoch of the full data dump is 99 and the corresponding second newly added metadata information of version number 100 has been written, then the second expected version number should be 101. And simultaneously, continuously judging whether the entry conforming to the expected version exists in the order-preserving queue or not until the order-preserving queue is empty, or if the entry conforming to the expected version is not found, continuously reading INCREMENTAL CHUNK.
It should be noted that, after all INCREMENTAL CHUNK scan reads are completed, entries may still exist in the order-preserving queue, and these entries are discontinuous from the entries version in the memory structure that has been decoded and updated to the ART tree and the variable-length kv before, i.e. there is a hole, and cannot be used to recover the index data. Since the existence of the hole, the subsequent entry, even if written to INCREMENTAL CHUNK, will not be returned to the front-end user, there is no concern that the non-use of these read entries will cause data inconsistencies. For example, entry #98 completes write INCREMENTAL CHUNK and returns success to the front-end user, entry #100 completes write INCREMENTAL CHUNK, but does not return to the front-end user because entry #99 is guaranteed to return first, at which time entry #99 may not have been successfully written to INCREMENTAL CHUNK due to network and underlying storage. The whole subsequent process is restarted, only the entry #98 and the entry #100 can be read, and a hole exists. Since entry #99 and entry #100 did not return success to the front end user, here entry #100 is not decoded and is not updated into the ART tree and variable length kv memory structure, and no successful writing but no reading occurs.
Optionally, in the data dumping method provided by the embodiment of the application, before scanning the plurality of first adaptive radix trees to obtain the first data information, the method further comprises the steps of obtaining historical time information of the last time of completing the full data dumping of the target system, determining a time difference value according to the historical time information and the current time information, and judging whether the current state of the target system meets the preset condition of the full data dumping according to the time difference value and the time threshold condition in the preset condition of the full data dumping.
In an alternative embodiment, the following steps are adopted to determine whether the current state of the target system meets the preset condition of full-volume data dump, a time threshold triggered by timing (i.e. the time threshold condition described above) is set, the default value is set to 1 day, then the time point T1 (i.e. the historical time information described above) at which the full-volume data dump is completed last time and the current time information are recorded, and whether the current state of the target system meets the preset condition of full-volume data dump is determined according to the time difference between the historical time information and the current time information and the time threshold.
Whether full data dump is needed or not is judged through a time threshold, and the problem that the bottom physical storage space is consumed due to the huge amount INCREMENTAL CHUNK can be effectively avoided.
Optionally, in the data dumping method provided by the embodiment of the application, judging whether the current state of the target system meets the preset conditions of the full data dumping according to the time threshold condition in the preset conditions of the time difference and the full data dumping comprises judging whether the time difference is larger than or equal to the time threshold in the time threshold condition, if the time difference is larger than the time threshold, calculating a first total value of metadata information in an increment block in a target time period, wherein the target time period is from a time point corresponding to historical time information to a time point of the current time information, acquiring a second total value of metadata information counted by a metadata management module in the target system, and judging whether the current state of the target system meets the preset conditions of the full data dumping according to the data relation between the first total value and the second total value and the preset conditions of the full data dumping.
In an alternative embodiment, judging whether the current state of the target system meets the preset condition of the full data dump according to the time difference value and the time threshold condition in the preset condition of the full data dump includes calculating the total amount of index data S1 (i.e. the first total amount value) written to INCREMENTAL CHUNK from the last full data dump and the second total amount value S2 of metadata information counted by the metadata management module (i.e. the index cache module) if the time difference value exceeds the set time threshold value threshold, and judging whether the current state of the target system meets the preset condition of the full data dump according to the data relationship between the first total amount value and the second total amount value and the data relationship condition in the preset condition of the full data dump.
For example, if the amount of changed data in S1 < 2 x S2, INCREMENTAL CHUNK is less than twice the amount of data needed for the full data dump, the full data dump is not triggered (here, the setting is that the full data dump is considered to consume extra space and CPU, so that a certain space recovery yield is needed to trigger), otherwise, the full data dump is triggered, and the relevant statistical information is reset.
Through the judging process, full data dump can be more reasonably carried out, and unnecessary resource waste is avoided.
Optionally, in the data dumping method provided by the embodiment of the application, before scanning the plurality of first self-adaptive radix trees to obtain the first data information, the method further comprises the steps of obtaining the number of times of requests for processing the metadata information received by the target system, and judging whether the current state of the target system meets the preset condition of full data dumping according to the number of times of requests and the request threshold condition in the preset condition of full data dumping.
In an alternative embodiment, the following steps may be adopted to determine whether the current state of the target system meets the preset condition of full-volume data dump, that is, INCREMENTAL CHUNK ENTRY is used to record the fixed length and the variable length kv submitted by the front-end user, the more the number of entries indicates that the user has more modifications, when the modifications accumulate to a certain number of times, if full-volume data dump is not performed in time, a very large number of INCREMENTAL CHUNK needs to be read when the process is restarted to restore data, which affects the data restoration speed and further increases the waiting time for providing services. In addition, the vast amount INCREMENTAL CHUNK also consumes underlying physical storage space, which can easily overload cluster capacity usage when the cluster size is not large. However, if the full data dump operation is performed too frequently, the service threads are occupied too much, so that iops of the foreground user is reduced and the processing delay of io is increased. Thus, there is a need for a mechanism that trades off triggering among many factors, such as overload consumption of physical storage capacity, excessive data recovery time, and impact on cluster traffic. The request threshold condition is set to 12.8 ten thousand times in the embodiment of the present application. This value is set based on the following considerations:
Through actual running test statistics, INCREMENTAL CHUNK ENTRY average sizes are about 25KiB,12.8 ten thousand times are about 12.8 ten thousand times 25 kib=3.2 GiB data total data including fixed length and variable length portions, and the total average size is about 0.5GiB. The worst case requires recovery of the index data from INCREMENTAL CHUNK to read 3.2GiB, taking nvme disk P4610 as an example, the bandwidth of a large block sequential read is 3200 mb per second, so reading 3.2GiB takes about 1 second. Considering that a plurality INCREMENTAL CHUNK may be distributed on different nvme disks, and network transmission is performed after reading, performing operations such as entry disorder identification, order-preserving processing, updating ART tree and variable-length kv memory data after local decoding, estimating the whole time consumption to be 2 seconds, reading 1.6GiB data according to the average condition, and calculating the average expected recovery time to be 1 second. At the same time, a space recovery yield of 3.2GiB/0.5GiB =6.4 times is also acceptable.
In an alternative embodiment, once a full data dump is completed, the index data stored in INCREMENTAL CHUNK may be cleaned, and because the index data stores metadata information of the user data, the metadata information of the user data occupies a lower proportion of the underlying physical storage space than the user data, a policy of cleaning the user data is not adopted, that is, a policy of moving part of the valid data to other chunk to release the whole chunk physical space is adopted, and a policy of deleting the chunk only when the index data in the whole chunk can be cleaned is adopted.
As shown in fig. 7, the results of entries in INCREMENTAL CHUNK1 (including header, entry_1 (entry 1), entry_2,) and INCREMENTAL CHUNK2 (including header, entry_n+1 (entry n+1), entry_n+2,) entry_n+m and focus, when the full-size data dump with epoch 99 is completed, i.e., sub-version < = 99 INCREMENTAL CHUNK1 (including entry_1 (entry 1), entry_2, & entry_n and focus), and INCREMENTAL CHUNK2 (including entry_n+1 (entry n+1, corresponding sub-version=98), entry_n+2 (entry n+2, corresponding sub-version=99)) have all been dumped into full 1 (i.e., entry_99_1, entry_99_2_99 in fig. 7). At this time, INCREMENTAL CHUNK ENTRY of the subset version < = 99 can be deleted, that is, as can be seen from fig. 7, the entry of the subset version=99 is stored in INCREMENTAL CHUNK2, and the older INCREMENTAL CHUNK1 can be deleted completely.
At the same time, full chunk also needs to be cleaned. As shown in fig. 7, when full chunk1 (including header, entry_99_1, entry_99_2..entry_99_n and impeller) and full chunk2 (including header, entry_377_2, entry_666_0..), the index data of the previous full chunk 666 can be deleted after the full chunk 666 (i.e., completion=true, epoch=666) is completed in full chunk2, and as can be seen from fig. 7, the full chunk 666 can be completely deleted from full chunk 2.
In an alternative embodiment, various statistics are recorded, and the information requiring statistics includes:
1) The memory usage amount is as follows:
The number of various nodes of the ART tree, including the inner and leaf nodes, allows the memory capacity used by the ART tree as a whole to be calculated.
B. the memory capacity used by the variable-length kv memory architecture.
2) Storage medium usage space
A. the number of INCREMENTAL CHUNK sums used and the chunk info for each chunk includes chunk id, chunk total size and the actual written size.
Full chunk information is the same INCREMENTAL CHUNK.
3) Other information at runtime
Sub version and committed version.
B. Whether full data dump is currently being performed and information of the full data dump being performed, including epoch, the number of dump ops completed.
C. the information of the full data dump triggered historically includes epoch, the number of ops of the dump, and whether it is a complete full data dump.
The recording of the various statistical information facilitates subsequent anomaly localization and analysis, etc.
In an alternative embodiment, accurate writing of the index cache information (i.e. the metadata information of the block) may be implemented based on the above-mentioned fixed-length kv memory structure, as shown in fig. 8, and the specific flow is described below:
If the front end submits 3 write ios in sequence, when the three write ios are submitted to the index cache module, the index cache module allocates a unique version number version inside the fragment, and the version is to be persisted to a storage medium. In INCREMENTAL CHUNK, entry_1, entry_2, and entry_n corresponding to the 3 write ios.
When INCREMENTAL CHUNK persistence is completed, the corresponding write io proceeds to the second step, update ART. The ART tree shown in FIG. 8 includes root, inner node1, inner node2 and leaf nodes 1 to 5. As shown in fig. 8, entry_1, entry_2, and entry_n correspond to leaf_1, leaf_2, and leaf_3 in ART. The updating operation can be strictly carried out according to the assigned version order, so that the index data in the ART tree can be strictly updated according to the operation sequence submitted by the user, and the new data submitted after occurrence is prevented from being covered by the old data submitted first.
The data dumping method provided by the embodiment of the application comprises the steps of scanning a plurality of first self-adaptive base trees to obtain first data information if the current state of a target system meets the preset condition of full data dumping, wherein each first self-adaptive base tree is constructed by first metadata information corresponding to one data slice, the first metadata information is a fixed-length key value pair, the first data information at least comprises metadata information corresponding to leaf nodes in the plurality of first self-adaptive base trees, scanning a plurality of first associated containers to obtain second data information, each first associated container stores second metadata information of one data slice, the second data information at least comprises second metadata information, the second metadata information is a variable-length key value pair, and dumping the first data information and the second data information to a target storage medium, so that the problem that the efficiency of data dumping is low due to sequential searching of metadata information in each layer from top to bottom when the data dumping is carried out. In the scheme, the self-adaptive base tree is used as the memory structure of the fixed-length metadata information, the association container is used as the memory structure of the variable-length metadata information, and when full-volume data dump is carried out, the self-adaptive base tree and the association container can rapidly realize full-volume data scanning without searching the metadata information in each layer from top to bottom in sequence, so that the effect of improving the efficiency of data dump is achieved.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.
The embodiment of the application also provides a data dump device, and the data dump device of the embodiment of the application can be used for executing the data dump method provided by the embodiment of the application. The following describes a data dump device provided by an embodiment of the present application.
FIG. 9 is a schematic diagram of a data dump device according to an embodiment of the application. As shown in fig. 9, the apparatus includes a first scanning unit 901, a second scanning unit 902, and a first storage unit 903.
The first scanning unit 901 is configured to scan a plurality of first adaptive radix trees to obtain first data information if a current state of the target system meets a preset condition of full-scale data dump, where each first adaptive radix tree is constructed by first metadata information corresponding to one data slice, the first metadata information is a fixed-length key value pair, and the first data information at least includes metadata information corresponding to leaf nodes in the plurality of first adaptive radix trees;
the second scanning unit 902 is configured to scan the plurality of first associated containers to obtain second data information, where each first associated container stores second metadata information of one data slice, the second data information at least includes second metadata information, and the second metadata information is a variable-length key value pair;
A first storage unit 903 for dumping the first data information and the second data information to the target storage medium.
According to the data dumping device provided by the embodiment of the application, if the current state of a target system meets the preset condition of full data dumping, a first scanning unit 901 scans a plurality of first self-adaptive base trees to obtain first data information, wherein each first self-adaptive base tree is constructed by first metadata information corresponding to one data slice, the first metadata information is a fixed-length key value pair, the first data information at least comprises metadata information corresponding to leaf nodes in the plurality of first self-adaptive base trees, a second scanning unit 902 scans a plurality of first associated containers to obtain second data information, each first associated container stores second metadata information of one data slice, each second data information at least comprises second metadata information, the second metadata information is a variable-length key value pair, and a first storage unit 903 stores the first data information and the second data information to a target storage medium, so that the problem of low data dumping efficiency caused by sequentially searching metadata information in each layer from top to bottom when dumping is solved. In the scheme, the self-adaptive base tree is used as the memory structure of the fixed-length metadata information, the association container is used as the memory structure of the variable-length metadata information, and when full-volume data dump is carried out, the self-adaptive base tree and the association container can rapidly realize full-volume data scanning without searching the metadata information in each layer from top to bottom in sequence, so that the effect of improving the efficiency of data dump is achieved.
Optionally, in the data dumping device provided by the embodiment of the application, the first scanning unit comprises a scanning subunit, a determining subunit and a processing subunit, wherein the scanning subunit is used for scanning the plurality of first self-adaptive radix trees to obtain metadata information corresponding to leaf nodes in the plurality of first self-adaptive radix trees, node types of the leaf nodes and common prefix information in upper nodes corresponding to the leaf nodes, the determining subunit is used for determining attribute information of the metadata information corresponding to the leaf nodes according to the node types, the leaf numbers and the common prefix information of the leaf nodes, and the processing subunit is used for obtaining the first data information according to the attribute information and the metadata information corresponding to the leaf nodes.
Optionally, in the data dumping device provided by the embodiment of the application, the device further comprises a third scanning unit, a second dumping unit and a third dumping unit, wherein the third scanning unit is used for scanning a coding table in a target system and a bitmap corresponding to the coding table to obtain third data information before dumping the first data information and the second data information to the target storage medium, the coding table is used for performing compression coding processing on metadata information when constructing a plurality of first adaptive radix trees, the bitmap is used for expressing the use condition of an array in the coding table, and the second dumping unit is used for dumping the third data information to the target storage medium.
Optionally, in the data dumping device provided by the embodiment of the application, the device further comprises a first judging unit, a first reading unit and a second reading unit, wherein the first judging unit is used for judging whether the full data dumping is finished or not through a finishing identifier corresponding to the full data dumping after the first data information and the second data information are dumped to the target storage medium, the first reading unit is used for reading the first data information in the target storage medium and constructing a plurality of second self-adaptive radix trees according to the first data information if the full data dumping is finished, and the second reading unit is used for reading the second data information in the target storage medium and constructing a plurality of second association containers according to the first data information.
Optionally, in the data dumping device provided by the embodiment of the application, the device further comprises a third reading unit for reading the newly added metadata information in the added block to obtain first newly added metadata information, a first obtaining unit for obtaining a first version number corresponding to the first newly added metadata information, wherein the first version number is generated when the first newly added metadata information is newly added, a first determining unit for determining whether the first version number is a first expected version number according to a target version number, wherein the target version number is a version number corresponding to the full data dumping, and a first updating unit for updating the plurality of second adaptive base trees according to the first newly added metadata information to obtain updated plurality of second adaptive base trees or updating the plurality of second associated containers according to the first newly added metadata information to obtain updated plurality of second associated containers.
Optionally, in the data dumping device provided by the embodiment of the application, the device further comprises a storing unit, a fourth reading unit and a second updating unit, wherein the storing unit is used for storing the first newly-added metadata information into the target queue if the target version number is not the first expected version number after determining whether the first version number is the first expected version number according to the target version number, the fourth reading unit is used for reading the newly-added metadata information in the added block again to obtain second newly-added metadata information, and the second updating unit is used for updating the plurality of second adaptive base trees according to the second newly-added metadata information if the second version number corresponding to the second newly-added metadata information is the first expected version number to obtain updated plurality of second adaptive base trees or updating the plurality of second associated containers according to the second newly-added metadata information to obtain updated plurality of second associated containers.
Optionally, in the data dumping device provided by the embodiment of the application, the device further comprises a second judging unit, a third updating unit and a first executing unit, wherein the second judging unit is used for carrying out updating processing on a plurality of second adaptive radix trees according to second newly added metadata information to obtain updated a plurality of second adaptive radix trees, or carrying out updating processing on a plurality of second associated containers according to the second newly added metadata information to obtain updated a plurality of second associated containers, judging whether third newly added metadata information exists in a target queue, the version number corresponding to the third newly added metadata information is a second expected version number, the first executing unit is used for repeatedly executing the step of reading the newly added metadata information in the added block until all newly added metadata information in the added block is read, and the third updating unit is used for carrying out updating processing on the updated plurality of second adaptive trees according to the third newly added metadata information or carrying out updating processing on the updated plurality of associated metadata information according to the third newly added metadata information until all newly added metadata information in the newly added block is read.
Optionally, in the data dumping device provided by the embodiment of the application, the device further comprises a second acquisition unit, a second determination unit and a third judgment unit, wherein the second acquisition unit is used for acquiring historical time information of the last completion of the full data dumping of the target system before scanning the plurality of first self-adaptive radix trees to obtain first data information, the second determination unit is used for determining a time difference value according to the historical time information and the current time information, and the third judgment unit is used for judging whether the current state of the target system meets the preset condition of the full data dumping according to the time difference value and a time threshold condition in the preset condition of the full data dumping.
Optionally, in the data dumping device provided by the embodiment of the application, the third judging unit comprises a first judging subunit, a calculating subunit, an obtaining subunit and a second judging subunit, wherein the first judging subunit is used for judging whether the time difference value is larger than or equal to a time threshold value in the time threshold value condition, the calculating subunit is used for calculating a first total quantity value of metadata information in an increment block in a target time period if the time difference value is larger than the time threshold value, the target time period is from a time point corresponding to historical time information to a time point of current time information, the second judging subunit is used for obtaining a second total quantity value of metadata information counted by a metadata management module in a target system, and the second judging subunit is used for judging whether the current state of the target system meets the preset condition of full data dumping according to the data relation condition in the preset condition of data relation between the first total quantity value and the second total quantity data dumping.
Optionally, in the data dumping device provided by the embodiment of the application, the device further comprises a third acquisition unit, a fourth judgment unit and a fourth judgment unit, wherein the third acquisition unit is used for acquiring the request times of processing the metadata information received by the target system before scanning the plurality of first self-adaptive radix trees to obtain the first data information, and the fourth judgment unit is used for judging whether the current state of the target system meets the preset condition of the full data dumping according to the request times and the request threshold condition in the preset condition of the full data dumping.
The data dump device includes a processor and a memory, where the first scanning unit 901, the second scanning unit 902, the first transferring unit 903, and the like are stored as program units, and the processor executes the program units stored in the memory to implement corresponding functions.
The processor includes a kernel, and the kernel fetches the corresponding program unit from the memory. The kernel can be provided with one or more than one, and the rapid full data dump of the metadata information is realized by adjusting the kernel parameters.
The memory may include volatile memory, random Access Memory (RAM), and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), among other forms in computer readable media, the memory including at least one memory chip.
The embodiment of the invention provides a computer readable storage medium, on which a program is stored, which when executed by a processor, implements a data dump method.
The embodiment of the invention provides a processor, which is used for running a program, wherein the program executes a data dump method when running.
As shown in FIG. 10, the embodiment of the invention provides an electronic device, which comprises a processor, a memory and a program stored in the memory and capable of running on the processor, wherein when the processor executes the program, the processor realizes the following steps of scanning a plurality of first self-adaptive radix trees to obtain first data information if the current state of a target system meets the preset condition of full data dumping, wherein each first self-adaptive radix tree is constructed by first metadata information corresponding to one data slice, the first metadata information is a fixed-length key value pair, the first data information at least comprises metadata information corresponding to leaf nodes in the plurality of first self-adaptive radix trees, scanning a plurality of first associated containers to obtain second data information, each first associated container stores second metadata information of one data slice, the second data information at least comprises the second metadata information, the second metadata information is a variable-length key value pair, and dumping the first data information and the second data information to the target storage medium.
Optionally, scanning the plurality of first self-adaptive radix trees to obtain first data information, wherein the scanning of the plurality of first self-adaptive radix trees to obtain metadata information corresponding to leaf nodes in the plurality of first self-adaptive radix trees, node types of the leaf nodes and common prefix information in upper nodes corresponding to the leaf nodes, determining attribute information of the metadata information corresponding to the leaf nodes according to the node types, the leaf numbers and the common prefix information of the leaf nodes, and obtaining the first data information according to the attribute information and the metadata information corresponding to the leaf nodes.
Optionally, before the first data information and the second data information are dumped to the target storage medium, the method further comprises the steps of scanning a coding table in the target system and a bitmap corresponding to the coding table to obtain third data information, wherein the coding table is used for carrying out compression coding processing on the metadata information when a plurality of first adaptive radix trees are constructed, the bitmap is used for expressing the use condition of an array in the coding table, and dumping the third data information to the target storage medium.
Optionally, after the first data information and the second data information are dumped to the target storage medium, the method further comprises the steps of judging whether the full data dumping is finished or not through a finishing identifier corresponding to the full data dumping, if the full data dumping is finished, reading the first data information in the target storage medium, constructing a plurality of second self-adaptive radix trees according to the first data information, reading the second data information in the target storage medium, and constructing a plurality of second association containers according to the first data information.
Optionally, after constructing a plurality of second association containers according to the first data information, the method further includes reading newly added metadata information in the added block to obtain first newly added metadata information, obtaining a first version number corresponding to the first newly added metadata information, wherein the first version number is generated when the first newly added metadata information is newly added, determining whether the first version number is a first expected version number according to a target version number, wherein the target version number is a version number corresponding to the full data dump, and if the first version number is the first expected version number, updating the plurality of second adaptive radix trees according to the first newly added metadata information to obtain updated plurality of second adaptive radix trees, or updating the plurality of second association containers according to the first newly added metadata information to obtain updated plurality of second association containers.
Optionally, after determining whether the first version number is the first expected version number according to the target version number, the method further includes storing first newly added metadata information into the target queue if the target version number is not the first expected version number, reading the newly added metadata information in the added block again to obtain second newly added metadata information, and updating the plurality of second adaptive radix trees according to the second newly added metadata information if the second version number corresponding to the second newly added metadata information is the first expected version number to obtain updated plurality of second adaptive radix trees or updating the plurality of second associated containers according to the second newly added metadata information to obtain updated plurality of second associated containers.
Optionally, after updating the plurality of second adaptive radix trees according to the second newly added metadata information to obtain updated plurality of second adaptive radix trees, or updating the plurality of updated plurality of second adaptive radix trees according to the second newly added metadata information to obtain updated plurality of second associated containers, the method further comprises judging whether third newly added metadata information exists in the target queue, wherein a version number corresponding to the third newly added metadata information is a second expected version number, if the third newly added metadata information does not exist, repeating the step of reading the newly added metadata information in the added block until all the newly added metadata information in the added block has been read, and if the third newly added metadata information exists, updating the updated plurality of second adaptive radix trees according to the third newly added metadata information, or updating the updated plurality of second associated containers according to the third newly added metadata information until all the newly added metadata information in the newly added block has been read.
Optionally, before the first data information is obtained by scanning the plurality of first adaptive radix trees, the method further comprises the steps of obtaining historical time information of the last time of completing the full data dump of the target system, determining a time difference value according to the historical time information and the current time information, and judging whether the current state of the target system meets preset conditions of the full data dump according to the time threshold value conditions in the preset conditions of the full data dump and the time difference value.
Optionally, judging whether the current state of the target system meets the preset condition of the full data dump according to the time difference value and the time threshold value condition in the preset condition of the full data dump comprises judging whether the time difference value is larger than or equal to the time threshold value in the time threshold value condition, if the time difference value is larger than the time threshold value, calculating a first total value of metadata information in an increment block in a target time period, wherein the target time period is from a time point corresponding to historical time information to a time point of the current time information, acquiring a second total value of metadata information counted by a metadata management module in the target system, and judging whether the current state of the target system meets the preset condition of the full data dump according to the data relation between the first total value and the second total value and the data relation condition in the preset condition of the full data dump.
Optionally, before scanning the plurality of first adaptive radix trees to obtain the first data information, the method further comprises the steps of obtaining the number of times of requests for processing the metadata information received by the target system, and judging whether the current state of the target system meets the preset condition of full data dump according to the number of times of requests and the request threshold condition in the preset condition of full data dump.
The device herein may be a server, PC, PAD, cell phone, etc.
The application also provides a computer program product which is suitable for executing a program initialized with the following method steps when the computer program is executed on data processing equipment, if the current state of a target system meets the preset condition of full data dump, scanning a plurality of first self-adaptive base trees to obtain first data information, wherein each first self-adaptive base tree is constructed by first metadata information corresponding to one data fragment, the first metadata information is a fixed-length key value pair, the first data information at least comprises metadata information corresponding to leaf nodes in the plurality of first self-adaptive base trees, scanning a plurality of first association containers to obtain second data information, wherein each first association container stores second metadata information of one data fragment, the second data information at least comprises second metadata information, the second metadata information is a variable-length key value pair, and the first data information and the second data information are dumped to a target storage medium.
Optionally, scanning the plurality of first self-adaptive radix trees to obtain first data information, wherein the scanning of the plurality of first self-adaptive radix trees to obtain metadata information corresponding to leaf nodes in the plurality of first self-adaptive radix trees, node types of the leaf nodes and common prefix information in upper nodes corresponding to the leaf nodes, determining attribute information of the metadata information corresponding to the leaf nodes according to the node types, the leaf numbers and the common prefix information of the leaf nodes, and obtaining the first data information according to the attribute information and the metadata information corresponding to the leaf nodes.
Optionally, before the first data information and the second data information are dumped to the target storage medium, the method further comprises the steps of scanning a coding table in the target system and a bitmap corresponding to the coding table to obtain third data information, wherein the coding table is used for carrying out compression coding processing on the metadata information when a plurality of first adaptive radix trees are constructed, the bitmap is used for expressing the use condition of an array in the coding table, and dumping the third data information to the target storage medium.
Optionally, after the first data information and the second data information are dumped to the target storage medium, the method further comprises the steps of judging whether the full data dumping is finished or not through a finishing identifier corresponding to the full data dumping, if the full data dumping is finished, reading the first data information in the target storage medium, constructing a plurality of second self-adaptive radix trees according to the first data information, reading the second data information in the target storage medium, and constructing a plurality of second association containers according to the first data information.
Optionally, after constructing a plurality of second association containers according to the first data information, the method further includes reading newly added metadata information in the added block to obtain first newly added metadata information, obtaining a first version number corresponding to the first newly added metadata information, wherein the first version number is generated when the first newly added metadata information is newly added, determining whether the first version number is a first expected version number according to a target version number, wherein the target version number is a version number corresponding to the full data dump, and if the first version number is the first expected version number, updating the plurality of second adaptive radix trees according to the first newly added metadata information to obtain updated plurality of second adaptive radix trees, or updating the plurality of second association containers according to the first newly added metadata information to obtain updated plurality of second association containers.
Optionally, after determining whether the first version number is the first expected version number according to the target version number, the method further includes storing first newly added metadata information into the target queue if the target version number is not the first expected version number, reading the newly added metadata information in the added block again to obtain second newly added metadata information, and updating the plurality of second adaptive radix trees according to the second newly added metadata information if the second version number corresponding to the second newly added metadata information is the first expected version number to obtain updated plurality of second adaptive radix trees or updating the plurality of second associated containers according to the second newly added metadata information to obtain updated plurality of second associated containers.
Optionally, after updating the plurality of second adaptive radix trees according to the second newly added metadata information to obtain updated plurality of second adaptive radix trees, or updating the plurality of updated plurality of second adaptive radix trees according to the second newly added metadata information to obtain updated plurality of second associated containers, the method further comprises judging whether third newly added metadata information exists in the target queue, wherein a version number corresponding to the third newly added metadata information is a second expected version number, if the third newly added metadata information does not exist, repeating the step of reading the newly added metadata information in the added block until all the newly added metadata information in the added block has been read, and if the third newly added metadata information exists, updating the updated plurality of second adaptive radix trees according to the third newly added metadata information, or updating the updated plurality of second associated containers according to the third newly added metadata information, and repeating the step of reading the newly added metadata information in the added block until all the newly added metadata information in the added block has been read.
Optionally, before the first data information is obtained by scanning the plurality of first adaptive radix trees, the method further comprises the steps of obtaining historical time information of the last time of completing the full data dump of the target system, determining a time difference value according to the historical time information and the current time information, and judging whether the current state of the target system meets preset conditions of the full data dump according to the time threshold value conditions in the preset conditions of the full data dump and the time difference value.
Optionally, judging whether the current state of the target system meets the preset condition of the full data dump according to the time difference value and the time threshold value condition in the preset condition of the full data dump comprises judging whether the time difference value is larger than or equal to the time threshold value in the time threshold value condition, if the time difference value is larger than the time threshold value, calculating a first total value of metadata information in an increment block in a target time period, wherein the target time period is from a time point corresponding to historical time information to a time point of the current time information, acquiring a second total value of metadata information counted by a metadata management module in the target system, and judging whether the current state of the target system meets the preset condition of the full data dump according to the data relation between the first total value and the second total value and the data relation condition in the preset condition of the full data dump.
Optionally, before scanning the plurality of first adaptive radix trees to obtain the first data information, the method further comprises the steps of obtaining the number of times of requests for processing the metadata information received by the target system, and judging whether the current state of the target system meets the preset condition of full data dump according to the number of times of requests and the request threshold condition in the preset condition of full data dump.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, etc., such as Read Only Memory (ROM) or flash RAM. Memory is an example of a computer-readable medium.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.