Disclosure of Invention
Embodiments of the present specification provide a cache cleaning method for random small IOs that aims to address one or more of the above problems and other potential problems.
In order to achieve the above object, the following technical scheme is provided:
According to a first aspect of the present disclosure, there is provided a method for cleaning a cache for random small IOs, the method comprising:
Obtaining the dirty data percentage of the cache nodes under a plurality of blocks, wherein the blocks are divided by back-end equipment, the blocks are managed by a plurality of block groups, and the number of the block groups is smaller than that of the blocks;
moving the blocks under the block groups corresponding to the dirty data threshold ranges based on the dirty data percentages of the blocks, wherein the dirty data percentages of the blocks are in the dirty data threshold ranges of the block groups for managing the blocks;
traversing the block group according to the sequence from high to low of the dirty data threshold range, searching cleanable blocks to perform cache cleaning, judging whether the cache nodes below the searched blocks need cavity filling or not during cleaning, and if so, performing cavity filling first and then starting the cleaning process.
According to the cache cleaning method for random small IO in the embodiment of the specification, the blocks are organized and ordered based on the dirty data quantity of the cache nodes under each block, and relatively continuous dirty data to be cleaned with the dirty data quantity from high to low can be efficiently and quickly found during cleaning, so that the addressing distance of the IO in the cleaning process is reduced, the disk seek time is shortened, meanwhile, the cache nodes are filled when holes are needed to be filled, the problem of disk lowercase is avoided when the IO is shortened, the cleaning efficiency of the cache is higher, the cache is not prone to bursting, and the IO performance of the whole system is guaranteed.
In some embodiments, traversing the set of tiles in order of the dirty data threshold from high to low, looking for the cleanable tile for cache cleaning, comprises:
Traversing the block group according to the sequence from high to low of the dirty data threshold range to find cleanable blocks;
Searching for cache nodes to be cleaned in the blocks obtained by searching, and cleaning dirty data of the cache nodes to be cleaned when the number of the cache nodes to be cleaned is larger than the single maximum cleaning number, and continuing to search for other cleanable blocks after cleaning until the cleaning is finished, wherein the cleaned blocks are moved to the positions below the corresponding block groups based on the new dirty data percentages;
And when the number of the cache nodes to be cleaned is smaller than or equal to the single maximum cleaning number, continuing to search for other cleanable blocks, and repeating the cleaning process until the traversal is finished.
In some embodiments, after the traversing is finished, judging whether the cache node to be cleaned exists, if so, cleaning dirty data of the cache node to be cleaned, otherwise, finishing the cleaning process.
In some embodiments, the traversal and cleaning process is timed on.
In some embodiments, judging whether the found cache node under the block needs hole filling includes calculating a hole value and a hole number of the found cache node under the block, calculating IO time consuming time before and after hole filling of the cache node, taking the IO number with shorter IO time consuming time after hole filling as a hole filling threshold, and when the IO number of the cache node is greater than or equal to the hole filling threshold, the cache node needs hole filling.
In some embodiments, in a redundancy-free array structure formed by n disks, the formula of the IO time-consuming time S before hole filling is as follows:
Wherein,The average seek time is indicated as being the average seek time,Representing the n blocks of disk synchronization overhead time,The average delay time is indicated as such,Representing the data transmission time.
In some embodiments, in a RAID5 array structure formed by n+1 disks, the IO time consuming time before hole filling is as follows:
Wherein,Representing the minimum IO time-consuming time before cavity filling, namely the IO time-consuming time under the condition of optimal distribution of m IOs; the maximum IO time consuming time before cavity filling is represented, namely the IO time consuming time under the condition of m IO worst distribution, m represents the IO quantity,Indicating a data transfer time of 4 kbytes,Representing the time consuming exclusive or calculation.
In some embodiments, the formula for IO time consuming time after hole filling is as follows:
Wherein,Represents the time-consuming time of 1M-size IO after void filling,The average seek time is indicated as being the average seek time,Representing the n blocks of disk synchronization overhead time,The average delay time is indicated as such,Representation ofThe time of transmission of the byte data,Representing the time consuming exclusive or calculation.
In some embodiments, the hole filling includes reading data in a hole space of the cache node to a memory.
Detailed Description
Preferred embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present description are shown in the drawings, it should be understood that the present description may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The term "comprising" and variations thereof as used herein means open ended, i.e., "including but not limited to. The term "or" means "and/or" unless specifically stated otherwise. The term "based on" means "based at least in part on". The terms "one example embodiment" and "one embodiment" mean "at least one example embodiment. The term "another embodiment" means "at least one additional embodiment". The terms "upper," "lower," "front," "rear," and the like, as used herein, refer to a position or a positional relationship based on the orientation or positional relationship shown in the drawings, and are merely for convenience in describing the principles of the present specification, and do not indicate or imply that the elements referred to must have a particular orientation, be configured or operated in a particular orientation, and thus should not be construed as limiting the present specification.
Caching is a technique to meet the access requirements of slow storage devices by storing data on devices that are accessed faster than the original data source. When the system needs to read certain data, firstly, checking whether the data exists in a cache, if the data exists, namely, the cache hits, directly reading the data from the cache, wherein the speed of reading the data is faster than that of reading the data from an original data source, and if the data does not exist, namely, the cache misses, the system can read the data from the original data source and store the data in the cache for future use. However, compared with the storage capacity of the original data source at the rear end, the storage capacity of the cache is smaller, particularly, in a random small IO environment, the access operation is carried out on the data on the storage medium in a random sequence, future IO requests are difficult to predict and cache effectively, the cache hit rate is low, the storage capacity is easy to be exhausted, the small IO is that the data blocks in the access process are relatively smaller, frequent addressing and data transmission are required, the quantity of the IO requests is increased rapidly, the cache exhaustion speed is further increased, and the IO performance of the whole system is limited when the cache is exhausted. Therefore, the embodiment of the present disclosure shows an efficient cache cleaning method, and a cache cleaning method for random small IOs in the embodiment of the present disclosure is described in detail below with reference to the accompanying drawings.
The method of the embodiment of the specification comprises the following steps:
S1, acquiring the dirty data percentage of cache nodes under a plurality of blocks;
S3, respectively moving the blocks to the positions below the block groups corresponding to the dirty data threshold range based on the dirty data percentages of the blocks;
s5, traversing the block group according to the sequence from high to low of the dirty data threshold range, and searching the cleanable blocks to perform cache cleaning.
In the embodiment of the present disclosure, the block is used as the storage management unit, the block in step S1 is divided by the back-end device according to a certain size, for example, 64M size is used as a block, each block is composed of 64 cache nodes with 1M size, each block corresponds to a block metadata, and the block metadata is used for recording the dirty data condition of the cache node under the corresponding block. Dirty data in a cache node refers to data stored in the cache node that is inconsistent with the actual data in the original data source or that has expired, erroneous, meaningless, and that needs to be cleaned up first. And the dirty data percentage is the percentage of the dirty data amount in the block to the total data storage amount of the block, and reflects the dirty data storage condition of the block. The blocks are managed by a plurality of block groups, and specifically, the block groups manage the blocks through block chains respectively, so that the blocks mounted on the block chains are arranged according to time sequence. The number of the block groups is smaller than the number of the blocks, that is, the number of the blockchains is smaller than the number of the blocks, each blockchain corresponds to a different dirty data threshold range, for example, as shown in fig. 2, 11 blockgroups are set, the blockchains of the block groups are respectively a dirty data threshold range of [0%,0% ], a dirty data threshold range of [0%,10% ], a dirty data threshold range of [1], a dirty data threshold range of [10%,20% ], and so on, of the following blockchains are all increased by 10%, until the dirty data threshold range reaches [90%,100% ].
In step S3, each block is mounted under a different blockchain based on its dirty data percentage, and along with the use of the cache and the cleaning process, the dirty data percentage in the block changes, and then the block continues to move to the corresponding blockchain based on the respective dirty data percentage, so as to keep the state that all blocks are arranged in groups under different dirty data conditions. For example, a just cleaned block is mounted on blockchain chunk_list [0], the dirty data percentage of the block gradually increases with the use of the cache in the block, when the dirty data percentage increases to a range of 10% -20%, the block is mounted on blockchain chunk_list [1] in a moving way, the dirty data percentage continues to increase, and the block continues to move to the corresponding blockchain mount based on the dirty data threshold range where the dirty data percentage is located.
The blocks are organized according to different dirty data conditions, the blocks with more dirty data can be cleaned preferentially when the cache cleaning is carried out, the addressing distance and the seek time of a magnetic disk in the cleaning process are reduced, and the cache cleaning efficiency is accelerated. The cleaning process of step S5 includes the steps of:
s51, traversing the block group according to the sequence from high to low of the dirty data threshold range to find cleanable blocks;
S53, searching for the cache nodes to be cleaned in the blocks obtained by searching, cleaning dirty data of the cache nodes to be cleaned when the number of the cache nodes to be cleaned is larger than the single maximum cleaning number, continuing to search for other cleanable blocks after cleaning until the cleaning is finished, and continuing to search for other cleanable blocks when the number of the cache nodes to be cleaned is smaller than or equal to the single maximum cleaning number, and repeating the cleaning process until the cleaning is finished.
Dirty data may include dirty data that has been cleaned or that does not need to be cleaned, and a block with a larger percentage of dirty data may also include cache nodes with a smaller portion of dirty data, so the cache nodes to be cleaned need to be searched first in step S53. The number of the cache nodes to be cleaned is larger than the single maximum cleaning number, so that a part of unclean cache nodes are reserved, a certain pressure is provided for the back-end equipment, and all the performances of the back-end equipment are exerted. The cleaned blocks are mounted under the corresponding blockchain based on their new dirty data percentage movements.
After the traversal process in step S55 is finished, it is determined whether the cache node to be cleaned is still present, so as to prevent the missing part of the cache node from being uncleaned, if so, the dirty data of the cache node to be cleaned is cleaned, otherwise, the cleaning process is finished. The above-mentioned traversing and cleaning processes are started at regular time, and the traversing process can also only traverse the block group with larger dirty data threshold range, so as to improve the single cleaning efficiency.
In step S53, when the dirty data of the to-be-cleaned cache node is cleaned, it is determined whether the to-be-cleaned cache node needs cavity filling, and if so, the cleaning process is started after cavity filling is performed. The hole filling refers to reading data in a space corresponding to a hole from the back-end equipment to the memory, so that all data blocks in the whole cache node space have cache data corresponding to the data blocks, a plurality of small IO can be constructed into a large IO after the hole filling, the number of times of IO is reduced, the problem of disk lowercase is avoided, and the problem that when dirty data is cleaned, only a plurality of small IO can be constructed for landing due to the existence of the hole, and the cache cleaning efficiency is influenced. When the number of IO times reaches a certain number, the time consuming time of IO after cavity filling is smaller than the time consuming time of IO before cavity filling, so that the improvement of IO performance is realized.
Specifically, determining whether the cache node to be cleaned needs cavity filling includes the following steps:
S531, calculating the hole value and the hole number of the cache node;
s533, calculating IO time consuming time before and after cavity filling of the cache node, and taking the IO number with shorter IO time consuming time after cavity filling as a cavity filling threshold;
s535, when the IO number of the cache node is greater than or equal to the hole filling threshold, the cache node needs hole filling.
In step S531, as shown in fig. 3, each cache node corresponds to a continuous 1M space of the back-end device, the cache node is divided into 256 data blocks with a size of 4K as a unit, the mapped data blocks are mapped with a size of data blocks as a unit during caching, the mapped data blocks are allocated with memories in one-to-one correspondence, the cache node comprises metadata and a block index, the metadata comprises a status bitmap and a node index which represent a status of the block index, and the block index records a memory page address corresponding to the block. Each data block indicates by a valid status bit whether the data block is valid, i.e. whether there is cached data in the data block space. Based on the valid bit of each data block and the valid bit and dirty bit of each sector in the bitmap, the IO number and the hole condition in each cache node can be calculated.
The time consumption of the disk IO mainly comprises seek time, delay time and transmission time, the transmission time can be ignored under the condition of small IO, and the time consumption of the disk IO mainly calculates the seek time and the delay time. For example, in a redundancy-free array structure formed by n disks, the formula of the IO time-consuming time S before hole filling is as follows:
Wherein,The average seek time is indicated as being the average seek time,Representing the n blocks of disk synchronization overhead time,The average delay time is indicated as such,Representing the data transmission time.
For a RAID5 array formed by n+1 blocks of magnetic disks, m write operations are performed in a lower case, m IO operations are optimally distributed on different stripes, at the moment, a host needs to perform m+1 read operations, m+1 write operations and 2m exclusive-or operations, and at the worst, m IO operations are distributed on the same stripe, at the moment, the host needs to perform 2m read operations, 2m write operations and 2m exclusive-or operations. Therefore, in a RAID5 array structure formed by n+1 blocks of disks, if m IOs are uniformly distributed on the data disks of the array, and the IO size is not greater than the minimum stripe processing unit 4K, the formula of the IO time consuming time before cavity filling is as follows:
Wherein,Representing the minimum IO time-consuming time before cavity filling, namely the IO time-consuming time under the condition of optimal distribution of m IOs; the maximum IO time consuming time before cavity filling is represented, namely the IO time consuming time under the condition of m IO worst distribution, m represents the IO quantity,Indicating a data transfer time of 4 kbytes,Representing the time consuming exclusive or calculation.
The formula of IO time consuming time after cavity filling is as follows:
Wherein,Represents the time-consuming time of 1M-size IO after void filling,The average seek time is indicated as being the average seek time,Representing the n blocks of disk synchronization overhead time,The average delay time is indicated as such,Representation ofThe time of transmission of the byte data,Representing the time consuming exclusive or calculation.
The hole filling threshold calculation in step S533 is as follows:
assuming that the IO after hole filling is less time consuming, namely:
The decomposition formula is obtained:
The M IOs are distributed in the 1M space, so that the M IOs only need to seek once, and the average seek time can be ignored. In the array of the common 4+1 block disk, the IO can be satisfied up to 80AndNamely, the hole filling threshold value is 80 at this time, and when the IO number is 80 or more, the time consumption after hole filling can be ensured to be shorter than that after hole filling.
While several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present description. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims.
The embodiments of the present specification have been described above, and the above description is illustrative, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the technical improvements in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.