Background
Fundamentally, data storage in NAND flash memory is achieved by retaining a certain number of electrons in the most basic memory cells, flash cells (R.Bez, E.Camerlenghi, A.Modelli, and A.Viscon. introduction to flash memory of the IEEE,91(4): 489-materials 502, 2003). There are three primary operations in flash memory: read, write, and erase, where read and write are in page units (a page is made up of multiple flash cells), and erase operations are in block units (each block contains multiple pages). Since the Flash Memory cells storing information cannot be rewritten again, a remote update method (k.zhou, s.hu, p.huang, and y.zhao.lx-SSD: Enhancing the life span of NAND Flash-Based Memory video Recycling Invalid pages. in proc.of IEEE MSST,2017) is adopted in the Flash Memory to write the newly updated data into a new free page, and the page storing the old data is set as Invalid. Since invalid pages cannot be directly rewritten, when the number of invalid pages in the flash memory increases and the available free space falls below a given threshold, the flash memory controller triggers a garbage collection operation to free up the space occupied by the invalid pages. The garbage recycling operation is mainly divided into three steps: firstly, selecting a recovered target block; secondly, the effective page data in the target block is migrated to other positions; and thirdly, erasing the target block to obtain a new free block.
The conventional flat flash memory is limited by the process technology, and the capacity of the flash memory can be increased by further reducing the characteristic size of the flash memory or storing more data information in the flash memory cells, but the increase of the data error rate caused by the reduction is difficult to compensate. Instead, 3D flash Memory opens up a new path for breaking through the scaling obstacles of flat flash Memory by vertically stacking multiple flash Memory layers (j.kim, a.j.hong, s.m.kim, k.s.shin, e.b.song, y.hwang, f.xiu, k.galatsis, c.o.chui, r.n.candler, et al.a Stacked Memory Device on Logic 3D Technology for Ultra-High-Density Data storage. However, 3D flash memory, while enabling greater storage capacity, also poses a serious "big block" problem. The delay due to garbage recycling mainly comprises two parts: when the block capacity in the 3D flash memory increases, the target block may contain a large number of valid pages, which increases the overhead of page migration and deteriorates the efficiency of garbage collection.
To reduce garbage collection delay, prior studies have proposed methods for Sub-Block erasure (M.A.d.' Absolute Block Erase for A Three Dimensional (3D) Memory, May 192015. US Patent 9,036,428; E.C.Oh and J.Kong.nonvolatile Memory Device and Sub-Block organizing Method of Feb.242015. US Patent 8,964,481). The method has the main idea that one block is divided into a plurality of subblocks with the same size, the subblocks are used as basic units for erasing, and the time consumed for erasing is the same, so that only effective pages in target subblocks need to be migrated in the garbage recycling process, and the migration overhead of the pages is greatly reduced.
The current research is mainly aimed at two aspects: (i) electronic implementation of sub-block erasure; (ii) selection of a suitable target sub-block. For example, PB (t. -y.chen, y. -h.chang, c. -c.ho, and s. -h.chen.energy sub blocks Erase Management to Boost the Performance of 3D NAND Flash memory.in proc.of ACM DAC,2016) divides sub-blocks by means of hardware isolation and selects a target sub-block according to the number of invalid sub-blocks and the time loss of releasing a unit space; GSM (C. -y.Liu, J.Kotra, M.Jung, and M.Kandemir.PEN: Design and Evaluation of Partial-Erase for 3D NAND-Based High sensitivity SSDs. in Proc. of USENIX FAST,2018) divides subblocks by means of software isolation, while taking the benefit of recovery (invalid space of recovery per unit time) as a measure for selecting target subblocks. However, how to utilize data update characteristics to improve sub-block erase efficiency remains largely unexplored.
The motivation for the present invention is the highly skewed access feature exhibited in storage systems. By directing the hot-write data (which is frequently updated for a short period of time) into the sub-block that is about to be erased, the present invention can speculatively reduce the number of valid pages in the target sub-block, thus shortening the data migration process.
Disclosure of Invention
The invention aims to solve the problems that the I/O performance of the whole flash memory is damaged due to the serious large block problem in the 3D flash memory in the prior art, and provides a method for accelerating the sub-block erasing in the 3D NAND flash memory, which further reduces the number of effective page migration on the basis of the existing sub-block erasing, thereby shortening the execution process of garbage collection and relieving the I/O blocking condition. In the actual I/O processing process, the invention concentrates the distribution of invalid pages in the flash memory by analyzing the characteristics of different logical pages in the workload and writing the invalid pages into the blocks of different groups based on the heat degree of the different logical pages, thereby improving the recovery benefit of the selected blocks in the garbage recovery process.
The invention comprises the following steps:
1) data writing, including dividing the hot degree of a logical page, and writing the data with different hot degrees into blocks of different groups in a step-by-step manner;
2) garbage collection, including the selection of a target block, the selection of a target sub-block and the migration of an effective page;
3) and based on mode design, determining a metric index under different subblock division modes, and writing data into the subblock with high recovery benefit.
In step 1), the specific steps of writing data may be:
1.1 selecting a plurality of proper thresholds to distinguish the logic pages with different updating frequencies according to the access frequency of the requested logic pages from the host in a short time;
1.2 dividing all blocks into different groups according to the recovery benefit of each block, wherein the number of the groups is the same as the total number of the data heat, and concentrating the blocks with high recovery benefit; the calculation of the recovery benefit of the block comprises the number of invalid sub-blocks in the block or the space released in unit time in the garbage recovery process and the like;
1.3 writing each logical page into a block in a group corresponding to the heat degree through the recorded heat degree of the logical page; writing data with high heat into a block with high recovery efficiency, speculatively increasing the number of invalid pages in the block through quick updating of the data, and increasing the number of invalid pages in the block in a stepped manner through different heat divisions;
1.4 selecting the appropriate target sub-block based on the page distribution in all sub-blocks in the block, further facilitating the sub-block erasure method applied by optimizing the spatial distribution of sub-blocks in the block with high recycling efficiency.
In step 2), the garbage recycling specifically comprises the following steps:
2.1 when the free space is insufficient, triggering garbage collection operation, and selecting a block with the highest collection benefit from the groups with the highest collection benefit as a target block;
2.2 after determining the target block, taking all sub-blocks in the block for calculating the highest recovery benefit as target sub-blocks, wherein the space occupied by the target sub-blocks is released in the garbage recovery process;
2.3 in the effective page migration stage, migrating the effective page into the block in the corresponding group according to the heat degree of the data, and simultaneously selecting the sub-blocks according to the write-in rule of the hot data; if the data is hot data, the data is preferentially migrated to the grouping corresponding to the hot degree, otherwise, the data is migrated to the randomly selected block.
In step 3), the specific steps of the pattern-based design include:
3.1 based on the different subblock division modes, the specific implementation strategy is different. Under soft isolation, the space released by unit time is used as an index of recovery benefit; under hard isolation, taking the number of invalid sub-blocks in a block as an index;
3.2 if the sub-block with high recovery efficiency contains the idle page, writing the data into the sub-block with high recovery efficiency, otherwise, based on the difference of the two isolation modes, writing into other sub-blocks which are closer to the sub-block with high recovery efficiency under soft isolation, and writing into the sub-blocks which contain the idle page and have higher recovery efficiency under hard isolation.
Compared with the prior art, the invention has the following outstanding advantages:
1. the invention distinguishes different heat degrees of data, writes the data of different heat degrees into the blocks in different groups in a stepped manner, and increases the number of invalid pages in the blocks speculatively.
2. The invention provides a pervasive strategy based on data access characteristic analysis and utilization, which combines data access characteristics to an advanced subblock erasing method, applies workload characteristics to subblock erasing design, and accelerates recovery of invalid pages in a flash memory by writing high-heat data into subblocks to be recovered; invalid pages in the flash memory can be distributed in partial sub-blocks more intensively, and garbage recycling benefits are greatly improved.
3. The invention can effectively improve the garbage recovery performance, reduce the number of effective page migration in the garbage recovery process, effectively reduce the migration delay, optimize the read-write performance of the flash memory to a certain extent, reduce the write amplification, shorten the execution process of the garbage recovery and reduce the I/O blocking condition.
4. The method of the invention has good performance, and can reduce the read delay of 23.4 percent, the write delay of 19.3 percent and the write amplification factor of 11.8 percent on average compared with the soft isolation and benchmark results. Compared with a benchmark method and a hard isolation method, the method can reduce the read delay by 0.4-56.9%, the write delay by 0.5-45.1% and the write amplification factor by 0.3-21.7%.
Detailed Description
The invention will be further explained with reference to the drawings.
The embodiment of the invention comprises the following steps:
step 1: the method comprises the following steps of logic page heat degree division and data writing:
1.1 selecting a plurality of proper thresholds to distinguish the logic pages with different write heat according to the access frequency of the requested logic pages from the host in a short time;
1.2 dividing all blocks into different groups according to the recovery benefit of each block, wherein the number of the groups is the same as the total number of the data heat, and concentrating the blocks with high recovery benefit; the calculation of the recovery benefit of a block varies in different ways, e.g. the number of invalid sub-blocks in the block or the space freed per unit time during garbage recovery;
1.3 writing each logical page into a block in a group corresponding to the heat degree through the recorded heat degree of the logical page; writing data with high heat into a block with high recovery benefit, speculatively increasing the number of invalid pages in the block by quickly updating the hot-written data, and increasing the number of invalid pages in the block in a stepped manner by different heat divisions;
1.4 selecting the appropriate target sub-block based on the page distribution in all sub-blocks in the block, further facilitating the sub-block erasure method applied by optimizing the spatial distribution of sub-blocks in the block with high recycling efficiency.
Step 2: the garbage recycling stage mainly comprises the steps of selecting a target block, selecting a target sub-block and migrating an effective page, and the method comprises the following specific steps:
2.1 when the free space is insufficient, triggering garbage collection operation, and selecting a block with the highest collection benefit from the groups with the highest collection benefit as a target block;
2.2 after determining the target block, taking all sub-blocks in the block for calculating the highest recovery benefit as target sub-blocks, wherein the space occupied by the target sub-blocks is released in the garbage recovery process;
2.3 in the effective page migration stage, migrating the effective page into the block in the corresponding group according to the heat degree of the data, and simultaneously selecting the sub-blocks according to the write-in rule of the hot data;
and step 3: the implementation under two different sub-block erasing methods comprises a soft isolation method and a hard isolation method, and the specific steps are as follows:
3.1 in the soft isolation method, the space released by unit time is used as an index of recovery benefit, and in the hard isolation method, the number of invalid sub-blocks in a block is used as an index;
3.2 when the sub-block with high recycling benefit contains a free page, preferentially writing data into the sub-block with high recycling benefit, otherwise writing data into other sub-blocks closer to the high recycling benefit under soft isolation and writing data into the sub-blocks containing the free page and having higher recycling benefit under hard isolation based on the difference of two isolation modes (soft isolation tends to recycle a plurality of continuous sub-blocks, and hard isolation tends to select the sub-block with high recycling benefit).
The core of the invention is to analyze the hot data characteristics in the working load on the prior advanced subblock erasing method in the flash memory, and further improve the garbage recovery efficiency with low cost, thereby avoiding unnecessary read-write operation, relieving the problem of write amplification, and simultaneously reducing the time for blocking the read-write request by the garbage recovery operation. The specific implementation mainly comprises the following modules:
1. a heat identification module: the module realizes high-efficiency data access frequency counting of the memory by using h independent hash functions and a multidimensional hash table. When an update request arrives, the Logical Page Number (LPN) of the update request is hashed into h table entries in the hash table by h hash functions, and the counter in the hit hash table entry is incremented by 1. For a given LPN, the module takes the minimum of the h entries corresponding to it as the access frequency of the LPN. Meanwhile, in order to avoid the continuous accumulation of the counters of the table entries, the values of all the table entries in the hash table are halved every time certain write operation is carried out. By giving g thresholds t1,t2,…tgWhen the access frequency d of the LPN satisfies ti<d<ti+1Or tgIf d is less than d, the heat degree of the logic page is considered to be i or g.
2. A block grouping module: all blocks are divided into g packets based on the recycling benefit, corresponding to g hotness of the data. Assuming a total of n blocks, the number of blocks in each packet cannot exceed
Therefore, each packet is ensured to contain at least one block, and the block is prevented from being concentrated in at least part of packets. Meanwhile, m is more than or equal to 2g-3, and n blocks can be placed in the grouping. In the initial stage of the flash memory, all blocks have the same reclamation benefit, so they are randomly distributed into g packets. As update requests are continuously executed, the reclamation benefit of the blocks changes, and the block grouping module needs to dynamically adjust the position of each block to ensure that the number of blocks in each group does not exceed m. As shown in fig. 2, for packet BG
iThe minimum recovery benefit among all blocks in (A) is r
i*The updated block is denoted by B, and r' is the recovery benefit after update. The method comprises the steps of firstly finding a packet where an updated block B originally locates, moving the updated block B into a corresponding packet according to updated recovery benefit, and when the number of blocks in a target packet reaches the maximum value, selecting a block with the maximum/minimum recovery benefit from the target packet and moving the block into a packet with a higher/lower level. When the original recovery benefit of the block B is larger than the updated recovery benefit, the block B is migrated to the high level, otherwise, the block B is migrated to the low level.
3. A heat update module: after the hot differentiation of data and the grouping of blocks, the hot update module designs an update write strategy for hot data, which aims to reduce the number of valid pages in the recycled sub-blocks. Assuming that the LPN with a heat level i is updated, the module preferentially updates the data to BGiOn this basis, the module mainly considers two problems: selecting BGiWhich block in the block and which sub-block in the block. And for the LPN with the identified heat degree, updating the data into one block in a corresponding group, and writing the data into the sub-block if the sub-block with high recovery efficiency in the block contains free pages, or else writing into any block containing the free pages in the group. In a special case, when all blocks in the corresponding packet do not contain free pages, the block containing free pages is preferentially written into any block in the packet adjacent to the corresponding packet. FIG. 3 depicts the process of writing data by first identifying that the logical page has 5 accesses and determining that it is hot 2 based on the threshold, and thus writing the data preferentiallyTo packet BGiIn (1).
4. An active page migration module: when the number of idle pages is lower than a given threshold, a garbage collection operation is triggered to collect the sub-blocks with high collection efficiency in the selected block. The migration module adopts a heat-aware migration strategy, and the main idea is to relocate the effective page according to the heat so as to further inhibit data migration in the next garbage collection operation. The specific method comprises the following steps: for a number of sub-blocks to be reclaimed within a given block, each valid page is scanned and its heat is analyzed by a data heat identification module. If the data is hot data, the data is preferentially migrated to the grouping corresponding to the hot degree, otherwise, the data is migrated to the randomly selected block.
When the flash memory receives a request from the host interface, the heat identification module updates and acquires the heat of the corresponding logical page. If the request is a write request, triggering a hot degree updating module, acquiring a block containing an idle page in a group corresponding to the hot degree of the data, and writing the data into the block; when the free space is enough, new data is written, which causes the effective page in the block to increase, and the block grouping module recalculates the recycling benefit and moves the block among the groups to maintain the grouping principle; if the free space is not enough (the free page is lower than a certain threshold), garbage collection is triggered, firstly, an effective page migration module is operated to migrate the effective page in the block, and then the block is erased and the space is released.
The system structure model realized by the invention is shown in figure 1, and the system model design mainly comprises two layers: a flash translation layer and a memory technology device layer. The flash translation layer comprises a heat identification module and a space distribution module, and writes the requested logic page number into the heat identification module through the host interface. The space allocation module is used for allocating physical space addresses for actual writing of data, and the heat updating module is contained in the space allocation module. The memory technology layer mainly comprises a garbage collector and a block grouping module, wherein the garbage collector comprises an effective page migration module. When the flash memory receives a write request from a host, the heat of the write request is analyzed by the LPN and the heat identification module of the request, the space allocator allocates an actual physical space according to the heat of data and the grouping condition of blocks, and the block grouping module is triggered to renew grouping of the blocks due to the fact that the recovery benefit of the blocks can be changed after the data is actually written. When there is not enough space allocation, a garbage collector is triggered, and in the valid page migration phase, the block grouping module is also triggered because the extra data writing may also cause the recovery benefit of the block to change.
The performance tests of the present invention are given below:
the workload used for testing by the present invention is obtained from an advanced resource repository that collects I/O records for the enterprise virtual desktop infrastructure. Each trace file records the detailed information of the read-write request from the host, which mainly includes the time of request reception, the type of the request (read/write), the logical address of the requested data and the size of the request. All performance testing and analysis processes are performed on an advanced SSD simulator. The performance of the system design is evaluated from three levels, firstly, the overall performance of the system model design is comprehensively evaluated, and the overall performance comprises the test garbage recycling delay, the average response delay of read-write requests and the write amplification factor; secondly, sensitivity analysis, namely observing performance change by adjusting experiment parameters, and mainly testing the performance under different subblock numbers and after changing the garbage recovery threshold; third, generalized analysis (universality), tests the performance of the system design when applied to different sub-block erasure methods. The performance test is carried out by a comparison experiment mode, mainly comprises a hard isolation method and a soft isolation method, and in addition, the performance of a benchmark (a subblock erasing method is not adopted) is compared. The soft isolation GSM method is to take the sub-block adjacent to the sub-block to be recovered as an isolation interval, and in the garbage recovery process, not only the effective pages in the sub-block to be recovered are migrated, but also the effective pages in the sub-block adjacent to the sub-block are migrated, so that the interference on the effective pages in the adjacent sub-block during erasing is avoided; the hard isolation method of PB is used to isolate interference by placing isolation layers (applying a blocking voltage) between adjacent sub-blocks. Both methods bring extra overhead, soft isolation brings extra overhead of effective page migration, and hard isolation inevitably causes capacity loss.
A. Test of overall Performance
The main parameters of the experimental tests refer to the configuration of a 64-layer 3D flash memory, which includes two channels (channels), each channel being contended by two flash chips (chips), each chip consisting of 5748 blocks, each block containing 768 pages, with a page size of 16 KB. The delay of the flash basic operation is configured as a 700 microsecond write delay, a 60 microsecond read delay, and an erase delay of 3500 microseconds. The number of hash functions in the heat identification module is set to be 7, and the total heat level of the data is also set to be 7. Furthermore, unless otherwise specified, the test procedure divides the block into 16 sub-blocks, with a garbage reclamation threshold of 20%.
A.1, delaying garbage recovery:
according to the size of the data written by the I/O request, 14 traces are selected from the resource library through the experiment, and the experiment result is shown in FIG. 4 (a). The present invention when applied on soft isolation can reduce the recovery delay by 67.8% and 61.6%, respectively, compared to the delay results under baseline and soft isolation.
A.2 average read latency:
fig. 4(b) shows the average read latency results. The results show that the delay can be reduced by 25.9% and 17.6% compared to the baseline and soft isolation methods, respectively. In addition, there are also several traces whose optimization of results is not obvious (e.g., 2016021719-LUN0 and 2016021619-LUN2), mainly because the interval between consecutive read requests is long and the garbage collection process is not as blocked.
A.3 average write latency:
fig. 4(c) shows the average write delay results. Since the read-write latency variation is mainly caused by garbage collection process blocking, the result of interpreting the average write latency is similar to the read latency.
A.4 write amplification factor:
the write amplification factor is determined by dividing the amount of actual writes inside the flash memory by the amount of write data from the host. The greater the write amplification factor, the greater the endurance penalty to the flash memory. Fig. 4(d) shows the write amplification test result, the system model designed by the invention hardly brings extra write operation, and is a design with a flash memory life-friendly effect.
The invention has the main design advantages of effectively improving the garbage recycling performance and reducing the number of effective page migration in the garbage recycling process. The delay of the garbage recovery process mainly consists of two parts: the erase delay and the effective page migration delay are reduced, so that the effective page migration can be effectively reduced. Reducing the garbage collection delay can optimize the read and write performance of the flash memory to some extent, since an excessively long garbage collection delay may cause an extended time for the read and write requests to be blocked. Migration of valid pages inevitably brings extra read and write operations, and less valid page migration can reduce write amplification.
B. Sensitivity test
Since the number of sub-blocks is an important factor affecting the performance of the sub-block erasing method, the garbage collection threshold value has a large influence on the performance of garbage collection. To assess the sensitivity of the system design, the experiments were further analyzed by varying the number of sub-blocks and the garbage recovery threshold.
B.1 influence of the number of sub-blocks:
fig. 5 shows a comparison of read and write performance and write amplification for the soft isolation method and the present invention at 8, 16 and 32 sub-blocks, respectively. Firstly, as the number of the subblocks increases, the subblocks of the soft isolation method during garbage recovery are more abundantly selected, so that the performance of the soft isolation method is improved, the method of the invention achieves the best performance under 8 subblocks, and the income caused by the increase of the number of the subblocks cannot make up for the extra time overhead caused by the increase of the number of the subblocks; secondly, when the number of the sub-blocks is increased, the soft isolation and the method of the invention can reduce the write amplification, because the granularity of erasure is thinner along with the increase of the number of the sub-blocks, and unnecessary migration overhead can be avoided; finally, the method of the present invention has the best performance, with an average 23.4% read latency reduction, 19.3% write latency reduction, and 11.8% write amplification reduction compared to the soft isolation and baseline results.
B.2 influence of garbage recovery threshold:
the test evaluates the impact of different garbage collection thresholds, including 10%, 20% and 30% thresholds. FIG. 6 shows that the method of the present invention has the best performance, and it is found that the read/write performance tends to decrease with the increase of the threshold, because the increase of the threshold results in more effective page migration.
C. General test
C.1 performance testing under hard isolation:
figure 7 shows the implementation of the invention under hard isolation and the performance comparison with the hard isolation method. The results show that the present invention can reduce the read delay by 0.4% to 56.9%, the write delay by 0.5% to 45.1%, and the write amplification factor by 0.3% to 21.7% compared to the baseline and hard isolation methods.
Aiming at the current situation that the overall I/O performance of the flash memory is damaged due to the serious large block problem in the 3D flash memory, the invention innovatively provides a universal strategy based on data access characteristic analysis and utilization. The existing method for solving the problem of large blocks mainly aims at designing a sub-block erasing method, and is still blank in the aspect of proposing a strategy by combining data access characteristics and a sub-block structure. The invention firstly proposes to combine the data access characteristic with the advanced subblock erasing method, so that the invalid pages in the flash memory can be distributed in partial subblocks more intensively, and the garbage recovery benefit is greatly improved.