Detailed Description
Reference will now be made in detail to the exemplary embodiments of the present invention, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings and the description to refer to the same or like parts.
FIG. 1 is a schematic diagram of a memory storage device according to an embodiment of the invention. Referring to fig. 1, a memory storage system 10 includes a host system 11 and a memory storage device 12. Host system 11 may be any type of computer system. For example. Host system 11 may be a notebook computer, desktop computer, smart phone, tablet computer, industrial computer, or the like. The memory storage device 12 is used to store data from the host system 11. For example, the memory storage 12 may include a solid state disk, a U disk, or other type of non-volatile storage. Host system 11 may be electrically connected to memory storage device 12 via a serial advanced technology bus attachment (Serial Advanced Technology Attachment, SATA) interface, peripheral component interconnect Express (Peripheral Component Interconnect Express, PCI Express), universal serial bus (Universal Serial Bus, USB), or other type of connection interface. Thus, host system 11 may store data to memory storage device 12 and/or read data from memory storage device 12.
Memory storage device 12 may include a connection interface 121, a memory module 122, and a memory controller 123. The connection interface 121 is used to connect the memory storage device 12 to the host system 11. For example, connection interface 121 may support connection interface standards such as SATA, PCI Express, or USB. Memory storage 12 may communicate with host system 11 via connection interface 121.
The memory module 122 is used for storing data. The memory module 122 may include a rewritable nonvolatile memory module. The memory module 122 includes an array of memory cells. The memory cells in the memory module 122 store data in the form of voltages. For example, the memory module 122 may include a single Level Cell (Single Level Cell, SLC) NAND-type flash memory module, a Multi Level Cell (MLC) NAND-type flash memory module, a third Level Cell (Triple Level Cell, TLC) NAND-type flash memory module, a Quad Level Cell (QLC) NAND-type flash memory module, a three-dimensional NAND-type flash memory module (3D NAND flash memory module) (which may have a plurality of third or fourth Level memory cells), or other memory modules having similar characteristics. The memory cells in the memory module 122 are arranged in an array.
The memory controller 123 is connected to the connection interface 121 and the memory module 122. The memory controller 123 may be used to control the memory storage device 12. For example, the memory controller 123 may control the connection interface 121 and the memory module 122 for data access and data management. For example, the memory controller 123 may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or other programmable general purpose or special purpose microprocessor, digital signal processor (Digital Signal Processor, DSP), programmable controller, application specific integrated circuit (Application Specific Integrated Circuits, ASIC), programmable logic device (Programmable Logic Device, PLD), or other similar device or combination of devices.
In one embodiment, memory controller 123 is also referred to as a flash memory controller. In one embodiment, the memory module 122 is also referred to as a flash memory module. The memory module 122 may receive a sequence of instructions from the memory controller 123 and access data stored in the memory unit according to the sequence of instructions.
FIG. 2 is a schematic diagram of a memory controller according to an embodiment of the invention. Referring to fig. 1 and 2, the memory controller 123 includes a host interface 21, a memory interface 22, a memory control circuit 23 and an error checking and correcting circuit 24.
The host interface 21 is used to connect to the host system 11 via the connection interface 121 to communicate with the host system 11. In the present example embodiment, host interface 504 is compliant with the SATA standard. However, it should be understood that the present invention is not limited thereto, and the host interface 504 may also be compatible with PATA standards, IEEE1394 standards, PCI Express standards, USB standards, SD standards, UHS-I standards, UHS-II standards, MS standards, MMC standards, eMMC standards, UFS standards, CF standards, IDE standards, or other suitable data transfer standards.
The memory interface 22 is configured to connect to the memory module 122 to communicate with the memory module 122. That is, the data to be written into the memory module 122 is converted into a format acceptable to the memory module 122 through the memory interface 22.
The memory control circuit 23 is connected to the host interface 21 and the memory interface 22. The memory control circuit 23 can communicate with the host system 11 via the host interface 21 and access the memory module 122 via the memory interface 22. The memory control circuit 23 may also be regarded as a control core of the memory controller 123, which has a plurality of control instructions that are executed to perform operations such as writing, reading, erasing, etc. of data when the memory storage device 12 is operated. In various embodiments, the control instructions of the memory control circuit 23 may be implemented in firmware, code or hardware, and the invention is not limited thereto. In the following embodiment, the explanation of the memory control circuit 23 is equivalent to the explanation of the memory controller 123. In addition, the memory control circuit 23 may further include one or more buffer memories for temporarily storing data.
The error checking and correcting circuit 24 is electrically connected to the memory control circuit 23 and is used for executing an error checking and correcting program to ensure the correctness of the data. When the memory control circuit 23 receives a write command from the host system 11, the error checking and correcting circuit 24 generates an error checking and correcting Code (Error Checking and Correcting Code, ECC Code) corresponding to the write command, and the memory control circuit 23 writes the data and the corresponding error checking and correcting Code into the memory module 122. When the memory control circuit 23 reads data from the memory module 122, the error checking and correcting codes are read simultaneously. And the memory control circuit 23 performs an error checking and correcting process on the read data according to the read error checking and correcting code.
FIG. 3 is a schematic diagram illustrating managing memory modules according to an embodiment of the invention.
Referring to fig. 1 and 3, the memory module 122 includes a plurality of physical programming units, such as physical units 301 (0) to 301 (C) shown in fig. 3. The memory module 122 is managed on a physical programming unit basis, and each physical programming unit includes a plurality of memory units and is used for non-volatile storage of data. Multiple physical program units may constitute a physical erase unit (e.g., a physical block). Multiple physical program units (or memory cells) in a physical erase unit may be erased simultaneously. In addition, the memory control circuit 23 may configure the plurality of logic units 311 (0) to 311 (D) to map at least a portion of the physical programming units. For example, a logical unit may be composed of one or more logical addresses. The mapping relationship between the logic unit and the physical programming unit can be recorded in the logic-to-physical mapping table.
In one embodiment, the memory control circuit 23 logically divides the memory module 122 into a data area 310, an idle area 320 and a system area 330. The physical units 301 (1) to 301 (a) logically belonging to the data area 310 store data (also referred to as user data) from the host system 11. The physical programmer in the data area 310 is marked and then associated with the idle area 320. In other words, all the physical units 301 (A+1) -301 (B) in the spare area 320 are marked and do not store valid data. Wherein, the plurality of physical program units in the idle area 320 may constitute an idle physical erase unit.
The physical units 301 (b+1) -301 (C) logically belonging to the system area 330 are used for recording system data. For example, the system data includes information about the manufacturer and model of the memory module, the number of physical erased cells of the memory module, the number of physical programmed cells per physical erased cell, etc. In particular, the number of physical Programming units in the data area 310, the idle area 320, and the system area 330 may vary according to different memory specifications.
In an embodiment where a memory cell can store multiple bits (e.g., MLC or TLC NAND-type flash memory), physical programming units belonging to the same word line (or the same word line layer) can be categorized into at least a lower physical programming unit and an upper physical programming unit. For example, in an MLC NAND type flash memory, the least significant bit (Least Significant Bit, LSB) of a memory cell belongs to the lower physical program cell, and the most significant bit (Most Significant Bit, MSB) of the memory cell belongs to the upper physical program cell. The reliability of the lower physical programming unit is generally higher than that of the upper physical programming unit. In an exemplary embodiment, the lower physical programming unit is also referred to as a fast page (fast page), and the upper physical programming unit is also referred to as a slow page (slow page). In other words, taking an MLC NAND type flash memory as an example, in an MLC NAND type flash memory, several memory cells arranged on the same word line may constitute 2 physical programming units, including a lower physical programming unit composed of the least significant bits (also referred to as the first significant bits) of the memory cells and an upper physical programming unit composed of the most significant bits (also referred to as the second significant bits) of the memory cells.
In addition, in the TLC NAND type flash memory, the least significant bit (Least Significant Bit, LSB) of a memory cell belongs to the lower physical program cell, the middle significant bit (Center Significant Bit, CSB) of the memory cell belongs to the middle physical program cell, and the most significant bit (Most Significant Bit, MSB) of the memory cell belongs to the upper physical program cell.
In addition, in the present exemplary embodiment, the error checking and correcting circuit 24 may perform single-frame (single-frame) encoding for the data stored in the same physical program unit, or may perform multi-frame (multi-frame) encoding for the data stored in a plurality of physical program units. The single frame coding and the multi-frame coding may employ at least one of coding algorithms such as low density parity check correction codes (low density parity code, LDPC), BCH codes, convolutional codes (convolutional code), or turbo codes (turbo codes), respectively. Alternatively, in an example embodiment, the multi-frame encoding may also employ a Reed-solomon codes (RS codes) algorithm or an exclusive-or (XOR) algorithm. In addition, in another exemplary embodiment, more encoding algorithms not listed above may be used, and will not be described here. Depending on the encoding algorithm employed, the error checking and correction circuit 24 may encode the data to be protected to generate a corresponding error correcting code and/or error checking code. For convenience of explanation, the error correction codes and/or error check codes generated by encoding will be collectively referred to as encoded data hereinafter.
FIG. 4 is a schematic diagram of a multi-frame encoding shown according to an embodiment of the present invention;
referring to FIG. 4, taking the example of encoding the data stored in the physical Programming units 810 (0) to 810 (E) to generate corresponding encoded data 820, each of the physical Programming units 810 (0) to 810 (E) is exemplifiedAt least a portion of the data stored by the user may be considered a frame. In multi-frame encoding, the data in the physical programming units 810 (0) to 810 (E) is encoded based on the location of each bit (or byte). For example, bit b at position 801 (1)11 、b21 、…、bp1 Will be encoded as bit b in encoded data 820o1 Bit b at position 801 (2)12 、b22 、…、bp2 Will be encoded as bit b in encoded data 820o2 The method comprises the steps of carrying out a first treatment on the surface of the Similarly, bit b at position 801 (r)1r 、b2r 、…、bpr Will be encoded as bit b in encoded data 820or . The data read from the physical programming units 810 (0) to 810 (E) can then be decoded based on the encoded data 820 in an attempt to correct errors that may exist in the read data.
In addition, in another example embodiment of fig. 4, the data used to generate the encoded data 820 may also include redundancy bits (redundancy bits) corresponding to data bits (data bits) in the data stored by the physical programming units 810 (0) to 810 (E). Taking the data stored in the physical programming unit 810 (0) as an example, the redundancy bits are, for example, encoded data generated by single frame encoding the data bits stored in the physical programming unit 810 (0). In the present exemplary embodiment, it is assumed that when reading data in the physical programming unit 810 (0), the data read from the physical programming unit 810 (0) can be decoded by using redundancy bits (e.g., encoded data generated by single frame encoding) in the physical programming unit 810 (0) to perform error detection and correction on the read data. However, when decoding using the redundancy bits in the physical program unit 810 (0) fails (e.g., the number of errors in the data stored in the physical program unit 810 (0) after decoding is greater than a threshold), a re-Read (Retry-Read) mechanism may be used to select and use other Read voltages to attempt to Read the correct data from the physical program unit 810 (0). When the correct data cannot be Read from the physical programming unit 810 (0) by the re-Read mechanism, the encoded data 820 and the data of the physical programming units 810 (1) to 810 (E) can be Read, and decoding is performed according to the encoded data 820 and the data of the physical programming units 810 (1) to 810 (E), so as to attempt to correct errors in the data stored in the physical programming unit 810 (0). That is, in the present exemplary embodiment, when decoding of encoded data generated using single frame encoding fails and reading of encoded data generated using a re-Read (Retry-Read) mechanism fails, decoding is performed using encoded data generated using multi-frame encoding.
It should be noted that, in the following embodiments, the error physical programming unit (error page) is that the number of bits (bits) in error in the physical programming unit exceeds the correction capability of the error checking and correcting code (ECC), so that the data stored in the error physical programming unit cannot be read or cannot be read correctly.
Fig. 5 is a flow chart illustrating a data recovery method according to an embodiment of the present invention.
Referring to fig. 5, the method of the present embodiment is applicable to the memory storage device 12 in the above embodiment, and the detailed steps of the present embodiment are described below together with the elements in the memory storage device 12. It should be noted that each step in fig. 5 may be implemented as a plurality of codes or circuits, which is not a limitation of the present invention. In addition, the method of fig. 5 may be used with the following exemplary embodiments, or may be used alone, and the present invention is not limited thereto.
In step S501, the memory controller 123 writes the data received from the host system 11 into the plurality of physical programming units of the memory module 122.
In step S502, when the memory controller 123 reads the data in the physical program unit, the data written in the physical program unit cannot be decoded by using the error checking and correcting operation.
In step S503, the memory controller 123 obtains an error probability value of each physical programming unit, where the error probability value of each physical programming unit is expressed as:
f (x, y, z) =h formula 1
Wherein x is the number of program/erase (P/E) times of the physical erase unit, y is the ECC value of the physical erase unit, z is the number of times the data in the physical erase unit is read, and h is the error probability value of the physical erase unit. The error probability value for each physical programmer is h1, h2., hn, where 1, 2 … n are the number of physical programmer and n is the total number of physical programmer.
For example, x can be searched from the P/E erasure count table corresponding to each physical erasure cell, such as TLC NAND flash memory, in which the P/E erasure count of the physical erasure cell is 3000-5000 times, and the error probability value of the physical programming cell in the physical erasure cell with P/E erasure count greater than 2500 times is relatively large.
Similarly, y can be searched from a bad block table corresponding to each physical erasing unit, the bad block table records the ECC value corresponding to each physical programming unit, the ECC value in the flash memory can be set to be 60, and the physical programming units with ECC values of 0-7,8-15, 16-23, 24-31 and 32-47 are classified and managed. Generally, flash memory uses physically programmed cells with relatively small ECC values to store data. Whereas the error probability value is larger for physically programmed cells with larger ECC values, e.g., 48-60.
Likewise, z is queried from the read number table corresponding to each physical programmer. The greater the number of reads of the physical programming unit, the greater its error probability value.
In one embodiment, the physical programming units are classified into at least a first type of physical programming unit and a second type of physical programming unit according to the error probability value. The physical program units of the first type may be good physical program units or normal or high-quality physical program units, for example, physical program units having P/E erase times smaller than 2500 times or having ECC values smaller or having read times may be classified as the physical program units of the first type. In contrast, the second type of physical program unit may be a bad physical program unit or a potentially faulty or low-quality physical program unit, for example, the physical program unit having a P/E erase count greater than 2500 times or a larger ECC value or a larger read count may be classified as the second type of physical program unit.
In step S504, the memory controller 123 shuffles the plurality of physical programming units in the memory module 122 according to the shuffling algorithm; and then, the disturbed entity programming units are grouped into a plurality of first groups, wherein each first group comprises a plurality of first entity programming units and a plurality of second entity programming units, the number of the second entity programming units in each first group is the same, and the entity programming units in the same first group belong to different entity erasing units.
The following describes how to shuffle the plurality of physical programming units in the memory module 122 according to an out-of-order algorithm with reference to fig. 6-7.
FIG. 6 is a schematic diagram of a sort-based disorder algorithm, according to one embodiment of the present invention. As shown in fig. 6, in one embodiment, the memory controller 123 randomly generates a random number for each physical programming unit (P0, P1, P2, P3, P4, P5, P6, P7, P8) and then sorts the random numbers (for example, in this embodiment, the random numbers may be sorted from large to small, or from small to large in another embodiment, the invention is not limited thereto), and the physical programming units corresponding to the random numbers in the first group are moved while the random numbers are sorted, so that the physical programming units in the first group may be disturbed after the sorting of the physical programming units is completed, and the second group (P8, P2, P7, P6, P0, P5, P1, P4, P3) shown in fig. 6 is obtained.
Fig. 7 is a schematic diagram of a swap-based out-of-order algorithm, according to one embodiment of the present invention. As shown in fig. 7, in one embodiment, a first group (P0, P1, P2, P3, P4, P5, P6, P7, P8) of 9 physical programming units generates a random number (the random number is greater than 1) between the 1 st physical programming unit P0 and the 6 th physical programming unit P5, and in this embodiment, the random number is 4, and the positions of the physical programming unit P5 and the physical programming unit P3 in the first group are exchanged, so as to obtain a second group (P0, P1, P2, P5, P4, P3, P6, P7, P8) shown in fig. 7. Obviously, the random number may be greater than 6 and only need to be smaller than N, where N is the number of physical programming units in the first group.
In step S505, the memory controller 123 groups the physical programming units in each first group into a plurality of subgroups, wherein each subgroup comprises a plurality of physical programming units, and each subgroup comprises at most one physical programming unit of the second type.
Fig. 8 is a specific flowchart of step S505 shown according to an embodiment of the present invention.
Step S505 includes the following steps S801 to S805. Specifically, in step S801, the memory controller 123 groups the first physical programming units in the first group into a plurality of subgroups.
In step S802, the memory controller 123 calculates an H value of each subgroup according to the error probability value of the first physical programming unit, wherein the H value is a mathematical expectation of the error probability value of the physical programming units in each subgroup (i.e. the probability of having 2 or 2 physical programming units in error) [ here, the H value is equal to the error probability value H of the first physical programming unit, because each subgroup only comprises the first physical programming units.
In step S803, the memory controller 123 groups the second physical programming units in the first group into a plurality of subgroups, and groups the second physical programming units with the smallest error probability value into a subgroup with the largest H value in the plurality of subgroups.
In step S804, the memory controller 123 recalculates the H values of the plurality of subgroups.
In step S805, the memory controller 123 groups the third physical programming units in the first group into a plurality of subgroups and groups the third physical programming units with the smallest error probability value into a subgroup with the largest H value until the physical programming units in the first group are grouped into a plurality of subgroups and the H values of the plurality of subgroups are equal or close.
In the following, it is described how to group each of the physical programming units of the first group into a plurality of subgroups, such that each subgroup includes at most one physical programming unit of the second type.
In one embodiment, the memory module 122 includes 260000 (26W) physical program units, and the 26W physical program units are shuffled and grouped into 84 first groups (i.e., the first group G0, the first group G1 …, the first group G83) by an out-of-order algorithm. Wherein each first group includes 3000 physical programming units. The 3000 physical program units in each first group are further grouped into 12 subgroups (i.e., subgroup g0, subgroup g1 … subgroup g 11) such that 250 physical program units are included in each subgroup.
Wherein the H values corresponding to the subgroups g0, g1, g2, g3, g4, g5, g6, g7, g8, g9, g10, g11 are H0, H1, H2, H3, H4, H5, H6, H7, H8, H9, H10, H11, respectively, wherein the H values are mathematical expectations of error probability values for the physical programming units in the subgroup.
For example, the physical programming units P0 to P249 are subjected to grouping operation. As shown in fig. 9, the physical programming units P0 to P11 (i.e., the first physical programming unit) are first grouped, the physical programming units P0 are respectively grouped into the subgroups g0, the physical programming units P1 are grouped into the subgroups g1, and the physical programming units P11 are grouped into the subgroups g11, so as to calculate H values corresponding to the subgroups g0 to g11, which are respectively H0 to H11.
If the error probability H of the physical program unit P7 is 0.3, i.e. the H7 corresponding to the subgroup g7 is 0.3, the physical program units in the other subgroups (subgroups g0 to g6, g8 to g 11) are all the first type of physical program units, i.e. the H values of the other subgroups are 0 (i.e. less than 0.3). Thus, the subgroup g7 has the largest H value relative to the other 11 subgroups.
Next, the grouping operation is continued for the physical programming units in the first group, i.e. the physical programming units P12 to P23 (i.e. the second physical programming unit), as shown in fig. 10, since the subgroup g7 to which the physical programming unit P7 belongs has the largest H value (H7); therefore, in this round of grouping operation, the physical programming units P13 in which the h value is the smallest are selected to be grouped into the subgroup g7. And (3) recalculating the H values of the 12 subgroups to obtain an H value of 0.2 of a subgroup g7, comparing the H value with H values of other subgroups, judging whether the H value of the subgroup g7 is still maximum, and if the H value of the subgroup g7 is still maximum, continuing to group the entity programming units with the minimum H value into the subgroup g7. For example, in the grouping operation of the physical programming units P24 to P35 (i.e., the third physical programming unit), the physical programming unit P30 with the smallest h value is continuously grouped into the subgroup g7. In this way, if the H value of subgroup g3 is the largest in the next grouping operation, then in the grouping operation of physical program units P36 to P47, physical program unit P38 with the smallest H value is grouped into subgroup g3 until physical program units P0 to P249 are all grouped into subgroups g0 to g 11. Thus, each subgroup comprises 249 first-type entity programming units and 1 second-type entity programming unit, and the potentially dense second-type entity programming units can be distributed into each subgroup, so that the condition that the second-type entity programming units in the same subgroup are too many and exceed the capability of the RS algorithm for recovering data is avoided.
In step S506, the memory controller 123 performs a logic operation on the data written into the physical program unit to generate an error checking and correcting code, wherein the error checking and correcting code is used for performing an error checking and correcting operation on the data in the physical program unit to recover the data in the physical program unit.
Based on this, the data recovery method, the memory storage device and the memory controller provided by the invention can disperse the potentially dense bad entity programming units into each subgroup, so as to avoid the situation that the data cannot be recovered due to the fact that the number of the wrong entity programming units in the same entity group is too large to exceed the capacity of the RS algorithm for recovering the data when the RS algorithm is used for decoding the read data.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.