CN104461781A

Movatterモバイル変換

Info

Publication number: CN104461781A
Application number: CN201410717059.2A
Authority: CN
Inventors: 冯丹; 柳青; 施展; 李剑; 欧阳梦云
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2014-12-01
Filing date: 2014-12-01
Publication date: 2015-03-25
Anticipated expiration: 2034-12-01
Also published as: CN104461781B

Abstract

一种基于纠删码的数据块重建方法，属于计算机存储技术领域，解决现有数据块修复方法需要传输大量数据的问题，以减少重建数据的传输量。本发明包括数据分块步骤、构造生成矩阵G步骤、生成校验块步骤、检查数据块状态步骤、构造修复矩阵步骤和修复数据块步骤。本发明将原始文件分为k个数据块，将每个数据块继续等分为r个数据片；k个数据块编码为m个校验块，每个校验块也包含r个校验片。重建任意一个数据块时，从剩余的每个数据块的r个数据片和校验块的r个校验片中取r/m片(该方法保证r被m整除)，从而重建一个数据块只需要总量(m+k-1)r/m的数据片，相对里德-所罗门编码重建一个数据块的数据量，有了明显的减少。

A data block reconstruction method based on an erasure code belongs to the technical field of computer storage, and solves the problem that a large amount of data needs to be transmitted in existing data block repair methods, so as to reduce the transmission amount of reconstructed data. The invention includes the steps of dividing data into blocks, constructing a generating matrix G, generating check blocks, checking the status of data blocks, constructing a repair matrix and repairing data blocks. The present invention divides the original file into k data blocks, and divides each data block into r data slices equally; k data blocks are coded into m check blocks, and each check block also includes r check slices . When rebuilding any data block, r/m slices are taken from the r data slices of each remaining data block and the r check slices of the check block (this method ensures that r is divisible by m), thus rebuilding a data block Only the total amount of (m+k-1)r/m data slices is needed, which is significantly reduced compared to the amount of data required to reconstruct a data block by Reed-Solomon encoding.

Description

Translated fromChinese

一种基于纠删码的数据块重建方法A Data Block Reconstruction Method Based on Erasure Code

技术领域technical field

本发明属于计算机存储技术领域，具体涉及一种基于纠删码的数据块重建方法，可减少重建数据传输。The invention belongs to the technical field of computer storage, and in particular relates to a data block reconstruction method based on an erasure code, which can reduce reconstruction data transmission.

背景技术Background technique

从RAID(独立硬盘冗余阵列)到分布式存储系统，纠删码广泛应用于存储系统中，用于防止部分数据丢失或数据服务器宕机导致的数据不可访问。From RAID (Redundant Array of Independent Hard Disks) to distributed storage systems, erasure codes are widely used in storage systems to prevent partial data loss or data inaccessibility caused by data server downtime.

纠删码是一种数据保护的编码方法，它首先将原始数据分为等大的数据块，然后再将数据块编码为校验块。当若干个数据块或校验块丢失时，纠删码技术保证原始数据仍然可以恢复。Erasure coding is a coding method for data protection. It first divides the original data into data blocks of equal size, and then encodes the data blocks into check blocks. When several data blocks or check blocks are lost, the erasure code technology ensures that the original data can still be recovered.

传统的编码方法将原始数据等分为k份数据块，编码生成m份校验块并将它们存储在k+m个不同的存储节点。存储节点是存储设备的逻辑抽象，既可以是一个磁盘也可以是一个存储服务器。所有(k+m)块数据块和校验块中的任意k块都可以重建出k个数据块。但是这类纠删码也面临着一个修复带宽问题：重建一个数据块需要该数据块大小的k倍磁盘I/O和网络流量，占用大量的存储资源和网络资源。The traditional encoding method divides the original data into k data blocks, encodes m check blocks and stores them in k+m different storage nodes. A storage node is a logical abstraction of a storage device, which can be either a disk or a storage server. Any k blocks in all (k+m) data blocks and check blocks can reconstruct k data blocks. However, this type of erasure code also faces a repair bandwidth problem: rebuilding a data block requires k times the size of the data block in disk I/O and network traffic, occupying a large amount of storage resources and network resources.

以(n，k)里德-所罗门编码为例，n是数据块和校验块的总个数，k是数据块个数，m＝n-k是校验块个数。当使用(n，k)里德-所罗门编码对数据量为M的文件进行编码时，首先将文件等分为k个数据块：D₀、D₁、...、D_k-1，每个数据块大小为M/k，接着计算生成矩阵和k个数据块的乘积(计算基于有限域)，得到m个校验块C₀、C₁、…、C_m-1，每个校验块大小也是M/k。(n，k)里德-所罗门编码的生成矩阵是一个基于有限域GF(2^w)的m行k列的矩阵，该矩阵可以是变换后的范德尔蒙德矩阵(Vandermonde matrix)，也可以是柯西矩阵(Cauchy matrix)。当一个数据块或校验块失效时，需要重建数据块或校验块以保证可靠性。如果失效的是校验块，利用k块数据块可以重新编码得到；如果失效的是数据块，利用剩余n-1块数据块和校验块中的任意k块可重建该数据块。无论是失效的是一块数据块还是一块校验块，都需要k块整块数据块或校验块进行重建，所需要的数据量是M。Taking (n, k) Reed-Solomon encoding as an example, n is the total number of data blocks and check blocks, k is the number of data blocks, and m=nk is the number of check blocks. When using (n, k) Reed-Solomon encoding to encode a file with a data volume of M, the file is first divided into k data blocks: D₀ , D₁ , ..., D_k-1 , each The size of each data block is M/k, and then calculate the product of the generator matrix and k data blocks (the calculation is based on the finite field), and obtain m check blocks C₀ , C₁ , ..., C_m-1 , each check The block size is also M/k. The generator matrix of (n, k) Reed-Solomon coding is a matrix with m rows and k columns based on finite field GF(2^w ), which can be a transformed Vandermonde matrix, or is the Cauchy matrix. When a data block or check block fails, the data block or check block needs to be rebuilt to ensure reliability. If the invalidation is a check block, k data blocks can be used to re-encode it; if the invalidation is a data block, the data block can be reconstructed by using any k of the remaining n-1 data blocks and check blocks. Regardless of whether it is a data block or a check block that fails, k entire data blocks or check blocks are required to rebuild, and the required data volume is M.

现有减少单数据节点失效所需重建数据量的编码方法往往牺牲了最优的存储效率或者最大距离可分(MDS)性质。简单再生码(Simple RegeneratingCodes，见论文："Simple regenerating codes：Network coding for cloud storage."INFOCOM，2012Proceedings IEEE.IEEE，2012.)在每个存储节点多存储1/f的数据量，f越大，额外的存储开销越小，但相应的修复带宽也越大，反之亦然。局部重建码(Local Reconstruction Codes，见论文："Erasure Coding inWindows Azure Storage."USENIX Annual Technical Conference.2012.)将数据块分组，不仅在组内进行编码，并在组间进行编码，这样可以将大部分单点失效的重建局限在组内进行，从而减少了重建数据量，但局部重建码需要额外的存储开销，且不满足MDS性质。功能性最小存储再生码(FunctionalMinimum Storage Regenerating codes，见论文：NCCloud：ANetwork-Coding-Based Storage System in a Cloud-of-Clouds[J].2013.)虽然存储和修复单个节点失效所需要的数据量都是最优的，但存在计算开销大，且保存的不是原来的数据，因此读取数据时需要解码等缺点。Existing encoding methods to reduce the amount of reconstructed data required for a single data node failure often sacrifice optimal storage efficiency or maximum distance separability (MDS) properties. Simple Regenerating Codes (Simple Regenerating Codes, see the paper: "Simple regenerating codes: Network coding for cloud storage." INFOCOM, 2012 Proceedings IEEE.IEEE, 2012.) stores 1/f more data in each storage node, the larger f is, The smaller the additional storage overhead, the larger the corresponding repair bandwidth, and vice versa. Local Reconstruction Codes (Local Reconstruction Codes, see the paper: "Erasure Coding in Windows Azure Storage." USENIX Annual Technical Conference.2012.) group data blocks, code not only within the group, but also between groups, so that large The reconstruction of partial single-point failure is limited to the group, thereby reducing the amount of reconstruction data, but the local reconstruction code requires additional storage overhead and does not satisfy the MDS property. Functional Minimum Storage Regenerating codes (FunctionalMinimum Storage Regenerating codes, see the paper: NCCloud: ANetwork-Coding-Based Storage System in a Cloud-of-Clouds[J].2013.) Although the amount of data required to store and repair a single node failure Both are optimal, but there are disadvantages such as high computational overhead, and the storage is not the original data, so it needs to be decoded when reading the data.

发明内容Contents of the invention

本发明提供一种基于纠删码的数据块重建方法，解决现有数据块修复方法需要传输大量数据的问题，以减少重建数据的传输量。The invention provides a data block reconstruction method based on an erasure code, which solves the problem that a large amount of data needs to be transmitted in the existing data block repair method, so as to reduce the transmission amount of reconstruction data.

本发明所提供的一种基于纠删码的数据块重建方法，包括数据分块步骤、构造生成矩阵G步骤、生成校验块步骤、检查数据块状态步骤、构造修复矩阵步骤和修复数据块步骤，其特征在于：A data block reconstruction method based on an erasure code provided by the present invention includes the steps of dividing data into blocks, constructing a generation matrix G, generating a check block, checking the state of a data block, constructing a repair matrix, and repairing a data block. , characterized by:

(1)数据分块步骤：(1) Data block steps:

将数据量为M的原始文件等分为k个数据块D_j，j＝0、...、k-1，再将k个数据块分别保存在k个数据节点上，进而将各数据节点上的数据块D_j等分为r个数据片D_j，p，p＝0、...、r-1，r＝m^k-1，k≥2，m≥2；等分过程中不足部分用0补齐并记录不足数据块或数据片的长度；Divide the original file with a data volume of M into k data blocks D_j , j=0,...,k-1, and then store the k data blocks on k data nodes respectively, and then divide each data node The data block D_j above is equally divided into r pieces of data D_{j, p} , p=0,..., r-1, r=m^k-1 , k≥2, m≥2; Partially fill with 0 and record the length of the insufficient data block or data slice;

对所有数据片赋予序号，数据片D_j，p为第j×r+p+1个数据片；Assign serial numbers to all data slices, data slice D_{j, p} is the j×r+p+1th data slice;

(2)构造生成矩阵G步骤：(2) Steps of constructing generator matrix G:

生成矩阵G是m行、k列的分块矩阵，包括m×k个子矩阵G_i，j：The generator matrix G is a block matrix with m rows and k columns, including m×k sub-matrices G_{i, j} :

$[\begin{matrix} {G G}_{0,0 0,0} & {G G}_{0,1 0,1} & . . . . . . & {G G}_{00,, k k - - 11} \\ {G G}_{1,0 1,0} & {G G}_{1,1 1,1} & . . . . . . & {G G}_{11,, k k - - 11} \\ . . . . . . & . . . . . . & . . . . . . & . . . . . . \\ {G G}_{m m - - 1,0 1,0} & {G G}_{m m - - 11,, 11} & . . . . . . & {G G}_{m m - - 11,, k k - - 11} \end{matrix}],,$

其中，每个子矩阵G_i，j为一个r行、r列的方阵，满足下面等式：Among them, each sub-matrix G_i,j is a square matrix with r rows and r columns, which satisfies the following equation:

${G G}_{i i,, j j} = = {(({I I}_{m m}^{i i + +}))}^{&CircleTimes; &CircleTimes; j j} &CircleTimes; &CircleTimes; {(({I I}_{m m}))}^{&CircleTimes; &CircleTimes; ((k k - - 11 - - j j))} \cdot \cdot {α α}_{i i,, j j},, i i = = 00 ~ ~ m m - - 11,, j j = = 00 ~ ~ k k - - 11;;$

其中，表示矩阵的张量乘(也称为Kronecker乘)，I_m表示m行、m列的单位矩阵，表示单位矩阵I_m所有元素循环左移i位后的结果，当i＝0时，表示j个连续张量乘的结果，α_i，j是(m+k，k)-里德-所罗门编码生成矩阵中第i行第j列元素；in, Represents the tensor multiplication of matrices (also known as Kronecker multiplication), I_m represents the identity matrix of m rows and m columns, Indicates the result of all elements of the identity matrix_Im being cyclically shifted to the left by i bits, when i=0, means j The result of continuous tensor multiplication, α_{i, j} is the (m+k, k)-Reed-Solomon encoding generation matrix element in row i and column j;

(3)生成校验块步骤：(3) Generate check block steps:

分别计算生成矩阵G中各行子矩阵和所有数据块的乘积，得到m个校验块C_i，i＝0～m-1，再将m个校验块分别保存在m个数据节点上，第i个校验块C_i为生成矩阵G的第i行子矩阵与k个数据块的乘积：Calculate the product of each row sub-matrix in the generator matrix G and all data blocks respectively to obtain m check blocks C_i , i=0~m-1, and then store the m check blocks on m data nodes respectively, the first The i check block C_i is the product of the i-th row sub-matrix of the generator matrix G and the k data blocks:

${C C}_{i i} = = [\begin{matrix} {C C}_{i i,, 00} \\ {C C}_{i i,, 11} \\ . . \\ . . \\ . . \\ {C C}_{i i,, r r - - 11} \end{matrix}] = = [\begin{matrix} {G G}_{i i,, 00} & {G G}_{i i,, 11} & . . . . . . & {G G}_{i i,, k k - - 11} \end{matrix}] \cdot &Center Dot; [\begin{matrix} {D D.}_{00} \\ {D D.}_{11} \\ . . \\ . . \\ . . \\ {D D.}_{k k - - 11} \end{matrix}];;$

校验块C_i再等分为r个校验片C_i，p，p＝0～r-1；对所有校验片赋予序号，校验片C_i，p为第i×r+p+1个校验片；The check block C_i is divided into r check slices C_{i, p} , p=0~r-1; all check slices are assigned serial numbers, and the check slice C_{i, p} is the i×r+p+ 1 check sheet;

(4)检查数据块状态步骤：(4) Check the data block status steps:

定期依次检查各数据节点上的数据块是否出错或丢失，是则转步骤(5)；否则不作处理；Regularly check whether the data blocks on each data node are wrong or lost, if so, turn to step (5); otherwise, do not process;

(5)构造修复矩阵步骤，包括下述子步骤：(5) The step of constructing the repair matrix includes the following sub-steps:

(5.1)当第i个数据节点上的数据块D_i出错或丢失，将生成矩阵G中第i列的子矩阵全设置为0，构成第一中间矩阵G_A；(5.1) When the data block D_i on the i-th data node is wrong or lost, all the sub-matrixes in the i-th column in the generation matrix G are set to 0 to form the first intermediate matrix G_A ;

(5.2)在第一中间矩阵G_A中选取任意一个非零子矩阵，在该非零子矩阵中选取任意一个非零矩阵元素作为种子，在第一中间矩阵G_A中标记该非零矩阵元素所在的行向量与列向量；(5.2) Select any non-zero sub-matrix in the first intermediate matrix G_A , select any non-zero matrix element in this non-zero sub-matrix as a seed, and mark this non-zero matrix element in the first intermediate matrix G_A The row vector and column vector where it is located;

(5.3)对所标记的行向量与列向量中的每个非零矩阵元素，在第一中间矩阵G_A中标记该非零矩阵元素所在的行向量和列向量；(5.3) for each non-zero matrix element in the marked row vector and the column vector, mark the row vector and the column vector where the non-zero matrix element is located in the first intermediate matrix_GA ;

(5.4)判断是否有新的行向量和列向量被标记，是则转子步骤(5.3)，否则进行子步骤(5.5)；(5.4) Judging whether there are new row vectors and column vectors to be marked, if yes, the rotor step (5.3), otherwise proceed to the sub-step (5.5);

(5.5)构成修复矩阵M_r：(5.5) Constitute the repair matrix M_r :

首先生成一个k×r行，k×r列的单位矩阵G_B；First generate an identity matrix G_B with k×r rows and k×r columns;

将第一中间矩阵G_A标记的列向量序号作为行向量序号，从单位矩阵G_B中选取对应的行向量，作为第一行向量组；The column vector sequence number of the first intermediate matrix G_A mark is used as the row vector sequence number, and the corresponding row vector is selected from the identity matrix G_B as the first row vector group;

将第一中间矩阵G_A标记的行向量序号作为行向量序号，从生成矩阵G中选取对应的行向量，作为第二行向量组；The row vector sequence number marked by the first intermediate matrix G_A is used as the row vector sequence number, and the corresponding row vector is selected from the generator matrix G as the second row vector group;

将所述第一行向量组置于第二行向量组之上，构成第二中间矩阵G_c，将第二中间矩阵G_c中全零列删除得到正方矩阵G_d，然后从正方矩阵G_d的逆矩阵中选择从i×r/m开始的r个行向量，构成r行、(k+m-1)×r/m列的修复矩阵M_r；Place the first row vector group on the second row vector group to form a second intermediate matrix G_c , delete all zero columns in the second intermediate matrix G_c to obtain a square matrix G_d , and then obtain a square matrix G d from the square matrix G_d the inverse matrix of Select r row vectors starting from i×r/m to form a repair matrix M_r with r rows and (k+m-1)×r/m columns;

(6)修复数据块步骤：(6) Repair data block steps:

选取第一中间矩阵G_A标记的列向量的列序号作为数据片序号，其所对应的各数据片作为数据片序列SD_r，数据片数量为(k-1)×r/m，选取第一中间矩阵G_A标记的行向量的行序号作为校验片序号，其所对应的各校验片作为校验片序列SC_r，校验片数量为m×r/m；Select the column number of the column vector marked by the first intermediate matrix G_A as the data piece number, and the corresponding data pieces as the data piece sequence SD_r , the number of data pieces is (k-1)×r/m, select the first The row number of the row vector marked by the intermediate matrix G_A is used as the check slice number, and the corresponding check slices are used as the check slice sequence SC_r , and the number of check slices is m×r/m;

通过计算M_r和数据片序列SD_r和校验片序列SC_r的乘积重建数据块D_i：Reconstruct data block D_i by calculating the product of M_r and data slice sequence SD_r and check slice sequence SC_r :

${D D.}_{i i} = = {M m}_{r r} \times \times [\begin{matrix} {SD SD}_{r r} \\ {SC SC}_{r r} \end{matrix}] . .$

本发明将原始文件分为k个数据块，将每个数据块继续等分为r个数据片；k个数据块编码为m个校验块，每个校验块也包含r个校验片。重建任意一个数据块时，从剩余的每个数据块的r个数据片和校验块的r个校验片中取r/m片(该方法保证r被m整除)，从而重建一个数据块只需要总量(m+k-1)r/m的数据片，相对里德-所罗门编码重建一个数据块的数据量，有了明显的减少。The present invention divides the original file into k data blocks, and divides each data block into r data slices equally; k data blocks are coded into m check blocks, and each check block also includes r check slices . When rebuilding any data block, take r/m slices from the r data slices of each remaining data block and the r check slices of the check block (this method guarantees that r is divisible by m), thus rebuilding a data block Only the total amount of (m+k-1)r/m data slices is needed, which is significantly reduced compared to the amount of data required to reconstruct a data block by Reed-Solomon encoding.

本发明要求k≥2，且m≥2，适用于重建存储数据块的数据节点，而不适用于重建存储校验块的校验节点。The present invention requires k≥2 and m≥2, which is suitable for rebuilding data nodes storing data blocks, but not suitable for rebuilding check nodes storing check blocks.

附图说明Description of drawings

图1为本发明的流程示意图；Fig. 1 is a schematic flow sheet of the present invention;

图2为构造修复矩阵步骤的流程示意图。Fig. 2 is a schematic flowchart of the steps of constructing a repair matrix.

具体实施方式Detailed ways

如图1所示，本发明的实施例，结合一个m＝2，k＝3的实例，包括数据分块步骤、构造生成矩阵G步骤、生成校验块步骤、检查步骤、构造修复矩阵步骤和修复数据块步骤：As shown in Figure 1, the embodiment of the present invention, in conjunction with an example of m=2, k=3, includes a data block step, a step of constructing a generator matrix G, a step of generating a check block, a step of checking, a step of constructing a repair matrix and Repair data block steps:

(1)数据分块步骤：(1) Data block steps:

将数据量为66MB的原始文件等分为3个22MB的数据块D_j，j＝0、1、2，再将3个数据块分别保存在3个数据节点N₀，N₁，N₂上，进而将各数据节点上的数据块D_j等分为4个数据片D_j，p，p＝0、1、2、3；Divide the original file with a data volume of 66MB into three 22MB data blocks D_j , j=0, 1, 2, and then store the three data blocks in three data nodes N₀ , N₁ , and N₂ respectively , and further divide the data block D_j on each data node into four data slices D_j,p , p=0, 1, 2, 3;

对所有数据片赋予序号，数据片D_j，p为第j×4+p+1个数据片；Assign serial numbers to all data slices, data slice D_{j, p} is the j×4+p+1th data slice;

(2)构造生成矩阵G步骤：(2) Steps of constructing generator matrix G:

生成矩阵G是2行，3列的分块矩阵，包括2×3个子矩阵：The generator matrix G is a block matrix with 2 rows and 3 columns, including 2×3 sub-matrices:

$G G = = [\begin{matrix} {G G}_{0,0 0,0} & {G G}_{0,1 0,1} & {G G}_{0,2 0,2} \\ {G G}_{1,0 1,0} & {G G}_{1,1 1,1} & {G G}_{1,2 1,2} \end{matrix}],,$

其中，每个子矩阵为一个4行、4列的方阵，满足下面等式：Among them, each sub-matrix is a square matrix with 4 rows and 4 columns, which satisfies the following equation:

${G G}_{i i,, j j} = = {(({I I}_{m m}^{i i + +}))}^{&CircleTimes; &CircleTimes; j j} &CircleTimes; &CircleTimes; {(({I I}_{m m}))}^{&CircleTimes; &CircleTimes; ((k k - - 11 - - j j))} \cdot &Center Dot; {α α}_{i i,, j j},, i i = = 00 ~ ~ 11,, j j = = 00 ~ ~ 22;;$

${G G}_{0,0 0,0} = = {G G}_{0,1 0,1} = = {G G}_{0,2 0,2} = = [\begin{matrix} 11 & 00 \\ 00 & 11 \end{matrix}] &CircleTimes; &CircleTimes; [\begin{matrix} 11 & 00 \\ 00 & 11 \end{matrix}] \cdot &Center Dot; 11 = = [\begin{matrix} 11 & 00 & 00 & 00 \\ 00 & 11 & 00 & 00 \\ 00 & 00 & 11 & 00 \\ 00 & 00 & 00 & 11 \end{matrix}],,$

${G G}_{11,, 00} = = [\begin{matrix} 11 & 00 \\ 00 & 11 \end{matrix}] &CircleTimes; &CircleTimes; [\begin{matrix} 11 & 00 \\ 00 & 11 \end{matrix}] \cdot &Center Dot; f f = = [\begin{matrix} f f & 00 & 00 & 00 \\ 00 & f f & 00 & 00 \\ 00 & 00 & f f & 00 \\ 00 & 00 & 00 & f f \end{matrix}],,$

${G G}_{11,, 11} = = [\begin{matrix} 00 & 11 \\ 11 & 00 \end{matrix}] &CircleTimes; &CircleTimes; [\begin{matrix} 11 & 00 \\ 00 & 11 \end{matrix}] \cdot &Center Dot; 88 = = [\begin{matrix} 00 & 00 & 88 & 00 \\ 00 & 00 & 00 & 88 \\ 88 & 00 & 00 & 00 \\ 00 & 88 & 00 & 00 \end{matrix}],,$

${G G}_{11,, 22} = = [\begin{matrix} 00 & 11 \\ 11 & 00 \end{matrix}] &CircleTimes; &CircleTimes; [\begin{matrix} 00 & 11 \\ 11 & 00 \end{matrix}] \cdot &Center Dot; 66 = = [\begin{matrix} 00 & 00 & 00 & 66 \\ 00 & 00 & 66 & 00 \\ 00 & 66 & 00 & 00 \\ 66 & 00 & 00 & 00 \end{matrix}];;$

矩阵中元素均以16进制表示；相应的(5，3)-里德-所罗门编码生成矩阵为2×3的矩阵：The elements in the matrix are all expressed in hexadecimal; the corresponding (5,3)-Reed-Solomon encoding generation matrix is a 2×3 matrix:

$[\begin{matrix} 11 & 11 & 11 \\ f f & 88 & 66 \end{matrix}];;$

α_i，j是(5，3)-里德-所罗门编码生成矩阵中第i行第j列元素；α_i,j is the (5,3)-Reed-Solomon encoding generation matrix element in row i and column j;

(3)生成校验块步骤：(3) Generate check block steps:

分别计算生成矩阵G中各行子矩阵和所有数据块D₀、D₁、D₂的乘积，得到2个校验块C₀和C₁，再将2个校验块分别保存在2个数据节点N₃，N₄上：Calculate the product of each row sub-matrix in the generator matrix G and all data blocks D₀ , D₁ , and D₂ to obtain two check blocks C₀ and C₁ , and then store the two check blocks in two data nodes N₃ , N₄ on:

$[\begin{matrix} {C C}_{00} \\ {C C}_{11} \end{matrix}] = = [\begin{matrix} {G G}_{0,0 0,0} & {G G}_{0,1 0,1} & {G G}_{0,2 0,2} \\ {G G}_{1,0 1,0} & {G G}_{1,1 1,1} & {G G}_{1,2 1,2} \end{matrix}] \cdot &Center Dot; [\begin{matrix} {D D.}_{00} \\ {D D.}_{11} \\ {D D.}_{22} \end{matrix}]$

每个校验块22MB，每个校验块可以分为4个校验片，记作C_i，p，以C_0，0和C_1，1为例，其计算如下：Each check block is 22MB, each check block can be divided into 4 check slices, denoted as C_i,p , taking C_0,0 and C_1,1 as an example, the calculation is as follows:

${C C}_{0,0 0,0} = = {D D.}_{0,0 0,0} &CirclePlus; &CirclePlus; {D D.}_{1,0 1,0} &CirclePlus; &CirclePlus; {D D.}_{2,0 2,0};;$

${C C}_{1,1 1,1} = = 1515 \cdot &Center Dot; {D D.}_{0,1 0,1} &CirclePlus; &CirclePlus; 88 \cdot &Center Dot; {D D.}_{1,3 1,3} &CirclePlus; &CirclePlus; 66 \cdot &Center Dot; {D D.}_{2,2 2,2};;$

其中表示异或运算，C_i，p是所有校验片中编号第i×4+p+1个校验片，p＝0～3；in Indicates XOR operation, C_{i, p} is the number i×4+p+1 check piece in all check pieces, p=0～3;

(4)检查数据块状态步骤：(4) Check the data block status steps:

(5)构造修复矩阵步骤，如图2所示，包括下述子步骤：(5) construct repair matrix step, as shown in Figure 2, comprise following sub-step:

(5.1)当第0个数据节点上的数据块D₀出错或丢失，其四个数据片均出错或丢失，将生成矩阵G中第0列的子矩阵全设置为0，构成第一中间矩阵G_A：(5.1) When the data block D₀ on the 0th data node is wrong or lost, and its four data pieces are all wrong or lost, all the sub-matrixes in the 0th column in the generation matrix G are set to 0 to form the first intermediate matrix G_A :

${G G}_{A A} = = [\begin{matrix} 00 & 00 & 00 & 00 & 11 & 00 & 00 & 00 & 11 & 00 & 00 & 00 \\ 00 & 00 & 00 & 00 & 00 & 11 & 00 & 00 & 00 & 11 & 00 & 00 \\ 00 & 00 & 00 & 00 & 00 & 00 & 11 & 00 & 00 & 00 & 11 & 00 \\ 00 & 00 & 00 & 00 & 00 & 00 & 00 & 11 & 00 & 00 & 00 & 11 \\ 00 & 00 & 00 & 00 & 00 & 00 & 88 & 00 & 00 & 00 & 00 & 66 \\ 00 & 00 & 00 & 00 & 00 & 00 & 00 & 88 & 00 & 00 & 66 & 00 \\ 00 & 00 & 00 & 00 & 88 & 00 & 00 & 00 & 00 & 66 & 00 & 00 \\ 00 & 00 & 00 & 00 & 00 & 88 & 00 & 00 & 66 & 00 & 00 & 00 \end{matrix}];;$

(5.2)在第一中间矩阵G_A中选取非零子矩阵G_0，1，在子矩阵G_0，1中选取矩阵元素1(G_A的第一行第五列)作为种子，在第一中间矩阵G_A中标记该非零矩阵元素所在的行向量(第一行各矩阵元素)与列向量(第五列各矩阵元素)；(5.2) Select a non-zero sub-matrix G_0,1 in the first intermediate matrix G_A , select matrix element 1 (the first row and fifth column of G_A ) in the sub-matrix G_0,1 as the seed, in the first The row vector (each matrix element in the first row) and the column vector (each matrix element in the fifth column) where the non-zero matrix element is marked in the intermediate matrix G_A ;

(5.4)判断是否有新的行向量和列向量被标记，是则转子步骤(5.3)，否则进行子步骤(5.5)；共计标记了四个行向量第1、2、7、8行向量和四个列向量第5、6、9、10列向量；(5.4) Judging whether there are new row vectors and column vectors to be marked, if so, then the rotor step (5.3), otherwise proceed to sub-step (5.5); a total of four row vectors 1st, 2nd, 7th, 8th row vectors and The 5th, 6th, 9th, and 10th column vectors of four column vectors;

(5.5)构成修复矩阵M_r：(5.5) Constitute the repair matrix M_r :

首先生成一个12行，12列大小的单位矩阵G_B；First generate a 12-row, 12-column identity matrix G_B ;

将第一中间矩阵G_A标记的列向量序号第5、6、9、10作为行向量序号，从单位矩阵G_B中选取对应的行向量，作为第一行向量组；The column vector number 5, 6, 9, and 10 marked by the first intermediate matrix G_A is used as the row vector number, and the corresponding row vector is selected from the unit matrix G_B as the first row vector group;

$[\begin{matrix} 00 & 00 & 00 & 00 & 11 & 00 & 00 & 00 & 00 & 00 & 00 & 00 \\ 00 & 00 & 00 & 00 & 00 & 11 & 00 & 00 & 00 & 00 & 00 & 00 \\ 00 & 00 & 00 & 00 & 00 & 00 & 00 & 00 & 11 & 00 & 00 & 00 \\ 00 & 00 & 00 & 00 & 00 & 00 & 00 & 00 & 00 & 11 & 00 & 00 \end{matrix}],,$

将第一中间矩阵G_A标记的行向量序号第1、2、7、8作为行向量序号，从生成矩阵G中选取对应的行向量，作为第二行向量组；Using the row vector numbers 1, 2, 7, and 8 marked by the first intermediate matrix G_A as the row vector numbers, select the corresponding row vector from the generator matrix G as the second row vector group;

$[\begin{matrix} 11 & 00 & 00 & 00 & 11 & 00 & 00 & 00 & 11 & 00 & 00 & 00 \\ 00 & 11 & 00 & 00 & 00 & 11 & 00 & 00 & 00 & 11 & 00 & 00 \\ 00 & 00 & f f & 00 & 88 & 00 & 00 & 00 & 00 & 66 & 00 & 00 \\ 00 & 00 & 00 & f f & 00 & 88 & 00 & 00 & 66 & 00 & 00 & 00 \end{matrix}],,$

将所述第一行向量组置于第二行向量组之上，构成第二中间矩阵G_c：Put the first row vector group on the second row vector group to form the second intermediate matrix G_c :

${G G}_{c c} = = [\begin{matrix} 00 & 00 & 00 & 00 & 11 & 00 & 00 & 00 & 00 & 00 & 00 & 00 \\ 00 & 00 & 00 & 00 & 00 & 11 & 00 & 00 & 00 & 00 & 00 & 00 \\ 00 & 00 & 00 & 00 & 00 & 00 & 00 & 00 & 11 & 00 & 00 & 00 \\ 00 & 00 & 00 & 00 & 00 & 00 & 00 & 00 & 00 & 11 & 00 & 00 \\ 11 & 00 & 00 & 00 & 11 & 00 & 00 & 00 & 11 & 00 & 00 & 00 \\ 00 & 11 & 00 & 00 & 00 & 11 & 00 & 00 & 00 & 11 & 00 & 00 \\ 00 & 00 & f f & 00 & 88 & 00 & 00 & 00 & 00 & 66 & 00 & 00 \\ 00 & 00 & 00 & f f & 00 & 88 & 00 & 00 & 66 & 00 & 00 & 00 \end{matrix}],,$

将第二中间矩阵G_c中全零列删除得到正方矩阵G_d：Delete all zero columns in the second intermediate matrix G_c to get a square matrix G_d :

${G G}_{d d} = = [\begin{matrix} 00 & 00 & 00 & 00 & 11 & 00 & 00 & 00 \\ 00 & 00 & 00 & 00 & 00 & 11 & 00 & 00 \\ 00 & 00 & 00 & 00 & 00 & 00 & 11 & 00 \\ 00 & 00 & 00 & 00 & 00 & 00 & 00 & 11 \\ 11 & 00 & 00 & 00 & 11 & 00 & 11 & 00 \\ 00 & 11 & 00 & 00 & 00 & 11 & 00 & 11 \\ 00 & 00 & f f & 00 & 88 & 00 & 00 & 66 \\ 00 & 00 & 00 & f f & 00 & 88 & 66 & 00 \end{matrix}],,$

G_d的逆矩阵Inverse matrix of G_d

${G G}_{d d}^{- - 11} = = [\begin{matrix} 11 & 00 & 00 & 00 & 11 & 00 & 00 & 00 \\ 11 & 11 & 00 & 00 & 11 & 11 & 00 & 00 \\ 5252 & 9696 & 00 & 5353 & 9696 & 9696 & 9696 & 00 \\ 00 & c c 44 & 5353 & 00 & 00 & 00 & 00 & 9696 \\ 11 & 00 & 00 & 00 & 00 & 00 & 00 & 00 \\ 00 & 11 & 00 & 00 & 00 & 00 & 00 & 00 \\ 00 & 00 & 11 & 00 & 00 & 00 & 00 & 00 \\ 00 & 00 & 00 & 11 & 00 & 00 & 00 & 00 \end{matrix}],,$

然后从正方矩阵G_d的逆矩阵中选择从0开始的4个行向量，构成4行、8列的修复矩阵M_r：Then from the inverse matrix of the square matrix G_d Select 4 row vectors starting from 0 to form a repair matrix M_r with 4 rows and 8 columns:

${M m}_{r r} = = [\begin{matrix} 11 & 00 & 00 & 00 & 11 & 00 & 00 & 00 \\ 11 & 11 & 00 & 00 & 11 & 11 & 00 & 00 \\ 5252 & 9696 & 00 & 5353 & 9696 & 9696 & 9696 & 00 \\ 00 & c c 44 & 5353 & 00 & 00 & 00 & 00 & 9696 \end{matrix}];;$

(6)修复数据块步骤：(6) Repair data block steps:

选取第一中间矩阵G_A标记的列向量的列序号第5、6、9、10作为数据片序号，其所对应的各数据片作为数据片序列SD_r，数据片数量为4，分别为数据片D_1，0，D_1，1，D_2，0，D_2，1，选取第一中间矩阵G_A标记的行向量的行序号第1、2、7、8作为校验片序号，其所对应的各校验片作为校验片序列SC_r，校验片数量为4，分别为校验片C_0，0，C_0，1，C_1，2，C_1，3；Select the column numbers 5, 6, 9, and 10 of the column vector marked by the first intermediate matrix G_A as the data piece number, and the corresponding data pieces as the data piece sequence SD_r , the number of data pieces is 4, respectively Slices D_1,0 , D_1,1 , D_2,0 , D_2,1 , select the row numbers 1, 2, 7, and 8 of the row vector marked by the first intermediate matrix G_A as the verification slice numbers, and The corresponding check slices are used as the check slice sequence SC_r , and the number of check slices is 4, which are check slices C_0,0 , C_0,1 , C_1,2 , C_1,3 ;

通过计算M_r和数据片序列SD_r和校验片序列SC_r的乘积重建数据块D₀：Reconstruct data block D₀ by calculating the product of M_r and data slice sequence SD_r and check slice sequence SC_r :

${D D.}_{00} = = [\begin{matrix} {D D.}_{0,0 0,0} \\ {D D.}_{0,1 0,1} \\ {D D.}_{0,2 0,2} \\ {D D.}_{0,3 0,3} \end{matrix}] = = [\begin{matrix} 11 & 00 & 00 & 00 & 11 & 00 & 00 & 00 \\ 11 & 11 & 00 & 00 & 11 & 11 & 00 & 00 \\ 5252 & 9696 & 00 & 5353 & 9696 & 9696 & 9696 & 00 \\ 00 & c c 44 & 5353 & 00 & 00 & 00 & 00 & 9696 \end{matrix}] \cdot &Center Dot; [\begin{matrix} {D D.}_{1,0 1,0} \\ {D D.}_{1,1 1,1} \\ {D D.}_{2,0 2,0} \\ {D D.}_{2,1 2,1} \\ {D D.}_{00,, 00} \\ {D D.}_{0,1 0,1} \\ {D D.}_{1,2 1,2} \\ {D D.}_{1,3 1,3} \end{matrix}] . .$

Claims

Translated fromChinese

1.一种基于纠删码的数据块重建方法，包括数据分块步骤、构造生成矩阵G步骤、生成校验块步骤、检查数据块状态步骤、构造修复矩阵步骤和修复数据块步骤，其特征在于：1. A data block reconstruction method based on erasure codes, comprising a data block step, constructing a generation matrix G step, generating a check block step, checking a data block state step, constructing a repair matrix step and repairing a data block step, its characteristics in:

(1)数据分块步骤：(1) Data block steps:

(2)构造生成矩阵G步骤：(2) Steps of constructing generator matrix G:

[\begin{matrix} {G G}_{0,0 0,0} & {G G}_{0,1 0,1} & . . . . . . & {G G}_{00,, k k - - 11} \\ {G G}_{1,0 1,0} & {G G}_{1,1 1,1} & . . . . . . & {G G}_{11,, k k - - 11} \\ . . . . . . & . . . . . . & . . . . . . & . . . . . . \\ {G G}_{m m - - 1,0 1,0} & {G G}_{m m - - 1,1 1,1} & . . . . . . & {G G}_{m m - - 11,, k k - - 11} \end{matrix}],,

G_{i, j} = {(I_{m}^{i +})}^{&CircleTimes; j} &CircleTimes; {(I_{m})}^{&CircleTimes; (k - 1 - j)} \cdot α_{i, j},

i＝0～m-1，j＝0～k-1；

G_{i, j} = {(I_{m}^{i +})}^{&CircleTimes; j} &CircleTimes; {(I_{m})}^{&CircleTimes; (k - 1 - j)} \cdot α_{i, j},

i=0～m-1, j=0～k-1;

(3)生成校验块步骤：(3) Steps to generate check block:

{C C}_{i i} = = [\begin{matrix} {C C}_{i i,, 00} \\ {C C}_{i i,, 11} \\ . . \\ . . \\ . . \\ {C C}_{i i,, r r - - 11} \end{matrix}] = = [\begin{matrix} {G G}_{i i,, 00} & {G G}_{i i,, 11} & . . . . . . & {G G}_{i i,, k k - - 11} \end{matrix}] \cdot &Center Dot; [\begin{matrix} {D D.}_{00} \\ {D D.}_{11} \\ . . \\ . . \\ . . \\ {D D.}_{k k - - 11} \end{matrix}];;

校验块C_i再等分为r个校验片C_j，p，p＝0～r-1；对所有校验片赋予序号，校验片C_j，p为第i×r+p+1个校验片；The check block C_i is divided into r check slices C_{j, p} , p=0~r-1; all check slices are assigned serial numbers, and the check slice C_{j, p} is the i×r+p+ 1 check sheet;

(4)检查数据块状态步骤：(4) Check the data block status steps:

(5.5)构成修复矩阵M_r：(5.5) Constitute the repair matrix M_r :

(6)修复数据块步骤：(6) Repair data block steps:

{D D.}_{i i} = = {M m}_{r r} \times \times [\begin{matrix} {SD SD}_{r r} \\ {SC SC}_{r r} \end{matrix}] . .