CN103139300A

Movatterモバイル変換

Info

Publication number: CN103139300A
Application number: CN2013100465409A
Authority: CN
Inventors: 张纪林; 韩书婷; 万健; 朱宝金; 周丽; 任永坚
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2013-02-05
Filing date: 2013-02-05
Publication date: 2013-06-05

Abstract

本发明公开了一种基于重复数据删除的虚拟机镜像管理的优化方法。其具体为：上传虚拟机镜像时，客户端采用固定大小分块方法将镜像文件分为若干数据块，数据块和本地文件系统的簇大小一致，再计算数据块的MD5值，使用Socket编程技术，将指纹发送至服务端，服务端进行指纹查找，将结果返回给客户端，客户端根据结果判定是否发送数据块，以节约网络资源；在服务端查找指纹时，使用指纹过滤器与指纹存储器减少内存使用量和磁盘访问量；在保存数据块时，数据块直接存入一个完整的簇，免除镜像恢复和切分的重复性工作。本发明实现了在线重复数据删除的核心态文件系统，减少磁盘存储量，降低网络消耗。

The invention discloses an optimization method for virtual machine image management based on data deduplication. Specifically: when uploading a virtual machine image, the client uses a fixed-size block method to divide the image file into several data blocks. The data block is consistent with the cluster size of the local file system, and then calculates the MD5 value of the data block, using Socket programming technology , send the fingerprint to the server, the server searches for the fingerprint, and returns the result to the client, and the client determines whether to send the data block according to the result to save network resources; when searching for the fingerprint on the server, use the fingerprint filter and the fingerprint memory Reduce memory usage and disk access; when saving data blocks, the data blocks are directly stored in a complete cluster, eliminating the repetitive work of mirror recovery and segmentation. The invention realizes the core state file system of online repeated data deletion, reduces disk storage capacity, and reduces network consumption.

Description

Translated fromChinese

一种基于重复数据删除的虚拟机镜像管理的优化方法An Optimal Method for Virtual Machine Image Management Based on Data Deduplication

技术领域technical field

本发明涉及一种云计算平台的虚拟机镜像管理方法，尤其涉及一种基于在线重复数据删除的虚拟机镜像管理的优化方法。The invention relates to a virtual machine image management method of a cloud computing platform, in particular to an optimization method for virtual machine image management based on online deduplication.

背景技术Background technique

云计算IAAS系统（基础架构即服务，以下简称云计算系统）采用虚拟机为服务形式向用户提供计算服务，用户的需求具有多变性、多样性，因此云计算系统需要向用户提供多种配置的虚拟机，如不同族系的操作系统或32/64位的系统。为了快速建立起虚拟机，云计算系统采用储存虚拟机镜像的方式来管理虚拟机。单个用户可能拥有多个配置不同的虚拟机镜像，然而虚拟机镜像文件通常大于2G，随着云计算系统规模的不断扩大，无论是虚拟机镜像的传输还是虚拟机镜像的存储等管理对云计算系统管理成本都会造成越来越大的压力。The cloud computing IAAS system (infrastructure as a service, hereinafter referred to as cloud computing system) uses virtual machines as services to provide computing services to users. The needs of users are variable and diverse. Therefore, cloud computing systems need to provide users with multiple configurations. Virtual machines, such as operating systems of different families or 32/64-bit systems. In order to quickly establish a virtual machine, the cloud computing system manages the virtual machine by storing the image of the virtual machine. A single user may have multiple virtual machine images with different configurations. However, virtual machine image files are usually larger than 2G. System management costs will create increasing pressure.

传统的虚拟机镜像管理中，采用直接上传镜像的方式将镜像文件从客户端上传到镜像服务端上，该方法实现起来比较简单，但是会消耗大量的网络资源与存储资源。其原因在于没有考虑到众多不同的虚拟机镜像文件之间，含有大量的重复数据，随着虚拟机服务的需求量增大，重复数据量也不断增加，云计算系统的资源使用率越来越低，成本也越来越高。In traditional virtual machine image management, the image file is uploaded from the client to the image server by directly uploading the image. This method is relatively simple to implement, but consumes a lot of network resources and storage resources. The reason is that it does not take into account that many different virtual machine image files contain a large amount of duplicate data. As the demand for virtual machine services increases, the amount of duplicate data continues to increase, and the resource usage of the cloud computing system is getting higher and higher. Low, and the cost is getting higher and higher.

近年来，随着云计算技术的深入发展，越来越多的企业开始部署共有云与私有云。随着企业规模的扩大、用户需求的增加，云的规模逐渐变大，迫使企业不得不增加投入。然而现有资源的利用率又不是很高，尤其是在管理虚拟机镜像文件这方面，消耗了大部分网络资源与存储资源，传统的管理虚拟机镜像文件方式表现出了极大的局限性。In recent years, with the in-depth development of cloud computing technology, more and more enterprises have begun to deploy public cloud and private cloud. With the expansion of enterprise scale and the increase of user demand, the scale of cloud gradually increases, forcing enterprises to increase investment. However, the utilization rate of existing resources is not very high, especially in the aspect of managing virtual machine image files, which consumes most of the network resources and storage resources, and the traditional way of managing virtual machine image files shows great limitations.

发明内容Contents of the invention

本发明的目的在于：针对上述传统的云计算系统中虚拟机镜像文件管理出现的问题，提出一种能够充分利用计算机性能以及硬件平台，优化虚拟机镜像文件传输与存储管理的方法。上传虚拟机镜像时，该方法应拥有减少网络传输量的功能，使之能够有效地提高传输效率，获得较高的IO吞吐率。利用重复数据删除技术消除大量重复数据，减少不必要的存储，提高存储系统的资源利用率，降低存储镜像数据的成本，增加镜像管理的灵活性，解决传统云计算系统中管理镜像消耗大量网络资源与存储资源的问题。The object of the present invention is to: aim at the problem that virtual machine image file management appears in above-mentioned traditional cloud computing system, propose a kind of method that can make full use of computer performance and hardware platform, optimize virtual machine image file transmission and storage management. When uploading a virtual machine image, the method should have the function of reducing network transmission volume, so that it can effectively improve transmission efficiency and obtain a higher IO throughput rate. Use deduplication technology to eliminate a large amount of duplicate data, reduce unnecessary storage, improve resource utilization of the storage system, reduce the cost of storing mirror data, increase the flexibility of mirror management, and solve the problem of managing mirror images in traditional cloud computing systems that consume a large amount of network resources. Issues with storage resources.

本发明所采用的技术方案是：The technical scheme adopted in the present invention is:

当客户端需要上传新生成的虚拟机镜像文件时，首先在镜像数据库注册该镜像，并在镜像服务端分配一个inode给该镜像文件。在客户端，采用固定大小分块方法对镜像文件进行分块，数据块大小与服务端文件系统的簇大小相同；计算每个数据块的MD5散列值，该散列值即为指纹，然后依次将指纹发送至服务端；服务端守护进程接收到指纹后，通过访问指纹过滤器与指纹存储器，查找接收的指纹是否已经存在，服务端将查找结果返回至客户端；客户端根据返回的查找结果，如指纹已存在指纹存储器中，则表示该指纹对应的数据块已经存在，则不发送数据块，只要修改镜像服务端的inode中的指针，指向已存的相同数据块，并修改该数据块的索引次数；否则发送数据块，在保存数据块时，数据块直接存入一个完整的簇，并将inode中指针指向新存入的数据块。重复上述过程，直至整个镜像文件发送完毕，并在镜像数据库中注册镜像文件可用。When the client needs to upload a newly generated virtual machine image file, it first registers the image in the image database, and assigns an inode to the image file on the image server. On the client side, the image file is divided into blocks using a fixed-size block method, and the size of the data block is the same as the cluster size of the file system on the server; the MD5 hash value of each data block is calculated, and the hash value is the fingerprint, and then Send the fingerprints to the server in turn; after the server daemon process receives the fingerprints, it checks whether the received fingerprints already exist by accessing the fingerprint filter and the fingerprint memory, and the server returns the search results to the client; As a result, if the fingerprint already exists in the fingerprint memory, it means that the data block corresponding to the fingerprint already exists, and the data block will not be sent. Just modify the pointer in the inode of the mirror server to point to the same data block that has been stored, and modify the data block The number of times of indexing; otherwise send the data block, when saving the data block, the data block is directly stored in a complete cluster, and the pointer in the inode points to the newly stored data block. Repeat the above process until the entire image file is sent and the image file is registered in the image database as available.

本发明的有益效果：Beneficial effects of the present invention:

1.本发明利用了哈希函数计算数据块指纹，预传输指纹到服务端端，利用哈希函数的冲突性对比指纹是否已存，从而避免数据块的重复传输，提高了网路传输的效率。1. The present invention uses the hash function to calculate the fingerprint of the data block, pre-transmits the fingerprint to the server, and uses the conflict of the hash function to compare whether the fingerprint is already stored, thereby avoiding repeated transmission of the data block and improving the efficiency of network transmission .

2.本发明切分下的数据块等于文件系统的簇大小，通过客户端切分发送到服务端，服务端直接将该数据块存入文件系统的一个簇中，将该数据块指纹直接更新到系统指纹库。该方案可以避免服务端端文件恢复，切分，计算指纹的重复性工作。2. The data block divided by the present invention is equal to the cluster size of the file system, and is sent to the server through the client, and the server directly stores the data block in a cluster of the file system, and directly updates the fingerprint of the data block to the system fingerprint library. This solution can avoid the repetitive work of server-side file recovery, segmentation, and fingerprint calculation.

3.本发明利用了空间局部性原理，只在内存中存储用于索引的部分指纹，把完整的指纹放在磁盘上，将指纹与对应的数据块放置在同一快组中，充分考虑了底层的磁盘布局，大大缓解了访问磁盘上指纹的磁盘访问开销。3. The present invention utilizes the principle of spatial locality, stores only part of the fingerprints used for indexing in the memory, puts the complete fingerprints on the disk, and places the fingerprints and corresponding data blocks in the same fast group, fully considering the underlying disk layout, which greatly alleviates the disk access overhead of accessing fingerprints on disk.

4.本发明充分利用了指纹预取技术，将同一快组中的数据块指纹预先度取到页高速缓冲区中，减小更新磁盘上指纹和引用计数的寻道时间，以较小的内存开销换取较高的IO性能。4. The present invention makes full use of the fingerprint prefetching technology, pre-fetches the data block fingerprints in the same fast group into the page high-speed buffer, reduces the seek time for updating the fingerprints and reference counts on the disk, and uses less memory Overhead in exchange for higher IO performance.

5.本发明对云计算系统中虚拟机镜像的管理进行整体性能上的优化，新生镜像的传输带宽利用率与磁盘空间的存储利用率都有很大的提高，从而极大的降低了部署云计算系统的投入成本。5. The present invention optimizes the overall performance of the management of the virtual machine image in the cloud computing system, and the transmission bandwidth utilization rate of the new image and the storage utilization rate of the disk space are greatly improved, thereby greatly reducing the cost of deploying the cloud. Calculate the input cost of the system.

6.本发明可以用于多种云计算平台上的虚拟机镜像管理。6. The present invention can be used for virtual machine image management on various cloud computing platforms.

附图说明Description of drawings

图1指纹存储器的布局图。Figure 1 The layout of the fingerprint memory.

图2指纹过滤器的设计图。Figure 2. Design diagram of the fingerprint filter.

图3为本发明进行基于重复数据删除的镜像管理流程图。FIG. 3 is a flow chart of image management based on data deduplication in the present invention.

具体实施方式Detailed ways

图1是一个磁盘快组中的指纹存储器的布局图。I节点区存储文件或目录的inode。指纹存储器部署在I节点区后的磁盘空间，记录该磁盘块组中的所有数据块的指纹和索引信息。指纹存储区后的磁盘空间，存储数据块。Fig. 1 is a layout diagram of a fingerprint memory in a disk fast group. The inode area stores the inode of a file or directory. The fingerprint storage is deployed in the disk space behind the I-node area, and records the fingerprints and index information of all data blocks in the disk block group. The disk space behind the fingerprint storage area stores data blocks.

图2 是内存中指纹过滤器的设计图。它是一个二级过滤器，第一级过滤器映射一个指纹的前n位，取名为索引键；第二级过滤器映射之后的k位，取名为桶键。索引表是一个含有

Figure 2013100465409100002DEST_PATH_IMAGE002

个元素的数组，数组下标表示指纹的前n位，数组中存放前n位与其下标相同的其他指纹的地址。本过滤器将前n为相同的指纹聚集性地存到一个磁盘数据块中，该数据块名叫桶，一个桶中的内容包括桶键和该指纹对应的数据块的磁盘块号。桶键是指纹的[n+1，n+1+k]字段，这些指纹的前n位相同。桶键按照升序排序。若一个桶中的指纹放满，则开辟新数据块为桶，以链表形式相连。Figure 2 is a design diagram of the in-memory fingerprint filter. It is a two-level filter. The first-level filter maps the first n bits of a fingerprint, which is named an index key; the second-level filter maps the k bits after that, and is named a bucket key. An index table is a

An array of elements, the array subscript indicates the first n digits of the fingerprint, and the address of other fingerprints with the same subscript as the first n digits are stored in the array. This filter aggregates the first n identical fingerprints into a disk data block. The data block is called a bucket, and the content in a bucket includes the bucket key and the disk block number of the data block corresponding to the fingerprint. The bucket key is the [n+1, n+1+k] field of fingerprints, the first n bits of these fingerprints are the same. Bucket keys are sorted in ascending order. If the fingerprints in a bucket are full, a new data block is created as a bucket, which is connected in the form of a linked list.

该方法进行虚拟机镜像传输与存储的步骤是：The method performs the steps of virtual machine image transmission and storage as follows:

（1）镜像文件传输过程(1) Image file transfer process

客户端采用固定大小分块方式将文件切分为若干数据块，计算每个数据块的MD5散列值，将该散列值发送至服务端。然后服务端进行指纹库查找，如果该指纹已存在指纹库中，则表示该指纹对应的数据块已经存在服务端中，此时服务端通知客户端不用发送该数据块；如果该指纹不存在指纹库中，则表示服务端中不存在该指纹对应的数据块，服务端则通知客户端发送该数据块。The client uses a fixed-size block method to divide the file into several data blocks, calculates the MD5 hash value of each data block, and sends the hash value to the server. Then the server searches the fingerprint library. If the fingerprint already exists in the fingerprint library, it means that the data block corresponding to the fingerprint already exists in the server. At this time, the server notifies the client not to send the data block; if the fingerprint does not exist In the library, it means that the data block corresponding to the fingerprint does not exist in the server, and the server notifies the client to send the data block.

客户端具体流程如下：The specific process of the client is as follows:

Step 1：读取数据文件，将文件根据固定长度切片算法进行切分，计算划分数据块的hash值。Step 1 : Read the data file, segment the file according to the fixed-length slicing algorithm, and calculate the hash value of the divided data blocks.

Step 2：发送hash值发送至存储端服务端并等待服务端返回信息。Step 2: Send the hash value to the storage server and wait for the server to return information.

Step 3：服务端返回信息至客户端，指明该hash指纹值是否存在。若不存在，则发送相应的文件数据块，若存在，则返回step2，继续发送下一个指纹。Step 3: The server returns information to the client, indicating whether the hash fingerprint value exists. If it does not exist, then send the corresponding file data block, if it exists, return to step2, and continue to send the next fingerprint.

Step 4：如此反复，直到文件读取完毕。Step 4: Repeat this until the file is read.

服务端流程如下：The server process is as follows:

Step 1：收到客户端发来的hash指纹值后，通过内存指纹过滤器，指纹存储区查询。若找到，则更新数据块索引，向客户端发送“找到”信息，否则发送“未找到”信息。Step 1: After receiving the hash fingerprint value sent by the client, query the fingerprint storage area through the memory fingerprint filter. If found, the data block index is updated, and a "found" message is sent to the client, otherwise a "not found" message is sent.

Step 2：未找到指纹，则启动接受数据环节，接受客户端传送来的文件数据，并将该数据块写入磁盘，调整指针。更新内存过滤器和指纹存储区，向客户端发送处理结果。Step 2: If the fingerprint is not found, start the link of receiving data, accept the file data sent by the client, write the data block to the disk, and adjust the pointer. Update the memory filter and fingerprint storage area, and send the processing result to the client.

Step 3：收到客户端发来的结束标志，更新镜像数据库，将该镜像标为active，退出流程。Step 3: After receiving the end sign from the client, update the mirror database, mark the mirror as active, and exit the process.

（2）指纹库查找过程(2) Fingerprint library search process

初始化指纹过滤器时，首先分配一个索引表，指纹存储器存在于磁盘上，它的设计遵循文件系统的磁盘布局，文件系统将数据块组织在快组中，为了充分利用空间局部性原理，将指纹存储器也放在每个块组中，图1说明了如何在快组中部署指纹存储器。每个块组中分配一个指纹存储器，它是由指纹和引用计数为元素的数组。向磁盘写入数据块后，要更新数据块对应的指纹和索引次数。When initializing the fingerprint filter, first allocate an index table. The fingerprint memory exists on the disk. Its design follows the disk layout of the file system. The file system organizes data blocks in fast groups. In order to make full use of the principle of spatial locality, the fingerprint Memory is also placed in each block group, and Figure 1 illustrates how to deploy fingerprint memory in a block group. Each block group is allocated a fingerprint memory, which is an array consisting of fingerprints and reference counts. After writing a data block to the disk, the fingerprint and index times corresponding to the data block should be updated.

当一个指纹到达文件系统后，首先查找指纹过滤器，如果指纹与过滤器中“n+k”位前缀相匹配，过滤器将会返回相应的物理磁盘数据块号。有可能多个指纹具有相同“n+k”位前缀，所以过滤器可能多次返回相同的块号。为了消除误报，文件系统将会根据返回的块号查找相应的指纹存储器，并且验证到达的指纹是否与指纹存储器中的完整指纹相匹配。When a fingerprint arrives in the file system, the fingerprint filter is first searched. If the fingerprint matches the "n+k" bit prefix in the filter, the filter will return the corresponding physical disk data block number. It is possible for multiple fingerprints to have the same "n+k" bit prefix, so the filter may return the same block number multiple times. In order to eliminate false positives, the file system will look up the corresponding fingerprint storage according to the returned block number, and verify whether the arriving fingerprint matches the complete fingerprint in the fingerprint storage.

本文件系统实现了一种指纹预取机制。当文件系统访问快组的指纹存储器时，不只是访问目标元素，并且将快组整个指纹过滤器预先读取到页缓冲区中，同一快组中的顺序写入就能够直接更新页缓冲区中的指纹过滤器，减少磁盘寻道次数。This file system implements a fingerprint prefetching mechanism. When the file system accesses the fingerprint memory of the fast group, it not only accesses the target element, but also pre-reads the entire fingerprint filter of the fast group into the page buffer, and the sequential writing in the same fast group can directly update the page buffer A fingerprint filter to reduce the number of disk seeks.

（3）镜像文件存储过程(3) Image file storage process

该文件系统的数据写入分为两种情况。第一种情况是，当客户端发送至服务端的指纹已经存在于服务端时，此时需要更新指纹存储器中该指纹对应的引用计数，同时需要修改该镜像文件inode节点中文件逻辑块号所对应磁盘逻辑块号。第二种情况是，当客户端发送至服务端的指纹不存在于服务端时，此时首先要将该指纹写入指纹存储器，并且更新该指纹对应的引用计数。同时要求客户端发送该指纹对应的数据块，等服务端接收到数据块后，将该数据块写入磁盘。本方法中数据块的大小和文件系统簇大小一致，一个数据块存放入文件系统的一个簇中，然后通过修改inode，使文件指针指向该簇。The data writing of the file system is divided into two cases. The first case is that when the fingerprint sent by the client to the server already exists on the server, it is necessary to update the reference count corresponding to the fingerprint in the fingerprint storage and modify the file logical block number corresponding to the inode node of the image file Disk logical block number. The second case is that when the fingerprint sent by the client to the server does not exist in the server, the fingerprint should first be written into the fingerprint memory, and the reference count corresponding to the fingerprint should be updated. At the same time, the client is required to send the data block corresponding to the fingerprint, and after the server receives the data block, write the data block to the disk. In this method, the size of the data block is consistent with the size of the file system cluster, and a data block is stored in a cluster of the file system, and then the file pointer is made to point to the cluster by modifying the inode.

下面结合附图和实施方法对本发明作进一步的详细说明。The present invention will be further described in detail below in conjunction with the accompanying drawings and implementation methods.

参照图3执行步骤来说明本发明实施过程：The implementation process of the present invention is described with reference to Fig. 3 execution steps:

（1）镜像文件传输过程(1) Image file transfer process

在生成镜像后，如步骤1所描述，向服务端发送写入文件的命令，服务端执行步骤2，处理写入命令；然后进行步骤3，把镜像文件的元数据发送至服务端，元数据包括：文件类型、访问权限、拥有者、时间戳、尺寸、数据块指针等。步骤4处理客户端发送的镜像文件元数据信息，根据文件的文件名，调用open()系统调用，打开文件，准备写入文件。After the image is generated, as described in step 1, send the command to write the file to the server, and the server executes step 2 to process the write command; then proceed to step 3 to send the metadata of the image file to the server, the metadata Including: file type, access rights, owner, timestamp, size, data block pointer, etc. Step 4 processes the metadata information of the image file sent by the client, calls the open() system call according to the file name of the file, opens the file, and prepares to write the file.

开始步骤5，调用镜像文件切分处理程序，转入步骤6，判断当前文件指针是否是文件的结束，如果是，则进行步骤15，发送结束信号，通知服务端，本次镜像传输已经结束，否则进行步骤7，采用静态分块方式对镜像文件进行切分。Start step 5, call the image file segmentation processing program, go to step 6, judge whether the current file pointer is the end of the file, if so, proceed to step 15, send an end signal, and notify the server that the mirror image transfer has ended, Otherwise, proceed to step 7 to split the image file in a static block manner.

通过执行步骤7计算分块的MD5散列值作为该分块的指纹，并且将指纹发送至服务端。在客户端收到服务端指纹库查询返回结果后，判断是否需要发送指纹对应的数据块，如果需要，执行步骤17，发送数据块。否则执行步骤11，处理下一个分块。进入步骤6。By performing step 7, calculate the MD5 hash value of the block as the fingerprint of the block, and send the fingerprint to the server. After the client receives the query result returned by the server-side fingerprint database, it judges whether it needs to send the data block corresponding to the fingerprint, and if necessary, execute step 17 to send the data block. Otherwise, execute step 11 to process the next block. Go to step 6.

（2）指纹库查找过程(2) Fingerprint library search process

服务端执行步骤8，进行指纹库查询，其查询步骤如下描述：The server executes step 8 to query the fingerprint library, and the query steps are described as follows:

首先查找指纹过滤器，如果指纹与过滤器中“n+k”位前缀相匹配，过滤器将会返回相应的物理磁盘数据块号。然后根据返回的块号查找相应的指纹存储器，并且验证到达的指纹是否与指纹存储器中的完整指纹相匹配。如果到达指纹已经存在指纹库中，执行步骤10，更新指纹引用计数。如果未找到该指纹的记录，则表示该指纹不存在指纹库中，执行步骤12，保存指纹，并且更新指纹引用计数，同时通知客户端发送相应的数据块。First look for the fingerprint filter, if the fingerprint matches the "n+k" bit prefix in the filter, the filter will return the corresponding physical disk data block number. Then search the corresponding fingerprint memory according to the returned block number, and verify whether the arriving fingerprint matches the complete fingerprint in the fingerprint memory. If the arriving fingerprint already exists in the fingerprint library, perform step 10 to update the fingerprint reference count. If no record of the fingerprint is found, it means that the fingerprint does not exist in the fingerprint library, execute step 12, save the fingerprint, update the fingerprint reference count, and notify the client to send the corresponding data block.

（3）镜像文件存储过程(3) Image file storage process

数据的存储包括两部分，一是指纹与指纹引用计数的存储，二是数据块的存储。指纹保存在指纹存储器中，其布局依据文件系统的磁盘布局，如图1所描述。特别之处在于，指纹存储区是基于文件系统布局的，把指纹放置在磁盘上，更加减少重复数据删除期间的磁盘寻道时间。如果查询结果表示到达指纹存在于指纹库中，则执行步骤10，更新指纹的引用计数。否则执行步骤12，保存接收到的指纹，并更新指纹引用计数。Data storage includes two parts, one is the storage of fingerprints and fingerprint reference counts, and the other is the storage of data blocks. The fingerprint is stored in the fingerprint memory, and its layout is based on the disk layout of the file system, as shown in Figure 1. The special feature is that the fingerprint storage area is based on the file system layout, and the fingerprint is placed on the disk, which further reduces the disk seek time during data deduplication. If the query result indicates that the arriving fingerprint exists in the fingerprint library, then step 10 is performed to update the reference count of the fingerprint. Otherwise, execute step 12, save the received fingerprint, and update the fingerprint reference count.

服务端接收到客户端发送的数据块后，执行步骤14，将数据块写入磁盘。After receiving the data block sent by the client, the server performs step 14 to write the data block to the disk.

Claims

1. optimization method based on the management of the virtual machine image of data de-duplication is characterized in that the method is specifically:

When client need to be uploaded newly-generated virtual machine image file, at first register this mirror image at mirror database, and distribute an inode to this image file at the mirroring service end; In client, adopt the fixed size method of partition to carry out piecemeal to image file, the data block size is identical with bunch size of service end file system; Calculate the MD5 hashed value of each data block, this hashed value is fingerprint, then successively fingerprint is sent to service end; After the service end finger daemon received fingerprint, by access fingerprint filter and fingerprint memory, whether the fingerprint of searching reception existed, and service end is back to client with lookup result; Client has existed in fingerprint memory as fingerprint according to the lookup result that returns, and represents that data block corresponding to this fingerprint exists, do not send data block, as long as the pointer in the inode of modification mirroring service end points to the identical block of having deposited, and revises the index number of times of this data block; Otherwise the transmission data block, when the save data piece, data block directly deposits one complete bunch in, and the inode pointer is pointed to the data block that newly deposits in; Repeat said process, until whole image file is sent, and the registration image file can be used in mirror database.