Data deduplication system and method based on inter-network secure IP tunnelTechnical Field
The invention belongs to the field of network systems, and particularly relates to a data deduplication system and method based on an inter-network secure IP tunnel.
Background
With the continuous development of internet technology and the continuous increase of the total number of internet users, the global data volume is also greatly increased. In this case, how to manage data storage economically and efficiently has become one of the most challenging important tasks in mass storage systems in the big data age. At the same time, limited by the storage capabilities of a single device, individuals and organizations often need to resort to cloud storage service providers to enable low cost storage, transmission and backup of ever-increasing data. In this case, in order to further improve the efficiency of data storage and reduce the cost of data storage, common cloud service providers typically avoid multiple storages of duplicate data as much as possible by adopting a data deduplication technology, so as to reduce the overhead of data storage and also reduce the upload bandwidth of users.
Meanwhile, in order to avoid private data disclosure of users, the real plaintext data is often required to be encrypted and then transmitted in consideration of a large number of malicious attackers in the real network environment. However, when different users upload the same data, the private keys adopted by the respective encryption are different, so that the same plaintext finally becomes different ciphertext, and the data repetition cannot be identified by the data storage server.
In the current network research, there are two existing conventional solutions, namely a method based on converged encryption (Convergent Encryption) and a method based on SSL or TLS. For the data plaintext M, the method based on convergent encryption uses a hash function to calculate a hash value H (M) of M, and encrypts the plaintext with the hash value as an encryption key to obtain E (H (M), M). Meanwhile, in order to ensure that the user can still decrypt the data when downloading the file, the data center needs to additionally store the hash value H (M) serving as the encryption key. For security reasons, the hash value H (M) needs to be encrypted by the user's respective private key to obtain E (Ka, H (M)). The SSL or TLS-based method directly encrypts data by SSL or TLS protocol during transmission, and decrypts data by SSL or TLS protocol during reception by the data storage server, so as to perform subsequent operations. For the method of convergent encryption, since the encryption result of the hash key is required to be additionally stored, the block size of the data deduplication can basically be only at the file level or a relatively large block (KB level), otherwise, when the block size is very small, the additionally introduced storage overhead cannot be ignored, so that the redundant information in the file cannot be fully utilized for deduplication, and the deduplication effect is poor. For the SSL or TLS method, since the data server uses the plaintext of the data when performing the repeated data determination, if the private data of the user is directly stored in the plaintext form, there is a greater potential safety hazard, so the data still needs to be encrypted before actually storing the data each time, and the data needs to be decrypted each time when the data is downloaded each time, thereby increasing the CPU overhead of the server.
Therefore, it is a problem to be solved by those skilled in the art to provide a system/method that can perform deduplication with a smaller chunk size and offload the complex encryption operation of the server into the network computing, and the system/method can greatly increase the deduplication efficiency of the data storage system and significantly reduce the CPU overhead of the server.
Disclosure of Invention
Aiming at the defect that the existing data file transmission data deduplication of a storage server cannot simultaneously consider the data deduplication storage efficiency and the CPU overhead of the storage server, the invention provides a data deduplication system and a data deduplication method based on an inter-network secure IP tunnel, which can ensure the safety and reliability of data packet transmission between a local area network and a data center network by establishing an encrypted inter-network IP tunnel, thereby ensuring the privacy of user privacy data in an untrusted transmission environment with malicious nodes, simultaneously enabling the inside of the data center network to obtain the plaintext of the user data, facilitating the subsequent analysis and processing of the data without introducing additional storage overhead, and unloading the encryption and decryption operation of the storage data to an edge programmable switch, thereby simultaneously realizing the great improvement of the deduplication efficiency and the improvement of the CPU utilization rate of the server, and saving the overhead and the cost of the data center storage server.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
in a first aspect, the invention provides a data deduplication system based on an inter-network secure IP tunnel, which comprises the inter-network secure IP tunnel, an edge switch and a centralized controller;
The inter-network security IP tunnel is established based on a programmable switch and is positioned between a local area network and a data center, and is used for encrypting the whole packet of the data packet which is related to the data deduplication and leaves the local area network and the data center and decrypting the whole packet of the data packet which is related to the data deduplication and enters the local area network and the data center;
The edge switch corresponds to a storage server of the data center and is used for finishing the check of whether the data block of the uploading file is stored by the storage server, and performing AES encryption operation on the non-stored plaintext data block and AES decryption operation on the encrypted data block of the downloading file;
The centralized controller is positioned in the data center and is responsible for carrying out key negotiation with the controller in the local area network, if a new file data block is stored during file uploading, a new table entry is added to the hash-index table of the corresponding storage server, and if the stored file data block is not used by other files any more during file deleting, an old table entry is deleted to the hash-index table of the corresponding storage server.
In a second aspect, the present invention provides a local area network client file uploading method using the data deduplication system according to the first aspect, including the following steps:
when uploading a file, a local area network client calculates hash values of corresponding data blocks respectively, and sequentially assembles the hash values into a plurality of inquiry data packets, wherein each data packet comprises hash values of m data blocks and is sent to a corresponding storage server through an inter-network secure IP tunnel;
Finishing the inquiry function when each inquiry data packet sent by the local area network client passes through the edge switch corresponding to the storage server, and recording index information of the data block on the storage server if the data block is stored;
Thirdly, integrating the received multiple query result data packets by the storage server to return a complete query result data packet;
Step four, the local area network client receives the complete query result data packet and then sends the missing data block to the storage server;
step five, calculating a hash value of a plaintext after the corresponding missing data block passes through an edge switch, encrypting the plaintext by AES, and obtaining an encrypted data block of a ciphertext and transmitting the encrypted data block to a storage server;
And step six, the storage server receives the data block encrypted by the edge switch, updates a data block counter maintained by the storage server, and sends the stored hash value and index to the centralized controller so as to update a hash-index table of the corresponding edge switch.
Optionally, in the third step, the complete query result data packet includes a 01 bit string, and the corresponding data block is indicated by the corresponding position 0 not to be stored, and the corresponding position 1 indicates that the corresponding data block is already stored on the server.
Optionally, in the fourth step, after receiving the complete query result data packet, the local area network client sends the missing data block corresponding to the position 0 to the storage server.
In a third aspect, the present invention provides a method for downloading a local area network client file using the data deduplication system according to the first aspect, including the steps of:
the first step, a local area network client sends a file ID to be downloaded;
The second step, the storage server sends the encrypted ciphertext to the local area network client through the internet security IP tunnel according to the index list of the corresponding file;
Thirdly, decrypting the encrypted ciphertext through AES to obtain the original plaintext of the data when the encrypted ciphertext passes through the corresponding edge switch.
In a fourth aspect, the present invention provides a method for deleting a local area network client file using the data deduplication system according to the first aspect, including the following steps:
the first step, a local area network client sends a file ID to be deleted;
and step two, the storage server sequentially reduces the corresponding data block counter by 1 according to the index list of the corresponding file, and if the corresponding counter is reduced to 0, the centralized controller is informed to update the hash-index table in the edge switch.
The invention has the beneficial effects that the system and the method for establishing the inter-network security IP tunnel and carrying out data deduplication through the network programmable switch can use finer granularity of data blocks to carry out deduplication, greatly improve the efficiency of data deduplication, thereby better identifying the information redundancy among files uploaded by a plurality of users, reducing the storage cost of a storage server and reducing the cost. And the programmable switch can analyze and process with the plaintext data by establishing the secure IP tunnel, so that other extra stored data are not needed to be introduced like the existing convergent encryption method. The invention also unloads the hash-index table lookup and AES encryption and decryption operations to the edge programmable switch, so that CPU overhead of the corresponding storage server is saved, and the CPU resource utilization rate can be improved.
Drawings
Fig. 1 is a system framework diagram of a data deduplication system based on an inter-network secure IP tunnel.
Detailed Description
The invention will now be described in further detail with reference to the accompanying drawings.
Example 1
The embodiment provides a data deduplication system based on an inter-network secure IP tunnel, and fig. 1 is a schematic diagram of an overall framework of the system. As shown in the figure, the system mainly comprises two parts of a local area network client and a data center, wherein a sender comprises a plurality of clients and a port programmable switch, and a receiver comprises a core programmable switch, a convergence programmable switch, an edge programmable switch and a plurality of storage servers. The local area network client needs to upload, download and delete files, an IP secure tunnel is established between a port switch of the local area network and a core switch of a data center, and an edge switch performs hash index lookup and AES algorithm operations.
The data deduplication system based on the inter-network secure IP tunnel specifically comprises an inter-network secure IP tunnel established between a local area network and a data center based on a programmable switch, an edge switch corresponding to a storage server and used for completing on-network calculation of data deduplication and encryption, and a centralized controller in the data center network.
The inter-network security IP tunnel module encrypts the whole packet of the data packet related to the data deduplication leaving the local area network or the data center and decrypts the whole packet of the data packet related to the data deduplication entering the local area network or the data center.
The edge switch completes the check of whether the corresponding data block of the uploading file is already stored by the storage server, the AES encryption operation of the plain data block that is not stored, and the decryption operation of the encrypted data block of the downloading file.
The centralized controller is responsible for carrying out key negotiation with a controller in the local area network, when a file is uploaded, if a new file block is stored, a new table entry is required to be added to the hash-index table of the corresponding storage server, and when the file is deleted, if the stored file block is not used by other files any more, an old table entry is required to be deleted to the hash-index table of the corresponding storage server.
The system establishes a safe IP tunnel between the local area network and the endpoint programmable exchanger of the data center, the endpoint exchanger decrypts the ciphertext data packet received from the external network and encrypts the plaintext data packet sent to the external network, thereby ensuring that the data packet is in a plaintext form in the local area network and the data center, the main repeated data deduplication function is unloaded to the edge programmable exchanger in the data center network, the local area network client calculates the corresponding hash code and sends the query data packet before sending the plaintext actual data, and after the data packet reaches the edge exchanger of the data center, whether the corresponding entry exists in the hash-index table of the programmable exchanger is queried, thereby judging whether the data block is stored on the corresponding storage server. Only when the data block is found to be missing, the client in the local area network can upload corresponding data subsequently, and AES encryption is carried out on the data block through a programmable switch in the data center network. When the client in the local area network needs to download the file, the storage ciphertext sent by the storage server is decrypted by an edge programmable switch in the data center network. Therefore, the system can support finer de-duplication granularity, so that the efficiency of de-duplication can be remarkably improved while introducing acceptable additional storage cost and reducing the consumption of a server CPU.
Example two
The embodiment provides a method for uploading, downloading and deleting files by adopting the data deduplication system based on the internet security IP tunnel of the embodiment I, which specifically comprises the following steps.
For file upload operations:
Firstly, when uploading a file, a local area network client calculates hash values of corresponding data blocks respectively, and sequentially assembles the hash values into a plurality of inquiry data packets, wherein each data packet comprises hash values of m data blocks and is sent to a corresponding storage server through an IP secure tunnel;
Secondly, finishing the inquiry function when each inquiry data packet sent by the local area network client passes through the edge switch corresponding to the storage server, and recording index information of the data block on the storage server if the data block is stored;
Thirdly, integrating the received data packets of the query results by the storage server to return a complete data packet, wherein the data packet contains 01 bit strings, and the corresponding data block is not stored by using the corresponding position 0, and the corresponding position 1 indicates that the corresponding data block is already stored on the server;
fourthly, after receiving the complete query result data packet, the local area network client sends the missing data block corresponding to the position 0 to the storage server;
Fifthly, calculating a hash value of the plaintext after the corresponding missing data block passes through an edge switch, encrypting the plaintext by AES, and transmitting the encrypted data block to a storage server;
And sixthly, the storage server receives the data block encrypted by the edge switch, updates a data block counter maintained by the storage server, and sends the stored hash value and index to a controller of the data center, so that a hash-index table of the corresponding programmable switch is updated.
For file download operations:
the first step, a local area network client sends a file ID to be downloaded;
the second step, the storage server sends the encrypted ciphertext to the local area network client through the IP secure tunnel according to the index list of the corresponding file;
Thirdly, decrypting the encrypted ciphertext through AES to obtain the original plaintext of the data when the encrypted ciphertext passes through the corresponding edge switch.
For file delete operations:
the first step, a local area network client sends a file ID to be deleted;
And step two, the storage server sequentially reduces the corresponding data block counter by 1 according to the index list of the corresponding file, and if the corresponding counter is reduced to 0, the storage server informs the controller to update the hash-index table in the edge switch.
The system and the method are innovative in that the system and the method finish the internet IP secure tunnel through the internet programmable switch for the first time, and unload the hash index table for data deduplication, the plaintext AES encryption operation and the ciphertext AES decryption operation to the internal programmable switch of the data center, so that the encrypted transmission of the data in an external unsafe network environment can be ensured, and the plaintext analysis processing in the local area network and the data center can be performed more efficiently by the smaller data block size without introducing additional storage overhead. Meanwhile, the invention can offload the calculation task of the storage server to the corresponding edge switch, and reduce the CPU overhead of the storage server by utilizing the linear processing speed of the programmable switch.
The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above examples, and all technical solutions belonging to the concept of the present invention belong to the protection scope of the present invention. It should be noted that modifications and adaptations to the invention without departing from the principles thereof are intended to be within the scope of the invention as set forth in the following claims.