



技术领域technical field
本申请涉及显示技术领域,尤其涉及一种数据重删方法及系统、电子设备、存储介质。The present application relates to the field of display technology, and in particular, to a data deduplication method and system, an electronic device, and a storage medium.
背景技术Background technique
无论是云存储系统,还是传统的数据存储系统,都存在有这大量的冗余数据,有的系统中数据重复率高达70%~90%,因此对存储系统进行重复数据删除是非常迫切的也是非常必要的。去重技术可以对存储系统中冗余数据进行删除,节省存储空间的使用量,节约网络带宽,同时减少数据中心的存储花费和日常能耗。但是传统的重复数据删除技术在云储存系统进行大数据重复数据删除时面临着巨大的挑战。Whether it is a cloud storage system or a traditional data storage system, there is a large amount of redundant data. In some systems, the data duplication rate is as high as 70% to 90%. Therefore, it is very urgent to deduplicate the storage system. very necessary. The deduplication technology can delete redundant data in the storage system, save the usage of storage space, save the network bandwidth, and at the same time reduce the storage cost and daily energy consumption of the data center. However, traditional data deduplication technology faces huge challenges in deduplication of big data in cloud storage systems.
发明内容SUMMARY OF THE INVENTION
本申请实施例提供了一种数据重删方法及系统、电子设备、存储介质,用以实现数据去重,节省存储空间。Embodiments of the present application provide a data deduplication method and system, an electronic device, and a storage medium, so as to realize data deduplication and save storage space.
本申请实施例提供的一种数据重删方法,该方法包括:A data deduplication method provided by the embodiment of the present application, the method includes:
客户端根据预设规则将目标数据拆分为多个数据片,并将数据片发送至全局重删模块;The client splits the target data into multiple data slices according to preset rules, and sends the data slices to the global deduplication module;
全局重删模块计算数据片的第一指纹信息,并查询指纹库中是否存在第一指纹信息的元数据映射信息;The global deduplication module calculates the first fingerprint information of the data piece, and queries whether the metadata mapping information of the first fingerprint information exists in the fingerprint database;
当指纹库中存在第一指纹信息的元数据映射信息时,客户端删除第一指纹信息对应的数据片。When the metadata mapping information of the first fingerprint information exists in the fingerprint database, the client deletes the data piece corresponding to the first fingerprint information.
在一些实施例中,根据预设规则将目标数据拆分为多个数据片,并将数据片发送至全局重删模块,具体包括:In some embodiments, the target data is divided into multiple data slices according to preset rules, and the data slices are sent to the global deduplication module, which specifically includes:
将目标数据拆分为与重删节点对应的多个数据片,将数据片发送至对应的重删节点。Divide the target data into multiple data slices corresponding to the deduplication nodes, and send the data slices to the corresponding deduplication nodes.
在一些实施例中,全局重删模块计算数据片的第一指纹信息,并在与重删节点对应的数据指纹库中查询第一指纹信息是否存在,具体包括:In some embodiments, the global deduplication module calculates the first fingerprint information of the data piece, and queries whether the first fingerprint information exists in the data fingerprint database corresponding to the deduplication node, which specifically includes:
重删节点计算接收的数据片的第一指纹信息,在与重删节点对应的指纹库中查询第一指纹信息是否存在。The deduplication node calculates the first fingerprint information of the received data piece, and queries whether the first fingerprint information exists in the fingerprint database corresponding to the deduplication node.
在一些实施例中,当数据指纹库中存在第一指纹信息时,方法还包括:In some embodiments, when the first fingerprint information exists in the data fingerprint database, the method further includes:
判断第一指纹信息的身份信息与指纹库中元数据映射信息对应的身份信息是否一致;Determine whether the identity information of the first fingerprint information is consistent with the identity information corresponding to the metadata mapping information in the fingerprint database;
若第一指纹信息的身份信息与指纹库中元数据映射信息对应的身份信息不一致,全局重删模块读取存储模块存储的与元数据映射信息对应的第二指纹信息,根据指纹冷热信息将第一指纹信息与第二指纹信息中的热信息作为新的第二指纹信息,并将存储模块存储的第二指纹信息更新为新的第二指纹信息。If the identity information of the first fingerprint information is inconsistent with the identity information corresponding to the metadata mapping information in the fingerprint database, the global deduplication module reads the second fingerprint information corresponding to the metadata mapping information stored in the storage module, The hot information in the first fingerprint information and the second fingerprint information is used as the new second fingerprint information, and the second fingerprint information stored in the storage module is updated to the new second fingerprint information.
在一些实施例中,还包括:In some embodiments, it also includes:
当指纹库中不存在第一指纹信息时,客户端向存储模块发送数据片下盘请求;When the first fingerprint information does not exist in the fingerprint database, the client sends a data chip download request to the storage module;
存储模块向客户端发送下盘允许信息;下盘允许信息包括数据片的内存地址;The storage module sends the downloading disk permission information to the client; the downloading disk permission information includes the memory address of the data slice;
客户端向全局重删模块发送存储请求,存储请求包括数据片的第一指纹信息以及内存地址;The client sends a storage request to the global deduplication module, and the storage request includes the first fingerprint information of the data slice and the memory address;
全局重删模块响应于存储请求,将第一指纹信息的元数据映射信息存储至存储模块的缓存队列;The global deduplication module, in response to the storage request, stores the metadata mapping information of the first fingerprint information in the cache queue of the storage module;
根据预设缓存规则,判断是否将元数据映射信息下盘。According to the preset caching rules, it is judged whether to download the metadata mapping information to the disk.
在一些实施例中,根据预设缓存规则,判断是否将元数据映射信息存储至存储模块,具体包括:In some embodiments, according to preset caching rules, determining whether to store the metadata mapping information in the storage module specifically includes:
判断存储模块的缓存队列的存储量是否到达预设阈值;Determine whether the storage capacity of the cache queue of the storage module reaches a preset threshold;
若是,则全局重删模块将元数据映射信息从缓存队列中删除;If so, the global deduplication module deletes the metadata mapping information from the cache queue;
若否,则按照缓存队列中的预设下盘顺序将元数据映射信息下盘。If not, the metadata mapping information is removed from the disk according to the preset removal order in the cache queue.
本申请实施例提供的一种数据重删系统,数据重删系统包括:A data deduplication system provided by an embodiment of the present application, the data deduplication system includes:
客户端,用于根据预设规则将目标数据拆分为多个数据片,并将数据片发送至全局重删模块;The client is used to split the target data into multiple data slices according to preset rules, and send the data slices to the global deduplication module;
全局重删模块,用于计算数据片的第一指纹信息,并在指纹库中查询第一指纹信息是否存在;a global deduplication module, used to calculate the first fingerprint information of the data piece, and query whether the first fingerprint information exists in the fingerprint database;
客户端还用于:当指纹库中存在第一指纹信息时,删除第一指纹信息对应的数据片。The client is further configured to delete the data piece corresponding to the first fingerprint information when the first fingerprint information exists in the fingerprint database.
本申请实施例提供的一种计算机设备,设备包括:A computer device provided by an embodiment of the present application, the device includes:
一个或多个处理器;one or more processors;
存储装置,用于存储一个或多个程序;a storage device for storing one or more programs;
当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器实现本申请实施例提供的数据重删方法。When one or more programs are executed by one or more processors, the one or more processors implement the data deduplication method provided by the embodiments of the present application.
本申请实施例提供的一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现本申请实施例提供的数据重删方法。A computer-readable storage medium provided by an embodiment of the present application stores a computer program thereon, and when the program is executed by a processor, the data deduplication method provided by the embodiment of the present application is implemented.
本申请实施例提供的一种计算机程序产品,包括计算机程序,计算机程序被处理器执行时实现本申请实施例提供的数据重删方法。A computer program product provided by an embodiment of the present application includes a computer program, and when the computer program is executed by a processor, the data deduplication method provided by the embodiment of the present application is implemented.
本申请实施例提供的数据重删方法及系统、电子设备、存储介质,通过全局重删模块计算数据片的第一指纹信息,当指纹库中存在第一指纹信息的元数据映射信息时,认为该数据片属于重复数据,并将该数据片删除,从而可以缩减数据存储系统的数据存储量,节省存储空间,节省存储成本。In the data deduplication method and system, electronic device, and storage medium provided by the embodiments of the present application, the first fingerprint information of the data piece is calculated by the global deduplication module, and when the metadata mapping information of the first fingerprint information exists in the fingerprint database, it is considered that The data piece belongs to duplicate data, and the data piece is deleted, so that the data storage capacity of the data storage system can be reduced, storage space and storage cost can be saved.
附图说明Description of drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简要介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the drawings used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without any creative effort.
图1为本申请实施例提供的一种数据重删方法的流程示意图;1 is a schematic flowchart of a data deduplication method according to an embodiment of the present application;
图2为本申请实施例提供的另一种数据重删方法的流程示意图;2 is a schematic flowchart of another data deduplication method provided by an embodiment of the present application;
图3为本申请实施例提供的一种数据重删系统的结构示意图;3 is a schematic structural diagram of a data deduplication system provided by an embodiment of the present application;
图4为本申请实施例提供的一种计算机设备的结构示意图。FIG. 4 is a schematic structural diagram of a computer device according to an embodiment of the present application.
具体实施方式Detailed ways
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例的附图,对本申请实施例的技术方案进行清楚、完整地描述。显然,所描述的实施例是本申请的一部分实施例,而不是全部的实施例。并且在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。基于所描述的本申请的实施例,本领域普通技术人员在无需创造性劳动的前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be described clearly and completely below with reference to the accompanying drawings of the embodiments of the present application. Obviously, the described embodiments are some, but not all, embodiments of the present application. And the embodiments in this application and the features in the embodiments may be combined with each other without conflict. Based on the described embodiments of the present application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present application.
除非另外定义,本申请使用的技术术语或者科学术语应当为本申请所属领域内具有一般技能的人士所理解的通常意义。本申请中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。“包括”或者“包含”等类似的词语意指出现该词前面的元件或者物件涵盖出现在该词后面列举的元件或者物件及其等同,而不排除其他元件或者物件。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接,而是可以包括电性的连接,不管是直接的还是间接的。Unless otherwise defined, technical or scientific terms used in this application shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. As used in this application, "first", "second" and similar words do not denote any order, quantity, or importance, but are merely used to distinguish the various components. "Comprises" or "comprising" and similar words mean that the elements or things appearing before the word encompass the elements or things recited after the word and their equivalents, but do not exclude other elements or things. Words like "connected" or "connected" are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect.
需要注意的是,附图中各图形的尺寸和形状不反映真实比例,目的只是示意说明本申请内容。并且自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。It should be noted that the dimensions and shapes of the figures in the accompanying drawings do not reflect the real scale, and are only intended to illustrate the content of the present application. And the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout.
本申请实施例提供了一种数据重删方法,如图1所示,该方法包括:The embodiment of the present application provides a data deduplication method, as shown in FIG. 1 , the method includes:
S101、客户端根据预设规则将目标数据拆分为多个数据片,并将数据片发送至全局重删模块;S101. The client splits the target data into multiple data slices according to a preset rule, and sends the data slices to the global deduplication module;
S102、全局重删模块计算数据片的第一指纹信息,并查询指纹库中是否存在第一指纹信息的元数据映射信息;S102, the global deduplication module calculates the first fingerprint information of the data piece, and queries whether the metadata mapping information of the first fingerprint information exists in the fingerprint database;
S103、当指纹库中存在第一指纹信息的元数据映射信息时,客户端删除第一指纹信息对应的数据片。S103. When the metadata mapping information of the first fingerprint information exists in the fingerprint database, the client deletes the data piece corresponding to the first fingerprint information.
需要说明的是,数据重删是指重复数据的删除。It should be noted that data deduplication refers to the deletion of duplicate data.
本申请实施例提供的数据重删方法,通过全局重删模块计算数据片的第一指纹信息,当指纹库中存在第一指纹信息的元数据映射信息时,认为该数据片属于重复数据,并将该数据片删除,从而可以缩减数据存储系统的数据存储量,节省存储空间和存储系统的能耗,节省存储成本。并且,全局重删模块可以跨越多个客户端删除重复的数据,在整个存储系统范围内进行重复数据删除,实现ceph分布式存储系统的全局数据重删,便于存储系统的性能扩展。In the data deduplication method provided by the embodiment of the present application, the first fingerprint information of the data piece is calculated by the global deduplication module, and when the metadata mapping information of the first fingerprint information exists in the fingerprint database, the data piece is considered to be duplicate data, and By deleting the data slice, the data storage amount of the data storage system can be reduced, the storage space and the energy consumption of the storage system can be saved, and the storage cost can be saved. In addition, the global deduplication module can delete duplicate data across multiple clients and perform deduplication in the entire storage system to realize global data deduplication of the ceph distributed storage system and facilitate the performance expansion of the storage system.
在一些实施例中,本申请实施例提供的数据重删方法应用于ceph分布式存储系统。即ceph分布式存储系统包括客户端和全局重删模块。In some embodiments, the data deduplication method provided by the embodiments of the present application is applied to the ceph distributed storage system. That is, the ceph distributed storage system includes a client and a global deduplication module.
在一些实施例中,目标数据、数据片为输入/输出(Input/Output,IO)数据。In some embodiments, the target data and data pieces are input/output (IO) data.
在一些实施例中,指纹库为第一指纹信息表。第一指纹信息表包括:数据的身份信息Poolid、第一指纹信息的元数据映射信息FingerPrint、数据编号Ref以及数据的内存地址addr。In some embodiments, the fingerprint database is the first fingerprint information table. The first fingerprint information table includes: the identity information Poolid of the data, the metadata mapping information FingerPrint of the first fingerprint information, the data number Ref, and the memory address addr of the data.
在一些实施例中,全局重删模块采用预设算法计算数据片的第一指纹信息。预设算法例如可以是哈希算法。In some embodiments, the global deduplication module uses a preset algorithm to calculate the first fingerprint information of the data piece. The preset algorithm may be, for example, a hash algorithm.
在一些实施例中,全局重删模块包括至少一个重删节点;根据预设规则将目标数据拆分为多个数据片,并将数据片发送至全局重删模块,具体包括:In some embodiments, the global deduplication module includes at least one deduplication node; splits the target data into multiple data slices according to preset rules, and sends the data slices to the global deduplication module, specifically including:
根据预设规则将目标数据拆分为与重删节点对应的多个数据片,将数据片发送至对应的重删节点。The target data is divided into multiple data slices corresponding to the deduplication nodes according to preset rules, and the data slices are sent to the corresponding deduplication nodes.
在一些实施例中,当全局重删模块包括多个重删节点时,可以采用如下方式部署全局重删模块中的重删节点:In some embodiments, when the global deduplication module includes multiple deduplication nodes, the deduplication nodes in the global deduplication module may be deployed in the following manner:
每一重删节点能够处理的IO数据的范围range采用如下公式计算:The range of IO data that each deduplication node can process is calculated using the following formula:
range=s÷scope%dedup_nr;range=s÷scope%dedup_nr;
其中,s=lba÷obj_size;lba为IO数据的数据尺寸;s为数据片分片大小,即数据片按照预设尺寸obj_size拆分后的大小;dedup_nr为重删节点在全局重删模块中的序号;scope为每一重删节点能够处理的数据范围,处理范围越大,重删效果越好。以全局重删模块包括三个重删节点为例,则依次每个重删节点能够处理的IO数据的范围为:[m,m+n]、[m+n+1,m+2n+1]、[m+2n+2,m+3n+1]……,依次类推,其中,m、n根据dedup_nr、range确定。需要说明的是,scope的数值越大,数据重删的效果越好。Among them, s=lba÷obj_size; lba is the data size of the IO data; s is the size of the data slice, that is, the size of the data slice after being split according to the preset size obj_size; dedup_nr is the deduplication node in the global deduplication module. Serial number; scope is the data range that each deduplication node can process. The larger the processing range, the better the deduplication effect. Taking the global deduplication module including three deduplication nodes as an example, the range of IO data that can be processed by each deduplication node in turn is: [m,m+n], [m+n+1,m+2n+1 ], [m+2n+2, m+3n+1]..., and so on, where m and n are determined according to dedup_nr and range. It should be noted that the larger the value of scope, the better the effect of data deduplication.
在具体实施时,根据预设规则将目标数据拆分为多个数据片包括:采用一致性哈希算法将目标数据按照重删节点能够处理的IO数据的范围range拆分为多个数据片。During specific implementation, dividing the target data into multiple data slices according to a preset rule includes: using a consistent hashing algorithm to divide the target data into multiple data slices according to the range of IO data that the deduplication node can process.
在具体实施时,全局重删模块中每一重删节点具有相应的节点地址IP。客户端将目标数据拆分为多个数据片后,查询数据片对应的重删节点的节点IP,根据节点IP将数据片发送至对应的重删节点。During specific implementation, each deduplication node in the global deduplication module has a corresponding node address IP. After the client splits the target data into multiple data slices, it queries the node IP of the deduplication node corresponding to the data slice, and sends the data slice to the corresponding deduplication node according to the node IP.
在一些实施例中,全局重删模块计算数据片的第一指纹信息,并在与重删节点对应的数据指纹库中查询第一指纹信息是否存在,具体包括:In some embodiments, the global deduplication module calculates the first fingerprint information of the data piece, and queries whether the first fingerprint information exists in the data fingerprint database corresponding to the deduplication node, which specifically includes:
重删节点计算接收的数据片的第一指纹信息,在与重删节点对应的指纹库中查询第一指纹信息是否存在。The deduplication node calculates the first fingerprint information of the received data piece, and queries whether the first fingerprint information exists in the fingerprint database corresponding to the deduplication node.
在一些实施例中,当数据指纹库中存在第一指纹信息时,方法还包括:In some embodiments, when the first fingerprint information exists in the data fingerprint database, the method further includes:
判断第一指纹信息的身份信息与指纹库中元数据映射信息对应的身份信息是否一致;Determine whether the identity information of the first fingerprint information is consistent with the identity information corresponding to the metadata mapping information in the fingerprint database;
若第一指纹信息的身份信息与指纹库中元数据映射信息对应的身份信息不一致,全局重删模块读取存储模块存储的与元数据映射信息对应的第二指纹信息,根据指纹冷热信息将第一指纹信息与第二指纹信息中的热信息作为新的第二指纹信息,并将存储模块存储的第二指纹信息更新为新的第二指纹信息。If the identity information of the first fingerprint information is inconsistent with the identity information corresponding to the metadata mapping information in the fingerprint database, the global deduplication module reads the second fingerprint information corresponding to the metadata mapping information stored in the storage module, The hot information in the first fingerprint information and the second fingerprint information is used as the new second fingerprint information, and the second fingerprint information stored in the storage module is updated to the new second fingerprint information.
在一些实施例中,存储模块包括对象存储设备(Object Storage Device,OSD)存储集群。OSD存储集群存储数据的指纹信息以及指纹信息对应的元数据映射信息。在具体实施时,OSD存储集群存储的元数据映射信息包括内存刷盘的指纹信息。OSD存储集群包括:存储元数据映射信息的元数据存储池Pool 1,以及存储IO数据的数据存储池Pool 2。元数据存储池Pool 1可以部署在高性能存储池,有利于数据重删和指纹检索。在具体实施时,全局重删模块读取数据存储池Pool 2中的第二指纹信息,新的第二指纹信息存储在数据存储池Pool 2中。In some embodiments, the storage module includes an Object Storage Device (OSD) storage cluster. The OSD storage cluster stores the fingerprint information of the data and the metadata mapping information corresponding to the fingerprint information. During specific implementation, the metadata mapping information stored in the OSD storage cluster includes the fingerprint information of the memory flash disk. The OSD storage cluster includes: a metadata storage pool Pool 1 for storing metadata mapping information, and a data storage pool Pool 2 for storing IO data. Metadata storage pool Pool 1 can be deployed in a high-performance storage pool, which is beneficial for data deduplication and fingerprint retrieval. During specific implementation, the global deduplication module reads the second fingerprint information in the data storage pool Pool 2 , and the new second fingerprint information is stored in the data storage pool Pool 2 .
在一些实施例中,还包括:In some embodiments, it also includes:
当指纹库中不存在第一指纹信息时,客户端向存储模块发送数据片下盘请求;When the first fingerprint information does not exist in the fingerprint database, the client sends a data chip download request to the storage module;
存储模块向客户端发送下盘允许信息;下盘允许信息包括数据片的内存地址;The storage module sends the downloading disk permission information to the client; the downloading disk permission information includes the memory address of the data slice;
客户端向全局重删模块发送存储请求,存储请求包括数据片的第一指纹信息以及内存地址;The client sends a storage request to the global deduplication module, and the storage request includes the first fingerprint information of the data slice and the memory address;
全局重删模块响应于存储请求,将第一指纹信息的元数据映射信息存储至存储模块的缓存队列;The global deduplication module, in response to the storage request, stores the metadata mapping information of the first fingerprint information in the cache queue of the storage module;
根据预设缓存规则,判断是否将元数据映射信息下盘。According to the preset caching rules, it is judged whether to download the metadata mapping information to the disk.
需要说明的是,当指纹库中不存在第一指纹信息时,即表示该数据片没有被存储过,不属于重复数据,需要下盘,客户端需要向存储模块发送下盘请求,写三副本,存储模块写三副本成功后,想客户端发送下盘允许信息。It should be noted that when the first fingerprint information does not exist in the fingerprint database, it means that the data piece has not been stored before and is not duplicate data, and needs to be downloaded. The client needs to send a download request to the storage module and write three copies. , after the storage module successfully writes three copies, the client sends the permission information for downloading the disk.
需要说明的是,客户端向全局重删模块发送的存储请求携带的指纹信息为第一指纹信息,即重删节点计算的指纹信息,从而无需再次对数据片的指纹信息进行计算,可以节省数据存储流程,还可以避免数据重复发送。It should be noted that the fingerprint information carried in the storage request sent by the client to the global deduplication module is the first fingerprint information, that is, the fingerprint information calculated by the deduplication node, so that the fingerprint information of the data piece does not need to be calculated again, which can save data. Stored processes can also avoid data re-sending.
在一些实施例中,根据预设缓存规则,判断是否将元数据存储至存储模块,具体包括:In some embodiments, according to preset caching rules, determining whether to store metadata in the storage module specifically includes:
判断存储模块的缓存队列的存储量是否到达预设阈值;Determine whether the storage capacity of the cache queue of the storage module reaches a preset threshold;
若是,则全局重删模块将元数据映射信息从缓存队列中删除;If so, the global deduplication module deletes the metadata mapping information from the cache queue;
若否,则按照缓存队列中的预设下盘顺序将元数据映射信息下盘。If not, the metadata mapping information is removed from the disk according to the preset removal order in the cache queue.
在具体实施时,判断存储模块的缓存队列的存储量是否到达预设阈值,当存储量到达预设阈值时,重删模块可以利用后台线程将指纹信息对应的元数据映射信息淘汰下盘,避免内存池被撑爆。In the specific implementation, it is judged whether the storage capacity of the cache queue of the storage module reaches the preset threshold. When the storage capacity reaches the preset threshold, the deduplication module can use the background thread to eliminate the metadata mapping information corresponding to the fingerprint information. The memory pool is bursting.
接下来对本申请实施例提供的数据重删方法的流程进行举例说明,如图2所示,数据重删方法包括如下步骤:Next, the flow of the data deduplication method provided by the embodiment of the present application is illustrated. As shown in FIG. 2 , the data deduplication method includes the following steps:
S201、根据预设规则将目标数据拆分为与重删节点对应的多个数据片,将数据片发送至对应的重删节点;S201. Split the target data into multiple data slices corresponding to the deduplication nodes according to preset rules, and send the data slices to the corresponding deduplication nodes;
S202、重删节点计算接收的数据片的第一指纹信息;S202, the deduplication node calculates the first fingerprint information of the received data slice;
S203、在与重删节点对应的指纹库中查询第一指纹信息是否存在;是则向客户端返回操作码F_EXIST执行步骤S206并执行步骤S204,否则向客户端返回操作码F_NO_EXIST并执行步骤S207;S203, query whether the first fingerprint information exists in the fingerprint database corresponding to the deduplication node; if yes, return the operation code F_EXIST to the client to execute step S206 and execute step S204, otherwise return the operation code F_NO_EXIST to the client and execute step S207;
S204、判断第一指纹信息的身份信息与指纹库中元数据映射信息对应的身份信息是否一致,否则执行步骤S205;S204, determine whether the identity information of the first fingerprint information is consistent with the identity information corresponding to the metadata mapping information in the fingerprint database, otherwise, perform step S205;
S205、全局重删模块读取存储模块存储的与元数据映射信息对应的第二指纹信息,根据指纹冷热信息将第一指纹信息与第二指纹信息中的热信息作为新的第二指纹信息,并将存储模块存储的第二指纹信息更新为新的第二指纹信息;S205: The global deduplication module reads the second fingerprint information corresponding to the metadata mapping information stored in the storage module, and uses the hot information in the first fingerprint information and the second fingerprint information as new second fingerprint information according to the fingerprint cold and heat information , and update the second fingerprint information stored in the storage module to new second fingerprint information;
S206、客户端删除第一指纹信息对应的数据片;S206, the client deletes the data piece corresponding to the first fingerprint information;
S207、客户端向存储模块发送数据片下盘请求;S207, the client sends a data chip download request to the storage module;
S208、存储模块向客户端发送下盘允许信息;下盘允许信息包括数据片的内存地址;S208, the storage module sends the disk permission information to the client; the disk permission information includes the memory address of the data slice;
S209、客户端向全局重删模块发送存储请求,存储请求包括数据片的第一指纹信息以及内存地址;S209, the client sends a storage request to the global deduplication module, where the storage request includes the first fingerprint information and the memory address of the data slice;
S210、全局重删模块响应于存储请求,将第一指纹信息的元数据映射信息存储至存储模块的缓存队列;S210, the global deduplication module, in response to the storage request, stores the metadata mapping information of the first fingerprint information in the cache queue of the storage module;
S211、判断存储模块的缓存队列的存储量是否到达预设阈值;若是则执行步骤S212,否则执行步骤S213;S211, determine whether the storage capacity of the cache queue of the storage module reaches a preset threshold; if so, go to step S212, otherwise go to step S213;
S212、将元数据映射信息从缓存队列中删除;S212, delete the metadata mapping information from the cache queue;
S213、将元数据映射信息及时下盘。S213: Download the metadata mapping information to the disk in time.
基于同一发明构思,本申请实施例还提供了一种数据重删系统,如图3所示,数据重删系统包括:Based on the same inventive concept, an embodiment of the present application also provides a data deduplication system. As shown in FIG. 3 , the data deduplication system includes:
客户端101,用于根据预设规则将目标数据拆分为多个数据片,并将数据片发送至全局重删模块;The
全局重删模块102,用于计算数据片的第一指纹信息,并在指纹库中查询第一指纹信息是否存在;The
客户端还用于:当指纹库中存在第一指纹信息时,删除第一指纹信息对应的数据片。The client is further configured to delete the data piece corresponding to the first fingerprint information when the first fingerprint information exists in the fingerprint database.
在一些实施例中,数据重删系统为ceph分布式存储系统。In some embodiments, the data deduplication system is a ceph distributed storage system.
在一些实施例中,全局重删模块包括至少一个重删节点,重删节点用于:计算接收的数据片的第一指纹信息,在与重删节点对应的指纹库中查询第一指纹信息是否存在。In some embodiments, the global deduplication module includes at least one deduplication node, and the deduplication node is configured to: calculate the first fingerprint information of the received data piece, and query whether the first fingerprint information is in the fingerprint database corresponding to the deduplication node. exist.
在一些实施例中,客户端包括:In some embodiments, the client includes:
数据拆分模块,用于:根据预设规则将目标数据拆分为与重删节点对应的多个数据片;The data splitting module is used for: splitting the target data into multiple data pieces corresponding to the deduplication nodes according to preset rules;
第一发送接收模块,用于:将数据片发送至对应的重删节点。The first sending and receiving module is used for: sending the data slice to the corresponding deduplication node.
在具体实施时,第一发送接收模块还用于查询数据片对应的重删节点的节点IP。从而第一发送接收模块可以根据节点IP将数据片发送至对应的重删节点。During specific implementation, the first sending and receiving module is further configured to query the node IP of the deduplication node corresponding to the data slice. Therefore, the first sending and receiving module can send the data slice to the corresponding deduplication node according to the node IP.
在一些实施例中,如图3所示,数据重删系统还包括:In some embodiments, as shown in Figure 3, the data deduplication system further includes:
存储模块103,用于存储第二指纹信息以及元数据映射信息。The
在一些实施例中,全局重删模块还用于:当数据指纹库中存在第一指纹信息时,判断第一指纹信息的身份信息与指纹库中元数据映射信息对应的身份信息是否一致;若第一指纹信息的身份信息与指纹库中元数据映射信息对应的身份信息不一致,全局重删模块读取存储模块存储的与元数据映射信息对应的第二指纹信息,根据指纹冷热信息将第一指纹信息与第二指纹信息中的热信息作为新的第二指纹信息;In some embodiments, the global deduplication module is further configured to: when the first fingerprint information exists in the data fingerprint database, determine whether the identity information of the first fingerprint information is consistent with the identity information corresponding to the metadata mapping information in the fingerprint database; if The identity information of the first fingerprint information is inconsistent with the identity information corresponding to the metadata mapping information in the fingerprint database, the global deduplication module reads the second fingerprint information corresponding to the metadata mapping information stored in the storage module, The thermal information in one fingerprint information and the second fingerprint information is used as the new second fingerprint information;
存储模块还用于将存储的第二指纹信息更新为新的第二指纹信息。The storage module is further configured to update the stored second fingerprint information to new second fingerprint information.
在一些实施例中,当指纹库中不存在第一指纹信息时,客户端的第一发送接收单元还用于向存储模块发送数据片下盘请求;In some embodiments, when the first fingerprint information does not exist in the fingerprint database, the first sending and receiving unit of the client terminal is further configured to send a data chip download request to the storage module;
存储模块还包括第二发送接收模块,用于向客户端发送下盘允许信息;下盘允许信息包括数据片的内存地址;The storage module further includes a second sending and receiving module, which is used to send the downloading disk permission information to the client; the downloading disk permission information includes the memory address of the data slice;
客户端的第一发送接收单元还用于:响应于下盘允许信息,在向全局重删模块发送存储请求,存储请求包括数据片的第一指纹信息以及内存地址;The first sending and receiving unit of the client is further configured to: in response to the disk permission information, send a storage request to the global deduplication module, where the storage request includes the first fingerprint information and the memory address of the data slice;
全局重删模块还包括:The global deduplication module also includes:
第三发送接收单元,用于响应于存储请求,将第一指纹信息的元数据映射信息存储至存储模块的缓存队列;a third sending and receiving unit, configured to store the metadata mapping information of the first fingerprint information in the cache queue of the storage module in response to the storage request;
阈值检测单元,用于根据预设缓存规则,判断是否将元数据映射信息下盘。The threshold detection unit is configured to determine whether to download the metadata mapping information to the disk according to the preset caching rules.
在一些实施例中,阈值检测单元用于根据预设缓存规则判断是否将元数据存储至存储模块,具体包括:In some embodiments, the threshold detection unit is configured to determine whether to store the metadata in the storage module according to a preset caching rule, and specifically includes:
判断存储模块的缓存队列的存储量是否到达预设阈值;Determine whether the storage capacity of the cache queue of the storage module reaches a preset threshold;
若是,则将元数据映射信息从缓存队列中删除;If so, delete the metadata mapping information from the cache queue;
若否,则按照缓存队列中的预设下盘顺序将元数据映射信息下盘。If not, the metadata mapping information is removed from the disk according to the preset order of removal in the cache queue.
基于同一发明构思,本申请实施例还提供了一种计算机设备,设备包括:Based on the same inventive concept, an embodiment of the present application also provides a computer device, which includes:
一个或多个处理器;one or more processors;
存储装置,用于存储一个或多个程序;a storage device for storing one or more programs;
当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器实现本申请实施例提供的数据重删方法。When one or more programs are executed by one or more processors, the one or more processors implement the data deduplication method provided by the embodiments of the present application.
该电子设备具体可以为桌面计算机、便携式计算机、智能手机、平板电脑、个人数字助理(Personal Digital Assistant,PDA)、服务器等。在一些实施例中,存储装置例如为存储器,如图4所示,该电子设备可以包括处理器201和存储器202。The electronic device may specifically be a desktop computer, a portable computer, a smart phone, a tablet computer, a personal digital assistant (Personal Digital Assistant, PDA), a server, and the like. In some embodiments, the storage device is, for example, a memory. As shown in FIG. 4 , the electronic device may include a
处理器201可以是通用处理器,例如中央处理器(CPU)、数字信号处理器(DigitalSignal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件,可以实现或者执行本申请实施例中公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。The
存储器202作为一种非易失性计算机可读存储介质,可用于存储非易失性软件程序、非易失性计算机可执行程序以及模块。存储器可以包括至少一种类型的存储介质,例如可以包括闪存、硬盘、多媒体卡、卡型存储器、随机访问存储器(Random Access Memory,RAM)、静态随机访问存储器(Static Random Access Memory,SRAM)、可编程只读存储器(Programmable Read Only Memory,PROM)、只读存储器(Read Only Memory,ROM)、带电可擦除可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、磁性存储器、磁盘、光盘等等。存储器是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。本申请实施例中的存储器202还可以是电路或者其它任意能够实现存储功能的装置,用于存储程序指令和/或数据。As a non-volatile computer-readable storage medium, the
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;上述计算机存储介质可以是计算机能够存取的任何可用介质或数据存储设备,包括但不限于:移动存储设备、随机存取存储器(RAM,Random Access Memory)、磁性存储器(例如软盘、硬盘、磁带、磁光盘(MO)等)、光学存储器(例如CD、DVD、BD、HVD等)、以及半导体存储器(例如ROM、EPROM、EEPROM、非易失性存储器(NAND FLASH)、固态硬盘(SSD))等各种可以存储程序代码的介质。Those of ordinary skill in the art can understand that all or part of the steps of implementing the above method embodiments may be completed by program instructions related to hardware, the aforementioned program may be stored in a computer-readable storage medium, and when the program is executed, execute Including the steps of the above-mentioned method embodiment; the above-mentioned computer storage medium can be any available medium or data storage device that can be accessed by a computer, including but not limited to: removable storage device, random access memory (RAM, Random Access Memory), magnetic memory (eg floppy disk, hard disk, magnetic tape, magneto-optical disk (MO), etc.), optical memory (eg CD, DVD, BD, HVD, etc.), and semiconductor memory (eg ROM, EPROM, EEPROM, non-volatile memory (NAND FLASH) , Solid State Drive (SSD)) and other media that can store program codes.
或者,本申请上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、随机存取存储器(RAM,Random Access Memory)、磁性存储器(例如软盘、硬盘、磁带、磁光盘(MO)等)、光学存储器(例如CD、DVD、BD、HVD等)、以及半导体存储器(例如ROM、EPROM、EEPROM、非易失性存储器(NAND FLASH)、固态硬盘(SSD))等各种可以存储程序代码的介质。Alternatively, if the above-mentioned integrated units of the present application are implemented in the form of software function modules and sold or used as independent products, they may also be stored in a computer-readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application can be embodied in the form of software products in essence or in the parts that make contributions to the prior art. The computer software products are stored in a storage medium and include several instructions for A computer device (which may be a personal computer, a server, or a network device, etc.) is caused to execute all or part of the methods described in the various embodiments of the present application. The aforementioned storage media include: removable storage devices, random access memory (RAM, Random Access Memory), magnetic storage (such as floppy disk, hard disk, magnetic tape, magneto-optical disk (MO), etc.), optical storage (such as CD, DVD, BD, etc.) , HVD, etc.), and semiconductor memories (eg, ROM, EPROM, EEPROM, non-volatile memory (NAND FLASH), solid-state disk (SSD), etc.) various media that can store program codes.
在一些可能的实施方式中,本公开提供的方法的各个方面还可以实现为一种程序产品的形式,其包括程序代码,当所述程序产品在计算机设备上运行时,所述程序代码用于使所述计算机设备执行本说明书上述描述的根据本公开各种示例性实施方式的方法中的步骤,例如,所述计算机设备可以执行本公开实施例所记载的数据重删方法。所述程序产品可以采用一个或多个可读介质的任意组合。In some possible implementations, various aspects of the methods provided by the present disclosure may also be implemented in the form of a program product comprising program code for, when the program product is run on a computer device, the program code for The computer device is caused to execute the steps in the methods according to various exemplary embodiments of the present disclosure described above in this specification. For example, the computer device may execute the data deduplication method described in the embodiments of the present disclosure. The program product may employ any combination of one or more readable media.
综上所述,本申请实施例提供的数据重删方法及系统、电子设备、存储介质,通过全局重删模块计算数据片的第一指纹信息,当指纹库中存在第一指纹信息的元数据映射信息时,认为该数据片属于重复数据,并将该数据片删除,从而可以缩减数据存储系统的数据存储量,节省存储空间,节省存储成本。To sum up, the data deduplication method and system, electronic device, and storage medium provided by the embodiments of the present application calculate the first fingerprint information of the data slice through the global deduplication module. When the metadata of the first fingerprint information exists in the fingerprint database When mapping information, the data piece is considered to be duplicate data, and the data piece is deleted, thereby reducing the data storage capacity of the data storage system, saving storage space, and saving storage costs.
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。As will be appreciated by those skilled in the art, the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied therein, including but not limited to disk storage, optical storage, and the like.
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.
尽管已描述了本发明的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。Although preferred embodiments of the present invention have been described, additional changes and modifications to these embodiments may occur to those skilled in the art once the basic inventive concepts are known. Therefore, the appended claims are intended to be construed to include the preferred embodiment and all changes and modifications that fall within the scope of the present invention.
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the present application without departing from the spirit and scope of the present application. Thus, if these modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to include these modifications and variations.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202111593136.4ACN114442931A (en) | 2021-12-23 | 2021-12-23 | Data deduplication method and system, electronic device and storage medium |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202111593136.4ACN114442931A (en) | 2021-12-23 | 2021-12-23 | Data deduplication method and system, electronic device and storage medium |
| Publication Number | Publication Date |
|---|---|
| CN114442931Atrue CN114442931A (en) | 2022-05-06 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202111593136.4APendingCN114442931A (en) | 2021-12-23 | 2021-12-23 | Data deduplication method and system, electronic device and storage medium |
| Country | Link |
|---|---|
| CN (1) | CN114442931A (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114911419A (en)* | 2022-05-07 | 2022-08-16 | 阿里巴巴达摩院(杭州)科技有限公司 | Data storage method, system, storage medium and computer terminal |
| CN116991811A (en)* | 2023-08-02 | 2023-11-03 | 重庆大学 | Distributed collaborative deduplication method and system |
| CN117369731A (en)* | 2023-12-07 | 2024-01-09 | 苏州元脑智能科技有限公司 | Data reduction processing method, device, equipment and medium |
| CN119045747A (en)* | 2024-10-30 | 2024-11-29 | 苏州元脑智能科技有限公司 | Data processing method, computer device, storage medium and program product |
| WO2025039507A1 (en)* | 2023-08-24 | 2025-02-27 | 华为技术有限公司 | Data deduplication method, and related system |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103970875A (en)* | 2014-05-15 | 2014-08-06 | 华中科技大学 | Parallel repeated data deleting method |
| US20150213049A1 (en)* | 2014-01-30 | 2015-07-30 | Netapp, Inc. | Asynchronous backend global deduplication |
| CN106066896A (en)* | 2016-07-15 | 2016-11-02 | 中国人民解放军理工大学 | A kind of big Data duplication applying perception deletes storage system and method |
| US10037337B1 (en)* | 2015-09-14 | 2018-07-31 | Cohesity, Inc. | Global deduplication |
| CN109800218A (en)* | 2019-01-04 | 2019-05-24 | 平安科技(深圳)有限公司 | Distributed memory system, memory node equipment and data duplicate removal method |
| CN111984203A (en)* | 2020-09-27 | 2020-11-24 | 苏州浪潮智能科技有限公司 | A data deduplication method, device, electronic device and storage medium |
| CN112148217A (en)* | 2020-09-11 | 2020-12-29 | 北京浪潮数据技术有限公司 | Caching method, device and medium for deduplication metadata of full flash storage system |
| CN112684975A (en)* | 2019-10-17 | 2021-04-20 | 华为技术有限公司 | Data storage method and device |
| CN112817962A (en)* | 2021-03-16 | 2021-05-18 | 广州鼎甲计算机科技有限公司 | Data storage method and device based on object storage and computer equipment |
| CN113227958A (en)* | 2019-12-03 | 2021-08-06 | 华为技术有限公司 | Apparatus, system, and method for optimization in deduplication |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20150213049A1 (en)* | 2014-01-30 | 2015-07-30 | Netapp, Inc. | Asynchronous backend global deduplication |
| CN103970875A (en)* | 2014-05-15 | 2014-08-06 | 华中科技大学 | Parallel repeated data deleting method |
| US10037337B1 (en)* | 2015-09-14 | 2018-07-31 | Cohesity, Inc. | Global deduplication |
| CN106066896A (en)* | 2016-07-15 | 2016-11-02 | 中国人民解放军理工大学 | A kind of big Data duplication applying perception deletes storage system and method |
| CN109800218A (en)* | 2019-01-04 | 2019-05-24 | 平安科技(深圳)有限公司 | Distributed memory system, memory node equipment and data duplicate removal method |
| CN112684975A (en)* | 2019-10-17 | 2021-04-20 | 华为技术有限公司 | Data storage method and device |
| CN113227958A (en)* | 2019-12-03 | 2021-08-06 | 华为技术有限公司 | Apparatus, system, and method for optimization in deduplication |
| CN112148217A (en)* | 2020-09-11 | 2020-12-29 | 北京浪潮数据技术有限公司 | Caching method, device and medium for deduplication metadata of full flash storage system |
| CN111984203A (en)* | 2020-09-27 | 2020-11-24 | 苏州浪潮智能科技有限公司 | A data deduplication method, device, electronic device and storage medium |
| CN112817962A (en)* | 2021-03-16 | 2021-05-18 | 广州鼎甲计算机科技有限公司 | Data storage method and device based on object storage and computer equipment |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114911419A (en)* | 2022-05-07 | 2022-08-16 | 阿里巴巴达摩院(杭州)科技有限公司 | Data storage method, system, storage medium and computer terminal |
| CN116991811A (en)* | 2023-08-02 | 2023-11-03 | 重庆大学 | Distributed collaborative deduplication method and system |
| WO2025039507A1 (en)* | 2023-08-24 | 2025-02-27 | 华为技术有限公司 | Data deduplication method, and related system |
| CN117369731A (en)* | 2023-12-07 | 2024-01-09 | 苏州元脑智能科技有限公司 | Data reduction processing method, device, equipment and medium |
| CN117369731B (en)* | 2023-12-07 | 2024-02-27 | 苏州元脑智能科技有限公司 | Data reduction processing method, device, equipment and medium |
| CN119045747A (en)* | 2024-10-30 | 2024-11-29 | 苏州元脑智能科技有限公司 | Data processing method, computer device, storage medium and program product |
| Publication | Publication Date | Title |
|---|---|---|
| CN114442931A (en) | Data deduplication method and system, electronic device and storage medium | |
| US8751763B1 (en) | Low-overhead deduplication within a block-based data storage | |
| US8370315B1 (en) | System and method for high performance deduplication indexing | |
| US10468077B2 (en) | Adaptive object buffering and meta-data indexing using persistent memory to improve flash memory durability in tiered storage | |
| CN111381779B (en) | Data processing method, device, equipment and storage medium | |
| US10891074B2 (en) | Key-value storage device supporting snapshot function and operating method thereof | |
| CN106610790B (en) | Method and device for deleting repeated data | |
| US10095624B1 (en) | Intelligent cache pre-fetch | |
| JP6227199B1 (en) | An efficient decompression locality system for demand paging | |
| US20190057090A1 (en) | Method and device of storing data object | |
| CN109446114B (en) | A spatial data caching method, device and storage medium | |
| CN112306974B (en) | A data processing method, device, equipment and storage medium | |
| CN105493080B (en) | Method and device for deduplication data based on context awareness | |
| US11327929B2 (en) | Method and system for reduced data movement compression using in-storage computing and a customized file system | |
| CN107003814A (en) | Effective metadata in the storage system | |
| CN105980992B (en) | A kind of storage system, the method for identification data block stability and device | |
| CN110851436B (en) | Distributed search framework with virtual indexing | |
| CN105917303B (en) | Controller, method for identifying stability of data block and storage system | |
| CN109582642A (en) | File memory method, delet method, server and storage medium | |
| CN114442961B (en) | Data processing method, device, computer equipment and storage medium | |
| CN110888837A (en) | Object storage small file merging method and device | |
| CN110427347A (en) | Method, apparatus, memory node and the storage medium of data de-duplication | |
| CN104750432B (en) | A kind of date storage method and device | |
| US20240311013A1 (en) | Data storage system, intelligent network interface card, and compute node | |
| CN114115734A (en) | Data deduplication method, device, equipment and storage medium |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication | ||
| RJ01 | Rejection of invention patent application after publication | Application publication date:20220506 |