Movatterモバイル変換


[0]ホーム

URL:


CN110147203A - A file management method, device, electronic device and storage medium - Google Patents

A file management method, device, electronic device and storage medium
Download PDF

Info

Publication number
CN110147203A
CN110147203ACN201910411298.8ACN201910411298ACN110147203ACN 110147203 ACN110147203 ACN 110147203ACN 201910411298 ACN201910411298 ACN 201910411298ACN 110147203 ACN110147203 ACN 110147203A
Authority
CN
China
Prior art keywords
storage
file
data
storage region
read
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910411298.8A
Other languages
Chinese (zh)
Other versions
CN110147203B (en
Inventor
尹滔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Cloud Network Technology Co Ltd
Beijing Kingsoft Cloud Technology Co Ltd
Original Assignee
Beijing Kingsoft Cloud Network Technology Co Ltd
Beijing Kingsoft Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Cloud Network Technology Co Ltd, Beijing Kingsoft Cloud Technology Co LtdfiledCriticalBeijing Kingsoft Cloud Network Technology Co Ltd
Priority to CN201910411298.8ApriorityCriticalpatent/CN110147203B/en
Publication of CN110147203ApublicationCriticalpatent/CN110147203A/en
Application grantedgrantedCritical
Publication of CN110147203BpublicationCriticalpatent/CN110147203B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

The embodiment of the invention provides a kind of file management method, device, electronic equipment and storage mediums, wherein file management method includes: to obtain merging file to be written;The merging file is stored to the first storage region, and obtains the index data for each small documents in the merging file;By the index datastore to the second storage region, the reading and writing data performance of second storage region is higher than the reading and writing data performance of first storage region.The embodiment of the present invention can also improve the reading data performance of cloud storage system while saving memory space.

Description

Translated fromChinese
一种文件管理方法、装置、电子设备及存储介质A file management method, device, electronic device and storage medium

技术领域technical field

本发明涉及数据存储技术领域,特别是涉及一种文件管理方法、装置、电子设备及存储介质。The present invention relates to the technical field of data storage, and in particular, to a file management method, device, electronic device and storage medium.

背景技术Background technique

云存储技术是一种用于数据云端存储的技术,云存储系统可以接收客户端通过网络发送的数据,从而对数据进行存储。Cloud storage technology is a technology used for data cloud storage. The cloud storage system can receive the data sent by the client through the network to store the data.

在云存储系统中,数据通常以文件的形式存储,具体地,云存储系统中的存储服务器将文件按照固定大小进行分块,每一块称为一个数据块,并将每个数据块存储至预先在存储介质上划分的存储空间块中,也即,一个存储空间块对应存储一个数据块,而对于大小不足上述固定大小的一个小文件,则需要在存储时单独占据一个存储空间块,当小文件数量过多时,导致出现存储空间浪费的问题。In the cloud storage system, data is usually stored in the form of files. Specifically, the storage server in the cloud storage system divides the file into blocks according to a fixed size, and each block is called a data block, and stores each data block to a predetermined size. Among the storage space blocks divided on the storage medium, that is, one storage space block corresponds to one data block, and for a small file whose size is less than the above-mentioned fixed size, it needs to occupy a separate storage space block during storage. When there are too many files, the problem of wasting storage space occurs.

现有技术通常采用如下方案避免存储空间浪费:将大量小文件合并为一个大文件,将该大文件与各小文件的索引信息分别存储至存储服务器的机械硬盘上。在读取小文件时,先读取小文件的索引信息,再基于索引信息从大文件中读取数据。The prior art generally adopts the following solution to avoid wasting storage space: combine a large number of small files into a large file, and store the large file and the index information of each small file on the mechanical hard disk of the storage server respectively. When reading a small file, the index information of the small file is read first, and then data is read from the large file based on the index information.

但是,由于小文件数量众多,现有技术在每次读取小文件时,都要先读取索引信息再读取数据,即,都要读取机械硬盘两次,因此存储服务器的单机QPS(Query Per Second,每秒查询率)可达1000以上,而机械硬盘能提供的最大QPS一般仅为90,导致数据读取能力无法满足数据读取需求,从而造成存储服务器的数据读取性能低下。However, due to the large number of small files, in the prior art, each time a small file is read, the index information must be read first and then the data, that is, the mechanical hard disk must be read twice, so the single-machine QPS ( Query Per Second (query rate per second) can reach more than 1000, while the maximum QPS that mechanical hard disks can provide is generally only 90, resulting in data read capabilities that cannot meet data read requirements, resulting in low data read performance of the storage server.

发明内容SUMMARY OF THE INVENTION

本发明实施例的目的在于提供一种文件管理方法、装置、电子设备及存储介质,以在节约存储空间的同时提高存储服务器的数据读取性能。具体技术方案如下:The purpose of the embodiments of the present invention is to provide a file management method, an apparatus, an electronic device and a storage medium, so as to improve the data reading performance of the storage server while saving storage space. The specific technical solutions are as follows:

第一方面,本发明实施例提供了一种文件管理方法,应用于云存储系统中的管理服务器,所述管理服务器用于管理所述云存储系统中的多台存储服务器,所述存储服务器用于存储数据,所述方法包括:In a first aspect, an embodiment of the present invention provides a file management method, which is applied to a management server in a cloud storage system, where the management server is used to manage multiple storage servers in the cloud storage system, and the storage server uses a For storing data, the method includes:

获得待写入的合并文件,所述合并文件为多个小文件经合并后得到的文件,所述小文件为大小低于预设阈值的文件;Obtaining a merged file to be written, the merged file is a file obtained after a plurality of small files are merged, and the small file is a file whose size is lower than a preset threshold;

将所述合并文件存储至第一存储区域,并获得针对所述合并文件中各小文件的索引数据,其中,每条所述索引数据中均携带一个所述小文件的存储位置信息;storing the merged file in the first storage area, and obtaining index data for each small file in the merged file, wherein each piece of the index data carries storage location information of one of the small files;

将所述索引数据存储至第二存储区域,其中,所述第二存储区域的数据读写性能高于所述第一存储区域的数据读写性能。The index data is stored in a second storage area, wherein the data read/write performance of the second storage area is higher than the data read/write performance of the first storage area.

可选的,所述将所述合并文件存储至第一存储区域的步骤,包括:Optionally, the step of storing the merged file in the first storage area includes:

将所述合并文件存储至第一存储介质对应的存储区域中;storing the merged file in a storage area corresponding to the first storage medium;

所述将所述索引数据存储至第二存储区域的步骤,包括:The step of storing the index data in the second storage area includes:

将所述索引数据存储至第二存储介质对应的存储区域中,其中,所述第二存储介质的数据读写性能高于所述第一存储介质的数据读写性能。The index data is stored in a storage area corresponding to a second storage medium, wherein the data read/write performance of the second storage medium is higher than the data read/write performance of the first storage medium.

可选的,所述第一存储介质为机械硬盘,所述第二存储介质为固态硬盘SSD。Optionally, the first storage medium is a mechanical hard disk, and the second storage medium is a solid state disk (SSD).

可选的,所述将所述合并文件存储至第一存储区域的步骤,包括:Optionally, the step of storing the merged file in the first storage area includes:

将所述合并文件存储至第一存储服务器的第一存储区域,所述第一存储服务器为所述多台存储服务器中的其中一台;storing the merged file in a first storage area of a first storage server, where the first storage server is one of the multiple storage servers;

所述将所述索引数据存储至第二存储区域的步骤,包括:The step of storing the index data in the second storage area includes:

将所述索引数据存储至所述第一存储服务器的第二存储区域。The index data is stored in the second storage area of the first storage server.

可选的,所述第一存储区域和所述第二存储区域被划分为大小相同的存储空间块,每一个存储空间块用于存储一个数据块,一个所述数据块包括:多个存储目录,以及存储位置指定目录,所述存储目录用于存储数据,所述存储位置指定目录下记录有预设数值,不同的所述预设数值用于指定将数据存储至不同的存储目录下;Optionally, the first storage area and the second storage area are divided into storage space blocks of the same size, each storage space block is used to store a data block, and a data block includes: a plurality of storage directories , and a storage location designation directory, the storage directory is used to store data, and a preset value is recorded under the storage location designation directory, and different preset values are used to specify that the data is stored under different storage directories;

所述将所述合并文件存储至第一存储区域的步骤,包括:The step of storing the merged file in the first storage area includes:

如果所述合并文件的大小不大于所述存储空间块的大小,则将所述合并文件存储至所述第一存储区域的其中一个存储空间块所对应数据块的指定存储目录下;If the size of the merged file is not greater than the size of the storage space block, the merged file is stored in the specified storage directory of the data block corresponding to one of the storage space blocks in the first storage area;

如果所述合并文件的大小大于所述存储空间块的大小,则将所述合并文件存储至所述第一存储区域的多个存储空间块各自对应的数据块的指定存储目录下。If the size of the merged file is larger than the size of the storage space block, the merged file is stored in the specified storage directory of the data blocks corresponding to each of the multiple storage space blocks in the first storage area.

可选的,所述方法还包括:Optionally, the method further includes:

从所述第二存储区域,读取待从所述第一存储区域读取的小文件的索引数据;From the second storage area, read the index data of the small file to be read from the first storage area;

根据所述索引数据,定位所述小文件的数据在所述第一存储区域的存储位置;According to the index data, locate the storage location of the data of the small file in the first storage area;

根据所定位的所述存储位置,从所述第一存储区域所存储的所述合并文件中读取所述小文件。According to the located storage location, the small file is read from the combined file stored in the first storage area.

可选的,所述从所述第二存储区域,读取待从所述第一存储区域读取的小文件的索引数据的步骤,包括:Optionally, the step of reading the index data of the small file to be read from the first storage area from the second storage area includes:

从所述第二存储介质对应的存储区域中,获得待从所述第一存储介质对应的存储区域中读取的小文件的索引数据;Obtain, from the storage area corresponding to the second storage medium, the index data of the small file to be read from the storage area corresponding to the first storage medium;

所述从所述第一存储区域所存储的所述合并文件中读取所述小文件的步骤,包括:The step of reading the small file from the combined file stored in the first storage area includes:

从所述第一存储介质对应的存储区域所存储的所述合并文件中,读取所述小文件。The small file is read from the combined file stored in the storage area corresponding to the first storage medium.

第二方面,本发明实施例提供了一种文件管理装置,应用于云存储系统中的管理服务器,所述管理服务器用于管理所述云存储系统中的多台存储服务器,所述存储服务器用于存储数据,所述装置包括:In a second aspect, an embodiment of the present invention provides a file management device, which is applied to a management server in a cloud storage system, where the management server is used to manage multiple storage servers in the cloud storage system, and the storage server uses For storing data, the device includes:

获得模块,用于获得待写入的合并文件,所述合并文件为多个小文件经合并后得到的文件,所述小文件为大小低于预设阈值的文件;an obtaining module, configured to obtain a merged file to be written, where the merged file is a file obtained after a plurality of small files are merged, and the small file is a file whose size is lower than a preset threshold;

第一存储模块,用于将所述合并文件存储至第一存储区域,并获得针对所述合并文件中各小文件的索引数据,其中,每条所述索引数据中均携带一个所述小文件的存储位置信息;a first storage module, configured to store the merged file in a first storage area, and obtain index data for each small file in the merged file, wherein each piece of the index data carries one of the small files storage location information;

第二存储模块,用于将所述索引数据存储至第二存储区域,其中,所述第二存储区域的数据读写性能高于所述第一存储区域的数据读写性能。The second storage module is configured to store the index data in a second storage area, wherein the data read/write performance of the second storage area is higher than the data read/write performance of the first storage area.

可选的,所述第一存储模块,具体用于:Optionally, the first storage module is specifically used for:

将所述合并文件存储至第一存储介质对应的存储区域中;storing the merged file in a storage area corresponding to the first storage medium;

所述第二存储模块,具体用于:The second storage module is specifically used for:

将所述索引数据存储至第二存储介质对应的存储区域中,其中,所述第二存储介质的数据读写性能高于所述第一存储介质的数据读写性能。The index data is stored in a storage area corresponding to a second storage medium, wherein the data read/write performance of the second storage medium is higher than the data read/write performance of the first storage medium.

可选的,所述第一存储介质为机械硬盘,所述第二存储介质为固态硬盘SSD。Optionally, the first storage medium is a mechanical hard disk, and the second storage medium is a solid state disk (SSD).

可选的,所述第一存储模块,具体用于:Optionally, the first storage module is specifically used for:

将所述合并文件存储至第一存储服务器的第一存储区域,所述第一存储服务器为所述多台存储服务器中的其中一台;storing the merged file in a first storage area of a first storage server, where the first storage server is one of the multiple storage servers;

所述第二存储模块,具体用于:The second storage module is specifically used for:

将所述索引数据存储至所述第一存储服务器的第二存储区域。The index data is stored in the second storage area of the first storage server.

可选的,所述第一存储区域和所述第二存储区域被划分为大小相同的存储空间块,每一个存储空间块用于存储一个数据块,一个所述数据块包括:多个存储目录,以及存储位置指定目录,所述存储目录用于存储数据,所述存储位置指定目录下记录有预设数值,不同的所述预设数值用于指定将数据存储至不同的存储目录下;Optionally, the first storage area and the second storage area are divided into storage space blocks of the same size, each storage space block is used to store a data block, and a data block includes: a plurality of storage directories , and a storage location designation directory, the storage directory is used to store data, and a preset value is recorded under the storage location designation directory, and different preset values are used to specify that the data is stored under different storage directories;

所述第一存储模块,包括:The first storage module includes:

第一存储子模块,用于如果所述合并文件的大小不大于所述存储空间块的大小,则将所述合并文件存储至所述第一存储区域的其中一个存储空间块所对应数据块的指定存储目录下;The first storage submodule is configured to store the combined file in the data block corresponding to one of the storage space blocks in the first storage area if the size of the combined file is not greater than the size of the storage space block. under the specified storage directory;

第二存储子模块,用于如果所述合并文件的大小大于所述存储空间块的大小,则将所述合并文件存储至所述第一存储区域的多个存储空间块各自对应的数据块的指定存储目录下。a second storage submodule, configured to store the combined file in the data blocks corresponding to the multiple storage space blocks in the first storage area if the size of the combined file is larger than the size of the storage space block under the specified storage directory.

可选的,所述装置还包括:Optionally, the device further includes:

第一读取模块,用于从所述第二存储区域,读取待从所述第一存储区域读取的小文件的索引数据;a first reading module, configured to read, from the second storage area, the index data of the small file to be read from the first storage area;

定位模块,用于根据所述索引数据,定位所述小文件的数据在所述第一存储区域的存储位置;a positioning module, configured to locate the storage location of the data of the small file in the first storage area according to the index data;

第二读取模块,用于根据所定位的所述存储位置,从所述第一存储区域所存储的所述合并文件中读取所述小文件。The second reading module is configured to read the small file from the combined file stored in the first storage area according to the located storage location.

可选的,所述第一读取模块,具体用于:Optionally, the first reading module is specifically used for:

从所述第二存储介质对应的存储区域中,获得待从所述第一存储介质对应的存储区域中读取的小文件的索引数据;Obtain, from the storage area corresponding to the second storage medium, the index data of the small file to be read from the storage area corresponding to the first storage medium;

所述第二读取模块,具体用于:The second reading module is specifically used for:

从所述第一存储介质对应的存储区域所存储的所述合并文件中,读取所述小文件。The small file is read from the combined file stored in the storage area corresponding to the first storage medium.

第三方面,本发明实施例提供了一种电子设备,包括处理器和机器可读存储介质,所述机器可读存储介质存储有能够被所述处理器执行的机器可执行指令,所述处理器执行所述机器可执行指令以实现上述第一方面提供的文件管理方法的方法步骤。In a third aspect, an embodiment of the present invention provides an electronic device, including a processor and a machine-readable storage medium, where the machine-readable storage medium stores machine-executable instructions that can be executed by the processor, and the processing The computer executes the machine-executable instructions to implement the method steps of the file management method provided in the first aspect.

第四方面,本发明实施例提供了一种计算机可读存储介质,所述计算机可读存储介质内存储有计算机程序,所述计算机程序被处理器执行时,实现上述第一方面提供的文件管理方法的方法步骤。In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the file management provided in the first aspect above is implemented Method steps of the method.

第五方面,本发明实施例还提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面提供的文件管理方法的方法步骤。In a fifth aspect, an embodiment of the present invention further provides a computer program product including instructions, which, when running on a computer, causes the computer to execute the method steps of the file management method provided in the first aspect.

第六方面,本发明实施例还提供了一种计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面提供的文件管理方法的方法步骤。In a sixth aspect, an embodiment of the present invention further provides a computer program that, when running on a computer, causes the computer to execute the method steps of the file management method provided in the first aspect.

本发明实施例提供的一种文件管理方法、装置、电子设备及存储介质,在获得待写入的合并文件后,通过将合并文件存储至第一存储区域,并获得针对合并文件中各小文件的索引数据,再将索引数据存储至第二存储区域,由于第二存储区域的数据读写性能高于第一存储区域的数据读写性能,因此在读取小文件时,能够利用第二存储区域的高读写性能更加快速地读取各小文件的索引数据,进而根据所读取的索引数据,从第一存储区域所存储的合并文件中读取各小文件,从而避免现有文件管理方法存在的数据读取性能低下的问题,在节约存储空间的同时还能够提高云存储系统的数据读取性能。当然,实施本发明的任一产品或方法必不一定需要同时达到以上所述的所有优点。In a file management method, device, electronic device, and storage medium provided by the embodiments of the present invention, after obtaining the merged file to be written, the merged file is stored in the first storage area, and the corresponding small files in the merged file are obtained by storing the merged file in the first storage area. The index data is stored in the second storage area. Since the data read and write performance of the second storage area is higher than that of the first storage area, the second storage area can be used when reading small files. The high read and write performance of the area can read the index data of each small file more quickly, and then read each small file from the merged file stored in the first storage area according to the read index data, thereby avoiding existing file management. The method has the problem of low data reading performance, which can improve the data reading performance of the cloud storage system while saving storage space. Of course, it is not necessary for any product or method to implement the present invention to simultaneously achieve all of the advantages described above.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.

图1为本发明实施例提供的一种文件管理方法的流程示意图;1 is a schematic flowchart of a file management method according to an embodiment of the present invention;

图2为本发明实施例的云存储系统中的数据管理拓扑结构的示意图;2 is a schematic diagram of a data management topology structure in a cloud storage system according to an embodiment of the present invention;

图3为本发明实施例提供的另一种文件管理方法的流程示意图;3 is a schematic flowchart of another file management method provided by an embodiment of the present invention;

图4为本发明实施例提供的再一种文件管理方法的流程示意图;4 is a schematic flowchart of still another file management method provided by an embodiment of the present invention;

图5为本发明实施例提供的第四种文件管理方法的流程示意图;5 is a schematic flowchart of a fourth file management method provided by an embodiment of the present invention;

图6为本发明实施例中的存储服务器的存储拓扑结构示意图;6 is a schematic diagram of a storage topology structure of a storage server in an embodiment of the present invention;

图7为本发明实施例中数据块的存储内容示意图;7 is a schematic diagram of the storage content of a data block in an embodiment of the present invention;

图8为本发明实施例提供的第五种文件管理方法的流程示意图;8 is a schematic flowchart of a fifth file management method provided by an embodiment of the present invention;

图9为本发明实施例提供的一种文件管理装置的结构示意图;9 is a schematic structural diagram of a file management apparatus according to an embodiment of the present invention;

图10为本发明实施例中第一存储模块的结构示意图;10 is a schematic structural diagram of a first storage module in an embodiment of the present invention;

图11为本发明实施例提供的另一种文件管理装置的结构示意图;11 is a schematic structural diagram of another file management apparatus provided by an embodiment of the present invention;

图12为本发明实施例提供的一种电子设备的结构示意图。FIG. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

现有的文件管理方法,通常还可以采用以下方案:Existing file management methods can usually also adopt the following solutions:

针对待写入的文件,判断文件大小是否小于预设的阈值,如果该文件不大于阈值,则可以将该文件直接存储至基于SSD(Solid State Disk,固态硬盘)的存储服务器中;如果该文件大于阈值,则可以将该文件存储至基于机械硬盘的存储服务器中。但是,这种管理方法使得所有小文件均需存放至SSD中,而目前SSD价格高昂,将导致云存储系统运营成本高。For the file to be written, it is judged whether the file size is smaller than the preset threshold, and if the file is not larger than the threshold, the file can be directly stored in a storage server based on SSD (Solid State Disk, solid state disk); if the file is If it is greater than the threshold, the file can be stored in a storage server based on a mechanical hard disk. However, this management method requires all small files to be stored in SSD, and the current high price of SSD will lead to high operating costs of cloud storage systems.

或者,如果该文件不大于阈值,则可以在内存中为该文件分配预设大小的缓存,将该文件写入缓存;如果该文件大于阈值,则可以将该文件存储至基于机械硬盘的存储服务器中。但是,这种管理方法由于小文件在内存中存放,当机器故障时将会导致数据丢失。Alternatively, if the file is not larger than the threshold, a cache with a preset size can be allocated for the file in memory, and the file can be written into the cache; if the file is larger than the threshold, the file can be stored in a storage server based on a mechanical hard disk middle. However, this management method will result in data loss when the machine fails because small files are stored in memory.

方法实施例1Method Example 1

有鉴如此,如图1所示,本发明实施例首先提供了一种文件管理方法,该方法可以应用于云存储系统中的管理服务器,其中,管理服务器可以用于管理云存储系统中的多台存储服务器,存储服务器用于存储数据,该方法可以包括以下步骤:In view of this, as shown in FIG. 1, an embodiment of the present invention first provides a file management method, which can be applied to a management server in a cloud storage system, wherein the management server can be used to manage multiple files in the cloud storage system. A storage server is used for storing data, and the method may include the following steps:

S101,获得待写入的合并文件。S101, obtaining a merged file to be written.

本发明实施例中,如图2所示,为云存储系统中的数据管理拓扑结构的示意图,其中,一个云存储系统中可以包括多个存储集群,每个存储集群中布设有一台管理服务器(即nameserver)以及若干台(N台,N大于2)存储服务器(即dataserver),管理服务器可以对上述若干台存储服务器统一进行管理,存储服务器用于存储数据,示例性地,当获得待写入数据后,管理服务器可以控制上述存储服务器对该数据进行存储。In this embodiment of the present invention, as shown in FIG. 2, it is a schematic diagram of a data management topology in a cloud storage system, wherein a cloud storage system may include multiple storage clusters, and each storage cluster is provided with a management server ( Nameserver) and several (N, N is greater than 2) storage servers (namely dataservers), the management server can manage the above-mentioned several storage servers in a unified manner, and the storage servers are used to store data. After the data is stored, the management server can control the storage server to store the data.

其中的合并文件,可以是指将多个小文件经合并处理后得到的一个大文件,可以理解,所得到的合并文件的大小相比于各小文件更大,由于可以以一个文件的形式进行存储,因此能够避免一个小文件存不满一个存储空间块的情况发生,即,避免存储空间浪费的问题。The merged file may refer to a large file obtained by merging multiple small files. It can be understood that the size of the obtained merged file is larger than that of each small file, because it can be performed in the form of one file. Therefore, it is possible to avoid the situation that a small file does not fit into one storage space block, that is, to avoid the problem of wasting storage space.

并且,合并文件中的各小文件的数据可以是按顺序保存至合并文件中的,因此更有利于传统机械硬盘读写(传统机械硬盘的随机读取性能远低于连续读取性能)。In addition, the data of each small file in the merged file can be stored in the merged file in sequence, which is more favorable for reading and writing of traditional mechanical hard disks (the random read performance of traditional mechanical hard disks is much lower than the continuous reading performance).

并且,由于此时合并文件还没有存储至存储服务器中,因此可以将上述合并文件称为待写入的合并文件。Moreover, since the merged file has not been stored in the storage server at this time, the aforementioned merged file may be referred to as the merged file to be written.

其中的小文件,可以是指大小低于预设阈值的文件,例如,数据大小低于2KB(Kilobyte,千字节)的文件。当然,云存储系统的开发人员可以根据实际业务需求灵活合理地设置上述阈值,本发明实施例对上述阈值的具体数值不做限定。将多个小文件合并为一个合并文件的过程可以通过现有的文件合并方法得到,其具体过程本发明实施例不再赘述。The small file may refer to a file whose size is lower than a preset threshold, for example, a file whose data size is lower than 2KB (Kilobyte, kilobyte). Of course, the developer of the cloud storage system can set the above threshold flexibly and reasonably according to actual business requirements, and the specific value of the above threshold is not limited in this embodiment of the present invention. The process of merging multiple small files into one merged file can be obtained through an existing file merging method, and the specific process thereof will not be repeated in this embodiment of the present invention.

S102,将合并文件存储至第一存储区域,并获得针对合并文件中各小文件的索引数据。S102: Store the merged file in the first storage area, and obtain index data for each small file in the merged file.

在一个存储集群中可以设置有不同的存储区域,针对不同的存储区域,其对应的数据读写性能可以不同,因此,本发明实施例可以将合并文件存储至第一存储区域,并在存储后,得到该合并文件中各小文件的索引数据,其中,每条索引数据中均携带一个小文件的存储位置信息,也即,一条索引数据对应一个小文件的存储位置信息,存储位置信息用于记录小文件在第一存储区域的存储位置,例如,所存储的存储空间块的标识号,并可以在读取该小文件时起到定位的作用。Different storage areas may be set in a storage cluster, and the corresponding data read/write performance may be different for different storage areas. Therefore, in this embodiment of the present invention, the merged file may be stored in the first storage area, and after storage to obtain the index data of each small file in the merged file, wherein each piece of index data carries the storage location information of one small file, that is, one piece of index data corresponds to the storage location information of one small file, and the storage location information is used for The storage location of the small file in the first storage area, for example, the identification number of the stored storage space block, is recorded, and can play a role in positioning when the small file is read.

上述得到各小文件的索引数据的过程,可以通过现有的文件索引信息生成方法得到,本发明实施例不再赘述。The above process of obtaining the index data of each small file may be obtained through an existing method for generating file index information, which is not repeated in this embodiment of the present invention.

S103,将索引数据存储至第二存储区域,第二存储区域的数据读写性能高于第一存储区域的数据读写性能。S103: Store the index data in a second storage area, where the data read/write performance of the second storage area is higher than the data read/write performance of the first storage area.

在得到索引数据后,本发明实施例可以将索引数据存储至第二存储区域,第二存储区域是与第一存储区域不同的区域,并且,第二存储区域的数据读写性能高于第一存储区域的数据读写性能。通过将索引数据存储到读写性能更高的第二存储区域中,这样在读取大量小文件时,可以利用第二存储区域的高读写性能,更加快速地读取各小文件的索引数据,进而根据所读取的索引数据进一步从合并文件中读取各小文件,能够避免由于传统机械硬盘性能所限导致的数据读取性能低下的问题。另外,由于索引数据相对于文件数据所需存储空间更小,因此,仅将索引数据存储在数据读写性能高的第二存储区域可以节约成本。After the index data is obtained, the embodiment of the present invention may store the index data in a second storage area, where the second storage area is a different area from the first storage area, and the data read/write performance of the second storage area is higher than that of the first storage area Data read and write performance of the storage area. By storing the index data in the second storage area with higher read and write performance, when reading a large number of small files, the high read and write performance of the second storage area can be used to read the index data of each small file more quickly , and further read each small file from the merged file according to the read index data, which can avoid the problem of low data reading performance due to the performance limitation of the traditional mechanical hard disk. In addition, since the index data requires less storage space than the file data, only storing the index data in the second storage area with high data read/write performance can save costs.

本发明实施例提供的一种文件管理方法,在获得待写入的合并文件后,通过将合并文件存储至第一存储区域,并获得针对合并文件中各小文件的索引数据,再将索引数据存储至第二存储区域,由于第二存储区域的数据读写性能高于第一存储区域的数据读写性能,因此在读取小文件时,能够利用第二存储区域的高读写性能更加快速地读取各小文件的索引数据,进而根据所读取的索引数据,从第一存储区域所存储的合并文件中读取各小文件,从而避免现有文件管理方法存在的数据读取性能低下的问题,在节约存储空间的同时还能够提高云存储系统的数据读取性能。In a file management method provided by an embodiment of the present invention, after obtaining a merged file to be written, the merged file is stored in a first storage area, and index data for each small file in the merged file is obtained, and then the index data is stored in the merged file. Store to the second storage area. Since the data read and write performance of the second storage area is higher than that of the first storage area, when reading small files, the high read and write performance of the second storage area can be used faster. The index data of each small file is read, and then according to the read index data, each small file is read from the merged file stored in the first storage area, thereby avoiding the low data reading performance of the existing file management method. In addition to saving storage space, it can also improve the data read performance of the cloud storage system.

方法实施例2Method Example 2

如图3所示,本发明实施例还提供了一种文件管理方法,该方法可以包括以下步骤:As shown in FIG. 3 , an embodiment of the present invention also provides a file management method, which may include the following steps:

S201,获得待写入的合并文件。S201, obtaining a merged file to be written.

该步骤与方法实施例1的步骤S101相同,本发明实施例在此不再赘述。This step is the same as step S101 in Method Embodiment 1, and details are not described herein again in this embodiment of the present invention.

S202,将合并文件存储至第一存储介质对应的存储区域中,并获得针对合并文件中各小文件的索引数据。S202: Store the merged file in a storage area corresponding to the first storage medium, and obtain index data for each small file in the merged file.

该步骤与方法实施例1的步骤S102相似,不同处在于,本发明实施例可以将合并文件存储至第一存储介质对应的存储区域中,即,第一存储区域具体可以为第一存储介质对应的存储区域。This step is similar to step S102 in Method Embodiment 1, except that in this embodiment of the present invention, the merged file may be stored in a storage area corresponding to the first storage medium, that is, the first storage area may specifically be a storage area corresponding to the first storage medium. storage area.

作为本发明实施例一种可选的实施方式,上述第一存储介质例如可以为机械硬盘,则上述将合并文件存储至第一存储介质对应的存储区域中的步骤,即,可以为将合并文件存储至机械硬盘对应的存储区域中。As an optional implementation manner of the embodiment of the present invention, the above-mentioned first storage medium may be, for example, a mechanical hard disk, and the above-mentioned step of storing the merged file in the storage area corresponding to the first storage medium may be: Store it in the storage area corresponding to the mechanical hard disk.

S203,将索引数据存储至第二存储介质对应的存储区域中。S203: Store the index data in a storage area corresponding to the second storage medium.

该步骤与方法实施例1的步骤S103相似,不同处在于,本发明实施例可以将索引数据存储至第二存储介质对应的存储区域中,即,第二存储区域具体可以为第二存储介质对应的存储区域。This step is similar to step S103 in Method Embodiment 1, except that in this embodiment of the present invention, the index data may be stored in a storage area corresponding to the second storage medium, that is, the second storage area may specifically be a storage area corresponding to the second storage medium. storage area.

作为本发明实施例一种可选的实施方式,上述第二存储介质例如可以为SSD,则上述将索引数据存储至第二存储介质对应的存储区域中的步骤,即,可以为将索引数据存储至SSD对应的存储区域中。As an optional implementation of the embodiment of the present invention, the second storage medium may be, for example, an SSD, and the above-mentioned step of storing the index data in the storage area corresponding to the second storage medium may be storing the index data to the storage area corresponding to the SSD.

可以理解,第二存储介质的数据读写性能高于第一存储介质的数据读写性能,因此在读取小文件时,本发明实施例能够更快速地读取各小文件的索引数据。It can be understood that the data reading and writing performance of the second storage medium is higher than that of the first storage medium. Therefore, when reading small files, the embodiment of the present invention can read the index data of each small file more quickly.

本发明实施例提供的一种文件管理方法,由于第二存储介质的数据读写性能高于第一存储介质的数据读写性能,因此在读取小文件时,能够利用第二存储介质的高读写性能更加快速地读取各小文件的索引数据,进而根据所读取的索引数据,从第一存储区域所存储的合并文件中读取各小文件,从而进一步提高云存储系统的数据读取性能。In the file management method provided by the embodiment of the present invention, since the data reading and writing performance of the second storage medium is higher than that of the first storage medium, when reading small files, the high performance of the second storage medium can be used. Read and write performance Read the index data of each small file more quickly, and then read each small file from the merged file stored in the first storage area according to the read index data, thereby further improving the data readability of the cloud storage system Take performance.

方法实施例3Method Example 3

如图4所示,本发明实施例还提供了一种文件管理方法,该方法可以包括以下步骤:As shown in FIG. 4 , an embodiment of the present invention also provides a file management method, which may include the following steps:

S301,获得待写入的合并文件。S301, obtaining a merged file to be written.

该步骤与方法实施例1的步骤S101相同,本发明实施例在此不再赘述。This step is the same as step S101 in Method Embodiment 1, and details are not described herein again in this embodiment of the present invention.

S302,将合并文件存储至第一存储服务器的第一存储区域。S302: Store the merged file in the first storage area of the first storage server.

该步骤与方法实施例1的步骤S102相似,不同处在于,本发明实施例中,管理服务器可以将合并文件存储至一个存储集群中的其中一台存储服务器上,具体地,可以将合并文件存储至第一存储服务器的第一存储区域,上述第一存储服务器即为存储集群中的任意一台存储服务器。This step is similar to step S102 in Method Embodiment 1, except that, in this embodiment of the present invention, the management server may store the merged file on one of the storage servers in a storage cluster. Specifically, the merged file may be stored in To the first storage area of the first storage server, the first storage server is any storage server in the storage cluster.

可选的,可以将合并文件存储至第一存储服务器的第一存储介质对应的存储区域,上述第一存储区域具体可以为第一存储服务器上的第一存储介质所对应的存储区域。Optionally, the combined file may be stored in a storage area corresponding to the first storage medium of the first storage server, and the first storage area may specifically be a storage area corresponding to the first storage medium on the first storage server.

可选的,第一存储介质可以为机械硬盘。Optionally, the first storage medium may be a mechanical hard disk.

S303,将索引数据存储至第一存储服务器的第二存储区域。S303: Store the index data in the second storage area of the first storage server.

该步骤与方法实施例1的步骤S103相似,不同处在于,本发明实施例可以将索引数据存储至一个存储集群中的其中一台存储服务器上,具体地,可以将索引数据存储至第一存储服务器的第二存储区域,上述第一存储服务器即为存储集群中的任意一台存储服务器,该存储服务器的第二存储区域的数据读写性能高于第一存储区域的数据读写性能。This step is similar to step S103 in Method Embodiment 1, except that in this embodiment of the present invention, the index data may be stored in one of the storage servers in a storage cluster, and specifically, the index data may be stored in the first storage The second storage area of the server, the first storage server is any storage server in the storage cluster, and the data read and write performance of the second storage area of the storage server is higher than that of the first storage area.

可选的,可以将索引数据存储至第一存储服务器的第二存储介质对应的存储区域,上述第二存储区域具体可以为第一存储服务器上的第二存储介质所对应的存储区域。Optionally, the index data may be stored in a storage area corresponding to the second storage medium of the first storage server, and the second storage area may specifically be a storage area corresponding to the second storage medium on the first storage server.

可选的,第二存储介质可以为SSD。Optionally, the second storage medium may be an SSD.

相比于现有技术中,将小文件全部存放在SSD磁盘,或者存放在SSD磁盘构成的分布式存储系统中,本发明实施例的成本更低,原因是在相同容量下,SSD比机械硬盘价格更贵,一个存储服务器中,如果磁盘全部更换为SSD,则相比于本发明实施例的文件管理方法,存储成本至少上升100倍。Compared with the prior art, all small files are stored in SSD disks, or in a distributed storage system composed of SSD disks, the cost of the embodiment of the present invention is lower, because under the same capacity, SSD is better than mechanical hard disk. The price is more expensive. In a storage server, if all the disks are replaced with SSDs, the storage cost increases by at least 100 times compared with the file management method in the embodiment of the present invention.

作为本发明实施例一种可选的实施方式,在图4所示流程的基础上,如图5所示,在步骤S302之前,本发明实施例的文件管理方法还可以包括:As an optional implementation manner of the embodiment of the present invention, on the basis of the process shown in FIG. 4, as shown in FIG. 5, before step S302, the file management method of the embodiment of the present invention may further include:

S301’,判断合并文件的大小是否大于存储空间块的大小。S301', judging whether the size of the merged file is larger than the size of the storage space block.

则上述步骤S302具体可以包括:The above step S302 may specifically include:

S3021,如果合并文件的大小不大于存储空间块的大小,则将合并文件存储至第一存储区域的其中一个存储空间块所对应数据块的指定存储目录下。S3021, if the size of the merged file is not greater than the size of the storage space block, store the merged file in the specified storage directory of the data block corresponding to one of the storage space blocks in the first storage area.

本发明实施例中,如果合并文件的大小小于或等于存储空间块的大小,表明只需要一个存储空间块所提供的存储空间,便可以满足该合并文件需要的存储空间。In the embodiment of the present invention, if the size of the combined file is less than or equal to the size of the storage space block, it indicates that only the storage space provided by one storage space block is required to satisfy the storage space required by the combined file.

示例性地,可以将上述合并文件存储在一个存储空间块所对应数据块的存储目录0下,或者存储目录1下。Exemplarily, the above-mentioned merged file may be stored in the storage directory 0 or the storage directory 1 of the data block corresponding to one storage space block.

S3022,如果合并文件的大小大于存储空间块的大小,则将合并文件存储至第一存储区域的多个存储空间块各自对应的数据块的指定存储目录下。S3022 , if the size of the combined file is larger than the size of the storage space block, store the combined file in the designated storage directory of the data blocks corresponding to the multiple storage space blocks in the first storage area.

本发明实施例中,如果合并文件的大小大于存储空间块的大小,表明一个存储空间块所提供的存储空间,无法满足该合并文件所需要的存储空间,因此可以使用多个连续的存储空间块对该合并文件进行存储。In this embodiment of the present invention, if the size of the merged file is larger than the size of the storage space block, it indicates that the storage space provided by one storage space block cannot satisfy the storage space required by the merged file, so multiple consecutive storage space blocks can be used Store the merged file.

同样地,可以将数据存储至上述多个存储空间块各自对应的数据块的指定存储目录下。Similarly, the data may be stored in the designated storage directory of the data blocks corresponding to each of the above-mentioned multiple storage space blocks.

如图6所示,为本发明实施例中的存储服务器的存储拓扑结构示意图,一个存储服务器中通常设置有多块磁盘,组成磁盘阵列。每块磁盘的存储区域,可以按照预设的大小,被划分为多个大小相同的存储空间块(N个,N大于2),各存储空间块可以用于存储数据。As shown in FIG. 6 , which is a schematic diagram of a storage topology structure of a storage server in an embodiment of the present invention, a storage server is usually provided with multiple disks to form a disk array. The storage area of each disk may be divided into multiple storage space blocks (N, where N is greater than 2) of the same size according to a preset size, and each storage space block may be used to store data.

本发明实施例中,示例性地,待写入文件在磁盘的存放路径可以表示为:/data/vols/vol1/phenix_data/00000000000000709106501,其中,data/、vols/、vol1/、phenix_data/均表示逐级的存储路径,phenix_data/之后的一串数字中,第一位数字表示数据块的类型,0表示三副本类型;第一位后的20位数字位表示存储空间块的标识号,即该存储空间块在一个存储集群的唯一标识,当数据(例如,以idx格式保存的数据,或者,以dat格式保存的数据)存储至存储空间块所对应的数据块后,管理服务器可以记录该数据所存储的存储空间块的标识号,从而生成索引数据,以便于以后查找和读取;最后两位表示该副本在三副本中的唯一标识号,取值通常为0-2中的一个,如果出现大于3个副本的情况,则可能大于2。In the embodiment of the present invention, exemplarily, the storage path of the to-be-written file on the disk may be expressed as: /data/vols/vol1/phenix_data/00000000000000709106501, where data/, vols/, vol1/, and phenix_data/ all represent one-by-one Level storage path, in a string of numbers after phenix_data/, the first digit represents the type of data block, 0 represents the three-copy type; the 20 digits after the first digit represent the identification number of the storage space block, that is, the storage The unique identifier of a space block in a storage cluster, when data (for example, data saved in idx format, or data saved in dat format) is stored in the data block corresponding to the storage space block, the management server can record the data. The identification number of the stored storage space block, so as to generate index data for later searching and reading; the last two bits represent the unique identification number of the copy in the three copies, and the value is usually one of 0-2. In the case of more than 3 replicas, it may be greater than 2.

如图7所示,为本发明实施例中数据块的存储内容示意图,每一个数据块目录下可以包括不同的目录,这些目录可以为:第一存储目录,以数字0表示该目录的名称;第二存储目录,以数字1表示该目录的名称;以及一个存储位置指定目录,以英文CURRENT表示该目录的名称。As shown in Figure 7, it is a schematic diagram of the storage content of the data block in the embodiment of the present invention, each data block directory may include different directories, and these directories may be: the first storage directory, and the number 0 represents the name of the directory; The second storage directory, the number 1 represents the name of the directory; and a storage location designation directory, the English CURRENT represents the name of the directory.

其中,存储目录(即存储目录0和存储目录1)可以用于存储数据,每个存储目录中可以保存索引数据,即idx格式的数据,还可以保存文件数据,即dat格式的数据。The storage directories (ie, storage directory 0 and storage directory 1) can be used to store data, and each storage directory can store index data, that is, data in idx format, and file data, that is, data in dat format.

存储位置指定目录(即CURRENT目录)用于指定将数据存储至哪一个存储目录下,即,用于指定将数据存储至存储目录0,或者将数据存储至存储目录01下。The storage location designation directory (ie, the CURRENT directory) is used to designate which storage directory to store the data in, that is, to designate to store the data in the storage directory 0 or store the data in the storage directory 01.

存储位置指定目录的指定原理为:在存储位置指定目录下可以保存数字0或1,当存储位置指定目录中保存的数值为0时,表示存储目录0为当前的存储目录,可以将当前的数据存储至存储目录0下;当存储位置指定目录中保存的数值为1时,表示存储目录1为当前的存储目录,可以将当前的数据存储至存储目录1下。The designation principle of the storage location specified directory is: the number 0 or 1 can be saved in the storage location specified directory. When the value stored in the storage location specified directory is 0, it means that the storage directory 0 is the current storage directory, and the current data can be stored. Store it in storage directory 0; when the value stored in the specified directory of the storage location is 1, it means that storage directory 1 is the current storage directory, and the current data can be stored in storage directory 1.

本发明实施例中,示例性地,索引数据在磁盘的存放路径可以表示为:/data/phenix_idx/{0..35}/00000000000000709106501/0/idx,其中,data/、phenix_idx/均表示逐级的存储路径;{0..35}表示索引数据所对应的文件数据所在磁盘的序号;phenix_idx/之后的一串数字中,第一位数字表示数据块的类型,0表示三副本类型;第一位后的20位数字位表示dat数据所对应的存储空间块的标识号,即该存储空间块在一个存储集群的唯一标识;最后两位表示该副本在三副本中的唯一标识号,取值通常为0-2中的一个,如果出现大于3个副本的情况,则可能大于2。In the embodiment of the present invention, exemplarily, the storage path of the index data on the disk may be represented as: /data/phenix_idx/{0..35}/00000000000000709106501/0/idx, where data/ and phenix_idx/ both represent level-by-level storage path; {0..35} indicates the serial number of the disk where the file data corresponding to the index data is located; in a string of numbers after phenix_idx/, the first digit indicates the type of the data block, and 0 indicates the three-copy type; the first The 20 digits after the digit represent the identification number of the storage space block corresponding to the dat data, that is, the unique identification of the storage space block in a storage cluster; the last two digits represent the unique identification number of the copy in three copies, and the value is Usually one of 0-2, possibly greater than 2 if there are more than 3 replicas.

上述磁盘阵列,例如可以为,由36块机械硬盘和1块SSD组成的磁盘阵列,其中,36块机械硬盘用于存储dat格式的文件,1块SSD用于存储idx格式的文件。由于本发明实施例将索引数据与文件数据分别存储,因此,对于机械硬盘中各数据块对应的存储目录,可以只保存文件数据,对于SSD中各数据块对应的存储目录,可以只保存索引数据,可选的,上述索引数据和文件数据均可以以二进制格式保存。The above-mentioned disk array can be, for example, a disk array composed of 36 mechanical hard disks and 1 SSD, wherein 36 mechanical hard disks are used for storing files in dat format, and 1 SSD is used for storing files in idx format. Since the embodiment of the present invention stores the index data and the file data separately, therefore, for the storage directory corresponding to each data block in the mechanical hard disk, only the file data may be stored, and for the storage directory corresponding to each data block in the SSD, only the index data may be stored , optionally, the above index data and file data can be saved in binary format.

根据上述存储位置指定目录的指定原理,在对当前存储目录下的数据进行压缩操作时,可以在压缩操作完后,将压缩好的数据存放在另一个存储目录下,同时修改存储位置指定目录中的数值,从而指向另一个存储目录,因此不会影响当前存储目录下的数据。According to the above-mentioned designation principle of the storage location designation directory, when the data in the current storage directory is compressed, the compressed data can be stored in another storage directory after the compression operation, and the storage location designation directory can be modified at the same time. The value of , thus points to another storage directory, so it will not affect the data in the current storage directory.

本发明实施例提供的一种文件管理方法,在获得待写入的合并文件后,通过将合并文件存储至云存储系统中的其中一台存储服务器的第一存储区域,并获得针对合并文件中各小文件的索引数据,再将索引数据存储至该存储服务器的第二存储区域,由于该存储服务器的第二存储区域的数据读写性能高于第一存储区域的数据读写性能,因此在读取小文件时,能够利用该存储服务器第二存储区域的高读写性能更加快速地读取各小文件的索引数据,进而根据所读取的索引数据,从第一存储区域所存储的合并文件中读取各小文件,从而避免现有文件管理方法存在的数据读取性能低下的问题,在节约存储空间的同时还能够提高云存储系统的数据读取性能。In a file management method provided by an embodiment of the present invention, after obtaining a merged file to be written, the merged file is stored in the first storage area of one of the storage servers in the cloud storage system, and the information for the merged file is obtained. The index data of each small file is stored in the second storage area of the storage server. Since the data read and write performance of the second storage area of the storage server is higher than the data read and write performance of the first storage area, the When reading small files, the high read-write performance of the second storage area of the storage server can be used to read the index data of each small file more quickly, and then according to the read index data, the merged data stored in the first storage area can be merged. Each small file is read from the file, thereby avoiding the problem of low data reading performance existing in the existing file management method, and can also improve the data reading performance of the cloud storage system while saving storage space.

方法实施例4Method Example 4

本发明实施例还提供了一种文件管理方法,在上述任一实施例的基础上,如图8所示,该方法可以包括以下步骤:An embodiment of the present invention also provides a file management method. On the basis of any of the foregoing embodiments, as shown in FIG. 8 , the method may include the following steps:

S 401,从第二存储区域读取待从第一存储区域读取的小文件的索引数据。S401: Read index data of the small file to be read from the first storage area from the second storage area.

根据前述实施例可知,第一存储区域存储有合并文件,第二存储区域存储有小文件的索引数据,因此,本发明实施例中,对于待从第一存储区域读取的某一个小文件,可以先从第二存储区域读取该小文件的索引数据,以获取该小文件的存储位置信息。According to the foregoing embodiments, the first storage area stores merged files, and the second storage area stores index data of small files. Therefore, in this embodiment of the present invention, for a certain small file to be read from the first storage area, The index data of the small file may be read from the second storage area first to obtain the storage location information of the small file.

作为本发明实施例一种可选的实施方式,可以从第二存储介质对应的存储区域中,获得待从第一存储介质对应的存储区域中读取的小文件的索引数据,第二存储介质可以为SSD。As an optional implementation of the embodiment of the present invention, the index data of the small file to be read from the storage area corresponding to the first storage medium may be obtained from the storage area corresponding to the second storage medium, and the second storage medium Can be SSD.

S402,根据索引数据,定位小文件的数据在第一存储区域的存储位置。S402, according to the index data, locate the storage location of the data of the small file in the first storage area.

在读取索引数据后,即可根据其中的存储位置信息,定位小文件的数据在第一存储区域的存储位置。After reading the index data, the storage location of the data of the small file in the first storage area can be located according to the storage location information therein.

具体地,根据前述实施例可知,索引数据中可以记录有存储空间块的标识号,因此,当确定存储空间块的标识号后,即可根据标识号定位小文件的文件数据存储在哪个存储空间块对应的数据块中。Specifically, according to the foregoing embodiments, the index data may record the identification number of the storage space block. Therefore, after determining the identification number of the storage space block, it is possible to locate the storage space in which the file data of the small file is stored according to the identification number. in the corresponding data block of the block.

作为本发明实施例一种可选的实施方式,可以从第一存储介质对应的存储区域所存储的合并文件中,读取小文件,可选的,第一存储介质可以为机械硬盘。As an optional implementation manner of the embodiment of the present invention, a small file may be read from a combined file stored in a storage area corresponding to the first storage medium. Optionally, the first storage medium may be a mechanical hard disk.

S403,根据所定位的存储位置,从第一存储区域所存储的合并文件中读取小文件。S403, according to the located storage location, read the small file from the merged file stored in the first storage area.

第一存储区域所存储的合并文件为多个小文件经合并后得到的文件,因此合并文件中包含各小文件的文件数据,当定位到小文件的文件数据存储在哪个存储空间块对应的数据块中后,即可从该数据块中读取小文件的文件数据,也即,从第一存储区域所存储的合并文件中读取小文件。The merged file stored in the first storage area is a file obtained by merging multiple small files. Therefore, the merged file contains the file data of each small file. When the file data of the small file is located, the data corresponding to which storage space block is stored. After the data block is stored, the file data of the small file can be read from the data block, that is, the small file is read from the combined file stored in the first storage area.

本发明实施例提供的一种文件管理方法,通过将合并文件存储至第一存储区域,将索引数据存储至第二存储区域,由于第二存储区域的数据读写性能高于第一存储区域的数据读写性能,因此在读取小文件时,能够利用第二存储区域的高读写性能更加快速地读取各小文件的索引数据,进而根据所读取的索引数据,从第一存储区域所存储的合并文件中读取各小文件,从而避免现有文件管理方法存在的数据读取性能低下的问题,在节约存储空间的同时还能够提高云存储系统的数据读取性能。In a file management method provided by an embodiment of the present invention, the combined file is stored in the first storage area, and the index data is stored in the second storage area. Since the data read and write performance of the second storage area is higher than that of the first storage area Data read and write performance, so when reading small files, the high read and write performance of the second storage area can be used to read the index data of each small file more quickly, and then according to the read index data, from the first storage area Each small file is read from the stored merged file, thereby avoiding the problem of low data reading performance existing in the existing file management method, and can also improve the data reading performance of the cloud storage system while saving storage space.

相应于上面的方法实施例,本发明实施例还提供了相应的装置实施例。Corresponding to the above method embodiments, the embodiments of the present invention further provide corresponding apparatus embodiments.

装置实施例1Device Example 1

如图9所示,本发明实施例提供了一种文件管理装置,可以应用于云存储系统中的管理服务器,管理服务器用于管理云存储系统中的多台存储服务器,存储服务器用于存储数据,一个云存储系统中可以包括多个存储集群,每个存储集群中布设有一台管理服务器以及若干台存储服务器,管理服务器可以对上述若干台存储服务器统一进行管理,存储服务器用于存储数据,示例性地,当获得待写入数据后,管理服务器可以控制上述存储服务器对该数据进行存储,该装置包括:As shown in FIG. 9 , an embodiment of the present invention provides a file management device, which can be applied to a management server in a cloud storage system, where the management server is used to manage multiple storage servers in the cloud storage system, and the storage server is used to store data , a cloud storage system may include multiple storage clusters, and each storage cluster is provided with a management server and several storage servers. The management server can manage the above-mentioned several storage servers in a unified manner, and the storage servers are used to store data. For example Typically, after obtaining the data to be written, the management server can control the above-mentioned storage server to store the data, and the device includes:

获得模块501,用于获得待写入的合并文件,合并文件为多个小文件经合并后得到的文件,小文件为大小低于预设阈值的文件。The obtaining module 501 is configured to obtain a merged file to be written, where the merged file is a file obtained by merging multiple small files, and the small file is a file whose size is lower than a preset threshold.

本发明实施例中的合并文件,可以是指将多个小文件经合并处理后得到的一个大文件,可以理解,所得到的合并文件的大小相比于各小文件更大,由于可以以一个文件的形式进行存储,因此能够避免一个小文件存不满一个存储空间块的情况发生,即,避免存储空间浪费的问题。The merged file in the embodiment of the present invention may refer to a large file obtained by merging multiple small files. It can be understood that the size of the obtained merged file is larger than that of each small file. It is stored in the form of a file, so it can avoid the situation that a small file does not fit into one storage space block, that is, avoid the problem of wasting storage space.

并且,合并文件中的各小文件的数据可以是按顺序保存至合并文件中的,因此更有利于传统机械硬盘读写(传统机械硬盘的随机读取性能远低于连续读取性能)。In addition, the data of each small file in the merged file can be stored in the merged file in sequence, which is more favorable for reading and writing of traditional mechanical hard disks (the random read performance of traditional mechanical hard disks is much lower than the continuous reading performance).

并且,由于此时合并文件还没有存储至存储服务器中,因此可以将上述合并文件称为待写入的合并文件。Moreover, since the merged file has not been stored in the storage server at this time, the aforementioned merged file may be referred to as the merged file to be written.

其中的小文件,可以是指大小低于预设阈值的文件,例如,数据大小低于2KB(Kilobyte,千字节)的文件。当然,云存储系统的开发人员可以根据实际业务需求灵活合理地设置上述阈值,本发明实施例对上述阈值的具体数值不做限定。将多个小文件合并为一个合并文件的过程可以通过现有的文件合并方法得到,其具体过程本发明实施例不再赘述。The small file may refer to a file whose size is lower than a preset threshold, for example, a file whose data size is lower than 2KB (Kilobyte, kilobyte). Of course, the developer of the cloud storage system can set the above threshold flexibly and reasonably according to actual business requirements, and the specific value of the above threshold is not limited in this embodiment of the present invention. The process of merging multiple small files into one merged file can be obtained through an existing file merging method, and the specific process thereof will not be repeated in this embodiment of the present invention.

第一存储模块502,用于将合并文件存储至第一存储区域,并获得针对合并文件中各小文件的索引数据,其中,每条索引数据中均携带一个小文件的存储位置信息。The first storage module 502 is configured to store the merged file in the first storage area, and obtain index data for each small file in the merged file, wherein each piece of index data carries storage location information of a small file.

在一个存储集群中可以设置有不同的存储区域,针对不同的存储区域,其对应的数据读写性能可以不同,因此,本发明实施例可以将合并文件存储至第一存储区域,并在存储后,得到该合并文件中各小文件的索引数据,其中,每条索引数据中均携带一个小文件的存储位置信息,也即,一条索引数据对应一个小文件的存储位置信息,存储位置信息用于记录小文件在第一存储区域的存储位置,例如,所存储的存储空间块的标识号,并可以在读取该小文件时起到定位的作用。Different storage areas may be set in a storage cluster, and the corresponding data read/write performance may be different for different storage areas. Therefore, in this embodiment of the present invention, the merged file may be stored in the first storage area, and after storage to obtain the index data of each small file in the merged file, wherein each piece of index data carries the storage location information of one small file, that is, one piece of index data corresponds to the storage location information of one small file, and the storage location information is used for The storage location of the small file in the first storage area, for example, the identification number of the stored storage space block, is recorded, and can play a role in positioning when the small file is read.

上述得到各小文件的索引数据的过程,可以通过现有的文件索引信息生成方法得到,本发明实施例不再赘述。The above process of obtaining the index data of each small file may be obtained through an existing method for generating file index information, which is not repeated in this embodiment of the present invention.

第二存储模块503,用于将索引数据存储至第二存储区域,其中,第二存储区域的数据读写性能高于第一存储区域的数据读写性能。The second storage module 503 is configured to store the index data in a second storage area, wherein the data read/write performance of the second storage area is higher than the data read/write performance of the first storage area.

在得到索引数据后,本发明实施例可以将索引数据存储至第二存储区域,第二存储区域是与第一存储区域不同的区域,并且,第二存储区域的数据读写性能高于第一存储区域的数据读写性能。通过将索引数据存储到读写性能更高的第二存储区域中,这样在读取大量小文件时,可以利用第二存储区域的高读写性能,更加快速地读取各小文件的索引数据,进而根据所读取的索引数据进一步从合并文件中读取各小文件,能够避免由于传统机械硬盘性能所限导致的数据读取性能低下的问题。另外,由于索引数据相对于文件数据所需存储空间更小,因此,仅将索引数据存储在数据读写性能高的第二存储区域可以节约成本。After the index data is obtained, the embodiment of the present invention may store the index data in a second storage area, where the second storage area is a different area from the first storage area, and the data read/write performance of the second storage area is higher than that of the first storage area Data read and write performance of the storage area. By storing the index data in the second storage area with higher read and write performance, when reading a large number of small files, the high read and write performance of the second storage area can be used to read the index data of each small file more quickly , and further read each small file from the merged file according to the read index data, which can avoid the problem of low data reading performance due to the performance limitation of the traditional mechanical hard disk. In addition, since the index data requires less storage space than the file data, only storing the index data in the second storage area with high data read/write performance can save costs.

作为本发明实施例一种可选的实施方式,第一存储模块,具体可以用于:As an optional implementation manner of the embodiment of the present invention, the first storage module can be specifically used for:

将合并文件存储至第一存储介质对应的存储区域中;storing the merged file in the storage area corresponding to the first storage medium;

第二存储模块,具体可以用于:The second storage module can be specifically used for:

将索引数据存储至第二存储介质对应的存储区域中,其中,第二存储介质的数据读写性能高于第一存储介质的数据读写性能。The index data is stored in the storage area corresponding to the second storage medium, wherein the data read and write performance of the second storage medium is higher than the data read and write performance of the first storage medium.

作为本发明实施例一种可选的实施方式,第一存储介质为机械硬盘,第二存储介质为固态硬盘SSD。As an optional implementation manner of the embodiment of the present invention, the first storage medium is a mechanical hard disk, and the second storage medium is a solid-state disk (SSD).

作为本发明实施例一种可选的实施方式,第一存储模块,具体可以用于:As an optional implementation manner of the embodiment of the present invention, the first storage module can be specifically used for:

将合并文件存储至第一存储服务器的第一存储区域,第一存储服务器为多台存储服务器中的其中一台;storing the merged file in a first storage area of a first storage server, where the first storage server is one of multiple storage servers;

第二存储模块,具体可以用于:The second storage module can be specifically used for:

将索引数据存储至第一存储服务器的第二存储区域。The index data is stored in the second storage area of the first storage server.

作为本发明实施例一种可选的实施方式,第一存储区域和第二存储区域被划分为大小相同的存储空间块,每一个存储空间块用于存储一个数据块,一个数据块包括:多个存储目录,以及存储位置指定目录,存储目录用于存储数据,存储位置指定目录下记录有预设数值,不同的预设数值用于指定将数据存储至不同的存储目录下,如图10所示,则第一存储模块,可以包括:As an optional implementation manner of the embodiment of the present invention, the first storage area and the second storage area are divided into storage space blocks of the same size, each storage space block is used to store a data block, and a data block includes: A storage directory, and a storage location designation directory. The storage directory is used to store data, and preset values are recorded in the storage location designation directory. Different preset values are used to designate data to be stored in different storage directories, as shown in Figure 10. shown, the first storage module may include:

第一存储子模块5021,用于如果合并文件的大小不大于存储空间块的大小,则将合并文件存储至第一存储区域的其中一个存储空间块所对应数据块的指定存储目录下。The first storage sub-module 5021 is configured to store the combined file in the specified storage directory of the data block corresponding to one of the storage space blocks in the first storage area if the size of the combined file is not greater than the size of the storage space block.

第二存储子模块5022,用于如果合并文件的大小大于存储空间块的大小,则将合并文件存储至第一存储区域的多个存储空间块各自对应的数据块的指定存储目录下。The second storage sub-module 5022 is configured to store the combined file in the specified storage directory of the data blocks corresponding to the multiple storage space blocks in the first storage area if the size of the combined file is greater than the size of the storage space block.

本发明实施例提供的一种文件管理装置,在获得待写入的合并文件后,通过将合并文件存储至第一存储区域,并获得针对合并文件中各小文件的索引数据,再将索引数据存储至第二存储区域,由于第二存储区域的数据读写性能高于第一存储区域的数据读写性能,因此在读取小文件时,能够利用第二存储区域的高读写性能更加快速地读取各小文件的索引数据,进而根据所读取的索引数据,从第一存储区域所存储的合并文件中读取各小文件,从而避免现有文件管理方法存在的数据读取性能低下的问题,在节约存储空间的同时还能够提高云存储系统的数据读取性能。In a file management device provided by an embodiment of the present invention, after obtaining a merged file to be written, the merged file is stored in a first storage area, and index data for each small file in the merged file is obtained, and then the index data is stored in the merged file. Store to the second storage area. Since the data read and write performance of the second storage area is higher than that of the first storage area, when reading small files, the high read and write performance of the second storage area can be used faster. The index data of each small file is read, and then according to the read index data, each small file is read from the merged file stored in the first storage area, thereby avoiding the low data reading performance of the existing file management method. In addition to saving storage space, it can also improve the data read performance of the cloud storage system.

装置实施例2Device Example 2

本发明实施例还提供了一种文件管理装置,在图9所示实施例的基础上,如图11所示,还可以包括:An embodiment of the present invention further provides a file management device, which, based on the embodiment shown in FIG. 9 , as shown in FIG. 11 , may further include:

第一读取模块601,用于从第二存储区域,读取待从第一存储区域读取的小文件的索引数据。The first reading module 601 is configured to read, from the second storage area, the index data of the small file to be read from the first storage area.

定位模块602,用于根据索引数据,定位小文件的数据在第一存储区域的存储位置。The positioning module 602 is configured to locate the storage position of the data of the small file in the first storage area according to the index data.

第二读取模块603,用于根据所定位的存储位置,从第一存储区域所存储的合并文件中读取小文件。The second reading module 603 is configured to read the small file from the combined file stored in the first storage area according to the located storage location.

作为本发明实施例一种可选的实施方式,第一读取模块,具体可以用于:As an optional implementation manner of the embodiment of the present invention, the first reading module can be specifically used for:

从第二存储介质对应的存储区域中,获得待从第一存储介质对应的存储区域中读取的小文件的索引数据;Obtain, from the storage area corresponding to the second storage medium, the index data of the small file to be read from the storage area corresponding to the first storage medium;

第二读取模块,具体可以用于:The second reading module can be specifically used for:

从第一存储介质对应的存储区域所存储的合并文件中,读取小文件。Small files are read from the combined files stored in the storage area corresponding to the first storage medium.

本发明实施例提供的一种文件管理装置,通过将合并文件存储至第一存储区域,将索引数据存储至第二存储区域,由于第二存储区域的数据读写性能高于第一存储区域的数据读写性能,因此在读取小文件时,能够利用第二存储区域的高读写性能更加快速地读取各小文件的索引数据,进而根据所读取的索引数据,从第一存储区域所存储的合并文件中读取各小文件,从而避免现有文件管理方法存在的数据读取性能低下的问题,在节约存储空间的同时还能够提高云存储系统的数据读取性能。In a file management apparatus provided by an embodiment of the present invention, the combined files are stored in the first storage area, and the index data is stored in the second storage area. Since the data read and write performance of the second storage area is higher than that of the first storage area Data read and write performance, so when reading small files, the high read and write performance of the second storage area can be used to read the index data of each small file more quickly, and then according to the read index data, from the first storage area Each small file is read from the stored merged file, thereby avoiding the problem of low data reading performance existing in the existing file management method, and can also improve the data reading performance of the cloud storage system while saving storage space.

本发明实施例还提供了一种电子设备,具体可以为服务器,如图12所示,该设备700包括处理器701和机器可读存储介质702,机器可读存储介质存储有能够被处理器执行的机器可执行指令,处理器执行机器可执行指令实现以下步骤:An embodiment of the present invention further provides an electronic device, which may be a server specifically. As shown in FIG. 12 , the device 700 includes a processor 701 and a machine-readable storage medium 702, where the machine-readable storage medium stores data that can be executed by the processor. The machine-executable instructions of the processor implement the following steps by executing the machine-executable instructions:

获得待写入的合并文件,合并文件为多个小文件经合并后得到的文件,小文件为大小低于预设阈值的文件;Obtain the merged file to be written, the merged file is a file obtained by merging multiple small files, and the small file is a file whose size is lower than a preset threshold;

将合并文件存储至第一存储区域,并获得针对合并文件中各小文件的索引数据,其中,每条索引数据中均携带一个小文件的存储位置信息;storing the merged file in the first storage area, and obtaining index data for each small file in the merged file, wherein each piece of index data carries storage location information of a small file;

将索引数据存储至第二存储区域,其中,第二存储区域的数据读写性能高于第一存储区域的数据读写性能。The index data is stored in the second storage area, wherein the data read/write performance of the second storage area is higher than the data read/write performance of the first storage area.

机器可读存储介质可以包括随机存取存储器(Random Access Memory,简称RAM),也可以包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。可选的,存储器还可以是至少一个位于远离前述处理器的存储装置。The machine-readable storage medium may include random access memory (Random Access Memory, RAM for short), and may also include non-volatile memory (non-volatile memory), such as at least one disk storage. Optionally, the memory may also be at least one storage device located away from the aforementioned processor.

上述的处理器可以是通用处理器,包括中央处理器(Central Processing Unit,简称CPU)、网络处理器(Network Processor,简称NP)等;还可以是数字信号处理器(Digital Signal Processing,简称DSP)、专用集成电路(Application SpecificIntegrated Circuit,简称ASIC)、现场可编程门阵列(Field-Programmable Gate Array,简称FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。The above-mentioned processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, referred to as CPU), a network processor (Network Processor, referred to as NP), etc.; may also be a digital signal processor (Digital Signal Processing, referred to as DSP) , Application Specific Integrated Circuit (ASIC for short), Field-Programmable Gate Array (FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, and discrete hardware components.

本发明实施例提供的电子设备,在获得待写入的合并文件后,通过将合并文件存储至第一存储区域,并获得针对合并文件中各小文件的索引数据,再将索引数据存储至第二存储区域,由于第二存储区域的数据读写性能高于第一存储区域的数据读写性能,因此在读取小文件时,能够利用第二存储区域的高读写性能更加快速地读取各小文件的索引数据,进而根据所读取的索引数据,从第一存储区域所存储的合并文件中读取各小文件,从而避免现有文件管理方法存在的数据读取性能低下的问题,在节约存储空间的同时还能够提高云存储系统的数据读取性能。In the electronic device provided by the embodiment of the present invention, after obtaining the merged file to be written, the merged file is stored in the first storage area, and the index data for each small file in the merged file is obtained, and then the index data is stored in the first storage area. Second storage area, since the data read/write performance of the second storage area is higher than the data read/write performance of the first storage area, when reading small files, the high read/write performance of the second storage area can be used to read more quickly The index data of each small file, and then according to the read index data, each small file is read from the merged file stored in the first storage area, thereby avoiding the problem of low data reading performance existing in the existing file management method, The data read performance of the cloud storage system can be improved while saving storage space.

本发明实施例还提供了一种计算机可读存储介质,计算机可读存储介质内存储有计算机程序,计算机程序被处理器执行时,用以执行如下步骤:An embodiment of the present invention also provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by the processor, it is used to perform the following steps:

获得待写入的合并文件,合并文件为多个小文件经合并后得到的文件,小文件为大小低于预设阈值的文件;Obtain the merged file to be written, the merged file is a file obtained by merging multiple small files, and the small file is a file whose size is lower than a preset threshold;

将合并文件存储至第一存储区域,并获得针对合并文件中各小文件的索引数据,其中,每条索引数据中均携带一个小文件的存储位置信息;storing the merged file in the first storage area, and obtaining index data for each small file in the merged file, wherein each piece of index data carries storage location information of a small file;

将索引数据存储至第二存储区域,其中,第二存储区域的数据读写性能高于第一存储区域的数据读写性能。The index data is stored in the second storage area, wherein the data read/write performance of the second storage area is higher than the data read/write performance of the first storage area.

本发明实施例提供的计算机可读存储介质,在获得待写入的合并文件后,通过将合并文件存储至第一存储区域,并获得针对合并文件中各小文件的索引数据,再将索引数据存储至第二存储区域,由于第二存储区域的数据读写性能高于第一存储区域的数据读写性能,因此在读取小文件时,能够利用第二存储区域的高读写性能更加快速地读取各小文件的索引数据,进而根据所读取的索引数据,从第一存储区域所存储的合并文件中读取各小文件,从而避免现有文件管理方法存在的数据读取性能低下的问题,在节约存储空间的同时还能够提高云存储系统的数据读取性能。In the computer-readable storage medium provided by the embodiment of the present invention, after the merged file to be written is obtained, the merged file is stored in the first storage area, and the index data for each small file in the merged file is obtained, and then the index data is stored. Store to the second storage area. Since the data read and write performance of the second storage area is higher than that of the first storage area, when reading small files, the high read and write performance of the second storage area can be used faster. The index data of each small file is read, and then according to the read index data, each small file is read from the merged file stored in the first storage area, thereby avoiding the low data reading performance of the existing file management method. In addition to saving storage space, it can also improve the data read performance of the cloud storage system.

本发明实施例还提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行以下步骤:Embodiments of the present invention also provide a computer program product containing instructions, which, when running on a computer, cause the computer to perform the following steps:

获得待写入的合并文件,合并文件为多个小文件经合并后得到的文件,小文件为大小低于预设阈值的文件;Obtain the merged file to be written, the merged file is a file obtained by merging multiple small files, and the small file is a file whose size is lower than a preset threshold;

将合并文件存储至第一存储区域,并获得针对合并文件中各小文件的索引数据,其中,每条索引数据中均携带一个小文件的存储位置信息;storing the merged file in the first storage area, and obtaining index data for each small file in the merged file, wherein each piece of index data carries storage location information of a small file;

将索引数据存储至第二存储区域,其中,第二存储区域的数据读写性能高于第一存储区域的数据读写性能。The index data is stored in the second storage area, wherein the data read/write performance of the second storage area is higher than the data read/write performance of the first storage area.

本发明实施例提供的包含指令的计算机程序产品,在获得待写入的合并文件后,通过将合并文件存储至第一存储区域,并获得针对合并文件中各小文件的索引数据,再将索引数据存储至第二存储区域,由于第二存储区域的数据读写性能高于第一存储区域的数据读写性能,因此在读取小文件时,能够利用第二存储区域的高读写性能更加快速地读取各小文件的索引数据,进而根据所读取的索引数据,从第一存储区域所存储的合并文件中读取各小文件,从而避免现有文件管理方法存在的数据读取性能低下的问题,在节约存储空间的同时还能够提高云存储系统的数据读取性能。The computer program product containing instructions provided by the embodiment of the present invention stores the merged file in the first storage area after obtaining the merged file to be written, obtains index data for each small file in the merged file, and then stores the index data for each small file in the merged file. Data is stored in the second storage area. Since the data read/write performance of the second storage area is higher than that of the first storage area, when reading small files, you can use the high read/write performance of the second storage area to make it easier to read and write. Quickly read the index data of each small file, and then read each small file from the merged file stored in the first storage area according to the read index data, thereby avoiding the data reading performance existing in the existing file management method The low problem can improve the data read performance of the cloud storage system while saving storage space.

本发明实施例还提供了一种计算机程序,当其在计算机上运行时,使得计算机执行以下步骤:The embodiment of the present invention also provides a computer program, when it is run on a computer, it causes the computer to perform the following steps:

获得待写入的合并文件,合并文件为多个小文件经合并后得到的文件,小文件为大小低于预设阈值的文件;Obtain the merged file to be written, the merged file is a file obtained by merging multiple small files, and the small file is a file whose size is lower than a preset threshold;

将合并文件存储至第一存储区域,并获得针对合并文件中各小文件的索引数据,其中,每条索引数据中均携带一个小文件的存储位置信息;storing the merged file in the first storage area, and obtaining index data for each small file in the merged file, wherein each piece of index data carries storage location information of a small file;

将索引数据存储至第二存储区域,其中,第二存储区域的数据读写性能高于第一存储区域的数据读写性能。The index data is stored in the second storage area, wherein the data read/write performance of the second storage area is higher than the data read/write performance of the first storage area.

本发明实施例提供的包含指令的计算机程序,在获得待写入的合并文件后,通过将合并文件存储至第一存储区域,并获得针对合并文件中各小文件的索引数据,再将索引数据存储至第二存储区域,由于第二存储区域的数据读写性能高于第一存储区域的数据读写性能,因此在读取小文件时,能够利用第二存储区域的高读写性能更加快速地读取各小文件的索引数据,进而根据所读取的索引数据,从第一存储区域所存储的合并文件中读取各小文件,从而避免现有文件管理方法存在的数据读取性能低下的问题,在节约存储空间的同时还能够提高云存储系统的数据读取性能。The computer program including instructions provided by the embodiment of the present invention stores the merged file in the first storage area after obtaining the merged file to be written, obtains index data for each small file in the merged file, and then stores the index data in the merged file. Store to the second storage area. Since the data read and write performance of the second storage area is higher than that of the first storage area, when reading small files, the high read and write performance of the second storage area can be used faster. The index data of each small file is read, and then according to the read index data, each small file is read from the merged file stored in the first storage area, thereby avoiding the low data reading performance of the existing file management method. In addition to saving storage space, it can also improve the data read performance of the cloud storage system.

对于装置/电子设备/存储介质实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。For the apparatus/electronic device/storage medium embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for related parts, please refer to the partial description of the method embodiment.

需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that, in this document, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any relationship between these entities or operations. any such actual relationship or sequence exists. Moreover, the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device that includes a list of elements includes not only those elements, but also includes not explicitly listed or other elements inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.

本说明书中的各个实施例均采用相关的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。Each embodiment in this specification is described in a related manner, and the same and similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, as for the system embodiments, since they are basically similar to the method embodiments, the description is relatively simple, and for related parts, please refer to the partial descriptions of the method embodiments.

以上所述仅为本发明的较佳实施例而已,并非用于限定本发明的保护范围。凡在本发明的精神和原则之内所作的任何修改、等同替换、改进等,均包含在本发明的保护范围内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the protection scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims (16)

CN201910411298.8A2019-05-162019-05-16File management method and device, electronic equipment and storage mediumActiveCN110147203B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201910411298.8ACN110147203B (en)2019-05-162019-05-16File management method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201910411298.8ACN110147203B (en)2019-05-162019-05-16File management method and device, electronic equipment and storage medium

Publications (2)

Publication NumberPublication Date
CN110147203Atrue CN110147203A (en)2019-08-20
CN110147203B CN110147203B (en)2022-11-04

Family

ID=67595693

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201910411298.8AActiveCN110147203B (en)2019-05-162019-05-16File management method and device, electronic equipment and storage medium

Country Status (1)

CountryLink
CN (1)CN110147203B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112416880A (en)*2021-01-222021-02-26南京群顶科技有限公司Method and device for optimizing storage performance of mass small files based on real-time merging
CN113420025A (en)*2021-06-112021-09-21广联达科技股份有限公司Component data processing method and device and electronic equipment
CN113495681A (en)*2020-04-072021-10-12杭州萤石软件有限公司NAND FLASH file data access method, device and storage medium
CN114218161A (en)*2021-12-292022-03-22北京百度网讯科技有限公司Index storage method and device, retrieval engine, electronic equipment and storage medium
WO2022083287A1 (en)*2020-10-202022-04-28百果园技术(新加坡)有限公司Storage space management method and apparatus, device, and storage medium
CN117632039A (en)*2024-01-252024-03-01合肥兆芯电子有限公司 Memory management method, memory storage device and memory control circuit unit
CN114218161B (en)*2021-12-292025-10-17北京百度网讯科技有限公司Index storage method, index storage device, search engine, electronic equipment and storage medium

Citations (22)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US5454103A (en)*1993-02-011995-09-26Lsc, Inc.Method and apparatus for file storage allocation for secondary storage using large and small file blocks
CA2596434A1 (en)*2006-08-252008-02-25Dan DodgeFile system having variable logical storage block size
US20130111182A1 (en)*2011-10-262013-05-02International Business Machines CorporationStoring a small file with a reduced storage and memory footprint
CN103577123A (en)*2013-11-122014-02-12河海大学Small file optimization storage method based on HDFS
CN103605726A (en)*2013-11-152014-02-26中安消技术有限公司Method and system for accessing small files, control node and storage node
CN103678579A (en)*2013-12-122014-03-26浪潮电子信息产业股份有限公司Optimizing method for small-file storage efficiency
CN104462563A (en)*2014-12-262015-03-25浙江宇视科技有限公司File storage method and system
CN104536959A (en)*2014-10-162015-04-22南京邮电大学Optimized method for accessing lots of small files for Hadoop
CN104572670A (en)*2013-10-152015-04-29方正国际软件(北京)有限公司Small file storage, query and deletion method and system
CN104991747A (en)*2015-07-302015-10-21湖南亿谷科技发展股份有限公司Method and system for data management
CN105069048A (en)*2015-07-232015-11-18东方网力科技股份有限公司Small file storage method, query method and device
CN105095421A (en)*2015-07-142015-11-25南京国电南自美卓控制系统有限公司Distributed storage method for real-time database
CN105138571A (en)*2015-07-242015-12-09四川长虹电器股份有限公司Distributed file system and method for storing lots of small files
US9286261B1 (en)*2011-11-142016-03-15Emc CorporationArchitecture and method for a burst buffer using flash technology
CN105868286A (en)*2016-03-232016-08-17中国科学院计算技术研究所Parallel adding method and system for merging small files on basis of distributed file system
CN105956183A (en)*2016-05-302016-09-21广东电网有限责任公司电力调度控制中心Method and system for multi-stage optimization storage of a lot of small files in distributed database
CN106021585A (en)*2016-06-022016-10-12同济大学Traffic incident video access method and system based on time-space characteristics
CN106294603A (en)*2016-07-292017-01-04北京奇虎科技有限公司File memory method and device
CN106775446A (en)*2016-11-112017-05-31中国人民解放军国防科学技术大学Based on the distributed file system small documents access method that solid state hard disc accelerates
CN107247714A (en)*2016-06-012017-10-13国家电网公司A kind of small documents access system and method based on distributed storage technology
CN107766374A (en)*2016-08-192018-03-06上海凯翔信息科技有限公司The optimization method and system that a kind of mass small documents storage is read
CN108234594A (en)*2017-11-282018-06-29北京市商汤科技开发有限公司File memory method and device, electronic equipment, program and medium

Patent Citations (22)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US5454103A (en)*1993-02-011995-09-26Lsc, Inc.Method and apparatus for file storage allocation for secondary storage using large and small file blocks
CA2596434A1 (en)*2006-08-252008-02-25Dan DodgeFile system having variable logical storage block size
US20130111182A1 (en)*2011-10-262013-05-02International Business Machines CorporationStoring a small file with a reduced storage and memory footprint
US9286261B1 (en)*2011-11-142016-03-15Emc CorporationArchitecture and method for a burst buffer using flash technology
CN104572670A (en)*2013-10-152015-04-29方正国际软件(北京)有限公司Small file storage, query and deletion method and system
CN103577123A (en)*2013-11-122014-02-12河海大学Small file optimization storage method based on HDFS
CN103605726A (en)*2013-11-152014-02-26中安消技术有限公司Method and system for accessing small files, control node and storage node
CN103678579A (en)*2013-12-122014-03-26浪潮电子信息产业股份有限公司Optimizing method for small-file storage efficiency
CN104536959A (en)*2014-10-162015-04-22南京邮电大学Optimized method for accessing lots of small files for Hadoop
CN104462563A (en)*2014-12-262015-03-25浙江宇视科技有限公司File storage method and system
CN105095421A (en)*2015-07-142015-11-25南京国电南自美卓控制系统有限公司Distributed storage method for real-time database
CN105069048A (en)*2015-07-232015-11-18东方网力科技股份有限公司Small file storage method, query method and device
CN105138571A (en)*2015-07-242015-12-09四川长虹电器股份有限公司Distributed file system and method for storing lots of small files
CN104991747A (en)*2015-07-302015-10-21湖南亿谷科技发展股份有限公司Method and system for data management
CN105868286A (en)*2016-03-232016-08-17中国科学院计算技术研究所Parallel adding method and system for merging small files on basis of distributed file system
CN105956183A (en)*2016-05-302016-09-21广东电网有限责任公司电力调度控制中心Method and system for multi-stage optimization storage of a lot of small files in distributed database
CN107247714A (en)*2016-06-012017-10-13国家电网公司A kind of small documents access system and method based on distributed storage technology
CN106021585A (en)*2016-06-022016-10-12同济大学Traffic incident video access method and system based on time-space characteristics
CN106294603A (en)*2016-07-292017-01-04北京奇虎科技有限公司File memory method and device
CN107766374A (en)*2016-08-192018-03-06上海凯翔信息科技有限公司The optimization method and system that a kind of mass small documents storage is read
CN106775446A (en)*2016-11-112017-05-31中国人民解放军国防科学技术大学Based on the distributed file system small documents access method that solid state hard disc accelerates
CN108234594A (en)*2017-11-282018-06-29北京市商汤科技开发有限公司File memory method and device, electronic equipment, program and medium

Cited By (9)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN113495681A (en)*2020-04-072021-10-12杭州萤石软件有限公司NAND FLASH file data access method, device and storage medium
US12182434B2 (en)2020-04-072024-12-31Hangzhou Ezviz Software Co., Ltd.Method and apparatus for data access of NAND FLASH file, and storage medium
WO2022083287A1 (en)*2020-10-202022-04-28百果园技术(新加坡)有限公司Storage space management method and apparatus, device, and storage medium
CN112416880A (en)*2021-01-222021-02-26南京群顶科技有限公司Method and device for optimizing storage performance of mass small files based on real-time merging
CN113420025A (en)*2021-06-112021-09-21广联达科技股份有限公司Component data processing method and device and electronic equipment
CN114218161A (en)*2021-12-292022-03-22北京百度网讯科技有限公司Index storage method and device, retrieval engine, electronic equipment and storage medium
CN114218161B (en)*2021-12-292025-10-17北京百度网讯科技有限公司Index storage method, index storage device, search engine, electronic equipment and storage medium
CN117632039A (en)*2024-01-252024-03-01合肥兆芯电子有限公司 Memory management method, memory storage device and memory control circuit unit
CN117632039B (en)*2024-01-252024-05-03合肥兆芯电子有限公司 Memory management method, memory storage device and memory control circuit unit

Also Published As

Publication numberPublication date
CN110147203B (en)2022-11-04

Similar Documents

PublicationPublication DateTitle
US12099741B2 (en)Lightweight copying of data using metadata references
US11068455B2 (en)Mapper tree with super leaf nodes
US10127233B2 (en)Data processing method and device in distributed file storage system
US10671285B2 (en)Tier based data file management
CN110147203A (en) A file management method, device, electronic device and storage medium
US10146786B2 (en)Managing deduplication in a data storage system using a Bloomier filter data dictionary
US11237761B2 (en)Management of multiple physical function nonvolatile memory devices
US9864683B1 (en)Managing cache for improved data availability by associating cache pages with respective data objects
US8751547B2 (en)Multiple file system and/or multi-host single instance store techniques
CN108459824B (en) A kind of data modification writing method and device
CN113853778B (en)Cloning method and device of file system
CN108268344B (en)Data processing method and device
CN105493080B (en) Method and device for deduplication data based on context awareness
US20210216231A1 (en)Method, electronic device and computer program product for rebuilding disk array
US11016884B2 (en)Virtual block redirection clean-up
CN105808451B (en)Data caching method and related device
CN115729846A (en)Data storage method and device
CN116016508A (en) A distributed object-based storage system and its control method
US20130007363A1 (en)Control device and control method
US11625184B1 (en)Recalling files from tape
CN113568567B (en)Method for seamless migration of simple storage service by index object, main device and storage server
US20230385240A1 (en)Optimizations for data deduplication operations
CN115756959A (en)Data backup method and device and electronic equipment
CN118796090A (en) IO data storage method and device
US20180210670A1 (en)Storage management of metadata

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp