CN104111804A

Movatterモバイル変換

Info

Publication number: CN104111804A
Application number: CN201410295985.5A
Authority: CN
Inventors: 官全龙; 胡舜; 罗伟其; 翁健
Original assignee: Jinan University
Current assignee: Jinan University
Priority date: 2014-06-27
Filing date: 2014-06-27
Publication date: 2014-10-22
Anticipated expiration: 2034-06-27
Also published as: CN104111804B

Abstract

本发明涉及一种分布式文件系统。其包括：大文件存储服务器用于存储拆分后的大文件数据块，其中大文件为大于预设大小的文件；大文件元数据管理服务器，用于存储大文件的元数据、存储大文件存储服务器上大文件数据块的映射信息、管理大文件的命名空间和处理用户的请求信息；缓存服务器，用于存储小文件、小文件的元数据和缓存部分访问量大的大文件，其中，小文件指小于或等于预设大小的文件。本发明将大文件和小文件分开存储，大文件进行分块存储在大文件存储服务器上，小文件则存储在缓存服务器上，有效提高大小文件读写效率。

The invention relates to a distributed file system. It includes: a large file storage server for storing split large file data blocks, wherein a large file is a file larger than a preset size; a large file metadata management server for storing metadata of large files, storing large file storage Mapping information of large file data blocks on the server, managing the namespace of large files, and processing user request information; the cache server is used to store small files, metadata of small files, and cache some large files with large access volumes. A file refers to a file smaller than or equal to a preset size. The invention stores large files and small files separately, large files are stored in blocks on the large file storage server, and small files are stored on the cache server, thereby effectively improving the efficiency of reading and writing large and small files.

Description

Translated fromChinese

一种分布式文件系统A Distributed File System

技术领域technical field

本发明涉及计算机存储技术领域，更具体地，涉及一种分布式文件系统。The present invention relates to the technical field of computer storage, and more specifically, to a distributed file system.

背景技术Background technique

随着云计算的普及和完善，越来越多的用户将个人或企业数据储存在云端，这些数据不仅包括大文件还包括小文件，这类数据具有数据量大、读取频率高于写入频率、需要进行快速检索等特点。With the popularization and improvement of cloud computing, more and more users store personal or enterprise data in the cloud. These data include not only large files but also small files. This type of data has the characteristics of large data volume and higher reading frequency than writing Frequency, need for quick retrieval, etc.

目前，云服务提供商所使用的文件系统主要分为两类：网络文件系统(Network File System，NFS)和分布式文件系统(Hadoop Distributed File System，HDFS)。网络文件系统指云服务提供商在服务器上进行虚拟分区，划分一块的磁盘空间给用户进行文件存储，而用户每次读写文件都需要先登录远程虚拟服务器，在虚拟磁盘上进行文件读写，该类系统的缺陷在于所有用户数据都保存在同一服务器上，如果服务器故障则对用户正常操作产生重大影响。At present, the file systems used by cloud service providers are mainly divided into two categories: Network File System (Network File System, NFS) and Distributed File System (Hadoop Distributed File System, HDFS). The network file system means that the cloud service provider performs a virtual partition on the server, and divides a piece of disk space for the user to store files. Every time the user reads and writes a file, he needs to log in to the remote virtual server first, and read and write files on the virtual disk. The defect of this type of system is that all user data are stored on the same server, and if the server fails, it will have a significant impact on the normal operation of users.

分布式文件系统指在服务提供商利用多台服务器进行集群共同存储数据的文件系统，用户在读写文件时需要发送请求，后台服务器处理用户请求并将请求结果发还给用户，当前使用最广泛的分布式文件系统是HDFS，然而该系统具有两个主要缺陷：无法高效存储大量的小文件和只有单一命名节点进行全局管理。为了改进这些缺陷后来的研究者提出了多种文件系统，然而却具有各自的缺陷，如针对海量小文件存储提出了TFS（Taobao File System），其将大量的小文件合并为一个大文件存储在数据服务器上，与HDFS相比该方法没有明显的进步，仅仅将小文件整合成一个大文件数据块存储在数据服务器上，且增加了一台备用命名服务器。而备用命名服务器不直接参与处理用户请求，只有当命名服务器宕机后备用命名服务器才代替命名服务器处理用户请求。该方法缺陷在于命名服务器主要负责处理用户请求，且存储空间是固定的，当数据量越来越大，其性能将成为限制TFS发展的瓶颈。且当命名服务器严重故障导致数据丢失时，备用命名服务器需要一边与命名服务器同步数据，一边响应用户的请求，此时备用命名服务器的负荷量过大。在MapR文件系统中，将文件数据块和元数据同时保存在节点上，克服了单一命名服务器的瓶颈，但将大文件和小文件同时存储在一起，浪费了存储资源且不便于管理。Distributed file system refers to a file system in which service providers use multiple servers to store data in clusters. Users need to send requests when reading and writing files. The background server processes user requests and returns the request results to users. It is currently the most widely used The most popular distributed file system is HDFS, but this system has two main defects: it cannot efficiently store a large number of small files and only has a single named node for global management. In order to improve these defects, later researchers proposed a variety of file systems, but they have their own defects. For example, TFS (Taobao File System) was proposed for the storage of massive small files, which merged a large number of small files into one large file and stored in On the data server, compared with HDFS, this method has no obvious progress, only integrates small files into a large file data block and stores it on the data server, and adds a backup name server. The backup name server does not directly participate in processing user requests, and only when the name server goes down does the backup name server replace the name server to process user requests. The disadvantage of this method is that the name server is mainly responsible for processing user requests, and the storage space is fixed. When the amount of data becomes larger and larger, its performance will become a bottleneck restricting the development of TFS. And when the serious failure of the naming server causes data loss, the standby naming server needs to synchronize data with the naming server while responding to the user's request. At this time, the load of the standby naming server is too large. In the MapR file system, file data blocks and metadata are stored on nodes at the same time, which overcomes the bottleneck of a single naming server, but storing large files and small files together at the same time wastes storage resources and is not easy to manage.

当前的分布式文件系统存在无法有效存储小文件并解决单一管理节点的难题。用户的文件数据各种各样，大小各不相同，云服务端的文件系统存储效率至关重要，也直接影响着文件系统的故障响应及恢复速度。设计合理的分布式文件系统，能够快速地恢复文件存储过程出现的故障，有极其重要的意义及实际应用价值。The current distributed file system cannot effectively store small files and solve the problem of a single management node. The user's file data is various and different in size. The storage efficiency of the file system on the cloud server is very important, and it also directly affects the fault response and recovery speed of the file system. It is of great significance and practical application value to design a reasonable distributed file system, which can quickly recover the failure of the file storage process.

发明内容Contents of the invention

本发明为克服上述现有技术所述的至少一种缺陷（不足），提供一种能有效存储小文件的分布式文件系统。In order to overcome at least one defect (deficiency) of the above-mentioned prior art, the present invention provides a distributed file system capable of effectively storing small files.

为解决上述技术问题，本发明的技术方案如下：In order to solve the problems of the technologies described above, the technical solution of the present invention is as follows:

一种分布式文件系统，包括：A distributed file system comprising:

大文件存储服务器用于存储拆分后的大文件数据块，其中大文件为大于预设大小的文件；The large file storage server is used to store the split large file data blocks, where a large file is a file larger than a preset size;

大文件元数据管理服务器，用于存储大文件的元数据、存储大文件存储服务器上大文件数据块的映射信息、管理大文件的命名空间和处理用户的请求信息；The large file metadata management server is used to store the metadata of the large file, store the mapping information of the large file data block on the large file storage server, manage the namespace of the large file and process the request information of the user;

缓存服务器，用于存储小文件、小文件的元数据和缓存部分访问量大的大文件，其中，小文件为小于或等于预设大小的文件。The cache server is used to store small files, metadata of small files, and cache large files with a large number of accesses, wherein small files are files smaller than or equal to a preset size.

上述方案中，所述大文件存储服务器包括若干台，大文件元数据管理服务器包括至少三台，缓存服务器包括至少三台。In the solution above, the large file storage servers include several, the large file metadata management servers include at least three, and the cache servers include at least three.

上述方案中，至少三台大文件元数据管理服务器之间采用自适应方式存储大文件元数据和大文件存储服务器上大文件数据块的映射信息以及承担用户请求的处理任务；In the above solution, at least three large file metadata management servers adopt an adaptive method to store the large file metadata and the mapping information of the large file data blocks on the large file storage server, and undertake the processing tasks requested by users;

至少三台缓存服务器之间采用自适应、动态调整方式存储数据和处理用户请求。At least three cache servers adopt an adaptive and dynamic adjustment method to store data and process user requests.

上述方案中，任何大文件元数据及大文件存储服务器上大文件数据块的映射信息存储在至少2台大文件元数据管理服务器上。In the above solution, any large file metadata and the mapping information of large file data blocks on the large file storage server are stored on at least two large file metadata management servers.

上述方案中，缓存服务器上设置有用于存储小文件元数据以及缓存服务器上所存储大文件的元数据的元数据保存区、用于存储小文件的小文件保存区以及用于缓存部分访问量大的大文件的大文件缓存区。In the above solution, the cache server is provided with a metadata storage area for storing metadata of small files and metadata of large files stored on the cache server, a small file storage area for storing small files, and a storage area for cached parts with large access Large file cache for large files.

上述方案中，缓存服务器中设有计数器，用于实现大文件访问分类机制，具体实现过程为：当用户通过该缓存服务器请求读写某个大文件时，该大文件访问量加1；In the above solution, a counter is set in the cache server to realize the large file access classification mechanism. The specific implementation process is as follows: When the user requests to read and write a large file through the cache server, the access amount of the large file is increased by 1;

设置访问量阀值；Set the traffic threshold;

访问量高于访问量阀值的大文件称为经常访问大文件；Large files whose access volume is higher than the access threshold are called frequently accessed large files;

缓存服务器对于经常访问大文件依据访问量从高到低进行排序。The cache server sorts frequently accessed large files according to the number of visits from high to low.

上述方案中，缓存服务器中存储部分访问量大的大文件的存储方式为：In the above solution, the storage method of storing some large files with a large number of accesses in the cache server is as follows:

当缓存服务器的缓存区空间足够时，缓存服务器将新大文件直接添加到大文件缓存区并在元数据保存区中添加新的大文件的元数据；When the cache space of the cache server is sufficient, the cache server directly adds the new large file to the large file cache and adds the metadata of the new large file in the metadata storage area;

当缓存服务器的大文件缓存区空间不足时，若缓存服务器需要添加一个新的用户经常访问的大文件，则将大文件缓存区中访问量最低的经常访问大文件删除直到空间足够，接着将新的大文件添加进大文件缓存区。When the space in the large file cache area of the cache server is insufficient, if the cache server needs to add a new large file frequently accessed by users, the frequently accessed large file with the lowest access volume in the large file cache area will be deleted until there is enough space, and then the new file will be added. The large files are added to the large file cache.

上述方案中，所述缓存服务器以永久性方式保存小文件元数据，以日志形式永久存储小文件，以更新方式保存经常访问大文件元数据。In the above solution, the cache server stores metadata of small files permanently, stores small files permanently in a log form, and stores metadata of frequently accessed large files in an update mode.

上述方案中，当其中一台大文件元数据管理服务器故障后，系统立刻引导用户请求到其他大文件元数据管理服务器进行处理，直到故障的大文件元数据管理服务器恢复正常；In the above solution, when one of the large file metadata management servers fails, the system immediately guides users to request other large file metadata management servers for processing until the failed large file metadata management server returns to normal;

当故障后恢复的大文件元数据管理服务器为空，则其他大文件元数据管理服务器与该大文件元数据管理服务器同步故障前与其相同的大文件元数据及大文件数据块映射信息。When the recovered large file metadata management server after the failure is empty, other large file metadata management servers synchronize the same large file metadata and large file data block mapping information with the large file metadata management server before the failure.

上述方案中，当其中一台缓存服务器故障后，系统立刻引导用户请求到其他缓存服务器处理，直到故障的缓存服务器恢复正常；In the above solution, when one of the cache servers fails, the system immediately guides the user's request to other cache servers for processing until the failed cache server returns to normal;

若故障后恢复的缓存服务器为空，则其他缓存服务器与该缓存服务器同步故障前与该缓存服务器相同的小文件及小文件元数据。If the cache server recovered after the failure is empty, other cache servers will synchronize with the cache server the same small files and small file metadata as the cache server before the failure.

与现有技术相比，本发明技术方案的有益效果是：Compared with the prior art, the beneficial effects of the technical solution of the present invention are:

（1）本发明的分布式文件系统将大文件和小文件分开存储，大文件进行分块存储在大文件存储服务器上，而小文件则存储在缓存服务器上。当用户需要读写小文件时直接访问缓存服务器然后做出对应操作，这样读写效率远高于传统的先访问元数据管理服务器再访问数据存储服务器的方式。而如果用户需要读写大文件先访问大文件元数据管理服务器，在获得位置信息后访问对应的大文件存储服务器。此系统能有效保存大文件和小文件，并提高了文件的读写效率。(1) The distributed file system of the present invention stores large files and small files separately, large files are stored in blocks on the large file storage server, and small files are stored on the cache server. When users need to read and write small files, they directly access the cache server and then perform corresponding operations, so that the read and write efficiency is much higher than the traditional method of first accessing the metadata management server and then accessing the data storage server. However, if the user needs to read and write large files, he first accesses the large file metadata management server, and then accesses the corresponding large file storage server after obtaining the location information. This system can effectively save large files and small files, and improve the efficiency of reading and writing files.

（2）本发明的系统采用至少3台缓存服务器和至少3台大文件元数据管理服务器，其中同层次各服务器之间互联，能有效突破传统单一管理服务器的瓶颈，当大量用户同时访问少量数据时，系统通过自适应、动态调整方式可以对多台服务器进行负载均衡，避免出现某台服务器因自身处理和存储能力不足但处理任务过重而宕机的情况，有效解决了单一管理节点带来的各种难题。(2) The system of the present invention adopts at least 3 cache servers and at least 3 large file metadata management servers, among which the interconnection between servers at the same level can effectively break through the bottleneck of the traditional single management server. When a large number of users access a small amount of data at the same time , the system can balance the load of multiple servers through self-adaption and dynamic adjustment, avoiding the situation that a certain server is down due to insufficient processing and storage capacity but heavy processing tasks, and effectively solves the problem caused by a single management node. Various puzzles.

（3）本发明的分布式文件系统在某台大文件元数据管理服务器和/或缓存服务器出现故障时，能够及时引导用户请求到其他大文件元数据管理服务器进行处理，能保证用户对于所存储文件的正常操作不受影响，而且由于故障所导致的数据丢失，本发明的系统利用其他服务器通过同步机制恢复故障服务器的数据，能有效提高故障恢复效率。(3) When a large file metadata management server and/or cache server breaks down, the distributed file system of the present invention can promptly guide users to request other large file metadata management servers for processing, and can ensure that the user has no problem with the stored files. The normal operation of the system is not affected, and the data loss caused by the failure, the system of the present invention utilizes other servers to restore the data of the failed server through a synchronization mechanism, which can effectively improve the recovery efficiency of the failure.

附图说明Description of drawings

图1为本发明一种分布式文件系统具体实施例中大文件存储服务器和大文件元数据管理服务器连接的示意图。FIG. 1 is a schematic diagram of the connection between a large file storage server and a large file metadata management server in a specific embodiment of a distributed file system according to the present invention.

图2为本发明一种分布式文件系统具体实施例中大文件元数据管理服务器和缓存服务器连接的示意图。Fig. 2 is a schematic diagram of the connection between a large file metadata management server and a cache server in a specific embodiment of a distributed file system according to the present invention.

图3为本发明一种分布式文件系统具体实施例中缓存服务器内部分区示意图。FIG. 3 is a schematic diagram of internal partitions of a cache server in a specific embodiment of a distributed file system according to the present invention.

具体实施方式Detailed ways

附图仅用于示例性说明，不能理解为对本专利的限制；The accompanying drawings are for illustrative purposes only and cannot be construed as limiting the patent;

为了更好说明本实施例，附图某些部件会有省略、放大或缩小，并不代表实际产品的尺寸；In order to better illustrate this embodiment, some parts in the drawings will be omitted, enlarged or reduced, and do not represent the size of the actual product;

对于本领域技术人员来说，附图中某些公知结构及其说明可能省略是可以理解的。For those skilled in the art, it is understandable that some well-known structures and descriptions thereof may be omitted in the drawings.

在本发明的描述中，需要理解的是，术语“第一”、“第二”仅用于描述目的，而不能理解为指示或暗示相对重要性或隐含所指示的技术特征的数量。由此，限定的“第一”、“第二”的特征可以明示或隐含地包括一个或者更多个该特征。在本发明的描述中，除非另有说明，“多个”的含义是两个或两个以上。In the description of the present invention, it should be understood that the terms "first" and "second" are used for description purposes only, and cannot be interpreted as indicating or implying relative importance or implying the quantity of indicated technical features. Thus, the defined "first" and "second" features may explicitly or implicitly include one or more of these features. In the description of the present invention, unless otherwise specified, "plurality" means two or more.

在本发明的描述中，需要说明的是，除非另有明确的规定和限定，术语“安装”、“连接”应做广义理解，例如，可以是固定连接，也可以是可拆卸连接，或一体地连接；可以是机械连接，也可以是电连接；可以是直接相连，也可以是通过中间媒介间接连接，可以说两个元件内部的连通。对于本领域的普通技术人员而言，可以具体情况理解上述术语在本发明的具体含义。In the description of the present invention, it should be noted that unless otherwise specified and limited, the terms "installation" and "connection" should be understood in a broad sense, for example, it can be a fixed connection, a detachable connection, or an integral Ground connection; it can be a mechanical connection or an electrical connection; it can be a direct connection or an indirect connection through an intermediary. It can be said that the internal communication of two components. Those of ordinary skill in the art can understand the specific meanings of the above terms in the present invention in specific situations.

下面结合附图和实施例对本发明的技术方案做进一步的说明。The technical solutions of the present invention will be further described below in conjunction with the accompanying drawings and embodiments.

实施例1Example 1

如图1和2所示，为本发明一种分布式文件系统具体实施例的架构图。参见图1和2，本具体实施例一种分布式文件系统具体包括大文件存储服务器、大文件元数据管理服务器、缓存服务器，大文件存储服务器、大文件元数据管理服务器、缓存服务器三者顺次连接在一起，其中缓存服务器用于连接用户，以接收用户请求、处理用户请求并返回请求结果给用户，在此过程中，当缓存服务器无法处理用户请求通常会将用户请求转发发给大文件元数据管理服务器进行请求处理，请求处理的结果也将通过缓存服务器返回给用户，此时缓存服务器不对用户请求和请求处理的结果做任何的处理，仅用作数据的转发装置。As shown in Figures 1 and 2, they are architecture diagrams of a specific embodiment of a distributed file system according to the present invention. Referring to Figures 1 and 2, a distributed file system in this specific embodiment specifically includes a large file storage server, a large file metadata management server, and a cache server, and the large file storage server, the large file metadata management server, and the cache server are all in sequence. The cache server is used to connect the user to receive the user request, process the user request and return the request result to the user. During this process, when the cache server cannot handle the user request, it usually forwards the user request to a large file The metadata management server processes the request, and the result of the request processing will be returned to the user through the cache server. At this time, the cache server does not perform any processing on the user request and the result of request processing, and is only used as a data forwarding device.

大文件存储服务器用于存储拆分后的大文件数据块，其中大文件为大于预设大小的文件；在预设过程中，通常通过管理员指定一个阀值，从实际角度以及参考其他文件系统来看，这个阀值通常为1M，当文件大小超过该阀值时为大文件，当低于或等于该阀值时为小文件；其中，大文件数据块的大小通常是固定的，可由管理员预设得到。大文件存储服务器保存数据后通常还将所保存大文件数据块的元数据发送给大文件元数据管理服务器；The large file storage server is used to store split large file data blocks, where a large file is a file larger than a preset size; in the preset process, the administrator usually specifies a threshold, from a practical point of view and with reference to other file systems From the point of view, this threshold is usually 1M. When the file size exceeds this threshold, it is a large file, and when it is lower than or equal to this threshold, it is a small file. Among them, the size of a large file data block is usually fixed and can be managed by preset by the staff. After saving the data, the large file storage server usually sends the metadata of the stored large file data blocks to the large file metadata management server;

大文件元数据管理服务器，用于保存、管理大文件存储服务器上所保存大文件数据块的映射关系，用于保存大文件存储服务器所保存大文件数据块的元数据，用于负责处理用户请求以及用于管理文件的命名空间；The large file metadata management server is used to save and manage the mapping relationship of the large file data blocks stored on the large file storage server, to store the metadata of the large file data blocks stored on the large file storage server, and to be responsible for processing user requests and a namespace for managing files;

基于本具体实施例的方案，本发明的分布式文件系统将大文件进行分块存储在大文件存储服务器上，而小文件则存储在缓存服务器上，通过缓存服务器访问小文件，通过大文件存储服务器和大文件元数据管理服务器访问大文件，对大文件和小文件实施分开保存，一方面能够有效保存小文件。另一方面区别与现有技术中大文件和小文件一起保存的方法，有效节约了存储资源，而且大大方便了文件的管理。Based on the scheme of this specific embodiment, the distributed file system of the present invention stores large files in blocks on the large file storage server, while small files are stored on the cache server, access small files through the cache server, and store large files through large files. The server and the large file metadata management server access the large file, and save the large file and the small file separately, on the one hand, it can effectively save the small file. On the other hand, it differs from the method of storing large files and small files together in the prior art, which effectively saves storage resources and greatly facilitates file management.

在具体实施过程中，为了解决单一管理节点的问题，本具体实施例在该分布式文件系统中设置了若干台大文件存储服务器、至少三台大文件元数据管理服务器和至少三台缓存服务器，所有大文件元数据管理服务器相互互联，所有缓存服务器之间相互互联。如图1和2所示的架构图中分布式文件系统设置了12台大文件存储服务器、3台大文件元数据管理服务器和3台缓存服务器。通常情况下同类型的服务器性能相同，不同类型的服务器性能可以差异化。In the specific implementation process, in order to solve the problem of a single management node, this specific embodiment sets several large file storage servers, at least three large file metadata management servers, and at least three cache servers in the distributed file system. The file metadata management servers are interconnected, and all cache servers are interconnected. In the architecture diagram shown in Figures 1 and 2, the distributed file system has 12 large file storage servers, 3 large file metadata management servers and 3 cache servers. Usually, servers of the same type have the same performance, but the performance of different types of servers can be differentiated.

其中，至少三台大文件元数据管理服务器之间采用自适应、动态调整方式存储大文件元数据和大文件存储服务器上大文件数据块的映射信息以及承担用户请求的处理任务。具体地，进行数据存储和处理用户请求时，在所有大文件元数据管理服务器中性能和存储能力相对较强的大文件元数据管理服务器存储更多大文件数据块的元数据及其映射信息，并承担更多的用户请求处理任务，而性能和存储能力相对较弱的大文件元数据管理服务器则存储相对较少的大文件数据块的元数据及其映射信息，并承担相对较少的用户请求处理任务。Among them, at least three large-file metadata management servers store large-file metadata and mapping information of large-file data blocks on the large-file storage servers in an adaptive and dynamic adjustment manner, and undertake processing tasks requested by users. Specifically, when storing data and processing user requests, among all large file metadata management servers, the large file metadata management server with relatively strong performance and storage capacity stores more metadata of large file data blocks and their mapping information, And undertake more user request processing tasks, while the large file metadata management server with relatively weak performance and storage capacity stores relatively few metadata of large file data blocks and their mapping information, and undertakes relatively few user requests Request processing tasks.

the

其中，至少三台缓存服务器之间采用自适应、动态调整方式存储数据和处理用户请求。具体地，进行数据存储和处理用户请求时，在所有缓存服务器中，对于性能和存储能力相对较强的缓存服务器存储更多的数据，并承担更多的用户请求处理任务，而对于性能和存储能力相对弱的缓存服务器则存储相对较少的数据和承担相对较少的任务。Among them, at least three cache servers adopt an adaptive and dynamic adjustment method to store data and process user requests. Specifically, when storing data and processing user requests, among all cache servers, cache servers with relatively strong performance and storage capacity store more data and undertake more user request processing tasks, while those with relatively strong performance and storage capacity Cache servers with relatively weak capabilities store relatively less data and undertake relatively fewer tasks.

基于上述方案，对大文件、小文件的存储，系统通过自适应、动态调整的方式进行服务器的负载均衡，使性能和存储能力相对较高的服务器和性能和存储能力相对较弱的服务器之间的数据存储和用户请求处理达到一个相对合理的分配，能有效突破传统单一管理服务器的瓶颈，避免出现某台服务器因自身处理和存储能力不足但处理任务过重而宕机的情况。Based on the above scheme, for the storage of large files and small files, the system performs load balancing of servers through self-adaption and dynamic adjustment, so that the server with relatively high performance and storage capacity and the server with relatively weak performance and storage capacity The data storage and user request processing achieve a relatively reasonable allocation, which can effectively break through the bottleneck of the traditional single management server, and avoid the situation that a certain server is down due to insufficient processing and storage capacity but too heavy processing tasks.

在具体实施过程中，为了保证服务器故障情况下用户的正常操作，上述方案中，本具体实施在大文件元数据管理服务器之间采用冗余备份的方式，即任何大文件元数据及大文件存储服务器上大文件数据块的映射信息存储在至少2台大文件元数据管理服务器上。当其中一台大文件元数据管理服务器故障时，还可以从其他大文件元数据管理服务器上获取信息。In the specific implementation process, in order to ensure the normal operation of the user in the case of a server failure, in the above-mentioned solution, this specific implementation adopts a redundant backup method between the large file metadata management servers, that is, any large file metadata and large file storage The mapping information of large file data blocks on the server is stored on at least two large file metadata management servers. When one of the large file metadata management servers fails, information can also be obtained from other large file metadata management servers.

在具体实施过程中，如图3所示，缓存服务器上设置元数据保存区、小文件保存区和大文件缓存区，元数据保存区负责保存该缓存服务器内部所存小文件的元数据以及缓存在缓存服务器上的大文件的元数据，小文件元数据是永久保存的，除非小文件被用户删除或系统故障，而大文件元数据采取更新式删除；小文件保存区，以日志的形式存储小文件，该部分数据是永久保存的，除非被用户删除或系统故障；大文件缓存区，存储用户访问量大的大文件，该部分数据是更新式删除，并非永久保存；In the specific implementation process, as shown in Figure 3, a metadata storage area, a small file storage area, and a large file cache area are set on the cache server, and the metadata storage area is responsible for storing the metadata of the small files stored in the cache server and the cache memory The metadata of large files on the cache server, the metadata of small files is permanently saved, unless the small files are deleted by the user or the system fails, and the metadata of large files is deleted by updating; the storage area for small files stores small files in the form of logs Files, this part of the data is permanently stored unless deleted by the user or the system fails; the large file cache area stores large files with a large number of user accesses, and this part of the data is updated and deleted, not permanently stored;

在具体的实施过程中，在缓存服务器上设置计数器，用于实现缓大文件访问分类机制，其具体实现过程为：当用户通过该缓存服务器访问某个大文件时，该大文件访问量加1；针对大文件预先设置访问量阀值；访问量高于访问量阀值的大文件称为经常访问大文件；缓存服务器对于经常访问大文件依据访问量从高到低进行排序。In the specific implementation process, a counter is set on the cache server to realize the classification mechanism for slowing large file access. The specific implementation process is: when a user accesses a certain large file through the cache server, the access amount of the large file is increased by 1 ;Access volume threshold is preset for large files; large files with access volume higher than the access volume threshold are called frequently accessed large files; the cache server sorts frequently accessed large files according to the access volume from high to low.

在一种优选的实施方式中，缓存服务器中存储部分访问量大的大文件的存储方式为：In a preferred implementation manner, the storage method for storing some large files with a large number of accesses in the cache server is as follows:

在上述方案中，与本发明的分布式文件系统中相关的用户代表所有用户，通常指没有缓存或者很小缓存区域的用户，如智能手表等可穿戴使智能设备。In the above solution, the users related to the distributed file system of the present invention represent all users, and generally refer to users who have no cache or a small cache area, such as wearable smart devices such as smart watches.

在具体实施过程中，在本发明的分布式文件系统中提出了故障恢复机制，包括缓存服务器故障恢复和大文件元数据管理服务器故障恢复两种情况，具体地：In the specific implementation process, a failure recovery mechanism is proposed in the distributed file system of the present invention, including two cases of cache server failure recovery and large file metadata management server failure recovery, specifically:

当其中一台大文件元数据管理服务器故障后，系统立刻引导用户请求到其他大文件元数据管理服务器进行处理，直到故障的大文件元数据管理服务器恢复正常；When one of the large file metadata management servers fails, the system immediately guides users to request other large file metadata management servers for processing until the failed large file metadata management server returns to normal;

当其中一台缓存服务器故障后，系统立刻引导用户请求到其他缓存服务器处理，直到故障的缓存服务器恢复正常；When one of the cache servers fails, the system immediately directs user requests to other cache servers for processing until the failed cache server returns to normal;

下面结合一个具体实例来详细描述本发明的分布式文件系统。The distributed file system of the present invention will be described in detail below in conjunction with a specific example.

如图1所示，分布式文件系统包括12台大文件存储服务器1-12、3台3台缓存服务器1-3和3台大文件元数据管理服务器1-3，大文件元数据管理服务器1保存大文件存储服务器1、2、3、4、5、6、9、10上所存储信息的元数据，大文件元数据管理服务器2保存大文件存储服务器1、2、5、6、7、8、11、12上所存储信息的元数据，大文件元数据管理服务器3保存大文件存储服务器3、4、7、8、9、10、11、12上所存储信息的元数据；As shown in Figure 1, the distributed file system includes 12 large file storage servers 1-12, 3 cache servers 1-3 and 3 large file metadata management servers 1-3, and the large file metadata management server 1 saves large files. The metadata of the stored information on the file storage servers 1, 2, 3, 4, 5, 6, 9, 10, the large file metadata management server 2 saves the large file storage servers 1, 2, 5, 6, 7, 8, The metadata of information stored on 11 and 12, the large file metadata management server 3 saves the metadata of information stored on the large file storage servers 3, 4, 7, 8, 9, 10, 11, and 12;

如图2所示，假设系统存储12个用户的小文件，则3台缓存服务器存储方式为：缓存服务器1保存用户1、2、3、4、5、6、9、10所存储的小文件及其元数据；缓存服务器2保存用户1、2、5、6、7、8、11、12所存储的小文件及其元数据；缓存服务器3保存用户3、4、7、8、9、10、11、12所存储的小文件及其元数据。其中，三台缓存服务器中还分步划分出一定的区域来缓存用户经常访问的大文件。As shown in Figure 2, assuming that the system stores the small files of 12 users, the storage method of the three cache servers is as follows: cache server 1 saves the small files stored by users 1, 2, 3, 4, 5, 6, 9, and 10 and its metadata; cache server 2 saves the small files and their metadata stored by users 1, 2, 5, 6, 7, 8, 11, and 12; cache server 3 saves users 3, 4, 7, 8, 9, 10, 11, 12 Stored small files and their metadata. Among them, the three cache servers also divide certain areas step by step to cache large files frequently accessed by users.

如图2所示，缓存服务器故障恢复过程为：当缓存服务器1发生故障后，用户请求被分配到缓存服务器2和缓存服务器3进行处理，缓存服务器1恢复后当其因特殊原因丢失数据时，缓存服务器2同步有关用户1、2、5、6所存储的小文件及其元数据给缓存服务器1，缓存服务器3同步有关用户3、4、9、10所存储的小文件及其元数据给缓存服务器1。As shown in Figure 2, the cache server failure recovery process is: when cache server 1 fails, user requests are allocated to cache server 2 and cache server 3 for processing, and when cache server 1 recovers and loses data due to special reasons, Cache server 2 synchronizes the small files and their metadata stored by users 1, 2, 5, and 6 to cache server 1, and cache server 3 synchronizes small files and their metadata stored by users 3, 4, 9, and 10 to Cache server 1.

如图1所示，大文件元数据管理服务器故障恢复过程为：当大文件元数据管理服务器1发生故障后，用户请求被分配到大文件元数据管理服务器2和大文件元数据管理服务器3进行处理，大文件元数据管理服务器1恢复后当其因特殊原因丢失数据时，大文件元数据管理服务器2同步大文件存储服务器1、2、5、6相关的数据映射以及元数据给大文件元数据管理服务器1，大文件元数据管理服务器3同步大文件存储服务器3、4、9、10相关的数据映射以及元数据给大文件元数据管理服务器1。As shown in Figure 1, the failure recovery process of the large file metadata management server is as follows: when the large file metadata management server 1 fails, the user request is allocated to the large file metadata management server 2 and the large file metadata management server 3 for processing. Processing, when the large file metadata management server 1 recovers and loses data due to special reasons, the large file metadata management server 2 synchronizes the data mapping and metadata related to the large file storage servers 1, 2, 5, and 6 to the large file metadata. The data management server 1 and the large file metadata management server 3 synchronize the data mapping and metadata related to the large file storage servers 3 , 4 , 9 , and 10 to the large file metadata management server 1 .

相同或相似的标号对应相同或相似的部件；The same or similar reference numerals correspond to the same or similar components;

附图中描述位置关系的用于仅用于示例性说明，不能理解为对本专利的限制；The positional relationship described in the drawings is only for illustrative purposes and cannot be construed as a limitation to this patent;

显然，本发明的上述实施例仅仅是为清楚地说明本发明所作的举例，而并非是对本发明的实施方式的限定。对于所属领域的普通技术人员来说，在上述说明的基础上还可以做出其它不同形式的变化或变动。这里无需也无法对所有的实施方式予以穷举。凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，均应包含在本发明权利要求的保护范围之内。Apparently, the above-mentioned embodiments of the present invention are only examples for clearly illustrating the present invention, rather than limiting the implementation of the present invention. For those of ordinary skill in the art, other changes or changes in different forms can be made on the basis of the above description. It is not necessary and impossible to exhaustively list all the implementation manners here. All modifications, equivalent replacements and improvements made within the spirit and principles of the present invention shall be included within the protection scope of the claims of the present invention.

Claims

Translated fromChinese

1.一种分布式文件系统，其特征在于，包括：1. A distributed file system, characterized in that, comprising:

2.根据权利要求1所述的分布式文件系统，其特征在于，所述大文件存储服务器包括若干台，大文件元数据管理服务器包括至少三台，缓存服务器包括至少三台。2. The distributed file system according to claim 1, wherein the large file storage servers include several, the large file metadata management servers include at least three, and the cache servers include at least three.

3.根据权利要求2所述的分布式文件系统，其特征在于，至少三台大文件元数据管理服务器之间采用自适应、动态调整的方式存储大文件元数据和大文件存储服务器上大文件数据块的映射信息以及承担用户请求的处理任务；3. The distributed file system according to claim 2, characterized in that at least three large file metadata management servers adopt an adaptive and dynamic adjustment method to store large file metadata and large file data on the large file storage server Mapping information of blocks and undertaking the processing tasks requested by users;

4.根据权利要求1所述的分布式文件系统，其特征在于，任何大文件元数据及大文件存储服务器上大文件数据块的映射信息存储在至少2台大文件元数据管理服务器上。4. The distributed file system according to claim 1, wherein any large file metadata and mapping information of large file data blocks on the large file storage server are stored on at least two large file metadata management servers.

5.根据权利要求4所述的分布式文件系统，其特征在于，缓存服务器上设置有用于存储小文件元数据以及缓存服务器上所存储大文件的元数据的元数据保存区、用于存储小文件的小文件保存区以及用于缓存部分访问量大的大文件的大文件缓存区。5. The distributed file system according to claim 4, wherein the cache server is provided with a metadata storage area for storing metadata of small files and metadata of large files stored on the cache server, for storing small A small file storage area for files and a large file cache area for caching some large files with a large number of accesses.

6.根据权利要求5所述的分布式文件系统，其特征在于，缓存服务器中设有计数器，用于实现大文件访问分类机制，具体实现过程为：当用户通过该缓存服务器请求读写某个大文件时，该大文件访问量加1；6. The distributed file system according to claim 5, wherein a counter is set in the cache server to implement a large file access classification mechanism, and the specific implementation process is as follows: When a user requests to read or write a certain file through the cache server When a large file is used, the access amount of the large file is increased by 1;

设置访问量阀值；Set the traffic threshold;

7.根据权利要求6所述的分布式文件系统，其特征在于，缓存服务器中存储部分访问量大的大文件的存储方式为：7. The distributed file system according to claim 6, wherein the cache server stores some large files with a large number of accesses in the following manner:

8.根据权利要求1至7任一项所述的分布式文件系统，其特征在于，所述缓存服务器以永久性方式保存小文件元数据，以日志形式永久存储小文件，以更新方式保存经常访问大文件元数据。8. The distributed file system according to any one of claims 1 to 7, wherein the cache server stores metadata of small files in a permanent manner, permanently stores small files in the form of a log, and stores frequent Access large file metadata.

9.根据权利要求2所述的分布式文件系统，其特征在于，当其中一台大文件元数据管理服务器故障后，系统立刻引导用户请求到其他大文件元数据管理服务器进行处理，直到故障的大文件元数据管理服务器恢复正常；9. The distributed file system according to claim 2, characterized in that, when one of the large file metadata management servers fails, the system immediately guides users to request other large file metadata management servers for processing until the large file metadata management server fails. The file metadata management server returns to normal;

10.根据权利要求2所述的分布式文件系统，其特征在于，当其中一台缓存服务器故障后，系统立刻引导用户请求到其他缓存服务器处理，直到故障的缓存服务器恢复正常；10. The distributed file system according to claim 2, wherein when one of the cache servers fails, the system immediately guides user requests to other cache servers for processing until the failed cache server returns to normal;