Movatterモバイル変換

[0]ホーム

Jump to content

GFS2

Edit links

From Wikipedia, the free encyclopedia

Shared-disk file system for Linux computer clusters

"Global File System" redirects here. For the general concept, seeGlobal file system.

GFS2
Structures
Developer(s)	Red Hat
Full name	Global File System 2
Introduced	2005 withLinux 2.6.19
Directory contents	Hashed (small directories stuffed into inode)
File allocation	bitmap (resource groups)
Bad blocks	No
Limits
Maxno. of files	Variable
Max filename length	255 bytes
Allowed filename characters	All exceptNUL
Features
Dates recorded	attribute modification (ctime), modification (mtime), access (atime)
Date resolution	Nanosecond
Attributes	No-atime, journaled data (regular files only), inherit journaled data (directories only), synchronous-write, append-only, immutable, exhash (dirs only, read only)
File system permissions	Unix permissions,ACLs and arbitrary security attributes
Transparent compression	No
Transparent encryption	No
Data deduplication	across nodes only
Other
Supported operating systems	Linux

GFS
Structures
Developer(s)	Red Hat (formerly,Sistina Software)
Full name	Global File System
Introduced	1996 withIRIX (1996),Linux (1997)
Directory contents	Hashed (small directories stuffed into inode)
File allocation	bitmap (resource groups)
Bad blocks	No
Limits
Maxno. of files	Variable
Max filename length	255 bytes
Allowed filename characters	All except NUL
Features
Dates recorded	attribute modification (ctime), modification (mtime), access (atime)
Date resolution	1s
Attributes	No-atime, journaled data (regular files only), inherit journaled data (directories only), synchronous-write, append-only, immutable, exhash (dirs only, read only)
File system permissions	Unix permissions,ACLs
Transparent compression	No
Transparent encryption	No
Data deduplication	across nodes only
Other
Supported operating systems	IRIX (now obsolete),FreeBSD (now obsolete),Linux

Incomputing, theGlobal File System 2 (GFS2) is ashared-disk file system forLinux computer clusters. GFS2 allows all members of a cluster to have direct concurrent access to the same sharedblock storage, in contrast todistributed file systems which distribute data throughout the cluster. GFS2 can also be used as a local file system on a single computer.

GFS2 has no disconnected operating-mode, and no client or server roles. All nodes in a GFS2 cluster function as peers. Using GFS2 in a cluster requireshardware to allow access to the shared storage, and a lock manager to control access to the storage.The lock manager operates as a separate module: thus GFS2 can use thedistributed lock manager (DLM) forcluster configurations and the "nolock" lock manager for local filesystems. Older versions of GFS also support GULM, a server-based lock manager which implements redundancy via failover.

GFS and GFS2 arefree software, distributed under the terms of theGNU General Public License.^[1]^[2]

History

[edit]

Development of GFS began in 1995 and was originally developed byUniversity of Minnesota professor Matthew O'Keefe and a group of students.^[3] It was originally written forSGI'sIRIX operating system, but in 1998 it was ported toLinux (2.4)^[4] since theopen source code provided a more convenient development platform. In late 1999/early 2000 it made its way toSistina Software, where it lived for a time as anopen-source project. In 2001, Sistina made the choice to make GFS a proprietary product.

Developers forked OpenGFS from the last public release of GFS and then further enhanced it to include updates allowing it to work with OpenDLM. But OpenGFS and OpenDLM became defunct, sinceRed Hat purchased Sistina in December 2003 and released GFS and many cluster-infrastructure pieces under theGPL in late June 2004.

Red Hat subsequently financed further development geared towards bug-fixing and stabilization. A further development,GFS2^[5]^[6] derives from GFS and was included along with itsdistributed lock manager (shared with GFS) in Linux 2.6.19. Red Hat Enterprise Linux 5.2 included GFS2 as a kernel module for evaluation purposes. With the 5.3 update, GFS2 became part of the kernel package.

GFS2 forms part of theFedora,Red Hat Enterprise Linux and associatedCentOS Linux distributions. Users can purchase commercial support to run GFS2 fully supported on top ofRed Hat Enterprise Linux. As of Red Hat Enterprise Linux 8.3, GFS2 is supported incloud computing environments in which shared storage devices are available.^[7]

The following list summarizes some version numbers and major features introduced:

v1.0 (1996)SGI IRIX only
v3.0 Linux port
v4journaling
v5 Redundant lock manager
v6.1 (2005)Distributed lock manager
Linux 2.6.19 -GFS2 and DLM merged into Linux kernel
Red Hat Enterprise Linux 5.3 releases the first fully supported GFS2

Hardware

[edit]

The design of GFS and of GFS2 targetsstorage area network (SAN)-like environments. Although it is possible to use them as a single node filesystem, the full feature-set requires a SAN. This can take the form ofiSCSI,Fibre Channel,ATA over Ethernet (AoE), or any other device which can be presented underLinux as a block device shared by a number of nodes, for example aDRBD device.

Thedistributed lock manager (DLM) requires anIP based network over which to communicate. This is normally justEthernet, but again, there are many other possible solutions. Depending upon the choice of SAN, it may be possible to combine this, but normal practice^{[citation needed]} involves separate networks for the DLM and storage.

The GFS requires afencing mechanism of some kind. This is a requirement of the cluster infrastructure, rather than GFS/GFS2 itself, but it is required for all multi-node clusters. The usual options include power switches and remote access controllers (e.g.DRAC,IPMI, orILO). Virtual and hypervisor-based fencing mechanisms can also be used.Fencing is used to ensure that a node which the cluster believes to be failed cannot suddenly start working again while another node is recovering the journal for the failed node. It can also optionally restart the failed node automatically once the recovery is complete.

Differences from a local filesystem

[edit]

Although the designers of GFS/GFS2 aimed to emulate a local filesystem closely, there are a number of differences to be aware of. Some of these are due to the existing filesystem interfaces not allowing the passing of information relating to the cluster. Some stem from the difficulty of implementing those features efficiently in a clustered manner. For example:

Theflock() system call on GFS/GFS2 is not interruptible bysignals.
Thefcntl() F_GETLK system call returns a PID of any blocking lock. Since this is a cluster filesystem, that PID might refer to a process on any of the nodes which have the filesystem mounted. Since the purpose of this interface is to allow a signal to be sent to the blocking process, this is no longer possible.
Leases are not supported with the lock_dlm (cluster) lock module, but they are supported when used as a local filesystem
dnotify will work on a "same node" basis, but its use with GFS/GFS2 is not recommended
inotify will also work on a "same node" basis, and is also not recommended (but it may become supported in the future)
splice is supported on GFS2 only

The other main difference, and one that is shared by all similar cluster filesystems, is that the cache control mechanism, known as glocks (pronounced Gee-locks) for GFS/GFS2, has an effect across the whole cluster. Eachinode on the filesystem has two glocks associated with it. One (called the iopen glock) keeps track of which processes have the inode open. The other (the inode glock) controls the cache relating to that inode. A glock has four states, UN (unlocked), SH (shared – a read lock), DF (deferred – a read lock incompatible with SH) and EX (exclusive). Each of the four modes maps directly to aDLM lock mode.

When in EX mode, an inode is allowed to cache data andmetadata (which might be "dirty", i.e. waiting for write back to the filesystem). In SH mode, the inode can cache data and metadata, but it must not be dirty. In DF mode, the inode is allowed to cache metadata only, and again it must not be dirty. The DF mode is used only for direct I/O. In UN mode, the inode must not cache any metadata.

In order that operations which change an inode's data or metadata do not interfere with each other, an EX lock is used. This means that certain operations, such as create/unlink of files from thesame directory and writes to thesame file should be, in general, restricted to one node in the cluster. Of course, doing these operations from multiple nodes will work as expected, but due to the requirement to flush caches frequently, it will not be very efficient.

The single most frequently asked question about GFS/GFS2 performance is why the performance can be poor with email servers. The solution is to break up the mail spool into separate directories and to try to keep (so far as is possible) each node reading and writing to a private set of directories.

Journaling

[edit]

GFS and GFS2 are bothjournaled file systems; and GFS2 supports a similar set of journaling modes asext3. Indata=writeback mode, only metadata is journaled. This is the only mode supported by GFS, however it is possible to turn on journaling on individual data-files, but only when they are of zero size. Journaled files in GFS have a number of restrictions placed upon them, such asno support for themmap or sendfile system calls, they also use a different on-disk format from regular files. There is also an "inherit-journal" attribute which when set on a directory causes all files (and sub-directories) created within that directory to have the journal (or inherit-journal, respectively) flag set. This can be used instead of thedata=journal mount option whichext3 supports (and GFS/GFS2 does not).

GFS2 also supportsdata=ordered mode which is similar todata=writeback except that dirty data is synced before each journal flush is completed. This ensures that blocks which have been added to an inode will have their content synced back to disk before the metadata is updated to record the new size and thus prevents uninitialised blocks appearing in a file under node failure conditions. The default journaling mode isdata=ordered, to matchext3's default.

As of 2010^[update], GFS2 does not yet supportdata=journal mode, but it does (unlike GFS) use the same on-disk format for both regular and journaled files, and it also supports the same journaled and inherit-journal attributes. GFS2 also relaxes the restrictions on when a file may have its journaled attribute changed to any time that the file is not open (also the same asext3).

For performance reasons, each node in GFS and GFS2 has its own journal. In GFS the journals are disk extents, in GFS2 the journals are just regular files. The number of nodes which may mount the filesystem at any one time is limited by the number of available journals.

Features of GFS2 compared with GFS

[edit]

GFS2 adds a number of new features which are not in GFS. Here is a summary of those features not already mentioned in the boxes to the right of this page:

The metadata filesystem (really a different root) – seeCompatibility and the GFS2 meta filesystem below
GFS2 specific trace points have been available since kernel 2.6.32
The XFS-style quota interface has been available in GFS2 since kernel 2.6.33
Caching ACLs have been available in GFS2 since 2.6.33
GFS2 supports the generation of "discard" requests for thin provisioning/SCSI TRIM requests
GFS2 supports I/O barriers (on by default, assuming underlying device supports it. Configurable from kernel 2.6.33 and up)
FIEMAP ioctl (to query mappings of inodes on disk)
Splice (system call) support
mmap/splice support for journaled files (enabled by using the same on disk format as for regular files)
Far fewer tunables (making set-up less complicated)
Ordered write mode (as per ext3, GFS only has writeback mode)

Compatibility and the GFS2 meta filesystem

[edit]

GFS2 was designed so that upgrading from GFS would be a simple procedure. To this end, most of the on-disk structure has remained the same as GFS, including thebig-endian byte ordering. There are a few differences though:

GFS2 has a "meta filesystem" through which processes access system files
GFS2 uses the same on-disk format for journaled files as for regular files
GFS2 uses regular (system) files for journals, whereas GFS uses special extents
GFS2 has some other "per_node" system files
The layout of theinode is (very slightly) different
The layout of indirect blocks differs slightly

The journaling systems of GFS and GFS2 are not compatible with each other. Upgrading is possible by means of a tool (gfs2_convert) which is run with the filesystem off-line to update the metadata. Some spare blocks in the GFS journals are used to create the (very small)per_node files required by GFS2 during the update process. Most of the data remains in place.

The GFS2 "meta filesystem" is not a filesystem in its own right, but an alternateroot of the main filesystem. Although it behaves like a "normal" filesystem, its contents are the various system files used by GFS2, and normally users do not need to ever look at it. The GFS2 utilitiesmount and unmount the meta filesystem as required, behind the scenes.

References

[edit]

^Teigland, David (29 June 2004)."Symmetric Cluster Architecture and Component Technical Specifications"(PDF). Red Hat Inc. Retrieved2007-08-03.{{cite journal}}:Cite journal requires|journal= (help)
^Soltis, Steven R.; Erickson, Grant M.; Preslan, Kenneth W. (1997)."The Global File System: A File System for Shared Disk Storage"(PDF).IEEE Transactions on Parallel and Distributed Systems. Archived fromthe original(PDF) on 2004-04-15.
^OpenGFS Data sharing with a GFS storage cluster
^Daniel Robbins (2001-09-01)."Common threads: Advanced filesystem implementor's guide, Part 3". IBM DeveloperWorks. Archived fromthe original on 2012-02-03. Retrieved2013-02-15.
^Whitehouse, Steven (27–30 June 2007)."The GFS2 Filesystem"(PDF).Proceedings of the Linux Symposium 2007.Ottawa, Ontario, Canada. pp. 253–259.
^Whitehouse, Steven (13–17 July 2009)."Testing and verification of cluster filesystems"(PDF).Proceedings of the Linux Symposium 2009.Montreal, Quebec, Canada. pp. 311–317.
^"Bringing Red Hat Resilient Storage to the public cloud".www.redhat.com. Retrieved19 February 2021.

External links

[edit]

Red HatRed Hat Enterprise Linux 6 - Global File System 2
Red HatCluster Suite and GFS Documentation Page
GFS Project Page Archived 2006-03-15 at theWayback Machine
OpenGFS Project Page (obsolete)
The GFS2 development git tree
The GFS2 utilities development git tree

v t e Red Hat
Major products	Enterprise Linux Cluster Suite GFS2 WildFly Virtual Machine Manager Ceph Satellite Simple Protocol for Independent Computing Environments Ansible CentOS
Services	Network Virtualization OpenShift
Projects	Fedora Project FreeOTP
Defunct	Red Hat Linux Mugshot Spacewalk
People	Jim Whitehurst Matthew Szulik Bob Young Marc Ewing Michael Tiemann Paul Cormier
Mergers and acquisitions	C2Net Cygnus Solutions Gluster Inktank Storage MetaMatrix Qumranet
Related	IBM Fedora Linux Certification Program Enterprise Linux derivatives Tower

File systems

Disk and
non-rotating

ADFS
AdvFS
Amiga FFS
Amiga OFS
APFS
AthFS
bcachefs
BFS
- Be File System
- Boot File System
- Byte File System (z/VM)
Btrfs
CVFS
CXFS
DFS
EFS
- Encrypting File System
- Extent File System
Episode
ext
- ext2
- ext3
- ext3cow
- ext4
FAT
- exFAT
Files-11
Fossil
GPFS
HAMMER
- HAMMER2
HFS (Classic Mac OS)
HFS (MVS)
HFS+
HPFS
HTFS
JFS
LFS
MFS
- Macintosh File System
- TiVo Media File System
MINIX
NetWare File System
Next3
NILFS
- NILFS2
NSS
NTFS
OneFS
OpenZFS
PFS
QFS
QNX4FS
ReFS
ReiserFS
- Reiser4
Reliance
Reliance Nitro
RFS
SFS
- Shared File System (VM)
- Smart File System
SNFS
Soup (Apple)
Tux3
UBIFS
UFS/UFS2
- soft updates
- WAPBL
VxFS
WAFL
Xiafs
XFS
Xsan
zFS (z/OS)
ZFS (Sun)

Optical disc

Flash memory andSSD

host-sidewear leveling	CHFS JFFS JFFS2 LogFS NILFS NILFS2 YAFFS UBIFS

Distributed parallel

NAS

Specialized

Aufs AXFS Boot File System Compact Disc File System cramfs Davfs2 EROFS FTPFS FUSE Lnfs LTFS NOVA MVFS SquashFS UMSDOS OverlayFS UnionFS
Pseudo	configfs devfs debugfs kernfs procfs specfs sysfs tmpfs WinFS
Encrypted	eCryptfs EncFS EFS Rubberhose SSHFS ZFS

Types

Features

Case preservation Copy-on-write Data deduplication Data scrubbing Execute in place Extent File attribute Extended file attributes File change log Fork Inode Links Hard Symbolic
Access control	Access-control list Filesystem-level encryption Permissions Modes Sticky bit