Movatterモバイル変換

[0]ホーム

Jump to content

Clustered file system

Edit links

From Wikipedia, the free encyclopedia

(Redirected fromDistributed file system)

Type of decentralized filesystem

Not to be confused withData cluster.

"Network filesystem" and "Parallel file system" redirect here. For the Sun NFS protocol, seeNetwork File System.

This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "Clustered file system" – news ·newspapers ·books ·scholar ·JSTOR(December 2015) (Learn how and when to remove this message)

Aclustered file system (CFS) is afile system which is shared by being simultaneouslymounted on multipleservers. There are several approaches toclustering, most of which do not employ a clustered file system (onlydirect attached storage for each node). Clustered file systems can provide features like location-independent addressing and redundancy which improve reliability or reduce the complexity of the other parts of the cluster.Parallel file systems are a type of clustered file system that spread data across multiple storage nodes, usually for redundancy or performance.^[1]

Shared-disk file system

[edit]

Ashared-disk file system uses astorage area network (SAN) to allow multiple computers to gain direct disk access at theblock level. Access control and translation from file-level operations that applications use to block-level operations used by the SAN must take place on the client node. The most common type of clustered file system, the shared-disk file system – by adding mechanisms forconcurrency control – provides a consistent andserializable view of the file system, avoiding corruption and unintendeddata loss even when multiple clients try to access the same files at the same time. Shared-disk file-systems commonly employ some sort offencing mechanism to prevent data corruption in case of node failures, because an unfenced device can cause data corruption if it loses communication with its sister nodes and tries to access the same information other nodes are accessing.

The underlying storage area network may use any of a number of block-level protocols, includingSCSI,iSCSI,HyperSCSI,ATA over Ethernet (AoE),Fibre Channel,network block device, andInfiniBand.

There are different architectural approaches to a shared-disk filesystem. Some distribute file information across all the servers in a cluster (fully distributed).^[2]

Examples

[edit]

Blue Whale Clustered file system (BWFS)
Silicon Graphics (SGI) clustered file system (CXFS)
Veritas Cluster File System
MicrosoftCluster Shared Volumes (CSV)
DataPlowNasan File System
IBM General Parallel File System (GPFS)
Oracle Cluster File System (OCFS)
OpenVMS Files-11 File System
PolyServe storage solutions
Quantum StorNext File System (SNFS), ex ADIC, ex CentraVision File System (CVFS)
Red HatGlobal File System (GFS2)
SunQFS
TerraScale Technologies TerraFS
Veritas CFS (Cluster FS: Clustered VxFS)
Versity VSM (SAM-QFS ported to Linux), ScoutFS
VMware VMFS
WekaFS
AppleXsan
DragonFly BSD HAMMER2

Distributed file systems

[edit]

Distributed file systems do not shareblock level access to the same storage but use a networkprotocol.^[3]^[4] These are commonly known as network file systems, even though they are not the only file systems that use the network to send data.^[5] Distributed file systems can restrict access to the file system depending onaccess lists orcapabilities on both the servers and the clients, depending on how the protocol is designed.

The difference between a distributed file system and adistributed data store is that a distributed file system allows files to be accessed using the same interfaces and semantics as local files – for example, mounting/unmounting, listing directories, read/write at byte boundaries, system's native permission model. Distributed data stores, by contrast, require using a different API or library and have different semantics (most often those of a database).^[6]

Design goals

[edit]

Distributed file systems may aim for "transparency" in a number of aspects. That is, they aim to be "invisible" to client programs, which "see" a system which is similar to a local file system. Behind the scenes, the distributed file system handles locating files, transporting data, and potentially providing other features listed below.

Access transparency: clients are unaware that files are distributed and can access them in the same way as local files are accessed.
Location transparency: a consistent namespace exists encompassing local as well as remote files. The name of a file does not give its location.
Concurrency transparency: all clients have the same view of the state of the file system. This means that if one process is modifying a file, any other processes on the same system or remote systems that are accessing the files will see the modifications in a coherent manner.
Failure transparency: the client and client programs should operate correctly after a server failure.
Heterogeneity: file service should be provided across different hardware and operating system platforms.
Scalability: the file system should work well in small environments (1 machine, a dozen machines) and also scale gracefully to bigger ones (hundreds through tens of thousands of systems).
Replication transparency: Clients should not have to be aware of the file replication performed across multiple servers to support scalability.
Migration transparency: files should be able to move between different servers without the client's knowledge.

History

[edit]

TheIncompatible Timesharing System used virtual devices for transparent inter-machine file system access in the 1960s. More file servers were developed in the 1970s. In 1976,Digital Equipment Corporation created theFile Access Listener (FAL), an implementation of theData Access Protocol as part ofDECnet Phase II which became the first widely used network file system. In 1984,Sun Microsystems created the file system called "Network File System" (NFS) which became the first widely usedInternet Protocol based network file system.^[4] Other notable network file systems areAndrew File System (AFS),Apple Filing Protocol (AFP),NetWare Core Protocol (NCP), andServer Message Block (SMB) which is also known as Common Internet File System (CIFS).

In 1986,IBM announced client and server support for Distributed Data Management Architecture (DDM) for theSystem/36,System/38, and IBM mainframe computers runningCICS. This was followed by the support forIBM Personal Computer,AS/400, IBM mainframe computers under theMVS andVSE operating systems, andFlexOS. DDM also became the foundation forDistributed Relational Database Architecture, also known as DRDA.

There are manypeer-to-peer network protocols for open-sourcedistributed file systems for cloud or closed-source clustered file systems, e. g.:9P,AFS,Coda,CIFS/SMB,DCE/DFS, WekaFS,^[7]Lustre, PanFS,^[8]Google File System,Mnet,Chord Project.

Examples

[edit]

Main article:List of distributed file systems

Alluxio
BeeGFS (Fraunhofer)
CephFS (Inktank, Red Hat, SUSE)
Windows Distributed File System (DFS) (Microsoft)
Infinit (acquired by Docker)
GfarmFS
GlusterFS (Red Hat)
GFS (Google Inc.)
GPFS (IBM)
HDFS (Apache Software Foundation)
IPFS (Inter Planetary File System)
iRODS
LizardFS (Skytechnology)
Lustre
MapR FS
MooseFS (Core Technology / Gemius)
ObjectiveFS
OneFS (EMC Isilon)
OrangeFS (Clemson University, Omnibond Systems), formerlyParallel Virtual File System
PanFS (Panasas)
Parallel Virtual File System (Clemson University, Argonne National Laboratory, Ohio Supercomputer Center)
RozoFS (Rozo Systems)
SMB/CIFS
Torus (CoreOS)
WekaFS (WekaIO)
XtreemFS

Network-attached storage

[edit]

Main article:Network-attached storage

Network-attached storage (NAS) provides both storage and a file system, like a shared disk file system on top of a storage area network (SAN). NAS typically uses file-based protocols (as opposed to block-based protocols a SAN would use) such asNFS (popular onUNIX systems), SMB/CIFS (Server Message Block/Common Internet File System) (used with MS Windows systems),AFP (used withApple Macintosh computers), orNCP (used withOES andNovell NetWare).

Design considerations

[edit]

Avoiding single point of failure

[edit]

The failure of disk hardware or a given storage node in a cluster can create asingle point of failure that can result indata loss or unavailability.Fault tolerance and high availability can be provided throughdata replication of one sort or another, so that data remains intact and available despite the failure of any single piece of equipment. For examples, see the lists ofdistributed fault-tolerant file systems anddistributed parallel fault-tolerant file systems.

Performance

[edit]

A commonperformance measurement of a clustered file system is the amount of time needed to satisfy service requests. In conventional systems, this time consists of a disk-access time and a small amount ofCPU-processing time. But in a clustered file system, a remote access has additional overhead due to the distributed structure. This includes the time to deliver the request to a server, the time to deliver the response to the client, and for each direction, a CPU overhead of running thecommunication protocol software.

Concurrency

[edit]

Concurrency control becomes an issue when more than one person or client is accessing the same file or block and want to update it. Hence updates to the file from one client should not interfere with access and updates from other clients. This problem is more complex with file systems due to concurrent overlapping writes, where different writers write to overlapping regions of the file concurrently.^[9] This problem is usually handled byconcurrency control orlocking which may either be built into the file system or provided by an add-on protocol.

History

[edit]

IBM mainframes in the 1970s could share physical disks and file systems if each machine had its own channel connection to the drives' control units. In the 1980s,Digital Equipment Corporation'sTOPS-20 andOpenVMS clusters (VAX/ALPHA/IA64) included shared disk file systems.^[10]

References

[edit]

^Saify, Amina; Kochhar, Garima; Hsieh, Jenwei; Celebioglu, Onur (May 2005)."Enhancing High-Performance Computing Clusters with Parallel File Systems"(PDF).Dell Power Solutions. Dell Inc. Retrieved6 March 2019.
^Mokadem, Riad; Litwin, Witold; Schwarz, Thomas (2006)."Disk Backup Through Algebraic Signatures in Scalable Distributed Data Structures"(PDF). DEXA 2006 Springer. Retrieved8 June 2006.
^Silberschatz, Abraham; Galvin, Peter; Gagne, Greg (2009)."Operating System Concepts, 8th Edition"(PDF).University of Babylon. John Wiley & Sons, Inc. pp. 705–725. Archived fromthe original(PDF) on 6 March 2019. Retrieved4 March 2019.
^^a ^bArpaci-Dusseau, Remzi H.; Arpaci-Dusseau, Andrea C. (2014),Sun's Network File System(PDF), Arpaci-Dusseau Books
^Sandberg, Russel (1986)."The Sun Network Filesystem: Design, Implementation and Experience"(PDF).Proceedings of the Summer 1986 USENIX Technical Conference and Exhibition. Sun Microsystems, Inc. Retrieved6 March 2019.NFS was designed to simplify the sharing of filesystem resources in a network of non-homogeneousmachines.
^Sobh, Tarek (2008).Advances in Computer and Information Sciences and Engineering. Springer Science & Business Media. pp. 423–440.Bibcode:2008acis.book.....S.
^"Weka Distributed File Systems (DFS)".weka.io. 2021-04-27. Retrieved2023-10-12.
^"PanFS Parallel File System".panasas.com. Retrieved2023-10-12.
^Pessach, Yaniv (2013).Distributed Storage: Concepts, Algorithms, and Implementations.ISBN 978-1482561043.
^Murphy, Dan (1996)."Origins and Development of TOPS-20". Dan Murphy. Ambitious Plans for Jupiter. Retrieved6 March 2019.Ultimately, both VMS and TOPS-20 shipped this kind of capability.

Movatterモバイル変換

Clustered file system

Shared-disk file system

Examples

Distributed file systems

Design goals

History

Examples

Network-attached storage

Design considerations

Avoiding single point of failure

Performance

Concurrency

History

See also

References

Further reading