In computing, a failover cluster refers to a group of independentservers that work together to maintainhigh availability of applications and services. If one of the servers fails, another node in the cluster can take over its workload with little or nodowntime. This process is known asfailover.
In a failover cluster, the clustered servers can collectively improve the availability and scalability of applications and services. The servers work together to provide either continuous availability or at least high availability (HA) through the failover process. Failover clusters can comprise either physical servers or virtual machines (VMs).
Each server is called anode and is connected to other servers in the cluster by physical cables and software. A failover cluster consists of at least two nodes to transfer data, as well as software to process the data through the cables. In addition, a cluster uses one or more technologies forload balancing, storage,parallel processing and other functions.
The applications and services in a failover cluster are sometimes known as clustered roles. The cluster keeps these roles operational if one server fails. At the same time, each role is proactively monitored to ensure that it is working properly. If it's not, it might be restarted or the role moved to another node.
Server failure can lead to application downtime, which can result in operational disruptions for users. By providing continuous availability or high availability, failover clusters enable users to keep using the applications and services they need without experiencing outages, even if a server fails. There might be a brief service interruption with high-availability clusters during the failover process. However, the system usually recovers quickly with little or no data loss and minimum downtime.
Failover clusters play a vital role to ensure the ongoing availability ofmission-critical applications and systems such as online transaction processing (OLTP) systems that demand very high -- near 100% -- availability.Database replication and disaster recovery (DR) also require failover clusters. These clusters provide geographic replication so if a server in one location goes down, data will still be available on failover servers at other sites.
In a high-availability failover cluster, groups of independent servers share data and resources, including storage. At any time, a least one node is active and at least one is passive. These clusters include a monitoring connection that allows each server to check the health of the other servers. In a two-node cluster (the simplest possible configuration), if a node fails the other node will recognize the failure via the monitoring connection and configure itself as the active node. Larger configurations usually use dedicated servers that determine if any nodes are failing and then direct another node to assume the load and participate in the failover process.
In high-availability failover clusters using VMs, the VMs are pooled into a cluster along with the physical servers they reside on. If there is a failure, the VMs on the failed host will be restarted on alternate hosts.
In a continuous-availability failover cluster -- also known as afault-tolerant cluster -- multiple systems share one copy of a computer's operating system (OS). Thus, the commands issued by one system are simultaneously executed on the other systems as well. The cluster requires a continuously available and near-exact copy of a machine running the service (physical or virtual). This redundancy model is known as 2N. It can automatically detect failures of hard drives, power supplies, network components and CPUs. When the cluster identifies a failure point, a backup component -- or procedure -- takes its place instantaneously without service interruption.
High-availability clusters attempt 99.999% availability (five 9s availability). These clusters are generally adequate for most applications and services. However, services or applications that require 100% availability, continuous-availability failover clusters are required. These include mission-critical applications such as electronic stock trading, ATM banking, order management, employee time clock systems and reservation systems. Many applications in manufacturing,logistics and e-commerce also require continuous-availability failover clusters.
Failover clustering is a popular feature in Windows Server andAzure Stack HCI. With those OSes, organizations can create highly available or continuously available file share storage for applications such asMicrosoft SQL Server and Hyper-V VMs. Another approach is to create highly available clustered roles on physical servers or VMs that are installed on servers running Hyper-V.
Failover clusters in Windows provide Cluster Shared Volume (CSV) functionality. CSV provides a general-purpose, clustered file system that's layered aboveNTFS orReFS. Multiple clustered nodes can read from or write to the sameLUN (part of a drive or collection of drives) that is provisioned as an NTFS volume. CSVs provide a consistent distributed namespace that can be used by clustered roles to access shared storage from all nodes and simplify the management of a large number of LUNs in a failover cluster.
Failover clusters are also available inVMware, SQL Server and Red Hat Enterprise Linux. VMware offers many virtualization tools for VM failover clusters. For example, vSpherevMotion replicates a VMware VM and its network to provide continuous availability. VMwarevSphere pools VMs and their hosts into a cluster for automatic failover and high availability.
SQL Server 2017 comes with a high-availabilityfailover clustering solution called Always On that registers SQL Server components as cluster resources and enables automatic failover to a different node if one node fails. Red Hat Enterprise Linux also provides a failover cluster mechanism, which allows users to create high-availability failover clusters with the Red Hat Global File System (GFS/GFS2).
affinity refers to a rule a user would set up to establish a relationship between two or more roles to keep them together. These roles could be VMs, resource groups or other entities. Anti-affinity is also a rule, albeit one that is used to keep the specified roles apart.
These rules are required because a failover cluster can hold many roles that can move and run between nodes. However, when certain roles should not run on the same node, they must be kept apart. Affinity and anti-affinity rules help avoid performance impact issues, which might happen if roles running on the same node use more resources or memory than they should.
Learn more about thedifferences between virtual servers and physical servers.
Q4 cloud infrastructure service revenues reach $119.1 billion, bringing the 2025 total to $419 billion. See how much market share...
Will $5 trillion in AI infrastructure investment be enough? Cloud providers facing that question must also yield a return, ...
As IT leaders aggressively re-allocate capital to fund new AI initiatives, repatriation offers both savings and greater control, ...
With Windows 10 end of support now past, enterprises must evaluate whether to upgrade to Windows 11 based on hardware readiness, ...
Risk is no longer centered only in core systems. Identity, hiring, endpoints and partner platforms are where exposure ...
The Windows 10 end-of-support deadline forces IT teams to choose between Windows 11 migration, ESU enrollment and broader desktop...
Hyper-V virtual machines have many use cases in enterprise IT. Windows administrators should follow these steps to create new VMs...
When people discuss desktop as a service, it is usually in the context of Windows desktops. For macOS, however, implementing DaaS...
DaaS and VPN both provide remote access to corporate resources, but they differ in important ways that IT leaders should evaluate...


