Movatterモバイル変換

[0]ホーム

Jump to content

Single point of failure

Edit links

From Wikipedia, the free encyclopedia

Part whose failure will disrupt the entire system

This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "Single point of failure" – news ·newspapers ·books ·scholar ·JSTOR(May 2014) (Learn how and when to remove this message)

In this diagram therouter is a single point of failure for the communication network between computers.

Asingle point of failure (SPOF) is a part of a system that wouldstop the entire system from working if it were tofail.^[1] The term single point of failure implies that there is not a backup or redundant option that would enable the system to continue to function without it. SPOFs are undesirable in any system with a goal ofhigh availability orreliability, be it a business practice, software application, or other industrial system. If there is a SPOF present in a system, it produces a potential interruption to the system that is substantially more disruptive than an error would elsewhere in the system.

Overview

[edit]

Systems can be made robust by addingredundancy in all potential SPOFs. Redundancy can be achieved at various levels.

The assessment of a potential SPOF involves identifying the critical components of a complex system that would provoke a total systems failure in case ofmalfunction.^[2] Highlyreliable systems should not rely on any such individual component.

For instance, the owner of a smalltree care company may only own onewoodchipper. If the chipper breaks, they may be unable to complete their current job and may have to cancel future jobs until they can obtain a replacement. The owner could prepare for this in multiple ways. The owner of the tree care company may havespare parts ready for the repair of the wood chipper, in case it fails. At a higher level, they may have a second wood chipper that they can bring to the job site. Finally, at the highest level, they may have enough equipment available to completely replace everything at the work site in the case of multiple failures.

Possible SPOFs in a simple setup
Using redundancy to avoid some SPOFs
Completely redundant system without SPOFs (note: assumes generator and grid sources are each rated at N, each UPS is rated at N, and "A/C" and "Electrical" are in and of themselves completely fault tolerant systems)

Computing

[edit]

This section needs to beupdated. The reason given is: Needs updating for public cloud computing. Please help update this article to reflect recent events or newly available information.(May 2022)

Afault-tolerant computer system can be achieved at the internal component level, at the system level (multiple machines), or site level (replication).

One would normally deploy aload balancer to ensure high availability for aserver cluster at the system level.^[3] In a high-availability server cluster, each individual server may attain internal component redundancy by having multiple power supplies, hard drives, and other components. System-level redundancy could be obtained by having spare servers waiting to take on the work of another server if it fails.

Since a data center is often a support center for other operations such as business logic, it represents a potential SPOF in itself. Thus, at the site level, the entire cluster may be replicated at another location, where it can be accessed in case the primary location becomes unavailable. This is typically addressed as part of anIT disaster recovery program. While previously the solution to this SPOF was physical duplication of clusters, the high demand for this duplication led multiple businesses to outsource duplication to 3rd parties usingcloud computing. However arguably, doing so simply moves the SPOF and may even increase the likelihood of a failure orcyberattack.^[4]

Paul Baran andDonald Davies developedpacket switching, a key part of "survivable communications networks". Such networks – includingARPANET and theInternet – are designed to have no single point of failure. Multiple paths between any two points on the network allow those points to continue communicating with each other, the packets"routing around" damage, even after any single failure of any one particular path or any one intermediate node.

Software engineering

[edit]

Insoftware engineering, abottleneck occurs when the capacity of anapplication or a computer system is limited by a single component. The bottleneck has lowest throughput of all parts of the transaction path. A common example is when a usedprogramming language is capable ofparallel processing, but a givensnippet of code has several independent processes run sequentially rather than simultaneously.

Performance engineering

[edit]

Tracking down bottlenecks (sometimes known ashot spots – sections of the code that execute most frequently – i.e., have the highest execution count) is calledperformance analysis. Reduction is usually achieved with the help of specialized tools, known as performance analyzers or profilers. The objective is to make those particular sections of code perform as fast as possible to improve overallalgorithmic efficiency.

Computer security

[edit]

A vulnerability or security exploit in just one component can compromise an entire system. One of the largest concerns incomputer security is attempting to eliminate SPOFs without sacrificing too much convenience to the user. With the invention and popularization of theInternet, several systems became connected to the broader world through many difficult to secure connections.^[4] While companies have developed a number of solutions to this, the most consistent form of SPOFs in complex systems tends to remainuser error, either by accidental mishandling by an operator or outside interference throughphishing attacks.^[5]

Other fields

[edit]

The concept of a single point of failure has also been applied to fields outside of engineering, computers, and networking, such as corporatesupply chain management^[6] and transportation management.^[7]

Design structures that create single points of failure includebottlenecks andseries circuits (in contrast toparallel circuits).

In transportation, some noted recent examples of the concept's application have included theNipigon River Bridge in Canada, where a partial bridge failure in January 2016 entirely severed road traffic betweenEastern Canada andWestern Canada for several days because it is located along a portion of theTrans-Canada Highway where there is no alternatedetour route for vehicles to take;^[8] and theNorwalk River Railroad Bridge inNorwalk,Connecticut, an agingswing bridge that sometimes gets stuck when opening or closing, disrupting rail traffic on theNortheast Corridor line.^[7]

The concept of a single point of failure has also been applied to theintelligence field, and the processes ofintelligence.Edward Snowden talked of the dangers of being what he described as "the single point of failure" – the sole repository of information.^[9]