This is an accepted version of this page
Ininformation technology andcomputer science, especially in the fields ofcomputer programming,operating systems,multiprocessors, anddatabases,concurrency control ensures that correct results forconcurrent operations are generated, while getting those results as quickly as possible.
Computer systems, bothsoftware andhardware, consist of modules, or components. Each component is designed to operate correctly, i.e., to obey or to meet certain consistency rules. When components that operate concurrently interact by messaging or by sharing accessed data (inmemory orstorage), a certain component's consistency may be violated by another component. The general area of concurrency control provides rules, methods, design methodologies, andtheories to maintain the consistency of components operating concurrently while interacting, and thus the consistency and correctness of the whole system. Introducing concurrency control into a system means applying operation constraints which typically result in some performance reduction. Operation consistency and correctness should be achieved with as good as possible efficiency, without reducing performance below reasonable levels. Concurrency control can require significant additional complexity and overhead in aconcurrent algorithm compared to the simplersequential algorithm.
For example, a failure in concurrency control can result indata corruption fromtorn read or write operations.
Comments:
Concurrency control inDatabase management systems (DBMS; e.g.,Bernstein et al. 1987,Weikum and Vossen 2001), othertransactional objects, and related distributed applications (e.g.,Grid computing andCloud computing) ensures thatdatabase transactions are performedconcurrently without violating thedata integrity of the respectivedatabases. Thus concurrency control is an essential element for correctness in any system where two database transactions or more, executed with time overlap, can access the same data, e.g., virtually in any general-purpose database system. Consequently, a vast body of related research has been accumulated since database systems emerged in the early 1970s. A well established concurrency controltheory for database systems is outlined in the references mentioned above:serializability theory, which allows to effectively design and analyze concurrency control methods and mechanisms. An alternative theory for concurrency control of atomic transactions overabstract data types is presented in (Lynch et al. 1993), and not utilized below. This theory is more refined, complex, with a wider scope, and has been less utilized in the Database literature than the classical theory above. Each theory has its pros and cons, emphasis andinsight. To some extent they are complementary, and their merging may be useful.
To ensure correctness, a DBMS usually guarantees that onlyserializabletransaction schedules are generated, unlessserializability isintentionally relaxed to increase performance, but only in cases where application correctness is not harmed. For maintaining correctness in cases of failed (aborted) transactions (which can always happen for many reasons) schedules also need to have therecoverability (from abort) property. A DBMS also guarantees that no effect ofcommitted transactions is lost, and no effect ofaborted (rolled back) transactions remains in the related database. Overall transaction characterization is usually summarized by theACID rules below. As databases have becomedistributed, or needed to cooperate in distributed environments (e.g.,Federated databases in the early 1990, andCloud computing currently), the effective distribution of concurrency control mechanisms has received special attention.
The concept of adatabase transaction (oratomic transaction) has evolved in order to enable both a well understood database system behavior in a faulty environment where crashes can happen any time, andrecovery from a crash to a well understood database state. A database transaction is a unit of work, typically encapsulating a number of operations over a database (e.g., reading adatabase object, writing, acquiring lock, etc.), an abstraction supported in database and also other systems. Each transaction has well defined boundaries in terms of which program/code executions are included in that transaction (determined by the transaction's programmer via special transaction commands). Every database transaction obeys the following rules (by support in the database system; i.e., a database system is designed to guarantee them for the transactions it runs):
The concept of atomic transaction has been extended during the years to what has becomeBusiness transactions which actually implement types ofWorkflow and are not atomic. However also such enhanced transactions typically utilize atomic transactions as components.
If transactions are executedserially, i.e., sequentially with no overlap in time, no transaction concurrency exists. However, if concurrent transactions with interleaving operations are allowed in an uncontrolled manner, some unexpected, undesirable results may occur, such as:
Most high-performance transactional systems need to run transactions concurrently to meet their performance requirements. Thus, without concurrency control such systems can neither provide correct results nor maintain their databases consistently.
The main categories of concurrency control mechanisms are:
Different categories provide different performance, i.e., different average transaction completion rates (throughput), depending on transaction types mix, computing level of parallelism, and other factors. If selection and knowledge about trade-offs are available, then category and method should be chosen to provide the highest performance.
The mutual blocking between two transactions (where each one blocks the other) or more results in adeadlock, where the transactions involved are stalled and cannot reach completion. Most non-optimistic mechanisms (with blocking) are prone to deadlocks which are resolved by an intentional abort of a stalled transaction (which releases the other transactions in that deadlock), and its immediate restart and re-execution. The likelihood of a deadlock is typically low.
Blocking, deadlocks, and aborts all result in performance reduction, and hence the trade-offs between the categories.
Many methods for concurrency control exist. Most of them can be implemented within either main category above. The major methods,[1] which have each many variants, and in some cases may overlap or be combined, are:
Other major concurrency control types that are utilized in conjunction with the methods above include:
The most common mechanism type in database systems since their early days in the 1970s has beenStrong strict Two-phase locking (SS2PL; also calledRigorous scheduling orRigorous 2PL) which is a special case (variant) ofTwo-phase locking (2PL). It is pessimistic. In spite of its long name (for historical reasons) the idea of theSS2PL mechanism is simple: "Release all locks applied by a transaction only after the transaction has ended." SS2PL (or Rigorousness) is also the name of the set of all schedules that can be generated by this mechanism, i.e., these SS2PL (or Rigorous) schedules have the SS2PL (or Rigorousness) property.
Concurrency control mechanisms firstly need to operate correctly, i.e., to maintain each transaction's integrity rules (as related to concurrency; application-specific integrity rule are out of the scope here) while transactions are running concurrently, and thus the integrity of the entire transactional system. Correctness needs to be achieved with as good performance as possible. In addition, increasingly a need exists to operate effectively while transactions aredistributed overprocesses,computers, andcomputer networks. Other subjects that may affect concurrency control arerecovery andreplication.
For correctness, a common major goal of most concurrency control mechanisms is generatingschedules with theSerializability property. Without serializability undesirable phenomena may occur, e.g., money may disappear from accounts, or be generated from nowhere.Serializability of a schedule means equivalence (in the resulting database values) to someserial schedule with the same transactions (i.e., in which transactions are sequential with no overlap in time, and thus completely isolated from each other: No concurrent access by any two transactions to the same data is possible). Serializability is considered the highest level ofisolation amongdatabase transactions, and the major correctness criterion for concurrent transactions. In some cases compromised,relaxed forms of serializability are allowed for better performance (e.g., the popularSnapshot isolation mechanism) or to meetavailability requirements in highly distributed systems (seeEventual consistency), but only if application's correctness is not violated by the relaxation (e.g., no relaxation is allowed formoney transactions, since by relaxation money can disappear, or appear from nowhere).
Almost all implemented concurrency control mechanisms achieve serializability by providingConflict serializability, a broad special case of serializability (i.e., it covers, enables most serializable schedules, and does not impose significant additional delay-causing constraints) which can be implemented efficiently.
Concurrency control typically also ensures theRecoverability property of schedules for maintaining correctness in cases of aborted transactions (which can always happen for many reasons).Recoverability (from abort) means that no committed transaction in a schedule has read data written by an aborted transaction. Such data disappear from the database (upon the abort) and are parts of an incorrect database state. Reading such data violates the consistency rule of ACID. Unlike Serializability, Recoverability cannot be compromised, relaxed at any case, since any relaxation results in quick database integrity violation upon aborts. The major methods listed above provide serializability mechanisms. None of them in its general form automatically provides recoverability, and special considerations and mechanism enhancements are needed to support recoverability. A commonly utilized special case of recoverability isStrictness, which allows efficient database recovery from failure (but excludes optimistic implementations.
With the fast technological development of computing the difference between local and distributed computing over low latencynetworks orbuses is blurring. Thus the quite effective utilization of local techniques in such distributed environments is common, e.g., incomputer clusters andmulti-core processors. However the local techniques have their limitations and use multi-processes (or threads) supported by multi-processors (or multi-cores) to scale. This often turns transactions into distributed ones, if they themselves need to span multi-processes. In these cases most local concurrency control techniques do not scale well.
All systems are prone to failures, and handlingrecovery from failure is a must. The properties of the generated schedules, which are dictated by the concurrency control mechanism, may affect the effectiveness and efficiency of recovery. For example, the Strictness property (mentioned in the sectionRecoverability above) is often desirable for an efficient recovery.
For high availability database objects are oftenreplicated. Updates of replicas of a same database object need to be kept synchronized. This may affect the way concurrency control is done (e.g., Gray et al. 1996[2]).
This sectionneeds expansion. You can help byadding missing information.(December 2010) |
Multitasking operating systems, especiallyreal-time operating systems, need to maintain the illusion that all tasks running on top of them are all running at the same time, even though only one or a few tasks really are running at any given moment due to the limitations of the hardware the operating system is running on. Such multitasking is fairly simple when all tasks are independent from each other. However, when several tasks try to use the same resource, or when tasks try to share information, it can lead to confusion and inconsistency. The task ofconcurrent computing is to solve that problem. Some solutions involve "locks" similar to the locks used in databases, but they risk causing problems of their own such asdeadlock. Other solutions areNon-blocking algorithms andRead-copy-update.