NotificationsYou must be signed in to change notification settings
Fork5
Star27

Commita3290f6

committed

Minor editing for README-SSI.

Fix some grammatical issues, try to clarify a couple of proofs, make theterminology more consistent.

1 parente2a0cb1 commita3290f6Copy full SHA for a3290f6

File tree

1 file changed

+65

-51

lines changed

src/backend/storage/lmgr
- README-SSI

1 file changed

+65

-51

lines changed

`‎src/backend/storage/lmgr/README-SSI‎`

Lines changed: 65 additions & 51 deletions

Original file line number	Diff line number	Diff line change
`@@ -3,11 +3,11 @@ src/backend/storage/lmgr/README-SSI`
`3`	`3`	`Serializable Snapshot Isolation (SSI) and Predicate Locking`
`4`	`4`	`===========================================================`
`5`	`5`
`6`		`-This iscurrently sittingin the lmgr directory because about 90% of`
`7`		`-the code is animplementation of predicate locking, which is required`
`8`		`-for SSI,rather than being directly related to SSI itself. When`
`9`		`-another usefor predicate locking justifies the effort to tease these`
`10`		`-two thingsapart, this README file should probably be split.`
	`6`	`+Thiscodeis in the lmgr directory because about 90% of it is an`
	`7`	`+implementation of predicate locking, which is required for SSI,`
	`8`	`+rather than being directly related to SSI itself. When another use`
	`9`	`+for predicate locking justifies the effort to tease these two things`
	`10`	`+apart, this README file should probably be split.`
`11`	`11`
`12`	`12`
`13`	`13`	`Credits`
`@@ -151,11 +151,11 @@ transactions.`
`151`	`151`	`SSI Algorithm`
`152`	`152`	`-------------`
`153`	`153`
`154`		`-Serializable transaction in PostgreSQL are implemented using`
	`154`	`+As of 9.1, serializable transactions in PostgreSQL are implemented using`
`155`	`155`	`Serializable Snapshot Isolation (SSI), based on the work of Cahill`
`156`	`156`	`et al. Fundamentally, this allows snapshot isolation to run as it`
`157`		`-has, while monitoring for conditions which could create a serialization`
`158`		`-anomaly.`
	`157`	`+previously did, while monitoring for conditions which could create a`
	`158`	`+serializationanomaly.`
`159`	`159`
`160`	`160`	`SSI is based on the observation [2] that each snapshot isolation`
`161`	`161`	`anomaly corresponds to a cycle that contains a "dangerous structure"`
`@@ -168,8 +168,10 @@ SSI works by watching for this dangerous structure, and rolling`
`168`	`168`	`back a transaction when needed to prevent any anomaly. This means it`
`169`	`169`	`only needs to track rw-conflicts between concurrent transactions, not`
`170`	`170`	`wr- and ww-dependencies. It also means there is a risk of false`
`171`		`-positives, because not every dangerous structure corresponds to an`
`172`		`-actual serialization failure.`
	`171`	`+positives, because not every dangerous structure is embedded in an`
	`172`	`+actual cycle. The number of false positives is low in practice, so`
	`173`	`+this represents an acceptable tradeoff for keeping the detection`
	`174`	`+overhead low.`
`173`	`175`
`174`	`176`	`The PostgreSQL implementation uses two additional optimizations:`
`175`	`177`
`@@ -182,11 +184,12 @@ The PostgreSQL implementation uses two additional optimizations:`
`182`	`184`	`one. Proof:`
`183`	`185`
`184`	`186`	`- Because there is a cycle, there must be some transaction T0 that`
`185`		`- precedes Tin in theserial order. (T0 might be the same as Tout).`
	`187`	`+ precedes Tin in thecycle. (T0 might be the same as Tout.)`
`186`	`188`
`187`		`- - Thedependency between T0 and Tin can't be a rw-conflict,`
	`189`	`+ - Theedge between T0 and Tin can't be a rw-conflict or ww-dependency,`
`188`	`190`	`because Tin was read-only, so it must be a wr-dependency.`
`189`		`- Those can only occur if T0 committed before Tin started.`
	`191`	`+ Those can only occur if T0 committed before Tin took its snapshot,`
	`192`	`+ else Tin would have ignored T0's output.`
`190`	`193`
`191`	`194`	`- Because Tout must commit before any other transaction in the`
`192`	`195`	`cycle, it must commit before T0 commits -- and thus before Tin`
`@@ -258,8 +261,8 @@ full serializable transactions under either strategy. Practical`
`258`	`261`	`implementations of predicate locking generally involve acquiring`
`259`	`262`	`locks against data as it is accessed, using multiple granularities`
`260`	`263`	`(tuple, page, table, etc.) with escalation as needed to keep the lock`
`261`		`-count to a number which can be tracked within RAM structures, and`
`262`		`-this was used in PostgreSQL. Coarse granularities can cause some`
	`264`	`+count to a number which can be tracked within RAM structures. This`
	`265`	`+approach was used in PostgreSQL. Coarse granularities can cause some`
`263`	`266`	`false positive indications of conflict. The number of false positives`
`264`	`267`	`can be influenced by plan choice.`
`265`	`268`
`@@ -276,18 +279,18 @@ Hellerstein, Stonebraker and Hamilton paper [3], along with the`
`276`	`279`	`locking papers referenced from that and the Cahill papers.`
`277`	`280`
`278`	`281`	`Because the SIREAD locks don't block, traditional locking techniques`
`279`		`-were be modified. Intent locking (locking higher level objects`
	`282`	`+have to be modified. Intent locking (locking higher level objects`
`280`	`283`	`before locking lower level objects) doesn't work with non-blocking`
`281`	`284`	`"locks" (which are, in some respects, more like flags than locks).`
`282`	`285`
`283`	`286`	`A configurable amount of shared memory is reserved at postmaster`
`284`	`287`	`start-up to track predicate locks. This size cannot be changed`
`285`	`288`	`without a restart.`
`286`	`289`
`287`		`- *To prevent resource exhaustion, multiple fine-grained locks may`
	`290`	`+To prevent resource exhaustion, multiple fine-grained locks may`
`288`	`291`	`be promoted to a single coarser-grained lock as needed.`
`289`	`292`
`290`		`- *An attempt to acquire an SIREAD lock on a tuple when the same`
	`293`	`+An attempt to acquire an SIREAD lock on a tuple when the same`
`291`	`294`	`transaction already holds an SIREAD lock on the page or the relation`
`292`	`295`	`will be ignored. Likewise, an attempt to lock a page when the`
`293`	`296`	`relation is locked will be ignored, and the acquisition of a coarser`
`@@ -306,8 +309,8 @@ Predicate locks will be acquired for the heap based on the following:`
`306`	`309`	`will be locked, whether or not it meets selection criteria; except`
`307`	`310`	`that there is no need to acquire an SIREAD lock on a tuple when the`
`308`	`311`	`transaction already holds a write lock on any tuple representing the`
`309`		`-row, since a rw-dependency would also create a ww-dependency which`
`310`		`-has more aggressive enforcement andwillthus prevent any anomaly.`
	`312`	`+row, since a rw-conflict would also create a ww-dependency which`
	`313`	`+has more aggressive enforcement and thus will prevent any anomaly.`
`311`	`314`
`312`	`315`	`* Modifying a heap tuple creates a rw-conflict with any transaction`
`313`	`316`	`that holds a SIREAD lock on that tuple, or on the page or relation`
`@@ -341,13 +344,13 @@ need not generate a conflict, although an update which "moves" a row`
`341`	`344`	`into the scan must generate a conflict. While correctness allows`
`342`	`345`	`false positives, they should be minimized for performance reasons.`
`343`	`346`
`344`		`-Several optimizations are possible, though not all implemented yet:`
	`347`	`+Several optimizations are possible, though not allareimplemented yet:`
`345`	`348`
`346`	`349`	`* An index scan which is just finding the right position for an`
`347`		`-index insertion or deletionneeds not acquire a predicate lock.`
	`350`	`+index insertion or deletionneed not acquire a predicate lock.`
`348`	`351`
`349`	`352`	`* An index scan which is comparing for equality on the entire key`
`350`		`-for a unique indexneeds not acquire a predicate lock as long as a key`
	`353`	`+for a unique indexneed not acquire a predicate lock as long as a key`
`351`	`354`	`is found corresponding to a visible tuple which has not been modified`
`352`	`355`	`by another transaction -- there are no "between or around" gaps to`
`353`	`356`	`cover.`
`@@ -362,6 +365,9 @@ x = 1 AND x = 2), then no predicate lock is needed.`
`362`	`365`
`363`	`366`	`Other index AM implementation considerations:`
`364`	`367`
	`368`	`+ * For an index AM that doesn't have support for predicate locking,`
	`369`	`+we just acquire a predicate lock on the whole index for any search.`
	`370`	`+`
`365`	`371`	`* B-tree index searches acquire predicate locks only on the`
`366`	`372`	`index leaf pages needed to lock the appropriate index range. If,`
`367`	`373`	`however, a search discovers that no root page has yet been created, a`
`@@ -395,8 +401,8 @@ tracking SIREAD locks.`
`395`	`401`	`any length of time; lock information is written to the tuples`
`396`	`402`	`involved in the transactions.`
`397`	`403`	`* In PostgreSQL, existing lock structures have pointers to`
`398`		`-memory which is related to aconnection. SIREAD locks need to persist`
`399`		`-past the end of the originating transaction and even theconnection`
	`404`	`+memory which is related to asession. SIREAD locks need to persist`
	`405`	`+past the end of the originating transaction and even thesession`
`400`	`406`	`which ran it.`
`401`	`407`	`* PostgreSQL needs to be able to tolerate a large number of`
`402`	`408`	`transactions executing while one long-running transaction stays open`
`@@ -411,7 +417,8 @@ isolation level distinct from snapshot isolation.`
`411`	`417`	`in the papers.`
`412`	`418`
`413`	`419`	`5. PostgreSQL doesn't assign a transaction number to a database`
`414`		`-transaction until and unless necessary.`
	`420`	`+transaction until and unless necessary (normally, when the transaction`
	`421`	`+attempts to modify data).`
`415`	`422`
`416`	`423`	`6. PostgreSQL has pluggable data types with user-definable`
`417`	`424`	`operators, as well as pluggable index types, not all of which are`
`@@ -453,42 +460,46 @@ versions of the row, based on the following proof that any additional`
`453`	`460`	`serialization failures we would get from that would be false`
`454`	`461`	`positives:`
`455`	`462`
`456`		`- o If transaction T1 reads a row (thus acquiring a predicate`
`457`		`-lock on it) and a second transaction T2 updates that row, must a`
`458`		`-third transaction T3 which updates the new version of the row have a`
`459`		`-rw-conflict in from T1 to prevent anomalies? In other words, does it`
`460`		`-matter whether this edge T1 -> T3 is there?`
	`463`	`+ o If transaction T1 reads a row version (thus acquiring a`
	`464`	`+predicate lock on it) and a second transaction T2 updates that row`
	`465`	`+version (thus creating a rw-conflict graph edge from T1 to T2), must a`
	`466`	`+third transaction T3 which re-updates the new version of the row also`
	`467`	`+have a rw-conflict in from T1 to prevent anomalies? In other words,`
	`468`	`+does it matter whether we recognize the edge T1 -> T3?`
`461`	`469`
`462`	`470`	`o If T1 has a conflict in, it certainly doesn't. Adding the`
`463`	`471`	`edge T1 -> T3 would create a dangerous structure, but we already had`
`464`		`-one from the edge T1 -> T2, so we would have aborted something`
`465`		`-anyway.`
	`472`	`+one from the edge T1 -> T2, so we would have aborted something anyway.`
	`473`	`+(T2 has already committed, else T3 could not have updated its output;`
	`474`	`+but we would have aborted either T1 or T1's predecessor(s). Hence`
	`475`	`+no cycle involving T1 and T3 can survive.)`
`466`	`476`
`467`	`477`	`o Now let's consider the case where T1 doesn't have a`
`468`		`-conflict in. If that's the case, for this edge T1 -> T3 to make a`
`469`		`-difference, T3 must have a rw-conflict out that induces a cycle in`
`470`		`-thedependency graph, i.e. a conflict out to some transaction`
`471`		`-preceding T1in theserial order. (A conflict out to T1 wouldwork`
`472`		`-too,but that would mean T1 has a conflict in andwewould have`
`473`		`-rolled back.)`
	`478`	`+rw-conflict in. If that's the case, for this edge T1 -> T3 to make a`
	`479`	`+difference, T3 must have a rw-conflict out that induces a cycle in the`
	`480`	`+dependency graph, i.e. a conflict out to some transaction preceding T1`
	`481`	`+in thegraph. (A conflict out to T1itselfwouldbe problematic too,`
	`482`	`+but that would mean T1 has a conflict in, the casewealready`
	`483`	`+eliminated.)`
`474`	`484`
`475`	`485`	`o So now we're trying to figure out if there can be an`
`476`	`486`	`rw-conflict edge T3 -> T0, where T0 is some transaction that precedes`
`477`		`-T1. For T0 to precede T1, there has to be has to be some edge, or`
`478`		`-sequence of edges, from T0 to T1. At least the last edge has to be a`
`479`		`-wr-dependency or ww-dependency rather than a rw-conflict, because T1`
`480`		`-doesn't have a rw-conflict in. And that gives us enough information`
`481`		`-about the order of transactions to see that T3 can't have a`
`482`		`-rw-dependency to T0:`
	`487`	`+T1. For T0 to precede T1, there has to be some edge, or sequence of`
	`488`	`+edges, from T0 to T1. At least the last edge has to be a wr-dependency`
	`489`	`+or ww-dependency rather than a rw-conflict, because T1 doesn't have a`
	`490`	`+rw-conflict in. And that gives us enough information about the order`
	`491`	`+of transactions to see that T3 can't have a rw-conflict to T0:`
`483`	`492`	`- T0 committed before T1 started (the wr/ww-dependency implies this)`
`484`	`493`	`- T1 started before T2 committed (the T1->T2 rw-conflict implies this)`
`485`		`- - T2 committed before T3 started (otherwise, T3 wouldbe aborted`
	`494`	`+ - T2 committed before T3 started (otherwise, T3 wouldget aborted`
`486`	`495`	`because of an update conflict)`
`487`	`496`
`488`	`497`	`o That means T0 committed before T3 started, and therefore`
`489`	`498`	`there can't be a rw-conflict from T3 to T0.`
`490`	`499`
`491`		`- o In both cases, we didn't need the T1 -> T3 edge.`
	`500`	`+ o So in all cases, we don't need the T1 -> T3 edge to`
	`501`	`+recognize cycles. Therefore it's not necessary for T1's SIREAD lock`
	`502`	`+on the original tuple version to cover later versions as well.`
`492`	`503`
`493`	`504`	`* Predicate locking in PostgreSQL starts at the tuple level`
`494`	`505`	`when possible. Multiple fine-grained locks are promoted to a single`
`@@ -520,17 +531,20 @@ NULL to indicate no conflict and a self-reference to indicate`
`520`	`531`	`multiple conflicts or conflicts with committed transactions, we use a`
`521`	`532`	`list of rw-conflicts. With the more complete information, false`
`522`	`533`	`positives are reduced and we have sufficient data for more aggressive`
`523`		`-clean-up and other optimizations.`
	`534`	`+clean-up and other optimizations:`
	`535`	`+`
`524`	`536`	`o We can avoid ever rolling back a transaction until and`
`525`	`537`	`unless there is a pivot where a transaction on the conflict out`
`526`	`538`	`side of the pivot committed before either of the other transactions.`
	`539`	`+`
`527`	`540`	`o We can avoid ever rolling back a transaction when the`
`528`	`541`	`transaction on the conflict in side of the pivot is explicitly or`
`529`	`542`	`implicitly READ ONLY unless the transaction on the conflict out`
`530`	`543`	`side of the pivot committed before the READ ONLY transaction acquired`
`531`	`544`	`its snapshot. (An implicit READ ONLY transaction is one which`
`532`	`545`	`committed without writing, even though it was not explicitly declared`
`533`	`546`	`to be READ ONLY.)`
	`547`	`+`
`534`	`548`	`o We can more aggressively clean up conflicts, predicate`
`535`	`549`	`locks, and SSI transaction information.`
`536`	`550`
`@@ -543,7 +557,7 @@ overlapping transaction dependencies.`
`543`	`557`	`until the conditions are right for it to start in the "opt out" state`
`544`	`558`	`described above. We add a DEFERRABLE state to transactions, which is`
`545`	`559`	`specified and maintained in a way similar to READ ONLY. It is`
`546`		`-ignored for transactionswhich are not SERIALIZABLE and READ ONLY.`
	`560`	`+ignored for transactionsthat are not SERIALIZABLE and READ ONLY.`
`547`	`561`
`548`	`562`	`* When a transaction must be rolled back, we pick among the`
`549`	`563`	`active transactions such that an immediate retry will not fail again`
`@@ -593,8 +607,8 @@ might never be touched, or should we keep adding returned items to`
`593`	`607`	`the end of the available list?`
`594`	`608`
`595`	`609`
`596`		`-Footnotes`
`597`		`----------`
	`610`	`+References`
	`611`	`+----------`
`598`	`612`
`599`	`613`	`[1] http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt`
`600`	`614`	`Search for serial execution to find the relevant section.`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commita3290f6

File tree

1 file changed

1 file changed

`‎src/backend/storage/lmgr/README-SSI‎`

0 commit comments