|
1 |
| -$PostgreSQL: pgsql/src/backend/access/nbtree/README,v 1.13 2006/07/25 19:13:00 tgl Exp $ |
| 1 | +$PostgreSQL: pgsql/src/backend/access/nbtree/README,v 1.14 2006/11/01 19:43:17 tgl Exp $ |
2 | 2 |
|
3 | 3 | This directory contains a correct implementation of Lehman and Yao's
|
4 | 4 | high-concurrency B-tree management algorithm (P. Lehman and S. Yao,
|
@@ -201,26 +201,25 @@ When we delete the last remaining child of a parent page, we mark the
|
201 | 201 | parent page "half-dead" as part of the atomic update that deletes the
|
202 | 202 | child page. This implicitly transfers the parent's key space to its right
|
203 | 203 | sibling (which it must have, since we never delete the overall-rightmost
|
204 |
| -page of a level). No future insertions into the parent level are allowed |
205 |
| -to insert keys into the half-dead page --- they must move right to its |
206 |
| -sibling, instead. The parent remains empty and can be deleted in a |
207 |
| -separate atomic action. (However, if it's the rightmost child of its own |
208 |
| -parent, it might have to stay half-dead for awhile, until it's also the |
209 |
| -only child.) |
210 |
| - |
211 |
| -Note that an empty leaf page is a valid tree state, but an empty interior |
212 |
| -page is not legal (an interior page must have children to delegate its |
213 |
| -key space to). So an interior page *must* be marked half-dead as soon |
214 |
| -as its last child is deleted. |
| 204 | +page of a level). Searches ignore the half-dead page and immediately move |
| 205 | +right. We need not worry about insertions into a half-dead page --- insertions |
| 206 | +into upper tree levels happen only as a result of splits of child pages, and |
| 207 | +the half-dead page no longer has any children that could split. Therefore |
| 208 | +the page stays empty even when we don't have lock on it, and we can complete |
| 209 | +its deletion in a second atomic action. |
215 | 210 |
|
216 | 211 | The notion of a half-dead page means that the key space relationship between
|
217 | 212 | the half-dead page's level and its parent's level may be a little out of
|
218 | 213 | whack: key space that appears to belong to the half-dead page's parent on the
|
219 |
| -parent level may really belong to its right sibling. We can tolerate this, |
220 |
| -however, because insertions and deletions on upper tree levels are always |
221 |
| -done by reference to child page numbers, not keys. The only cost is that |
222 |
| -searches may sometimes descend to the half-dead page and then have to move |
223 |
| -right, rather than going directly to the sibling page. |
| 214 | +parent level may really belong to its right sibling. To prevent any possible |
| 215 | +problems, we hold lock on the deleted child page until we have finished |
| 216 | +deleting any now-half-dead parent page(s). This prevents any insertions into |
| 217 | +the transferred keyspace until the operation is complete. The reason for |
| 218 | +doing this is that a sufficiently large number of insertions into the |
| 219 | +transferred keyspace, resulting in multiple page splits, could propagate keys |
| 220 | +from that keyspace into the parent level, resulting in transiently |
| 221 | +out-of-order keys in that level. It is thought that that wouldn't cause any |
| 222 | +serious problem, but it seems too risky to allow. |
224 | 223 |
|
225 | 224 | A deleted page cannot be reclaimed immediately, since there may be other
|
226 | 225 | processes waiting to reference it (ie, search processes that just left the
|
|