|
13 | 13 | * See Knuth, volume 3, for more than you want to know about the external
|
14 | 14 | * sorting algorithm. Historically, we divided the input into sorted runs
|
15 | 15 | * using replacement selection, in the form of a priority tree implemented
|
16 |
| - * as a heap (essentially his Algorithm 5.2.3H -- although that strategy is |
17 |
| - *often avoided altogether), but that can now only happen first the first |
18 |
| - *run. We merge the runs using polyphase merge, Knuth's Algorithm |
| 16 | + * as a heap (essentially his Algorithm 5.2.3H), but now we only do that |
| 17 | + *for the first run, and only if the run would otherwise end up being very |
| 18 | + *short. We merge the runs using polyphase merge, Knuth's Algorithm |
19 | 19 | * 5.4.2D. The logical "tapes" used by Algorithm D are implemented by
|
20 | 20 | * logtape.c, which avoids space wastage by recycling disk space as soon
|
21 | 21 | * as each block is read from its "tape".
|
22 | 22 | *
|
23 |
| - * Wenever form the initial runs usingKnuth's recommendedreplacement |
24 |
| - *selection data structure (Algorithm 5.4.1R), because it uses a fixed |
25 |
| - *number of recordsin memory at all times. Since we are dealing with |
26 |
| - *tuples that may varyconsiderably in size, we want to be able to vary |
27 |
| - *the number of recordskept in memory to ensure full utilization of the |
28 |
| - *allowed sort memoryspace. So, we keep the tuples in a variable-size |
29 |
| - *heap, with the nextrecord to go out at the top of the heap. Like |
30 |
| - *Algorithm 5.4.1R, eachrecord is stored with the run number that it |
31 |
| - *must go into, and we use(run number, key) as the ordering key for the |
32 |
| - *heap. When the run numberat the top of the heap changes, we know that |
33 |
| - *no more records of the priorrun are left in the heap. Note that there |
34 |
| - *are in practice only ever twodistinct run numbers,due to the greatly |
35 |
| - *reduced use ofreplacement selectionin PostgreSQL 9.6. |
| 23 | + * Wedo not useKnuth's recommendeddata structure (Algorithm 5.4.1R) for |
| 24 | + *the replacement selection, because it uses a fixed number of records |
| 25 | + * in memory at all times. Since we are dealing with tuples that may vary |
| 26 | + * considerably in size, we want to be able to vary the number of records |
| 27 | + * kept in memory to ensure full utilization of the allowed sort memory |
| 28 | + * space. So, we keep the tuples in a variable-size heap, with the next |
| 29 | + * record to go out at the top of the heap. Like Algorithm 5.4.1R, each |
| 30 | + * record is stored with the run number that it must go into, and we use |
| 31 | + * (run number, key) as the ordering key for the heap. When the run number |
| 32 | + * at the top of the heap changes, we know that no more records of the prior |
| 33 | + * run are left in the heap. Note that there are in practice only ever two |
| 34 | + * distinct run numbers,because since PostgreSQL 9.6, we only use |
| 35 | + * replacement selectionto form the first run. |
36 | 36 | *
|
37 | 37 | * In PostgreSQL 9.6, a heap (based on Knuth's Algorithm H, with some small
|
38 | 38 | * customizations) is only used with the aim of producing just one run,
|
|