|
1 | | -$Header: /cvsroot/pgsql/src/backend/utils/mmgr/README,v 1.3 2001/02/15 21:38:26 tgl Exp $ |
| 1 | +$Header: /cvsroot/pgsql/src/backend/utils/mmgr/README,v 1.4 2003/04/30 19:04:12 tgl Exp $ |
2 | 2 |
|
3 | 3 | Notes about memory allocation redesign |
4 | 4 | -------------------------------------- |
@@ -110,109 +110,121 @@ children of a given context, but don't reset or delete that context |
110 | 110 | itself". |
111 | 111 |
|
112 | 112 |
|
113 | | -Top-level contexts |
114 | | ------------------- |
| 113 | +Globally known contexts |
| 114 | +----------------------- |
115 | 115 |
|
116 | | -There will be severaltop-level contexts--- these contexts have no parent |
117 | | -and will bereferencedby global variables. At any instant the system may |
| 116 | +There will be severalwidely-known contextsthat will typically be |
| 117 | +referencedthrough global variables. At any instant the system may |
118 | 118 | contain many additional contexts, but all other contexts should be direct |
119 | | -or indirect children of one of the top-level contexts to ensure they are |
120 | | -not leaked in event of an error. I presently envision these top-level |
121 | | -contexts: |
122 | | - |
123 | | -TopMemoryContext --- allocating here is essentially the same as "malloc", |
124 | | -because this context will never be reset or deleted. This is for stuff |
125 | | -that should live forever, or for stuff that you know you will delete |
126 | | -at the appropriate time. An example is fd.c's tables of open files, |
127 | | -as well as the context management nodes for memory contexts themselves. |
128 | | -Avoid allocating stuff here unless really necessary, and especially |
129 | | -avoid running with CurrentMemoryContext pointing here. |
| 119 | +or indirect children of one of these contexts to ensure they are not |
| 120 | +leaked in event of an error. |
| 121 | + |
| 122 | +TopMemoryContext --- this is the actual top level of the context tree; |
| 123 | +every other context is a direct or indirect child of this one. Allocating |
| 124 | +here is essentially the same as "malloc", because this context will never |
| 125 | +be reset or deleted. This is for stuff that should live forever, or for |
| 126 | +stuff that the controlling module will take care of deleting at the |
| 127 | +appropriate time. An example is fd.c's tables of open files, as well as |
| 128 | +the context management nodes for memory contexts themselves. Avoid |
| 129 | +allocating stuff here unless really necessary, and especially avoid |
| 130 | +running with CurrentMemoryContext pointing here. |
130 | 131 |
|
131 | 132 | PostmasterContext --- this is the postmaster's normal working context. |
132 | 133 | After a backend is spawned, it can delete PostmasterContext to free its |
133 | 134 | copy of memory the postmaster was using that it doesn't need. (Anything |
134 | 135 | that has to be passed from postmaster to backends will be passed in |
135 | | -TopMemoryContext. The postmaster willprobablyhave only TopMemoryContext, |
136 | | -PostmasterContext, andpossiblyErrorContext --- the remaining top-level |
137 | | -contextswill be set up in each backend during startup.) |
| 136 | +TopMemoryContext. The postmaster will have only TopMemoryContext, |
| 137 | +PostmasterContext, and ErrorContext --- the remaining top-level contexts |
| 138 | +will be set up in each backend during startup.) |
138 | 139 |
|
139 | 140 | CacheMemoryContext --- permanent storage for relcache, catcache, and |
140 | 141 | related modules. This will never be reset or deleted, either, so it's |
141 | 142 | not truly necessary to distinguish it from TopMemoryContext. But it |
142 | 143 | seems worthwhile to maintain the distinction for debugging purposes. |
143 | | -(Note: CacheMemoryContext may well have child-contexts with shorter |
144 | | -lifespans. For example, a child context seems like the best place to |
145 | | -keep the subsidiary storage associated with a relcache entry; that way |
146 | | -we can free rule parsetrees and so forth easily, without having to depend |
147 | | -on constructing a reliable version of freeObject().) |
148 | | - |
149 | | -QueryContext --- this is where the storage holding a received query string |
150 | | -is kept, as well as storage that should live as long as the query string, |
151 | | -notably the parsetree constructed from it. This context will be reset at |
152 | | -the top of each cycle of the outer loop of PostgresMain, thereby freeing |
153 | | -the old query and parsetree. We must keep this separate from |
154 | | -TopTransactionContext because a query string might need to live either a |
155 | | -longer or shorter time than a transaction, depending on whether it |
156 | | -contains begin/end commands or not. (This'll also fix the nasty bug that |
157 | | -"vacuum; anything else" crashes if submitted as a single query string, |
158 | | -because vacuum's xact commit frees the memory holding the parsetree...) |
| 144 | +(Note: CacheMemoryContext will have child-contexts with shorter lifespans. |
| 145 | +For example, a child context is the best place to keep the subsidiary |
| 146 | +storage associated with a relcache entry; that way we can free rule |
| 147 | +parsetrees and so forth easily, without having to depend on constructing |
| 148 | +a reliable version of freeObject().) |
| 149 | + |
| 150 | +MessageContext --- this context holds the current command message from the |
| 151 | +frontend, as well as any derived storage that need only live as long as |
| 152 | +the current message (for example, in simple-Query mode the parse and plan |
| 153 | +trees can live here). This context will be reset, and any children |
| 154 | +deleted, at the top of each cycle of the outer loop of PostgresMain. This |
| 155 | +is kept separate from per-transaction and per-portal contexts because a |
| 156 | +query string might need to live either a longer or shorter time than any |
| 157 | +single transaction or portal. |
159 | 158 |
|
160 | 159 | TopTransactionContext --- this holds everything that lives until end of |
161 | 160 | transaction (longer than one statement within a transaction!). An example |
162 | 161 | of what has to be here is the list of pending NOTIFY messages to be sent |
163 | 162 | at xact commit. This context will be reset, and all its children deleted, |
164 | | -at conclusion of each transaction cycle. Note: presently I envision that |
165 | | -this context will NOT be cleared immediately upon error; its contents |
166 | | -will survive anyway until the transaction block is exited by |
167 | | -COMMIT/ROLLBACK. This seems appropriate since we want to move in the |
168 | | -direction of allowing a transaction to continue processing after an error. |
169 | | - |
170 | | -TransactionCommandContext --- this is really a child of |
171 | | -TopTransactionContext, not a top-level context, but we'll probably store a |
172 | | -link to it in a global variable anyway for convenience. All the memory |
173 | | -allocated during planning and execution lives here or in a child context. |
174 | | -This context is deleted at statement completion, whether normal completion |
175 | | -or error abort. |
176 | | - |
177 | | -ErrorContext --- this permanent context will be switched into |
178 | | -for error recovery processing, and then reset on completion of recovery. |
179 | | -We'll arrange to have, say, 8K of memory available in it at all times. |
180 | | -In this way, we can ensure that some memory is available for error |
181 | | -recovery even if the backend has run out of memory otherwise. This should |
182 | | -allow out-of-memory to be treated as a normal ERROR condition, not a FATAL |
183 | | -error. |
184 | | - |
185 | | -If we ever implement nested transactions, there may need to be some |
186 | | -additional levels of transaction-local contexts between |
187 | | -TopTransactionContext and TransactionCommandContext, but that's beyond |
188 | | -the scope of this proposal. |
| 163 | +at conclusion of each transaction cycle. Note: this context is NOT |
| 164 | +cleared immediately upon error; its contents will survive until the |
| 165 | +transaction block is exited by COMMIT/ROLLBACK. |
| 166 | +(If we ever implement nested transactions, TopTransactionContext may need |
| 167 | +to be split into a true "top" pointer and a "current transaction" pointer.) |
| 168 | + |
| 169 | +QueryContext --- this is not actually a separate context, but a global |
| 170 | +variable pointing to the context that holds the current command's parse |
| 171 | +and plan trees. (In simple-Query mode this points to MessageContext; |
| 172 | +when executing a prepared statement it will point at the prepared |
| 173 | +statement's private context.) Generally it is not appropriate for any |
| 174 | +code to use QueryContext as an allocation target --- from the point of |
| 175 | +view of any code that would be referencing the QueryContext variable, |
| 176 | +it's a read-only context. |
| 177 | + |
| 178 | +PortalContext --- this is not actually a separate context either, but a |
| 179 | +global variable pointing to the per-portal context of the currently active |
| 180 | +execution portal. This can be used if it's necessary to allocate storage |
| 181 | +that will live just as long as the execution of the current portal requires. |
| 182 | + |
| 183 | +ErrorContext --- this permanent context will be switched into for error |
| 184 | +recovery processing, and then reset on completion of recovery. We'll |
| 185 | +arrange to have, say, 8K of memory available in it at all times. In this |
| 186 | +way, we can ensure that some memory is available for error recovery even |
| 187 | +if the backend has run out of memory otherwise. This allows out-of-memory |
| 188 | +to be treated as a normal ERROR condition, not a FATAL error. |
| 189 | + |
| 190 | + |
| 191 | +Contexts for prepared statements and portals |
| 192 | +-------------------------------------------- |
| 193 | + |
| 194 | +A prepared-statement object has an associated private context, in which |
| 195 | +the parse and plan trees for its query are stored. Because these trees |
| 196 | +are read-only to the executor, the prepared statement can be re-used many |
| 197 | +times without further copying of these trees. QueryContext points at this |
| 198 | +private context while executing any portal built from the prepared |
| 199 | +statement. |
| 200 | + |
| 201 | +An execution-portal object has a private context that is referenced by |
| 202 | +PortalContext when the portal is active. In the case of a portal created |
| 203 | +by DECLARE CURSOR, this private context contains the query parse and plan |
| 204 | +trees (there being no other object that can hold them). Portals created |
| 205 | +from prepared statements simply reference the prepared statements' trees, |
| 206 | +and won't actually need any storage allocated in their private contexts. |
189 | 207 |
|
190 | 208 |
|
191 | 209 | Transient contexts during execution |
192 | 210 | ----------------------------------- |
193 | 211 |
|
194 | | -The planner will probably have a transient context in which it stores |
195 | | -pathnodes; this will allow it to release the bulk of its temporary space |
196 | | -usage (which can be a lot, for large joins) at completion of planning. |
197 | | -The completed plan tree will be in TransactionCommandContext. |
| 212 | +When creating a prepared statement, the parse and plan trees will be built |
| 213 | +in a temporary context that's a child of MessageContext (so that it will |
| 214 | +go away automatically upon error). On success, the finished plan is |
| 215 | +copied to the prepared statement's private context, and the temp context |
| 216 | +is released; this allows planner temporary space to be recovered before |
| 217 | +execution begins. (In simple-Query mode we'll not bother with the extra |
| 218 | +copy step, so the planner temp space stays around till end of query.) |
198 | 219 |
|
199 | 220 | The top-level executor routines, as well as most of the "plan node" |
200 | | -execution code, will normally run in a context with command lifetime. |
201 | | -(This will be TransactionCommandContext for normal queries, but when |
202 | | -executing a cursor, it will be a context associated with the cursor.) |
203 | | -Most of the memory allocated in these routines is intended to live until |
204 | | -end of query, so this is appropriate for those purposes. We already have |
205 | | -a mechanism --- "tuple table slots" --- for avoiding leakage of tuples, |
206 | | -which is the major kind of short-lived data handled by these routines. |
207 | | -This still leaves a certain amount of explicit pfree'ing needed by plan |
208 | | -node code, but that code largely exists already and is probably not worth |
209 | | -trying to remove. I looked at the possibility of running in a shorter- |
210 | | -lived context (such as a context that gets reset per-tuple), but this |
211 | | -seems fairly impractical. The biggest problem with it is that code in |
212 | | -the index access routines, as well as some other complex algorithms like |
213 | | -tuplesort.c, assumes that palloc'd storage will live across tuples. |
214 | | -For example, rtree uses a palloc'd state stack to keep track of an index |
215 | | -scan. |
| 221 | +execution code, will normally run in a context that is created by |
| 222 | +ExecutorStart and destroyed by ExecutorEnd; this context also holds the |
| 223 | +"plan state" tree built during ExecutorStart. Most of the memory |
| 224 | +allocated in these routines is intended to live until end of query, |
| 225 | +so this is appropriate for those purposes. The executor's top context |
| 226 | +is a child of PortalContext, that is, the per-portal context of the |
| 227 | +portal that represents the query's execution. |
216 | 228 |
|
217 | 229 | The main improvement needed in the executor is that expression evaluation |
218 | 230 | --- both for qual testing and for computation of targetlist entries --- |
@@ -277,7 +289,7 @@ be released on error. Currently it does that through a "portal", |
277 | 289 | which is essentially a child context of TopMemoryContext. While that |
278 | 290 | way still works, it's ugly since xact abort needs special processing |
279 | 291 | to delete the portal. Better would be to use a context that's a child |
280 | | -ofQueryContext and hence is certain to go away as part of normal |
| 292 | +ofPortalContext and hence is certain to go away as part of normal |
281 | 293 | processing. (Eventually we might have an even better solution from |
282 | 294 | nested transactions, but this'll do fine for now.) |
283 | 295 |
|
@@ -371,12 +383,14 @@ the relcache's per-relation contexts). |
371 | 383 | Also, it will be possible to specify a minimum context size. If this |
372 | 384 | value is greater than zero then a block of that size will be grabbed |
373 | 385 | immediately upon context creation, and cleared but not released during |
374 | | -context resets. This feature is needed for ErrorContext (see above). |
375 | | -It is also useful for per-tuple contexts, which will be reset frequently |
376 | | -and typically will not allocate very much space per tuple cycle. We can |
377 | | -save a lot of unnecessary malloc traffic if these contexts hang onto one |
378 | | -allocation block rather than releasing and reacquiring the block on |
379 | | -each tuple cycle. |
| 386 | +context resets. This feature is needed for ErrorContext (see above), |
| 387 | +but will most likely not be used for other contexts. |
| 388 | + |
| 389 | +We expect that per-tuple contexts will be reset frequently and typically |
| 390 | +will not allocate very much space per tuple cycle. To make this usage |
| 391 | +pattern cheap, the first block allocated in a context is not given |
| 392 | +back to malloc() during reset, but just cleared. This avoids malloc |
| 393 | +thrashing. |
380 | 394 |
|
381 | 395 |
|
382 | 396 | Other notes |
|