forked frompostgres/postgres
- Notifications
You must be signed in to change notification settings - Fork6
Commit0fc1af1
committed
Improve memory management in regex compiler.
The previous logic here created a separate pool of arcs for eachstate, so that the out-arcs of each state were physically storedwithin it. Perhaps this choice was driven by trying to not includea "from" pointer within each arc; but Spencer gave up on that idealong ago, and it's hard to see what the value is now. The approachturns out to be fairly disastrous in terms of memory consumption,though. In the first place, NFAs built by this engine seem to haveabout 4 arcs per state on average, with a majority having only oneor two out-arcs. So pre-allocating 10 out-arcs for each state isalready cause for a factor of two or more bloat. Worse, the NFAoptimization phase moves arcs around with abandon. In a large NFA,some of the states will have hundreds of out-arcs, so towards theend of the optimization phase we have a significant number of stateswhose arc pools have room for hundreds of arcs each, even though onlya few of those arcs are in use. We have seen real-world regexes inwhich this effect bloats the memory requirement by 25X or even more.Hence, get rid of the per-state arc pools in favor of a single arcpool for the whole NFA, with variable-sized allocation batchesinstead of always asking for 10 at a time. While we're at it,let's batch the allocations of state structs too, to further reducethe malloc traffic.This incidentally allows moveouts() to be optimized in a similarway to moveins(): when moving an arc to another state, it's nowvalid to just re-link the same arc struct into a different outchain,where before the code invariants required us to make a physicallynew arc and then free the old one.These changes reduce the regex compiler's typical space consumptionfor average-size regexes by about a factor of two, and much more forlarge or complicated regexes. In a large test set of real-worldregexes, we formerly had half a dozen cases that failed with "regularexpression too complex" due to exceeding the REG_MAX_COMPILE_SPACElimit (about 150MB); we would have had to raise that limit tosomething close to 400MB to make them work with the old code. Now,none of those cases need more than 13MB to compile. Furthermore,the test set is about 10% faster overall due to less malloc traffic.Discussion:https://postgr.es/m/168861.1614298592@sss.pgh.pa.us1 parentb3a9e98 commit0fc1af1
File tree
3 files changed
+174
-119
lines changed- src
- backend/regex
- include/regex
3 files changed
+174
-119
lines changedLines changed: 137 additions & 99 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
57 | 57 |
| |
58 | 58 |
| |
59 | 59 |
| |
| 60 | + | |
60 | 61 |
| |
61 | 62 |
| |
62 |
| - | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
63 | 69 |
| |
64 | 70 |
| |
65 | 71 |
| |
| |||
68 | 74 |
| |
69 | 75 |
| |
70 | 76 |
| |
| 77 | + | |
| 78 | + | |
71 | 79 |
| |
72 | 80 |
| |
73 |
| - | |
74 | 81 |
| |
75 | 82 |
| |
76 | 83 |
| |
| |||
99 | 106 |
| |
100 | 107 |
| |
101 | 108 |
| |
102 |
| - | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
103 | 113 |
| |
104 |
| - | |
| 114 | + | |
105 | 115 |
| |
106 |
| - | |
107 |
| - | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
108 | 119 |
| |
109 |
| - | |
| 120 | + | |
| 121 | + | |
110 | 122 |
| |
111 |
| - | |
112 |
| - | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
113 | 126 |
| |
| 127 | + | |
114 | 128 |
| |
115 |
| - | |
116 | 129 |
| |
117 |
| - | |
118 |
| - | |
119 | 130 |
| |
120 | 131 |
| |
121 | 132 |
| |
| |||
138 | 149 |
| |
139 | 150 |
| |
140 | 151 |
| |
141 |
| - | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
142 | 160 |
| |
143 |
| - | |
144 |
| - | |
| 161 | + | |
145 | 162 |
| |
| 163 | + | |
146 | 164 |
| |
147 | 165 |
| |
| 166 | + | |
| 167 | + | |
| 168 | + | |
148 | 169 |
| |
149 | 170 |
| |
150 | 171 |
| |
151 | 172 |
| |
152 | 173 |
| |
153 |
| - | |
154 |
| - | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
155 | 179 |
| |
156 | 180 |
| |
157 | 181 |
| |
158 | 182 |
| |
159 |
| - | |
160 |
| - | |
161 |
| - | |
162 |
| - | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
163 | 189 |
| |
164 | 190 |
| |
165 | 191 |
| |
| |||
240 | 266 |
| |
241 | 267 |
| |
242 | 268 |
| |
243 |
| - | |
244 |
| - | |
245 |
| - | |
246 |
| - | |
247 |
| - | |
248 |
| - | |
249 |
| - | |
250 |
| - | |
251 |
| - | |
252 |
| - | |
253 |
| - | |
254 |
| - | |
255 |
| - | |
256 |
| - | |
257 |
| - | |
258 |
| - | |
259 |
| - | |
260 |
| - | |
261 |
| - | |
262 |
| - | |
263 |
| - | |
264 |
| - | |
265 |
| - | |
266 |
| - | |
267 |
| - | |
268 |
| - | |
| 269 | + | |
| 270 | + | |
269 | 271 |
| |
270 | 272 |
| |
271 | 273 |
| |
| |||
334 | 336 |
| |
335 | 337 |
| |
336 | 338 |
| |
337 |
| - | |
338 |
| - | |
| 339 | + | |
339 | 340 |
| |
340 | 341 |
| |
341 | 342 |
| |
| |||
369 | 370 |
| |
370 | 371 |
| |
371 | 372 |
| |
372 |
| - | |
| 373 | + | |
373 | 374 |
| |
374 | 375 |
| |
375 |
| - | |
376 |
| - | |
| 376 | + | |
377 | 377 |
| |
378 | 378 |
| |
379 | 379 |
| |
380 |
| - | |
381 |
| - | |
| 380 | + | |
| 381 | + | |
382 | 382 |
| |
383 |
| - | |
384 |
| - | |
385 |
| - | |
| 383 | + | |
| 384 | + | |
386 | 385 |
| |
387 |
| - | |
388 |
| - | |
389 |
| - | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
390 | 393 |
| |
391 | 394 |
| |
392 |
| - | |
| 395 | + | |
393 | 396 |
| |
394 | 397 |
| |
395 | 398 |
| |
396 | 399 |
| |
397 | 400 |
| |
398 | 401 |
| |
399 |
| - | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
400 | 406 |
| |
401 | 407 |
| |
402 | 408 |
| |
403 | 409 |
| |
404 | 410 |
| |
405 |
| - | |
406 |
| - | |
407 |
| - | |
408 |
| - | |
409 |
| - | |
410 |
| - | |
411 |
| - | |
412 |
| - | |
413 |
| - | |
414 |
| - | |
415 |
| - | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
416 | 417 |
| |
417 |
| - | |
418 | 418 |
| |
419 |
| - | |
420 |
| - | |
421 | 419 |
| |
422 | 420 |
| |
423 | 421 |
| |
| |||
478 | 476 |
| |
479 | 477 |
| |
480 | 478 |
| |
481 |
| - | |
| 479 | + | |
482 | 480 |
| |
483 | 481 |
| |
484 | 482 |
| |
485 | 483 |
| |
486 | 484 |
| |
487 | 485 |
| |
488 | 486 |
| |
489 |
| - | |
490 |
| - | |
| 487 | + | |
| 488 | + | |
491 | 489 |
| |
492 | 490 |
| |
493 | 491 |
| |
494 |
| - | |
| 492 | + | |
495 | 493 |
| |
496 | 494 |
| |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
497 | 537 |
| |
498 |
| - | |
499 |
| - | |
| 538 | + | |
500 | 539 |
| |
501 | 540 |
| |
502 | 541 |
| |
| |||
1009 | 1048 |
| |
1010 | 1049 |
| |
1011 | 1050 |
| |
| 1051 | + | |
| 1052 | + | |
1012 | 1053 |
| |
1013 | 1054 |
| |
1014 | 1055 |
| |
| |||
1031 | 1072 |
| |
1032 | 1073 |
| |
1033 | 1074 |
| |
1034 |
| - | |
1035 |
| - | |
1036 |
| - | |
| 1075 | + | |
| 1076 | + | |
| 1077 | + | |
1037 | 1078 |
| |
1038 | 1079 |
| |
1039 | 1080 |
| |
| |||
1063 | 1104 |
| |
1064 | 1105 |
| |
1065 | 1106 |
| |
1066 |
| - | |
1067 |
| - | |
| 1107 | + | |
| 1108 | + | |
| 1109 | + | |
| 1110 | + | |
| 1111 | + | |
| 1112 | + | |
1068 | 1113 |
| |
1069 | 1114 |
| |
1070 | 1115 |
| |
| |||
1087 | 1132 |
| |
1088 | 1133 |
| |
1089 | 1134 |
| |
1090 |
| - | |
1091 |
| - | |
| 1135 | + | |
1092 | 1136 |
| |
1093 | 1137 |
| |
1094 | 1138 |
| |
| |||
3413 | 3457 |
| |
3414 | 3458 |
| |
3415 | 3459 |
| |
3416 |
| - | |
3417 | 3460 |
| |
3418 | 3461 |
| |
3419 | 3462 |
| |
| |||
3451 | 3494 |
| |
3452 | 3495 |
| |
3453 | 3496 |
| |
3454 |
| - | |
3455 |
| - | |
3456 |
| - | |
3457 |
| - | |
3458 |
| - | |
3459 |
| - | |
| 3497 | + | |
| 3498 | + | |
3460 | 3499 |
| |
3461 |
| - | |
3462 |
| - | |
3463 |
| - | |
| 3500 | + | |
| 3501 | + | |
3464 | 3502 |
| |
3465 | 3503 |
| |
3466 | 3504 |
| |
|
0 commit comments
Comments
(0)