forked frompostgres/postgres
- Notifications
You must be signed in to change notification settings - Fork6
Commitcebc1d3
committed
Fix regex engine to suppress useless concatenation sub-REs.
The comment for parsebranch() claims that it avoids generatingunnecessary concatenation nodes in the "subre" tree, but it missedsome significant cases. Once we've decided that a given atom is"messy" and can't be bundled with the preceding atom(s) of thecurrent regex branch, parseqatom() always generated two new concatnodes, one to concat the messy atom to what follows it in the branch,and an upper node to concatenate the preceding part of the branchto that one. But one or both of these could be unnecessary, if themessy atom is the first, last, or only one in the branch. Improvethe code to suppress such useless concat nodes, along with theno-op child nodes representing empty chunks of a branch.Reducing the number of subre tree nodes offers significant savingsnot only at execution but during compilation, because each subre nodehas its own NFA that has to be separately optimized. (Maybe somedaywe'll figure out how to share the optimization work across multipletree nodes, but it doesn't look easy.) Eliminating upper tree nodesis especially useful because they tend to have larger NFAs.This is part of a patch series that in total reduces the regex engine'sruntime by about a factor of four on a large corpus of real-world regexes.Patch by me, reviewed by Joel JacobsonDiscussion:https://postgr.es/m/1340281.1613018383@sss.pgh.pa.us1 parent824bf71 commitcebc1d3
1 file changed
+76
-7
lines changedLines changed: 76 additions & 7 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
1123 | 1123 |
| |
1124 | 1124 |
| |
1125 | 1125 |
| |
1126 |
| - | |
| 1126 | + | |
| 1127 | + | |
| 1128 | + | |
| 1129 | + | |
1127 | 1130 |
| |
1128 |
| - | |
| 1131 | + | |
| 1132 | + | |
| 1133 | + | |
| 1134 | + | |
| 1135 | + | |
| 1136 | + | |
| 1137 | + | |
1129 | 1138 |
| |
1130 | 1139 |
| |
1131 | 1140 |
| |
1132 | 1141 |
| |
1133 | 1142 |
| |
| 1143 | + | |
1134 | 1144 |
| |
1135 | 1145 |
| |
1136 | 1146 |
| |
| |||
1220 | 1230 |
| |
1221 | 1231 |
| |
1222 | 1232 |
| |
| 1233 | + | |
| 1234 | + | |
1223 | 1235 |
| |
| 1236 | + | |
| 1237 | + | |
| 1238 | + | |
| 1239 | + | |
| 1240 | + | |
| 1241 | + | |
| 1242 | + | |
| 1243 | + | |
| 1244 | + | |
| 1245 | + | |
| 1246 | + | |
| 1247 | + | |
| 1248 | + | |
| 1249 | + | |
| 1250 | + | |
| 1251 | + | |
| 1252 | + | |
| 1253 | + | |
| 1254 | + | |
| 1255 | + | |
| 1256 | + | |
| 1257 | + | |
| 1258 | + | |
| 1259 | + | |
| 1260 | + | |
| 1261 | + | |
| 1262 | + | |
| 1263 | + | |
| 1264 | + | |
1224 | 1265 |
| |
1225 | 1266 |
| |
| 1267 | + | |
| 1268 | + | |
| 1269 | + | |
| 1270 | + | |
1226 | 1271 |
| |
1227 |
| - | |
| 1272 | + | |
| 1273 | + | |
| 1274 | + | |
| 1275 | + | |
| 1276 | + | |
| 1277 | + | |
| 1278 | + | |
| 1279 | + | |
| 1280 | + | |
| 1281 | + | |
| 1282 | + | |
| 1283 | + | |
| 1284 | + | |
| 1285 | + | |
| 1286 | + | |
| 1287 | + | |
| 1288 | + | |
| 1289 | + | |
| 1290 | + | |
| 1291 | + | |
| 1292 | + | |
| 1293 | + | |
| 1294 | + | |
| 1295 | + | |
| 1296 | + | |
| 1297 | + | |
| 1298 | + | |
| 1299 | + | |
| 1300 | + | |
1228 | 1301 |
| |
1229 |
| - | |
1230 |
| - | |
1231 |
| - | |
1232 |
| - | |
1233 | 1302 |
| |
1234 | 1303 |
| |
1235 | 1304 |
| |
|
0 commit comments
Comments
(0)