forked frompostgres/postgres
- Notifications
You must be signed in to change notification settings - Fork6
Commit1058555
committed
In the Snowball dictionary, don't try to stem excessively-long words.
If the input word exceeds 1000 bytes, don't pass it to the stemmer;just return it as-is after case folding. Such an input is surelynot a word in any human language, so whatever the stemmer mightdo to it would be pretty dubious in the first place. Adding thisrestriction protects us against a known recursion-to-stack-overflowproblem in the Turkish stemmer, and it seems like good insuranceagainst any other safety or performance issues that may exist inthe Snowball stemmers. (I note, for example, that they contain noCHECK_FOR_INTERRUPTS calls, so we really don't want them runningfor a long time.) The threshold of 1000 bytes is arbitrary.An alternative definition could have been to treat such words asstopwords, but that seems like a bigger break from the old behavior.Per report from Egor Chindyaskin and Alexander Lakhin.Thanks to Olly Betts for the recommendation to fix it this way.Discussion:https://postgr.es/m/1661334672.728714027@f473.i.mail.ru1 parent0101f77 commit1058555
1 file changed
+17
-1
lines changedLines changed: 17 additions & 1 deletion
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
275 | 275 |
| |
276 | 276 |
| |
277 | 277 |
| |
278 |
| - | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
279 | 294 |
| |
| 295 | + | |
280 | 296 |
| |
281 | 297 |
| |
282 | 298 |
| |
|
0 commit comments
Comments
(0)