forked frompostgres/postgres
- Notifications
You must be signed in to change notification settings - Fork6
Commitc9b0c67
committed
Fix default text search parser's ts_headline code for phrase queries.
This code could produce very poor results when asked to highlight astring based on a query using phrase-match operators. The root causeis that hlCover(), which is supposed to find a minimal substring thatmatches the query, was written assuming that word position is notsignificant. I'm only 95% convinced that its algorithm was correct evenfor plain AND/OR queries; but it definitely fails completely for phrasematches, causing it to possibly not identify a cover string at all.Hence, rewrite hlCover() with a less-tense algorithm that just triesall the possible substrings, earlier and shorter ones first. (This isnot as bad as it sounds performance-wise, because all of the stringmatching has been done already: the repeated tsquery match checksboil down to pointer comparisons.)Unfortunately, since that approach produces more candidate coverstrings than before, it also exposes that there were bugs in theheuristics in mark_hl_words() for selecting a best cover string.Fixes there include:* Do not apply the ShortWord filter to words that appear in the query.* Remove a misguided optimization for quickly rejecting a cover.* Fix order-of-operation bug that could cause computation of awrong figure of merit (poslen) when shortening a cover.* Change the preference rule so that candidate headlines that do notinclude their whole cover string (after MaxWords trimming) are lowestpriority, since they may not actually satisfy the user's query.This results in some changes in existing regression test cases,but they all seem reasonable. Note in particular that the testsinvolving strings like "1 2 3" were previously being affected bythe ShortWord filter, masking the normal matching behavior.Per bug #16345 from Augustinas Jokubauskas; the new test cases arebased on that example. Back-patch to 9.6 where phrase search wasadded to tsquery.Discussion:https://postgr.es/m/16345-2e0cf5cddbdcd3b4@postgresql.org1 parentb10f8bb commitc9b0c67
File tree
3 files changed
+141
-86
lines changed- src
- backend/tsearch
- test/regress
- expected
- sql
3 files changed
+141
-86
lines changedLines changed: 99 additions & 74 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
1942 | 1942 |
| |
1943 | 1943 |
| |
1944 | 1944 |
| |
1945 |
| - | |
| 1945 | + | |
1946 | 1946 |
| |
1947 |
| - | |
| 1947 | + | |
| 1948 | + | |
1948 | 1949 |
| |
1949 | 1950 |
| |
1950 | 1951 |
| |
| |||
2003 | 2004 |
| |
2004 | 2005 |
| |
2005 | 2006 |
| |
2006 |
| - | |
2007 |
| - | |
2008 |
| - | |
| 2007 | + | |
| 2008 | + | |
| 2009 | + | |
| 2010 | + | |
| 2011 | + | |
| 2012 | + | |
| 2013 | + | |
2009 | 2014 |
| |
2010 |
| - | |
2011 |
| - | |
2012 |
| - | |
2013 |
| - | |
2014 |
| - | |
2015 |
| - | |
2016 |
| - | |
| 2015 | + | |
2017 | 2016 |
| |
2018 |
| - | |
| 2017 | + | |
| 2018 | + | |
2019 | 2019 |
| |
2020 |
| - | |
| 2020 | + | |
| 2021 | + | |
| 2022 | + | |
| 2023 | + | |
| 2024 | + | |
2021 | 2025 |
| |
| 2026 | + | |
| 2027 | + | |
| 2028 | + | |
2022 | 2029 |
| |
2023 |
| - | |
2024 |
| - | |
2025 |
| - | |
2026 |
| - | |
2027 |
| - | |
2028 |
| - | |
2029 |
| - | |
2030 |
| - | |
2031 |
| - | |
2032 |
| - | |
2033 | 2030 |
| |
2034 |
| - | |
2035 | 2031 |
| |
| 2032 | + | |
| 2033 | + | |
2036 | 2034 |
| |
2037 |
| - | |
2038 |
| - | |
| 2035 | + | |
| 2036 | + | |
| 2037 | + | |
| 2038 | + | |
| 2039 | + | |
| 2040 | + | |
| 2041 | + | |
| 2042 | + | |
| 2043 | + | |
| 2044 | + | |
| 2045 | + | |
| 2046 | + | |
| 2047 | + | |
| 2048 | + | |
| 2049 | + | |
| 2050 | + | |
| 2051 | + | |
| 2052 | + | |
| 2053 | + | |
| 2054 | + | |
2039 | 2055 |
| |
2040 |
| - | |
2041 |
| - | |
| 2056 | + | |
| 2057 | + | |
| 2058 | + | |
| 2059 | + | |
| 2060 | + | |
| 2061 | + | |
| 2062 | + | |
| 2063 | + | |
2042 | 2064 |
| |
2043 |
| - | |
| 2065 | + | |
| 2066 | + | |
| 2067 | + | |
| 2068 | + | |
| 2069 | + | |
| 2070 | + | |
| 2071 | + | |
2044 | 2072 |
| |
2045 |
| - | |
2046 |
| - | |
2047 |
| - | |
2048 |
| - | |
2049 |
| - | |
2050 |
| - | |
| 2073 | + | |
| 2074 | + | |
| 2075 | + | |
| 2076 | + | |
2051 | 2077 |
| |
2052 |
| - | |
2053 |
| - | |
2054 |
| - | |
| 2078 | + | |
| 2079 | + | |
| 2080 | + | |
2055 | 2081 |
| |
2056 |
| - | |
2057 |
| - | |
2058 |
| - | |
2059 |
| - | |
2060 |
| - | |
2061 |
| - | |
2062 |
| - | |
| 2082 | + | |
| 2083 | + | |
2063 | 2084 |
| |
2064 |
| - | |
2065 |
| - | |
2066 |
| - | |
2067 |
| - | |
2068 |
| - | |
2069 |
| - | |
2070 |
| - | |
2071 |
| - | |
| 2085 | + | |
| 2086 | + | |
| 2087 | + | |
| 2088 | + | |
| 2089 | + | |
| 2090 | + | |
| 2091 | + | |
| 2092 | + | |
2072 | 2093 |
| |
| 2094 | + | |
| 2095 | + | |
| 2096 | + | |
2073 | 2097 |
| |
2074 |
| - | |
2075 | 2098 |
| |
2076 | 2099 |
| |
2077 | 2100 |
| |
| |||
2357 | 2380 |
| |
2358 | 2381 |
| |
2359 | 2382 |
| |
| 2383 | + | |
2360 | 2384 |
| |
2361 | 2385 |
| |
2362 | 2386 |
| |
2363 | 2387 |
| |
2364 |
| - | |
| 2388 | + | |
2365 | 2389 |
| |
2366 | 2390 |
| |
2367 | 2391 |
| |
| |||
2387 | 2411 |
| |
2388 | 2412 |
| |
2389 | 2413 |
| |
2390 |
| - | |
2391 |
| - | |
2392 |
| - | |
2393 |
| - | |
2394 |
| - | |
2395 |
| - | |
2396 |
| - | |
2397 |
| - | |
2398 | 2414 |
| |
2399 | 2415 |
| |
2400 | 2416 |
| |
| |||
2449 | 2465 |
| |
2450 | 2466 |
| |
2451 | 2467 |
| |
| 2468 | + | |
| 2469 | + | |
2452 | 2470 |
| |
2453 | 2471 |
| |
2454 | 2472 |
| |
2455 | 2473 |
| |
2456 |
| - | |
2457 |
| - | |
2458 |
| - | |
| 2474 | + | |
2459 | 2475 |
| |
2460 | 2476 |
| |
2461 | 2477 |
| |
2462 | 2478 |
| |
2463 |
| - | |
2464 |
| - | |
2465 |
| - | |
2466 |
| - | |
| 2479 | + | |
| 2480 | + | |
| 2481 | + | |
| 2482 | + | |
| 2483 | + | |
| 2484 | + | |
| 2485 | + | |
| 2486 | + | |
| 2487 | + | |
| 2488 | + | |
| 2489 | + | |
2467 | 2490 |
| |
2468 |
| - | |
2469 |
| - | |
2470 |
| - | |
| 2491 | + | |
| 2492 | + | |
| 2493 | + | |
| 2494 | + | |
2471 | 2495 |
| |
2472 | 2496 |
| |
2473 | 2497 |
| |
2474 | 2498 |
| |
| 2499 | + | |
2475 | 2500 |
| |
2476 | 2501 |
| |
2477 | 2502 |
| |
|
Lines changed: 31 additions & 12 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
1301 | 1301 |
| |
1302 | 1302 |
| |
1303 | 1303 |
| |
1304 |
| - | |
1305 |
| - | |
1306 |
| - | |
1307 |
| - | |
1308 |
| - | |
1309 |
| - | |
| 1304 | + | |
| 1305 | + | |
| 1306 | + | |
| 1307 | + | |
| 1308 | + | |
| 1309 | + | |
1310 | 1310 |
| |
1311 | 1311 |
| |
1312 | 1312 |
| |
| |||
1328 | 1328 |
| |
1329 | 1329 |
| |
1330 | 1330 |
| |
| 1331 | + | |
| 1332 | + | |
| 1333 | + | |
| 1334 | + | |
| 1335 | + | |
| 1336 | + | |
| 1337 | + | |
| 1338 | + | |
| 1339 | + | |
1331 | 1340 |
| |
1332 | 1341 |
| |
1333 | 1342 |
| |
| |||
1364 | 1373 |
| |
1365 | 1374 |
| |
1366 | 1375 |
| |
1367 |
| - | |
1368 |
| - | |
1369 |
| - | |
| 1376 | + | |
| 1377 | + | |
| 1378 | + | |
1370 | 1379 |
| |
1371 | 1380 |
| |
1372 | 1381 |
| |
1373 |
| - | |
1374 |
| - | |
1375 |
| - | |
| 1382 | + | |
| 1383 | + | |
| 1384 | + | |
1376 | 1385 |
| |
1377 | 1386 |
| |
1378 | 1387 |
| |
| |||
1467 | 1476 |
| |
1468 | 1477 |
| |
1469 | 1478 |
| |
| 1479 | + | |
| 1480 | + | |
| 1481 | + | |
| 1482 | + | |
| 1483 | + | |
| 1484 | + | |
| 1485 | + | |
| 1486 | + | |
| 1487 | + | |
| 1488 | + | |
1470 | 1489 |
| |
1471 | 1490 |
| |
1472 | 1491 |
| |
|
Lines changed: 11 additions & 0 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
384 | 384 |
| |
385 | 385 |
| |
386 | 386 |
| |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
387 | 392 |
| |
388 | 393 |
| |
389 | 394 |
| |
| |||
454 | 459 |
| |
455 | 460 |
| |
456 | 461 |
| |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
457 | 468 |
| |
458 | 469 |
| |
459 | 470 |
| |
|
0 commit comments
Comments
(0)