Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitbd371ea

Browse files
committed
Fix logical errors in tsquery selectivity estimation for prefix queries.
I made multiple errors in commit97532f7,stemming mostly from failure to think about the available frequency dataas being element frequencies not value frequencies (so that occurrences ofdifferent elements are not mutually exclusive). This led to sillinessessuch as estimating that "word" would match more rows than "word:*".The choice to clamp to a minimum estimate of DEFAULT_TS_MATCH_SEL alsoseems pretty ill-considered in hindsight, as it would frequently result inan estimate much larger than the available data suggests. We do need somesort of clamp, since a pattern not matching any of the MCELEMs probablystill needs a selectivity estimate of more than zero. I chose instead toclamp to at least what a non-MCELEM word would be estimated as, preservingthe property that "word:*" doesn't get an estimate less than plain "word",whether or not the word appears in MCELEM.Per investigation of a gripe from Bill Martin, though I suspect that hisexample case actually isn't even reaching the erroneous code.Back-patch to 9.1 where this code was introduced.
1 parentef06dca commitbd371ea

File tree

1 file changed

+28
-16
lines changed

1 file changed

+28
-16
lines changed

‎src/backend/tsearch/ts_selfuncs.c

Lines changed: 28 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -299,23 +299,29 @@ tsquery_opr_selec(QueryItem *item, char *operand,
299299
{
300300
/* Prefix match, ie the query item is lexeme:* */
301301
Selectivitymatched,
302-
allmcvs;
303-
inti;
302+
allmces;
303+
inti,
304+
n_matched;
304305

305306
/*
306-
* Our strategy is to scan through the MCV list and add up the
307-
* frequencies of the ones that match the prefix, thereby assuming
308-
* that the MCVs are representative of the whole lexeme population
309-
* in this respect. Compare histogram_selectivity().
307+
* Our strategy is to scan through the MCELEM list and combine the
308+
* frequencies of the ones that match the prefix. We then
309+
* extrapolate the fraction of matching MCELEMs to the remaining
310+
* rows, assuming that the MCELEMs are representative of the whole
311+
* lexeme population in this respect. (Compare
312+
* histogram_selectivity().) Note that these are most common
313+
* elements not most common values, so they're not mutually
314+
* exclusive. We treat occurrences as independent events.
310315
*
311316
* This is only a good plan if we have a pretty fair number of
312-
*MCVs available; we set the threshold at 100. If no stats or
317+
*MCELEMs available; we set the threshold at 100. If no stats or
313318
* insufficient stats, arbitrarily use DEFAULT_TS_MATCH_SEL*4.
314319
*/
315320
if (lookup==NULL||length<100)
316321
return (Selectivity) (DEFAULT_TS_MATCH_SEL*4);
317322

318-
matched=allmcvs=0;
323+
matched=allmces=0;
324+
n_matched=0;
319325
for (i=0;i<length;i++)
320326
{
321327
TextFreq*t=lookup+i;
@@ -324,20 +330,26 @@ tsquery_opr_selec(QueryItem *item, char *operand,
324330
if (tlen >=key.length&&
325331
strncmp(key.lexeme,VARDATA_ANY(t->element),
326332
key.length)==0)
327-
matched+=t->frequency;
328-
allmcvs+=t->frequency;
333+
{
334+
matched+=t->frequency-matched*t->frequency;
335+
n_matched++;
336+
}
337+
allmces+=t->frequency-allmces*t->frequency;
329338
}
330339

331-
if (allmcvs>0)/* paranoia about zero divide */
332-
selec=matched /allmcvs;
333-
else
334-
selec= (Selectivity) (DEFAULT_TS_MATCH_SEL*4);
340+
/* Clamp to ensure sanity in the face of roundoff error */
341+
CLAMP_PROBABILITY(matched);
342+
CLAMP_PROBABILITY(allmces);
343+
344+
selec=matched+ (1.0-allmces)* ((double)n_matched /length);
335345

336346
/*
337347
* In any case, never believe that a prefix match has selectivity
338-
* less than DEFAULT_TS_MATCH_SEL.
348+
* less than we would assign for a non-MCELEM lexeme. This
349+
* preserves the property that "word:*" should be estimated to
350+
* match at least as many rows as "word" would be.
339351
*/
340-
selec=Max(DEFAULT_TS_MATCH_SEL,selec);
352+
selec=Max(Min(DEFAULT_TS_MATCH_SEL,minfreq /2),selec);
341353
}
342354
else
343355
{

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp