Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit52b6053

Browse files
committed
Fix tsmatchsel() to account properly for null rows.
ts_typanalyze.c computes MCE statistics as fractions of the non-null rows,which seems fairly reasonable, and anyway changing it in released versionswouldn't be a good idea. But then ts_selfuncs.c has to account for that.Failure to do so results in overestimates in columns with a significantfraction of null documents. Back-patch to 8.4 where this stuff wasintroduced.Jesper Krogh
1 parentde623f3 commit52b6053

File tree

2 files changed

+8
-0
lines changed

2 files changed

+8
-0
lines changed

‎src/backend/tsearch/ts_selfuncs.c

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -189,11 +189,17 @@ tsquerysel(VariableStatData *vardata, Datum constval)
189189
/* No most-common-elements info, so do without */
190190
selec=tsquery_opr_selec_no_stats(query);
191191
}
192+
193+
/*
194+
* MCE stats count only non-null rows, so adjust for null rows.
195+
*/
196+
selec *= (1.0-stats->stanullfrac);
192197
}
193198
else
194199
{
195200
/* No stats at all, so do without */
196201
selec=tsquery_opr_selec_no_stats(query);
202+
/* we assume no nulls here, so no stanullfrac correction */
197203
}
198204

199205
returnselec;

‎src/include/catalog/pg_statistic.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -246,6 +246,8 @@ typedef FormData_pg_statistic *Form_pg_statistic;
246246
* type with identifiable elements (for instance, tsvector). staop contains
247247
* the equality operator appropriate to the element type. stavalues contains
248248
* the most common element values, and stanumbers their frequencies. Unlike
249+
* MCV slots, frequencies are measured as the fraction of non-null rows the
250+
* element value appears in, not the frequency of all rows. Also unlike
249251
* MCV slots, the values are sorted into order (to support binary search
250252
* for a particular value). Since this puts the minimum and maximum
251253
* frequencies at unpredictable spots in stanumbers, there are two extra

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp