Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit5e72412

Browse files
committed
Fix recently-introduced performance problem in ts_headline().
The new hlCover() algorithm that I introduced in commitc9b0c67turns out to potentially take O(N^2) or worse time on long documents,if there are many occurrences of individual query words but few or nosubstrings that actually satisfy the query. (One way to hit thisbehavior is with a "common_word & rare_word" type of query.) Thisseems unavoidable given the original goal of checking every substringof the document, so we have to back off that idea. Fortunately, itseems unlikely that anyone would really want headlines spanning all ofa long document, so we can avoid the worse-than-linear behavior byimposing a maximum length of substring that we'll consider.For now, just hard-wire that maximum length as a multiple of max_wordstimes max_fragments. Perhaps at some point somebody will argue forexposing it as a ts_headline parameter, but I'm hesitant to make sucha feature addition in a back-patched bug fix.I also noted that the hlFirstIndex() function I'd added in thatcommit was unnecessarily stupid: it really only needs to check whethera HeadlineWordEntry's item pointer is null or not. This wouldn't makeall that much difference in typical cases with queries having justa few terms, but a cycle shaved is a cycle earned.In addition, add a CHECK_FOR_INTERRUPTS call in TS_execute_recurse.This ensures that hlCover's loop is cancellable if it manages to takea long time, and it may protect some other TS_execute callers as well.Back-patch to 9.6 as the previous commit was. I also chose to add theCHECK_FOR_INTERRUPTS call to 9.5. The old hlCover() algorithm seemsto avoid the O(N^2) behavior, at least on the test case I tried, butnonetheless it's not very quick on a long document.Per report from Stephen Frost.Discussion:https://postgr.es/m/20200724160535.GW12375@tamriel.snowman.net
1 parenta43d087 commit5e72412

File tree

2 files changed

+35
-25
lines changed

2 files changed

+35
-25
lines changed

‎src/backend/tsearch/wparser_def.c

Lines changed: 32 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -2107,33 +2107,29 @@ checkcondition_HL(void *opaque, QueryOperand *val, ExecPhraseData *data)
21072107
* Returns -1 if no such index
21082108
*/
21092109
staticint
2110-
hlFirstIndex(HeadlineParsedText*prs,TSQueryquery,intpos)
2110+
hlFirstIndex(HeadlineParsedText*prs,intpos)
21112111
{
21122112
inti;
21132113

2114-
/* For each word ... */
21152114
for (i=pos;i<prs->curwords;i++)
21162115
{
2117-
/* ... scan the query to see if this word matches any operand */
2118-
QueryItem*item=GETQUERY(query);
2119-
intj;
2120-
2121-
for (j=0;j<query->size;j++)
2122-
{
2123-
if (item->type==QI_VAL&&
2124-
prs->words[i].item==&item->qoperand)
2125-
returni;
2126-
item++;
2127-
}
2116+
if (prs->words[i].item!=NULL)
2117+
returni;
21282118
}
21292119
return-1;
21302120
}
21312121

21322122
/*
21332123
* hlCover: try to find a substring of prs' word list that satisfies query
21342124
*
2135-
* At entry, *p must be the first word index to consider (initialize this to
2136-
* zero, or to the next index after a previous successful search).
2125+
* At entry, *p must be the first word index to consider (initialize this
2126+
* to zero, or to the next index after a previous successful search).
2127+
* We will consider all substrings starting at or after that word, and
2128+
* containing no more than max_cover words. (We need a length limit to
2129+
* keep this from taking O(N^2) time for a long document with many query
2130+
* words but few complete matches. Actually, since checkcondition_HL is
2131+
* roughly O(N) in the length of the substring being checked, it's even
2132+
* worse than that.)
21372133
*
21382134
* On success, sets *p to first word index and *q to last word index of the
21392135
* cover substring, and returns true.
@@ -2142,7 +2138,8 @@ hlFirstIndex(HeadlineParsedText *prs, TSQuery query, int pos)
21422138
* words used in the query.
21432139
*/
21442140
staticbool
2145-
hlCover(HeadlineParsedText*prs,TSQueryquery,int*p,int*q)
2141+
hlCover(HeadlineParsedText*prs,TSQueryquery,intmax_cover,
2142+
int*p,int*q)
21462143
{
21472144
intpmin,
21482145
pmax,
@@ -2156,7 +2153,7 @@ hlCover(HeadlineParsedText *prs, TSQuery query, int *p, int *q)
21562153
* appearing in the query; there's no point in trying endpoints in between
21572154
* such points.
21582155
*/
2159-
pmin=hlFirstIndex(prs,query,*p);
2156+
pmin=hlFirstIndex(prs,*p);
21602157
while (pmin >=0)
21612158
{
21622159
/* This useless assignment just keeps stupider compilers quiet */
@@ -2177,7 +2174,7 @@ hlCover(HeadlineParsedText *prs, TSQuery query, int *p, int *q)
21772174
return true;
21782175
}
21792176
/* Nope, so advance pmax to next feasible endpoint */
2180-
nextpmax=hlFirstIndex(prs,query,pmax+1);
2177+
nextpmax=hlFirstIndex(prs,pmax+1);
21812178

21822179
/*
21832180
* If this is our first advance past pmin, then the result is also
@@ -2188,7 +2185,7 @@ hlCover(HeadlineParsedText *prs, TSQuery query, int *p, int *q)
21882185
nextpmin=nextpmax;
21892186
pmax=nextpmax;
21902187
}
2191-
while (pmax >=0);
2188+
while (pmax >=0&&pmax-pmin<max_cover);
21922189
/* No luck here, so try next feasible startpoint */
21932190
pmin=nextpmin;
21942191
}
@@ -2290,7 +2287,7 @@ get_next_fragment(HeadlineParsedText *prs, int *startpos, int *endpos,
22902287
staticvoid
22912288
mark_hl_fragments(HeadlineParsedText*prs,TSQueryquery,boolhighlightall,
22922289
intshortword,intmin_words,
2293-
intmax_words,intmax_fragments)
2290+
intmax_words,intmax_fragments,intmax_cover)
22942291
{
22952292
int32poslen,
22962293
curlen,
@@ -2317,7 +2314,7 @@ mark_hl_fragments(HeadlineParsedText *prs, TSQuery query, bool highlightall,
23172314
covers=palloc(maxcovers*sizeof(CoverPos));
23182315

23192316
/* get all covers */
2320-
while (hlCover(prs,query,&p,&q))
2317+
while (hlCover(prs,query,max_cover,&p,&q))
23212318
{
23222319
startpos=p;
23232320
endpos=q;
@@ -2472,7 +2469,7 @@ mark_hl_fragments(HeadlineParsedText *prs, TSQuery query, bool highlightall,
24722469
*/
24732470
staticvoid
24742471
mark_hl_words(HeadlineParsedText*prs,TSQueryquery,boolhighlightall,
2475-
intshortword,intmin_words,intmax_words)
2472+
intshortword,intmin_words,intmax_words,intmax_cover)
24762473
{
24772474
intp=0,
24782475
q=0;
@@ -2490,7 +2487,7 @@ mark_hl_words(HeadlineParsedText *prs, TSQuery query, bool highlightall,
24902487
if (!highlightall)
24912488
{
24922489
/* examine all covers, select a headline using the best one */
2493-
while (hlCover(prs,query,&p,&q))
2490+
while (hlCover(prs,query,max_cover,&p,&q))
24942491
{
24952492
/*
24962493
* Count words (curlen) and interesting words (poslen) within
@@ -2646,6 +2643,7 @@ prsd_headline(PG_FUNCTION_ARGS)
26462643
intshortword=3;
26472644
intmax_fragments=0;
26482645
boolhighlightall= false;
2646+
intmax_cover;
26492647
ListCell*l;
26502648

26512649
/* Extract configuration option values */
@@ -2685,6 +2683,15 @@ prsd_headline(PG_FUNCTION_ARGS)
26852683
defel->defname)));
26862684
}
26872685

2686+
/*
2687+
* We might eventually make max_cover a user-settable parameter, but for
2688+
* now, just compute a reasonable value based on max_words and
2689+
* max_fragments.
2690+
*/
2691+
max_cover=Max(max_words*10,100);
2692+
if (max_fragments>0)
2693+
max_cover *=max_fragments;
2694+
26882695
/* in HighlightAll mode these parameters are ignored */
26892696
if (!highlightall)
26902697
{
@@ -2709,10 +2716,10 @@ prsd_headline(PG_FUNCTION_ARGS)
27092716
/* Apply appropriate headline selector */
27102717
if (max_fragments==0)
27112718
mark_hl_words(prs,query,highlightall,shortword,
2712-
min_words,max_words);
2719+
min_words,max_words,max_cover);
27132720
else
27142721
mark_hl_fragments(prs,query,highlightall,shortword,
2715-
min_words,max_words,max_fragments);
2722+
min_words,max_words,max_fragments,max_cover);
27162723

27172724
/* Fill in default values for string options */
27182725
if (!prs->startsel)

‎src/backend/utils/adt/tsvector_op.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1868,6 +1868,9 @@ TS_execute_recurse(QueryItem *curitem, void *arg, uint32 flags,
18681868
/* since this function recurses, it could be driven to stack overflow */
18691869
check_stack_depth();
18701870

1871+
/* ... and let's check for query cancel while we're at it */
1872+
CHECK_FOR_INTERRUPTS();
1873+
18711874
if (curitem->type==QI_VAL)
18721875
returnchkcond(arg, (QueryOperand*)curitem,
18731876
NULL/* don't need position info */ ) ?TS_YES :TS_NO;

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp