Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit4e2477b

Browse files
committed
Fix strange behavior (and possible crashes) in full text phrase search.
In an attempt to simplify the tsquery matching engine, the originalphrase search patch invented rewrite rules that would rearrange atsquery so that no AND/OR/NOT operator appeared below a PHRASE operator.But this approach had numerous problems. The rearrangement step wasmissed by ts_rewrite (and perhaps other places), allowing tsqueriesto be created that would cause Assert failures or perhaps crashes atexecution, as reported by Andreas Seltenreich. The rewrite ruleseffectively defined semantics for operators underneath PHRASE that werebuggy, or at least unintuitive. And because rewriting was done intsqueryin() rather than at execution, the rearrangement was user-visible,which is not very desirable --- for example, it might cause unexpectedmatches or failures to match in ts_rewrite.As a somewhat independent problem, the behavior of nested PHRASE operatorswas only sane for left-deep trees; queries like "x <-> (y <-> z)" did notbehave intuitively at all.To fix, get rid of the rewrite logic altogether, and instead teach thetsquery execution engine to manage AND/OR/NOT below a PHRASE operatorby explicitly computing the match location(s) and match widths for theseoperators.This requires introducing some additional fields into the publicly visibleExecPhraseData struct; but since there's no way for third-party code topass such a struct to TS_phrase_execute, it shouldn't create an ABI problemas long as we don't move the offsets of the existing fields.Another related problem was that index searches supposed that "!x <-> y"could be lossily approximated as "!x & y", which isn't correct becausethe latter will reject, say, "x q y" which the query itself accepts.This required some tweaking in TS_execute_ternary along with the maintsquery engine.Back-patch to 9.6 where phrase operators were introduced. While thiscould be argued to change behavior more than we'd like in a stable branch,we have to do something about the crash hazards and index-vs-seqscaninconsistency, and it doesn't seem desirable to let the unintuitivebehaviors induced by the rewriting implementation stand as precedent.Discussion:https://postgr.es/m/28215.1481999808@sss.pgh.pa.usDiscussion:https://postgr.es/m/26706.1482087250@sss.pgh.pa.us
1 parenteaac6c7 commit4e2477b

File tree

14 files changed

+673
-599
lines changed

14 files changed

+673
-599
lines changed

‎doc/src/sgml/datatype.sgml

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3959,15 +3959,7 @@ SELECT 'fat &amp; rat &amp; ! cat'::tsquery;
39593959
tsquery
39603960
------------------------
39613961
'fat' &amp; 'rat' &amp; !'cat'
3962-
3963-
SELECT '(fat | rat) &lt;-&gt; cat'::tsquery;
3964-
tsquery
3965-
-----------------------------------
3966-
'fat' &lt;-&gt; 'cat' | 'rat' &lt;-&gt; 'cat'
39673962
</programlisting>
3968-
3969-
The last example demonstrates that <type>tsquery</type> sometimes
3970-
rearranges nested operators into a logically equivalent formulation.
39713963
</para>
39723964

39733965
<para>

‎doc/src/sgml/textsearch.sgml

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -264,7 +264,7 @@ SELECT 'fat &amp; cow'::tsquery @@ 'a fat cat sat on a mat and ate a fat rat'::t
264264
text, any more than a <type>tsvector</type> is. A <type>tsquery</type>
265265
contains search terms, which must be already-normalized lexemes, and
266266
may combine multiple terms using AND, OR, NOT, and FOLLOWED BY operators.
267-
(For details see <xref linkend="datatype-tsquery">.) There are
267+
(Forsyntaxdetails see <xref linkend="datatype-tsquery">.) There are
268268
functions <function>to_tsquery</>, <function>plainto_tsquery</>,
269269
and <function>phraseto_tsquery</>
270270
that are helpful in converting user-written text into a proper
@@ -323,6 +323,8 @@ text @@ text
323323
at least one of its arguments must appear, while the <literal>!</> (NOT)
324324
operator specifies that its argument must <emphasis>not</> appear in
325325
order to have a match.
326+
For example, the query <literal>fat &amp; ! rat</> matches documents that
327+
contain <literal>fat</> but not <literal>rat</>.
326328
</para>
327329

328330
<para>
@@ -377,6 +379,28 @@ SELECT phraseto_tsquery('the cats ate the rats');
377379
then <literal>&amp;</literal>, then <literal>&lt;-&gt;</literal>,
378380
and <literal>!</literal> most tightly.
379381
</para>
382+
383+
<para>
384+
It's worth noticing that the AND/OR/NOT operators mean something subtly
385+
different when they are within the arguments of a FOLLOWED BY operator
386+
than when they are not, because within FOLLOWED BY the exact position of
387+
the match is significant. For example, normally <literal>!x</> matches
388+
only documents that do not contain <literal>x</> anywhere.
389+
But <literal>!x &lt;-&gt; y</> matches <literal>y</> if it is not
390+
immediately after an <literal>x</>; an occurrence of <literal>x</>
391+
elsewhere in the document does not prevent a match. Another example is
392+
that <literal>x &amp; y</> normally only requires that <literal>x</>
393+
and <literal>y</> both appear somewhere in the document, but
394+
<literal>(x &amp; y) &lt;-&gt; z</> requires <literal>x</>
395+
and <literal>y</> to match at the same place, immediately before
396+
a <literal>z</>. Thus this query behaves differently from
397+
<literal>x &lt;-&gt; z &amp; y &lt;-&gt; z</>, which will match a
398+
document containing two separate sequences <literal>x z</> and
399+
<literal>y z</>. (This specific query is useless as written,
400+
since <literal>x</> and <literal>y</> could not match at the same place;
401+
but with more complex situations such as prefix-match patterns, a query
402+
of this form could be useful.)
403+
</para>
380404
</sect2>
381405

382406
<sect2 id="textsearch-intro-configurations">

‎src/backend/utils/adt/tsginidx.c

Lines changed: 18 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -212,7 +212,7 @@ checkcondition_gin(void *checkval, QueryOperand *val, ExecPhraseData *data)
212212
* Evaluate tsquery boolean expression using ternary logic.
213213
*/
214214
staticGinTernaryValue
215-
TS_execute_ternary(GinChkVal*gcv,QueryItem*curitem)
215+
TS_execute_ternary(GinChkVal*gcv,QueryItem*curitem,boolin_phrase)
216216
{
217217
GinTernaryValueval1,
218218
val2,
@@ -230,25 +230,32 @@ TS_execute_ternary(GinChkVal *gcv, QueryItem *curitem)
230230
switch (curitem->qoperator.oper)
231231
{
232232
caseOP_NOT:
233-
result=TS_execute_ternary(gcv,curitem+1);
233+
/* In phrase search, always return MAYBE since we lack positions */
234+
if (in_phrase)
235+
returnGIN_MAYBE;
236+
result=TS_execute_ternary(gcv,curitem+1,in_phrase);
234237
if (result==GIN_MAYBE)
235238
returnresult;
236239
return !result;
237240

238241
caseOP_PHRASE:
239242

240243
/*
241-
* GIN doesn't contain any information about positions, treat
244+
* GIN doesn't contain any information about positions,sotreat
242245
* OP_PHRASE as OP_AND with recheck requirement
243246
*/
244-
*gcv->need_recheck= true;
247+
*(gcv->need_recheck)= true;
248+
/* Pass down in_phrase == true in case there's a NOT below */
249+
in_phrase= true;
250+
245251
/* FALL THRU */
246252

247253
caseOP_AND:
248-
val1=TS_execute_ternary(gcv,curitem+curitem->qoperator.left);
254+
val1=TS_execute_ternary(gcv,curitem+curitem->qoperator.left,
255+
in_phrase);
249256
if (val1==GIN_FALSE)
250257
returnGIN_FALSE;
251-
val2=TS_execute_ternary(gcv,curitem+1);
258+
val2=TS_execute_ternary(gcv,curitem+1,in_phrase);
252259
if (val2==GIN_FALSE)
253260
returnGIN_FALSE;
254261
if (val1==GIN_TRUE&&val2==GIN_TRUE)
@@ -257,10 +264,11 @@ TS_execute_ternary(GinChkVal *gcv, QueryItem *curitem)
257264
returnGIN_MAYBE;
258265

259266
caseOP_OR:
260-
val1=TS_execute_ternary(gcv,curitem+curitem->qoperator.left);
267+
val1=TS_execute_ternary(gcv,curitem+curitem->qoperator.left,
268+
in_phrase);
261269
if (val1==GIN_TRUE)
262270
returnGIN_TRUE;
263-
val2=TS_execute_ternary(gcv,curitem+1);
271+
val2=TS_execute_ternary(gcv,curitem+1,in_phrase);
264272
if (val2==GIN_TRUE)
265273
returnGIN_TRUE;
266274
if (val1==GIN_FALSE&&val2==GIN_FALSE)
@@ -307,7 +315,7 @@ gin_tsquery_consistent(PG_FUNCTION_ARGS)
307315

308316
res=TS_execute(GETQUERY(query),
309317
&gcv,
310-
TS_EXEC_CALC_NOT |TS_EXEC_PHRASE_AS_AND,
318+
TS_EXEC_CALC_NOT |TS_EXEC_PHRASE_NO_POS,
311319
checkcondition_gin);
312320
}
313321

@@ -343,7 +351,7 @@ gin_tsquery_triconsistent(PG_FUNCTION_ARGS)
343351
gcv.map_item_operand= (int*) (extra_data[0]);
344352
gcv.need_recheck=&recheck;
345353

346-
res=TS_execute_ternary(&gcv,GETQUERY(query));
354+
res=TS_execute_ternary(&gcv,GETQUERY(query), false);
347355

348356
if (res==GIN_TRUE&&recheck)
349357
res=GIN_MAYBE;

‎src/backend/utils/adt/tsgistidx.c

Lines changed: 7 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -359,25 +359,22 @@ gtsvector_consistent(PG_FUNCTION_ARGS)
359359
if (ISALLTRUE(key))
360360
PG_RETURN_BOOL(true);
361361

362-
PG_RETURN_BOOL(TS_execute(
363-
GETQUERY(query),
362+
/* since signature is lossy, cannot specify CALC_NOT here */
363+
PG_RETURN_BOOL(TS_execute(GETQUERY(query),
364364
(void*)GETSIGN(key),
365-
TS_EXEC_PHRASE_AS_AND,
366-
checkcondition_bit
367-
));
365+
TS_EXEC_PHRASE_NO_POS,
366+
checkcondition_bit));
368367
}
369368
else
370369
{/* only leaf pages */
371370
CHKVALchkval;
372371

373372
chkval.arrb=GETARR(key);
374373
chkval.arre=chkval.arrb+ARRNELEM(key);
375-
PG_RETURN_BOOL(TS_execute(
376-
GETQUERY(query),
374+
PG_RETURN_BOOL(TS_execute(GETQUERY(query),
377375
(void*)&chkval,
378-
TS_EXEC_PHRASE_AS_AND |TS_EXEC_CALC_NOT,
379-
checkcondition_arr
380-
));
376+
TS_EXEC_PHRASE_NO_POS |TS_EXEC_CALC_NOT,
377+
checkcondition_arr));
381378
}
382379
}
383380

‎src/backend/utils/adt/tsquery.c

Lines changed: 13 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -557,13 +557,11 @@ findoprnd_recurse(QueryItem *ptr, uint32 *pos, int nnodes, bool *needcleanup)
557557
curitem->oper==OP_OR||
558558
curitem->oper==OP_PHRASE);
559559

560-
if (curitem->oper==OP_PHRASE)
561-
*needcleanup= true;/* push OP_PHRASE down later */
562-
563560
(*pos)++;
564561

565562
/* process RIGHT argument */
566563
findoprnd_recurse(ptr,pos,nnodes,needcleanup);
564+
567565
curitem->left=*pos-tmp;/* set LEFT arg's offset */
568566

569567
/* process LEFT argument */
@@ -574,8 +572,9 @@ findoprnd_recurse(QueryItem *ptr, uint32 *pos, int nnodes, bool *needcleanup)
574572

575573

576574
/*
577-
* Fills in the left-fields previously left unfilled. The input
578-
* QueryItems must be in polish (prefix) notation.
575+
* Fill in the left-fields previously left unfilled.
576+
* The input QueryItems must be in polish (prefix) notation.
577+
* Also, set *needcleanup to true if there are any QI_VALSTOP nodes.
579578
*/
580579
staticvoid
581580
findoprnd(QueryItem*ptr,intsize,bool*needcleanup)
@@ -687,15 +686,17 @@ parse_tsquery(char *buf,
687686
memcpy((void*)GETOPERAND(query), (void*)state.op,state.sumlen);
688687
pfree(state.op);
689688

690-
/* Set left operand pointers for every operator. */
689+
/*
690+
* Set left operand pointers for every operator. While we're at it,
691+
* detect whether there are any QI_VALSTOP nodes.
692+
*/
691693
findoprnd(ptr,query->size,&needcleanup);
692694

693695
/*
694-
* QI_VALSTOP nodes should be cleaned and OP_PHRASE should be pushed
695-
* down
696+
* If there are QI_VALSTOP nodes, delete them and simplify the tree.
696697
*/
697698
if (needcleanup)
698-
returncleanup_fakeval_and_phrase(query);
699+
query=cleanup_tsquery_stopwords(query);
699700

700701
returnquery;
701702
}
@@ -1088,6 +1089,9 @@ tsqueryrecv(PG_FUNCTION_ARGS)
10881089
*/
10891090
findoprnd(item,size,&needcleanup);
10901091

1092+
/* Can't have found any QI_VALSTOP nodes */
1093+
Assert(!needcleanup);
1094+
10911095
/* Copy operands to output struct */
10921096
for (i=0;i<size;i++)
10931097
{
@@ -1105,9 +1109,6 @@ tsqueryrecv(PG_FUNCTION_ARGS)
11051109

11061110
SET_VARSIZE(query,len+datalen);
11071111

1108-
if (needcleanup)
1109-
PG_RETURN_TSQUERY(cleanup_fakeval_and_phrase(query));
1110-
11111112
PG_RETURN_TSQUERY(query);
11121113
}
11131114

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp