Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit7844608

Browse files
committed
Get rid of USE_WIDE_UPPER_LOWER dependency in trigram construction.
contrib/pg_trgm's make_trigrams() was coded to ignore multibyte characterboundaries and just make trigrams from bytes if USE_WIDE_UPPER_LOWER wasn'tdefined. This is a bit odd, since there's no obvious reason why trigramcompaction rules should depend on the presence of towlower() and friends.What's more, there was an Assert() that would fail if that code path wasfed any multibyte characters.We need to do something about this since the pending regex-indexing patchhas an assumption that you get just one "trgm" from any three characters.The best solution seems to be to remove the USE_WIDE_UPPER_LOWERdependency, which shouldn't really have been there in the first place.The second loop in make_trigrams() is now just a fast path and not apotentially incompatible algorithm.If there is anybody still using Postgres on machines without wcstombs() ortowlower(), and they have non-ASCII data indexed by pg_trgm, they'll needto REINDEX those indexes after pg_upgrade to 9.3, else searches may failincorrectly. It seems likely that there are no such installations, though.In passing, rename cnt_trigram to compact_trigram, which seems to betterdescribe its functionality, and improve make_trigrams' test for whether ithas to use the slow path or not (per a suggestion from Alexander Korotkov).
1 parentfaf4726 commit7844608

File tree

1 file changed

+10
-7
lines changed

1 file changed

+10
-7
lines changed

‎contrib/pg_trgm/trgm_op.c

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -109,9 +109,13 @@ find_word(char *str, int lenstr, char **endword, int *charlen)
109109
returnbeginword;
110110
}
111111

112-
#ifdefUSE_WIDE_UPPER_LOWER
112+
/*
113+
* Reduce a trigram (three possibly multi-byte characters) to a trgm,
114+
* which is always exactly three bytes. If we have three single-byte
115+
* characters, we just use them as-is; otherwise we form a hash value.
116+
*/
113117
staticvoid
114-
cnt_trigram(trgm*tptr,char*str,intbytelen)
118+
compact_trigram(trgm*tptr,char*str,intbytelen)
115119
{
116120
if (bytelen==3)
117121
{
@@ -131,7 +135,6 @@ cnt_trigram(trgm *tptr, char *str, int bytelen)
131135
CPTRGM(tptr,&crc);
132136
}
133137
}
134-
#endif
135138

136139
/*
137140
* Adds trigrams from words (already padded).
@@ -144,16 +147,16 @@ make_trigrams(trgm *tptr, char *str, int bytelen, int charlen)
144147
if (charlen<3)
145148
returntptr;
146149

147-
#ifdefUSE_WIDE_UPPER_LOWER
148-
if (pg_database_encoding_max_length()>1)
150+
if (bytelen>charlen)
149151
{
152+
/* Find multibyte character boundaries and apply compact_trigram */
150153
intlenfirst=pg_mblen(str),
151154
lenmiddle=pg_mblen(str+lenfirst),
152155
lenlast=pg_mblen(str+lenfirst+lenmiddle);
153156

154157
while ((ptr-str)+lenfirst+lenmiddle+lenlast <=bytelen)
155158
{
156-
cnt_trigram(tptr,ptr,lenfirst+lenmiddle+lenlast);
159+
compact_trigram(tptr,ptr,lenfirst+lenmiddle+lenlast);
157160

158161
ptr+=lenfirst;
159162
tptr++;
@@ -164,8 +167,8 @@ make_trigrams(trgm *tptr, char *str, int bytelen, int charlen)
164167
}
165168
}
166169
else
167-
#endif
168170
{
171+
/* Fast path when there are no multibyte characters */
169172
Assert(bytelen==charlen);
170173

171174
while (ptr-str<bytelen-2/* number of trigrams = strlen - 2 */ )

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp