Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit2c05560

Browse files
committed
Indexing support for pattern matching operations via separate operator
class when lc_collate is not C.
1 parent2a2f6cf commit2c05560

File tree

20 files changed

+488
-207
lines changed

20 files changed

+488
-207
lines changed

‎doc/src/sgml/charset.sgml

Lines changed: 3 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/charset.sgml,v 2.35 2003/04/1513:26:54 petere Exp $ -->
1+
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/charset.sgml,v 2.36 2003/05/1515:50:18 petere Exp $ -->
22

33
<chapter id="charset">
44
<title>Localization</>
@@ -213,23 +213,13 @@ initdb --locale=sv_SE
213213
The <function>to_char</> family of functions
214214
</para>
215215
</listitem>
216-
217-
<listitem>
218-
<para>
219-
The <literal>LIKE</> and <literal>~</> operators for pattern
220-
matching
221-
</para>
222-
</listitem>
223216
</itemizedlist>
224217
</para>
225218

226219
<para>
227220
The only severe drawback of using the locale support in
228-
<productname>PostgreSQL</> is its speed. So use locales only if you
229-
actually need it. It should be noted in particular that selecting
230-
a non-C locale disables index optimizations for <literal>LIKE</> and
231-
<literal>~</> operators, which can make a huge difference in the
232-
speed of searches that use those operators.
221+
<productname>PostgreSQL</> is its speed. So use locales only if
222+
you actually need them.
233223
</para>
234224
</sect2>
235225

‎doc/src/sgml/indices.sgml

Lines changed: 44 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/indices.sgml,v 1.40 2003/03/25 16:15:36 petere Exp $ -->
1+
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/indices.sgml,v 1.41 2003/05/1515:50:18 petere Exp $ -->
22

33
<chapter id="indexes">
44
<title id="indexes-title">Indexes</title>
@@ -132,6 +132,19 @@ CREATE INDEX test1_id_index ON test1 (id);
132132
</simplelist>
133133
</para>
134134

135+
<para>
136+
The optimizer can also use a B-tree index for queries involving the
137+
pattern matching operators <literal>LIKE</>,
138+
<literal>ILIKE</literal>, <literal>~</literal>, and
139+
<literal>~*</literal>, <emphasis>if</emphasis> the pattern is
140+
anchored to the beginning of the string, e.g., <literal>col LIKE
141+
'foo%'</literal> or <literal>col ~ '^foo'</literal>, but not
142+
<literal>col LIKE '%bar'</literal>. However, if your server does
143+
not use the C locale you will need to create the index with a
144+
special operator class. See <xref linkend="indexes-opclass">
145+
below.
146+
</para>
147+
135148
<para>
136149
<indexterm>
137150
<primary>indexes</primary>
@@ -405,6 +418,36 @@ CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable>
405418
<literal>bigbox_ops</literal>.
406419
</para>
407420
</listitem>
421+
422+
<listitem>
423+
<para>
424+
The operator classes <literal>text_pattern_ops</literal>,
425+
<literal>varchar_pattern_ops</literal>,
426+
<literal>bpchar_pattern_ops</literal>, and
427+
<literal>name_pattern_ops</literal> support B-tree indexes on
428+
the types <type>text</type>, <type>varchar</type>,
429+
<type>char</type>, and <type>name</type>, respectively. The
430+
difference to the ordinary operator classes is that the values
431+
are compared strictly character by character rather than
432+
according to the locale-specific collation rules. This makes
433+
these operator classes suitable for use by queries involving
434+
pattern matching expressions (<literal>LIKE</literal> or POSIX
435+
regular expressions) if the server does not use the standard
436+
<quote>C</quote> locale. As an example, to index a
437+
<type>varchar</type> column like this:
438+
<programlisting>
439+
CREATE INDEX test_index ON test_table (col varchar_pattern_ops);
440+
</programlisting>
441+
If you do use the C locale, you should instead create an index
442+
with the default operator class. Also note that you should
443+
create an index with the default operator class if you want
444+
queries involving ordinary comparisons to use an index. Such
445+
queries cannot use the
446+
<literal><replaceable>xxx</replaceable>_pattern_ops</literal>
447+
operator classes. It is possible, however, to create multiple
448+
indexes on the same column with different operator classes.
449+
</para>
450+
</listitem>
408451
</itemizedlist>
409452
</para>
410453

‎doc/src/sgml/release.sgml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
$Header: /cvsroot/pgsql/doc/src/sgml/release.sgml,v 1.187 2003/05/14 03:25:59 tgl Exp $
2+
$Header: /cvsroot/pgsql/doc/src/sgml/release.sgml,v 1.188 2003/05/15 15:50:18 petere Exp $
33
-->
44

55
<appendix id="release">
@@ -24,6 +24,7 @@ CDATA means the content is "SGML-free", so you can write without
2424
worries about funny characters.
2525
-->
2626
<literallayout><![CDATA[
27+
Pattern matching operations can use indexes regardless of locale
2728
New frontend/backend protocol supports many long-requested features
2829
SET AUTOCOMMIT TO OFF is no longer supported
2930
Reimplementation of NUMERIC datatype for more speed

‎doc/src/sgml/runtime.sgml

Lines changed: 8 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
$Header: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v 1.179 2003/05/14 03:26:00 tgl Exp $
2+
$Header: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v 1.180 2003/05/15 15:50:18 petere Exp $
33
-->
44

55
<Chapter Id="runtime">
@@ -133,26 +133,13 @@ postgres$ <userinput>initdb -D /usr/local/pgsql/data</userinput>
133133
</para>
134134

135135
<para>
136-
<command>initdb</command> also initializes the default locale<indexterm><primary>locale</></> for
137-
the database cluster. Normally, it will just take the locale
138-
settings in the environment and apply them to the initialized
139-
database. It is possible to specify a different locale for the
140-
database; more information about that can be found in <xref
141-
linkend="locale">. One surprise you might encounter while running
142-
<command>initdb</command> is a notice similar to this:
143-
<screen>
144-
The database cluster will be initialized with locale de_DE.
145-
This locale setting will prevent the use of indexes for pattern matching
146-
operations. If that is a concern, rerun initdb with the collation order
147-
set to "C". For more information see the documentation.
148-
</screen>
149-
This is intended to warn you that the currently selected locale
150-
will cause indexes to be sorted in an order that prevents them from
151-
being used for <literal>LIKE</> and regular-expression searches. If you need
152-
good performance in such searches, you should set your current
153-
locale to <literal>C</> and re-run <command>initdb</command>, e.g.,
154-
by running <literal>initdb --lc-collate=C</literal>. The sort
155-
order used within a particular database cluster is set by
136+
<command>initdb</command> also initializes the default
137+
locale<indexterm><primary>locale</></> for the database cluster.
138+
Normally, it will just take the locale settings in the environment
139+
and apply them to the initialized database. It is possible to
140+
specify a different locale for the database; more information about
141+
that can be found in <xref linkend="locale">. The sort order used
142+
within a particular database cluster is set by
156143
<command>initdb</command> and cannot be changed later, short of
157144
dumping all data, rerunning <command>initdb</command>, and
158145
reloading the data. So it's important to make this choice correctly

‎src/backend/access/nbtree/nbtcompare.c

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
*
99
*
1010
* IDENTIFICATION
11-
* $Header: /cvsroot/pgsql/src/backend/access/nbtree/nbtcompare.c,v 1.44 2002/06/20 20:29:25 momjian Exp $
11+
* $Header: /cvsroot/pgsql/src/backend/access/nbtree/nbtcompare.c,v 1.45 2003/05/15 15:50:18 petere Exp $
1212
*
1313
* NOTES
1414
*
@@ -149,3 +149,12 @@ btnamecmp(PG_FUNCTION_ARGS)
149149

150150
PG_RETURN_INT32(strncmp(NameStr(*a),NameStr(*b),NAMEDATALEN));
151151
}
152+
153+
Datum
154+
btname_pattern_cmp(PG_FUNCTION_ARGS)
155+
{
156+
Namea=PG_GETARG_NAME(0);
157+
Nameb=PG_GETARG_NAME(1);
158+
159+
PG_RETURN_INT32(memcmp(NameStr(*a),NameStr(*b),NAMEDATALEN));
160+
}

‎src/backend/optimizer/path/indxpath.c

Lines changed: 46 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
*
1010
*
1111
* IDENTIFICATION
12-
* $Header: /cvsroot/pgsql/src/backend/optimizer/path/indxpath.c,v 1.137 2003/05/13 04:38:58 tgl Exp $
12+
* $Header: /cvsroot/pgsql/src/backend/optimizer/path/indxpath.c,v 1.138 2003/05/15 15:50:18 petere Exp $
1313
*
1414
*-------------------------------------------------------------------------
1515
*/
@@ -1797,44 +1797,40 @@ match_special_index_operator(Expr *clause, Oid opclass,
17971797
caseOID_VARCHAR_LIKE_OP:
17981798
caseOID_NAME_LIKE_OP:
17991799
/* the right-hand const is type text for all of these */
1800-
if (locale_is_like_safe())
1801-
isIndexable=pattern_fixed_prefix(patt,Pattern_Type_Like,
1802-
&prefix,&rest)!=Pattern_Prefix_None;
1800+
isIndexable=pattern_fixed_prefix(patt,Pattern_Type_Like,
1801+
&prefix,&rest)!=Pattern_Prefix_None;
18031802
break;
18041803

18051804
caseOID_BYTEA_LIKE_OP:
18061805
isIndexable=pattern_fixed_prefix(patt,Pattern_Type_Like,
1807-
&prefix,&rest)!=Pattern_Prefix_None;
1806+
&prefix,&rest)!=Pattern_Prefix_None;
18081807
break;
18091808

18101809
caseOID_TEXT_ICLIKE_OP:
18111810
caseOID_BPCHAR_ICLIKE_OP:
18121811
caseOID_VARCHAR_ICLIKE_OP:
18131812
caseOID_NAME_ICLIKE_OP:
18141813
/* the right-hand const is type text for all of these */
1815-
if (locale_is_like_safe())
1816-
isIndexable=pattern_fixed_prefix(patt,Pattern_Type_Like_IC,
1817-
&prefix,&rest)!=Pattern_Prefix_None;
1814+
isIndexable=pattern_fixed_prefix(patt,Pattern_Type_Like_IC,
1815+
&prefix,&rest)!=Pattern_Prefix_None;
18181816
break;
18191817

18201818
caseOID_TEXT_REGEXEQ_OP:
18211819
caseOID_BPCHAR_REGEXEQ_OP:
18221820
caseOID_VARCHAR_REGEXEQ_OP:
18231821
caseOID_NAME_REGEXEQ_OP:
18241822
/* the right-hand const is type text for all of these */
1825-
if (locale_is_like_safe())
1826-
isIndexable=pattern_fixed_prefix(patt,Pattern_Type_Regex,
1827-
&prefix,&rest)!=Pattern_Prefix_None;
1823+
isIndexable=pattern_fixed_prefix(patt,Pattern_Type_Regex,
1824+
&prefix,&rest)!=Pattern_Prefix_None;
18281825
break;
18291826

18301827
caseOID_TEXT_ICREGEXEQ_OP:
18311828
caseOID_BPCHAR_ICREGEXEQ_OP:
18321829
caseOID_VARCHAR_ICREGEXEQ_OP:
18331830
caseOID_NAME_ICREGEXEQ_OP:
18341831
/* the right-hand const is type text for all of these */
1835-
if (locale_is_like_safe())
1836-
isIndexable=pattern_fixed_prefix(patt,Pattern_Type_Regex_IC,
1837-
&prefix,&rest)!=Pattern_Prefix_None;
1832+
isIndexable=pattern_fixed_prefix(patt,Pattern_Type_Regex_IC,
1833+
&prefix,&rest)!=Pattern_Prefix_None;
18381834
break;
18391835

18401836
caseOID_INET_SUB_OP:
@@ -1867,42 +1863,53 @@ match_special_index_operator(Expr *clause, Oid opclass,
18671863
caseOID_TEXT_ICLIKE_OP:
18681864
caseOID_TEXT_REGEXEQ_OP:
18691865
caseOID_TEXT_ICREGEXEQ_OP:
1870-
if (!op_in_opclass(find_operator(">=",TEXTOID),opclass)||
1871-
!op_in_opclass(find_operator("<",TEXTOID),opclass))
1872-
isIndexable= false;
1866+
if (lc_collate_is_c())
1867+
isIndexable= (op_in_opclass(find_operator(">=",TEXTOID),opclass)
1868+
&&op_in_opclass(find_operator("<",TEXTOID),opclass));
1869+
else
1870+
isIndexable= (op_in_opclass(find_operator("~>=~",TEXTOID),opclass)
1871+
&&op_in_opclass(find_operator("~<~",TEXTOID),opclass));
18731872
break;
18741873

18751874
caseOID_BYTEA_LIKE_OP:
1876-
if (!op_in_opclass(find_operator(">=",BYTEAOID),opclass)||
1877-
!op_in_opclass(find_operator("<",BYTEAOID),opclass))
1878-
isIndexable= false;
1875+
isIndexable= (op_in_opclass(find_operator(">=",BYTEAOID),opclass)
1876+
&&op_in_opclass(find_operator("<",BYTEAOID),opclass));
18791877
break;
18801878

18811879
caseOID_BPCHAR_LIKE_OP:
18821880
caseOID_BPCHAR_ICLIKE_OP:
18831881
caseOID_BPCHAR_REGEXEQ_OP:
18841882
caseOID_BPCHAR_ICREGEXEQ_OP:
1885-
if (!op_in_opclass(find_operator(">=",BPCHAROID),opclass)||
1886-
!op_in_opclass(find_operator("<",BPCHAROID),opclass))
1887-
isIndexable= false;
1883+
if (lc_collate_is_c())
1884+
isIndexable= (op_in_opclass(find_operator(">=",BPCHAROID),opclass)
1885+
&&op_in_opclass(find_operator("<",BPCHAROID),opclass));
1886+
else
1887+
isIndexable= (op_in_opclass(find_operator("~>=~",BPCHAROID),opclass)
1888+
&&op_in_opclass(find_operator("~<~",BPCHAROID),opclass));
18881889
break;
18891890

18901891
caseOID_VARCHAR_LIKE_OP:
18911892
caseOID_VARCHAR_ICLIKE_OP:
18921893
caseOID_VARCHAR_REGEXEQ_OP:
18931894
caseOID_VARCHAR_ICREGEXEQ_OP:
1894-
if (!op_in_opclass(find_operator(">=",VARCHAROID),opclass)||
1895-
!op_in_opclass(find_operator("<",VARCHAROID),opclass))
1896-
isIndexable= false;
1895+
if (lc_collate_is_c())
1896+
isIndexable= (op_in_opclass(find_operator(">=",VARCHAROID),opclass)
1897+
&&op_in_opclass(find_operator("<",VARCHAROID),opclass));
1898+
else
1899+
isIndexable= (op_in_opclass(find_operator("~>=~",VARCHAROID),opclass)
1900+
&&op_in_opclass(find_operator("~<~",VARCHAROID),opclass));
18971901
break;
18981902

18991903
caseOID_NAME_LIKE_OP:
19001904
caseOID_NAME_ICLIKE_OP:
19011905
caseOID_NAME_REGEXEQ_OP:
19021906
caseOID_NAME_ICREGEXEQ_OP:
1903-
if (!op_in_opclass(find_operator(">=",NAMEOID),opclass)||
1904-
!op_in_opclass(find_operator("<",NAMEOID),opclass))
1905-
isIndexable= false;
1907+
if (lc_collate_is_c())
1908+
isIndexable= (op_in_opclass(find_operator(">=",NAMEOID),opclass)
1909+
&&op_in_opclass(find_operator("<",NAMEOID),opclass));
1910+
else
1911+
isIndexable= (op_in_opclass(find_operator("~>=~",NAMEOID),opclass)
1912+
&&op_in_opclass(find_operator("~<~",NAMEOID),opclass));
19061913
break;
19071914

19081915
caseOID_INET_SUB_OP:
@@ -2039,6 +2046,7 @@ prefix_quals(Node *leftop, Oid expr_op,
20392046
List*result;
20402047
Oiddatatype;
20412048
Oidoproid;
2049+
constchar*oprname;
20422050
char*prefix;
20432051
Const*con;
20442052
Expr*expr;
@@ -2098,9 +2106,10 @@ prefix_quals(Node *leftop, Oid expr_op,
20982106
*/
20992107
if (pstatus==Pattern_Prefix_Exact)
21002108
{
2101-
oproid=find_operator("=",datatype);
2109+
oprname= (datatype==BYTEAOID||lc_collate_is_c() ?"=" :"~=~");
2110+
oproid=find_operator(oprname,datatype);
21022111
if (oproid==InvalidOid)
2103-
elog(ERROR,"prefix_quals: no=operator for type %u",datatype);
2112+
elog(ERROR,"prefix_quals: no operator%sfor type %u",oprname,datatype);
21042113
con=string_to_const(prefix,datatype);
21052114
expr=make_opclause(oproid,BOOLOID, false,
21062115
(Expr*)leftop, (Expr*)con);
@@ -2113,9 +2122,10 @@ prefix_quals(Node *leftop, Oid expr_op,
21132122
*
21142123
* We can always say "x >= prefix".
21152124
*/
2116-
oproid=find_operator(">=",datatype);
2125+
oprname= (datatype==BYTEAOID||lc_collate_is_c() ?">=" :"~>=~");
2126+
oproid=find_operator(oprname,datatype);
21172127
if (oproid==InvalidOid)
2118-
elog(ERROR,"prefix_quals: no>=operator for type %u",datatype);
2128+
elog(ERROR,"prefix_quals: no operator%sfor type %u",oprname,datatype);
21192129
con=string_to_const(prefix,datatype);
21202130
expr=make_opclause(oproid,BOOLOID, false,
21212131
(Expr*)leftop, (Expr*)con);
@@ -2129,9 +2139,10 @@ prefix_quals(Node *leftop, Oid expr_op,
21292139
greaterstr=make_greater_string(con);
21302140
if (greaterstr)
21312141
{
2132-
oproid=find_operator("<",datatype);
2142+
oprname= (datatype==BYTEAOID||lc_collate_is_c() ?"<" :"~<~");
2143+
oproid=find_operator(oprname,datatype);
21332144
if (oproid==InvalidOid)
2134-
elog(ERROR,"prefix_quals: no<operator for type %u",datatype);
2145+
elog(ERROR,"prefix_quals: no operator%sfor type %u",oprname,datatype);
21352146
expr=make_opclause(oproid,BOOLOID, false,
21362147
(Expr*)leftop, (Expr*)greaterstr);
21372148
result=lappend(result,expr);

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp