Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitafb0d07

Browse files
committed
Replace the data structure used for keyword lookup.
Previously, ScanKeywordLookup was passed an array of string pointers.This had some performance deficiencies: the strings themselves mightbe scattered all over the place depending on the compiler (and somequick checking shows that at least with gcc-on-Linux, they indeedweren't reliably close together). That led to very cache-unfriendlybehavior as the binary search touched strings in many different pages.Also, depending on the platform, the string pointers might need tobe adjusted at program start, so that they couldn't be simple constantdata. And the ScanKeyword struct had been designed with an eye to32-bit machines originally; on 64-bit it requires 16 bytes perkeyword, making it even more cache-unfriendly.Redesign so that the keyword strings themselves are allocatedconsecutively (as part of one big char-string constant), therebyeliminating the touch-lots-of-unrelated-pages syndrome. And getrid of the ScanKeyword array in favor of three separate arrays:uint16 offsets into the keyword array, uint16 token codes, anduint8 keyword categories. That reduces the overhead per keywordto 5 bytes instead of 16 (even less in programs that only needone of the token codes and categories); moreover, the binary searchonly touches the offsets array, further reducing its cache footprint.This also lets us put the token codes somewhere else than thekeyword strings are, which avoids some unpleasant build dependencies.While we're at it, wrap the data used by ScanKeywordLookup intoa struct that can be treated as an opaque type by most callers.That doesn't change things much right now, but it will make itless painful to switch to a hash-based lookup method, as is beingdiscussed in the mailing list thread.Most of the change here is associated with adding a generatorscript that can build the new data structure from the samelist-of-PG_KEYWORD header representation we used before.The PG_KEYWORD lists that plpgsql and ecpg used to embed intheir scanner .c files have to be moved into headers, and theMakefiles have to be taught to invoke the generator script.This work is also necessary if we're to consider hash-based lookup,since the generator script is what would be responsible forconstructing a hash table.Aside from saving a few kilobytes in each program that includesthe keyword table, this seems to speed up raw parsing (flex+bison)by a few percent. So it's worth doing even as it stands, thoughwe think we can gain even more with a follow-on patch to switchto hash-based lookup.John Naylor, with further hacking by meDiscussion:https://postgr.es/m/CAJVSVGXdFVU2sgym89XPL=Lv1zOS5=EHHQ8XWNzFL=mTXkKMLw@mail.gmail.com
1 parentc5c7fa2 commitafb0d07

32 files changed

+843
-439
lines changed

‎contrib/pg_stat_statements/pg_stat_statements.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3075,8 +3075,8 @@ fill_in_constant_lengths(pgssJumbleState *jstate, const char *query,
30753075
/* initialize the flex scanner --- should match raw_parser() */
30763076
yyscanner=scanner_init(query,
30773077
&yyextra,
3078-
ScanKeywords,
3079-
NumScanKeywords);
3078+
&ScanKeywords,
3079+
ScanKeywordTokens);
30803080

30813081
/* we don't want to re-emit any escape string warnings */
30823082
yyextra.escape_string_warning= false;

‎src/backend/parser/parser.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ raw_parser(const char *str)
4141

4242
/* initialize the flex scanner */
4343
yyscanner=scanner_init(str,&yyextra.core_yy_extra,
44-
ScanKeywords,NumScanKeywords);
44+
&ScanKeywords,ScanKeywordTokens);
4545

4646
/* base_yylex() only needs this much initialization */
4747
yyextra.have_lookahead= false;

‎src/backend/parser/scan.l

Lines changed: 33 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,21 @@ intbackslash_quote = BACKSLASH_QUOTE_SAFE_ENCODING;
6666
boolescape_string_warning =true;
6767
boolstandard_conforming_strings =true;
6868

69+
/*
70+
* Constant data exported from this file. This array maps from the
71+
* zero-based keyword numbers returned by ScanKeywordLookup to the
72+
* Bison token numbers needed by gram.y. This is exported because
73+
* callers need to pass it to scanner_init, if they are using the
74+
* standard keyword list ScanKeywords.
75+
*/
76+
#definePG_KEYWORD(kwname, value, category) value,
77+
78+
const uint16 ScanKeywordTokens[] = {
79+
#include"parser/kwlist.h"
80+
};
81+
82+
#undef PG_KEYWORD
83+
6984
/*
7085
* Set the type of YYSTYPE.
7186
*/
@@ -504,18 +519,18 @@ other.
504519
* We will pass this along as a normal character string,
505520
* but preceded with an internally-generated "NCHAR".
506521
*/
507-
const ScanKeyword *keyword;
522+
intkwnum;
508523

509524
SET_YYLLOC();
510525
yyless(1);/* eat only 'n' this time */
511526

512-
keyword =ScanKeywordLookup("nchar",
513-
yyextra->keywords,
514-
yyextra->num_keywords);
515-
if (keyword !=NULL)
527+
kwnum =ScanKeywordLookup("nchar",
528+
yyextra->keywordlist);
529+
if (kwnum >=0)
516530
{
517-
yylval->keyword = keyword->name;
518-
return keyword->value;
531+
yylval->keyword =GetScanKeyword(kwnum,
532+
yyextra->keywordlist);
533+
return yyextra->keyword_tokens[kwnum];
519534
}
520535
else
521536
{
@@ -1021,19 +1036,19 @@ other.
10211036

10221037

10231038
{identifier}{
1024-
const ScanKeyword *keyword;
1039+
intkwnum;
10251040
char *ident;
10261041

10271042
SET_YYLLOC();
10281043

10291044
/* Is it a keyword? */
1030-
keyword =ScanKeywordLookup(yytext,
1031-
yyextra->keywords,
1032-
yyextra->num_keywords);
1033-
if (keyword !=NULL)
1045+
kwnum =ScanKeywordLookup(yytext,
1046+
yyextra->keywordlist);
1047+
if (kwnum >=0)
10341048
{
1035-
yylval->keyword = keyword->name;
1036-
return keyword->value;
1049+
yylval->keyword =GetScanKeyword(kwnum,
1050+
yyextra->keywordlist);
1051+
return yyextra->keyword_tokens[kwnum];
10371052
}
10381053

10391054
/*
@@ -1142,8 +1157,8 @@ scanner_yyerror(const char *message, core_yyscan_t yyscanner)
11421157
core_yyscan_t
11431158
scanner_init(constchar *str,
11441159
core_yy_extra_type *yyext,
1145-
constScanKeyword *keywords,
1146-
int num_keywords)
1160+
constScanKeywordList *keywordlist,
1161+
const uint16 *keyword_tokens)
11471162
{
11481163
Sizeslen =strlen(str);
11491164
yyscan_tscanner;
@@ -1153,8 +1168,8 @@ scanner_init(const char *str,
11531168

11541169
core_yyset_extra(yyext, scanner);
11551170

1156-
yyext->keywords =keywords;
1157-
yyext->num_keywords =num_keywords;
1171+
yyext->keywordlist =keywordlist;
1172+
yyext->keyword_tokens =keyword_tokens;
11581173

11591174
yyext->backslash_quote = backslash_quote;
11601175
yyext->escape_string_warning = escape_string_warning;

‎src/backend/utils/adt/misc.c

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -417,15 +417,17 @@ pg_get_keywords(PG_FUNCTION_ARGS)
417417

418418
funcctx=SRF_PERCALL_SETUP();
419419

420-
if (funcctx->call_cntr<NumScanKeywords)
420+
if (funcctx->call_cntr<ScanKeywords.num_keywords)
421421
{
422422
char*values[3];
423423
HeapTupletuple;
424424

425425
/* cast-away-const is ugly but alternatives aren't much better */
426-
values[0]=unconstify(char*,ScanKeywords[funcctx->call_cntr].name);
426+
values[0]=unconstify(char*,
427+
GetScanKeyword(funcctx->call_cntr,
428+
&ScanKeywords));
427429

428-
switch (ScanKeywords[funcctx->call_cntr].category)
430+
switch (ScanKeywordCategories[funcctx->call_cntr])
429431
{
430432
caseUNRESERVED_KEYWORD:
431433
values[1]="U";

‎src/backend/utils/adt/ruleutils.c

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10601,11 +10601,9 @@ quote_identifier(const char *ident)
1060110601
* Note: ScanKeywordLookup() does case-insensitive comparison, but
1060210602
* that's fine, since we already know we have all-lower-case.
1060310603
*/
10604-
constScanKeyword*keyword=ScanKeywordLookup(ident,
10605-
ScanKeywords,
10606-
NumScanKeywords);
10604+
intkwnum=ScanKeywordLookup(ident,&ScanKeywords);
1060710605

10608-
if (keyword!=NULL&&keyword->category!=UNRESERVED_KEYWORD)
10606+
if (kwnum >=0&&ScanKeywordCategories[kwnum]!=UNRESERVED_KEYWORD)
1060910607
safe= false;
1061010608
}
1061110609

‎src/common/.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
/kwlist_d.h

‎src/common/Makefile

Lines changed: 15 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -41,11 +41,11 @@ override CPPFLAGS += -DVAL_LDFLAGS_EX="\"$(LDFLAGS_EX)\""
4141
overrideCPPFLAGS += -DVAL_LDFLAGS_SL="\"$(LDFLAGS_SL)\""
4242
overrideCPPFLAGS += -DVAL_LIBS="\"$(LIBS)\""
4343

44-
overrideCPPFLAGS := -DFRONTEND$(CPPFLAGS)
44+
overrideCPPFLAGS := -DFRONTEND-I. -I$(top_srcdir)/src/common$(CPPFLAGS)
4545
LIBS +=$(PTHREAD_LIBS)
4646

4747
OBJS_COMMON = base64.o config_info.o controldata_utils.o exec.o file_perm.o\
48-
ip.o keywords.o link-canary.o md5.o pg_lzcompress.o\
48+
ip.o keywords.okwlookup.olink-canary.o md5.o pg_lzcompress.o\
4949
pgfnames.o psprintf.o relpath.o\
5050
rmtree.o saslprep.o scram-common.o string.o unicode_norm.o\
5151
username.o wait_error.o
@@ -65,6 +65,8 @@ OBJS_SRV = $(OBJS_COMMON:%.o=%_srv.o)
6565

6666
all: libpgcommon.a libpgcommon_shlib.a libpgcommon_srv.a
6767

68+
distprep: kwlist_d.h
69+
6870
# libpgcommon is needed by some contrib
6971
install: all installdirs
7072
$(INSTALL_STLIB) libpgcommon.a'$(DESTDIR)$(libdir)/libpgcommon.a'
@@ -115,16 +117,18 @@ libpgcommon_srv.a: $(OBJS_SRV)
115117
%_srv.o:%.c%.o
116118
$(CC)$(CFLAGS)$(subst -DFRONTEND,,$(CPPFLAGS)) -c$< -o$@
117119

118-
# Dependencies of keywords.o need to be managed explicitly to make sure
119-
# that you don't get broken parsing code, even in a non-enable-depend build.
120-
# Note that gram.h isn't required for the frontend versions of keywords.o.
121-
$(top_builddir)/src/include/parser/gram.h:$(top_srcdir)/src/backend/parser/gram.y
122-
$(MAKE) -C$(top_builddir)/src/backend$(top_builddir)/src/include/parser/gram.h
120+
# generate SQL keyword lookup table to be included into keywords*.o.
121+
kwlist_d.h:$(top_srcdir)/src/include/parser/kwlist.h$(top_srcdir)/src/tools/gen_keywordlist.pl
122+
$(PERL)$(top_srcdir)/src/tools/gen_keywordlist.pl --extern$<
123123

124-
keywords.o:$(top_srcdir)/src/include/parser/kwlist.h
125-
keywords_shlib.o:$(top_srcdir)/src/include/parser/kwlist.h
126-
keywords_srv.o:$(top_builddir)/src/include/parser/gram.h$(top_srcdir)/src/include/parser/kwlist.h
124+
# Dependencies ofkeywords*.o need to be managed explicitly to make sure
125+
# that you don't get broken parsing code, even in a non-enable-depend build.
126+
keywords.okeywords_shlib.okeywords_srv.o: kwlist_d.h
127127

128-
cleandistcleanmaintainer-clean:
128+
# kwlist_d.h is in the distribution tarball, so it is not cleaned here.
129+
cleandistclean:
129130
rm -f libpgcommon.a libpgcommon_shlib.a libpgcommon_srv.a
130131
rm -f$(OBJS_FRONTEND)$(OBJS_SHLIB)$(OBJS_SRV)
132+
133+
maintainer-clean: distclean
134+
rm -f kwlist_d.h

‎src/common/keywords.c

Lines changed: 9 additions & 90 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
/*-------------------------------------------------------------------------
22
*
33
* keywords.c
4-
*lexical token lookup for key words in PostgreSQL
4+
*PostgreSQL's list of SQL keywords
55
*
66
*
77
* Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
@@ -13,102 +13,21 @@
1313
*
1414
*-------------------------------------------------------------------------
1515
*/
16-
#ifndefFRONTEND
17-
#include"postgres.h"
18-
#else
19-
#include"postgres_fe.h"
20-
#endif
16+
#include"c.h"
2117

22-
#ifndefFRONTEND
23-
24-
#include"parser/gramparse.h"
18+
#include"common/keywords.h"
2519

26-
#definePG_KEYWORD(a,b,c) {a,b,c},
2720

28-
#else
21+
/* ScanKeywordList lookup data for SQL keywords */
2922

30-
#include"common/keywords.h"
31-
32-
/*
33-
* We don't need the token number for frontend uses, so leave it out to avoid
34-
* requiring backend headers that won't compile cleanly here.
35-
*/
36-
#definePG_KEYWORD(a,b,c) {a,0,c},
23+
#include"kwlist_d.h"
3724

38-
#endif/*FRONTEND */
25+
/*Keyword categories for SQL keywords */
3926

27+
#definePG_KEYWORD(kwname,value,category) category,
4028

41-
constScanKeywordScanKeywords[]= {
29+
constuint8ScanKeywordCategories[SCANKEYWORDS_NUM_KEYWORDS]= {
4230
#include"parser/kwlist.h"
4331
};
4432

45-
constintNumScanKeywords=lengthof(ScanKeywords);
46-
47-
48-
/*
49-
* ScanKeywordLookup - see if a given word is a keyword
50-
*
51-
* The table to be searched is passed explicitly, so that this can be used
52-
* to search keyword lists other than the standard list appearing above.
53-
*
54-
* Returns a pointer to the ScanKeyword table entry, or NULL if no match.
55-
*
56-
* The match is done case-insensitively. Note that we deliberately use a
57-
* dumbed-down case conversion that will only translate 'A'-'Z' into 'a'-'z',
58-
* even if we are in a locale where tolower() would produce more or different
59-
* translations. This is to conform to the SQL99 spec, which says that
60-
* keywords are to be matched in this way even though non-keyword identifiers
61-
* receive a different case-normalization mapping.
62-
*/
63-
constScanKeyword*
64-
ScanKeywordLookup(constchar*text,
65-
constScanKeyword*keywords,
66-
intnum_keywords)
67-
{
68-
intlen,
69-
i;
70-
charword[NAMEDATALEN];
71-
constScanKeyword*low;
72-
constScanKeyword*high;
73-
74-
len=strlen(text);
75-
/* We assume all keywords are shorter than NAMEDATALEN. */
76-
if (len >=NAMEDATALEN)
77-
returnNULL;
78-
79-
/*
80-
* Apply an ASCII-only downcasing. We must not use tolower() since it may
81-
* produce the wrong translation in some locales (eg, Turkish).
82-
*/
83-
for (i=0;i<len;i++)
84-
{
85-
charch=text[i];
86-
87-
if (ch >='A'&&ch <='Z')
88-
ch+='a'-'A';
89-
word[i]=ch;
90-
}
91-
word[len]='\0';
92-
93-
/*
94-
* Now do a binary search using plain strcmp() comparison.
95-
*/
96-
low=keywords;
97-
high=keywords+ (num_keywords-1);
98-
while (low <=high)
99-
{
100-
constScanKeyword*middle;
101-
intdifference;
102-
103-
middle=low+ (high-low) /2;
104-
difference=strcmp(middle->name,word);
105-
if (difference==0)
106-
returnmiddle;
107-
elseif (difference<0)
108-
low=middle+1;
109-
else
110-
high=middle-1;
111-
}
112-
113-
returnNULL;
114-
}
33+
#undef PG_KEYWORD

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp