NotificationsYou must be signed in to change notification settings
Fork6
Star31

Commitafb0d07

committed

Replace the data structure used for keyword lookup.

Previously, ScanKeywordLookup was passed an array of string pointers.This had some performance deficiencies: the strings themselves mightbe scattered all over the place depending on the compiler (and somequick checking shows that at least with gcc-on-Linux, they indeedweren't reliably close together). That led to very cache-unfriendlybehavior as the binary search touched strings in many different pages.Also, depending on the platform, the string pointers might need tobe adjusted at program start, so that they couldn't be simple constantdata. And the ScanKeyword struct had been designed with an eye to32-bit machines originally; on 64-bit it requires 16 bytes perkeyword, making it even more cache-unfriendly.Redesign so that the keyword strings themselves are allocatedconsecutively (as part of one big char-string constant), therebyeliminating the touch-lots-of-unrelated-pages syndrome. And getrid of the ScanKeyword array in favor of three separate arrays:uint16 offsets into the keyword array, uint16 token codes, anduint8 keyword categories. That reduces the overhead per keywordto 5 bytes instead of 16 (even less in programs that only needone of the token codes and categories); moreover, the binary searchonly touches the offsets array, further reducing its cache footprint.This also lets us put the token codes somewhere else than thekeyword strings are, which avoids some unpleasant build dependencies.While we're at it, wrap the data used by ScanKeywordLookup intoa struct that can be treated as an opaque type by most callers.That doesn't change things much right now, but it will make itless painful to switch to a hash-based lookup method, as is beingdiscussed in the mailing list thread.Most of the change here is associated with adding a generatorscript that can build the new data structure from the samelist-of-PG_KEYWORD header representation we used before.The PG_KEYWORD lists that plpgsql and ecpg used to embed intheir scanner .c files have to be moved into headers, and theMakefiles have to be taught to invoke the generator script.This work is also necessary if we're to consider hash-based lookup,since the generator script is what would be responsible forconstructing a hash table.Aside from saving a few kilobytes in each program that includesthe keyword table, this seems to speed up raw parsing (flex+bison)by a few percent. So it's worth doing even as it stands, thoughwe think we can gain even more with a follow-on patch to switchto hash-based lookup.John Naylor, with further hacking by meDiscussion:https://postgr.es/m/CAJVSVGXdFVU2sgym89XPL=Lv1zOS5=EHHQ8XWNzFL=mTXkKMLw@mail.gmail.com

1 parentc5c7fa2 commitafb0d07Copy full SHA for afb0d07

File tree

32 files changed

+843

-439

lines changed

contrib/pg_stat_statements
- pg_stat_statements.c
src
- backend
  - parser
    - parser.c
    - scan.l
  - utils/adt
    - misc.c
    - ruleutils.c
- common
- fe_utils
  - string_utils.c
- include
  - common
    - keywords.h
    - kwlookup.h
  - parser
    - kwlist.h
    - scanner.h
- interfaces/ecpg/preproc
- pl/plpgsql/src
- tools
  - gen_keywordlist.pl
  - msvc

32 files changed

+843

-439

lines changed

`‎contrib/pg_stat_statements/pg_stat_statements.c`

Lines changed: 2 additions & 2 deletions

Original file line number	Diff line number	Diff line change
`@@ -3075,8 +3075,8 @@ fill_in_constant_lengths(pgssJumbleState jstate, const char query,`
`3075`	`3075`	`/* initialize the flex scanner --- should match raw_parser() */`
`3076`	`3076`	`yyscanner=scanner_init(query,`
`3077`	`3077`	`&yyextra,`
`3078`		`-ScanKeywords,`
`3079`		`-NumScanKeywords);`
	`3078`	`+&ScanKeywords,`
	`3079`	`+ScanKeywordTokens);`
`3080`	`3080`
`3081`	`3081`	`/* we don't want to re-emit any escape string warnings */`
`3082`	`3082`	`yyextra.escape_string_warning= false;`

`‎src/backend/parser/parser.c`

Lines changed: 1 addition & 1 deletion

Original file line number	Diff line number	Diff line change
`@@ -41,7 +41,7 @@ raw_parser(const char *str)`
`41`	`41`
`42`	`42`	`/* initialize the flex scanner */`
`43`	`43`	`yyscanner=scanner_init(str,&yyextra.core_yy_extra,`
`44`		`-ScanKeywords,NumScanKeywords);`
	`44`	`+&ScanKeywords,ScanKeywordTokens);`
`45`	`45`
`46`	`46`	`/* base_yylex() only needs this much initialization */`
`47`	`47`	`yyextra.have_lookahead= false;`

`‎src/backend/parser/scan.l`

Lines changed: 33 additions & 18 deletions

Original file line number	Diff line number	Diff line change
`@@ -66,6 +66,21 @@ intbackslash_quote = BACKSLASH_QUOTE_SAFE_ENCODING;`
`66`	`66`	`boolescape_string_warning =true;`
`67`	`67`	`boolstandard_conforming_strings =true;`
`68`	`68`
	`69`	`+/*`
	`70`	`+ * Constant data exported from this file. This array maps from the`
	`71`	`+ * zero-based keyword numbers returned by ScanKeywordLookup to the`
	`72`	`+ * Bison token numbers needed by gram.y. This is exported because`
	`73`	`+ * callers need to pass it to scanner_init, if they are using the`
	`74`	`+ * standard keyword list ScanKeywords.`
	`75`	`+*/`
	`76`	`+#definePG_KEYWORD(kwname, value, category) value,`
	`77`	`+`
	`78`	`+const uint16 ScanKeywordTokens[] = {`
	`79`	`+#include"parser/kwlist.h"`
	`80`	`+};`
	`81`	`+`
	`82`	`+#undef PG_KEYWORD`
	`83`	`+`
`69`	`84`	`/*`
`70`	`85`	`* Set the type of YYSTYPE.`
`71`	`86`	`*/`
`@@ -504,18 +519,18 @@ other.`
`504`	`519`	`* We will pass this along as a normal character string,`
`505`	`520`	`* but preceded with an internally-generated "NCHAR".`
`506`	`521`	`*/`
`507`		`-const ScanKeyword *keyword;`
	`522`	`+intkwnum;`
`508`	`523`
`509`	`524`	`SET_YYLLOC();`
`510`	`525`	`yyless(1);/* eat only 'n' this time */`
`511`	`526`
`512`		`-keyword =ScanKeywordLookup("nchar",`
`513`		`-yyextra->keywords,`
`514`		`-yyextra->num_keywords);`
`515`		`-if (keyword !=NULL)`
	`527`	`+kwnum =ScanKeywordLookup("nchar",`
	`528`	`+ yyextra->keywordlist);`
	`529`	`+if (kwnum >=0)`
`516`	`530`	`{`
`517`		`-yylval->keyword = keyword->name;`
`518`		`-return keyword->value;`
	`531`	`+yylval->keyword =GetScanKeyword(kwnum,`
	`532`	`+ yyextra->keywordlist);`
	`533`	`+return yyextra->keyword_tokens[kwnum];`
`519`	`534`	`}`
`520`	`535`	`else`
`521`	`536`	`{`
`@@ -1021,19 +1036,19 @@ other.`
`1021`	`1036`
`1022`	`1037`
`1023`	`1038`	`{identifier}{`
`1024`		`-const ScanKeyword *keyword;`
	`1039`	`+intkwnum;`
`1025`	`1040`	`char *ident;`
`1026`	`1041`
`1027`	`1042`	`SET_YYLLOC();`
`1028`	`1043`
`1029`	`1044`	`/* Is it a keyword? */`
`1030`		`-keyword =ScanKeywordLookup(yytext,`
`1031`		`-yyextra->keywords,`
`1032`		`-yyextra->num_keywords);`
`1033`		`-if (keyword !=NULL)`
	`1045`	`+kwnum =ScanKeywordLookup(yytext,`
	`1046`	`+ yyextra->keywordlist);`
	`1047`	`+if (kwnum >=0)`
`1034`	`1048`	`{`
`1035`		`-yylval->keyword = keyword->name;`
`1036`		`-return keyword->value;`
	`1049`	`+yylval->keyword =GetScanKeyword(kwnum,`
	`1050`	`+ yyextra->keywordlist);`
	`1051`	`+return yyextra->keyword_tokens[kwnum];`
`1037`	`1052`	`}`
`1038`	`1053`
`1039`	`1054`	`/*`
`@@ -1142,8 +1157,8 @@ scanner_yyerror(const char *message, core_yyscan_t yyscanner)`
`1142`	`1157`	`core_yyscan_t`
`1143`	`1158`	`scanner_init(constchar *str,`
`1144`	`1159`	`core_yy_extra_type *yyext,`
`1145`		`-constScanKeyword *keywords,`
`1146`		`-int num_keywords)`
	`1160`	`+constScanKeywordList *keywordlist,`
	`1161`	`+const uint16 *keyword_tokens)`
`1147`	`1162`	`{`
`1148`	`1163`	`Sizeslen =strlen(str);`
`1149`	`1164`	`yyscan_tscanner;`
`@@ -1153,8 +1168,8 @@ scanner_init(const char *str,`
`1153`	`1168`
`1154`	`1169`	`core_yyset_extra(yyext, scanner);`
`1155`	`1170`
`1156`		`-yyext->keywords =keywords;`
`1157`		`-yyext->num_keywords =num_keywords;`
	`1171`	`+yyext->keywordlist =keywordlist;`
	`1172`	`+yyext->keyword_tokens =keyword_tokens;`
`1158`	`1173`
`1159`	`1174`	`yyext->backslash_quote = backslash_quote;`
`1160`	`1175`	`yyext->escape_string_warning = escape_string_warning;`

`‎src/backend/utils/adt/misc.c`

Lines changed: 5 additions & 3 deletions

Original file line number	Diff line number	Diff line change
`@@ -417,15 +417,17 @@ pg_get_keywords(PG_FUNCTION_ARGS)`
`417`	`417`
`418`	`418`	`funcctx=SRF_PERCALL_SETUP();`
`419`	`419`
`420`		`-if (funcctx->call_cntr<NumScanKeywords)`
	`420`	`+if (funcctx->call_cntr<ScanKeywords.num_keywords)`
`421`	`421`	`{`
`422`	`422`	`char*values[3];`
`423`	`423`	`HeapTupletuple;`
`424`	`424`
`425`	`425`	`/* cast-away-const is ugly but alternatives aren't much better */`
`426`		`-values[0]=unconstify(char*,ScanKeywords[funcctx->call_cntr].name);`
	`426`	`+values[0]=unconstify(char*,`
	`427`	`+GetScanKeyword(funcctx->call_cntr,`
	`428`	`+&ScanKeywords));`
`427`	`429`
`428`		`-switch (ScanKeywords[funcctx->call_cntr].category)`
	`430`	`+switch (ScanKeywordCategories[funcctx->call_cntr])`
`429`	`431`	`{`
`430`	`432`	`caseUNRESERVED_KEYWORD:`
`431`	`433`	`values[1]="U";`

`‎src/backend/utils/adt/ruleutils.c`

Lines changed: 2 additions & 4 deletions

Original file line number	Diff line number	Diff line change
`@@ -10601,11 +10601,9 @@ quote_identifier(const char *ident)`
`10601`	`10601`	`* Note: ScanKeywordLookup() does case-insensitive comparison, but`
`10602`	`10602`	`* that's fine, since we already know we have all-lower-case.`
`10603`	`10603`	`*/`
`10604`		`-constScanKeyword*keyword=ScanKeywordLookup(ident,`
`10605`		`-ScanKeywords,`
`10606`		`-NumScanKeywords);`
	`10604`	`+intkwnum=ScanKeywordLookup(ident,&ScanKeywords);`
`10607`	`10605`
`10608`		`-if (keyword!=NULL&&keyword->category!=UNRESERVED_KEYWORD)`
	`10606`	`+if (kwnum >=0&&ScanKeywordCategories[kwnum]!=UNRESERVED_KEYWORD)`
`10609`	`10607`	`safe= false;`
`10610`	`10608`	`}`
`10611`	`10609`

`‎src/common/.gitignore`

Lines changed: 1 addition & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	`+/kwlist_d.h`

`‎src/common/Makefile`

Lines changed: 15 additions & 11 deletions

Original file line number	Diff line number	Diff line change
`@@ -41,11 +41,11 @@ override CPPFLAGS += -DVAL_LDFLAGS_EX="\"$(LDFLAGS_EX)\""`
`41`	`41`	`overrideCPPFLAGS += -DVAL_LDFLAGS_SL="\"$(LDFLAGS_SL)\""`
`42`	`42`	`overrideCPPFLAGS += -DVAL_LIBS="\"$(LIBS)\""`
`43`	`43`
`44`		`-overrideCPPFLAGS := -DFRONTEND$(CPPFLAGS)`
	`44`	`+overrideCPPFLAGS := -DFRONTEND-I. -I$(top_srcdir)/src/common$(CPPFLAGS)`
`45`	`45`	`LIBS +=$(PTHREAD_LIBS)`
`46`	`46`
`47`	`47`	`OBJS_COMMON = base64.o config_info.o controldata_utils.o exec.o file_perm.o\`
`48`		`-ip.o keywords.o link-canary.o md5.o pg_lzcompress.o\`
	`48`	`+ip.o keywords.okwlookup.olink-canary.o md5.o pg_lzcompress.o\`
`49`	`49`	`pgfnames.o psprintf.o relpath.o\`
`50`	`50`	`rmtree.o saslprep.o scram-common.o string.o unicode_norm.o\`
`51`	`51`	`username.o wait_error.o`
`@@ -65,6 +65,8 @@ OBJS_SRV = $(OBJS_COMMON:%.o=%_srv.o)`
`65`	`65`
`66`	`66`	`all: libpgcommon.a libpgcommon_shlib.a libpgcommon_srv.a`
`67`	`67`
	`68`	`+distprep: kwlist_d.h`
	`69`	`+`
`68`	`70`	`# libpgcommon is needed by some contrib`
`69`	`71`	`install: all installdirs`
`70`	`72`	`$(INSTALL_STLIB) libpgcommon.a'$(DESTDIR)$(libdir)/libpgcommon.a'`
`@@ -115,16 +117,18 @@ libpgcommon_srv.a: $(OBJS_SRV)`
`115`	`117`	`%_srv.o:%.c%.o`
`116`	`118`	`$(CC)$(CFLAGS)$(subst -DFRONTEND,,$(CPPFLAGS)) -c$< -o$@`
`117`	`119`
`118`		`-# Dependencies of keywords.o need to be managed explicitly to make sure`
`119`		`-# that you don't get broken parsing code, even in a non-enable-depend build.`
`120`		`-# Note that gram.h isn't required for the frontend versions of keywords.o.`
`121`		`-$(top_builddir)/src/include/parser/gram.h:$(top_srcdir)/src/backend/parser/gram.y`
`122`		`-$(MAKE) -C$(top_builddir)/src/backend$(top_builddir)/src/include/parser/gram.h`
	`120`	`+# generate SQL keyword lookup table to be included into keywords*.o.`
	`121`	`+kwlist_d.h:$(top_srcdir)/src/include/parser/kwlist.h$(top_srcdir)/src/tools/gen_keywordlist.pl`
	`122`	`+$(PERL)$(top_srcdir)/src/tools/gen_keywordlist.pl --extern$<`
`123`	`123`
`124`		`-keywords.o:$(top_srcdir)/src/include/parser/kwlist.h`
`125`		`-keywords_shlib.o:$(top_srcdir)/src/include/parser/kwlist.h`
`126`		`-keywords_srv.o:$(top_builddir)/src/include/parser/gram.h$(top_srcdir)/src/include/parser/kwlist.h`
	`124`	`+# Dependencies ofkeywords*.o need to be managed explicitly to make sure`
	`125`	`+# that you don't get broken parsing code, even in a non-enable-depend build.`
	`126`	`+keywords.okeywords_shlib.okeywords_srv.o: kwlist_d.h`
`127`	`127`
`128`		`-cleandistcleanmaintainer-clean:`
	`128`	`+# kwlist_d.h is in the distribution tarball, so it is not cleaned here.`
	`129`	`+cleandistclean:`
`129`	`130`	`rm -f libpgcommon.a libpgcommon_shlib.a libpgcommon_srv.a`
`130`	`131`	`rm -f$(OBJS_FRONTEND)$(OBJS_SHLIB)$(OBJS_SRV)`
	`132`	`+`
	`133`	`+maintainer-clean: distclean`
	`134`	`+rm -f kwlist_d.h`

`‎src/common/keywords.c`

Lines changed: 9 additions & 90 deletions

Original file line number	Diff line number	Diff line change
`@@ -1,7 +1,7 @@`
`1`	`1`	`/*-------------------------------------------------------------------------`
`2`	`2`	`*`
`3`	`3`	`* keywords.c`
`4`		`- *lexical token lookup for key words in PostgreSQL`
	`4`	`+ *PostgreSQL's list of SQL keywords`
`5`	`5`	`*`
`6`	`6`	`*`
`7`	`7`	`* Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group`
`@@ -13,102 +13,21 @@`
`13`	`13`	`*`
`14`	`14`	`*-------------------------------------------------------------------------`
`15`	`15`	`*/`
`16`		`-#ifndefFRONTEND`
`17`		`-#include"postgres.h"`
`18`		`-#else`
`19`		`-#include"postgres_fe.h"`
`20`		`-#endif`
	`16`	`+#include"c.h"`
`21`	`17`
`22`		`-#ifndefFRONTEND`
`23`		`-`
`24`		`-#include"parser/gramparse.h"`
	`18`	`+#include"common/keywords.h"`
`25`	`19`
`26`		`-#definePG_KEYWORD(a,b,c) {a,b,c},`
`27`	`20`
`28`		`-#else`
	`21`	`+/* ScanKeywordList lookup data for SQL keywords */`
`29`	`22`
`30`		`-#include"common/keywords.h"`
`31`		`-`
`32`		`-/*`
`33`		`- * We don't need the token number for frontend uses, so leave it out to avoid`
`34`		`- * requiring backend headers that won't compile cleanly here.`
`35`		`- */`
`36`		`-#definePG_KEYWORD(a,b,c) {a,0,c},`
	`23`	`+#include"kwlist_d.h"`
`37`	`24`
`38`		`-#endif/FRONTEND /`
	`25`	`+/Keyword categories for SQL keywords /`
`39`	`26`
	`27`	`+#definePG_KEYWORD(kwname,value,category) category,`
`40`	`28`
`41`		`-constScanKeywordScanKeywords[]= {`
	`29`	`+constuint8ScanKeywordCategories[SCANKEYWORDS_NUM_KEYWORDS]= {`
`42`	`30`	`#include"parser/kwlist.h"`
`43`	`31`	`};`
`44`	`32`
`45`		`-constintNumScanKeywords=lengthof(ScanKeywords);`
`46`		`-`
`47`		`-`
`48`		`-/*`
`49`		`- * ScanKeywordLookup - see if a given word is a keyword`
`50`		`- *`
`51`		`- * The table to be searched is passed explicitly, so that this can be used`
`52`		`- * to search keyword lists other than the standard list appearing above.`
`53`		`- *`
`54`		`- * Returns a pointer to the ScanKeyword table entry, or NULL if no match.`
`55`		`- *`
`56`		`- * The match is done case-insensitively. Note that we deliberately use a`
`57`		`- * dumbed-down case conversion that will only translate 'A'-'Z' into 'a'-'z',`
`58`		`- * even if we are in a locale where tolower() would produce more or different`
`59`		`- * translations. This is to conform to the SQL99 spec, which says that`
`60`		`- * keywords are to be matched in this way even though non-keyword identifiers`
`61`		`- * receive a different case-normalization mapping.`
`62`		`- */`
`63`		`-constScanKeyword*`
`64`		`-ScanKeywordLookup(constchar*text,`
`65`		`-constScanKeyword*keywords,`
`66`		`-intnum_keywords)`
`67`		`-{`
`68`		`-intlen,`
`69`		`-i;`
`70`		`-charword[NAMEDATALEN];`
`71`		`-constScanKeyword*low;`
`72`		`-constScanKeyword*high;`
`73`		`-`
`74`		`-len=strlen(text);`
`75`		`-/* We assume all keywords are shorter than NAMEDATALEN. */`
`76`		`-if (len >=NAMEDATALEN)`
`77`		`-returnNULL;`
`78`		`-`
`79`		`-/*`
`80`		`- * Apply an ASCII-only downcasing. We must not use tolower() since it may`
`81`		`- * produce the wrong translation in some locales (eg, Turkish).`
`82`		`- */`
`83`		`-for (i=0;i<len;i++)`
`84`		`-{`
`85`		`-charch=text[i];`
`86`		`-`
`87`		`-if (ch >='A'&&ch <='Z')`
`88`		`-ch+='a'-'A';`
`89`		`-word[i]=ch;`
`90`		`-}`
`91`		`-word[len]='\0';`
`92`		`-`
`93`		`-/*`
`94`		`- * Now do a binary search using plain strcmp() comparison.`
`95`		`- */`
`96`		`-low=keywords;`
`97`		`-high=keywords+ (num_keywords-1);`
`98`		`-while (low <=high)`
`99`		`-{`
`100`		`-constScanKeyword*middle;`
`101`		`-intdifference;`
`102`		`-`
`103`		`-middle=low+ (high-low) /2;`
`104`		`-difference=strcmp(middle->name,word);`
`105`		`-if (difference==0)`
`106`		`-returnmiddle;`
`107`		`-elseif (difference<0)`
`108`		`-low=middle+1;`
`109`		`-else`
`110`		`-high=middle-1;`
`111`		`-}`
`112`		`-`
`113`		`-returnNULL;`
`114`		`-}`
	`33`	`+#undef PG_KEYWORD`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commitafb0d07

File tree

32 files changed

32 files changed

`‎contrib/pg_stat_statements/pg_stat_statements.c`

`‎src/backend/parser/parser.c`

`‎src/backend/parser/scan.l`

`‎src/backend/utils/adt/misc.c`

`‎src/backend/utils/adt/ruleutils.c`

`‎src/common/.gitignore`

`‎src/common/Makefile`

`‎src/common/keywords.c`

0 commit comments