NotificationsYou must be signed in to change notification settings
Fork28
Star153

Commit73912e7

committed

Fix GIN to support null keys, empty and null items, and full index scans.

Per my recent proposal(s). Null key datums can now be returned byextractValue and extractQuery functions, and will be stored in the index.Also, placeholder entries are made for indexable items that are NULL orcontain no keys according to extractValue. This means that the index isnow always complete, having at least one entry for every indexed heap TID,and so we can get rid of the prohibition on full-index scans. A full-indexscan is implemented much the same way as partial-match scans were already:we build a bitmap representing all the TIDs found in the index, and thendrive the results off that.Also, introduce a concept of a "search mode" that can be requested byextractQuery when the operator requires matching to empty items (this isjust as cheap as matching to a single key) or requires a full index scan(which is not so cheap, but it sure beats failing or giving wrong answers).The behavior remains backward compatible for opclasses that don't returnany null keys or request a non-default search mode.Using these features, we can now make the GIN index opclass for anyarraybehave in a way that matches the actual anyarray operators for &&, <@, @>,and = ... which it failed to do before in assorted corner cases.This commit fixes the core GIN code and ginarrayprocs.c, updates thedocumentation, and adds some simple regression test cases for the newbehaviors using the array operators. The tsearch and contrib GIN opclasssupport functions still need to be looked over and probably fixed.Another thing I intend to fix separately is that this is pretty inefficientfor cases where more than one scan condition needs a full-index search:we'll run duplicate GinScanEntrys, each one of which builds a large bitmap.There is some existing logic to merge duplicate GinScanEntrys but it needsrefactoring to make it work for entries belonging to different scan keys.Note that most of gin.h has been split out into a new file gin_private.h,so that gin.h doesn't export anything that's not supposed to be used by GINopclasses or the rest of the backend. I did quite a bit of other codebeautification work as well, mostly fixing comments and choosing moreappropriate names for things.

1 parent9b4271d commit73912e7Copy full SHA for 73912e7

File tree

23 files changed

+3311

-1855

lines changed

contrib/hstore
- hstore_gin.c
doc/src/sgml
- gin.sgml
src
- backend/access/gin
- include
  - access
    - gin.h
    - gin_private.h
  - utils
    - builtins.h
- test/regress
  - data
    - array.data
  - expected
    - arrays.out
    - create_index.out
  - sql
    - arrays.sql
    - create_index.sql

23 files changed

+3311

-1855

lines changed

`‎contrib/hstore/hstore_gin.c‎`

Lines changed: 1 addition & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -4,6 +4,7 @@`
`4`	`4`	`#include"postgres.h"`
`5`	`5`
`6`	`6`	`#include"access/gin.h"`
	`7`	`+#include"access/skey.h"`
`7`	`8`	`#include"catalog/pg_type.h"`
`8`	`9`
`9`	`10`	`#include"hstore.h"`

`‎doc/src/sgml/gin.sgml‎`

Lines changed: 175 additions & 87 deletions

Large diffs are not rendered by default.

`‎src/backend/access/gin/README‎`

Lines changed: 148 additions & 32 deletions

Original file line number	Diff line number	Diff line change
`@@ -15,23 +15,23 @@ is similar to GiST and differs from btree indices, which have predefined,`
`15`	`15`	`comparison-based operations.`
`16`	`16`
`17`	`17`	`An inverted index is an index structure storing a set of (key, posting list)`
`18`		`-pairs, where 'posting list' is a set ofdocuments in which the key occurs.`
	`18`	`+pairs, where 'posting list' is a set ofheap rows in which the key occurs.`
`19`	`19`	`(A text document would usually contain many keys.) The primary goal of`
`20`	`20`	`Gin indices is support for highly scalable, full-text search in PostgreSQL.`
`21`	`21`
`22`		`-Gin consists of a B-tree index constructed overentries (ET, entries tree),`
`23`		`-where eachentry is an element ofthe indexedvalue (element of array, lexeme`
`24`		`-for tsvector) and where each tuple in a leaf pageis either a pointer to a`
`25`		`-B-tree over item pointers (PT,posting tree), or a list of item pointers`
`26`		`-(PL,posting list) if thetuple is small enough.`
	`22`	`+AGinindexconsists of a B-tree index constructed overkey values,`
	`23`	`+where eachkey is an element ofsome indexeditems (element of array, lexeme`
	`24`	`+for tsvector) and where each tuple in a leaf pagecontains either a pointer to`
	`25`	`+aB-tree over item pointers (posting tree), or a simple list of item pointers`
	`26`	`+(posting list) if thelist is small enough.`
`27`	`27`
`28`		`-Note: There is no delete operation for ET. The reason for this is that in`
`29`		`-our experience, the set of distinct words in a large corpus changes very`
`30`		`-rarely. This greatly simplifies the code and concurrency algorithms.`
	`28`	`+Note: There is no delete operation in the key (entry) tree. The reason for`
	`29`	`+this is that in our experience, the set of distinct words in a large corpus`
	`30`	`+changes very slowly. This greatly simplifies the code and concurrency`
	`31`	`+algorithms.`
`31`	`32`
`32`		`-Gin comes with built-in support for one-dimensional arrays (eg. integer[],`
`33`		`-text[]), but no support for NULL elements. The following operations are`
`34`		`-available:`
	`33`	`+Core PostgreSQL includes built-in Gin support for one-dimensional arrays`
	`34`	`+(eg. integer[], text[]). The following operations are available:`
`35`	`35`
`36`	`36`	`* contains: value_array @> query_array`
`37`	`37`	`* overlaps: value_array && query_array`
`@@ -71,7 +71,7 @@ limit).`
`71`	`71`	`If a non-zero search limit is set, then the returned set is a subset of the`
`72`	`72`	`whole result set, chosen at random.`
`73`	`73`
`74`		`-"Soft" means that the actual number of returned results couldslightlydiffer`
	`74`	`+"Soft" means that the actual number of returned results could differ`
`75`	`75`	`from the specified limit, depending on the query and the quality of the`
`76`	`76`	`system's random number generator.`
`77`	`77`
`@@ -80,40 +80,156 @@ From experience, a value of 'gin_fuzzy_search_limit' in the thousands`
`80`	`80`	`have no effect for queries returning a result set with less tuples than this`
`81`	`81`	`number.`
`82`	`82`
`83`		`-Limitations`
`84`		`------------`
`85`		`-`
`86`		`- * No support for multicolumn indices`
`87`		`- * Gin doesn't uses scan->kill_prior_tuple & scan->ignore_killed_tuples`
`88`		`- * Gin searches entries only by equality matching. This may be improved in`
`89`		`- future.`
`90`		`- * Gin doesn't support full scans of indices.`
`91`		`- * Gin doesn't index NULL values.`
	`83`	`+Index structure`
	`84`	`+---------------`
`92`	`85`
`93`		`-Open Items`
`94`		`-----------`
	`86`	`+The "items" that a GIN index indexes are composite values that contain`
	`87`	`+zero or more "keys". For example, an item might be an integer array, and`
	`88`	`+then the keys would be the individual integer values. The index actually`
	`89`	`+stores and searches for the key values, not the items per se. In the`
	`90`	`+pg_opclass entry for a GIN opclass, the opcintype is the data type of the`
	`91`	`+items, and the opckeytype is the data type of the keys. GIN is optimized`
	`92`	`+for cases where items contain many keys and the same key values appear`
	`93`	`+in many different items.`
	`94`	`+`
	`95`	`+A GIN index contains a metapage, a btree of key entries, and possibly`
	`96`	`+"posting tree" pages, which hold the overflow when a key entry acquires`
	`97`	`+too many heap tuple pointers to fit in a btree page. Additionally, if the`
	`98`	`+fast-update feature is enabled, there can be "list pages" holding "pending"`
	`99`	`+key entries that haven't yet been merged into the main btree. The list`
	`100`	`+pages have to be scanned linearly when doing a search, so the pending`
	`101`	`+entries should be merged into the main btree before there get to be too`
	`102`	`+many of them. The advantage of the pending list is that bulk insertion of`
	`103`	`+a few thousand entries can be much faster than retail insertion. (The win`
	`104`	`+comes mainly from not having to do multiple searches/insertions when the`
	`105`	`+same key appears in multiple new heap tuples.)`
	`106`	`+`
	`107`	`+Key entries are nominally of the same IndexEntry format as used in other`
	`108`	`+index types, but since a leaf key entry typically refers to multiple heap`
	`109`	`+tuples, there are significant differences. (See GinFormTuple, which works`
	`110`	`+by building a "normal" index tuple and then modifying it.) The points to`
	`111`	`+know are:`
	`112`	`+`
	`113`	`+* In a single-column index, a key tuple just contains the key datum, but`
	`114`	`+in a multi-column index, a key tuple contains the pair (column number,`
	`115`	`+key datum) where the column number is stored as an int2. This is needed`
	`116`	`+to support different key data types in different columns. This much of`
	`117`	`+the tuple is built by index_form_tuple according to the usual rules.`
	`118`	`+The column number (if present) can never be null, but the key datum can`
	`119`	`+be, in which case a null bitmap is present as usual. (As usual for index`
	`120`	`+tuples, the size of the null bitmap is fixed at INDEX_MAX_KEYS.)`
	`121`	`+`
	`122`	`+* If the key datum is null (ie, IndexTupleHasNulls() is true), then`
	`123`	`+just after the nominal index data (ie, at offset IndexInfoFindDataOffset`
	`124`	`+or IndexInfoFindDataOffset + sizeof(int2)) there is a byte indicating`
	`125`	`+the "category" of the null entry. These are the possible categories:`
	`126`	`+1 = ordinary null key value extracted from an indexable item`
	`127`	`+2 = placeholder for zero-key indexable item`
	`128`	`+3 = placeholder for null indexable item`
	`129`	`+Placeholder null entries are inserted into the index because otherwise`
	`130`	`+there would be no index entry at all for an empty or null indexable item,`
	`131`	`+which would mean that full index scans couldn't be done and various corner`
	`132`	`+cases would give wrong answers. The different categories of null entries`
	`133`	`+are treated as distinct keys by the btree, but heap itempointers for the`
	`134`	`+same category of null entry are merged into one index entry just as happens`
	`135`	`+with ordinary key entries.`
	`136`	`+`
	`137`	`+* In a key entry at the btree leaf level, at the next SHORTALIGN boundary,`
	`138`	`+there is an array of zero or more ItemPointers, which store the heap tuple`
	`139`	`+TIDs for which the indexable items contain this key. This is called the`
	`140`	`+"posting list". The TIDs in a posting list must appear in sorted order.`
	`141`	`+If the list would be too big for the index tuple to fit on an index page,`
	`142`	`+the ItemPointers are pushed out to a separate posting page or pages, and`
	`143`	`+none appear in the key entry itself. The separate pages are called a`
	`144`	`+"posting tree"; they are organized as a btree of ItemPointer values.`
	`145`	`+Note that in either case, the ItemPointers associated with a key can`
	`146`	`+easily be read out in sorted order; this is relied on by the scan`
	`147`	`+algorithms.`
	`148`	`+`
	`149`	`+* The index tuple header fields of a leaf key entry are abused as follows:`
	`150`	`+`
	`151`	`+1) Posting list case:`
	`152`	`+`
	`153`	`+* ItemPointerGetBlockNumber(&itup->t_tid) contains the offset from index`
	`154`	`+ tuple start to the posting list.`
	`155`	`+ Access macros: GinGetPostingOffset(itup) / GinSetPostingOffset(itup,n)`
	`156`	`+`
	`157`	`+* ItemPointerGetOffsetNumber(&itup->t_tid) contains the number of elements`
	`158`	`+ in the posting list (number of heap itempointers).`
	`159`	`+ Access macros: GinGetNPosting(itup) / GinSetNPosting(itup,n)`
	`160`	`+`
	`161`	`+* If IndexTupleHasNulls(itup) is true, the null category byte can be`
	`162`	`+ accessed/set with GinGetNullCategory(itup,gs) / GinSetNullCategory(itup,gs,c)`
	`163`	`+`
	`164`	`+* The posting list can be accessed with GinGetPosting(itup)`
	`165`	`+`
	`166`	`+2) Posting tree case:`
	`167`	`+`
	`168`	`+* ItemPointerGetBlockNumber(&itup->t_tid) contains the index block number`
	`169`	`+ of the root of the posting tree.`
	`170`	`+ Access macros: GinGetPostingTree(itup) / GinSetPostingTree(itup, blkno)`
	`171`	`+`
	`172`	`+* ItemPointerGetOffsetNumber(&itup->t_tid) contains the magic number`
	`173`	`+ GIN_TREE_POSTING, which distinguishes this from the posting-list case`
	`174`	`+ (it's large enough that that many heap itempointers couldn't possibly`
	`175`	`+ fit on an index page). This value is inserted automatically by the`
	`176`	`+ GinSetPostingTree macro.`
	`177`	`+`
	`178`	`+* If IndexTupleHasNulls(itup) is true, the null category byte can be`
	`179`	`+ accessed/set with GinGetNullCategory(itup) / GinSetNullCategory(itup,c)`
	`180`	`+`
	`181`	`+* The posting list is not present and must not be accessed.`
	`182`	`+`
	`183`	`+Use the macro GinIsPostingTree(itup) to determine which case applies.`
	`184`	`+`
	`185`	`+In both cases, itup->t_info & INDEX_SIZE_MASK contains actual total size of`
	`186`	`+tuple, and the INDEX_VAR_MASK and INDEX_NULL_MASK bits have their normal`
	`187`	`+meanings as set by index_form_tuple.`
	`188`	`+`
	`189`	`+Index tuples in non-leaf levels of the btree contain the optional column`
	`190`	`+number, key datum, and null category byte as above. They do not contain`
	`191`	`+a posting list. ItemPointerGetBlockNumber(&itup->t_tid) is the downlink`
	`192`	`+to the next lower btree level, and ItemPointerGetOffsetNumber(&itup->t_tid)`
	`193`	`+is InvalidOffsetNumber. Use the access macros GinGetDownlink/GinSetDownlink`
	`194`	`+to get/set the downlink.`
	`195`	`+`
	`196`	`+Index entries that appear in "pending list" pages work a tad differently as`
	`197`	`+well. The optional column number, key datum, and null category byte are as`
	`198`	`+for other GIN index entries. However, there is always exactly one heap`
	`199`	`+itempointer associated with a pending entry, and it is stored in the t_tid`
	`200`	`+header field just as in non-GIN indexes. There is no posting list.`
	`201`	`+Furthermore, the code that searches the pending list assumes that all`
	`202`	`+entries for a given heap tuple appear consecutively in the pending list and`
	`203`	`+are sorted by the column-number-plus-key-datum. The GIN_LIST_FULLROW page`
	`204`	`+flag bit tells whether entries for a given heap tuple are spread across`
	`205`	`+multiple pending-list pages. If GIN_LIST_FULLROW is set, the page contains`
	`206`	`+all the entries for one or more heap tuples. If GIN_LIST_FULLROW is clear,`
	`207`	`+the page contains entries for only one heap tuple, and they are not all`
	`208`	`+the entries for that tuple. (Thus, a heap tuple whose entries do not all`
	`209`	`+fit on one pending-list page must have those pages to itself, even if this`
	`210`	`+results in wasting much of the space on the preceding page and the last`
	`211`	`+page for the tuple.)`
`95`	`212`
`96`		`-We appreciate any comments, help and suggestions.`
	`213`	`+Limitations`
	`214`	`+-----------`
`97`	`215`
`98`		`- *Teach optimizer/executor that GIN is intrinsically clustered. i.e., it`
`99`		`- always returns ItemPointer in ascending order.`
`100`		`-* Tweak gincostestimate.`
	`216`	`+ *Gin doesn't use scan->kill_prior_tuple & scan->ignore_killed_tuples`
	`217`	`+* Gin searches entries only by equality matching, or simple range`
	`218`	`+ matching using the "partial match" feature.`
`101`	`219`
`102`	`220`	`TODO`
`103`	`221`	`----`
`104`	`222`
`105`	`223`	`Nearest future:`
`106`	`224`
`107`		`- * Opclasses forall types (no programming, just many catalog changes).`
	`225`	`+ * Opclasses formore types (no programming, just many catalog changes)`
`108`	`226`
`109`	`227`	`Distant future:`
`110`	`228`
`111`	`229`	`* Replace B-tree of entries to something like GiST`
`112`		`- * Add multicolumn support`
`113`		`- * Optimize insert operations (background index insertion)`
`114`	`230`
`115`	`231`	`Authors`
`116`	`232`	`-------`
`117`	`233`
`118`		`-All work was done by Teodor Sigaev (teodor@sigaev.ru) and Oleg Bartunov`
	`234`	`+Original work was done by Teodor Sigaev (teodor@sigaev.ru) and Oleg Bartunov`
`119`	`235`	`(oleg@sai.msu.su).`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit73912e7

File tree

23 files changed

23 files changed

`‎contrib/hstore/hstore_gin.c‎`

`‎doc/src/sgml/gin.sgml‎`

`‎src/backend/access/gin/README‎`

0 commit comments