F.31. ltree — hierarchical tree-like data type | ||||
---|---|---|---|---|
Prev | Up | Appendix F. Additional Supplied Modules and Extensions Shipped inpostgrespro-std-17-contrib | Home | Next |
F.31. ltree — hierarchical tree-like data type#
This module implements a data typeltree
for representing labels of data stored in a hierarchical tree-like structure. Extensive facilities for searching through label trees are provided.
This module is considered“trusted”, that is, it can be installed by non-superusers who haveCREATE
privilege on the current database.
F.31.1. Definitions#
Alabel is a sequence of alphanumeric characters, underscores, and hyphens. Valid alphanumeric character ranges are dependent on the database locale. For example, in C locale, the charactersA-Za-z0-9_-
are allowed. Labels must be no more than 1000 characters long.
Examples:42
,Personal_Services
Alabel path is a sequence of zero or more labels separated by dots, for exampleL1.L2.L3
, representing a path from the root of a hierarchical tree to a particular node. The length of a label path cannot exceed 65535 labels.
Example:Top.Countries.Europe.Russia
Theltree
module provides several data types:
ltree
stores a label path.lquery
represents a regular-expression-like pattern for matchingltree
values. A simple word matches that label within a path. A star symbol (*
) matches zero or more labels. These can be joined with dots to form a pattern that must match the whole label path. For example:fooMatch the exact label path
foo
*.foo.*Match any label path containing the labelfoo
*.fooMatch any label path whose last label isfoo
Both star symbols and simple words can be quantified to restrict how many labels they can match:
*{
n
}Match exactlyn
labels*{n
,}Match at leastn
labels*{n
,m
}Match at leastn
but not more thanm
labels*{,m
}Match at mostm
labels — same as*{0,m
}foo{n
,m
}Match at leastn
but not more thanm
occurrences offoo
foo{,}Match any number of occurrences offoo
, including zeroIn the absence of any explicit quantifier, the default for a star symbol is to match any number of labels (that is,
{,}
) while the default for a non-star item is to match exactly once (that is,{1}
).There are several modifiers that can be put at the end of a non-star
lquery
item to make it match more than just the exact match:@Match case-insensitively, for example
a@
matchesA
*Match any label with this prefix, for examplefoo*
matchesfoobar
%Match initial underscore-separated wordsThe behavior of
%
is a bit complicated. It tries to match words rather than the entire label. For examplefoo_bar%
matchesfoo_bar_baz
but notfoo_barbaz
. If combined with*
, prefix matching applies to each word separately, for examplefoo_bar%*
matchesfoo1_bar2_baz
but notfoo1_br2_baz
.Also, you can write several possibly-modified non-star items separated with
|
(OR) to match any of those items, and you can put!
(NOT) at the start of a non-star group to match any label that doesn't match any of the alternatives. A quantifier, if any, goes at the end of the group; it means some number of matches for the group as a whole (that is, some number of labels matching or not matching any of the alternatives).Here's an annotated example of
lquery
:Top.*{0,2}.sport*@.!football|tennis{1,}.Russ*|Spaina. b. c. d. e.
This query will match any label path that:
begins with the label
Top
and next has zero to two labels before
a label beginning with the case-insensitive prefix
sport
then has one or more labels, none of which match
football
nortennis
and then ends with a label beginning with
Russ
or exactly matchingSpain
.
ltxtquery
represents a full-text-search-like pattern for matchingltree
values. Anltxtquery
value contains words, possibly with the modifiers@
,*
,%
at the end; the modifiers have the same meanings as inlquery
. Words can be combined with&
(AND),|
(OR),!
(NOT), and parentheses. The key difference fromlquery
is thatltxtquery
matches words without regard to their position in the label path.Here's an example
ltxtquery
:Europe & Russia*@ & !Transportation
This will match paths that contain the label
Europe
and any label beginning withRussia
(case-insensitive), but not paths containing the labelTransportation
. The location of these words within the path is not important. Also, when%
is used, the word can be matched to any underscore-separated word within a label, regardless of position.
Note:ltxtquery
allows whitespace between symbols, butltree
andlquery
do not.
F.31.2. Operators and Functions#
Typeltree
has the usual comparison operators=
,<>
,<
,>
,<=
,>=
. Comparison sorts in the order of a tree traversal, with the children of a node sorted by label text. In addition, the specialized operators shown inTable F.17 are available.
Table F.17. ltree
Operators
Operator Description |
---|
Is left argument an ancestor of right (or equal)? |
Is left argument a descendant of right (or equal)? |
Does |
Does |
Does |
Concatenates |
Converts text to |
Does array contain an ancestor of |
Does array contain a descendant of |
Does array contain any path matching |
Does |
Does array contain any path matching |
Returns first array entry that is an ancestor of |
Returns first array entry that is a descendant of |
Returns first array entry that matches |
Returns first array entry that matches |
The operators<@
,@>
,@
and~
have analogues^<@
,^@>
,^@
,^~
, which are the same except they do not use indexes. These are useful only for testing purposes.
The available functions are shown inTable F.18.
Table F.18. ltree
Functions
F.31.3. Indexes#
ltree
supports several types of indexes that can speed up the indicated operators:
B-tree index over
ltree
:<
,<=
,=
,>=
,>
Hash index over
ltree
:=
GiST index over
ltree
(gist_ltree_ops
opclass):<
,<=
,=
,>=
,>
,@>
,<@
,@
,~
,?
gist_ltree_ops
GiST opclass approximates a set of path labels as a bitmap signature. Its optional integer parametersiglen
determines the signature length in bytes. The default signature length is 8 bytes. The length must be a positive multiple ofint
alignment (4 bytes on most machines)) up to 2024. Longer signatures lead to a more precise search (scanning a smaller fraction of the index and fewer heap pages), at the cost of a larger index.Example of creating such an index with the default signature length of 8 bytes:
CREATE INDEX path_gist_idx ON test USING GIST (path);
Example of creating such an index with a signature length of 100 bytes:
CREATE INDEX path_gist_idx ON test USING GIST (path gist_ltree_ops(siglen=100));
GiST index over
ltree[]
(gist__ltree_ops
opclass):ltree[] <@ ltree
,ltree @> ltree[]
,@
,~
,?
gist__ltree_ops
GiST opclass works similarly togist_ltree_ops
and also takes signature length as a parameter. The default value ofsiglen
ingist__ltree_ops
is 28 bytes.Example of creating such an index with the default signature length of 28 bytes:
CREATE INDEX path_gist_idx ON test USING GIST (array_path);
Example of creating such an index with a signature length of 100 bytes:
CREATE INDEX path_gist_idx ON test USING GIST (array_path gist__ltree_ops(siglen=100));
Note: This index type is lossy.
F.31.4. Example#
This example uses the following data (also available in filecontrib/ltree/ltreetest.sql
in the source distribution):
CREATE TABLE test (path ltree);INSERT INTO test VALUES ('Top');INSERT INTO test VALUES ('Top.Science');INSERT INTO test VALUES ('Top.Science.Astronomy');INSERT INTO test VALUES ('Top.Science.Astronomy.Astrophysics');INSERT INTO test VALUES ('Top.Science.Astronomy.Cosmology');INSERT INTO test VALUES ('Top.Hobbies');INSERT INTO test VALUES ('Top.Hobbies.Amateurs_Astronomy');INSERT INTO test VALUES ('Top.Collections');INSERT INTO test VALUES ('Top.Collections.Pictures');INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy');INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Stars');INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Galaxies');INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Astronauts');CREATE INDEX path_gist_idx ON test USING GIST (path);CREATE INDEX path_idx ON test USING BTREE (path);CREATE INDEX path_hash_idx ON test USING HASH (path);
Now, we have a tabletest
populated with data describing the hierarchy shown below:
Top / | \ Science Hobbies Collections / | \ Astronomy Amateurs_Astronomy Pictures / \ |Astrophysics Cosmology Astronomy / | \ Galaxies Stars Astronauts
We can do inheritance:
ltreetest=> SELECT path FROM test WHERE path <@ 'Top.Science'; path------------------------------------ Top.Science Top.Science.Astronomy Top.Science.Astronomy.Astrophysics Top.Science.Astronomy.Cosmology(4 rows)
Here are some examples of path matching:
ltreetest=> SELECT path FROM test WHERE path ~ '*.Astronomy.*'; path----------------------------------------------- Top.Science.Astronomy Top.Science.Astronomy.Astrophysics Top.Science.Astronomy.Cosmology Top.Collections.Pictures.Astronomy Top.Collections.Pictures.Astronomy.Stars Top.Collections.Pictures.Astronomy.Galaxies Top.Collections.Pictures.Astronomy.Astronauts(7 rows)ltreetest=> SELECT path FROM test WHERE path ~ '*.!pictures@.Astronomy.*'; path------------------------------------ Top.Science.Astronomy Top.Science.Astronomy.Astrophysics Top.Science.Astronomy.Cosmology(3 rows)
Here are some examples of full text search:
ltreetest=> SELECT path FROM test WHERE path @ 'Astro*% & !pictures@'; path------------------------------------ Top.Science.Astronomy Top.Science.Astronomy.Astrophysics Top.Science.Astronomy.Cosmology Top.Hobbies.Amateurs_Astronomy(4 rows)ltreetest=> SELECT path FROM test WHERE path @ 'Astro* & !pictures@'; path------------------------------------ Top.Science.Astronomy Top.Science.Astronomy.Astrophysics Top.Science.Astronomy.Cosmology(3 rows)
Path construction using functions:
ltreetest=> SELECT subpath(path,0,2)||'Space'||subpath(path,2) FROM test WHERE path <@ 'Top.Science.Astronomy'; ?column?------------------------------------------ Top.Science.Space.Astronomy Top.Science.Space.Astronomy.Astrophysics Top.Science.Space.Astronomy.Cosmology(3 rows)
We could simplify this by creating an SQL function that inserts a label at a specified position in a path:
CREATE FUNCTION ins_label(ltree, int, text) RETURNS ltree AS 'select subpath($1,0,$2) || $3 || subpath($1,$2);' LANGUAGE SQL IMMUTABLE;ltreetest=> SELECT ins_label(path,2,'Space') FROM test WHERE path <@ 'Top.Science.Astronomy'; ins_label------------------------------------------ Top.Science.Space.Astronomy Top.Science.Space.Astronomy.Astrophysics Top.Science.Space.Astronomy.Cosmology(3 rows)
F.31.5. Transforms#
Theltree_plpython3u
extension implements transforms for theltree
type for PL/Python. If installed and specified when creating a function,ltree
values are mapped to Python lists. (The reverse is currently not supported, however.)
Caution
It is strongly recommended that the transform extension be installed in the same schema asltree
. Otherwise there are installation-time security hazards if a transform extension's schema contains objects defined by a hostile user.
F.31.6. Authors#
All work was done by Teodor Sigaev (<teodor@stack.net>
) and Oleg Bartunov (<oleg@sai.msu.su>
). Seehttp://www.sai.msu.su/~megera/postgres/gist/ for additional information. Authors would like to thank Eugeny Rodichev for helpful discussions. Comments and bug reports are welcome.