Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitcd38d4a

Browse files
committed
Merge commit '1111fce351cb161ef71ed8f18bac80211466de75' into PGPROEE9_6
2 parentsa5990c0 +1111fce commitcd38d4a

38 files changed

+2509
-107
lines changed

‎contrib/rum/Makefile

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,18 @@
11
# contrib/rum/Makefile
22

33
MODULE_big = rum
4-
OBJS = rumsort.o rum_ts_utils.o rumtsquery.o\
5-
rumbtree.o rumbulk.o rumdatapage.o\
6-
rumentrypage.o rumget.o ruminsert.o\
7-
rumscan.o rumutil.o rumvacuum.o rumvalidate.o\
8-
rum_timestamp.o$(WIN32RES)
4+
OBJS =src/rumsort.osrc/rum_ts_utils.osrc/rumtsquery.o\
5+
src/rumbtree.osrc/rumbulk.osrc/rumdatapage.o\
6+
src/rumentrypage.osrc/rumget.osrc/ruminsert.o\
7+
src/rumscan.osrc/rumutil.osrc/rumvacuum.osrc/rumvalidate.o\
8+
src/rum_timestamp.o$(WIN32RES)
99

1010
EXTENSION = rum
1111
DATA = rum--1.0.sql
1212
PGFILEDESC = "RUM index access method"
1313

14-
REGRESS = rum ruminv timestamp orderby altorder
14+
REGRESS = rum rum_hash ruminv timestamp orderby orderby_hash altorder\
15+
altorder_hash
1516

1617
LDFLAGS_SL +=$(filter -lm,$(LIBS))
1718

‎contrib/rum/README.md

Lines changed: 167 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,32 @@
1+
[![Postgres Professional](img/PGpro-logo.png)](https://postgrespro.com/)
2+
13
#RUM - RUM access method
24

35
##Introduction
46

5-
The**rum** module provides access method to work with RUM index. It is based
6-
on the GIN access methods code.
7+
The**rum** module provides access method to work with`RUM` index. It is based
8+
on the`GIN` access methods code.
9+
10+
`GIN` index allows to perform fast full text search using`tsvector` and
11+
`tsquery` types. But full text search with GIN index has several problems:
12+
13+
- Slow ranking. It is need position information about lexems to ranking.`GIN`
14+
index doesn't store positions of lexems. So after index scan we need additional
15+
heap scan to retreive lexems positions.
16+
- Slow phrase search with`GIN` index. This problem relates with previous
17+
problem. It is need position information to perform phrase search.
18+
- Slow ordering by timestamp.`GIN` index can't store some related information
19+
in index with lexemes. So it is necessary to perform additional heap scan.
20+
21+
`RUM` solves this problems by storing additional information in posting tree.
22+
For example, positional information of lexemes or timestamps. You can get an
23+
idea of`RUM` by the following picture:
24+
25+
![How RUM stores additional information](img/gin_rum.png)
26+
27+
Drawback of`RUM` is that it has slower build and insert time than`GIN`.
28+
It is because we need to store additional information besides keys and because
29+
`RUM` uses generic WAL.
730

831
##License
932

@@ -14,7 +37,7 @@ This module available under the same license as
1437

1538
Before build and install**rum** you should ensure following:
1639

17-
* PostgreSQL version is 9.6.
40+
* PostgreSQL version is 9.6+.
1841

1942
Typical installation procedure may look like this:
2043

@@ -25,18 +48,27 @@ Typical installation procedure may look like this:
2548
$ make USE_PGXS=1 installcheck
2649
$ psql DB -c "CREATE EXTENSION rum;"
2750

28-
##New access method and operator class
29-
30-
The**rum** module provides the access method**rum** and the operator class
31-
**rum_tsvector_ops**.
51+
##Common operators and functions
3252

33-
The module providesnew operators.
53+
**rum** module providesnext operators.
3454

3555
| Operator | Returns | Description
3656
| -------------------- | ------- | ----------------------------------------------
3757
| tsvector<=> tsquery | float4 | Returns distance between tsvector and tsquery.
58+
| timestamp<=> timestamp | float8 | Returns distance between two timestamps.
59+
| timestamp<=| timestamp | float8 | Returns distance only for left timestamps.
60+
| timestamp|=> timestamp | float8 | Returns distance only for right timestamps.
3861

39-
##Examples
62+
##Operator classes
63+
64+
**rum** provides next operator classes.
65+
66+
###rum_tsvector_ops
67+
68+
For type:`tsvector`
69+
70+
This operator class stores`tsvector` lexemes with positional information. Supports
71+
ordering by`<=>` operator and prefix search. There is the example.
4072

4173
Let us assume we have the table:
4274

@@ -67,28 +99,146 @@ CREATE INDEX rumidx ON test_rum USING rum (a rum_tsvector_ops);
6799
And we can execute the following queries:
68100

69101
```sql
70-
=#SELECT t, a <=> to_tsquery('english', 'beautiful | place') AS rank
71-
FROM test_rum
72-
WHERE a @@ to_tsquery('english','beautiful | place')
73-
ORDER BY a<=> to_tsquery('english','beautiful | place');
102+
SELECT t, a<=> to_tsquery('english','beautiful | place')AS rank
103+
FROM test_rum
104+
WHERE a @@ to_tsquery('english','beautiful | place')
105+
ORDER BY a<=> to_tsquery('english','beautiful | place');
74106
t | rank
75107
---------------------------------+-----------
76108
The situation is most beautiful |0.0303964
77109
It is a beautiful |0.0303964
78110
It lookslike a beautiful place |0.0607927
79111
(3 rows)
80112

81-
=#SELECT t, a <=> to_tsquery('english', 'place | situation') AS rank
82-
FROM test_rum
83-
WHERE a @@ to_tsquery('english','place | situation')
84-
ORDER BY a<=> to_tsquery('english','place | situation');
113+
SELECT t, a<=> to_tsquery('english','place | situation')AS rank
114+
FROM test_rum
115+
WHERE a @@ to_tsquery('english','place | situation')
116+
ORDER BY a<=> to_tsquery('english','place | situation');
85117
t | rank
86118
---------------------------------+-----------
87119
The situation is most beautiful |0.0303964
88120
It lookslike a beautiful place |0.0303964
89121
(2 rows)
90122
```
91123

124+
###rum_tsvector_hash_ops
125+
126+
For type:`tsvector`
127+
128+
This operator class stores hash of`tsvector` lexemes with positional information.
129+
Supports ordering by`<=>` operator. But**doesn't** support prefix search.
130+
131+
###rum_timestamp_ops
132+
133+
For type:`timestamp`
134+
135+
Operator class provides fast search and ordering by timestamp fields. Supports
136+
ordering by`<=>`,`<=|` and`|=>` operators. Can be used with
137+
`rum_tsvector_timestamp_ops` operator class.
138+
139+
###rum_timestamptz_ops
140+
141+
For type:`timestamptz`
142+
143+
Operator class provides fast search and ordering by timestamptz fields. Supports
144+
ordering by`<=>`,`<=|` and`|=>` operators. Can be used with
145+
`rum_tsvector_timestamptz_ops` operator class.
146+
147+
###rum_tsvector_timestamp_ops
148+
149+
For type:`tsvector`
150+
151+
This operator class stores`tsvector` lexems with timestamp field. There is the example.
152+
153+
Let us assume we have the table:
154+
```sql
155+
CREATETABLEtsts (idint, t tsvector, dtimestamp);
156+
157+
\copy tstsfrom'rum/data/tsts.data'
158+
159+
CREATEINDEXtsts_idxON tsts USING rum (t rum_tsvector_timestamp_ops, d)
160+
WITH (attach='d', to='t');
161+
```
162+
163+
Now we can execute the following queries:
164+
```sql
165+
EXPLAIN (costs off)
166+
SELECT id, d, d<=>'2016-05-16 14:21:25'FROM tstsWHERE t @@'wr&qh'ORDER BY d<=>'2016-05-16 14:21:25'LIMIT5;
167+
QUERY PLAN
168+
-----------------------------------------------------------------------------------
169+
Limit
170+
-> Index Scan using tsts_idxon tsts
171+
Index Cond: (t @@'''wr'' &''qh'''::tsquery)
172+
Order By: (d<=>'Mon May 16 14:21:25 2016'::timestamp without time zone)
173+
(4 rows)
174+
175+
SELECT id, d, d<=>'2016-05-16 14:21:25'FROM tstsWHERE t @@'wr&qh'ORDER BY d<=>'2016-05-16 14:21:25'LIMIT5;
176+
id | d | ?column?
177+
-----+---------------------------------+---------------
178+
355 | Mon May1614:21:22.3267242016 |2.673276
179+
354 | Mon May1613:21:22.3267242016 |3602.673276
180+
371 | Tue May1706:21:22.3267242016 |57597.326724
181+
406 | Wed May1817:21:22.3267242016 |183597.326724
182+
415 | Thu May1902:21:22.3267242016 |215997.326724
183+
(5 rows)
184+
```
185+
186+
###rum_tsvector_timestamptz_ops
187+
188+
For type:`tsvector`
189+
190+
See comments for`rum_tsvector_timestamp_ops` operator class.
191+
192+
###rum_tsvector_hash_timestamp_ops
193+
194+
For type:`tsvector`
195+
196+
This operator class stores hash of`tsvector` lexems with timestamp field.
197+
**Doesn't** support prefix search.
198+
199+
###rum_tsvector_hash_timestamptz_ops
200+
201+
For type:`tsvector`
202+
203+
This operator class stores hash of`tsvector` lexems with timestamptz field.
204+
**Doesn't** support prefix search.
205+
206+
###rum_tsquery_ops
207+
208+
For type:`tsquery`
209+
210+
Stores branches of query tree in additional information. For example we have the table:
211+
```sql
212+
CREATETABLEquery (q tsquery, tagtext);
213+
214+
INSERT INTO queryVALUES ('supernova & star','sn'),
215+
('black','color'),
216+
('big & bang & black & hole','bang'),
217+
('spiral & galaxy','shape'),
218+
('black & hole','color');
219+
220+
CREATEINDEXquery_idxON query USING rum(q);
221+
```
222+
223+
Now we can execute the following fast query:
224+
```sql
225+
SELECT*FROM query
226+
WHERE to_tsvector('black holes never exists before we think about them') @@ q;
227+
q | tag
228+
------------------+-------
229+
'black' | color
230+
'black' &'hole' | color
231+
(2 rows)
232+
```
233+
234+
##Todo
235+
236+
- Allow multiple additional information (lexemes positions + timestamp).
237+
- Add support for arrays.
238+
- Improve ranking function to support TF/IDF.
239+
- Improve insert time.
240+
- Improve GENERIC WAL to support shift (PostgreSQL core changes).
241+
92242
##Authors
93243

94244
Alexander Korotkov <a.korotkov@postgrespro.ru> Postgres Professional Ltd., Russia

‎contrib/rum/expected/altorder.out

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -323,3 +323,4 @@ SELECT id, d FROM atsts WHERE t @@ 'wr&qh' AND d >= '2016-05-16 14:21:25' ORDER
323323
496 | Sun May 22 11:21:22.326724 2016
324324
(8 rows)
325325

326+
DROP TABLE atsts CASCADE;

‎contrib/rum/expected/altorder_1.out

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -322,3 +322,4 @@ SELECT id, d FROM atsts WHERE t @@ 'wr&qh' AND d >= '2016-05-16 14:21:25' ORDER
322322
496 | Sun May 22 11:21:22.326724 2016
323323
(8 rows)
324324

325+
DROP TABLE atsts CASCADE;

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp