- Notifications
You must be signed in to change notification settings - Fork57
Description
Below are two cases showing wrong results returned from RUM index withorder_by_attach=TRUE
. In both test cases, iforder_by_attach=FALSE
, correct results will be returned.
Tested with PostgreSQL 12.2 & newest commite34375a.
CASE 1:
CREATE TABLE test (id bigint NOT NULL,folder bigint NOT NULL,time bigint NOT NULL,tsv tsvector NOT NULL)
INSERT INTO test (id, folder, time, tsv) VALUES (1, 10, 100, to_tsvector('wordA'));INSERT INTO test (id, folder, time, tsv) VALUES (2, 20, 200, to_tsvector('wordB'));INSERT INTO test (id, folder, time, tsv) VALUES (3, 10, 300, to_tsvector('wordA'));INSERT INTO test (id, folder, time, tsv) VALUES (4, 20, 400, to_tsvector('wordB'));
Below shows expected results when select without index:
SELECT * FROM test WHERE tsv @@ (to_tsquery('wordA')) AND (folder = 10::bigint)
Returns rows with id 1 and 3.
SELECT * FROM test WHERE tsv @@ (to_tsquery('wordA')) AND (folder = 10::bigint) ORDER BY time <=| 500::bigint
Returns rows with id 3 and 1.
After creating the following index, wrong results will be returned:
CREATE INDEX test_idx ON test USING rum(folder, tsv rum_tsvector_addon_ops, time) with (attach = 'time', to = 'tsv', order_by_attach=TRUE);
SET enable_seqscan = OFF;SELECT * FROM test WHERE tsv @@ (to_tsquery('wordA')) AND (folder = 10::bigint)
Wrong result: Returns only row with id 1. (Expects 1 and 3)
SET enable_seqscan = OFF;SELECT * FROM test WHERE tsv @@ (to_tsquery('wordA')) AND (folder = 10::bigint) ORDER BY time <=| 500::bigint
Wrong result: Returns nothing. (Expects rows with id 3 and 1)
CASE 2:
CREATE TABLE test2 (id bigint NOT NULL,time bigint NOT NULL,tsv tsvector NOT NULL)
CREATE OR REPLACE PROCEDURE test2_init()AS $$DECLAREcounter INTEGER := 1; str TEXT;time BIGINT;BEGINWHILE counter <= 1000 LOOPIF counter % 10 = 0 THENstr := 'wordA wordB';ELSEIF counter % 11 = 0 THENstr := 'wordA wordB wordC';ELSEstr := 'wordA wordD';END IF;-- insert rows with alternating timeIF counter % 2 = 0 THENtime := counter;ELSEtime := -counter;END IF;INSERT INTO test2 (id, time, tsv) VALUES (counter, time, to_tsvector(str));counter := counter + 1;END LOOP;END$$LANGUAGE plpgsql;CALL test2_init();
Below shows expected results when select without index:
SELECT * from test2 where tsv @@ (to_tsquery('wordA & wordB')) ORDER BY time <=| 1001::bigint
Returned 181 rows.
After creating the following index, wrong results will be returned:
CREATE INDEX test2_idx ON test2 USING rum(tsv rum_tsvector_addon_ops, time) with (attach = 'time', to = 'tsv', order_by_attach=TRUE);
SET enable_seqscan = OFF;SELECT * from test2 where tsv @@ (to_tsquery('wordA & wordB')) ORDER BY time <=| 1001::bigint
Wrong result: Returns only 153 rows. (Expects 181 rows)
If the above query changes fromORDER BY time <=| 1001::bigint
toORDER BY <=> 1001::bigint
, then correct number of rows will be returned.