NotificationsYou must be signed in to change notification settings
Fork68
Star588

Available optimizations

Dmitry Ivanov edited this pageOct 9, 2017 ·11 revisions

Effective caching

Since the beginning, pg_pathman aims to improve performance of queries involving partitioned tables with significant amount of partitions (i.e. 1K+). Although PostgreSQL 10 features built-in support for partitioning, it still uses an exhaustive search in order to select suitable partitions, which gets slower and slower as the total amount increases.

Our per-relation cache (partition dispatch cache) stores an ordered array of structs containing:

Partition'sOid (a unique identifier of this partition);
Left & right bounds (RANGE partitioning) ORhash value (HASH partitioning);

This allows us to perform a binary search (tables partitioned by RANGE), and use hash value moduloamount of partitions as an index of the corresponding partition (tables partitioned by HASH). Furthermore, we strive to remove redundant WHERE clauses (for instance, we don't have to recheckkey < 20 for rows of partition covering[0, 10)).

You can examine cache stats using the following query:

select*from pathman_cache_stats;         context          |  size  | used  | entries--------------------------+--------+-------+--------- maintenance              |0 |0 |0 partition dispatch cache |9216 |8496 |2-- N partitioned tables partition parents cache  |122880 |70912 |10-- cached pairs (child, parent) partition bounds cache   |122880 |71936 |10-- cached bounds (L + R)(4 rows)

Custom Nodes

pg_pathman provides a couple ofcustom plan nodes which aim to reduce execution time, namely:

RuntimeAppend (overridesAppend plan node)
RuntimeMergeAppend (overridesMergeAppend plan node)
PartitionFilter (drop-in replacement for INSERT triggers)

PartitionFilter acts as aproxy node for INSERT's child scan, which means it can redirect output tuples to the corresponding partition:

EXPLAIN (COSTS OFF)INSERT INTO partitioned_tableSELECT generate_series(1,10), random();               QUERY PLAN----------------------------------------- Inserton partitioned_table->  Custom Scan (PartitionFilter)->  Subquery Scanon"*SELECT*"->  Result(4 rows)

RuntimeAppend andRuntimeMergeAppend have much in common: they come in handy in a case when WHERE condition takes form of:

VARIABLE OP PARAM

This kind of expressions can no longer be optimized at planning time since the parameter's value is not known until the execution stage takes place. The problem can be solved by embedding theWHERE condition analysis routine into the originalAppend's code, thus making it pick only required scans out of a whole bunch of planned partition scans. This effectively boils down to creation of a custom node capable of performing such a check.

Examples

Here's a few examples of queries with improved plans:

createtablesimple (idintnot null);insert into simplevalues (10), (20);createtableabc (idintnot null);select create_range_partitions('abc','id',1,10,10);/* pg_pathman*/explain (analyze, costs off)select*from abcwhere id<21;                               QUERY PLAN------------------------------------------------------------------------ Append (actualtime=0.034..38.046 rows=190000 loops=1)->  Seq Scanon abc_1 (actualtime=0.032..13.249 rows=90000 loops=1)->  Seq Scanon abc_2 (actualtime=0.006..9.746 rows=100000 loops=1) Planningtime:0.379 ms Executiontime:45.777 ms(5 rows)/* standard inheritance*/explain (analyze, costs off)select*from abcwhere id<21;                               QUERY PLAN------------------------------------------------------------------------- Append (actualtime=0.017..51.047 rows=190001 loops=1)->  Seq Scanon abc (actualtime=0.016..0.018 rows=1 loops=1)         Filter: (id<21)->  Seq Scanon abc_1 (actualtime=0.016..20.356 rows=90000 loops=1)         Filter: (id<21)->  Seq Scanon abc_2 (actualtime=0.011..17.618 rows=100000 loops=1)         Filter: (id<21) Planningtime:0.847 ms Executiontime:57.893 ms(9 rows)/* pg_pathman*/explain (analyze, costs off)update simpleset id=2from abcwheresimple.id=abc.id;                                       QUERY PLAN----------------------------------------------------------------------------------------Updateon simple (actualtime=54.708..54.708 rows=0 loops=1)->  Nested Loop (actualtime=0.098..42.023 rows=20000 loops=1)->  Seq Scanon simple (actualtime=0.017..0.019 rows=2 loops=1)->  Custom Scan (RuntimeAppend) (actualtime=0.022..19.123 rows=10000 loops=2)               Prune by: (simple.id=abc.id)->  Seq Scanon abc_1 (actualtime=0.031..19.920 rows=10000 loops=1)                     Filter: (simple.id= id)                     Rows Removed by Filter:90000->  Seq Scanon abc_2 (actualtime=0.012..16.387 rows=10000 loops=1)                     Filter: (simple.id= id)                     Rows Removed by Filter:90000 Planningtime:3.450 ms Executiontime:54.858 ms(13 rows)/* standard inheritance*/explain (analyze, costs off)update simpleset id=2from abcwheresimple.id=abc.id;                                      QUERY PLAN--------------------------------------------------------------------------------------Updateon simple (actualtime=399.068..399.068 rows=0 loops=1)->  HashJoin (actualtime=0.096..385.563 rows=20000 loops=1)         Hash Cond: (abc.id=simple.id)->  Append (actualtime=0.034..281.405 rows=1000000 loops=1)->  Seq Scanon abc (actualtime=0.006..0.006 rows=0 loops=1)->  Seq Scanon abc_1 (actualtime=0.025..25.897 rows=100000 loops=1)->  Seq Scanon abc_2 (actualtime=0.008..21.762 rows=100000 loops=1)->  Seq Scanon abc_3 (actualtime=0.008..21.162 rows=100000 loops=1)->  Seq Scanon abc_4 (actualtime=0.006..21.217 rows=100000 loops=1)->  Seq Scanon abc_5 (actualtime=0.006..21.300 rows=100000 loops=1)->  Seq Scanon abc_6 (actualtime=0.006..21.507 rows=100000 loops=1)->  Seq Scanon abc_7 (actualtime=0.007..21.285 rows=100000 loops=1)->  Seq Scanon abc_8 (actualtime=0.006..21.182 rows=100000 loops=1)->  Seq Scanon abc_9 (actualtime=0.008..21.253 rows=100000 loops=1)->  Seq Scanon abc_10 (actualtime=0.006..21.666 rows=100000 loops=1)->  Hash (actualtime=0.019..0.019 rows=2 loops=1)               Buckets:1024  Batches:1  Memory Usage: 9kB->  Seq Scanon simple (actualtime=0.006..0.009 rows=2 loops=1) Planningtime:1.405 ms Executiontime:399.338 ms(20 rows)/* pg_pathman*/explain (analyze, costs off)deletefrom abcwhere id=1;                             QUERY PLAN---------------------------------------------------------------------Deleteon abc_1 (actualtime=22.751..22.751 rows=0 loops=1)->  Seq Scanon abc_1 (actualtime=22.749..22.749 rows=0 loops=1)         Filter: (id=1)         Rows Removed by Filter:90000 Planningtime:0.144 ms Executiontime:22.786 ms(6 rows)/* standard inheritance*/explain (analyze, costs off)deletefrom abcwhere id=1;                               QUERY PLAN------------------------------------------------------------------------Deleteon abc (actualtime=36.546..36.546 rows=0 loops=1)Deleteon abcDeleteon abc_1->  Seq Scanon abc (actualtime=0.006..0.006 rows=0 loops=1)         Filter: (id=1)->  Seq Scanon abc_1 (actualtime=0.092..21.588 rows=10000 loops=1)         Filter: (id=1)         Rows Removed by Filter:90000 Planningtime:1.603 ms Executiontime:36.615 ms(10 rows)

Caveats

It's hard to support different releases of PostgreSQL (9.5, 9.6, 10 etc), but we try to do our best. However, there might be some problems with planning that are inherent to the specific release of PostgreSQL, since we use different version-specific pieces of code to make pg_pathman work. For example, wehad to disable some optimizations on PostgreSQL 9.5 in:

SELECT ... FROM partitioned_table sub-queries under UPDATE & DELETE statements;
SELECT ... FROM partitioned_table FOR SHARE/UPDATE/etc.

/* * SELECT sub-query under UPDATE*//* PostgreSQL 9.5, chose all partitions and parent*/explain (costs off)update simpleset id=2where idin (select*from abcwhere id=simple.idand id<20);                           QUERY PLAN----------------------------------------------------------------Updateon simple->  Seq Scanon simple         Filter: (SubPlan1)         SubPlan1->  Append->  Seq Scanon abc                       Filter: ((id<20)AND (id=simple.id))->  Seq Scanon abc_1                       Filter: ((id<20)AND (id=simple.id))->  Seq Scanon abc_2                       Filter: ((id<20)AND (id=simple.id))(11 rows)/* PostgreSQL 9.6+, chose only relevant partitions*/explain (costs off)update simpleset id=2where idin (select*from abcwhere id=simple.idand id<20);                             QUERY PLAN--------------------------------------------------------------------Updateon simple->  Seq Scanon simple         Filter: (SubPlan1)         SubPlan1->  Custom Scan (RuntimeAppend)                 Prune by: ((abc.id<20)AND (abc.id=simple.id))->  Seq Scanon abc_1 abc                       Filter: (id=simple.id)->  Seq Scanon abc_2 abc                       Filter: ((id<20)AND (id=simple.id))(10 rows)/* * SELECT ... FOR UPDATE*//* PostgreSQL 9.5, couldn't exclude parent*/EXPLAIN (COSTS OFF)SELECT*FROMrowmarks.firstORDER BY id FORUPDATE;              QUERY PLAN--------------------------------------- LockRows->  Sort         Sort Key:first.id->  Append->  Seq Scanon first->  Seq Scanon first_0->  Seq Scanon first_1->  Seq Scanon first_2(8 rows)/* PostgreSQL 9.6+, better planning time (hidden)*/EXPLAIN (COSTS OFF)SELECT*FROMrowmarks.firstORDER BY id FORUPDATE;              QUERY PLAN--------------------------------------- LockRows->  Sort         Sort Key:first_0.id->  Append->  Seq Scanon first_0->  Seq Scanon first_1->  Seq Scanon first_2(7 rows)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Available optimizations

Effective caching

Custom Nodes

Examples

Caveats

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally