Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Postgres Professional fork of PostgreSQL

NotificationsYou must be signed in to change notification settings

postgrespro/postgrespro

Repository files navigation

Build StatusPGXN version

pg_pathman

Thepg_pathman module provides optimized partitioning mechanism and functions to manage partitions.

The extension is compatible with PostgreSQL 9.5 (9.6 support is coming soon).

Overview

Partitioning means splitting one large table into smaller pieces. Each row in such table is moved to a single partition according to the partitioning key. PostgreSQL supports partitioning via table inheritance: each partition must be created as a child table with CHECK CONSTRAINT. For example:

CREATE TABLE test (id SERIAL PRIMARY KEY, title TEXT);CREATE TABLE test_1 (CHECK ( id >= 100 AND id < 200 )) INHERITS (test);CREATE TABLE test_2 (CHECK ( id >= 200 AND id < 300 )) INHERITS (test);

Despite the flexibility, this approach forces the planner to perform an exhaustive search and to check constraints on each partition to determine whether it should be present in the plan or not. Large amount of partitions may result in significant planning overhead.

Thepg_pathman module features partition managing functions and optimized planning mechanism which utilizes knowledge of the partitions' structure. It stores partitioning configuration in thepathman_config table; each row contains a single entry for a partitioned table (relation name, partitioning column and its type). During the initialization stage thepg_pathman module caches some information about child partitions in the shared memory, which is used later for plan construction. Before a SELECT query is executed,pg_pathman traverses the condition tree in search of expressions like:

VARIABLE OP CONST

whereVARIABLE is a partitioning key,OP is a comparison operator (supported operators are =, <, <=, >, >=),CONST is a scalar value. For example:

WHERE id = 150

Based on the partitioning type and condition's operator,pg_pathman searches for the corresponding partitions and builds the plan. Currentlypg_pathman supports two partitioning schemes:

  • RANGE - maps rows to partitions using partitioning key ranges assigned to each partition. Optimization is achieved by using the binary search algorithm;
  • HASH - maps rows to partitions using a generic hash function.

More interesting features are yet to come. Stay tuned!

Roadmap

  • Provide a way to create user-defined partition creation\destruction callbacks (issue#22)
  • Implement LIST partitioning scheme;
  • Optimize hash join (both tables are partitioned by join key).

Installation guide

To installpg_pathman, execute this in the module's directory:

make install USE_PGXS=1

Modify theshared_preload_libraries parameter inpostgresql.conf as following:

shared_preload_libraries = 'pg_pathman'

It is essential to restart the PostgreSQL instance. After that, execute the following query in psql:

CREATE EXTENSION pg_pathman;

Done! Now it's time to setup your partitioning schemes.

Important: Don't forget to set thePG_CONFIG variable in case you want to testpg_pathman on a custom build of PostgreSQL. Read morehere.

Available functions

Partition creation

create_hash_partitions(relation         REGCLASS,                       attributeTEXT,                       partitions_countINTEGER,                       partition_nameTEXT DEFAULTNULL,                       partition_dataBOOLEAN DEFAULT TRUE)

Performs HASH partitioning forrelation by integer keyattribute. Thepartitions_count parameter specifies the number of partitions to create; it cannot be changed afterwards. Ifpartition_data istrue then all the data will be automatically copied from the parent table to partitions. Note that data migration may took a while to finish and the table will be locked until transaction commits. Seepartition_table_concurrently() for a lock-free way to migrate data. Partition creation callback is invoked for each partition if set beforehand (seeset_part_init_callback()).

create_range_partitions(relation       REGCLASS,                        attributeTEXT,                        start_value    ANYELEMENT,                        interval       ANYELEMENT,                        countINTEGER DEFAULTNULL                        partition_dataBOOLEAN DEFAULT TRUE)create_range_partitions(relation       REGCLASS,                        attributeTEXT,                        start_value    ANYELEMENT,                        interval       INTERVAL,                        countINTEGER DEFAULTNULL,                        partition_dataBOOLEAN DEFAULT TRUE)

Performs RANGE partitioning forrelation by partitioning keyattribute.start_value argument specifies initial value,interval sets the range of values in a single partition,count is the number of premade partitions (if not set then pathman tries to determine it based on attribute values). Partition creation callback is invoked for each partition if set beforehand.

create_partitions_from_range(relation       REGCLASS,                             attributeTEXT,                             start_value    ANYELEMENT,                             end_value      ANYELEMENT,                             interval       ANYELEMENT,                             partition_dataBOOLEAN DEFAULT TRUE)create_partitions_from_range(relation       REGCLASS,                             attributeTEXT,                             start_value    ANYELEMENT,                             end_value      ANYELEMENT,                             interval       INTERVAL,                             partition_dataBOOLEAN DEFAULT TRUE)

Performs RANGE-partitioning from specified range forrelation by partitioning keyattribute. Partition creation callback is invoked for each partition if set beforehand.

Data migration

partition_table_concurrently(relation REGCLASS)

Starts a background worker to move data from parent table to partitions. The worker utilizes short transactions to copy small batches of data (up to 10K rows per transaction) and thus doesn't significantly interfere with user's activity.

stop_concurrent_part_task(relation REGCLASS)

Stops a background worker performing a concurrent partitioning task. Note: worker will exit after it finishes relocating a current batch.

Triggers

create_hash_update_trigger(parent REGCLASS)

Creates the trigger on UPDATE for HASH partitions. The UPDATE trigger isn't created by default because of the overhead. It's useful in cases when the key attribute might change.

create_range_update_trigger(parent REGCLASS)

Same as above, but for a RANGE-partitioned table.

Post-creation partition management

split_range_partition(partition      REGCLASS,                      value          ANYELEMENT,                      partition_nameTEXT DEFAULTNULL)

Split RANGEpartition in two byvalue. Partition creation callback is invoked for a new partition if available.

merge_range_partitions(partition1 REGCLASS, partition2 REGCLASS)

Merge two adjacent RANGE partitions. First, data frompartition2 is copied topartition1, thenpartition2 is removed.

append_range_partition(p_relation     REGCLASS,                       partition_nameTEXT DEFAULTNULL,                       tablespaceTEXT DEFAULTNULL)

Append new RANGE partition withpathman_config.range_interval as interval.

prepend_range_partition(p_relation     REGCLASS,                        partition_nameTEXT DEFAULTNULL,                        tablespaceTEXT DEFAULTNULL)

Prepend new RANGE partition withpathman_config.range_interval as interval.

add_range_partition(relation       REGCLASS,                    start_value    ANYELEMENT,                    end_value      ANYELEMENT,                    partition_nameTEXT DEFAULTNULL,                    tablespaceTEXT DEFAULTNULL)

Create new RANGE partition forrelation with specified range bounds.

drop_range_partition(partitionTEXT, delete_dataBOOLEAN DEFAULT TRUE)

Drop RANGE partition and all of its data ifdelete_data is true.

attach_range_partition(relation    REGCLASS,                       partition   REGCLASS,                       start_value ANYELEMENT,                       end_value   ANYELEMENT)

Attach partition to the existing RANGE-partitioned relation. The attached table must have exactly the same structure as the parent table, including the dropped columns. Partition creation callback is invoked if set (seepathman_config_params).

detach_range_partition(partition REGCLASS)

Detach partition from the existing RANGE-partitioned relation.

disable_pathman_for(relationTEXT)

Permanently disablepg_pathman partitioning mechanism for the specified parent table and remove the insert trigger if it exists. All partitions and data remain unchanged.

drop_partitions(parent      REGCLASS,                delete_dataBOOLEAN DEFAULT FALSE)

Drop partitions of theparent table (both foreign and local relations). Ifdelete_data isfalse, the data is copied to the parent table first. Default isfalse.

Additional parameters

set_enable_parent(relation REGCLASS, valueBOOLEAN)

Include/exclude parent table into/from query plan. In original PostgreSQL planner parent table is always included into query plan even if it's empty which can lead to additional overhead. You can usedisable_parent() if you are never going to use parent table as a storage. Default value depends on thepartition_data parameter that was specified during initial partitioning increate_range_partitions() orcreate_partitions_from_range() functions. If thepartition_data parameter wastrue then all data have already been migrated to partitions and parent table disabled. Otherwise it is enabled.

set_auto(relation REGCLASS, valueBOOLEAN)

Enable/disable auto partition propagation (only for RANGE partitioning). It is enabled by default.

set_init_callback(relation REGCLASS, callback REGPROC DEFAULT0)

Set partition creation callback to be invoked for each attached or created partition (both HASH and RANGE).

Views and tables

pathman_config --- main config storage

CREATETABLEIF NOT EXISTS pathman_config (    partrel         REGCLASSNOT NULLPRIMARY KEY,    attnameTEXTNOT NULL,    parttypeINTEGERNOT NULL,    range_intervalTEXT,CHECK (parttypeIN (1,2))/* check for allowed part types*/ );

This table stores a list of partitioned tables.

pathman_config_params --- optional parameters

CREATETABLEIF NOT EXISTS pathman_config_params (    partrel        REGCLASSNOT NULLPRIMARY KEY,    enable_parentBOOLEANNOT NULL DEFAULT TRUE,    autoBOOLEANNOT NULL DEFAULT TRUE,    init_callback  REGPROCEDURENOT NULL DEFAULT0);

This table stores optional parameters which override standard behavior.

pathman_concurrent_part_tasks --- currently running partitioning workers

-- helper SRF functionCREATE OR REPLACEFUNCTIONshow_concurrent_part_tasks()RETURNS TABLE (    userid     REGROLE,    pidINT,    dbidOID,    relid      REGCLASS,    processedINT,    statusTEXT)AS'pg_pathman','show_concurrent_part_tasks_internal'LANGUAGE C STRICT;CREATE OR REPLACEVIEWpathman_concurrent_part_tasksASSELECT*FROM show_concurrent_part_tasks();

This view lists all currently running concurrent partitioning tasks.

pathman_partition_list --- list of all existing partitions

-- helper SRF functionCREATE OR REPLACEFUNCTIONshow_partition_list()RETURNS TABLE (    parent     REGCLASS,    partition  REGCLASS,    parttype   INT4,    partattrTEXT,    range_minTEXT,    range_maxTEXT)AS'pg_pathman','show_partition_list_internal'LANGUAGE C STRICT;CREATE OR REPLACEVIEWpathman_partition_listASSELECT*FROM show_partition_list();

This view lists all existing partitions, as well as their parents and range boundaries (NULL for HASH partitions).

Custom plan nodes

pg_pathman provides a couple ofcustom plan nodes which aim to reduce execution time, namely:

  • RuntimeAppend (overridesAppend plan node)
  • RuntimeMergeAppend (overridesMergeAppend plan node)
  • PartitionFilter (drop-in replacement for INSERT triggers)

PartitionFilter acts as aproxy node for INSERT's child scan, which means it can redirect output tuples to the corresponding partition:

EXPLAIN (COSTS OFF)INSERT INTO partitioned_tableSELECT generate_series(1,10), random();               QUERY PLAN----------------------------------------- Inserton partitioned_table->  Custom Scan (PartitionFilter)->  Subquery Scanon"*SELECT*"->  Result(4 rows)

RuntimeAppend andRuntimeMergeAppend have much in common: they come in handy in a case when WHERE condition takes form of:

VARIABLE OP PARAM

This kind of expressions can no longer be optimized at planning time since the parameter's value is not known until the execution stage takes place. The problem can be solved by embedding theWHERE condition analysis routine into the originalAppend's code, thus making it pick only required scans out of a whole bunch of planned partition scans. This effectively boils down to creation of a custom node capable of performing such a check.


There are at least several cases that demonstrate usefulness of these nodes:

/* create table we're going to partition*/CREATETABLEpartitioned_table(idINTNOT NULL, payloadREAL);/* insert some data*/INSERT INTO partitioned_tableSELECT generate_series(1,1000), random();/* perform partitioning*/SELECT create_hash_partitions('partitioned_table','id',100);/* create ordinary table*/CREATETABLEsome_tableASSELECT generate_series(1,100)AS VAL;
  • id = (select ... limit 1)
EXPLAIN (COSTS OFF, ANALYZE)SELECT*FROM partitioned_tableWHERE id= (SELECT*FROM some_tableLIMIT1);                                             QUERY PLAN---------------------------------------------------------------------------------------------------- Custom Scan (RuntimeAppend) (actualtime=0.030..0.033 rows=1 loops=1)   InitPlan1 (returns $0)->Limit (actualtime=0.011..0.011 rows=1 loops=1)->  Seq Scanon some_table (actualtime=0.010..0.010 rows=1 loops=1)->  Seq Scanon partitioned_table_70 partitioned_table (actualtime=0.004..0.006 rows=1 loops=1)         Filter: (id= $0)         Rows Removed by Filter:9 Planningtime:1.131 ms Executiontime:0.075 ms(9 rows)/* disable RuntimeAppend node*/SETpg_pathman.enable_runtimeappend= f;EXPLAIN (COSTS OFF, ANALYZE)SELECT*FROM partitioned_tableWHERE id= (SELECT*FROM some_tableLIMIT1);                                    QUERY PLAN---------------------------------------------------------------------------------- Append (actualtime=0.196..0.274 rows=1 loops=1)   InitPlan1 (returns $0)->Limit (actualtime=0.005..0.005 rows=1 loops=1)->  Seq Scanon some_table (actualtime=0.003..0.003 rows=1 loops=1)->  Seq Scanon partitioned_table_0 (actualtime=0.014..0.014 rows=0 loops=1)         Filter: (id= $0)         Rows Removed by Filter:6->  Seq Scanon partitioned_table_1 (actualtime=0.003..0.003 rows=0 loops=1)         Filter: (id= $0)         Rows Removed by Filter:5         .../* more plans follow*/ Planningtime:1.140 ms Executiontime:0.855 ms(306 rows)
  • id = ANY (select ...)
EXPLAIN (COSTS OFF, ANALYZE)SELECT*FROM partitioned_tableWHERE id= any (SELECT*FROM some_tablelimit4);                                                QUERY PLAN----------------------------------------------------------------------------------------------------------- Nested Loop (actualtime=0.025..0.060 rows=4 loops=1)->Limit (actualtime=0.009..0.011 rows=4 loops=1)->  Seq Scanon some_table (actualtime=0.008..0.010 rows=4 loops=1)->  Custom Scan (RuntimeAppend) (actualtime=0.002..0.004 rows=1 loops=4)->  Seq Scanon partitioned_table_70 partitioned_table (actualtime=0.001..0.001 rows=10 loops=1)->  Seq Scanon partitioned_table_26 partitioned_table (actualtime=0.002..0.003 rows=9 loops=1)->  Seq Scanon partitioned_table_27 partitioned_table (actualtime=0.001..0.002 rows=20 loops=1)->  Seq Scanon partitioned_table_63 partitioned_table (actualtime=0.001..0.002 rows=9 loops=1) Planningtime:0.771 ms Executiontime:0.101 ms(10 rows)/* disable RuntimeAppend node*/SETpg_pathman.enable_runtimeappend= f;EXPLAIN (COSTS OFF, ANALYZE)SELECT*FROM partitioned_tableWHERE id= any (SELECT*FROM some_tablelimit4);                                       QUERY PLAN----------------------------------------------------------------------------------------- Nested Loop SemiJoin (actualtime=0.531..1.526 rows=4 loops=1)Join Filter: (partitioned_table.id=some_table.val)   Rows Removed byJoin Filter:3990->  Append (actualtime=0.190..0.470 rows=1000 loops=1)->  Seq Scanon partitioned_table (actualtime=0.187..0.187 rows=0 loops=1)->  Seq Scanon partitioned_table_0 (actualtime=0.002..0.004 rows=6 loops=1)->  Seq Scanon partitioned_table_1 (actualtime=0.001..0.001 rows=5 loops=1)->  Seq Scanon partitioned_table_2 (actualtime=0.002..0.004 rows=14 loops=1).../* 96 scans follow*/->  Materialize (actualtime=0.000..0.000 rows=4 loops=1000)->Limit (actualtime=0.005..0.006 rows=4 loops=1)->  Seq Scanon some_table (actualtime=0.003..0.004 rows=4 loops=1) Planningtime:2.169 ms Executiontime:2.059 ms(110 rows)
  • NestLoop involving a partitioned table, which is omitted since it's occasionally shown above.

In case you're interested, you can read more about custom nodes at Alexander Korotkov'sblog.

Examples

Common tips

  • You can easily addpartition column containing the names of the underlying partitions using the system attribute calledtableoid:
SELECT tableoid::regclassAS partition,*FROM partitioned_table;
  • Though indices on a parent table aren't particularly useful (since it's supposed to be empty), they act as prototypes for indices on partitions. For each index on the parent table,pg_pathman will create a similar index on every partition.

  • All running concurrent partitioning tasks can be listed using thepathman_concurrent_part_tasks view:

SELECT*FROM pathman_concurrent_part_tasks; userid | pid  | dbid  | relid | processed | status--------+------+-------+-------+-----------+--------- dmitry |7367 |16384 | test  |472000 | working(1 row)

HASH partitioning

Consider an example of HASH partitioning. First create a table with some integer column:

CREATETABLEitems (    idSERIALPRIMARY KEY,    nameTEXT,    codeBIGINT);INSERT INTO items (id, name, code)SELECT g, md5(g::text), random()*100000FROM generate_series(1,100000)as g;

Now run thecreate_hash_partitions() function with appropriate arguments:

SELECT create_hash_partitions('items','id',100);

This will create new partitions and move the data from parent to partitions.

Here's an example of the query performing filtering by partitioning key:

SELECT*FROM itemsWHERE id=1234;  id  |               name               | code------+----------------------------------+------1234 | 81dc9bdb52d04dc20036dbd8313ed055 |1855(1 row)EXPLAINSELECT*FROM itemsWHERE id=1234;                                     QUERY PLAN------------------------------------------------------------------------------------ Append  (cost=0.28..8.29 rows=0 width=0)->  Index Scan using items_34_pkeyon items_34  (cost=0.28..8.29 rows=0 width=0)         Index Cond: (id=1234)

Notice that theAppend node contains only one child scan which corresponds to the WHERE clause.

Important: pay attention to the fact thatpg_pathman excludes the parent table from the query plan.

To access parent table use ONLY modifier:

EXPLAINSELECT*FROM ONLY items;                      QUERY PLAN------------------------------------------------------ Seq Scanon items  (cost=0.00..0.00 rows=1 width=45)

RANGE partitioning

Consider an example of RANGE partitioning. Let's create a table containing some dummy logs:

CREATETABLEjournal (    idSERIAL,    dtTIMESTAMPNOT NULL,    levelINTEGER,    msgTEXT);-- similar index will also be created for each partitionCREATEINDEXON journal(dt);-- generate some dataINSERT INTO journal (dt, level, msg)SELECT g, random()*6, md5(g::text)FROM generate_series('2015-01-01'::date,'2015-12-31'::date,'1 minute')as g;

Run thecreate_range_partitions() function to create partitions so that each partition would contain the data for one day:

SELECT create_range_partitions('journal','dt','2015-01-01'::date,'1 day'::interval);

It will create 365 partitions and move the data from parent to partitions.

New partitions are appended automaticaly by insert trigger, but it can be done manually with the following functions:

-- add new partition with specified rangeSELECT add_range_partition('journal','2016-01-01'::date,'2016-01-07'::date);-- append new partition with default rangeSELECT append_range_partition('journal');

The first one creates a partition with specified range. The second one creates a partition with default interval and appends it to the partition list. It is also possible to attach an existing table as partition. For example, we may want to attach an archive table (or even foreign table from another server) for some outdated data:

CREATE FOREIGN TABLE journal_archive (    idINTEGERNOT NULL,    dtTIMESTAMPNOT NULL,    levelINTEGER,    msgTEXT)SERVER archive_server;SELECT attach_range_partition('journal','journal_archive','2014-01-01'::date,'2015-01-01'::date);

Important: the definition of the attached table must match the one of the existing partitioned table, including the dropped columns.

To merge to adjacent partitions, use themerge_range_partitions() function:

SELECT merge_range_partitions('journal_archive','journal_1');

To split partition by value, use thesplit_range_partition() function:

SELECT split_range_partition('journal_366','2016-01-03'::date);

To detach partition, use thedetach_range_partition() function:

SELECT detach_range_partition('journal_archive');

Here's an example of the query performing filtering by partitioning key:

SELECT*FROM journalWHERE dt>='2015-06-01'AND dt<'2015-06-03';   id   |         dt          | level |               msg--------+---------------------+-------+----------------------------------217441 |2015-06-0100:00:00 |2 | 15053892d993ce19f580a128f87e3dbf217442 |2015-06-0100:01:00 |1 | 3a7c46f18a952d62ce5418ac2056010c217443 |2015-06-0100:02:00 |0 | 92c8de8f82faf0b139a3d99f2792311d ...(2880 rows)EXPLAINSELECT*FROM journalWHERE dt>='2015-06-01'AND dt<'2015-06-03';                            QUERY PLAN------------------------------------------------------------------ Append  (cost=0.00..58.80 rows=0 width=0)->  Seq Scanon journal_152  (cost=0.00..29.40 rows=0 width=0)->  Seq Scanon journal_153  (cost=0.00..29.40 rows=0 width=0)(3 rows)

Disablingpg_pathman

There are several user-accessibleGUC variables designed to toggle the whole module or specific custom nodes on and off:

  • pg_pathman.enable --- disable (or enable)pg_pathman completely
  • pg_pathman.enable_runtimeappend --- toggleRuntimeAppend custom node on\off
  • pg_pathman.enable_runtimemergeappend --- toggleRuntimeMergeAppend custom node on\off
  • pg_pathman.enable_partitionfilter --- togglePartitionFilter custom node on\off
  • pg_pathman.enable_auto_partition --- toggle automatic partition creation on\off (per session)
  • pg_pathman.insert_into_fdw --- allow INSERTs into various FDWs(disabled | postgres | any_fdw)
  • pg_pathman.override_copy --- toggle COPY statement hooking on\off

Topermanently disablepg_pathman for some previously partitioned table, use thedisable_pathman_for() function:

SELECT disable_pathman_for('range_rel');

All sections and data will remain unchanged and will be handled by the standard PostgreSQL inheritance mechanism.

##FeedbackDo not hesitate to post your issues, questions and new ideas at theissues page.

Authors

Ildar Musini.musin@postgrespro.ru Postgres Professional Ltd., RussiaAlexander Korotkova.korotkov@postgrespro.ru Postgres Professional Ltd., RussiaDmitry Ivanovd.ivanov@postgrespro.ru Postgres Professional Ltd., Russia

About

Postgres Professional fork of PostgreSQL

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors36


[8]ページ先頭

©2009-2025 Movatter.jp