Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit433f116

Browse files
committed
Day 56
1 parented1cde3 commit433f116

File tree

2 files changed

+105
-0
lines changed

2 files changed

+105
-0
lines changed
Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
Originally from:[tweet](https://twitter.com/samokhvalov/status/1726887992340738324),[LinkedIn post]().
2+
3+
---
4+
5+
#How to make the non-production Postgres planner behave like in production
6+
7+
>I post a new PostgreSQL "howto" article every day. Join me in this
8+
>journey –[subscribe](https://twitter.com/samokhvalov/), provide feedback, share!
9+
10+
For query optimization goals – running`EXPLAIN (ANALYZE, BUFFERS)` and verifying various optimization ideas such as a
11+
new index – it is crucial to ensure that the Postgres planner behaves exactly like or similarly to the planner working
12+
with the production database. Fortunately, this is possible regardless of the resources (CPU, RAM, disk IO) available on
13+
the non-production machine.
14+
15+
The technique described here is used in DBLab Engine
16+
[@Database_Lab](https://twitter.com/Database_Lab)
17+
to support fast database cloning/branching for query optimization and database testing in CI/CD.
18+
19+
To achieve the planner's prod/non-prod behavior parity, two components are needed:
20+
21+
1) Matching database settings
22+
2) The same or very similar statistic (the content of`pg_statistic`)
23+
24+
##Matching database settings
25+
26+
Use this query on the production database to obtain the list of non-default settings that influence the planner's
27+
behavior:
28+
29+
```sql
30+
select
31+
format(e'%s = %s', name, setting)as configs
32+
from
33+
pg_settings
34+
where
35+
source<>'default'
36+
and (
37+
name ~'(work_mem$|^enable_|_cost$|scan_size$|effective_cache_size|^jit)'
38+
or name ~'(^geqo|default_statistics_target|constraint_exclusion|cursor_tuple_fraction)'
39+
or name ~'(collapse_limit$|parallel|plan_cache_mode)'
40+
);
41+
```
42+
43+
Notes:
44+
45+
- The planner's behavior doesn't depend on the actual resources available such as CPU or RAM. Nor does it depend on OS,
46+
FS or their settings.
47+
- The value of`shared_buffers` doesn't matter(!) – it will only affect the executor's behavior and buffer pool's
48+
hit/read ratio. What does matter for the planner is`effective_cache_size` and you can set it to a value that
49+
significantly exceed the actual RAM available, "fooling" the planner in a good sense, achieving the goal to match the
50+
production planner behavior. So you can have, say, 1 TiB of RAM and`shared_buffers = '250GB'` in production and
51+
`effective_cache_size = '750GB'`, and be able to effectively analyze and optimize queries on a small 8-GiB machine
52+
with`shared_buffers = '2GB'` and`effective_cache_size = '750GB'`. The planner will assume you have a lot of RAM when
53+
choosing the optimal plans.
54+
- Similarly, you can have fast SSDs in production and use`random_page_cost = 1.1` there, and very slow cheap magnetic
55+
disks in your testing environments. With`random_page_cost = 1.1` being used there, the planner will consider random
56+
access to be not costly. The execution will be still slow, but the plan chosen and the data volumes (actual rows,
57+
buffer numbers) will be the same or very close to production.
58+
59+
##Matching statistics
60+
61+
There is a quite interesting and exotic idea of dumping/restoring the content of`pg_statistic` from production to testing
62+
environments of much smaller size, not having the actual data there. There is not a very popular solution implementing
63+
this idea; possibly,[pg_dbms_stats](https://github.com/ossc-db/pg_dbms_stats/blob/master/doc/pg_dbms_stats-en.md) can
64+
help with it.
65+
66+
>🎯**TODO:** Test and see if it works.
67+
68+
A straightforward idea is to have the same-size database. Two options are possible here:
69+
70+
***Option 1:** physical copy. It can be done using`pg_basebackup`, copying`PGDATA` using rsync or an alternative, or
71+
restoring from physical backups such as`WAL-G`'s or`pgBackRest`'s.
72+
73+
***Option 2:** logical copy. It can be done using dump/restore or creating a logical replica.
74+
75+
Notes:
76+
77+
- The physical method of provisioning gives the best results: not only will "actual rows" match, but also the buffer
78+
numbers provided by running`EXPLAIN (ANALYZE, BUFFERS)`, because the number of pages (`relpages`) is the same, the
79+
bloat
80+
is preserved, and so on. This is the best method for optimization and fine-tuning the queries.
81+
- The logical method gives the matching row counts, but the size of tables and indexes is going to be
82+
different –`relpages` is smaller in a freshly provisioned node, the bloat is not preserved, and tuples, generally,
83+
are stored in a different order (we can call this bloat "good" since we want to have it in testing environments to
84+
match the production state). This method still enables quite efficient query optimization workflow, with an additional
85+
idea that the importance of keeping bloat low in production becomes higher.
86+
- After dump/restore you must explicitly run`ANALYZE` (or`vacuumdb --analyze -j <number of workers>`) to initially
87+
gather statistics in`pg_statistic`, because`pg_restore` (or`psql`) won't run it for you.
88+
- If the database content needs to be changed to remove sensitive data, this most certainly is going to affect the
89+
planner behavior. For some queries, the impact may be quite low, but for others it can be critical, making query
90+
optimization virtually impossible. These negative effects are generally grater than those caused by dump/restore
91+
losing bloat because:
92+
- dump/restore affects`relpages` (bloat lost) and the order of tuples, but not the content of`pg_statistic`
93+
- removal of sensitive data can not only reorder tuples, produce irrelevant bloat ("bad bloat"), but also lose
94+
important parts of production's`pg_statistic`.
95+
96+
##Summary
97+
98+
To make the non-production Postgres planner behave like in production, perform these two steps:
99+
100+
1) Tune certain non-prod Postgres settings (planner-related,`work_mem`) to match production.
101+
102+
2) Copy the database from production if possible, preferring physical type of provisioning to logical, and – if it is
103+
possible, of course – avoiding data modifications. For ultra-fast delivery of clones, consider using the DBLab Engine
104+
[@Database_Lab](https://twitter.com/Database_Lab).

‎README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,7 @@ As an example, first 2 rows:
8282
- 0053[Index maintenance](./0053_index_maintenance.md)
8383
- 0054[How to check btree indexes for corruption (pg_amcheck)](./0054_how_to_check_btree_indexes_for_corruption.md)
8484
- 0055[How to drop a column](./0055_how_to_drop_a_column.md)
85+
- 0056[How to make the non-production Postgres planner behave like in production](./0056_how_to_imitate_production_planner.md)
8586
- ...
8687

8788
##Contributors

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp