Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitd6c08e2

Browse files
Add hash_mem_multiplier GUC.
Add a GUC that acts as a multiplier on work_mem. It gets applied whensizing executor node hash tables that were previously size constrainedusing work_mem alone.The new GUC can be used to preferentially give hash-based nodes morememory than the generic work_mem limit. It is intended to enable admintuning of the executor's memory usage. Overall system throughput andsystem responsiveness can be improved by giving hash-based executornodes more memory (especially over sort-based alternatives, which areoften much less sensitive to being memory constrained).The default value for hash_mem_multiplier is 1.0, which is also theminimum valid value. This means that hash-based nodes continue to applywork_mem in the traditional way by default.hash_mem_multiplier is generally useful. However, it is being added nowdue to concerns about hash aggregate performance stability for usersthat upgrade to Postgres 13 (which added disk-based hash aggregation incommit1f39bce). While the old hash aggregate behavior riskedout-of-memory errors, it is nevertheless likely that many users actuallybenefited. Hash agg's previous indifference to work_mem during queryexecution was not just faster; it also accidentally made aggregationresilient to grouping estimate problems (at least in cases where thisdidn't create destabilizing memory pressure).hash_mem_multiplier can provide a certain kind of continuity with thebehavior of Postgres 12 hash aggregates in cases where the plannerincorrectly estimates that all groups (plus related allocations) willfit in work_mem/hash_mem. This seems necessary because hash-basedaggregation is usually much slower when only a small fraction of allgroups can fit. Even when it isn't possible to totally avoid hashaggregates that spill, giving hash aggregation more memory will reliablyimprove performance (the same cannot be said for external sortoperations, which appear to be almost unaffected by memory availabilityprovided it's at least possible to get a single merge pass).The PostgreSQL 13 release notes should advise users that increasinghash_mem_multiplier can help with performance regressions associatedwith hash aggregation. That can be taken care of by a later commit.Author: Peter GeogheganReviewed-By: Álvaro Herrera, Jeff DavisDiscussion:https://postgr.es/m/20200625203629.7m6yvut7eqblgmfo@alap3.anarazel.deDiscussion:https://postgr.es/m/CAH2-WzmD%2Bi1pG6rc1%2BCjc4V6EaFJ_qSuKCCHVnH%3DoruqD-zqow%40mail.gmail.comBackpatch: 13-, where disk-based hash aggregation was introduced.
1 parent6023b7e commitd6c08e2

File tree

19 files changed

+205
-81
lines changed

19 files changed

+205
-81
lines changed

‎doc/src/sgml/config.sgml

Lines changed: 51 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1690,22 +1690,64 @@ include_dir 'conf.d'
16901690
</term>
16911691
<listitem>
16921692
<para>
1693-
Sets the maximum amount of memory to be used by a query operation
1693+
Sets thebasemaximum amount of memory to be used by a query operation
16941694
(such as a sort or hash table) before writing to temporary disk files.
16951695
If this value is specified without units, it is taken as kilobytes.
16961696
The default value is four megabytes (<literal>4MB</literal>).
16971697
Note that for a complex query, several sort or hash operations might be
1698-
running in parallel; each operation will be allowed to use as much memory
1699-
as this value specifies before it starts to write data into temporary
1700-
files. Also, several running sessions could be doing such operations
1701-
concurrently. Therefore, the total memory used could be many
1702-
times the value of <varname>work_mem</varname>; it is necessary to
1703-
keep this fact in mind when choosing the value. Sort operations are
1704-
used for <literal>ORDER BY</literal>, <literal>DISTINCT</literal>, and
1705-
merge joins.
1698+
running in parallel; each operation will generally be allowed
1699+
to use as much memory as this value specifies before it starts
1700+
to write data into temporary files. Also, several running
1701+
sessions could be doing such operations concurrently.
1702+
Therefore, the total memory used could be many times the value
1703+
of <varname>work_mem</varname>; it is necessary to keep this
1704+
fact in mind when choosing the value. Sort operations are used
1705+
for <literal>ORDER BY</literal>, <literal>DISTINCT</literal>,
1706+
and merge joins.
17061707
Hash tables are used in hash joins, hash-based aggregation, and
17071708
hash-based processing of <literal>IN</literal> subqueries.
17081709
</para>
1710+
<para>
1711+
Hash-based operations are generally more sensitive to memory
1712+
availability than equivalent sort-based operations. The
1713+
memory available for hash tables is computed by multiplying
1714+
<varname>work_mem</varname> by
1715+
<varname>hash_mem_multiplier</varname>. This makes it
1716+
possible for hash-based operations to use an amount of memory
1717+
that exceeds the usual <varname>work_mem</varname> base
1718+
amount.
1719+
</para>
1720+
</listitem>
1721+
</varlistentry>
1722+
1723+
<varlistentry id="guc-hash-mem-multiplier" xreflabel="hash_mem_multiplier">
1724+
<term><varname>hash_mem_multiplier</varname> (<type>floating point</type>)
1725+
<indexterm>
1726+
<primary><varname>hash_mem_multiplier</varname> configuration parameter</primary>
1727+
</indexterm>
1728+
</term>
1729+
<listitem>
1730+
<para>
1731+
Used to compute the maximum amount of memory that hash-based
1732+
operations can use. The final limit is determined by
1733+
multiplying <varname>work_mem</varname> by
1734+
<varname>hash_mem_multiplier</varname>. The default value is
1735+
1.0, which makes hash-based operations subject to the same
1736+
simple <varname>work_mem</varname> maximum as sort-based
1737+
operations.
1738+
</para>
1739+
<para>
1740+
Consider increasing <varname>hash_mem_multiplier</varname> in
1741+
environments where spilling by query operations is a regular
1742+
occurrence, especially when simply increasing
1743+
<varname>work_mem</varname> results in memory pressure (memory
1744+
pressure typically takes the form of intermittent out of
1745+
memory errors). A setting of 1.5 or 2.0 may be effective with
1746+
mixed workloads. Higher settings in the range of 2.0 - 8.0 or
1747+
more may be effective in environments where
1748+
<varname>work_mem</varname> has already been increased to 40MB
1749+
or more.
1750+
</para>
17091751
</listitem>
17101752
</varlistentry>
17111753

‎doc/src/sgml/ref/postgres-ref.sgml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -338,10 +338,10 @@ PostgreSQL documentation
338338
<term><option>-S</option> <replaceable class="parameter">work-mem</replaceable></term>
339339
<listitem>
340340
<para>
341-
Specifies the amount of memory to be used byinternalsorts and hashes
342-
before resorting to temporary disk files. See the description of the
343-
<varname>work_mem</varname> configuration parameter in <xref
344-
linkend="runtime-config-resource-memory"/>.
341+
Specifies thebaseamount of memory to be used by sorts and
342+
hash tablesbefore resorting to temporary disk files. See the
343+
description of the<varname>work_mem</varname> configuration
344+
parameter in <xreflinkend="runtime-config-resource-memory"/>.
345345
</para>
346346
</listitem>
347347
</varlistentry>

‎doc/src/sgml/runtime.sgml

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1326,10 +1326,12 @@ Out of Memory: Killed process 12345 (postgres).
13261326
system running out of memory, you can avoid the problem by changing
13271327
your configuration. In some cases, it may help to lower memory-related
13281328
configuration parameters, particularly
1329-
<link linkend="guc-shared-buffers"><varname>shared_buffers</varname></link>
1330-
and <link linkend="guc-work-mem"><varname>work_mem</varname></link>. In
1331-
other cases, the problem may be caused by allowing too many connections
1332-
to the database server itself. In many cases, it may be better to reduce
1329+
<link linkend="guc-shared-buffers"><varname>shared_buffers</varname></link>,
1330+
<link linkend="guc-work-mem"><varname>work_mem</varname></link>, and
1331+
<link linkend="guc-hash-mem-multiplier"><varname>hash_mem_multiplier</varname></link>.
1332+
In other cases, the problem may be caused by allowing too many
1333+
connections to the database server itself. In many cases, it may
1334+
be better to reduce
13331335
<link linkend="guc-max-connections"><varname>max_connections</varname></link>
13341336
and instead make use of external connection-pooling software.
13351337
</para>

‎src/backend/executor/execGrouping.c

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -165,13 +165,14 @@ BuildTupleHashTableExt(PlanState *parent,
165165
{
166166
TupleHashTablehashtable;
167167
Sizeentrysize=sizeof(TupleHashEntryData)+additionalsize;
168+
inthash_mem=get_hash_mem();
168169
MemoryContextoldcontext;
169170
boolallow_jit;
170171

171172
Assert(nbuckets>0);
172173

173-
/* Limit initial table size request to not more thanwork_mem */
174-
nbuckets=Min(nbuckets, (long) ((work_mem*1024L) /entrysize));
174+
/* Limit initial table size request to not more thanhash_mem */
175+
nbuckets=Min(nbuckets, (long) ((hash_mem*1024L) /entrysize));
175176

176177
oldcontext=MemoryContextSwitchTo(metacxt);
177178

‎src/backend/executor/nodeAgg.c

Lines changed: 16 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -203,7 +203,7 @@
203203
* entries (and initialize new transition states), we instead spill them to
204204
* disk to be processed later. The tuples are spilled in a partitioned
205205
* manner, so that subsequent batches are smaller and less likely to exceed
206-
*work_mem (if a batch does exceedwork_mem, it must be spilled
206+
*hash_mem (if a batch does exceedhash_mem, it must be spilled
207207
* recursively).
208208
*
209209
* Spilled data is written to logical tapes. These provide better control
@@ -212,7 +212,7 @@
212212
*
213213
* Note that it's possible for transition states to start small but then
214214
* grow very large; for instance in the case of ARRAY_AGG. In such cases,
215-
* it's still possible to significantly exceedwork_mem. We try to avoid
215+
* it's still possible to significantly exceedhash_mem. We try to avoid
216216
* this situation by estimating what will fit in the available memory, and
217217
* imposing a limit on the number of groups separately from the amount of
218218
* memory consumed.
@@ -1516,7 +1516,7 @@ build_hash_table(AggState *aggstate, int setno, long nbuckets)
15161516

15171517
/*
15181518
* Used to make sure initial hash table allocation does not exceed
1519-
*work_mem. Note that the estimate does not include space for
1519+
*hash_mem. Note that the estimate does not include space for
15201520
* pass-by-reference transition data values, nor for the representative
15211521
* tuple of each group.
15221522
*/
@@ -1782,7 +1782,7 @@ hashagg_recompile_expressions(AggState *aggstate, bool minslot, bool nullcheck)
17821782
}
17831783

17841784
/*
1785-
* Set limits that trigger spilling to avoid exceedingwork_mem. Consider the
1785+
* Set limits that trigger spilling to avoid exceedinghash_mem. Consider the
17861786
* number of partitions we expect to create (if we do spill).
17871787
*
17881788
* There are two limits: a memory limit, and also an ngroups limit. The
@@ -1796,13 +1796,14 @@ hash_agg_set_limits(double hashentrysize, double input_groups, int used_bits,
17961796
{
17971797
intnpartitions;
17981798
Sizepartition_mem;
1799+
inthash_mem=get_hash_mem();
17991800

1800-
/* if not expected to spill, use all ofwork_mem */
1801-
if (input_groups*hashentrysize<work_mem*1024L)
1801+
/* if not expected to spill, use all ofhash_mem */
1802+
if (input_groups*hashentrysize<hash_mem*1024L)
18021803
{
18031804
if (num_partitions!=NULL)
18041805
*num_partitions=0;
1805-
*mem_limit=work_mem*1024L;
1806+
*mem_limit=hash_mem*1024L;
18061807
*ngroups_limit=*mem_limit /hashentrysize;
18071808
return;
18081809
}
@@ -1824,14 +1825,14 @@ hash_agg_set_limits(double hashentrysize, double input_groups, int used_bits,
18241825
HASHAGG_WRITE_BUFFER_SIZE*npartitions;
18251826

18261827
/*
1827-
* Don't set the limit below 3/4 ofwork_mem. In that case, we are at the
1828+
* Don't set the limit below 3/4 ofhash_mem. In that case, we are at the
18281829
* minimum number of partitions, so we aren't going to dramatically exceed
18291830
* work mem anyway.
18301831
*/
1831-
if (work_mem*1024L>4*partition_mem)
1832-
*mem_limit=work_mem*1024L-partition_mem;
1832+
if (hash_mem*1024L>4*partition_mem)
1833+
*mem_limit=hash_mem*1024L-partition_mem;
18331834
else
1834-
*mem_limit=work_mem*1024L*0.75;
1835+
*mem_limit=hash_mem*1024L*0.75;
18351836

18361837
if (*mem_limit>hashentrysize)
18371838
*ngroups_limit=*mem_limit /hashentrysize;
@@ -1989,19 +1990,20 @@ hash_choose_num_partitions(double input_groups, double hashentrysize,
19891990
intpartition_limit;
19901991
intnpartitions;
19911992
intpartition_bits;
1993+
inthash_mem=get_hash_mem();
19921994

19931995
/*
19941996
* Avoid creating so many partitions that the memory requirements of the
1995-
* open partition files are greater than 1/4 ofwork_mem.
1997+
* open partition files are greater than 1/4 ofhash_mem.
19961998
*/
19971999
partition_limit=
1998-
(work_mem*1024L*0.25-HASHAGG_READ_BUFFER_SIZE) /
2000+
(hash_mem*1024L*0.25-HASHAGG_READ_BUFFER_SIZE) /
19992001
HASHAGG_WRITE_BUFFER_SIZE;
20002002

20012003
mem_wanted=HASHAGG_PARTITION_FACTOR*input_groups*hashentrysize;
20022004

20032005
/* make enough partitions so that each one is likely to fit in memory */
2004-
npartitions=1+ (mem_wanted / (work_mem*1024L));
2006+
npartitions=1+ (mem_wanted / (hash_mem*1024L));
20052007

20062008
if (npartitions>partition_limit)
20072009
npartitions=partition_limit;

‎src/backend/executor/nodeHash.c

Lines changed: 61 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@
3939
#include"port/atomics.h"
4040
#include"port/pg_bitutils.h"
4141
#include"utils/dynahash.h"
42+
#include"utils/guc.h"
4243
#include"utils/lsyscache.h"
4344
#include"utils/memutils.h"
4445
#include"utils/syscache.h"
@@ -506,7 +507,7 @@ ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations,
506507
hashtable->spaceAllowed=space_allowed;
507508
hashtable->spaceUsedSkew=0;
508509
hashtable->spaceAllowedSkew=
509-
hashtable->spaceAllowed*SKEW_WORK_MEM_PERCENT /100;
510+
hashtable->spaceAllowed*SKEW_HASH_MEM_PERCENT /100;
510511
hashtable->chunks=NULL;
511512
hashtable->current_chunk=NULL;
512513
hashtable->parallel_state=state->parallel_state;
@@ -665,7 +666,7 @@ ExecHashTableCreate(HashState *state, List *hashOperators, List *hashCollations,
665666

666667
void
667668
ExecChooseHashTableSize(doublentuples,inttupwidth,booluseskew,
668-
booltry_combined_work_mem,
669+
booltry_combined_hash_mem,
669670
intparallel_workers,
670671
size_t*space_allowed,
671672
int*numbuckets,
@@ -682,6 +683,7 @@ ExecChooseHashTableSize(double ntuples, int tupwidth, bool useskew,
682683
intnbatch=1;
683684
intnbuckets;
684685
doubledbuckets;
686+
inthash_mem=get_hash_mem();
685687

686688
/* Force a plausible relation size if no info */
687689
if (ntuples <=0.0)
@@ -698,16 +700,16 @@ ExecChooseHashTableSize(double ntuples, int tupwidth, bool useskew,
698700
inner_rel_bytes=ntuples*tupsize;
699701

700702
/*
701-
* Target in-memory hashtable size iswork_mem kilobytes.
703+
* Target in-memory hashtable size ishash_mem kilobytes.
702704
*/
703-
hash_table_bytes=work_mem*1024L;
705+
hash_table_bytes=hash_mem*1024L;
704706

705707
/*
706-
* Parallel Hash tries to use the combinedwork_mem of all workers to
707-
* avoid the need to batch. If that won't work, it falls back towork_mem
708+
* Parallel Hash tries to use the combinedhash_mem of all workers to
709+
* avoid the need to batch. If that won't work, it falls back tohash_mem
708710
* per worker and tries to process batches in parallel.
709711
*/
710-
if (try_combined_work_mem)
712+
if (try_combined_hash_mem)
711713
hash_table_bytes+=hash_table_bytes*parallel_workers;
712714

713715
*space_allowed=hash_table_bytes;
@@ -728,7 +730,7 @@ ExecChooseHashTableSize(double ntuples, int tupwidth, bool useskew,
728730
*/
729731
if (useskew)
730732
{
731-
skew_table_bytes=hash_table_bytes*SKEW_WORK_MEM_PERCENT /100;
733+
skew_table_bytes=hash_table_bytes*SKEW_HASH_MEM_PERCENT /100;
732734

733735
/*----------
734736
* Divisor is:
@@ -751,7 +753,7 @@ ExecChooseHashTableSize(double ntuples, int tupwidth, bool useskew,
751753
/*
752754
* Set nbuckets to achieve an average bucket load of NTUP_PER_BUCKET when
753755
* memory is filled, assuming a single batch; but limit the value so that
754-
* the pointer arrays we'll try to allocate do not exceedwork_mem nor
756+
* the pointer arrays we'll try to allocate do not exceedhash_mem nor
755757
* MaxAllocSize.
756758
*
757759
* Note that both nbuckets and nbatch must be powers of 2 to make
@@ -790,10 +792,10 @@ ExecChooseHashTableSize(double ntuples, int tupwidth, bool useskew,
790792
longbucket_size;
791793

792794
/*
793-
* If Parallel Hash with combinedwork_mem would still need multiple
794-
* batches, we'll have to fall back to regularwork_mem budget.
795+
* If Parallel Hash with combinedhash_mem would still need multiple
796+
* batches, we'll have to fall back to regularhash_mem budget.
795797
*/
796-
if (try_combined_work_mem)
798+
if (try_combined_hash_mem)
797799
{
798800
ExecChooseHashTableSize(ntuples,tupwidth,useskew,
799801
false,parallel_workers,
@@ -805,7 +807,7 @@ ExecChooseHashTableSize(double ntuples, int tupwidth, bool useskew,
805807
}
806808

807809
/*
808-
* Estimate the number of buckets we'll want to have whenwork_mem is
810+
* Estimate the number of buckets we'll want to have whenhash_mem is
809811
* entirely full. Each bucket will contain a bucket pointer plus
810812
* NTUP_PER_BUCKET tuples, whose projected size already includes
811813
* overhead for the hash code, pointer to the next tuple, etc.
@@ -820,8 +822,8 @@ ExecChooseHashTableSize(double ntuples, int tupwidth, bool useskew,
820822
/*
821823
* Buckets are simple pointers to hashjoin tuples, while tupsize
822824
* includes the pointer, hash code, and MinimalTupleData. So buckets
823-
* should never really exceed 25% ofwork_mem (even for
824-
* NTUP_PER_BUCKET=1); except maybe forwork_mem values that are not
825+
* should never really exceed 25% ofhash_mem (even for
826+
* NTUP_PER_BUCKET=1); except maybe forhash_mem values that are not
825827
* 2^N bytes, where we might get more because of doubling. So let's
826828
* look for 50% here.
827829
*/
@@ -1095,15 +1097,17 @@ ExecParallelHashIncreaseNumBatches(HashJoinTable hashtable)
10951097
/* Figure out how many batches to use. */
10961098
if (hashtable->nbatch==1)
10971099
{
1100+
inthash_mem=get_hash_mem();
1101+
10981102
/*
10991103
* We are going from single-batch to multi-batch. We need
11001104
* to switch from one large combined memory budget to the
1101-
* regularwork_mem budget.
1105+
* regularhash_mem budget.
11021106
*/
1103-
pstate->space_allowed=work_mem*1024L;
1107+
pstate->space_allowed=hash_mem*1024L;
11041108

11051109
/*
1106-
* The combinedwork_mem of all participants wasn't
1110+
* The combinedhash_mem of all participants wasn't
11071111
* enough. Therefore one batch per participant would be
11081112
* approximately equivalent and would probably also be
11091113
* insufficient. So try two batches per participant,
@@ -2855,7 +2859,7 @@ ExecParallelHashTupleAlloc(HashJoinTable hashtable, size_t size,
28552859

28562860
/*
28572861
* Check if our space limit would be exceeded. To avoid choking on
2858-
* very large tuples or very lowwork_mem setting, we'll always allow
2862+
* very large tuples or very lowhash_mem setting, we'll always allow
28592863
* each backend to allocate at least one chunk.
28602864
*/
28612865
if (hashtable->batches[0].at_least_one_chunk&&
@@ -3366,3 +3370,41 @@ ExecParallelHashTuplePrealloc(HashJoinTable hashtable, int batchno, size_t size)
33663370

33673371
return true;
33683372
}
3373+
3374+
/*
3375+
* Get a hash_mem value by multiplying the work_mem GUC's value by the
3376+
* hash_mem_multiplier GUC's value.
3377+
*
3378+
* Returns a work_mem style KB value that hash-based nodes (including but not
3379+
* limited to hash join) use in place of work_mem. This is subject to the
3380+
* same restrictions as work_mem itself. (There is no such thing as the
3381+
* hash_mem GUC, but it's convenient for our callers to pretend that there
3382+
* is.)
3383+
*
3384+
* Exported for use by the planner, as well as other hash-based executor
3385+
* nodes. This is a rather random place for this, but there is no better
3386+
* place.
3387+
*/
3388+
int
3389+
get_hash_mem(void)
3390+
{
3391+
doublehash_mem;
3392+
3393+
Assert(hash_mem_multiplier >=1.0);
3394+
3395+
hash_mem= (double)work_mem*hash_mem_multiplier;
3396+
3397+
/*
3398+
* guc.c enforces a MAX_KILOBYTES limitation on work_mem in order to
3399+
* support the assumption that raw derived byte values can be stored in
3400+
* 'long' variables. The returned hash_mem value must also meet this
3401+
* assumption.
3402+
*
3403+
* We clamp the final value rather than throw an error because it should
3404+
* be possible to set work_mem and hash_mem_multiplier independently.
3405+
*/
3406+
if (hash_mem<MAX_KILOBYTES)
3407+
return (int)hash_mem;
3408+
3409+
returnMAX_KILOBYTES;
3410+
}

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp