Debugging memory leaks in Postgres, jemalloc edition

June 21, 2025

I've beentalkingabout debuggingmemory leaks for more than a year now; covering Valgrind, AdressSanitizer, memleak, and heaptrack. But there are still a few more tools to explore1 and today we're going to look atjemalloc, the alternative malloc implementation from2 Meta.

Alternative malloc implementations are popular and practical. Google has tcmalloc, Microsoft has mimalloc, and Meta has jemalloc3. But jemalloc is the only malloc implementation I've seen so far with decent memory leak detection. This is becauseAddressSanitizer support is not sufficient to detect leaks that, for example, only sometimes trigger the OOM killer but otherwise get cleaned up on exit.

1 gperftools and bytehound are on my list to check out eventually.
2 I can't confidently summarize the history, so readthis post if you're curious.
3 Other major jemalloc users includeFreeBSD andApache Arrow.

Scenario

In mylast post, we introduced two memory leaks into Postgres and debugged them with heaptrack. In this post we'll introduce those same two memory leaks again1 but we will debug them with jemalloc.

While you can easily use jemalloc on macOS, heap profiling and leak detectionisn't supported on macOS. So you'll have to pull out a Linux (virtual) machine.

Although we have been using Postgres as the codebase from which to explore tools for debugging memory leaks, these techniques are relevant for memory leaks in C, C++, and Rust projects in general.

Grab and build Postgres2.

$ git clone https://github.com/postgres/postgres$ cd postgres$ git checkout REL_17_STABLE$ ./configure --without-zlib --without-icu \    --without-readline --enable-debug --prefix=/usr/local/$ make -j8 && sudo make install

And grab and build jemalloc.

$ git clone https://github.com/facebook/jemalloc$ cd jemalloc$ ./autogen.sh$ ./configure --enable-prof --enable-prof-frameptr$ make -j8 && sudo make install

1 Much of the code and text of this post is taken from the previous post, my apologies.
2 I don't normally demonstrate installing globally but I'm running this in a dedicated virtual machine so installing globally doesn't bother me.

A leak in postmaster

Every time a Postgres process starts up it is scheduled by thepostmaster process. Let's introduce a memory leak into postmaster.

$ git diff src/backend/postmasterdiff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.cindex d032091495b..e0bf8943763 100644--- a/src/backend/postmaster/postmaster.c+++ b/src/backend/postmaster/postmaster.c@@ -3547,6 +3547,13 @@ BackendStartup(ClientSocket *client_sock)       Backend    *bn;                         /* for backend cleanup */       pid_t           pid;       BackendStartupData startup_data;+       MemoryContext old;+       int *s;++       old = MemoryContextSwitchTo(TopMemoryContext);+       s = palloc(8321);+       *s = 12;+       MemoryContextSwitchTo(old);       /*        * Create backend data structure.  Better before the fork() so we can

Remember that Postgres allocates memory in nested arenas calledMemoryContexts. The top-level arena is calledTopMemoryContext and it is freed as the process exits. Excessive allocations (leaks) inTopMemoryContext would not be caught by Valgrind memcheck or LeakSanitizer because the memory is actually freed as the process exits becauseTopMemoryContext is freed as the process exits. But while the process is alive, the above leak is real.

(If we switch frompalloc tomalloc above, LeakSanitizer does catch this leak. I didn't try Valgrind memcheck but it probably catches this too.)

An easy way to trigger this leak is by executing a ton of separatepsql clients that create tons ofPostgres client backend processes.

$ for run in {1..100000}; do psql postgres -c 'select 1'; done

With the diff above in place, rebuild and reinstall Postgres.

$ make -j8 && make install

Create a database and runpostgres, but with the jemalloc library inLD_PRELOAD.

$ initdb testdb$ MALLOC_CONF=prof_leak:true,lg_prof_sample:0,prof_final:true \  LD_PRELOAD=/usr/local/lib/libjemalloc.so \  postgres -D $(pwd)/testdb2025-06-21 12:25:07.576 EDT [640443] LOG:  starting PostgreSQL 17.5 on aarch64-unknown-linux-gnu, compiled by gcc (Debian 12.2.0-14+deb12u1) 12.2.0, 64-bit2025-06-21 12:25:07.577 EDT [640443] LOG:  listening on IPv6 address "::1", port 54322025-06-21 12:25:07.577 EDT [640443] LOG:  listening on IPv4 address "127.0.0.1", port 54322025-06-21 12:25:07.578 EDT [640443] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"2025-06-21 12:25:07.582 EDT [640446] LOG:  database system was shut down at 2025-06-21 12:24:52 EDT<jemalloc>: Leak approximation summary: ~423600 bytes, ~109 objects, >= 65 contexts<jemalloc>: Run jeprof on dump output for leak detail2025-06-21 12:25:07.586 EDT [640443] LOG:  database system is ready to accept connections

In another terminal we'll exercise the leaking workload.

$ for run in {1..100000}; do psql postgres -c 'select 1'; done

If you want to watch the memory usage climb while this workload is running, opentop in another terminal.

When that is done we should have leaked a good deal of memory. Hit Control-C on thepostgres process and now we can see what jemalloc tells us. We'll look specifically at the heap file for thepostmaster process which was shown above in brackets,640443.

$ jeprof --lines --inuse_space `which postgres` testdb/jeprof.640443.0.f.heapUsing local file /usr/local/bin/postgres.Using local file testdb/jeprof.640443.0.f.heap.Welcome to jeprof!  For help, type 'help'.(jeprof)

Now runtop --cum to see the stack traces with the most cumulative memory in-use.

(jeprof) top --cumTotal: 976.9 MB    0.0   0.0%   0.0%    976.8 100.0% __libc_init_first@@GLIBC_2.17 ??:?    0.0   0.0%   0.0%    976.8 100.0% __libc_start_main@GLIBC_2.17 ??:?    0.0   0.0%   0.0%    976.8 100.0% _start ??:?    0.0   0.0%   0.0%    976.8 100.0% main /home/phil/postgres/src/backend/main/main.c:199  976.7 100.0% 100.0%    976.7 100.0% AllocSetAllocLarge /home/phil/postgres/src/backend/utils/mmgr/aset.c:715    0.0   0.0% 100.0%    976.6 100.0% PostmasterMain /home/phil/postgres/src/backend/postmaster/postmaster.c:1374    0.0   0.0% 100.0%    976.6 100.0% ServerLoop.isra.0 /home/phil/postgres/src/backend/postmaster/postmaster.c:1676    0.0   0.0% 100.0%    976.6 100.0% BackendStartup (inline) /home/phil/postgres/src/backend/postmaster/postmaster.c:3555    0.0   0.0% 100.0%      0.1   0.0% PostmasterMain /home/phil/postgres/src/backend/postmaster/postmaster.c:585    0.0   0.0% 100.0%      0.1   0.0% MemoryContextAllocExtended /home/phil/postgres/src/backend/utils/mmgr/mcxt.c:1250 (discriminator 5)

And immediately we see this huge jump in in-use memory at exactly the line we started leakilypalloc-ing insrc/backend/postmaster/postmaster.c. That's perfect!

Let's introduce a leak in another Postgres process and see if we can catch that too.

A leak in a client backend

Let's leak memory inTopMemoryContext in the implementation ofrandom().

$ git diff src/backend/utils/diff --git a/src/backend/utils/adt/pseudorandomfuncs.c b/src/backend/utils/adt/pseudorandomfuncs.cindex 8e82c7078c5..886efbfaf78 100644--- a/src/backend/utils/adt/pseudorandomfuncs.c+++ b/src/backend/utils/adt/pseudorandomfuncs.c@@ -20,6 +20,7 @@#include "utils/fmgrprotos.h"#include "utils/numeric.h"#include "utils/timestamp.h"+#include "utils/memutils.h"/* Shared PRNG state used by all the random functions */static pg_prng_state prng_state;@@ -84,6 +85,13 @@ Datumdrandom(PG_FUNCTION_ARGS){       float8          result;+       int* s;+       MemoryContext old;++       old = MemoryContextSwitchTo(TopMemoryContext);+       s = palloc(100);+       MemoryContextSwitchTo(old);+       *s = 90;       initialize_prng();

We can trigger this leak by executingrandom() a bunch of times. For example withSELECT sum(random()) FROM generate_series(1, 100_0000);.

Build and install Postgres with this diff.

$ make -j16 && make install

And start up Postgres again against thetestdb we created before.

$ MALLOC_CONF=prof_leak:true,lg_prof_sample:0,prof_final:true \  LD_PRELOAD=/usr/local/lib/libjemalloc.so \  postgres -D $(pwd)/testdb2025-06-21 13:10:39.766 EDT [845169] LOG:  starting PostgreSQL 17.5 on aarch64-unknown-linux-gnu, compiled by gcc (Debian 12.2.0-14+deb12u1) 12.2.0, 64-bit2025-06-21 13:10:39.767 EDT [845169] LOG:  listening on IPv6 address "::1", port 54322025-06-21 13:10:39.767 EDT [845169] LOG:  listening on IPv4 address "127.0.0.1", port 54322025-06-21 13:10:39.767 EDT [845169] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"2025-06-21 13:10:39.769 EDT [845172] LOG:  database system was shut down at 2025-06-21 13:10:27 EDT<jemalloc>: Leak approximation summary: ~423600 bytes, ~109 objects, >= 65 contexts<jemalloc>: Run jeprof on dump output for leak detail2025-06-21 13:10:39.771 EDT [845169] LOG:  database system is ready to accept connections

In a new terminal, start a psql session and find the corresponding client backend PID withpg_backend_pid().

$ psql postgrespsql (17.5)Type "help" for help.postgres=# select pg_backend_pid();pg_backend_pid----------------        845177(1 row)postgres=#

Now run the leaking workload.

postgres=# SELECT sum(random()) FROM generate_series(1, 10_000_000);       sum-------------------499960.8137393289(1 row)

Now hit Control-D to exitpsql gracefully. And hit Control-C on thepostgres process to exit it gracefully too.

Now loadjeprof with the profile file corresponding to the backend in which we leaked.

$ jeprof --lines --inuse_space `which postgres` testdb/jeprof.845177.0.f.heapUsing local file /usr/local/bin/postgres.Using local file testdb/jeprof.845177.0.f.heap.Welcome to jeprof!  For help, type 'help'.(jeprof)

Runtop --cum like before.

(jeprof) top --cumTotal: 1305.8 MB    0.0   0.0%   0.0%   1305.7 100.0% __libc_init_first@@GLIBC_2.17 ??:?    0.0   0.0%   0.0%   1305.7 100.0% __libc_start_main@GLIBC_2.17 ??:?    0.0   0.0%   0.0%   1305.7 100.0% _start ??:?    0.0   0.0%   0.0%   1305.7 100.0% main /home/phil/postgres/src/backend/main/main.c:199    0.0   0.0%   0.0%   1305.5 100.0% PostmasterMain /home/phil/postgres/src/backend/postmaster/postmaster.c:1374    0.0   0.0%   0.0%   1305.5 100.0% ServerLoop.isra.0 /home/phil/postgres/src/backend/postmaster/postmaster.c:1676    0.0   0.0%   0.0%   1305.5 100.0% BackendStartup (inline) /home/phil/postgres/src/backend/postmaster/postmaster.c:3603    0.0   0.0%   0.0%   1305.5 100.0% postmaster_child_launch /home/phil/postgres/src/backend/postmaster/launch_backend.c:277    0.0   0.0%   0.0%   1305.4 100.0% BackendMain /home/phil/postgres/src/backend/tcop/backend_startup.c:105 1305.1 100.0% 100.0%   1305.1 100.0% AllocSetAllocFromNewBlock /home/phil/postgres/src/backend/utils/mmgr/aset.c:919

Well, we see some large allocations but not yet enough info. The defaulttop command limits to 10 lines of output. We can usetop30 --cum to see more.

(jeprof) top30 --cumTotal: 1305.8 MB    0.0   0.0%   0.0%   1305.7 100.0% __libc_init_first@@GLIBC_2.17 ??:?    0.0   0.0%   0.0%   1305.7 100.0% __libc_start_main@GLIBC_2.17 ??:?    0.0   0.0%   0.0%   1305.7 100.0% _start ??:?    0.0   0.0%   0.0%   1305.7 100.0% main /home/phil/postgres/src/backend/main/main.c:199    0.0   0.0%   0.0%   1305.5 100.0% PostmasterMain /home/phil/postgres/src/backend/postmaster/postmaster.c:1374    0.0   0.0%   0.0%   1305.5 100.0% ServerLoop.isra.0 /home/phil/postgres/src/backend/postmaster/postmaster.c:1676    0.0   0.0%   0.0%   1305.5 100.0% BackendStartup (inline) /home/phil/postgres/src/backend/postmaster/postmaster.c:3603    0.0   0.0%   0.0%   1305.5 100.0% postmaster_child_launch /home/phil/postgres/src/backend/postmaster/launch_backend.c:277    0.0   0.0%   0.0%   1305.4 100.0% BackendMain /home/phil/postgres/src/backend/tcop/backend_startup.c:105 1305.1 100.0% 100.0%   1305.1 100.0% AllocSetAllocFromNewBlock /home/phil/postgres/src/backend/utils/mmgr/aset.c:919    0.0   0.0% 100.0%   1304.0  99.9% PostgresMain /home/phil/postgres/src/backend/tcop/postgres.c:4767    0.0   0.0% 100.0%   1304.0  99.9% PortalRun /home/phil/postgres/src/backend/tcop/pquery.c:766    0.0   0.0% 100.0%   1304.0  99.9% PortalRunSelect /home/phil/postgres/src/backend/tcop/pquery.c:922    0.0   0.0% 100.0%   1304.0  99.9% exec_simple_query /home/phil/postgres/src/backend/tcop/postgres.c:1278    0.0   0.0% 100.0%   1304.0  99.9% ExecAgg /home/phil/postgres/src/backend/executor/nodeAgg.c:2179    0.0   0.0% 100.0%   1304.0  99.9% ExecEvalExprSwitchContext (inline) /home/phil/postgres/src/backend/executor/../../../src/include/executor/executor.h:356    0.0   0.0% 100.0%   1304.0  99.9% ExecInterpExpr /home/phil/postgres/src/backend/executor/execExprInterp.c:740    0.0   0.0% 100.0%   1304.0  99.9% ExecProcNode (inline) /home/phil/postgres/src/backend/executor/../../../src/include/executor/executor.h:274    0.0   0.0% 100.0%   1304.0  99.9% ExecutePlan (inline) /home/phil/postgres/src/backend/executor/execMain.c:1649    0.0   0.0% 100.0%   1304.0  99.9% advance_aggregates (inline) /home/phil/postgres/src/backend/executor/nodeAgg.c:820    0.0   0.0% 100.0%   1304.0  99.9% agg_retrieve_direct (inline) /home/phil/postgres/src/backend/executor/nodeAgg.c:2454    0.0   0.0% 100.0%   1304.0  99.9% drandom /home/phil/postgres/src/backend/utils/adt/pseudorandomfuncs.c:93    0.0   0.0% 100.0%   1304.0  99.9% standard_ExecutorRun /home/phil/postgres/src/backend/executor/execMain.c:361    0.0   0.0% 100.0%      1.3   0.1% PostgresMain /home/phil/postgres/src/backend/tcop/postgres.c:4324    0.0   0.0% 100.0%      0.9   0.1% InitPostgres /home/phil/postgres/src/backend/utils/init/postinit.c:1194 (discriminator 5)    0.0   0.0% 100.0%      0.9   0.1% InitCatalogCachePhase2 /home/phil/postgres/src/backend/utils/cache/syscache.c:187 (discriminator 3)    0.0   0.0% 100.0%      0.9   0.1% RelationCacheInitializePhase3 /home/phil/postgres/src/backend/utils/cache/relcache.c:4372    0.0   0.0% 100.0%      0.6   0.0% RelationBuildDesc /home/phil/postgres/src/backend/utils/cache/relcache.c:1208    0.0   0.0% 100.0%      0.6   0.0% RelationIdGetRelation /home/phil/postgres/src/backend/utils/cache/relcache.c:2116    0.0   0.0% 100.0%      0.6   0.0% index_open /home/phil/postgres/src/backend/access/index/indexam.c:137

And we found our leak.

Subscribe to theEDB Engineering Newsletter to learn about future posts from the EDB Engineering team.

Share this

More Blogs

More Blogs

Offline In-place Major Upgrades with CloudNativePG

The 1.26 version of CloudNativePG introduces the declarative offline in-place major upgrades of PostgreSQL made possible through the use of the Postgres native tool pg_upgrade. In this blog post, we...
June 23, 2025

PostgreSQL Logical and Physical Replication Comparison and the Advantages of Distributed PGD

PostgreSQL offers powerful physical and logical replication mechanisms for ensuring data availability. EDB's Postgres Distributed (PGD) builds on these by providing an advanced, flexible distributed database solution, offering active-active replication,...
June 16, 2025

Debugging memory leaks in Postgres, heaptrack edition

In this post we'll introduce two memory leaks into Postgres and debug them with heaptrack. Like almost every memory leak tool available to us (including memleak which I wrote about...
May 22, 2025