Debugging memory leaks in Postgres, jemalloc edition
I've beentalkingabout debuggingmemory leaks for more than a year now; covering Valgrind, AdressSanitizer, memleak, and heaptrack. But there are still a few more tools to explore1 and today we're going to look atjemalloc, the alternative malloc implementation from2 Meta.
Alternative malloc implementations are popular and practical. Google has tcmalloc, Microsoft has mimalloc, and Meta has jemalloc3. But jemalloc is the only malloc implementation I've seen so far with decent memory leak detection. This is becauseAddressSanitizer support is not sufficient to detect leaks that, for example, only sometimes trigger the OOM killer but otherwise get cleaned up on exit.
1 gperftools and bytehound are on my list to check out eventually.
2 I can't confidently summarize the history, so readthis post if you're curious.
3 Other major jemalloc users includeFreeBSD andApache Arrow.
Scenario
In mylast post, we introduced two memory leaks into Postgres and debugged them with heaptrack. In this post we'll introduce those same two memory leaks again1 but we will debug them with jemalloc.
While you can easily use jemalloc on macOS, heap profiling and leak detectionisn't supported on macOS. So you'll have to pull out a Linux (virtual) machine.
Although we have been using Postgres as the codebase from which to explore tools for debugging memory leaks, these techniques are relevant for memory leaks in C, C++, and Rust projects in general.
Grab and build Postgres2.
$ git clone https://github.com/postgres/postgres$ cd postgres$ git checkout REL_17_STABLE$ ./configure --without-zlib --without-icu \ --without-readline --enable-debug --prefix=/usr/local/$ make -j8 && sudo make install
And grab and build jemalloc.
$ git clone https://github.com/facebook/jemalloc$ cd jemalloc$ ./autogen.sh$ ./configure --enable-prof --enable-prof-frameptr$ make -j8 && sudo make install
1 Much of the code and text of this post is taken from the previous post, my apologies.
2 I don't normally demonstrate installing globally but I'm running this in a dedicated virtual machine so installing globally doesn't bother me.
A leak in postmaster
Every time a Postgres process starts up it is scheduled by thepostmaster process. Let's introduce a memory leak into postmaster.
$ git diff src/backend/postmasterdiff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.cindex d032091495b..e0bf8943763 100644--- a/src/backend/postmaster/postmaster.c+++ b/src/backend/postmaster/postmaster.c@@ -3547,6 +3547,13 @@ BackendStartup(ClientSocket *client_sock) Backend *bn; /* for backend cleanup */ pid_t pid; BackendStartupData startup_data;+ MemoryContext old;+ int *s;++ old = MemoryContextSwitchTo(TopMemoryContext);+ s = palloc(8321);+ *s = 12;+ MemoryContextSwitchTo(old); /* * Create backend data structure. Better before the fork() so we can
Remember that Postgres allocates memory in nested arenas calledMemoryContexts. The top-level arena is calledTopMemoryContext
and it is freed as the process exits. Excessive allocations (leaks) inTopMemoryContext
would not be caught by Valgrind memcheck or LeakSanitizer because the memory is actually freed as the process exits becauseTopMemoryContext
is freed as the process exits. But while the process is alive, the above leak is real.
(If we switch frompalloc
tomalloc
above, LeakSanitizer does catch this leak. I didn't try Valgrind memcheck but it probably catches this too.)
An easy way to trigger this leak is by executing a ton of separatepsql
clients that create tons ofPostgres client backend processes.
$ for run in {1..100000}; do psql postgres -c 'select 1'; done
With the diff above in place, rebuild and reinstall Postgres.
$ make -j8 && make install
Create a database and runpostgres
, but with the jemalloc library inLD_PRELOAD
.
$ initdb testdb$ MALLOC_CONF=prof_leak:true,lg_prof_sample:0,prof_final:true \ LD_PRELOAD=/usr/local/lib/libjemalloc.so \ postgres -D $(pwd)/testdb2025-06-21 12:25:07.576 EDT [640443] LOG: starting PostgreSQL 17.5 on aarch64-unknown-linux-gnu, compiled by gcc (Debian 12.2.0-14+deb12u1) 12.2.0, 64-bit2025-06-21 12:25:07.577 EDT [640443] LOG: listening on IPv6 address "::1", port 54322025-06-21 12:25:07.577 EDT [640443] LOG: listening on IPv4 address "127.0.0.1", port 54322025-06-21 12:25:07.578 EDT [640443] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"2025-06-21 12:25:07.582 EDT [640446] LOG: database system was shut down at 2025-06-21 12:24:52 EDT<jemalloc>: Leak approximation summary: ~423600 bytes, ~109 objects, >= 65 contexts<jemalloc>: Run jeprof on dump output for leak detail2025-06-21 12:25:07.586 EDT [640443] LOG: database system is ready to accept connections
In another terminal we'll exercise the leaking workload.
$ for run in {1..100000}; do psql postgres -c 'select 1'; done
If you want to watch the memory usage climb while this workload is running, opentop
in another terminal.
When that is done we should have leaked a good deal of memory. Hit Control-C on thepostgres
process and now we can see what jemalloc tells us. We'll look specifically at the heap file for thepostmaster
process which was shown above in brackets,640443
.
$ jeprof --lines --inuse_space `which postgres` testdb/jeprof.640443.0.f.heapUsing local file /usr/local/bin/postgres.Using local file testdb/jeprof.640443.0.f.heap.Welcome to jeprof! For help, type 'help'.(jeprof)
Now runtop --cum
to see the stack traces with the most cumulative memory in-use.
(jeprof) top --cumTotal: 976.9 MB 0.0 0.0% 0.0% 976.8 100.0% __libc_init_first@@GLIBC_2.17 ??:? 0.0 0.0% 0.0% 976.8 100.0% __libc_start_main@GLIBC_2.17 ??:? 0.0 0.0% 0.0% 976.8 100.0% _start ??:? 0.0 0.0% 0.0% 976.8 100.0% main /home/phil/postgres/src/backend/main/main.c:199 976.7 100.0% 100.0% 976.7 100.0% AllocSetAllocLarge /home/phil/postgres/src/backend/utils/mmgr/aset.c:715 0.0 0.0% 100.0% 976.6 100.0% PostmasterMain /home/phil/postgres/src/backend/postmaster/postmaster.c:1374 0.0 0.0% 100.0% 976.6 100.0% ServerLoop.isra.0 /home/phil/postgres/src/backend/postmaster/postmaster.c:1676 0.0 0.0% 100.0% 976.6 100.0% BackendStartup (inline) /home/phil/postgres/src/backend/postmaster/postmaster.c:3555 0.0 0.0% 100.0% 0.1 0.0% PostmasterMain /home/phil/postgres/src/backend/postmaster/postmaster.c:585 0.0 0.0% 100.0% 0.1 0.0% MemoryContextAllocExtended /home/phil/postgres/src/backend/utils/mmgr/mcxt.c:1250 (discriminator 5)
And immediately we see this huge jump in in-use memory at exactly the line we started leakilypalloc
-ing insrc/backend/postmaster/postmaster.c
. That's perfect!
Let's introduce a leak in another Postgres process and see if we can catch that too.
A leak in a client backend
Let's leak memory inTopMemoryContext
in the implementation ofrandom()
.
$ git diff src/backend/utils/diff --git a/src/backend/utils/adt/pseudorandomfuncs.c b/src/backend/utils/adt/pseudorandomfuncs.cindex 8e82c7078c5..886efbfaf78 100644--- a/src/backend/utils/adt/pseudorandomfuncs.c+++ b/src/backend/utils/adt/pseudorandomfuncs.c@@ -20,6 +20,7 @@#include "utils/fmgrprotos.h"#include "utils/numeric.h"#include "utils/timestamp.h"+#include "utils/memutils.h"/* Shared PRNG state used by all the random functions */static pg_prng_state prng_state;@@ -84,6 +85,13 @@ Datumdrandom(PG_FUNCTION_ARGS){ float8 result;+ int* s;+ MemoryContext old;++ old = MemoryContextSwitchTo(TopMemoryContext);+ s = palloc(100);+ MemoryContextSwitchTo(old);+ *s = 90; initialize_prng();
We can trigger this leak by executingrandom()
a bunch of times. For example withSELECT sum(random()) FROM generate_series(1, 100_0000);
.
Build and install Postgres with this diff.
$ make -j16 && make install
And start up Postgres again against thetestdb
we created before.
$ MALLOC_CONF=prof_leak:true,lg_prof_sample:0,prof_final:true \ LD_PRELOAD=/usr/local/lib/libjemalloc.so \ postgres -D $(pwd)/testdb2025-06-21 13:10:39.766 EDT [845169] LOG: starting PostgreSQL 17.5 on aarch64-unknown-linux-gnu, compiled by gcc (Debian 12.2.0-14+deb12u1) 12.2.0, 64-bit2025-06-21 13:10:39.767 EDT [845169] LOG: listening on IPv6 address "::1", port 54322025-06-21 13:10:39.767 EDT [845169] LOG: listening on IPv4 address "127.0.0.1", port 54322025-06-21 13:10:39.767 EDT [845169] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"2025-06-21 13:10:39.769 EDT [845172] LOG: database system was shut down at 2025-06-21 13:10:27 EDT<jemalloc>: Leak approximation summary: ~423600 bytes, ~109 objects, >= 65 contexts<jemalloc>: Run jeprof on dump output for leak detail2025-06-21 13:10:39.771 EDT [845169] LOG: database system is ready to accept connections
In a new terminal, start a psql session and find the corresponding client backend PID withpg_backend_pid().
$ psql postgrespsql (17.5)Type "help" for help.postgres=# select pg_backend_pid();pg_backend_pid---------------- 845177(1 row)postgres=#
Now run the leaking workload.
postgres=# SELECT sum(random()) FROM generate_series(1, 10_000_000); sum-------------------499960.8137393289(1 row)
Now hit Control-D to exitpsql
gracefully. And hit Control-C on thepostgres
process to exit it gracefully too.
Now loadjeprof
with the profile file corresponding to the backend in which we leaked.
$ jeprof --lines --inuse_space `which postgres` testdb/jeprof.845177.0.f.heapUsing local file /usr/local/bin/postgres.Using local file testdb/jeprof.845177.0.f.heap.Welcome to jeprof! For help, type 'help'.(jeprof)
Runtop --cum
like before.
(jeprof) top --cumTotal: 1305.8 MB 0.0 0.0% 0.0% 1305.7 100.0% __libc_init_first@@GLIBC_2.17 ??:? 0.0 0.0% 0.0% 1305.7 100.0% __libc_start_main@GLIBC_2.17 ??:? 0.0 0.0% 0.0% 1305.7 100.0% _start ??:? 0.0 0.0% 0.0% 1305.7 100.0% main /home/phil/postgres/src/backend/main/main.c:199 0.0 0.0% 0.0% 1305.5 100.0% PostmasterMain /home/phil/postgres/src/backend/postmaster/postmaster.c:1374 0.0 0.0% 0.0% 1305.5 100.0% ServerLoop.isra.0 /home/phil/postgres/src/backend/postmaster/postmaster.c:1676 0.0 0.0% 0.0% 1305.5 100.0% BackendStartup (inline) /home/phil/postgres/src/backend/postmaster/postmaster.c:3603 0.0 0.0% 0.0% 1305.5 100.0% postmaster_child_launch /home/phil/postgres/src/backend/postmaster/launch_backend.c:277 0.0 0.0% 0.0% 1305.4 100.0% BackendMain /home/phil/postgres/src/backend/tcop/backend_startup.c:105 1305.1 100.0% 100.0% 1305.1 100.0% AllocSetAllocFromNewBlock /home/phil/postgres/src/backend/utils/mmgr/aset.c:919
Well, we see some large allocations but not yet enough info. The defaulttop
command limits to 10 lines of output. We can usetop30 --cum
to see more.
(jeprof) top30 --cumTotal: 1305.8 MB 0.0 0.0% 0.0% 1305.7 100.0% __libc_init_first@@GLIBC_2.17 ??:? 0.0 0.0% 0.0% 1305.7 100.0% __libc_start_main@GLIBC_2.17 ??:? 0.0 0.0% 0.0% 1305.7 100.0% _start ??:? 0.0 0.0% 0.0% 1305.7 100.0% main /home/phil/postgres/src/backend/main/main.c:199 0.0 0.0% 0.0% 1305.5 100.0% PostmasterMain /home/phil/postgres/src/backend/postmaster/postmaster.c:1374 0.0 0.0% 0.0% 1305.5 100.0% ServerLoop.isra.0 /home/phil/postgres/src/backend/postmaster/postmaster.c:1676 0.0 0.0% 0.0% 1305.5 100.0% BackendStartup (inline) /home/phil/postgres/src/backend/postmaster/postmaster.c:3603 0.0 0.0% 0.0% 1305.5 100.0% postmaster_child_launch /home/phil/postgres/src/backend/postmaster/launch_backend.c:277 0.0 0.0% 0.0% 1305.4 100.0% BackendMain /home/phil/postgres/src/backend/tcop/backend_startup.c:105 1305.1 100.0% 100.0% 1305.1 100.0% AllocSetAllocFromNewBlock /home/phil/postgres/src/backend/utils/mmgr/aset.c:919 0.0 0.0% 100.0% 1304.0 99.9% PostgresMain /home/phil/postgres/src/backend/tcop/postgres.c:4767 0.0 0.0% 100.0% 1304.0 99.9% PortalRun /home/phil/postgres/src/backend/tcop/pquery.c:766 0.0 0.0% 100.0% 1304.0 99.9% PortalRunSelect /home/phil/postgres/src/backend/tcop/pquery.c:922 0.0 0.0% 100.0% 1304.0 99.9% exec_simple_query /home/phil/postgres/src/backend/tcop/postgres.c:1278 0.0 0.0% 100.0% 1304.0 99.9% ExecAgg /home/phil/postgres/src/backend/executor/nodeAgg.c:2179 0.0 0.0% 100.0% 1304.0 99.9% ExecEvalExprSwitchContext (inline) /home/phil/postgres/src/backend/executor/../../../src/include/executor/executor.h:356 0.0 0.0% 100.0% 1304.0 99.9% ExecInterpExpr /home/phil/postgres/src/backend/executor/execExprInterp.c:740 0.0 0.0% 100.0% 1304.0 99.9% ExecProcNode (inline) /home/phil/postgres/src/backend/executor/../../../src/include/executor/executor.h:274 0.0 0.0% 100.0% 1304.0 99.9% ExecutePlan (inline) /home/phil/postgres/src/backend/executor/execMain.c:1649 0.0 0.0% 100.0% 1304.0 99.9% advance_aggregates (inline) /home/phil/postgres/src/backend/executor/nodeAgg.c:820 0.0 0.0% 100.0% 1304.0 99.9% agg_retrieve_direct (inline) /home/phil/postgres/src/backend/executor/nodeAgg.c:2454 0.0 0.0% 100.0% 1304.0 99.9% drandom /home/phil/postgres/src/backend/utils/adt/pseudorandomfuncs.c:93 0.0 0.0% 100.0% 1304.0 99.9% standard_ExecutorRun /home/phil/postgres/src/backend/executor/execMain.c:361 0.0 0.0% 100.0% 1.3 0.1% PostgresMain /home/phil/postgres/src/backend/tcop/postgres.c:4324 0.0 0.0% 100.0% 0.9 0.1% InitPostgres /home/phil/postgres/src/backend/utils/init/postinit.c:1194 (discriminator 5) 0.0 0.0% 100.0% 0.9 0.1% InitCatalogCachePhase2 /home/phil/postgres/src/backend/utils/cache/syscache.c:187 (discriminator 3) 0.0 0.0% 100.0% 0.9 0.1% RelationCacheInitializePhase3 /home/phil/postgres/src/backend/utils/cache/relcache.c:4372 0.0 0.0% 100.0% 0.6 0.0% RelationBuildDesc /home/phil/postgres/src/backend/utils/cache/relcache.c:1208 0.0 0.0% 100.0% 0.6 0.0% RelationIdGetRelation /home/phil/postgres/src/backend/utils/cache/relcache.c:2116 0.0 0.0% 100.0% 0.6 0.0% index_open /home/phil/postgres/src/backend/access/index/indexam.c:137
And we found our leak.
Subscribe to theEDB Engineering Newsletter to learn about future posts from the EDB Engineering team.
In this Article