Debugging memory leaks in Postgres, heaptrack edition
In this post we'll introduce two memory leaks into Postgres and debug them withheaptrack. Like almost every memory leak tool available to us (including memleakwhich I wrote about last time), heaptrack requires you to be on Linux. But a Linux VM on a Mac is fine too (that is where I'm running this code from).
Although we use Postgres as the codebase with which to explore tools likememleak
(last time) andheaptrack
(this time), these techniques are relevant for memory leaks in C, C++, and Rust projects in general.
Thank you to my coworker Jacob Champion for reviewing a version of this post.
Grab and build Postgres.
$ git clone https://github.com/postgres/postgres$ cd postgres$ git checkout REL_17_STABLE$ ./configure --enable-debug \ --prefix=$(pwd)/build \ --libdir=$(pwd)/build/lib$ make -j16 && make install
A leak in postmaster
Every time a Postgres process starts up it is scheduled by thepostmaster process. Let's introduce a memory leak intopostmaster
.
$ git diff src/backend/postmasterdiff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.cindex d032091495b..e0bf8943763 100644--- a/src/backend/postmaster/postmaster.c+++ b/src/backend/postmaster/postmaster.c@@ -3547,6 +3547,13 @@ BackendStartup(ClientSocket *client_sock) Backend *bn; /* for backend cleanup */ pid_t pid; BackendStartupData startup_data;+ MemoryContext old;+ int *s;++ old = MemoryContextSwitchTo(TopMemoryContext);+ s = palloc(80);+ *s = 12;+ MemoryContextSwitchTo(old); /* * Create backend data structure. Better before the fork() so we can
Remember that Postgres allocates memory in nested arenas calledMemoryContexts. The top-level arena is calledTopMemoryContext
and it is freed as the process exits. Excessive allocations (leaks) inTopMemoryContext
would not be caught byValgrind memcheck orLeakSanitizer because the memory is actually freed as the process exits becauseTopMemoryContext
is freed as the process exits. But while the process is alive, the above leak is real.
(If we switch frompalloc
tomalloc
above,LeakSanitizer
does catch this leak. I didn't tryValgrind memcheck
but it probably catches this too.)
An easy way to trigger this leak is by executing a ton of separatepsql
clients that create tons of Postgresclient backend processes.
for run in {1..100000}; do psql postgres -c 'select 1'; done
With the diff above in place, rebuild and reinstall Postgres.
$ make -j16 && make install
Create a database and start a Postgres server.
$ ./build/bin/initdb testdb$ ./build/bin/postgres -D $(pwd)/testdb2025-05-22 12:05:15.995 EDT [260576] LOG: starting PostgreSQL 17.5 on aarch64-unknown-linux-gnu, compiled by gcc (Debian 14.2.0-19) 14.2.0, 64-bit2025-05-22 12:05:15.996 EDT [260576] LOG: listening on IPv6 address "::1", port 54322025-05-22 12:05:15.996 EDT [260576] LOG: listening on IPv4 address "127.0.0.1", port 54322025-05-22 12:05:15.997 EDT [260576] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"2025-05-22 12:05:16.001 EDT [260579] LOG: database system was shut down at 2025-05-22 11:37:53 EDT2025-05-22 12:05:16.004 EDT [260576] LOG: database system is ready to accept connections
The integer in brackets is the PID of thepostmaster
process. In another terminal attach to thepostmaster
process withheaptrack
.
$ sudo heaptrack -p 260576
Theheaptrack
process should run until the Postgres server ends.
In another terminal we'll exercise the leaking workload.
$ for run in {1..100000}; do ./build/bin/psql postgres -c 'select 1'; done
If you want to watch the memory usage climb while this workload is running, opentop
in another terminal.
When that is done we should have leaked a good deal of memory. Hit Control-C on thepostgres
process and now we can see whatheaptrack
tells us.
$ sudo heaptrack -p 260576heaptrack output will be written to "/home/phil/postgres/heaptrack.postgres.260645.zst"injecting heaptrack into application via GDB, this might take some time...warning: 44 ./nptl/cancellation.c: No such file or directoryinjection finishedheaptrack stats: allocations: 19 leaked allocations: 19 temporary allocations: 0Heaptrack finished! Now run the following to investigate the data: heaptrack --analyze "/home/phil/postgres/heaptrack.postgres.260645.zst"
It uses a different PID in the log file name than the one we gave it to track but that's ok. Now we runheaptrack_print
to format things.
$ heaptrack_print --print-leaks 1 ./heaptrack.postgres.260645.zst > leak.txt
And if we open that file,leak.txt
, and search forMEMORY LEAKS
we get this:
MEMORY LEAKS16.77M leaked over 11 calls fromAllocSetAllocFromNewBlock at /home/phil/postgres/src/backend/utils/mmgr/aset.c:908 in /home/phil/postgres/build/bin/postgres16.77M leaked over 11 calls from: ServerLoop::BackendStartup at /home/phil/postgres/src/backend/postmaster/postmaster.c:3554 in /home/phil/postgres/build/bin/postgres ServerLoop at /home/phil/postgres/src/backend/postmaster/postmaster.c:1676 PostmasterMain at /home/phil/postgres/src/backend/postmaster/postmaster.c:1374 in /home/phil/postgres/build/bin/postgres main at /home/phil/postgres/src/backend/main/main.c:199 in /home/phil/postgres/build/bin/postgres
Which is exactly the leak we introduced.
To make sure we're not hallucinating, comment out the allocation diff and rerunheaptrack
and this workload, theMEMORY LEAKS
section will be empty. In this scenario,leak.txt
looks like this:
$ cat leak.txtreading file "./heaptrack.postgres.504295.zst" - please wait, this might take some time...Debuggee command was: ./build/bin/postgres -D /home/phil/postgres/testdbfinished reading file, now analyzing data:MOST CALLS TO ALLOCATION FUNCTIONSPEAK MEMORY CONSUMERSMEMORY LEAKSMOST TEMPORARY ALLOCATIONStotal runtime: 146.664000s.calls to allocation functions: 0 (0/s)temporary memory allocations: 0 (0/s)peak heap memory consumption: 0Bpeak RSS (including heaptrack overhead): 21.93Mtotal memory leaked: 0
Let's introduce a leak in another Postgres process and see if we can catch that too.
A leak in a client backend
A memory leak in a client backend can be harder to catch since it is ephemeral. The client backend starts up for a single Postgres session and exits when the session exits. But so long as we can grab the PID of the client backend process, we can attach to it withheaptrack
.
Let's leak memory inTopMemoryContext
in the implementation ofrandom()
.
$ git diff src/backend/utils/diff --git a/src/backend/utils/adt/pseudorandomfuncs.c b/src/backend/utils/adt/pseudorandomfuncs.cindex 8e82c7078c5..886efbfaf78 100644--- a/src/backend/utils/adt/pseudorandomfuncs.c+++ b/src/backend/utils/adt/pseudorandomfuncs.c@@ -20,6 +20,7 @@#include "utils/fmgrprotos.h"#include "utils/numeric.h"#include "utils/timestamp.h"+#include "utils/memutils.h"/* Shared PRNG state used by all the random functions */static pg_prng_state prng_state;@@ -84,6 +85,13 @@ Datumdrandom(PG_FUNCTION_ARGS){ float8 result;+ int* s;+ MemoryContext old;++ old = MemoryContextSwitchTo(TopMemoryContext);+ s = palloc(120);+ MemoryContextSwitchTo(old);+ *s = 90; initialize_prng();
We can trigger this leak by executingrandom()
a bunch of times. For example withSELECT sum(random()) FROM generate_series(1, 100_0000);
.
Build and install Postgres with this diff.
$ make -j16 && make install
And start up Postgres again against thetestdb
we created before.
$ ./build/bin/postgres -D $(pwd)/testdb2025-05-22 14:37:11.322 EDT [704381] LOG: starting PostgreSQL 17.5 on aarch64-unknown-linux-gnu, compiled by gcc (Debian 14.2.0-19) 14.2.0, 64-bit2025-05-22 14:37:11.323 EDT [704381] LOG: listening on IPv6 address "::1", port 54322025-05-22 14:37:11.323 EDT [704381] LOG: listening on IPv4 address "127.0.0.1", port 54322025-05-22 14:37:11.324 EDT [704381] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"2025-05-22 14:37:11.327 EDT [704384] LOG: database system was shut down at 2025-05-22 14:31:00 EDT2025-05-22 14:37:11.329 EDT [704381] LOG: database system is ready to accept connections
In a new terminal, start apsql
session and find the corresponding client backend PID withpg_backend_pid().
$ ./build/bin/psql postgrespsql (17.5)Type "help" for help.postgres=# select pg_backend_pid();pg_backend_pid---------------- 704389(1 row)postgres=#
Keep this session alive and in a new terminal attachheaptrack
to it.
$ sudo heaptrack -p 704389heaptrack output will be written to "/home/phil/heaptrack.postgres.704409.zst"injecting heaptrack into application via GDB, this might take some time...warning: 44 ./nptl/cancellation.c: No such file or directoryinjection finished
Back in that704389
psql
session, run the leaking workload.
postgres=# SELECT sum(random()) FROM generate_series(1, 10_000_000); sum-------------------499960.8137393289(1 row)
Now hit Control-D to exitpsql
gracefully.
Theheaptrack
terminal will tell us where to look.
$ sudo heaptrack -p 704389heaptrack output will be written to "/home/phil/heaptrack.postgres.704409.zst"injecting heaptrack into application via GDB, this might take some time...warning: 44 ./nptl/cancellation.c: No such file or directoryinjection finishedheaptrack stats: allocations: 206 leaked allocations: 177 temporary allocations: 8removing heaptrack injection via GDB, this might take some time...ptrace: No such process.No symbol table is loaded. Use the "file" command.The program is not being run.Heaptrack finished! Now run the following to investigate the data: heaptrack --analyze "/home/phil/heaptrack.postgres.704409.zst"
Runheaptrack_print
like before.
$ heaptrack_print --print-leaks 1 ./heaptrack.postgres.704409.zst > leak.txt
And look inleak.txt
for theMEMORY LEAKS
section again.
MEMORY LEAKS1.37G leaked over 180 calls fromAllocSetAllocFromNewBlock at /home/phil/postgres/src/backend/utils/mmgr/aset.c:908 in /home/phil/postgres/build/bin/postgres1.37G leaked over 170 calls from: drandom at /home/phil/postgres/src/backend/utils/adt/pseudorandomfuncs.c:92 in /home/phil/postgres/build/bin/postgres ExecInterpExpr at /home/phil/postgres/src/backend/executor/execExprInterp.c:740 in /home/phil/postgres/build/bin/postgres ExecAgg::agg_retrieve_direct::advance_aggregates::ExecEvalExprSwitchContext at ../../../src/include/executor/executor.h:356 in /home/phil/postgres/build/bin/postgres ExecAgg::agg_retrieve_direct::advance_aggregates at /home/phil/postgres/src/backend/executor/nodeAgg.c:820 ExecAgg::agg_retrieve_direct at /home/phil/postgres/src/backend/executor/nodeAgg.c:2454 ExecAgg at /home/phil/postgres/src/backend/executor/nodeAgg.c:2179 standard_ExecutorRun::ExecutePlan::ExecProcNode
And we found our leak.
Considerations
Unlikememleak
,heaptrack
requires the process to exit before it will print leak reports. This is unfortunate since sometimes you don't want to let or force a process to end.
On the other hand,heaptrack
has been reporting stack traces and line numbers in different environments for me more reliably thanmemleak
has.
Again, both seem to be better options thanValgrind memcheck
orLeakSanitizer
if your leaked memory will get cleaned up before the program exits.
Bothheaptrack
andmemleak
seem to be good tools to have in the chest.
Subscribe to theEDB Engineering Newsletter to learn about future posts from the EDB Engineering team.
In this Article