Legacy C++ services don't crash — they slowly bleed memory until someone restarts them at 3 AM.
If you've inherited a 20‑year‑old codebase with mysterious memory growth, this guide is for you.
You can't fix a leak if you can't reproduce.
This is your complete, production‑focused Valgrind investigation playbook.
It's based on real systems, real leaks, and real debugging pain.
Table of Contents
- The Workflow
- Step 1 — Reproduce the Leak
- Step 2 — Static Analysis
- Step 3 — Compile for Valgrind
- Step 4 — Run Valgrind
- Step 5 — Understand Valgrind's Leak Types
- Step 6 — Capture the Stack Trace
- Step 7 — Optional Regression Test
- Quick Reference
- Real‑World Example
- The Golden Rule
The Workflow
[Leak Trigger] → [Static Analysis] → [Compile Debug Build]
↓
[Run Valgrind] → [Interpret Leak Types] → [Stack Trace]
↓
[Regression Test]
Step 1 — Reproduce the Leak
You cannot find a leak if you cannot trigger it.
Measure memory growth
use the following script to track the total memory of your application used.
PID=$(pgrep your_service)
while true; do
echo "$(date): $(pmap $PID | grep total | awk '{print $2}')"
sleep 60
done
Interpretation:
| Pattern | Meaning |
|---|---|
| Linear growth | Per‑operation leak |
| Step‑function growth | Specific trigger |
| No growth | Wrong hypothesis |
Find the minimum trigger
Your goal: reproduce the leak in under 10 minutes.
Why?
- Valgrind slows execution by 20–50×
- A 10‑minute trigger becomes 3–8 hours
- A 1‑hour trigger becomes 2–5 days
Why this matters: If your trigger is too slow, Valgrind becomes unusable.
Step 2 — Static Analysis (5 minutes, zero runtime cost)
Before running anything, let the compiler find the obvious issues.
Clang Static Analyzer
scan-build make
Look for "Memory leak" warnings (ignore "Potential leak").
clang‑tidy
clang-tidy legacy_file.cpp \
--checks='-*,clang-analyzer-*,cppcoreguidelines-owning-memory'
Finds:
-
newwithoutdelete -
mallocwithoutfree - Raw owning pointers
Misses:
- Cycles
- Third‑party leaks
- Runtime‑dependent leaks
Why this matters: Static analysis gives you free wins before you even run the program.
Step 3 — Compile for Valgrind
Valgrind is useless without debug symbols. So first thing you should do is to compile the whole application with debug flag.
g++ -g3 -O0 -fno-omit-frame-pointer -o your_service your_service.cpp
| Flag | Purpose |
|---|---|
-g3 |
Full debug info |
-O0 |
Clean stack frames |
-fno-omit-frame-pointer |
Reliable backtraces |
Why this matters: Without debug symbols, Valgrind can't show you file/line numbers.
Step 4 — Run Valgrind
Run only the trigger you identified in Step 1.
valgrind --leak-check=full \
--show-leak-kinds=definite,indirect \
--track-origins=yes \
--log-file=valgrind_out.txt \
./your_service --run-trigger
For long‑running services
Use vgdb to inspect leaks mid‑run:
valgrind --vgdb=yes --vgdb-error=0 --leak-check=full ./your_service
Then:
vgdb leak_check full definite indirect
Why this matters: You don't need to wait hours — you can inspect leaks while running.
Step 5 — Understand Valgrind's Leak Types
After the run, Valgrind will give you a report about memory lost in valgrind_out.txt. Example summary:
definitely lost: 1,024 bytes
indirectly lost: 6,144 bytes
possibly lost: 0 bytes
still reachable: 45,000 bytes
What each type means
Valgrind gives the following types of memory lost. Based on the types, you decides your action.
| Type | Meaning | Action |
|---|---|---|
| definitely lost | Real leak | Fix first |
| indirectly lost | Child of a lost block | Fix parent |
| possibly lost | Pointer arithmetic / corruption | Investigate |
| still reachable | Globals/statics | Ignore unless growing |
If "still reachable" grows
Use Massif:
valgrind --tool=massif ./your_trigger
ms_print massif.out
Why this matters: "Still reachable" is not a leak — unless it grows.
Step 6 — Capture the Stack Trace
A real leak looks like the following. With the stack trace and debug symbols, exact source file name and line number will be given. That is where memory is allocated. To fix it, you need to find out why the allocated memory was not released, e.g. delete is only called on one running path. With the trigger, another running path is active.
1,024 bytes in 1 blocks are definitely lost
at operator new
by DatabaseConnection::ExecuteQuery (db_connection.cpp:67)
by CustomerLoader::FetchCustomer (customer_loader.cpp:89)
Extract only leak blocks:
grep -A10 "definitely lost" valgrind_out.txt
Why this matters: The stack trace is the map that leads you to the leak.
Step 7 — Optional Regression Test
Useful when multiple developers touch the code.
TEST(LeakTest, ConfirmLeakExists) {
size_t before = get_current_rss();
for (int i = 0; i < 100; i++) {
suspect_function();
}
size_t after = get_current_rss();
EXPECT_LT((after - before) / 100, 1024);
}
Why this matters: Regression tests prevent old leaks from returning.
Quick Reference
| Task | Command |
|---|---|
| Basic leak check | valgrind --leak-check=full ./binary |
| Only real leaks | --show-leak-kinds=definite,indirect |
| Save output | --log-file=leak.log |
| Check running service | vgdb leak_check full definite indirect |
| Heap profiling | valgrind --tool=massif |
| Extract leak | grep -A10 "definitely lost" |
Real‑World Example
Imagine a legacy service that loads customers from a database and caches them.
The bug
// customer_loader.h
struct Customer {
int id;
std::string name;
};
class CustomerRepository {
public:
Customer* LoadCustomer(int id);
};
// customer_loader.cpp
#include "customer_loader.h"
#include "db_connection.h"
Customer* CustomerRepository::LoadCustomer(int id)
{
DatabaseConnection* conn = DatabaseConnection::Get(); // singleton
ResultSet* rs = conn->ExecuteQuery("SELECT id, name FROM customers WHERE id = " + std::to_string(id));
if (!rs->Next()) {
return nullptr;
}
Customer* c = new Customer{};
c->id = rs->GetInt(0);
c->name = rs->GetString(1);
// BUG: ResultSet is never deleted
// delete rs; // missing
return c; // caller owns Customer*
}
Caller code:
void ProcessRequest(int customerId)
{
CustomerRepository repo;
Customer* c = repo.LoadCustomer(customerId);
if (!c) {
return;
}
// ... use c ...
delete c; // correct
}
At first glance, this looks “fine” because Customer is deleted.
But ResultSet is leaked on every call.
Valgrind report
You run your request handler under Valgrind:
valgrind --leak-check=full \
--show-leak-kinds=definite,indirect \
--track-origins=yes \
--log-file=valgrind_leak.log \
./service --handle-request 42
Relevant part of the report:
==12345== 128 bytes in 1 blocks are definitely lost in loss record 3 of 5
==12345== at 0x4C2F1A3: operator new(unsigned long) (vg_replace_malloc.c:422)
==12345== by 0x401F8B: ResultSet::ResultSet(DBHandle*) (result_set.cpp:27)
==12345== by 0x4023D1: DatabaseConnection::ExecuteQuery(std::string const&) (db_connection.cpp:88)
==12345== by 0x4039A4: CustomerRepository::LoadCustomer(int) (customer_loader.cpp:11)
==12345== by 0x40412F: ProcessRequest(int) (request_handler.cpp:25)
==12345== by 0x4043C9: main (main.cpp:17)
Key points:
- “128 bytes in 1 blocks are definitely lost” → real leak
- Allocation happens in
ResultSet::ResultSet - The call chain leads to
CustomerRepository::LoadCustomer
You don’t need to know ResultSet internals—only that you allocated it and never freed it.
The fix
Customer* CustomerRepository::LoadCustomer(int id)
{
DatabaseConnection* conn = DatabaseConnection::Get();
ResultSet* rs = conn->ExecuteQuery("SELECT id, name FROM customers WHERE id = " + std::to_string(id));
if (!rs->Next()) {
delete rs; // ✅ free on early return
return nullptr;
}
Customer* c = new Customer{};
c->id = rs->GetInt(0);
c->name = rs->GetString(1);
delete rs; // ✅ free after use
return c;
}
Re‑run Valgrind:
==12345== HEAP SUMMARY:
==12345== in use at exit: 0 bytes in 0 blocks
==12345== total heap usage: 1,234 allocs, 1,234 frees, 98,765 bytes allocated
==12345==
==12345== All heap blocks were freed -- no leaks are possible
The Golden Rule
Never start fixing until you can reproduce the leak in under 10 minutes.
The trigger is your truth.
The stack trace is your map.
Top comments (0)