Wang - C++ Developer

Posted on May 6

Finding Memory Leaks in Legacy C++ Applications with Valgrind

#cpp #legacy #valgrind #clangtidy

Legacy C++ services don't crash — they slowly bleed memory until someone restarts them at 3 AM.

If you've inherited a 20‑year‑old codebase with mysterious memory growth, this guide is for you.

You can't fix a leak if you can't reproduce.

This is your complete, production‑focused Valgrind investigation playbook.
It's based on real systems, real leaks, and real debugging pain.

The Workflow
Step 1 — Reproduce the Leak
Step 2 — Static Analysis
Step 3 — Compile for Valgrind
Step 4 — Run Valgrind
Step 5 — Understand Valgrind's Leak Types
Step 6 — Capture the Stack Trace
Step 7 — Optional Regression Test
Quick Reference
Real‑World Example
The Golden Rule

The Workflow

[Leak Trigger] → [Static Analysis] → [Compile Debug Build]
        ↓
   [Run Valgrind] → [Interpret Leak Types] → [Stack Trace]
        ↓
     [Regression Test]

Step 1 — Reproduce the Leak

You cannot find a leak if you cannot trigger it.

Measure memory growth

use the following script to track the total memory of your application used.

PID=$(pgrep your_service)
while true; do
  echo "$(date): $(pmap $PID | grep total | awk '{print $2}')"
  sleep 60
done

Interpretation:

Pattern	Meaning
Linear growth	Per‑operation leak
Step‑function growth	Specific trigger
No growth	Wrong hypothesis

Find the minimum trigger

Your goal: reproduce the leak in under 10 minutes.

Why?

Valgrind slows execution by 20–50×
A 10‑minute trigger becomes 3–8 hours
A 1‑hour trigger becomes 2–5 days

Why this matters: If your trigger is too slow, Valgrind becomes unusable.

Step 2 — Static Analysis (5 minutes, zero runtime cost)

Before running anything, let the compiler find the obvious issues.

Clang Static Analyzer

scan-build make

Look for "Memory leak" warnings (ignore "Potential leak").

clang‑tidy

clang-tidy legacy_file.cpp \
  --checks='-*,clang-analyzer-*,cppcoreguidelines-owning-memory'

Finds:

new without delete
malloc without free
Raw owning pointers

Misses:

Cycles
Third‑party leaks
Runtime‑dependent leaks

Why this matters: Static analysis gives you free wins before you even run the program.

Step 3 — Compile for Valgrind

Valgrind is useless without debug symbols. So first thing you should do is to compile the whole application with debug flag.

g++ -g3 -O0 -fno-omit-frame-pointer -o your_service your_service.cpp

Flag	Purpose
`-g3`	Full debug info
`-O0`	Clean stack frames
`-fno-omit-frame-pointer`	Reliable backtraces

Why this matters: Without debug symbols, Valgrind can't show you file/line numbers.

Step 4 — Run Valgrind

Run only the trigger you identified in Step 1.

valgrind --leak-check=full \
         --show-leak-kinds=definite,indirect \
         --track-origins=yes \
         --log-file=valgrind_out.txt \
         ./your_service --run-trigger

For long‑running services

Use vgdb to inspect leaks mid‑run:

valgrind --vgdb=yes --vgdb-error=0 --leak-check=full ./your_service

Then:

vgdb leak_check full definite indirect

Why this matters: You don't need to wait hours — you can inspect leaks while running.

Step 5 — Understand Valgrind's Leak Types

After the run, Valgrind will give you a report about memory lost in valgrind_out.txt. Example summary:

definitely lost: 1,024 bytes
indirectly lost: 6,144 bytes
possibly lost: 0 bytes
still reachable: 45,000 bytes

What each type means

Valgrind gives the following types of memory lost. Based on the types, you decides your action.

Type	Meaning	Action
definitely lost	Real leak	Fix first
indirectly lost	Child of a lost block	Fix parent
possibly lost	Pointer arithmetic / corruption	Investigate
still reachable	Globals/statics	Ignore unless growing

If "still reachable" grows

Use Massif:

valgrind --tool=massif ./your_trigger
ms_print massif.out

Why this matters: "Still reachable" is not a leak — unless it grows.

Step 6 — Capture the Stack Trace

A real leak looks like the following. With the stack trace and debug symbols, exact source file name and line number will be given. That is where memory is allocated. To fix it, you need to find out why the allocated memory was not released, e.g. delete is only called on one running path. With the trigger, another running path is active.

1,024 bytes in 1 blocks are definitely lost
at operator new
by DatabaseConnection::ExecuteQuery (db_connection.cpp:67)
by CustomerLoader::FetchCustomer (customer_loader.cpp:89)

Extract only leak blocks:

grep -A10 "definitely lost" valgrind_out.txt

Why this matters: The stack trace is the map that leads you to the leak.

Step 7 — Optional Regression Test

Useful when multiple developers touch the code.

TEST(LeakTest, ConfirmLeakExists) {
    size_t before = get_current_rss();
    for (int i = 0; i < 100; i++) {
        suspect_function();
    }
    size_t after = get_current_rss();
    EXPECT_LT((after - before) / 100, 1024);
}

Why this matters: Regression tests prevent old leaks from returning.

Quick Reference

Task	Command
Basic leak check	`valgrind --leak-check=full ./binary`
Only real leaks	`--show-leak-kinds=definite,indirect`
Save output	`--log-file=leak.log`
Check running service	`vgdb leak_check full definite indirect`
Heap profiling	`valgrind --tool=massif`
Extract leak	`grep -A10 "definitely lost"`

Real‑World Example

Imagine a legacy service that loads customers from a database and caches them.

The bug

// customer_loader.h
struct Customer {
    int id;
    std::string name;
};

class CustomerRepository {
public:
    Customer* LoadCustomer(int id);
};

// customer_loader.cpp
#include "customer_loader.h"
#include "db_connection.h"

Customer* CustomerRepository::LoadCustomer(int id)
{
    DatabaseConnection* conn = DatabaseConnection::Get(); // singleton
    ResultSet* rs = conn->ExecuteQuery("SELECT id, name FROM customers WHERE id = " + std::to_string(id));

    if (!rs->Next()) {
        return nullptr;
    }

    Customer* c = new Customer{};
    c->id = rs->GetInt(0);
    c->name = rs->GetString(1);

    // BUG: ResultSet is never deleted
    // delete rs;  // missing

    return c; // caller owns Customer*
}

Caller code:

void ProcessRequest(int customerId)
{
    CustomerRepository repo;
    Customer* c = repo.LoadCustomer(customerId);

    if (!c) {
        return;
    }

    // ... use c ...

    delete c; // correct
}

At first glance, this looks “fine” because Customer is deleted.

But ResultSet is leaked on every call.

Valgrind report

You run your request handler under Valgrind:

valgrind --leak-check=full \
         --show-leak-kinds=definite,indirect \
         --track-origins=yes \
         --log-file=valgrind_leak.log \
         ./service --handle-request 42

Relevant part of the report:

==12345== 128 bytes in 1 blocks are definitely lost in loss record 3 of 5
==12345==    at 0x4C2F1A3: operator new(unsigned long) (vg_replace_malloc.c:422)
==12345==    by 0x401F8B: ResultSet::ResultSet(DBHandle*) (result_set.cpp:27)
==12345==    by 0x4023D1: DatabaseConnection::ExecuteQuery(std::string const&) (db_connection.cpp:88)
==12345==    by 0x4039A4: CustomerRepository::LoadCustomer(int) (customer_loader.cpp:11)
==12345==    by 0x40412F: ProcessRequest(int) (request_handler.cpp:25)
==12345==    by 0x4043C9: main (main.cpp:17)

Key points:

“128 bytes in 1 blocks are definitely lost” → real leak
Allocation happens in ResultSet::ResultSet
The call chain leads to CustomerRepository::LoadCustomer

You don’t need to know ResultSet internals—only that you allocated it and never freed it.

The fix

Customer* CustomerRepository::LoadCustomer(int id)
{
    DatabaseConnection* conn = DatabaseConnection::Get();
    ResultSet* rs = conn->ExecuteQuery("SELECT id, name FROM customers WHERE id = " + std::to_string(id));

    if (!rs->Next()) {
        delete rs;          // ✅ free on early return
        return nullptr;
    }

    Customer* c = new Customer{};
    c->id = rs->GetInt(0);
    c->name = rs->GetString(1);

    delete rs;              // ✅ free after use

    return c;
}

Re‑run Valgrind:

==12345== HEAP SUMMARY:
==12345==     in use at exit: 0 bytes in 0 blocks
==12345==   total heap usage: 1,234 allocs, 1,234 frees, 98,765 bytes allocated
==12345== 
==12345== All heap blocks were freed -- no leaks are possible

The Golden Rule

Never start fixing until you can reproduce the leak in under 10 minutes.

The trigger is your truth.
The stack trace is your map.

DEV Community