우병수

Posted on Jun 10 • Originally published at techdigestor.com

Unsigned Sizes Bit Me in Production — Here's How I Finally Got My Head Around Them

#rust #productivity #tools #webdev

TL;DR: The vector was empty. That's it.

📖 Reading time: ~30 min

What's in this article

The Bug That Sent Me Down This Rabbit Hole
Why Sizes Are Unsigned in the First Place
The Three Ways Unsigned Sizes Actually Break Your Code
Catching These Bugs Before They Ship: The Toolchain
The Actual Fixes: Patterns That Work in Practice
The Signed vs Unsigned Size Debate: Where the Industry Actually Lands
What About Other Languages?
Quick Reference: Unsigned Size Fixes at a Glance

The Bug That Sent Me Down This Rabbit Hole

The Loop That Crashed in Production and Left Zero Evidence in CI

The vector was empty. That's it. That's the whole bug. I had a loop that started at container.size() - 1 and counted down, and on an empty vector, size() returns 0 — which is of type size_t, an unsigned type. Subtracting 1 from an unsigned zero doesn't give you -1. It wraps around to 18446744073709551615 — the maximum value of a 64-bit unsigned integer. The loop ran. The index blew past the vector bounds immediately. Memory corruption, segfault, gone. But only in production, only with empty input.

The part that genuinely annoyed me: I compiled with -Wall -Wextra and got nothing. Not a single diagnostic. The compiler watched me hand-craft a perfect underflow and stayed completely silent. I only found it by adding AddressSanitizer (-fsanitize=address) to a local debug build after a prod incident. The code had passed code review, passed CI, passed a test suite that just never fed it an empty container. The crash was 100% deterministic once I knew the trigger — but without that trigger in tests, it was invisible.

// This looks fine. It is not fine.
for (size_t i = container.size() - 1; i >= 0; i--) {
    process(container[i]);
}
// When container is empty: size() returns 0 (unsigned)
// 0 - 1 wraps to 18446744073709551615
// i >= 0 is ALWAYS true for unsigned — infinite loop + OOB access
// Compiler sees no problem here. Neither did the reviewer.

This isn't exotic. I've seen this exact pattern — or variants of it — in codebases written by people who absolutely know what they're doing. The unsigned-integer underflow bug is one of those failure modes that gets past smart reviewers because the code reads naturally. Your brain parses i >= 0 as a sensible lower-bound check. The compiler just doesn't care that it's logically vacuous for an unsigned type. It'll even warn you about signed/unsigned comparisons with -Wsign-compare, but the silent tautology? That gets a pass.

What I want to work through here is the full picture: why C and C++ made unsigned the default for sizes and counts, the exact places where this decision quietly destroys you beyond just the countdown loop, and the concrete compiler flags, sanitizers, and code patterns that actually catch it before prod does. There's also a real question about whether you should reach for signed integers by default — which the C++ Core Guidelines now answer with a pretty clear opinion — and I'll get into that with actual trade-offs, not just the party line.

Why Sizes Are Unsigned in the First Place

The thing that catches most developers off guard isn't that size_t is unsigned — it's why it was made unsigned in the first place, and how reasonable that decision was given the hardware of the era. Back in the early 80s, the designers weren't being careless. They were working on machines where squeezing every addressable byte out of a 32-bit address space was a real engineering constraint. An unsigned 32-bit integer gives you a range of 0 to ~4.29 billion. A signed one tops out at ~2.14 billion. On a machine with 4GB physical RAM as a theoretical ceiling, that difference wasn't academic — it was the difference between being able to address all of it or not. So size_t became unsigned, and the logic was: a size or an index is never negative, so why waste a bit on the sign?

The C++ STL made this permanent. Every container you touch returns size_t or a size_type alias that resolves to it. std::vector::size() returns size_t. std::string::length() returns size_t. operator[] on containers takes size_type. This wasn't accidental — it was a deliberate choice to match the C memory model. The problem is that this bakes unsigned arithmetic into every loop you write over a container. The moment you do something like vec.size() - 1 on an empty vector, you don't get -1. You get SIZE_MAX, which is 18,446,744,073,709,551,615 on a 64-bit system. That's not a bounds error you catch immediately — it's undefined behavior waiting to corrupt memory in production.

// This looks innocent. It is not.
std::vector<int> v = {};
for (size_t i = 0; i < v.size() - 1; i++) {
    // v.size() is 0, so v.size() - 1 wraps to SIZE_MAX
    // This loop runs ~18 quintillion times or segfaults
    process(v[i]);
}

// The fix: use a signed cast or restructure the loop
for (int i = 0; i < (int)v.size() - 1; i++) {
    process(v[i]);
}
// Or just use range-based for and avoid the index entirely

Rust made a deliberate call to keep usize unsigned — same reasoning, different outcome. The difference is that in debug builds, Rust panics on integer overflow rather than silently wrapping. So 0usize - 1 crashes your program immediately instead of handing you a garbage value. In release builds it still wraps (for performance), but you can use checked_sub to make the intent explicit. The borrow checker also pushes you toward iterators over manual indexing, which sidesteps the problem entirely most of the time. It's not that Rust solved the unsigned size problem — it's that the tooling makes you confront it immediately rather than letting it sit dormant in your codebase.

// Rust debug build: this panics immediately
let v: Vec<i32> = vec![];
let x = v.len() - 1; // thread 'main' panicked at 'attempt to subtract with overflow'

// Rust with explicit overflow handling
let x = v.len().checked_sub(1); // returns Option<usize>
match x {
    Some(idx) => println!("Last index: {}", idx),
    None => println!("Empty vector"),
}

The honest take is that unsigned sizes made total sense in 1985, made some sense in 1998 when the STL was standardized, and are mostly legacy baggage today. We're running 64-bit systems where ptrdiff_t can represent offsets up to 9 exabytes. Nobody needs that extra bit of address range anymore. The cost we pay — wrapping arithmetic, compiler warnings about signed/unsigned comparison, the mental overhead of never doing subtraction without thinking twice — isn't worth the benefit. If the STL were designed from scratch today, you'd see a lot of arguments for using ptrdiff_t or a newtype wrapper with explicit overflow semantics. C++20's std::ssize() was added specifically to give you a signed size when you need it, which is basically the committee acknowledging the problem exists without being able to fix the underlying API.

The Three Ways Unsigned Sizes Actually Break Your Code

The subtraction underflow case catches almost everyone once. You write what looks like a perfectly normal reverse loop — for (size_t i = v.size() - 1; i >= 0; i--) — and it spins forever. The comparison i >= 0 is always true because size_t is unsigned and unsigned values cannot be negative. When i hits 0 and you decrement it, it wraps to SIZE_MAX (typically 18446744073709551615 on 64-bit). Your loop counter just became the largest possible number instead of stopping. The compiler might even warn you about this with -Wtype-limits or -Wsign-compare, but the warning is easy to miss in a noisy build output.

// This loops forever — i wraps to SIZE_MAX when it crosses 0
for (size_t i = v.size() - 1; i >= 0; i--) {
    process(v[i]);
}

// Fix option 1: iterate differently
for (size_t i = v.size(); i-- > 0; ) {
    process(v[i]);
}

// Fix option 2: use ptrdiff_t for any loop that needs to go negative
for (ptrdiff_t i = (ptrdiff_t)v.size() - 1; i >= 0; i--) {
    process(v[i]);
}

The signed/unsigned comparison trap is nastier because it looks completely benign. You have int n = -1 and you write if (n < v.size()), expecting that -1 is less than any size. It's not — at least not the way the CPU sees it. The C++ standard says that when you mix signed and unsigned in a comparison, the signed value gets converted to unsigned. So -1 becomes SIZE_MAX, and suddenly your check reads "is 18 quintillion less than v.size()?" which is false. This is the kind of bug that passes code review because the intent is obvious to a human but the compiler is doing something different. GCC and Clang both warn about this with -Wsign-compare (included in -Wall), so if you're ignoring -Wall warnings on a C++ project, this is a good reason to stop.

int n = -1;
std::vector<int> v = {1, 2, 3};

// Bug: n gets converted to unsigned, becomes SIZE_MAX
// This prints nothing instead of all elements
if (n < (int)v.size()) {
    std::cout << "n is in range\n";
}

// Fix: cast to signed explicitly before comparing
if (n < static_cast<int>(v.size())) {
    std::cout << "n is in range\n"; // now correctly prints
}

The arithmetic mixing bug is the most dangerous because it can corrupt memory without an obvious crash site. You have a size_t index and a int offset, and you want to step backward through a buffer. When you write size_t pos = base_index + offset and offset is -5, the -5 gets implicitly converted to unsigned before the addition. On a 64-bit system that's adding 18446744073709551611 to your index. You're no longer going backwards — you're jumping to an enormous address and reading garbage or triggering a segfault far from where the actual logic bug lives. This shows up constantly in audio DSP code and ring buffer implementations where negative offsets are a normal part of the design.

uint8_t buffer[1024];
size_t head = 100;
int lookback = -5; // "go back 5 bytes"

// Bug: lookback wraps to huge number, ptr is nowhere near buffer
uint8_t* ptr = buffer + head + lookback;

// Fix: do the signed arithmetic first, then cast
ptrdiff_t safe_offset = (ptrdiff_t)head + lookback; // -5 + 100 = 95, correct
if (safe_offset >= 0 && safe_offset < 1024) {
    uint8_t* ptr = buffer + safe_offset;
}

All three of these share the same root cause: the C/C++ implicit conversion rules are designed around the assumption that you know which operands are signed and unsigned at all times. The actual failure mode in each case is that the code compiles without errors, runs without immediate crashes in most inputs, and only misbehaves on the specific edge case (empty vector, value of -1, backward seek). That's what makes them hard to catch in manual testing. The practical fix is to enable -Wall -Wextra -Wsign-conversion in C++ and treat those warnings as errors in CI. In Rust this class of bug largely doesn't exist because the type system forces you to use checked_sub or explicit casts — the wrapping is opt-in rather than the default.

Reproducing the Underflow Bug Yourself

The most disorienting part of this bug is that it looks completely reasonable at first glance. You're iterating a vector backwards, starting from the last element. The code compiles clean. Then on an empty container it either hangs forever, segfaults, or silently reads garbage — depending on which language and which build mode you're in.

Here's the minimal C++ repro. Save this as repro.cpp:

#include <iostream>
#include <vector>

int main() {
    std::vector<int> v; // empty — this is the trap

    // size() returns 0, which is size_t (unsigned)
    // 0 - 1 wraps to 18446744073709551615 on 64-bit
    // the loop condition i < v.size() is then 18446744073709551615 < 0 — false immediately?
    // No. v.size() is also size_t, so the comparison is unsigned.
    // i starts at max size_t, which is NOT < 0, so... wait, it IS < 0 as unsigned?
    // No — 0 is the smallest unsigned value. So this loop DOES run, accessing v[huge_number].
    for (size_t i = v.size() - 1; i < v.size(); i--) {
        std::cout << v[i] << "\n";
    }

    return 0;
}

Build and run it:

g++ -std=c++17 -Wall -Wextra -o repro repro.cpp && ./repro

You might get a segfault immediately. You might get a hang on some systems if the optimizer does something unexpected. What you won't get is a warning — GCC and Clang will compile this without a peep even with -Wall -Wextra, because the types are consistent. Every operand is unsigned, so there's no signed/unsigned mismatch to flag. The underflow is completely invisible to the compiler's default checks. Add -fsanitize=address,undefined to the flags and you'll actually catch it at runtime — that's worth doing during development.

The Rust version shows you something more useful: the debug/release behavioral split. Drop this into src/main.rs:

fn main() {
    let v: Vec<i32> = vec![];

    // usize subtraction underflows exactly like C++ size_t
    // but Rust's behavior depends entirely on build mode
    let i: usize = v.len() - 1; // this is the line that matters

    println!("{}", v[i]);
}

Now run it both ways and watch the difference:

# Debug build — panics with an explicit overflow message
cargo run

# Release build — wraps silently to usize::MAX, then panics on the index
# but if you had a large enough Vec, it would just read wrong memory quietly
cargo run --release

In debug mode you get: thread 'main' panicked at 'attempt to subtract with overflow'. That's Rust doing you a favor. Flip to --release and Rust strips those overflow checks for performance — the subtraction wraps to 18446744073709551615 (that's usize::MAX on 64-bit) and only panics when you try to actually index into the empty vec. The failure mode changes entirely based on a build flag, which is the thing that catches people off guard when their tests pass in dev and something explodes in production.

The real lesson here isn't just "unsigned types wrap" — it's that the same logical bug produces three completely different observable behaviors: a compile-time silence in C++, a runtime panic in Rust debug, and a silent wrap-then-crash in Rust release. If you're writing code that iterates backwards over anything, the safest pattern in C++ is to cast explicitly or use a signed loop variable. In Rust, use v.len().checked_sub(1) which returns an Option<usize> and forces you to handle the empty case.

Catching These Bugs Before They Ship: The Toolchain

The most underused safety net I know is already on your machine — you just haven't wired it into your build. Running clang-tidy on a file you know has a signed/unsigned mismatch is genuinely humbling. It catches things that compile cleanly with zero warnings under default g++ settings. The command is simple:

clang-tidy repro.cpp -- -std=c++17

The bugprone-* check family is what you want here. It flags suspicious loop conditions like i <= container.size() - 1 where size() returns a size_t and size() - 1 wraps to a massive positive number when the container is empty. That's a class of bug that looks completely innocent in a code review. The cppcoreguidelines-narrowing-conversions check catches the other direction — silently stuffing a size_t into an int in a return statement. Drop this config file in your repo root so everyone on the team gets the same checks automatically:

# .clang-tidy
Checks: 'bugprone-*,cppcoreguidelines-narrowing-conversions'
WarningsAsErrors: 'bugprone-*'
HeaderFilterRegex: '.*'

Setting WarningsAsErrors for the bugprone-* family is opinionated but I'd argue it's correct. If you leave them as warnings, they get ignored in three sprints and you're back where you started. The HeaderFilterRegex: '.*' line matters — without it, clang-tidy skips headers and you miss half the issues.

UBSan + AddressSanitizer is your runtime backstop for anything clang-tidy misses statically. Unsigned wrap is technically defined behavior in C++, so UBSan won't catch that specifically, but it will catch the downstream consequences — out-of-bounds array accesses that result from the wrapped index, signed overflow in adjacent code, and misaligned reads. Compile your test builds like this:

g++ -fsanitize=undefined,address -g -O1 repro.cpp -o repro_san
./repro_san

The -O1 flag is important. -O0 generates so much redundant code that the sanitizer output gets noisy and slow. -O1 keeps the binary readable while letting the sanitizers do real work. Never run this in your release build — the binary is 2-3x slower and the memory overhead from ASan is significant. It belongs in your CI debug job, running against your full test suite, not in what you ship.

Rust handles this differently and honestly more honestly. In debug mode, integer overflow panics by default. But the thing that catches people: release builds silently wrap, same as C. If you're chasing a production bug involving sizes or counts, this flag gives you the debug-mode panic behavior without recompiling your whole dependency graph in debug mode:

RUSTFLAGS='-C overflow-checks=on' cargo run --release

Wire the sanitizer step into CI as a separate job, not a build variant of your release pipeline. Here's a minimal GitHub Actions snippet that won't slow down your main build:

jobs:
  sanitizers:
    runs-on: ubuntu-22.04
    steps:
      - uses: actions/checkout@v4
      - name: Build with UBSan + ASan
        run: |
          # separate build dir so it doesn't pollute release artifacts
          cmake -B build_san -DCMAKE_CXX_FLAGS="-fsanitize=undefined,address -g -O1"
          cmake --build build_san
      - name: Run tests under sanitizers
        run: ctest --test-dir build_san --output-on-failure
        env:
          ASAN_OPTIONS: halt_on_error=1:detect_leaks=1
          UBSAN_OPTIONS: halt_on_error=1:print_stacktrace=1

halt_on_error=1 in the sanitizer options is non-negotiable for CI. Without it, the sanitizer logs errors but exits 0, your CI job goes green, and you miss every violation. print_stacktrace=1 for UBSan means you get a useful stack trace instead of a one-liner that just tells you a file and line number with no context.

The Actual Fixes: Patterns That Work in Practice

The thing that bites most people isn't that they don't understand unsigned types — it's that they pick the wrong fix for the wrong context and end up with code that's either unreadable or still subtly broken. I've seen all five of these patterns misapplied, so let me be specific about when each one earns its keep.

Fix 1: ptrdiff_t or ssize_t for loop indices

ptrdiff_t is the signed type designed for pointer differences and index arithmetic. If you're writing a loop that might need to go negative — like walking backward with an offset calculation — reach for it instead of int. The practical reason: int is 32 bits even on 64-bit platforms, so on a container holding more than ~2 billion elements, your index silently overflows. ptrdiff_t matches the platform's pointer width.

// ssize_t is POSIX, not standard C++ — fine for Linux/macOS, not MSVC
#include <sys/types.h>

for (ssize_t i = static_cast<ssize_t>(v.size()) - 1; i >= 0; i--) {
    // i will correctly hit -1 and stop. With size_t this wraps to SIZE_MAX.
    process(v[i]);
}

The gotcha with ssize_t: it's POSIX-only. On Windows with MSVC, you don't have it. Use ptrdiff_t from <cstddef> instead — it's standard C++ and works everywhere.

Fix 2: Cast early, cast once, use int throughout

This is my go-to for legacy code I can't restructure. One cast at the top of the function buys you signed arithmetic everywhere below it, and it makes the intent obvious to the next person reading the code. The alternative — casting inline at every comparison — is where people introduce subtle bugs because they forget one spot.

void process_range(const std::vector<Item>& v, int offset) {
    // Cast once here. Don't scatter static_cast throughout the function.
    const int size = static_cast<int>(v.size());

    for (int i = size - offset; i < size; i++) {
        if (i < 0) continue; // now this actually works — int can be negative
        do_thing(v[i]);
    }
}

The honest trade-off: if your vector can exceed INT_MAX elements (~2.1 billion on most platforms), this truncates. That's basically never a real problem, but if you're writing infrastructure code that handles enormous datasets, use ptrdiff_t instead of int.

Fix 3: Reverse iterators sidestep the problem entirely

For clean reverse traversal with no index arithmetic, this is the cleanest option regardless of C++ version. No casts, no signed/unsigned tension, no off-by-one risk from a subtraction:

for (auto it = v.rbegin(); it != v.rend(); ++it) {
    process(*it);
}

// Or with range-based for in C++20 using std::views::reverse
#include <ranges>
for (auto& item : v | std::views::reverse) {
    process(item);
}

Where iterators fall short: when you actually need the index value — say, you're computing positions relative to the end, or you need to pass the index to another function. In that case you're back to index arithmetic and you need one of the other fixes. Don't contort your logic to avoid an index when you genuinely need one.

Fix 4: std::ssize() — the right answer if you're on C++20

std::ssize() was added specifically for this problem. It returns a signed ptrdiff_t, which means index arithmetic just works without any ceremony:

#include <iterator> // std::ssize lives here

// This is the cleanest C++20 solution for index-based reverse iteration
for (auto i = std::ssize(v) - 1; i >= 0; i--) {
    process(v[static_cast<size_t>(i)]); // cast back for the subscript operator
}

// Also works for size comparisons — no more -Wsign-compare warnings
if (std::ssize(v) > some_signed_int) { ... }

One small annoyance: the subscript operator on std::vector still takes size_type (unsigned), so you need to cast back when you actually index. Some compilers warn on this. I usually accept it — it's one cast at the usage site, and the comparison arithmetic stays clean throughout.

Fix 5: Rust's checked_sub — make the failure explicit

Rust makes this harder to get wrong by design: usize arithmetic panics in debug builds and wraps in release, so raw subtraction on len() will bite you immediately during testing rather than silently in production. The idiomatic fix is checked_sub, which forces you to handle the zero-length case:

// This panics in debug if v is empty: v.len() - 1 underflows
// Don't do this:
let last = v[v.len() - 1];

// Do this instead:
if let Some(last_idx) = v.len().checked_sub(1) {
    process(&v[last_idx]);
}

// Or more idiomatically — just use .last()
if let Some(item) = v.last() {
    process(item);
}

// For a counted-down loop, use a range and reverse it
for i in (0..v.len()).rev() {
    process(&v[i]);
}

The .rev() approach is the Rust equivalent of reverse iterators in C++ — it's idiomatic, compiles to tight code, and doesn't involve any subtraction. checked_sub shines when you're computing an index from external input and wrapping is a correctness issue, not just a performance one.

Picking the right fix for the situation

Reverse traversal, clean code, no index needed: iterators (rbegin/rend) in C++, .rev() in Rust. Zero cognitive overhead.
Index arithmetic in C++20 codebases: std::ssize(). It's in the standard, it's explicit, and it silences -Wsign-compare warnings that actually matter.
Subtracting from a usize in Rust: checked_sub when you genuinely might hit zero, .saturating_sub() when zero is a valid fallback, raw subtraction only when you've proven the operand is nonzero.
Legacy C++ you can't restructure: one static_cast<int> at the top of the function, then use int everywhere. It's unglamorous but readable and easy to code-review.
Pointer-width arithmetic or huge containers: ptrdiff_t over int — same signed semantics, platform-appropriate width.

The Signed vs Unsigned Size Debate: Where the Industry Actually Lands

The thing that trips most people up is assuming this debate has a clean answer. It doesn't — but the major authorities do lean heavily in one direction, and the reasoning is worth understanding rather than just copying the conclusion.

The C++ Core Guidelines are unusually direct here. Guidelines I.12 and ES.100 both push toward signed arithmetic as the default. The reasoning isn't philosophical — it's that unsigned types have wrapping behavior on underflow that silently produces a huge positive number, and the optimizer is legally allowed to assume signed overflow never happens (giving it more room to optimize), whereas unsigned overflow is defined behavior that compilers can't warn about as aggressively. In practice, mixing int and size_t in arithmetic is where most bugs hide. You subtract two sizes, get a negative conceptual result, and the type system hands you 18 quintillion instead of -1.

Google's internal practice is more pragmatic: they accept size_t in the codebase because fighting every STL API would be exhausting, but they lean on sanitizer runs to catch the wraps. That's the honest version of the approach — the discipline lives in the tooling, not the type declaration. If you're not running UBSan and ASan in your CI, you don't actually have that safety net. The type just looks safe.

# What Google-style discipline actually requires in CI
clang++ -fsanitize=undefined,address -g -O1 your_code.cpp -o out
# Then run your tests. Unsigned wraps show up as runtime errors here.
# Without this step, size_t gives you false confidence.

Rust picks a different compromise that I think is the most intellectually honest. usize is the correct type for indexing and sizes — the language doesn't pretend otherwise — but debug builds panic on overflow by default, and the ecosystem provides checked_sub, saturating_add, and wrapping_add as explicit choices rather than implicit behavior. You're forced to decide what you want to happen when arithmetic goes out of range. That's a better API design than letting the type silently wrap.

// Rust: the explicit choice pattern
let a: usize = some_length();
let b: usize = some_offset();

// This panics in debug if b > a — catches the bug
let diff = a - b;

// This is the intentional version: you're saying "I know this might fail"
let diff = a.checked_sub(b).expect("offset exceeded length");

// Or saturate to zero if negative conceptually makes no sense
let diff = a.saturating_sub(b);

The practical middle ground that I've landed on after debugging enough wrapping bugs: use size_t or usize when an API demands it, but convert to a signed type the moment you do any arithmetic that could go below zero. In C++, that means casting to ptrdiff_t or int64_t immediately after receiving the value, not at the point of subtraction. The cast at the operation site is too easy to forget. In C++, std::ssize() (added in C++20) gives you a signed size directly from containers, which removes the friction entirely. For teams building systems where code quality tooling matters as much as language choice, pairing these practices with a broader quality workflow matters — there's a useful breakdown of relevant tooling in this guide on Essential SaaS Tools for Small Business in 2026.

// C++20: just use ssize() and stop fighting it
#include <iterator>
#include <vector>

std::vector<int> v = {1, 2, 3, 4, 5};

// size() returns size_t — subtraction here is unsigned and risky
for (auto i = v.size() - 1; i >= 0; --i) { /* infinite loop bug */ }

// ssize() returns ptrdiff_t — signed, subtraction works correctly
for (auto i = std::ssize(v) - 1; i >= 0; --i) { /* works */ }

Setting Up a .clang-tidy Config That Actually Catches This

Most teams either skip clang-tidy entirely or run it with the default config and wonder why it doesn't catch the unsigned wrapping bugs they keep shipping. The default config is nearly useless for this class of problem — you need to explicitly opt into the checks that matter. Here's the minimal .clang-tidy file I use on projects where unsigned size bugs have actually bitten us:

# .clang-tidy
---
Checks: >
  -*, 
  bugprone-too-small-loop-variable,
  bugprone-narrowing-conversions,
  cppcoreguidelines-narrowing-conversions

WarningsAsErrors: >
  bugprone-too-small-loop-variable,
  bugprone-narrowing-conversions

HeaderFilterRegex: '.*'

CheckOptions:
  - key: bugprone-too-small-loop-variable.MagnitudeBitsUpperLimit
    value: '16'

The -* at the top is intentional — it disables everything first, then re-enables only what you want. If you don't do this, you'll get hundreds of style warnings from unrelated checks and everyone will start ignoring the output. The MagnitudeBitsUpperLimit option for bugprone-too-small-loop-variable controls when a loop counter type is considered "too small" relative to the container size type — setting it to 16 means a uint16_t iterating over a size_t-bounded container will actually get flagged.

To do a one-shot audit across your whole project, run this from the repo root:

# Requires a compile_commands.json — generate with cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
find . -name '*.cpp' -not -path './build/*' \
  | xargs clang-tidy --config-file=.clang-tidy -- -std=c++17 2>&1 \
  | grep 'warning:' \
  | sort | uniq -c | sort -rn \
  | head -20

The pipe chain is the part most people skip. Raw clang-tidy output is a wall of text with file paths, line numbers, and multi-line context — totally unreadable at scale. The grep | sort | uniq -c | sort -rn collapses it down to a frequency-ranked list of your actual violation patterns. The first time I ran this on a 40K-line codebase, the top result was the same narrowing conversion pattern copy-pasted into 23 different places. That's actionable. A raw dump is not.

The gotcha with xargs clang-tidy: if you don't have a compile_commands.json, clang-tidy falls back to guessing compiler flags and will miss half the real issues because it can't resolve your include paths. With CMake, generate it once:

cmake -B build -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
ln -s build/compile_commands.json .

For the pre-commit hook, you want it fast — running clang-tidy on the entire project on every commit will get the hook disabled within a week. The trick is staging-aware filtering. Add this to your .pre-commit-config.yaml:

repos:
  - repo: local
    hooks:
      - id: clang-tidy-staged
        name: clang-tidy (staged cpp files only)
        language: system
        # Only runs on staged .cpp files, not the whole tree
        entry: bash -c 'clang-tidy --config-file=.clang-tidy "$@" -- -std=c++17' --
        types: [c++]
        files: \.cpp$
        pass_filenames: true

The pass_filenames: true combined with types: [c++] means pre-commit automatically filters to only the staged C++ files and passes them as arguments to your entry script. You get real-time feedback on exactly what you're about to commit without waiting 30 seconds for a full project scan. One thing that trips people up: if a staged file includes a header that has the actual bug, clang-tidy will still report it — which is correct behavior, but can feel surprising when the warning points to a file you didn't touch.

What About Other Languages?

Python sidesteps this entire class of bug by making integers arbitrary precision and signed by default. You can't get a len() that wraps around to 4 billion. What you can get is a logic error where some calculation produces a negative number and you pass it to something expecting a positive size — like slicing a list with a computed negative index. Python will either clamp it or raise a ValueError depending on context, but it won't silently give you a gigantic unsigned value. The bug manifests loudly or not at all.

Go made a deliberate choice that I respect: len() and cap() return int, which is signed. The Go FAQ even calls this out explicitly — unsigned arithmetic causes too many subtle bugs for it to be the default. In practice I've written a lot of Go and almost never hit size-related underflow. The one place it still bites you is if you're calling into a C library via cgo and accepting a uint32 or size_t back. Then you're right back in the same territory as C. But for pure Go code, the designers made the right call.

JavaScript's size bugs are a different flavor entirely. array.length returns a non-negative integer, but all JS numbers are IEEE 754 doubles under the hood. You don't get unsigned wrapping — you get floating-point precision loss. An array with more than 253 elements (which you will never hit) would start losing precision in its length. More practically, you can accidentally assign a negative value to array.length and get a RangeError, or do math that produces NaN and then use that as an index. Neither is the silent-gigantic-number problem from C — they're loud failures or obvious NaN propagation.

Java returns a signed 32-bit int from both array.length and List.size(). Overflow is theoretically possible if you have a collection with more than 2.1 billion elements — but that's a machine-RAM problem before it's a correctness problem. Underflow in the C/C++ sense essentially doesn't happen. The closest analog is when developers subtract two size() results and forget that int subtraction can go negative, then pass that to something that treats it as a count. That's a logic bug, but it's a signed-integer logic bug — the type system at least represents the negative value correctly instead of wrapping it.

The pattern is clear: this specific underflow bug is almost exclusive to C, C++, and Rust because they're the languages that expose the machine-level size_t or usize directly as the primary type for sizes and lengths. That's not accidental — it mirrors how the hardware actually addresses memory, and both C and Rust are deliberately close to the metal. The tradeoff is that you get efficiency and explicit control, but you also inherit the footgun. Every other mainstream language puts a signed abstraction in front of that type, and that single decision eliminates a whole category of bugs without any measurable performance cost.

Quick Reference: Unsigned Size Fixes at a Glance

The Fixes, Side by Side

I keep this list open in a second monitor when reviewing PRs. The mistakes repeat themselves — countdown loops written by developers who learned C in a positive-index-only bubble, mixed arithmetic that silently wraps, and CI configs that would never catch any of it. Here's the exact pattern for each situation.

Countdown Loops by Language Version

C++17 — use ptrdiff_t or a reverse iterator. The trap is writing for (size_t i = n-1; i >= 0; i--) — when i hits 0 and decrements, it wraps to SIZE_MAX and your loop runs forever (or crashes). Two clean options:

// Option A: ptrdiff_t index — signed, same width as size_t on any target
for (std::ptrdiff_t i = static_cast<std::ptrdiff_t>(vec.size()) - 1; i >= 0; --i) {
    process(vec[i]);
}

// Option B: reverse iterator — no index math at all, prefer this when you don't need i
for (auto it = vec.rbegin(); it != vec.rend(); ++it) {
    process(*it);
}

C++20 — std::ssize() makes this cleaner. std::ssize() returns a signed type (typically ptrdiff_t) so the cast is baked in:

// std::ssize() is in <iterator>, no manual cast needed
for (auto i = std::ssize(vec) - 1; i >= 0; --i) {
    process(vec[i]);
}

Rust — checked_sub() or just reverse the iterator. Rust panics in debug on underflow and wraps in release, so neither is "safe by default" for logic correctness. The idiomatic fix is to never do the subtraction at all:

// Preferred: .rev() on a range — zero underflow risk, zero unsafe
for item in vec.iter().rev() {
    process(item);
}

// When you genuinely need the index value going downward:
for i in (0..vec.len()).rev() {
    process(&vec[i]);
}

// If you're computing offsets and need an explicit guard:
if let Some(prev) = current_index.checked_sub(1) {
    process(&vec[prev]);
}

Mixed Signed/Unsigned Arithmetic — Cast Before, Not After

The single most common mistake I see in code review: someone computes an offset with mixed types, gets a warning, and casts the result to silence it. That's wrong. By the time you cast the result, the unsigned wrap has already happened. Cast to signed before the arithmetic touches unsigned values:

// WRONG — wraps before the cast, you're casting garbage
size_t count = vec.size();
int offset = -3;
size_t result = static_cast<size_t>(count + offset); // UB if offset negative

// RIGHT — bring everything into signed space first
ptrdiff_t signed_count = static_cast<ptrdiff_t>(vec.size());
ptrdiff_t result = signed_count + offset; // safe, no wrap
if (result >= 0 && result < signed_count) {
    process(vec[result]);
}

CI Enforcement That Actually Catches This

Warnings without enforcement are just noise. The combo that works: UBSan in debug builds to catch runtime wraps, and clang-tidy in PR checks to catch the patterns statically before they even run. Here's a minimal CMake + clang-tidy setup:

# CMakeLists.txt — debug build with UBSan
if(CMAKE_BUILD_TYPE STREQUAL "Debug")
  target_compile_options(myapp PRIVATE -fsanitize=undefined -fno-omit-frame-pointer)
  target_link_options(myapp PRIVATE -fsanitize=undefined)
endif()

# .clang-tidy — the checks that catch signed/unsigned issues
Checks: >
  -*,
  bugprone-implicit-widening-of-multiplication-result,
  bugprone-signed-char-misuse,
  cppcoreguidelines-narrowing-conversions,
  performance-no-int-to-ptr
WarningsAsErrors: "*"

The WarningsAsErrors: "*" line is non-negotiable — without it, developers ignore the output after two weeks. Run this in your PR pipeline with something like run-clang-tidy -p build/ -checks=... src/ and fail the build on any hit. UBSan in debug catches the cases that slip through static analysis, especially in template-heavy code where the types only resolve at instantiation time. Together they cover about 90% of the unsigned size bugs I've seen in real codebases.

Disclaimer: This article is for informational purposes only. The views and opinions expressed are those of the author(s) and do not necessarily reflect the official policy or position of Sonic Rocket or its affiliates. Always consult with a certified professional before making any financial or technical decisions based on this content.

Originally published on techdigestor.com. Follow for more developer-focused tooling reviews and productivity guides.

DEV Community