DEV Community: David Álvarez Rosa

One Hundred Thousand Reads

David Álvarez Rosa — Sat, 18 Jul 2026 10:53:50 +0000

This site just passed one hundred thousand reads. It started as a public notebook that almost no one read. The plan hasn't changed: one post a month, no quick takes, only deep dives into things I care about. If you have read even one, thank you. That number is you.

Visitors stay five and a half minutes on average. Reddit and Hacker News send almost nine in ten of them; every search engine put together sends fewer than one in twenty.

The traffic comes in spikes: a post hits a front page, pulls a few thousand reads in a day, then goes quiet. The three most-read posts owe over 41,000 reads to a few such days.¹

Thank you, again, for reading.

Optimizing a Lock-Free Ring Buffer leads with 17,169 reads, followed by the Fundamental Theorem of Calculus (12,575) and Devirtualization and Static Polymorphism (11,961). ↩

Tuning a Server for Benchmarking

David Álvarez Rosa — Fri, 26 Jun 2026 16:50:06 +0000

Optimizing code starts with measuring it, and a measurement is only useful if it is repeatable: a 2% improvement is invisible under 5% of noise. Yet on an untuned machine the same binary can easily run several percent faster or slower between runs. In this post we take a tiny benchmark and tune the machine step by step, re-measuring after every change, until runs become deterministic.

Continue reading---Tuning a Server for Benchmarking

Self-Hosting on the Dark Web

David Álvarez Rosa — Mon, 01 Jun 2026 10:56:24 +0000

This site is now reachable over Tor as a hidden service, at a .onion address that resolves only inside the Tor network.¹ Tor relays and encrypts your traffic as it passes through thousands of volunteer-run servers, so that no single party can link who you are to what you are doing; a hidden service extends that anonymity to the server itself.

It's built by the nonprofit Tor Project, which advances human rights and freedoms through free software and open networks, so that anyone can use the internet free from tracking, surveillance, and censorship. The network only works because people use it, so consider supporting them or running a relay---your contribution helps millions stay safe and private online every day.

The hidden service

Install Tor and point a hidden service at a local port. Edit /etc/tor/torrc

HiddenServiceDir /var/lib/tor/blog/
HiddenServicePort 80 127.0.0.1:8080

The directory must be a dedicated, Tor-owned path---not your web root.² Restart Tor and read the address it generates

$ sudo systemctl restart tor@default
$ sudo cat /var/lib/tor/blog/hostname
dhevt6e4rtgbtr3jh53xrpwmgtilkah6nyjujocsspssrsexc7omxhid.onion

Serving the site

Tor forwards the onion's port 80 to 127.0.0.1:8080, so the web server just needs to listen there. Add an nginx server block for it---no TLS, no HTTP/2, no QUIC, since Tor speaks plain TCP and provides its own encryption.

server {
  listen 127.0.0.1:8080;
  server_name dhevt6e4rtgbtr3jh53xrpwmgtilkah6nyjujocsspssrsexc7omxhid.onion;

  root /srv/tor.david.alvarezrosa.com;
  index index.html;
  error_page 404 /404/index.html;

  location / {
    try_files $uri $uri/ =404;
  }
}

Reload nginx and the site is live on Tor.

Building for the onion

A static site bakes its base URL into absolute links, so a clearnet build would point visitors back to the clearnet domain even when served over Tor. The fix is to build a second copy with the onion as its base URL

$ hugo --minify --baseURL="http://dhevt6e4rtgbtr3jh53xrpwmgtilkah6nyjujocsspssrsexc7omxhid.onion/"

The deploy pipeline does this automatically: every push builds the site once per target---clearnet and Tor---and rsyncs each to its own web root, so the two stay in sync without any manual work.³

That's it. Read this site over Tor at dhevt6e4rtgbtr3jh53xrpwmgtilkah6nyjujocsspssrsexc7omxhid.onion.

Open it in the Tor Browser. There is no certificate authority, no DNS, and no exposed IP---the address is derived directly from a public key, and the connection is end-to-end encrypted by Tor itself. ↩
Tor stores the service's private key and hostname file here and insists on owning it (chmod 700, user debian-tor). Point it at your site files and Tor refuses to start. ↩
See First Steps on a New Server for the underlying machine; the full configuration lives in my homelab repository, and the site's own repository holds the GitHub Actions workflow that builds and deploys the Tor copy. ↩

[Boost]

David Álvarez Rosa — Sun, 24 May 2026 09:58:02 +0000

David Álvarez Rosa

May 23

Devirtualization and Static Polymorphism

#computerscience #cpp #performance #programming

4 min read

Devirtualization and Static Polymorphism

David Álvarez Rosa — Sat, 23 May 2026 21:36:21 +0000

Ever wondered why your "clean" polymorphic design underperforms in benchmarks? Virtual dispatch enables polymorphism, but it comes with hidden overhead: pointer indirection, larger object layouts, and fewer inlining opportunities.

Compilers do their best to devirtualize these calls, but it isn't always possible. On latency-sensitive paths, it's beneficial to manually replace dynamic dispatch with static polymorphism, so calls are resolved at compile time and the abstraction has effectively zero runtime cost.

Virtual dispatch

Runtime polymorphism occurs when a base interface exposes a virtual method that derived classes override. Calls made through a Base& are then dispatched to the appropriate override at runtime. Under the hood, a virtual table (vtable) is created for each class, and a pointer (vptr) to the vtable is added to each instance.

On a virtual call, the compiler loads the vptr, selects the right slot in the vtable, and performs an indirect call through that function pointer. The drawback is that the extra vptr increases object size, and the indirection through the vtable makes the call hard to predict. This prevents inlining, increases branch mispredictions, and reduces cache efficiency.

The best way to observe this phenomenon is by inspecting the assembly¹ code emitted by the compiler for a minimal example

class Base {
public:
  auto foo() -> int;
};

auto bar(Base* base) -> int {
  return base->foo() + 77;
}

For a non-virtual member function foo like in the example above, the free function bar issues a direct call

bar(Base*):
        sub     rsp, 8
        call    Base::foo()  // Direct call
        add     rsp, 8
        add     eax, 77
        ret

However, declaring foo as virtual changes bar's assembly into an indirect, vtable-based call

bar(Base*):
        sub     rsp, 8
        mov     rax, QWORD PTR [rdi]  // vptr (pointer to vtable)
        call    [QWORD PTR [rax]]     // Virtual call
        add     rsp, 8
        add     eax, 77
        ret

Devirtualization

Sometimes the compiler can statically deduce which override a virtual call will hit. In those cases, it devirtualizes the call and emits a direct call instead (skipping the vtable). For example, devirtualization is straightforward² when the runtime type is clearly fixed

struct Base {
  virtual auto foo() -> int = 0;
};

struct Derived : Base {
  auto foo() -> int override { return 77; }
};

auto bar() -> int {
  Derived derived;
  return derived.foo();  // compiler knows this is Derived::foo
}

The compiler is able to devirtualize even through a base pointer, as long as it can track the allocation and prove there is only one possible concrete type. The problem is that with traditional compilation, object files are created per translation unit (TU)---compiled and optimized in isolation. The linker simply stitches those objects together, so cross-TU optimizations are inherently limited. That's where compiler flags are useful.

-fwhole-program
: tells the compiler "this translation unit is the entire program." If no class derives from Base in this TU, the compiler is free to assume nothing ever does, and can devirtualize calls on Base.

-flto
: link-time optimization. Keeps an intermediate representation in the object files and optimizes across all of them at link time, effectively treating multiple source files as a single TU.

On the language side, final is a lightweight way to give the compiler the same guarantee for specific methods

class Base {
public:
  virtual auto foo() -> int;
  virtual auto bar() -> int;
};

class Derived : public Base {
public:
  auto foo() -> int override;  // override
  auto bar() -> int final;     // final
};

auto test(Derived* derived) -> int {
  return derived->foo() + derived->bar();
}

Here, foo() can still be overridden, so derived->foo() remains a virtual call. However, bar() is marked as final, so the compiler emits a direct call even though it's declared virtual in the base

test(Derived*):
        push    rbx
        sub     rsp, 16
        mov     rax, QWORD PTR [rdi]
        mov     QWORD PTR [rsp+8], rdi
        call    [QWORD PTR [rax]]       // Virtual call
        mov     rdi, QWORD PTR [rsp+8]
        mov     ebx, eax
        call    Derived::bar()          // Direct call
        add     rsp, 16
        add     eax, ebx
        pop     rbx
        ret

Static polymorphism

When the compiler can't devirtualize, one option is to use static polymorphism instead. The canonical tool for this is the Curiously Recurring Template Pattern³ (CRTP). With CRTP, the base class is templated on the derived class, and invokes methods on it via static_cast---no virtual keyword involved

template <typename Derived>
class Base {
public:
  auto foo() -> int {
    return 77 + static_cast<Derived*>(this)->bar();
  }
};

class Derived : public Base<Derived> {
public:
  auto bar() -> int {
    return 88;
  }
};

auto test() -> int {
  Derived derived;
  return derived.foo();
}

With -O3 optimization, the compiler inlines everything and constant-folds the result. No vtable, no vptr, no indirection. Fully optimized⁴ call.

test():
        mov     eax, 165  // 77 + 88
        ret

Deducing this. C++23's deducing this keeps the same static-dispatch model but makes it easier to write. Instead of templating the entire class (and writing Base<Derived> everywhere), you template only the member function that needs access to the derived type, and let the compiler deduce self from *this

class Base {
public:
  auto foo(this auto&& self) -> int { return 77 + self.bar(); }
};

class Derived : public Base {
public:
  auto bar() -> int { return 88; }
};

This yields identical optimized code: foo is instantiated as foo<Derived>, and the call to bar is resolved statically and inlined.

Assembly generated with gcc at -O3 on x86-64. Similar results were observed with clang on the same platform. ↩
The compiler emits a direct call to Derived::foo (or inlines it), because derived cannot have any other dynamic type. ↩
The curiously recurring template pattern is an idiom where a class X derives from a class template instantiated with X itself as a template argument. More generally, this is known as F-bound polymorphism, a form of F-bounded quantification. ↩
The trade-off is that each Base<Derived> instantiation is a distinct, unrelated type, so there's no common runtime base to upcast to. Any shared functionality that operates across different derived types must itself be templated. ↩