DEV Community: Slim

ClickHouse system tables ate my disk (and the fix)

Slim — Thu, 09 Jul 2026 15:13:13 +0000

TL;DR.

A monitoring alert said I was dropping check results. The disk was 100% full. My actual data was 20 MB. ClickHouse had quietly written 12 GB of logs about itself, mostly system.text_log and system.trace_log. Those same logs also burn CPU at idle. The fix is a few lines of ClickHouse config that disable the noisy logs and slow the metrics collector. Full config is below.

The alert

One morning a critical alert fired: UptimepageResultsLost. Its description is blunt: storage write failures or dropped results, checks run but results are not persisting. Then it did something worse than fire once. It cleared, fired again, cleared, and fired again, over and over.

I run Uptimepage, an uptime monitoring service. "Results not persisting" means the one thing customers pay for, recording whether their sites are up, might be failing. So it had my full attention.

The good news first: no data was lost. The write path retries, and every failed write was caught by a retry. I keep a counter for results that actually get dropped, and it stayed at zero the whole time. But something was clearly wrong underneath.

The real signal: a full disk

The first real signal came from Postgres, which had crashed and restarted a couple of minutes earlier:

FATAL: could not write lock file "postmaster.pid": No space left on device

The disk was 100% full. Postgres could not write its lock file, crashed, and recovered on its own through WAL replay. ClickHouse, where I store raw check results, was rejecting inserts with its own version of the same complaint:

Code: 243. DB::Exception: Cannot reserve 1.00 MiB, not enough space:
While executing WaitForAsyncInsert. (NOT_ENOUGH_SPACE)

So the alert was a symptom. The real problem was a full disk. That reframes the question: I do not store much, so what filled it?

Leak one: old Docker images

docker system df gave the first half of the answer:

88 Docker images, only two of them in use. Every deploy pulls a fresh image, and nothing pruned the old ones, so they piled up for weeks. A docker image prune -af reclaimed about 10 GB and took the disk off the ceiling.

That stopped the bleeding. But 13.8 GB of volumes is a lot for a service whose data I thought was tiny. That number turned out to be the real story.

Leak two: ClickHouse logging about itself

I went into ClickHouse and asked the obvious question, which table is big:

SELECT database, formatReadableSize(sum(bytes_on_disk)) AS size, sum(rows) AS rows
FROM system.parts
WHERE active
GROUP BY database
ORDER BY sum(bytes_on_disk) DESC

The answer stopped me:

Database	Size	Rows
`system`	11.8 GiB	577,340,927
`monitor`	19.7 MiB	1,813,884

monitor is my data: every check result and rollup I keep. About 20 MB. The system database, ClickHouse's own diagnostic tables, was 11.8 GiB. Nearly all of the storage was ClickHouse logging about itself.

Breaking system down by table:

Two tables did most of the damage. system.text_log (5.29 GiB) is a copy of the server's own log output written into a table. system.trace_log (3.02 GiB) is the query profiler, which samples running queries. Both are handy when you are actively debugging ClickHouse. Neither is worth multiple gigabytes when I am not. And text_log is off by default, so something in my setup had switched it on.

The fix

Two parts: reclaim the space now, and stop it coming back.

Reclaim now. The system log tables are throwaway diagnostics, not real data. Truncate them:

TRUNCATE TABLE IF EXISTS system.text_log;
TRUNCATE TABLE IF EXISTS system.trace_log;
-- and the rest of the system.*_log tables

That gave back 12 GB at once and took the disk from 74% down to 41%.

Stop the regrowth. Truncating is a one-time cleanup. Without a cap, the logs fill right back up. The durable fix is ClickHouse config, added under /etc/clickhouse-server/config.d/. I watch ClickHouse through Grafana, not these tables. So I disable almost all of them and keep only query_log and part_log, both bounded, for the rare hands-on debugging session:

<clickhouse>
    <!-- Sample async metrics every 60s, not every second. -->
    <asynchronous_metrics_update_period_s>60</asynchronous_metrics_update_period_s>

    <!-- Disable the log tables. remove="1" on an absent table is a no-op,
         so this list is safe to paste as-is across ClickHouse versions. -->
    <asynchronous_metric_log remove="1"/>
    <asynchronous_insert_log remove="1"/>
    <backup_log remove="1"/>
    <error_log remove="1"/>
    <crash_log remove="1"/>
    <metric_log remove="1"/>
    <query_metric_log remove="1"/>
    <query_thread_log remove="1"/>
    <query_views_log remove="1"/>
    <session_log remove="1"/>
    <text_log remove="1"/>
    <trace_log remove="1"/>
    <opentelemetry_span_log remove="1"/>
    <zookeeper_log remove="1"/>
    <processors_profile_log remove="1"/>
    <latency_log remove="1"/>
    <background_schedule_pool_log remove="1"/>

    <!-- Keep query_log and part_log, bounded, for on-hand debugging. -->
    <query_log>
        <ttl>event_date + INTERVAL 3 DAY DELETE</ttl>
        <max_size_rows>1048576</max_size_rows>
    </query_log>
    <part_log>
        <ttl>event_date + INTERVAL 3 DAY DELETE</ttl>
        <max_size_rows>1048576</max_size_rows>
    </part_log>
</clickhouse>

remove="1" disables a system log completely. The <ttl> element on query_log and part_log adds a TTL and keeps the table's default partitioning, so you do not have to restate the whole engine. ClickHouse picks up the change after a restart. Altinity's "system tables ate my disk" note covers the same ground and is worth a read. If you would rather keep the diagnostics, give every log a short <ttl> instead of disabling it.

Bonus: lower CPU, not just disk

Those log tables are written constantly, and on a small server that steady write load shows up as CPU. A long-running ClickHouse issue tracks the server using noticeable CPU at zero load, #60016. People there report dropping from 40 to 70% CPU down to about 1.5% after disabling the logs and slowing the async-metrics collector. So those two settings, the asynchronous_metrics_update_period_s line and the remove="1" block, pay off twice: less disk and less CPU.

The real lesson: alert on the cause, not the effect

Here is the part that stings. I had an alert for "results are being dropped." I did not have an alert for "the disk is filling up." So a full disk is a slow, predictable problem that builds over days. But it only reached me as a sudden downstream symptom, after Postgres had already crashed once.

A downstream alert like "results lost" is not a substitute for watching the resource that actually runs out. I added the missing one: a plain host disk-space alert on node_filesystem_avail_bytes, firing at 80% and 90% used, well before anything starts failing. That is the alert that should have caught this on day one.

Key takeaways

ClickHouse system log tables are unbounded by default and can dwarf your real data. Mine were 11.8 GB against 20 MB of actual data.

system.text_log and system.trace_log are the usual offenders. text_log is off by default, so check whether something enabled it.

Cap them in config: remove="1" to disable a log, or <ttl> plus <max_size_rows> to bound the ones you keep. Truncate to reclaim space right away.

It is not just disk. The same logs burn CPU at idle on small servers. Disabling them, plus asynchronous_metrics_update_period_s = 60, took reporters in ClickHouse issue #60016 from 40 to 70% CPU down to about 1.5%.

Anything that pulls artifacts on a schedule, Docker images in my case, needs matching cleanup or it becomes a slow disk leak.

Alert on the cause (disk space), not only the effect (dropped writes). The cause gives you days of warning; the effect gives you minutes.

This happened on Uptimepage, the uptime monitor I run and dogfood.

Postgres vs ClickHouse? I use both. 4 tricks from the split.

Slim — Thu, 09 Jul 2026 05:55:03 +0000

TL;DR. My uptime monitor keeps config and incidents in Postgres and check results in ClickHouse. The split is one rule: does a row ever change? Config gets edited, so it wants Postgres transactions and constraints. A check result is written once and never touched again, so it goes to ClickHouse, where the right codec, a careful sort key, and a per-row TTL make billions of rows cheap. Four tricks and a bonus below, useful even if you only ever run one database.

Every monitor I run writes a row every 20 seconds, from every region, and never stops. One monitor is about 4,300 rows a day, per region. Multiply by every monitor on the platform and the rows only ever go up.

That stream would slowly crush a normal Postgres table, and watching it is the whole product. So the check results do not live in Postgres. They live in ClickHouse. Everything else, the monitors and incidents and teams, lives in Postgres. The interesting part is the line between them.

People ask why not one database. "Postgres vs ClickHouse" is the wrong question, because the two are not fighting over the same job. Here is the line I draw, and the one rule under all of it: split your data by how it is written, not by which engine is faster.

1. Does the row change? That picks the database

Forget benchmarks for a second. One question sorts a table into one store or the other: after you write a row, will you ever change it?

one monitor, two stores:

  you edit a monitor  ->  Postgres   (targets, incidents, team)
                          the row changes, has constraints, lives in a transaction

  a probe checks it   ->  ClickHouse (check_results)
                          one row, written once, never updated

A monitor is a row that changes. You toggle it off, edit the URL, change the interval. So it lives in Postgres:

CREATE TABLE targets (
    id            UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name          TEXT NOT NULL,
    check_spec    JSONB NOT NULL,
    interval_secs INTEGER NOT NULL CHECK (interval_secs >= 10),
    enabled       BOOLEAN NOT NULL DEFAULT true,
    updated_at    TIMESTAMPTZ NOT NULL DEFAULT now()
);

That CHECK means a 3-second interval can never reach the table, no matter which part of the app tried to write it. This is Postgres doing the thing it is best at: a small set of rows that must stay correct while many callers change them at once.

A check result is the opposite. It is written once when a probe finishes, and then it never changes. Nothing ever updates it, so it does not need any of that:

CREATE TABLE check_results (
    org_id      UUID,
    target_id   UUID,
    region      LowCardinality(String),
    timestamp   DateTime('UTC') CODEC(DoubleDelta, ZSTD(1)),
    status      Enum8('up' = 1, 'down' = 2, 'degraded' = 3, 'error' = 4),
    duration_ms UInt32 CODEC(T64, ZSTD(1))
) ENGINE = MergeTree

No updated_at, no foreign key, no transaction. Just an append-only stream that grows forever. That is the shape ClickHouse is built for and the shape that slowly hurts Postgres.

Notice the types too. status is an Enum8, one byte on disk, not the string "up". region is LowCardinality, stored once in a dictionary and referenced by a small id instead of repeating the text on every row. On a table that only ever grows, a byte saved per row is a byte saved times billions.

2. The right codec turns a day of timestamps into almost nothing

Once the results are in a column store, the type on each column is most of the compression, and the default setting wastes a lot of space.

Look at that timestamp again. A monitor checks every 20 seconds, so for one monitor a day of timestamps is a run of 20, 20, 20, 20, over and over. DoubleDelta stores the change in the change. For a steady interval the first delta is a constant 20 and the second delta is zero, so each row after the first packs down to about a bit. The codec is not a ClickHouse invention: it comes from Facebook's Gorilla time-series paper, built for exactly this, measurements taken at a steady rate.

timestamp   DateTime('UTC')  CODEC(DoubleDelta, ZSTD(1)),
duration_ms UInt32           CODEC(T64, ZSTD(1)),
dns_ms      Nullable(UInt16) CODEC(T64, ZSTD(1)),

I also store the timestamp at whole seconds on purpose. The smallest interval is 20 seconds, so no two checks for one monitor ever land in the same second, and the sub-second detail lives in duration_ms where it belongs. Whole seconds that step by a fixed amount compress far harder than milliseconds would.

The latency columns get T64 instead, because they are small integers that stay in a small range, a different shape from a steady clock. T64 crops the unused high bits off a block of values, so a UInt32 that never climbs past a few thousand milliseconds stops paying for all 32 bits. That is the trick worth copying, and it is not the specific codec names. It is that a timestamp ticking by a fixed step and a latency staying in a small range are two different shapes, and telling the database which is which does the work.

3. Put the tenant in the sort key, not the partition

I got this one wrong first, so learn it from my mistake instead of your own.

This is a multi-tenant app, so every row carries an org_id and almost every query filters by it. My first schema partitioned by org, one slot per customer, because it felt tidy. ClickHouse turned that into a flood of tiny parts, the merges could not keep up, and startup got slower the more customers I had. I ripped it out. A partition is not a folder for tidiness. It is a physical unit ClickHouse merges and expires, and you want few large ones, not many small ones.

Here is what it should be:

) ENGINE = MergeTree
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY (org_id, target_id, region, timestamp)

org_id leads the ORDER BY, so a per-org query walks the sorted primary index and reads only that org's slice. But it stays out of PARTITION BY. The rule I follow now: partition by something low-cardinality that you also delete by, here the day, and put the high-cardinality tenant key in the sort order. Sort key answers "find this org fast". Partition answers "drop old data cheaply", which is the next trick.

4. Make retention a number in a row, not a schema change

Different plans keep history for different lengths of time. The clumsy way is a migration or a cleanup job per plan. The clean way is to make the retention window a column:

ttl_days UInt16 DEFAULT 30 CODEC(ZSTD(1))
) ENGINE = MergeTree
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY (org_id, target_id, region, timestamp)
TTL timestamp + toIntervalDay(ttl_days)

Each row carries its own ttl_days, stamped from the org's plan at the moment it is written. A free plan keeps 30 days, a paid plan keeps more, and changing that needs no ALTER, no migration, no backfill. The next write just stores a different number. The daily partitions line up with the TTL, so ClickHouse expires old data by dropping whole parts, close to free, and a whole column of the same 30 compresses away to nothing. The flexibility costs no space.

One more piece makes reading that history cheap. A materialized view rolls raw checks into per-minute and per-hour summaries as they land:

CREATE MATERIALIZED VIEW check_results_1m
ENGINE = AggregatingMergeTree
ORDER BY (org_id, target_id, region, minute)
AS SELECT
    org_id, target_id, region,
    toStartOfMinute(timestamp)  AS minute,
    countState()                AS total_checks,
    countIfState(status = 'up') AS up_checks,
    avgState(duration_ms)       AS avg_duration_ms
FROM check_results
GROUP BY org_id, target_id, region, minute;

So a dashboard showing 30 days across a thousand monitors reads a few thousand minute-buckets, not millions of raw rows. Recent views read the minute rollup, long history reads an hour rollup, and the raw rows underneath expire on their own TTL. The firehose is there when you need to drill into one bad minute, and left alone the rest of the time.

One more, because everyone hits it: make inserts idempotent

Here is the trick I wish someone had told me first. My agents batch check results and send them to ClickHouse. A network blip after the server commits but before my side gets the ack means the batch retries and sends the exact same block again. Without protection, that double-counts every row in it, and your uptime numbers quietly drift.

ClickHouse has a fix built in, but for a plain (non-Replicated) MergeTree it is off until you turn it on:

SETTINGS index_granularity = 8192,
         non_replicated_deduplication_window = 1000;

The server hashes each inserted block and remembers the last 1,000 hashes. A retry sends an identical block, the hash matches, and the server drops it instead of appending it again. The retry becomes safe to do blindly, so the client stays simple: send, and if unsure, send again. Make the window bigger than the most blocks a single retry could resend, and you stop having to worry about duplicate writes.

So, one database or two?

For almost every app, one. If your data fits in Postgres and stays fast, a second database is a cost you should not pay: two schemas, two clients, two things to back up and think about.

Reach for the second store only when one table stops looking like the rest of your tables. For me that table was check_results. It is written once and never edited, it grows without end, and every question I ask it is a summary over a time range. That is a different shape from my monitors and my incidents, so it wanted a different database.

The rule I would give my past self: do not split by "which database is faster". Split by how the data is written. Rows that change and must stay correct want Postgres. An append-only stream you only ever summarize wants a column store like ClickHouse. Most of the tricks above are that one idea pushed down into the schema.

Uptimepage is open source, AGPL-3.0, and both schemas are in the repo: github.com/uptimepage/uptimepage. The probe that writes those rows is its own post, and the wider build story, one binary and two databases, is here. Or start free on the hosted tier and point a check at something.

4 things writing an HTTP prober in Rust taught me

Slim — Tue, 07 Jul 2026 14:07:03 +0000

TL;DR. An uptime prober is a strange HTTP client, so I swapped reqwest for raw hyper. It needs the opposite of what a normal client gives you: no connection pool, so the cold path gets measured instead of hidden; every phase timed, even when connect fails; a hard SSRF block on the URL the user typed; and exact failure reasons instead of one flat error. Four tricks below, each worth copying even if you never build a monitor.

I built an uptime checker in Rust. Like most people, I used reqwest first. Then I dropped it and moved down to raw hyper.

Not because reqwest is slow. It is great, and for app code you should use it. But a prober is a strange kind of HTTP client. It makes one request, and its whole job is to measure that request and report exactly what happened. A normal client is built to do the opposite: make requests fast and hide the messy details. Every trick below is a messy detail I needed to keep.

Here are four things building it taught me. Each one is a habit worth copying, even if you never write a monitor.

1. A connection pool lies to a monitor

The first thing reqwest gives you is a connection pool, and it is the first thing I had to remove.

A pool keeps TCP connections open and reuses them. The second request to a host skips DNS, skips the TCP handshake, skips TLS, and comes back much faster. For app code that is a free speed boost. For a monitor it is a lie. If I reuse a warm connection, the "connect time" I report is near zero, because I did not connect, I borrowed. A user watching a slow TLS handshake slowly rise in one region would see nothing, because the pool hid the handshake.

So the connector makes exactly one request per connection, then drops it:

/// No pooling: the caller drives exactly one request over the returned
/// stream, then drops it.
pub(crate) async fn timed_connect(
    p: &ConnectParams<'_>,
    host: &str,
    port: u16,
    is_https: bool,
) -> Result<TimedConnection, ConnectError> {

Every check pays the full cost of a fresh connection, because the full cost is the thing I am trying to measure. The lesson is general: if you are timing a request, a warm pool is measuring the wrong thing. Reuse is a feature that erases exactly the numbers a monitor exists to show.

2. Put the timings inside the error

Splitting a request into DNS, TCP, TLS, and time to first byte is easy on the happy path. The interesting question is what happens when connect fails.

The naive design loses everything on failure. You get an Err and no numbers, so a hung TLS handshake looks the same as a dead DNS server. Both are just "failed." That is the worst moment to have no data.

The fix is small, and I now use it everywhere: make the error carry the phases that finished before the one that broke.

pub(crate) enum ConnectError {
    Dns(anyhow::Error),
    NoAddrs { dns_ms: u16 },
    Connect { err: io::Error, dns_ms: u16 },
    Tls { err: io::Error, dns_ms: u16, connect_ms: u16 },
}

A TLS failure still reports how long DNS and TCP took. So when a cert handshake hangs, the chart still shows DNS was 12ms, connect was 45ms, and then TLS took the rest. The shape of the error tells you what broke. You are never left guessing which layer stalled, even on the path where everything went wrong. reqwest gives you one flat Error at the very end. Here the error type holds the answer.

one probe, one fresh connection, each phase on its own clock:

  DNS         TCP connect      TLS handshake     request + TTFB
  +-----------+----------------+-----------------+---------------->
   12ms        45ms             (hung)            never reached

  TLS failed. The error still carries dns_ms=12, connect_ms=45,
  so you know exactly which layer stalled, not just "it failed".

3. The SSRF check goes after DNS, not before

This is the one most people building a fetch-any-URL feature get wrong, and it is the scary one.

A checker fetches whatever URL the user typed. That is the feature. It is also a classic SSRF engine. Someone signs up, points a check at http://169.254.169.254/latest/meta-data/, and now your prober is reading cloud credentials from inside your own network and showing them the reply. Point it at http://10.0.0.5:6379 and it is a port scanner for your private subnet.

The trap is thinking you can block this by checking the hostname. You cannot. The classic bypass is DNS rebinding: the name looks public and passes your check, but it resolves to 127.0.0.1. The block has to happen after resolution, on the real IP you are about to dial:

let addrs: Vec<SocketAddr> = p
    .resolver
    .resolve_addrs(host)
    .await
    .map_err(ConnectError::Dns)?
    .into_iter()
    .filter(|ip| p.ssrf_guard.allow(*ip))   // block loopback, private, link-local, metadata
    .map(|ip| SocketAddr::new(ip, port))
    .collect();

Resolve first, filter every resolved IP, then connect only to what survives. And because each redirect hop reconnects through this same guarded path, a Location: header pointing at 169.254.169.254 gets rejected exactly like a directly typed one. The safety sits in the single place all connections pass through, so there is nothing to forget on the redirect path. If you ever fetch a user-supplied URL server-side, this is the check that matters, and the hostname version is not it.

4. Stop grepping your error strings

When a check goes red the user gets one line, and that line has to be right. "Something went wrong" trains people to ignore alerts. "Certificate expired" tells them what to fix in five seconds.

The tempting way to produce that line is to match on error message text. It is a trap: those strings change between library versions and read differently on every OS. The message is not an API. So instead the connector downcasts to the real error and reads the reason straight out of it:

fn tls_reason(io: &io::Error) -> &'static str {
    use rustls::CertificateError;
    let Some(rustls::Error::InvalidCertificate(cert)) =
        io.get_ref().and_then(|e| e.downcast_ref::<rustls::Error>())
    else {
        return "tls";
    };
    match cert {
        CertificateError::Expired => "certificate expired",
        CertificateError::UnknownIssuer => "certificate not trusted",
        CertificateError::NotValidForName => "certificate hostname mismatch",
        CertificateError::Revoked => "certificate revoked",
        _ => "certificate invalid",
    }
}

An expired cert, an untrusted issuer, and a hostname mismatch are three different problems with three different fixes, and the user should never have to guess which one they have. The io::ErrorKind and the rustls error already carry the truth. The trick is to refuse to flatten it into a string and then re-parse the string you just made.

So, reqwest or not?

For your program? Almost certainly reqwest. Going down to hyper means you own redirects, decompression with a size cap, timeout budgeting per hop, and the request-target rules for HTTP/1 versus HTTP/2. That is real code with real edge cases, and every one is a place reqwest would have been correct for free. If I needed to fetch a config file at startup, writing this connector would be a mistake.

It is worth it here for one reason: measuring and guarding the request is the product, not a detail of it. When the request itself is the thing you sell, you want to own every millisecond and every IP it touches.

Uptimepage is open source, AGPL-3.0, and the whole probe path is in the repo: github.com/uptimepage/uptimepage. If you want the wider build story, one binary and two databases, that is a separate post. Or just start free on the hosted tier and point a check at something.

Email bombing through uptime monitoring pages

Slim — Mon, 06 Jul 2026 13:14:34 +0000

Cover photo by Brian J. Tromp on Unsplash.

I build an uptime monitoring tool. Part of my job is to attack it before someone else does. This post is about one attack that most people in this space do not talk about: using monitoring tools to flood someone's inbox with email.

I will show you how it works from the attacker side. Then I will show you how to stop it from the builder side. If you run any tool that sends email to an address a user typed in, this is your problem too.

What email bombing is

Email bombing is simple. You flood one inbox with so many messages that the person cannot use it. Hundreds of emails in a few minutes. When the attacker uses signup forms to do it, people call it subscription bombing or list bombing. Same idea. Anti-abuse groups describe it as a denial-of-service attack on your inbox, and IT teams saw a rise in 2026.

The goal is usually not the flood itself. The flood is a cover.

Say an attacker just stole your card and bought something. Your bank sends an email or a text: "Did you make this payment?" If your inbox has hundreds of new emails in a few minutes, you will never see that one bank message. By the time you dig it out, the money is gone. The flood buys the attacker time and quiet. Security teams have watched attackers use this trick to hide real fraud.

It is not always about money. Some ransomware crews use email bombing as step one. They flood a worker with mail, then call and pretend to be the IT help desk: "We see the spam, let us help you fix it." The panic is the way in. The flood softens the target for the real attack. The US cyber agency CISA documented this play: email bombing plus a fake help-desk chat on Microsoft Teams, used by the Black Basta ransomware crew to get in. Former members kept using it through 2025.

Why uptime pages make a good weapon

Photo by Moritz Erken on Unsplash.

Here is the part that surprised me the first time I looked.

Most uptime and status tools let you add an email address for alerts. You type an address, the tool sends a "confirm your email" message, and you click a link to turn alerts on. Normal stuff.

Now look at it as an attacker. The tool will send an email to any address I type. I do not need to own that address. I type the victim's email and hit save. The confirm email goes to the victim, not to me.

One email is not a bomb. But there are many of these tools. And inside each tool I can make many monitors, each with its own alert email. So I write a small script. Sign up, add the victim's address to fifty monitors, move to the next tool, repeat. The victim gets a wave of "please confirm" emails from services they have never heard of.

The mean part is where these emails come from. They come from real companies with a good name. They pass the checks that tell Gmail a sender is real (SPF and DKIM, if you want the terms). So they land in the main inbox, not in spam. One study in 2026 found that most of these emails were not marked as spam and reached the inbox. A normal spam filter usually will not save the victim, because on paper none of it is spam.

The builder side: how not to be the tool that floods people

I will be honest. When I first tested my own tool, it had this hole. The confirm email had no real limit and no clear owner. So I fixed it. Here is what actually helped, in order.

Rate limits are the floor. Cap how many confirm emails one address can get per day across your whole platform, not just per account. If one attacker opens twenty accounts, a per-account limit does nothing. Count by the target address.

Say who and why. My old confirm email just said "confirm this address." Now it says who added it and to what: "Acme Inc added this address to a monitor." If a stranger gets one, at least they can see what is happening.

Give a fast way out. Put a one-click "this was not me, stop" link in the email. One click kills that address on your tool. No login, no support ticket. There is a standard for this, RFC 8058, so mail apps can show a stop button right in the message.

Block the obvious traps. Do not allow role addresses like postmaster@ or abuse@. They are shared mailboxes, not a person who asked for alerts. And do not allow your own company domains, because a tool that emails itself can start a mail loop.

Skip the email when you can. If the person adding the address is logged in with that same email through Google or GitHub, you already know they own it. Mark it confirmed and send nothing.

None of this is clever. It is boring plumbing. But boring plumbing is what keeps your name off the list of services used in someone's email bomb.

If you are the one getting bombed

Do not delete everything in a panic. That is exactly what the attacker wants. Somewhere in that pile there may be one real email that matters, like a bank alert or a password reset you did not ask for.

First, search your inbox for words like "payment", "login", "password", or your bank's name. Handle the real one first. Call your bank if you see a charge you did not make.

Then use the stop and unsubscribe links in the junk. Most real services put them there. It is slow work, but it cuts the flood at the source.

If it keeps coming, tell your email provider. Gmail and others can filter a flood once they see the pattern. And if someone calls you "from IT" right after the flood starts, be careful. That call can be part of the attack.

One last thing

I like uptime tools. I build one. But any tool that will email a stranger on request can be turned into a small weapon, and most builders never test for it. If you run one of these, spend an afternoon trying to bomb yourself. You will learn more from that hour than from any security checklist, and you will find the holes before someone else does.

More reading: how I built this monitor, and how to report abuse to us.

Building an uptime monitor in Rust: one binary, two databases, 130K checks/sec per core

Slim — Tue, 30 Jun 2026 14:22:09 +0000

I spent the last few months building Uptimepage, an open-source uptime monitor and status page written in Rust. This post is the build story: the decisions that shaped it, the parts I rewrote, and the numbers that came out the other side.

The whole thing ships as one self-contained binary of about 23 MB, plus Postgres and ClickHouse. You can docker compose up and self-host it, or use the hosted tier. Source is AGPL-3.0 on GitHub.

Why one binary

A status page is a small surface with a lot of moving parts behind it: a scheduler, probe workers, an HTTP client, a time-series writer, an incident detector, an alerting fan-out, a web UI, a JSON API. The usual answer is a handful of services and a message bus between them.

I went the other way. One process, one binary, the same image whether it runs the control plane or a remote probe. No queue to operate, no version skew between services, no "which container is wedged" at 3am. The cost is that you have to be careful about what shares a thread and what can stall what. Most of the engineering below is about keeping those boundaries clean inside a single process.

Stack: Rust 1.95 (edition 2024), Tokio, Axum 0.8, Askama for compile-time HTML templates, HTMX 2 for partial swaps so there is no SPA framework to ship. The API stays the single source of truth because every UI mutation hits the same /api/v1/* endpoint a script would.

Two databases, on purpose

Monitors are low-cardinality relational data that gets mutated by API calls: targets, regions, channels, incidents, plans. That is Postgres. Check results are append-only, high-cardinality, and almost always queried by time range. That is ClickHouse.

Trying to force one of those into the other is where uptime monitors usually fall over. Putting billions of check results in Postgres turns every dashboard query into a sequential scan. Putting your relational config in ClickHouse means fighting its update model forever. So: Postgres 18 for the world, ClickHouse 26.3 for the firehose. Both run their migrations at process startup, so there is no separate migrator to forget.

The HTTP client I did not want to write

The first version used a popular high-level HTTP client. It worked, but a monitor is a weird HTTP workload: you connect once per target per interval, you never reuse the connection, and you care about the timing of each phase more than the body.

So I dropped down to hyper and hyper-util with rustls and built a connector that times DNS resolution, TCP connect, and the TLS handshake as separate numbers, then runs the request over hyper::client::conn and aborts the connection task the moment the body is read. Each result carries dns_ms, connect_ms, tls_ms, and ttfb_ms as distinct columns, which is what makes "it got slow but it is the DNS, not your server" a thing the dashboard can actually say.

The rewrite paid for itself. On a single core the client sustains around 130K checks/sec at saturation, roughly 7.7 microseconds per check. That was a 44 to 56 percent throughput gain over the old path. A chunk of it was a url::parse call hiding in the redirect policy that cost 7.5 percent on its own and just vanished. Two cores get to about 153K. Scaling goes sub-linear past four cores because of shared HTTP/2 connection state and the pool mutex, which is a fine problem to have for this workload.

These numbers come from a laptop and a loadtest binary, so I treat them as regression detection, not capacity planning. The headline in-process run pushed 252K requests/sec sustained at p99 393ms. The design goal is around 50K concurrent in-flight checks per node with under 50ms p99 of per-check overhead on top of the network.

One heap, not a timer per target

The naive scheduler spawns a timer task per monitor. That falls apart at fleet size: thousands of tasks, thousands of wakeups, memory that grows with the number of targets.

Instead there is a single driver task that owns one BinaryHeap<Reverse<Due>>, a min-heap keyed by the next due Instant for the whole fleet. Memory stays flat in fleet size. Each target gets a deterministic jitter offset hashed from its UUID so a thousand monitors on a 60s interval do not all fire on the same tick. Generation and sequence counters mean a re-scheduled target supersedes its stale heap entry instead of double-firing. The registry refresh that pulls config from Postgres runs on its own task with exponential backoff, so a Postgres hiccup never stalls dispatch: the scheduler keeps running on what it last knew.

Failure isolation inside one process

Because everything is one process, one bad target cannot be allowed to take down the rest. Three patterns do most of that work. Per-host circuit breakers trip when a host keeps failing, so it fails fast with circuit_open instead of tying up a worker on a timeout, then probes half-open after a cooldown. A per-tenant host throttle bulkhead caps how many checks can be in flight against one host at once; over the cap, a check is recorded as throttled and degraded rather than piling on, and it never pages. And singleflight on RDAP collapses domain-expiry checks for the same domain across many tenants into one upstream probe, with sticky last-good state so a flaky registrar does not flip the monitor red.

The worker pool itself is a task-per-dispatch gated by a semaphore sized to a max-concurrency setting, with an SSRF guard filtering resolved IPs before any connect.

Modeling time series in ClickHouse

The check_results table is a MergeTree ordered by (org_id, target_id, region, timestamp). The org_id leads the sort key so each tenant gets a sparse-index slice, but it is deliberately kept out of the partition key (partition is by day) so we do not end up with millions of partitions.

The columns are where the storage savings live:

Timestamps are DateTime('UTC') with CODEC(DoubleDelta, ZSTD(1)). Check intervals are near-constant, so DoubleDelta crushes the gaps to almost nothing.
Numeric phase columns (duration_ms, dns_ms, connect_ms, tls_ms, ttfb_ms, response code and size) use CODEC(T64, ZSTD(1)).
region and agent_id are LowCardinality(String), status is an Enum8.

Retention is per row. There is a ttl_days column with TTL timestamp + toIntervalDay(ttl_days), stamped from the org's plan at write time. A plan that buys longer history needs zero schema change: new rows just carry a bigger number.

On top of the raw table sit two AggregatingMergeTree materialized views: a per-minute rollup kept 30 days and an hourly rollup kept 13 months, both holding quantilesState for p50/p95/p99 and per-status counts. Reads route by range: anything inside 30 days hits the minute rollup, older ranges hit the hour rollup, and raw reads are capped at a 90-day span. A dashboard asking for "last 24h p99 latency" never touches a raw row.

Regional probes without a second brain

You can run probes in multiple regions, but I did not want each region to be its own little system. So an agent is the same binary in agent mode, running as a stateless probe with no database, no web, no alerting. Adding a region adds execution capacity, never a second control plane.

An agent pulls its region's config from the control plane with ETag/304 handling, serves its last-known config if the control plane blips, and pauses if its token is revoked. It ships results back in batches that reuse one UUID across retries so a lost ack cannot double-count. Region and agent identity are derived server-side from the bearer token, never sent in the payload, so a probe cannot claim to be somewhere it is not. A separate long-poll loop handles interactive "check now" so a button press in the UI runs on a real remote probe within milliseconds.

Region is the partition dimension end to end, so every read can slice by region: per-region latency series, per-region incident scope, "down in Singapore, up in Helsinki."

Incidents as a follower, not a gatekeeper

The incident detector is a background task that follows the check_results stream and writes into the Postgres incidents table. The rule I held to: it never touches the hot write path, never gates check execution, and never produces alerts directly.

The detection itself is boring on purpose: two or more consecutive unhealthy results with no open incident opens one, two or more consecutive healthy results closes it. A small flap threshold absorbs single-result blips. The cross-tenant walk is keyset-paginated so memory stays bounded as tenants grow, and a unique index on open incidents resolves the race when two ticks try to open the same one: only the winner pages.

Opening or resolving fires a non-blocking signal to the escalation engine, which does repeat-until-acknowledged paging on a per-monitor cadence across about fourteen transports (Slack, generic signed webhooks, Telegram, PagerDuty, ntfy, Pushover, Discord, email, and more), with sharded per-incident locks so a reconcile and an inbound signal can never double-page the same episode. Channel secrets are sealed at rest with AES-GCM and never echoed back.

Automation as a first-class surface

Because the API is the single source of truth, the rest came almost for free:

A self-describing OpenAPI 3.1 spec with Swagger UI.
An official Terraform provider so you can manage monitors and notification channels as code.
An MCP server so an LLM client can query your monitors and incidents over OAuth 2.1, scope-gated and audited, with per-action confirmation on the few write tools.

Where it is

Uptimepage is live, free to start with no card, and AGPL-3.0 open source. The core is not paywalled: checks, status pages, subscribers, the API, and every alert channel are in the free tier. It monitors itself and serves its own status badges from the running binary, which is the most honest dogfood I could think of.

Site and hosted tier: https://uptimepage.dev
Source: https://github.com/uptimepage/uptimepage
More build notes: https://uptimepage.dev/blog

Happy to answer anything about the Rust internals in the comments.

99.9% uptime is 43 minutes a month. Do you know your number?

Slim — Wed, 24 Jun 2026 14:45:04 +0000

"Three nines" sounds like the service is basically always up. It is not. 99.9% uptime gives you 43 minutes of downtime every month, and one bad deploy on a Tuesday can spend the whole budget before lunch.

Most people never do the arithmetic, so the percentages stay abstract. Here is every common availability target turned into actual time you are allowed to be down.

99.9% is 43 minutes a month. 99.99% is 4 minutes. Every extra nine costs you roughly 10x the effort.

The nines, in plain time

Availability	Per year	Per month	Per week	Per day
99% (two nines)	3.65 days	7h 12m	1h 41m	14m 24s
99.5%	1.83 days	3h 36m	50m 24s	7m 12s
99.9% (three nines)	8h 46m	43m 12s	10m 5s	1m 26s
99.95%	4h 23m	21m 36s	5m 2s	43s
99.99% (four nines)	52m 34s	4m 19s	60s	8.6s
99.999% (five nines)	5m 15s	26s	6s	0.9s

Rounded, assuming a 365-day year and a 30-day month. The math is just (1 - availability) * length of the window. That is the whole formula. 99.9% means 0.1% of the window is allowed to be downtime, so over a 30-day month that is 0.001 times 43,200 minutes, which lands at 43.2 minutes.

The actual times land harder than the percentages do.

99% looks respectable as a number and is genuinely bad as a service. Three and a half days a year is a long weekend of your product being down. Nobody would tolerate that, yet "99%" reads fine in a slide.

The jump from three nines to four nines is brutal. You go from 43 minutes a month to 4 minutes a month. That last 0.09% costs you redundancy, failover, on-call rotations, and a lot of money. Each extra nine roughly multiplies the engineering effort while shrinking your room for error by 10x.

Five nines is 26 seconds a month. At that point a single slow deploy or one bad health check blows your entire budget. Most teams that claim five nines are either measuring something narrow or not measuring honestly.

The trap: the number you advertise is not the number you measure

There are two different uptime numbers and they almost never match.

The first is the SLA you promise, a yearly figure, the one on the pricing page. The second is what you actually measured this month. They drift apart for reasons that have nothing to do with how reliable your service really was.

Window matters more than the percentage. A single 50-minute outage is a non-event against a yearly 99.9% budget of nearly 9 hours. That exact same outage blows straight through a monthly 99.9% budget of 43 minutes. Same outage, same service, two different verdicts, purely because of which window you measured. Vendors love quoting the yearly number because it hides the months that hurt.

What counts as "down" matters just as much. Was a single failed health check downtime, or noise? Did a 30-second blip in one region count against you? If your monitor records every flap as an outage, your measured uptime will look worse than the service actually was. If it averages everything into a smooth ratio, it will look better. Neither is the real number, and the gap between them is the difference between a monitor you trust and one you have learned to ignore.

Treat your downtime allowance as a budget

Think of your monthly allowance as money you get to spend rather than a target to perfect.

99.9% monthly gives you 43 minutes. That is your error budget, and everything draws it down: planned maintenance, a failed deploy, a dependency that went slow for an afternoon. Frame it that way and the question stops being "how do we never go down" and becomes "what do we want to spend these 43 minutes on," which is a far more honest conversation and the one good SRE teams actually have.

It also kills the instinct to chase more nines than you need. A side project does not need four nines. An internal tool does not need four nines. Adding a nine you will never use is just paying for redundancy nobody asked for. Pick the target your users actually feel, then measure against it on the window they actually live in, which is usually the month.

So what is your number

Find your real measured uptime for last month, not the aspirational figure from the pricing page. Then check it against the table. If you do not know it to the minute, that is the actual thing to fix first, because you cannot spend a budget you never counted.

I build uptimepage, an open source, self-hostable uptime and status page monitor. It computes uptime from confirmed incidents over whatever window you pick, so the number you see is the one your users actually felt, not a ratio of green checks. The source is right there if you want to see how the math is done.

How I got my AI agents to communicate across repos — and shipped SAMP doing it

Slim — Sat, 25 Apr 2026 06:18:01 +0000

Situation. I was working on lumen-argus, a project that runs across three Claude Code sessions in three different repos. They needed to share context — "I just refactored the auth module," "the schema migration landed, pull main." For weeks I was copy-pasting between terminal windows like a caveman.

Task. I wanted cross-session, cross-repo messaging for AI agents. Cheap. Fast. Local. Ideally something I could read with cat if anything went wrong.

Three constraints:

Token cost per message had to be near zero.
No new servers, no daemons, no MCP handshake.
It had to work for any agent — Claude Code today, Cursor or my next project tomorrow.

Action. I surveyed the existing options. mcp_agent_mail, Agent Teams, broker daemons — all of them spin up an HTTP server, register identities, run a SQLite store, and burn tokens on polling hooks. Overkill for three agents swapping a dozen messages a day.

So I borrowed Linus Torvalds' playbook from git:

Per-writer append-only logs. One file per agent: log-<alias>.jsonl. No file ever has two writers, so there's no locking, no interleave, and Syncthing / Dropbox / iCloud can't cause merge conflicts.
Content-addressed IDs. id = sha256({ts, from, to, thread, body})[:16]. Same content → same id, so dedup after sync is automatic.
mtime short-circuit. Before parsing anything, stat the log files; if nothing changed, exit immediately.

I packaged it as SAMP (Simple Agent Message Protocol) — a vendor-neutral spec — and shipped a reference implementation, agent-message: three Claude Code slash commands (/message-send, /message-inbox, /message-reply), a msg shell helper for humans, and a tiny Python wrapper any other agent CLI can spawn.

Result.

1 Bash tool call per send/receive from Claude Code. No MCP init, no ack roundtrip.
0 LLM tokens when humans (or cron, or scripts) read/send via the shell helper. The model is never in the loop.
~30 ms latency — that's python3 startup. Everything else is a file append.
Works offline. Syncs across machines via Syncthing / Dropbox / iCloud with zero conflicts by construction.
One install script. No pip, no npm, no Docker.

Docs: https://slima4.github.io/agent-message/
Repo: https://github.com/slima4/agent-message
Spec: https://github.com/slima4/agent-message/blob/main/SPEC.md

If you're juggling multiple agent sessions and copy-pasting between them, give it a try. PRs and issues welcome.

Tags: ai, claudecode, anthropic, aiagents, SAMP

Sniffing Claude Code's API Calls: What Your IDE Is Really Sending

Slim — Mon, 16 Mar 2026 05:05:11 +0000

Every time you press Enter in Claude Code, something interesting happens behind the scenes. Your full conversation — system prompt, message history, tool definitions, everything — gets packaged into an API call and sent to Anthropic's servers.

But you never get to see those calls. Claude Code logs a JSONL transcript of what it did (tool calls, responses, thinking blocks), but not the raw API traffic that made it happen. The system prompt, HTTP headers, request parameters, latency per call, and one entirely hidden API call — all invisible.

So we built a way to see everything.

The Trick: One Environment Variable

Claude Code officially supports ANTHROPIC_BASE_URL — an environment variable that redirects API traffic to a custom endpoint. It's meant for enterprise proxies, but it works perfectly for local interception:

Claude Code  ──plain HTTP──▶  Sniffer (localhost:7735)  ──HTTPS──▶  api.anthropic.com
                                    │
                                    ▼
                          ~/.claude/api-sniffer/*.jsonl

Start the sniffer in one terminal, launch Claude Code in another:

# Terminal 1
claudetui sniffer

# Terminal 2
claudetui sniff

claudetui sniff auto-detects the sniffer port and launches Claude Code through the proxy. If the sniffer isn't running, it falls back to launching Claude Code directly — so you never get stuck with a ConnectionRefused retry loop.

Every API call now flows through the proxy and gets logged. Claude Code works identically — it doesn't know (or care) that traffic is being captured.

No TLS interception. No certificates. No patching binaries. Just a localhost HTTP server that forwards to the real API over HTTPS.

What You See

The sniffer prints one line per API call as it happens:

  ClaudeTUI API Sniffer — listening on http://127.0.0.1:7735

  Use:  ANTHROPIC_BASE_URL=http://localhost:7735 claude
  Log:  ~/.claude/api-sniffer/sniffer-20260314-103000.jsonl

  #1   POST /v1/messages  opus-4-6  45.2k->1.5k  $0.120  2312ms  740KB/4.2KB  98%c  [Tt]
  #2   POST /v1/messages  opus-4-6  48.1k->0.8k  $0.094  1134ms  741KB/2.1KB  99%c  [TU]  Edit
  #3   POST /v1/messages  opus-4-6  50.3k->52    $0.081  1823ms  742KB/0.3KB  100%c  [U]  Glob,Grep
  #4   POST /v1/messages  opus-4-6  12.3k->2.1k  $0.041  3412ms  42KB/6.8KB   95%c  [Tt]  compaction
  #5   POST /v1/messages  sonnet-4-6  14.3k->2.1k  $0.008  2341ms  42KB/6.8KB  [Tt]  +agent.1

  Summary: 5 requests | $0.344 | 170k in | 5.6k out | 2.3MB sent | 18KB recv | 1 sub-agent

Each line shows the model, input->output tokens, estimated cost, latency, traffic size, cache hit ratio, content block types, tool names, and sub-agent tracking. Compaction events get flagged automatically.

The content blocks tell you what Claude is doing: T = thinking, t = text, U = tool use, S = server-side tool (like WebSearch). Cache ratio (98%c) shows how much you're saving — a 0%c (shown in red) means a cache miss, 12.5x more expensive.

Meanwhile, every request and response is logged as structured JSONL for later analysis.

What Transcripts Don't Tell You

Claude Code's JSONL transcripts are useful, but they omit a lot. Here's what the sniffer captures that transcripts don't:

Data	In Transcript?	In Sniffer?
Token usage (input/output/cache)	Yes	Yes
Raw system prompt	No	Yes
Full conversation history per request	No	Yes
Request parameters (max_tokens, temperature)	No	Yes
HTTP headers (anthropic-beta, version)	No	Yes
Request/response latency	No	Yes
Hidden compaction API call	No	Yes
Error response bodies	Partial	Yes
Streaming SSE events	No	Yes
Tool definitions (full JSON schema)	No	Yes

The most interesting items on this list are the system prompt and the hidden compaction call.

The System Prompt

Claude Code's system prompt is sent on every single API call. It contains:

Claude Code's internal instructions and behavioral guidelines
Tool definitions (Read, Write, Edit, Bash, Glob, Grep, etc.) with full JSON schemas
Your CLAUDE.md project instructions
Memory files, hooks output, and other injected context
Safety and permission guidelines

With --full mode, the sniffer captures the complete system prompt text. In our sessions, it consistently measures ~14k tokens — a fixed tax on every API call.

This is useful for understanding exactly what Claude Code "knows" about your project. Your CLAUDE.md, your hooks output, your memory files — it's all there in the system prompt, and now you can read it.

The Hidden Compaction Call

This is the one we were most curious about.

When Claude Code's context window fills up (~167k of 200k tokens), it triggers compaction. The entire conversation gets compressed into a summary, and the next turn starts fresh with just the system prompt + summary.

But here's the thing: the API call that generates the compaction summary doesn't appear in the transcript. Claude Code makes it, receives the summary, and continues — but the JSONL transcript shows nothing. You see a compact_boundary marker, but not the actual summarization call.

The sniffer catches it because it's just another POST /v1/messages:

  #12  POST /v1/messages  opus-4-6  12.3k->2.1k  $0.041  3412ms  compaction

The sniffer detects compaction by comparing consecutive requests. When the message count drops by more than 50% or the total content size drops by more than 70% compared to the previous request, it flags it as post-compaction. The dramatic shrinkage — from 167k tokens of conversation down to a ~15k summary — is unmistakable.

This is the only way to observe the compaction call's actual cost, latency, and output tokens. In our sessions, compaction summary generation takes 2-4 seconds and produces 11-19k tokens of compressed context.

The Tool Use Loop

When you see a line like this in the sniffer:

  #17  POST /v1/messages  opus-4-6  114.3k->531  $0.217  16047ms  [U]  tool

That's 114k tokens in but only 531 out. Why so few output tokens? Because Claude isn't writing prose — it's calling a tool. The response is just a small JSON block:

{"type": "tool_use", "name": "Read", "input": {"file_path": "/src/app.py"}}

Here's the full cycle for a single tool call:

Claude Code sends the full conversation to the API (114.3k input tokens — system prompt, message history, tool definitions, everything)
API responds with a tool_use block — just the tool name and parameters (531 output tokens)
Claude Code executes the tool locally — reads the file, runs the command, whatever the tool does
Claude Code sends another request with the tool result appended as a tool_result message — now input tokens are higher because the file contents (or command output) are part of the conversation

That's why you see rapid back-to-back requests in the sniffer. A single "read this file and edit it" from the user might generate 5+ API calls:

  #17  POST /v1/messages  opus-4-6  114.3k->531   [U]  tool     ← decide to read file
  #18  POST /v1/messages  opus-4-6  116.1k->204   [U]  tool     ← decide to edit file
  #19  POST /v1/messages  opus-4-6  117.8k->1.2k  [Tt]          ← respond to user

Each round-trip adds the tool result to the conversation, growing the input tokens. This is why context fills up faster than you'd expect — tool results (file contents, command output, search results) are often much larger than the tool call itself.

SSE Streaming Under the Hood

Claude Code uses Server-Sent Events (SSE) for streaming responses. The API returns text/event-stream and sends data in chunks as the model generates tokens.

The sniffer handles this transparently — it forwards each chunk to Claude Code as it arrives (so you don't notice any delay), while capturing the entire stream for logging.

After streaming completes, it reassembles the SSE events to extract structured data: model, usage, stop reason, and content block types (text, thinking, tool_use). This is what makes the one-line terminal output possible — you get clean 45.2k->1.5k $0.120 2312ms instead of raw SSE data.

The key technical detail: we use response.read1(8192) instead of response.read(8192). The read1() method reads whatever data is currently available without waiting for the full buffer to fill — critical for streaming, where you need to forward partial data immediately.

Sub-Agent Tracking

When Claude Code spawns a sub-agent (via the Agent tool), the sub-agent makes its own API calls — often using a different model. The sniffer tracks these by session ID:

  #8   POST /v1/messages  opus-4-6    89.1k->3.2k  $0.182  8234ms  99%c  [TU]  Agent
  #9   POST /v1/messages  sonnet-4-6  14.3k->2.1k  $0.008  2341ms         [Tt]  +agent.1
  #10  POST /v1/messages  sonnet-4-6  16.5k->1.2k  $0.006  1823ms         [TU]  Read  agent.1
  #11  POST /v1/messages  sonnet-4-6  22.8k->0.5k  $0.009  1243ms         [t]   agent.1
  #12  POST /v1/messages  opus-4-6    92.3k->1.5k  $0.152  4312ms  99%c  [Tt]

+agent.1 marks the first request from a new sub-agent. Subsequent requests from the same agent show agent.1. The main session has no label.

This reveals things you can't see from the transcript: sub-agents often use Sonnet (cheaper, faster) for research tasks while the main session runs on Opus. You can see exactly how many API calls each sub-agent makes, their cost, and how they overlap with the main session.

Cache Misses — The Silent Cost Spike

The cache ratio (98%c, 100%c) shows what percentage of input tokens were cache reads. Most of the time it's near 100% — great, you're paying the cheap rate.

But leave your session idle for ~5 minutes and watch what happens:

  #6   POST /v1/messages  opus-4-6  129.4k->15   $0.199   3336ms  100%c  [t]
  #7   POST /v1/messages  opus-4-6  129.5k->428  $2.460  16108ms  0%c    [Tt]
  #8   POST /v1/messages  opus-4-6  130.0k->600  $0.248  18310ms  100%c  [Tt]

Request #7 cost $2.46 — 12.5x more than usual — because the cache expired. All 129.5k tokens went through cache_creation at $18.75/M instead of cache_read at $1.50/M. Same data, same tokens, wildly different price.

The sniffer shows 0%c in red to make these cache misses impossible to miss.

Per-Request Cost Tracking

The sniffer calculates cost per API call using the token breakdown from the response:

{
  "usage": {
    "input_tokens": 3,
    "cache_read_input_tokens": 45000,
    "cache_creation_input_tokens": 800,
    "output_tokens": 1500
  }
}

With model-specific pricing (Opus: $15/$1.50/$18.75/$75 per 1M tokens for input/cache-read/cache-write/output), each line shows the exact cost of that call. No estimation, no averaging — real cost per request.

This revealed something we didn't expect: the variance between calls is huge. A simple response might cost $0.03, while a long code generation can cost $0.50+ — in the same session, same model.

What We Learned

After running the sniffer on real sessions, a few things stood out:

1. The system prompt is remarkably stable. It barely changes between calls within a session. The ~14k tokens are almost entirely cached after the first call, making them cheap ($1.50/M vs $15/M for Opus). But they still consume context window space.

2. Compaction is expensive in latency, not just tokens. The summary generation call takes 2-4 seconds — during which Claude Code is unresponsive. On a long session with 3 compactions, that's 6-12 seconds of dead time.

3. Cache hit rates are extraordinary. In typical sessions, 95-98% of input tokens are cache reads. The stateless-API design sounds expensive, but caching makes it practical.

4. Error responses are more informative than you'd think. When Claude Code encounters a 429 (rate limit) or 529 (overloaded), the response body often includes retry-after headers and detailed error messages. These are swallowed by Claude Code's retry logic and never shown to you.

5. Beta headers reveal feature flags. The anthropic-beta header shows which experimental features are active. Watching this change across Claude Code versions is interesting.

Security Notes

The sniffer is designed to be safe by default:

Localhost only — binds to 127.0.0.1, never 0.0.0.0
API keys redacted — x-api-key and authorization headers stripped from logs by default (use --no-redact to override)
Restricted permissions — log files created with 0o600 (owner read/write only)
Local plaintext — the API key transits in plain text only over the loopback interface, which is standard for local proxy patterns

Try It

The sniffer is part of ClaudeTUI:

# Install
brew tap slima4/claude-tui && brew install claude-tui && claudetui setup

# Or
curl -sSL https://raw.githubusercontent.com/slima4/claude-tui/main/install.sh | bash

# Run
claudetui sniffer              # Terminal 1: start proxy
claudetui sniff                # Terminal 2: launch claude through proxy
claudetui sniff --resume abc   # or resume a session through proxy

Options:

--port PORT     Custom port (default: 7735)
--full          Log complete request/response bodies (warning: large files)
--no-redact     Include API keys in logs (use with caution)
--quiet         Suppress terminal output, log only

Python 3.8+, stdlib only — no external dependencies.

What's Next

The sniffer captures data that was previously invisible. Combined with ClaudeTUI's existing context efficiency analysis, this gives a complete picture of what Claude Code is doing under the hood — from high-level token waste tracking down to raw HTTP traffic.

Some things we're exploring:

Replaying captured sessions for cost modeling ("what would this session cost on Sonnet vs Opus?")
Diffing system prompts across Claude Code versions to track changes
Correlating latency with context size — does response time scale linearly with input tokens?
Analyzing compaction summaries — what gets preserved and what gets lost?

If you're curious about what your Claude Code sessions actually look like at the API level, point the sniffer at a session and watch the data flow.

ClaudeTUI is open source and MIT licensed. Stdlib-only Python, zero external dependencies.

GitHub: github.com/slima4/claude-tui

Where Do Your Claude Code Tokens Actually Go? We Traced Every Single One

Slim — Sat, 14 Mar 2026 06:25:43 +0000

You're paying for 200,000 tokens of context. But how many of those tokens are actually doing useful work?

We built ClaudeTUI — a set of monitoring tools for Claude Code — and dug into the raw JSONL transcript data to trace every token. What we found surprised us: there are four distinct categories of token usage, and only one of them is your actual work.

What Happens When You Press Enter

Here's something most Claude Code users don't realize: every time you press Enter, the entire conversation is sent from scratch.

The Claude API is stateless. It doesn't remember your previous messages. So every single keystroke triggers an API call that includes:

System prompt (~14k tokens) — Claude Code's instructions, tool definitions, your CLAUDE.md
Full conversation history — every message, every tool call, every tool result since the last compaction
Your new message

On turn 1, that's maybe 15k tokens. By turn 15, it's 100k. By turn 30, it's 167k — and then compaction fires.

This is why Claude gets slower and more expensive as your session goes on. Each Enter keystroke processes more tokens than the last. And it's why compaction exists: without it, you'd hit the 200k wall and the session would simply stop.

The good news: Anthropic's prompt caching makes this less painful than it sounds. But it's worth understanding how.

The Cache Lives on Anthropic's Servers

Your machine sends the full conversation on every request — the same bytes go over the network every time. The optimization happens server-side: Anthropic checks "have I seen this exact prefix of tokens recently?" If yes, it skips re-processing them and charges the cheaper cache read rate ($1.50/M instead of $15/M for Opus — a 10x discount).

In a 157-turn session, we measured 98% of all tokens as cache reads. That makes sense: by turn 100, you're re-sending 99 turns of history that are already cached. Only the newest content goes through the expensive cache_creation path.

The cache has a TTL — likely ~5 minutes for conversation content. If you pause too long between turns, the cache expires and the next call pays full input price for everything. This is also why compaction is expensive: it blows away the entire cached conversation and replaces it with a brand new summary that goes through cache_creation from scratch.

One more thing: the tokens still count toward your 200k context window, even when cached. Caching saves money, not space.

Now let's look at what those tokens actually are.

The Four Types of Tokens

Every API call Claude Code makes has a token usage breakdown in its transcript. By parsing thousands of these calls across real sessions, we identified four categories:

1. System Prompt (~14k tokens) — The Constant Tax

Every single API call includes a system prompt: Claude Code's internal instructions, tool definitions, safety guidelines, and your CLAUDE.md file. In our sessions, this was consistently ~14,328 tokens.

This isn't something you can avoid. It's infrastructure. But it means that out of your 200k window, only ~186k is ever available for actual conversation.

We discovered this by looking at cache_read_input_tokens after compaction events. The value resets to exactly 14,328 every time — that's the system prompt floor. During normal operation, cache_read grows from 14k to 167k as your conversation accumulates in the cache.

2. Compaction Summary (~11-19k tokens) — The Rebuild Cost

When compaction fires, Claude Code compresses your entire conversation into a summary. The next API call has to read that summary to reconstruct context. This is the real overhead of compaction.

From a real 3-compaction session:

Compaction	Summary Size	What It Costs
#1	18.8k tokens	$0.47 (Opus)
#2	10.6k tokens	$0.22 (Opus)
#3	17.8k tokens	$0.37 (Opus)

These summaries are lossy. Your 167k of rich context — exact error messages, file contents, code snippets — gets compressed into 11-19k tokens. Details are lost.

3. Useful Work — What You Actually Paid For

This is everything else: your prompts, Claude's responses, tool calls, file reads, code edits, test output. The actual productive content of your session.

4. Headroom (~33k tokens) — The Unused Buffer

Claude Code doesn't wait until 200k to compact. It triggers at roughly 83% capacity (~167k tokens), reserving ~33k tokens as a buffer for the compaction process itself.

That means ~16.5% of your context window is never available for useful work. You're paying for 200k but only getting ~167k.

A Real Session, Dissected

Here's an actual 4-segment session from our monitoring data:

  Seg 1  ▒▒▓▓████████████████████████████████████████████████░░░░░  200.0k
         14.3k system │ 152.7k useful │ 33.0k headroom │ → compacted

  Seg 2  ▒▒▓▓▓████████████████████████████████████░░░░░░░░░░░░░░░  200.0k
         14.3k system │ 18.8k summary │ 114.4k useful │ 52.5k headroom │ → compacted

  Seg 3  ▒▒▓▓▓████████████████████████████████████████████████░░░  200.0k
         14.3k system │ 17.8k summary │ 141.2k useful │ 33.9k headroom │ → compacted

→ Seg 4  ▒▒▓▓██████                                                44.8k
         14.3k system │ 10.6k summary │ 12.7k useful

  Efficiency: 76%  │  Wasted: 166.5k/644.8k

76% efficiency means 76% of the total tokens went to useful work. The other 24% went to compaction summaries and headroom.

Notice how Seg 1 has no summary — it's the first segment, nothing to rebuild from. But starting from Seg 2, every segment pays the summary tax.

The Hidden API Call

One thing we couldn't find in the transcript: the compaction summary generation itself. Claude Code makes a hidden API call that reads your ~167k context and produces the summary, but this call is not logged in the JSONL transcript.

Based on the preTokens metadata we found in compaction events, this hidden call reads the full pre-compaction context (~167k tokens). At Opus pricing ($1.50/M for cached reads), that's roughly $0.25 per compaction just for the summary generation — on top of the rebuild cost.

What This Means for Your Wallet

Let's do the math for a long Opus session with 3 compactions:

Token budget: 644.8k total

Category	Tokens	Cost (Opus)	% of Total
Useful work	490k	~$8.50	76%
Compaction summaries	47k	~$0.85	7%
Headroom (unused)	108k	$0 (not billed)	17%
System prompt (constant)	~43k	~$0.06 (cached)	—
Hidden summary generation	~500k reads	~$0.75	—

The headroom tokens aren't billed directly — they represent capacity you couldn't use. But the summaries and hidden calls are real costs.

With Sonnet 4.6 the same session would be dramatically cheaper. Sonnet supports up to 1M context, so with 644k tokens you'd hit zero compactions:

All tokens are useful work
Efficiency: 100%
Cost: ~$5.50 (vs ~$10+ on Opus)

The System Prompt Discovery

Perhaps the most interesting finding: the system prompt is a constant ~14k tax on every segment.

Before our investigation, we were counting the full post-compaction context as "rebuild waste." A segment showing 33.1k rebuild looked like 33.1k of compaction overhead. But 14.3k of that is system prompt — you'd pay it regardless.

The actual compaction overhead (the summary) is only 33.1k - 14.3k = 18.8k. That's a 43% difference in how you measure waste.

How we detected it:

After compaction #1: cache_read = 14,328  ← system prompt
After compaction #2: cache_read = 14,328  ← same
After compaction #3: cache_read = 14,328  ← same

During normal operation: cache_read grows from 14k → 167k

The cache_read value tells you exactly what's already cached. After compaction, only the system prompt survives in cache — everything else (the compaction summary) goes through cache_creation.

The Compaction Cache Structure

Here's how token caching works across a compaction boundary:

Before compaction (normal operation):

cache_read: 166,575    ← almost everything is cached
cache_creation: 312    ← tiny new content
input_tokens: 3        ← negligible
output_tokens: 126

First call after compaction:

cache_read: 14,328     ← only system prompt survives
cache_creation: 18,793 ← compaction summary, written fresh
input_tokens: 3
output_tokens: 1,249

The cache gets blown away by compaction. Everything that was cached (your conversation, tool results, file contents) is gone. Only the system prompt persists because it's on a separate, longer-lived cache (likely a 1-hour TTL vs the 5-minute conversation cache).

7 Things You Can Do Right Now

1. Use /compact manually at logical breakpoints

Don't wait for auto-compaction at 167k. After finishing a feature or fixing a bug, compact yourself. You can guide what gets preserved:

/compact Preserve all file paths, error messages, and the list of modified files

2. Use /clear between distinct tasks

Switching from implementation to debugging? Starting a new feature? A fresh 186k of clean context beats 80k of stale context with irrelevant history.

3. Delegate verbose work to subagents

Each subagent gets its own isolated 200k context window. Running tests, searching large codebases, or fetching documentation in subagents keeps verbose output from bloating your main session.

4. Read files with line ranges

Instead of reading entire files, specify what you need: "Read lines 40-90 of handler.ts." Especially critical in debugging loops where you might read the same file repeatedly.

5. Disable unused MCP servers

Each MCP server loads its full tool schema into context on every request. A 20-tool server can consume 5-10k tokens just by existing. That's on top of the 14k system prompt.

6. Keep CLAUDE.md under 200 lines

CLAUDE.md is part of that ~14k system prompt. It loads on every single API call and survives all compaction cycles. If it's bloated, you're paying on every call.

7. Monitor your efficiency

Install ClaudeTUI and watch the numbers in real-time. Seeing "Efficiency: 76%" drop to "Efficiency: 68%" after a compaction changes how you think about context management.

How to See This Yourself

Install ClaudeTUI:

# Via Homebrew
brew install slima4/claude-tui/claude-tui && claudetui setup

# Or directly
curl -sSL https://raw.githubusercontent.com/slima4/claude-tui/main/install.sh | bash

Open a second terminal and run:

claudetui monitor    # live dashboard
claudetui chart      # efficiency chart (standalone)

The efficiency chart shows the 4-component breakdown for every segment in your session — updated live as you work. Press w in the monitor to open it, or v to toggle between horizontal and vertical views.

The Bottom Line

Every Claude Code session has four types of token usage:

System prompt (~14k) — constant tax, can't avoid it, but it's cheap (cached)
Compaction summaries (~11-19k each) — the real cost of compaction, lossy compression of your work
Useful work — what you actually paid for
Headroom (~33k) — buffer that's never available for work

In a typical 3-compaction Opus session, about 76% of tokens are useful work. The rest is overhead. Making this visible — and understanding what each component actually is — is the first step to spending tokens more intentionally.

ClaudeTUI is open source and MIT licensed. Stdlib-only Python, zero external dependencies.

GitHub: github.com/slima4/claude-tui

ClaudeTUI v0.3: Claude Code statusline gets a unified CLI, interactive configurator, and a proper splash screen

Slim — Wed, 11 Mar 2026 07:27:08 +0000

A couple of days ago I shared ClaudeTUI — a real-time statusline, live monitor, and session analytics for Claude Code. Since then, a lot has changed. Here's what's new in v0.3.

One command to rule them all

The biggest change: instead of six separate CLI commands for managing the Claude Code statusline, monitor, and analytics, there's now a single claudetui dispatcher.

# before
claude-ui-monitor
claude-stats --days 7
claude-sessions list
claude-ui-mode compact

# after
claudetui monitor
claudetui stats --days 7
claudetui sessions list
claudetui mode compact

Every subcommand passes arguments straight through — claudetui is just a 60-line Python script that resolves the right tool and execs it. No overhead, no framework. If you type claudetui --help, you get the full menu:

claudetui — CLI for ClaudeTUI (Claude Code utilities)

Commands:
  monitor     Live session dashboard (separate terminal)
  stats       Post-session analytics
  sessions    Browse, compare, resume, export sessions
  mode        Switch statusline mode (full/compact/custom)
  setup       Configure statusline, hooks, and commands
  uninstall   Remove ClaudeTUI configuration

The old command names (claude-ui-monitor, claude-stats, etc.) are gone. Clean break, clean namespace.

Interactive statusline configurator

The Claude Code statusline has 20+ components across three lines — context usage, cost, model, git stats, tool trace, and more. Previously you could only choose between "everything" (full mode) and "almost nothing" (compact mode). Now there's a third option:

claudetui mode custom

This opens a curses TUI where you can toggle individual components with arrow keys and spacebar:

Each component shows a live preview of what it looks like — colored progress bars, sparklines, git stats — right in the menu. You can pick from five widget styles for the matrix rain area, or apply presets (all, minimal, focused).

Don't like interactive menus? Use flags:

claudetui mode custom --hide model,cost,session_id
claudetui mode custom -w hex -p focused
claudetui mode custom -l   # show what's currently hidden

Config saves to ~/.claude/claudeui.json and hot-reloads — no restart needed.

Monitor got a facelift

The live monitor (claudetui monitor) picked up several improvements:

Splash screen — the monitor now shows an ASCII art logo while it loads the session in the background. Looks cool, masks the 1-2 second discovery time.

Pinned layout — the header (context, cost, stats) and footer (hotkey bar) are now fixed. Only the log section scrolls. No more hunting for the help bar when the log fills up.

Configurable log size — add "monitor": {"log_lines": 12} to your claudeui.json, or set it to 0 to hide the log entirely. Default is 8. Adjustable from the settings panel too (press c in the monitor).

Responsive footer — the hotkey bar adapts to terminal width. Full labels at 60+ columns, abbreviated at 40+, minimal below that.

The rebrand

You might have noticed: it's ClaudeTUI now, not ClaudeUI. The old name was too generic and clashed with other projects. Everything got renamed — repo, Homebrew tap, slash commands (/ui:* → /tui:*), CLI tools, docs, landing page, and all the screenshots.

The installer handles migration automatically — if you have the old claudeui Homebrew tap, it untaps it and points you to the new one.

brew tap slima4/claude-tui
brew install claude-tui
claudetui setup

Or the one-liner:

curl -sSL https://raw.githubusercontent.com/slima4/claude-tui/main/install.sh | bash

What's next

The transcript parsing approach works surprisingly well, but it's still reverse-engineering an undocumented format. Every Claude Code update is a small gamble. I'd love to see an official API for session metadata — even just a stable JSON schema for the transcript would help.

In the meantime, if you use Claude Code and want a statusline with real-time context tracking, cost monitoring, and session analytics: github.com/slima4/claude-tui

Stars and issues welcome.

I built a real-time dashboard for Claude Code because I kept losing track of my sessions

Slim — Mon, 09 Mar 2026 08:04:57 +0000

Claude Code has a 200k token context window but gives you zero visibility into how much of it you've used — until auto-compaction kicks in and wipes half your context. I got tired of that surprise, so I built ClaudeTUI.

The problem

If you use Claude Code daily, you've probably hit these:

Auto-compaction fires mid-task and you lose context
No idea how much a session is costing you
Can't tell which files Claude has been touching
No way to compare sessions or track patterns over time

Claude Code is a powerful tool, but it's a black box. You type, it works, and you hope for the best.

What ClaudeTUI does

It's a collection of tools that plug into Claude Code and give you full visibility:

Statusline — a real-time status bar that sits at the bottom of Claude Code:

 0110100 Opus 4.6 │ ████████░░░░░░░░░░░░ 42% 65.5k/200.0k │ ~24 turns left │ $2.34 │ 12m │ 0x compact
 1001011 my-project │ main +42 -17 │ 18 turns │ 5 files │ 0 err │ 82% cache │ 4x think │ ~$0.13/turn
 0110010 read config.ts → edit config.ts → bash npm test → edit README.md │ config.ts×2 README.md×1

Context usage, cost, cache ratio, git diff, tool trace, compaction prediction — all live. There's also a compact 1-line mode if you prefer minimal.

Live Monitor — open a second terminal and get a full dashboard:

$ claude-ui-monitor

Context sparkline with compaction history, cost breakdown with cache savings, per-turn activity (tools, files, errors), session-wide stats, and a scrollable log viewer with filters. It even tracks agent spawns and their results. The matrix rain header pauses when Claude is idle.

Hooks — automatic context injected into your sessions:

Session start: shows which files you've been editing recently across sessions
After edit: warns you about reverse dependencies ("4 files import this module")
Before edit: flags high-churn files ("config.ts edited 43 times in 5 sessions — maybe refactor?")

Session Stats — post-session analytics:

$ claude-stats --days 7 -s

Cost breakdown, token sparklines, tool usage charts, file activity heatmaps. See which sessions burned the most tokens and why.

Session Manager — browse and compare sessions:

$ claude-sessions list
$ claude-sessions diff abc123 def456

Side-by-side comparison of cost, duration, tools used, and file activity between any two sessions.

Slash Commands — deep reports without leaving Claude Code:

/ui:session    # full session report
/ui:cost       # cost deep dive
/ui:perf       # tool efficiency analysis
/ui:context    # context window predictions

How it works

Everything runs by parsing Claude Code's transcript JSONL files from ~/.claude/projects/. No API keys, no external services, no dependencies — just Python 3.8+ and the standard library.

The statusline uses Claude Code's statusLine config to run a Python script that reads session metadata from stdin and the transcript file from disk. It does two passes: a reverse pass to find current context size, and a forward pass to accumulate costs and activity.

The hooks use Claude Code's hooks system to run scripts on events like SessionStart, PreToolUse, and PostToolUse.

The monitor watches the transcript file for changes and redraws when it detects new content.

Install

One command:

curl -sSL https://raw.githubusercontent.com/slima4/claude-tui/main/install.sh | bash

It sets up everything — statusline, hooks, slash commands, and CLI tools. The installer asks you to pick full or compact mode for the statusline. You can switch anytime with claude-ui-mode compact or claude-ui-mode full.

To uninstall: claude-ui-uninstall

What I learned

Building tools that parse another tool's internal format is fragile by nature. Claude Code's transcript format isn't documented, so I had to reverse-engineer it by reading JSONL files and figuring out the structure. It works well today, but could break with any Claude Code update.

The other challenge was performance. The statusline runs on every refresh, so it needs to parse the transcript fast. For long sessions with thousands of entries, the reverse-pass-first approach helps — you find the current context size quickly without reading the entire file sequentially.