DEV Community

mag123c
mag123c

Posted on • Edited on

Parsing 2 GiB/s of AI token usage with TUI like ccusage (Rust + simd-json)

The Problem

I use Claude Code, Codex CLI, and Gemini CLI daily. One day I checked my API bill — it was way higher than expected. But I had no idea where the tokens were going.

Existing tracking tools were too slow. Scanning my 3 GB of session files (9,000+ files across three CLIs) took over 40 seconds. I wanted something instant.

So I built toktrack — a terminal-native token usage tracker that parses everything locally at 2 GiB/s.

The Data

Each AI CLI stores session data differently:

CLI Location Format
Claude Code ~/.claude/projects/**/*.jsonl JSONL, per-message usage
Codex CLI ~/.codex/sessions/**/*.jsonl JSONL, cumulative counters
Gemini CLI ~/.gemini/tmp/*/chats/*.json JSON, includes thinking_tokens

A single Claude Code session file can look like this:

{"timestamp":"2026-01-15T10:00:00Z","message":{"model":"claude-sonnet-4-20250514","usage":{"input_tokens":12000,"output_tokens":3500,"cache_read_input_tokens":8000,"cache_creation_input_tokens":2000}},"costUSD":0.042}
Enter fullscreen mode Exit fullscreen mode

Multiply this by thousands of sessions over months, and you're looking at gigabytes of JSONL to parse.

Why simd-json

Standard serde_json is good. But when you're parsing 3 GB of line-delimited JSON, every microsecond per line adds up.

simd-json is a Rust port of simdjson that uses SIMD instructions (AVX2, SSE4.2, NEON) to parse JSON significantly faster. The key trick: in-place parsing with mutable buffers.

#[derive(Deserialize)]
struct ClaudeJsonLine<'a> {
    timestamp: &'a str,              // borrowed, zero-copy
    #[serde(rename = "requestId")]
    request_id: Option<&'a str>,     // borrowed, zero-copy
    message: Option<ClaudeMessage<'a>>,
    #[serde(rename = "costUSD")]
    cost_usd: Option<f64>,
}
Enter fullscreen mode Exit fullscreen mode

By using &'a str instead of String, we avoid heap allocations for every field. simd-json parses the JSON in-place on a mutable byte buffer, and our structs just borrow slices from that buffer.

The one gotcha: simd-json's from_slice requires &mut [u8], so you need to own a mutable copy of each line:

let reader = BufReader::new(File::open(path)?);
for line in reader.lines() {
    let line = line?;
    let mut bytes = line.into_bytes();  // owned, mutable
    if let Ok(parsed) = simd_json::from_slice::<ClaudeJsonLine>(&mut bytes) {
        // extract what we need, bytes are consumed
    }
}
Enter fullscreen mode Exit fullscreen mode

This gave a 17-25% throughput improvement over standard serde_json on my dataset.

Adding Parallelism with rayon

A single-threaded parser hit ~1 GiB/s. But with 9,000+ files, we can parallelize at the file level trivially using rayon:

use rayon::prelude::*;

let entries: Vec<UsageEntry> = files
    .par_iter()
    .flat_map(|f| parser.parse_file(f).unwrap_or_default())
    .collect();
Enter fullscreen mode Exit fullscreen mode

That's it. rayon's par_iter() distributes files across threads automatically. Combined with simd-json, this pushed throughput to ~2 GiB/s — a 3.2x improvement over sequential parsing.

Stage Throughput
serde_json (baseline) ~800 MiB/s
simd-json (zero-copy) ~1.0 GiB/s
simd-json + rayon ~2.0 GiB/s

The Hard Part: Each CLI is Different

The real complexity wasn't parsing speed — it was handling three completely different data formats behind a single trait:

pub trait CLIParser: Send + Sync {
    fn name(&self) -> &str;
    fn data_dir(&self) -> PathBuf;
    fn file_pattern(&self) -> &str;
    fn parse_file(&self, path: &Path) -> Result<Vec<UsageEntry>>;
}
Enter fullscreen mode Exit fullscreen mode

Claude Code is straightforward — each JSONL line with a message.usage field is one API call.

Codex CLI was tricky. Token counts are cumulative — each token_count event reports the running total, not a delta. And the model name is in a separate turn_context line. So parsing is stateful:

line 1: session_meta  → extract session_id
line 2: turn_context  → extract model name
line 3: event_msg     → token_count (cumulative total)
line 4: event_msg     → token_count (larger cumulative total)
Enter fullscreen mode Exit fullscreen mode

You need to keep only the last token_count per session.

Gemini CLI uses standard JSON (not JSONL) with a unique thinking_tokens field that no other CLI tracks.

TUI with ratatui

For the dashboard, I used ratatui to build 4 views:

  • Overview — Total tokens/cost with a GitHub-style 52-week heatmap
  • Models — Per-model breakdown with percentage bars
  • Daily — Scrollable table with sparkline charts
  • Stats — Key metrics in a card grid

The heatmap uses 2x2 Unicode block characters to fit 52 weeks of data in a compact space, with percentile-based color intensity.

Results

On my machine (Apple Silicon, 9,000+ files, 3.4 GB total):

Time
Cold start (no cache) ~1.2s
Warm start (cached) ~0.05s

The caching layer stores daily summaries in ~/.toktrack/cache/. Past dates are immutable — only today is recomputed. This means even when Claude Code deletes session files after 30 days, your cost history survives.

Try It

npx toktrack
Enter fullscreen mode Exit fullscreen mode

GitHub: github.com/mag123c/toktrack

If you use Claude Code, Codex CLI, or Gemini CLI and want to know where your tokens are going — give it a try.

Top comments (0)