DEV Community

mag123c
mag123c

Posted on

Parsing 2 GiB/s of AI token logs with Rust + simd-json

The Problem

I use Claude Code, Codex CLI, and Gemini CLI daily. One day I checked my API bill — it was way higher than expected. But I had no idea where the tokens were going.

Existing tracking tools were too slow. Scanning my 3 GB of session files (9,000+ files across three CLIs) took over 40 seconds. I wanted something instant.

So I built toktrack — a terminal-native token usage tracker that parses everything locally at 2 GiB/s.

The Data

Each AI CLI stores session data differently:

CLI Location Format
Claude Code ~/.claude/projects/**/*.jsonl JSONL, per-message usage
Codex CLI ~/.codex/sessions/**/*.jsonl JSONL, cumulative counters
Gemini CLI ~/.gemini/tmp/*/chats/*.json JSON, includes thinking_tokens

A single Claude Code session file can look like this:

{"timestamp":"2026-01-15T10:00:00Z","message":{"model":"claude-sonnet-4-20250514","usage":{"input_tokens":12000,"output_tokens":3500,"cache_read_input_tokens":8000,"cache_creation_input_tokens":2000}},"costUSD":0.042}
Enter fullscreen mode Exit fullscreen mode

Multiply this by thousands of sessions over months, and you're looking at gigabytes of JSONL to parse.

Why simd-json

Standard serde_json is good. But when you're parsing 3 GB of line-delimited JSON, every microsecond per line adds up.

simd-json is a Rust port of simdjson that uses SIMD instructions (AVX2, SSE4.2, NEON) to parse JSON significantly faster. The key trick: in-place parsing with mutable buffers.

#[derive(Deserialize)]
struct ClaudeJsonLine<'a> {
    timestamp: &'a str,              // borrowed, zero-copy
    #[serde(rename = "requestId")]
    request_id: Option<&'a str>,     // borrowed, zero-copy
    message: Option<ClaudeMessage<'a>>,
    #[serde(rename = "costUSD")]
    cost_usd: Option<f64>,
}
Enter fullscreen mode Exit fullscreen mode

By using &'a str instead of String, we avoid heap allocations for every field. simd-json parses the JSON in-place on a mutable byte buffer, and our structs just borrow slices from that buffer.

The one gotcha: simd-json's from_slice requires &mut [u8], so you need to own a mutable copy of each line:

let reader = BufReader::new(File::open(path)?);
for line in reader.lines() {
    let line = line?;
    let mut bytes = line.into_bytes();  // owned, mutable
    if let Ok(parsed) = simd_json::from_slice::<ClaudeJsonLine>(&mut bytes) {
        // extract what we need, bytes are consumed
    }
}
Enter fullscreen mode Exit fullscreen mode

This gave a 17-25% throughput improvement over standard serde_json on my dataset.

Adding Parallelism with rayon

A single-threaded parser hit ~1 GiB/s. But with 9,000+ files, we can parallelize at the file level trivially using rayon:

use rayon::prelude::*;

let entries: Vec<UsageEntry> = files
    .par_iter()
    .flat_map(|f| parser.parse_file(f).unwrap_or_default())
    .collect();
Enter fullscreen mode Exit fullscreen mode

That's it. rayon's par_iter() distributes files across threads automatically. Combined with simd-json, this pushed throughput to ~2 GiB/s — a 3.2x improvement over sequential parsing.

Stage Throughput
serde_json (baseline) ~800 MiB/s
simd-json (zero-copy) ~1.0 GiB/s
simd-json + rayon ~2.0 GiB/s

The Hard Part: Each CLI is Different

The real complexity wasn't parsing speed — it was handling three completely different data formats behind a single trait:

pub trait CLIParser: Send + Sync {
    fn name(&self) -> &str;
    fn data_dir(&self) -> PathBuf;
    fn file_pattern(&self) -> &str;
    fn parse_file(&self, path: &Path) -> Result<Vec<UsageEntry>>;
}
Enter fullscreen mode Exit fullscreen mode

Claude Code is straightforward — each JSONL line with a message.usage field is one API call.

Codex CLI was tricky. Token counts are cumulative — each token_count event reports the running total, not a delta. And the model name is in a separate turn_context line. So parsing is stateful:

line 1: session_meta  → extract session_id
line 2: turn_context  → extract model name
line 3: event_msg     → token_count (cumulative total)
line 4: event_msg     → token_count (larger cumulative total)
Enter fullscreen mode Exit fullscreen mode

You need to keep only the last token_count per session.

Gemini CLI uses standard JSON (not JSONL) with a unique thinking_tokens field that no other CLI tracks.

TUI with ratatui

For the dashboard, I used ratatui to build 4 views:

  • Overview — Total tokens/cost with a GitHub-style 52-week heatmap
  • Models — Per-model breakdown with percentage bars
  • Daily — Scrollable table with sparkline charts
  • Stats — Key metrics in a card grid

The heatmap uses 2x2 Unicode block characters to fit 52 weeks of data in a compact space, with percentile-based color intensity.

Results

On my machine (Apple Silicon, 9,000+ files, 3.4 GB total):

Time
Cold start (no cache) ~1.2s
Warm start (cached) ~0.05s

The caching layer stores daily summaries in ~/.toktrack/cache/. Past dates are immutable — only today is recomputed. This means even when Claude Code deletes session files after 30 days, your cost history survives.

Try It

npx toktrack
# or
cargo install toktrack
Enter fullscreen mode Exit fullscreen mode

GitHub: github.com/mag123c/toktrack

If you use Claude Code, Codex CLI, or Gemini CLI and want to know where your tokens are going — give it a try.

Top comments (0)