The Problem
I use Claude Code, Codex CLI, and Gemini CLI daily. One day I checked my API bill — it was way higher than expected. But I had no idea where the tokens were going.
Existing tracking tools were too slow. Scanning my 3 GB of session files (9,000+ files across three CLIs) took over 40 seconds. I wanted something instant.
So I built toktrack — a terminal-native token usage tracker that parses everything locally at 2 GiB/s.
The Data
Each AI CLI stores session data differently:
| CLI | Location | Format |
|---|---|---|
| Claude Code | ~/.claude/projects/**/*.jsonl |
JSONL, per-message usage |
| Codex CLI | ~/.codex/sessions/**/*.jsonl |
JSONL, cumulative counters |
| Gemini CLI | ~/.gemini/tmp/*/chats/*.json |
JSON, includes thinking_tokens |
A single Claude Code session file can look like this:
{"timestamp":"2026-01-15T10:00:00Z","message":{"model":"claude-sonnet-4-20250514","usage":{"input_tokens":12000,"output_tokens":3500,"cache_read_input_tokens":8000,"cache_creation_input_tokens":2000}},"costUSD":0.042}
Multiply this by thousands of sessions over months, and you're looking at gigabytes of JSONL to parse.
Why simd-json
Standard serde_json is good. But when you're parsing 3 GB of line-delimited JSON, every microsecond per line adds up.
simd-json is a Rust port of simdjson that uses SIMD instructions (AVX2, SSE4.2, NEON) to parse JSON significantly faster. The key trick: in-place parsing with mutable buffers.
#[derive(Deserialize)]
struct ClaudeJsonLine<'a> {
timestamp: &'a str, // borrowed, zero-copy
#[serde(rename = "requestId")]
request_id: Option<&'a str>, // borrowed, zero-copy
message: Option<ClaudeMessage<'a>>,
#[serde(rename = "costUSD")]
cost_usd: Option<f64>,
}
By using &'a str instead of String, we avoid heap allocations for every field. simd-json parses the JSON in-place on a mutable byte buffer, and our structs just borrow slices from that buffer.
The one gotcha: simd-json's from_slice requires &mut [u8], so you need to own a mutable copy of each line:
let reader = BufReader::new(File::open(path)?);
for line in reader.lines() {
let line = line?;
let mut bytes = line.into_bytes(); // owned, mutable
if let Ok(parsed) = simd_json::from_slice::<ClaudeJsonLine>(&mut bytes) {
// extract what we need, bytes are consumed
}
}
This gave a 17-25% throughput improvement over standard serde_json on my dataset.
Adding Parallelism with rayon
A single-threaded parser hit ~1 GiB/s. But with 9,000+ files, we can parallelize at the file level trivially using rayon:
use rayon::prelude::*;
let entries: Vec<UsageEntry> = files
.par_iter()
.flat_map(|f| parser.parse_file(f).unwrap_or_default())
.collect();
That's it. rayon's par_iter() distributes files across threads automatically. Combined with simd-json, this pushed throughput to ~2 GiB/s — a 3.2x improvement over sequential parsing.
| Stage | Throughput |
|---|---|
| serde_json (baseline) | ~800 MiB/s |
| simd-json (zero-copy) | ~1.0 GiB/s |
| simd-json + rayon | ~2.0 GiB/s |
The Hard Part: Each CLI is Different
The real complexity wasn't parsing speed — it was handling three completely different data formats behind a single trait:
pub trait CLIParser: Send + Sync {
fn name(&self) -> &str;
fn data_dir(&self) -> PathBuf;
fn file_pattern(&self) -> &str;
fn parse_file(&self, path: &Path) -> Result<Vec<UsageEntry>>;
}
Claude Code is straightforward — each JSONL line with a message.usage field is one API call.
Codex CLI was tricky. Token counts are cumulative — each token_count event reports the running total, not a delta. And the model name is in a separate turn_context line. So parsing is stateful:
line 1: session_meta → extract session_id
line 2: turn_context → extract model name
line 3: event_msg → token_count (cumulative total)
line 4: event_msg → token_count (larger cumulative total)
You need to keep only the last token_count per session.
Gemini CLI uses standard JSON (not JSONL) with a unique thinking_tokens field that no other CLI tracks.
TUI with ratatui
For the dashboard, I used ratatui to build 4 views:
- Overview — Total tokens/cost with a GitHub-style 52-week heatmap
- Models — Per-model breakdown with percentage bars
- Daily — Scrollable table with sparkline charts
- Stats — Key metrics in a card grid
The heatmap uses 2x2 Unicode block characters to fit 52 weeks of data in a compact space, with percentile-based color intensity.
Results
On my machine (Apple Silicon, 9,000+ files, 3.4 GB total):
| Time | |
|---|---|
| Cold start (no cache) | ~1.2s |
| Warm start (cached) | ~0.05s |
The caching layer stores daily summaries in ~/.toktrack/cache/. Past dates are immutable — only today is recomputed. This means even when Claude Code deletes session files after 30 days, your cost history survives.
Try It
npx toktrack
# or
cargo install toktrack
GitHub: github.com/mag123c/toktrack
If you use Claude Code, Codex CLI, or Gemini CLI and want to know where your tokens are going — give it a try.
Top comments (0)