The Problem
I use Claude Code, Codex CLI, and Gemini CLI daily. One day I checked my API bill — it was way higher than expected. But I had no idea where the tokens were going.
Existing tracking tools were too slow. Scanning my 3 GB of session files (9,000+ files across three CLIs) took over 40 seconds. I wanted something instant.
So I built toktrack — a terminal-native token usage tracker that parses everything locally at 2 GiB/s.
The Data
Each AI CLI stores session data differently:
| CLI | Location | Format |
|---|---|---|
| Claude Code | ~/.claude/projects/**/*.jsonl |
JSONL, per-message usage |
| Codex CLI | ~/.codex/sessions/**/*.jsonl |
JSONL, cumulative counters |
| Gemini CLI | ~/.gemini/tmp/*/chats/*.json |
JSON, includes thinking_tokens |
A single Claude Code session file can look like this:
{"timestamp":"2026-01-15T10:00:00Z","message":{"model":"claude-sonnet-4-20250514","usage":{"input_tokens":12000,"output_tokens":3500,"cache_read_input_tokens":8000,"cache_creation_input_tokens":2000}},"costUSD":0.042}
Multiply this by thousands of sessions over months, and you're looking at gigabytes of JSONL to parse.
Why simd-json
Standard serde_json is good. But when you're parsing 3 GB of line-delimited JSON, every microsecond per line adds up.
simd-json is a Rust port of simdjson that uses SIMD instructions (AVX2, SSE4.2, NEON) to parse JSON significantly faster. The key trick: in-place parsing with mutable buffers.
#[derive(Deserialize)]
struct ClaudeJsonLine<'a> {
timestamp: &'a str, // borrowed, zero-copy
#[serde(rename = "requestId")]
request_id: Option<&'a str>, // borrowed, zero-copy
message: Option<ClaudeMessage<'a>>,
#[serde(rename = "costUSD")]
cost_usd: Option<f64>,
}
By using &'a str instead of String, we avoid heap allocations for every field. simd-json parses the JSON in-place on a mutable byte buffer, and our structs just borrow slices from that buffer.
The one gotcha: simd-json's from_slice requires &mut [u8], so you need to own a mutable copy of each line:
let reader = BufReader::new(File::open(path)?);
for line in reader.lines() {
let line = line?;
let mut bytes = line.into_bytes(); // owned, mutable
if let Ok(parsed) = simd_json::from_slice::<ClaudeJsonLine>(&mut bytes) {
// extract what we need, bytes are consumed
}
}
This gave a 17-25% throughput improvement over standard serde_json on my dataset.
Adding Parallelism with rayon
A single-threaded parser hit ~1 GiB/s. But with 9,000+ files, we can parallelize at the file level trivially using rayon:
use rayon::prelude::*;
let entries: Vec<UsageEntry> = files
.par_iter()
.flat_map(|f| parser.parse_file(f).unwrap_or_default())
.collect();
That's it. rayon's par_iter() distributes files across threads automatically. Combined with simd-json, this pushed throughput to ~2 GiB/s — a 3.2x improvement over sequential parsing.
| Stage | Throughput |
|---|---|
| serde_json (baseline) | ~800 MiB/s |
| simd-json (zero-copy) | ~1.0 GiB/s |
| simd-json + rayon | ~2.0 GiB/s |
The Hard Part: Each CLI is Different
The real complexity wasn't parsing speed — it was handling three completely different data formats behind a single trait:
pub trait CLIParser: Send + Sync {
fn name(&self) -> &str;
fn data_dir(&self) -> PathBuf;
fn file_pattern(&self) -> &str;
fn parse_file(&self, path: &Path) -> Result<Vec<UsageEntry>>;
}
Claude Code is straightforward — each JSONL line with a message.usage field is one API call.
Codex CLI was tricky. Token counts are cumulative — each token_count event reports the running total, not a delta. And the model name is in a separate turn_context line. So parsing is stateful:
line 1: session_meta → extract session_id
line 2: turn_context → extract model name
line 3: event_msg → token_count (cumulative total)
line 4: event_msg → token_count (larger cumulative total)
You need to keep only the last token_count per session.
Gemini CLI uses standard JSON (not JSONL) with a unique thinking_tokens field that no other CLI tracks.
TUI with ratatui
For the dashboard, I used ratatui to build 4 views:
- Overview — Total tokens/cost with a GitHub-style 52-week heatmap
- Models — Per-model breakdown with percentage bars
- Daily — Scrollable table with sparkline charts
- Stats — Key metrics in a card grid
The heatmap uses 2x2 Unicode block characters to fit 52 weeks of data in a compact space, with percentile-based color intensity.
Results
On my machine (Apple Silicon, 9,000+ files, 3.4 GB total):
| Time | |
|---|---|
| Cold start (no cache) | ~1.2s |
| Warm start (cached) | ~0.05s |
The caching layer stores daily summaries in ~/.toktrack/cache/. Past dates are immutable — only today is recomputed. This means even when Claude Code deletes session files after 30 days, your cost history survives.
Try It
npx toktrack
GitHub: github.com/mag123c/toktrack
If you use Claude Code, Codex CLI, or Gemini CLI and want to know where your tokens are going — give it a try.
Top comments (2)
The Codex cumulative-counter bit is the part people usually miss. Nice callout on caching daily summaries too, because raw CLI logs disappearing or changing shape can make historical cost analysis surprisingly fragile.
Some comments may only be visible to logged-in visitors. Sign in to view all comments.