© 2005 gh555.com All Rights Reserved.
Links
- Project discussion and roadmap: https://github.com/gh555com/qqq/discussions/6
- Official download: https://www.gh555.com/qqq
- Source code: https://github.com/gh555com/qqq
- qqq IDE Development Progress since 2026.05.16: https://github.com/gh555com/qqq/discussions/7
Metrics Compass — Complete Manual
==============
Last updated: 2026-05-18
Total metrics: 48 (40 in-panel + 5 out-of-panel + 3 hidden)
Location: chat.html #metrics-panel — below the candlestick chart, above the message area
────────────────────────────────────────────────
I. Turn Zone (Per-turn, one-shot metrics) — 16 items
────────────────────────────────────────────────
[ 1 ] id="mv-in" · Label: "↑" · Name: Turn Input Tokens
──────────────────────────────────
Meaning: Total tokens sent to the AI this turn. Includes system prompt + conversation history + current user message.
Source: agent.js → usage.prompt_tokens (accumulated each turn)
Example values: 12.3K / 456K / 1.2M
Formatting: ≥1M → "1.2M", ≥1K → "12.3K", otherwise raw value
Appearance: White text, default opacity 0.65
[ 2 ] id="mv-out" · Label: "↓" · Name: Turn Output Tokens
──────────────────────────────────
Meaning: Total tokens generated by the AI this turn. Includes response body + tool-call JSON.
Source: agent.js → usage.completion_tokens (accumulated each turn)
Example values: 2.1K / 890 / 15.6K
Formatting: Same as mv-in
Appearance: White text
[ 3 ] id="mv-think" · Label: "🧠" · Name: Deep Thinking Tokens
──────────────────────────────────
Meaning: reasoning_tokens — tokens consumed by DeepSeek's internal chain-of-thought. Higher = deeper thinking, better quality but higher cost. Only has a value when thinking=enabled tier is active.
Source: agent.js → usage.completion_tokens_details.reasoning_tokens
Example values: 0 / 3.2K / 12.8K
Appearance: White text
[ 4 ] id="mv-cache" · Label: "💾" · Name: Prefix Cache Hit Rate
──────────────────────────────────
Meaning: cacheHit / (cacheHit + cacheMiss) × 100%. Higher hit rate = more savings, because cached tokens cost ~1/120th of uncached tokens.
Source: Client-side → Math.round(cacheHitTokens / total * 100)
Example values: 85% / 42% / 0%
Color rules:
≥60% → green mc-hi (high savings efficiency)
20–59% → yellow mc-warn (average)
<20% → red mc-bad (wasteful)
No data → white
[ 5 ] id="mv-hit" · Label: "Hit" · Name: Cache Hit Token Count
──────────────────────────────────
Meaning: Tokens that hit the prefix cache this turn. Billed at a very low rate (~¥0.025/M tokens).
Source: agent.js → usage.prompt_cache_hit_tokens
Example values: 24.5K / 0 / 890K
Color: Always mc-hi green (good news)
[ 6 ] id="mv-miss" · Label: "Miss" · Name: Cache Miss Token Count
──────────────────────────────────
Meaning: Tokens that did not hit the prefix cache this turn. Billed at full price (~¥3/M tokens) — 120× the cached rate.
Source: agent.js → usage.prompt_cache_miss_tokens
Example values: 1.2K / 89K / 0
Color: Always mc-bad red (reminder: this portion is expensive)
[ 7 ] id="mv-cost" · Label: "💰" · Name: Turn ge Cost
──────────────────────────────────
Meaning: Total ge consumed this turn. Billed server-side, includes all API calls + vision analysis fees. 1 ge ≈ ¥1.
Source: agent.js → _turnCostWge / 10000
Example values: 0.0032 / <0.001 / 0.1250
Formatting: ≤0 → "0", <0.001 → "<0.001", <0.01 → 4 decimal places, otherwise 3 decimal places
Appearance: White text
[ 8 ] id="mv-tier" · Label: (none) · Name: Model Tier
──────────────────────────────────
Meaning: Which model configuration is used this turn. Normal messages use Pro+Max (deep thinking); casual greetings use Flash (no thinking, extremely cheap).
Source: agent.js → _metrics.turn.tier ("⚡ Flash" or "🧠 Pro+Max")
Example values: "🧠 Pro+Max" / "⚡ Flash" / "—" (not started)
Appearance: White text, font-weight: normal, opacity: 0.7
[ 9 ] id="mv-tools" · Label: "🔧" · Name: Tool Call Count
──────────────────────────────────
Meaning: How many times the AI called tools this turn (read_file / edit_file / run_command / search_text, etc.).
Source: agent.js → _metrics.turn.toolCount (accumulated each turn)
Example values: 0 / 3 / 12
Appearance: White text
[10] id="mv-time" · Label: "⏱" · Name: Turn Duration
──────────────────────────────────
Meaning: Total time from the user sending a message to the AI completing its full reply (including all tool calls).
Source: agent.js → Date.now() - _turnStart
Example values: 2.3s / 850ms / 1.2m
Formatting: ≥60s → "1.2m", ≥1s → "2.3s", otherwise "850ms"
Appearance: White text
[11] id="mv-tps" · Label: "tok/s" · Name: Output Throughput
──────────────────────────────────
Meaning: completionTokens / duration (seconds). Measures the AI's text generation speed — higher = faster typing.
Source: Client-derived → Math.round(completionTokens / durationMs * 1000)
Example values: 85 / 120 / 0 (displays "—" when no valid data)
Color: mc-hi green
[12] id="mv-save" · Label: "¥Saved" · Name: Turn Cache Savings
──────────────────────────────────
Meaning: Money saved by cache hits this turn (in CNY). Formula: cacheHitTokens × (¥3 − ¥0.025) / 1M.
Source: Client-derived → cacheHitTokens * 2.975 / 1000000
Example values: ¥0.0731 / ¥<0.001 / — (no cache hits)
Color: mc-hi green (savings are always good)
[13] id="mv-retry" · Label: "retry" · Name: API Retry Count
──────────────────────────────────
Meaning: How many times the API call was automatically retried this turn due to 429/502/503/network errors.
Source: agent.js → _metrics.turn.retries
Example values: 0 / 1 / 3
Color rules:
≥3 → red mc-bad (very poor network)
≥1 → yellow mc-warn (some instability)
0 → white (normal)
[14] id="mv-tavg" · Label: "t̄ool" · Name: Average Tool Duration
──────────────────────────────────
Meaning: Average duration per tool call this turn (ms). Measures tool execution efficiency.
Source: Client-derived → toolTotalMs / toolCount (rounded)
Example values: 45ms / 1.2s / — (no tool calls)
Formatting: ≥1s → "1.2s", otherwise "45ms"
Appearance: White text
[15] id="mv-ttft" · Label: "TTFT" · Name: Time To First Token
──────────────────────────────────
Meaning: Time To First Token — latency from sending the HTTP request to receiving the first valid token. Lower = better network, faster inference startup.
Source: agent.js → Date.now() - _requestStartMs (timestamped on arrival of first content/reasoning/tool_call delta)
Example values: 280ms / 1.5s / 4.2s
Color rules:
≤500ms → green mc-hi (very fast)
501–1500ms → normal (no color)
1501–3000ms → yellow mc-warn (slightly slow)
3000ms → red mc-bad (severe latency)
[16] id="mv-free" · Label: (none) · Name: Free Billing Window Indicator
──────────────────────────────────
Meaning: Whether this turn falls within a free billing window. ✨ = free, — = billed normally.
Source: agent.js → _lastBillingFreeWindow (free_window field from server-side billing event)
Example values: "✨" / "—"
Appearance: White text
────────────────────────────────────────────────
II. Session Zone (Cumulative for current conversation) — 11 items
────────────────────────────────────────────────
[17] id="mv-sin" · Label: "Σ↑" · Name: Session Cumulative Input Tokens
──────────────────────────────────
Meaning: Total input tokens across all turns from the start of this conversation to now.
Source: agent.js → _metrics.session.promptTokens (adds usage.prompt_tokens each turn)
Values: Cumulative version of mv-in; numbers much larger than single-turn values
[18] id="mv-sout" · Label: "Σ↓" · Name: Session Cumulative Output Tokens
──────────────────────────────────
Meaning: Total output tokens across all turns in this conversation.
Source: agent.js → _metrics.session.completionTokens
[19] id="mv-scache" · Label: "Σ💾" · Name: Session Cumulative Cache Hits
──────────────────────────────────
Meaning: Total prefix-cache-hit token count for this conversation (tokens that saved you money).
Source: agent.js → _metrics.session.cacheHitTokens
[20] id="mv-smiss" · Label: "Σ✗" · Name: Session Cumulative Cache Misses
──────────────────────────────────
Meaning: Total prefix-cache-miss token count for this conversation.
Source: agent.js → _metrics.session.cacheMissTokens
[21] id="mv-scost" · Label: "Σ💰" · Name: Session Cumulative ge Cost
──────────────────────────────────
Meaning: Total ge consumed in this conversation. Formatted same as mv-cost.
Source: agent.js → _metrics.session.costGe
[22] id="mv-turns" · Label: "🔄" · Name: Session Turn Count
──────────────────────────────────
Meaning: How many times the user has sent messages in this conversation (each turn = one user message + one complete AI reply).
Source: agent.js → _metrics.session.turns
[23] id="mv-sretry" · Label: "Σretry" · Name: Session Cumulative Retries
──────────────────────────────────
Meaning: Total retry count across all turns in this conversation. Reflects overall network stability for the session.
Source: agent.js → _metrics.session.retries
[24] id="mv-ssave" · Label: "Σ¥Saved" · Name: Session Cumulative Savings
──────────────────────────────────
Meaning: Total money saved by caching across this entire conversation (CNY). Formatted same as mv-save.
Source: agent.js → _metrics.session.cnySaved
Color: mc-hi green
[25] id="mv-savgtps" · Label: "x̄tok/s" · Name: Session Average Throughput
──────────────────────────────────
Meaning: Session total output tokens / session total duration. Reflects average generation speed over the whole conversation.
Source: Client-derived → sessionCompletionTokens / sessionTotalDurationMs * 1000
Example values: 78 / 0 (displays "—" when no valid data)
[26] id="mv-savgsave" · Label: "x̄¥/turn" · Name: Average Savings Per Turn
──────────────────────────────────
Meaning: Total savings / total turns. Reflects average per-turn cache optimization benefit.
Source: Client-derived → sessionCnySaved / sessionTurns
Example values: ¥0.0412 / ¥<0.001 / — (no data)
Color: mc-hi green
[27] id="mv-savgretry" · Label: "x̄ret/turn" · Name: Average Retries Per Turn
──────────────────────────────────
Meaning: Total retries / total turns. Reflects network stability. >0.5 = very unstable network.
Source: Client-derived → sessionRetries / turns
Example values: 0.00 / 0.33 / 1.50
Color rules:
0.5 → red mc-bad (unstable network)
0.1 → yellow mc-warn
≤0.1 → white (normal)
────────────────────────────────────────────────
III. Engine Zone (Context Engine) — 5 items
────────────────────────────────────────────────
[28] id="mv-facts" · Label: "📚" · Name: Fact Library Count
──────────────────────────────────
Meaning: Number of structured facts extracted by the context engine from compressed old messages. Each fact contains type/content/keywords, used for subsequent semantic retrieval. More facts = the AI knows more about your project context.
Source: agent.js → _ctx.facts.length
Example values: 0 / 23 / 87 (upper limit: 100)
Appearance: White text
[29] id="mv-narr" · Label: "📖" · Name: Narrative Summary Length
──────────────────────────────────
Meaning: Character count of the global narrative summary — a coherent overview of compressed conversation history, injected into every system prompt to maintain AI context continuity across turns.
Source: agent.js → _ctx.narrative.length
Example values: 0 / 456 / 1.2K
Appearance: White text
[30] id="mv-ctx" · Label: "📊" · Name: Context Window Usage
──────────────────────────────────
Meaning: totalTokens / 800K × 100%. 800K is DeepSeek's context window compression trigger threshold. >50% = system begins auto-compressing old messages; >80% = approaching the limit.
Source: agent.js → Math.min(100, Math.round(totalTokens / 800000 * 100))
Example values: 12% / 48% / 82%
Color rules:
80% → red mc-bad (near the limit)
51–80% → yellow mc-warn (compression underway)
≤50% → green mc-hi (plenty of room)
[31] id="mv-warm" · Label: "warm" · Name: Cache Warmup Status
──────────────────────────────────
Meaning: 500ms after the panel opens, a lightweight request (max_tokens=1) is sent to DeepSeek in the background to pre-build the KV cache for the system prompt. This way the user's first message can hit the cache, saving up to 120× in cost.
Source: agent.js → _warmupStatus
Values:
"⏳" = pending — warmup in progress (panel just opened)
"✅" = ok — warmup succeeded (cache built)
"❌" = fail — warmup failed (network/auth issue)
"—" = none — not started (panel never opened)
Colors:
"✅" → green mc-hi
"⏳" → yellow mc-warn
"❌" → red mc-bad
"—" → white
[32] id="mv-lastcall" · Label: "cache" · Name: Time Since Last API Call
──────────────────────────────────
Meaning: How long since the last API call. Used to determine whether the prefix cache is still within its TTL. DeepSeek cache TTL is approximately 5–10 minutes; after 10 minutes the cache may have expired.
Source: agent.js → _lastCallTs; client calculates Date.now() - _lastCallTs
Example values: "12s ago" / "3m ago" / "15m ago" / "—" (never called)
Color rules:
<1m → green mc-hi (cache definitely still alive)
1–5m → green mc-hi
5–10m → yellow mc-warn (may be expiring soon)
10m → red mc-bad (cache most likely expired)
────────────────────────────────────────────────
IV. Lifetime Zone (Cumulative across IDE restarts) — 8 items
────────────────────────────────────────────────
These 8 items are prefixed with "∞" to indicate they survive IDE restarts. Data is persisted to VS Code globalState['qqq-ai.lifetimeMetrics'].
[33] id="mv-lturns" · Label: "∞turns" · Name: Lifetime Total Turns
──────────────────────────────────
Meaning: Total number of AI conversation turns you have initiated since installing qqq-ai. A cumulative counter that persists across IDE restarts.
Source: agent.js → _lifetime.turns
Persistence: globalState, 2s debounce flush + deactivate flush
[34] id="mv-lsess" · Label: "∞sess" · Name: Lifetime Total Sessions
──────────────────────────────────
Meaning: Total number of panel sessions you have started (switching tabs or starting a new chat does not count — only creating a new Agent instance counts).
Source: agent.js → _lifetime.sessions
[35] id="mv-lcost" · Label: "∞¥spent" · Name: Lifetime Total Spend
──────────────────────────────────
Meaning: Total ge you have ever spent on qqq AI (approximately equivalent to CNY). Formatted same as mv-cost.
Source: agent.js → _lifetime.costGe
[36] id="mv-lsave" · Label: "∞¥saved" · Name: Lifetime Total Savings
──────────────────────────────────
Meaning: Total money saved by prefix caching over your entire lifetime. Green highlighted — see how much you've saved.
Source: agent.js → _lifetime.cnySaved
Color: mc-hi green
[37] id="mv-lin" · Label: "∞↑" · Name: Lifetime Total Input Tokens
──────────────────────────────────
Meaning: Total input tokens across all your conversations (total context ever sent to the AI).
Source: agent.js → _lifetime.promptTokens
[38] id="mv-lout" · Label: "∞↓" · Name: Lifetime Total Output Tokens
──────────────────────────────────
Meaning: Total output tokens the AI has ever generated for you.
Source: agent.js → _lifetime.completionTokens
[39] id="mv-lret" · Label: "∞ret" · Name: Lifetime Total Retries
──────────────────────────────────
Meaning: Total number of automatic API retries you have ever experienced. Lower = more stable network.
Source: agent.js → _lifetime.retries
[40] id="mv-ldur" · Label: "∞⏱" · Name: Lifetime Total Duration
──────────────────────────────────
Meaning: Total time you have spent with qqq AI (sum of all turn durations).
Source: agent.js → _lifetime.durationMs
Example values: "23m" / "1.5h" / "3.2h"
Formatting: <60 min → "23m", ≥60 min → "1.5h"
────────────────────────────────────────────────
V. Zone 4 — Token Estimator
────────────────────────────────────────────────
Location: Above the input box — id="input-est-row" → id="input-est"
Function: Real-time estimation of how many tokens your current input will consume.
Algorithm:
- Count characters in the input box
- Detect whether CJK characters are present (Chinese / Japanese / Korean)
- Contains CJK → estimated tokens = character count ÷ 1.5
- Pure English → estimated tokens = character count ÷ 4
- Empty input → display "— tok"
Color rules:
- > 4000 tokens → bad (red #d77) ← this turn will be expensive
- 1001–4000 → warn (yellow #d4a017) ← moderate cost
- ≤1000 → normal grey
- Empty input → default
Display format: "~1.2K tok" or "~890 tok" or "— tok"
About "-100 tot": If you see this, the input token estimator displayed a negative value during an abnormal state. The correct value is always a non-negative integer. This is a boundary bug caused by an abnormal input box value.length (e.g., empty string being misidentified). rAF throttling is now in place and this should no longer occur under normal conditions.
────────────────────────────────────────────────
VI. Quick Reference Table
────────────────────────────────────────────────
Zone # id Label Name
──────────────────────────────────────────────────
Turn 1 mv-in ↑ Turn Input Tokens
Turn 2 mv-out ↓ Turn Output Tokens
Turn 3 mv-think 🧠 Deep Thinking Tokens
Turn 4 mv-cache 💾 Cache Hit Rate
Turn 5 mv-hit Hit Cache Hit Token Count
Turn 6 mv-miss Miss Cache Miss Token Count
Turn 7 mv-cost 💰 Turn ge Cost
Turn 8 mv-tier — Model Tier
Turn 9 mv-tools 🔧 Tool Call Count
Turn 10 mv-time ⏱ Turn Duration
Turn 11 mv-tps tok/s Output Throughput
Turn 12 mv-save ¥Saved Turn Cache Savings
Turn 13 mv-retry retry API Retry Count
Turn 14 mv-tavg t̄ool Avg Tool Duration
Turn 15 mv-ttft TTFT Time To First Token
Turn 16 mv-free — Free Billing Indicator
──────────────────────────────────────────────────
Sess 17 mv-sin Σ↑ Session Input Tokens
Sess 18 mv-sout Σ↓ Session Output Tokens
Sess 19 mv-scache Σ💾 Session Cache Hits
Sess 20 mv-smiss Σ✗ Session Cache Misses
Sess 21 mv-scost Σ💰 Session ge Cost
Sess 22 mv-turns 🔄 Session Turn Count
Sess 23 mv-sretry Σretry Session Retries
Sess 24 mv-ssave Σ¥Saved Session Savings
Sess 25 mv-savgtps x̄tok/s Avg Throughput
Sess 26 mv-savgsave x̄¥/turn Avg Savings Per Turn
Sess 27 mv-savgretry x̄ret/turn Avg Retries Per Turn
──────────────────────────────────────────────────
Eng 28 mv-facts 📚 Fact Library Count
Eng 29 mv-narr 📖 Narrative Summary Length
Eng 30 mv-ctx 📊 Context Window Usage
Eng 31 mv-warm warm Warmup Status
Eng 32 mv-lastcall cache Time Since Last Call
──────────────────────────────────────────────────
Life 33 mv-lturns ∞turns Lifetime Total Turns
Life 34 mv-lsess ∞sess Lifetime Total Sessions
Life 35 mv-lcost ∞¥spent Lifetime Total Spend
Life 36 mv-lsave ∞¥saved Lifetime Total Savings
Life 37 mv-lin ∞↑ Lifetime Total Input
Life 38 mv-lout ∞↓ Lifetime Total Output
Life 39 mv-lret ∞ret Lifetime Total Retries
Life 40 mv-ldur ∞⏱ Lifetime Total Duration
────────────────────────────────────────────────
VII. Out-of-Panel Metrics (outside #metrics-panel) — 5 items
────────────────────────────────────────────────
These metrics are not inside #metrics-panel but are distributed across other parts of the UI. They are still part of the Compass's observable system.
[41] id="cost-label" · Location: Status bar top-left · Name: Window-level Cumulative ge Cost
──────────────────────────────────
Meaning: Cumulative ge consumed since the panel was opened (window-level accumulator). Adds the current turn's cost each time a cost message arrives.
Source: chat.html → totalCostGe variable; cost handler → += parseFloat(msg.estimate)
Example values: 0.0320 ge / 1.25 ge / 12.80 ge
Formatting: <0.01 → 4 decimal places, otherwise 2 decimal places
Color: No special color, follows vscode-foreground
[42] id="rage-fill" + id="rage-label" · Location: Status bar · Name: Rage Meter
──────────────────────────────────
Meaning: Current rage value = current turn cost / baseline, as a percentage. Red progress bar + flame icon with a number.
Source: agent.js → _emitRageDot() → panel.js → rageDot message
Example values: 🔥 42 (meaning rage = 42%)
Formatting: Integer percentage
Color: Fixed red #f14c4c progress bar
[43] id="ge-bar-fill" · Location: 6px-wide vertical bar on the left side of the message area · Name: ge Fractional Progress Bar
──────────────────────────────────
Meaning: Driven by the fractional part of totalCostGe — cycles from 0 to 1 ge. Gives the user a live sense that "money is being spent."
Source: chat.html → updateGeBar(); frac = totalCostGe % 1
Example values: Height 0% ~ 100% (corresponding to ge decimal 0.00 ~ 0.99)
Formatting: h = max(5%, frac * 100)
Color: Gradient #f59e0b → #ef4444, pulses during animation
[44] id="ctx-bar" · Location: 4px-tall horizontal bar at the bottom of the panel · Name: Context Window HP Bar
──────────────────────────────────
Meaning: totalTokens / 800K × 100%, representing context window usage. Compression of old messages begins above 50%.
Source: agent.js → _updateHp() → panel.js → hp message
Example values: 12% (green) / 65% (yellow) / 92% (red)
Formatting: Integer percentage; shows at least 3% when any usage is present
Color rules: ≤50% green #4ec9b0 / 51–80% yellow #cca700 / >80% red #f14c4c
[45] id="input-est" · Location: Above the input box · Name: Input Token Estimator
──────────────────────────────────
Meaning: Real-time estimate of how many tokens the current input text will consume.
Source: chat.html → inputEl.addEventListener('input') → rAF throttle
Example values: ~890 tok / ~1.2K tok / — tok (empty input)
Formatting: ≥1000 → "~N.NK tok", otherwise "~N tok"
Color rules: >4000 red bad / >1000 yellow warn / normal grey / empty input: no class
────────────────────────────────────────────────
VIII. Hidden Metrics (in agent.js but not rendered in UI) — 3 items
────────────────────────────────────────────────
These three fields exist in the _metrics.turn object in agent.js but are never rendered in HTML.
[46] jsonMode · Name: JSON Mode Flag
──────────────────────────────────
Meaning: Marks whether this turn used JSON Mode (response_format: { type: 'json_object' }). Originally planned to be set to true in scenarios that use JSON Mode (such as plan generation/revision), but in the current codebase it is initialized and never set to true — a "reserved but not implemented" field.
Source: agent.js → _metrics.turn.jsonMode (always false)
Example values: false
Formatting: Boolean
Color: None (not rendered in UI)
[47] maxTokens · Name: Per-turn Max Output Token Limit
──────────────────────────────────
Meaning: The max_tokens parameter in the API request, which limits how many tokens the model can generate in a single turn. Currently fixed at 32768, assigned once when turn metrics are created, never changed thereafter.
Source: agent.js → _metrics.turn.maxTokens = 32768
Example values: 32768
Formatting: Integer
Color: None (not rendered in UI)
[48] toolTotalMs · Name: Cumulative Tool Call Duration
──────────────────────────────────
Meaning: Sum of all tool call durations this turn (ms). Accumulated by adding Date.now() - start after each tool completes. Used to derive toolAvgMs = toolTotalMs / toolCount; not displayed directly.
Source: agent.js → _executeToolCallsParallel → this._metrics.turn.toolTotalMs += (Date.now() - _toolStart)
Example values: 1250 (ms)
Formatting: Integer milliseconds
Color: None (not rendered in UI; derived as toolAvgMs for panel display)
Top comments (0)