GH Health

Posted on May 22

qqq AI: Metrics Compass — Complete Manual

#ai #productivity #programming #webdev

- qqq IDE Development Progress since 2026.05.16: https://github.com/gh555com/qqq/discussions/7

Metrics Compass — Complete Manual

==============
Last updated: 2026-05-18
Total metrics: 48 (40 in-panel + 5 out-of-panel + 3 hidden)
Location: chat.html #metrics-panel — below the candlestick chart, above the message area

────────────────────────────────────────────────

I. Turn Zone (Per-turn, one-shot metrics) — 16 items

────────────────────────────────────────────────

[ 1 ] id="mv-in" · Label: "↑" · Name: Turn Input Tokens
──────────────────────────────────
Meaning: Total tokens sent to the AI this turn. Includes system prompt + conversation history + current user message.
Source: agent.js → usage.prompt_tokens (accumulated each turn)
Example values: 12.3K / 456K / 1.2M
Formatting: ≥1M → "1.2M", ≥1K → "12.3K", otherwise raw value
Appearance: White text, default opacity 0.65

[ 2 ] id="mv-out" · Label: "↓" · Name: Turn Output Tokens
──────────────────────────────────
Meaning: Total tokens generated by the AI this turn. Includes response body + tool-call JSON.
Source: agent.js → usage.completion_tokens (accumulated each turn)
Example values: 2.1K / 890 / 15.6K
Formatting: Same as mv-in
Appearance: White text

[ 3 ] id="mv-think" · Label: "🧠" · Name: Deep Thinking Tokens
──────────────────────────────────
Meaning: reasoning_tokens — tokens consumed by DeepSeek's internal chain-of-thought. Higher = deeper thinking, better quality but higher cost. Only has a value when thinking=enabled tier is active.
Source: agent.js → usage.completion_tokens_details.reasoning_tokens
Example values: 0 / 3.2K / 12.8K
Appearance: White text

[ 4 ] id="mv-cache" · Label: "💾" · Name: Prefix Cache Hit Rate
──────────────────────────────────
Meaning: cacheHit / (cacheHit + cacheMiss) × 100%. Higher hit rate = more savings, because cached tokens cost ~1/120th of uncached tokens.
Source: Client-side → Math.round(cacheHitTokens / total * 100)
Example values: 85% / 42% / 0%
Color rules:
≥60% → green mc-hi (high savings efficiency)
20–59% → yellow mc-warn (average)
<20% → red mc-bad (wasteful)
No data → white

[ 5 ] id="mv-hit" · Label: "Hit" · Name: Cache Hit Token Count
──────────────────────────────────
Meaning: Tokens that hit the prefix cache this turn. Billed at a very low rate (~¥0.025/M tokens).
Source: agent.js → usage.prompt_cache_hit_tokens
Example values: 24.5K / 0 / 890K
Color: Always mc-hi green (good news)

[ 6 ] id="mv-miss" · Label: "Miss" · Name: Cache Miss Token Count
──────────────────────────────────
Meaning: Tokens that did not hit the prefix cache this turn. Billed at full price (~¥3/M tokens) — 120× the cached rate.
Source: agent.js → usage.prompt_cache_miss_tokens
Example values: 1.2K / 89K / 0
Color: Always mc-bad red (reminder: this portion is expensive)

[ 7 ] id="mv-cost" · Label: "💰" · Name: Turn ge Cost
──────────────────────────────────
Meaning: Total ge consumed this turn. Billed server-side, includes all API calls + vision analysis fees. 1 ge ≈ ¥1.
Source: agent.js → _turnCostWge / 10000
Example values: 0.0032 / <0.001 / 0.1250
Formatting: ≤0 → "0", <0.001 → "<0.001", <0.01 → 4 decimal places, otherwise 3 decimal places
Appearance: White text

[ 8 ] id="mv-tier" · Label: (none) · Name: Model Tier
──────────────────────────────────
Meaning: Which model configuration is used this turn. Normal messages use Pro+Max (deep thinking); casual greetings use Flash (no thinking, extremely cheap).
Source: agent.js → _metrics.turn.tier ("⚡ Flash" or "🧠 Pro+Max")
Example values: "🧠 Pro+Max" / "⚡ Flash" / "—" (not started)
Appearance: White text, font-weight: normal, opacity: 0.7

[ 9 ] id="mv-tools" · Label: "🔧" · Name: Tool Call Count
──────────────────────────────────
Meaning: How many times the AI called tools this turn (read_file / edit_file / run_command / search_text, etc.).
Source: agent.js → _metrics.turn.toolCount (accumulated each turn)
Example values: 0 / 3 / 12
Appearance: White text

[10] id="mv-time" · Label: "⏱" · Name: Turn Duration
──────────────────────────────────
Meaning: Total time from the user sending a message to the AI completing its full reply (including all tool calls).
Source: agent.js → Date.now() - _turnStart
Example values: 2.3s / 850ms / 1.2m
Formatting: ≥60s → "1.2m", ≥1s → "2.3s", otherwise "850ms"
Appearance: White text

[11] id="mv-tps" · Label: "tok/s" · Name: Output Throughput
──────────────────────────────────
Meaning: completionTokens / duration (seconds). Measures the AI's text generation speed — higher = faster typing.
Source: Client-derived → Math.round(completionTokens / durationMs * 1000)
Example values: 85 / 120 / 0 (displays "—" when no valid data)
Color: mc-hi green

[12] id="mv-save" · Label: "¥Saved" · Name: Turn Cache Savings
──────────────────────────────────
Meaning: Money saved by cache hits this turn (in CNY). Formula: cacheHitTokens × (¥3 − ¥0.025) / 1M.
Source: Client-derived → cacheHitTokens * 2.975 / 1000000
Example values: ¥0.0731 / ¥<0.001 / — (no cache hits)
Color: mc-hi green (savings are always good)

[13] id="mv-retry" · Label: "retry" · Name: API Retry Count
──────────────────────────────────
Meaning: How many times the API call was automatically retried this turn due to 429/502/503/network errors.
Source: agent.js → _metrics.turn.retries
Example values: 0 / 1 / 3
Color rules:
≥3 → red mc-bad (very poor network)
≥1 → yellow mc-warn (some instability)
0 → white (normal)

[14] id="mv-tavg" · Label: "t̄ool" · Name: Average Tool Duration
──────────────────────────────────
Meaning: Average duration per tool call this turn (ms). Measures tool execution efficiency.
Source: Client-derived → toolTotalMs / toolCount (rounded)
Example values: 45ms / 1.2s / — (no tool calls)
Formatting: ≥1s → "1.2s", otherwise "45ms"
Appearance: White text

[15] id="mv-ttft" · Label: "TTFT" · Name: Time To First Token
──────────────────────────────────
Meaning: Time To First Token — latency from sending the HTTP request to receiving the first valid token. Lower = better network, faster inference startup.
Source: agent.js → Date.now() - _requestStartMs (timestamped on arrival of first content/reasoning/tool_call delta)
Example values: 280ms / 1.5s / 4.2s
Color rules:
≤500ms → green mc-hi (very fast)
501–1500ms → normal (no color)
1501–3000ms → yellow mc-warn (slightly slow)

3000ms → red mc-bad (severe latency)

[16] id="mv-free" · Label: (none) · Name: Free Billing Window Indicator
──────────────────────────────────
Meaning: Whether this turn falls within a free billing window. ✨ = free, — = billed normally.
Source: agent.js → _lastBillingFreeWindow (free_window field from server-side billing event)
Example values: "✨" / "—"
Appearance: White text

────────────────────────────────────────────────

II. Session Zone (Cumulative for current conversation) — 11 items

────────────────────────────────────────────────

[17] id="mv-sin" · Label: "Σ↑" · Name: Session Cumulative Input Tokens
──────────────────────────────────
Meaning: Total input tokens across all turns from the start of this conversation to now.
Source: agent.js → _metrics.session.promptTokens (adds usage.prompt_tokens each turn)
Values: Cumulative version of mv-in; numbers much larger than single-turn values

[18] id="mv-sout" · Label: "Σ↓" · Name: Session Cumulative Output Tokens
──────────────────────────────────
Meaning: Total output tokens across all turns in this conversation.
Source: agent.js → _metrics.session.completionTokens

[19] id="mv-scache" · Label: "Σ💾" · Name: Session Cumulative Cache Hits
──────────────────────────────────
Meaning: Total prefix-cache-hit token count for this conversation (tokens that saved you money).
Source: agent.js → _metrics.session.cacheHitTokens

[20] id="mv-smiss" · Label: "Σ✗" · Name: Session Cumulative Cache Misses
──────────────────────────────────
Meaning: Total prefix-cache-miss token count for this conversation.
Source: agent.js → _metrics.session.cacheMissTokens

[21] id="mv-scost" · Label: "Σ💰" · Name: Session Cumulative ge Cost
──────────────────────────────────
Meaning: Total ge consumed in this conversation. Formatted same as mv-cost.
Source: agent.js → _metrics.session.costGe

[22] id="mv-turns" · Label: "🔄" · Name: Session Turn Count
──────────────────────────────────
Meaning: How many times the user has sent messages in this conversation (each turn = one user message + one complete AI reply).
Source: agent.js → _metrics.session.turns

[23] id="mv-sretry" · Label: "Σretry" · Name: Session Cumulative Retries
──────────────────────────────────
Meaning: Total retry count across all turns in this conversation. Reflects overall network stability for the session.
Source: agent.js → _metrics.session.retries

[24] id="mv-ssave" · Label: "Σ¥Saved" · Name: Session Cumulative Savings
──────────────────────────────────
Meaning: Total money saved by caching across this entire conversation (CNY). Formatted same as mv-save.
Source: agent.js → _metrics.session.cnySaved
Color: mc-hi green

[25] id="mv-savgtps" · Label: "x̄tok/s" · Name: Session Average Throughput
──────────────────────────────────
Meaning: Session total output tokens / session total duration. Reflects average generation speed over the whole conversation.
Source: Client-derived → sessionCompletionTokens / sessionTotalDurationMs * 1000
Example values: 78 / 0 (displays "—" when no valid data)

[26] id="mv-savgsave" · Label: "x̄¥/turn" · Name: Average Savings Per Turn
──────────────────────────────────
Meaning: Total savings / total turns. Reflects average per-turn cache optimization benefit.
Source: Client-derived → sessionCnySaved / sessionTurns
Example values: ¥0.0412 / ¥<0.001 / — (no data)
Color: mc-hi green

[27] id="mv-savgretry" · Label: "x̄ret/turn" · Name: Average Retries Per Turn
──────────────────────────────────
Meaning: Total retries / total turns. Reflects network stability. >0.5 = very unstable network.
Source: Client-derived → sessionRetries / turns
Example values: 0.00 / 0.33 / 1.50
Color rules:

0.5 → red mc-bad (unstable network)
0.1 → yellow mc-warn
≤0.1 → white (normal)

────────────────────────────────────────────────

III. Engine Zone (Context Engine) — 5 items

────────────────────────────────────────────────

[28] id="mv-facts" · Label: "📚" · Name: Fact Library Count
──────────────────────────────────
Meaning: Number of structured facts extracted by the context engine from compressed old messages. Each fact contains type/content/keywords, used for subsequent semantic retrieval. More facts = the AI knows more about your project context.
Source: agent.js → _ctx.facts.length
Example values: 0 / 23 / 87 (upper limit: 100)
Appearance: White text

[29] id="mv-narr" · Label: "📖" · Name: Narrative Summary Length
──────────────────────────────────
Meaning: Character count of the global narrative summary — a coherent overview of compressed conversation history, injected into every system prompt to maintain AI context continuity across turns.
Source: agent.js → _ctx.narrative.length
Example values: 0 / 456 / 1.2K
Appearance: White text

[30] id="mv-ctx" · Label: "📊" · Name: Context Window Usage
──────────────────────────────────
Meaning: totalTokens / 800K × 100%. 800K is DeepSeek's context window compression trigger threshold. >50% = system begins auto-compressing old messages; >80% = approaching the limit.
Source: agent.js → Math.min(100, Math.round(totalTokens / 800000 * 100))
Example values: 12% / 48% / 82%
Color rules:

80% → red mc-bad (near the limit)
51–80% → yellow mc-warn (compression underway)
≤50% → green mc-hi (plenty of room)

[31] id="mv-warm" · Label: "warm" · Name: Cache Warmup Status
──────────────────────────────────
Meaning: 500ms after the panel opens, a lightweight request (max_tokens=1) is sent to DeepSeek in the background to pre-build the KV cache for the system prompt. This way the user's first message can hit the cache, saving up to 120× in cost.
Source: agent.js → _warmupStatus
Values:
"⏳" = pending — warmup in progress (panel just opened)
"✅" = ok — warmup succeeded (cache built)
"❌" = fail — warmup failed (network/auth issue)
"—" = none — not started (panel never opened)
Colors:
"✅" → green mc-hi
"⏳" → yellow mc-warn
"❌" → red mc-bad
"—" → white

[32] id="mv-lastcall" · Label: "cache" · Name: Time Since Last API Call
──────────────────────────────────
Meaning: How long since the last API call. Used to determine whether the prefix cache is still within its TTL. DeepSeek cache TTL is approximately 5–10 minutes; after 10 minutes the cache may have expired.
Source: agent.js → _lastCallTs; client calculates Date.now() - _lastCallTs
Example values: "12s ago" / "3m ago" / "15m ago" / "—" (never called)
Color rules:
<1m → green mc-hi (cache definitely still alive)
1–5m → green mc-hi
5–10m → yellow mc-warn (may be expiring soon)

10m → red mc-bad (cache most likely expired)

────────────────────────────────────────────────

IV. Lifetime Zone (Cumulative across IDE restarts) — 8 items

────────────────────────────────────────────────
These 8 items are prefixed with "∞" to indicate they survive IDE restarts. Data is persisted to VS Code globalState['qqq-ai.lifetimeMetrics'].

[33] id="mv-lturns" · Label: "∞turns" · Name: Lifetime Total Turns
──────────────────────────────────
Meaning: Total number of AI conversation turns you have initiated since installing qqq-ai. A cumulative counter that persists across IDE restarts.
Source: agent.js → _lifetime.turns
Persistence: globalState, 2s debounce flush + deactivate flush

[34] id="mv-lsess" · Label: "∞sess" · Name: Lifetime Total Sessions
──────────────────────────────────
Meaning: Total number of panel sessions you have started (switching tabs or starting a new chat does not count — only creating a new Agent instance counts).
Source: agent.js → _lifetime.sessions

[35] id="mv-lcost" · Label: "∞¥spent" · Name: Lifetime Total Spend
──────────────────────────────────
Meaning: Total ge you have ever spent on qqq AI (approximately equivalent to CNY). Formatted same as mv-cost.
Source: agent.js → _lifetime.costGe

[36] id="mv-lsave" · Label: "∞¥saved" · Name: Lifetime Total Savings
──────────────────────────────────
Meaning: Total money saved by prefix caching over your entire lifetime. Green highlighted — see how much you've saved.
Source: agent.js → _lifetime.cnySaved
Color: mc-hi green

[37] id="mv-lin" · Label: "∞↑" · Name: Lifetime Total Input Tokens
──────────────────────────────────
Meaning: Total input tokens across all your conversations (total context ever sent to the AI).
Source: agent.js → _lifetime.promptTokens

[38] id="mv-lout" · Label: "∞↓" · Name: Lifetime Total Output Tokens
──────────────────────────────────
Meaning: Total output tokens the AI has ever generated for you.
Source: agent.js → _lifetime.completionTokens

[39] id="mv-lret" · Label: "∞ret" · Name: Lifetime Total Retries
──────────────────────────────────
Meaning: Total number of automatic API retries you have ever experienced. Lower = more stable network.
Source: agent.js → _lifetime.retries

[40] id="mv-ldur" · Label: "∞⏱" · Name: Lifetime Total Duration
──────────────────────────────────
Meaning: Total time you have spent with qqq AI (sum of all turn durations).
Source: agent.js → _lifetime.durationMs
Example values: "23m" / "1.5h" / "3.2h"
Formatting: <60 min → "23m", ≥60 min → "1.5h"

────────────────────────────────────────────────

V. Zone 4 — Token Estimator

────────────────────────────────────────────────

Location: Above the input box — id="input-est-row" → id="input-est"

Function: Real-time estimation of how many tokens your current input will consume.

Algorithm:

Count characters in the input box
Detect whether CJK characters are present (Chinese / Japanese / Korean)
Contains CJK → estimated tokens = character count ÷ 1.5
Pure English → estimated tokens = character count ÷ 4
Empty input → display "— tok"

Color rules:

> 4000 tokens → bad (red #d77) ← this turn will be expensive
1001–4000 → warn (yellow #d4a017) ← moderate cost
≤1000 → normal grey
Empty input → default

Display format: "~1.2K tok" or "~890 tok" or "— tok"

About "-100 tot": If you see this, the input token estimator displayed a negative value during an abnormal state. The correct value is always a non-negative integer. This is a boundary bug caused by an abnormal input box value.length (e.g., empty string being misidentified). rAF throttling is now in place and this should no longer occur under normal conditions.

────────────────────────────────────────────────

VI. Quick Reference Table

────────────────────────────────────────────────

Zone    #    id              Label        Name
──────────────────────────────────────────────────
Turn    1    mv-in           ↑            Turn Input Tokens
Turn    2    mv-out          ↓            Turn Output Tokens
Turn    3    mv-think        🧠           Deep Thinking Tokens
Turn    4    mv-cache        💾           Cache Hit Rate
Turn    5    mv-hit          Hit          Cache Hit Token Count
Turn    6    mv-miss         Miss         Cache Miss Token Count
Turn    7    mv-cost         💰           Turn ge Cost
Turn    8    mv-tier         —            Model Tier
Turn    9    mv-tools        🔧           Tool Call Count
Turn   10    mv-time         ⏱           Turn Duration
Turn   11    mv-tps          tok/s        Output Throughput
Turn   12    mv-save         ¥Saved       Turn Cache Savings
Turn   13    mv-retry        retry        API Retry Count
Turn   14    mv-tavg         t̄ool        Avg Tool Duration
Turn   15    mv-ttft         TTFT         Time To First Token
Turn   16    mv-free         —            Free Billing Indicator
──────────────────────────────────────────────────
Sess   17    mv-sin          Σ↑           Session Input Tokens
Sess   18    mv-sout         Σ↓           Session Output Tokens
Sess   19    mv-scache       Σ💾          Session Cache Hits
Sess   20    mv-smiss        Σ✗           Session Cache Misses
Sess   21    mv-scost        Σ💰          Session ge Cost
Sess   22    mv-turns        🔄           Session Turn Count
Sess   23    mv-sretry       Σretry       Session Retries
Sess   24    mv-ssave        Σ¥Saved      Session Savings
Sess   25    mv-savgtps      x̄tok/s      Avg Throughput
Sess   26    mv-savgsave     x̄¥/turn     Avg Savings Per Turn
Sess   27    mv-savgretry    x̄ret/turn   Avg Retries Per Turn
──────────────────────────────────────────────────
Eng    28    mv-facts        📚           Fact Library Count
Eng    29    mv-narr         📖           Narrative Summary Length
Eng    30    mv-ctx          📊           Context Window Usage
Eng    31    mv-warm         warm         Warmup Status
Eng    32    mv-lastcall     cache        Time Since Last Call
──────────────────────────────────────────────────
Life   33    mv-lturns       ∞turns       Lifetime Total Turns
Life   34    mv-lsess        ∞sess        Lifetime Total Sessions
Life   35    mv-lcost        ∞¥spent      Lifetime Total Spend
Life   36    mv-lsave        ∞¥saved      Lifetime Total Savings
Life   37    mv-lin          ∞↑           Lifetime Total Input
Life   38    mv-lout         ∞↓           Lifetime Total Output
Life   39    mv-lret         ∞ret         Lifetime Total Retries
Life   40    mv-ldur         ∞⏱          Lifetime Total Duration

────────────────────────────────────────────────

VII. Out-of-Panel Metrics (outside #metrics-panel) — 5 items

────────────────────────────────────────────────

These metrics are not inside #metrics-panel but are distributed across other parts of the UI. They are still part of the Compass's observable system.

[41] id="cost-label" · Location: Status bar top-left · Name: Window-level Cumulative ge Cost
──────────────────────────────────
Meaning: Cumulative ge consumed since the panel was opened (window-level accumulator). Adds the current turn's cost each time a cost message arrives.
Source: chat.html → totalCostGe variable; cost handler → += parseFloat(msg.estimate)
Example values: 0.0320 ge / 1.25 ge / 12.80 ge
Formatting: <0.01 → 4 decimal places, otherwise 2 decimal places
Color: No special color, follows vscode-foreground

[42] id="rage-fill" + id="rage-label" · Location: Status bar · Name: Rage Meter
──────────────────────────────────
Meaning: Current rage value = current turn cost / baseline, as a percentage. Red progress bar + flame icon with a number.
Source: agent.js → _emitRageDot() → panel.js → rageDot message
Example values: 🔥 42 (meaning rage = 42%)
Formatting: Integer percentage
Color: Fixed red #f14c4c progress bar

[43] id="ge-bar-fill" · Location: 6px-wide vertical bar on the left side of the message area · Name: ge Fractional Progress Bar
──────────────────────────────────
Meaning: Driven by the fractional part of totalCostGe — cycles from 0 to 1 ge. Gives the user a live sense that "money is being spent."
Source: chat.html → updateGeBar(); frac = totalCostGe % 1
Example values: Height 0% ~ 100% (corresponding to ge decimal 0.00 ~ 0.99)
Formatting: h = max(5%, frac * 100)
Color: Gradient #f59e0b → #ef4444, pulses during animation

[44] id="ctx-bar" · Location: 4px-tall horizontal bar at the bottom of the panel · Name: Context Window HP Bar
──────────────────────────────────
Meaning: totalTokens / 800K × 100%, representing context window usage. Compression of old messages begins above 50%.
Source: agent.js → _updateHp() → panel.js → hp message
Example values: 12% (green) / 65% (yellow) / 92% (red)
Formatting: Integer percentage; shows at least 3% when any usage is present
Color rules: ≤50% green #4ec9b0 / 51–80% yellow #cca700 / >80% red #f14c4c

[45] id="input-est" · Location: Above the input box · Name: Input Token Estimator
──────────────────────────────────
Meaning: Real-time estimate of how many tokens the current input text will consume.
Source: chat.html → inputEl.addEventListener('input') → rAF throttle
Example values: ~890 tok / ~1.2K tok / — tok (empty input)
Formatting: ≥1000 → "~N.NK tok", otherwise "~N tok"
Color rules: >4000 red bad / >1000 yellow warn / normal grey / empty input: no class

────────────────────────────────────────────────

VIII. Hidden Metrics (in agent.js but not rendered in UI) — 3 items

────────────────────────────────────────────────

These three fields exist in the _metrics.turn object in agent.js but are never rendered in HTML.

[46] jsonMode · Name: JSON Mode Flag
──────────────────────────────────
Meaning: Marks whether this turn used JSON Mode (response_format: { type: 'json_object' }). Originally planned to be set to true in scenarios that use JSON Mode (such as plan generation/revision), but in the current codebase it is initialized and never set to true — a "reserved but not implemented" field.
Source: agent.js → _metrics.turn.jsonMode (always false)
Example values: false
Formatting: Boolean
Color: None (not rendered in UI)

[47] maxTokens · Name: Per-turn Max Output Token Limit
──────────────────────────────────
Meaning: The max_tokens parameter in the API request, which limits how many tokens the model can generate in a single turn. Currently fixed at 32768, assigned once when turn metrics are created, never changed thereafter.
Source: agent.js → _metrics.turn.maxTokens = 32768
Example values: 32768
Formatting: Integer
Color: None (not rendered in UI)

[48] toolTotalMs · Name: Cumulative Tool Call Duration
──────────────────────────────────
Meaning: Sum of all tool call durations this turn (ms). Accumulated by adding Date.now() - start after each tool completes. Used to derive toolAvgMs = toolTotalMs / toolCount; not displayed directly.
Source: agent.js → _executeToolCallsParallel → this._metrics.turn.toolTotalMs += (Date.now() - _toolStart)
Example values: 1250 (ms)
Formatting: Integer milliseconds
Color: None (not rendered in UI; derived as toolAvgMs for panel display)