I've been using Claude Code for a while and realized I had zero visibility into what the agent was doing across sessions. Which tools it called, whether it touched files it shouldn't have, how many calls it made per task.
So I built a hook that scores every session.
What it does
The hook listens to three Claude Code events:
- PostToolUse — records every tool call, checks it against an allowlist, flags protected path access
-
PreToolUse — blocks tool calls that touch sensitive files like
.envor SSH keys - Stop — computes the final trust score and logs it
At the end of each session you get:
[authe.me] Trust Score: 92 (reliability=100 | scope=75 | cost=100)
[authe.me] tools=14 violations=1 failed=0
The scoring
Three dimensions, weighted into an overall score:
- Reliability (40%) — what percentage of tool calls succeeded
- Scope (35%) — did the agent stay within your allowed tools and paths. Each violation drops the score by 25 points
- Cost (25%) — how many tool calls were made. Under 20 is fine, over 100 starts flagging
Hash chaining
Every tool call event gets hashed with the previous hash, creating a chain. If someone tampers with the log, the chain breaks. Same concept as blockchain audit trails but much simpler.
def compute_hash(prev_hash, data):
payload = f"{prev_hash}:{json.dumps(data, sort_keys=True)}"
return hashlib.sha256(payload.encode()).hexdigest()[:16]
Protected path blocking
The PreToolUse hook checks if Claude is about to read or edit a sensitive file. If it matches, the hook returns a deny decision with a reason that Claude receives as feedback:
result = {
"hookSpecificOutput": {
"hookEventName": "PreToolUse",
"permissionDecision": "deny",
"permissionDecisionReason": "authe.me policy: .env is a protected path."
}
}
Claude sees the denial reason and adjusts its plan.
Install
mkdir -p ~/.claude/hooks ~/.authe
curl -fsSL https://raw.githubusercontent.com/autheme/claude-code-hook/main/authe-hook.py -o ~/.claude/hooks/authe-hook.py
chmod +x ~/.claude/hooks/authe-hook.py
Then add the hook config to ~/.claude/settings.json:
{
"hooks": {
"PostToolUse": [
{
"matcher": "",
"hooks": [
{
"type": "command",
"command": "AUTHE_HOOK_EVENT=PostToolUse python3 ~/.claude/hooks/authe-hook.py"
}
]
}
],
"Stop": [
{
"matcher": "",
"hooks": [
{
"type": "command",
"command": "AUTHE_HOOK_EVENT=Stop python3 ~/.claude/hooks/authe-hook.py"
}
]
}
]
}
}
Single file, zero dependencies, pure Python.
What's next
Working on remote reporting so you can aggregate scores across sessions and agents. Also building an OpenClaw plugin that does the same thing for that ecosystem.
Repo: github.com/autheme/claude-code-hook
Built with Claude. Would love feedback from anyone running Claude Code in production.
Top comments (1)
The three scoring dimensions you landed on (reliability, scope, cost) map cleanly onto where prompt structure actually matters upstream.
Reliability failures usually trace back to an underspecified objective or missing success criteria — the model doesn't know what "done" looks like, so it keeps retrying or diverges. Scope violations often happen when constraints aren't explicit enough in the system prompt — the agent wasn't told those paths were off-limits at the intent layer, so PreToolUse is catching what the prompt should have prevented. Cost spikes often come from a missing chain-of-thought structure — the model is spinning without a reasoning framework.
So the hook is measuring execution quality, but the input to those decisions is the prompt. A structured constraints block in your CLAUDE.md (file access scope, protected paths, disallowed tools) should reduce your violation rate before the hook fires. Then you're using the trust score to validate that the prompt is working, not to compensate for it being vague.
Two layers working together rather than one catching for the other. Worth checking if low scope scores correlate with underspecified constraint sections in your project instructions.
If you're looking to build those constraint blocks more systematically, flompt.dev lets you do it visually with typed prompt blocks including a dedicated constraints block — might be useful for the agent config side. github.com/Nyrok/flompt