DEV Community

Cover image for Mining Hidden Skills from Claude Code Session Logs with Semantic Knowledge Graphs
Kazuki Chigita
Kazuki Chigita

Posted on

Mining Hidden Skills from Claude Code Session Logs with Semantic Knowledge Graphs

The Problem

If you use Claude Code (or any LLM-based coding agent) daily, you've probably noticed yourself repeating similar workflows. The natural next step is: Can we turn these into reusable Skills?

The conventional path is: document your knowledge → codify it into a Skill → deploy. But in practice, the first step — articulating tacit knowledge as formal documentation — is where most people get stuck. You know what you do, but writing it down precisely enough for an agent to replicate is surprisingly hard.

Here's the key insight: your session logs already contain that knowledge. Every time you correct the agent, choose a specific tool sequence, or guide a workflow, you're implicitly recording your decision-making process. The question is how to extract it.

crune is the tool I built to answer that question. It analyzes Claude Code JSONL session logs, builds a semantic knowledge graph across sessions, detects recurring workflow patterns, and surfaces Skill candidates — all without requiring you to write documentation first.

crune knowledge graph visualization

Sessions clustered into topics with multi-signal edges

Architecture Overview

crune has two parts:

  1. Data pipeline (scripts/): Reads JSONL session files → extracts features → builds knowledge graph → generates Skill candidates → outputs JSON
  2. Frontend (src/): React SPA with three views — Session List, Session Playback, and Knowledge Graph

The core pipeline follows this flow:

Feature Extraction ─┐
  1. TF-IDF          │
  2. Tool-IDF        ├→ Combined Matrix → SVD → Clustering → Topic Nodes → Edges → Louvain → Brandes
  3. Structural      │
────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Let me walk through each stage.

Step 1: Multi-Signal Feature Extraction

Each session is represented by three independent feature vectors, each L2-normalized before combination.

Text Features (TF-IDF)

For each session, I concatenate user prompts, assistant responses, edited file paths, and git branch names, then tokenize with:

  • CamelCase splitting (sessionPlaybacksession, playback)
  • snake_case / kebab-case splitting
  • File path segment extraction (excluding extensions and short segments)
  • English/Japanese stop word removal
  • Noise filtering (UUIDs, hex strings ≥ 6 chars, pure numbers, tokens > 40 chars)

Vectorization uses sublinear TF-IDF:

tf(t,d)=log(1+count(t,d)) \text{tf}(t, d) = \log(1 + \text{count}(t, d))
idf(t)=log(Ndf(t)) \text{idf}(t) = \log\left(\frac{N}{\text{df}(t)}\right)

Vocabulary is filtered to terms appearing in ≥ 2 documents and ≤ 80% of all documents.

Tool Usage Features (Tool-IDF)

Tool usage counts across main turns and subagent turns are weighted by IDF to suppress ubiquitous tools:

w(tool,s)=log(1+count)×idf(tool) w(\text{tool}, s) = \log(1 + \text{count}) \times \text{idf}(\text{tool})
idf(tool)=log(Ndf(tool)) \text{idf}(\text{tool}) = \log\left(\frac{N}{\text{df}(\text{tool})}\right)

A session that uses Agent 3 times gets a higher weight for that tool than one using Read 100 times — because Agent usage is rarer and more distinctive.

Structural Features (7-dimensional)

A fixed-length vector capturing the shape of each session:

Dim Feature Description
0 userRatio Proportion of user input turns
1 assistantRatio Proportion of assistant response turns
2 toolCallRatio Fraction of turns containing tool calls
3 subagentRatio Subagent involvement: (Agent turns + subagent count) / total turns
4 avgToolsPerTurn log(1 + total_tools / total_turns)
5 editHeaviness (Edit + Write) / total tool calls
6 readHeaviness (Read + Grep + Glob) / total tool calls

Step 2: Latent Space via Truncated SVD

The three feature vectors are concatenated with sqrt-weighted coefficients:

xi=[  0.50  ti,0.25  ui,0.25  si  ] \mathbf{x}_i = \Big[\; \sqrt{0.50}\;\mathbf{t}_i, \quad \sqrt{0.25}\;\mathbf{u}_i, \quad \sqrt{0.25}\;\mathbf{s}_i \;\Big]

where t, u, s are the TF-IDF, Tool-IDF, and Structural vectors respectively. The √weight trick ensures that in cosine distance, each group contributes proportionally to the specified ratio (50:25:25) after concatenation.

Since the number of sessions m is typically much smaller than the feature dimension n (TF-IDF vocab + tool vocab + 7), I compute SVD efficiently via the Gram matrix:

  1. Compute the Gram matrix (much smaller than the full covariance):
G=AA(m×m) G = A A^\top \quad (m \times m)
  1. Extract top-k eigenvectors via power iteration + deflation (50 iterations, seed=42 for reproducibility)
  2. Recover singular values and right singular vectors:
σi=λi,V=AUΣ1 \sigma_i = \sqrt{\lambda_i}, \qquad V = A^\top U \Sigma^{-1}
  1. L2-normalize the session embeddings → k-dimensional dense latent space
k=min(80,  max(20,  m/4)) k = \min\big(80,\; \max(20,\; \lfloor m/4 \rfloor)\big)

This latent space naturally surfaces cross-signal axes — inspecting columns of V reveals which words and tools contribute to each latent dimension.

Step 3: Agglomerative Clustering

Using cosine distance in the SVD latent space, I run average-linkage agglomerative clustering with automatic threshold detection:

  1. Build a cosine distance matrix between all session embeddings
  2. Run average-linkage clustering, recording the merge distance history
  3. Elbow detection: find the merge step where the second derivative (acceleration) of merge distances is maximized → use as the cutoff threshold
    • Fallback: 0.7 if history is too short
    • Clamped to [0.3, 0.9]
  4. Re-cluster with the detected threshold

Oversized Cluster Splitting

Clusters containing > 25% of all sessions (minimum 10) are split using a tighter threshold: median(internal distances) × 0.8 (floor: 0.15).

Narrow Cluster Merging

When /insights facets data is available, clusters with ≤ 2 sessions are candidates for merging. Goal categories are normalized from 50+ raw types into ~10 canonical categories (feature, bugfix, refactoring, documentation, review, testing, etc.). Two narrow clusters merge if they share ≥ 1 normalized category AND their average cosine distance < 0.7, up to a maximum merged size of 8.

Step 4: Topic Node Construction

Each cluster becomes a topic node with:

Field Method
Keywords Top-5 terms from the TF-IDF centroid (mean of cluster member vectors)
Label Shortest underlying_goal from facets data (≤ 80 chars), falling back to top-3 keywords
Representative Prompts Top-3 user prompts ranked by cosine similarity to centroid (deduplicated)
Tool Signature Top-5 tools by log(1 + count) × idf(tool)
Dominant Role subagent-delegated if subagent ratio > 15%, tool-heavy if tool ratio > 60%, else user-driven

Step 5: Multi-Signal Edge Construction

For every pair of topics, three signals are computed:

Signal Weight Method
Semantic Similarity 0.4 Cosine similarity between topic centroids in SVD space
File Overlap 0.3 Jaccard coefficient of edited file sets across member sessions
Session Overlap 0.3 0.6 if same project & branch, 0.4 if sessions within 1 hour (max taken)
strength=0.4ssem+0.3sfile+0.3ssession \text{strength} = 0.4\,s_{\text{sem}} + 0.3\,s_{\text{file}} + 0.3\,s_{\text{session}}

Edges are created only when strength > 0.2. Each edge is classified:

Type Condition
cross-project-bridge Source and target belong to entirely different project sets
shared-module File overlap is the dominant weighted signal
workflow-continuation Session overlap is the dominant weighted signal
semantic-similarity Default (none of the above)

Step 6: Community Detection & Centrality

Louvain community detection groups related topics by maximizing modularity:

Q=12mij[Aijkikj2m]δ(ci,cj) Q = \frac{1}{2m} \sum_{ij}\left[A_{ij} - \frac{k_i k_j}{2m}\right] \delta(c_i, c_j)

Each node starts in its own community. Nodes are moved to whichever neighboring community yields the largest modularity gain ΔQ, repeated until convergence (max 100 iterations).

Brandes betweenness centrality identifies bridge topics — nodes that sit on shortest paths between communities:

  1. BFS from every node, recording shortest path counts σ and predecessors
  2. Back-propagate dependency:
δ(v)+=σ(v)σ(w)(1+δ(w)) \delta(v) \mathrel{+}= \frac{\sigma(v)}{\sigma(w)} \cdot \bigl(1 + \delta(w)\bigr)
  1. Normalize for undirected graphs:
CB(v)=CB(v)(n1)(n2) C_B(v) = \frac{C_B(v)}{(n-1)(n-2)}

Topics in the top 10% of betweenness centrality (minimum 1) are flagged as bridge topics — they represent knowledge that connects otherwise separate domains.

Skill Generation Pipeline

Once topics are identified, crune generates Skill candidates through a three-stage pipeline.

Stage 1: Reusability Scoring

Each topic is scored on four to six signals depending on data availability:

Signal Weight Formula
Frequency 0.30 sessionCount / max(sessionCount)
Time Cost 0.20 avgDuration / max(avgDuration)
Cross-Project 0.20 (projectCount − 1) / (max(projectCount) − 1)
Recency 0.10 1 − daysSinceLastSeen / max(daysSinceLastSeen)
Success Rate 0.10 Fraction of fully_achieved or mostly_achieved outcomes (from /insights facets)
Helpfulness 0.10 Normalized claude_helpfulness average (essential=1.0, very_helpful=0.8, ..., unhelpful=0.0)

The last two signals are only available when /insights facets data exists. Without facets, the base four signals are reweighted (0.35, 0.25, 0.25, 0.15).

Stage 2: Heuristic Skill Generation

Each topic is converted into a SKILL.md skeleton following the anthropics/skills format:

  • Name: top-3 keywords + project suffix in kebab-case (≤ 40 chars)
  • Description: Pushiness-oriented trigger description — specifying when the skill should activate rather than what it does
  • Body: Overview → When to Use (citing representative prompts) → Workflow steps → Detected tool sequence patterns → Guidelines

Stage 3: LLM Synthesis

The heuristic skeleton is refined via claude -p with an enriched prompt containing topic metadata, representative prompts, tool signatures, enriched tool sequence patterns, and optionally graph context (centrality, connected topics by edge type).

The top-N candidates by reusability score (default: 5) are pre-synthesized at build time. On-demand re-synthesis with full graph context is available through the UI.

Session Playback

Beyond the knowledge graph, crune also provides session-level exploration:

Session list
Session list with summaries, work types, and keywords

Session playback
Full session playback showing the conversation flow

Each session gets a locally-computed summary (no LLM needed): a representative prompt selected via Jaccard centrality with position weighting, extracted keywords, work type classification (investigation / implementation / debugging / planning), and scope from the common directory prefix of edited files.

The Core Value: Discovery, Not Generation

To be honest, the auto-generated SKILL.md files aren't always production-ready. Heuristic generation tends to produce surface-level procedure listings that lack the why behind each step.

But that's not the point. The real value is Skill candidate discovery:

  • "You've repeated this pattern 15 times"
  • "It averages 25 minutes per session"
  • "It appears across 3 different projects"

These are things you genuinely don't notice from day-to-day work. Patterns you've internalized as "just how I do things" become visible when projected onto a knowledge graph. The cross-project patterns are especially hard to spot — same workflow, different context, different codebase.

Once you see the pattern, adding your intent and judgment to turn it into a quality Skill becomes much easier than writing one from scratch.

Future Direction

The next frontier is multi-user session analysis — analyzing sessions from team members working on the same product to discover shared tacit knowledge and team-wide workflow patterns. This obviously raises privacy concerns around sharing raw prompts, so I'm exploring approaches inspired by federated learning and secure aggregation to aggregate patterns without exposing individual session data.

Getting Started

No clone required — just run via npx:

npx @chigichan24/crune --dry-run                    # Preview skill candidates
npx @chigichan24/crune --skip-synthesis              # Generate heuristic skills (no LLM)
npx @chigichan24/crune --count 3 --model haiku       # LLM-synthesized skills (requires claude CLI)
npx @chigichan24/crune --output-dir ~/.claude/skills  # Install skills directly
Enter fullscreen mode Exit fullscreen mode

Output follows the Claude Code skill format (<name>/SKILL.md), ready to use as /skill-name commands.

The source is at github.com/chigichan24/crune. If this sounds interesting, give it a ⭐ on GitHub — it helps others discover the project. Contributions, issues, and feedback are all welcome.

Top comments (0)