How I stopped Claude from cloning entire GitHub repos for a 10-line snippet

#claude #github #productivity #tooling

TL;DR — I wrote a Claude Code skill that prevents Claude from cloning or npm install-ing a repo when I only wanted one function or one idea from it. Raw files only, into /tmp, smallest useful unit, adapted to my project's style. MIT, 70 lines of markdown. Part of jd-skills.

The pattern that finally annoyed me enough

You've done this. I've done this. You're in a Claude Code session, you paste a GitHub URL, and you say something like:

"Look at how this repo handles agent handoffs — can we do something similar?"

And then Claude goes: git clone https://github.com/..., reads 47 files, asks you which __init__.py is interesting, and 90 seconds later you're three levels deep in someone else's repo scaffolding for what should have been a 12-line concept.

Or worse — it adds the whole library to your package.json as a dependency. For one function. You now own its transitive deps, its CVE notifications, and a version pin you'll never upgrade.

The problem isn't Claude being lazy or sloppy. The problem is that "use this library" and "borrow an idea from this library" deserve completely different workflows, and there was no rule telling Claude which one I meant.

The rule

The fix is dumb-simple as a rule and surprisingly effective in practice:

When the user references a GitHub repo for inspiration (not as a dependency), never clone it and never install it. Read the README first. If the README answers the question, stop there. If code is needed, fetch raw files via raw.githubusercontent.com into the OS temp dir, lift the minimum useful unit, adapt it to the user's style, and cite the source commit SHA.

I wrote this up as a Claude Code skill — a SKILL.md with a description that triggers auto-invocation when a GitHub URL is dropped in as inspiration. Claude reads the skill on session start, and it just… does the right thing now.

Walkthrough — the concept-only case

I was working on a YC job-applier and someone pointed me at TauricResearch/TradingAgents. "Can we use this multi-agent pattern?"

Without the skill: Claude would have cloned the whole trading repo, then I'd have spent 20 minutes pruning irrelevant files. The repo is for trading. Almost none of it transfers to job applications.

With the skill: Claude started with raw.githubusercontent.com/TauricResearch/TradingAgents/main/README.md to map the rough shape — specialised role agents (analysts → bull/bear researchers debating → judge → trader → risk team) coordinated by a graph. Then it pulled a handful of source files via raw URLs to see how the pattern was actually wired: the agent prompt templates, the JSON schema each role hands back, and the graph node that routes between them. Four files into /tmp, not a clone.

That was enough to propose an analogue for my use case: JobFitAnalyst + RecruiterPersonaWriter + Critic (a role that argues against applying), orchestrated by my existing extractor pipeline — with the prompt and schema shapes borrowed from the trading agents but rewritten for jobs. We discussed the design before any code got written.

Result: the pattern was useful. The framework around it wasn't. Cloning would have been pure friction.

Walkthrough — the surgical lift case

Different prompt: "There's a clean exponential backoff in litl/backoff — can we just steal it?"

With the skill, Claude:

Pinned the SHA so the source is reproducible:

   gh api repos/litl/backoff/commits/master --jq '.sha' | cut -c1-7
   # → abc1234

Listed the tree via the API instead of cloning:

   gh api repos/litl/backoff/contents/backoff --jq '.[].path'

Identified _wait_gen.py as the target — ~40 lines of generator logic.
Fetched just that file to /tmp/sge-backoff-abc1234/:

   curl -fsSL "https://raw.githubusercontent.com/litl/backoff/abc1234/backoff/_wait_gen.py" \
     -o /tmp/sge-backoff-abc1234/_wait_gen.py

Extracted the 8-line expo generator, rewrote it as a plain function matching my scraper's style (no Pythonic generators, just a list).
Pasted it into utils/retry.py with a provenance comment:

   # adapted from github.com/litl/backoff@abc1234:backoff/_wait_gen.py
   def exponential_backoff(...):
       ...

No pip install backoff. No git submodule. Eight lines, owned by me, traceable to the source.

Why a Skill and not an MCP

Quick aside in case you're considering writing your own. People sometimes ask: should this be a Skill or an MCP server?

MCP ships new tools to Claude. You'd build a github_raw_fetch server, host it, version it, wire it into clients.
Skill ships instructions that shape how Claude uses tools it already has — WebFetch, curl via Bash, gh, Read.

This is purely a discipline layered on existing tools. Skill. Anthropic's own mcp-builder is itself a Skill, which is a hint.

Rule of thumb: if you can write your idea as a paragraph of instructions, it's a Skill. If you need to add a new verb to Claude's vocabulary, it's an MCP.

Install

mkdir -p ~/.claude/skills/surgical-github-extraction
curl -fsSL https://raw.githubusercontent.com/jeet-dhandha/jd-skills/main/skills/surgical-github-extraction/SKILL.md \
  -o ~/.claude/skills/surgical-github-extraction/SKILL.md

Or project-scoped under .claude/skills/ in your repo.

Repo: https://github.com/jeet-dhandha/jd-skills

It's part of jd-skills, a small collection of Claude Code skills I'm building. The sibling skill, code-graft, handles the case where a one-off snippet isn't enough but a runtime dep is too much — vendor only the slice of a library you actually use into your project, trim the rest, re-sync selectively from upstream. Useful for things like "I want one tokenizer out of HuggingFace transformers without the 2GB."

Issues, prompts that misfired, and "this should also handle X" reports very welcome — skills only get sharper with concrete failure cases.

If you write Claude Code skills, I'd love to see them. Drop them in the comments.