DEV Community

Prakhar
Prakhar

Posted on

I built an MCP server that finds you mergeable open source issues in 30 seconds

I built an MCP server that finds you mergeable open source issues in 30 seconds

The problem

I'm a CS student at IIT Guwahati. A few months ago I decided I wanted to contribute to open source. The advice was always the same: "look for the good first issue label."

So I did. And every single time:

  • The issue was already assigned
  • Someone had opened a PR yesterday
  • The repo hadn't been touched in 8 months
  • It needed a language I knew on paper but not in practice

After three weekends of this loop, I gave up and went back to building side projects.

Then MCP (Model Context Protocol) launched and I realized: this is exactly the kind of problem an AI agent should solve. Not by generating anything — just by filtering GitHub data better than I can scroll through it.

So I built OpenCollab MCP.

What it does

You ask Claude (or Cursor, or any MCP client):

"Find me a Python good-first-issue I can finish this weekend. Make sure nobody's working on it."

OpenCollab exposes 22 tools to the AI. The model picks the right ones, chains them together:

  1. match_me — reads my GitHub, picks my strongest language
  2. find_issues — searches good-first-issues in that language
  3. check_issue_availability — for each candidate, verifies no one's working on it
  4. issue_complexity — rates difficulty 1-10

I get back 5 actually-mergeable issues in under 30 seconds.

Then:

"Plan a PR for issue #456 in owner/repo."

It calls generate_pr_plan which fetches the issue body, comments, CONTRIBUTING.md, repo structure, and default branch — handing the AI everything needed to draft real code.

The 22 tools

🔍 Discovery (6): find_issues, trending_repos, similar_repos, find_mentor_repos, weekend_issues, match_me

📊 Evaluation (7): repo_health, contribution_readiness, impact_estimator, repo_activity_pulse, compare_repos, repo_languages, dependency_check

🎯 Issue Intel (6): check_issue_availability, issue_complexity, stale_issue_finder, label_explorer, recent_prs, generate_pr_plan

👤 Profile (3): analyze_profile, first_timer_score, contributor_leaderboard

Design choices that mattered

It's a data bridge, not an AI

Zero AI inference happens on my end. Your client (Claude/Cursor) does all the reasoning. OpenCollab just feeds it clean GitHub data.

This means: zero cost to me, fully private (runs on your machine), and it inherits whatever model your client uses.

Pydantic for every input

Every tool's input is a Pydantic model with extra="forbid" and str_strip_whitespace=True. LLMs sometimes pass stray fields or whitespace — Pydantic catches it before any logic runs.

class IssueInput(BaseModel):
    model_config = ConfigDict(str_strip_whitespace=True, extra="forbid")
    owner: str = Field(..., min_length=1)
    repo: str = Field(..., min_length=1)
    issue_number: str = Field(..., min_length=1)
Enter fullscreen mode Exit fullscreen mode

The issue_number: str is intentional — clients pass numbers as strings, and a permissive parser handles '#123', ' 123 ', and '123' uniformly.

Async + parallel for the heavy tools

match_me was originally 3 sequential API calls. Now it's asyncio.gather:

user, repos_raw = await asyncio.gather(
    github_get(f"/users/{username}"),
    github_get(f"/users/{username}/repos", {...}),
)
Enter fullscreen mode Exit fullscreen mode

Same trick in repo_health (3 calls), compare_repos (4 calls per repo), dependency_check (8 file lookups), generate_pr_plan (5 endpoints). 3-5x latency improvement on the heavy paths.

In-memory TTL cache for rate limits

GitHub's unauthenticated rate limit is brutal. Even authenticated, 5 chained tool calls per question burns through fast. I added a 5-minute in-memory cache:

def _cache_get(key: str) -> Any | None:
    entry = _cache.get(key)
    if not entry:
        return None
    expires_at, value = entry
    if time.monotonic() > expires_at:
        _cache.pop(key, None)
        return None
    return value
Enter fullscreen mode Exit fullscreen mode

Hand-rolled because functools.lru_cache doesn't do TTL and I didn't want a cachetools dependency for 30 lines.

MockTransport for fast tests

All 45 tests run in 0.12 seconds. No real network. httpx.MockTransport lets me return arbitrary status codes per path, which mattered for testing the GitHub-202-while-stats-compute case.

def _handler(request: httpx.Request) -> httpx.Response:
    path = request.url.path
    if path not in routes:
        return httpx.Response(404, json={"message": "Not Found (mock)"})
    spec = routes[path]
    status, body = spec if isinstance(spec, tuple) else (200, spec)
    return httpx.Response(status, json=body)
Enter fullscreen mode Exit fullscreen mode

Install in 60 seconds

pip install opencollab-mcp
# or
uvx opencollab-mcp
Enter fullscreen mode Exit fullscreen mode

Then add to your Claude Desktop config:

{
  "mcpServers": {
    "opencollab": {
      "command": "uvx",
      "args": ["opencollab-mcp"],
      "env": { "GITHUB_TOKEN": "your_token_here" }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Restart Claude. Done.

What's next

  • first_pr_generator — one-shot find + plan + draft my first PR
  • track_my_prs — dashboard with staleness nudges
  • skill_gap — compare your skills vs a target repo's stack

If you've been wanting to contribute to open source but couldn't find the right issue, give it a shot. And if it helps you land a PR — a ⭐ on the repo would genuinely make my week.

GitHub: https://github.com/prakhar1605/Opencollab-mcp

PyPI: https://pypi.org/project/opencollab-mcp/

Top comments (0)