DEV Community

Cover image for Giving AI Agents a Verdict on Repo Health—Actor #4 in My Apify Portfolio
Joe Slade
Joe Slade

Posted on

Giving AI Agents a Verdict on Repo Health—Actor #4 in My Apify Portfolio

Your AI agent will recommend a library that hasn't shipped a commit in over a year—and never flinch. It can't tell a thriving project from a dying one, so it treats a vibrant repo and an abandoned one as equally safe to build on. That's how stale dependencies sneak into production: not through carelessness, but because nothing in the loop ever asked is this thing still maintained?

Developers ask it constantly—they just do it by hand, badly. One pattern I kept seeing while researching this: people lean on a single signal and know it's a bad one. As one developer put it on r/selfhosted: "if a github repo hasn't been updated in the past 2 years I tend to consider it abandoned… but there are some projects older than this that I'd still happily use." Last-commit date is a blunt instrument. So are stars. So is open-issue count—"maybe the maintainers don't think they're as important; the issue tracker will often be swamped." Everyone has a heuristic; nobody has a verdict.

So I built GitHub Repo Intelligence MCP—the fourth actor in my Apify portfolio. Point it at any owner/name or GitHub URL and it returns one of four verdicts — Actively maintained, Slowing, At-risk, or Likely abandoned—backed by the metrics and the pinned thresholds that produced it. It runs as an MCP server, so an AI agent can call it mid-task instead of guessing.

Here's how it's built—and the design decision the whole thing rests on.


The Core Design Decision: The Verdict Is the Product

Most GitHub tooling for agents hands the model a pile of API data and leaves the conclusion to guesswork. The official GitHub MCP server will happily fetch you commits, issues, and PRs—but "here are 300 issues" is not an answer to "should I depend on this?" The model still has to invent a judgment, with no consistent rubric, every single time.

I inverted that. The actor's whole job is to do the synthesis and return the conclusion, with the evidence attached:

get_repo_health          — headline verdict + per-dimension breakdown + rationale
get_activity_metrics     — commit cadence, releases, momentum trend
get_issue_health         — maintainer response time + stale-backlog ratio
get_pr_health            — merge rate + oldest open PR
get_contributor_insights — contributor count + bus-factor flag
Enter fullscreen mode Exit fullscreen mode

Four dimensions—activity, issues, PRs, contributors — collapse into one verdict. Crucially, the verdict isn't a black-box score. Every response ships the metrics and the pinned thresholds behind it, so a human (or an agent) can audit why, not just read a number. Judgment, not raw access.

And it's deterministic: the same repo state and the same config always produce the same verdict. No per-call heuristic drift, no "the model felt differently this time." That property matters more for an agent tool than it sounds—it means the verdict is reproducible, testable, and safe to build a workflow on.


The Architecture: One Fetch, Five Tools

The naive way to build five tools is five sets of API calls. That's slow, and against GitHub's rate limits it's actively hostile.

Instead, a single GitHub GraphQL query pulls everything the five tools need in one round trip. The result is cached and shared, so drilling from the headline verdict (get_repo_health) down into a dimension (get_issue_health) costs zero extra upstream calls—it's reading from the same cached fetch. GraphQL is what makes that possible: one precisely-shaped request instead of a dozen REST round-trips for commits, issues, PRs, releases, and contributors.

The verdict logic itself is pure: metrics in, thresholds applied, verdict out. That makes it trivially testable without touching the network—the same discipline I've carried through every actor in this portfolio.


The Gotcha: Bring a GitHub Token (Here's Why)

The failure most people will hit first isn't in my code—it's the thing that makes the whole tool fall over if you skip it: GraphQL rejects unauthenticated requests outright. GitHub's REST API gives anonymous callers a tiny quota; the GraphQL API gives them nothing. No token, no data.

It gets worse on Apify specifically. Actors run from shared cloud egress, so if anonymous requests were allowed, every Apify user would be drawing from the same pooled anonymous IP quota—which burns to zero almost instantly. The symptom an unprepared user hits is a wave of rate-limit errors that seem to come out of nowhere and have nothing to do with their own usage.

So the input schema makes a GitHub token a first-class, walked-through step rather than an afterthought buried in a README. A fine-grained token with public-repo read access is enough—sixty seconds in GitHub settings—and it lifts you to 5,000 requests/hour that are yours, not shared. The lesson I keep relearning across this portfolio: the scariest setup failures are the ones that look like the tool is broken when really it's the environment you didn't prime.


A Real Run

I pointed it at two repos that make the contrast obvious—one thriving, one famously frozen—against the live endpoint.

vercel/next.js:

{
  "repo": "vercel/next.js",
  "verdict": "Actively maintained",
  "activity": "Healthy",
  "issues": "Healthy",
  "pullRequests": "Moderate",
  "contributors": "Healthy",
  "rationale": "Overall: activity is Healthy; weakest dimension is pull requests (Moderate)."
}
Enter fullscreen mode Exit fullscreen mode

moment/moment:

{
  "repo": "moment/moment",
  "verdict": "Likely abandoned",
  "activity": "Likely abandoned",
  "issues": "At-Risk",
  "pullRequests": "At-Risk",
  "contributors": "Healthy",
  "rationale": "Overall: activity is Likely abandoned; weakest dimension is issues (At-Risk)."
}
Enter fullscreen mode Exit fullscreen mode

Two repos, two unambiguous verdicts—and the rationale tells you which dimension drove each one. Then drill into the activity behind moment's verdict, no extra fetch:

{
  "repo": "moment/moment",
  "commits90d": 0,
  "commits365d": 0,
  "releaseCount365d": 0,
  "momentumTrend": "steady"
}
Enter fullscreen mode Exit fullscreen mode

Zero commits in a year. Zero releases. Steadily zero.

Now—moment is the honest, interesting case, and it's worth not glossing over. Moment isn't broken. Its maintainers deliberately declared it legacy years ago and pointed people to lighter alternatives. It's still downloaded millions of times a week, which is exactly why contributors reads Healthy—a deep historical base. A single-signal tool that only checked "is it popular?" would wave it through. A single-signal tool that only checked "recent commits?" would call it dead without nuance.

This is the whole point of a transparent, multi-dimensional verdict: it flags moment as Likely abandoned on the signals that matter for new adoption (no active development), while the per-dimension breakdown and rationale let you—or your agent—apply judgment. The verdict says "don't start something new on this." That's the right call. And you can see exactly why it said it.

GitHub Repo Intelligence MCP—verdicts for next.js vs moment over MCP


It Runs as a Standby MCP Server

The actor is a **standby actor—a persistent, always-warm HTTP server exposing an MCP endpoint over Streamable HTTP. Practically, that means it drops straight into the MCP configurator flow: add it to your agent, and get_repo_health is a tool the model can reach for the moment it's about to recommend a dependency. No cold-start per query.

Because it's always warm, you can also schedule it—re-check the repos your project depends on and get flagged when one of them starts slowing, before "slowing" becomes "abandoned."


What's Next

The verdict logic today is threshold-based: it reads quantifiable signals—cadence, response times, merge rates, contributor depth—and maps them to a verdict with pinned, transparent cutoffs. That's fast, deterministic, and free to run, and it's the right default.

The honest frontier is the gap between signals and context. A repo can be quiet because it's dying—or because it's genuinely finished, like moment. A maintainer can be slow because they've checked out, or because they're mid-rewrite on a branch. Threshold logic gets you a defensible verdict; a contextual layer (an optional semantic pass over release notes, pinned issues, and maintainer statements) is where the next iteration earns its cost.

This actor also pairs with the rest of the portfolio—same philosophy every time: focused, testable, composable. Small actors you wire into a pipeline, each doing one thing with a clear opinion, instead of one monolith that does everything adequately and decides nothing.


Try It

Live on the Apify Store:
👉 apify.com/joeslade/github-repo-intelligence-mcp

The input schema walks you through the GitHub token and the five tools. If you wire it into an agent workflow—or you've got a repo whose verdict surprises you—drop a comment. I'd like to hear which dimension called it.


Top comments (0)