Building RepoRank — The Open Source Impact & Funding Readiness Engine

#ai #webdev #opensource #showdev

Treats external APIs as SQL tables

A journey into bridging open-source metrics, cross-source SQL joins, and AI-driven grant matching.

The Open Source Funding Crisis

Every day, developers build and maintain critical pieces of open-source software that power global infrastructure. Yet, one of the biggest challenges in the ecosystem remains funding. Maintainers spend hours writing grant proposals, pitch decks, and sponsorship applications.

To prove a project's impact, you have to answer tough questions:

How many developers are using it? (Downloads on PyPI/npm)
Is the community talking about it? (HackerNews mentions)
What is its current financial state? (Open Collective stats)
How healthy is the repository? (GitHub stars, forks, and issues)

Gathering this data means integrating with four or five different APIs, parsing fragmented JSON structures, writing boilerplate auth code, and manually stitching the pieces together.

For the Pirates of the Coral-bean Hackathon, I built RepoRank to solve this exact problem. RepoRank uses Coral SQL and Hugging Face Qwen 2.5 (72B) to aggregate cross-source signals instantly, calculate an overall impact score, generate a professional funding pitch, and match projects with active grant programs.

The Secret Weapon: Coral SQL

Integrating 4+ external APIs usually takes days of reading docs and writing API wrapper code. Coral changes this completely by letting you query APIs using standard SQL.

With Coral, APIs are treated as SQL tables. The star of the show in RepoRank is this single cross-source JOIN query:

SELECT  g.full_name,
        g.stargazers_count,
        g.forks_count,
        g.open_issues_count,
        p.last_month_downloads AS monthly_downloads,
        h.mention_count        AS hn_mentions_6mo,
        h.top_score            AS hn_top_score
FROM    github.repos_get    g
JOIN    pypi.packages       p  ON  p.name = 'fastapi'
JOIN    (
            SELECT COUNT(*) AS mention_count, MAX(score) AS top_score
            FROM   hackernews.stories
            WHERE  query = 'fastapi'
              AND  time  > NOW() - INTERVAL '180 days'
        )                   h  ON  1 = 1
WHERE   g.owner = 'tiangolo' AND g.repo = 'fastapi'

Coral handles the execution under the hood, fetching the package stats, checking HackerNews for stories matching the repository name over the last 180 days, retrieving GitHub details, and returning a unified, flat SQL row.

Building the Sources We Needed

Coral has a great collection of built-in sources, but to capture the full open-source picture, we needed data from PyPI, npm, HackerNews, and Open Collective.

Instead of hardcoding API requests in Python, we extended Coral! We designed and wrote 4 custom YAML source specifications that map external REST and GraphQL endpoints into SQL-queryable tables.

Here is a snippet of the custom GraphQL source spec we wrote for Open Collective:

dsl_version: 3
name: opencollective
backend: http
base_url: https://api.opencollective.com/graphql/v2
tables:
  - name: collectives
    filters:
      - name: slug
        required: true
    request:
      method: POST
      path: /
      body:
        format: text
        content: |
          {
            "query": "query ($slug: String) { collective(slug: $slug) { slug name stats { totalAmountReceived { value } contributorsCount } } }",
            "variables": { "slug": "{{filter.slug}}" }
          }

By mapping the GraphQL variable to the {{filter.slug}} template, we can query Open Collective using a simple SQL WHERE clause:

SELECT total_amount_received FROM opencollective.collectives WHERE slug = 'fastapi'

Upstream Contributions

To help other developers build on this, I opened 3 upstream Pull Requests to the official withcoral/coral repository:

PR #827: Adds the PyPI packages source spec & docs.
PR #828: Adds the npm packages source spec & docs.
PR #829: Adds the Open Collective GraphQL source spec & docs.

From Raw Data to Actionable Insights

Once the Coral engine executes the query and returns a structured record, RepoRank hands the data off to Hugging Face Qwen 2.5 (72B Instruct).

Using structured JSON schema outputs, Qwen acts as an expert open-source analyst, returning:

Impact Score (0-100): Mapped across metrics.
Impact Narrative: A professional, plain-English summary of the project's real-world value.
Funding Pitch: A punchy, one-sentence elevator pitch ready to copy-paste into sponsorship forms.
Targeted Grants: Recommends 3-5 specific programs (e.g., Google Summer of Code, PSF, MOSS, GitHub Sponsors) matching the language, ecosystem, and project scale.

A Premium User Experience

To match the power of the backend, I built a dark-mode dashboard with:

Interactive SVG Radar Charts: Visually represents project balance across 6 key metrics (Stars, Forks, Downloads, Community Buzz, Financial Backing, and Repository Health).
Session-Persistent GitHub Auth: Connect your GitHub account, load up your repositories in a sidebar, and click any repository to trigger an instant analysis.
Clickable Grant Cards: Hardmapped direct links to known grant applications, with dynamic Google search fallbacks for newer programs.
Inspectable Queries: A collapsible SQL details viewer so developers can see the exact query run by the Coral engine.
Keyboard Navigation: Cmd+K to focus the search bar, Cmd+Enter to run the analysis, and Esc to exit modals.

Key Takeaways

SQL as an API Gateway: Treating APIs as SQL tables makes data orchestration significantly simpler. Writing cross-source joins in SQL is cleaner than writing asynchronous API aggregation loops.
Community Expansion: Extending Coral's capabilities by writing YAML specifications shows how powerful declarative source mappings are.
Developer-centric AI: AI models are most effective when fed structured, pre-filtered data. Combining the data-gathering capabilities of Coral with the synthesis capabilities of LLMs creates an excellent pipeline.

🔗 Links & Resources

GitHub Repository: himanshu748/reporank
Upstream Pull Requests: Coral Pull Requests Page

Built for the Pirates of the Coral-bean Hackathon. Happy sailing! 🏴‍☠️

Top comments (5)

Harjot Singh • May 31

The framing of OSS funding as an evidence problem is spot on, maintainers burn hours re-assembling the same impact story (downloads, mentions, financial state) for every grant because the proof is scattered across PyPI, npm, HN, Open Collective and nobody's joined it. RepoRank is really a cross-source SQL join wearing a funding-engine hat, and that's the right architecture: the value is in correlating signals that individually mean little but together tell a fundable story. Two thoughts on the AI grant-matching layer: ground every claim in the actual numbers so the pitch can't inflate impact (a funder who catches one exaggerated stat discounts the whole application, exactly like a hallucinated citation), and make the matching explainable, "you fit this grant because X" beats a black-box score a maintainer can't act on. Aggregate-then-ground-then-match is a pattern I lean on constantly building Moonshift. Where's the hardest data to get reliably, is it the financial state (Open Collective is patchy) or attributing real usage versus vanity download counts?