Optimizing "GitHub-as-a-Database": Solving Rate Limits with Server-Side Caching

#performance #opensource #javascript #node

The Context: An Open-Source Educational Tool

I currently serve as the Tech Lead for DigitalBoneBox, an open-source educational platform designed to render high-fidelity anatomical resources for anatomy students.

Our architectural constraints are unique: we needed a "database" that was completely open, version-controlled, and accessible to non-technical contributors (like anatomy professors) who might want to fix a typo or add a description without touching a database console.

The solution? "GitHub-as-a-Database." We store our data (JSON files and images) in a specific data branch of our public repository. The application fetches this content via the GitHub Raw API to render the UI.

While this lowered the barrier to entry for contributors, it introduced a critical engineering challenge: The N+1 Fetch Problem.

The Problem: Latency and Rate Limits
In our initial architecture, the client (or a thin server wrapper) would:

Fetch the "Manifest" file (a list of all bones).
Iterate through that list.
Fire a separate HTTP request for each bone to get its metadata (images, sub-bones, descriptions).

For a boneset like the "Bony Pelvis," this resulted in dozens of simultaneous HTTP requests to raw.githubusercontent.com.

The Consequence:

Latency: The user interface would "pop in" elements slowly as requests completed.
The Kill Switch: GitHub imposes a strict rate limit on unauthenticated requests (60 requests per hour per IP). A single student clicking through the application aggressively could exhaust this limit in minutes, causing the application to crash with 403 Forbidden errors.

As the application grew, this architecture became unsustainable. We needed a way to preserve the open-source data model while ensuring high availability.

The Solution: Server-Side "Warm Cache" Strategy
To solve this, I led the refactoring of our Node.js backend to implement an In-Memory Warm Cache. Instead of fetching data on request, we shifted the heavy lifting to the startup phase.

The Architecture Shift

Cold Start: When the Node.js server boots up, it enters a "Warming" state.
Bulk Fetching: The server traverses the GitHub data structure once, fetching the manifest and all constituent bone files.
In-Memory Indexing: These files are parsed and stored in a local JavaScript object (searchCache).
Serving: All subsequent user requests (Search, Navigation, Dropdowns) are served instantly from the local memory.

This reduces the API load from N (requests per user action) to 1 (request per server deployment).

The Implementation
Here is a simplified look at the logic we implemented in our server.js to handle this caching strategy.

// Cache storage
let searchCache = null;

// The "Warming" Logic
async function initializeSearchCache() {
    try {
        console.log("Initializing search cache...");

        // 1. Fetch the master manifest once
        const bonesetData = await fetchJSON(BONESET_JSON_URL);

        const searchData = [];

        // 2. Iterate and fetch details server-side (only happens once at startup)
        for (const boneId of bonesetData.bones || []) {
            const boneData = await fetchJSON(`${BONES_DIR_URL}${boneId}.json`);

            if (boneData) {
                // Flatten the data for efficient searching
                searchData.push({
                    id: boneData.id,
                    name: boneData.name,
                    type: "bone",
                    // ... additional metadata
                });
            }
        }

        // 3. Store in memory
        searchCache = searchData;
        console.log(`Cache warmed: ${searchData.length} items ready.`);

    } catch (error) {
        console.error("Critical: Failed to warm cache:", error);
    }
}

// Initialize on server start
initializeSearchCache();

The Results
The impact of this refactoring was immediate:

Zero Rate-Limit Errors: Since the server only fetches data once upon reboot (or scheduled refresh), we stay well below GitHub's API limits, regardless of how many students are using the app concurrently.
Sub-100ms Response Times: Search queries that previously waited for network round-trips now return instantly from memory.
Resilience: If GitHub goes down temporarily, the application continues to function for all connected users because the data is already cached in the server's RAM.

Conclusion
In open-source development, we often have to balance contributor experience (using simple files on GitHub) with user experience (performance and reliability). By introducing a caching middleware layer, we maintained the simplicity of our "Git-based database" for our anatomy professors while delivering a robust, production-grade experience for our students.

For developers building similar read-heavy applications on static data sources: don't let your clients fetch directly. Build a caching layer early—your API limits will thank you.

DEV Community

Optimizing "GitHub-as-a-Database": Solving Rate Limits with Server-Side Caching

The Context: An Open-Source Educational Tool

Top comments (0)