DEV Community

Cover image for Built a Caching Proxy for OpenAI β€” Saved 40% on API Bills
adamday75
adamday75

Posted on

Built a Caching Proxy for OpenAI β€” Saved 40% on API Bills

A maintenance manager's first SaaS. Technical deep-dive + lessons learned.

Hey dev.to! πŸ‘‹

I'm not a career developer. I supervise industrial mechanics and run a maintenance department. But we needed AI for our CMMS (Computerized Maintenance Management System), and the OpenAI API costs were getting crazy.

So I built a caching proxy. Here's how it works, what I learned, and the actual code.

───

The Problem

We're using AI for:

β€’ Auto-generating work orders
β€’ Predictive maintenance alerts
β€’ Vendor communications
β€’ Training docs

Issue: Same prompts, repeated constantly, paying every time.

User: "Generate work order for HVAC maintenance"
β†’ Pay $0.002

User: "Generate work order for HVAC maintenance" (same prompt)
β†’ Pay $0.002 again

User: "Generate work order for HVAC maintenance" (same prompt, 3rd time)
β†’ Pay $0.002 AGAIN
Enter fullscreen mode Exit fullscreen mode

This adds up FAST at scale.

───

The Solution: Caching Proxy

Intercept OpenAI requests, hash the prompt, cache the response.

Architecture:

Your App β†’ AI Optimizer Proxy β†’ OpenAI API
                ↓
           SQLite Cache
                ↓
        (hash β†’ response)
Enter fullscreen mode Exit fullscreen mode

Flow:

  1. App sends request to proxy
  2. Proxy hashes: sha256(prompt + model + params)
  3. Check cache: β€’ Hit: Return cached response (FREE) β€’ Miss: Forward to OpenAI, cache response, return
  4. Dashboard tracks hits/misses/savings

───

The Code (Simplified)

Cache lookup:

async function getCacheKey(hash) {
  const db = await getDb();
  return db.get('SELECT * FROM cache WHERE hash = ?', [hash]);
}

async function setCacheKey(hash, response, ttl = 86400) {
  const db = await getDb();
  await db.run(
    'INSERT OR REPLACE INTO cache (hash, response, expires_at) VALUES (?, ?, ?)',
    [hash, JSON.stringify(response), Date.now() + ttl * 1000]
  );
}
Enter fullscreen mode Exit fullscreen mode

Request Handler:

app.post('/v1/chat/completions', async (req, res) => {
  const hash = hashPrompt(req.body);
  const cached = await getCacheKey(hash);

  if (cached && cached.expires_at > Date.now()) {
    // Cache HIT
    analytics.recordHit(true);
    return res.json(cached.response);
  }

  // Cache MISS - call OpenAI
  const response = await fetch('https://api.openai.com/v1/chat/completions', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify(req.body)
  });

  const data = await response.json();
  await setCacheKey(hash, data);
  analytics.recordHit(false);

  res.json(data);
});
Enter fullscreen mode Exit fullscreen mode

───

The Results

My workload (1 week):

β€’ Total requests: 1,847
β€’ Cache hits: 1,385
β€’ Cache misses: 462
β€’ Cache hit rate: 75%
β€’ Cost savings: ~40%

Real money saved: If you're spending $100/month, expect $40-60 savings.

───
Technical Challenges

  1. Device Fingerprinting

For license validation, I needed to identify devices without login:

const { machineId } = require('node-machine-id');
const deviceId = await machineId();
const fingerprint = sha256(deviceId + os.hostname());
Enter fullscreen mode Exit fullscreen mode
  1. Stripe Webhooks

First payment webhook failed because I didn't handle customer.subscription.created vs customer.subscription.updated differently. Now I route by event.type:

app.post('/stripe', (req, res) => {
  const eventType = req.body.type;

  switch(eventType) {
    case 'customer.subscription.created':
      createLicense(req.body.data.object);
      break;
    case 'customer.subscription.deleted':
      revokeLicense(req.body.data.object);
      break;
    // ... 4 more event types
  }
});
Enter fullscreen mode Exit fullscreen mode
  1. Email Delivery

Gmail OAuth was a pain. Had to:

β€’ Create Google Cloud project
β€’ Enable Gmail API
β€’ Get OAuth credentials
β€’ Handle refresh tokens
β€’ Deploy secrets to Fly.io

First emails failed with unauthorized_client. Turned out my refresh token was from a different OAuth client. Started fresh, worked immediately.

───

The Stack

| Component   | Tech              |
| ----------- | ----------------- |
| Backend     | Node.js + Express |
| Desktop App | Electron          |
| Database    | SQLite            |
| Hosting     | Fly.io            |
| Payments    | Stripe            |
| Email       | Gmail API         |
| Builds      | electron-builder  |
Enter fullscreen mode Exit fullscreen mode

Total build time: ~1 month (nights/weekends)

───

Lessons Learned

  1. Schema matters: Added device_limit column after deploying. Had to recreate licenses. Check your schema BEFORE launch.

  2. Fresh Stripe signup > DB hacking: When my license broke, creating a new test subscription was faster than debugging the DB.

  3. Ship before perfect: My first build had no stats dashboard. Shipped anyway. Added it later.

  4. Non-devs can build SaaS: I learned enough to ship. You can too.

───

Try It...

Free 14-day trial: https://ai-optimizer-landing.vercel.app

GitHub (open source): https://github.com/adamday75/ai-optimizer-app

Drop-in replacement. Change one env var. See your savings.

───

Questions?

I'm happy to answer anything about:

(https://github.com/adamday75/ai-optimizer-app)

β€’ The caching strategy
β€’ License system
β€’ Stripe integration
β€’ Electron builds
β€’ Learning Node.js as a non-dev

Drop a comment! πŸ‘‡

───

Adam Day | Maintenance Manager β†’ Accidental SaaS Founder

───

Top comments (0)