ayka.code

Posted on Feb 13 • Edited on Feb 17

Build a Serverless RAG Engine for with Gemini chatbot and deploy it for $0

#node #prisma #rag #ai

Master modern AI architecture with Node.js, Gemini 2.5, and Cloudflare R2

👉 Get the Source Code & Template Here

👉 Read the full tutorial here

Introduction: The Problem with "Toy" RAG Apps

Most RAG tutorials skip the hard parts that actually matter in production:

No security model: Users can access each other's private data.
Naive file handling: Large uploads crash your Node.js server.
Expensive infra: AWS egress fees and managed vector DBs drain your wallet.
Blocking operations: Processing files freezes your entire API.

We are going to solve all of these using a production-proven architecture.

The $0 Tech Stack

Every piece of this stack has a generous free tier:

Cloudflare R2: S3-compatible storage with zero egress fees.
Gemini 2.5 Flash: High-performance LLM with a free tier of 15 requests/minute.
PostgreSQL + pgvector: Battle-tested database with native vector support.
BullMQ: Redis-backed job queue to handle heavy processing in the background.

Step 1: Understanding the Architecture

We follow a 4-phase workflow designed for scale:

Direct-to-Cloud Uploads: Browser uploads files directly to R2 using presigned URLs. Your server never touches the raw bytes, preventing memory crashes.
Asynchronous Ingestion: A BullMQ worker handles the "heavy lifting"—downloading, chunking, and embedding—without blocking your API.
Hybrid Retrieval: We use PostgreSQL row-level security so users only search their own data.
Contextual Generation: Gemini generates answers with smart citations (temporary links to the source files).

Step 2: Zero-Cost Storage with Cloudflare R2

Traditional uploads stream data through your server. If 10 users upload 50MB files simultaneously, your server spikes by 500MB and likely crashes.

The Solution: The Reservation Pattern
We issue a time-limited Presigned URL. The browser sends the file directly to Cloudflare.

// Backend: Generate the permission
const { signedUrl, fileKey, fileId } = await uploadService.generateSignedUrl(
  fileName,
  fileType,
  fileSize,
  isPublic,
  req.user 
);
res.send({ signedUrl, fileKey, fileId });

Step 3: Contextual Query Rewriting

If a user asks "Who is the CEO of Tesla?" followed by "What about SpaceX?", a naive vector search for "What about SpaceX?" will fail because it lacks context.

We use Gemma 3-12B to rewrite queries in ~200ms:

// User: "What about SpaceX?"
// Gemma Rewrites: "Who is the CEO of SpaceX?"

This ensures your vector search actually finds the right documents.

Step 4: Hybrid Search with Row-Level Security

Multi-tenancy is the biggest hurdle in RAG. You can't let User A see User B's documents. Instead of filtering in JavaScript (which is slow and buggy), we do it in SQL:

SELECT d.content, d.metadata, f."originalName",
       (d.embedding <=> ${vectorQuery}::vector) as distance
FROM "Document" d
LEFT JOIN "File" f ON d."fileId" = f.id
WHERE (d."userId" = ${userId} OR f."isPublic" = true)
ORDER BY distance ASC LIMIT 5;

This enforces security at the database layer. No accidental data leaks.

Step 5: Visual RAG - Understanding Images

Traditional RAG is text-only. If you upload a receipt, most systems fail. We use Gemini Vision to describe the image in detail, then embed that description.

Input	Gemini Vision Output
Photo of coffee receipt	"Starbucks receipt, Jan 15, 2026. Grande Latte $5.45. Paid with Visa..."

Now, when you search "How much did I spend at Starbucks?", the system finds the image because of its semantic description.

Conclusion: Build vs Buy

Commercial RAG solutions can cost $1,900+/year. By building this architecture, you save that money while gaining skills in:

Distributed systems (BullMQ)
Vector Database optimization (pgvector)
Cloud Security (Presigned URLs)

🚀 Want the Full Source Code?

If you want to save 40+ hours of setup, I’ve packaged this entire production-ready architecture into the Node.js Enterprise Launchpad.

Standard Price: $20
Launch Special: $4 (80% OFF)

It includes the RAG pipeline, Auth, RBAC, Socket.io, and Docker configurations.

👉 Get the Source Code & Template Here

DEV Community