Master modern AI architecture with Node.js, Gemini 2.5, and Cloudflare R2
Introduction: The Problem with "Toy" RAG Apps
Most RAG tutorials skip the hard parts that actually matter in production:
- No security model: Users can access each other's private data.
- Naive file handling: Large uploads crash your Node.js server.
- Expensive infra: AWS egress fees and managed vector DBs drain your wallet.
- Blocking operations: Processing files freezes your entire API.
We are going to solve all of these using a production-proven architecture.
The $0 Tech Stack
Every piece of this stack has a generous free tier:
- Cloudflare R2: S3-compatible storage with zero egress fees.
- Gemini 2.5 Flash: High-performance LLM with a free tier of 15 requests/minute.
- PostgreSQL + pgvector: Battle-tested database with native vector support.
- BullMQ: Redis-backed job queue to handle heavy processing in the background.
Step 1: Understanding the Architecture
We follow a 4-phase workflow designed for scale:
- Direct-to-Cloud Uploads: Browser uploads files directly to R2 using presigned URLs. Your server never touches the raw bytes, preventing memory crashes.
- Asynchronous Ingestion: A BullMQ worker handles the "heavy lifting"โdownloading, chunking, and embeddingโwithout blocking your API.
- Hybrid Retrieval: We use PostgreSQL row-level security so users only search their own data.
- Contextual Generation: Gemini generates answers with smart citations (temporary links to the source files).
Step 2: Zero-Cost Storage with Cloudflare R2
Traditional uploads stream data through your server. If 10 users upload 50MB files simultaneously, your server spikes by 500MB and likely crashes.
The Solution: The Reservation Pattern
We issue a time-limited Presigned URL. The browser sends the file directly to Cloudflare.
// Backend: Generate the permission
const { signedUrl, fileKey, fileId } = await uploadService.generateSignedUrl(
fileName,
fileType,
fileSize,
isPublic,
req.user
);
res.send({ signedUrl, fileKey, fileId });
Step 3: Contextual Query Rewriting
If a user asks "Who is the CEO of Tesla?" followed by "What about SpaceX?", a naive vector search for "What about SpaceX?" will fail because it lacks context.
We use Gemma 3-12B to rewrite queries in ~200ms:
// User: "What about SpaceX?"
// Gemma Rewrites: "Who is the CEO of SpaceX?"
This ensures your vector search actually finds the right documents.
Step 4: Hybrid Search with Row-Level Security
Multi-tenancy is the biggest hurdle in RAG. You can't let User A see User B's documents. Instead of filtering in JavaScript (which is slow and buggy), we do it in SQL:
SELECT d.content, d.metadata, f."originalName",
(d.embedding <=> ${vectorQuery}::vector) as distance
FROM "Document" d
LEFT JOIN "File" f ON d."fileId" = f.id
WHERE (d."userId" = ${userId} OR f."isPublic" = true)
ORDER BY distance ASC LIMIT 5;
This enforces security at the database layer. No accidental data leaks.
Step 5: Visual RAG - Understanding Images
Traditional RAG is text-only. If you upload a receipt, most systems fail. We use Gemini Vision to describe the image in detail, then embed that description.
| Input | Gemini Vision Output |
|---|---|
| Photo of coffee receipt | "Starbucks receipt, Jan 15, 2026. Grande Latte $5.45. Paid with Visa..." |
Now, when you search "How much did I spend at Starbucks?", the system finds the image because of its semantic description.
Conclusion: Build vs Buy
Commercial RAG solutions can cost $1,900+/year. By building this architecture, you save that money while gaining skills in:
- Distributed systems (BullMQ)
- Vector Database optimization (pgvector)
- Cloud Security (Presigned URLs)
๐ Want the Full Source Code?
If you want to save 40+ hours of setup, Iโve packaged this entire production-ready architecture into the Node.js Enterprise Launchpad.
- Standard Price: $20
- Launch Special: $4 (80% OFF)
It includes the RAG pipeline, Auth, RBAC, Socket.io, and Docker configurations.
Top comments (0)