DEV Community: Sreeraj Sreenivasan

Demystifying Tokens: How AI Actually Reads Your Code and Prompts

Sreeraj Sreenivasan — Sun, 24 May 2026 12:00:44 +0000

If you’ve been building with Large Language Models (LLMs), integrating APIs, or just messing around with prompt engineering, you’ve hit the word token a million times.

You know it’s the unit you get billed for. You know it’s the thing that fills up your "context window." But how does it actually work under the hood?

If you think LLMs read text word-by-word like humans, or character-by-character like traditional code compilers, think again. Let's pull back the curtain on tokenization and see what’s really going on when you hit "Send."

What Exactly is a Token?

To an AI, a token is the fundamental building block of language.

LLMs don't understand English, Python, or JavaScript directly. Instead, they run raw text through a processing step called tokenization, which chops strings into smaller pieces. A token can be a single character, a part of a word (sub-word), an entire word, or even punctuation and trailing spaces.

Here is a quick rule of thumb for English text:

1 token ≈ 4 characters
1 token ≈ 0.75 words
100 words ≈ 130–140 tokens

But things get weird when you look closer. Let's see how an AI tokenizer actually splits a sentence.

Tokenization in Action

Take a simple sentence like: "Learning AI is fun!"

A typical LLM tokenizer (like OpenAI's cl100k_base used for GPT-4) won't see four distinct words. It breaks them down like this:

Fragment	Token Type	Reason
Learn	Sub-word	The root root of the word
ing	Suffix	Common sub-word ending
AI	Space + Word	The space before a word is grouped with it
is	Space + Word	Grouped together to save space
fun	Space + Word	Grouped together
!	Punctuation	Standard punctuation gets its own token

A 4-word sentence instantly becomes 6 tokens.

The Developer's Gotcha: White Space and Code

Because spaces are often baked into the tokens themselves, formatting matters immensely. In programming languages like Python—where indentation defines scope—tabbing or spacing drastically increases your token count.

# This code block uses more tokens than you think because 
# indentation spaces are processed as distinct token fragments.
def hello_world():
    print("Hello, World!")

Why Don't We Just Use Whole Words?

It seems like an extra step, so why do AI researchers rely on sub-word tokenization instead of a massive dictionary of whole words?

1. The "Out of Vocabulary" (OOV) Problem

If an LLM only recognized whole words, what happens when a user types a typo, a brand new framework name, or internet slang (like rizz)? The model would break down. By using sub-words (like breaking ungettable into un + get + table), the AI can dynamically deduce the meaning of words it has never seen before.

2. Computational Efficiency

The English language has millions of words. Teaching an AI a unique mathematical identity for every single word—plus all its tenses and plural forms—would make the model's architecture massive and sluggish. By using a fixed vocabulary of roughly 50,000 to 100,000 sub-word tokens, the AI can assemble literally any word in existence, acting like a bucket of Lego bricks.

3. Turning Text into Vectors

Computers only process numbers. Tokenization is the bridge. Once text is split into tokens, each unique token is mapped to a specific integer ID.

Learn might be ID 4321
ing might be ID 128

These IDs are then converted into high-dimensional vectors (embeddings) so the LLM can run complex matrix multiplication to predict the next logical token.

The Context Window Budget

Every LLM has a Context Window (e.g., 8k, 32k, or even 1M+ tokens). Think of this as the model's short-term working memory. When you text a chatbot, the entire history of your conversation is bundled up and sent back to the API with every single new prompt. If your conversation history hits 4,000 tokens and the model's limit is 4,000, it cannot generate another word without "forgetting" the very first token at the top of the chat.

As developers, managing this budget is critical. Techniques like vector databases (RAG), text summarization, and aggressive trimming of system prompts are entirely about keeping token costs low and preventing your application from hitting memory ceilings.

Want to Test It Yourself?

If you are writing backend code or optimizing prompts, don't guess your token counts. You can experiment with official tokenizer tools to see exactly how your text is being sliced:

OpenAI Tokenizer: An interactive web tool showing how text translates to token IDs.
Tiktoken (Python): A fast BPE tokenizer library you can integrate into your Python backends to count tokens locally before hitting an API.

import tiktoken

encoding = tiktoken.get_encoding("cl100k_base")
tokens = encoding.encode("Learning AI is fun!")
print(f"Token Count: {len(tokens)}")  # Outputs: 6

Understanding tokens is the first step toward writing more cost-efficient prompts, building better AI apps, and understanding why models behave the way they do.

Over to you: Have you run into any weird bugs or massive cloud bills because of unexpected token usage? Let's talk about it in the comments below!

Your Guide to Vibe Coding with a Local LLM

Sreeraj Sreenivasan — Mon, 18 May 2026 18:47:39 +0000

No API costs. No rate limits. No privacy concerns. Just you, your machine, and a model that thinks at the speed of flow. A complete setup guide for local AI-powered coding.

No API costs. No rate limits. No privacy concerns. Just you, your machine, and a model that thinks at the speed of flow.

The Problem with Cloud AI for Coding

You're deep in a coding session. You're in the zone. Then your AI assistant hits a rate limit, lags for 4 seconds, or you suddenly remember you just pasted a proprietary database schema into a third-party API.

Cloud-based LLMs are incredible — but for vibe coding, that fluid, almost meditative state of rapid prototyping and iterative thinking, they're not always the right tool. Latency breaks flow. Rate limits kill momentum. Privacy is a legitimate concern for professional codebases.

The solution? Run the model locally. This guide sets up your machine as a fully self-contained AI coding environment, for free, forever.

01 — Choosing Your Runner: Why Ollama Wins

Your "runner" is the software that loads model weights and serves them via a local API. The three main contenders are Ollama, LM Studio, and llama.cpp.

Runner	Best for	Tradeoff
Ollama	Integration, automation, IDE plugins	Minimal GUI
LM Studio	Discovering and testing models visually	Heavier, less scriptable
llama.cpp	Maximum performance tuning	Requires more configuration

For vibe coding, Ollama wins. It exposes an OpenAI-compatible API at localhost:11434, which means every IDE plugin and chat UI that supports OpenAI can point straight at your local model — zero code changes required. It installs in one command and runs silently in the background.

02 — The Brain: Best Open-Weights Coding Models

Model choice depends on your hardware. Here's the current state-of-the-art landscape for coding:

Model	Size	Best for	Min VRAM	Speed
Qwen2.5-Coder	7B	Autocomplete, quick edits	8GB	⚡ Fast
DeepSeek-Coder-V2	16B	Architecture, debugging	12GB	⚖️ Balanced
Qwen2.5-Coder	32B	Complex reasoning, refactoring	24GB	🧠 Deep

For most developers on 16–32GB unified memory (Apple Silicon) or a mid-range NVIDIA GPU, DeepSeek-Coder-V2 16B hits the sweet spot — fast enough for conversational flow, smart enough for non-trivial problems.

💡 Apple Silicon tip: Unified memory is a superpower here. A MacBook Pro M3 Max with 64GB can run a 32B model entirely in memory with impressive throughput. No discrete GPU needed.

03 — The Interface: Your Vibe Coding Cockpit

The model running in the background is just the engine. You need a cockpit. Here are the three layers:

Continue.dev (VS Code / JetBrains)

The best open-source AI coding assistant for local LLMs. Inline autocomplete, a chat sidebar, slash commands, and full Ollama support out of the box. This is your primary coding interface.

Open WebUI

A self-hosted, ChatGPT-like web interface that connects to Ollama. Perfect for longer architecture brainstorming sessions, explaining complex problems, or rubber-ducking system design — without leaving your local environment.

Aider (CLI)

A terminal-based AI pair programmer that edits your actual files and is commit-aware. Exceptional for bulk refactoring, large-scale changes across multiple files, and keeping a clean git history of AI-assisted edits.

Recommended combo: Ollama in the background → Continue.dev in VS Code for in-editor flow → Open WebUI in a browser tab for architecture chats.

04 — Step-by-Step Setup Checklist

Step 1 — Install Ollama

Visit ollama.com or run:

# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows: download the installer from ollama.com

Ollama runs as a background service on port 11434.

Step 2 — Pull your first model

# Fast and lightweight (good starting point)
ollama pull qwen2.5-coder:7b

# Balanced power and speed (recommended for most setups)
ollama pull deepseek-coder-v2:16b

# Maximum capability (requires 24GB+ VRAM or unified memory)
ollama pull qwen2.5-coder:32b

Step 3 — Test the model

ollama run qwen2.5-coder:7b
# Type a prompt. If you get a response, your runner is working.

Step 4 — Install Continue.dev in VS Code

Open VS Code → Extensions (Cmd+Shift+X) → search "Continue" → Install.

Continue will auto-detect your running Ollama instance.

Step 5 — Configure Continue

Open ~/.continue/config.json and add your model:

{
  "models": [
    {
      "title": "DeepSeek Coder",
      "provider": "ollama",
      "model": "deepseek-coder-v2:16b",
      "apiBase": "http://localhost:11434"
    }
  ],
  "tabAutocompleteModel": {
    "title": "Qwen2.5 Coder 7B",
    "provider": "ollama",
    "model": "qwen2.5-coder:7b",
    "apiBase": "http://localhost:11434"
  }
}

Restart VS Code and hit Cmd+L (Mac) / Ctrl+L (Windows/Linux) to open the chat.

Step 6 — Install Open WebUI (optional, requires Docker)

docker run -d \
  -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  ghcr.io/open-webui/open-webui:main

Visit http://localhost:3000 and connect it to your Ollama instance.

Step 7 — Tune for speed

# Maximize GPU offloading (set in your shell profile)
export OLLAMA_NUM_GPU_LAYERS=-1

# Enable flash attention for faster inference (supported hardware)
export OLLAMA_FLASH_ATTENTION=1

On Apple Silicon, GPU offloading is automatic — no configuration needed.

Step 8 — Start vibe coding

Open a project in VS Code. Hit Cmd+L to open Continue. Ask it anything about your codebase. Feel the flow.

05 — Pro Tips for Maximum Performance

Use quantized models. A Q4_K_M quantized 14B model often runs faster than a Q8 7B model with comparable quality. You can specify the quantization level explicitly:

ollama pull qwen2.5-coder:14b-instruct-q4_K_M

Keep context windows tight. Shorter context = faster generation. In Continue, set "contextLength": 8192 unless you genuinely need more. Feeding 128K tokens to every autocomplete request will kill your latency.

Use a dedicated model per task. A small 3B model for tab-completion, a 16B model for chat. Continue supports multiple model configs and you can switch with a keyboard shortcut — this is one of its best features.

Pre-warm your model. On first load, models take a few seconds to initialize. Send a dummy request when your machine starts up to keep the model warm in memory.

The Vibe Is Yours to Own

Once this stack is running, you have a private, unlimited, cost-free AI coding environment that runs entirely on your hardware. No subscriptions. No outages. No one reading your code.

The future of AI-assisted development isn't just in the cloud — it's sitting on your desk, ready to go offline.

The Evolution of AI Coding Styles: From Syntax Warriors to Intent Architects

Sreeraj Sreenivasan — Sun, 17 May 2026 13:04:40 +0000

The way we write code is undergoing a seismic shift. For decades, developers were defined by their mastery of syntax, their ability to debug obscure errors at 2 AM, and their encyclopedic knowledge of standard libraries. Today, AI has fundamentally rewritten the rules of the game.

We're transitioning from a syntax-first era — where writing code line-by-line was the job — to an intent-first era, where expressing what you want to build matters more than remembering how to build it.

This isn't about AI replacing developers. It's about the evolution of coding styles — from autocomplete assistants to autonomous agent swarms — and what that means for how we think, architect, and ship software.

Let's break down the five distinct paradigms of modern AI-assisted development, how they work, and what they demand from developers.

1. Inline Copiloting: The Navigator in Your Editor

What It Is

Inline copiloting is the most familiar AI coding style. Tools like GitHub Copilot, Tabnine, and Amazon CodeWhisperer sit inside your IDE and provide real-time, context-aware code suggestions as you type.

Think of it as pair programming with an AI. You're still the pilot — you write the function signature, name the variables, define the logic — but the AI acts as a highly competent navigator that fills in the repetitive, predictable parts.

How It Works

You type a comment: // fetch user data from API
Copilot suggests the entire function body: API call, error handling, JSON parsing
You accept, reject, or modify the suggestion
You stay in control of architecture, flow, and edge cases

Developer Role: The Pilot

You're driving. The AI is suggesting the next turn, but you decide the route, the destination, and when to override.

Strengths

Speed: Eliminates boilerplate, common patterns, and repetitive loops
Context-aware: Reads your existing code and adapts suggestions
Low friction: Feels like enhanced autocomplete, not a context switch

Weaknesses

No architectural thinking: Copilot won't design your system
Passive assistance: You still write most of the code manually
Quality variance: Suggestions range from brilliant to buggy

Best For

Writing tests, boilerplate, utility functions
Exploring unfamiliar libraries or languages
Developers who want AI help without changing their workflow

2. Prompt Engineering: Code as Context, Prompts as Instructions

What It Is

Prompt engineering treats the AI like a highly skilled contractor. Instead of typing code line-by-line, you write a structured, precise prompt that acts like a specification document. The AI generates the implementation, and you review, refine, and integrate.

This isn't casual ChatGPT usage. It's context-rich, constraint-heavy, version-controlled prompting where the quality of your output is directly proportional to the quality of your prompt.

How It Works

You are a senior backend engineer specializing in FastAPI and async SQLAlchemy.

Task: Build a REST API endpoint for user authentication with the following requirements:
- POST /auth/login
- Accept email and password
- Validate input using Pydantic v2
- Query Users table using async SQLAlchemy
- Hash passwords with bcrypt
- Return JWT token on success
- Return 401 on invalid credentials
- Include error handling for database timeouts

Style: Clean, production-ready, type-hinted Python 3.11+
Return: Only the FastAPI route function and dependencies

The AI generates the code. You review, test, and integrate.

Developer Role: The Analytical Architect

You're not writing code — you're writing instructions for code. Your job is to define constraints, edge cases, design patterns, and quality criteria with surgical precision.

Strengths

High-quality output: Well-structured prompts produce production-grade code
Reusability: Save prompts as templates for similar tasks
Iterative refinement: Debug the prompt, not just the code

Weaknesses

Prompt fragility: Small wording changes can drastically alter output
No execution: The AI doesn't run, test, or debug the code
Context limits: Large codebases require careful chunking

Best For

Generating components, schemas, services, or modules from scratch
Refactoring existing code with specific constraints
Developers comfortable with specification-driven development

3. Vibe Coding: Intent-Driven Development

What It Is

Vibe coding is the most radical departure from traditional development. Instead of writing code or prompts, you describe what you want in natural language or voice, and the AI autonomously builds, debugs, runs, and iterates until it works.

Tools like Cursor, Replit Agent, v0 by Vercel, and bolt.new are purpose-built for this style. You act as a director or product manager, and the AI is the development team.

How It Works

You say (or type):

"Build a React dashboard with a sidebar, a table showing user data from /api/users, and a search filter. Use Tailwind for styling. Make it responsive."

The AI:

Scaffolds the React components
Fetches data from the API
Applies Tailwind classes
Runs the dev server
Debugs errors autonomously
Shows you a working preview

You review, request changes, and the AI iterates.

Developer Role: The Director

You're not coding. You're managing outcomes. You define the goal, provide feedback, and steer the direction. The AI handles implementation, package installation, and debugging.

Strengths

Fastest prototype-to-product loop: Go from idea to working app in minutes
No syntax barriers: Accessible to non-developers or those learning new stacks
Autonomous debugging: AI fixes its own errors and retries

Weaknesses

Loss of control: You don't see every line being written
Black-box risk: Hard to debug when the AI gets stuck
Quality ceiling: Works brilliantly for prototypes, struggles with complex architecture

Best For

Rapid prototyping, MVPs, side projects
Learning new frameworks by observing AI's approach
Developers who want to ship fast and iterate faster

4. Agentic Orchestration: Multi-Agent Swarms

What It Is

Agentic orchestration is the next frontier. Instead of a single AI assistant, you deploy multiple specialized AI agents that collaborate autonomously. Each agent has a distinct role — PM Agent, Dev Agent, QA Agent, DevOps Agent — and they communicate, divide tasks, and execute in parallel.

Tools like AutoGPT, MetaGPT, CrewAI, and LangGraph enable this workflow.

How It Works

You define a high-level goal:

"Build a SaaS app for invoice generation with user authentication, PDF export, and Stripe integration."

The orchestration layer deploys:

PM Agent: Breaks down requirements, defines user stories
Dev Agent: Writes backend (FastAPI), frontend (React), database schema
QA Agent: Writes tests, runs them, reports failures
DevOps Agent: Dockerizes the app, sets up CI/CD

The agents execute autonomously, passing context between each other. You monitor progress and intervene only when needed.

Developer Role: The System Overseer

You're not a coder. You're a systems orchestrator. Your job is to define the goal, configure the agent swarm, review outputs, and handle exceptions.

Strengths

Massive parallelization: Agents work simultaneously on different parts of the system
Separation of concerns: Each agent is optimized for its domain
End-to-end automation: From requirements to deployment

Weaknesses

Complexity: Orchestrating agents requires deep architectural knowledge
Coordination failures: Agents can conflict or duplicate work
Cost: Running multiple agents simultaneously is expensive

Best For

Large, complex projects with well-defined requirements
Teams exploring fully autonomous development pipelines
Developers who want to scale their impact exponentially

5. Forensic / Remedial Coding: Refactoring the Past

What It Is

Forensic coding is the AI-assisted art of analyzing, refactoring, and modernizing legacy, broken, or inefficient codebases. This is the opposite of greenfield development — it's archaeology, surgery, and translation all at once.

AI excels at reading decades-old COBOL, mapping dependencies in spaghetti code, identifying vulnerabilities, and translating legacy systems into modern languages.

How It Works

You feed the AI a legacy codebase:

"This is a 10,000-line COBOL program for payroll processing. Map all dependencies, identify security vulnerabilities, and generate a Python equivalent using modern best practices."

The AI:

Parses the COBOL syntax
Maps data flows and business logic
Flags SQL injection risks, buffer overflows, hardcoded credentials
Generates a Python/FastAPI equivalent with type hints, async support, and tests

Developer Role: The Code Archaeologist

You're not building new features — you're rescuing, refactoring, and modernizing. Your job is to understand the original intent, validate the AI's translation, and ensure nothing breaks.

Strengths

Speed: Refactors in hours what would take weeks manually
Pattern recognition: AI spots anti-patterns humans miss
Cross-language translation: Converts COBOL → Python, PHP → Node.js, etc.

Weaknesses

Context gaps: AI may misinterpret obscure legacy logic
Risk: Automated refactoring can introduce subtle bugs
Validation burden: You must rigorously test the output

Best For

Migrating legacy systems to modern stacks
Security audits of old codebases
Developers maintaining or sunsetting legacy apps

The Great Shift: Syntax-First → Intent-First

Here's how the developer skillset is fundamentally changing:

Old Coding Era (Syntax-First)	Modern AI Coding Era (Intent-First)
Memorizing syntax and standard libraries	Knowing which AI tool fits the task
Writing boilerplate from scratch	Reviewing and refining AI-generated code
Debugging line-by-line manually	Prompting AI to debug and explain errors
Googling Stack Overflow for solutions	Prompting AI with context and constraints
Deep expertise in 1-2 languages	Broad fluency across stacks via AI assistance
Lone-wolf coding sessions	Collaborating with AI agents and tools
Code quality = your skill ceiling	Code quality = your review + verification rigor
Speed = typing speed + recall	Speed = prompt quality + orchestration skill
Architecture in your head	Architecture in prompts, docs, and diagrams
Career defined by what you can build alone	Career defined by what you can build with AI

The Takeaway: Adapt Without Losing Your Edge

The intent-first era doesn't mean traditional coding skills are obsolete. It means they're being abstracted up the stack.

Here's how to thrive:

Master the fundamentals — AI accelerates execution, but it won't architect your system. You still need to understand data structures, algorithms, API design, and software patterns.
Learn to review, not just write — Your new superpower is critical code review. Can you spot the subtle bug in AI-generated code? Do you know when a suggestion is brilliant vs. dangerous?
Become a prompt engineer — Writing precise, constraint-rich prompts is a skill. Treat it like writing tests: specific, deterministic, and version-controlled.
Experiment with all paradigms — Inline copiloting for boilerplate, prompt engineering for components, vibe coding for prototypes, agentic orchestration for complex projects. Use the right tool for the job.
Build verification systems — AI moves fast. You need automated tests, type checkers, linters, and security scanners to catch what AI misses.
Stay curious, not defensive — The developers who resist AI will be left behind. The ones who integrate it strategically will 10x their impact.

The future of coding isn't about writing less code. It's about building better systems, faster, with higher-quality outputs, by orchestrating AI as a force multiplier.

The question isn't whether AI will change how you code.

The question is: How fast can you adapt your coding style to harness it?

What's your current AI coding style? Are you still in the syntax-first era, or have you made the leap to intent-first development? Drop a comment below — I'd love to hear how you're adapting!

The 2026 Developer's Guide to Zero-Cost Full-Stack Hosting: FastAPI, React, and PostgreSQL

Sreeraj Sreenivasan — Tue, 12 May 2026 12:55:20 +0000

From local dev to a production-ready public release — without spending a dollar.

Introduction

Hosting a full-stack application used to mean picking a server, paying a monthly bill, and hoping it didn't fall over at 3am. In 2026, that model is largely obsolete for solo developers and small teams.

The modern zero-cost stack — FastAPI on Render, React on Vercel, PostgreSQL on Neon — gives you serverless databases that scale to zero, edge-delivered frontends with sub-millisecond load times, and Git-integrated CI/CD that deploys on every push. All of it free, all of it production-grade, all of it the same infrastructure that startups run in production at scale.

To see this stack in action, you can visit mobitrendz.vercel.app, a full-stack FastAPI, PostgreSQL, React template I successfully deployed today for zero cost. Please sign up and try it.

But raw hosting is only half the story. The real unlock in 2026 is treating your OpenAPI schema as a living source of truth — a contract that keeps your FastAPI backend and React frontend permanently in sync, automatically, with type-safe generated clients that break the build if the contract drifts.

This guide walks through:

The "Contract-First" architecture that makes this stack production-ready
A detailed review of Vercel, Render, and Neon in their 2026 roles
An honest comparison against the alternatives
A practical deployment checklist you can run today

Let's ship.

Part 1: The "Source of Truth" Architecture

Hosting Is No Longer Just About Files

The old mental model of hosting was simple: put your HTML somewhere, point a domain at it, done. That model broke when applications became stateful, distributed, and AI-integrated.

In 2026, a production full-stack app has to answer harder questions:

Where does your data live relative to your users? Latency from a single-region server is now a measurable UX problem. Edge delivery isn't optional for global audiences.
How does your frontend know what the backend expects? Manual API documentation drifts. Types get out of sync. The frontend sends a field the backend renamed three sprints ago, and you find out from a user complaint.
How does your system behave under load spikes it didn't anticipate? Serverless databases that scale to zero (and back up) handle this elegantly. Fixed-resource servers don't.

The answer to all three is an architecture that treats type safety as infrastructure — not a developer preference, but a build constraint enforced in CI/CD.

The Contract-First Loop

The Contract-First loop is the architectural backbone of this stack. Here's how it works end to end:

┌─────────────────────────────────────────────────┐
│                   THE LOOP                       │
│                                                 │
│  FastAPI (Render)                               │
│  └── exposes /openapi.json                      │
│       └── triggers @hey-api/openapi-ts          │
│            └── generates typed React client     │
│                 └── build fails if schema drift │
│                      └── Vercel deploys only    │
│                           if types pass         │
└─────────────────────────────────────────────────┘

Step 1 — FastAPI as the Schema Authority

FastAPI generates an OpenAPI 3.1 schema automatically from your route decorators and Pydantic models. This isn't documentation you write — it's a machine-readable contract your code produces.

# FastAPI automatically exposes this at /openapi.json
from fastapi import FastAPI
from pydantic import BaseModel, EmailStr

app = FastAPI(
    title="MyApp API",
    version="1.0.0",
    # Explicitly version your schema for client generation stability
    openapi_version="3.1.0",
)

class UserCreate(BaseModel):
    email: EmailStr
    name: str
    role: str = "user"

class UserResponse(BaseModel):
    id: str
    email: EmailStr
    name: str
    role: str
    created_at: str

@app.post("/api/v1/users", response_model=UserResponse, status_code=201)
async def create_user(payload: UserCreate) -> UserResponse:
    ...

Step 2 — Auto-Generating the React Client

@hey-api/openapi-ts consumes your /openapi.json and generates a fully-typed TypeScript client — models, services, request/response types — directly from the schema.

# package.json script
"generate:api": "openapi-ts --input https://your-api.onrender.com/openapi.json --output src/api/generated --client axios"

This produces:

// src/api/generated/services/UsersService.ts (auto-generated — do not edit)
export class UsersService {
  static async createUser(data: UserCreate): Promise<UserResponse> {
    return request(OpenAPI, {
      method: 'POST',
      url: '/api/v1/users',
      body: data,
    });
  }
}

Step 3 — CI/CD as the Contract Enforcer

The loop closes in your CI pipeline. Before Vercel deploys, regenerate the client and run TypeScript's compiler as a type-checker. If the backend schema changed and the frontend code now references a field that no longer exists, tsc --noEmit fails the build.

# .github/workflows/frontend.yml
name: Frontend CI

on:
  push:
    branches: [main]
  pull_request:

jobs:
  type-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install dependencies
        run: npm ci

      - name: Regenerate API client from live schema
        run: npm run generate:api
        env:
          API_URL: ${{ secrets.RENDER_API_URL }}

      - name: TypeScript type check
        run: npx tsc --noEmit

      - name: Run tests
        run: npm test

If tsc --noEmit exits non-zero, the Vercel deployment never triggers. Your frontend cannot ship code that is type-incompatible with your backend. That's the contract.

Part 2: Provider Deep Dive

Vercel — The AI Cloud

Role in the stack: Frontend host, edge runtime, preview environments

Vercel's 2026 positioning is as an "AI Cloud" — a CDN-first platform where your application logic runs as close to the user as physically possible. For a React SPA backed by a FastAPI service, Vercel handles everything the browser touches.

Edge Delivery and Sub-Millisecond Load Times

Vercel's global edge network spans 100+ points of presence. When a user in Singapore requests your app, they're served from Singapore — not from a server in us-east-1. Static assets, cached responses, and edge functions all execute at the node closest to the request origin.

For a React app with code-split routes and optimised bundles, this means:

First Contentful Paint under 800ms globally
Time to Interactive under 1.5s on 4G connections
Automatic HTTP/3 and Brotli compression

Ephemeral Environments for Every Pull Request

Every pull request to your GitHub repository automatically gets a unique preview URL:

https://myapp-git-feature-auth-flow-yourteam.vercel.app

This is a fully functional deployment — not a mock. It connects to your real Neon database branch (more on this below), runs your real frontend code, and is shareable with stakeholders for review before merge.

When the PR closes, the environment tears itself down. No cleanup, no dangling resources, no cost.

Free Tier Highlights (2026):

100 GB bandwidth/month
Unlimited deployments
6,000 build minutes/month
Preview environments on every PR
Edge Functions with 500K invocations/month

The Constraint: Vercel is a frontend platform. Your FastAPI backend does not run on Vercel. API routes (/api/*) can be handled by Vercel Edge Functions for lightweight tasks (auth checks, redirects, header injection), but your primary FastAPI application lives on Render.

Render — The Application Host

Role in the stack: FastAPI runtime, background workers, cron jobs

Render is where your Python application actually runs. It takes a Git repository, detects your runtime, builds your Docker image or uses a managed environment, and deploys.

750 Free Instance Hours

Render's free tier provides 750 instance hours per month — enough for one always-on service, or several services that share the allocation. A single FastAPI service running continuously uses exactly 720 hours in a 30-day month, fitting within the free tier.

# render.yaml — Infrastructure as Code for Render
services:
  - type: web
    name: myapp-api
    env: python
    buildCommand: pip install -r requirements.txt
    startCommand: uvicorn app.main:app --host 0.0.0.0 --port $PORT
    envVars:
      - key: DATABASE_URL
        fromDatabase:
          name: myapp-db
          property: connectionString
      - key: SECRET_KEY
        generateValue: true
      - key: SENTRY_DSN
        sync: false  # Set manually in Render dashboard
    healthCheckPath: /health
    autoDeploy: true

Git-Integrated CI/CD

Push to main, Render builds and deploys. No additional CI configuration required for the basics. Every deploy shows build logs in real time, and failed deploys automatically roll back to the last successful build.

For more control, connect Render to a GitHub Actions workflow:

# Trigger Render deploy after backend tests pass
- name: Deploy to Render
  if: github.ref == 'refs/heads/main'
  run: |
    curl -X POST ${{ secrets.RENDER_DEPLOY_HOOK_URL }}

The Cold Start Reality

Free tier Render instances spin down after 15 minutes of inactivity. The first request after inactivity incurs a cold start — typically 10–30 seconds for a Python service. For a hobby project or internal tool this is acceptable. For a customer-facing API with SLA requirements, upgrade to a paid instance ($7/month) or use a cron job to ping the health endpoint every 10 minutes.

# app/routers/health.py
from fastapi import APIRouter

router = APIRouter()

@router.get("/health", tags=["system"])
async def health_check() -> dict:
    return {"status": "ok", "version": "1.0.0"}

Free Tier Highlights (2026):

750 instance hours/month
Automatic Git-to-deploy on push
Built-in TLS/SSL certificates
DDoS protection
Private networking between services

Neon — Serverless Postgres

Role in the stack: Primary database, branching for preview environments

Neon is PostgreSQL — fully compatible, no proprietary extensions required — running on a serverless architecture that separates storage from compute. When no queries are running, the compute scales to zero. When a query arrives, it spins back up in milliseconds.

The 3 GiB Free Tier

Neon's free tier includes 3 GiB of storage, which is substantial for most applications in early production. A users table with a million rows, JSON metadata, and indexes typically sits well under 500 MB.

More importantly, the serverless billing model means you never pay for idle time. A database that receives one query per hour costs the same as one that receives zero.

Database Branching for Preview Environments

This is Neon's killer feature for the zero-cost stack. Just as Vercel creates a preview environment for every PR, Neon can create a database branch — a copy-on-write snapshot of your schema and data that a preview environment can use safely.

# Using the Neon CLI in CI/CD
- name: Create Neon branch for PR
  run: |
    neon branches create \
      --project-id $NEON_PROJECT_ID \
      --name "preview/pr-${{ github.event.pull_request.number }}" \
      --parent main

The preview Vercel deployment connects to the preview Neon branch. Migrations tested in the preview environment never touch production data. When the PR merges, the branch is deleted automatically.

Connecting FastAPI to Neon

# app/core/database.py
from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession, async_sessionmaker
from app.core.config import settings

# Neon requires sslmode=require — always
engine = create_async_engine(
    settings.DATABASE_URL,
    pool_size=5,
    max_overflow=10,
    pool_pre_ping=True,  # Handles Neon's scale-to-zero reconnection
    connect_args={"ssl": "require"},
)

AsyncSessionLocal = async_sessionmaker(
    engine,
    class_=AsyncSession,
    expire_on_commit=False,
)

pool_pre_ping=True is critical. When Neon scales to zero and back, existing connections become stale. pool_pre_ping sends a lightweight SELECT 1 before each connection checkout, discarding stale connections transparently.

Free Tier Highlights (2026):

3 GiB storage
Scale to zero (no idle compute cost)
Database branching
Point-in-time restore (7 days)
Postgres 16 with pgvector support

Part 3: Stack Comparison — Zero-Cost vs. The Alternatives

Every architectural choice has trade-offs. Here's an honest comparison of the zero-cost stack against the primary alternatives developers choose in 2026.

	Zero-Cost Stack	Option A: PaaS	Option B: Hybrid Cloud	Option C: Budget VPS	Option D: Home Server
Providers	Vercel + Render + Neon	Render / Koyeb alone	Vercel + Neon	Hostinger / DigitalOcean	Self-hosted + Cloudflare
Monthly Cost	$0	$0–7	$0–20	$5–10	~$0 (electricity)
Best For	Side projects, MVPs, OSS templates	Rapid prototyping	Performance + scalability	Full control, no cold starts	Privacy, unlimited data
Cold Starts	Yes (Render free tier)	Yes (free tier)	No	No	No
Edge Delivery	✅ Vercel global CDN	❌ Single region	✅ Vercel global CDN	❌ Single region	⚠️ Via Cloudflare
Git-to-Deploy	✅ Render + Vercel	✅ Native	✅ Vercel	⚠️ Manual setup	❌ Manual
DB Branching	✅ Neon	❌	✅ Neon	❌	❌
Preview Envs	✅ Vercel	❌	✅ Vercel	❌	❌
Scale to Zero	✅ Neon + Render	✅	✅ Neon	❌	❌
Operational Overhead	Low	Very Low	Low	High	Very High
Production Viability	Medium-High	Medium	High	High	Medium

When to choose each:

Zero-Cost Stack — You're building an MVP, an open-source template, or a portfolio project. You want production-grade tooling without a credit card. Accept the Render cold start trade-off.

Option A — PaaS Only (Render/Koyeb) — You want the simplest possible deployment. One platform, one dashboard, one bill. Koyeb offers European region support, which matters for GDPR compliance.

Option B — Hybrid Cloud (Vercel + Neon) — You're scaling and performance is non-negotiable. You've outgrown Render's free tier and moved your backend to a paid Render instance or Railway. Vercel + Neon is the premium tier of this stack.

Option C — Budget VPS — You need consistent response times without cold starts, want root access, and don't mind setting up Nginx, systemd, and a deployment pipeline yourself. $6/month on DigitalOcean buys you a fully dedicated environment.

Option D — Home Linux Server — You're privacy-focused, running large datasets that would be expensive in the cloud, or experimenting with local AI models. Cloudflare Tunnels expose your local server to the internet without port-forwarding. The trade-off is reliability: your uptime depends on your home internet and hardware.

Part 4: The 2026 Deployment Checklist

✅ Secret Syncing — Never Leak Keys in Git

The cardinal rule: environment variables never touch your repository. Not even in .env.example with real values. Not even in a private repo.

The correct pattern:

# .env (local only — must be in .gitignore)
DATABASE_URL=postgresql+asyncpg://user:password@ep-xxx.neon.tech/mydb?sslmode=require
SECRET_KEY=your-local-dev-secret
SENTRY_DSN=https://xxx@sentry.io/xxx

# .env.example (committed to Git — dummy values only)
DATABASE_URL=postgresql+asyncpg://user:password@host/dbname?sslmode=require
SECRET_KEY=generate-with-openssl-rand-hex-32
SENTRY_DSN=https://your-dsn@sentry.io/your-project

Syncing between Render and Vercel:

Both Render and Vercel have environment variable dashboards. Set secrets there — never in code. For variables that both services need (like a shared JWT secret), set them independently in each dashboard.

For team environments, use a secrets manager:

# app/core/config.py
from pydantic_settings import BaseSettings, SettingsConfigDict

class Settings(BaseSettings):
    # Pydantic-settings reads from environment variables automatically
    # On Render/Vercel: set in the dashboard
    # Locally: read from .env file
    DATABASE_URL: str
    SECRET_KEY: str
    SENTRY_DSN: str = ""
    ENVIRONMENT: str = "development"
    CORS_ORIGINS: list[str] = ["http://localhost:5173"]

    model_config = SettingsConfigDict(
        env_file=".env",
        env_file_encoding="utf-8",
        case_sensitive=True,
    )

settings = Settings()

Sharing the backend URL with the frontend:

# In Vercel dashboard — Environment Variables
VITE_API_URL=https://myapp-api.onrender.com

// src/api/client.ts
import { OpenAPI } from './generated';

OpenAPI.BASE = import.meta.env.VITE_API_URL ?? 'http://localhost:8000';

✅ Standardized Error Handling — The `detail` Key Contract

Your frontend should never show a user "Network Error" or "Request failed with status 422." Every error your API returns should carry a human-readable message the UI can display directly.

FastAPI's HTTPException does this via the detail key:

# Backend — app/services/user.py
from fastapi import HTTPException, status

async def create_user(payload: UserCreate, db: AsyncSession) -> User:
    existing = await user_repo.get_by_email(db, payload.email)
    if existing:
        raise HTTPException(
            status_code=status.HTTP_409_CONFLICT,
            detail="A user with this email already exists.",
            # This exact string reaches the React frontend
        )
    ...

FastAPI serialises this as:

{ "detail": "A user with this email already exists." }

Catching it universally in React:

// src/api/interceptors.ts
import { client } from './generated';
import toast from 'react-hot-toast';

client.interceptors.response.use(
  (response) => response,
  (error) => {
    const detail = error.response?.data?.detail;

    if (typeof detail === 'string') {
      // HTTPException with string message: "A user with this email already exists."
      toast.error(detail);
    } else if (Array.isArray(detail)) {
      // Pydantic validation error: array of field-level errors
      const messages = detail.map((e: { msg: string }) => e.msg).join(', ');
      toast.error(`Validation error: ${messages}`);
    } else if (error.response?.status === 429) {
      toast.error('Too many requests. Please wait a moment and try again.');
    } else {
      toast.error('Something went wrong. Please try again.');
    }

    return Promise.reject(error);
  }
);

This single interceptor handles:

409 — business logic conflicts with specific messages
422 — Pydantic validation failures with field-level detail
429 — rate limiting (via SlowAPI's custom handler)
401 / 403 — authentication and authorization failures
500 — unexpected server errors with a safe generic fallback

The user always sees a meaningful message. The frontend never parses raw status codes.

✅ Automated Type Checks — Break the Build on Schema Drift

This is the enforcement mechanism for the Contract-First loop. If the backend changes a field name, removes an endpoint, or alters a response model, the CI pipeline fails before anything ships to production.

Full CI pipeline for a Contract-First monorepo:

# .github/workflows/ci.yml
name: Full Stack CI

on:
  push:
    branches: [main]
  pull_request:

jobs:
  backend:
    name: Backend Tests
    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgres:16
        env:
          POSTGRES_PASSWORD: test
          POSTGRES_DB: testdb
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      - run: pip install -r requirements.txt
      - run: pytest --cov=app --cov-report=xml
        env:
          DATABASE_URL: postgresql+asyncpg://postgres:test@localhost/testdb
          SECRET_KEY: test-secret-key

  schema-export:
    name: Export OpenAPI Schema
    needs: backend
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      - run: pip install -r requirements.txt
      - name: Export schema to file
        run: |
          python -c "
          import json
          from app.main import app
          schema = app.openapi()
          with open('openapi.json', 'w') as f:
              json.dump(schema, f, indent=2)
          "
      - uses: actions/upload-artifact@v4
        with:
          name: openapi-schema
          path: openapi.json

  frontend:
    name: Frontend Type Check
    needs: schema-export
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
      - run: npm ci
        working-directory: frontend

      - uses: actions/download-artifact@v4
        with:
          name: openapi-schema
          path: frontend/

      - name: Generate API client from schema
        run: npm run generate:api -- --input openapi.json
        working-directory: frontend

      - name: TypeScript type check
        # This fails if any generated type is incompatible with existing frontend code
        run: npx tsc --noEmit
        working-directory: frontend

      - name: Run frontend tests
        run: npm test -- --run
        working-directory: frontend

What this pipeline enforces:

Backend tests must pass before schema export runs
Schema is exported directly from the FastAPI application — not fetched from a live URL — making it reproducible in CI
The exported schema regenerates the TypeScript client
tsc --noEmit validates that existing frontend code is compatible with the new client types
Only after all three jobs pass does Vercel's deployment trigger

If a backend developer renames user_id to id in UserResponse, step 4 fails with a TypeScript error pointing exactly to the frontend component that referenced user_id. The schema drift is caught before any user sees it.

Conclusion: From Local to Production-Ready

The zero-cost stack in 2026 is genuinely production-grade for a wide class of applications. What used to require a DevOps engineer, a cloud budget, and weeks of configuration now fits in a render.yaml, a GitHub Actions workflow, and a Vercel project.

But the real value isn't the hosting — it's the architecture around it.

The Contract-First loop means your frontend and backend evolve together, not independently. The standardised detail key means your users see meaningful error messages instead of raw HTTP codes. The CI/CD type check means schema drift gets caught in a pull request, not a production incident.

Your launch checklist:

[ ] render.yaml committed to the root of your repository
[ ] Environment variables set in Render and Vercel dashboards (never in Git)
[ ] VITE_API_URL pointing to your Render service URL
[ ] generate:api script in package.json pointing to your OpenAPI schema
[ ] GitHub Actions workflow running tsc --noEmit on every PR
[ ] pool_pre_ping=True in your SQLAlchemy engine for Neon reconnection
[ ] Custom 429 handler in SlowAPI returning {"detail": "..."} format
[ ] Sentry before_send hook capturing HTTPException.detail
[ ] Health endpoint at /health for Render uptime monitoring
[ ] .env in .gitignore, .env.example with dummy values committed

The gap between a local dev environment and a publicly releasable GitHub template is exactly this checklist. Run through it once, and you have a template every future project can start from.

Ship with confidence.

Tags: fastapi react postgres vercel render neon devops webdev python typescript

The Resilience & Observability Stack

Sreeraj Sreenivasan — Mon, 11 May 2026 11:45:12 +0000

Building Production-Ready FastAPI in 2026

Your API works. But is it production-ready?

Why This Matters

In 2026, "Contract-First" development means more than an OpenAPI spec. It means three implicit promises to every consumer:

Errors are predictable — every failure returns a structured, documented payload
Health is visible — logs, metrics, and traces tell a coherent story
The system self-heals — transient failures retry; abuse gets throttled

This article covers the two pillars that deliver on those promises:

Observability: Structlog + Prometheus + Sentry + Rich
Resilience: Tenacity + SlowAPI

Pillar 1: Enterprise Observability

Structlog — Centralized JSON Logging

Cloud environments need machine-readable logs. Structlog gives you JSON in production and human-friendly output locally — toggled by a single env var.

# app/core/logging.py
import structlog

def configure_logging() -> None:
    processors = [
        structlog.contextvars.merge_contextvars,
        structlog.stdlib.add_log_level,
        structlog.processors.TimeStamper(fmt="iso"),
        structlog.processors.JSONRenderer()  # swap for ConsoleRenderer() locally
        if settings.ENVIRONMENT == "production"
        else structlog.dev.ConsoleRenderer(colors=True),
    ]
    structlog.configure(processors=processors, cache_logger_on_first_use=True)

In production, every log entry is a clean, indexable JSON object. In development, it's colourised and human-readable — no config changes required.

Rich — Beautiful Local Console Output

Install Rich tracebacks globally and your terminal shows full variable state at every frame of an exception — invaluable for debugging async SQLAlchemy sessions.

from rich.traceback import install as install_rich_traceback
install_rich_traceback(show_locals=True, width=120)

Instead of a wall of text, you get colour-coded output with file references and local variable values at the exact line that failed.

Prometheus — Metrics in Two Lines

from prometheus_fastapi_instrumentator import Instrumentator

Instrumentator().instrument(app).expose(app, endpoint="/metrics")

You get request counts, latency histograms, and in-flight connections out of the box — ready for Grafana dashboards and SLO alerting.

Sentry — Error Tracking That Talks to Your Frontend

The critical constraint most templates miss: Sentry by default drops HTTPException.detail — the exact string your React frontend reads to show users a meaningful message like "User already exists".

Fix it with a before_send hook:

from fastapi import HTTPException

def before_send(event: dict, hint: dict) -> dict | None:
    exc_info = hint.get("exc_info")
    if exc_info:
        _, exc_value, _ = exc_info
        if isinstance(exc_value, HTTPException):
            event.setdefault("extra", {})
            event["extra"]["http_exception_detail"] = exc_value.detail
            event["extra"]["status_code"] = exc_value.status_code
            event.setdefault("tags", {})["http_status"] = str(exc_value.status_code)
    return event

sentry_sdk.init(dsn=settings.SENTRY_DSN, before_send=before_send, ...)

Now every 409 Conflict in your Sentry dashboard shows exactly what the user saw. Filter by http_status:409 across your entire project instantly.

Pillar 2: Application Resilience

Tenacity — Retries for Transient Failures

Kubernetes rolling deploys, Aurora cold starts, and flaky network hops all introduce brief connectivity gaps. Without retries, those gaps become 500 errors. With Tenacity, they're invisible.

from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
from sqlalchemy.exc import OperationalError, DisconnectionError

db_retry = retry(
    retry=retry_if_exception_type((OperationalError, DisconnectionError)),
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=0.5, max=4),
    reraise=True,  # Still raises after exhaustion — Sentry catches it with full context
)

Apply it as a decorator on any database or external HTTP call:

@db_retry
async def get_by_email(self, db: AsyncSession, email: str) -> User | None:
    result = await db.execute(select(User).where(User.email == email))
    return result.scalar_one_or_none()

reraise=True ensures exhausted retries still propagate normally through your exception handlers, keeping Structlog and Sentry integration intact.

SlowAPI — Rate Limiting That Respects Your OpenAPI Contract

The subtle problem with naive rate limiting: the 429 response becomes an undocumented payload that breaks your auto-generated frontend client.

The fix is a custom handler that returns {"detail": "..."} — identical to every other FastAPI error:

from fastapi.responses import JSONResponse
from slowapi.errors import RateLimitExceeded

async def rate_limit_handler(request: Request, exc: RateLimitExceeded) -> JSONResponse:
    return JSONResponse(
        status_code=429,
        content={"detail": f"Rate limit exceeded: {exc.detail}. Please slow down."},
        headers={"Retry-After": "60"},
    )

app.add_exception_handler(RateLimitExceeded, rate_limit_handler)

Apply per-route limits based on risk:

@router.post("/auth/login")
@limiter.limit("10/minute")   # Strict — brute-force bait
async def login(request: Request, payload: LoginRequest): ...

@router.post("/users/")
@limiter.limit("5/minute")    # Tight — prevent account creation spam
async def create_user(request: Request, payload: UserCreate): ...

The Frontend Connection

Every tool above converges on one payoff: your React client always reads the same key.

Scenario	Status	Response
User already exists	`409`	`{"detail": "User already exists"}`
Rate limit hit	`429`	`{"detail": "Rate limit exceeded..."}`
Validation failure	`422`	`{"detail": [...Pydantic errors]}`
Server error	`500`	`{"detail": "Internal server error"}`

One Axios interceptor handles all of them:

client.interceptors.response.use(
  (res) => res,
  (error) => {
    const message = error.response?.data?.detail ?? "An unexpected error occurred.";
    toast.error(typeof message === "string" ? message : JSON.stringify(message));
    return Promise.reject(error);
  }
);

No special-casing. No silent failures. One contract, end to end.

Conclusion

The gap between a demo API and a production API isn't features — it's operational maturity.

Tool	What it solves
`Structlog`	Structured logs for cloud aggregators
`Rich`	Developer-friendly local debugging
`Prometheus`	Latency metrics and SLO visibility
`Sentry + before_send`	Error tracking with frontend-aware payloads
`Tenacity`	Silent recovery from transient failures
`SlowAPI + custom handler`	Rate limiting that honours your OpenAPI contract

Together, these six tools ensure your FastAPI app behaves with consistency and transparency — whether on a single VPS, a Kubernetes cluster, or a globally distributed edge network.

Ship with confidence.

Tags: fastapi python observability sentry prometheus structlog tenacity slowapi backend

Stop Writing Code. Start Managing Agents. (A VSCode vs. Antigravity Story)

Sreeraj Sreenivasan — Wed, 06 May 2026 23:35:47 +0000

Coding in VSCode vs. Google Antigravity: A Developer's Honest Take

Two editors. Two philosophies. One very opinionated comparison.

So you've heard the buzz about Google Antigravity. Maybe you saw the announcement drop alongside Gemini 3 in November 2025 and thought, "Should I actually switch from my trusty VSCode setup?" I had the same thought. Then I spent a few weeks using both — seriously, back to back, on real projects — and here's what I found.

Spoiler: this isn't a simple "X is better" post. It's more complicated than that. And honestly, more interesting.

The Baseline: VSCode Is Still the GOAT of Familiarity

Let's be real — Visual Studio Code has earned its crown. After years of extensions, themes, keybindings, and deeply personal .settings.json files, VSCode feels like home. It's fast, deeply customizable, and the extension ecosystem is genuinely unmatched.

With GitHub Copilot, Copilot Chat, or even a self-hosted Ollama integration, VSCode has gotten really good at AI-assisted coding. Inline completions, chat sidebars, refactoring suggestions — it's all there.

But it still works the way IDEs have always worked: you write code, the AI suggests things, you accept or reject them. You're the pilot. The AI is your co-pilot who occasionally suggests a lane change.

That mental model is comfortable. Predictable. Controllable.

Enter Antigravity: Where the AI Stops Co-Piloting and Starts Flying

Google Antigravity landed in public preview on November 18, 2025 — and calling it "just another AI IDE" would be like calling a helicopter "just another car."

At its core, Antigravity is built around a radically different idea: what if the AI wasn't in the sidebar — but was actually doing the work?

It ships with two primary views:

Editor View — A familiar VS Code-style interface. Tab completions, inline commands, the extension support you're used to. This is where you code hands-on.
Manager View (Mission Control) — This is where things get wild. You describe a task at a high level, and Antigravity spins up a team of autonomous AI agents — a planner, executor agents, a reviewer — and you watch them work in parallel across your editor, terminal, and an embedded Chrome browser. Simultaneously.

Yes, it literally opens a browser, navigates your app, clicks around, and reports back with screenshots.

Head-to-Head: The Real Differences

🧠 Philosophy

	VSCode	Antigravity
You write code, AI assists	✅	✅ (Editor View)
AI writes code, you review	Via Copilot (basic)	✅ (Agent Mode — full)
Multi-agent parallel execution	❌	✅
Built-in browser automation	❌	✅

VSCode's philosophy: You are the developer. AI is a tool.

Antigravity's philosophy: AI is an autonomous developer. You are the manager.

This isn't just a feature difference — it's an entirely different way of thinking about your role.

⚙️ Workflow in Practice

In VSCode, a typical feature implementation looks like:

Open file
Type/describe what you want
Copilot suggests, you accept
Repeat until done
Manually test in terminal/browser

In Antigravity (Manager View):

Describe the feature in plain language
Agents generate a Plan Artifact — a structured implementation plan you can review
Executor agents write code, run terminal commands, and test in the browser
You receive Artifacts — screenshots, recordings, task logs — as verifiable proof of work
Leave comments on the Artifact (like Google Docs) to course-correct without stopping execution

The Artifact system is genuinely clever. Instead of scrolling through raw tool calls trying to figure out what the agent did, you get structured deliverables you can actually review.

🤖 Model Flexibility

VSCode (with Copilot) is largely locked to OpenAI/Microsoft models, though extensions give you some flexibility.

Antigravity gives you model choice out of the box: Gemini 3.1 Pro is the default, but you can also route tasks to Claude Sonnet 4.6, Claude Opus 4.6, or OpenAI models — even per-task if you want. This matters more than it sounds when you're dealing with tasks that different models handle differently.

💰 Pricing

VSCode is free. GitHub Copilot runs ~$10/month for individuals (more for teams).

Antigravity currently offers:

Free tier: Rate-limited, ~20 requests/day with Gemini 3 Flash — enough for light exploration
Pro: $20/month (bundled with Google AI Pro)
Ultra: $249.99/month for heavy agentic workloads

Fair warning: the free tier was very generous during preview, but early adopters reported significant quota tightening post-launch. The "work done" credit metric is opaque, so budget carefully before going all-in.

Where VSCode Still Wins

Let me be honest — VSCode isn't going anywhere for me soon, and here's why:

Stability. Antigravity is a November 2025 public preview. Agent loops get stuck. Multi-agent conflicts produce inconsistent output. Some VS Code extensions break. For production work on a real codebase with tight deadlines, that's a meaningful risk.

Control. When you want precision — a specific refactor, a focused bug fix, a carefully crafted function — VSCode + Copilot is faster and more predictable. You don't need to spin up an agent team to fix a typo.

Speed. For quick, tactical changes, the Editor View overhead of Antigravity's agent initialization can feel like overkill.

Ecosystem. The VSCode extension marketplace is still unmatched. Language servers, debuggers, linters, test runners — the depth is staggering.

Where Antigravity Actually Shines

But there are scenarios where Antigravity makes me feel like I unlocked a cheat code:

Greenfield projects. When I'm spinning up something new and want to go from idea to scaffolded, tested, running app fast — Antigravity is genuinely jaw-dropping. Describe the app, let the agents build it, watch the browser preview update in real time.

UI iteration. "Move the nav to the left, make the cards wider, add a loading state" — Antigravity handles visual feedback loops beautifully. The browser-integrated testing means the agent sees what you see.

Multi-step debugging. Ask it to find why a specific flow is broken. It reads code, runs the app, clicks through the bug, and reports back with a root cause analysis. That's hours of work delegated to minutes.

Complex refactors across many files. The multi-agent architecture can parallelize work that would require serious context management if you tried to do it yourself in Copilot.

The Real Question: What's Your Development Style?

Here's my honest framework for thinking about which to reach for:

Use VSCode when:

You're deep in production code that needs surgical precision
You want full control over every change
You're working in an extension-heavy environment
You need stability above all else

Use Antigravity when:

You're prototyping or building greenfield
You have a complex, multi-step task and want to delegate the execution
You want to experience where IDE tooling is heading
You're doing design-to-code work that benefits from visual browser feedback

Use both — which is genuinely what I do. VSCode as my daily driver for production code, Antigravity when I want to accelerate a specific feature or prototype.

The Bigger Picture

Antigravity isn't just a product launch. It's Google's bet on what software development looks like in 3–5 years — where your job isn't writing code line by line, but managing a team of AI agents that do it for you.

Whether that excites you or terrifies you probably says something about your relationship with coding. For me? Both, honestly.

The "agent-first" paradigm is still rough. It's still a preview. But it's also the most genuinely different thing I've used in years. VSCode + Copilot feels like evolution. Antigravity feels like a mutation — ungainly and strange and sometimes brilliant.

The developers who will thrive in the next few years are probably the ones who get comfortable managing agents and get their hands dirty in the editor. Not one or the other.

So: download Antigravity, play with it, break it a little. Keep your VSCode. Use them as complements, not competitors.

The future of coding isn't replacing you. It's changing what you spend your time on.

Have you tried Antigravity yet? I'd love to hear how it fits (or doesn't) into your workflow — drop a comment below. 👇

Tags: vscode googleantigravity ai productivity webdev

A Practical Project Structure for FastAPI Applications

Sreeraj Sreenivasan — Mon, 04 May 2026 03:39:24 +0000

A short guide to organizing FastAPI apps beyond a single main.py file.

FastAPI makes it easy to start with a single main.py file. That is great for demos, prototypes, and small APIs.

But once your application grows, one file can quickly turn into a mix of routes, database logic, security helpers, settings, and business rules. A clear project structure helps keep the app easier to understand, test, and extend.

Here is a practical FastAPI structure for growing backend applications:

.
├── app/
│   ├── api/
│   │   └── v1/
│   │       ├── endpoints/
│   │       └── router.py
│   ├── core/
│   ├── crud/
│   ├── db/
│   ├── models/
│   ├── services/
│   └── main.py
├── alembic/
├── docs/
├── scripts/
├── tests/
├── .env.example
├── alembic.ini
├── docker-compose.yaml
├── Dockerfile
├── pyproject.toml
└── README.md

app/main.py
This is the application entry point.

Use it to create the FastAPI() app, register routers, configure lifespan events, add middleware, and expose basic endpoints like /health.

Try not to put every route here. If main.py grows every time you add a feature, it is probably doing too much.

app/api/v1/
This folder contains your versioned API.

A common pattern is:

api/
└── v1/
    ├── endpoints/
    │   ├── login.py
    │   └── users.py
    └── router.py

Each file in endpoints/ handles one feature or resource. For example, users.py contains user-related routes.

The router.py file combines those endpoint routers, so main.py only needs to include one versioned router:

app.include_router(api_router, prefix="/api/v1")

Versioning early makes future changes easier.

app/core/
Use core/ for app-wide configuration and security code.

This usually includes:

Environment-based settings
Secret keys
JWT helpers
Password hashing
Authentication utilities
These concerns are used across the app, so keeping them separate avoids duplication.

app/models/
The models/ folder stores your data shapes.

Depending on your stack, this may include SQLModel models, Pydantic schemas, database table models, request models, and response models.

A useful rule: do not assume your database model should also be your API response model. For example, a user table may contain a hashed password, but your API response should not.

app/crud/
Use crud/ for reusable database operations.

Instead of writing database queries directly inside route handlers, keep them in focused functions like:

Create user
Get user by ID
Get user by email
Update user
Delete user
This keeps endpoints cleaner and database behavior easier to test.

app/db/
The db/ folder handles database setup.

It often contains the database engine, session helpers, connection logic, and initial seed data.

This keeps infrastructure concerns separate from API logic.

app/services/
Use services/ for business logic and integrations.

If a route needs to coordinate multiple steps, call an external API, send an email, or apply business rules, that logic usually belongs in a service.

Endpoints should describe the HTTP interface. Services should describe what the application actually does.

alembic/
alembic/ stores database migrations.

Models describe the current shape of your data. Migrations describe how the database changes over time.

For real applications, migrations are essential.

tests/
Your tests should be easy to find and understand.

One simple approach is to mirror your API structure:

tests/
└── api/
    └── v1/
        └── endpoints/
            ├── test_login.py
            └── test_users.py

Start by testing behavior that users and clients depend on: login, user creation, validation, permissions, and health checks.

Supporting files
Files like Dockerfile, docker-compose.yaml, .env.example, pyproject.toml, scripts/, and docs/ are part of a healthy backend project too.

They help with local development, dependency management, deployment, documentation, and onboarding.

Generated folders like .venv/, pycache/, .pytest_cache/, and build outputs should usually stay out of your source structure and be ignored by Git.

Final thoughts
A good FastAPI structure should make the next feature easier to add.

Keep routes thin, move business logic into services, keep database logic reusable, separate config from feature code, and organize tests around behavior.

You do not need a complex architecture on day one. But once your API starts growing, a clean structure gives your project room to breathe.

The Bulletproof FastAPI Stack

Sreeraj Sreenivasan — Sun, 26 Apr 2026 23:13:04 +0000

Building a FastAPI project is exciting—until the code grows, the types get messy, and security vulnerabilities creep in. In a world where Developer Experience (DX) is king, how do you keep your velocity high without sacrificing quality?

The answer is a modern defensive pipeline. Here’s why the combination of Ruff, Mypy, Bandit, and Prek is the ultimate power-up for your FastAPI backend.

1. Ruff: The Speed Demon

Gone are the days of waiting for Flake8 or Black. Ruff is a Rust-powered linting
behemoth. For FastAPI projects, it handles everything from sorting your imports
in main.py to catching unused variables in your route handlers—all in
milliseconds.

• Instant Feedback: Fixes code as you type.
• Unified Tooling: Replaces 5+ tools with one binary.

2. Mypy: The Type Safety Net

FastAPI relies on Python type hints to perform its magic. Mypy ensures those hints are actually correct. It validates that the data flowing from your schemas.py into your crud.py logic is exactly what you expect.

"Mypy catches those silent 'await' bugs in async routes that would
otherwise only surface as mysterious runtime errors in production."

3. Bandit: Your Security Sentinel

When you're building APIs that handle user data, security isn't optional. Bandit scans your code for common security pitfalls, like insecure password hashing or SQL injection risks, ensuring your FastAPI app stays protected from day one.

4. Prek: The Automated Gatekeeper

Why manually run checks when you can automate them? Prek hooks all these tools together. If a commit doesn't pass the Ruff linting or Mypy type checks, it doesn't get into the repo. It's the ultimate "quality firewall."

5. Debugpy: Precision Inspections

When an async request fails, you need more than just log statements. Debugpy allows you to pause time inside your FastAPI endpoints, inspecting Pydantic objects and database states with surgical precision.

Ready to level up your Python workflow? Start small, automate early, and let the tools do the heavy lifting.