Archit Mittal

Posted on Apr 19 • Originally published at architmittal.com

Week in AI Recap: 7 Shifts That Actually Matter for Indian Developers (April 2026)

#agents #ai #automation #news

I spend my weeks watching AI news for one reason: I automate workflows for Indian businesses, and the ground shifts under my feet every Monday. Most of what trends on X this week isn't useful. A few things are. Here's the signal, filtered for developers in India who ship real work.

1. Agent frameworks are consolidating — pick one and move on

The "which agent framework" debate is finally cooling. Three winners keep showing up in production code reviews: the Claude Agent SDK for operator-style tasks, LangGraph for stateful pipelines, and plain function calling on top of OpenAI / Anthropic / Gemini APIs for the 80% of jobs that don't need a framework at all.

My take: if you're building internal ops tooling — GST reconciliation, vendor onboarding, invoice triage — skip the framework. A 60-line Python script with tool definitions outperforms most "agent platforms" by six months. Here's the shape I use:

import anthropic

client = anthropic.Anthropic()

TOOLS = [
    {
        "name": "fetch_gst_status",
        "description": "Fetch GST filing status for a GSTIN",
        "input_schema": {
            "type": "object",
            "properties": {"gstin": {"type": "string"}},
            "required": ["gstin"],
        },
    }
]

def run(user_msg):
    messages = [{"role": "user", "content": user_msg}]
    while True:
        resp = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            tools=TOOLS,
            messages=messages,
        )
        if resp.stop_reason == "end_turn":
            return resp.content[-1].text
        # handle tool_use blocks, append tool_result, loop

Start here. Add complexity only when you can name the specific failure you're fixing.

2. Long context is cheap enough to abuse

Token prices for 200K+ context windows dropped again this week. For Indian teams dealing with PDF-heavy workflows — RERA filings, Income Tax orders, CBDT circulars — this changes the math. You no longer need RAG for a 300-page document. You paste it in, you ask the question, you're done — for roughly ₹3–5 per query on Sonnet-class models.

Does this kill RAG? No. For a 50,000-document corpus it's still essential. But for the common "make sense of this one fat PDF" case, vector databases have been over-engineered for two years.

3. Voice agents stopped being demos

Real production deployments I saw this week: a Pune-based clinic chain running outbound appointment reminders in Marathi and Hindi on a voice stack that costs under ₹2/minute. An edtech in Bangalore is using real-time voice for spoken English practice — latency under 400ms, good enough that kids don't game it.

The stack that shows up most often: Deepgram or Sarvam for ASR (Sarvam wins on Indic languages by a wide margin), a small LLM for routing, ElevenLabs or Bhashini for TTS. If you've been waiting for "voice AI is ready" — it's ready. Not for every use case, but the ones that work, work well.

4. Coding agents moved from autocomplete to delegation

The shift I watched this week in team Slacks: developers stopped saying "Copilot suggested" and started saying "I gave the agent the ticket." The mental model is different. You hand over a scoped task with acceptance criteria, you go make chai, you come back to a diff.

This isn't free. It works when:

The task has clear input/output contracts
Tests exist or can be written first
The code review discipline is still tight

It falls apart when teams treat agent output as trusted. I've seen two production incidents this quarter caused by unreviewed agent-written DB migrations. Review the diff. Always.

5. India's public AI stack shipped real things

The DPI + AI crossover keeps quietly shipping. Bhashini added better code-mixed Hinglish handling. ONDC's search layer gained semantic understanding for product descriptions. UPI's fraud detection got a visible accuracy bump — small merchants I talk to noticed fewer false declines this month.

If you build for Bharat, plug into these layers instead of building parallel ones. The cost math on a startup trying to replicate Indic NLP from scratch doesn't work.

6. Evals are the new moat

Every serious AI team I talked to this week was investing in evals, not prompts. The pattern: write 50–200 real examples with expected outputs, run your pipeline, score with a judge model, track drift over deploys.

Minimum viable eval in Python:

import json, anthropic

client = anthropic.Anthropic()
cases = json.load(open("evals.json"))

def judge(expected, got):
    prompt = (
        f"Expected:\n{expected}\n\n"
        f"Got:\n{got}\n\n"
        "Is this correct? Reply PASS or FAIL with a one-line reason."
    )
    r = client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=100,
        messages=[{"role": "user", "content": prompt}],
    )
    return r.content[0].text.strip()

passed = 0
for c in cases:
    got = run_your_pipeline(c["input"])
    verdict = judge(c["expected"], got)
    if verdict.startswith("PASS"):
        passed += 1
    else:
        print(f"FAIL: {c['input'][:60]}  ->  {verdict}")

print(f"{passed}/{len(cases)} passed")

Run this before every deploy. You'll catch 80% of regressions that would otherwise ship.

7. The "AI doing accounting" story is real, and boring

I'll close on the topic I spend most of my time on. Last week I helped a mid-sized D2C brand cut their monthly reconciliation from 14 person-days to under 2. Nothing glamorous — a Python pipeline that pulls Razorpay settlements, tallies against Shopify orders, flags mismatches, and writes a Tally-compatible CSV. Savings: roughly ₹85,000/month in ops cost.

This is where AI is quietly winning in India. Not chatbots. Not "AI-powered" products. Just ordinary automation where one of the steps happens to call an LLM because regex would have taken three weeks to write.

If you're a developer looking for paid work in 2026, this is the niche. Every SME in Tier 1 and Tier 2 has reconciliation, vendor onboarding, compliance filing, or customer support workflows that are one weekend of focused work away from a 70% time reduction. The businesses don't know it. You can tell them.

What I'm watching next week

Pricing from the newest Indic-language foundation models
Whether updated DPDP rules change what data can flow through foreign-hosted LLMs
Real benchmarks (not vibes) on code-gen for Python-heavy automation

That's the signal. Everything else this week was noise — keep your head down and ship.

I'm Archit Mittal — I automate chaos for businesses. Follow me for daily automation content.

DEV Community