Grek Creator

Posted on Mar 3

Why I Became a 'Vibecoder' (And Why It's Not Lazy Coding)

#python #ai #fastapi #productivity

"You just let AI write your code?" — Yes. And here's why that makes me more of an engineer, not less.

The term "vibecoder" gets thrown around as an insult lately. It implies you're lazy, that you don't understand what's happening under the hood, that you're just prompting your way to technical debt.

I'm here to tell you something different: vibecoding isn't about replacing engineering. It's about removing friction between idea and revenue.

What Changed in Late 2024

For years, I built REST APIs the traditional way: Flask/Django, manual CRUD, boilerplate authentication, repetitive validation logic. Solid work. Reliable. Slow.

Then I hit a wall: a client needed a Telegram bot with AI symptom triage, 152-FZ compliance, PWA fallback, and 1C:Medicine integration — in 3 months.

Traditional approach: ~800 hours. Deadline: impossible.

So I adapted. I built workflows. I learned where AI accelerates and where human judgment is non-negotiable.

Result: 420 hours. Production launch. Zero compliance violations.

The 3-Tier Hybrid Pattern (With Code)

Here's the architecture I now use for every AI-powered system. It's not sexy, but it works in production.

Tier 1: Keyword Router (0ms, ~78–95% of requests)

# routers/keyword_router.py
import re
from typing import Optional, Callable

class KeywordRouter:
    def __init__(self):
        self.routes: dict[str, tuple[re.Pattern, Callable]] = {
            "symptom_headache": (
                re.compile(r"(голова болит|мигрень|цефалгия)", re.I),
                self._route_neurology
            ),
            "booking_cancel": (
                re.compile(r"(отменить запись|отмена|перенести)", re.I),
                self._route_appointment
            ),
        }

    def route(self, text: str) -> Optional[str]:
        text_norm = text.lower().strip()
        for route_name, (pattern, handler) in self.routes.items():
            if pattern.search(text_norm):
                return handler(text_norm)
        return None

Why this matters: No LLM call. No latency. No API cost. 78% of medical queries and 95% of dating bot conversations are handled here.

Tier 2: RAG + Cache (~100ms, ~17% of requests)

# services/rag_service.py
from chromadb import Client
from sentence_transformers import SentenceTransformer
import hashlib

class RAGService:
    def __init__(self, collection_name: str):
        self.chroma = Client().get_collection(collection_name)
        self.embedder = SentenceTransformer("rubert-tiny2")
        self.cache: dict[str, str] = {}

    def _normalize_key(self, query: str) -> str:
        normalized = re.sub(r"[^\w\s]", "", query.lower().strip())
        return hashlib.md5(normalized.encode()).hexdigest()

    def get_response(self, query: str) -> Optional[str]:
        key = self._normalize_key(query)
        if key in self.cache:
            return self.cache[key]
        query_vec = self.embedder.encode([query])[0]
        results = self.chroma.query(
            query_embeddings=[query_vec.tolist()],
            n_results=3,
            include=["documents"]
        )
        if results["documents"][0]:
            response = self._synthesize(results["documents"][0][0])
            self.cache[key] = response
            return response
        return None

Tier 3: LLM Fallback (2–6s, ~5% of requests)

# services/llm_service.py
import asyncio
from llama_cpp import Llama

class LocalLLM:
    def __init__(self, model_path: str):
        self.llm = Llama(
            model_path=model_path,
            n_ctx=4096,
            n_gpu_layers=0,
            verbose=False
        )

    async def generate(self, prompt: str, timeout: float = 15.0) -> str:
        try:
            return await asyncio.wait_for(
                asyncio.to_thread(self._generate_sync, prompt),
                timeout=timeout
            )
        except asyncio.TimeoutError:
            return "I'm taking too long — let me connect you to a human."

    def _generate_sync(self, prompt: str) -> str:
        output = self.llm(prompt, max_tokens=512, stop=["\n\n"])
        return output["choices"][0]["text"].strip()

Where AI Helps — And Where It Doesn't

AI accelerates:

Boilerplate CRUD endpoints
Regex pattern generation
Documentation drafts
Test data generation
Error message localization

Human judgment is non-negotiable:

Architecture decisions (3-tier vs monolith)
Compliance logic (152-FZ, GDPR, 323-FZ)
Payment flows (idempotency, webhook verification)
Error handling strategy
Business logic edge cases

The Irony: More AI = More Engineering Discipline

The more I used AI, the more I realized: if your architecture is weak, AI helps you build broken systems faster.

So I built guardrails:

No AI-generated code without human review (especially payments/auth)
Architecture first, code second (sketch on paper before prompting)
Test critical paths manually (AI can write tests, but I verify business logic)
Error monitoring from day one (66 error codes catalogued in one project)

Results: Speed Without Sacrificing Reliability

Metric	Traditional	AI-Augmented
Delivery time	8 weeks	3.5 weeks
Boilerplate hours	~300	~40
Production bugs (first month)	12	3
Uptime (first 6 weeks)	98.1%	99.7%

The projects didn't just ship faster — they shipped better, because I spent time on architecture instead of typing @app.post("/users") for the 47th time.

Vibecoding Is a Philosophy, Not a Shortcut

Call it what you want: AI-augmented engineering, high-velocity development, vibecoding. The label doesn't matter. The outcome does.

I'm not here to win arguments on Twitter. I'm here to ship products that make money — with production-grade reliability, compliance, and maintainability.

Built AI-powered systems for healthcare, luxury tourism, and social tech. Portfolio: grekcreator.com

Top comments (4)

Haskell Thurber • Mar 5

Your point about payment flows being "human judgment non-negotiable" hits hard. I've been through this exact scenario.

When I integrated Telegram Stars payments in my Mini App, the initial AI-generated payment handler was clean but missed a critical edge case: the pre_checkout_query has a 10-second timeout. If your server doesn't respond with answerPreCheckoutQuery() within that window, Telegram silently fails the payment. No error, no retry — the user just sees "payment failed."

AI-generated code had the logic right but wrapped it in await chains that occasionally exceeded the timeout under load. Human fix: move the pre-checkout response to the very first line, do validation synchronously, and defer everything else to successful_payment.

// The fix: respond FIRST, validate later
bot.on('pre_checkout_query', async (ctx) => {
  await ctx.answerPreCheckoutQuery(true); // <-- always first
  // Heavy validation happens in successful_payment handler
});

That's a one-line change an architect spots instantly, but AI generates "correct-looking" code that breaks at scale.

Your 3-tier keyword→RAG→LLM pattern is also spot on for Telegram bots. We use a simpler version: pattern matching handles ~70% of bot commands, the rest goes through business logic. No LLM needed for most interactions, which keeps the $5/month VPS budget intact.

Solid post — bookmarking the KeywordRouter pattern.

Grek Creator • Mar 6

Spot on! The Telegram Stars timeout is a textbook example of AI missing critical edge cases.

Your answerPreCheckoutQuery(true) FIRST, validate later pattern is exactly what I'm talking about — an architect spots this instantly, but AI generates "clean" code that fails under load.

We saw the same with Stripe webhooks: the AI gets the logic right but the async timing wrong.

Question: are you using the same pattern matching approach for the rest of your bot commands, or did you go full LLM for complex flows?

Thanks for the detailed breakdown — this is the kind of production war story that helps everyone level up!

Lakshmi Sravya Vedantham • Mar 4

The 3-tier routing pattern is the part worth highlighting more — keyword → RAG → LLM fallback is genuinely good architecture, not vibecoding. Most people skip to LLM for everything and then wonder why it's slow and expensive. The irony is AI-assisted development pushed you toward a more disciplined system than you'd have built otherwise. That tracks with my experience too.

Grek Creator • Mar 4

Spot on. AI didn't replace discipline — it demanded more of it. Thanks for reading!