Sujal Meena

Posted on May 16

How I built an AI that writes launch posts in your voice — not generic ChatGPT slop

#ai #webdev #python #buildinpublic

Every time I shipped a side project, I spent 2 hours
writing the same announcement 6 different ways.

Reddit wants casual. Hacker News wants technical.
Product Hunt wants punchy under 60 characters.
IndieHackers wants transparency and honest numbers.

I tried using ChatGPT. The output was generic, got
flagged by Reddit's spam filters immediately, and
sounded nothing like me. So I built OmniLaunch.

Here's exactly how it works technically.

The core problem with AI writing tools

Every AI writing tool has the same flaw — it produces
identical output for everyone. Feed the same prompt to
ChatGPT and you get the same "In today's fast-paced
world" energy regardless of who you are.

The fix isn't a better prompt. It's training the model
on YOUR writing specifically.

How voice cloning actually works

When a user pastes their writing samples, here's what
happens under the hood:

async def generate_style_embedding(text: str) -> list[float]:
    result = genai.embed_content(
        model="models/text-embedding-004",
        content=text,
        task_type="SEMANTIC_SIMILARITY"
    )
    return result['embedding']  # 768-dimensional vector

We use Google's text-embedding-004 model to convert
the user's writing into a 768-dimensional vector. This
vector captures the semantic fingerprint of how they
write — not just the words, but the patterns.

That vector gets stored in Supabase using pgvector:

style_embedding VECTOR(768)

The LangGraph agent pipeline

The generation pipeline has 4 agents running in sequence:

1. Context Research Agent
Fetches live platform rules from the database —
Reddit flair requirements, HN's 80 character title
limit, Product Hunt's 60 character tagline limit.
Rules change so we fetch fresh on every generation.

2. Drafting Agent
Takes the user's Tone Manifesto (a JSON description
of their writing style) + platform constraints +
product description and generates a first draft via
Gemini.

The system prompt enforces platform rules hard:

system_prompt = f"""
You are writing a {platform} post for a developer.

VOICE PROFILE:
{json.dumps(tone_manifesto, indent=2)}

PLATFORM RULES:
- Title max: {constraints['title_max_chars']} characters
- Forbidden words: {constraints['forbidden_words']}
- Required prefix: {constraints.get('prefix', 'none')}

Write in the user's voice. Match their sentence 
length, formality level, and emoji usage exactly.
"""

3. Humanizer Agent
This is the most important agent. It does 3 things:

First — removes AI-isms. We maintain a list of 50+
phrases that signal AI-generated content:

AI_ISMS = [
    "in today's fast-paced world",
    "game-changer",
    "revolutionary", 
    "unlock your potential",
    "dive into",
    "leverage",
    "seamlessly",
    "it's worth noting",
    "certainly",
    # ... 40+ more
]

Second — scores voice match using cosine similarity:

def compute_cosine_similarity(
    vec_a: list[float], 
    vec_b: list[float]
) -> float:
    a = np.array(vec_a)
    b = np.array(vec_b)
    dot = np.dot(a, b)
    norm = np.linalg.norm(a) * np.linalg.norm(b)
    if norm == 0:
        return 0.0
    return float(dot / norm)

def normalize_cosine_to_score(cosine_sim: float) -> float:
    return (cosine_sim + 1.0) / 2.0

Third — if the voice match score is below 0.85, it
sends the draft back to the Drafting Agent with
specific feedback. Maximum 3 revision loops.

4. LangGraph routing

The revision loop is handled by a conditional edge:

def route_after_humanize(state: PipelineState) -> str:
    if (state["voice_match_score"] < 0.85 
            and state["revision_count"] < 3):
        return "drafting"
    return "end"

graph.add_conditional_edges(
    "humanizing",
    route_after_humanize,
    {"drafting": "drafting", "end": END}
)

The MV3 gotcha that killed most tab suspenders

While building Tab Hibernator Pro (a Chrome extension
I shipped before this), I learned the hard lesson about
Manifest V3 service workers.

Most developers use setInterval for background timers:

// WRONG in MV3 — service worker goes to sleep
setInterval(checkTabs, 60000)

Service workers in MV3 go idle after ~30 seconds of
inactivity. Your setInterval dies silently. The fix:

// CORRECT — alarms persist through service worker sleep
chrome.alarms.create('hibernation-check', {
  delayInMinutes: 1,
  periodInMinutes: 1
});

chrome.alarms.onAlarm.addListener((alarm) => {
  if (alarm.name === 'hibernation-check') {
    checkAndHibernateTabs();
  }
});

The alarm API persists independently of the service
worker lifecycle. This is documented but easy to miss
and impossible to debug because DevTools keeps the
service worker alive — so everything works in testing
and breaks in production.

Real-time generation with SSE

Bundle generation runs as a Celery task. The frontend
watches progress via Server-Sent Events:

// useBundleSSE.ts
const eventSource = new EventSource(
  `/api/v1/bundles/${bundleId}/status?token=${token}`
);

eventSource.onmessage = (event) => {
  const data = JSON.parse(event.data);

  if (data.status === 'complete') {
    eventSource.close();
    fetchFinalBundle();
  }

  if (data.status === 'failed') {
    eventSource.close();
    showError(data.detail);
  }

  updateProgress(data);
};

// Cleanup on unmount
return () => eventSource.close();

Note: EventSource doesn't support custom headers so
the JWT goes as a query parameter. Make sure your
backend validates it the same way as header auth.

What I'd do differently

Start with text-paste, not URL scraping

My original spec included scraping the user's Reddit
and HN posts for voice samples. Playwright on a server
is a nightmare — 300MB Chromium install, rate limiting,
bot detection. I replaced it with a simple textarea
asking users to paste 3-5 samples of their writing.
Simpler, faster, works better.

Rate limit LLM endpoints from day one

I added slowapi after the fact. Should have been
there from the start:

from slowapi import Limiter
from slowapi.util import get_remote_address

limiter = Limiter(key_func=get_remote_address)

@router.post("/generate-launch-bundle")
@limiter.limit("5/minute")
async def generate_bundle(request: Request, ...):
    ...

One user hammering the endpoint burns your entire
Gemini quota in minutes.

Polling before SSE

I built polling first and SSE second. This was the
right order — polling is simpler to debug, SSE is
better UX. Build the boring version first.

The stack

Backend: FastAPI + LangGraph + Celery + Redis
Database: Supabase + pgvector
AI: Google Gemini + text-embedding-004
Frontend: Next.js 14 + Tailwind v4 + Framer Motion
Billing: Razorpay
Deployment: Vercel (frontend) + Render (backend)

Try it

OmniLaunch is live with a 14-day free trial and
3 free launches forever.

omni-launch-coral.vercel.app

Built this solo in ~3 weeks. Happy to answer any
questions about the LangGraph pipeline, the embedding
approach, or the MV3 chrome.alarms gotcha.

DEV Community