I'm a 3rd year CS student. Every time I shipped a side
project I spent 2 hours writing the same announcement
6 different ways.
Reddit wants casual. HN wants technical. Product Hunt
wants punchy under 60 characters. IndieHackers wants
honest numbers and vulnerability.
Every AI tool I tried produced the same generic output
that got flagged by Reddit's spam filters immediately.
So I built OmniLaunch. Here's exactly how it works.
The core problem with AI writing tools
Every AI writing tool has the same flaw.
Feed the same prompt to ChatGPT and you get identical
output regardless of who you are. "In today's
fast-paced world, leveraging synergies to unlock
value..." — you know the vibe.
The fix isn't a better prompt. It's training on YOUR
writing specifically using embeddings.
Architecture overview
The system has 5 main components:
- Voice Lab — ingests writing samples, generates embeddings
- LangGraph agent pipeline — 3 agents per platform
- SSE streaming — real-time generation progress
- Validation engine — platform rule enforcement
- Razorpay billing — subscription management
Let me break down each one.
Voice cloning with text-embedding-004
When a user pastes their writing samples, I convert
them into a 768-dimensional vector using Google's
text-embedding-004 model:
async def generate_style_embedding(
text: str
) -> list[float]:
result = genai.embed_content(
model="models/text-embedding-004",
content=text,
task_type="SEMANTIC_SIMILARITY"
)
return result['embedding']
That vector gets stored in Supabase using pgvector:
ALTER TABLE voice_profiles
ADD COLUMN style_embedding VECTOR(768);
Every generated draft gets embedded using the same
model. Voice match score is cosine similarity between
the draft embedding and the stored profile:
def compute_cosine_similarity(
vec_a: list[float],
vec_b: list[float]
) -> float:
a = np.array(vec_a)
b = np.array(vec_b)
dot = np.dot(a, b)
norm = np.linalg.norm(a) * np.linalg.norm(b)
if norm == 0:
return 0.0
return float(dot / norm)
def normalize_cosine_to_score(
cosine_sim: float
) -> float:
# Map [-1, 1] to [0, 1]
return (cosine_sim + 1.0) / 2.0
The LangGraph pipeline
Three agents run sequentially per platform:
1. Context Research Agent
Fetches live platform rules from the database on
every generation — not hardcoded. Reddit flair
requirements change. HN's 80 character limit is
strict. PH taglines must be under 60 characters.
async def context_research_node(
state: PipelineState
) -> dict:
platform = state["platform"]
rules = await fetch_platform_rules(platform)
await progress_callback(
platform, "researching",
f"Fetched rules for {platform}"
)
return {"platform_constraints": rules}
2. Drafting Agent
Takes the Tone Manifesto + platform constraints +
product description. Generates a first draft via
Gemini with a platform-specific system prompt:
system_prompt = f"""
You are writing a {platform} post for a developer.
VOICE PROFILE:
{json.dumps(tone_manifesto, indent=2)}
PLATFORM RULES:
- Title max: {constraints['title_max_chars']} chars
- Forbidden words: {constraints['forbidden_words']}
- Required prefix: {constraints.get('prefix', 'none')}
- Tone: {constraints['tone']}
Write in the user's voice exactly. Match their
sentence length, formality, emoji usage.
NEVER use: {', '.join(AI_ISMS[:5])}...
"""
3. Humanizer Agent
The most important agent. Three responsibilities:
First — removes AI-isms. 50+ patterns:
AI_ISMS = [
"in today's fast-paced world",
"game-changer",
"revolutionary",
"unlock your potential",
"dive into",
"leverage",
"seamlessly",
"it's worth noting",
"certainly",
# 40+ more...
]
Second — scores voice match via cosine similarity.
Third — if score < 0.85, sends back to drafter
with specific feedback. Max 3 revision loops.
The routing logic
def route_after_humanize(
state: PipelineState
) -> str:
if (state["voice_match_score"] < 0.85
and state["revision_count"] < 3):
return "drafting"
return "end"
graph.add_conditional_edges(
"humanizing",
route_after_humanize,
{"drafting": "drafting", "end": END}
)
LangGraph merges the returned dict into state before
evaluating the conditional edge — so route_after_humanize
always sees the freshly updated score and revision count.
Real-time progress with SSE
Bundle generation runs as a Celery task. The frontend
watches progress via Server-Sent Events:
// hooks/useBundleSSE.ts
const useEffect(() => {
if (!bundleId || !token) return;
const es = new EventSource(
`/api/v1/bundles/${bundleId}/status?token=${token}`
);
es.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.status === 'complete') {
es.close();
fetchFinalBundle();
return;
}
if (data.status === 'failed') {
es.close();
showError(data.detail);
return;
}
updateProgress(data);
};
es.onerror = () => {
es.close();
fallbackToPolling();
};
return () => es.close();
}, [bundleId, token]);
One gotcha: EventSource doesn't support custom headers
so the JWT goes as a query parameter. Make sure your
backend validates it identically to header auth:
@router.get("/bundles/{bundle_id}/status")
async def stream_bundle_status(
bundle_id: str,
token: str = Query(...), # JWT from query param
db: AsyncSession = Depends(get_db)
):
user = await verify_token(token) # same as header auth
# ... SSE logic
The MV3 chrome.alarms gotcha
While building Tab Hibernator Pro (a Chrome extension
I shipped before this), I hit the most frustrating
MV3 bug.
Most developers use setInterval for background timers:
// BROKEN in MV3 — service worker sleeps
setInterval(checkTabs, 60000)
Service workers in MV3 go idle after ~30 seconds.
Your setInterval dies silently. Everything works in
DevTools (which keeps the worker alive) and breaks
in production.
The fix:
// CORRECT — alarms survive service worker sleep
chrome.alarms.create('hibernation-check', {
delayInMinutes: 1,
periodInMinutes: 1
});
chrome.alarms.onAlarm.addListener((alarm) => {
if (alarm.name === 'hibernation-check') {
checkAndHibernateTabs();
}
});
chrome.alarms persists independently of the service
worker lifecycle. This is documented but easy to
miss and impossible to debug until you know it.
Mistakes I made
Started with URL scraping for voice samples
My original spec scraped the user's Reddit and HN
posts using Playwright. 300MB Chromium install, rate
limiting, bot detection, flaky tests. Replaced it
with a simple textarea — paste 3-5 writing samples.
Simpler and works better.
setInterval in the service worker (see above)
Burned 2 days debugging why the extension worked
perfectly in testing and broke immediately in
production.
No rate limiting on LLM endpoints
Added slowapi after the architecture was complete.
Should have been there from day one. One user
hammering generate can burn your entire Gemini
quota in minutes:
from slowapi import Limiter
from slowapi.util import get_remote_address
limiter = Limiter(key_func=get_remote_address)
@router.post("/generate-launch-bundle")
@limiter.limit("5/minute")
async def generate_bundle(request: Request, ...):
...
Building LangGraph before the pipeline worked
Spent time learning LangGraph's API while the
basic pipeline wasn't even tested end-to-end.
Should have built a working custom loop first,
then migrated to LangGraph once the logic was
proven.
What I'd do differently
Build the boring version first. Polling before SSE.
Custom loop before LangGraph. Text paste before URL
scraping. The boring version is easier to debug and
gives you something to show people faster.
Rate limit everything that touches an LLM on day one.
Don't scrape when an API or simple text input works.
Top comments (1)
Voice-mimicry AI is a hard problem. I'm running cache-miss enrichment with Claude for a Japanese brand DB, and "consistent tone" was the trickiest prompt-engineering challenge by far.
What was the breakthrough that finally made the voice-match feel real?