DEV Community

Cover image for I built a “deterministic” LLM text rephraser with a validation pipeline - looking for architectural feedback
kefa
kefa

Posted on

I built a “deterministic” LLM text rephraser with a validation pipeline - looking for architectural feedback

Most LLM apps that “rewrite text” are thin wrappers around an API call.
You send text → you get text back.
That works for demos.
It breaks down quickly when you want predictable behavior, quotas, abuse resistance, and quality guarantees without storing user data.
I built a prototype called AI Text Rephrase to explore a question:
Can you make an LLM text transformation service behave like a deterministic backend system instead of a probabilistic chatbot?
This post is about the architecture and trade-offs, not the product.
App: https://app.aitechfuture.net

**The core problem
**LLM rewriting is non-deterministic and unbounded by default:
• Output style drifts
• Sometimes it summarizes instead of rephrasing
• Sometimes it changes meaning
• Sometimes it ignores the requested tone
• Sometimes it returns explanations, lists, or commentary
• Sometimes it fails silently
If you expose this directly as an API, you get:
• inconsistent UX
• hard-to-debug failures
• quota abuse
• unpredictable cost
• no way to enforce “this is a rephrase, not a rewrite”
So instead of trusting the model, I wrapped it in a fixed pipeline with validation.

**The design principle
**The LLM is not trusted. It is treated like an unreliable subsystem that must pass validation before its output is accepted.
Every request goes through this flow:

  1. Rate limit
  2. Tier identification (anonymous vs authenticated)
  3. Quota check
  4. Input validation (length bounds)
  5. Text preprocessing
  6. LLM inference (temperature = 0, single output)
  7. Semantic validation
  8. Tone adherence validation
  9. Response assembly
  10. Quota increment If validation fails, inference is retried once. Then the request fails. No heuristics. No “looks good”. Pure thresholds.

The interesting part: the validation layer
**After inference, three checks happen:
**1) Semantic similarity check

Using sentence embeddings:
cosine_similarity(original, rephrased) ≥ threshold
If meaning drifts → reject.
2) Tone adherence check
Simple linguistic heuristics:
• average word length
• formality markers
• structure patterns
If tone is wrong → reject.
3) Output format check
Length ratio must be within bounds.
If the model summarizes or expands too much → reject.
This turned out to matter more than prompt engineering.

Deterministic constraints (hard rules)
These cannot change at runtime:
• very low temperature
• single output only
• fixed set of tones
• validation always enabled
• no dynamic prompt mutation
• max 1 retry on failure
The goal is to make the system behave predictably across requests.

Why SQLite?
This is controversial.
I intentionally used SQLite because:
• Single-file persistence
• No external DB
• Zero infrastructure overhead
• Prototype constraint: single instance, single writer
The database stores only:
• users
• sessions
• quota counters
• OTPs
It does not store:
• input text
• output text
• history
This forces the system to be stateless regarding content and simplifies privacy concerns.

API gateway before business logic
All cross-cutting concerns live before the pipeline:
• OTP authentication
• quota manager
• sliding window rate limiter
• request routing
The rephrase pipeline never knows who the user is.
It only receives validated input.
This separation made debugging and reasoning about failures much easier.

Why a minimal frontend?
No framework. No build step.
Because this is not a frontend problem.
The goal was to reduce moving parts and make Docker deployment trivial.

What this design prevents
This architecture prevents:
• prompt injection via user text
• quota exhaustion by bots
• style drift
• meaning drift
• random output shapes
• cost spikes from multi-output retries
• storing user content for debugging
It behaves more like a compiler pipeline than an AI app.

Known limitations (by design)
• SQLite single-writer model
• No horizontal scaling
• In-memory embedding model load at startup
• No streaming responses
• No rephrase history
All intentional for this stage.

What I’m looking for feedback on
I’m not looking for UI or feature feedback.
I’d love input from people who’ve built LLM systems on:

  1. Is semantic + tone validation a reasonable guardrail, or would you do this differently?
  2. Is “retry once then fail” the right trade-off?
  3. Would you move any validation before inference?
  4. Is SQLite acceptable here given the constraints?
  5. Any architectural smell in the pipeline separation?
  6. How would you evolve this toward multi-instance without breaking the design? You can try to break it here: https://app.aitechfuture.net Would really appreciate thoughts from folks working on LLM infra and backend systems.

Top comments (0)