DEV Community: IWWOMI

The Synthesis Wall: Frontier AI Without Sending Your Data

mirac kodat — Sat, 16 May 2026 11:35:18 +0000

Every executive team has now had the same uncomfortable meeting. Engineering wants to use Claude for code review. Sales wants GPT-4 to draft proposals. Customer support has been quietly piping tickets into a chatbot through someone's personal API key. Legal walks in, asks one question — "where is that data going?" — and the whole program freezes.

The freeze is rational. The frontier models do live on someone else's infrastructure. Your customer records, M&A drafts, source code, and medical histories are exactly the data you cannot ship to a third party. Yet the productivity gap between teams that have integrated AI well and teams that haven't is now the difference between weeks and quarters.

The usual answer — "self-host an open model" — costs millions, requires a team you don't have, and ships you a model that benchmarks 30% behind whatever Anthropic released last week.

There is a third path. You don't bring the AI inside your walls. You build a wall that stands between your data and the AI. This piece is about that wall — what it is, what it costs, how it scales, and how to deploy one in 30 days without disrupting a single existing system.

The architecture in one sentence

A data sanitization layer is a programmable proxy that sits in the egress path between your applications and any external LLM provider. Outbound: it detects sensitive entities in a prompt, replaces them with reversible tokens, stores the mapping in your vault, and forwards only the tokenized prompt. Inbound: it receives the model's response, restores the original values from the vault, and delivers a complete answer to the user.

The provider sees structure. You keep substance. The mapping never crosses your trust boundary, so the provider literally cannot leak what it never received — a property that matters enormously when your compliance officer asks for guarantees rather than promises.

Key idea. This is not a model. It is plumbing. The frontier model still does the thinking; you just changed what it gets to think about.

Why this is the right primitive

There are four common alternatives, and each has a fatal flaw.

Self-hosted open-weight models (Llama 3.1 70B, Qwen 2.5, DeepSeek V3) sound appealing until you cost out the GPU bill, the model-ops headcount, and the gap between an open model and the closed frontier. Even the most generous self-host plans land at $30k–$120k per month for serious inference traffic, plus two-to-three FTE in MLOps. For most enterprises this is the worst of both worlds: high cost, lower capability. We dig into this trade-off more in our AI transformation playbook.

Provider data-processing agreements (the "we promise we won't train on your data" page) are necessary but insufficient. They are contracts about behavior, not about technical capability. An attacker who breaches the provider, an insider with the wrong access, or a future model that accidentally memorizes your data — none of these are stopped by a DPA. Modern security thinking has moved decisively from promise to prove. See OWASP's LLM Top 10 for why provider trust alone is no longer acceptable.

Pure local redaction in the client (regex stripping in the browser or SDK) is the right intuition wrong direction. Client-side anything is bypassable, inconsistent, and impossible to audit. A central layer enforces a single policy that every team inherits automatically.

Synthetic-data generation sounds elegant — train a small model on synthetic versions of your real data — but it only solves training. Inference still involves real user data, which is the actual problem.

The sanitization layer is the only architecture that gives you frontier capability, central enforcement, and a clean audit trail at the same time.

What happens in a single request

Consider a sales operations analyst asking the AI to draft a follow-up email for a customer who placed a six-figure order. The prompt naturally contains a name, a customer ID, an order amount — the exact data that should never reach a public API in raw form.

Behind the wall, in milliseconds:

Detection. A named-entity recognition model scans the prompt and flags Ahmet Yılmaz as PERSON, 12345678901 as NATIONAL_ID, $45,000 as MONETARY_AMOUNT. Detection runs through three layers: a transformer NER (multilingual, fine-tuned on your domain), regex rules (for things like IBANs, credit cards, IP addresses), and a domain dictionary (your product names, internal project codenames, partner companies).
Tokenization. Each sensitive value is replaced with a format-preserving placeholder: [PERSON_1], [ID_1], [AMOUNT_1]. The original-to-token mapping goes into an encrypted vault inside your environment — typically AES-256 at rest with per-tenant keys via AWS KMS or HashiCorp Vault.
Policy check. Before the request leaves your perimeter, the policy engine asks: Is this user allowed to send MONETARY_AMOUNT data to gpt-4o? If yes, forward. If no, block, escalate, or downgrade to a smaller model with stricter constraints.
Transmission. Only the sanitized prompt goes to the provider. Your egress firewall can be configured to allow LLM provider IPs only via the wall — any direct call from an application becomes a policy violation.
Generation. The model writes the email using tokens. It has no idea who Ahmet is or what he bought.
Restoration. The response comes back. The wall walks the response text, replaces each token with its original value from the vault, and delivers the final output.
Logging. Request metadata — user, timestamp, entity types involved, model used, policy applied, token count, cost — is written to your SIEM. The actual sensitive payload is never logged.

End-to-end latency added by the wall: typically 80–250ms on warm cache, less than the variance between OpenAI's own response times on the same prompt. Detection and tokenization can be parallelized; the vault lookup on restoration is the hot path.

The six capabilities, properly scoped

A sanitization layer is six tightly-coupled services behind one API.

1. Detection and classification. Multilingual NER (we use a fine-tuned XLM-RoBERTa for Turkish/English) plus regex plus dictionaries. Critically: the detector has to be tunable per industry. A bank cares about IBANs and SWIFT codes. A hospital cares about ICD-10 codes and medication names. A law firm cares about case numbers and party names. Out-of-the-box PII detection is the starting point, not the destination.

2. Tokenization and masking. Format-preserving so the model still reasons correctly. Ahmet Yılmaz becomes [PERSON_1] (not [REDACTED]) so the model knows it's a person and writes "Dear [PERSON_1]," in the right place. Numeric amounts become [AMOUNT_1] with the right magnitude class so calculations still work. Dates become [DATE_1] with preserved relative ordering.

3. Policy engine. Plain-English rules over (department, model, data class, action). "Marketing can use gpt-4o for any data except MEDICAL_RECORD. Engineering can use claude-3.5-sonnet for anything in the PUBLIC_REPO class but must use the on-prem model for anything in PRIVATE_REPO." These rules are versioned, reviewable in Git, and enforced before any external call. The engine ties closely to how we think about security at the application layer.

4. Audit and compliance. Every request, every response, every policy decision — without the sensitive payload. This is what converts AI from a compliance liability into a defensible process under KVKK, GDPR, ISO 27001, and HIPAA. The audit log is what your legal team will demand in year two and never had in year one.

5. Threat protection. LLMs have a unique attack surface: prompt injection (embedded instructions in user data), jailbreaks (clever prompts that bypass safety), and exfiltration (asking the model to leak its system prompt or training data). The wall inspects both directions for these patterns — incoming prompts for injection attempts, outgoing responses for leaked secrets or non-compliant content.

6. Model router. Different requests, different models. A simple summarization can go to gpt-4o-mini at $0.15 per million input tokens. A high-stakes contract review goes to claude-3.5-sonnet at $3 per million. The router optimizes for cost, latency, and capability per request — and gives you vendor independence as a side effect. We cover the cost-routing pattern in our microservices architecture writeup.

How it scales to enterprise volume

The naive implementation — single Node process, in-memory vault, sequential detection — works for a pilot but caps around 200 requests per second. Real enterprise traffic looks more like 5,000–50,000 RPS at peak. Three architectural decisions get you there.

Stateless detection workers behind a load balancer. Detection and tokenization are CPU-bound but stateless once your models are loaded. Run them as a Kubernetes deployment of 8–32 pods, scale horizontally on CPU. Each pod holds the NER model in memory; cold-start is mitigated by readiness probes that wait for model load. We've covered this Kubernetes pattern in our DevOps best practices guide.

Vault as a managed service. Don't build your own. Use Vault Enterprise, AWS Secrets Manager + KMS, or GCP Secret Manager. The vault is the most sensitive component in your architecture; making it bespoke is exactly the wrong place to save engineering time. Token-to-value lookups become a managed problem with audit logs you don't have to write.

Cache the model client. OpenAI-style HTTP/2 connections benefit hugely from connection pooling. Maintain a warm pool of 10–20 connections per provider per worker; the latency difference between cold-connect and warm is 200ms+ — bigger than your entire detection pipeline.

Background restoration for large responses. Streaming responses (server-sent events) need streaming restoration. As tokens arrive from the model, restore them on the fly and stream to the user. Do not buffer the full response, which forfeits the conversational latency advantage that made LLMs feel magical.

At 50,000 RPS, a properly architected wall adds roughly $0.0001 per request in your own infrastructure (against $0.001–$0.020 in model API cost), uses ~15ms of detection time, and gives you a single audit-able choke point for every AI interaction in the organization. The cost ratio is so favorable that the wall pays for itself just on model cost optimization — routing routine requests away from the flagship model is usually a 40–60% spend reduction. Database operations underneath this scale require their own discipline; we cover that in database optimization for high-traffic apps.

A 30-day deployment plan that actually works

Big-bang rollouts of new security layers fail. Here's how to ship a sanitization layer in one month without disrupting anything.

Week 1 — Pick one workflow. Choose the highest-pain, highest-leverage AI use case currently blocked by data sensitivity. Customer support triage. Contract clause extraction. Internal knowledge search over Confluence or Notion. Code review on private repos. One workflow, one team, one model. Define the entity classes that matter for this workflow and nothing else.

Week 2 — Stand up the wall in shadow mode. Deploy the layer in front of the chosen workflow but in observe-only mode. It detects, logs, would-have-tokenized, but does not modify the request. You now have a real dataset showing exactly what sensitive entities your users send, in what frequency, in what context. This data is gold for the next step.

Week 3 — Tune the detection. Based on shadow data, adjust the entity catalog. Add the domain-specific patterns the off-the-shelf model missed. Suppress the false positives (every team has at least one — for us it was repeatedly flagging "Stripe" as a person). Get the legal team to review the catalog: do they agree these are the categories that matter for KVKK Article 9 / GDPR Article 9 / your sector regulation?

Week 4 — Switch to enforce, then expand. Flip from observe to enforce on the pilot workflow. Watch error rates for 48 hours. Review the audit log with legal and compliance. Once the pattern is validated, the second workflow plugs in with a fraction of the effort because the layer is already running, the policies are already written, and the team already trusts the audit trail.

This phased approach is how every enterprise security primitive (WAFs, secrets managers, SIEM) actually rolled out — and how the sanitization layer should roll out too. The same pattern works for moving regulated workloads to the cloud, which we cover in our cloud migration guide.

The compliance picture, briefly

Under KVKK (Turkish data protection), Article 9 governs cross-border transfer of personal data — which is exactly what happens every time someone sends a customer name to an API hosted in the US. The sanitization layer is the technical control that lets you argue, with audit evidence, that personal data did not cross the border because it never left your perimeter in identified form.

Under GDPR, the same logic applies via Article 44 (transfers to third countries). Pseudonymization is explicitly recognized in Article 4(5) as a privacy-enhancing technology that materially reduces risk. A sanitization layer is, by definition, pseudonymization with a properly-secured re-identification key.

Under ISO 27001 Annex A 8.10 (information deletion) and A 8.11 (data masking), the wall directly satisfies the technical control requirements that auditors look for.

Under HIPAA, the same architecture functions as a de-identification layer per the Safe Harbor method, with the vault holding the identifiers that would otherwise convert PHI exposure into a reportable incident.

The same wall, configured per-industry, gives you a defensible posture across all four regimes. Your security team writes the policy once; the application teams inherit compliance automatically. This is a major reduction in audit overhead.

What this changes for IT

For technology leadership, the sanitization layer is more than a privacy tool — it's a strategic chokepoint. Three implications matter.

Single point of governance. Instead of negotiating data-handling terms with every AI vendor and auditing every integration separately, IT manages one layer with one policy set. Every AI-touching application in the enterprise — from the internal LLM chatbot to the marketing copy generator to the customer support bot we built using modern web architecture — inherits those controls automatically.

Clean separation of concerns. Application teams build features. The wall enforces data protection. Security teams audit one boundary instead of dozens. Compliance teams have one log to review.

Observability into AI usage. For the first time, IT can answer questions that today's ad-hoc AI use makes impossible: which teams are using AI most, on what data, at what cost, with what risk profile? Per-team token spend, per-model cost trends, policy violation rates — all emerge as a byproduct of doing the primary job.

The strategic frame. Most enterprises will eventually have a single AI gateway. The question is whether you design it deliberately as a strategic asset, or accumulate it accidentally as ten different teams build ten different proxies. The first path takes a quarter and pays dividends forever. The second takes years and produces ten different audit liabilities.

Common objections, briefly

"Won't sanitization hurt the model's accuracy?" In practice, no — modern LLMs reason perfectly well over structured placeholders as long as the placeholder preserves the type of entity. Where accuracy does suffer is on natively unstructured tasks like sentiment analysis of customer feedback, where the customer's actual words matter. For those tasks you either accept the trade-off or run them through an on-prem model. The router can make this routing automatic.

"What about agents that need to take real actions on real data?" The wall is for the LLM call, not the tool call. When the model outputs send_email_to([PERSON_1]), your application layer restores [PERSON_1] to the real address before invoking the email tool. The agent's reasoning happens on tokens; the agent's actions happen on real data inside your perimeter.

"Can the provider deduce identity from context?" Possible in theory, mitigated in practice by entity rotation (the same person gets different tokens in different sessions), aggressive minimization (only send the prompt fragments that need to reach the model), and provider-side privacy policies. The threat model here is residual; the alternative is sending everything in clear text.

Ready to build one?

If your organization is currently sending raw customer data to public LLM APIs — and most are — you are accumulating compliance debt every day. If you're holding back AI adoption entirely because legal said no, you are losing the productivity race.

The sanitization layer is the architectural primitive that lets you stop both. Your data stays home. The AI thinks anyway. Compliance gets a defensible answer. Engineering gets to ship.

We've built sanitization layers for regulated industries — finance, healthcare, legal — across both Turkey and Europe. If you want to discuss what one would look like for your stack, your data, and your compliance regime, get in touch. The first conversation costs you 30 minutes and clarifies whether this is the right primitive for your problem.

Keep your data. Use the AI. Both can be true at once.

From Public Cloud to Self-Hosted PaaS: A Migration Story

mirac kodat — Sat, 16 May 2026 11:35:17 +0000

We just moved a client's production workload off the public cloud and rebuilt their infrastructure from the ground up. The result is the kind of work that doesn't fit in a status update — so here is the full story, the trade-offs we accepted, and what every growing company should ask before they sign their next AWS bill.

The problem was simple — and familiar

Like many growing companies, our client was hosting their applications on a major cloud provider. Every month the same two questions came up at the leadership table.

Why are we paying this much? Cloud bills had quietly tripled over eighteen months. Most of the growth wasn't from new features or new customers — it was from "small" line items that nobody was watching: NAT gateway traffic, cross-AZ data transfer, idle managed-service buffers, and replicated storage that nobody had pruned in a year. The team had stopped reading the bill in detail because reading it didn't change anything.

Why does every customer require a different management process? Each tenant had been onboarded as a one-off — a custom VPC, a custom database, a custom set of IAM roles. By customer number twelve, the operational surface was unmanageable. A configuration change for one customer meant a four-hour ticket for the platform team. There was no leverage in growth.

The first problem was draining the budget. The second was draining the team's time. We saw both, and our recommendation was direct: it's time to leave.

The honest framing. Public cloud isn't bad. It's a poor fit for a specific shape of workload — predictable traffic, multi-tenant by design, cost-sensitive, where the elasticity premium isn't paying for itself anymore. That described our client exactly.

What we built

We built a multi-tenant Platform-as-a-Service infrastructure on private VDS (Virtual Dedicated Server) instances, fully under our control. The shape of the system:

A single control plane that provisions tenants, runs deployments, and handles upgrades.
Per-tenant isolation at the namespace level — each customer gets their own Kubernetes namespace, their own database schema, their own observability scope — but they share the underlying nodes for cost efficiency.
Identity and policy managed centrally with Keycloak, so the same access model applies whether a tenant has one user or fifty.
Self-service onboarding through an internal portal — picking the right combination of services for a new customer is now a 10-minute form, not a week of platform-team coordination.
Closed management surface — the orchestration layer is reachable only from a VPN-gated jump host. There is no public internet path into the things that control everything else.

This is closer to how Render, Fly.io, or Heroku built their platforms — except sized for a single company's needs and operated by the people who use it daily.

The outcomes that mattered

After three months in production:

Monthly infrastructure costs dropped significantly. We don't quote the exact percentage publicly, but the spend trajectory crossed below the old cloud baseline in the second month and kept going.
New customer onboarding went from days to ten minutes. What used to be a multi-team handoff is now a form on the internal portal.
All environments became observable from a single point. One Grafana, one Loki, one Tempo. The whole platform is legible from one screen.
The management layer was completely closed to public access. No more public-internet-facing dashboards. No more "we'll set up SSO later." The reachable surface is dramatically smaller.
Vendor lock-in was eliminated. The same Helm charts, the same infrastructure-as-code definitions, will run on any provider with a VDS API. If we want to multi-home tomorrow, we can.

These outcomes don't show up on a marketing page. They show up on a finance report and in a platform team's morale.

The real message of this project

"Digital transformation" is too often discussed as adding new tools. A new dashboard, a new AI integration, a new observability product. Tools matter, but they're the visible 10%. What creates lasting impact is whether the foundation those tools sit on is built right.

A foundation that will still scale three years from now. A foundation that will still stay secure when a key team member leaves. A foundation that's sustainable on whatever budget you have in 2028 — not just whatever you have today.

This is the layer most companies under-invest in until it breaks. By the time it breaks, the cost of fixing it is much higher than the cost of building it right the first time.

Where our work begins

This is where IWWOMI's work starts. We don't just build applications or AI solutions — we design and deploy the entire infrastructure that keeps them running. From the data layer to deployment pipelines, from security to observability. The same discipline that makes our AI transformation work production-grade is what makes our infrastructure work survive contact with growth.

Some adjacent reading from our team:

Cloud Migration Strategy: A Complete Guide — the framework we use to decide what moves and what stays.
DevOps Best Practices for Modern Development Teams — the operational practices behind self-hosted at scale.
Microservices Architecture: When and How to Use It — the architectural shape that makes multi-tenant possible.
Database Optimization Techniques for High-Traffic Apps — what we tune once the database becomes the bottleneck.

When to consider this

You probably should not exit the public cloud if:

Your traffic is genuinely spiky and you need autoscaling that you couldn't justify staffing yourselves.
You're a small team without operational depth, and one of the founders is on call.
You depend on managed services (RDS, Aurora, DynamoDB) in ways that would take a year to replicate.

You probably should consider it if:

Your traffic profile is predictable and your cloud bill is growing faster than your customer base.
You serve multiple tenants that are structurally similar.
Compliance (KVKK, GDPR, sector-specific) keeps adding requirements you find hard to satisfy on shared infrastructure.
A single API price change from a single provider could meaningfully hurt your margins.

The third option — and often the right one — isn't to fully exit. It's to build a self-hosted core for the steady-state workload and keep a small cloud footprint for the burst. The economics of a hybrid landing usually beat either pure cloud or pure on-prem.

The technical deep-dive

For the full technical write-up — the architecture diagrams, the trade-offs we accepted, what we'd do differently, and the specific tooling choices (Kubernetes, Cilium, Longhorn, Tempo, Loki, Argo CD) — read our team lead Abdullah Taş's piece on Medium:

From Zero to Production: The Story of Building a Self-Hosted PaaS Architecture →

Ready to talk?

If your infrastructure is struggling to keep up with growing workloads, or if your cloud bill has stopped being sustainable, this is exactly the kind of work we do. The first conversation is a 30-minute call where we look at your current setup, your trajectory, and what a different shape would mean for you. No commitment, no slide deck, just an honest read.

Get in touch — we'd love to hear what you're building.

Test post

mirac kodat — Sat, 16 May 2026 11:19:27 +0000

the test post