Biricik Biricik

Posted on Apr 8 • Edited on May 16 • Originally published at dev.to

The case for self-hosted AI in regulated industries: data sovereignty in 2026

#ai #webdev #devops #security

The question every regulated-industry engineering manager is asking in 2026

If you run an engineering team in healthcare, finance, government, or defense, you've had a version of this conversation in the last six months: someone in the org wants to use a generative AI feature, and someone else in the org is asking where, exactly, the data goes when a user clicks "generate." If you can't answer that question with a single country, a single legal jurisdiction, and a single owner, you have a compliance problem you haven't solved yet.

This article is for those engineering managers and the developers who report to them. I'm Cemhan Biricik, founder of ZSky AI. We run a generative image and video platform on hardware we own, in a country we operate in, and we made that choice deliberately. Below is the case for self-hosted AI infrastructure in regulated industries in 2026, what's changed in the last twelve months, and how we built our stack to actually answer the question.

What's actually changed since 2024

Three things, all of them moving at the same time, all of them pointing toward "you'd better know where your data lives."

1. EU-US data transfer is back in court. The Data Privacy Framework that replaced Privacy Shield is facing its second major legal challenge. The arguments are familiar to anyone who lived through Schrems II: U.S. surveillance authority is broader than EU adequacy permits, and remedies for EU citizens are weaker than the Charter of Fundamental Rights requires. Counsel I've talked to are not betting on the DPF surviving the next two years intact. If you've built your AI feature on a multi-region cloud provider that routes EU user prompts through U.S. inference endpoints, you may be back to drafting Standard Contractual Clauses on top of Transfer Impact Assessments by the end of 2026.

2. State-level AI laws shipped in the U.S. Colorado's AI Act, the New York City employment law, the California ADMT regs, and Utah's disclosure requirements are now live or imminent. Each one carves out specific obligations for "high-risk" automated decision systems, and each one expects you to know what model is processing what data and where. The federal floor is still being negotiated, but the state floor is already a ceiling for non-compliant vendors.

3. Enterprise procurement woke up. I run sales conversations with enterprise customers, and the security questionnaires that used to ask about HTTPS and SSO now ask about model provenance, training data jurisdiction, inference geography, fine-tuning isolation, and whether the vendor has the legal authority to refuse a foreign government data request. If you can't answer those questions with one breath each, the deal slows down or dies.

The combined effect is that "we use OpenAI's API" or "we use Google Vertex" is no longer an answer to "where does our data live." It's the start of a 40-page Transfer Impact Assessment. And in regulated industries — healthcare, finance, government, defense — that assessment frequently concludes "you can't, actually."

The cloud-rented competitor problem

Let's name names, generously. The major generative AI APIs you might be tempted to wire into a regulated workflow break down something like this:

Multi-region U.S. cloud providers (OpenAI, Google, AWS Bedrock, Azure OpenAI): Your data probably stays in the region you pick, but the vendor's training pipeline, support staff, and "abuse review" subprocessors are global. The vendor's legal entity is U.S., which means a Foreign Intelligence Surveillance Act request can compel disclosure regardless of where the bytes physically sit. You can negotiate enterprise terms, but you can't negotiate FISA away.
Chinese-origin video models (Kling, Hailuo, Vidu, and several wrappers): Hosted on infrastructure in mainland China. Subject to the Cybersecurity Law, the Data Security Law, and Article 7 of the National Intelligence Law, which obligates organizations to assist state intelligence work. For any U.S. healthcare, finance, or government workload, this is a non-starter and your CISO will say so in five seconds.
European vendors: Generally better on residency, but most of the actual model weights they serve come from U.S. or Chinese labs, and the inference often happens on Microsoft Azure or AWS Frankfurt. The "European-ness" sometimes goes one layer deep.
Self-hosted open-weight models on your own hardware: Slowest path to ship, highest engineering cost, complete control over data and geography.

ZSky picked option four, and we did it from day one.

How we built it

We own seven NVIDIA RTX 5090 GPUs. They live in a facility in the United States. They are wired to networking we control. The inference stack is a Python and Go service mesh we wrote ourselves, fronted by a Cloudflare-protected web tier and a queue layer that pins jobs to a specific GPU pool based on workload type. When a user types a prompt into ZSky AI, the prompt goes to our web tier, gets queued, runs on our hardware, and the result comes back. There is no third-party inference API in that path. There is no overseas hop. There is no "we send your prompt to Hugging Face" footnote. The bytes stay on metal we bought.

This was not the cheap or easy choice. The math on owning seven RTX 5090s versus paying per-token to a hyperscaler is brutal in the short term. We made the call because we wanted the answer to "where does my data live" to be one sentence long.

The compliance posture we run today

For the engineering managers reading this, here is what's already done and what's in flight, in the order an enterprise procurement team will ask:

Data residency. All inference happens on U.S.-based hardware we own and operate. Static assets and edge caching ride on Cloudflare's global CDN, which is contractually U.S.-headquartered with documented sub-processor lists. Customer prompts and generations are stored on U.S.-located object storage with at-rest encryption (AES-256) and per-tenant key isolation for paid plans.

GDPR. We honor data subject access requests within the 30-day window. Right to erasure is implemented in our admin tooling — when a user deletes their account, we run a cascading delete across the database, the object store, and the queue logs within 72 hours. We're a data controller for free-tier accounts and a data processor for enterprise accounts, with separate Data Processing Addenda available for each.

CCPA / CPRA. Same backbone as GDPR. The "Do Not Sell or Share" link is on every page footer; we don't sell user data, period, but the link exists because California law says it has to. Verified consumer requests get the same 45-day response window the statute requires.

SOC 2 Type II. We're mid-audit. Our auditor is a top-25 firm. Trust Services Criteria covered: Security, Availability, Confidentiality. We expect the report to land in Q3 2026. Until then, we share our SOC 2 Type I report under NDA.

HIPAA / BAA. We execute Business Associate Agreements with healthcare customers who need them. The same hardware that serves consumer traffic can be carved into a tenant pool with audit logging enabled and PHI handling rules turned on. Email enterprise@zsky.ai if you need to start that conversation.

FedRAMP. Not yet. It's on the roadmap, but FedRAMP Moderate is a 12-to-18-month engagement for an organization our size, and we're being honest about the timeline rather than vaporware-ing the badge. Read our security page for the full and current list.

Export control / ITAR. Our hardware is U.S.-based, our staff is U.S.-based, and our model weights are not distributed outside the U.S. without an export classification. For defense customers asking about ITAR-controlled workflows, we can discuss tenant isolation in more detail under NDA.

The full enterprise compliance overview lives at zsky.ai/enterprise.html.

What "self-hosted" actually buys you

I want to be specific about this, because "self-hosted" gets used as a vibe word and not an architecture word.

It buys you a one-sentence data flow diagram. "User prompt enters U.S. web tier, hits U.S. queue, runs on U.S. GPU, returns to user." That's the whole sentence. A privacy lawyer can read that in three seconds and clear it. A privacy lawyer can read OpenAI's data flow in twenty minutes and need a coffee.

It buys you a single legal jurisdiction. When a customer asks "what court would hear a dispute about our data," we can answer with a state and a country. We're not stitching together a U.S. main agreement, an Irish DPA for EU users, a Singapore sub-processor for APAC traffic, and a "we may use other regions for capacity reasons" footnote.

It buys you the ability to refuse a request. If a foreign government asks for our customer prompts, we say no, and the legal basis for that no is "we are a U.S. company with no operations in your jurisdiction." Cloud-rented competitors with global presence have a much harder conversation. It's not theoretical — Microsoft has been litigating cross-border data requests in the U.S. courts for the better part of a decade.

It buys you control over the model lifecycle. We can pin a model version for a customer. We can run an old model alongside a new one for an A/B period. We can roll back. Hyperscaler APIs get silent updates; we get to decide when and how.

It buys you bandwidth on hard problems. When something breaks, we can ssh into the box, look at the logs, and fix it. We're not waiting for a status page to acknowledge an outage that started 40 minutes ago.

What it costs you is capital, calendar, and staff. Owning hardware means racking it, cooling it, monitoring it, and replacing it on a depreciation schedule. It means a 3 a.m. page when a fan dies. It means deciding which model variant to deploy and being responsible if you pick the wrong one. Self-hosted is not free — it's just honest about what you're paying for.

The honest comparison table

Here's the version I'd put in front of an engineering manager:

Question	Hyperscaler API	Chinese-hosted API	ZSky self-hosted
Where does the prompt physically run?	Multi-region, vendor decides	Mainland China	One U.S. facility
Legal jurisdiction over the data	U.S. + sub-processors globally	PRC	U.S., one entity
Foreign government compulsion risk	FISA applies	National Intelligence Law applies	U.S. process only
BAA available for healthcare	Some providers, with caveats	No	Yes
GDPR data flow	Multi-hop, requires TIA	Effectively non-compliant	Single-hop, simple TIA
Model version control	Vendor controls	Vendor controls	Customer can pin
Cost at 1M generations/month	Linear per-token	Linear per-token	Fixed amortized
Time to ship a feature	Days	Days	Weeks (we did the weeks already)

The trade-off is real. If you need to ship a chatbot prototype this week and your data is non-sensitive marketing content, hyperscaler APIs are still the right answer. If you need to ship a clinical decision support tool, an internal financial analysis assistant, a government records summarizer, or a defense contractor's intel briefing generator, the trade-off looks very different.

What I'd tell an engineering manager evaluating us

Three things.

One, ask for the data flow diagram in writing. Not the marketing version. The actual one your security team would put in a Transfer Impact Assessment. We'll send it. Every vendor you're evaluating should be able to send one in under 48 hours, and the ones that can't are telling you something.

Two, ask about the sub-processor list. Ours is short and lives on our security page. The shorter the list, the fewer the legal entities you have to evaluate.

Three, run a pilot on a non-sensitive workload first. Generate marketing illustrations, internal training graphics, communication aids. See how our latency and quality compare to whatever you're using now. If the pilot goes well, scale it into the regulated workflow with a BAA in place.

Where to start

If you're an engineer who wants to kick the tires on the platform, head to zsky.ai and use the free tier. 200 credits to start, 100 a day after that, 1080p video with audio, no credit card required. That's enough to run a real evaluation.

If you're an engineering manager who wants to start an enterprise conversation, our enterprise page has the SOC 2 Type I request form and the BAA intake. Email enterprise@zsky.ai with your use case and I or someone on my team will reply within a business day.

We chose the harder build because the easier one didn't actually solve the problem. If your industry has the same constraints, maybe the same choice is right for you.

— Cemhan Biricik, Founder, ZSky AI

DEV Community