Valeria Bernhardt

Posted on Jul 2

Choosing an EU-Hosted Inference Provider: A 2026 Comparison

#ai #opensource #eu #llm

European teams building with LLMs face a question that did not exist a few years ago: where do you actually run inference? US options fall into two camps, proprietary-model providers like OpenAI, and recently open-source inference platforms like Together AI or Fireworks that serve more affordable open-weight models. Both are fast and competitively priced, but routing sensitive data through US infrastructure raises GDPR and data-residency concerns. A growing set of European providers now offer an alternative for running open-source models inside the EU, but they differ widely in focus, pricing model, and what they actually host.

This article compares the main options for running open-source model inference inside the EU, what each is best at, and where each falls short.

Quick answer:
If you want a one-line version: for serverless, pay-per-token inference on open-source models with EU data residency, the most direct options are Lyceum, Scaleway, IONOS, and STACKIT. For a managed, single-vendor model family, Mistral. The rest of this article explains the trade-offs.

What to look for in an EU inference provider

Before comparing vendors, the criteria that actually matter for an inference workload:
Data residency: Are inference requests processed inside the EU, and is that contractually guaranteed (DPA/AVV)?
Pricing model: Serverless pay-per-token (you pay only for the tokens you process) vs. a dedicated endpoint with reserved capacity at a fixed rate for steady high load.
Model choice: A broad catalog of open-source models vs. a single vendor's own models.
Integration effort: OpenAI-compatible APIs let you switch by changing an endpoint; proprietary APIs require a rewrite.
Scaling: Can you go from a quick test to production volume without changing providers, and is there a dedicated-endpoint option for steady high load?

The providers

1. Lyceum:

EU-hosted inference for open-source models
A Berlin-based inference cloud offering serverless, pay-per-token access to open-source models through an OpenAI-compatible API, with GPU VMs and clusters available for training on the same platform.

Pros:
✅OpenAI-compatible API: migrating from OpenAI or another provider is largely a change of base URL and key, no rewrite
Broad,
✅current open-source model catalog (DeepSeek, GLM, Qwen, Llama, Kimi and others) with transparent per-token pricing (from $0.13/1M tokens)
✅Serverless smart routing: requests are routed to the best available capacity automatically
✅EU data residency for European workloads, GDPR-compliant with DPA/AVV available, plus zero-retention mode (prompts and outputs not stored)
✅Pay-per-token with no base fees, no minimum commitment, scale-to-zero, prompt caching included
✅ can scale into training (GPU VMs and clusters) on the same platform if needed
✅Close, hands-on customer support (direct access to the team, e.g. via a shared Slack channel), rather than ticket queues

Cons:
❌Dedicated inference endpoints are still in beta (generally available planned)
Best for: EU teams that want low-cost, drop-in inference on open-source models, with the option to scale into training, without managing infrastructure.

2. Scaleway:

French cloud with serverless inference
A French provider offering serverless inference (Generative APIs) and dedicated deployments, hosted in its Paris data centers, alongside a broad cloud and GPU portfolio.

Pros:
✅Serverless, pay-per-token inference with an OpenAI-compatible API **(from €0.15/1M for gpt-oss-120b)
✅Reasonably current catalog** including GLM and recent Qwen models, hosted in France, GDPR-compliant, no data retention
✅Low first-token latency (sub-200ms reported in Europe), free tier on the first 1M tokens
✅Backed by a large, established European cloud with GPU instances and clusters

Cons:
❌On a like-for-like model, pricing can run higher than the cheapest options (e.g. Llama 3.3 70B at €0.90/1M in and out); worth comparing per-model rates
❌Inference is one part of a very broad cloud product, which can add complexity
Best for: Teams wanting a serverless EU inference API from an established French cloud, especially if already in the Scaleway ecosystem.

3. IONOS:

German AI Model Hub
The AI Model Hub from IONOS, one of Germany's largest hosters, serves open-source models from German data centers via an OpenAI-compatible API, with integrated RAG and vector-database features.

Pros:
✅All models hosted in Germany, GDPR-compliant, AVV/DPA available, no US CLOUD Act exposure
✅OpenAI-compatible API plus built-in vector database and RAG without extra setup
✅Pay-per-token, no minimum commitment, plus a free ionosGPT chat interface for non-technical users
✅Trusted, established German provider

Cons:
❌Noticeably more expensive per token than the cheapest EU options (e.g. Llama 3.3 70B at €0.65/1M in and out)
❌Small, older model catalog (around six models: Llama 3.1/3.3, Mistral Nemo/Small, gpt-oss-120b); no current GLM, DeepSeek, Qwen or Kimi
Best for: German companies and SMBs that want a trusted, GDPR-compliant API with RAG built in, and don't need the broadest model selection.

4. STACKIT:

Schwarz Group's sovereign cloud
STACKIT AI Model Serving is the inference service of STACKIT, the cloud arm of the Schwarz Group (Lidl, Kaufland). It serves open-source LLMs via an OpenAI-compatible API from German data centers, with a strong data-sovereignty and compliance focus.

Pros:
✅Hosted in German data centers, data not stored or used for training, strong compliance (ISO 27001, C5, SOC 2)
✅OpenAI-compatible API, token-based billing (most models €0.45/1M in, €0.65/1M out)
✅Backed by a large European group, attractive for regulated DACH enterprises

Cons:
❌Limited catalog (around six text models: Qwen3-VL 235B, Qwen3.6 27B, Llama 3.3 70B, Gemma 3 27B, GPT-OSS 120B/20B); no GLM, DeepSeek or Kimi
❌Still relatively young as a service, with a narrower feature set than dedicated inference platforms
❌ Pricing higher than the cheapest options for comparable models
Best for: Regulated DACH enterprises that prioritize sovereignty and compliance certifications over model breadth.

5. Mistral AI:

Europe's model champion
A French AI lab building its own models, offered through its La Plateforme API. Many of its models are also released as open weights (Apache 2.0), so they can be self-hosted.

Pros:
✅Strong, well-regarded in-house models, EU-hosted with European data residency
✅Competitive pricing, especially on output (Large 3 around $2/$6 per 1M in/out, cheaper than GPT/Claude on output)
✅Many models open-weight under Apache 2.0, so you can self-host if needed
✅Free experimentation tier on La Plateforme, polished API and tooling

Cons:
❌You can only run Mistral's own models; no DeepSeek, GLM, Qwen, Kimi etc.
❌128K context window across models, smaller than the 1M offered by some competitors
❌Model vendor rather than a neutral multi-model inference platform
Best for: Teams happy to standardize on Mistral's own models specifically.

Provider	Pricing model	Model choice	Best for
Lyceum	Pay-per-token	Broad open-source	All-in-one: inference, smart routing, dedicated endpoints (beta) and training, EU-hosted
Scaleway	Pay-per-token / dedicated	Good	Serverless EU inference from an established French cloud
IONOS	Pay-per-token	Limited	German SMBs wanting GDPR-compliant API with built-in RAG
STACKIT	Pay-per-token	Very limited	Regulated DACH enterprises prioritizing sovereignty/certifications
Mistral	Pay-per-token	Mistral only	Teams standardizing on Mistral's own models

How to decide

The honest answer is that “best” depends on your workload:
You want a broad, current open-source catalog at a low per-token price, with data in the EU and minimal setup: Lyceum is the most direct fit, a pay-per-token, OpenAI-compatible platform with smart routing across its own capacity, so you get availability without managing a separate router.
You want one vendor's models and don't need flexibility: Mistral.
You're a German SMB prioritizing a trusted brand with built-in RAG: IONOS.

The most important practical step: most of these offer free credits or trials. Pick two or three that match your workload, run your actual prompts through each, and compare real cost and latency on your own data before committing.

Is there a provider you'd recommend that's missing here? And what tips the decision for you, price, model choice, or compliance?

DEV Community