<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Valeria Bernhardt</title>
    <description>The latest articles on DEV Community by Valeria Bernhardt (@valeria_bernhardt_c9473b7).</description>
    <link>https://dev.to/valeria_bernhardt_c9473b7</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4001862%2F893d146f-7dab-46fe-8350-9dd57be4ab00.png</url>
      <title>DEV Community: Valeria Bernhardt</title>
      <link>https://dev.to/valeria_bernhardt_c9473b7</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/valeria_bernhardt_c9473b7"/>
    <language>en</language>
    <item>
      <title>Choosing an EU-Hosted Inference Provider: A 2026 Comparison</title>
      <dc:creator>Valeria Bernhardt</dc:creator>
      <pubDate>Thu, 02 Jul 2026 09:59:26 +0000</pubDate>
      <link>https://dev.to/valeria_bernhardt_c9473b7/choosing-an-eu-hosted-inference-provider-a-2026-comparison-5d5h</link>
      <guid>https://dev.to/valeria_bernhardt_c9473b7/choosing-an-eu-hosted-inference-provider-a-2026-comparison-5d5h</guid>
      <description>&lt;p&gt;&lt;strong&gt;European teams building with LLMs face a question that did not exist a few years ago: where do you actually run inference?&lt;/strong&gt; US options fall into two camps, proprietary-model providers like OpenAI, and recently open-source inference platforms like Together AI or Fireworks that serve more affordable open-weight models. Both are fast and competitively priced, but routing sensitive data through US infrastructure raises GDPR and data-residency concerns. A growing set of European providers now offer an alternative for running open-source models inside the EU, but they differ widely in focus, pricing model, and what they actually host.&lt;/p&gt;

&lt;p&gt;This article &lt;strong&gt;compares the main options&lt;/strong&gt; for running open-source model inference &lt;strong&gt;inside the EU&lt;/strong&gt;, &lt;strong&gt;what each is best at&lt;/strong&gt;, and &lt;strong&gt;where each falls short&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quick answer:&lt;/strong&gt;&lt;br&gt;
If you want a one-line version: for serverless, pay-per-token inference on open-source models with EU data residency, the most direct options are Lyceum, Scaleway, IONOS, and STACKIT. For a managed, single-vendor model family, Mistral. The rest of this article explains the trade-offs.&lt;/p&gt;




&lt;h2&gt;
  
  
  What to look for in an EU inference provider
&lt;/h2&gt;

&lt;p&gt;Before comparing vendors, the criteria that actually matter for an inference workload:&lt;br&gt;
&lt;strong&gt;Data residency&lt;/strong&gt;: Are inference requests processed inside the EU, and is that contractually guaranteed (DPA/AVV)?&lt;br&gt;
&lt;strong&gt;Pricing model&lt;/strong&gt;: Serverless pay-per-token (you pay only for the tokens you process) vs. a dedicated endpoint with reserved capacity at a fixed rate for steady high load.&lt;br&gt;
&lt;strong&gt;Model choice&lt;/strong&gt;: A broad catalog of open-source models vs. a single vendor's own models.&lt;br&gt;
&lt;strong&gt;Integration effort&lt;/strong&gt;: OpenAI-compatible APIs let you switch by changing an endpoint; proprietary APIs require a rewrite.&lt;br&gt;
&lt;strong&gt;Scaling&lt;/strong&gt;: Can you go from a quick test to production volume without changing providers, and is there a dedicated-endpoint option for steady high load?&lt;/p&gt;




&lt;h2&gt;
  
  
  The providers
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Lyceum:
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;EU-hosted inference for open-source models&lt;/em&gt;&lt;br&gt;
A Berlin-based inference cloud offering serverless, pay-per-token access to open-source models through an OpenAI-compatible API, with GPU VMs and clusters available for training on the same platform.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;br&gt;
✅&lt;strong&gt;OpenAI-compatible API&lt;/strong&gt;: migrating from OpenAI or another provider is largely a change of base URL and key, no rewrite&lt;br&gt;
Broad, &lt;br&gt;
✅&lt;strong&gt;current open-source model catalog&lt;/strong&gt; (DeepSeek, GLM, Qwen, Llama, Kimi and others) with transparent per-token pricing (from $0.13/1M tokens)&lt;br&gt;
✅&lt;strong&gt;Serverless smart routing&lt;/strong&gt;: requests are routed to the best available capacity automatically&lt;br&gt;
✅&lt;strong&gt;EU data residency&lt;/strong&gt; for European workloads, GDPR-compliant with DPA/AVV available, plus zero-retention mode (prompts and outputs not stored)&lt;br&gt;
✅&lt;strong&gt;Pay-per-token&lt;/strong&gt; with no base fees, no minimum commitment, scale-to-zero, &lt;strong&gt;prompt caching included&lt;/strong&gt;&lt;br&gt;
✅ can &lt;strong&gt;scale into training&lt;/strong&gt; (GPU VMs and clusters) on the same platform if needed&lt;br&gt;
✅Close, &lt;strong&gt;hands-on customer support&lt;/strong&gt; (direct access to the team, e.g. via a shared Slack channel), rather than ticket queues&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;br&gt;
❌&lt;strong&gt;Dedicated inference endpoints&lt;/strong&gt; are still in &lt;strong&gt;beta&lt;/strong&gt; (generally available planned)&lt;br&gt;
&lt;strong&gt;Best for:&lt;/strong&gt; EU teams that want low-cost, drop-in inference on open-source models, with the option to scale into training, without managing infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Scaleway:
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;French cloud with serverless inference&lt;/em&gt;&lt;br&gt;
A French provider offering serverless inference (Generative APIs) and dedicated deployments, hosted in its Paris data centers, alongside a broad cloud and GPU portfolio.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;br&gt;
✅&lt;strong&gt;Serverless, pay-per-token inference with an OpenAI-compatible API **(from €0.15/1M for gpt-oss-120b)&lt;br&gt;
✅&lt;/strong&gt;Reasonably current catalog** including GLM and recent Qwen models, hosted in France, GDPR-compliant, no data retention&lt;br&gt;
✅&lt;strong&gt;Low first-token latency&lt;/strong&gt; (sub-200ms reported in Europe), free tier on the first 1M tokens&lt;br&gt;
✅&lt;strong&gt;Backed by a large, established European cloud&lt;/strong&gt; with GPU instances and clusters&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;br&gt;
❌On a like-for-like model, &lt;strong&gt;pricing can run higher than the cheapest options&lt;/strong&gt; (e.g. Llama 3.3 70B at €0.90/1M in and out); worth comparing per-model rates&lt;br&gt;
❌Inference is one part of a very broad cloud product, which &lt;strong&gt;can add complexity&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Best for:&lt;/strong&gt; Teams wanting a serverless EU inference API from an established French cloud, especially if already in the Scaleway ecosystem.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. IONOS:
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;German AI Model Hub&lt;/em&gt;&lt;br&gt;
The AI Model Hub from IONOS, one of Germany's largest hosters, serves open-source models from German data centers via an OpenAI-compatible API, with integrated RAG and vector-database features.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;br&gt;
✅All &lt;strong&gt;models hosted in Germany&lt;/strong&gt;, GDPR-compliant, AVV/DPA available, no US CLOUD Act exposure&lt;br&gt;
✅&lt;strong&gt;OpenAI-compatible API&lt;/strong&gt; plus built-in vector database and RAG without extra setup&lt;br&gt;
✅&lt;strong&gt;Pay-per-token&lt;/strong&gt;, no minimum commitment, plus a free ionosGPT chat interface for non-technical users&lt;br&gt;
✅&lt;strong&gt;Trusted&lt;/strong&gt;, established German provider&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;br&gt;
❌Noticeably &lt;strong&gt;more expensive per token&lt;/strong&gt; than the cheapest EU options (e.g. Llama 3.3 70B at €0.65/1M in and out)&lt;br&gt;
❌&lt;strong&gt;Small, older model catalog&lt;/strong&gt; (around six models: Llama 3.1/3.3, Mistral Nemo/Small, gpt-oss-120b); no current GLM, DeepSeek, Qwen or Kimi&lt;br&gt;
&lt;strong&gt;Best for:&lt;/strong&gt; German companies and SMBs that want a trusted, GDPR-compliant API with RAG built in, and don't need the broadest model selection.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. STACKIT:
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Schwarz Group's sovereign cloud&lt;/em&gt;&lt;br&gt;
STACKIT AI Model Serving is the inference service of STACKIT, the cloud arm of the Schwarz Group (Lidl, Kaufland). It serves open-source LLMs via an OpenAI-compatible API from German data centers, with a strong data-sovereignty and compliance focus.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;br&gt;
✅&lt;strong&gt;Hosted in German data centers&lt;/strong&gt;, data not stored or used for training, &lt;strong&gt;strong compliance&lt;/strong&gt; (ISO 27001, C5, SOC 2)&lt;br&gt;
✅&lt;strong&gt;OpenAI-compatible API, token-based billing&lt;/strong&gt; (most models €0.45/1M in, €0.65/1M out)&lt;br&gt;
✅&lt;strong&gt;Backed by a large European group&lt;/strong&gt;, attractive for regulated DACH enterprises&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;br&gt;
❌&lt;strong&gt;Limited catalog&lt;/strong&gt; (around six text models: Qwen3-VL 235B, Qwen3.6 27B, Llama 3.3 70B, Gemma 3 27B, GPT-OSS 120B/20B); no GLM, DeepSeek or Kimi&lt;br&gt;
❌Still relatively young as a service, with a &lt;strong&gt;narrower feature set&lt;/strong&gt; than dedicated inference platforms&lt;br&gt;
❌ &lt;strong&gt;Pricing higher&lt;/strong&gt; than the cheapest options for comparable models&lt;br&gt;
&lt;strong&gt;Best for:&lt;/strong&gt; Regulated DACH enterprises that prioritize sovereignty and compliance certifications over model breadth.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Mistral AI:
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Europe's model champion&lt;/em&gt;&lt;br&gt;
A French AI lab building its own models, offered through its La Plateforme API. Many of its models are also released as open weights (Apache 2.0), so they can be self-hosted.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;br&gt;
✅Strong, well-regarded &lt;strong&gt;in-house models&lt;/strong&gt;, &lt;strong&gt;EU-hosted&lt;/strong&gt; with European data residency&lt;br&gt;
✅&lt;strong&gt;Competitive pricing&lt;/strong&gt;, especially on output (Large 3 around $2/$6 per 1M in/out, cheaper than GPT/Claude on output)&lt;br&gt;
✅Many models open-weight under Apache 2.0, so you &lt;strong&gt;can self-host if needed&lt;/strong&gt;&lt;br&gt;
✅&lt;strong&gt;Free experimentation&lt;/strong&gt; tier on La Plateforme, polished API and tooling&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;br&gt;
❌You can &lt;strong&gt;only run Mistral's own models&lt;/strong&gt;; no DeepSeek, GLM, Qwen, Kimi etc.&lt;br&gt;
❌&lt;strong&gt;128K context window across models&lt;/strong&gt;, smaller than the 1M offered by some competitors&lt;br&gt;
❌&lt;strong&gt;Model vendor&lt;/strong&gt; rather than a neutral multi-model inference platform&lt;br&gt;
&lt;strong&gt;Best for:&lt;/strong&gt; Teams happy to standardize on Mistral's own models specifically.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Pricing model&lt;/th&gt;
&lt;th&gt;Model choice&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Lyceum&lt;/td&gt;
&lt;td&gt;Pay-per-token&lt;/td&gt;
&lt;td&gt;Broad open-source&lt;/td&gt;
&lt;td&gt;All-in-one: inference, smart routing, dedicated endpoints (beta) and training, EU-hosted&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scaleway&lt;/td&gt;
&lt;td&gt;Pay-per-token / dedicated&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;td&gt;Serverless EU inference from an established French cloud&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IONOS&lt;/td&gt;
&lt;td&gt;Pay-per-token&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;German SMBs wanting GDPR-compliant API with built-in RAG&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;STACKIT&lt;/td&gt;
&lt;td&gt;Pay-per-token&lt;/td&gt;
&lt;td&gt;Very limited&lt;/td&gt;
&lt;td&gt;Regulated DACH enterprises prioritizing sovereignty/certifications&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mistral&lt;/td&gt;
&lt;td&gt;Pay-per-token&lt;/td&gt;
&lt;td&gt;Mistral only&lt;/td&gt;
&lt;td&gt;Teams standardizing on Mistral's own models&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  How to decide
&lt;/h2&gt;

&lt;p&gt;The honest answer is that &lt;strong&gt;“best” depends on your workload&lt;/strong&gt;:&lt;br&gt;
You want a &lt;strong&gt;broad, current open-source catalog at a low per-token price&lt;/strong&gt;, with data in the EU and minimal setup: &lt;strong&gt;Lyceum&lt;/strong&gt; is the most direct fit, a pay-per-token, OpenAI-compatible platform with smart routing across its own capacity, so you get availability without managing a separate router.&lt;br&gt;
You want &lt;strong&gt;one vendor's models&lt;/strong&gt; and don't need flexibility: &lt;strong&gt;Mistral&lt;/strong&gt;.&lt;br&gt;
You're a German SMB prioritizing a &lt;strong&gt;trusted brand&lt;/strong&gt; with built-in RAG: &lt;strong&gt;IONOS&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The most important practical step:&lt;/strong&gt; most of these offer free credits or trials. Pick two or three that match your workload, run your actual prompts through each, and compare real cost and latency on your own data before committing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is there a provider you'd recommend that's missing here? And what tips the decision for you, price, model choice, or compliance?&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>eu</category>
      <category>llm</category>
    </item>
  </channel>
</rss>
