This article was originally published on aifoss.dev
TL;DR: Most open-weight LLMs are not under traditional open-source licenses. Apache 2.0 and MIT are the only safe bets for unrestricted commercial products. The Meta Llama Community License permits commercial use but contains restrictions — including a full EU exclusion for Llama 4 — that require more than a skim.
| MIT | Apache 2.0 | Meta Llama Community | Gemma Terms | |
|---|---|---|---|---|
| License type | Permissive | Permissive + patent grant | Custom / source-available | Proprietary terms |
| Commercial SaaS | ✅ | ✅ | ✅ under 700M MAU | ✅ with use policy |
| EU deployment | ✅ | ✅ | ✅ Llama 3.3 / ❌ Llama 4 | ✅ |
| OSI-certified open source | ✅ | ✅ | ❌ | ❌ |
| Fine-tune and redistribute | ✅ | ✅ | ✅ name must start "Llama" | ✅ |
Honest take: For any commercial product, pick Apache 2.0 or MIT — Mistral Small 4, Qwen3, or Phi-4. Llama is usable in most US-based contexts but carries conditions that require a real legal read, not a skim.
Why the license matters as much as the benchmark
Developers run evals, check SWE-bench scores, and measure tokens per second. The license file gets a glance at most. That's the wrong order for anything commercial.
A model that scores 3% better on HumanEval but carries a problematic license is worse for your product than a slightly weaker alternative under Apache 2.0. License problems surface months into a build — when legal reviews the stack before a fundraise, when you scale past a threshold, or when a customer in Berlin asks for your data processing agreement.
There are five license families that cover virtually every open-weight LLM in production as of June 2026: MIT, Apache 2.0, GPL/AGPL, the Meta Llama Community License, and Google's Gemma Terms of Use. Each has a different risk profile depending on what you're building. The real cost comparison between FOSS AI and SaaS AI changes significantly depending on which tier of license risk you're willing to carry.
MIT: the simplest permissive license
The MIT License is two paragraphs. It lets you do anything with the software — run it, modify it, distribute it, sell products built with it — as long as you preserve the copyright notice. No patents, no copyleft, no usage restrictions.
Models under MIT in 2026:
- DeepSeek V3, V4-Pro, V4-Flash, R1 (all weights on HuggingFace)
- Phi-4 and Phi-4 Multimodal (Microsoft)
The practical gap in MIT is what it doesn't include: no patent grant. If the model creator holds a patent on an architecture technique embedded in the weights, MIT doesn't automatically license you to use that patent. In practice, model architectures are rarely patent-enforced against downstream users — but Apache 2.0 closes this gap explicitly.
Also worth noting: DeepSeek's MIT license ships with an Attachment A — a use-restrictions appendix covering illegal and hazardous applications. This is standard for AI models and doesn't restrict normal commercial or SaaS use.
When MIT is the right call: Solo developer, startup, internal tool, mobile app — anything where you want zero legal overhead. Simplicity is the point.
Apache 2.0: MIT plus patent protection
Apache 2.0 is the license legal teams at larger companies actually prefer over MIT. The substantive difference: contributors grant a royalty-free patent license covering their contributions. If the model creator holds patents on their training techniques or architectural innovations, you're protected.
Models under Apache 2.0 in 2026:
- Mistral Small 4, Mistral 3 (all current open-weight Mistral releases)
- Qwen2.5 (all sizes, 0.5B through 72B) and Qwen3 (all sizes)
- Falcon 40B (Technology Innovation Institute)
- GPT-NeoX-20B (EleutherAI)
Apache 2.0 allows commercial use, modification, redistribution, private use, and sublicensing. Requirements: preserve copyright notices, include the license text if redistributing, note any changes made. Minimal for API-based or self-hosted deployments.
One nuance with Mistral: earlier models (pre-2025) had a modified license with a revenue threshold — companies earning over $20M/month needed a commercial license. The current Mistral 3 generation and Mistral Small 4 dropped that restriction and moved fully to Apache 2.0. If you're running an older Mistral model, verify the specific license on HuggingFace before assuming Apache 2.0.
When Apache 2.0 is the right call: Anything commercial where you want explicit patent protection. Larger team, legal review incoming, SaaS product, enterprise deployment. Apache 2.0 is the safe default.
GPL and AGPL: copyleft and the SaaS trap
GPL and AGPL are designed to keep software free — if you distribute a GPL-licensed program, you must release the source of your derivative under the same license. AGPL extends this to network use: if you run modified AGPL software as a service, you must share your modifications with users even if you never "distribute" the code.
No major open-weight LLM is released under GPL or AGPL. But the inference tooling around LLMs often is:
- llama.cpp: MIT. Safe.
- vLLM: Apache 2.0. Safe for hosted inference.
- Text Generation WebUI (oobabooga): AGPL-3.0. If you run a modified fork as a hosted service, you must publish your modifications.
- GPT4All desktop app: MIT for the app itself, but bundles components that use AGPL-licensed libraries. If you build a product on top of GPT4All's internals, audit the full dependency tree.
The SaaS trap with AGPL is subtle. GPL's traditional loophole let SaaS companies run GPL code on a server without "distributing" it, keeping the source private. AGPL was invented to close that loophole. If your inference layer uses AGPL code, you may be obligated to open-source your entire serving stack.
When this bites you: You fork oobabooga and build a proprietary SaaS on top. You bundle AGPL components into a packaged product. Both are real compliance risks.
Meta Llama Community License: read the actual text
The Meta Llama Community License is the most widely used non-standard license in the LLM space. Llama 3.3 and the Llama 4 family (Scout, Maverick) are all under it. It looks open — weights are public, commercial use is allowed — but it contains restrictions that don't exist in Apache 2.0 or MIT.
What it allows:
- Commercial use (building products, charging customers, SaaS)
- Modification and fine-tuning
- Redistribution with attribution
- Using model outputs to train other AI models (added in Llama 3.1, continued in Llama 4 with attribution requirements)
What it restricts:
700M MAU threshold: If your product reaches 700 million monthly active users, your license terminates and you must request a separate commercial license from Meta before reaching that threshold. For the vast majority of products this is theoretical, but it means the license is technically conditional — a flag in legal reviews.
Derivative model naming: If you fine-tune a Llama model and release the weights publicly, the derivative model name must start with "Llama." Your product is fine being called anything. Your released fine-tune weights on HuggingFace must be named "Llama-Something." This is a branding requirement, not copyleft — you don't have to open-source your fine-tune data or code.
No "Llama" in your product name: Your commercial product or SaaS cannot use "Llama" in its name or branding.
EU exclusion (Llama 4 only): The Llama 4 Community License explicitly excludes persons residing in the EU and companies headquartered in the EU from the license grant. This is the most significant new restriction in 2026. If your team is EU-based, or your product's primary user base is EU, Llama 4 is legally unavailable under the community license. Llama 3.3 does not have this restriction. The practical effect has been a wave of EU teams migrating to Mistral Small 4 and Qwen3.
Not OSI-certified open source: The OSI's definition requires no restriction based on groups of person
Top comments (0)