Self-Hosted AI for Dev Teams 2026: No Subscriptions

#tabby #librechat #anythingllm #selfhosted

This article was originally published on aifoss.dev

TL;DR: A 10-person team spending $190/month on GitHub Copilot Business can replace it with Tabby on a single RTX 4090 workstation and break even in under two years — while also getting private LLM chat (LibreChat) and document RAG (AnythingLLM) that SaaS doesn't give you for that price. The catch: someone owns the server, and someone maintains the stack.

	Tabby	LibreChat	AnythingLLM
Replaces	GitHub Copilot Business	ChatGPT Enterprise / Teams	Notion AI / private doc chat
Multi-user	Yes — LDAP, SSO, API tokens	Yes — LDAP, OIDC, OAuth	Yes — Admin/Manager/Default roles
GPU required	8 GB+ VRAM	No (UI layer, API-backed)	No (connects to Ollama or API)
License	Apache 2.0	MIT	MIT
Latest version	v0.32.0 (Jan 2026)	v0.8.6 (May 31, 2026)	v1.13.0 (May 26, 2026)

Honest take: If your team has even one developer willing to own the infrastructure, the math is not close. Three years of GitHub Copilot Business for 10 developers ($6,840) buys an RTX 4090 workstation and three years of electricity.

The Three AI Workloads Teams Actually Use

Most AI tool evaluation at the team level collapses into "what's the one tool that does everything." There isn't one. What teams actually need breaks into three distinct workloads with different technical requirements:

Code completion — inline autocomplete and chat in the IDE. Latency-sensitive. Every developer uses this every hour. The bottleneck is inference speed, not feature list.
LLM chat — a shared ChatGPT-style interface for prompting, writing, debugging, and general AI work. No special hardware if you're routing to an API; the important things here are multi-user access control and model flexibility.
Document RAG — ingesting internal docs, code, runbooks, and wikis so the LLM can answer questions about your actual codebase and company knowledge. Embedding workloads are batch-able and not latency-sensitive.

The open-source options that actually work in a team context are Tabby for code completion, LibreChat for the chat interface, and AnythingLLM for document RAG. Each solves one workload well. All three run on the same server.

Code Completion: Tabby v0.32.0

Tabby is the closest open-source equivalent to GitHub Copilot's backend. It ships as a single binary or Docker container, provides IDE extensions for VS Code, JetBrains, Vim, Neovim, and Eclipse, and adds team features that Copilot's entry tier doesn't have: named user accounts, per-user API tokens, LDAP/SSO authentication, usage analytics, and repository indexing for context-aware completions.

v0.32.0 (January 2026) introduced generic OAuth support and improved multi-branch codebase indexing. The repository indexing feature matters: it's what lets Tabby complete code that references your internal libraries, not just public code patterns from training data.

Model selection by team size

Tabby runs quantized models. The model you pick determines inference speed and how many concurrent developers you can serve without queuing:

Team size	Recommended model	VRAM needed	Concurrent users (approx.)
3–5 devs	StarCoder2-7B or Qwen2.5-Coder-7B (Q8)	8 GB	3–5
5–15 devs	Qwen2.5-Coder-7B (Q8)	8–10 GB	8–12
5–15 devs (higher quality)	Qwen2.5-Coder-14B (Q4_K_M)	10–12 GB	6–10
15–25 devs	Qwen2.5-Coder-32B (Q4)	20–24 GB	15–20

A single RTX 4090 with 24 GB VRAM handles up to 20 concurrent developers on a 7B model, or 15 on the 32B model with Tabby's built-in request queuing. The 7B sweet spot covers most teams under 15 people without quality tradeoffs that developers notice in daily use.

# Start Tabby with Qwen2.5-Coder-7B on CUDA
docker run -it \
  --gpus all \
  -p 8080:8080 \
  -v $HOME/.tabby:/data \
  tabbyml/tabby \
  serve \
  --model Qwen2.5-Coder-7B-Instruct \
  --chat-model Qwen2.5-Coder-7B-Instruct \
  --device cuda

When ready:

INFO  tabby::routes > listening on 0.0.0.0:8080

After the first-run admin setup at http://localhost:8080, create user accounts under Settings → Users, configure LDAP under Settings → Security → LDAP, and index your codebase under Settings → Indexing → Add Repository. Each developer gets their own API token; usage analytics break out per user in the admin panel.

Full setup walkthrough: Tabby Team Server Setup 2026.

LLM Chat: LibreChat v0.8.6

LibreChat is what you actually want when someone asks for "a self-hosted ChatGPT for the team." It handles multi-user auth (LDAP, OIDC, OAuth, plain email/password), lets you configure multiple model providers simultaneously, and has agent and tool support that ChatGPT's Team tier doesn't expose for on-premise deployments.

v0.8.6 (May 31, 2026) added Agent Skills and Subagents — packaging reusable instructions, scripts, and tool permissions into portable capabilities that agents can invoke automatically. For a dev team, that means: a "write a JIRA ticket" agent, a "summarize this PR" agent, and a "query our runbooks" agent — all shared across the team, not rebuilt by each developer individually.

What LibreChat adds that matters for teams:

Per-user model permissions — control which users or groups can access which models
LDAP/OIDC auth — users log in with corporate credentials; no separate account management
Multiple providers in one UI — route some tasks to local Ollama, others to OpenAI or Anthropic, from the same interface; developers don't juggle separate tools
Shared agents — build once, shared across the org
No GPU required — LibreChat is a UI and orchestration layer; it calls your Ollama instance or a commercial API, so the GPU budget goes toward Tabby inference, not here

# docker-compose.yml — LibreChat + MongoDB + Meilisearch
version: '3.8'
services:
  librechat:
    image: ghcr.io/danny-avila/librechat:v0.8.6
    ports:
      - "3080:3080"
    env_file:
      - .env
    volumes:
      - ./librechat.yaml:/app/librechat.yaml
    depends_on:
      - mongodb
      - meilisearch

  mongodb:
    image: mongo:7.0
    volumes:
      - mongodb_data:/data/db

  meilisearch:
    image: getmeili/meilisearch:v1.6
    volumes:
      - meilisearch_data:/meili_data

volumes:
  mongodb_data:
  meilisearch_data:

Key .env fields for a team deployment:

# Disable self-registration — require LDAP or invite-only
ALLOW_REGISTRATION=false

# LDAP
LDAP_URL=ldap://your-ldap:389
LDAP_USER_SEARCH_BASE=ou=people,dc=example,dc=com
LDAP_SEARCH_FILTER=mail

# Model backends
OLLAMA_BASE_URL=http://your-ollama-host:11434
OPENAI_API_KEY=sk-...            # optional — add if team uses cloud models alongside local

# Security
JWT_SECRET=<random-64-char-string>
SESSION_EXPIRY=604800000         # 7 days

One known gap: full role-based access control (group-level model permissions via GUI) is still in development for the 2026 roadmap. Per-user permissions exist and work; granular group policies require manual YAML config rather than an admin panel toggle. Fine for a 10-person team, annoying at 50+.

Full setup: LibreChat Setup Guide 2026.

Document RAG: AnythingLLM v1.13.0

AnythingLLM (v1.13.0, May 26, 2026) handles the "chat with your docs" workload. For a dev team that means: internal wikis, architecture decision records, runbooks, design docs, and the shared knowledge that currently lives in Notion or Confluence and takes 20 minutes to find.

The Docker version enables proper multi-user features: Admin/Manager/Default roles, per-workspace access controls, isolated document libraries per project or team, and embeddable chat widgets for intern