Mārtiņš Veiss

Posted on May 28

Running AI in regulated environments: how AutoBot keeps your documents on-premise

#selfhosted #ai #security #privacy

Most AI productivity tools are asking you to trust a third party with your data. For a solo dev building side projects, that trade-off is fine. For a law firm, a hospital system, or a fintech company — it isn't.

This post is for the second group. I want to walk through exactly how AutoBot handles data in a regulated environment, where the risks actually sit, and what you need to configure to deploy it safely.

The problem with cloud AI in regulated industries

When you send a prompt to GPT-4, Claude, or Gemini, your text crosses the network. That's obvious. What's less obvious is what else goes with it when you're using most AI platforms:

The documents you've uploaded to provide context
The "system prompt" that describes your business or patient workflows
Metadata about what you're working on and when

For HIPAA-covered entities, PHI (Protected Health Information) cannot be processed by a business associate without a signed BAA. Most consumer AI products don't offer BAAs. The ones that do cost enterprise pricing and require legal review cycles.

GDPR's article 28 has similar requirements for data processors. SOC 2 Type II audits will ask where your data goes. ISO 27001 requires you to document and control it.

None of this means you can't use AI. It means you need to choose carefully where your data goes.

AutoBot's data model

AutoBot separates two things that most platforms conflate: the knowledge base and the brain.

The knowledge base is the documents you upload — your patient intake forms, your case files, your customer contracts, your internal codebase. In AutoBot, this data never leaves your machine. The RAG engine (Retrieval-Augmented Generation) indexes your documents locally into ChromaDB, a vector database running on your own hardware. When you ask a question, the relevant chunks are retrieved locally — no external API call has happened yet.

The brain is the LLM that synthesizes the retrieved context into an answer. This is where you have a choice:

Run it locally via Ollama or any OpenAI-compatible server — prompts never leave your network
Route to a cloud model (GPT-4, Claude) — only the synthesized prompt goes out, not your documents

The line is: the brain phones home. Your documents don't.

This is meaningfully different from uploading files to a cloud AI assistant. In AutoBot, a search for "what does our standard NDA say about IP assignment" retrieves the relevant clause locally and sends only the question + retrieved text to the LLM. Your full NDA document never leaves your hardware.

Network isolation in practice

Data model is one layer. Network configuration is another.

AutoBot runs on Docker Compose. The default configuration is suitable for a development environment — it exposes ports on 0.0.0.0, which means anything on your network can reach it. For production in a regulated environment, you need to tighten this.

1. Bind to localhost, not all interfaces

In your docker-compose.override.yml:

services:
  frontend:
    ports:
      - "127.0.0.1:3000:3000"
  backend:
    ports:
      - "127.0.0.1:8000:8000"

This means AutoBot is only reachable from the host machine itself. Put a reverse proxy (nginx, Caddy) in front of it that handles TLS and access control.

2. Internal service network

Keep internal services off the host network entirely:

networks:
  autobot-internal:
    internal: true

services:
  chromadb:
    networks:
      - autobot-internal
  redis:
    networks:
      - autobot-internal
  backend:
    networks:
      - autobot-internal
      - default  # only backend needs external access for LLM calls

ChromaDB and Redis should never be reachable outside the Docker network.

3. Resource limits

One question I keep seeing (shoutout to @clawnewsai.bsky.social for surfacing this): does AutoBot enforce resource limits by default?

Currently no — the deploy.resources stanza in docker-compose.yml is left to the operator. For production, add explicit limits to prevent one runaway process from starving the host:

services:
  backend:
    deploy:
      resources:
        limits:
          cpus: "2.0"
          memory: 4G
        reservations:
          cpus: "0.5"
          memory: 1G

Real-world numbers vary with load. A good starting point for a team of 5-10 users: 4 vCPUs and 8GB RAM for the full stack on a dedicated host.

What goes where: the data flow audit

For compliance documentation, here's exactly what leaves your network in each configuration:

Mode	Documents	Prompts	Answers
Full local (Ollama)	Never	Never	Never
Hybrid (OpenAI/Claude for LLM)	Never	Yes — to LLM provider	Returned from LLM
Cloud-only (no local Ollama)	Never	Yes — to LLM provider	Returned from LLM

In all cases: your documents stay on your hardware. The ChromaDB vector store is local. The retrieval step is local.

If you're running Ollama on the same host, nothing crosses the network boundary at all. This is the right configuration for HIPAA-covered environments until you have a BAA with your LLM provider.

Getting started in a restricted environment

git clone https://github.com/mrveiss/AutoBot-AI.git
cd AutoBot-AI

# Copy and edit the environment file
cp .env.example .env

# For full local operation: install Ollama first
# https://ollama.com/download
ollama pull llama3.2

# Start AutoBot
docker compose up -d

Open http://localhost:3000. Connect Ollama as your LLM provider. Upload your first document.

From that point, nothing has left your machine.

Compliance checklist before production

This is not legal advice. Get a qualified compliance officer to sign off before processing regulated data. But here's the technical baseline:

[ ] Reverse proxy with TLS in front of AutoBot
[ ] Ports bound to 127.0.0.1, not 0.0.0.0
[ ] ChromaDB and Redis on internal Docker network only
[ ] Resource limits set per service
[ ] Ollama running locally if zero data egress is required
[ ] Audit logging enabled on the reverse proxy layer
[ ] Host firewall rules (ufw / firewalld) blocking unexpected inbound
[ ] Regular backups of ChromaDB volume (your knowledge base)
[ ] BAA executed with any cloud LLM provider you route to

The DevOps guide at dev.to/mrveiss/self-hosting-autobot-a-devops-deep-dive covers the reverse proxy and firewall setup in detail.

The overhead question

I keep getting asked: what's the overhead for a small team?

For a team of 5-10: a single machine with 16GB RAM and a modern CPU runs the full stack comfortably in CPU-only mode. GPU is optional — it dramatically accelerates local inference but isn't required. The DevOps guide has sizing tables for different workload profiles.

The operational overhead is similar to running any self-hosted application — you own the uptime, you manage the updates, you back up the data. For regulated industries, that overhead is already priced in. You're not adding new burden; you're moving existing compliance obligations to infrastructure you control.

The bottom line

AutoBot doesn't solve your compliance program. But it removes the data-egress problem from the AI layer entirely.

If you're in a regulated industry and you've been waiting on cloud AI because you can't figure out the data residency question — the answer is to not send the data in the first place.

Your knowledge base stays on your machine. You pick the brain. If you pick a local brain, nothing leaves your network.

That's the architecture. The compliance framework on top is yours to build. But the foundation is solid.

AutoBot is open source at github.com/mrveiss/AutoBot-AI. Full documentation, Docker deployment guides, and community discussions are there. If you're working through a regulated-environment deployment and hit a configuration question, open a discussion — the community has seen most of the edge cases.

DEV Community