George Lukas

Posted on Feb 20 • Edited on Mar 6

Chapter 3: A Better Abstraction — Managing LLM Apps with Terraform + Helm

#ai #devops #sre #tutorial

In Chapter 2, we reached an uncomfortable conclusion: Terraform can manage Kubernetes, but that doesn't mean it should manage everything in Kubernetes.

We observed that:

Terraform → Versioning, auditing, reproducibility
Helm → Simplicity, lifecycle management
Terraform + K8s Provider directly → Verbose, giant state, no rollbacks

The question that lingers: "Is there a way to have the best of both?"

Changing Abstraction Level

The problem in Chapter 2 wasn't Terraform itself, it was the level of abstraction we chose.

Wrong thinking:

Terraform → Manage Deployments, Services, Ingress, etc
             (individual Kubernetes resources)

Right thinking:

Terraform → Manage Helm Releases
             (complete applications as units)

It's a subtle but profound change. Instead of Terraform replacing Helm, Terraform orchestrates Helm.

Layered Architecture of Responsibilities

Let's visualize how responsibilities are divided:

┌──────────────────────────────────────────┐
│         You (Developer)                  │
│    Define desired state in code          │
└────────────────┬─────────────────────────┘
                 │
                 ↓
┌──────────────────────────────────────────┐
│         Terraform (Orchestrator)         │
│  • Manages namespaces                    │
│  • Manages infrastructure secrets        │
│  • Manages RBAC                          │
│  • Manages Helm Releases (pointers)      │
└────────────────┬─────────────────────────┘
                 │
                 ↓
┌──────────────────────────────────────────┐
│         Helm (Package Manager)           │
│  • Renders templates                     │
│  • Applies resources to cluster          │
│  • Maintains release history             │
│  • Manages rollbacks                     │
└────────────────┬─────────────────────────┘
                 │
                 ↓
┌──────────────────────────────────────────┐
│         Kubernetes (Runtime)             │
│  • Runs containers                       │
│  • Manages storage                       │
│  • Routes traffic                        │
│  • Self-healing                          │
└──────────────────────────────────────────┘

Each layer does what it does best. No unnecessary overlap.

Project Structure:

cap3-helm-provider/
├── main.tf                      # Main configuration
├── variables.tf                 # Input variables
├── terraform.tfvars             # Values (don't commit!)
├── outputs.tf                   # Useful outputs
├── .gitignore                   # Secret protection
│
├── values/                      # Helm chart values
│   ├── ollama-values.yaml
│   └── librechat-values.yaml
│
└── README.md                    # Documentation

Separation of code and configuration
- Logic (main.tf) separated from values (values/)
- Easy to version and review changes
Reusable
- Same structure for dev, staging, prod
- Only change terraform.tfvars
Secure
- .gitignore protects secrets
- Values can have public and private versions

Part 1: Provider Declaration

# main.tf
terraform {
  required_version = ">= 1.0"

  required_providers {
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "~> 2.23"
    }
    helm = {
      source  = "hashicorp/helm"
      version = "~> 2.11"
    }
  }
}

provider "kubernetes" {
  config_path    = "~/.kube/config"
  config_context = "minikube"
}

provider "helm" {
  kubernetes {
    config_path    = "~/.kube/config"
    config_context = "minikube"
  }
}

Helm Provider

Now we have two providers:

kubernetes: For base infrastructure resources
helm: For managing application releases

Important: The Helm provider doesn't replace the Kubernetes provider. They work together:

Kubernetes provider → Creates namespaces, secrets, RBAC
Helm provider → Deploys applications in those namespaces

Semantic versioning (~> 2.23):

~> 2.23 means:
- 2.23.0, 2.23.1, 2.24.0 (accepts)
- 3.0.0 (rejects - breaking change)

Ensures security updates without breaking compatibility.

Part 2: Base Infrastructure (Terraform Territory)

# Namespaces managed by Terraform
resource "kubernetes_namespace" "ollama" {
  metadata {
    name = "ollama"
    labels = {
      managed-by = "terraform"
    }
  }
}

resource "kubernetes_namespace" "librechat" {
  metadata {
    name = "librechat"
    labels = {
      managed-by = "terraform"
    }
  }
}

Why does Terraform manage namespaces?

Namespaces are infrastructure, not applications. They:

Rarely change
Are prerequisites for everything
Define security boundaries
Need to exist before applications

Label managed-by = "terraform":

# Useful for filtering
kubectl get ns -l managed-by=terraform

# Output:
NAME        STATUS   AGE
ollama      Active   10m
librechat   Active   10m

Makes it clear these resources shouldn't be edited manually.

Part 3: Infrastructure Secrets

# Secret managed by Terraform (infra-level)
resource "kubernetes_secret" "librechat_credentials" {
  metadata {
    name      = "librechat-credentials-env"
    namespace = kubernetes_namespace.librechat.metadata[0].name
  }

  data = {
    JWT_SECRET         = var.jwt_secret
    JWT_REFRESH_SECRET = var.jwt_refresh_secret
    CREDS_KEY          = var.creds_key
    CREDS_IV           = var.creds_iv
    MONGO_URI          = "mongodb://librechat-mongodb:27017/LibreChat"
    MEILI_HOST         = "http://librechat-meilisearch:7700"
    OLLAMA_BASE_URL    = "http://ollama.ollama.svc.cluster.local:11434"
  }

  type = "Opaque"
}

Design decision: Why does Terraform manage this secret?

This secret contains infrastructure credentials that:

Need to exist before application deployment
Don't change frequently
Are shared between environments (same structure, different values)
Should be versioned (structure) but not values (.tfvars)

Important dynamic reference:

namespace = kubernetes_namespace.librechat.metadata[0].name

Terraform ensures the namespace is created first, then the secret. Automatic dependency management!

Variables file (variables.tf):

variable "jwt_secret" {
  description = "JWT secret for LibreChat"
  type        = string
  sensitive   = true
}

variable "jwt_refresh_secret" {
  description = "JWT refresh secret"
  type        = string
  sensitive   = true
}

variable "creds_key" {
  description = "Credentials encryption key"
  type        = string
  sensitive   = true
}

variable "creds_iv" {
  description = "Credentials initialization vector"
  type        = string
  sensitive   = true
}

Values file (terraform.tfvars - DON'T COMMIT!):

jwt_secret         = "abc123def456..."  # generated with openssl rand -hex 32
jwt_refresh_secret = "ghi789jkl012..."
creds_key          = "mno345pqr678..."
creds_iv           = "stu901vwx234..."

.gitignore:

# Terraform
*.tfstate
*.tfstate.*
.terraform/
terraform.tfvars  # ← CRITICAL!

# Sensitive files
values/*-secrets.yaml

Part 4: Helm Releases

This is where the Chapter 3 approach shines.

Ollama deployment:

# Helm Release - Ollama
resource "helm_release" "ollama" {
  name       = "ollama"
  repository = "https://otwld.github.io/ollama-helm/"
  chart      = "ollama"
  namespace  = kubernetes_namespace.ollama.metadata[0].name

  values = [
    file("${path.module}/values/ollama-values.yaml")
  ]

  # Version control
  version = "1.41.0"  # Pin version for reproducibility

  # Deployment settings
  create_namespace = false  # Already created by Terraform
  wait             = true   # Wait for ready
  timeout          = 600    # 10 minutes max

  # Dependency tracking
  depends_on = [
    kubernetes_namespace.ollama
  ]
}

Breaking it down:

1. Chart source:

repository = "https://otwld.github.io/ollama-helm/"
chart      = "ollama"

2. Values file:

values = [
  file("${path.module}/values/ollama-values.yaml")
]

The file() function reads YAML from Terraform module Path.

3. Version pinning:

version = "1.41.0"

Critical for reproducibility! Without this, helm_release would use "latest", which changes over time.

4. Deployment controls:

wait    = true   # Don't return until ready
timeout = 600    # 10 min max

Terraform waits for Pods to be healthy before considering deployment successful.

5. Dependencies:

depends_on = [kubernetes_namespace.ollama]

Terraform creates namespace → then creates release

The values file (values/ollama-values.yaml):

ollama:
  gpu:
    enabled: true
    type: nvidia
    number: 1

  models:
    - llama2
    - codellama

  resources:
    requests:
      cpu: 2
      memory: 8Gi

service:
  type: ClusterIP
  port: 11434

ingress:
  enabled: true
  className: nginx
  hosts:
    - host: ollama.glukas.space
      paths:
        - path: /
          pathType: Prefix

LibreChat deployment:

# Helm Release - LibreChat
resource "helm_release" "librechat" {
  name       = "librechat"
  repository = "oci://ghcr.io/danny-avila/librechat-chart"
  chart      = "librechat"
  namespace  = kubernetes_namespace.librechat.metadata[0].name

  values = [
    file("${path.module}/values/librechat-values.yaml")
  ]

  version = "1.5.0"

  create_namespace = false
  wait             = true
  timeout          = 900  # 15 min (MongoDB initialization)

  depends_on = [
    kubernetes_namespace.librechat,
    kubernetes_secret.librechat_credentials
  ]
}

Notice:

Different repository (OCI registry)
Longer timeout (MongoDB takes time)
Depends on secret (must exist first)

LibreChat depends on Ollama. Terraform ensures order:

Namespace
Secret
Ollama
LibreChat

The values file (values/librechat-values.yaml):

config:
  APP_TITLE: "LibreChat + Ollama (via Terraform)"
  HOST: "0.0.0.0"
  PORT: "3080"
  SEARCH: "true"
  MONGO_URI: "mongodb://librechat-mongodb:27017/LibreChat"
  MEILI_HOST: "http://librechat-meilisearch:7700"

librechat:
  configEnv:
    ALLOW_REGISTRATION: "true"
  configYamlContent: |
    version: 1.1.5
    cache: true

    endpoints:
      custom:
        - name: "Ollama"
          apiKey: "ollama"
          baseURL: "http://ollama.ollama.svc.cluster.local:11434/v1"
          models:
            default:
              - "llama2:latest"
            fetch: true
          titleConvo: true
          titleModel: "llama2:latest"
          summarize: false
          summaryModel: "llama2:latest"
          forcePrompt: false
          modelDisplayLabel: "Ollama"
          addParams:
            temperature: 0.7
            max_tokens: 2000

extraEnvVarsSecret: "librechat-credentials-env"

ingress:
  enabled: true
  className: "nginx"
  annotations:
    nginx.ingress.kubernetes.io/proxy-body-size: "25m"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "600"
  hosts:
    - host: librechat.glukas.space
      paths:
        - path: /
          pathType: Prefix

mongodb:
  enabled: true
  auth:
    enabled: false
  image:
    repository: bitnami/mongodb
    tag: latest
  persistence:
    enabled: true
    size: 8Gi
  resources:
    requests:
      memory: "256Mi"
      cpu: "100m"
    limits:
      memory: "1Gi"
      cpu: "500m"

meilisearch:
  enabled: true
  auth:
    enabled: false
  environment:
    MEILI_NO_ANALYTICS: "true"
    MEILI_ENV: "development"
  persistence:
    enabled: true
    size: 1Gi
  resources:
    requests:
      memory: "128Mi"
      cpu: "50m"
    limits:
      memory: "512Mi"
      cpu: "250m"

resources:
  requests:
    memory: "256Mi"
    cpu: "100m"
  limits:
    memory: "1Gi"
    cpu: "500m"

persistence:
  enabled: true
  size: 5Gi
  storageClass: "standard"

replicaCount: 1

Deployment Workflow

Now let's see this in action.

Initial Deployment

# Generate secrets
export TF_VAR_jwt_secret=$(openssl rand -hex 32)
export TF_VAR_jwt_refresh_secret=$(openssl rand -hex 32)
export TF_VAR_creds_key=$(openssl rand -hex 32)
export TF_VAR_creds_iv=$(openssl rand -hex 16)

# Or create terraform.tfvars
cat > terraform.tfvars <<EOF
jwt_secret         = "$(openssl rand -hex 32)"
jwt_refresh_secret = "$(openssl rand -hex 32)"
creds_key          = "$(openssl rand -hex 32)"
creds_iv           = "$(openssl rand -hex 16)"
EOF

# 1. Initialize
terraform init
# Downloads both kubernetes and helm providers

# 2. Validate
terraform validate
# Checks HCL syntax

# 3. Plan
terraform plan

Plan output:

Terraform will perform the following actions:

  # kubernetes_namespace.ollama will be created
  + resource "kubernetes_namespace" "ollama" {
      + id = (known after apply)

      + metadata {
          + generation = (known after apply)
          + name       = "ollama"
          + labels     = {
              + "managed-by" = "terraform"
            }
        }
    }

  # kubernetes_secret.librechat_credentials will be created
  + resource "kubernetes_secret" "librechat_credentials" {
      + data = (sensitive value)
      + id   = (known after apply)
      + type = "Opaque"

      + metadata {
          + name      = "librechat-credentials-env"
          + namespace = (known after apply)
        }
    }

  # helm_release.ollama will be created
  + resource "helm_release" "ollama" {
      + id          = (known after apply)
      + name        = "ollama"
      + namespace   = (known after apply)
      + repository  = "https://otwld.github.io/ollama-helm/"
      + version     = "1.41.0"
      + status      = (known after apply)
      + values      = [
          + <<-EOT
              ollama:
                gpu:
                  enabled: true
                  ...
            EOT,
        ]
    }

Plan: 5 to add, 0 to change, 0 to destroy.

Notice: Only 5 resources in the plan!

2 namespaces
1 secret
2 helm_releases

Compare to Chapter 2: would be 50+ individual K8s resources.

# 4. Apply
terraform apply

# Output:
kubernetes_namespace.ollama: Creating...
kubernetes_namespace.librechat: Creating...
kubernetes_namespace.ollama: Creation complete after 1s
kubernetes_namespace.librechat: Creation complete after 1s
kubernetes_secret.librechat_credentials: Creating...
kubernetes_secret.librechat_credentials: Creation complete after 1s
helm_release.ollama: Creating...
helm_release.ollama: Still creating... [10s elapsed]
helm_release.ollama: Still creating... [20s elapsed]
helm_release.ollama: Creation complete after 45s [id=ollama]
helm_release.librechat: Creating...
helm_release.librechat: Still creating... [10s elapsed]
helm_release.librechat: Still creating... [20s elapsed]
...
helm_release.librechat: Creation complete after 2m15s [id=librechat]

Apply complete! Resources: 5 added, 0 changed, 0 destroyed.

What happened behind the scenes:

Terraform created namespaces
Terraform created secret
Terraform told Helm: "Install ollama chart with these values"
Helm rendered templates and created: Deployment, Service, PVC, Ingress, etc
Terraform told Helm: "Install librechat chart with these values"
Helm rendered templates and created: Deployment, Service, MongoDB StatefulSet, MeiliSearch Deployment, Ingress, etc

Terraform state only tracks the 5 high-level resources.
Helm manages all the detailed Kubernetes resources.

Verifying Deployment

# Check Terraform state
terraform state list
# kubernetes_namespace.librechat
# kubernetes_namespace.ollama
# kubernetes_secret.librechat_credentials
# helm_release.librechat
# helm_release.ollama

# Check Helm releases
helm list -A
# NAME        NAMESPACE   REVISION  STATUS    CHART             APP VERSION
# ollama      ollama      1         deployed  ollama-1.41.0     0.1.20
# librechat   librechat   1         deployed  librechat-1.5.0   0.7.0

# Check actual Kubernetes resources
kubectl get all -n ollama
# NAME                         READY   STATUS    RESTARTS   AGE
# pod/ollama-7d8f9c5b6d-xk2p4  1/1     Running   0          2m
#
# NAME             TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)     AGE
# service/ollama   ClusterIP   10.96.245.12   <none>        11434/TCP   2m
#
# NAME                     READY   UP-TO-DATE   AVAILABLE   AGE
# deployment.apps/ollama   1/1     1            1           2m

kubectl get all -n librechat
# (Shows MongoDB, MeiliSearch, LibreChat deployments/services)

# Testing Ollama
curl http://ollama.glukas.space/api/tags

Output:

{
  "models": [
    {
      "name": "llama2:latest",
      "modified_at": "2025-02-07T13:30:00.000Z",
      "size": 3826793677,
      "digest": "sha256:abc123...",
      "details": {
        "format": "gguf",
        "family": "llama",
        "families": ["llama"],
        "parameter_size": "7B",
        "quantization_level": "Q4_0"
      }
    }
  ]
}

Ollama Working!

# Testing LibreChat
curl -I http://librechat.glukas.space

Output:

HTTP/1.1 200 OK
Server: nginx
Content-Type: text/html; charset=utf-8
...

LibreChat Working!

Open in browser:

http://librechat.glukas.space
Sign in
Select model Ollama
Start chatting!

All Good!

Operations

Now let's see how daily operations are different.

Upgrading Chart Version

Scenario: New Ollama chart version available.

# 1. Update version
vim main.tf

resource "helm_release" "ollama" {
  # ...
  version = "1.42.0"  # was 1.41.0
}

# 2. Plan
terraform plan

Output:

  ~ resource "helm_release" "ollama" {
        id                         = "ollama"
        name                       = "ollama"
      ~ version                    = "1.41.0" -> "1.42.0"
        # (15 unchanged attributes hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Only 1 change: chart version!

# 3. Apply
terraform apply

Helm does rolling update, zero downtime!

Changing Configuration

Scenario: Add CodeLlama model.

# 1. Edit values
vim values/ollama-values.yaml

ollama:
  models:
    - llama2
    - codellama  # ← NEW

# 2. Plan
terraform plan

Output:

  ~ resource "helm_release" "ollama" {
      ~ values   = [
          ~ <<-EOT
                ollama:
                  models:
                    - llama2
              +     - codellama
            EOT,
        ]
    }

Terraform detects diff in YAML!

# 3. Apply
terraform apply

Helm updates Deployment → Pod restarts → Downloads CodeLlama → Ready!

Rollback

Scenario: New version broke.

Option 1: Via Terraform

# Revert commit in Git
git revert HEAD

# Apply previous version
terraform apply

Option 2: Via Helm (faster)

# View history
helm history ollama -n ollama
# REVISION  UPDATED                   STATUS      CHART           DESCRIPTION
# 1         Thu Feb 07 10:29:30 2025  superseded  ollama-1.41.0   Install complete
# 2         Thu Feb 07 11:15:20 2025  deployed    ollama-1.42.0   Upgrade complete

# Rollback
helm rollback ollama -n ollama
# Rollback was a success! Happy Helming!

# Terraform detects on next plan
terraform plan
# (will show drift, but it's not a problem)

Best practice: Always use Terraform, but Helm is available for emergencies.

Comparison: Chapter 2 vs Chapter 3

Let's put side by side to visualize the gain.

Code Required

Metric	Ch 2 (TF + K8s)	Ch 3 (TF + Helm)	Reduction
Lines of HCL	~500	~100	80% ↓
Lines of YAML	0	~100	-
Total code	500	200	60% ↓
Files	1 monolith	5 organized	-

State Management

Aspect	Ch 2	Ch 3
Resources in state	50+	5
State size	2.3 MB	15 KB
Plan time	2 minutes	10 seconds
Detectable drift	Partially	Yes (via Helm)

Operations

Operation	Ch 2	Ch 3
Initial deploy	`terraform apply` (5 min)	`terraform apply` (2 min)
Version upgrade	Edit multiple blocks	Change 1 line
Rollback	`git revert` + apply	`helm rollback` (instant)
View status	`terraform state list`	`helm list`
Debug	`kubectl` + state inspection	`helm status`

Maintainability

Factor	Ch 2	Ch 3
Learning curve	High (HCL + K8s)	Medium (HCL + familiar YAML)
New dev onboarding	Difficult	Reasonable
Code review	Complex (many changes)	Simple (clear diff)
Reusability	Low	High (public charts)

The Advantages Scale

Now imagine you don't have 2 applications, but 20:

Chapter 2:

20 applications × 250 lines = 5,000 lines of HCL
20 applications × 50 resources = 1,000 resources in state
terraform plan = 10+ minutes
State file = 50+ MB

Chapter 3:

20 applications × 15 lines = 300 lines of HCL
20 applications × 100 lines YAML = 2,000 lines (familiar)
20 releases in state
terraform plan = 30 seconds
State file = 300 KB

The difference becomes even more dramatic at scale.

Chapter 3 solves 90% of Chapter 2's problems, and in many scenarios that's sufficient.

Terraform + Helm is the sweet spot for managing Kubernetes applications in a reproducible and versioned way.

Recapping what we achieved:

Total versioning — Everything in Git
Reproducibility — terraform apply = identical environment
Separation of responsibilities — Terraform (infra) + Helm (apps)
Manageable state — Few tracked resources
Possible rollbacks — Via Helm or Git
Less code — 94% reduction vs Ch 2
Ecosystem — Thousands of public charts
Maintainable — Familiar YAML, minimal HCL

But there are still limitations:

1. Deployment is Manual

# Always needs someone executing
terraform apply

There's no real continuous deployment. Git isn't the single source of truth, it's an input that requires manual action.

2. No Continuous Reconciliation

# If someone does this:
kubectl edit deployment ollama -n ollama

# Terraform only detects on next plan
# Until then, there's divergence

There's no automatic process ensuring cluster = code.

3. Limited Auditing

Who deployed version 1.42.0?
git log  # Shows commit
# But who executed terraform apply?
# There's no central record

Terraform state has some information, but it's not a complete audit log.

4. Approvals and Gates

How to ensure production deployment:
- Passed automated tests?
- Was approved by PO/PM?
- Has automatic rollback if it fails?

Terraform doesn't have this built-in. You need to build custom pipelines.

5. Complex Multi-Tenancy

How to allow:
- Team A to manage their apps in namespace team-a
- Team B to manage their apps in namespace team-b
- But both use the same cluster?
- Without giving access to the entire Terraform state?

Possible, but requires complex architecture.

What's Missing: GitOps

GitOps principles:

Declarative: Desired state described declaratively (we have this!)
Versioned: Everything in Git (we have this!)
Pull-based: Cluster pulls changes from Git automatically (we don't have this)
Continuous reconciliation: Agent ensures cluster = Git always (we don't have this)

What would change with GitOps:

Without GitOps (Ch 3):

Developer → commits → Git
Developer → terraform apply → Cluster
(Push-based, manual)

With GitOps (Ch 4):

Developer → commits → Git
ArgoCD (agent in cluster) → polls Git → applies changes
(Pull-based, automatic)

Additional benefits:

Zero-touch deployment: Commit = automatic deploy
Auto-healing: Cluster self-corrects if it diverges
Complete audit: Each deploy is a commit
Approvals: PR process = deployment approval
Multi-tenancy: Each team has their repo/branch

Next Chapter:

In Chapter 4, we'll discover how Kubernetes infrastructure is managed at scale:

ArgoCD: Real continuous deployment
Application Sets: Deploy to multiple clusters
Granular RBAC: Secure multi-tenancy
Sync waves: Dependency orchestration
Auto-sync: Git → Cluster automatic
Automatic rollbacks: Integrated health checks

DEV Community

Chapter 3: A Better Abstraction — Managing LLM Apps with Terraform + Helm

Changing Abstraction Level

Layered Architecture of Responsibilities

Project Structure:

Part 1: Provider Declaration

Part 2: Base Infrastructure (Terraform Territory)

Part 3: Infrastructure Secrets

Part 4: Helm Releases

Deployment Workflow

Initial Deployment

Verifying Deployment

Operations

Upgrading Chart Version

Changing Configuration

Rollback

Comparison: Chapter 2 vs Chapter 3

Code Required

State Management

Operations

Maintainability

The Advantages Scale

Chapter 3 solves 90% of Chapter 2's problems, and in many scenarios that's sufficient.

But there are still limitations:

1. Deployment is Manual

2. No Continuous Reconciliation

3. Limited Auditing

4. Approvals and Gates

5. Complex Multi-Tenancy

What's Missing: GitOps

Next Chapter:

Top comments (0)