cited

Posted on Apr 28

5 Things I'm Actually Running on My Free Oracle Cloud ARM Box (That Aren't a Blog)

#webdev #seo

Oracle's free tier gives you 4 ARM cores and 24GB RAM. Forever. Most people waste it on nginx serving a portfolio site that gets 3 visitors a month. Here's what's actually worth running.

1. Self-Hosted AI Inference with Ollama + Mistral 7B

Run a local LLM that you actually own. Ollama turns model management into a docker pull-style workflow, and Mistral 7B fits comfortably in 24GB with room to breathe.

Turns out 24GB is the magic number for 7B models. You get real inference speeds without quantization sacrifices, and ARM's efficiency means idle CPU sits around 2–3% between requests.

Tool: Ollama

Pull and install: curl -fsSL https://ollama.com/install.sh | sh
Pull a model: ollama pull mistral
Expose via systemd and reverse proxy with Caddy on port 11434

# Quick smoke test
curl http://localhost:11434/api/generate \
  -d '{"model": "mistral", "prompt": "Explain ARM64 in one sentence", "stream": false}'

Est. usage: ~18–22GB RAM under load, 3–4 cores pegged during inference, ~0.5 cores idle

2. GitHub Actions Self-Hosted Runner with Earthly Cache

Your CI pipeline is slow because you're paying for shared GitHub runners that throw away your build cache every run. A self-hosted runner on this box fixes that — 4 ARM cores handle parallel jobs fine, and Earthly's cache layer persists locally between runs.

The real win: Docker layer caching survives across PRs. A build that took 8 minutes drops to 90 seconds.

Tool: Earthly + GitHub Actions runner

Register the runner: ./config.sh --url https://github.com/your/repo --token TOKEN
Install Earthly: brew install earthly/earthly/earthly (or the ARM binary directly)
Add runs-on: self-hosted to your workflow yaml, done

jobs:
  build:
    runs-on: self-hosted
    steps:
      - uses: actions/checkout@v4
      - run: earthly +build

Est. usage: 2–4 cores during builds, ~4–8GB RAM per concurrent job, nearly zero between runs

3. Personal Observability Stack: Grafana + Prometheus + Loki

Stop paying Datadog $30/month to monitor a side project. The full Grafana stack — metrics, logs, alerting — runs comfortably in under 6GB RAM on this box. 24GB means you can scrape a dozen services and retain 30 days of logs without sweating.

No seriously, don't sleep on this. You get dashboards, log correlation, and PagerDuty-style alerts for literally $0.

Tool: Grafana OSS stack

Deploy with Docker Compose (Grafana + Prometheus + Loki + Promtail)
Point Prometheus at your services; use Node Exporter for host metrics
Import dashboard ID 1860 for a solid starting point

# docker-compose snippet
services:
  grafana:
    image: grafana/grafana:latest
    platform: linux/arm64
    ports: ["3000:3000"]
  prometheus:
    image: prom/prometheus:latest
    platform: linux/arm64

Est. usage: ~3–5GB RAM total for the stack, <0.5 cores idle, spikes to 1 core on dashboard load

4. WebAssembly Edge Function Sandbox with Wasmtime

This one's underrated. WASM sandboxes are perfect for running untrusted user-submitted code — think online judges, plugin systems, or cheap serverless functions. ARM's native WASM execution via Wasmtime is genuinely fast, not a gimmick.

The security story is real: each invocation gets an isolated sandbox with explicit capability grants. No container overhead.

Tool: Wasmtime + WAGI

Install Wasmtime: curl https://wasmtime.dev/install.sh -sSf | bash
Set up WAGI as an HTTP gateway for WASM modules
Drop .wasm binaries into a modules directory; WAGI routes by path

# Run a WASM function directly
wasmtime run --dir=. my_function.wasm

# Or via WAGI HTTP gateway
wagi -c modules.toml --listen 0.0.0.0:3000

Est. usage: ~512MB–2GB RAM depending on concurrent executions, <1 core idle, scales linearly with load

5. Private LLM Gateway / API Proxy with LiteLLM

You're juggling OpenAI, Anthropic, and your local Ollama instance. LiteLLM unifies them behind one OpenAI-compatible endpoint. Self-host it here and you get: usage logging, per-key rate limiting, cost tracking, and fallback routing — all on metal you control.

This pairs perfectly with use case #1. Route cheap requests to local Mistral, expensive ones to GPT-4.

Tool: LiteLLM Proxy

pip install litellm[proxy]
Write a config.yaml with your model list and routing rules
Run litellm --config config.yaml --port 8000 behind Caddy with auth

# config.yaml
model_list:
  - model_name: fast
    litellm_params:
      model: ollama/mistral
      api_base: http://localhost:11434
  - model_name: smart
    litellm_params:
      model: gpt-4o
      api_key: sk-...

Est. usage: ~1–2GB RAM, <0.3 cores idle, negligible unless proxying heavy traffic

🏆 Top Pick: Ollama (Use Case #1)

Best spec-fit + practical value. 24GB RAM is exactly what you need for a 7B model to run without quantization compromises. ARM efficiency keeps idle consumption low. And the practical upside — a private, free, zero-latency LLM API — is immediately useful for literally every other project you're running on the same box.

Gotchas Nobody Mentions

ARM-incompatible Docker images are the #1 time sink. Always check for linux/arm64 tags first. If they're missing, add --platform linux/arm64 and hope the maintainer publishes multi-arch. Sometimes you'll need to build from source.

Oracle will nuke your account. Seriously. They've been known to terminate "free" instances citing abuse or inactivity. Snapshot your disk regularly. Don't build anything stateful here without a backup strategy.

No reverse DNS by default. Your IP won't resolve to a hostname. This matters if you're trying to send email or use services that do PTR record checks. Oracle lets you set rDNS in the console, but it's buried.

Egress costs aren't zero. The free tier includes 10TB/month outbound, but it's easy to burn through if you're serving large model weights or running a build cache that syncs artifacts. Watch the bandwidth dashboard.

Security lists ≠ iptables. Oracle has two firewall layers — the VCN security list and the OS-level iptables. Opening a port in the console does nothing if iptables blocks it. Both need to be configured.

What are you actually running on yours? Drop it in the comments — I'm always looking for the next reason to spin up another service on this thing.

Top comments (1)

Riaan Pietersen • Apr 29

I am bummed that this article doesnt have more comments or more likes. Do you perhaps have more info on Self-Hosted AI Inference with Ollama + Mistral 7B...? I am pitching a project soon that will require a self hosted LLM.