Oracle's free tier gives you 4 ARM cores and 24GB RAM. Forever. Most people waste it on nginx serving a portfolio site that gets 3 visitors a month. Here's what's actually worth running.
1. Self-Hosted AI Inference with Ollama + Mistral 7B
Run a local LLM that you actually own. Ollama turns model management into a docker pull-style workflow, and Mistral 7B fits comfortably in 24GB with room to breathe.
Turns out 24GB is the magic number for 7B models. You get real inference speeds without quantization sacrifices, and ARM's efficiency means idle CPU sits around 2–3% between requests.
Tool: Ollama
- Pull and install:
curl -fsSL https://ollama.com/install.sh | sh - Pull a model:
ollama pull mistral - Expose via systemd and reverse proxy with Caddy on port 11434
# Quick smoke test
curl http://localhost:11434/api/generate \
-d '{"model": "mistral", "prompt": "Explain ARM64 in one sentence", "stream": false}'
Est. usage: ~18–22GB RAM under load, 3–4 cores pegged during inference, ~0.5 cores idle
2. GitHub Actions Self-Hosted Runner with Earthly Cache
Your CI pipeline is slow because you're paying for shared GitHub runners that throw away your build cache every run. A self-hosted runner on this box fixes that — 4 ARM cores handle parallel jobs fine, and Earthly's cache layer persists locally between runs.
The real win: Docker layer caching survives across PRs. A build that took 8 minutes drops to 90 seconds.
Tool: Earthly + GitHub Actions runner
- Register the runner:
./config.sh --url https://github.com/your/repo --token TOKEN - Install Earthly:
brew install earthly/earthly/earthly(or the ARM binary directly) - Add
runs-on: self-hostedto your workflow yaml, done
jobs:
build:
runs-on: self-hosted
steps:
- uses: actions/checkout@v4
- run: earthly +build
Est. usage: 2–4 cores during builds, ~4–8GB RAM per concurrent job, nearly zero between runs
3. Personal Observability Stack: Grafana + Prometheus + Loki
Stop paying Datadog $30/month to monitor a side project. The full Grafana stack — metrics, logs, alerting — runs comfortably in under 6GB RAM on this box. 24GB means you can scrape a dozen services and retain 30 days of logs without sweating.
No seriously, don't sleep on this. You get dashboards, log correlation, and PagerDuty-style alerts for literally $0.
Tool: Grafana OSS stack
- Deploy with Docker Compose (Grafana + Prometheus + Loki + Promtail)
- Point Prometheus at your services; use Node Exporter for host metrics
- Import dashboard ID
1860for a solid starting point
# docker-compose snippet
services:
grafana:
image: grafana/grafana:latest
platform: linux/arm64
ports: ["3000:3000"]
prometheus:
image: prom/prometheus:latest
platform: linux/arm64
Est. usage: ~3–5GB RAM total for the stack, <0.5 cores idle, spikes to 1 core on dashboard load
4. WebAssembly Edge Function Sandbox with Wasmtime
This one's underrated. WASM sandboxes are perfect for running untrusted user-submitted code — think online judges, plugin systems, or cheap serverless functions. ARM's native WASM execution via Wasmtime is genuinely fast, not a gimmick.
The security story is real: each invocation gets an isolated sandbox with explicit capability grants. No container overhead.
- Install Wasmtime:
curl https://wasmtime.dev/install.sh -sSf | bash - Set up WAGI as an HTTP gateway for WASM modules
- Drop
.wasmbinaries into a modules directory; WAGI routes by path
# Run a WASM function directly
wasmtime run --dir=. my_function.wasm
# Or via WAGI HTTP gateway
wagi -c modules.toml --listen 0.0.0.0:3000
Est. usage: ~512MB–2GB RAM depending on concurrent executions, <1 core idle, scales linearly with load
5. Private LLM Gateway / API Proxy with LiteLLM
You're juggling OpenAI, Anthropic, and your local Ollama instance. LiteLLM unifies them behind one OpenAI-compatible endpoint. Self-host it here and you get: usage logging, per-key rate limiting, cost tracking, and fallback routing — all on metal you control.
This pairs perfectly with use case #1. Route cheap requests to local Mistral, expensive ones to GPT-4.
Tool: LiteLLM Proxy
pip install litellm[proxy]- Write a
config.yamlwith your model list and routing rules - Run
litellm --config config.yaml --port 8000behind Caddy with auth
# config.yaml
model_list:
- model_name: fast
litellm_params:
model: ollama/mistral
api_base: http://localhost:11434
- model_name: smart
litellm_params:
model: gpt-4o
api_key: sk-...
Est. usage: ~1–2GB RAM, <0.3 cores idle, negligible unless proxying heavy traffic
🏆 Top Pick: Ollama (Use Case #1)
Best spec-fit + practical value. 24GB RAM is exactly what you need for a 7B model to run without quantization compromises. ARM efficiency keeps idle consumption low. And the practical upside — a private, free, zero-latency LLM API — is immediately useful for literally every other project you're running on the same box.
Gotchas Nobody Mentions
ARM-incompatible Docker images are the #1 time sink. Always check for linux/arm64 tags first. If they're missing, add --platform linux/arm64 and hope the maintainer publishes multi-arch. Sometimes you'll need to build from source.
Oracle will nuke your account. Seriously. They've been known to terminate "free" instances citing abuse or inactivity. Snapshot your disk regularly. Don't build anything stateful here without a backup strategy.
No reverse DNS by default. Your IP won't resolve to a hostname. This matters if you're trying to send email or use services that do PTR record checks. Oracle lets you set rDNS in the console, but it's buried.
Egress costs aren't zero. The free tier includes 10TB/month outbound, but it's easy to burn through if you're serving large model weights or running a build cache that syncs artifacts. Watch the bandwidth dashboard.
Security lists ≠ iptables. Oracle has two firewall layers — the VCN security list and the OS-level iptables. Opening a port in the console does nothing if iptables blocks it. Both need to be configured.
What are you actually running on yours? Drop it in the comments — I'm always looking for the next reason to spin up another service on this thing.
Top comments (0)