5 Ways to Run AI Agents on Oracle Cloud Free Tier (4 ARM Cores, 24GB RAM)
Oracle Cloud's Always Free tier includes a single Ampere A1 Compute instance with 4 ARM64 cores and 24 GB RAM -- permanently free. A comparable AWS t4g.xlarge costs ~$110/month. Here are five AI agent use cases you can run on it today, with step-by-step commands, resource usage estimates, and the ARM-specific gotchas that catch most people the first time.
Use Case 1: Vector Database + Embedding Server (Qdrant + sentence-transformers)
Why 4 ARM Cores / 24 GB Fits
Qdrant uses approximately 1.5-2 GB RAM for 1 million 768-dimensional vectors. With 24 GB total, you can store 10-12 million vectors and still have 10 GB available for the embedding model. The all-MiniLM-L6-v2 model requires only ~80 MB loaded; all-mpnet-base-v2 needs ~420 MB. Both fit comfortably alongside Qdrant.
4 ARM cores handle embedding inference at roughly 500-800 sentences/second for MiniLM -- more than sufficient for a personal AI research stack.
Setup Commands
sudo apt-get update && sudo apt-get install -y python3-pip python3-venv curl
curl -fsSL https://get.docker.com | sudo sh
sudo usermod -aG docker $USER && newgrp docker
docker pull qdrant/qdrant:latest
docker run -d --name qdrant -p 6333:6333 -p 6334:6334 -v $(pwd)/qdrant_storage:/qdrant/storage qdrant/qdrant
python3 -m venv venv && source venv/bin/activate
pip install sentence-transformers qdrant-client
Resource Estimates
| Resource | Idle | Under Load |
|---|---|---|
| CPU | 2-5% | 60-80% (4 cores) |
| RAM | ~3 GB | ~6-8 GB |
| Storage | ~2 GB | ~10-15 GB (1M vectors) |
Oracle Gotcha
Security List rules required. By default, Oracle blocks all ingress traffic. You must add ingress rules for ports 6333 and 6334 in your VCN's Security List, or Qdrant will be unreachable. Navigate to: Networking -> Virtual Cloud Networks -> Your VCN -> Security Lists -> Add Ingress Rule.
Use Case 2: Local LLM Inference Server (Ollama + Llama 3.2)
Why 4 ARM Cores / 24 GB Fits
Llama 3.2 3B in Q4_K_M quantisation requires ~2.2 GB RAM loaded. The 8B model requires ~5 GB. With 24 GB available, you can run the 8B model and still have 18 GB free for other services.
CPU inference on 4 ARM64 cores achieves approximately 8-12 tokens/second for the 3B model -- adequate for background agent tasks, content generation, and analysis pipelines that are not latency-sensitive.
Setup Commands
curl -fsSL https://ollama.ai/install.sh | sh
sudo systemctl enable ollama && sudo systemctl start ollama
ollama pull llama3.2:3b
ollama run llama3.2:3b "What is GEO content optimisation?"
curl http://localhost:11434/api/generate -d '{"model":"llama3.2:3b","prompt":"Summarise:","stream":false}'
Resource Estimates
| Model | RAM | Tokens/sec | Storage |
|---|---|---|---|
| Llama 3.2 3B (Q4) | ~2.2 GB | 10-15 tok/s | 2.0 GB |
| Llama 3.2 8B (Q4) | ~5.0 GB | 4-7 tok/s | 4.7 GB |
Oracle Gotcha
ARM architecture requires ARM-native Docker images. If you pull an x86_64 binary it will fail with "exec format error". Always verify images support linux/arm64 via docker manifest inspect <image> before pulling. Ollama's native installer handles this automatically.
Use Case 3: AI Agent Orchestration Hub (n8n)
Why 4 ARM Cores / 24 GB Fits
n8n uses approximately 200-400 MB RAM at idle with a small workflow set. Under active execution with 10-20 concurrent workflows it uses ~1-2 GB. This leaves more than 20 GB free for an LLM or vector database running alongside it.
4 ARM cores handle n8n's workflow engine and Node.js backend simultaneously with overhead to spare.
Setup Commands
curl -fsSL https://get.docker.com | sudo sh
mkdir -p ~/.n8n
docker run -d --name n8n -p 5678:5678 -v ~/.n8n:/home/node/.n8n -e N8N_BASIC_AUTH_ACTIVE=true -e N8N_BASIC_AUTH_USER=admin -e N8N_BASIC_AUTH_PASSWORD=changeme --restart unless-stopped docker.n8n.io/n8nio/n8n
echo "n8n running at: http://$(curl -s ifconfig.me):5678"
Resource Estimates
| Service | RAM (idle) | RAM (active) | CPU |
|---|---|---|---|
| n8n | ~300 MB | ~1.5 GB | 5-30% |
Oracle Gotcha
Boot volume is only 47 GB by default. After OS, Docker images, and workflow data you may hit the limit. Attach a Block Volume (Always Free includes 200 GB total) before installing everything:
sudo mkfs.ext4 /dev/sdb
sudo mkdir /data && sudo mount /dev/sdb /data
echo "/dev/sdb /data ext4 defaults 0 0" | sudo tee -a /etc/fstab
Use Case 4: Web Scraping Pipeline (Playwright + Airflow)
Why 4 ARM Cores / 24 GB Fits
A headless Chromium instance via Playwright uses approximately 300-500 MB RAM per browser context. Running 4-8 parallel workers consumes ~2-4 GB total. Apache Airflow in a minimal SQLite-backed configuration uses ~500 MB. Total pipeline RAM: ~3-5 GB -- leaving 19 GB free.
Setup Commands
sudo apt-get install -y python3-pip python3-venv
python3 -m venv scrape-env && source scrape-env/bin/activate
pip install playwright apache-airflow
playwright install chromium
playwright install-deps chromium
airflow db init
airflow users create --username admin --password admin --firstname Admin --lastname User --role Admin --email admin@local
Resource Estimates
| Component | RAM | CPU |
|---|---|---|
| Airflow Webserver | ~400 MB | 2-5% |
| Airflow Scheduler | ~200 MB | 1-3% |
| Playwright (4 workers) | ~1.6 GB | 20-60% |
Oracle Gotcha
Playwright on ARM requires --no-sandbox flag in server environments. Oracle's ARM kernel prevents the Chromium sandbox from working. Without it, Playwright will hang silently or throw "Target crashed":
browser = p.chromium.launch(args=["--no-sandbox", "--disable-dev-shm-usage"])
Use Case 5: Multi-Agent Coordination (CrewAI + Ollama)
Why 4 ARM Cores / 24 GB Fits
CrewAI and AutoGen are Python orchestration frameworks -- CPU-light when delegating to local or remote LLMs. A 4-agent CrewAI group uses ~500 MB for Python processes. Combined with Ollama (Use Case 2), the entire stack runs in ~6-7 GB, leaving 17 GB free.
Setup Commands
pip install crewai crewai-tools
ollama pull llama3.2:3b
python3 - << 'DEMO'
from crewai import Agent, Task, Crew, Process
from langchain_community.llms import Ollama
llm = Ollama(model="llama3.2:3b", base_url="http://127.0.0.1:11434")
researcher = Agent(role="Researcher", goal="Find key facts", llm=llm,
backstory="Expert at gathering information.")
writer = Agent(role="Writer", goal="Write clear summaries", llm=llm,
backstory="Skilled technical writer.")
task1 = Task(description="Research GEO trends 2025.", agent=researcher,
expected_output="3 bullet points")
task2 = Task(description="Write 100-word summary.", agent=writer,
expected_output="100-word summary")
crew = Crew(agents=[researcher, writer], tasks=[task1, task2],
process=Process.sequential)
print(crew.kickoff())
DEMO
Resource Estimates
| Component | RAM | CPU |
|---|---|---|
| CrewAI Python | ~300 MB | 2-5% |
| Ollama (3B) | ~2.2 GB | 50-90% during inference |
| Total | ~2.5 GB | Bursts to ~95% |
Oracle Gotcha
IPv6 is default-on on Oracle instances but many Python HTTP libraries bind to it unexpectedly. If Ollama connection fails, force IPv4:
OLLAMA_HOST=0.0.0.0 ollama serve &
And in Python: use base_url="http://127.0.0.1:11434" instead of localhost.
Top Pick Ranking
| Rank | Use Case | Reason |
|---|---|---|
| 1 | Qdrant + Embedding Server | Enables RAG for all other use cases; excellent RAM headroom |
| 2 | Ollama + Llama 3.2 | Eliminates API costs; 10 tok/s viable for agent tasks |
| 3 | CrewAI Multi-Agent | Combines 1+2; orchestrates real agent workflows |
| 4 | n8n Orchestration | Fastest path to useful pipelines for non-coders |
| 5 | Web Scraping Pipeline | Most resource-hungry; best on dedicated instance |
Recommendation: Start with Use Cases 1 and 2 in parallel, then layer CrewAI on top. Full autonomous AI agent stack for $0/month.
Oracle Cloud Gotchas: 5 Things Nobody Tells You
- Security Lists and Security Groups are different -- both can block traffic independently. When a port is unreachable, check both.
-
ARM-incompatible images fail silently --
exec format errormeans you pulled an x86 image. Always checklinux/arm64support first. - Boot volume is 47 GB, not unlimited -- the 200 GB block storage is separate and must be manually attached and mounted.
-
IPv6 is default-on, causing subtle issues -- many services bind to
::1instead of127.0.0.1. Explicitly bind servers to0.0.0.0. - Always Free limits are per-tenancy -- you get 4 OCPUs and 24 GB total across ALL ARM instances. A second instance will fail with a confusing "shape not available" error, not a quota error.
Top comments (0)