Kas

Posted on Apr 29

5 Ways to Run AI Agents on Oracle Cloud Free Tier (4 ARM Cores, 24GB RAM)

#cloud #ai #devops #productivity

5 Ways to Run AI Agents on Oracle Cloud Free Tier (4 ARM Cores, 24GB RAM)

Oracle Cloud's Always Free tier includes a single Ampere A1 Compute instance with 4 ARM64 cores and 24 GB RAM -- permanently free. A comparable AWS t4g.xlarge costs ~$110/month. Here are five AI agent use cases you can run on it today, with step-by-step commands, resource usage estimates, and the ARM-specific gotchas that catch most people the first time.

Use Case 1: Vector Database + Embedding Server (Qdrant + sentence-transformers)

Why 4 ARM Cores / 24 GB Fits

Qdrant uses approximately 1.5-2 GB RAM for 1 million 768-dimensional vectors. With 24 GB total, you can store 10-12 million vectors and still have 10 GB available for the embedding model. The all-MiniLM-L6-v2 model requires only ~80 MB loaded; all-mpnet-base-v2 needs ~420 MB. Both fit comfortably alongside Qdrant.

4 ARM cores handle embedding inference at roughly 500-800 sentences/second for MiniLM -- more than sufficient for a personal AI research stack.

Setup Commands

sudo apt-get update && sudo apt-get install -y python3-pip python3-venv curl
curl -fsSL https://get.docker.com | sudo sh
sudo usermod -aG docker $USER && newgrp docker
docker pull qdrant/qdrant:latest
docker run -d --name qdrant -p 6333:6333 -p 6334:6334   -v $(pwd)/qdrant_storage:/qdrant/storage qdrant/qdrant
python3 -m venv venv && source venv/bin/activate
pip install sentence-transformers qdrant-client

Resource Estimates

Resource	Idle	Under Load
CPU	2-5%	60-80% (4 cores)
RAM	~3 GB	~6-8 GB
Storage	~2 GB	~10-15 GB (1M vectors)

Oracle Gotcha

Security List rules required. By default, Oracle blocks all ingress traffic. You must add ingress rules for ports 6333 and 6334 in your VCN's Security List, or Qdrant will be unreachable. Navigate to: Networking -> Virtual Cloud Networks -> Your VCN -> Security Lists -> Add Ingress Rule.

Use Case 2: Local LLM Inference Server (Ollama + Llama 3.2)

Why 4 ARM Cores / 24 GB Fits

Llama 3.2 3B in Q4_K_M quantisation requires ~2.2 GB RAM loaded. The 8B model requires ~5 GB. With 24 GB available, you can run the 8B model and still have 18 GB free for other services.

CPU inference on 4 ARM64 cores achieves approximately 8-12 tokens/second for the 3B model -- adequate for background agent tasks, content generation, and analysis pipelines that are not latency-sensitive.

Setup Commands

curl -fsSL https://ollama.ai/install.sh | sh
sudo systemctl enable ollama && sudo systemctl start ollama
ollama pull llama3.2:3b
ollama run llama3.2:3b "What is GEO content optimisation?"
curl http://localhost:11434/api/generate   -d '{"model":"llama3.2:3b","prompt":"Summarise:","stream":false}'

Resource Estimates

Model	RAM	Tokens/sec	Storage
Llama 3.2 3B (Q4)	~2.2 GB	10-15 tok/s	2.0 GB
Llama 3.2 8B (Q4)	~5.0 GB	4-7 tok/s	4.7 GB

Oracle Gotcha

ARM architecture requires ARM-native Docker images. If you pull an x86_64 binary it will fail with "exec format error". Always verify images support linux/arm64 via docker manifest inspect <image> before pulling. Ollama's native installer handles this automatically.

Use Case 3: AI Agent Orchestration Hub (n8n)

Why 4 ARM Cores / 24 GB Fits

n8n uses approximately 200-400 MB RAM at idle with a small workflow set. Under active execution with 10-20 concurrent workflows it uses ~1-2 GB. This leaves more than 20 GB free for an LLM or vector database running alongside it.

4 ARM cores handle n8n's workflow engine and Node.js backend simultaneously with overhead to spare.

Setup Commands

curl -fsSL https://get.docker.com | sudo sh
mkdir -p ~/.n8n
docker run -d --name n8n -p 5678:5678   -v ~/.n8n:/home/node/.n8n   -e N8N_BASIC_AUTH_ACTIVE=true   -e N8N_BASIC_AUTH_USER=admin   -e N8N_BASIC_AUTH_PASSWORD=changeme   --restart unless-stopped   docker.n8n.io/n8nio/n8n
echo "n8n running at: http://$(curl -s ifconfig.me):5678"

Resource Estimates

Service	RAM (idle)	RAM (active)	CPU
n8n	~300 MB	~1.5 GB	5-30%

Oracle Gotcha

Boot volume is only 47 GB by default. After OS, Docker images, and workflow data you may hit the limit. Attach a Block Volume (Always Free includes 200 GB total) before installing everything:

sudo mkfs.ext4 /dev/sdb
sudo mkdir /data && sudo mount /dev/sdb /data
echo "/dev/sdb /data ext4 defaults 0 0" | sudo tee -a /etc/fstab

Use Case 4: Web Scraping Pipeline (Playwright + Airflow)

Why 4 ARM Cores / 24 GB Fits

A headless Chromium instance via Playwright uses approximately 300-500 MB RAM per browser context. Running 4-8 parallel workers consumes ~2-4 GB total. Apache Airflow in a minimal SQLite-backed configuration uses ~500 MB. Total pipeline RAM: ~3-5 GB -- leaving 19 GB free.

Setup Commands

sudo apt-get install -y python3-pip python3-venv
python3 -m venv scrape-env && source scrape-env/bin/activate
pip install playwright apache-airflow
playwright install chromium
playwright install-deps chromium
airflow db init
airflow users create --username admin --password admin   --firstname Admin --lastname User --role Admin --email admin@local

Resource Estimates

Component	RAM	CPU
Airflow Webserver	~400 MB	2-5%
Airflow Scheduler	~200 MB	1-3%
Playwright (4 workers)	~1.6 GB	20-60%

Oracle Gotcha

Playwright on ARM requires --no-sandbox flag in server environments. Oracle's ARM kernel prevents the Chromium sandbox from working. Without it, Playwright will hang silently or throw "Target crashed":

browser = p.chromium.launch(args=["--no-sandbox", "--disable-dev-shm-usage"])

Use Case 5: Multi-Agent Coordination (CrewAI + Ollama)

Why 4 ARM Cores / 24 GB Fits

CrewAI and AutoGen are Python orchestration frameworks -- CPU-light when delegating to local or remote LLMs. A 4-agent CrewAI group uses ~500 MB for Python processes. Combined with Ollama (Use Case 2), the entire stack runs in ~6-7 GB, leaving 17 GB free.

Setup Commands

pip install crewai crewai-tools
ollama pull llama3.2:3b
python3 - << 'DEMO'
from crewai import Agent, Task, Crew, Process
from langchain_community.llms import Ollama
llm = Ollama(model="llama3.2:3b", base_url="http://127.0.0.1:11434")
researcher = Agent(role="Researcher", goal="Find key facts", llm=llm,
                   backstory="Expert at gathering information.")
writer = Agent(role="Writer", goal="Write clear summaries", llm=llm,
               backstory="Skilled technical writer.")
task1 = Task(description="Research GEO trends 2025.", agent=researcher,
             expected_output="3 bullet points")
task2 = Task(description="Write 100-word summary.", agent=writer,
             expected_output="100-word summary")
crew = Crew(agents=[researcher, writer], tasks=[task1, task2],
            process=Process.sequential)
print(crew.kickoff())
DEMO

Resource Estimates

Component	RAM	CPU
CrewAI Python	~300 MB	2-5%
Ollama (3B)	~2.2 GB	50-90% during inference
Total	~2.5 GB	Bursts to ~95%

Oracle Gotcha

IPv6 is default-on on Oracle instances but many Python HTTP libraries bind to it unexpectedly. If Ollama connection fails, force IPv4:

OLLAMA_HOST=0.0.0.0 ollama serve &

And in Python: use base_url="http://127.0.0.1:11434" instead of localhost.

Top Pick Ranking

Rank	Use Case	Reason
1	Qdrant + Embedding Server	Enables RAG for all other use cases; excellent RAM headroom
2	Ollama + Llama 3.2	Eliminates API costs; 10 tok/s viable for agent tasks
3	CrewAI Multi-Agent	Combines 1+2; orchestrates real agent workflows
4	n8n Orchestration	Fastest path to useful pipelines for non-coders
5	Web Scraping Pipeline	Most resource-hungry; best on dedicated instance

Recommendation: Start with Use Cases 1 and 2 in parallel, then layer CrewAI on top. Full autonomous AI agent stack for $0/month.

Oracle Cloud Gotchas: 5 Things Nobody Tells You

Security Lists and Security Groups are different -- both can block traffic independently. When a port is unreachable, check both.
ARM-incompatible images fail silently -- exec format error means you pulled an x86 image. Always check linux/arm64 support first.
Boot volume is 47 GB, not unlimited -- the 200 GB block storage is separate and must be manually attached and mounted.
IPv6 is default-on, causing subtle issues -- many services bind to ::1 instead of 127.0.0.1. Explicitly bind servers to 0.0.0.0.
Always Free limits are per-tenancy -- you get 4 OCPUs and 24 GB total across ALL ARM instances. A second instance will fail with a confusing "shape not available" error, not a quota error.

DEV Community

5 Ways to Run AI Agents on Oracle Cloud Free Tier (4 ARM Cores, 24GB RAM)

5 Ways to Run AI Agents on Oracle Cloud Free Tier (4 ARM Cores, 24GB RAM)

Use Case 1: Vector Database + Embedding Server (Qdrant + sentence-transformers)

Why 4 ARM Cores / 24 GB Fits

Setup Commands

Resource Estimates

Oracle Gotcha

Use Case 2: Local LLM Inference Server (Ollama + Llama 3.2)

Why 4 ARM Cores / 24 GB Fits

Setup Commands

Resource Estimates

Oracle Gotcha

Use Case 3: AI Agent Orchestration Hub (n8n)

Why 4 ARM Cores / 24 GB Fits

Setup Commands

Resource Estimates

Oracle Gotcha

Use Case 4: Web Scraping Pipeline (Playwright + Airflow)

Why 4 ARM Cores / 24 GB Fits

Setup Commands

Resource Estimates

Oracle Gotcha

Use Case 5: Multi-Agent Coordination (CrewAI + Ollama)

Why 4 ARM Cores / 24 GB Fits

Setup Commands

Resource Estimates

Oracle Gotcha

Top Pick Ranking

Oracle Cloud Gotchas: 5 Things Nobody Tells You

Top comments (0)