Elena Revicheva

Posted on Apr 15 • Originally published at aideazz.hashnode.dev

Oracle Cloud Free Tier for Production AI Agents: What Actually Works

#ai #programming #machinelearning

Originally published on AIdeazz — cross-posted here with canonical link.

Most developers dismiss Oracle Cloud's free tier as a student sandbox. They're wrong. I've been running production AI agents on it for months — serving real customers, handling thousands of requests, maintaining five-nines uptime. The infrastructure is solid, but you need to understand its sharp edges and work within specific constraints.

The Free Tier Reality Check

Oracle gives you 4 ARM cores, 24GB RAM, and 200GB storage split across VMs. Plus an Autonomous Database with 20GB storage and APEX for quick frontends. That's more generous than AWS or GCP free tiers, but the real value isn't the specs — it's the production-grade features they don't gate behind paywalls.

Here's what I actually run on Oracle Cloud free tier AI infrastructure:

Multi-agent orchestration system routing between Groq and Claude
WhatsApp and Telegram bot endpoints
Vector storage for RAG pipelines
Real-time monitoring and alerting
Backup failover instances

The catch? You get two VMs maximum. That forces architectural decisions most cloud-native developers aren't used to making. You can't spin up a new instance for every microservice. You need to think like you're managing physical servers again.

VM Shapes and Agent Architecture

Oracle's free ARM shapes (VM.Standard.A1.Flex) are surprisingly capable for AI workloads. I run a primary orchestrator on one 2-core/12GB instance and a failover on the second. The orchestrator doesn't run models — it routes requests to external APIs (Groq for speed, Claude for complex reasoning) and manages state.

The architecture looks like this:

Primary VM (2 cores, 12GB):
- Nginx reverse proxy
- Node.js orchestrator service  
- Redis for session management
- Vector DB (Qdrant) for RAG
- Monitoring stack (Prometheus/Grafana)

Secondary VM (2 cores, 12GB):
- Standby orchestrator
- PostgreSQL for conversation history
- Backup vector DB
- Log aggregation

Memory is your constraint, not CPU. Each agent process takes 200-400MB baseline. Vector operations spike to 2GB during indexing. Plan for 8GB working memory on a 12GB instance after OS overhead.

The ARM architecture means some libraries need recompilation. Qdrant ships ARM builds, but I had to build pgvector from source. Docker helps, but adds 10-15% overhead you might not afford.

Autonomous Database for Agent State

Oracle's Always Free Autonomous Database is the sleeper hit. 20GB might seem limiting, but for agent state management it's plenty. I store:

Conversation histories (compressed after 7 days)
User preferences and context
Agent performance metrics
Workflow definitions

The built-in REST APIs mean you can query directly from your agents without an ORM. Response times average 15ms for simple queries, 50ms for complex joins. That's faster than self-hosted Postgres on the same free tier VMs.

More importantly, it's truly autonomous. Automatic backups, patching, and scaling within limits. I've never touched maintenance in six months of operation. Compare that to managing your own Postgres instance — the time saved is worth architectural compromises.

Schema design matters more when you can't just throw storage at problems:

-- Conversation storage with automatic compression
CREATE TABLE conversations (
    id RAW(16) DEFAULT SYS_GUID(),
    user_id VARCHAR2(255),
    agent_type VARCHAR2(50),
    messages BLOB,
    created_at TIMESTAMP DEFAULT SYSTIMESTAMP,
    compressed NUMBER(1) DEFAULT 0
);

-- Automatic compression job
BEGIN
  DBMS_SCHEDULER.CREATE_JOB (
    job_name => 'COMPRESS_OLD_CONVOS',
    job_type => 'PLSQL_BLOCK',
    job_action => 'BEGIN compress_old_conversations; END;',
    start_date => SYSTIMESTAMP,
    repeat_interval => 'FREQ=DAILY; BYHOUR=3'
  );
END;

mTLS and Security Without The Pain

Oracle enforces mTLS for Autonomous Database connections. Initially annoying, but it forced me to implement proper certificate management early. Every connection is encrypted, authenticated, and logged. You can't accidentally expose a database to the internet.

For AI agents, this matters. You're handling user conversations, potentially sensitive data. The default security posture saves you from rookie mistakes. I've seen too many chatbot databases exposed on Shodan because someone forgot to configure firewall rules.

Setting up mTLS for your agents:

const oracledb = require('oracledb');

// Wallet path contains all certs
oracledb.initOracleClient({
  libDir: '/opt/oracle/instantclient',
  configDir: '/opt/oracle/wallet'
});

// Connection automatically uses mTLS
const connection = await oracledb.getConnection({
  user: process.env.DB_USER,
  password: process.env.DB_PASSWORD,
  connectString: process.env.DB_CONNECT_STRING
});

The wallet setup is one-time pain. Download from OCI console, extract to your VM, point your client library at it. After that, it just works.

Keeping Agents Alive in Production

Free tier doesn't include fancy orchestration. No Kubernetes, no managed containers. You need old-school process management. I use PM2 for Node.js agents, systemd for everything else.

Critical lesson: Oracle terminates idle VMs after 7 days. Not just stops — terminates. Your agents must generate enough CPU activity to avoid idle detection. I run a lightweight monitoring heartbeat every hour that spikes CPU to 5% for 30 seconds. Ugly but effective.

My PM2 ecosystem file for production agents:

module.exports = {
  apps: [{
    name: 'orchestrator',
    script: './src/orchestrator.js',
    instances: 2,
    exec_mode: 'cluster',
    max_memory_restart: '800M',
    error_file: '/var/log/pm2/orchestrator-error.log',
    out_file: '/var/log/pm2/orchestrator-out.log',
    merge_logs: true,
    env: {
      NODE_ENV: 'production',
      GROQ_API_KEY: process.env.GROQ_API_KEY,
      CLAUDE_API_KEY: process.env.CLAUDE_API_KEY
    }
  }, {
    name: 'heartbeat',
    script: './src/heartbeat.js',
    cron_restart: '0 * * * *',
    autorestart: false
  }]
};

The heartbeat script prevents termination:

// Prevents Oracle free tier VM termination
const crypto = require('crypto');

function generateActivity() {
  const start = Date.now();
  const duration = 30000; // 30 seconds

  while (Date.now() - start < duration) {
    // CPU-intensive operation
    crypto.pbkdf2Sync('keepalive', 'salt', 100000, 512, 'sha512');
  }

  console.log(`Heartbeat completed at ${new Date().toISOString()}`);
  process.exit(0);
}

generateActivity();

Real Constraints and Workarounds

Network limits: 10TB egress monthly. Sounds like plenty until you're proxying image generation or video processing. I route media-heavy operations directly to client devices when possible.

No load balancer: The free tier doesn't include OCI Load Balancer. I use Nginx on the primary VM with health checks to the secondary. During primary failure, I update DNS (60-second TTL) to point to secondary. Not instant failover, but good enough for most agent use cases.

Storage performance: The boot volumes are network-attached, not local NVMe. Sequential reads hit 100MB/s, but random I/O struggles. Don't run your vector database on boot volume — use memory or provision block storage (not free).

IPv4 exhaustion: You get 2 public IPs max. I multiplex services using Nginx SNI routing for different domains. Telegram webhooks go to yourdomain.com/telegram, WhatsApp to yourdomain.com/whatsapp, all on the same IP.

No object storage: Oracle's free tier doesn't include Object Storage. I use Cloudflare R2 (generous free tier) for conversation archives and generated media. The egress from R2 to OCI is free, keeping costs zero.

Production Patterns That Work

After months of operation, these patterns consistently deliver:

Pattern 1: Stateless orchestrators, stateful database
Agents are disposable. All state lives in Autonomous DB or Redis. Agent crashes don't lose conversations. Deployments are zero-downtime by updating standby first.

Pattern 2: External model routing
Don't try to run models on free tier VMs. Route to Groq (fast, cheap) for simple queries, Claude (smart, expensive) for complex ones. Decision logic in orchestrator:

function selectModel(query, userTier, conversationContext) {
  const complexity = analyzeComplexity(query);

  if (complexity.requiresReasoning || userTier === 'premium') {
    return 'claude-3-opus';
  }

  if (complexity.tokens > 4000 || complexity.multiStep) {
    return 'groq-mixtral';
  }

  return 'groq-llama-70b';
}

Pattern 3: Aggressive caching
Cache everything. Model responses for common queries. User preferences. Workflow definitions. RAM is limited but faster than any network call. My Redis instance runs with 2GB max memory, LRU eviction, and serves 90% of repeat queries from cache.

Pattern 4: Monitoring without observability platforms
DataDog and New Relic cost more than your infrastructure. I use Prometheus + Grafana on the secondary VM, scraping metrics every 30 seconds. Alerts go to Telegram via webhook. Total overhead: 200MB RAM, 5GB disk.

When to Graduate Beyond Free Tier

Oracle Cloud free tier AI infrastructure works until:

You need true high availability (not 60-second DNS failover)
Media processing becomes core (transcription, image generation)
You hit 10TB monthly egress
Compliance requires specific regions (free tier is limited)
You need more than 50 concurrent agent sessions

At that point, the jump to paid tier is reasonable. You've validated the business model, architecture is proven, and you know exact resource requirements. Most importantly, you can keep free tier as development/staging environment.

I'm still on free tier after six months, serving hundreds of daily users across WhatsApp and Telegram. The constraints force efficient architecture. The boring infrastructure work — process management, monitoring, backups — is what keeps agents alive, not the AI model you choose.

Oracle built their free tier to hook enterprises. They accidentally created the best platform for bootstrapped AI builders who care more about shipping than architectural purity.

— Elena Revicheva · AIdeazz · Portfolio

DEV Community