Firew Shafi

Posted on Mar 15 • Edited on Mar 20

# How I Built an AI That Controls Cloudflare WAF via Plain English

#cloudflarechallenge #webdev #ai #devops

I've managed Cloudflare across multiple enterprise accounts for the past two years. I know the dashboard intimately — probably better than I anticipated.

In that time, I've navigated the same mental maze hundreds of times. You know the one: you need to block a badly-behaved IP that just started hammering your login endpoint, so you open the dashboard, find the right account out of several, switch to the right zone, locate Security → Security rules, click Create Rule, manually type out a firewall expression in the CF expression language, pick an action, hit save, and finally verify it's active. If you manage five accounts and eight zones, that's not a security task anymore — it's a memory test.

That frustration is why I built Flarite — an AI command engine that lets you manage your entire Cloudflare setup (and eventually your full SaaS stack) through plain English. No dashboard. No expression syntax memorization. Just intent.

This article is about the technical decisions behind making that work — specifically for the Cloudflare WAF and firewall management piece. I'll walk through the architecture, the security model for handling API tokens, and how the AI translates natural language into valid Cloudflare filter expressions.

The Core Insight: Cloudflare Already Has a Great API

Before writing a single line, I needed to validate the premise. The question wasn't "can Cloudflare be automated?" — it very much can. Their REST API is comprehensive and well-documented. The question was: "can the gap between what a person thinks and what the API understands be bridged reliably by an LLM?"

For WAF in particular, the answer is almost always yes. The reason is that Cloudflare's filter expression language, while powerful, maps one-to-one with concepts a person naturally expresses:

What you say	What Cloudflare needs
"Block China"	`(ip.geoip.country eq "CN")`
"Block that IP 1.2.3.4"	`(ip.src eq "1.2.3.4")`
"Challenge scrapers"	`(cf.threat_score gt 10)`
"Block anyone hitting /admin from outside"	`(http.request.uri.path eq "/admin") and (not ip.src in {"10.0.0.0/8"})`

The semantic distance between human intent and the expression DSL is small. That's the ideal task for an LLM.

The Architecture at a Glance

Flarite is built entirely on Cloudflare's own infrastructure — Cloudflare Workers for the backend runtime, D1 for storage, KV for session caching, and Workers AI for the LLM.

[Mobile / Web App]
       │
       ▼
[Cloudflare Worker — Hono.js Router]
       │
   ┌───┴────────────────────────────────────┐
   │  Route: /api/ai/command                │
   │  1. Validate session (KV)              │
   │  2. Check prompt quota (D1)            │
   │  3. Decrypt user's CF API token (D1)   │
   │  4. Forward to Workers AI (Llama 3.1)  │
   │  5. AI selects + executes tool         │
   │  6. Return structured result           │
   └────────────────────────────────────────┘
       │
       ▼
[Cloudflare API — api.cloudflare.com]

The backend is a single Hono.js application deployed as a Cloudflare Worker. Hono is a lightweight, edge-native framework — perfect for this workload where every millisecond of cold-start time matters.

Step 1: The Tool System (How the AI Knows What to Call)

The AI doesn't have free-form access to the Cloudflare API. Instead, I built a typed tool registry — a set of explicit, pre-defined function definitions that the model can choose from. This is the same pattern as OpenAI function calling or Claude's tool use, but implemented for Cloudflare Workers AI.

Every tool is a ToolDefinition object:

interface ToolDefinition {
    name: string;
    description: string;        // This is what the AI reads
    viewHint: string;           // Tells the frontend how to render the response
    requiredPermissions: string[];
    schema: JSONSchema;         // Input parameters
    run: (args: any, context: ToolContext) => Promise<any>;
}

Here's the real create_firewall_rule tool from Flarite's codebase. Notice the description — it's written for the AI, not for the user. It teaches the model how to translate English into CF expression syntax:

{
    name: 'create_firewall_rule',
    description: `Create a new firewall rule for a zone. Translates plain English descriptions
into Cloudflare filter expressions.

IMPORTANT: You must convert the user's plain English request into a valid Cloudflare
Firewall expression syntax before calling this tool.

Common CF expression patterns:
  - Block a specific IP:         (ip.src eq "1.2.3.4")
  - Block a country (ISO code):  (ip.geoip.country eq "CN")
  - Block by user agent:         (http.user_agent contains "scraperbot")
  - Block a path:                (http.request.uri.path eq "/admin")
  - Challenge bot traffic:       (cf.threat_score gt 10)
  - Combine conditions (AND):    (ip.geoip.country eq "CN") and (cf.threat_score gt 5)

Valid actions: block, allow, challenge, js_challenge, managed_challenge, log

Always set a clear description so the user can identify the rule later.`,

    viewHint: 'FIREWALL_RULE',
    requiredPermissions: ['Zone › Firewall Services: Edit'],
    schema: {
        type: 'object',
        properties: {
            zone_id:    { type: 'string', description: 'Zone ID. Optional — omit and the system will resolve it.' },
            expression: { type: 'string', description: 'Cloudflare filter expression (translated from user intent)' },
            action:     { type: 'string', enum: ['block', 'allow', 'challenge', 'js_challenge', 'managed_challenge', 'log'] },
            description:{ type: 'string', description: 'Human-readable description of what this rule does' },
            priority:   { type: 'number', description: 'Rule priority (optional, lower = higher priority)' }
        },
        required: ['expression', 'action', 'description']
    },

    run: async ({ zone_id, expression, action, description, priority }, { apiToken, zoneId: ctxZoneId }) => {
        const id = await ensureZoneId(zone_id, apiToken, ctxZoneId);

        // Cloudflare requires creating a Filter first, then a Rule that references it
        const filterRes = await cfFetch(`/zones/${id}/filters`, {
            method: 'POST',
            body: JSON.stringify([{ expression, description }])
        }, apiToken);

        if (!filterRes.success) return { success: false, errors: filterRes.errors };
        const filter_id = filterRes.result?.[0]?.id;

        const body: any = [{ filter: { id: filter_id }, action, description }];
        if (priority !== undefined) body[0].priority = priority;

        const ruleRes = await cfFetch(`/zones/${id}/firewall/rules`, {
            method: 'POST',
            body: JSON.stringify(body)
        }, apiToken);

        if (ruleRes.success) {
            const r = ruleRes.result?.[0];
            return { success: true, action: 'created', id: r?.id, rule_action: r?.action, description: r?.description, expression };
        }
        return { success: false, errors: ruleRes.errors };
    }
}

One thing worth noting: Cloudflare's WAF API is a two-step operation. You create a Filter (the expression) first, then you create a Firewall Rule that references the filter's ID. The AI doesn't need to know any of this — the run function handles the sequencing completely. The model just provides the translated expression and action, and the tool infrastructure handles the API mechanics.

Step 2: The "Elicitation" Problem (Multi-Account / Multi-Zone)

This is where it gets interesting for anyone who manages Cloudflare at enterprise scale.

When someone types "block China on my zone", the system needs to know which zone. If they have one zone, fine — auto-resolve it. If they have eight zones across three accounts, you need to ask.

But here's the problem: you can't pause an AI inference call mid-flight to ask a follow-up question. The entire AI → tool → API → response cycle needs to complete atomically.

My solution was an Elicitation Protocol. When a tool needs disambiguation, it throws a special exception instead of completing:

async function ensureZoneId(
    zoneId: string | undefined,
    apiToken: string,
    contextId?: string
): Promise<string> {
    if (zoneId) return zoneId;
    if (contextId) return contextId;  // User already selected a zone in this session

    const res = await cfFetch('/zones', {}, apiToken);
    const zones = res.result || [];

    if (zones.length === 1) return zones[0].id;  // Auto-resolve single zone
    if (zones.length === 0) throw new Error("No zones found for this token.");

    // Multiple zones — we need to ask the user
    throw new ElicitationRequired({
        type: 'zone_selection',
        data: {
            zones: zones.map((z: any) => ({
                id: z.id,
                name: z.name,
                status: z.status,
                plan: z.plan?.name || 'Free'
            }))
        },
        prompt: "Multiple zones found. Please select which domain you're referring to:"
    });
}

The frontend catches ElicitationRequired and renders a selection UI. The user picks their zone, the choice is stored in session context, and the next prompt automatically resolves to the correct zone without asking again.

This is the pattern that makes multi-account management feel natural rather than cumbersome. Once you've set context, it sticks.

Step 3: The Token Security Model

This is the part I spent the most time on, and the part I'd most want scrutinized.

Users have to hand Flarite their Cloudflare API token. That's a significant trust requirement — Cloudflare API tokens can control DNS, WAF rules, Workers, and more. Getting this wrong would be catastrophic.

Here's what I built:

Ingestion

When a token arrives at the backend (via src/routes/users.ts), it is immediately encrypted and never logged in plaintext:

const encryptToken = async (rawToken: string, encryptionKey: string): Promise<string> => {
    // Derive a CryptoKey from the environment secret
    const keyMaterial = await crypto.subtle.importKey(
        'raw',
        new TextEncoder().encode(encryptionKey),
        { name: 'PBKDF2' },
        false,
        ['deriveKey']
    );

    const key = await crypto.subtle.deriveKey(
        { name: 'PBKDF2', salt: new TextEncoder().encode('flarite-salt'), iterations: 210000, hash: 'SHA-256' },
        keyMaterial,
        { name: 'AES-GCM', length: 256 },
        false,
        ['encrypt']
    );

    // A unique IV is generated for every token — same plaintext never produces the same ciphertext
    const iv = crypto.getRandomValues(new Uint8Array(12));
    const ciphertext = await crypto.subtle.encrypt(
        { name: 'AES-GCM', iv },
        key,
        new TextEncoder().encode(rawToken)
    );

    // Store as: base64(iv):base64(ciphertext)
    const ivB64 = btoa(String.fromCharCode(...iv));
    const ctB64 = btoa(String.fromCharCode(...new Uint8Array(ciphertext)));
    return `${ivB64}:${ctB64}`;
};

What lands in D1:

iv:ciphertext
ABCDEFGHijk...:xyz9876543...

The ENCRYPTION_KEY is a Wrangler secret — it never touches the D1 database, never appears in logs, and never leaves the Worker runtime.

Runtime Decryption

When an AI command comes in and needs to call the Cloudflare API, the proxy layer (src/routes/proxy.ts) decrypts the token on the fly:

User sends prompt
      │
      ▼
[Worker validates KV session]
      │
      ▼
[Worker fetches iv:ciphertext from D1]
      │
      ▼
[crypto.subtle.decrypt → raw token in memory only]
      │
      ▼
[fetch → api.cloudflare.com with Bearer token]
      │
      ▼
[Raw token goes out of scope, GC'd]

The plaintext token exists in memory for milliseconds and is never persisted in decrypted form. This is as close to zero-knowledge as you can get without fully client-side encryption.

Password Security

User account passwords use PBKDF2 with 210,000 iterations (the current OWASP recommended minimum), and OTPs are generated exclusively with crypto.getRandomValues() — not Math.random(). These are small details that matter.

Step 4: Multi-Tenant Architecture for Teams

Flarite isn't just a personal tool — it's designed for teams managing a shared Cloudflare organization. The database schema reflects this:

users (email, password_hash, global_role)
  │
  └─→ organization_members (user_id, org_id, role: owner|admin|member)
                │
                └─→ organizations (tier, prompts_used_this_month)
                          │
                          └─→ user_cf_credentials (org_id, iv, ciphertext)

The crucial detail: API tokens are scoped to the organization, not the individual user. An admin on a team can run Cloudflare commands using the organization's stored tokens without ever seeing the raw token value. The credential belongs to the org — team members inherit access through their membership role.

This matches how enterprise Cloudflare accounts actually work. You manage tokens at the account level, not per-person. Teams share a service token, and people come and go without rekeying everything.

Step 5: The AI Layer (Llama 3.1 on Workers AI)

The AI inference layer runs on Cloudflare's own Workers AI using @cf/meta/llama-3.1-8b-instruct. Running the model inside the same runtime as the backend has three advantages: no external API round-trip latency, no data leaving Cloudflare's network, and a simple per-token billing model.

The system prompt gives the model its context, the tool descriptions teach it what's available, and the model returns a structured tool call. Here's the simplified flow inside src/routes/ai.ts:

// 1. Check quota
const org = await db.prepare(
    'SELECT prompts_used_this_month, tier FROM organizations WHERE id = ?'
).bind(orgId).first();

const limit = TIER_LIMITS[org.tier];
if (org.prompts_used_this_month >= limit) {
    return c.json({ error: 'Monthly prompt limit reached.' }, 429);
}

// 2. Build tool list for the user's active integration (Cloudflare, Stripe, etc.)
const tools = getToolsForProvider(activeIntegration);

// 3. Inference
const aiResponse = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
    messages: [
        { role: 'system', content: buildSystemPrompt(tools) },
        ...conversationHistory,
        { role: 'user', content: userPrompt }
    ],
    tools: tools.map(t => ({ name: t.name, description: t.description, parameters: t.schema }))
});

// 4. Execute tool if called
if (aiResponse.tool_calls?.length > 0) {
    const call = aiResponse.tool_calls[0];
    const tool = tools.find(t => t.name === call.name);
    const result = await tool.run(call.arguments, { apiToken: decryptedToken, zoneId: ctx.zoneId });

    // 5. Atomic: increment counter + log — in a single D1 batch
    await db.batch([
        db.prepare('UPDATE organizations SET prompts_used_this_month = prompts_used_this_month + 1 WHERE id = ?').bind(orgId),
        db.prepare('INSERT INTO prompt_logs (org_id, prompt, tool_called, result_summary) VALUES (?, ?, ?, ?)').bind(orgId, userPrompt, call.name, JSON.stringify(result).slice(0, 500))
    ]);

    return c.json({ type: 'tool_result', tool: call.name, result, viewHint: tool.viewHint });
}

The D1 batch for counter+logging is important. If the counter increment succeeded but the log insert failed (or vice versa), you'd have billing drift. The batch makes them atomic.

What a Real Interaction Looks Like

To make this concrete, here's an actual sequence for "block all traffic from Russia on my main zone":

1. User sends: "block all traffic from Russia on my main zone"

2. AI selects: create_firewall_rule

3. AI translates to:

{
    "expression": "(ip.geoip.country eq \"RU\")",
    "action": "block",
    "description": "Block all traffic originating from Russia"
}

4. Tool creates the Cloudflare filter at /zones/{id}/filters

5. Tool creates the firewall rule at /zones/{id}/firewall/rules referencing the filter

6. Returns:

{
    "success": true,
    "action": "created",
    "id": "f2a4b...",
    "rule_action": "block",
    "description": "Block all traffic originating from Russia",
    "expression": "(ip.geoip.country eq \"RU\")"
}

7. Frontend renders a FIREWALL_RULE card showing the rule details.

The entire round-trip — from the user typing to a live WAF rule active on Cloudflare's edge — takes under 3 seconds. Without Flarite, that same operation through the dashboard takes 45–90 seconds, assuming you know the expression syntax.

What the Request Tracer Adds

One feature I'm particularly happy with is the Request Tracer integration. Cloudflare's API has a tracer endpoint (/accounts/{id}/request-tracer/trace) that lets you simulate an HTTP request through your stack and see exactly which WAF rules, firewall policies, and rulesets matched.

With Flarite, you can type: "trace a GET request to https://example.com/admin and tell me what fires" and get back a clean breakdown of every matched rule:

{
    "status_code": 403,
    "matched": [
        {
            "step_name": "firewall_rules",
            "type": "firewall",
            "public_name": "Block /admin access",
            "action": "block",
            "expression": "(http.request.uri.path eq \"/admin\")"
        },
        {
            "step_name": "waf_rule",
            "type": "waf",
            "public_name": "Cloudflare Managed Ruleset",
            "action": "managed_challenge"
        }
    ]
}

The tool recursively walks the trace tree, including nested ruleset traces, to surface everything that would apply to a given request. For debugging WAF behavior — especially when rules interact unexpectedly — this is genuinely transformative.

Lessons Learned

1. Tool descriptions are your most important code.
The AI doesn't read your TypeScript — it reads your descriptions. I spent more time refining tool descriptions and adding example translations than I did writing the actual API calls. Bad descriptions lead to wrong expressions being generated. Good descriptions lead to reliable, predictable behavior.

2. Multi-account resolution is a first-class problem.
Don't treat it as an edge case. If you're building anything for enterprise Cloudflare users, assume they have multiple accounts and multiple zones. The elicitation pattern — detect ambiguity, surface a selection UI, persist the choice in session — is the right model.

3. Never log decrypted tokens. Ever.
Even in development. Build the habit early. Your D1 tables, your Workers logs (via console.log), your error messages — none of them should ever contain a plaintext API token. I treat any log containing a raw token as a critical security incident, even in a local dev environment.

4. Edge-native is a genuine advantage for this use case.
Running the entire stack (inference, API proxying, session management) inside Cloudflare's network means the decrypted token never crosses a network boundary in plaintext. The Worker decrypts in memory, makes the API call, and the token is gone. That's a security property you can't easily achieve with a traditional server architecture.

5. The Cloudflare API has two-step semantics for WAF.
You create filters and rules separately. The AI doesn't need to know this, but you do. Abstract it properly in your tool implementation so the AI just provides intent and the tool handles sequencing.

What's Next

Flarite currently supports Cloudflare, Stripe, Supabase, Appwrite, GitHub, and Vercel — all in the same conversational interface. The vision is a single place where an engineer or operator can manage their entire production SaaS stack without switching dashboards.

If you've ever felt the Cloudflare dashboard growing larger and more complex with every feature release — and you've ever wished you could just describe what you want — that's exactly the frustration I built this to fix.

Flarite is live at flarite.com. There's a free trial with 50 prompts — no credit card required. If you manage Cloudflare at scale and want to kick the tires, I'd genuinely love your feedback.

Technical Stack Summary

Layer	Technology
Runtime	Cloudflare Workers (Edge-Native)
Framework	Hono.js
Database	Cloudflare D1 (Serverless SQLite)
Session Cache	Cloudflare KV
AI Model	Llama 3.1 8B Instruct (Workers AI)
Encryption	Web Crypto API — AES-GCM-256, PBKDF2 (210k iterations)
Frontend	React Native (Expo) + Next.js Web
Deployment	Cloudflare Pages + Wrangler

Built by a developer who was tired of clicking through the same dashboards every day. Feedback and questions welcome in the comments.

DEV Community