DEV Community

Alex Spinov
Alex Spinov

Posted on

Cloudflare Workers AI Has a Free API: Run AI Models at the Edge with Zero Infrastructure

What is Workers AI?

Workers AI lets you run AI models on Cloudflare's edge network — text generation, image classification, embeddings, speech-to-text, translation, and more. No GPU provisioning, no model hosting.

Free tier: 10,000 neurons/day (enough for ~100+ requests).

Quick Start

npm create cloudflare@latest my-ai-app -- --template worker-typescript
cd my-ai-app
Enter fullscreen mode Exit fullscreen mode
# wrangler.toml
[ai]
binding = "AI"
Enter fullscreen mode Exit fullscreen mode

Text Generation (LLM)

export default {
  async fetch(request: Request, env: Env) {
    const response = await env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
      messages: [
        { role: "system", content: "You are a helpful assistant." },
        { role: "user", content: "Explain WebAssembly in 3 sentences." },
      ],
      max_tokens: 256,
    });

    return Response.json(response);
  },
};
Enter fullscreen mode Exit fullscreen mode

Streaming Responses

const stream = await env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
  messages: [{ role: "user", content: "Write a poem about coding" }],
  stream: true,
});

return new Response(stream, {
  headers: { "Content-Type": "text/event-stream" },
});
Enter fullscreen mode Exit fullscreen mode

Text Embeddings

const embeddings = await env.AI.run("@cf/baai/bge-base-en-v1.5", {
  text: ["How to deploy a web app", "Best practices for CI/CD"],
});

console.log(embeddings.data[0].length); // 768 dimensions
Enter fullscreen mode Exit fullscreen mode

Combine with Vectorize for semantic search.

Image Classification

const imageData = await request.arrayBuffer();
const result = await env.AI.run("@cf/microsoft/resnet-50", {
  image: [...new Uint8Array(imageData)],
});

console.log(result); // [{label: "cat", score: 0.95}, ...]
Enter fullscreen mode Exit fullscreen mode

Image Generation

const image = await env.AI.run("@cf/stabilityai/stable-diffusion-xl-base-1.0", {
  prompt: "A futuristic city skyline at sunset, cyberpunk style",
});

return new Response(image, {
  headers: { "Content-Type": "image/png" },
});
Enter fullscreen mode Exit fullscreen mode

Translation

const translated = await env.AI.run("@cf/meta/m2m100-1.2b", {
  text: "Hello, how are you?",
  source_lang: "english",
  target_lang: "spanish",
});
// {translated_text: "Hola, ¿cómo estás?"}
Enter fullscreen mode Exit fullscreen mode

Speech-to-Text

const audioData = await request.arrayBuffer();
const transcription = await env.AI.run("@cf/openai/whisper", {
  audio: [...new Uint8Array(audioData)],
});
console.log(transcription.text);
Enter fullscreen mode Exit fullscreen mode

Text Summarization

const summary = await env.AI.run("@cf/facebook/bart-large-cnn", {
  input_text: longArticleText,
  max_length: 150,
});
Enter fullscreen mode Exit fullscreen mode

Available Models

Category Model Use Case
LLM Llama 3.1 8B Chat, code gen
Embeddings BGE Base Semantic search
Image Gen SDXL Image creation
Vision ResNet-50 Image classification
Translation M2M-100 100+ languages
Speech Whisper Audio transcription
Summarization BART Text summarization

REST API

curl "https://api.cloudflare.com/client/v4/accounts/ACCOUNT_ID/ai/run/@cf/meta/llama-3.1-8b-instruct" \
  -H "Authorization: Bearer $CF_TOKEN" \
  -d '{"messages": [{"role": "user", "content": "Hello"}]}'
Enter fullscreen mode Exit fullscreen mode

Need AI integration or edge computing setup?

📧 spinov001@gmail.com
🔧 My tools on Apify Store

Edge AI or centralized GPU — what's your approach?

Top comments (0)