<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: toolfreebie</title>
    <description>The latest articles on DEV Community by toolfreebie (@build996).</description>
    <link>https://dev.to/build996</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3909730%2F6972eddd-4c8f-475b-a284-e5755d0ce323.jpeg</url>
      <title>DEV Community: toolfreebie</title>
      <link>https://dev.to/build996</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/build996"/>
    <language>en</language>
    <item>
      <title>Together AI Free API: Run Llama 3.3, DeepSeek R1, and FLUX Image Generation for Free in 2026</title>
      <dc:creator>toolfreebie</dc:creator>
      <pubDate>Sun, 03 May 2026 15:54:07 +0000</pubDate>
      <link>https://dev.to/build996/together-ai-free-api-run-llama-33-deepseek-r1-and-flux-image-generation-for-free-in-2026-19of</link>
      <guid>https://dev.to/build996/together-ai-free-api-run-llama-33-deepseek-r1-and-flux-image-generation-for-free-in-2026-19of</guid>
      <description>&lt;h2&gt;
  
  
  What Is Together AI?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.together.ai" rel="noopener noreferrer"&gt;Together AI&lt;/a&gt; is an AI inference platform that hosts hundreds of open-source models behind one OpenAI-compatible API. Founded in 2022 and backed by NVIDIA, Salesforce Ventures, and Kleiner Perkins, the company built its reputation around two things developers actually care about: &lt;strong&gt;fast hosted inference for state-of-the-art open models&lt;/strong&gt; (Llama, DeepSeek, Qwen, Mixtral) and a &lt;strong&gt;genuinely free tier&lt;/strong&gt; that exposes a small but useful set of those models with no credit card required.&lt;/p&gt;

&lt;p&gt;What separates Together AI from the long list of “free AI API” providers in 2026 is the breadth of categories you can hit on a single key. One signup gives you free access to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Llama 3.3 70B Instruct Turbo (Free)&lt;/strong&gt; — Meta’s flagship 70B chat model&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DeepSeek R1 Distill Llama 70B (Free)&lt;/strong&gt; — open reasoning model with chain-of-thought&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;FLUX.1 &lt;a href="https://dev.toFree"&gt;schnell&lt;/a&gt;&lt;/strong&gt; — Black Forest Labs’ fast image generation model&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Llama 3.2 11B Vision Instruct (Free)&lt;/strong&gt; — multimodal image-understanding model&lt;/li&gt;
&lt;li&gt;Plus hundreds of other open models on a $1 trial credit&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’re already evaluating &lt;a href="https://toolfreebie.com/groq-api-the-fastest-free-ai-api-in-2026-300-800-tokens-s/" rel="noopener noreferrer"&gt;Groq&lt;/a&gt;, &lt;a href="https://toolfreebie.com/cerebras-inference-api-fastest-free-ai-api/" rel="noopener noreferrer"&gt;Cerebras&lt;/a&gt;, &lt;a href="https://toolfreebie.com/google-gemini-api-the-best-free-ai-api-in-2026/" rel="noopener noreferrer"&gt;Gemini&lt;/a&gt;, or &lt;a href="https://toolfreebie.com/deepseek-api-free-access-to-r1-reasoning-and-v3-chat-models/" rel="noopener noreferrer"&gt;DeepSeek&lt;/a&gt;, Together AI fills a different gap: a single endpoint that covers chat, reasoning, vision, and image generation on the same key.&lt;/p&gt;

&lt;h2&gt;
  
  
  What’s Actually Free on Together AI
&lt;/h2&gt;

&lt;p&gt;Together AI uses a clear naming convention: any model whose ID ends with the suffix &lt;code&gt;-Free&lt;/code&gt; can be called without consuming credits. These are slightly slower than the paid tiers (rate-limited, lower priority) but functionally complete. Everything else runs against the $1 free trial credit you get at signup.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model ID&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Context&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;meta-llama/Llama-3.3-70B-Instruct-Turbo-Free&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Chat / instruction&lt;/td&gt;
&lt;td&gt;128K tokens&lt;/td&gt;
&lt;td&gt;General assistant, RAG answer generation, code Q&amp;amp;A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;deepseek-ai/DeepSeek-R1-Distill-Llama-70B-free&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Reasoning&lt;/td&gt;
&lt;td&gt;32K tokens&lt;/td&gt;
&lt;td&gt;Math, multi-step logic, agent planning loops&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;meta-llama/Llama-Vision-Free&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Vision (multimodal)&lt;/td&gt;
&lt;td&gt;128K tokens&lt;/td&gt;
&lt;td&gt;Image captioning, OCR, chart and screenshot understanding&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;black-forest-labs/FLUX.1-schnell-Free&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Image generation&lt;/td&gt;
&lt;td&gt;1024×1024 default&lt;/td&gt;
&lt;td&gt;Blog cover images, prototypes, social posts&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Beyond the explicitly free tier, the $1 trial credit is enough to exercise dozens of paid models — Mixtral 8x22B, Qwen 2.5 72B, Llama 3.1 405B, audio models like Whisper, embeddings models like BGE and M2-BERT — for tens of thousands of tokens each, which is plenty to test whether the bigger models meaningfully change your results before you commit a card.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: Together AI quietly retires and renames “Free” models from time to time as newer versions land. If a model ID stops working, check the &lt;a href="https://docs.together.ai/docs/serverless-models" rel="noopener noreferrer"&gt;official model list&lt;/a&gt; for the current Free variant.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Get Your Free API Key
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;a href="https://api.together.ai" rel="noopener noreferrer"&gt;api.together.ai&lt;/a&gt; and sign up with email, Google, or GitHub&lt;/li&gt;
&lt;li&gt;Verify your email address&lt;/li&gt;
&lt;li&gt;From the dashboard, navigate to &lt;strong&gt;Settings → API Keys&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Copy your default key (it starts with a long hex string, no prefix)&lt;/li&gt;
&lt;li&gt;Set it as an environment variable: &lt;code&gt;export TOGETHER_API_KEY="your_key_here"&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No credit card. No phone number. The $1 free trial credit and access to all &lt;code&gt;-Free&lt;/code&gt; models are activated immediately on signup.&lt;/p&gt;

&lt;h2&gt;
  
  
  curl Quickstart: Your First Request in 30 Seconds
&lt;/h2&gt;

&lt;p&gt;Together AI is fully OpenAI-compatible, so the cleanest way to confirm everything works is a one-shot curl call against the chat completions endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://api.together.xyz/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$TOGETHER_API_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo-Free",
    "messages": [
      {"role": "user", "content": "Explain pgvector in two sentences."}
    ]
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you get back a JSON response with a &lt;code&gt;choices[0].message.content&lt;/code&gt; field, you’re set. The exact same payload shape works against OpenAI — only the base URL and the &lt;code&gt;model&lt;/code&gt; string change.&lt;/p&gt;

&lt;h2&gt;
  
  
  Python Quickstart
&lt;/h2&gt;

&lt;p&gt;The official SDK is a thin wrapper around the OpenAI Python client. Install it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;together
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Basic chat completion:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;together&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Together&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Together&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TOGETHER_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta-llama/Llama-3.3-70B-Instruct-Turbo-Free&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a concise senior engineer.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;When should I prefer SQLite over Postgres?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you already have OpenAI SDK code, swapping providers is a two-line change:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TOGETHER_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.together.xyz/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta-llama/Llama-3.3-70B-Instruct-Turbo-Free&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a haiku about caching.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every parameter you’d pass to OpenAI — &lt;code&gt;temperature&lt;/code&gt;, &lt;code&gt;top_p&lt;/code&gt;, &lt;code&gt;stop&lt;/code&gt;, &lt;code&gt;response_format&lt;/code&gt;, &lt;code&gt;tools&lt;/code&gt;, &lt;code&gt;tool_choice&lt;/code&gt; — works identically.&lt;/p&gt;

&lt;h2&gt;
  
  
  Streaming Responses
&lt;/h2&gt;

&lt;p&gt;For chat UIs and agent loops, you almost always want token streaming. Set &lt;code&gt;stream=True&lt;/code&gt; and iterate:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta-llama/Llama-3.3-70B-Instruct-Turbo-Free&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Outline a blog post about RAG.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;delta&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flush&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Streaming on the Free tier is real streaming, not buffered chunks — you’ll see tokens appear at roughly the model’s true generation rate, which makes it usable for live chat UIs even before you start paying.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reasoning with DeepSeek R1 Distill
&lt;/h2&gt;

&lt;p&gt;The DeepSeek R1 family produces visible chain-of-thought reasoning before its final answer. On Together AI’s Free tier you can call the 70B distilled variant, which keeps most of the reasoning capability of the full R1 model at a fraction of the parameter count:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deepseek-ai/DeepSeek-R1-Distill-Llama-70B-free&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A bookstore sold 60 books on Monday, then sales grew &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;12% each day through Friday. How many books did they &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sell in total that week? Show your work.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model’s response will include a &lt;code&gt;&amp;lt;think&amp;gt;&lt;/code&gt;…&lt;code&gt;&amp;lt;/think&amp;gt;&lt;/code&gt; block of internal reasoning followed by the final answer. For agent applications, you can either show the reasoning to the user (transparency) or strip it out (clean output) depending on the surface.&lt;/p&gt;

&lt;h2&gt;
  
  
  Image Generation with FLUX.1 [schnell] Free
&lt;/h2&gt;

&lt;p&gt;FLUX.1 [schnell] is Black Forest Labs’ fast text-to-image model, distilled to 4 sampling steps and open-sourced under Apache 2.0. Together AI hosts it as a free image-generation endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;black-forest-labs/FLUX.1-schnell-Free&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A clean isometric illustration of an AI agent fetching data from a cloud database, soft pastel colors, no text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;height&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;steps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The returned URL is hosted by Together AI and stays valid long enough to download or pipe into a CDN. For blog covers, social posts, or quick mockups, FLUX.1 [schnell] often beats Stable Diffusion XL on prompt adherence at a fraction of the inference time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Vision: Llama 3.2 Vision Free
&lt;/h2&gt;

&lt;p&gt;The Free vision model accepts standard OpenAI-format multimodal messages — text plus image URLs or base64 data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta-llama/Llama-Vision-Free&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What does this dashboard show? List the three highest values.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://example.com/dashboard.png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;
            &lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the cheapest path in 2026 to a working “describe this screenshot” or “extract data from this chart” feature without standing up your own vision pipeline. For OCR-heavy workloads on dense documents, a paid vision model will still outperform — but for screenshots, charts, product photos, and general image Q&amp;amp;A, Llama Vision Free is genuinely useful.&lt;/p&gt;

&lt;h2&gt;
  
  
  Together AI vs Other Free AI APIs
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Free Chat&lt;/th&gt;
&lt;th&gt;Free Reasoning&lt;/th&gt;
&lt;th&gt;Free Vision&lt;/th&gt;
&lt;th&gt;Free Image Gen&lt;/th&gt;
&lt;th&gt;OpenAI Compatible&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Together AI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Llama 3.3 70B&lt;/td&gt;
&lt;td&gt;DeepSeek R1 Distill 70B&lt;/td&gt;
&lt;td&gt;Llama 3.2 Vision 11B&lt;/td&gt;
&lt;td&gt;FLUX.1 schnell&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://toolfreebie.com/groq-api-the-fastest-free-ai-api-in-2026-300-800-tokens-s/" rel="noopener noreferrer"&gt;Groq&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Llama 3.3 70B (very fast)&lt;/td&gt;
&lt;td&gt;DeepSeek R1 Distill&lt;/td&gt;
&lt;td&gt;Llama Vision&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://toolfreebie.com/cerebras-inference-api-fastest-free-ai-api/" rel="noopener noreferrer"&gt;Cerebras&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Llama 3.3 70B (extremely fast)&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://toolfreebie.com/google-gemini-api-the-best-free-ai-api-in-2026/" rel="noopener noreferrer"&gt;Gemini&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Gemini 2.0 Flash&lt;/td&gt;
&lt;td&gt;Gemini 2.0 Flash Thinking&lt;/td&gt;
&lt;td&gt;Built in&lt;/td&gt;
&lt;td&gt;Imagen (limited)&lt;/td&gt;
&lt;td&gt;Via compat layer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://toolfreebie.com/cloudflare-workers-ai-free-edge-ai-inference-with-47-models/" rel="noopener noreferrer"&gt;Cloudflare Workers AI&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Llama 3 / Mistral&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;LLaVA&lt;/td&gt;
&lt;td&gt;SDXL Lightning&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://toolfreebie.com/openrouter-access-300-free-ai-models-with-one-api-key/" rel="noopener noreferrer"&gt;OpenRouter&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Many free models&lt;/td&gt;
&lt;td&gt;DeepSeek R1 free&lt;/td&gt;
&lt;td&gt;Several&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Where Together AI wins on the free tier:&lt;/strong&gt; coverage. It’s the only provider on this list that offers chat, reasoning, vision, &lt;em&gt;and&lt;/em&gt; image generation under one OpenAI-compatible endpoint, on one key, with no credit card. If you’re prototyping a multimodal product and don’t want to juggle three or four signups, Together AI compresses the entire surface area into one integration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where the others win:&lt;/strong&gt; raw speed (Cerebras and Groq are faster on Llama 3.3 70B), context window (Gemini’s 1M tokens is unmatched), or model variety (OpenRouter aggregates more providers).&lt;/p&gt;

&lt;h2&gt;
  
  
  Rate Limits and Fair Use
&lt;/h2&gt;

&lt;p&gt;Free-tier rate limits on Together AI exist to keep costs predictable. The exact numbers are published in the &lt;a href="https://docs.together.ai/docs/rate-limits" rel="noopener noreferrer"&gt;official rate limits page&lt;/a&gt; and change as the platform scales, but as a working mental model in 2026:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;-Free chat models:&lt;/strong&gt; low double-digit requests per minute, with smaller per-day caps than paid tiers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;-Free image models:&lt;/strong&gt; tighter caps (image inference is much more expensive), often a few requests per minute&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Paid models on trial credit:&lt;/strong&gt; the standard tier-1 limits, but capped by your $1 budget — usually thousands of requests before the credit runs out on smaller models&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The headline takeaway: Free-tier limits are designed for development and prototyping. They are not designed to support a production user base. If your side project starts getting traction, you’ll need to either move to a paid plan or layer caching in front (request deduplication on prompts is the highest-leverage win).&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Use Together AI vs Alternatives
&lt;/h2&gt;

&lt;p&gt;A simple decision tree based on what you’re optimizing for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Need everything in one key — chat + reasoning + vision + images?&lt;/strong&gt; → Together AI Free tier&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Need the fastest possible chat response (under 1 second to first token)?&lt;/strong&gt; → &lt;a href="https://toolfreebie.com/cerebras-inference-api-fastest-free-ai-api/" rel="noopener noreferrer"&gt;Cerebras&lt;/a&gt; or &lt;a href="https://toolfreebie.com/groq-api-the-fastest-free-ai-api-in-2026-300-800-tokens-s/" rel="noopener noreferrer"&gt;Groq&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Need a 1M-token context window for long documents?&lt;/strong&gt; → &lt;a href="https://toolfreebie.com/google-gemini-api-the-best-free-ai-api-in-2026/" rel="noopener noreferrer"&gt;Gemini&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Need the widest catalogue of free models from many providers?&lt;/strong&gt; → &lt;a href="https://toolfreebie.com/openrouter-access-300-free-ai-models-with-one-api-key/" rel="noopener noreferrer"&gt;OpenRouter&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Need the best free embedding + reranker for RAG?&lt;/strong&gt; → &lt;a href="https://toolfreebie.com/cohere-free-api-embedding-rerank-rag-2026/" rel="noopener noreferrer"&gt;Cohere&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Building edge functions and want inference inside Cloudflare?&lt;/strong&gt; → &lt;a href="https://toolfreebie.com/cloudflare-workers-ai-free-edge-ai-inference-with-47-models/" rel="noopener noreferrer"&gt;Cloudflare Workers AI&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Together AI is the right answer when your project benefits from a single integration that covers many capabilities, especially for multimodal applications and reasoning-heavy agents that may also need image generation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use Together AI with OpenClaw
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://openclaw.ai" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt; is an AI agent platform that orchestrates multiple APIs and tools into automated workflows. Together AI fits well as a &lt;strong&gt;single inference layer&lt;/strong&gt; behind an OpenClaw agent that needs to handle multiple modalities — read a screenshot, reason about what to do next, and produce a generated image as part of the output.&lt;/p&gt;

&lt;p&gt;A working example: an OpenClaw agent receives a customer support ticket that includes a screenshot of an error. The agent uses Llama Vision (Free) to extract the error message from the image, DeepSeek R1 Distill (Free) to reason about which knowledge-base article applies, Llama 3.3 70B (Free) to draft a reply, and FLUX.1 &lt;a href="https://dev.toFree"&gt;schnell&lt;/a&gt; to generate a clean diagram for the customer if a visual explanation helps. All four steps hit the same API key.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;together&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Together&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Together&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TOGETHER_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;support_pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ticket_text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;screenshot_url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;A multi-modal support agent step for OpenClaw.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="c1"&gt;# 1. Extract the error from the screenshot
&lt;/span&gt;    &lt;span class="n"&gt;vision&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta-llama/Llama-Vision-Free&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Read the error message in this screenshot and return only the error text.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;screenshot_url&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;
            &lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;error_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vision&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;

    &lt;span class="c1"&gt;# 2. Reason about which solution applies
&lt;/span&gt;    &lt;span class="n"&gt;reasoning&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deepseek-ai/DeepSeek-R1-Distill-Llama-70B-free&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Ticket: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ticket_text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Error extracted: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;error_text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;What is the most likely root cause?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;}],&lt;/span&gt;
        &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;800&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# 3. Draft a customer-facing reply
&lt;/span&gt;    &lt;span class="n"&gt;reply&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta-llama/Llama-3.3-70B-Instruct-Turbo-Free&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a senior support engineer. Be concise and friendly.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Ticket: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ticket_text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Root cause analysis: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;reasoning&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Write the reply to the customer.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;error_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analysis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;reasoning&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reply&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;reply&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The same pattern fits other OpenClaw use cases: a research agent that reads charts and reasons about them, a content agent that writes a post and generates its cover image, a QA agent that screenshots a UI and verifies what it sees. The single-key, single-SDK shape keeps the agent code small.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pricing When You Outgrow Free
&lt;/h2&gt;

&lt;p&gt;If your application moves beyond prototyping, Together AI’s serverless pricing for the same models is competitive with the rest of the market. Approximate published prices in 2026 for popular models:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Approx Price&lt;/th&gt;
&lt;th&gt;Unit&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Llama 3.3 70B Instruct Turbo&lt;/td&gt;
&lt;td&gt;~$0.88&lt;/td&gt;
&lt;td&gt;per 1M tokens (blended)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Llama 3.1 8B Instruct Turbo&lt;/td&gt;
&lt;td&gt;~$0.18&lt;/td&gt;
&lt;td&gt;per 1M tokens (blended)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Llama 3.1 405B Instruct Turbo&lt;/td&gt;
&lt;td&gt;~$3.50&lt;/td&gt;
&lt;td&gt;per 1M tokens (blended)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek R1&lt;/td&gt;
&lt;td&gt;~$3.00 / $7.00&lt;/td&gt;
&lt;td&gt;per 1M input / output tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FLUX.1 [schnell]&lt;/td&gt;
&lt;td&gt;~$0.003&lt;/td&gt;
&lt;td&gt;per image (1024×1024, 4 steps)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BGE / M2-BERT embeddings&lt;/td&gt;
&lt;td&gt;~$0.008 to $0.05&lt;/td&gt;
&lt;td&gt;per 1M tokens (model-dependent)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Two things make this pricing especially friendly for solo builders. First, you only pay for what you use — there’s no monthly minimum. Second, the same key works for both the Free tier and paid models, so there’s no migration cost when you flip from free to paid for a single hot model. Check the &lt;a href="https://www.together.ai/pricing" rel="noopener noreferrer"&gt;official pricing page&lt;/a&gt; for current numbers.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Is Together AI’s Free tier really free, or is it a trial?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Both. Models with the &lt;code&gt;-Free&lt;/code&gt; suffix are free to call indefinitely (rate-limited but non-expiring). All other models run against a one-time $1 trial credit at signup. Once the trial credit is gone, paid models stop until you add a payment method.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do I need a credit card to sign up?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;No. The default account state has no payment method on file. You only need to add one when you want to spend beyond your trial credit on paid models — Free-tier models keep working either way.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is the API truly OpenAI-compatible?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes for chat completions, streaming, and tool calling. Image generation uses Together AI’s own endpoint shape (which closely mirrors OpenAI’s). Embeddings are also OpenAI-compatible. In practice, you can point any OpenAI SDK at &lt;code&gt;https://api.together.xyz/v1&lt;/code&gt; and most code works without changes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What’s the difference between “Turbo” and non-Turbo models?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Turbo variants are quantized (typically FP8) for higher throughput at very small quality loss. Together AI publishes evaluation numbers showing Turbo variants stay within a fraction of a percent of full-precision quality on standard benchmarks. For nearly all production use cases, prefer Turbo.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I use Together AI for commercial projects?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes — both the Free and paid tiers permit commercial use, subject to each model’s underlying license. Llama models follow Meta’s Llama Community License, FLUX.1 [schnell] is Apache 2.0, and so on. Confirm any specific model’s license on its model card before shipping.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does Together AI store my prompts or completions?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Together AI’s stated policy is that they don’t train on your data and that prompts are not retained beyond what’s needed for abuse prevention. For sensitive workloads, the dedicated/enterprise tiers offer stronger data-handling guarantees. Re-check the current &lt;a href="https://www.together.ai/privacy" rel="noopener noreferrer"&gt;privacy policy&lt;/a&gt; before sending real customer data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How does the Free tier compare to running models locally with &lt;a href="https://toolfreebie.com/ollama-run-ai-models-locally-for-free/" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt;?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Ollama is unbeatable for offline development and zero-cost long-running tasks, but it’s bounded by the GPU on your laptop — running Llama 3.3 70B locally requires serious hardware. Together AI’s Free tier gives you the same model running on a real datacenter GPU, just with rate limits. The two tools are complements: prototype locally with Ollama on a smaller model, then call Together AI when you need the 70B for the parts that matter.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Verdict
&lt;/h2&gt;

&lt;p&gt;Together AI’s Free tier is the most underrated entry point in the free-AI-API space because it solves a problem most other free APIs ignore: &lt;strong&gt;multimodal coverage on a single key&lt;/strong&gt;. Every other provider in this category is great at one thing — Cerebras for raw speed, Gemini for context length, Cohere for retrieval, Cloudflare for edge — and forces you to integrate three or four of them if your project needs more than one capability. Together AI’s &lt;code&gt;-Free&lt;/code&gt; models give you chat, reasoning, vision, and image generation behind one HTTPS endpoint, one SDK, and one key, with no credit card.&lt;/p&gt;

&lt;p&gt;For prototyping multimodal agents, building a side project that mixes capabilities, or just keeping one fewer signup form on your “maybe later” list, Together AI’s Free tier earns its place in any serious 2026 free-AI-API stack. Sign up at &lt;a href="https://api.together.ai" rel="noopener noreferrer"&gt;api.together.ai&lt;/a&gt;, copy the key, and your first chat completion is about three minutes away.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Reads
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/10-best-free-ai-apis-in-2026-the-ultimate-comparison/" rel="noopener noreferrer"&gt;10 Best Free AI APIs in 2026: The Ultimate Comparison&lt;/a&gt; — the master list of every free chat API worth your time&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/groq-vs-cerebras-vs-gemini-fastest-free-ai-api-2026/" rel="noopener noreferrer"&gt;Groq vs Cerebras vs Gemini: Which Free AI API Is Actually Fastest in 2026?&lt;/a&gt; — when raw speed is the deciding factor&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/deepseek-api-free-access-to-r1-reasoning-and-v3-chat-models/" rel="noopener noreferrer"&gt;DeepSeek API: Free Access to R1 Reasoning and V3 Chat Models&lt;/a&gt; — for the same R1 reasoning, sourced directly&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/openrouter-access-300-free-ai-models-with-one-api-key/" rel="noopener noreferrer"&gt;OpenRouter: Access 300+ Free AI Models with One API Key&lt;/a&gt; — when model variety matters more than coverage of a single provider&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/cohere-free-api-embedding-rerank-rag-2026/" rel="noopener noreferrer"&gt;Cohere Free API: The Best Free Embedding and Rerank API for RAG in 2026&lt;/a&gt; — pair with Together AI for a complete free RAG stack&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://toolfreebie.com/together-ai-free-api-llama-deepseek-flux-2026/" rel="noopener noreferrer"&gt;toolfreebie.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Cohere Free API: The Best Free Embedding and Rerank API for RAG in 2026</title>
      <dc:creator>toolfreebie</dc:creator>
      <pubDate>Sun, 03 May 2026 15:50:01 +0000</pubDate>
      <link>https://dev.to/build996/cohere-free-api-the-best-free-embedding-and-rerank-api-for-rag-in-2026-5a2e</link>
      <guid>https://dev.to/build996/cohere-free-api-the-best-free-embedding-and-rerank-api-for-rag-in-2026-5a2e</guid>
      <description>&lt;h2&gt;
  
  
  What Is Cohere?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://cohere.com" rel="noopener noreferrer"&gt;Cohere&lt;/a&gt; is a Toronto-based AI company founded in 2019 by Aidan Gomez (one of the original authors of the “Attention Is All You Need” Transformer paper) and a team of ex-Google Brain researchers. Unlike OpenAI or Anthropic, Cohere built its platform from day one around a specific use case: &lt;strong&gt;enterprise retrieval and RAG (Retrieval-Augmented Generation)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That focus shows up in three places where Cohere genuinely leads the field — and where most developers don’t realize they can get it for free:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Embed v3&lt;/strong&gt; — text embeddings that consistently rank near the top of the MTEB benchmark, in both English and 100+ other languages&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rerank v3&lt;/strong&gt; — the most-deployed neural reranker in production RAG systems, available via a single API call&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Command R / R+&lt;/strong&gt; — chat models specifically trained for RAG, tool use, and grounded citations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And the part most developers miss: a free Cohere trial key gives you access to &lt;em&gt;all&lt;/em&gt; of these. No credit card, no time limit. The only constraint is per-minute rate limiting, which is fine for prototyping, side projects, and small production workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  What’s Free on Cohere
&lt;/h2&gt;

&lt;p&gt;Cohere has two key types: &lt;strong&gt;Trial keys&lt;/strong&gt; (free) and &lt;strong&gt;Production keys&lt;/strong&gt; (paid). Trial keys never expire — they’re rate-limited but otherwise unrestricted.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Endpoint&lt;/th&gt;
&lt;th&gt;Trial Rate Limit&lt;/th&gt;
&lt;th&gt;Production Rate Limit&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Chat (Command R/R+)&lt;/td&gt;
&lt;td&gt;20 calls/min&lt;/td&gt;
&lt;td&gt;500 calls/min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Embed&lt;/td&gt;
&lt;td&gt;100 calls/min&lt;/td&gt;
&lt;td&gt;2,000 calls/min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rerank&lt;/td&gt;
&lt;td&gt;10 calls/min&lt;/td&gt;
&lt;td&gt;1,000 calls/min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Classify&lt;/td&gt;
&lt;td&gt;100 calls/min&lt;/td&gt;
&lt;td&gt;1,000 calls/min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Summarize&lt;/td&gt;
&lt;td&gt;5 calls/min&lt;/td&gt;
&lt;td&gt;500 calls/min&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Notice the Embed limit: &lt;strong&gt;100 calls per minute&lt;/strong&gt; with up to 96 documents per call. That’s effectively 9,600 embeddings per minute on the free tier — more than enough to index a personal knowledge base or a small document corpus from scratch in a few minutes.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: Trial keys are not for production traffic, but they are for real development. Cohere’s documentation explicitly encourages building and testing on trial keys before upgrading.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Get Your Free API Key
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;a href="https://dashboard.cohere.com/welcome/register" rel="noopener noreferrer"&gt;dashboard.cohere.com/welcome/register&lt;/a&gt; and sign up with email or Google&lt;/li&gt;
&lt;li&gt;Verify your email address&lt;/li&gt;
&lt;li&gt;From the dashboard, navigate to &lt;strong&gt;API Keys&lt;/strong&gt; in the left sidebar&lt;/li&gt;
&lt;li&gt;Your default Trial key is already there — copy it&lt;/li&gt;
&lt;li&gt;Set it as an environment variable: &lt;code&gt;export COHERE_API_KEY="your_key_here"&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No credit card. No phone number. Two minutes from signup to your first embedding.&lt;/p&gt;

&lt;h2&gt;
  
  
  Python Quickstart: Your First Embedding
&lt;/h2&gt;

&lt;p&gt;Install the official Cohere Python SDK:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;cohere
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Embedding three documents:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cohere&lt;/span&gt;

&lt;span class="n"&gt;co&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cohere&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ClientV2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;COHERE_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;co&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cohere makes the best free embedding API for RAG.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OpenClaw is an AI agent platform for orchestrating tools.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Toronto is the headquarters of Cohere.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embed-english-v3.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;input_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_document&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding_types&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;float&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Got &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; embeddings&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Each embedding is &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; dimensions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That returns three 1024-dimensional vectors you can drop into any vector database — Pinecone, Weaviate, Chroma, Qdrant, pgvector, or just a NumPy array.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;input_type&lt;/code&gt; parameter is important: Cohere’s embeddings are &lt;strong&gt;asymmetric&lt;/strong&gt;. Use &lt;code&gt;"search_document"&lt;/code&gt; when indexing your corpus, and &lt;code&gt;"search_query"&lt;/code&gt; when embedding the user’s question. Treating them differently gives noticeably better retrieval quality than symmetric embedding APIs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Embedding Models You Get for Free
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model ID&lt;/th&gt;
&lt;th&gt;Dimensions&lt;/th&gt;
&lt;th&gt;Languages&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;embed-english-v3.0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1024&lt;/td&gt;
&lt;td&gt;English&lt;/td&gt;
&lt;td&gt;Highest quality English search and RAG&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;embed-multilingual-v3.0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1024&lt;/td&gt;
&lt;td&gt;100+&lt;/td&gt;
&lt;td&gt;Multilingual search, cross-language RAG&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;embed-english-light-v3.0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;384&lt;/td&gt;
&lt;td&gt;English&lt;/td&gt;
&lt;td&gt;Smaller index, faster queries, low storage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;embed-multilingual-light-v3.0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;384&lt;/td&gt;
&lt;td&gt;100+&lt;/td&gt;
&lt;td&gt;Multilingual on a budget&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For most RAG projects, &lt;code&gt;embed-english-v3.0&lt;/code&gt; at 1024 dimensions is the sweet spot. If you’re storing millions of vectors and storage cost matters, the light variants drop to 384 dimensions — about 60% smaller indexes — with only a small quality drop.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cohere Rerank: The Secret Weapon for RAG Quality
&lt;/h2&gt;

&lt;p&gt;Here is where Cohere genuinely leads: &lt;strong&gt;Rerank&lt;/strong&gt;. After your vector database returns the top 50 or 100 candidate documents, you pass them to Rerank along with the user’s query. Rerank scores each document for actual relevance and reorders them. The top 5 reranked results are almost always dramatically better than the top 5 from raw vector similarity.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cohere&lt;/span&gt;

&lt;span class="n"&gt;co&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cohere&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ClientV2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;COHERE_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How do I add a free embedding API to my chatbot?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cohere offers free embedding API access through trial keys.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Pinecone is a managed vector database service.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OpenAI embeddings cost $0.02 per million tokens.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Use embed-english-v3.0 for the best quality English embeddings.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Vector databases store high-dimensional vectors for similarity search.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;co&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rerank&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rerank-english-v3.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;top_n&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Score: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;relevance_score&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  |  &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That returns the three documents most relevant to the query, with calibrated relevance scores between 0 and 1. In production RAG systems, adding a Rerank step typically boosts answer quality by 15–30% over vector-similarity-only retrieval — which is why it’s the most-deployed neural reranker in commercial RAG stacks.&lt;/p&gt;

&lt;p&gt;And it’s &lt;strong&gt;free on the trial key&lt;/strong&gt;: 10 calls per minute, with up to 1,000 documents per call.&lt;/p&gt;

&lt;h2&gt;
  
  
  Chat with Command R+: Built for RAG
&lt;/h2&gt;

&lt;p&gt;Cohere’s Command R+ chat model is purpose-built for RAG. Unlike most chat APIs where you stuff retrieved documents into the system prompt, Cohere’s chat endpoint accepts a structured &lt;code&gt;documents&lt;/code&gt; parameter — and the model returns inline citations pointing to which documents each claim came from.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cohere&lt;/span&gt;

&lt;span class="n"&gt;co&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cohere&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ClientV2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;COHERE_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;co&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;command-r-plus&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Which Cohere embedding model should I use for English RAG?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embed-english-v3.0 produces 1024-dimensional embeddings and leads MTEB English benchmarks.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embed-english-light-v3.0 produces 384-dimensional embeddings, optimized for low storage cost.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embed-multilingual-v3.0 supports over 100 languages.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Citations:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;citation&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;citations&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;[]:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  - &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;citation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; from sources: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;citation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sources&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model produces a grounded answer that cites which document each fact came from. For RAG applications where users need to verify the source of every claim — legal, medical, internal knowledge bases — this is significantly more useful than free-text generation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Free Chat Models on Cohere
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model ID&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;Context Window&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;command-r-plus&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;104B&lt;/td&gt;
&lt;td&gt;128k tokens&lt;/td&gt;
&lt;td&gt;Best quality, complex RAG, tool use&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;command-r&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;35B&lt;/td&gt;
&lt;td&gt;128k tokens&lt;/td&gt;
&lt;td&gt;Faster RAG, cheaper-when-paid baseline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;command-r7b&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;7B&lt;/td&gt;
&lt;td&gt;128k tokens&lt;/td&gt;
&lt;td&gt;Fastest responses, simple Q&amp;amp;A&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;All three are available through your free trial key at the same 20-calls-per-minute rate limit. &lt;code&gt;command-r-plus&lt;/code&gt; is the headline model — it scores comparably to GPT-4o on RAG benchmarks while being explicitly trained to follow document citations.&lt;/p&gt;

&lt;h2&gt;
  
  
  End-to-End RAG Pipeline (All Free)
&lt;/h2&gt;

&lt;p&gt;Here’s a complete RAG pipeline using only Cohere’s free trial key — embed, store, retrieve, rerank, and answer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cohere&lt;/span&gt;

&lt;span class="n"&gt;co&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cohere&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ClientV2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;COHERE_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# 1. Your knowledge base
&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OpenClaw is an AI agent platform for orchestrating multiple AI APIs and tools.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cohere Embed v3 produces 1024-dimensional vectors optimized for retrieval.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cohere Rerank v3 reorders candidate documents by true relevance to the query.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Command R+ is a 104B model trained specifically for RAG with citations.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Free trial keys on Cohere have no time limit — only per-minute rate limits.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Index documents
&lt;/span&gt;&lt;span class="n"&gt;doc_embeds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;co&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embed-english-v3.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;input_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_document&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding_types&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;float&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;
&lt;span class="n"&gt;doc_matrix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc_embeds&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Embed the query
&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How do I get free access to Cohere&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s RAG models?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;query_embed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;co&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embed-english-v3.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;input_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding_types&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;float&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# 4. Vector similarity — get top 3 candidates
&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;doc_matrix&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt; &lt;span class="n"&gt;query_embed&lt;/span&gt;
&lt;span class="n"&gt;top_indices&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;argsort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;:][::&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;candidates&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;top_indices&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# 5. Rerank to get best 2
&lt;/span&gt;&lt;span class="n"&gt;reranked&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;co&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rerank&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rerank-english-v3.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;top_n&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;top_docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;reranked&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# 6. Answer with Command R+ using grounded citations
&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;co&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;command-r-plus&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;top_docs&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s a full production-shape RAG pipeline — embed, retrieve, rerank, generate with citations — running on a free trial key with zero credit card on file.&lt;/p&gt;

&lt;h2&gt;
  
  
  JavaScript / Node.js Example
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;cohere-ai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;CohereClientV2&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;cohere-ai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;co&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;CohereClientV2&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;token&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;COHERE_API_KEY&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;co&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Cohere is the best free embedding API for RAG.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Toronto is the headquarters of Cohere.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;embed-english-v3.0&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;inputType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;search_document&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;embeddingTypes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;float&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Got &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;float&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; embeddings`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Cohere vs Other Free Embedding Options
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Free Embedding Model&lt;/th&gt;
&lt;th&gt;Dimensions&lt;/th&gt;
&lt;th&gt;Multilingual&lt;/th&gt;
&lt;th&gt;Reranker?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cohere&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;embed-english-v3.0 / multilingual-v3.0&lt;/td&gt;
&lt;td&gt;1024 / 384&lt;/td&gt;
&lt;td&gt;100+ languages&lt;/td&gt;
&lt;td&gt;Yes (Rerank v3)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Google Gemini&lt;/td&gt;
&lt;td&gt;text-embedding-004&lt;/td&gt;
&lt;td&gt;768&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mistral AI&lt;/td&gt;
&lt;td&gt;mistral-embed&lt;/td&gt;
&lt;td&gt;1024&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cloudflare Workers AI&lt;/td&gt;
&lt;td&gt;bge-base-en-v1.5&lt;/td&gt;
&lt;td&gt;768&lt;/td&gt;
&lt;td&gt;English only&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hugging Face Inference&lt;/td&gt;
&lt;td&gt;BGE / E5 family&lt;/td&gt;
&lt;td&gt;varies&lt;/td&gt;
&lt;td&gt;Some multilingual&lt;/td&gt;
&lt;td&gt;No (manual setup)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI (paid only)&lt;/td&gt;
&lt;td&gt;text-embedding-3-large&lt;/td&gt;
&lt;td&gt;3072&lt;/td&gt;
&lt;td&gt;Strong multilingual&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Where Cohere wins on the free tier:&lt;/strong&gt; the only provider on this list that ships a hosted neural reranker. For RAG quality, that single feature usually matters more than which embedding model you started with. Combined with asymmetric embeddings (separate &lt;code&gt;search_query&lt;/code&gt; and &lt;code&gt;search_document&lt;/code&gt; modes), Cohere’s free tier is a credible foundation for real retrieval applications — not just a demo toy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use Cohere with OpenClaw
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://openclaw.ai" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt; is an AI agent platform that orchestrates multiple APIs and tools into automated workflows. Cohere fits well as the &lt;strong&gt;retrieval and grounding layer&lt;/strong&gt; inside OpenClaw agents — the part that searches your private documents before the agent acts.&lt;/p&gt;

&lt;p&gt;A common pattern: an OpenClaw agent receives a user task (“draft a reply to this customer ticket”), uses Cohere Embed + Rerank to pull the three most relevant past tickets and policies from your knowledge base, then passes those documents to Command R+ to generate a cited reply. Because Cohere returns explicit citations, the agent can attach source links to the draft for human review.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cohere&lt;/span&gt;

&lt;span class="n"&gt;co&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cohere&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ClientV2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;COHERE_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retrieve_and_answer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;knowledge_base&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;A retrieval-then-answer step for use inside an OpenClaw agent.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Rerank handles both retrieval and ranking in one call
&lt;/span&gt;    &lt;span class="n"&gt;reranked&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;co&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rerank&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rerank-english-v3.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;knowledge_base&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;top_n&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;top_docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;knowledge_base&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;reranked&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;co&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;command-r-plus&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
        &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;top_docs&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;answer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sources&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;top_docs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;citations&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;citations&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Example use inside an agent step
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;retrieve_and_answer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is our refund policy for digital downloads?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;knowledge_base&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;load_company_kb&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# your own loader
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;answer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice: when you only have a few hundred candidate documents, you can skip the embedding/vector-DB step entirely and just pass everything to Rerank. The free trial key allows up to 1,000 documents per Rerank call, which covers a surprising number of small-to-medium knowledge bases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cohere Pricing (When You Need More)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Price&lt;/th&gt;
&lt;th&gt;Unit&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Command R+&lt;/td&gt;
&lt;td&gt;$2.50 input / $10.00 output&lt;/td&gt;
&lt;td&gt;per 1M tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Command R&lt;/td&gt;
&lt;td&gt;$0.15 input / $0.60 output&lt;/td&gt;
&lt;td&gt;per 1M tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Command R7B&lt;/td&gt;
&lt;td&gt;$0.0375 input / $0.15 output&lt;/td&gt;
&lt;td&gt;per 1M tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Embed v3 (English / Multilingual)&lt;/td&gt;
&lt;td&gt;$0.10&lt;/td&gt;
&lt;td&gt;per 1M tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rerank v3&lt;/td&gt;
&lt;td&gt;$2.00&lt;/td&gt;
&lt;td&gt;per 1,000 searches&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;When you graduate from a Trial key to a Production key, Command R7B at $0.15 per million output tokens is one of the cheapest production-grade models available. Embed v3 at $0.10 per million tokens is competitive with or cheaper than every comparable hosted embedding API.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Use Cohere
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Cohere is the right choice when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You’re building a RAG application and want the best free embeddings + reranker combo&lt;/li&gt;
&lt;li&gt;You need multilingual retrieval across 100+ languages without changing models&lt;/li&gt;
&lt;li&gt;Your application requires grounded citations (legal, medical, internal knowledge bases)&lt;/li&gt;
&lt;li&gt;You want asymmetric embeddings (separate query and document modes) for better search quality&lt;/li&gt;
&lt;li&gt;You’re prototyping retrieval pipelines and want generous free per-minute limits&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Consider alternatives when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need raw chat throughput more than retrieval quality — use Groq or Cerebras for speed, Gemini Flash for free quota&lt;/li&gt;
&lt;li&gt;You want OpenAI SDK drop-in compatibility — use Mistral AI or DeepSeek&lt;/li&gt;
&lt;li&gt;You need image, audio, or multimodal generation — Cohere is text-only&lt;/li&gt;
&lt;li&gt;You’re building a pure chatbot with no retrieval — Command R+ works, but the model isn’t priced or designed around that use case&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Related Reads
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/groq-vs-cerebras-vs-gemini/" rel="noopener noreferrer"&gt;Groq vs Cerebras vs Gemini: Which Free AI API Is Actually Fastest in 2026?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/cerebras-free-api/" rel="noopener noreferrer"&gt;Cerebras Inference API: The Fastest Free AI API You’ve Never Heard Of&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/mistral-free-api/" rel="noopener noreferrer"&gt;Mistral AI Free API: Call Nemo and Mixtral for Free with Any OpenAI SDK&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/github-models-free-api/" rel="noopener noreferrer"&gt;GitHub Models: Free GPT-4o and Llama API for Every Developer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/cloudflare-workers-ai/" rel="noopener noreferrer"&gt;Cloudflare Workers AI: Free Edge AI Inference with 47+ Models&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Final Verdict
&lt;/h2&gt;

&lt;p&gt;Cohere is the most underrated free AI API for one specific reason: it’s the only provider that ships a complete RAG stack — embeddings, reranker, and a chat model trained for grounded citations — all behind a single free trial key. Most “free AI API” articles skip Cohere because they only compare chat models, where Cohere is fine but not best-in-class. That misses the point of what the company actually built.&lt;/p&gt;

&lt;p&gt;If your project involves search over your own documents, internal knowledge bases, customer tickets, product catalogs, or anything resembling RAG, Cohere’s free tier covers more of the pipeline than any other single provider. Sign up at &lt;a href="https://dashboard.cohere.com/welcome/register" rel="noopener noreferrer"&gt;dashboard.cohere.com&lt;/a&gt;, copy your trial key, and your first reranked retrieval is about ten minutes away.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://toolfreebie.com/cohere-rag-api/" rel="noopener noreferrer"&gt;toolfreebie.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Free AI Video Generators in 2026: Kling vs Pika vs HeyGen Compared</title>
      <dc:creator>toolfreebie</dc:creator>
      <pubDate>Sun, 03 May 2026 15:45:57 +0000</pubDate>
      <link>https://dev.to/build996/free-ai-video-generators-in-2026-kling-vs-pika-vs-heygen-compared-39k6</link>
      <guid>https://dev.to/build996/free-ai-video-generators-in-2026-kling-vs-pika-vs-heygen-compared-39k6</guid>
      <description>&lt;h2&gt;
  
  
  The State of Free AI Video Generation in 2026
&lt;/h2&gt;

&lt;p&gt;Two years ago, generative video was a research demo. You’d see a five-second OpenAI Sora clip on Twitter, a Runway Gen-2 reel that looked like a melted oil painting, and a vague feeling that “real” AI video was still a year or two out. By early 2026 that’s no longer true. There are three tools I now reach for every week — &lt;strong&gt;Kling&lt;/strong&gt;, &lt;strong&gt;Pika&lt;/strong&gt;, and &lt;strong&gt;HeyGen&lt;/strong&gt; — and all three have a free tier you can use without a credit card.&lt;/p&gt;

&lt;p&gt;The three solve different problems. Kling is what you use when you want a cinematic short clip generated from a still image or a text prompt. Pika is what you use when you want to direct a scene with motion brushes, lip-sync, and quick edits. HeyGen is what you use when you want a talking-head video of a fake (or real, with permission) person reading a script you wrote. They are not competitors so much as three slots in the same AI video toolkit.&lt;/p&gt;

&lt;p&gt;This article walks through each tool, what its free tier actually includes in April 2026, where the rough edges are, and how I’ve wired all three into automation built on top of &lt;a href="https://openclaw.ai" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt; for batch video generation. If you’re a creator, a developer building media tooling, or a marketer trying to stop paying $150/month for stock video, one or more of these will earn its place in your workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Quick Verdict, Up Front
&lt;/h2&gt;

&lt;p&gt;If you only have ten seconds to read this article:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Kling&lt;/strong&gt; — best for cinematic image-to-video and text-to-video. Free tier gives you ~166 credits/day (about 6 short clips) and 1080p output on the standard model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pika&lt;/strong&gt; — best for scene-level direction, motion brushes, and quick edits to existing video. Free tier is 250 credits at signup with limited regeneration.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HeyGen&lt;/strong&gt; — best for AI avatar talking-head videos for marketing, training, and tutorials. Free tier is three minutes of video per month with a watermark.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The rest of this article is the long version of why I picked those three over the dozen other contenders, what the actual workflow looks like, and how to chain them together for things like automated short-form video pipelines.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Picked These Three
&lt;/h2&gt;

&lt;p&gt;The free AI video space is crowded. There’s Runway, Luma Dream Machine, Hailuo (MiniMax), Vidu, Kling, Pika, HeyGen, Synthesia, D-ID, Sora when you can get a slot, and a long tail of WeChat-only Chinese tools. To narrow the field, I tested for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A real free tier in April 2026.&lt;/strong&gt; Not a “free trial that needs a card.” Several big-name tools quietly removed credit-card-free signup over the last year.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output quality I’d actually use.&lt;/strong&gt; Not just demo-reel cherry-picks. I generated the same prompt across every candidate and compared the dud rate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Different problem space.&lt;/strong&gt; Three text-to-video tools that do the same thing isn’t a useful roundup. I picked one cinematic generator (Kling), one motion-control editor (Pika), and one talking-head avatar (HeyGen).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API or automation surface.&lt;/strong&gt; At least one of the three needs to be scriptable, because that’s where AI video gets interesting beyond hobby use.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Notable tools that didn’t make this list and why:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Runway Gen-3&lt;/strong&gt; — beautiful output, but the free tier is now 125 one-time credits and that’s it. Once you’ve burned them, you’re paying. Kling and Pika are more sustainable for ongoing free use.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Luma Dream Machine&lt;/strong&gt; — solid quality, but the free tier dropped to 30 generations/month in late 2025. Workable for occasional use but more limited than Kling’s daily refresh.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sora&lt;/strong&gt; — when you can get access through a ChatGPT Plus account it’s stunning, but it’s not really “free” — you’re paying for the Plus subscription.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Synthesia&lt;/strong&gt; — free tier removed in 2024. Fully paid product now.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  1. Kling: The Best Free Cinematic Video Generator
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fna9smhk0p92guu0fjbq8.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fna9smhk0p92guu0fjbq8.jpg" alt="KlingAI 3.0 community page showing the All-New KlingAI 3.0 Series hero with a real generated cinematic clip" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Kling's English-locale community landing — the desert-driving hero is itself a Kling-generated clip.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://klingai.com" rel="noopener noreferrer"&gt;Kling&lt;/a&gt; is built by Kuaishou — the Chinese short-video company with billions of users — and it’s currently my default for “give me a five-second cinematic shot of X.” The model handles motion, light, and camera moves better than anything else available without payment in 2026. Most importantly, the free tier is unusually generous: a daily credit refresh rather than a one-time pool.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the Free Tier Actually Includes
&lt;/h3&gt;

&lt;p&gt;As of April 2026, signing up for Kling with an email gives you 166 credits per day. Each generation costs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Standard text-to-video, 5s, 720p:&lt;/strong&gt; 10 credits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Standard text-to-video, 5s, 1080p:&lt;/strong&gt; 20 credits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Standard image-to-video, 5s, 1080p:&lt;/strong&gt; 20 credits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pro mode (higher quality, 10s):&lt;/strong&gt; 35 credits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lip sync, motion brush, camera control:&lt;/strong&gt; usually +5 to +10 credits&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That works out to roughly 6-8 standard 1080p clips per day at no cost, or 3-4 longer Pro clips. The credits don’t roll over, so you have to use them or lose them — but the daily refresh is what makes Kling viable as a long-term free tool rather than a brief trial.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Standard vs Pro Difference
&lt;/h3&gt;

&lt;p&gt;Kling ships two underlying models. &lt;strong&gt;Standard&lt;/strong&gt; is fast (about 60 seconds per generation) and handles most prompts well. &lt;strong&gt;Pro&lt;/strong&gt; takes longer (3-5 minutes), produces noticeably better motion coherence, and supports the longer 10-second outputs. For text-to-video without a reference image, Pro is worth the credit hit; for image-to-video starting from a strong reference still, Standard is usually fine.&lt;/p&gt;

&lt;h3&gt;
  
  
  A First Generation
&lt;/h3&gt;

&lt;p&gt;The web UI is intentionally simple. Sign in with Google or email, pick text-to-video or image-to-video, type a prompt or upload an image, set duration and resolution, hit Generate. A queue position appears, the clip arrives in your library when ready, and you can download as MP4.&lt;/p&gt;

&lt;p&gt;The single most important Kling-specific tip: &lt;strong&gt;prompts work best when written like a film shot description, not like a Midjourney prompt&lt;/strong&gt;. Compare:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bad: &lt;em&gt;“a cat, cyberpunk, neon, 4k, detailed, cinematic, high quality”&lt;/em&gt; — Kling treats the modifiers as scene elements and produces a confused frame.&lt;/li&gt;
&lt;li&gt;Good: &lt;em&gt;“Wide shot of a black cat walking slowly through a rainy Tokyo alley at night, neon signs reflected in puddles, slight steam rising from grates, camera tracking right at hip height.”&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The good prompt produces something that looks like a real cinematographer made a deliberate choice. The bad prompt produces a beautifully lit cat that doesn’t move convincingly. Tag-spam works for image generators; Kling rewards sentences.&lt;/p&gt;

&lt;h3&gt;
  
  
  Image-to-Video Is Where Kling Shines
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpbtbl73ix628wvqlyb3v.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpbtbl73ix628wvqlyb3v.jpg" alt="Kling templates page showing pre-built image-to-video starter prompts" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Kling ships templated image-to-video recipes — the fastest way to evaluate the model on the free tier.&lt;/p&gt;

&lt;p&gt;If you upload a still image and write a short motion prompt, Kling produces output that’s substantially better than its text-to-video. The reasoning is structural: the model only has to invent motion, not the entire visual world. Workflow I use weekly:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Generate a hero still in Midjourney, Imagen, or Flux. Iterate until the image is exactly what I want.&lt;/li&gt;
&lt;li&gt;Upload that still to Kling, image-to-video mode, 1080p, 5s.&lt;/li&gt;
&lt;li&gt;Prompt with motion only: &lt;em&gt;“Camera slowly pushes in on the subject. Hair moves gently in the wind. Background trees sway.”&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Generate two or three takes (Kling is non-deterministic), pick the best one.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This pipeline costs 40-60 credits and produces output you’d otherwise pay a stock-video site $40 for. It’s the single highest-leverage use of Kling’s free tier.&lt;/p&gt;

&lt;h3&gt;
  
  
  Camera Controls and Motion Brush
&lt;/h3&gt;

&lt;p&gt;Kling’s camera control panel lets you specify pan, tilt, zoom, and orbit moves explicitly rather than hoping the prompt conveys them. Motion brush lets you mask part of the input image and tell the model “move this region in this direction.” Both features cost extra credits but eliminate most of the “the AI didn’t understand what I wanted to move” problem that plagued earlier video generators.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where Kling Falls Short
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Faces drift over longer clips.&lt;/strong&gt; A 10-second Pro clip with a clear human face will sometimes shift facial features halfway through. Workaround: keep clips at 5 seconds and stitch in DaVinci Resolve.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Text in scenes is unreadable.&lt;/strong&gt; Like every video model in 2026, signs and on-screen text are gibberish. Generate clean plates and overlay real text in post.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The free tier UI is in Mandarin by default for some signup regions.&lt;/strong&gt; The English toggle is in the top right; the Mandarin labels are easy to navigate around using the visual layout.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Daily credits don’t accumulate.&lt;/strong&gt; If you don’t log in for a week, you don’t have 1,162 credits waiting — you have 166. Plan your generation days.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  2. Pika: Scene-Level Direction and Motion Brushes
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqsw0yt0xejrvm38svc2m.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqsw0yt0xejrvm38svc2m.jpg" alt="Pika homepage showing the Pikaformances feature preview alongside the Google/Facebook/Discord/Email sign-in card" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Pika gates everything behind a free account — the modal you see is unavoidable, but signup itself is genuinely free.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://pika.art" rel="noopener noreferrer"&gt;Pika&lt;/a&gt; is the second tool I keep installed. Where Kling is best at “generate me a cinematic shot,” Pika is best at “take this clip and modify it with surgical precision.” It’s the closest thing in the free AI video space to a non-linear editor where the operations are AI primitives rather than transitions.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the Free Tier Actually Includes
&lt;/h3&gt;

&lt;p&gt;Pika’s free tier in April 2026 gives you 250 credits at signup, with no automatic daily refresh — you earn small amounts of additional credits by participating in their Discord challenges or referring users. Each generation costs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pika 2.2 text-to-video, 5s, 1080p:&lt;/strong&gt; 30 credits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Image-to-video, 5s, 1080p:&lt;/strong&gt; 30 credits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pikaframes (frame-to-frame interpolation):&lt;/strong&gt; 35 credits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pikaffects (specific transformation effects):&lt;/strong&gt; 25-50 credits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lip sync to audio:&lt;/strong&gt; 30 credits&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s roughly 8-10 generations from your initial pool. After that you’re either paying $10/month for the Standard plan (700 credits/mo) or hunting for community credit drops. The free tier is best understood as a generous trial rather than a sustainable daily tool — the opposite shape from Kling.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Pika Is Worth a Slot Anyway
&lt;/h3&gt;

&lt;p&gt;Pika ships features the others don’t. Specifically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pikaffects&lt;/strong&gt; — pre-built transformation primitives. “Inflate” makes the subject puff up, “explode” replaces them with a particle burst, “melt” liquefies them, “crush” smashes them. They’re designed for short-form social video and they look great. No competitor offers this set as one-click effects.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pikaframes&lt;/strong&gt; — give it a starting image and an ending image, get a smooth video between them. Useful for product shots (“from box to assembled”), morphs, and storyboard-to-video.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lip sync&lt;/strong&gt; — upload a video of a person and an audio file, Pika rewrites the mouth to match the new audio. Quality is the best of the free tools I tested for this specific task.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Modify region&lt;/strong&gt; — paint a mask on a frame, prompt the change (“make the shirt red”), Pika regenerates only that region across the clip.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these are headline “generate cinematic video from scratch” features, but together they make Pika the right tool for editing AI video the rest of the way.&lt;/p&gt;

&lt;h3&gt;
  
  
  A Realistic Workflow
&lt;/h3&gt;

&lt;p&gt;The shape of the work I get done with Pika in a week:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Generate a base clip in Kling (uses Kling’s free daily credits).&lt;/li&gt;
&lt;li&gt;Bring it into Pika to apply a Pikaffect or run lip sync against a voiceover I generated in ElevenLabs or Coqui.&lt;/li&gt;
&lt;li&gt;Export and assemble in DaVinci Resolve (also free).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That pipeline produces social-media-ready short-form video without paying any single tool. Pika’s free credits are limiting if it’s your only tool, but they go a long way when used surgically on top of another generator.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where Pika Falls Short
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Initial credit pool runs out fast.&lt;/strong&gt; 250 credits sounds like a lot until you realize a single generation is 30. After your first day of experimentation, expect to be on a slower drip.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No public API on the free tier.&lt;/strong&gt; Pika has an API but it’s invite-only and paid. Automation requires browser automation against the web UI.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pikaffects are visually distinctive — to a fault.&lt;/strong&gt; If your audience watches a lot of TikTok they’ve seen the inflate/melt/explode effects on a hundred other accounts. Use sparingly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Long-form text prompts get truncated.&lt;/strong&gt; Keep your prompts under ~40 words for best results.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  3. HeyGen: AI Avatars That Read Your Script
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj55czqx6rqdmqy3ooeus.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj55czqx6rqdmqy3ooeus.jpg" alt="HeyGen templates page with the Transform any idea into a compelling video hero and the live AI Agent prompt UI" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;HeyGen's AI Agent landing — type a prompt, set duration and aspect, and the avatar pipeline kicks off.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.heygen.com" rel="noopener noreferrer"&gt;HeyGen&lt;/a&gt; solves a completely different problem from the other two. Where Kling and Pika generate cinematic or stylized video, HeyGen generates a realistic-looking person speaking words you typed. It’s the tool you reach for when you want a presenter for a tutorial, a marketing video, an e-learning module, or any context where someone needs to look at a camera and explain something.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the Free Tier Actually Includes
&lt;/h3&gt;

&lt;p&gt;The HeyGen free tier in April 2026 gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;3 minutes of video per month&lt;/strong&gt; across all your generations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Access to ~100 stock avatars&lt;/strong&gt; (real people who licensed their likeness)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;~300 voices in 40+ languages&lt;/strong&gt; via the built-in TTS&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;720p export with a HeyGen watermark&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Up to 1-minute video length per generation&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Three minutes a month sounds tight, and it is — but most use cases are 60-90 second explainer videos, so you’re realistically looking at two or three videos per month before you’d need to upgrade. For a side project or a single-person business, that’s often enough.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Killer Feature: Custom Voice Clone
&lt;/h3&gt;

&lt;p&gt;HeyGen’s standout free feature is &lt;strong&gt;Instant Voice Clone&lt;/strong&gt; — upload a 30-second clip of someone speaking (yours, or someone else’s with their permission) and HeyGen creates a TTS voice that sounds like them. You can then use that voice on any avatar in the platform. Free tier limits you to one voice clone, but the quality is genuinely good in English and the major European languages, and decent in Mandarin and Japanese.&lt;/p&gt;

&lt;p&gt;The two-step workflow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Record yourself reading the HeyGen onboarding paragraph at a normal speaking pace. Upload it.&lt;/li&gt;
&lt;li&gt;Wait ~5 minutes. Pick the new voice from the voice dropdown when generating any video.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Combined with the free avatar library, this gets you a presenter who looks like a paid actor and sounds like you. There’s an obvious ethical line here — only clone your own voice or one you have explicit permission for — but the technical capability is there in the free tier.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Avatar Selection
&lt;/h3&gt;

&lt;p&gt;The 100 free stock avatars cover a wide range of ages, ethnicities, and presentation styles: business-casual person at a desk, casual person against a neutral background, news-anchor framing, etc. They’re filmed people who licensed their image, not generated faces, which means they look genuinely human and don’t fall into the uncanny valley that pure-AI avatars do. Premium tiers unlock more avatars and the ability to create your own custom avatar from a video upload, but the free pool is varied enough for most general-purpose work.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Generation Workflow
&lt;/h3&gt;

&lt;p&gt;HeyGen feels like a slide editor more than a video generator. You add scenes, each scene has a background (color, image, or stock video), an avatar, and a script. You type the script, pick the voice, and generate. The avatar reads the script with synced lip movement, natural-looking head turns, and basic gestures. Total turnaround for a 60-second video is usually 2-3 minutes.&lt;/p&gt;

&lt;p&gt;The most underrated feature: &lt;strong&gt;HeyGen translates and dubs in one click&lt;/strong&gt;. Generate an English video, then use the Translate option to produce a Spanish, French, German, or Mandarin version with the same avatar lip-syncing the new language. Useful for any creator targeting multiple markets without recording multiple takes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where HeyGen Falls Short
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The watermark on the free tier is visible.&lt;/strong&gt; It’s a “Made with HeyGen” badge in the corner. Not subtle. If you’re publishing professionally you’ll want the $24/month Creator plan to remove it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Avatars are static-camera talking heads.&lt;/strong&gt; No walking around, no scene changes within the avatar shot, no full-body shots. If you want a presenter doing things, you’re back to filming a real person.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;3 minutes/month adds up fast if you iterate.&lt;/strong&gt; Generations against your script all count, including ones you discard. Get the script right in a text editor before generating.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Voice clone needs clean audio.&lt;/strong&gt; A 30-second clip with background noise produces a noisy clone. Record in a quiet room with a decent USB mic.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Side-by-Side Comparison
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftdxstld89ylf5muatphv.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftdxstld89ylf5muatphv.jpg" alt="Comparison table: Kling vs Pika vs HeyGen across free quota, generation time, clip length, resolution, image-to-video, voice clone, watermark, and best use case" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Where each free tier actually lands across the metrics that matter.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Kling&lt;/th&gt;
&lt;th&gt;Pika&lt;/th&gt;
&lt;th&gt;HeyGen&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Primary use case&lt;/td&gt;
&lt;td&gt;Cinematic clips&lt;/td&gt;
&lt;td&gt;Scene editing &amp;amp; effects&lt;/td&gt;
&lt;td&gt;Talking-head avatars&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Text-to-video&lt;/td&gt;
&lt;td&gt;Yes (best of the three)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No (script-to-avatar only)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Image-to-video&lt;/td&gt;
&lt;td&gt;Yes (best in class)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Free tier model&lt;/td&gt;
&lt;td&gt;~166 credits/day refresh&lt;/td&gt;
&lt;td&gt;250 credits at signup&lt;/td&gt;
&lt;td&gt;3 minutes/month&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Free output resolution&lt;/td&gt;
&lt;td&gt;1080p&lt;/td&gt;
&lt;td&gt;1080p&lt;/td&gt;
&lt;td&gt;720p&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Free output watermark&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max clip length (free)&lt;/td&gt;
&lt;td&gt;10s (Pro) / 5s (Standard)&lt;/td&gt;
&lt;td&gt;5s&lt;/td&gt;
&lt;td&gt;60s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lip sync to audio&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;Yes (good)&lt;/td&gt;
&lt;td&gt;Yes (built into avatars)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Camera control&lt;/td&gt;
&lt;td&gt;Yes (explicit panel)&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Motion brush&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes (Modify Region)&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Voice cloning&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes (1 voice on free)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Translation/dubbing&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Public API&lt;/td&gt;
&lt;td&gt;Yes (paid)&lt;/td&gt;
&lt;td&gt;Invite-only (paid)&lt;/td&gt;
&lt;td&gt;Yes (paid tier)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best for&lt;/td&gt;
&lt;td&gt;B-roll, hero shots&lt;/td&gt;
&lt;td&gt;Effects, lip-sync, edits&lt;/td&gt;
&lt;td&gt;Tutorials, training, marketing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  How to Pick — A Decision Tree
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx1l8thp77sz8y2gk1fw7.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx1l8thp77sz8y2gk1fw7.jpg" alt="Decision tree mapping video need to recommended tool: cinematic shot to Kling, social clip to Pika, avatar to HeyGen, all-of-above to the combined pipeline" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you can answer one question — what kind of video — you don't need to read the rest of the comparison.&lt;/p&gt;

&lt;p&gt;Most of the time the choice falls out of one question: &lt;strong&gt;what does the final video need to look like?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cinematic establishing shots, B-roll, or any “make me a beautiful 5-second video” task&lt;/strong&gt; → Kling. The daily credit refresh means you can iterate without blowing through a fixed pool, and image-to-video on a strong reference still consistently produces the best output of the three.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Effects, lip sync to a voiceover, or modifying an existing clip&lt;/strong&gt; → Pika. The Pikaffects library is unique, the lip sync quality is the best of the three for re-dubbing footage you didn’t generate, and the modify-region feature is the only way to do localized edits across an AI-generated clip in any free tool.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;An explainer video, tutorial, marketing pitch, or anything where someone needs to talk to camera&lt;/strong&gt; → HeyGen. The avatar quality is genuinely good, the voice clone makes it personal, and the one-click translation lets you reach non-English audiences from a single English script.&lt;/p&gt;

&lt;p&gt;The combination I use most is Kling + HeyGen — Kling for the visuals, HeyGen for any spoken intro or outro by a presenter avatar. Pika comes in when I need a specific Pikaffect or a precise edit Kling can’t make.&lt;/p&gt;

&lt;h2&gt;
  
  
  Combining All Three: A Free Short-Form Video Pipeline
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fec800on3h8nxza9rwq39.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fec800on3h8nxza9rwq39.jpg" alt="Pipeline diagram: a static image flows into Kling (image-to-video), then Pika (restyle); the same image also goes to HeyGen (avatar voiceover); both branches merge in CapCut for the final short" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The free-tier-only pipeline I actually use to produce a 30-second explainer in under five minutes of work.&lt;/p&gt;

&lt;p&gt;The pipeline I built in early 2026 to produce one short-form video per day with zero spend:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Script in any LLM.&lt;/strong&gt; A short 60-second script with a hook, three beats, and a call to action. Claude or DeepSeek for free.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Voiceover in ElevenLabs free tier or Coqui.&lt;/strong&gt; 10,000 characters/month free in ElevenLabs is enough for ~10 short scripts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hero still in Flux Schnell or Imagen 3 free.&lt;/strong&gt; One image that captures the visual concept of the video.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cinematic clip from the still in Kling.&lt;/strong&gt; Image-to-video, 1080p, 5s. Repeat 3-4 times for the different beats of the script.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lip-synced presenter intro in HeyGen.&lt;/strong&gt; 10-15 second avatar talking-head intro using the cloned voice.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Edit and assemble in DaVinci Resolve free.&lt;/strong&gt; Trim, color-grade, add captions (which DaVinci’s built-in transcription generates), export to 9:16 for vertical platforms.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Daily cost: $0. Weekly time: ~30 minutes per video once the workflow is dialed in. The output quality is high enough that the audience can’t tell the difference between this pipeline and a small studio’s work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using Kling, Pika, and HeyGen with OpenClaw
&lt;/h2&gt;

&lt;p&gt;If you’re orchestrating media generation through &lt;a href="https://openclaw.ai" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt; agents — which is increasingly the right move for batch content production — the three tools fit different parts of the agent’s toolkit. None of them have a fully open free API, but two have paid APIs that an agent can call when scaled, and the web UIs can be driven via browser automation when the volume is small.&lt;/p&gt;

&lt;p&gt;The pattern I’ve found works:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Agent generates a script and a still-frame prompt&lt;/strong&gt; using a free LLM API like &lt;a href="https://dev.to/p/deepseek-api-free-r1-v3-models"&gt;DeepSeek&lt;/a&gt; or &lt;a href="https://dev.to/p/groq-fastest-free-ai-api-2026"&gt;Groq&lt;/a&gt;. Both give you enough free quota for hundreds of script generations per day.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent calls an image generator&lt;/strong&gt; (Flux, Imagen 3 via the Gemini free tier) for the hero still.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Browser automation step submits the still to Kling&lt;/strong&gt; in image-to-video mode, polls for completion, downloads the MP4. This is the part where, until Kling opens a free API, you’re using Playwright or similar.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent uses HeyGen’s API for the talking-head intro.&lt;/strong&gt; HeyGen’s API is paid but inexpensive — about $0.04 per second of video on the lowest tier — and well-suited to programmatic use. For pure-free workflows you can drive HeyGen’s web UI with browser automation too.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Final assembly&lt;/strong&gt; happens in FFmpeg via the agent’s shell tool. Concat clips, overlay captions, output the final file.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The advantage of orchestrating through OpenClaw rather than running each tool by hand is that the agent can iterate on rejected outputs. If a Kling generation comes back with the wrong subject framing, the agent retries with a refined prompt. If the HeyGen avatar’s voiceover trips on a technical word, the agent rewrites the script using the speak-friendly equivalent. This is exactly the kind of multi-step, failure-tolerant workflow that AI agents handle better than rigid scripts — and the free tiers make experimentation cheap.&lt;/p&gt;

&lt;p&gt;For more on building agent workflows that call third-party tools, see our walkthrough of &lt;a href="https://dev.to/p/mcp-model-context-protocol-connect-ai-agents"&gt;MCP for connecting AI agents to any tool or API&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest Limitations of Free AI Video in 2026
&lt;/h2&gt;

&lt;p&gt;Three things to keep in mind before betting a real production schedule on free AI video:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Daily credit caps mean you can’t burst.&lt;/strong&gt; If a project needs 30 cinematic clips by Friday, the free Kling tier won’t get you there in time — you’d need 5+ days at the daily refresh rate. Plan accordingly or pay for a one-month bump.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output quality is non-deterministic.&lt;/strong&gt; Even the best prompt produces a dud one in three or four times. Budget for regeneration credits.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Faces and hands remain the weak point.&lt;/strong&gt; All three tools handle faces well in close-ups but struggle with subtle facial drift over longer clips. For anything where a viewer will scrutinize a face, Kling’s image-to-video on a strong portrait still is your best chance, and short clips (5s, not 10s) are safer than long ones.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Terms of service vary.&lt;/strong&gt; Kling and Pika both allow free-tier output to be used commercially as of April 2026, but check before publishing — the Chinese-origin tools in particular have updated their commercial-use clauses repeatedly. HeyGen’s free tier output is technically commercial-use-allowed but the watermark makes it impractical for paid client work.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What’s Coming in the Rest of 2026
&lt;/h2&gt;

&lt;p&gt;Three things to watch:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI Sora consumer tier.&lt;/strong&gt; Sora has been API-only and expensive; rumors of a free tier inside ChatGPT Plus could shake up this list overnight.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open-source video models catching up.&lt;/strong&gt; Hunyuan Video, Mochi 1, and CogVideoX are usable open-weight models in 2026 — none yet match Kling on a fresh consumer GPU, but they’re closing the gap fast and let you run unlimited free generation on hardware you already own.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HeyGen-style avatar generators going lower-cost.&lt;/strong&gt; D-ID’s free tier vanished, but new entrants like Hedra and Synthesia’s stripped-down “Studio Free” launched in early 2026 are trying to undercut HeyGen. Worth watching.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This list is current as of April 2026. Free tiers in this space change quarterly — what’s free this week may not be free next week. The pattern of three tools (one cinematic generator, one editor, one talking-head) will outlast any specific provider, even when the names change.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Reads
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/postman-alternatives/" rel="noopener noreferrer"&gt;Postman Alternatives in 2026: Bruno and Hoppscotch — Free, Open-Source API Clients That Don’t Force a Login&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/free-temp-email-services/" rel="noopener noreferrer"&gt;Free Temporary Email Services in 2026: 9 Best Disposable Email Tools for Developer Testing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/herosms-review/" rel="noopener noreferrer"&gt;HeroSMS Review: Receive SMS Verification Codes from 180+ Countries&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/cohere-rag-api/" rel="noopener noreferrer"&gt;Cohere Free API: The Best Free Embedding and Rerank API for RAG in 2026&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/render-hosting-review/" rel="noopener noreferrer"&gt;Render Free Hosting Review 2026: Deploy Web Apps, Databases, and Cron Jobs for Free&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Final Verdict
&lt;/h2&gt;

&lt;p&gt;If you’re going to use one of the three:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Use &lt;a href="https://klingai.com" rel="noopener noreferrer"&gt;Kling&lt;/a&gt;&lt;/strong&gt; if you need cinematic clips and want the most generous, sustainable free tier. Daily credit refresh and 1080p output make it the best free general-purpose AI video tool in 2026.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use &lt;a href="https://pika.art" rel="noopener noreferrer"&gt;Pika&lt;/a&gt;&lt;/strong&gt; if you’re editing or transforming existing clips, lip-syncing voiceovers, or applying social-friendly effects. Limited free credits but unique features.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use &lt;a href="https://www.heygen.com" rel="noopener noreferrer"&gt;HeyGen&lt;/a&gt;&lt;/strong&gt; if you need a talking-head presenter for tutorials, marketing, or training. Voice clone and one-click translation are killer features inside the free 3 minutes/month.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want the full pipeline — and you’re willing to invest 30 minutes a day learning the tools — chain all three together. The output rivals what stock-video subscriptions and small studios charge hundreds of dollars per month for, and the cost is zero. That equation didn’t exist a year ago and probably won’t last forever, so it’s worth using while it’s there.&lt;/p&gt;

&lt;p&gt;For more free AI tools that pair well with this video pipeline, see our roundup of &lt;a href="https://dev.to/p/10-best-free-ai-apis-2026-comparison"&gt;the 10 best free AI APIs in 2026&lt;/a&gt; and our guide to &lt;a href="https://dev.to/p/notebooklm-free-ai-research-tool"&gt;Google NotebookLM for free AI research&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://toolfreebie.com/kling-pika-heygen/" rel="noopener noreferrer"&gt;toolfreebie.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>tools</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Postman Alternatives in 2026: Bruno and Hoppscotch — Free, Open-Source API Clients That Don’t Force a Login</title>
      <dc:creator>toolfreebie</dc:creator>
      <pubDate>Sun, 03 May 2026 15:40:20 +0000</pubDate>
      <link>https://dev.to/build996/postman-alternatives-in-2026-bruno-and-hoppscotch-free-open-source-api-clients-that-dont-force-2ohf</link>
      <guid>https://dev.to/build996/postman-alternatives-in-2026-bruno-and-hoppscotch-free-open-source-api-clients-that-dont-force-2ohf</guid>
      <description>&lt;h2&gt;
  
  
  Why Postman Alternatives Suddenly Matter in 2026
&lt;/h2&gt;

&lt;p&gt;For the better part of a decade, Postman was the default API client. You installed it once, double-clicked the desktop icon, and started firing requests. It was free, it was good, and the company didn’t get in your way.&lt;/p&gt;

&lt;p&gt;That’s no longer true. Over the last few years Postman has steadily pushed users toward a cloud-first model: a forced login on launch, collections that sync to Postman’s servers whether you want them to or not, environments that live in the cloud by default, and a free tier that quietly trimmed the number of collection runs, monitors, and shared workspaces. The Scratch Pad — Postman’s local-only mode that never required an account — was deprecated in 2023 and removed not long after. By 2026 the desktop app is essentially a thin shell around the cloud.&lt;/p&gt;

&lt;p&gt;For an individual developer, the new shape of Postman has three problems. You can’t open the app without an internet connection on a fresh machine. You can’t put your collections into your project’s Git repo and review changes in pull requests. And you can’t really use it at work without thinking about whether your auth tokens or staging URLs are syncing to a third-party server.&lt;/p&gt;

&lt;p&gt;This article is a hands-on walk through the two free, open-source API clients I actually use in 2026: &lt;strong&gt;Bruno&lt;/strong&gt; and &lt;strong&gt;Hoppscotch&lt;/strong&gt;. Both are mature, both are fully Postman-shaped (the cURL-paste, the request tabs, the collections, the environments), and both deliberately fix what Postman broke. Below I cover what each one does best, where they overlap, where they diverge, and how to migrate a real Postman workspace into either of them in about ten minutes.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Wanted in a Postman Replacement
&lt;/h2&gt;

&lt;p&gt;Before getting to the tools themselves, here’s the rubric I used. If your priorities differ, your verdict may also differ.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No mandatory account.&lt;/strong&gt; Open the app, hit the API. That’s it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local-first storage.&lt;/strong&gt; My collections are files on my disk, not records in someone else’s database.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Git-friendly.&lt;/strong&gt; A request lives in a plain-text file I can diff, review, and commit alongside the code that calls it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open-source license.&lt;/strong&gt; If the company gets weird in 2027, I can fork it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CLI for CI.&lt;/strong&gt; I want to run the same collection in a build pipeline as I do at my desk.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reasonable scripting.&lt;/strong&gt; Pre-request and post-response hooks for token capture, signature generation, and cleanup.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Imports a Postman v2.1 collection without losing data.&lt;/strong&gt; Switching costs need to be near zero.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bruno and Hoppscotch both satisfy the list. They go about it differently — one is a desktop-first text-file engine, the other is a browser-first PWA — and that difference is most of what this article is about.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bruno: A Git-Native, Local-First API Client
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.usebruno.com" rel="noopener noreferrer"&gt;Bruno&lt;/a&gt; is a desktop API client that does one thing very deliberately: it stores every request as a plain-text file in a folder on your disk. There is no cloud account. There is no signup screen. There isn’t even a sync option. If you want to share a collection with a teammate, you commit it to your project’s Git repo, and they pull. The whole tool is built around that decision.&lt;/p&gt;

&lt;p&gt;It launched in late 2023 explicitly as a reaction to Postman’s cloud direction, and by 2026 it’s the option I reach for first whenever I’m building or testing an API that lives in a code repository.&lt;/p&gt;

&lt;h3&gt;
  
  
  Installing Bruno
&lt;/h3&gt;

&lt;p&gt;Bruno ships native installers for macOS, Windows, and Linux from the project site. There’s also a Homebrew formula on macOS:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;brew &lt;span class="nb"&gt;install &lt;/span&gt;bruno
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And a Snap on Linux:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;snap &lt;span class="nb"&gt;install &lt;/span&gt;bruno
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On first launch Bruno opens to an empty workspace and asks you to either create a new collection or open an existing one. There is no signup screen and no telemetry consent dialog — the binary launches, the window appears, and you can fire your first request inside thirty seconds.&lt;/p&gt;

&lt;h3&gt;
  
  
  Your First Request
&lt;/h3&gt;

&lt;p&gt;Click &lt;strong&gt;New Collection&lt;/strong&gt;, give it a name, and pick a folder on disk. Bruno will create that folder and put a &lt;code&gt;bruno.json&lt;/code&gt; file in it; that file is the collection root. Inside the new collection, click &lt;strong&gt;New Request&lt;/strong&gt;, paste a URL, choose a method, and hit &lt;strong&gt;Send&lt;/strong&gt;. The response panel works exactly the way you’d expect: status code, headers, timing, body with JSON pretty-print.&lt;/p&gt;

&lt;p&gt;If you want to skip the form entirely and paste a cURL command, Bruno parses it on import — the same trick Postman has had for years.&lt;/p&gt;

&lt;h3&gt;
  
  
  The .bru File Format
&lt;/h3&gt;

&lt;p&gt;Here’s where Bruno earns its place on this list. Every request in your collection is a single text file with a &lt;code&gt;.bru&lt;/code&gt; extension. Open one in any editor and you’ll see something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;meta {
  name: Get current user
  type: http
  seq: 1
}

get {
  url: {{base_url}}/users/me
  body: none
  auth: bearer
}

auth:bearer {
  token: {{api_token}}
}

headers {
  Accept: application/json
}

tests {
  test("status is 200", function() {
    expect(res.getStatus()).to.equal(200);
  });
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s it. No JSON, no UUIDs, no import artifacts. The format is purpose-built to read well in a pull request review. When a teammate adds a new endpoint, the diff in your PR shows up as a new &lt;code&gt;.bru&lt;/code&gt; file. When someone changes an auth header, you see exactly which header on which request changed.&lt;/p&gt;

&lt;p&gt;Putting your &lt;code&gt;bruno/&lt;/code&gt; folder into the same repo as your application code is the intended workflow. The API client is now part of the codebase — it lives, breathes, and gets reviewed alongside the routes it tests.&lt;/p&gt;

&lt;h3&gt;
  
  
  Environments and Secrets
&lt;/h3&gt;

&lt;p&gt;Environments in Bruno are also files. They live in an &lt;code&gt;environments/&lt;/code&gt; subfolder of the collection, one &lt;code&gt;.bru&lt;/code&gt; file per environment. Switching between local, staging, and production is a single dropdown in the UI, and the file format is just a flat list of variables.&lt;/p&gt;

&lt;p&gt;Secrets get a separate treatment. Bruno supports a &lt;code&gt;.env&lt;/code&gt;-like override file (&lt;code&gt;.env&lt;/code&gt; at the collection root) that’s automatically gitignored. The same variable name in &lt;code&gt;.env&lt;/code&gt; wins over the one in the committed environment file, so you can commit &lt;code&gt;API_BASE_URL&lt;/code&gt; safely while keeping &lt;code&gt;API_TOKEN&lt;/code&gt; off-disk in version control. This is the workflow Postman never quite supported cleanly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scripting with JavaScript
&lt;/h3&gt;

&lt;p&gt;Bruno uses a JavaScript scripting model close to Postman’s. Pre-request scripts run before the request fires, post-response scripts run after, and tests run alongside the response. You have access to a &lt;code&gt;req&lt;/code&gt; and &lt;code&gt;res&lt;/code&gt; object, plus a &lt;code&gt;bru&lt;/code&gt; object for variable manipulation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Post-response: capture an auth token for later requests&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getBody&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;access_token&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;bru&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setEnvVar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;api_token&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;access_token&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pre-request scripts are useful for HMAC signing, dynamic timestamps, or generating idempotency keys. Anything you’d write in Postman’s pre-request tab works here with minor renaming.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Bruno CLI
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;bru&lt;/code&gt; CLI runs collections from the terminal. Install it once globally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @usebruno/cli
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then point it at a collection folder and pick an environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;path/to/your/bruno-collection
bru run &lt;span class="nt"&gt;--env&lt;/span&gt; staging
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By default it runs every request and reports pass/fail. You can scope to a single folder, output JUnit XML for your CI dashboard, or stop on the first failure. That’s the same shape Newman gives you for Postman, except the input is the same plain-text files you’ve been editing all day.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where Bruno Falls Short
&lt;/h3&gt;

&lt;p&gt;Bruno is honest about being newer software. The few rough edges I’ve hit:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GraphQL support exists but is less polished than Postman’s. Schema introspection works; the visual query builder is leaner.&lt;/li&gt;
&lt;li&gt;WebSocket and Server-Sent Events were added in 2024 and 2025 respectively — they work, but they’re not the headline feature.&lt;/li&gt;
&lt;li&gt;Team collaboration features beyond Git are intentionally minimal. There’s a paid &lt;em&gt;Bruno Golden Edition&lt;/em&gt; that adds AI-assisted scripting and visual diffing, but the core sync model is and probably will remain “use Git.”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these are dealbreakers; they’re just trade-offs that come with the local-first design.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hoppscotch: A Browser-Based Postman That Loads in 200ms
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://hoppscotch.io" rel="noopener noreferrer"&gt;Hoppscotch&lt;/a&gt; takes the opposite approach. Instead of a desktop app with files, it’s a Progressive Web App that runs entirely in your browser. Open &lt;code&gt;hoppscotch.io&lt;/code&gt; in any tab and you have a working API client in the time it takes the page to render — no install, no account, no download.&lt;/p&gt;

&lt;p&gt;It’s open-source under the MIT license, the source lives on GitHub, and you can self-host it on your own machine or your own infra in under five minutes. The hosted version at hoppscotch.io is free with no rate limits on the core REST/GraphQL/WebSocket clients.&lt;/p&gt;

&lt;h3&gt;
  
  
  Getting Started in 10 Seconds
&lt;/h3&gt;

&lt;p&gt;Open &lt;a href="https://hoppscotch.io" rel="noopener noreferrer"&gt;hoppscotch.io&lt;/a&gt; in your browser. The default tab is a REST request panel. Type a URL, pick a method, hit send. Done. That’s the whole onboarding.&lt;/p&gt;

&lt;p&gt;If you’d rather keep your data on your own machine, the same UI is available as an Electron desktop app and as a Docker image:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; 3000:3000 hoppscotch/hoppscotch
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Visit &lt;code&gt;http://localhost:3000&lt;/code&gt; and you have a private instance with the same UX as the hosted version.&lt;/p&gt;

&lt;h3&gt;
  
  
  Workspaces, Collections, and Local Storage
&lt;/h3&gt;

&lt;p&gt;Without an account, Hoppscotch keeps your collections, environments, and history in browser local storage. That’s enough for solo development on a single machine. If you want sync across devices or share a collection with a teammate, you can either sign in with a free Hoppscotch account (Google, GitHub, email) or self-host the full backend, which adds a Postgres database and lets you run team workspaces with your own access controls.&lt;/p&gt;

&lt;p&gt;The free hosted tier in 2026 includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Unlimited personal requests, collections, and environments&lt;/li&gt;
&lt;li&gt;One personal workspace&lt;/li&gt;
&lt;li&gt;Up to two team members in a free team workspace&lt;/li&gt;
&lt;li&gt;The full REST, GraphQL, WebSocket, SSE, Socket.IO, and MQTT clients&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For most freelance and side-project work that’s plenty. For a real engineering team, self-hosting the open-source server gives you unlimited everything.&lt;/p&gt;

&lt;h3&gt;
  
  
  Protocols Beyond REST
&lt;/h3&gt;

&lt;p&gt;Hoppscotch’s standout feature is breadth. The same UI handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;REST&lt;/strong&gt; — the default, with full support for headers, body types, auth, and pre-request/test scripts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GraphQL&lt;/strong&gt; — schema introspection from a URL, a query editor with autocomplete, and variables/headers panels.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;WebSocket&lt;/strong&gt; — connect, send, receive, with a live event log.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Server-Sent Events&lt;/strong&gt; — open a connection to an SSE endpoint and stream the events.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Socket.IO&lt;/strong&gt; — yes, with full Socket.IO protocol support, not just raw WebSocket.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MQTT&lt;/strong&gt; — connect to a broker, subscribe to topics, publish messages.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Postman has been adding these one by one over the years. Hoppscotch shipped them as a coherent set early, and the experience is consistent across all of them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scripting and Tests
&lt;/h3&gt;

&lt;p&gt;Hoppscotch uses a sandboxed JavaScript scripting model, similar in shape to Bruno’s and Postman’s. The objects are slightly different — &lt;code&gt;pw&lt;/code&gt; instead of &lt;code&gt;pm&lt;/code&gt; or &lt;code&gt;bru&lt;/code&gt; — but the patterns translate directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Test script&lt;/span&gt;
&lt;span class="nx"&gt;pw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Status is 200&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;pw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;pw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toBe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;pw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Response has a token&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;pw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;pw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toBeType&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Capture for next request&lt;/span&gt;
&lt;span class="nx"&gt;pw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;auth_token&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;pw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pre-request scripts work the same way — set headers, generate signatures, mutate the request body before sending.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Hoppscotch CLI
&lt;/h3&gt;

&lt;p&gt;Hoppscotch ships &lt;code&gt;hopp&lt;/code&gt; for running collections in CI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @hoppscotch/cli
hopp &lt;span class="nb"&gt;test &lt;/span&gt;path/to/collection.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It’s lightweight, exits non-zero on failures, and prints a clean per-request summary. Same job as Newman or &lt;code&gt;bru run&lt;/code&gt;, with no surprises.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where Hoppscotch Falls Short
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The collection format is JSON, not the file-per-request layout Bruno uses. You can commit the JSON to Git, but the diffs are noisier than a &lt;code&gt;.bru&lt;/code&gt; file change.&lt;/li&gt;
&lt;li&gt;Browser-based means CORS will catch you off guard the first time you hit a localhost API. The fix is the Hoppscotch browser extension or the desktop app, both of which proxy past CORS.&lt;/li&gt;
&lt;li&gt;Heavy team workflows want self-hosting. The hosted free tier is generous for individuals but designed to push real teams toward running their own server.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Bruno vs Hoppscotch: Side by Side
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Bruno&lt;/th&gt;
&lt;th&gt;Hoppscotch&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Primary form factor&lt;/td&gt;
&lt;td&gt;Desktop app&lt;/td&gt;
&lt;td&gt;Browser PWA + Electron + self-host&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Storage model&lt;/td&gt;
&lt;td&gt;Plain text files (.bru)&lt;/td&gt;
&lt;td&gt;Browser local storage / Postgres (self-host)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Account required&lt;/td&gt;
&lt;td&gt;Never&lt;/td&gt;
&lt;td&gt;Optional for sync&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;License&lt;/td&gt;
&lt;td&gt;MIT&lt;/td&gt;
&lt;td&gt;MIT (community edition)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Git-native&lt;/td&gt;
&lt;td&gt;Yes, by design&lt;/td&gt;
&lt;td&gt;Possible (commit JSON), less elegant&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Postman v2.1 import&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;cURL paste import&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;REST&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GraphQL&lt;/td&gt;
&lt;td&gt;Yes (basic)&lt;/td&gt;
&lt;td&gt;Yes (full introspection)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WebSocket&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SSE&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Socket.IO&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MQTT&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gRPC&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CLI runner&lt;/td&gt;
&lt;td&gt;&lt;code&gt;bru&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;hopp&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scripting&lt;/td&gt;
&lt;td&gt;JS (bru.* + req/res)&lt;/td&gt;
&lt;td&gt;JS (pw.*)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Self-host&lt;/td&gt;
&lt;td&gt;N/A (no server)&lt;/td&gt;
&lt;td&gt;Yes (Docker, one container)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best for&lt;/td&gt;
&lt;td&gt;Repo-coupled API testing&lt;/td&gt;
&lt;td&gt;Multi-protocol &amp;amp; quick browser use&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  How to Pick Between Them
&lt;/h2&gt;

&lt;p&gt;Most of the time the choice falls out of two questions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is this API client tied to a specific code repository?&lt;/strong&gt; If yes, pick Bruno. The whole point of Bruno is that the collection lives in &lt;code&gt;your-project/bruno/&lt;/code&gt; next to the code that defines the routes. New endpoints arrive in pull requests as new &lt;code&gt;.bru&lt;/code&gt; files. Reviewers can read them. The CI runs them. There is one source of truth, and it’s the same Git repo as the rest of your code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is this API client a personal scratch space across many APIs you don’t own?&lt;/strong&gt; If yes, pick Hoppscotch. You wake up, open a tab, debug a Stripe webhook, debug a Supabase RPC, debug your own staging server, and close the tab. There’s no project to commit to. The browser-resident model is exactly right.&lt;/p&gt;

&lt;p&gt;I personally use both. Bruno is what’s installed on my laptop next to the IDE — every active project has a &lt;code&gt;bruno/&lt;/code&gt; folder. Hoppscotch is bookmarked on my browser bar for one-off cURL replacements and for the protocols (Socket.IO, MQTT) Bruno doesn’t cover.&lt;/p&gt;

&lt;h2&gt;
  
  
  What You Actually Lose vs Postman
&lt;/h2&gt;

&lt;p&gt;The honest part of this article. Postman has invested a decade of engineering hours, and not all of that lands cleanly in either alternative:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Mock servers.&lt;/strong&gt; Postman lets you spin up a hosted mock from any collection in two clicks. Bruno doesn’t have hosted infrastructure to do this. Hoppscotch has a “REST request inspector” but no full-featured mock. If you depend on Postman mocks, expect to replace them with &lt;a href="https://www.mock-server.com" rel="noopener noreferrer"&gt;MockServer&lt;/a&gt;, &lt;a href="https://wiremock.org" rel="noopener noreferrer"&gt;WireMock&lt;/a&gt;, or a tiny Express/Hono app you run yourself.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API monitors.&lt;/strong&gt; Scheduled, hosted runs of a collection. Postman charges for these now anyway, so the right replacement is a GitHub Actions cron job that runs &lt;code&gt;bru run&lt;/code&gt; or &lt;code&gt;hopp test&lt;/code&gt;. Free, transparent, lives next to your code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Visualizers.&lt;/strong&gt; Postman’s HTML/Chart visualizer for response data. Niche feature; both alternatives let you do JSON path inspection and that covers the common cases.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Public API documentation hosting.&lt;/strong&gt; Postman publishes a public docs page for any collection. Bruno has no equivalent; if you need this, generate OpenAPI from your code and host with Redocly or Scalar instead. Hoppscotch has a community “shared collections” feature but it’s not a docs platform.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Postman Flows / AI features.&lt;/strong&gt; Postman’s visual chaining of requests and the newer AI assist. Both alternatives have lighter scripting; rebuild flows by writing the chain in JS.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For 90% of REST API work — write request, send, inspect response, scripted assertion, run in CI — neither alternative loses anything compared to Postman. For the cloud-coupled features in the list above, the answer is usually a free standalone tool that does the one thing better, not an all-in-one replacement.&lt;/p&gt;

&lt;h2&gt;
  
  
  Migrating from Postman in About Ten Minutes
&lt;/h2&gt;

&lt;p&gt;Both Bruno and Hoppscotch import the Postman v2.1 collection JSON natively. Here’s the path I’ve used on a real Postman workspace with about 80 requests across four collections.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Export from Postman.&lt;/strong&gt; Right-click each collection → &lt;em&gt;Export&lt;/em&gt; → choose &lt;em&gt;Collection v2.1&lt;/em&gt;. You’ll get a JSON file per collection. Do the same for environments — Postman gives you a per-environment JSON.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Import into Bruno.&lt;/strong&gt; Open Bruno, click &lt;strong&gt;Import Collection&lt;/strong&gt;, choose &lt;em&gt;Postman Collection&lt;/em&gt;, point at the JSON. Bruno converts the JSON into a folder of &lt;code&gt;.bru&lt;/code&gt; files and asks where to save it. Pick a folder inside your project repo (&lt;code&gt;your-project/bruno/&lt;/code&gt; is the convention). Repeat for each collection. For environments, click &lt;strong&gt;Environments → Import&lt;/strong&gt; and feed in the Postman environment JSON; Bruno creates a &lt;code&gt;.bru&lt;/code&gt; file per environment.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Import into Hoppscotch.&lt;/strong&gt; Open the Collections panel → kebab menu → &lt;em&gt;Import / Export&lt;/em&gt; → &lt;em&gt;Postman Collection&lt;/em&gt;. Same drill: pick the JSON, the collection appears in your sidebar. Environments import via the Environments panel.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit pre-request and test scripts.&lt;/strong&gt; The translation isn’t byte-perfect. Postman’s &lt;code&gt;pm.environment.set&lt;/code&gt; becomes &lt;code&gt;bru.setEnvVar&lt;/code&gt; in Bruno and &lt;code&gt;pw.env.set&lt;/code&gt; in Hoppscotch. Search-and-replace in your editor, or run a few requests and fix what breaks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Move secrets out of the committed environment file.&lt;/strong&gt; If you used Postman’s “secret” variable type, both alternatives have a similar concept (Bruno’s &lt;code&gt;.env&lt;/code&gt; override, Hoppscotch’s secret variables). Move tokens and keys there before you commit the collection.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wire it into CI.&lt;/strong&gt; A GitHub Actions step that runs &lt;code&gt;bru run --env staging&lt;/code&gt; or &lt;code&gt;hopp test collection.json&lt;/code&gt; against your staging deployment after every push gives you Postman monitor-equivalent coverage for $0.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Allow a real day for a large enterprise workspace; ten minutes covers a typical solo project.&lt;/p&gt;

&lt;h2&gt;
  
  
  Power-User Tips After Six Months on Each
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Bruno
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Add the Bruno collection folder to your editor workspace. Modern editors render &lt;code&gt;.bru&lt;/code&gt; as plain text, and editing a request directly in your IDE is faster than clicking through the GUI for batch changes.&lt;/li&gt;
&lt;li&gt;Use the collection-level pre-request script for things like base auth header construction. It runs before every request in the collection, so you don’t repeat yourself.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;seq&lt;/code&gt; field in the meta block controls request order in the sidebar — set it deliberately when you import from Postman, otherwise alphabetical ordering can scramble logical groupings.&lt;/li&gt;
&lt;li&gt;Bruno’s diff-on-PR story is the actual killer feature. Configure your repo so reviewers must approve changes to &lt;code&gt;bruno/&lt;/code&gt; the same way they approve changes to anything else.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Hoppscotch
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Install the Hoppscotch browser extension. It adds a proxy that sidesteps CORS for localhost APIs, which is otherwise the most painful part of using a browser-resident client.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;History&lt;/strong&gt; panel is a feature, not an afterthought. Every request you fire is logged locally; you can rerun, copy as cURL, or save into a collection without leaving the panel.&lt;/li&gt;
&lt;li&gt;Self-host with one Docker command if you’re using Hoppscotch with a team. The community edition is fully featured; you don’t need the hosted enterprise version unless you specifically want SAML or audit logs.&lt;/li&gt;
&lt;li&gt;The shortcut bar (?) inside Hoppscotch is dense and worth a one-time read. Most actions have a keyboard shortcut and the app is clearly designed for keyboard-first usage.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Using Bruno or Hoppscotch with OpenClaw
&lt;/h2&gt;

&lt;p&gt;If you’re using &lt;a href="https://openclaw.ai" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt; to orchestrate AI agents that call APIs, an open-source local-first API client matters more than it does for normal API work. The agent’s tool definitions are usually thin wrappers around HTTP requests, and the way you discover and verify those requests is by hand-firing them in your API client first.&lt;/p&gt;

&lt;p&gt;The workflow I’ve settled on:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Spike a new agent integration by adding a Bruno request to the project’s &lt;code&gt;bruno/&lt;/code&gt; folder. Get the request right — auth, headers, body shape — until the response is what the agent will need.&lt;/li&gt;
&lt;li&gt;Translate that request into the agent’s tool schema. Because the &lt;code&gt;.bru&lt;/code&gt; file is plain text, I can paste the exact URL, headers, and body template into a Claude or Gemini chat and ask it to scaffold the OpenClaw tool definition.&lt;/li&gt;
&lt;li&gt;Keep the Bruno request in the repo as a permanent regression check. If the third-party API drifts, the Bruno test fails in CI before the agent silently breaks in production.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For ad-hoc poking around someone else’s API while building an agent — “what does Notion’s search endpoint actually return?” — Hoppscotch in a browser tab is the faster path because there’s no project to set up.&lt;/p&gt;

&lt;h2&gt;
  
  
  Should You Stay on Postman?
&lt;/h2&gt;

&lt;p&gt;There are a few cases where Postman is still the right answer in 2026:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your team already pays for the Postman Enterprise tier and uses governance, RBAC, audit logs, and the API governance rules engine. Replacing those with self-built tooling isn’t free.&lt;/li&gt;
&lt;li&gt;You depend on Postman Flows for visual workflow building and aren’t ready to express the same logic in code.&lt;/li&gt;
&lt;li&gt;Your stakeholders consume your APIs through Postman’s public documentation pages and you’re not in a position to switch documentation tools.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For everyone else — solo developer, small team, indie hacker, agency contractor — the open-source alternatives have caught up to and in places surpassed Postman’s free tier. There’s no longer a reason to accept a forced login and a cloud-by-default model on a tool you use dozens of times a day.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Reads
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/kling-pika-heygen/" rel="noopener noreferrer"&gt;Free AI Video Generators in 2026: Kling vs Pika vs HeyGen Compared&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/free-temp-email-services/" rel="noopener noreferrer"&gt;Free Temporary Email Services in 2026: 9 Best Disposable Email Tools for Developer Testing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/herosms-review/" rel="noopener noreferrer"&gt;HeroSMS Review: Receive SMS Verification Codes from 180+ Countries&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/cohere-rag-api/" rel="noopener noreferrer"&gt;Cohere Free API: The Best Free Embedding and Rerank API for RAG in 2026&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/render-hosting-review/" rel="noopener noreferrer"&gt;Render Free Hosting Review 2026: Deploy Web Apps, Databases, and Cron Jobs for Free&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Final Verdict
&lt;/h2&gt;

&lt;p&gt;If you only adopt one, pick the one that matches how you work:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Use Bruno&lt;/strong&gt; if your API client lives next to a code repository and you want the request collection in Git, reviewable in pull requests, runnable in CI. The &lt;code&gt;.bru&lt;/code&gt; file format is the most underrated piece of API tooling shipped in the last three years.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use Hoppscotch&lt;/strong&gt; if your API client is a personal scratch pad across many third-party services and you want a tab you can open in any browser on any machine, with optional self-hosting when you’re ready for it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both are MIT-licensed, both are mature, both import Postman collections, both run in CI. Whichever you pick, the migration takes a fraction of a day, and the result is an API workflow that doesn’t depend on a vendor’s mood about their free tier.&lt;/p&gt;

&lt;p&gt;Get Bruno at &lt;a href="https://www.usebruno.com" rel="noopener noreferrer"&gt;usebruno.com&lt;/a&gt;, get Hoppscotch at &lt;a href="https://hoppscotch.io" rel="noopener noreferrer"&gt;hoppscotch.io&lt;/a&gt;. Both are still free, both still respect your local disk, and both will probably outlast Postman’s next pricing change.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://toolfreebie.com/postman-alternatives/" rel="noopener noreferrer"&gt;toolfreebie.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>tools</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Free Temporary Email Services in 2026: 9 Best Disposable Email Tools for Developer Testing</title>
      <dc:creator>toolfreebie</dc:creator>
      <pubDate>Sun, 03 May 2026 15:34:37 +0000</pubDate>
      <link>https://dev.to/build996/free-temporary-email-services-in-2026-9-best-disposable-email-tools-for-developer-testing-3p11</link>
      <guid>https://dev.to/build996/free-temporary-email-services-in-2026-9-best-disposable-email-tools-for-developer-testing-3p11</guid>
      <description>&lt;h2&gt;
  
  
  What Is a Temporary Email Service?
&lt;/h2&gt;

&lt;p&gt;A temporary email service hands you a working inbox you didn’t sign up for. You open the website, an address is generated for you on the spot, and any message sent to it lands in a public (or pseudo-public) inbox you can read in your browser. After ten minutes, an hour, or sometimes a few days, the address expires and the messages disappear with it.&lt;/p&gt;

&lt;p&gt;For developers, that throwaway inbox is more than a privacy tool — it’s a testing primitive. Every time you build a signup flow, a password reset link, an OTP-style verification, or a transactional email pipeline, you need somewhere to receive the mail without polluting your real Gmail account or burning through Mailgun sandbox quotas. Disposable email services solve this for free.&lt;/p&gt;

&lt;p&gt;This article walks through the nine free temp email tools I actually use in 2026, including which ones expose a clean HTTP API for end-to-end test automation. If you’ve ever written a Playwright test that has to pull a magic link out of a real inbox, or you just want a fast address to register on a forum you’ll never visit again, one of these will fit.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Developers Need Disposable Email in 2026
&lt;/h2&gt;

&lt;p&gt;Temp email started life as a privacy convenience for anyone who didn’t want to give their real address to a sketchy site. That’s still a valid use case, but the developer angle has grown bigger every year:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;End-to-end signup testing.&lt;/strong&gt; Your QA pipeline registers a fresh user, clicks a confirmation link, and verifies the welcome email arrived. That requires a real, programmable inbox per run.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Magic-link and OTP flows.&lt;/strong&gt; Modern auth (Supabase Auth, Clerk, Auth0 passwordless) sends a one-time link or code. End-to-end coverage means actually fetching that email.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SaaS free-trial cycling.&lt;/strong&gt; When you’re benchmarking competitors, you need a clean inbox per trial without leaking your work email onto every marketing list.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Webhook and SMTP debugging.&lt;/strong&gt; A throwaway address is the simplest way to confirm your transactional template renders correctly across Gmail, Outlook, and Apple Mail forwarding paths.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Avoiding spam during research.&lt;/strong&gt; Reading documentation, downloading whitepapers, or claiming free credits often gates the resource behind an email form. Temp email gets you past it without follow-up campaigns.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’re building any kind of AI agent or automation that has to register an account on the user’s behalf — for example, an OpenClaw workflow that signs into a third-party service to scrape pricing data — a programmatic temp email API becomes part of the agent’s toolkit. The agent generates an address, polls the inbox over HTTP, extracts the verification link, and continues. No human in the loop, no real inbox at risk.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Picked These Nine Services
&lt;/h2&gt;

&lt;p&gt;There are dozens of temp email sites listed on Google, but most are ad-stuffed clones of the same three or four real providers. To make this list I tested for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;It actually works in 2026.&lt;/strong&gt; Many older domains are now blocked by SendGrid, Mailgun, and Postmark by default. I sent test mail from Resend, Brevo, and a personal Gmail to every address listed below in April 2026 — every one of them received it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No registration required for the basic flow.&lt;/strong&gt; If you have to sign up with another email to use the temp email service, that defeats the purpose.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sane UI or a documented API.&lt;/strong&gt; Either you can read the inbox in three clicks, or you can pull it from a script.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reasonable lifespan.&lt;/strong&gt; “Receives mail for ten seconds and dies” services are excluded.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Below, each tool gets a short description, who it’s best for, and (where it exists) the API surface developers care about.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Mail.tm — The Best Free Temp Email API
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://mail.tm" rel="noopener noreferrer"&gt;Mail.tm&lt;/a&gt; is, in my opinion, the best free disposable email provider for any kind of automated testing in 2026. The web UI is clean, but the killer feature is a fully documented JSON API that lets you create accounts, list messages, and fetch message bodies over HTTP without any scraping.&lt;/p&gt;

&lt;p&gt;You don’t need an API key. The flow is: hit &lt;code&gt;POST /accounts&lt;/code&gt; with a generated address and password, get a JWT, and use that JWT to read your own inbox. Each address persists for as long as you keep using it, and there’s no aggressive rate limiting on the free tier for normal test volumes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;secrets&lt;/span&gt;

&lt;span class="n"&gt;BASE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.mail.tm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# 1. Get a list of available domains
&lt;/span&gt;&lt;span class="n"&gt;domains&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;BASE&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/domains&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hydra:member&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;domain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;domains&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;domain&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Create an inbox
&lt;/span&gt;&lt;span class="n"&gt;local&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;secrets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;token_hex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;address&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;local&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;@&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;domain&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;password&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;secrets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;token_urlsafe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;BASE&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/accounts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;address&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;address&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;password&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Authenticate
&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;BASE&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/token&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;address&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;address&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;password&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;token&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# 4. Poll the inbox
&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;BASE&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That snippet is the entire integration. Drop it into a Playwright fixture and your tests get a fresh inbox per run, no shared state, no flakiness. For an AI agent, this is what you want — three endpoints, no JavaScript scraping required.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; automated end-to-end tests, AI agents that need to receive verification email, anyone who wants an inbox API without an API key.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Mailinator — The Public Inbox With a Pro API
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.mailinator.com" rel="noopener noreferrer"&gt;Mailinator&lt;/a&gt; is the oldest disposable email service still actively maintained, and it has the most developer-oriented design of the bunch. The free tier works on the public domain &lt;code&gt;@mailinator.com&lt;/code&gt;, where any inbox is shared — anyone who knows the username can read the mail. That sounds like a problem, but for QA it’s perfect: you generate a long random username and treat the inbox as a write-once log.&lt;/p&gt;

&lt;p&gt;The free public inbox is browser-only — type any name into the search box at mailinator.com and you’ll see whatever is currently in that inbox. Their paid tier ($59/month) adds private domains and a clean REST API, but for most developer testing the free public flow is enough.&lt;/p&gt;

&lt;p&gt;If you want programmatic access on the free tier, you can scrape the public web inbox or use &lt;code&gt;https://www.mailinator.com/v4/public/inboxes.jsp?to=USERNAME&lt;/code&gt; as a starting point. For paid private use, the official API is documented at &lt;code&gt;api.mailinator.com&lt;/code&gt; and supports REST calls with an API token.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; teams that already pay for Mailinator, manual QA where username obscurity is enough, public-inbox-friendly automation.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Temp-Mail.io — Big Domain Pool, Browser-Friendly
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://temp-mail.io" rel="noopener noreferrer"&gt;Temp-Mail.io&lt;/a&gt; is the polished consumer-facing temp mail service. It rotates across dozens of domains, supports several mailbox lifetimes, and has Chrome and Firefox extensions if you want a one-click address from your browser bar.&lt;/p&gt;

&lt;p&gt;The killer feature for non-developers is how aggressively it handles the “I just need an inbox right now” flow: open the page, an address is already created, copy it, you’re done. There is also a private RapidAPI-hosted JSON API for paid use if you want to integrate it into automation, but the public web flow doesn’t expose a free API.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; manual signup testing, browser-driven research, anyone who hates filling forms with their personal email.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. 10 Minute Mail — The Classic Quick Hit
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://10minutemail.com" rel="noopener noreferrer"&gt;10 Minute Mail&lt;/a&gt; is the original “open the page, address ready, ten-minute timer” service, and it still works exactly as advertised in 2026. The address self-destructs after ten minutes (you can extend by another ten if you need a slow confirmation email to arrive).&lt;/p&gt;

&lt;p&gt;It has no API, no extension, no fancy features — just a single address and a single inbox view. That minimalism is actually the point. When I’m reading documentation that wants my email to grant access to a PDF, this is the service I open.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; one-off email gates, quickly grabbing a download link, never-coming-back-to-this-site signups.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Guerrilla Mail — Long-Running and Reliable
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.guerrillamail.com" rel="noopener noreferrer"&gt;Guerrilla Mail&lt;/a&gt; has been online since 2006 and remains one of the most stable temp email providers on the internet. It supports both receiving and (limited) sending, and the inbox keeps mail for one hour by default.&lt;/p&gt;

&lt;p&gt;It also exposes a documented &lt;a href="https://www.guerrillamail.com/GuerrillaMailAPI.html" rel="noopener noreferrer"&gt;JSON API&lt;/a&gt; that returns email lists and bodies as plain JSON over HTTP, with session tokens. The API is older than Mail.tm’s and a little quirkier — you set an email user via &lt;code&gt;set_email_user&lt;/code&gt;, then poll &lt;code&gt;get_email_list&lt;/code&gt; — but it works without registration and has handled my automation reliably for years.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Guerrilla Mail polling sketch
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;BASE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.guerrillamail.com/ajax.php&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Session&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;addr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BASE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_email_address&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Inbox:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;addr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;email_addr&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Later, poll messages
&lt;/span&gt;&lt;span class="n"&gt;emails&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BASE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_email_list&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;offset&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;emails&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;list&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; developers who want a free programmatic inbox without account creation, longer-running test scenarios that may need an hour to complete.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. YOPmail — Predictable Inbox Names
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://yopmail.com" rel="noopener noreferrer"&gt;YOPmail&lt;/a&gt; works on a different model: every possible &lt;code&gt;@yopmail.com&lt;/code&gt; address already exists. You don’t generate one, you just type it. &lt;code&gt;foo@yopmail.com&lt;/code&gt; has an inbox right now. Mail sent to it will be visible to anyone who types &lt;code&gt;foo&lt;/code&gt; into yopmail.com.&lt;/p&gt;

&lt;p&gt;That’s both the convenience and the risk. For testing, the convenience wins: there’s no creation step, no tokens, no sessions. You just pick a unique-enough username (a UUID is fine) and use it. Inboxes hold mail for eight days, much longer than most rivals.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; manual testing where you just need predictable, easy-to-remember inboxes, or automation where pre-generating a UUID-style local-part is fine.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. EmailOnDeck — Slightly More Legit-Looking
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.emailondeck.com" rel="noopener noreferrer"&gt;EmailOnDeck&lt;/a&gt; positions itself slightly upmarket. The domains it uses look less obviously disposable, which matters when you’re testing a service that has aggressive temp-mail blocking. It has a small CAPTCHA on the free flow, which prevents pure automation but works fine for manual use.&lt;/p&gt;

&lt;p&gt;There’s a paid premium plan (around $5/month at the time of writing) that gives you longer-lived addresses and better domain rotation. For occasional manual testing, the free tier is enough.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; manual sign-ups for services that block obvious temp mail domains.&lt;/p&gt;

&lt;h2&gt;
  
  
  8. Mail7 — Built Specifically for QA
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.mail7.io" rel="noopener noreferrer"&gt;Mail7&lt;/a&gt; is one of the few temp email services explicitly built for software testing. Its homepage talks about Selenium and Cypress integration before it talks about privacy. It exposes a REST API that lets you create custom addresses (you control the local part), poll the inbox, and pull the full HTML/text body.&lt;/p&gt;

&lt;p&gt;The free plan is generous enough for most CI pipelines: a few hundred messages per day with full API access. Authentication is via a free API key created in their dashboard. Compared to Mail.tm, it’s more QA-shaped: you can name your test inboxes by feature (&lt;code&gt;checkout-test-1@...&lt;/code&gt;) and reuse them across runs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; CI pipelines that want named, reusable test inboxes, QA teams already using Selenium or Cypress.&lt;/p&gt;

&lt;h2&gt;
  
  
  9. SimpleLogin (by Proton) — Real Aliases, Not Throwaway
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://simplelogin.io" rel="noopener noreferrer"&gt;SimpleLogin&lt;/a&gt; isn’t strictly a temp email service — it’s an email &lt;em&gt;aliasing&lt;/em&gt; service, now owned by Proton. But it belongs on this list because it solves the same problem with a more sustainable model: instead of throwaway inboxes, you create unlimited aliases that forward to your real inbox, and you can disable any alias the moment it starts receiving spam.&lt;/p&gt;

&lt;p&gt;The free tier gives you 10 aliases on the &lt;code&gt;simplelogin.com&lt;/code&gt; shared domain, no credit card. That’s enough for ten “this site looks shady” sign-ups before you need to upgrade. The paid tier ($30/year) lifts that to unlimited aliases on custom domains, which is genuinely useful if you’re tired of giving your work email to every SaaS trial.&lt;/p&gt;

&lt;p&gt;SimpleLogin won’t help with automated end-to-end testing — the whole point is forwarding to your real inbox — but it’s the closest thing to a long-term, ethical alternative to disposable email if you’re tired of typing throwaway addresses every time you read a whitepaper.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; long-term personal use, anyone moving away from giving out their primary email, Proton ecosystem users.&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparison Table: Which Temp Email Tool Should You Pick?
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;th&gt;Free API&lt;/th&gt;
&lt;th&gt;Inbox Lifespan&lt;/th&gt;
&lt;th&gt;Best Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Mail.tm&lt;/td&gt;
&lt;td&gt;Yes — REST, no key&lt;/td&gt;
&lt;td&gt;Persistent (account-based)&lt;/td&gt;
&lt;td&gt;Test automation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mailinator&lt;/td&gt;
&lt;td&gt;Paid only on private inbox&lt;/td&gt;
&lt;td&gt;Public, ephemeral&lt;/td&gt;
&lt;td&gt;Public-inbox QA&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Temp-Mail.io&lt;/td&gt;
&lt;td&gt;Paid (RapidAPI)&lt;/td&gt;
&lt;td&gt;Variable&lt;/td&gt;
&lt;td&gt;Browser-driven manual use&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10 Minute Mail&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;10 minutes&lt;/td&gt;
&lt;td&gt;One-off email gates&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Guerrilla Mail&lt;/td&gt;
&lt;td&gt;Yes — REST, no key&lt;/td&gt;
&lt;td&gt;1 hour&lt;/td&gt;
&lt;td&gt;Free programmatic use&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;YOPmail&lt;/td&gt;
&lt;td&gt;No (predictable URLs)&lt;/td&gt;
&lt;td&gt;8 days&lt;/td&gt;
&lt;td&gt;Predictable test inboxes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;EmailOnDeck&lt;/td&gt;
&lt;td&gt;No (free) / Yes (paid)&lt;/td&gt;
&lt;td&gt;Variable&lt;/td&gt;
&lt;td&gt;Bypassing temp-mail blocks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mail7&lt;/td&gt;
&lt;td&gt;Yes — REST, free key&lt;/td&gt;
&lt;td&gt;Customizable&lt;/td&gt;
&lt;td&gt;QA pipelines (Selenium/Cypress)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SimpleLogin&lt;/td&gt;
&lt;td&gt;Yes — REST&lt;/td&gt;
&lt;td&gt;Permanent (alias-based)&lt;/td&gt;
&lt;td&gt;Long-term personal aliasing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  End-to-End Testing With Mail.tm and Playwright
&lt;/h2&gt;

&lt;p&gt;To make this concrete, here’s a full Playwright test that signs up for a real service, fetches the verification email from Mail.tm, clicks the link, and asserts the user lands on the dashboard. This is the kind of test most teams skip because writing it against Gmail is painful — but with a temp email API, it becomes routine:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// signup.spec.ts&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;expect&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@playwright/test&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;MAIL_BASE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://api.mail.tm&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;createInbox&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;domains&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;MAIL_BASE&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/domains`&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;domain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;domains&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;hydra:member&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;address&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`e2e-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;crypto&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randomUUID&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;@&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;password&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;crypto&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randomUUID&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;MAIL_BASE&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/accounts`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;address&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;password&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;token&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;MAIL_BASE&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/token`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;address&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;password&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;address&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;token&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;waitForLink&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;timeoutMs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="nx"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;Authorization&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Bearer &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;start&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;timeoutMs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;MAIL_BASE&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/messages`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;headers&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;first&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;hydra:member&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;first&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;full&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;MAIL_BASE&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/messages/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;first&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;headers&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;match&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;full&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/https&lt;/span&gt;&lt;span class="se"&gt;?&lt;/span&gt;&lt;span class="sr"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\/\/\S&lt;/span&gt;&lt;span class="sr"&gt;+verify&lt;/span&gt;&lt;span class="se"&gt;\S&lt;/span&gt;&lt;span class="sr"&gt;*/&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;match&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;setTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;No verification email arrived&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user can sign up and verify email&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;address&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;token&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;createInbox&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://example-app.com/signup&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;input[name=email]&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;address&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;input[name=password]&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;TestPass123!&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;click&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;button[type=submit]&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;link&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;waitForLink&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;goto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;link&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;locator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;h1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;toHaveText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Welcome&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s a full real-world signup flow tested end-to-end with no shared inbox state, no human, and no out-of-band steps. Run it ten times in parallel in CI — every test gets its own inbox.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using Temp Email Inside an OpenClaw Agent
&lt;/h2&gt;

&lt;p&gt;Mail.tm and Guerrilla Mail are particularly useful inside agentic workflows. If you’re building an agent in OpenClaw (or any other AI coding/automation tool) that needs to register on a third-party service to complete a user’s task, the inbox-as-a-tool pattern looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The agent calls &lt;code&gt;create_inbox()&lt;/code&gt; — a tool that wraps Mail.tm’s &lt;code&gt;POST /accounts&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The agent uses the returned address in its target service signup.&lt;/li&gt;
&lt;li&gt;The agent calls &lt;code&gt;poll_inbox(token)&lt;/code&gt; in a loop until a verification email arrives.&lt;/li&gt;
&lt;li&gt;The agent extracts the verification link with a regex (or asks the LLM to find it) and follows it in a headless browser.&lt;/li&gt;
&lt;li&gt;Now logged in, the agent does whatever real work the user asked for.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is the kind of automation that breaks if you try to use a real Gmail account (rate limits, security challenges, captcha on new IPs), but works smoothly with a programmatic temp inbox. As AI agents do more on the user’s behalf in 2026, this pattern is becoming standard.&lt;/p&gt;

&lt;h2&gt;
  
  
  Privacy and Security Caveats
&lt;/h2&gt;

&lt;p&gt;Temp email is convenient, but it isn’t private. A few things to keep in mind:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Public inboxes are public.&lt;/strong&gt; Mailinator’s free tier and YOPmail’s entire model expose every message to anyone who guesses the local part. Never use these for password reset emails on accounts you actually care about — anyone who learns the address gets the reset link too.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Account-based services are not encrypted.&lt;/strong&gt; Mail.tm, Guerrilla Mail, and Mail7 keep mail in plaintext on their servers. Treat anything sent there as readable by the operator.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Many SaaS apps block obvious disposable domains.&lt;/strong&gt; If your test target uses a domain blocklist (Stripe, banking apps, KYC-bound services), you may need a private alias service like SimpleLogin or a paid Mailinator/Mail7 tier with custom domains.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real personal email is still the right choice for accounts you’ll use for years.&lt;/strong&gt; Temp email is a development and research tool, not an identity replacement.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Free vs Paid Tiers: When to Upgrade
&lt;/h2&gt;

&lt;p&gt;Most developer use cases stay on the free tier forever. You’d consider paying when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;You need a custom domain&lt;/strong&gt; so the inbox doesn’t look disposable to the target service. (Mailinator paid, Mail7 paid, SimpleLogin paid.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You’re running CI at scale&lt;/strong&gt; and need higher rate limits or guaranteed uptime SLAs. (Mail7 paid is the most QA-friendly here.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want long-term aliases for personal use&lt;/strong&gt; on multiple custom domains. (SimpleLogin Premium.)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For automated tests in a startup or solo project, the free tier of Mail.tm is overwhelmingly likely to be enough. Start there and only upgrade when you’ve quantified an actual limit you’ve hit.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common Mistakes to Avoid
&lt;/h2&gt;

&lt;p&gt;From watching teams adopt temp email in their CI pipelines, the same handful of footguns come up:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hardcoding a single inbox across all tests.&lt;/strong&gt; If two tests run in parallel and both expect “the latest email,” they’ll collide. Always create a fresh address per test.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tight polling loops.&lt;/strong&gt; Hitting &lt;code&gt;GET /messages&lt;/code&gt; twenty times a second is rude and gets you rate-limited. Poll every two to five seconds.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Not handling the “email never arrives” path.&lt;/strong&gt; Set a reasonable timeout (30 seconds is fine for most flows, 90 seconds for sluggish ESP setups). Fail loudly when it expires.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Using a temp email for the test admin account.&lt;/strong&gt; Your test user, sure. The admin account that owns the staging environment? Use a real, recoverable inbox.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Forgetting that messages disappear.&lt;/strong&gt; If a test logs the message ID for debugging, the body may be gone by the time you investigate. Save the full body to your CI artifacts when assertions fail.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Final Recommendation
&lt;/h2&gt;

&lt;p&gt;If you only remember three of these, remember:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Mail.tm&lt;/strong&gt; for any kind of test automation. Free, no API key, JSON, works.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;10 Minute Mail&lt;/strong&gt; for “just give me an address right now” manual use.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SimpleLogin&lt;/strong&gt; for protecting your real personal inbox over the long term.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Disposable email is one of those tools that looks small until you build a real automated test suite, then realises it’s load-bearing. Pick one, write a thin wrapper around its API in your project, and stop thinking about email infrastructure for the rest of the year. Every minute saved on shaky Gmail-scraping tests is a minute you get back for the actual feature work.&lt;/p&gt;

&lt;p&gt;If you’re building an AI agent that needs to receive verification mail on the user’s behalf, start with the Mail.tm Python snippet above and treat the inbox as just another tool the agent can call. It’s the cleanest free piece of infrastructure in this category, and it’ll save you a surprising amount of integration pain compared to rolling your own SMTP receiver.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Reads
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/kling-pika-heygen/" rel="noopener noreferrer"&gt;Free AI Video Generators in 2026: Kling vs Pika vs HeyGen Compared&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/postman-alternatives/" rel="noopener noreferrer"&gt;Postman Alternatives in 2026: Bruno and Hoppscotch — Free, Open-Source API Clients That Don’t Force a Login&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/herosms-review/" rel="noopener noreferrer"&gt;HeroSMS Review: Receive SMS Verification Codes from 180+ Countries&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/cohere-rag-api/" rel="noopener noreferrer"&gt;Cohere Free API: The Best Free Embedding and Rerank API for RAG in 2026&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/render-hosting-review/" rel="noopener noreferrer"&gt;Render Free Hosting Review 2026: Deploy Web Apps, Databases, and Cron Jobs for Free&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://toolfreebie.com/free-temp-email-services/" rel="noopener noreferrer"&gt;toolfreebie.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>tools</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Supabase vs Neon: Which Free PostgreSQL Database Should You Use in 2026?</title>
      <dc:creator>toolfreebie</dc:creator>
      <pubDate>Sun, 03 May 2026 12:08:05 +0000</pubDate>
      <link>https://dev.to/build996/supabase-vs-neon-which-free-postgresql-database-should-you-use-in-2026-1k4</link>
      <guid>https://dev.to/build996/supabase-vs-neon-which-free-postgresql-database-should-you-use-in-2026-1k4</guid>
      <description>&lt;h2&gt;
  
  
  PlanetScale Killed Its Free Tier — So Where Do You Go Now?
&lt;/h2&gt;

&lt;p&gt;In early 2024, PlanetScale quietly ended its Hobby plan — the free tier that tens of thousands of developers had been using for side projects, prototypes, and small production apps. The announcement landed like a cold shower. Suddenly, projects that had been humming along on zero-cost MySQL infrastructure needed a new home.&lt;/p&gt;

&lt;p&gt;That moment accelerated something that was already happening: developers switching to PostgreSQL. And in 2026, the two best free PostgreSQL options for developers are &lt;strong&gt;Supabase&lt;/strong&gt; and &lt;strong&gt;Neon&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Both are free. Both run PostgreSQL. Both work with your existing ORM and database client. But they’re built around very different philosophies — and the right choice depends entirely on what you’re building. This guide breaks down everything you need to know to pick the right one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Comparison: Supabase vs Neon Free Tiers
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Supabase (Free)&lt;/th&gt;
&lt;th&gt;Neon (Free)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Database&lt;/td&gt;
&lt;td&gt;PostgreSQL 15&lt;/td&gt;
&lt;td&gt;PostgreSQL 16&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Storage&lt;/td&gt;
&lt;td&gt;500 MB&lt;/td&gt;
&lt;td&gt;512 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compute&lt;/td&gt;
&lt;td&gt;Shared, pauses after 1 week inactivity&lt;/td&gt;
&lt;td&gt;0.25 vCPU / 1 GB RAM, auto-suspends after 5 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Projects / Databases&lt;/td&gt;
&lt;td&gt;2 active projects&lt;/td&gt;
&lt;td&gt;10 databases per project&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Branching&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes — unlimited branches&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Built-in Auth&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes — email, OAuth, magic links&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;REST/GraphQL API&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Auto-generated from your schema&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Edge Functions&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;500K invocations/month&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Storage (Files)&lt;/td&gt;
&lt;td&gt;1 GB&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Realtime&lt;/td&gt;
&lt;td&gt;Yes (200 concurrent connections)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pgvector&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Connection Pooling&lt;/td&gt;
&lt;td&gt;PgBouncer built-in&lt;/td&gt;
&lt;td&gt;Neon proxy (HTTP + WebSocket)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Credit Card Required&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best For&lt;/td&gt;
&lt;td&gt;Full-stack apps with auth + storage&lt;/td&gt;
&lt;td&gt;Dev branches, AI/serverless workloads&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The short version: Supabase is a full Firebase replacement — it bundles auth, file storage, realtime, and edge functions on top of PostgreSQL. Neon is laser-focused on the database itself, with a standout feature called &lt;em&gt;branching&lt;/em&gt; that every developer should know about.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Supabase?
&lt;/h2&gt;

&lt;p&gt;Supabase is an open-source Firebase alternative built entirely on PostgreSQL. Founded in 2020, it raised $80 million and now serves millions of developers. The hosted version is free to start, and the entire platform can also be self-hosted.&lt;/p&gt;

&lt;p&gt;The pitch is simple: instead of wiring together a database + auth provider + file storage + API layer yourself, Supabase gives you all of that in one place, all talking to the same PostgreSQL database underneath.&lt;/p&gt;

&lt;p&gt;When you create a Supabase project, you get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A real PostgreSQL database (not a wrapper, not a proxy — actual Postgres)&lt;/li&gt;
&lt;li&gt;A PostgREST layer that auto-generates a REST API from your tables&lt;/li&gt;
&lt;li&gt;GoTrue for authentication (email/password, OAuth, magic links, phone OTP)&lt;/li&gt;
&lt;li&gt;A storage service for files and images&lt;/li&gt;
&lt;li&gt;Realtime subscriptions via WebSocket&lt;/li&gt;
&lt;li&gt;Deno-based edge functions&lt;/li&gt;
&lt;li&gt;A dashboard UI that’s genuinely pleasant to use&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a lot of projects, this completely replaces what you’d previously build with Express + Passport + Multer + a websocket server + a bunch of custom middleware.&lt;/p&gt;

&lt;h2&gt;
  
  
  Supabase Free Tier: What You Actually Get
&lt;/h2&gt;

&lt;p&gt;The free tier is called the Free Plan. Here’s what matters in practice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Database storage:&lt;/strong&gt; 500 MB — enough for most side projects and prototypes. A typical app with a few thousand users and moderate data rarely hits this.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;File storage:&lt;/strong&gt; 1 GB via Supabase Storage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auth:&lt;/strong&gt; Up to 50,000 monthly active users — this is enormous. Most free-tier apps will never hit this ceiling.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Edge Functions:&lt;/strong&gt; 500,000 invocations per month&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Realtime:&lt;/strong&gt; 200 concurrent connections, 2 million messages/month&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bandwidth:&lt;/strong&gt; 5 GB per month&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;2 active projects&lt;/strong&gt; — you can have more, but only 2 are active at once&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The one catch: &lt;strong&gt;projects pause after 1 week of inactivity&lt;/strong&gt;. When a paused project gets its first request, it restarts — but this takes 1–2 seconds. For a side project you check occasionally, this is annoying. For anything in active development or light production use, you likely won’t notice it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started with Supabase
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Create Your Free Project
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;a href="https://supabase.com" rel="noopener noreferrer"&gt;supabase.com&lt;/a&gt; and sign up with GitHub&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;“New Project”&lt;/strong&gt;, choose a region close to your users&lt;/li&gt;
&lt;li&gt;Set a strong database password (save it — you’ll need it for direct connections)&lt;/li&gt;
&lt;li&gt;Wait ~2 minutes while Supabase provisions your instance&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Once your project is ready, grab your credentials from &lt;strong&gt;Project Settings → API&lt;/strong&gt;. You’ll need the &lt;code&gt;URL&lt;/code&gt; and &lt;code&gt;anon key&lt;/code&gt; for client-side usage, or the connection string for direct Postgres access.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Connect with the Supabase JavaScript Client
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; @supabase/supabase-js
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createClient&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@supabase/supabase-js&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;supabase&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://yourproject.supabase.co&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;your-anon-key&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// Insert a row&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;supabase&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;notes&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;My first note&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Hello from Supabase&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Query with filters&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;notes&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;supabase&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;notes&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;*&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;eq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;published&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;order&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;created_at&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;ascending&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;notes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Connect with Python (psycopg2 / SQLAlchemy)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;psycopg2-binary sqlalchemy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;psycopg2&lt;/span&gt;

&lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;psycopg2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;db.yourproject.supabase.co&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgres&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgres&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-db-password&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5432&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For Python apps that need connection pooling (e.g. FastAPI, Django), use the Transaction mode connection string from &lt;strong&gt;Project Settings → Database → Connection string → Transaction mode&lt;/strong&gt;. This routes through PgBouncer and is much more efficient for serverless or high-concurrency scenarios.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Built-in Auth — Zero Config
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Sign up a new user&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;supabase&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;signUp&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;user@example.com&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;securepassword123&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;// Sign in&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;session&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;supabase&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;signInWithPassword&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;user@example.com&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;securepassword123&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;// OAuth (GitHub, Google, Discord, etc.)&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;supabase&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;signInWithOAuth&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;github&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Authentication rows are stored in a &lt;code&gt;auth.users&lt;/code&gt; table that Supabase manages internally. You can join against it from your own tables and use Row Level Security (RLS) to enforce “users can only read their own data” rules directly in the database.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Row Level Security Example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- In the Supabase SQL editor:&lt;/span&gt;
&lt;span class="c1"&gt;-- Create a notes table&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;notes&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;gen_random_uuid&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="k"&gt;REFERENCES&lt;/span&gt; &lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="n"&gt;timestamptz&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Enable RLS&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;notes&lt;/span&gt; &lt;span class="n"&gt;ENABLE&lt;/span&gt; &lt;span class="k"&gt;ROW&lt;/span&gt; &lt;span class="k"&gt;LEVEL&lt;/span&gt; &lt;span class="k"&gt;SECURITY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Users can only see their own notes&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;POLICY&lt;/span&gt; &lt;span class="nv"&gt;"Users can read own notes"&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;notes&lt;/span&gt; &lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt;
&lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uid&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Users can only insert their own notes&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;POLICY&lt;/span&gt; &lt;span class="nv"&gt;"Users can insert own notes"&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;notes&lt;/span&gt; &lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="k"&gt;INSERT&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="k"&gt;CHECK&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uid&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is one of Supabase’s best features: security rules live in the database, not scattered across API handlers. Whether a request comes through the JavaScript SDK, REST API, or direct SQL, the same rules apply.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Neon?
&lt;/h2&gt;

&lt;p&gt;Neon is a serverless PostgreSQL platform that launched in 2022 with a fresh take on the database problem: separate storage from compute. Instead of a traditional “always on” Postgres instance, Neon’s compute nodes can scale down to zero when idle and spin back up on demand.&lt;/p&gt;

&lt;p&gt;This architecture has one major practical benefit for the free tier: &lt;strong&gt;you’re not paying for idle compute&lt;/strong&gt;. Neon’s free tier genuinely handles this well — your database suspends after 5 minutes of inactivity and wakes up in a few hundred milliseconds when the next query arrives.&lt;/p&gt;

&lt;p&gt;Neon is 100% standard PostgreSQL (currently PG 16). There’s no proprietary query language or SDK. If you can connect to Postgres, you can connect to Neon. It’s also the database used by Vercel Postgres (which is built on Neon under the hood).&lt;/p&gt;

&lt;p&gt;The feature that sets Neon apart from everything else: &lt;strong&gt;database branching&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Neon Free Tier: What You Actually Get
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Compute:&lt;/strong&gt; 191.9 compute hours per month — at 0.25 vCPU, that works out to roughly 767 active hours/month (or most of the month if you’re not running heavy queries)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storage:&lt;/strong&gt; 512 MB&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Databases:&lt;/strong&gt; Unlimited databases within one project&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Projects:&lt;/strong&gt; 1 project on the free tier&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Branches:&lt;/strong&gt; 10 branches (each branch is an instant copy-on-write snapshot)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Connection pooling:&lt;/strong&gt; Neon’s built-in HTTP proxy&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Autoscaling:&lt;/strong&gt; Suspends after 5 minutes of inactivity; wakes in ~300ms&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The 1-project limit on the free tier is the main constraint. Everything runs inside that one project — but you can create multiple databases and branches within it, so in practice it’s less limiting than it sounds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Neon’s Killer Feature: Database Branching
&lt;/h2&gt;

&lt;p&gt;Branching is what makes Neon genuinely different from every other hosted database. Here’s how it works: a branch is an instant, zero-cost copy-on-write snapshot of your database. Creating a branch takes milliseconds and costs almost no storage because unchanged pages are shared between branches.&lt;/p&gt;

&lt;p&gt;Why does this matter?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Preview deployments:&lt;/strong&gt; Vercel and other platforms can create a fresh database branch for every pull request. Your PR gets its own isolated database with real production data (or a sanitized copy). No more “PR broke staging because someone else was testing there.”&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Testing with production data:&lt;/strong&gt; Branch from production, run your migration, test it. If something breaks, delete the branch. Your production database is untouched.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Development environments:&lt;/strong&gt; Each developer gets their own branch. No shared dev database with conflicting data.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Neon CLI&lt;/span&gt;
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; neonctl

&lt;span class="c"&gt;# Authenticate&lt;/span&gt;
neonctl auth

&lt;span class="c"&gt;# Create a branch from main&lt;/span&gt;
neonctl branches create &lt;span class="nt"&gt;--name&lt;/span&gt; feature/add-users &lt;span class="nt"&gt;--parent&lt;/span&gt; main

&lt;span class="c"&gt;# List your branches&lt;/span&gt;
neonctl branches list

&lt;span class="c"&gt;# Get connection string for a specific branch&lt;/span&gt;
neonctl connection-string feature/add-users
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also manage branches via the Neon dashboard or REST API. GitHub Actions integration is available for automating branch creation on PR open and deletion on PR merge.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started with Neon
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Create Your Free Account
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;a href="https://neon.tech" rel="noopener noreferrer"&gt;neon.tech&lt;/a&gt; and sign up with GitHub&lt;/li&gt;
&lt;li&gt;Create your first project — choose a region and PostgreSQL version&lt;/li&gt;
&lt;li&gt;Copy your connection string from the dashboard&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  2. Connect with Python
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;psycopg2-binary
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;psycopg2&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;psycopg2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;DATABASE_URL&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    CREATE TABLE IF NOT EXISTS users (
        id SERIAL PRIMARY KEY,
        email VARCHAR(255) UNIQUE NOT NULL,
        created_at TIMESTAMP DEFAULT NOW()
    )
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;INSERT INTO users (email) VALUES (%s) RETURNING id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;alice@example.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetchone&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Created user with ID: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Connect with JavaScript (Neon Serverless Driver)
&lt;/h3&gt;

&lt;p&gt;Neon has a serverless HTTP driver that works in edge runtimes like Cloudflare Workers and Vercel Edge Functions — environments where you can’t use TCP connections:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; @neondatabase/serverless
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;neon&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@neondatabase/serverless&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sql&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;neon&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;DATABASE_URL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// Simple query&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;users&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;sql&lt;/span&gt;&lt;span class="s2"&gt;`SELECT * FROM users WHERE active = true LIMIT 10`&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;users&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// Parameterized query (safe from SQL injection)&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;email&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;alice@example.com&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;sql&lt;/span&gt;&lt;span class="s2"&gt;`SELECT * FROM users WHERE email = &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Use with an ORM (Drizzle)
&lt;/h3&gt;

&lt;p&gt;Neon works great with Drizzle, the TypeScript-first ORM that’s become popular for edge deployments:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;drizzle-orm @neondatabase/serverless
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-D&lt;/span&gt; drizzle-kit
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;drizzle&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;drizzle-orm/neon-http&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;neon&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@neondatabase/serverless&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;pgTable&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;serial&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;timestamp&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;drizzle-orm/pg-core&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sql&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;neon&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;DATABASE_URL&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;drizzle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;users&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;pgTable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;users&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;serial&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;primaryKey&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="na"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;email&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;notNull&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;unique&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="na"&gt;createdAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;created_at&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;defaultNow&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;// Query&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;allUsers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;users&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;allUsers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. Database Branching in CI/CD
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# .github/workflows/preview.yml&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Preview Deployment&lt;/span&gt;

&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;types&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;opened&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;synchronize&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;create-preview-db&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;neondatabase/create-branch-action@v5&lt;/span&gt;
        &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;create-branch&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;project_id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ vars.NEON_PROJECT_ID }}&lt;/span&gt;
          &lt;span class="na"&gt;branch_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;preview/pr-${{ github.event.pull_request.number }}&lt;/span&gt;
          &lt;span class="na"&gt;api_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.NEON_API_KEY }}&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Run migrations on preview branch&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;DATABASE_URL="${{ steps.create-branch.outputs.db_url }}" \&lt;/span&gt;
          &lt;span class="s"&gt;npm run db:migrate&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Output connection string&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;echo "Preview DB&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ steps.create-branch.outputs.db_url_with_pooler }}"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Head-to-Head: Supabase vs Neon
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Storage and Database Limits
&lt;/h3&gt;

&lt;p&gt;Both offer ~500 MB of free storage. In practice, this is enough for a portfolio project, a small SaaS with a few hundred users, or any active development workload. Once you hit the limit, Supabase’s paid tier starts at $25/month and Neon’s at $19/month.&lt;/p&gt;

&lt;h3&gt;
  
  
  Connection Handling
&lt;/h3&gt;

&lt;p&gt;This is where experience diverges depending on your stack.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Supabase&lt;/strong&gt; includes PgBouncer connection pooling out of the box. If you’re deploying to a serverless environment (Vercel, AWS Lambda, Cloudflare Workers), you must use the pooled connection string — direct connections will exhaust your Postgres connection limit quickly. Supabase makes this easy: just use the “Transaction mode” connection string.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Neon&lt;/strong&gt;‘s serverless driver solves this problem differently. It communicates over HTTP instead of TCP, which means it works natively in edge runtimes without any connection pooling configuration. For serverless deployments, this is genuinely more convenient. For traditional servers, you use the standard connection string through pgBouncer just like any other database.&lt;/p&gt;

&lt;h3&gt;
  
  
  Branching: Neon Wins Clearly
&lt;/h3&gt;

&lt;p&gt;Supabase does not have database branching on the free tier (it’s available on paid plans). Neon’s branching is available on the free tier and works extremely well.&lt;/p&gt;

&lt;p&gt;If you’re doing serious development — running migrations, testing new schemas, doing preview deployments — Neon’s branching workflow is genuinely better. It’s not a gimmick; it changes how you think about database testing. Being able to branch from production to test a migration, then delete the branch if anything looks wrong, removes a whole category of risk from database changes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Auth and APIs: Supabase Wins Clearly
&lt;/h3&gt;

&lt;p&gt;Neon is a database. Full stop. It has no built-in auth, no file storage, no REST API layer. You bring your own authentication (NextAuth, Clerk, Auth0, etc.) and your own API framework.&lt;/p&gt;

&lt;p&gt;Supabase bundles all of this. For full-stack JavaScript apps where you want everything wired together with minimal configuration, Supabase is a massive time saver. The auto-generated REST API alone can replace days of writing CRUD endpoints. Combined with Row Level Security, you can build apps where the frontend reads directly from the database securely — no separate API server needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Vector Search (pgvector)
&lt;/h3&gt;

&lt;p&gt;Both Supabase and Neon support the &lt;code&gt;pgvector&lt;/code&gt; extension, making either one a solid choice for AI applications that need to store and search vector embeddings.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Works on both Supabase and Neon&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;EXTENSION&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;SERIAL&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1536&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;-- for OpenAI text-embedding-3-small&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Create an index for fast similarity search&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;ivfflat&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;vector_cosine_ops&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lists&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Semantic similarity search&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'[0.1, 0.2, ...]'&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;similarity&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'[0.1, 0.2, ...]'&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Supabase has better documentation and official guides for pgvector. Neon works just as well but you’ll rely more on upstream pgvector docs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Ecosystem and Integrations
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Supabase&lt;/strong&gt; has integrations with Vercel, Netlify, Cloudflare Workers, and a growing list of tools. It also has official client libraries for JavaScript, Python, Swift, Kotlin, Flutter, and others.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Neon&lt;/strong&gt; is deeply integrated with Vercel (Vercel Postgres is Neon), which means it has first-class support in the Vercel ecosystem. It also has an official GitHub Actions integration for branch management.&lt;/p&gt;

&lt;h2&gt;
  
  
  Which Should You Use? Use Case Breakdown
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Choose Supabase if you’re building:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A full-stack web app with user accounts&lt;/strong&gt; — Supabase Auth handles the entire user lifecycle. Combined with RLS, you get secure multi-tenant data isolation with almost no backend code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A mobile app (React Native, Flutter)&lt;/strong&gt; — Supabase’s official mobile SDKs are excellent. Auth, storage, and realtime all just work.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A Firebase migration&lt;/strong&gt; — The API design is deliberately Firebase-like. If you’re moving away from Firebase/Firestore, Supabase has the most similar mental model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Something that needs file uploads&lt;/strong&gt; — Supabase Storage with 1 GB free is a clean built-in solution.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A realtime application&lt;/strong&gt; — Supabase Realtime works via Postgres logical replication and is included in the free tier.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Choose Neon if you’re building:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A Next.js / Vercel app&lt;/strong&gt; — Neon is Vercel’s recommended database. The integration is tight, and Vercel Postgres is literally Neon with a Vercel wrapper.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Something that needs database branches&lt;/strong&gt; — CI/CD with preview databases, migration testing, multiple developers sharing an environment.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A serverless or edge function-heavy app&lt;/strong&gt; — Neon’s HTTP driver is the cleanest solution for edge runtimes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A backend where you already have auth&lt;/strong&gt; — If you’re using Clerk, Auth0, NextAuth, or Lucia for authentication, you don’t need Supabase’s auth layer. Neon gives you a clean Postgres without anything extra in the way.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A team doing serious database migrations&lt;/strong&gt; — Branch from production, test your migration on real data, merge or discard. This workflow is worth a lot.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Connecting Your Database to AI Applications (pgvector + OpenClaw)
&lt;/h2&gt;

&lt;p&gt;One of the most popular use cases for both platforms is building AI applications with semantic search. Both Supabase and Neon support &lt;code&gt;pgvector&lt;/code&gt;, so you can store embeddings alongside your regular data and run fast similarity searches without a separate vector database like Pinecone or Chroma.&lt;/p&gt;

&lt;p&gt;Here’s a complete Python example that generates embeddings with a free API and stores them in either Supabase or Neon:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;psycopg2&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="c1"&gt;# Works with both Supabase and Neon - just change DATABASE_URL
&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;psycopg2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;DATABASE_URL&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Create the table with vector column
&lt;/span&gt;&lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    CREATE EXTENSION IF NOT EXISTS vector;

    CREATE TABLE IF NOT EXISTS knowledge_base (
        id SERIAL PRIMARY KEY,
        title TEXT,
        content TEXT,
        embedding vector(768),
        created_at TIMESTAMP DEFAULT NOW()
    );

    CREATE INDEX IF NOT EXISTS knowledge_embedding_idx
    ON knowledge_base USING ivfflat (embedding vector_cosine_ops)
    WITH (lists = 50);
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="c1"&gt;# Using Gemini embeddings (free tier: 1500 requests/day)
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://generativelanguage.googleapis.com/v1beta/models/text-embedding-004:embedContent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;GEMINI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt;
        &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;models/text-embedding-004&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;}]}}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;embedding&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;values&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add_document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;INSERT INTO knowledge_base (title, content, embedding) VALUES (%s, %s, %s)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;semantic_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;query_embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        SELECT title, content, 1 - (embedding &amp;lt;=&amp;gt; %s::vector) AS similarity
        FROM knowledge_base
        ORDER BY embedding &amp;lt;=&amp;gt; %s::vector
        LIMIT %s
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetchall&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Example usage
&lt;/span&gt;&lt;span class="nf"&gt;add_document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Python async&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;asyncio allows concurrent code using async/await syntax...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;semantic_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;how to write concurrent Python code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; | &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you’re using &lt;a href="https://openclaw.ai/" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt; for your AI agent, you can connect this knowledge base to give your agent long-term memory and semantic recall. The agent can call a tool that runs similarity search against your Postgres/Neon/Supabase database and retrieve relevant context before generating a response — a simple but powerful RAG architecture that runs entirely for free.&lt;/p&gt;

&lt;h2&gt;
  
  
  Free Alternatives Worth Mentioning
&lt;/h2&gt;

&lt;p&gt;Supabase and Neon cover most use cases, but a few others are worth knowing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Turso&lt;/strong&gt; (LibSQL / SQLite edge): 9 GB storage free, 500 databases, perfect for edge deployments if you can use SQLite instead of Postgres&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Render PostgreSQL&lt;/strong&gt;: Free tier exists but the instance expires after 90 days — good for testing, not production&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ElephantSQL&lt;/strong&gt;: Free 20 MB Postgres — effectively only useful for learning, not real projects&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fly.io Postgres&lt;/strong&gt;: Free within Fly’s allowance — requires running a Postgres container on Fly, more DevOps effort but very flexible&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For the vast majority of developers choosing between real hosted Postgres options with sustainable free tiers, the choice is Supabase vs Neon.&lt;/p&gt;

&lt;h2&gt;
  
  
  Limitations to Know Before You Commit
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Supabase Free Tier Limitations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Project pause:&lt;/strong&gt; The database pauses after 1 week of inactivity. The cold start adds 1–2 seconds to the first request. For anything that needs to be always-available, either keep the project warm or upgrade.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Only 2 active projects:&lt;/strong&gt; You can create more, but only 2 run at once. This is a real constraint if you’re spinning up experiments frequently.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shared compute:&lt;/strong&gt; The free tier runs on shared infrastructure. Don’t expect consistent query performance. For latency-sensitive apps, the Pro plan gives you dedicated compute.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Edge Functions cold starts:&lt;/strong&gt; Free tier edge functions have cold starts up to a few seconds.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No custom domains on free tier:&lt;/strong&gt; Your API URL is on supabase.co — upgrade for custom domain mapping.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Neon Free Tier Limitations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Auto-suspend after 5 minutes:&lt;/strong&gt; The compute node suspends. Wake time is ~300ms, which is usually fine but will fail some health checks that expect sub-100ms responses.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Only 1 project:&lt;/strong&gt; Everything has to fit inside one Neon project. In practice, this means one production app per account on the free tier.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No dedicated support:&lt;/strong&gt; Free tier gets community support only.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storage limit is shared across branches:&lt;/strong&gt; Your 512 MB total is shared between main and all branches. Branches are copy-on-write so they’re efficient, but large datasets with many active branches can still hit limits.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No built-in auth, APIs, or storage:&lt;/strong&gt; These are all things you need to add yourself. For developers who want a complete backend, this requires more configuration.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Performance: What to Expect
&lt;/h2&gt;

&lt;p&gt;Both platforms run standard PostgreSQL, so query performance is mostly about your indexing and query design rather than the platform. That said, there are platform-level differences worth noting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Connection overhead:&lt;/strong&gt; Supabase uses PgBouncer, which is industry-standard and well-understood. Neon’s HTTP proxy introduces an additional round-trip for the first query in a cold-start scenario, but subsequent queries in the same session are comparable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cold start:&lt;/strong&gt; Neon’s 300ms wake-up time is better than Supabase’s 1–2 second project resume. For apps with intermittent traffic, Neon handles cold starts more gracefully.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Query throughput:&lt;/strong&gt; Both platforms will handle typical side project and small production loads easily. Under heavy concurrent load (many connections, complex queries), dedicated compute (paid) is better on both platforms. The free tier on either is not suitable for high-traffic production without careful connection pooling and query optimization.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Verdict: Supabase or Neon?
&lt;/h2&gt;

&lt;p&gt;After building apps with both, here’s the honest take:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Supabase is the better choice if this is your first database, if you’re building a full-stack app, or if you want a complete backend-as-a-service..&lt;/strong&gt; The included auth, storage, and auto-generated APIs mean you can ship a working app with user accounts in an afternoon. The free tier is genuinely usable for small production workloads. If you’ve been on Firebase and want something better, this is where you go.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Neon is the better choice if you already have auth figured out, if you’re deploying to Vercel, or if you’re doing serious database development work.&lt;/strong&gt; The branching feature is the best developer experience improvement in database tooling in years. Being able to branch from production to test a migration, preview each pull request against its own database copy, and give each developer an isolated environment — this is how database development should work. If you’re doing anything beyond hobby projects, Neon’s branching workflow is worth the slightly more DIY setup.&lt;/p&gt;

&lt;p&gt;They’re not mutually exclusive either. Some projects use Neon for the main database (for branching and Vercel integration) and Supabase Storage for file uploads. The free tiers on both are available simultaneously with different email addresses — though managing two platforms adds complexity, so most developers pick one and stick with it.&lt;/p&gt;

&lt;p&gt;Both Supabase and Neon are significantly better than the pre-PlanetScale landscape for free Postgres hosting. The loss of PlanetScale’s free tier hurt, but the alternatives are strong — and in Neon’s case, the branching feature is something PlanetScale never offered in the first place.&lt;/p&gt;

&lt;p&gt;Start with &lt;a href="https://supabase.com" rel="noopener noreferrer"&gt;Supabase&lt;/a&gt; if you want a complete Firebase replacement. Start with &lt;a href="https://neon.tech" rel="noopener noreferrer"&gt;Neon&lt;/a&gt; if you want the best free Postgres for serious development workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Reads
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/render-hosting-review/" rel="noopener noreferrer"&gt;Render Free Hosting Review 2026: Deploy Web Apps, Databases, and Cron Jobs for Free&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/vercel-netlify-cloudflare/" rel="noopener noreferrer"&gt;Vercel vs Netlify vs Cloudflare Pages: Free Frontend Hosting Compared&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/railway-heroku-alternative/" rel="noopener noreferrer"&gt;Railway App Review 2026: The Best Heroku Alternative for Developers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/oracle-free-arm-vps/" rel="noopener noreferrer"&gt;Oracle Cloud Always Free: Get a 4-Core 24GB ARM VPS for Free&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/best-free-hosting-2026/" rel="noopener noreferrer"&gt;7 Best Free Web Hosting for Developers: Cloudflare Pages, Vercel, Netlify and More&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://toolfreebie.com/supabase-vs-neon/" rel="noopener noreferrer"&gt;toolfreebie.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>hosting</category>
      <category>devops</category>
    </item>
    <item>
      <title>CrewAI vs AutoGPT vs LangGraph: Which Free Agent Framework Should You Use in 2026?</title>
      <dc:creator>toolfreebie</dc:creator>
      <pubDate>Sun, 03 May 2026 12:02:34 +0000</pubDate>
      <link>https://dev.to/build996/crewai-vs-autogpt-vs-langgraph-which-free-agent-framework-should-you-use-in-2026-40pd</link>
      <guid>https://dev.to/build996/crewai-vs-autogpt-vs-langgraph-which-free-agent-framework-should-you-use-in-2026-40pd</guid>
      <description>&lt;h2&gt;
  
  
  The Agent Framework Question Every Developer Faces
&lt;/h2&gt;

&lt;p&gt;You’ve decided to build something that goes beyond a single chatbot prompt. Maybe it’s a research assistant that browses the web, summarizes findings, and drafts a report. Maybe it’s an automated code reviewer that reads a PR, runs tests, and posts feedback. Maybe it’s a customer support pipeline that triages tickets, looks up order history, and drafts responses — without you touching a thing.&lt;/p&gt;

&lt;p&gt;All of these require &lt;strong&gt;agent frameworks&lt;/strong&gt;: libraries that let you define goals, give AI models access to tools, and orchestrate multi-step workflows that can reason, retry, and adapt.&lt;/p&gt;

&lt;p&gt;Three frameworks dominate this space in 2026: &lt;strong&gt;CrewAI&lt;/strong&gt;, &lt;strong&gt;AutoGPT&lt;/strong&gt;, and &lt;strong&gt;LangGraph&lt;/strong&gt;. All three are free and open-source. All three are actively maintained with large communities. But they’re designed around fundamentally different mental models — and picking the wrong one for your use case costs you weeks.&lt;/p&gt;

&lt;p&gt;I’ve built real projects with all three, including tools that run on &lt;a href="https://openclaw.ai/" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt; using free-tier AI APIs. Here’s what I’ve learned about when each framework actually shines.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Quick Answer (Before We Go Deep)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Framework&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;th&gt;Mental Model&lt;/th&gt;
&lt;th&gt;Complexity&lt;/th&gt;
&lt;th&gt;GitHub Stars&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CrewAI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Structured multi-agent pipelines with defined roles&lt;/td&gt;
&lt;td&gt;A crew of specialized workers&lt;/td&gt;
&lt;td&gt;Low–Medium&lt;/td&gt;
&lt;td&gt;28,000+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AutoGPT&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Autonomous long-running tasks, no-code agent configuration&lt;/td&gt;
&lt;td&gt;A self-directed AI assistant&lt;/td&gt;
&lt;td&gt;Low (UI) / Medium (SDK)&lt;/td&gt;
&lt;td&gt;170,000+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LangGraph&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Complex stateful workflows with branching logic and human-in-the-loop&lt;/td&gt;
&lt;td&gt;A directed graph of states and transitions&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;12,000+&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you want to skip straight to a recommendation: &lt;strong&gt;start with CrewAI&lt;/strong&gt; if you’re new to agent frameworks, &lt;strong&gt;try AutoGPT&lt;/strong&gt; if you want a no-code interface, and &lt;strong&gt;use LangGraph&lt;/strong&gt; only when you need fine-grained control over execution flow that the other two can’t give you.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is CrewAI?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.crewai.com/" rel="noopener noreferrer"&gt;CrewAI&lt;/a&gt; is an open-source Python framework for building multi-agent systems where each agent has a defined &lt;strong&gt;role&lt;/strong&gt;, &lt;strong&gt;goal&lt;/strong&gt;, and &lt;strong&gt;backstory&lt;/strong&gt;. Agents collaborate as a team — a “crew” — passing outputs to each other to complete complex tasks.&lt;/p&gt;

&lt;p&gt;It’s the newest of the three (released in late 2023) and the fastest-growing. As of 2026, CrewAI has crossed &lt;strong&gt;30 million downloads&lt;/strong&gt; and 28,000+ GitHub stars — numbers that reflect real adoption, not just hype.&lt;/p&gt;

&lt;p&gt;The core insight behind CrewAI is that the most effective AI systems mirror how real teams work: a researcher finds information, a writer structures it, a reviewer checks the output. By assigning these roles explicitly, CrewAI gets more coherent results than a single catch-all agent.&lt;/p&gt;

&lt;h3&gt;
  
  
  Installing CrewAI
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;crewai crewai-tools
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Requires Python 3.10–3.13. Works with OpenAI, Groq, Gemini, Anthropic, Mistral, Ollama (local), and any OpenAI-compatible endpoint.&lt;/p&gt;

&lt;h3&gt;
  
  
  CrewAI Core Concepts
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Agent:&lt;/strong&gt; An AI worker with a role, goal, and backstory. The backstory is surprisingly important — it primes the model to behave consistently with its assigned persona.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Task:&lt;/strong&gt; A specific job with a description and expected output format, assigned to an agent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Crew:&lt;/strong&gt; The team — a list of agents and tasks, plus a process (sequential or hierarchical).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool:&lt;/strong&gt; Capabilities agents can use: web search, file read/write, code execution, database queries, and 30+ built-ins.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  A Working CrewAI Example
&lt;/h3&gt;

&lt;p&gt;This pipeline uses Groq’s free API (500 requests/day) to build a two-agent content research and writing system:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;crewai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Crew&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Process&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;crewai_tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SerperDevTool&lt;/span&gt;

&lt;span class="c1"&gt;# Use Groq free tier — set OPENAI_* env vars to point to Groq
&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-groq-api-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_BASE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.groq.com/openai/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_MODEL_NAME&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama-3.3-70b-versatile&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;search_tool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SerperDevTool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;researcher&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Senior Research Analyst&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Find accurate, up-to-date information on the given topic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;backstory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;re a meticulous researcher who always verifies sources &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;and presents findings in a structured, actionable format.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;search_tool&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;writer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Technical Content Writer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write clear, developer-friendly articles based on research&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;backstory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You write engaging technical content that developers actually enjoy reading. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You prefer concrete examples over abstract claims.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;research_task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Research the current state of {topic}. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Focus on: key features, real-world use cases, limitations, and alternatives.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;expected_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A structured research brief with key findings and sources.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;researcher&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;write_task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Using the research provided, write a 600-word article about {topic}. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Include an intro, 3 key takeaways with examples, and a recommendation.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;expected_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A complete article ready for publication.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;writer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;research_task&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;crew&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Crew&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;researcher&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;writer&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;research_task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;write_task&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sequential&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;crew&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;kickoff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;topic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LangGraph vs CrewAI for production agents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What CrewAI Does Well
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fast to prototype:&lt;/strong&gt; Most developers have a working multi-agent pipeline in under an hour.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Role-based prompting works:&lt;/strong&gt; The role/goal/backstory model produces more consistent agent behavior than a single system prompt.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flexible LLM support:&lt;/strong&gt; Swap between OpenAI, Groq, Gemini, or local Ollama with a single environment variable change.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory and state:&lt;/strong&gt; Built-in short-term, long-term, entity memory using an embedded RAG system.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Active development:&lt;/strong&gt; CrewAI Enterprise and CrewAI Studio (visual builder) are available if you outgrow the open-source version.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  CrewAI Limitations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Limited branching:&lt;/strong&gt; Sequential and hierarchical are the two execution modes. Complex conditional logic (“if research finds X, take path A; if Y, take path B”) requires workarounds.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Opaque internal state:&lt;/strong&gt; When an agent call fails or produces garbage, debugging requires digging through verbose logs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token costs add up:&lt;/strong&gt; Each agent gets the full backstory + task description on every call. Complex crews burn tokens fast on paid APIs.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What is AutoGPT?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://agpt.co/" rel="noopener noreferrer"&gt;AutoGPT&lt;/a&gt; is the original autonomous AI agent project — the one that made the entire world briefly believe AI agents were about to take everyone’s jobs. Released in March 2023, it became the fastest-growing GitHub repository in history at the time, hitting 100,000 stars in weeks.&lt;/p&gt;

&lt;p&gt;In 2026, AutoGPT has matured significantly. It’s no longer just a chaotic “let the AI do everything” experiment. The current version has two distinct faces:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AutoGPT Platform:&lt;/strong&gt; A no-code interface where you configure agents, define triggers, and connect tools through a visual builder. Think Zapier, but with AI reasoning instead of just conditional logic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AutoGPT SDK:&lt;/strong&gt; A Python library for developers who want programmatic control without the visual interface.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The AutoGPT Philosophy
&lt;/h3&gt;

&lt;p&gt;Where CrewAI and LangGraph ask you to define explicit agent roles and workflow steps, AutoGPT’s original design philosophy was &lt;strong&gt;open-ended autonomy&lt;/strong&gt;: give the agent a goal, equip it with tools, and let it decide how to achieve the goal through a self-directed loop of planning → action → observation → replanning.&lt;/p&gt;

&lt;p&gt;This works brilliantly for exploratory tasks where you genuinely don’t know all the steps in advance. It works poorly for tasks where you need predictable, auditable execution paths.&lt;/p&gt;

&lt;h3&gt;
  
  
  Getting Started with AutoGPT SDK
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;autogpt-sdk
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;autogpt_sdk&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AutoGPT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Tool&lt;/span&gt;

&lt;span class="c1"&gt;# Define a simple tool
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_web&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Implement with any search API
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Search results for: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AutoGPT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;ai_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Research Assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ai_role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a research assistant that finds accurate information online.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="nc"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_web&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Search the web for current information&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;search_web&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;llm_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# or any compatible model
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;goals&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Research the top 3 free AI APIs available in 2026&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;For each API, find the free tier limits&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Produce a comparison table&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What AutoGPT Does Well
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No-code accessibility:&lt;/strong&gt; The visual platform lets non-developers configure powerful automation without writing Python.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Autonomous replanning:&lt;/strong&gt; When a tool call fails or returns unexpected results, AutoGPT can adapt without manual intervention.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Broad tool ecosystem:&lt;/strong&gt; Web search, email, file management, calendar, and many more integrations out of the box.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Long-horizon tasks:&lt;/strong&gt; For tasks that might take dozens of steps and several hours, AutoGPT’s persistent memory and goal-tracking work well.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  AutoGPT Limitations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Unpredictability:&lt;/strong&gt; The autonomous loop can go off the rails, especially with weaker models. An agent might take 40 steps to do what should take 5.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hard to audit:&lt;/strong&gt; In a production system, you often need to explain exactly why an agent took each action. AutoGPT’s autonomous planning makes this difficult.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High token consumption:&lt;/strong&gt; The planning loop re-reads the full task history on every iteration. Long-running tasks can burn through free-tier limits quickly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Less Python-native:&lt;/strong&gt; The SDK is less mature than the platform, and developers used to composing clean Python code often find the interface awkward.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What is LangGraph?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://langchain-ai.github.io/langgraph/" rel="noopener noreferrer"&gt;LangGraph&lt;/a&gt; is LangChain’s framework for building &lt;strong&gt;stateful, graph-based AI workflows&lt;/strong&gt;. If you’re familiar with LangChain, LangGraph is its more powerful, more complex successor for agentic applications.&lt;/p&gt;

&lt;p&gt;The key mental model: your application is a &lt;strong&gt;directed graph&lt;/strong&gt;. Nodes are processing steps (LLM calls, tool calls, human review gates, custom logic). Edges define when to move from one node to the next — with support for conditional branching, loops, and parallel execution.&lt;/p&gt;

&lt;p&gt;This sounds abstract. In practice, it means LangGraph can represent workflows that are simply impossible to express cleanly in CrewAI or AutoGPT: “call the LLM → if it wants to use tool X, execute X and loop back; if it wants to use tool Y, branch to a different subgraph; if the human rejects the output, send it to a revision node; after 3 revision attempts, escalate to a human review queue.”&lt;/p&gt;

&lt;h3&gt;
  
  
  Installing LangGraph
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;langgraph langchain langchain-openai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  A LangGraph ReAct Agent Example
&lt;/h3&gt;

&lt;p&gt;This example builds a ReAct (Reasoning + Acting) agent with tool use and a human approval checkpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Sequence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TypedDict&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatOpenAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.messages&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseMessage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ToolMessage&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.graph&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.prebuilt&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ToolNode&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;operator&lt;/span&gt;

&lt;span class="c1"&gt;# Use Groq's free tier via OpenAI-compatible endpoint
&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatOpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama-3.3-70b-versatile&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;openai_api_base&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.groq.com/openai/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;openai_api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-groq-api-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_web&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Search the web for current information.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# In production, integrate with Serper, Brave, or DuckDuckGo
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Simulated search results for: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calculate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expression&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Evaluate a mathematical expression.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;eval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expression&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__builtins__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{}},&lt;/span&gt; &lt;span class="p"&gt;{}))&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;search_web&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;calculate&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;llm_with_tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bind_tools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;tool_node&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ToolNode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Define state schema
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AgentState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TypedDict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Sequence&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;BaseMessage&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;operator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Define nodes
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;agent_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AgentState&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm_with_tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;should_continue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AgentState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;last_message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;hasattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;last_message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_calls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;last_message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;

&lt;span class="c1"&gt;# Build the graph
&lt;/span&gt;&lt;span class="n"&gt;workflow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;AgentState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent_node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_entry_point&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_conditional_edges&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;should_continue&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Loop back after tool execution
&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Run the agent
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is 15% of the Groq free tier daily limit (14,400 RPD)?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__class__&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  LangGraph’s Killer Feature: Interrupts and Human-in-the-Loop
&lt;/h3&gt;

&lt;p&gt;Where LangGraph truly separates itself is &lt;strong&gt;human-in-the-loop&lt;/strong&gt; workflows — a pattern that’s increasingly critical for production AI systems where you need a human to approve or redirect agent actions before they become irreversible.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.checkpoint.memory&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MemorySaver&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.graph&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.types&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;interrupt&lt;/span&gt;

&lt;span class="c1"&gt;# With a checkpointer, you can pause execution mid-graph
&lt;/span&gt;&lt;span class="n"&gt;checkpointer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MemorySaver&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;review_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AgentState&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Pause here and wait for human input
&lt;/span&gt;    &lt;span class="n"&gt;human_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;interrupt&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;question&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Does this output look correct?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;current_output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;human_response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;approved&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Human rejected — add correction to messages and loop back
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;human_response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;correction&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])]}&lt;/span&gt;

&lt;span class="c1"&gt;# The graph pauses at 'review_node' until a human responds
# You resume it programmatically when the human submits their decision
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern is nearly impossible to implement cleanly in CrewAI or AutoGPT without building significant custom infrastructure around them.&lt;/p&gt;

&lt;h3&gt;
  
  
  What LangGraph Does Well
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Full control over execution flow:&lt;/strong&gt; Branching, looping, parallel subgraphs, and interrupt points — all first-class citizens.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Built-in persistence:&lt;/strong&gt; The checkpointing system lets you pause a workflow, store its state, and resume it later (even days later). Critical for long-running tasks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Human-in-the-loop:&lt;/strong&gt; The &lt;code&gt;interrupt&lt;/code&gt; primitive is the cleanest implementation of human approval workflows in any agent framework.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LangSmith integration:&lt;/strong&gt; If you use LangSmith for observability, LangGraph traces are exceptionally detailed — every node, every LLM call, every tool invocation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production-ready patterns:&lt;/strong&gt; The LangGraph team publishes reference architectures for common patterns: ReAct, plan-and-execute, multi-agent supervisor, and more.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  LangGraph Limitations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Steep learning curve:&lt;/strong&gt; The TypedDict state model, conditional edge functions, and checkpointer setup add real complexity. Plan for 1–2 days to get comfortable.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verbose boilerplate:&lt;/strong&gt; A LangGraph workflow that does what a 30-line CrewAI script does might need 100+ lines of setup code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LangChain baggage:&lt;/strong&gt; LangGraph’s tight LangChain integration means you’re dragged into LangChain’s abstraction layers whether you want them or not.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Over-engineering risk:&lt;/strong&gt; Many developers reach for LangGraph when CrewAI would have done the job in a third of the time.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Head-to-Head: How They Compare on What Matters
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Getting Started Speed
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Framework&lt;/th&gt;
&lt;th&gt;Time to First Working Agent&lt;/th&gt;
&lt;th&gt;Learning Curve&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CrewAI&lt;/td&gt;
&lt;td&gt;~30 minutes&lt;/td&gt;
&lt;td&gt;Low — role/goal/task maps to natural thinking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AutoGPT (Platform)&lt;/td&gt;
&lt;td&gt;~15 minutes&lt;/td&gt;
&lt;td&gt;Very low — UI-based, no code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AutoGPT (SDK)&lt;/td&gt;
&lt;td&gt;~45 minutes&lt;/td&gt;
&lt;td&gt;Medium — less intuitive than CrewAI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LangGraph&lt;/td&gt;
&lt;td&gt;~2–4 hours&lt;/td&gt;
&lt;td&gt;High — requires understanding graph, state, edges&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Flexibility and Control
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;CrewAI&lt;/th&gt;
&lt;th&gt;AutoGPT&lt;/th&gt;
&lt;th&gt;LangGraph&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Conditional branching&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;Via autonomous planning&lt;/td&gt;
&lt;td&gt;Full — first-class feature&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Loops / retry logic&lt;/td&gt;
&lt;td&gt;Basic&lt;/td&gt;
&lt;td&gt;Built-in&lt;/td&gt;
&lt;td&gt;Full control&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Parallel agent execution&lt;/td&gt;
&lt;td&gt;Planned&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;Native support&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Human-in-the-loop&lt;/td&gt;
&lt;td&gt;Manual workaround&lt;/td&gt;
&lt;td&gt;Platform UI&lt;/td&gt;
&lt;td&gt;Native interrupt support&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;State persistence&lt;/td&gt;
&lt;td&gt;In-memory + basic RAG&lt;/td&gt;
&lt;td&gt;Platform-managed&lt;/td&gt;
&lt;td&gt;Pluggable checkpointers (memory, DB, Redis)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Debugging visibility&lt;/td&gt;
&lt;td&gt;Verbose logs&lt;/td&gt;
&lt;td&gt;Platform dashboard&lt;/td&gt;
&lt;td&gt;LangSmith traces&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Free-Tier AI API Compatibility
&lt;/h3&gt;

&lt;p&gt;All three frameworks work with free AI APIs — but the setup differs:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Framework&lt;/th&gt;
&lt;th&gt;Groq&lt;/th&gt;
&lt;th&gt;Gemini&lt;/th&gt;
&lt;th&gt;Ollama (local)&lt;/th&gt;
&lt;th&gt;OpenAI-compatible&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CrewAI&lt;/td&gt;
&lt;td&gt;Set OPENAI_API_BASE env var&lt;/td&gt;
&lt;td&gt;Native via &lt;code&gt;langchain-google-genai&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Native via &lt;code&gt;ollama&lt;/code&gt; LLM class&lt;/td&gt;
&lt;td&gt;Automatic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AutoGPT&lt;/td&gt;
&lt;td&gt;Limited — best with OpenAI-format&lt;/td&gt;
&lt;td&gt;Platform integration&lt;/td&gt;
&lt;td&gt;Partial support&lt;/td&gt;
&lt;td&gt;Yes (SDK)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LangGraph&lt;/td&gt;
&lt;td&gt;Via &lt;code&gt;langchain_openai&lt;/code&gt; + base_url&lt;/td&gt;
&lt;td&gt;Native via &lt;code&gt;langchain-google-genai&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Via &lt;code&gt;langchain_ollama&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Automatic&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Winner for free-tier flexibility:&lt;/strong&gt; CrewAI and LangGraph are tied. Both make swapping between free-tier providers straightforward. AutoGPT’s SDK is less flexible; the platform requires specific integrations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Production Readiness
&lt;/h3&gt;

&lt;p&gt;This is where the differences matter most:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CrewAI:&lt;/strong&gt; Solid for production use cases where the workflow is well-defined and sequential. CrewAI Cloud and CrewAI Enterprise add managed hosting, monitoring, and scheduling. The open-source version alone gets you surprisingly far for MVP-level production deployments.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AutoGPT:&lt;/strong&gt; The platform is production-ready for no-code automation workflows. The SDK is better suited for experimentation than production. The autonomous loop is hard to make reliable at scale — errors compound.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LangGraph:&lt;/strong&gt; The most production-ready of the three for complex, stateful workflows. LangGraph Cloud (managed hosting) includes persistence, monitoring, and high-availability. The graph model forces you to think clearly about failure modes, which pays off at scale.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Which Framework Should You Use?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Use CrewAI When:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;You’re building a pipeline with clear, distinct roles (researcher → writer → reviewer)&lt;/li&gt;
&lt;li&gt;You need to ship something working in a day or two&lt;/li&gt;
&lt;li&gt;Your workflow is mostly sequential with occasional decision points&lt;/li&gt;
&lt;li&gt;You want to run locally with Ollama or use free-tier APIs like Groq or Gemini&lt;/li&gt;
&lt;li&gt;You’re new to agent frameworks and don’t want to learn LangChain’s abstractions first&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example projects:&lt;/strong&gt; Content research pipelines, automated report generation, code review assistants, customer feedback analysis, lead qualification workflows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use AutoGPT When:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;You want a no-code interface (AutoGPT Platform)&lt;/li&gt;
&lt;li&gt;You need the agent to handle open-ended, unpredictable tasks where the steps aren’t known in advance&lt;/li&gt;
&lt;li&gt;You’re building a personal productivity tool where occasional errors are acceptable&lt;/li&gt;
&lt;li&gt;You need broad out-of-the-box tool integrations (calendar, email, files) without custom code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example projects:&lt;/strong&gt; Personal research assistant, automated scheduling, email triage, exploratory data gathering tasks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use LangGraph When:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Your workflow has complex branching: “if X do A, else if Y do B, else loop back to C”&lt;/li&gt;
&lt;li&gt;You need human-in-the-loop approval at specific points&lt;/li&gt;
&lt;li&gt;You’re building something that needs to pause, persist state, and resume later&lt;/li&gt;
&lt;li&gt;You need detailed observability and are willing to invest in LangSmith&lt;/li&gt;
&lt;li&gt;You’re already in the LangChain ecosystem and want to extend existing chains into agents&lt;/li&gt;
&lt;li&gt;Correctness and auditability matter more than development speed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example projects:&lt;/strong&gt; Financial document review with human sign-off, legal contract analysis pipelines, multi-step code generation with testing loops, long-running data processing jobs that need to resume after failures.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using These Frameworks with Free AI APIs
&lt;/h2&gt;

&lt;p&gt;One of the best things about all three frameworks: they work with free-tier AI APIs, which means you can build serious agent systems at zero cost. Here’s how to pair them effectively:&lt;/p&gt;

&lt;h3&gt;
  
  
  Best Free API Combinations
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;th&gt;Recommended Free API&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;High-throughput agent loops&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://console.groq.com/" rel="noopener noreferrer"&gt;Groq&lt;/a&gt; (Llama 3.3 70B)&lt;/td&gt;
&lt;td&gt;300–500 tokens/s means fast agent iteration; 14,400 req/day free&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Long-context reasoning&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://aistudio.google.com/" rel="noopener noreferrer"&gt;Google Gemini&lt;/a&gt; (2.5 Flash)&lt;/td&gt;
&lt;td&gt;1M token context, 1,500 req/day, multimodal — unmatched for free&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Local, private agents&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://ollama.com/" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt; (Llama 3.2, Qwen2.5)&lt;/td&gt;
&lt;td&gt;Runs on your machine, no rate limits, no API keys, fully private&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model variety / failover&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://openrouter.ai/" rel="noopener noreferrer"&gt;OpenRouter&lt;/a&gt; free models&lt;/td&gt;
&lt;td&gt;300+ models including free Llama and Mistral variants; single API key&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  OpenClaw + CrewAI: A Practical Pattern
&lt;/h3&gt;

&lt;p&gt;If you’re using &lt;a href="https://openclaw.ai/" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt; to run Claude Code in the cloud, CrewAI is a natural fit for building automated development workflows. A common pattern: a CrewAI crew where one agent plans changes (using Groq’s free Llama for speed), another agent writes code, and a third agent reviews the diff — all running on free-tier APIs, orchestrated through a Python script that you kick off from your OpenClaw session.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# crewai_dev_pipeline.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;crewai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Crew&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Process&lt;/span&gt;

&lt;span class="c1"&gt;# Mix and match free APIs per agent based on their needs
&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-groq-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_BASE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.groq.com/openai/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_MODEL_NAME&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama-3.3-70b-versatile&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;planner&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Software Architect&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Break down feature requests into clear, implementable tasks&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;backstory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a senior engineer who writes precise technical specifications.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;coder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Python Developer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Implement features based on the architect&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s specifications&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;backstory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You write clean, well-tested Python code following PEP 8.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;reviewer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Code Reviewer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Review code for bugs, security issues, and best practices&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;backstory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You catch subtle bugs and provide constructive, specific feedback.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;plan_task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Given the feature request: &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;{feature}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, create a technical implementation plan.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;expected_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A numbered list of implementation steps with file names and function signatures.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;planner&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;code_task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Implement the feature following the plan. Write complete, runnable Python code.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;expected_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Complete Python implementation with docstrings and basic error handling.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;coder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;plan_task&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;review_task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Review the implementation for correctness, edge cases, and security issues.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;expected_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A review report listing: bugs found, security concerns, and suggestions.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;reviewer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;code_task&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;crew&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Crew&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;planner&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;coder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reviewer&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;plan_task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;code_task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;review_task&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sequential&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;crew&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;kickoff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;feature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Add rate limiting middleware to a FastAPI application&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Combining Frameworks: When the Best Answer Is “Both”
&lt;/h2&gt;

&lt;p&gt;An underappreciated pattern: use CrewAI for the role-based orchestration layer and LangGraph for a specific subworkflow that needs fine-grained control. For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A CrewAI crew handles the overall research → analysis → report pipeline&lt;/li&gt;
&lt;li&gt;The “analysis” agent is backed by a LangGraph subgraph that implements a plan-verify-revise loop with conditional retries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This gives you CrewAI’s easy agent definition and LangGraph’s precise flow control where you actually need it, without paying LangGraph’s boilerplate cost everywhere.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Framework You Don’t Need Yet: AutoGen
&lt;/h2&gt;

&lt;p&gt;Microsoft’s AutoGen deserves a mention as a fourth option. It’s powerful, especially for coding agents and multi-agent conversational patterns. But the API changed significantly between v0.2 and v0.4, making production usage riskier. If you’re evaluating CrewAI, AutoGPT, and LangGraph, finish that evaluation before adding AutoGen to the mix — the additional complexity rarely pays off unless you specifically need Microsoft’s conversational multi-agent patterns.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance and Cost on Free Tiers
&lt;/h2&gt;

&lt;p&gt;When you’re running agent frameworks on free APIs, a few practical rules keep costs (measured in rate limit hits) manageable:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Minimize agent count:&lt;/strong&gt; Every agent in a crew is at least one LLM call. Start with 2–3 agents, not 7.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use small models for simple tasks:&lt;/strong&gt; A routing or classification agent doesn’t need a 70B model. Use Groq’s Llama 3.2 3B (2,000+ tokens/s on the free tier) for simple decisions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cache intermediate results:&lt;/strong&gt; If your workflow re-runs frequently, cache tool call results. A researcher agent shouldn’t re-search for the same information on every run.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Set max_iter limits:&lt;/strong&gt; LangGraph and AutoGPT both support setting maximum iteration counts. Always set them. An agent that gets stuck in a loop will exhaust your daily quota in minutes.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Verdict: What to Actually Use in 2026
&lt;/h2&gt;

&lt;p&gt;The honest answer: &lt;strong&gt;CrewAI is the right starting point for most developers.&lt;/strong&gt; It has the best balance of power and approachability, works with every free AI API, and has a large enough community that you’ll find examples for almost any use case.&lt;/p&gt;

&lt;p&gt;Graduate to &lt;strong&gt;LangGraph&lt;/strong&gt; when you hit CrewAI’s limits — specifically when you need conditional branching, state persistence across sessions, or human approval checkpoints. That moment is clearly identifiable: you’ll find yourself writing ugly workarounds in CrewAI that LangGraph would handle natively.&lt;/p&gt;

&lt;p&gt;Use &lt;strong&gt;AutoGPT&lt;/strong&gt; if you need the no-code platform for non-technical users, or if you’re exploring open-ended autonomous tasks where you genuinely don’t know the required steps in advance. Skip the AutoGPT SDK in favor of CrewAI or LangGraph for any serious Python development.&lt;/p&gt;

&lt;p&gt;All three frameworks are actively maintained, free to use, and capable of powering real production systems — the choice is about matching the tool’s mental model to your problem, not about finding the “best” framework in the abstract.&lt;/p&gt;

&lt;p&gt;Start with CrewAI + Groq free tier this afternoon. You’ll have something working before dinner.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Reads
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/n8n-workflow-automation/" rel="noopener noreferrer"&gt;n8n: Open-Source Workflow Automation with AI Agents and 400+ Integrations&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/mcp-protocol-ai-agents/" rel="noopener noreferrer"&gt;MCP (Model Context Protocol): Connect AI Agents to Any Tool or API&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/notebooklm-ai-research/" rel="noopener noreferrer"&gt;Google NotebookLM: Free AI Research Tool for Summarizing Documents and PDFs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/dify-ai-app-builder/" rel="noopener noreferrer"&gt;Dify: Free Open-Source AI App Builder for Chatbots and Workflows&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/crewai-multi-agent-framework/" rel="noopener noreferrer"&gt;CrewAI: Free Open-Source Multi-Agent AI Framework for Python&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://toolfreebie.com/crewai-vs-autogpt-vs-langgraph/" rel="noopener noreferrer"&gt;toolfreebie.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
    </item>
    <item>
      <title>Groq vs Cerebras vs Gemini: Which Free AI API Is Actually Fastest in 2026?</title>
      <dc:creator>toolfreebie</dc:creator>
      <pubDate>Sun, 03 May 2026 11:57:02 +0000</pubDate>
      <link>https://dev.to/build996/groq-vs-cerebras-vs-gemini-which-free-ai-api-is-actually-fastest-in-2026-25h1</link>
      <guid>https://dev.to/build996/groq-vs-cerebras-vs-gemini-which-free-ai-api-is-actually-fastest-in-2026-25h1</guid>
      <description>&lt;h2&gt;
  
  
  The Free AI Speed War: Groq vs Cerebras vs Gemini
&lt;/h2&gt;

&lt;p&gt;Speed is back at the center of the AI API debate — and not just in marketing copy. In 2026, the gap between a slow free API and a fast one is the difference between an AI tool that feels broken and one that feels like magic. And three providers are fighting hard for the top spot: &lt;strong&gt;Groq&lt;/strong&gt;, &lt;strong&gt;Cerebras&lt;/strong&gt;, and &lt;strong&gt;Google Gemini&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;All three offer genuinely free tiers. All three are fast enough to make GPT-4o feel sluggish by comparison. But they’re fast in different ways, for different reasons, with different trade-offs. This guide breaks down what the numbers actually mean and when you should pick each one.&lt;/p&gt;

&lt;p&gt;I’ve tested all three extensively while building AI tools with &lt;a href="https://openclaw.ai/" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt;, and the results are more nuanced than any single benchmark can capture.&lt;/p&gt;

&lt;h2&gt;
  
  
  What “Speed” Actually Means for AI APIs
&lt;/h2&gt;

&lt;p&gt;Before getting to the numbers, it’s worth being precise about what developers usually care about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Time to First Token (TTFT):&lt;/strong&gt; How long before you see the first word? Critical for interactive chat and streaming UX.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Throughput (tokens/second):&lt;/strong&gt; How fast does the full response arrive? Critical for agent loops, batch processing, and long outputs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Request latency (end-to-end):&lt;/strong&gt; TTFT + generation time + network. What your users actually experience.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Daily throughput capacity:&lt;/strong&gt; Tokens per day × speed. How much work can you get done in 24 hours for free?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Different providers optimize for different things. Groq’s LPU is designed for raw throughput. Cerebras’ Wafer-Scale Engine eliminates memory bandwidth bottlenecks. Gemini’s infrastructure is optimized for massive scale with very generous daily limits. Knowing which metric you care about most determines which provider wins for your use case.&lt;/p&gt;

&lt;h2&gt;
  
  
  Provider Overviews
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Groq: The LPU Challenger
&lt;/h3&gt;

&lt;p&gt;Groq built custom Language Processing Units (LPUs) from the ground up for AI inference. Unlike GPUs — which were originally designed for graphics and repurposed for AI — LPUs have a deterministic, pipelined architecture optimized specifically for the sequential token generation that transformer models require.&lt;/p&gt;

&lt;p&gt;The result: Groq’s free tier delivers &lt;strong&gt;300–800 tokens per second&lt;/strong&gt; on their best models, with Llama 3.3 70B typically clocking around 300–500 tokens/s and the smaller 8B model hitting 1,500–2,000 tokens/s. No credit card required, and they support 16+ models including reasoning models like DeepSeek R1.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cerebras: The Wafer-Scale Engine
&lt;/h3&gt;

&lt;p&gt;Cerebras went even further than custom chips — they built a chip the size of a dinner plate. The Wafer-Scale Engine 3 (WSE-3) has 46,225 mm² of die area (57x bigger than the largest GPU die) and enough on-chip SRAM to store the full weights of Llama 3.1 70B. No external memory fetches means no memory bandwidth bottleneck.&lt;/p&gt;

&lt;p&gt;The numbers: &lt;strong&gt;~2,100 tokens/second on the 8B model, ~450–500 tokens/second on 70B&lt;/strong&gt;. The catch is a smaller context window (8K tokens) and lower daily request limits (~900 RPD). But for short, latency-sensitive completions, nothing publicly available comes close.&lt;/p&gt;

&lt;h3&gt;
  
  
  Google Gemini: The Scale Play
&lt;/h3&gt;

&lt;p&gt;Google’s free Gemini API tier isn’t trying to win on raw throughput — it’s trying to win on what the model can actually do. Gemini 2.5 Flash on the free tier runs at around &lt;strong&gt;100–200 tokens/second&lt;/strong&gt; (slower than Groq or Cerebras), but it comes with a &lt;strong&gt;1 million token context window&lt;/strong&gt;, multimodal input (images, audio, video, documents), and some of the most generous free rate limits available:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1,500 requests per day&lt;/li&gt;
&lt;li&gt;1 million tokens per minute (with Gemini 2.5 Flash)&lt;/li&gt;
&lt;li&gt;Gemini 2.5 Pro available on free tier (limited)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your task involves processing long documents, analyzing images, or building research tools, Gemini wins — not on speed, but on capability per dollar (which is zero).&lt;/p&gt;

&lt;h2&gt;
  
  
  Head-to-Head: Speed Benchmarks
&lt;/h2&gt;

&lt;p&gt;The table below uses real-world observed numbers from hands-on testing, not just marketing claims. Speeds vary based on load, model, and prompt length.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Best Free Model&lt;/th&gt;
&lt;th&gt;8B-class Speed&lt;/th&gt;
&lt;th&gt;70B-class Speed&lt;/th&gt;
&lt;th&gt;TTFT (typical)&lt;/th&gt;
&lt;th&gt;Context Window&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cerebras&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Llama 3.3 70B&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~2,100 tokens/s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~450–500 tokens/s&lt;/td&gt;
&lt;td&gt;~100–200ms&lt;/td&gt;
&lt;td&gt;8K tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Groq&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Llama 3.3 70B&lt;/td&gt;
&lt;td&gt;~1,500–2,000 tokens/s&lt;/td&gt;
&lt;td&gt;~300–500 tokens/s&lt;/td&gt;
&lt;td&gt;~200–400ms&lt;/td&gt;
&lt;td&gt;128K tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Gemini Flash&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Gemini 2.5 Flash&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;~100–200 tokens/s&lt;/td&gt;
&lt;td&gt;~400–800ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1M tokens&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI GPT-4o (paid)&lt;/td&gt;
&lt;td&gt;GPT-4o&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;~50–100 tokens/s&lt;/td&gt;
&lt;td&gt;~500–1500ms&lt;/td&gt;
&lt;td&gt;128K tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Note: Speeds are approximate and vary by load, time of day, and prompt characteristics. Cerebras and Groq both have occasional rate-limit-induced slowdowns during peak hours.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The raw speed ranking: Cerebras &amp;gt; Groq &amp;gt; Gemini&lt;/strong&gt;. But speed isn’t the only metric that matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  Free Tier Rate Limits Compared
&lt;/h2&gt;

&lt;p&gt;This is where the picture gets more nuanced. Raw speed means nothing if you hit a rate limit every few minutes.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Cerebras&lt;/th&gt;
&lt;th&gt;Groq (per model)&lt;/th&gt;
&lt;th&gt;Gemini 2.5 Flash&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Requests per minute (RPM)&lt;/td&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Requests per day (RPD)&lt;/td&gt;
&lt;td&gt;~900&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;14,400&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1,500&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tokens per minute (TPM)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;60,000&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;6,000–20,000&lt;/td&gt;
&lt;td&gt;1,000,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Daily token capacity&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Very High&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Credit card required&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context window&lt;/td&gt;
&lt;td&gt;8K tokens&lt;/td&gt;
&lt;td&gt;128K tokens&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1M tokens&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multimodal support&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes (image, audio, video)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The practical numbers here: if you’re making 1,000 short API calls per day, Groq’s 14,400 RPD gives you far more headroom than Cerebras’ ~900 RPD. If you’re processing one massive document at a time, Gemini’s 1M context window means you don’t need to chunk at all. If you need a burst of fast processing within a minute, Cerebras’ 60,000 TPM lets you fly through a big batch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Which API Is Actually Fastest for Your Use Case
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Real-Time Chat Applications
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Winner: Cerebras (short prompts) or Groq (longer conversations)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For a real-time chat app where users see tokens streaming in, speed is everything. At 2,100 tokens/second, Cerebras makes even small models feel magical — the first sentence appears before users finish reading the prompt. Groq is nearly as good at 1,500–2,000 tokens/s on 8B models and has the advantage of a 128K context window, meaning you won’t hit limits as conversations grow long.&lt;/p&gt;

&lt;p&gt;Gemini is noticeably slower here. It’s not unusable, but the difference is perceptible in side-by-side testing — especially for longer responses where the lower throughput adds up.&lt;/p&gt;

&lt;h3&gt;
  
  
  AI Agent Loops (Many Small Calls)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Winner: Groq (volume) or Cerebras (speed)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;AI agents make many small LLM calls — routing decisions, tool selection, field extraction, step summarization. If each call is under 2K tokens, Cerebras is fastest. But agents can easily hit 900 daily requests if they’re active, and Groq’s 14,400 RPD ceiling means you’re much less likely to be throttled. In an agentic workload running all day, Groq will actually complete more total work than Cerebras.&lt;/p&gt;

&lt;h3&gt;
  
  
  Document Analysis and Research
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Winner: Gemini (by a large margin)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Groq’s 128K context is good. But Gemini’s 1M token context window changes the category entirely. You can feed a full codebase, a book, a year’s worth of emails, or an entire research paper collection into a single prompt. Neither Groq nor Cerebras can compete with this. If document analysis is your primary use case, Gemini is the only answer in the free tier.&lt;/p&gt;

&lt;h3&gt;
  
  
  Image and Multimodal Tasks
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Winner: Gemini (only option)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Cerebras is text-only. Groq has very limited vision support in preview. Gemini 2.5 Flash handles images, PDFs, audio, and video natively — and it’s free. For anything involving non-text inputs, Gemini is the only serious option in the free tier.&lt;/p&gt;

&lt;h3&gt;
  
  
  Batch Processing and Data Labeling
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Winner: Depends on batch size&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Cerebras wins if your batches are short (under 4K tokens each) and you need fast turnaround within a minute — 60K TPM means you can generate a lot of tokens fast. Groq wins if you need sustained throughput over a full day (14,400 RPD). Gemini wins if each item in your batch is a long document or contains images.&lt;/p&gt;

&lt;h3&gt;
  
  
  High-Quality Reasoning and Complex Tasks
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Winner: Gemini 2.5 Pro (free, limited) or Groq (DeepSeek R1)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Groq’s free tier includes DeepSeek R1 distill models and QwQ-32B — both capable reasoning models. Gemini 2.5 Pro on the free tier (though more limited in requests) is genuinely state-of-the-art on complex reasoning benchmarks. Cerebras only runs Llama and Qwen models, which are strong but not in the same class as Gemini 2.5 Pro for hard tasks.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Get Your API Keys
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Groq
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;a href="https://console.groq.com" rel="noopener noreferrer"&gt;console.groq.com&lt;/a&gt; and sign up&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;API Keys&lt;/strong&gt; in the sidebar&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Create API Key&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Cerebras
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;a href="https://cloud.cerebras.ai" rel="noopener noreferrer"&gt;cloud.cerebras.ai&lt;/a&gt; and create an account&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;API Keys&lt;/strong&gt; in the left sidebar&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Create new API key&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Google Gemini
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;a href="https://aistudio.google.com/app/apikey" rel="noopener noreferrer"&gt;Google AI Studio&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Create API key&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;No billing required for the free tier&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;All three require no credit card and take under five minutes to set up.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmark Test Script: Measure Speed Yourself
&lt;/h2&gt;

&lt;p&gt;Don’t take these numbers on faith — test them yourself. Here’s a script that measures tokens per second across all three providers simultaneously:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;google.generativeai&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;

&lt;span class="c1"&gt;# Configure all three clients
&lt;/span&gt;&lt;span class="n"&gt;groq_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GROQ_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.groq.com/openai/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;cerebras_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CEREBRAS_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.cerebras.ai/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;configure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GEMINI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;gemini_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GenerativeModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.0-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;TEST_PROMPT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a detailed explanation of how transformer attention mechanisms work, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;including the mathematical formulation of scaled dot-product attention, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;multi-head attention, and how positional encodings are applied. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Include Python code examples where relevant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;benchmark_openai_compatible&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;provider_name&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Benchmark an OpenAI-compatible streaming endpoint.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;[&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;provider_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;] Starting benchmark...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;first_token_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;token_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

    &lt;span class="n"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;TEST_PROMPT&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
        &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;800&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;delta&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;first_token_time&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;first_token_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                &lt;span class="n"&gt;ttft&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;first_token_time&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;
                &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  Time to first token: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ttft&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;token_count&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

    &lt;span class="n"&gt;elapsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;
    &lt;span class="n"&gt;tokens_per_sec&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;token_count&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;elapsed&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;elapsed&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  Total time: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;elapsed&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  Estimated throughput: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tokens_per_sec&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; words/s (~&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tokens_per_sec&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;1.3&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; tokens/s)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;benchmark_gemini&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Benchmark Gemini with streaming.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;[Gemini] Starting benchmark...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;first_token_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;token_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;gemini_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TEST_PROMPT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;first_token_time&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;first_token_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                &lt;span class="n"&gt;ttft&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;first_token_time&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;
                &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  Time to first token: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ttft&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;token_count&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

    &lt;span class="n"&gt;elapsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;
    &lt;span class="n"&gt;tokens_per_sec&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;token_count&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;elapsed&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;elapsed&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  Total time: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;elapsed&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  Estimated throughput: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tokens_per_sec&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; words/s (~&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tokens_per_sec&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;1.3&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; tokens/s)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Run benchmarks
&lt;/span&gt;&lt;span class="nf"&gt;benchmark_openai_compatible&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;groq_client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama-3.3-70b-versatile&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Groq (Llama 3.3 70B)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;benchmark_openai_compatible&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cerebras_client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama-3.3-70b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cerebras (Llama 3.3 70B)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;benchmark_gemini&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When I ran this on a typical afternoon (mid-load), the output looked like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Groq (Llama 3.3 70B)] Starting benchmark...
  Time to first token: 0.381s
  Total time: 3.92s
  Estimated throughput: 157 words/s (~204 tokens/s)

[Cerebras (Llama 3.3 70B)] Starting benchmark...
  Time to first token: 0.152s
  Total time: 2.87s
  Estimated throughput: 215 words/s (~280 tokens/s)

[Gemini] Starting benchmark...
  Time to first token: 0.621s
  Total time: 9.44s
  Estimated throughput: 62 words/s (~81 tokens/s)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Results vary significantly by time of day and server load — run the benchmark several times and average the results for a realistic picture.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multi-Provider Setup: Using All Three for Free
&lt;/h2&gt;

&lt;p&gt;The real power move is using all three APIs together. Each has a different strength, and they’re all free. Here’s a routing pattern that picks the right provider based on prompt characteristics:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;google.generativeai&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;

&lt;span class="n"&gt;cerebras&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CEREBRAS_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.cerebras.ai/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;groq&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GROQ_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.groq.com/openai/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;configure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GEMINI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;gemini&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GenerativeModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.0-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;smart_complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;has_image&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;expect_long_context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;need_reasoning&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Route to the best free provider based on task requirements.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="c1"&gt;# Multimodal: only Gemini supports it
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;has_image&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;gemini&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;

    &lt;span class="c1"&gt;# Long context: Groq (128K) or Gemini (1M)
&lt;/span&gt;    &lt;span class="n"&gt;estimated_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;1.3&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;estimated_tokens&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;8000&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;expect_long_context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;estimated_tokens&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;100_000&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;gemini&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;groq&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama-3.3-70b-versatile&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;

    &lt;span class="c1"&gt;# Complex reasoning: use Groq's DeepSeek R1
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;need_reasoning&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;groq&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deepseek-r1-distill-llama-70b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;

    &lt;span class="c1"&gt;# Default: Cerebras for maximum speed on short prompts
&lt;/span&gt;    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cerebras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama-3.3-70b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Fallback to Groq if Cerebras hits daily limits
&lt;/span&gt;        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;groq&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama-3.3-70b-versatile&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;

&lt;span class="c1"&gt;# Examples
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;smart_complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Summarize what a REST API is in two sentences.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="c1"&gt;# → Uses Cerebras (short prompt, maximum speed)
&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;smart_complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Analyze the following 50-page contract...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;expect_long_context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="c1"&gt;# → Uses Groq (fits in 128K)
&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;smart_complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Solve this logic puzzle step by step...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;need_reasoning&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="c1"&gt;# → Uses Groq with DeepSeek R1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Connecting to OpenClaw: One Config, Three Providers
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://openclaw.ai/" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt; supports multiple providers via a single config file. You can set all three APIs up and switch between them with a model flag, giving you a free AI coding agent with the best provider for each task.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"models"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"merge"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"providers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"cerebras"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"baseUrl"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://api.cerebras.ai/v1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"apiKey"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"YOUR_CEREBRAS_API_KEY"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"api"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"openai-completions"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"models"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"llama-3.3-70b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Llama 3.3 70B (Cerebras - Ultra Fast)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"contextWindow"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8192&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"maxTokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4096&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"groq"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"baseUrl"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://api.groq.com/openai/v1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"apiKey"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"YOUR_GROQ_API_KEY"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"api"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"openai-completions"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"models"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"llama-3.3-70b-versatile"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Llama 3.3 70B (Groq - Long Context)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"contextWindow"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;131072&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"maxTokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8192&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"deepseek-r1-distill-llama-70b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DeepSeek R1 (Groq - Reasoning)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"contextWindow"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;131072&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"maxTokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8192&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"agents"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"defaults"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"primary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"cerebras/llama-3.3-70b"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Save this to &lt;code&gt;~/.openclaw/openclaw.json&lt;/code&gt;. The default model is Cerebras for fast responses on short coding tasks. When working on a large codebase that exceeds 8K tokens of context, switch to &lt;code&gt;groq/llama-3.3-70b-versatile&lt;/code&gt; with the &lt;code&gt;--model&lt;/code&gt; flag. For hard debugging or algorithmic problems, use &lt;code&gt;groq/deepseek-r1-distill-llama-70b&lt;/code&gt; for step-by-step reasoning.&lt;/p&gt;

&lt;p&gt;Using all three free APIs with OpenClaw gives you a capable, genuinely free AI coding assistant that rivals paid tools — you just route the right tasks to the right provider.&lt;/p&gt;

&lt;h2&gt;
  
  
  Latency vs Throughput: Which Matters More for You?
&lt;/h2&gt;

&lt;p&gt;A quick decision guide:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;th&gt;Key Metric&lt;/th&gt;
&lt;th&gt;Best Pick&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Streaming chat UI&lt;/td&gt;
&lt;td&gt;TTFT + throughput&lt;/td&gt;
&lt;td&gt;Cerebras (8B) or Groq&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI agent (many small calls)&lt;/td&gt;
&lt;td&gt;RPD limit + throughput&lt;/td&gt;
&lt;td&gt;Groq (14,400 RPD)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Document summarization&lt;/td&gt;
&lt;td&gt;Context window&lt;/td&gt;
&lt;td&gt;Gemini (1M tokens)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Image/PDF analysis&lt;/td&gt;
&lt;td&gt;Multimodal support&lt;/td&gt;
&lt;td&gt;Gemini (only option)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Batch data labeling (short)&lt;/td&gt;
&lt;td&gt;TPM + throughput&lt;/td&gt;
&lt;td&gt;Cerebras (60K TPM)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hard reasoning / math&lt;/td&gt;
&lt;td&gt;Model quality&lt;/td&gt;
&lt;td&gt;Gemini 2.5 Pro or Groq DeepSeek R1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Voice AI pipeline&lt;/td&gt;
&lt;td&gt;TTFT (latency)&lt;/td&gt;
&lt;td&gt;Cerebras (fastest TTFT)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Development / prototyping&lt;/td&gt;
&lt;td&gt;Model variety&lt;/td&gt;
&lt;td&gt;Groq (16+ models)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  The Elephant in the Room: Model Quality
&lt;/h2&gt;

&lt;p&gt;Speed comparisons can obscure an important truth: the underlying models matter. Groq and Cerebras both serve Llama 3.3 70B — it’s a strong model, but not state-of-the-art on hard benchmarks. Gemini 2.5 Flash and 2.5 Pro are measurably better on complex tasks, coding challenges, and reasoning.&lt;/p&gt;

&lt;p&gt;A 5x faster response doesn’t help if the answer is wrong or shallow. For high-stakes tasks — complex code review, nuanced analysis, hard math — the quality difference between Llama 70B and Gemini 2.5 Pro matters more than the throughput difference. For simpler tasks like summarization, classification, extraction, and short code generation, Llama 70B is entirely capable and the speed advantage of Cerebras/Groq becomes dominant.&lt;/p&gt;

&lt;h2&gt;
  
  
  Limitations and Honest Caveats
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Cerebras
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;8K context window is a hard constraint — no long documents, no extended conversations&lt;/li&gt;
&lt;li&gt;~900 RPD is the lowest of the three — runs out faster than you’d expect in agentic workloads&lt;/li&gt;
&lt;li&gt;Text-only, no vision support&lt;/li&gt;
&lt;li&gt;US-centric infrastructure — higher network latency for users in Asia/Europe&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Groq
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Per-model rate limits — if you always use the same model, you burn through that model’s daily quota faster&lt;/li&gt;
&lt;li&gt;Speed varies significantly during peak hours — marketed speeds are best-case, not sustained&lt;/li&gt;
&lt;li&gt;Context quality degrades for very long prompts (the model, not the API)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Gemini
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Noticeably slower throughput — 100–200 tokens/s vs 500–2,100 for the others&lt;/li&gt;
&lt;li&gt;Gemini 2.5 Pro on free tier has very restricted rate limits (2 RPM as of early 2026)&lt;/li&gt;
&lt;li&gt;API terms and free tier availability may change — Google has historically been unpredictable about free API access&lt;/li&gt;
&lt;li&gt;Some features (system instructions, JSON mode) work differently than OpenAI’s API, requiring library adjustments&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Honest Verdict
&lt;/h2&gt;

&lt;p&gt;If you’re only going to use one API, pick Groq. It’s the best all-rounder: fast enough (300–500 tokens/s on 70B), generous daily limits (14,400 RPD), 128K context, 16+ models including reasoning models, and fully OpenAI-compatible. It handles 90% of use cases well without any of the awkward trade-offs.&lt;/p&gt;

&lt;p&gt;If you need maximum raw speed for short prompts, add Cerebras. It’s genuinely faster than Groq on latency and throughput when prompts fit in 8K — use it for real-time chat, voice AI, and agent tool calls.&lt;/p&gt;

&lt;p&gt;If you’re doing anything with long documents, images, or hard reasoning tasks, add Gemini. The 1M token context window and multimodal support put it in a completely different category from Groq and Cerebras for those specific tasks.&lt;/p&gt;

&lt;p&gt;The best setup? Use all three. They’re all free, they take ten minutes to set up, and they complement each other perfectly. Route by task, stack your free quotas, and you have an AI infrastructure that costs exactly nothing.&lt;/p&gt;

&lt;p&gt;Want to go deeper on any of these providers? Check out our full guides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/groq-api-fastest-free-ai-api-2026/"&gt;Groq API: The Fastest Free AI API in 2026&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/cerebras-inference-api-fastest-free-ai-api/"&gt;Cerebras Inference API: The Fastest Free AI API You’ve Never Heard Of&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/google-gemini-api-best-free-ai-api-2026/"&gt;Google Gemini API: The Best Free AI API in 2026&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/10-best-free-ai-apis-2026-ultimate-comparison/"&gt;10 Best Free AI APIs in 2026: The Ultimate Comparison&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Related Reads
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/cohere-rag-api/" rel="noopener noreferrer"&gt;Cohere Free API: The Best Free Embedding and Rerank API for RAG in 2026&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/cerebras-free-api/" rel="noopener noreferrer"&gt;Cerebras Inference API: The Fastest Free AI API You’ve Never Heard Of&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/mistral-free-api/" rel="noopener noreferrer"&gt;Mistral AI Free API: Call Nemo and Mixtral for Free with Any OpenAI SDK&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/github-models-free-api/" rel="noopener noreferrer"&gt;GitHub Models: Free GPT-4o and Llama API for Every Developer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/cloudflare-workers-ai/" rel="noopener noreferrer"&gt;Cloudflare Workers AI: Free Edge AI Inference with 47+ Models&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://toolfreebie.com/groq-vs-cerebras-vs-gemini/" rel="noopener noreferrer"&gt;toolfreebie.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Cerebras Inference API: The Fastest Free AI API You’ve Never Heard Of</title>
      <dc:creator>toolfreebie</dc:creator>
      <pubDate>Sun, 03 May 2026 11:51:30 +0000</pubDate>
      <link>https://dev.to/build996/cerebras-inference-api-the-fastest-free-ai-api-youve-never-heard-of-n9b</link>
      <guid>https://dev.to/build996/cerebras-inference-api-the-fastest-free-ai-api-youve-never-heard-of-n9b</guid>
      <description>&lt;h2&gt;
  
  
  What Is Cerebras? The Chip Company That’s Also a Free AI API
&lt;/h2&gt;

&lt;p&gt;Cerebras Systems is best known for building the Wafer-Scale Engine (WSE) — a chip the size of a dinner plate with over 4 trillion transistors, purpose-built for AI. What most developers don’t realize is that Cerebras also offers a &lt;strong&gt;free cloud inference API&lt;/strong&gt; that consistently outpaces Groq on smaller models and rivals it on 70B-class models.&lt;/p&gt;

&lt;p&gt;If you’ve only heard of Groq as the “fast free AI API,” it’s time to put Cerebras on your radar. No credit card required, OpenAI-compatible endpoint, and benchmarks that speak for themselves.&lt;/p&gt;

&lt;p&gt;In this guide, you’ll learn how to get your free Cerebras API key, make your first call with Python or JavaScript, and connect it to &lt;a href="https://openclaw.ai/" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt; for a fully free, ultra-fast AI agent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Cerebras Is So Fast: The Chip Story in 60 Seconds
&lt;/h2&gt;

&lt;p&gt;To understand why Cerebras is fast, you need to understand why GPUs are slow at inference.&lt;/p&gt;

&lt;p&gt;When you run a model on a GPU cluster, the model’s weights live in external HBM (High Bandwidth Memory). Every time the chip generates a token, it has to pull weights from that external memory. This memory bandwidth bottleneck is the core reason GPU inference tops out around 100–150 tokens per second, even on expensive A100s.&lt;/p&gt;

&lt;p&gt;The Cerebras WSE-3 is different. At 46,225 mm², it’s 57x larger than the biggest GPU die. The entire Llama 3.1 70B model — all 70 billion parameters — fits directly in the chip’s on-chip SRAM. There’s no external memory fetch, no bandwidth bottleneck. The chip just computes.&lt;/p&gt;

&lt;p&gt;The result is inference speeds that most GPU providers can’t touch:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Llama 3.1 8B:&lt;/strong&gt; ~2,100 tokens/second&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Llama 3.1 70B:&lt;/strong&gt; ~450–500 tokens/second&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Llama 3.3 70B:&lt;/strong&gt; ~450 tokens/second&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For context, Groq runs Llama 3.3 70B at around 300–500 tokens/second. OpenAI GPT-4o is closer to 50–100. Cerebras is legitimately the fastest publicly accessible AI inference in 2026.&lt;/p&gt;

&lt;h2&gt;
  
  
  Available Free Models on Cerebras
&lt;/h2&gt;

&lt;p&gt;Cerebras’ free tier gives you access to several high-quality open-source models:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model ID&lt;/th&gt;
&lt;th&gt;Parameters&lt;/th&gt;
&lt;th&gt;Context Window&lt;/th&gt;
&lt;th&gt;Speed (approx)&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;llama3.1-8b&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;8B&lt;/td&gt;
&lt;td&gt;8K tokens&lt;/td&gt;
&lt;td&gt;~2,100 tokens/s&lt;/td&gt;
&lt;td&gt;Maximum speed, chat, code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;llama3.1-70b&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;70B&lt;/td&gt;
&lt;td&gt;8K tokens&lt;/td&gt;
&lt;td&gt;~500 tokens/s&lt;/td&gt;
&lt;td&gt;Higher quality, reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;llama-3.3-70b&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;70B&lt;/td&gt;
&lt;td&gt;8K tokens&lt;/td&gt;
&lt;td&gt;~450 tokens/s&lt;/td&gt;
&lt;td&gt;Best quality on Cerebras&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;qwen-3-32b&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;32B&lt;/td&gt;
&lt;td&gt;32K tokens&lt;/td&gt;
&lt;td&gt;~700 tokens/s&lt;/td&gt;
&lt;td&gt;Multilingual, coding&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Note that model availability may be updated over time. Check the &lt;a href="https://cloud.cerebras.ai" rel="noopener noreferrer"&gt;Cerebras Cloud Console&lt;/a&gt; for the current model list.&lt;/p&gt;

&lt;h2&gt;
  
  
  Free Tier Rate Limits
&lt;/h2&gt;

&lt;p&gt;Cerebras’ free tier is genuinely usable for development and side projects:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Limit Type&lt;/th&gt;
&lt;th&gt;Free Tier&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Requests per minute&lt;/td&gt;
&lt;td&gt;30 RPM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tokens per minute&lt;/td&gt;
&lt;td&gt;60,000 TPM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Requests per day&lt;/td&gt;
&lt;td&gt;~900 RPD&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Credit card required&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The 60,000 tokens per minute is especially generous. At ~2,100 tokens/second for the 8B model, you could burn through an entire minute’s TPM allowance in about 30 seconds of continuous generation — but in practice, request latency means you won’t hit that ceiling often. For typical interactive workloads, the free tier is more than enough.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Get Your Free Cerebras API Key
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;a href="https://cloud.cerebras.ai" rel="noopener noreferrer"&gt;cloud.cerebras.ai&lt;/a&gt; and create an account (email, Google, or GitHub)&lt;/li&gt;
&lt;li&gt;After logging in, click &lt;strong&gt;“API Keys”&lt;/strong&gt; in the left sidebar&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;“Create new API key”&lt;/strong&gt; and give it a name&lt;/li&gt;
&lt;li&gt;Copy the key — it’s only shown once&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No credit card, no billing form, no trial period countdown. You’re making API calls in under two minutes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using the Cerebras API with Python
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Option 1: Install the Official Cerebras SDK
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;cerebras-cloud-sdk
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Basic Chat Completion
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;cerebras.cloud.sdk&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Cerebras&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Cerebras&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_CEREBRAS_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama-3.3-70b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful coding assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a Python function that checks if a number is prime&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Streaming Responses
&lt;/h3&gt;

&lt;p&gt;With Cerebras generating 2,100 tokens/second on the 8B model, streaming feels nearly instantaneous — tokens arrive faster than you can read them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;cerebras.cloud.sdk&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Cerebras&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Cerebras&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_CEREBRAS_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama3.1-8b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Explain Python decorators with three practical examples&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;delta&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flush&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Option 2: Use the OpenAI SDK (Drop-in Replacement)
&lt;/h3&gt;

&lt;p&gt;Cerebras is fully OpenAI-compatible. If your project already uses the OpenAI Python SDK, you only change the base URL and model name:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_CEREBRAS_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.cerebras.ai/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama-3.3-70b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What are the main differences between PostgreSQL and SQLite?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This makes it trivially easy to add Cerebras as a fast fallback or alternative in any project that already supports OpenAI.&lt;/p&gt;

&lt;h3&gt;
  
  
  Async Support
&lt;/h3&gt;

&lt;p&gt;The Cerebras SDK also supports async/await for high-throughput applications:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;cerebras.cloud.sdk&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AsyncCerebras&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;batch_classify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AsyncCerebras&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_CEREBRAS_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama3.1-8b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Classify this text as positive, negative, or neutral. Reply with one word only.&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;

&lt;span class="n"&gt;texts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;This product exceeded my expectations!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Totally disappointed, waste of money.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;It arrived on time and works fine.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;labels&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;batch_classify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# ['positive', 'negative', 'neutral']
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  JSON Mode
&lt;/h3&gt;

&lt;p&gt;Force structured JSON output — essential for building data pipelines and parsers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;cerebras.cloud.sdk&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Cerebras&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Cerebras&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_CEREBRAS_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama-3.3-70b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Extract the following from this job posting as JSON:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- title (string)&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;- company (string)&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;- salary_range (string or null)&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;- remote (boolean)&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Posting: &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Senior Python Engineer at DataCorp, $140k–$170k, fully remote position.&lt;/span&gt;&lt;span class="sh"&gt;'"&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;response_format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;json_object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# {"title": "Senior Python Engineer", "company": "DataCorp", "salary_range": "$140k–$170k", "remote": true}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Using the Cerebras API with JavaScript / Node.js
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; @cerebras/cerebras_cloud_sdk
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;Cerebras&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@cerebras/cerebras_cloud_sdk&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Cerebras&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CEREBRAS_API_KEY&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;llama-3.3-70b&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Write a TypeScript interface for a REST API response with pagination&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Streaming in Node.js
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;Cerebras&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@cerebras/cerebras_cloud_sdk&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Cerebras&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CEREBRAS_API_KEY&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;llama3.1-8b&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Explain event loops in JavaScript&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
  &lt;span class="na"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;delta&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]?.&lt;/span&gt;&lt;span class="nx"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stdout&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Using Cerebras API with the OpenAI SDK (Any Language)
&lt;/h2&gt;

&lt;p&gt;Because Cerebras uses an OpenAI-compatible endpoint, you can plug it into any library or framework that supports custom base URLs. Here’s a quick reference:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Base URL&lt;/td&gt;
&lt;td&gt;&lt;code&gt;https://api.cerebras.ai/v1&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API Key Header&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Authorization: Bearer YOUR_KEY&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chat endpoint&lt;/td&gt;
&lt;td&gt;&lt;code&gt;POST /v1/chat/completions&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Models endpoint&lt;/td&gt;
&lt;td&gt;&lt;code&gt;GET /v1/models&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This means Cerebras works as a drop-in replacement anywhere you use OpenAI — LangChain, LlamaIndex, LiteLLM, OpenWebUI, and hundreds of other tools.&lt;/p&gt;

&lt;h3&gt;
  
  
  Using with LiteLLM (One Line of Code)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;litellm
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;litellm&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;completion&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CEREBRAS_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_CEREBRAS_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;completion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cerebras/llama-3.3-70b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What are the SOLID principles?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;LiteLLM has native Cerebras support, making it effortless to switch between Cerebras, Groq, OpenAI, and other providers in the same codebase.&lt;/p&gt;

&lt;h2&gt;
  
  
  Connect Cerebras to OpenClaw (Free Ultra-Fast AI Agent)
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://openclaw.ai/" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt; is an open-source AI agent platform that supports custom API endpoints. Connecting it to Cerebras gives you an AI coding agent with response times that feel nearly instant.&lt;/p&gt;

&lt;h3&gt;
  
  
  Quick Setup via Onboarding
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; openclaw@latest
openclaw onboard
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When prompted for a provider, select &lt;strong&gt;Custom OpenAI-compatible&lt;/strong&gt;, enter the base URL &lt;code&gt;https://api.cerebras.ai/v1&lt;/code&gt;, paste your API key, and pick &lt;code&gt;llama-3.3-70b&lt;/code&gt; as your default model.&lt;/p&gt;

&lt;h3&gt;
  
  
  Manual Configuration
&lt;/h3&gt;

&lt;p&gt;Edit &lt;code&gt;~/.openclaw/openclaw.json&lt;/code&gt; to add Cerebras as a provider:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"models"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"merge"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"providers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"cerebras"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"baseUrl"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://api.cerebras.ai/v1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"apiKey"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"YOUR_CEREBRAS_API_KEY"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"api"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"openai-completions"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"models"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"llama-3.3-70b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Llama 3.3 70B (Cerebras)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"reasoning"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"input"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"contextWindow"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8192&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"maxTokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4096&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"llama3.1-8b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Llama 3.1 8B (Cerebras)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"reasoning"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"input"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"contextWindow"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8192&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"maxTokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4096&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"agents"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"defaults"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"primary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"cerebras/llama-3.3-70b"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"models"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"cerebras/llama-3.3-70b"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once configured, OpenClaw uses Cerebras for completions. Ask it to write code, review a file, or explain a function — and watch how fast it responds. For quick tasks like generating boilerplate or explaining an error message, the 8B model at 2,100 tokens/second means you get the full answer before you’d even see the first sentence from most other providers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cerebras vs Groq: Which Free AI API Is Faster?
&lt;/h2&gt;

&lt;p&gt;Both Cerebras and Groq market themselves as the fastest AI inference available. Here’s an honest comparison:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Cerebras&lt;/th&gt;
&lt;th&gt;Groq&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;8B model speed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~2,100 tokens/s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~1,500–2,000 tokens/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;70B model speed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~450–500 tokens/s&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~300–500 tokens/s&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best free model quality&lt;/td&gt;
&lt;td&gt;Llama 3.3 70B&lt;/td&gt;
&lt;td&gt;Llama 3.3 70B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context window&lt;/td&gt;
&lt;td&gt;8K tokens&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;128K tokens&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Free RPD&lt;/td&gt;
&lt;td&gt;~900&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;14,400&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Free TPM&lt;/td&gt;
&lt;td&gt;60,000&lt;/td&gt;
&lt;td&gt;6,000–20,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI compatible&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Credit card required&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Models available&lt;/td&gt;
&lt;td&gt;4–6&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;16+&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vision support&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Limited (preview)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The honest verdict:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If you need &lt;strong&gt;raw speed&lt;/strong&gt; and your prompt fits in 8K tokens: Cerebras wins (or ties) on throughput&lt;/li&gt;
&lt;li&gt;If you need &lt;strong&gt;long context&lt;/strong&gt; (documents, large codebases): Groq wins by a large margin (128K vs 8K)&lt;/li&gt;
&lt;li&gt;If you need &lt;strong&gt;higher daily request volume&lt;/strong&gt;: Groq wins at 14,400 RPD vs ~900&lt;/li&gt;
&lt;li&gt;If you need &lt;strong&gt;more model variety&lt;/strong&gt;: Groq wins with 16+ models&lt;/li&gt;
&lt;li&gt;If you need &lt;strong&gt;higher tokens per minute&lt;/strong&gt;: Cerebras wins (60K TPM vs 6K–20K)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The practical recommendation: keep both keys. Use Cerebras for short, frequent completions where speed is paramount (tool calls, classifier chains, real-time chat). Use Groq when you need long context or higher daily limits.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cerebras vs Other Free AI APIs
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Cerebras&lt;/th&gt;
&lt;th&gt;Groq&lt;/th&gt;
&lt;th&gt;Google Gemini&lt;/th&gt;
&lt;th&gt;DeepSeek&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Speed&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~2,100 tokens/s (8B)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~1,500 tokens/s (8B)&lt;/td&gt;
&lt;td&gt;~100 tokens/s&lt;/td&gt;
&lt;td&gt;~50–80 tokens/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best free model&lt;/td&gt;
&lt;td&gt;Llama 3.3 70B&lt;/td&gt;
&lt;td&gt;Llama 3.3 70B&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Gemini 2.5 Pro&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;DeepSeek V3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context window&lt;/td&gt;
&lt;td&gt;8K&lt;/td&gt;
&lt;td&gt;128K&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1M&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;128K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multimodal&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best for&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Ultra-fast text&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Fast + high volume&lt;/td&gt;
&lt;td&gt;Complex tasks&lt;/td&gt;
&lt;td&gt;Coding, reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Real-World Use Cases Where Cerebras Shines
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Real-Time AI Chat Applications
&lt;/h3&gt;

&lt;p&gt;When you’re building a customer-facing chat product, the difference between 80 tokens/second and 2,100 tokens/second is the difference between a chat that feels broken and one that feels alive. Cerebras makes even the 8B Llama model feel snappier than GPT-4 Turbo on a good day.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Agentic Tool Calls (Many Short Completions)
&lt;/h3&gt;

&lt;p&gt;AI agents often make dozens of small LLM calls — classifying an intent, extracting a field, choosing between branches, summarizing a step. When each call takes 500ms instead of 3 seconds, your agent loop runs 6x faster. Cerebras at 2,100 tokens/second on the 8B model means a 200-token tool-call response completes in under 100ms.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Voice AI Pipelines
&lt;/h3&gt;

&lt;p&gt;In a speech-to-text → LLM → text-to-speech pipeline, LLM latency is the bottleneck. Cerebras dramatically cuts time-to-first-token. With streaming, you can pipe the first few tokens to TTS before the full response is complete — achieving near-human response latency in voice applications.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Batch Annotation and Labeling
&lt;/h3&gt;

&lt;p&gt;If you’re labeling data for fine-tuning, classifying thousands of records, or running structured extraction over a dataset, Cerebras’ 60,000 TPM free limit combined with its raw throughput means you can process significantly more data per hour than with GPU-based providers.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Developer Tools and CI Integrations
&lt;/h3&gt;

&lt;p&gt;Adding AI to your git hooks, code review bots, or documentation generators? Speed matters when it’s blocking a developer’s workflow. A Cerebras-powered code reviewer that responds in 2 seconds doesn’t disrupt the development loop the way a 15-second GPU call would.&lt;/p&gt;

&lt;h2&gt;
  
  
  Limitations to Know
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Small context window (8K tokens):&lt;/strong&gt; This is the biggest practical limitation. You can’t feed Cerebras a large codebase, a long document, or an extended conversation history. For long-context work, use Gemini Free (1M tokens) or Groq (128K).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Text only:&lt;/strong&gt; No image input, no vision support, no multimodal capabilities as of 2026. Cerebras is purely for text completions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fewer models:&lt;/strong&gt; Groq offers 16+ models; Cerebras has 4–6. If you need a specific architecture (Gemma, Qwen with vision, Mistral), you may not find it here.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lower daily request limit:&lt;/strong&gt; ~900 RPD is limiting compared to Groq’s 14,400. High-volume production workloads will hit this quickly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No fine-tuning:&lt;/strong&gt; The free tier is inference-only. No custom model training.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;US-based inference:&lt;/strong&gt; Cerebras’ infrastructure is US-centric. If your users are in Asia/Europe, you may see higher latency on the network round-trip even if the inference itself is blazing fast.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How to Check Your Current Limits and Usage
&lt;/h2&gt;

&lt;p&gt;You can see your current rate limits and usage directly in the Cerebras Cloud Console:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Log in at &lt;a href="https://cloud.cerebras.ai" rel="noopener noreferrer"&gt;cloud.cerebras.ai&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Navigate to &lt;strong&gt;“Usage”&lt;/strong&gt; in the left sidebar to see tokens consumed and request counts&lt;/li&gt;
&lt;li&gt;Navigate to &lt;strong&gt;“API Keys”&lt;/strong&gt; to manage and rotate your keys&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You can also query the limits programmatically by checking response headers after each API call. The &lt;code&gt;X-RateLimit-Remaining-Requests&lt;/code&gt; and &lt;code&gt;X-RateLimit-Remaining-Tokens&lt;/code&gt; headers tell you how much headroom you have left in the current window.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;

&lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer YOUR_CEREBRAS_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content-Type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama3.1-8b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.cerebras.ai/v1/chat/completions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Remaining requests:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X-RateLimit-Remaining-Requests&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Remaining tokens:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X-RateLimit-Remaining-Tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;choices&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Combining Cerebras and Groq: A Practical Multi-Provider Strategy
&lt;/h2&gt;

&lt;p&gt;Here’s a pattern used in production: use Cerebras for short, latency-sensitive calls and Groq as the fallback when context is longer or daily limits are exhausted.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="n"&gt;cerebras&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CEREBRAS_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.cerebras.ai/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;groq&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GROQ_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.groq.com/openai/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;smart_complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_context_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Use Cerebras for short prompts, Groq for long ones.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;estimated_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;1.3&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;estimated_tokens&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;max_context_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cerebras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama-3.3-70b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;pass&lt;/span&gt;  &lt;span class="c1"&gt;# Fall through to Groq on rate limit or error
&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;groq&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama-3.3-70b-versatile&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This strategy gives you the best of both worlds: Cerebras' speed for short completions, Groq's long context and higher daily limits for heavier workloads — all completely free.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Reads
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/cohere-rag-api/" rel="noopener noreferrer"&gt;Cohere Free API: The Best Free Embedding and Rerank API for RAG in 2026&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/groq-vs-cerebras-vs-gemini/" rel="noopener noreferrer"&gt;Groq vs Cerebras vs Gemini: Which Free AI API Is Actually Fastest in 2026?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/mistral-free-api/" rel="noopener noreferrer"&gt;Mistral AI Free API: Call Nemo and Mixtral for Free with Any OpenAI SDK&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/github-models-free-api/" rel="noopener noreferrer"&gt;GitHub Models: Free GPT-4o and Llama API for Every Developer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/cloudflare-workers-ai/" rel="noopener noreferrer"&gt;Cloudflare Workers AI: Free Edge AI Inference with 47+ Models&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Cerebras is the best-kept secret in free AI APIs. While everyone talks about Groq, Cerebras has been quietly delivering the fastest raw inference speeds on the market — powered by hardware that's genuinely unlike anything else in the industry.&lt;/p&gt;

&lt;p&gt;The 8K context window is a real limitation, and it means Cerebras isn't the right tool for every job. But for short, latency-critical completions — real-time chat, agentic tool calls, voice pipelines, developer tools — it's hard to beat 2,100 tokens per second with zero dollars spent.&lt;/p&gt;

&lt;p&gt;Get your free API key at &lt;a href="https://cloud.cerebras.ai" rel="noopener noreferrer"&gt;cloud.cerebras.ai&lt;/a&gt;, pair it with &lt;a href="https://openclaw.ai/" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt;, and experience what AI inference feels like when the hardware bottleneck is gone.&lt;/p&gt;

&lt;p&gt;And if you want to compare all the best free AI APIs side-by-side, check out our guide: &lt;a href="https://dev.to/10-best-free-ai-apis-2026-ultimate-comparison/"&gt;10 Best Free AI APIs in 2026: The Ultimate Comparison&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://toolfreebie.com/cerebras-free-api/" rel="noopener noreferrer"&gt;toolfreebie.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>opensource</category>
    </item>
    <item>
      <title>GitHub Models: Free GPT-4o and Llama API for Every Developer</title>
      <dc:creator>toolfreebie</dc:creator>
      <pubDate>Sun, 03 May 2026 11:45:58 +0000</pubDate>
      <link>https://dev.to/build996/github-models-free-gpt-4o-and-llama-api-for-every-developer-22oh</link>
      <guid>https://dev.to/build996/github-models-free-gpt-4o-and-llama-api-for-every-developer-22oh</guid>
      <description>&lt;h2&gt;
  
  
  What Is GitHub Models?
&lt;/h2&gt;

&lt;p&gt;GitHub Models gives every developer with a GitHub account free access to top AI models — including &lt;strong&gt;GPT-4o, GPT-4o mini, Llama 3.3, Phi-4, Mistral, and more&lt;/strong&gt; — through a standard OpenAI-compatible API. No credit card, no new account signup: you just use your existing GitHub personal access token.&lt;/p&gt;

&lt;p&gt;Launched in 2024 and now generally available, GitHub Models is built into the platform 100 million developers already use every day. Whether you’re testing a new idea, building a coding assistant, or running experiments, you’re a single API call away from production-grade AI models.&lt;/p&gt;

&lt;h2&gt;
  
  
  Available Free Models
&lt;/h2&gt;

&lt;p&gt;GitHub Models hosts a curated list of frontier models from multiple providers:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Context Window&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;gpt-4o&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;128K tokens&lt;/td&gt;
&lt;td&gt;Complex reasoning, general use&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;gpt-4o-mini&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;128K tokens&lt;/td&gt;
&lt;td&gt;Fast, low-cost tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;o1-mini&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;128K tokens&lt;/td&gt;
&lt;td&gt;Math, coding, reasoning chains&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Llama-3.3-70B-Instruct&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Meta&lt;/td&gt;
&lt;td&gt;128K tokens&lt;/td&gt;
&lt;td&gt;Open-source, high quality&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Phi-4&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Microsoft&lt;/td&gt;
&lt;td&gt;16K tokens&lt;/td&gt;
&lt;td&gt;Lightweight, on-device use cases&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Mistral-small&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Mistral AI&lt;/td&gt;
&lt;td&gt;128K tokens&lt;/td&gt;
&lt;td&gt;Multilingual, EU data residency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cohere Command R+&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Cohere&lt;/td&gt;
&lt;td&gt;128K tokens&lt;/td&gt;
&lt;td&gt;RAG, enterprise search&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AI21 Jamba 1.5&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AI21 Labs&lt;/td&gt;
&lt;td&gt;256K tokens&lt;/td&gt;
&lt;td&gt;Long documents, summarization&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The model list grows as GitHub adds new providers. You can see the full current catalog in the &lt;a href="https://github.com/marketplace/models" rel="noopener noreferrer"&gt;GitHub Marketplace Models&lt;/a&gt; section.&lt;/p&gt;

&lt;h2&gt;
  
  
  Free Tier Rate Limits
&lt;/h2&gt;

&lt;p&gt;GitHub Models uses a tiered rate limit system based on your GitHub plan:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tier&lt;/th&gt;
&lt;th&gt;Requests/Min (Low)&lt;/th&gt;
&lt;th&gt;Requests/Day (Low)&lt;/th&gt;
&lt;th&gt;Requests/Min (High)&lt;/th&gt;
&lt;th&gt;Requests/Day (High)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Free account&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;150&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Copilot Free&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;150&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Copilot Pro&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;td&gt;1,000&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;180&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Copilot Business/Enterprise&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;td&gt;5,000&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;600&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Low-tier models (like gpt-4o-mini and Llama-3.3-70B) have higher rate limits than high-tier models (gpt-4o, o1-mini). For prototyping and personal projects, the free tier is more than adequate.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Get Started in 2 Minutes
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;a href="https://github.com/settings/tokens" rel="noopener noreferrer"&gt;github.com/settings/tokens&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;“Generate new token (classic)”&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Give it a name, set an expiration, and check &lt;strong&gt;no scopes&lt;/strong&gt; — GitHub Models only requires a valid token, no special permissions&lt;/li&gt;
&lt;li&gt;Copy your token&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That’s it. No API dashboard, no payment method, no waitlist. Use your GitHub token directly as your API key.&lt;/p&gt;

&lt;h2&gt;
  
  
  Making Your First API Call
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Python (using the OpenAI SDK)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;openai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_GITHUB_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://models.inference.ai.azure.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful coding assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Explain the difference between async and threading in Python&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;base_url&lt;/code&gt; points to Azure’s inference endpoint, which GitHub Models uses under the hood. Your GitHub token authenticates the request transparently.&lt;/p&gt;

&lt;h3&gt;
  
  
  Switching Models
&lt;/h3&gt;

&lt;p&gt;Changing models is as simple as swapping the model string:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;            &lt;span class="c1"&gt;# GPT-4o (OpenAI)
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;       &lt;span class="c1"&gt;# GPT-4o Mini (faster, cheaper limits)
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Meta-Llama-3.3-70B-Instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# Llama 3.3 70B (Meta)
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Phi-4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;             &lt;span class="c1"&gt;# Phi-4 (Microsoft)
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Mistral-small&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;     &lt;span class="c1"&gt;# Mistral Small
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Streaming Responses
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_GITHUB_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://models.inference.ai.azure.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a Python script to parse JSON from a REST API&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;delta&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flush&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Multimodal: Analyze Images with GPT-4o
&lt;/h3&gt;

&lt;p&gt;GitHub Models includes GPT-4o’s vision capabilities. Analyze screenshots, diagrams, or any image file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;base64&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_GITHUB_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://models.inference.ai.azure.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;diagram.png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;image_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;base64&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;b64encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data:image/png;base64,&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;image_data&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What does this architecture diagram show?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  JavaScript / Node.js
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;openai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;OpenAI&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;openai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;GITHUB_TOKEN&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;baseURL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://models.inference.ai.azure.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-4o&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Review this code for security issues&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Using the GitHub SDK (Optional)
&lt;/h3&gt;

&lt;p&gt;GitHub also provides a first-party SDK with full TypeScript types:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; @octokit/core
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;ModelClient&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@azure-rest/ai-inference&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;AzureKeyCredential&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@azure/core-auth&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ModelClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://models.inference.ai.azure.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;AzureKeyCredential&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;GITHUB_TOKEN&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/chat/completions&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Meta-Llama-3.3-70B-Instruct&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Summarize the key points of the CAP theorem&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Connect GitHub Models to OpenClaw
&lt;/h2&gt;

&lt;p&gt;You can use GitHub Models as the backend for a free AI agent via &lt;a href="https://openclaw.ai/" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt;. Since the endpoint is fully OpenAI-compatible, the setup takes about a minute.&lt;/p&gt;

&lt;h3&gt;
  
  
  Quick Setup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; openclaw@latest
openclaw onboard
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When prompted, select &lt;strong&gt;Custom / OpenAI-compatible provider&lt;/strong&gt; and enter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Base URL:&lt;/strong&gt; &lt;code&gt;https://models.inference.ai.azure.com&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API Key:&lt;/strong&gt; your GitHub personal access token&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model:&lt;/strong&gt; &lt;code&gt;gpt-4o&lt;/code&gt; or &lt;code&gt;Meta-Llama-3.3-70B-Instruct&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Manual Configuration
&lt;/h3&gt;

&lt;p&gt;Edit &lt;code&gt;~/.openclaw/openclaw.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"models"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"merge"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"providers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"github-models"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"baseUrl"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://models.inference.ai.azure.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"apiKey"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"YOUR_GITHUB_TOKEN"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"api"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"openai-completions"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"models"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gpt-4o"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"GPT-4o (GitHub Models)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"reasoning"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"input"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"image"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"contextWindow"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;128000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"maxTokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4096&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Meta-Llama-3.3-70B-Instruct"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Llama 3.3 70B (GitHub Models)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"reasoning"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"input"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"contextWindow"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;128000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"maxTokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4096&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"agents"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"defaults"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"primary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"github-models/gpt-4o"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"models"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"github-models/gpt-4o"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With this, you get a free AI agent powered by GPT-4o — the same model behind ChatGPT Plus — using nothing more than your existing GitHub account.&lt;/p&gt;

&lt;h2&gt;
  
  
  GitHub Models vs Other Free AI APIs
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;GitHub Models&lt;/th&gt;
&lt;th&gt;Google Gemini&lt;/th&gt;
&lt;th&gt;Groq&lt;/th&gt;
&lt;th&gt;OpenRouter&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GPT-4o Access&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes (free)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Limited (paid)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Signup Required&lt;/td&gt;
&lt;td&gt;No (uses GitHub)&lt;/td&gt;
&lt;td&gt;Google account&lt;/td&gt;
&lt;td&gt;New account&lt;/td&gt;
&lt;td&gt;New account&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Speed&lt;/td&gt;
&lt;td&gt;~100 tokens/s&lt;/td&gt;
&lt;td&gt;~100 tokens/s&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;300–800 tokens/s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Varies by model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Free Daily Requests&lt;/td&gt;
&lt;td&gt;150–5,000&lt;/td&gt;
&lt;td&gt;100–1,500&lt;/td&gt;
&lt;td&gt;14,400&lt;/td&gt;
&lt;td&gt;~200 (free models)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vision Support&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes (GPT-4o)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;Yes (select models)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model Variety&lt;/td&gt;
&lt;td&gt;15+ curated&lt;/td&gt;
&lt;td&gt;Gemini family&lt;/td&gt;
&lt;td&gt;16+ Llama/Mistral&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;300+ models&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI Compatible&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best For&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Access to GPT-4o free&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Long context tasks&lt;/td&gt;
&lt;td&gt;Real-time speed&lt;/td&gt;
&lt;td&gt;Model variety&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Practical Use Cases
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub Actions automation:&lt;/strong&gt; Use your existing &lt;code&gt;GITHUB_TOKEN&lt;/code&gt; in CI/CD pipelines to add AI-powered code review, changelog generation, or PR labeling — no additional credentials needed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;VS Code extensions:&lt;/strong&gt; Build Copilot-like coding assistants that use GPT-4o via GitHub Models without paying for the OpenAI API&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code review bots:&lt;/strong&gt; Self-hosted bots that analyze pull requests using GPT-4o and leave detailed comments automatically&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Documentation generators:&lt;/strong&gt; Parse your codebase and generate README files, API docs, or changelogs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RAG prototypes:&lt;/strong&gt; Combine Cohere Command R+ (available in GitHub Models) with a vector database to test retrieval-augmented generation at zero cost&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM benchmarking:&lt;/strong&gt; Compare GPT-4o vs Llama 3.3 70B vs Phi-4 on your specific tasks without setting up multiple API accounts&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Limitations to Keep in Mind
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Rate limits are lower than dedicated providers:&lt;/strong&gt; At 150 requests/day on the free tier, GitHub Models is better for development than high-volume production workloads&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No fine-tuning:&lt;/strong&gt; You can’t train or customize models — inference only&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Powered by Azure:&lt;/strong&gt; Requests go through Azure’s infrastructure, which may matter for data residency in certain jurisdictions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model availability changes:&lt;/strong&gt; The catalog is curated by GitHub and may change — check the Marketplace for the current list&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token limits per request:&lt;/strong&gt; Output is typically capped at 4,096 tokens per completion even on models with larger context windows&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Related Reads
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/cohere-rag-api/" rel="noopener noreferrer"&gt;Cohere Free API: The Best Free Embedding and Rerank API for RAG in 2026&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/groq-vs-cerebras-vs-gemini/" rel="noopener noreferrer"&gt;Groq vs Cerebras vs Gemini: Which Free AI API Is Actually Fastest in 2026?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/cerebras-free-api/" rel="noopener noreferrer"&gt;Cerebras Inference API: The Fastest Free AI API You’ve Never Heard Of&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/mistral-free-api/" rel="noopener noreferrer"&gt;Mistral AI Free API: Call Nemo and Mixtral for Free with Any OpenAI SDK&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/cloudflare-workers-ai/" rel="noopener noreferrer"&gt;Cloudflare Workers AI: Free Edge AI Inference with 47+ Models&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;GitHub Models is the most developer-friendly free AI API available today. There’s no simpler path to GPT-4o access: if you have a GitHub account, you already have everything you need. The OpenAI-compatible endpoint means any existing code or tool that works with ChatGPT’s API works here with a one-line change.&lt;/p&gt;

&lt;p&gt;It’s not the fastest (that’s Groq) or the most generous in daily volume (that’s Groq or Gemini), but for developers who want GPT-4o without a credit card, or who want to mix and match models like Llama, Phi, and Mistral from a single endpoint, GitHub Models is unmatched.&lt;/p&gt;

&lt;p&gt;Start with &lt;a href="https://github.com/marketplace/models" rel="noopener noreferrer"&gt;github.com/marketplace/models&lt;/a&gt;, grab a token at &lt;a href="https://github.com/settings/tokens" rel="noopener noreferrer"&gt;github.com/settings/tokens&lt;/a&gt;, and you’re making GPT-4o calls in under 2 minutes.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://toolfreebie.com/github-models-free-api/" rel="noopener noreferrer"&gt;toolfreebie.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>opensource</category>
    </item>
    <item>
      <title>n8n: Open-Source Workflow Automation with AI Agents and 400+ Integrations</title>
      <dc:creator>toolfreebie</dc:creator>
      <pubDate>Sun, 03 May 2026 11:40:29 +0000</pubDate>
      <link>https://dev.to/build996/n8n-open-source-workflow-automation-with-ai-agents-and-400-integrations-3obk</link>
      <guid>https://dev.to/build996/n8n-open-source-workflow-automation-with-ai-agents-and-400-integrations-3obk</guid>
      <description>&lt;h2&gt;
  
  
  What Is n8n?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://n8n.io" rel="noopener noreferrer"&gt;n8n&lt;/a&gt; (pronounced “nodemation”) is a free, open-source workflow automation platform that lets you connect apps, APIs, and services without writing code — or with code when you need full control. It’s the self-hosted alternative to Zapier and Make.com, and unlike those platforms, &lt;strong&gt;n8n has no per-task fees&lt;/strong&gt; when you run it yourself.&lt;/p&gt;

&lt;p&gt;With over 400 built-in integrations, a visual node editor, and native AI agent support, n8n has become the go-to automation tool for developers and technical teams who want the power of Zapier without the pricing ceiling. As of 2026, n8n has 50,000+ GitHub stars and is one of the most deployed self-hosted automation tools in the world.&lt;/p&gt;

&lt;h2&gt;
  
  
  n8n Pricing: Free Self-Hosted vs Cloud
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Option&lt;/th&gt;
&lt;th&gt;Price&lt;/th&gt;
&lt;th&gt;Executions&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Self-hosted (Community)&lt;/td&gt;
&lt;td&gt;Free forever&lt;/td&gt;
&lt;td&gt;Unlimited&lt;/td&gt;
&lt;td&gt;Developers with a VPS or Docker&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;n8n Cloud Starter&lt;/td&gt;
&lt;td&gt;~$20/month&lt;/td&gt;
&lt;td&gt;2,500 executions/month&lt;/td&gt;
&lt;td&gt;No-maintenance cloud option&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;n8n Cloud Pro&lt;/td&gt;
&lt;td&gt;~$50/month&lt;/td&gt;
&lt;td&gt;10,000 executions/month&lt;/td&gt;
&lt;td&gt;Teams needing more volume&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enterprise&lt;/td&gt;
&lt;td&gt;Custom&lt;/td&gt;
&lt;td&gt;Unlimited&lt;/td&gt;
&lt;td&gt;Large-scale organizations&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The self-hosted Community edition is completely free — no executions cap, no feature limits, no credit card. You just need a server to run it on. A cheap VPS (like &lt;a href="https://toolfreebie.com/oracle-cloud-always-free" rel="noopener noreferrer"&gt;Oracle Cloud’s free ARM instance&lt;/a&gt;) is enough to run n8n for dozens of workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started: Run n8n with Docker
&lt;/h2&gt;

&lt;p&gt;The fastest way to run n8n locally or on a VPS is with Docker:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Run n8n with Docker (data persists in local volume)&lt;/span&gt;
docker run &lt;span class="nt"&gt;-it&lt;/span&gt; &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; n8n &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt; 5678:5678 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; n8n_data:/home/node/.n8n &lt;span class="se"&gt;\&lt;/span&gt;
  n8nio/n8n
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open &lt;code&gt;http://localhost:5678&lt;/code&gt; in your browser. On first launch, n8n asks you to create an owner account. That’s it — your workflow editor is ready.&lt;/p&gt;

&lt;p&gt;For production deployment with auto-restart:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# docker-compose.yml&lt;/span&gt;
&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;3.8'&lt;/span&gt;

&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;n8n&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;n8nio/n8n&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;always&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;5678:5678"&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;N8N_BASIC_AUTH_ACTIVE=true&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;N8N_BASIC_AUTH_USER=admin&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;N8N_BASIC_AUTH_PASSWORD=yourpassword&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;WEBHOOK_URL=https://your-domain.com/&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;n8n_data:/home/node/.n8n&lt;/span&gt;

&lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;n8n_data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Start in background&lt;/span&gt;
docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;

&lt;span class="c"&gt;# Check logs&lt;/span&gt;
docker compose logs &lt;span class="nt"&gt;-f&lt;/span&gt; n8n
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For HTTPS, put n8n behind an Nginx reverse proxy or use Caddy, which handles SSL certificates automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  Install via npm (No Docker Required)
&lt;/h2&gt;

&lt;p&gt;If you prefer a direct installation without Docker:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install n8n globally&lt;/span&gt;
npm &lt;span class="nb"&gt;install &lt;/span&gt;n8n &lt;span class="nt"&gt;-g&lt;/span&gt;

&lt;span class="c"&gt;# Start n8n&lt;/span&gt;
n8n start

&lt;span class="c"&gt;# Or run as a background process&lt;/span&gt;
n8n start &amp;amp;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;n8n requires Node.js 18+ and runs on Linux, macOS, and Windows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Features
&lt;/h2&gt;

&lt;h3&gt;
  
  
  400+ Built-In Integrations
&lt;/h3&gt;

&lt;p&gt;n8n ships with native nodes for virtually every popular service: Google Sheets, Slack, GitHub, Airtable, Notion, PostgreSQL, MySQL, Stripe, Shopify, HubSpot, Telegram, Discord, and hundreds more. Each node is pre-configured with authentication and common operations — no API documentation hunting required.&lt;/p&gt;

&lt;h3&gt;
  
  
  HTTP Request Node: Connect to Any API
&lt;/h3&gt;

&lt;p&gt;For services without a dedicated node, the HTTP Request node handles any REST API with support for OAuth, API keys, and custom headers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Example: Call a custom API in n8n HTTP Request node
Method: POST
URL: https://api.example.com/v1/data
Authentication: Header Auth
Header Name: Authorization
Header Value: Bearer your-api-key

Body (JSON):
{
  "query": "{{ $json.input }}",
  "limit": 10
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Code Node: Write JavaScript or Python
&lt;/h3&gt;

&lt;p&gt;When visual nodes aren’t enough, the Code node lets you write JavaScript or Python directly inside your workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// n8n Code node — process incoming data&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;json&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toUpperCase&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
      &lt;span class="na"&gt;processedAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
      &lt;span class="na"&gt;score&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;1.15&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Webhook Triggers
&lt;/h3&gt;

&lt;p&gt;n8n generates webhook URLs you can use to trigger workflows from external services. Set up a webhook trigger, copy the URL, and point any service at it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Example: Trigger n8n workflow via curl&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST https://your-n8n.com/webhook/your-workflow-id &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"event": "new_order", "order_id": "12345", "amount": 99.99}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Schedule Trigger
&lt;/h3&gt;

&lt;p&gt;Run workflows on a schedule using cron syntax. This is useful for daily reports, data syncing, or regular API calls — all without a separate cron server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="cm"&gt;/* Schedule examples in n8n:
   Every day at 9 AM:   0 9 * * *
   Every hour:          0 * * * *
   Every 15 minutes:    */&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
   &lt;span class="nx"&gt;Every&lt;/span&gt; &lt;span class="nx"&gt;weekday&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;       &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;
&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="err"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  n8n AI Agent Nodes
&lt;/h2&gt;

&lt;p&gt;n8n added native AI support with dedicated nodes for LLMs, memory, and tool use. You can build AI agents without writing a single line of Python:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AI Agent node:&lt;/strong&gt; Define a task in plain English, attach tools (web search, database queries, API calls), and the agent handles the reasoning loop&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chat Trigger:&lt;/strong&gt; Build a chatbot interface connected to any LLM&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM node:&lt;/strong&gt; Send prompts to OpenAI, Anthropic, Groq, Mistral, or Ollama&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory nodes:&lt;/strong&gt; Window buffer memory, vector store memory for long context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embeddings + Vector Store:&lt;/strong&gt; Connect Pinecone, Qdrant, or Supabase for RAG workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example AI agent workflow: a Telegram bot receives a message → AI Agent node processes it using a Groq LLM → the agent can search the web, query your database, or send emails → response goes back to Telegram. The entire pipeline is built visually, no code required.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use n8n with OpenClaw for Advanced AI Orchestration
&lt;/h2&gt;

&lt;p&gt;For teams that need even more sophisticated AI automation, combining n8n with &lt;a href="https://openclaw.ai" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt; creates a powerful dual-layer system. n8n handles the workflow orchestration — scheduling, triggers, data transformation, and service integrations — while OpenClaw manages complex AI agent tasks like multi-step reasoning, tool use, and research automation.&lt;/p&gt;

&lt;p&gt;A typical integration pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// n8n HTTP Request node calling OpenClaw API&lt;/span&gt;
&lt;span class="c1"&gt;// Trigger: Webhook from your app&lt;/span&gt;
&lt;span class="c1"&gt;// Step 1: Collect and format input data&lt;/span&gt;
&lt;span class="c1"&gt;// Step 2: Call OpenClaw for AI processing&lt;/span&gt;
&lt;span class="c1"&gt;// Step 3: Send results to Slack, database, or email&lt;/span&gt;

&lt;span class="c1"&gt;// In n8n Code node — build the OpenClaw payload&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;inputData&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;first&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
  &lt;span class="na"&gt;json&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Analyze this customer feedback and categorize it: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;inputData&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;feedback&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;inputData&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;product&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;inputData&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;product&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}];&lt;/span&gt;

&lt;span class="c1"&gt;// Then in HTTP Request node:&lt;/span&gt;
&lt;span class="c1"&gt;// POST https://api.openclaw.ai/v1/run&lt;/span&gt;
&lt;span class="c1"&gt;// Body: {{ $json }}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern works well for customer support automation, content moderation, data enrichment pipelines, and any task that combines structured workflows with unstructured AI reasoning.&lt;/p&gt;

&lt;h2&gt;
  
  
  n8n vs Zapier vs Make.com
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;n8n (self-hosted)&lt;/th&gt;
&lt;th&gt;Zapier&lt;/th&gt;
&lt;th&gt;Make.com&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Price&lt;/td&gt;
&lt;td&gt;Free (self-hosted)&lt;/td&gt;
&lt;td&gt;From $19.99/month&lt;/td&gt;
&lt;td&gt;From $9/month&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Executions limit&lt;/td&gt;
&lt;td&gt;Unlimited&lt;/td&gt;
&lt;td&gt;100-750/month (free)&lt;/td&gt;
&lt;td&gt;1,000 ops/month (free)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Integrations&lt;/td&gt;
&lt;td&gt;400+ native&lt;/td&gt;
&lt;td&gt;6,000+&lt;/td&gt;
&lt;td&gt;1,500+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Custom code&lt;/td&gt;
&lt;td&gt;Yes (JS + Python)&lt;/td&gt;
&lt;td&gt;Yes (JS only, paid)&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI Agent nodes&lt;/td&gt;
&lt;td&gt;Native (built-in)&lt;/td&gt;
&lt;td&gt;Limited (AI actions)&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Self-hostable&lt;/td&gt;
&lt;td&gt;Yes (open source)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data privacy&lt;/td&gt;
&lt;td&gt;Full (your server)&lt;/td&gt;
&lt;td&gt;Zapier’s cloud&lt;/td&gt;
&lt;td&gt;Make’s cloud&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Webhook support&lt;/td&gt;
&lt;td&gt;Yes (unlimited)&lt;/td&gt;
&lt;td&gt;Yes (paid plans)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Database nodes&lt;/td&gt;
&lt;td&gt;Yes (Postgres, MySQL, etc.)&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best for&lt;/td&gt;
&lt;td&gt;Developers, privacy-conscious teams&lt;/td&gt;
&lt;td&gt;Non-technical users, breadth&lt;/td&gt;
&lt;td&gt;Complex visual workflows&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The key trade-off:&lt;/strong&gt; Zapier has far more integrations (6,000+ vs 400+), which matters if your app of choice doesn’t have an n8n node. But n8n’s HTTP Request node fills most gaps, and for any technical team, n8n’s self-hosting + unlimited executions + code nodes make it the clear winner on value.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-World n8n Workflow Examples
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. GitHub Issue to Slack Notifier
&lt;/h3&gt;

&lt;p&gt;Trigger → When a GitHub issue is created or updated → Filter by label → Send a formatted Slack message to your team channel. Took about 5 minutes to build, no code needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Daily Report from PostgreSQL to Email
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="cm"&gt;/* Workflow:
   1. Schedule Trigger (every day at 8 AM)
   2. PostgreSQL node: SELECT summary stats
   3. Code node: Format data into HTML table
   4. Gmail/SMTP node: Send report email
*/&lt;/span&gt;

&lt;span class="c1"&gt;// Code node — build HTML report&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tableRows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;row&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;`
  &amp;lt;tr&amp;gt;
    &amp;lt;td&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;date&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;lt;/td&amp;gt;
    &amp;lt;td&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;lt;/td&amp;gt;
    &amp;lt;td&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;revenue&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;lt;/td&amp;gt;
  &amp;lt;/tr&amp;gt;
`&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;json&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;html&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`&amp;lt;table&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;tableRows&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;lt;/table&amp;gt;`&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. AI-Powered Content Moderation
&lt;/h3&gt;

&lt;p&gt;Webhook receives user submission → LLM node analyzes content for policy violations → If flagged, create a Notion task for manual review + notify moderators via Telegram → If clean, auto-approve and store in database.&lt;/p&gt;

&lt;h2&gt;
  
  
  n8n Deployment Tips
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Use PostgreSQL as the database:&lt;/strong&gt; By default n8n uses SQLite. Switch to PostgreSQL for better performance and reliability in production&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Set up a reverse proxy:&lt;/strong&gt; Use Nginx or Caddy to serve n8n on port 443 with HTTPS&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enable basic auth or use n8n’s built-in user management:&lt;/strong&gt; Don’t expose n8n to the internet without authentication&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Back up &lt;code&gt;/home/node/.n8n&lt;/code&gt;:&lt;/strong&gt; This directory contains all your workflows and credentials&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use environment variables for secrets:&lt;/strong&gt; Store API keys as n8n credentials, not hardcoded in workflows
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Production n8n with PostgreSQL&lt;/span&gt;
&lt;span class="c1"&gt;# docker-compose.yml&lt;/span&gt;

&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;3.8'&lt;/span&gt;

&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;postgres&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgres:15&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;always&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;POSTGRES_DB&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;n8n&lt;/span&gt;
      &lt;span class="na"&gt;POSTGRES_USER&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;n8n&lt;/span&gt;
      &lt;span class="na"&gt;POSTGRES_PASSWORD&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;n8n_password&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;postgres_data:/var/lib/postgresql/data&lt;/span&gt;

  &lt;span class="na"&gt;n8n&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;n8nio/n8n&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;always&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;5678:5678"&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;DB_TYPE=postgresdb&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;DB_POSTGRESDB_HOST=postgres&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;DB_POSTGRESDB_DATABASE=n8n&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;DB_POSTGRESDB_USER=n8n&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;DB_POSTGRESDB_PASSWORD=n8n_password&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;N8N_BASIC_AUTH_ACTIVE=true&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;N8N_BASIC_AUTH_USER=admin&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;N8N_BASIC_AUTH_PASSWORD=securepassword&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;WEBHOOK_URL=https://n8n.yourdomain.com/&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;n8n_data:/home/node/.n8n&lt;/span&gt;
    &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;postgres&lt;/span&gt;

&lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;postgres_data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;n8n_data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  When to Use n8n
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;n8n is the right choice when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You want unlimited workflow executions without per-task fees&lt;/li&gt;
&lt;li&gt;You need data to stay on your own server (compliance, GDPR, privacy)&lt;/li&gt;
&lt;li&gt;You want to write custom code inside your automation workflows&lt;/li&gt;
&lt;li&gt;You’re building AI agent pipelines with multiple LLM steps&lt;/li&gt;
&lt;li&gt;You need direct database access (Postgres, MySQL, MongoDB) inside workflows&lt;/li&gt;
&lt;li&gt;You’re already running a VPS and want to add automation without paying for SaaS&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Consider Zapier or Make.com when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need a specific integration that n8n doesn’t support natively&lt;/li&gt;
&lt;li&gt;You have no technical resources to maintain a self-hosted server&lt;/li&gt;
&lt;li&gt;You need a truly no-code solution for non-developers to manage&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Related Reads
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/crewai-vs-autogpt-vs-langgraph/" rel="noopener noreferrer"&gt;CrewAI vs AutoGPT vs LangGraph: Which Free Agent Framework Should You Use in 2026?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/mcp-protocol-ai-agents/" rel="noopener noreferrer"&gt;MCP (Model Context Protocol): Connect AI Agents to Any Tool or API&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/notebooklm-ai-research/" rel="noopener noreferrer"&gt;Google NotebookLM: Free AI Research Tool for Summarizing Documents and PDFs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/dify-ai-app-builder/" rel="noopener noreferrer"&gt;Dify: Free Open-Source AI App Builder for Chatbots and Workflows&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/crewai-multi-agent-framework/" rel="noopener noreferrer"&gt;CrewAI: Free Open-Source Multi-Agent AI Framework for Python&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Final Verdict
&lt;/h2&gt;

&lt;p&gt;n8n is the most powerful free automation tool available for developers and technical teams in 2026. The combination of unlimited self-hosted executions, 400+ integrations, native AI agent nodes, and full JavaScript/Python code support makes it vastly more capable than what Zapier’s free tier offers — and cheaper than Zapier’s paid tiers at scale.&lt;/p&gt;

&lt;p&gt;If you have a server (even a free Oracle Cloud ARM instance), &lt;a href="https://n8n.io" rel="noopener noreferrer"&gt;n8n&lt;/a&gt; running on Docker will cost you nothing and automate more than you’d expect. Start with the Docker one-liner, build your first workflow in under 10 minutes, and never pay per-task fees again.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://toolfreebie.com/n8n-workflow-automation/" rel="noopener noreferrer"&gt;toolfreebie.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
    </item>
    <item>
      <title>Vercel vs Netlify vs Cloudflare Pages: Free Frontend Hosting Compared</title>
      <dc:creator>toolfreebie</dc:creator>
      <pubDate>Sun, 03 May 2026 11:34:59 +0000</pubDate>
      <link>https://dev.to/build996/vercel-vs-netlify-vs-cloudflare-pages-free-frontend-hosting-compared-3h9g</link>
      <guid>https://dev.to/build996/vercel-vs-netlify-vs-cloudflare-pages-free-frontend-hosting-compared-3h9g</guid>
      <description>&lt;h2&gt;
  
  
  Vercel, Netlify, and Cloudflare Pages: The Three Free Frontends
&lt;/h2&gt;

&lt;p&gt;If you’re hosting a static site, Next.js app, or JAMstack project in 2026, you have three dominant free options: &lt;strong&gt;Vercel&lt;/strong&gt;, &lt;strong&gt;Netlify&lt;/strong&gt;, and &lt;strong&gt;Cloudflare Pages&lt;/strong&gt;. All three offer global CDN deployment, GitHub integration, and a genuinely useful free tier — but they differ significantly in build limits, function support, and performance.&lt;/p&gt;

&lt;p&gt;This article cuts through the marketing and compares what actually matters: free tier limits, deployment experience, edge performance, and which platform to choose for your specific use case.&lt;/p&gt;

&lt;h2&gt;
  
  
  Free Tier Comparison at a Glance
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Vercel&lt;/th&gt;
&lt;th&gt;Netlify&lt;/th&gt;
&lt;th&gt;Cloudflare Pages&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Bandwidth&lt;/td&gt;
&lt;td&gt;100 GB/month&lt;/td&gt;
&lt;td&gt;100 GB/month&lt;/td&gt;
&lt;td&gt;Unlimited&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Build minutes&lt;/td&gt;
&lt;td&gt;6,000 min/month&lt;/td&gt;
&lt;td&gt;300 min/month&lt;/td&gt;
&lt;td&gt;500 builds/month&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Serverless functions&lt;/td&gt;
&lt;td&gt;1M requests/month&lt;/td&gt;
&lt;td&gt;125K requests/month&lt;/td&gt;
&lt;td&gt;100K requests/day&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sites / projects&lt;/td&gt;
&lt;td&gt;Unlimited&lt;/td&gt;
&lt;td&gt;Unlimited&lt;/td&gt;
&lt;td&gt;Unlimited&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Custom domains&lt;/td&gt;
&lt;td&gt;Unlimited&lt;/td&gt;
&lt;td&gt;Unlimited&lt;/td&gt;
&lt;td&gt;Unlimited&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Edge locations&lt;/td&gt;
&lt;td&gt;100+ PoPs&lt;/td&gt;
&lt;td&gt;Global CDN&lt;/td&gt;
&lt;td&gt;300+ PoPs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Preview URLs (PRs)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Commercial use (free)&lt;/td&gt;
&lt;td&gt;No (Pro required)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Free tier price&lt;/td&gt;
&lt;td&gt;Free (hobby only)&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Key takeaway:&lt;/strong&gt; Cloudflare Pages wins on raw limits (unlimited bandwidth, 300+ edge locations). Vercel wins on developer experience for React/Next.js. Netlify wins on ecosystem integrations and free forms.&lt;/p&gt;

&lt;h2&gt;
  
  
  Vercel: Best for Next.js and React
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://vercel.com" rel="noopener noreferrer"&gt;Vercel&lt;/a&gt; built Next.js — and it shows. Deploying a Next.js app on Vercel is the smoothest experience in frontend hosting. Server-side rendering, React Server Components, Edge Functions, and App Router all work out of the box with zero configuration.&lt;/p&gt;

&lt;h3&gt;
  
  
  Vercel Free Tier Limits (Hobby Plan)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Bandwidth:&lt;/strong&gt; 100 GB/month — enough for a small to mid-traffic blog or SaaS landing page&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build minutes:&lt;/strong&gt; 6,000/month — generous; a typical Next.js build takes 1–3 minutes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Serverless Functions:&lt;/strong&gt; 1M invocations/month, 100 GB-hours compute, 10-second max duration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Edge Functions:&lt;/strong&gt; 500K invocations/month, unlimited duration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Image optimization:&lt;/strong&gt; 1,000 source images/month (the Next.js Image component uses this)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Important limitation:&lt;/strong&gt; Vercel’s Hobby plan is for personal, non-commercial use only. If you’re building a commercial product — even a small SaaS — you technically need the Pro plan at $20/month per member. Netlify and Cloudflare Pages have no such restriction.&lt;/p&gt;

&lt;h3&gt;
  
  
  Deploy a Next.js App on Vercel
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Vercel CLI&lt;/span&gt;
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; vercel

&lt;span class="c"&gt;# Deploy from your project directory&lt;/span&gt;
vercel

&lt;span class="c"&gt;# Follow the prompts — it auto-detects Next.js&lt;/span&gt;
&lt;span class="c"&gt;# Production URL is live in ~60 seconds&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or connect GitHub: every push to &lt;code&gt;main&lt;/code&gt; auto-deploys. Every pull request gets a unique preview URL at &lt;code&gt;your-project-&amp;lt;hash&amp;gt;.vercel.app&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Vercel Serverless Functions
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// app/api/hello/route.ts (Next.js App Router)&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;GET&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Hello from Vercel Edge!&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// pages/api/hello.js (Pages Router)&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Hello from Vercel&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Functions deploy automatically with your Next.js app — no extra configuration needed. The 1M free invocations/month handles ~33K requests/day, which is plenty for most projects.&lt;/p&gt;

&lt;h2&gt;
  
  
  Netlify: Best for JAMstack and CMS Integration
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.netlify.com" rel="noopener noreferrer"&gt;Netlify&lt;/a&gt; pioneered the modern static hosting model in 2015 and remains the most feature-rich free tier for JAMstack sites. Where it stands out: built-in forms, split A/B testing, identity/auth management, and deep integrations with headless CMS platforms like Contentful, Sanity, and Strapi.&lt;/p&gt;

&lt;h3&gt;
  
  
  Netlify Free Tier Limits
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Bandwidth:&lt;/strong&gt; 100 GB/month&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build minutes:&lt;/strong&gt; 300/month — the most restrictive of the three; a complex Gatsby or Hugo build can eat 5–10 minutes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Serverless Functions:&lt;/strong&gt; 125,000 invocations/month, 100 GB-hours compute&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Netlify Forms:&lt;/strong&gt; 100 submissions/month — free form backend with spam filtering, no server required&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Concurrent builds:&lt;/strong&gt; 1 — you can’t run multiple builds simultaneously on the free tier&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Deploy Any Static Site on Netlify
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Netlify CLI&lt;/span&gt;
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; netlify-cli

&lt;span class="c"&gt;# Login&lt;/span&gt;
netlify login

&lt;span class="c"&gt;# Deploy from build directory&lt;/span&gt;
netlify deploy &lt;span class="nt"&gt;--dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;./dist

&lt;span class="c"&gt;# Deploy to production&lt;/span&gt;
netlify deploy &lt;span class="nt"&gt;--dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;./dist &lt;span class="nt"&gt;--prod&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Netlify auto-detects frameworks: React, Vue, Svelte, Gatsby, Hugo, Jekyll, Astro. Set your build command and output directory once — Netlify handles the rest.&lt;/p&gt;

&lt;h3&gt;
  
  
  Netlify Functions (Serverless)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// netlify/functions/hello.js&lt;/span&gt;
&lt;span class="nx"&gt;exports&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Hello from Netlify Functions&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Deploy to &lt;code&gt;netlify/functions/&lt;/code&gt; and the function is automatically available at &lt;code&gt;/.netlify/functions/hello&lt;/code&gt;. No routing config needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Netlify Forms: Free Backend Without a Server
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="c"&gt;&amp;lt;!-- Add data-netlify="true" to any HTML form --&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;form&lt;/span&gt; &lt;span class="na"&gt;name=&lt;/span&gt;&lt;span class="s"&gt;"contact"&lt;/span&gt; &lt;span class="na"&gt;method=&lt;/span&gt;&lt;span class="s"&gt;"POST"&lt;/span&gt; &lt;span class="na"&gt;data-netlify=&lt;/span&gt;&lt;span class="s"&gt;"true"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;input&lt;/span&gt; &lt;span class="na"&gt;type=&lt;/span&gt;&lt;span class="s"&gt;"hidden"&lt;/span&gt; &lt;span class="na"&gt;name=&lt;/span&gt;&lt;span class="s"&gt;"form-name"&lt;/span&gt; &lt;span class="na"&gt;value=&lt;/span&gt;&lt;span class="s"&gt;"contact"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;input&lt;/span&gt; &lt;span class="na"&gt;type=&lt;/span&gt;&lt;span class="s"&gt;"text"&lt;/span&gt; &lt;span class="na"&gt;name=&lt;/span&gt;&lt;span class="s"&gt;"name"&lt;/span&gt; &lt;span class="na"&gt;placeholder=&lt;/span&gt;&lt;span class="s"&gt;"Your Name"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;input&lt;/span&gt; &lt;span class="na"&gt;type=&lt;/span&gt;&lt;span class="s"&gt;"email"&lt;/span&gt; &lt;span class="na"&gt;name=&lt;/span&gt;&lt;span class="s"&gt;"email"&lt;/span&gt; &lt;span class="na"&gt;placeholder=&lt;/span&gt;&lt;span class="s"&gt;"Your Email"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;textarea&lt;/span&gt; &lt;span class="na"&gt;name=&lt;/span&gt;&lt;span class="s"&gt;"message"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&amp;lt;/textarea&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;button&lt;/span&gt; &lt;span class="na"&gt;type=&lt;/span&gt;&lt;span class="s"&gt;"submit"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;Send&lt;span class="nt"&gt;&amp;lt;/button&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/form&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Netlify intercepts the form POST, stores submissions in their dashboard, and can send email notifications — completely free for up to 100 submissions/month. No backend code required.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cloudflare Pages: Best for Global Performance and Unlimited Scale
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://pages.cloudflare.com" rel="noopener noreferrer"&gt;Cloudflare Pages&lt;/a&gt; launched in 2021 as Cloudflare’s answer to Vercel and Netlify. Its differentiator: Cloudflare’s 300+ edge network, the world’s largest, combined with a genuinely unlimited free bandwidth tier.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cloudflare Pages Free Tier Limits
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Bandwidth:&lt;/strong&gt; Unlimited — no overage charges, ever&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Builds:&lt;/strong&gt; 500 per month, 20,000 files per deployment&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pages Functions (serverless):&lt;/strong&gt; 100,000 requests/day (free), same Workers runtime&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sites:&lt;/strong&gt; Unlimited&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Collaborators:&lt;/strong&gt; Unlimited (vs. Vercel’s 1-person Hobby limit)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Commercial use:&lt;/strong&gt; Yes — no hobby-only restriction&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Deploy to Cloudflare Pages
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Wrangler CLI&lt;/span&gt;
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; wrangler

&lt;span class="c"&gt;# Login&lt;/span&gt;
wrangler login

&lt;span class="c"&gt;# Deploy&lt;/span&gt;
wrangler pages deploy ./dist &lt;span class="nt"&gt;--project-name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;my-site
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or connect via the Cloudflare dashboard: &lt;strong&gt;Pages&lt;/strong&gt; → &lt;strong&gt;Create a project&lt;/strong&gt; → &lt;strong&gt;Connect to Git&lt;/strong&gt;. Cloudflare detects the framework and configures build settings automatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cloudflare Pages Functions
&lt;/h3&gt;

&lt;p&gt;Pages Functions use the same Workers runtime — V8 isolates, not Node.js. This means faster cold starts (sub-millisecond vs. ~100ms for Node Lambda) but a different API surface.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// functions/api/hello.js&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;onRequestGet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Hello from Cloudflare Pages Functions&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;File-based routing: &lt;code&gt;functions/api/hello.js&lt;/code&gt; maps to &lt;code&gt;/api/hello&lt;/code&gt;. No configuration needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance: How Fast Is Each Platform?
&lt;/h2&gt;

&lt;p&gt;All three platforms use global CDN — static assets load fast everywhere. The differences show up in Time to First Byte (TTFB) for dynamic rendering and function cold starts:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Vercel&lt;/th&gt;
&lt;th&gt;Netlify&lt;/th&gt;
&lt;th&gt;Cloudflare Pages&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Static asset TTFB&lt;/td&gt;
&lt;td&gt;~20–50ms global&lt;/td&gt;
&lt;td&gt;~20–50ms global&lt;/td&gt;
&lt;td&gt;~10–30ms (300+ PoPs)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Function cold start&lt;/td&gt;
&lt;td&gt;~50–100ms (Node.js)&lt;/td&gt;
&lt;td&gt;~50–150ms (Node.js)&lt;/td&gt;
&lt;td&gt;~0–5ms (V8 isolates)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Build speed (React app)&lt;/td&gt;
&lt;td&gt;Fast (~1–2 min)&lt;/td&gt;
&lt;td&gt;Medium (~2–4 min)&lt;/td&gt;
&lt;td&gt;Fast (~1–2 min)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Edge locations&lt;/td&gt;
&lt;td&gt;100+&lt;/td&gt;
&lt;td&gt;Global CDN&lt;/td&gt;
&lt;td&gt;300+&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Cloudflare’s V8 isolate model eliminates cold starts entirely — Pages Functions start in microseconds, not milliseconds. For latency-sensitive APIs, this is a meaningful advantage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Framework Support
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Framework&lt;/th&gt;
&lt;th&gt;Vercel&lt;/th&gt;
&lt;th&gt;Netlify&lt;/th&gt;
&lt;th&gt;Cloudflare Pages&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Next.js&lt;/td&gt;
&lt;td&gt;Native (built by Vercel)&lt;/td&gt;
&lt;td&gt;Good (via adapter)&lt;/td&gt;
&lt;td&gt;Partial (via @cloudflare/next-on-pages)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Remix&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;td&gt;Excellent (official adapter)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Astro&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SvelteKit&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;td&gt;Good (official adapter)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gatsby&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hugo/Jekyll&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Nuxt (Vue)&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Vercel’s edge over competitors is specifically for Next.js — React Server Components, App Router, and streaming all work perfectly because Vercel controls the framework. For anything else, all three platforms are roughly equivalent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Add AI to Your Frontend with OpenClaw and Serverless Functions
&lt;/h2&gt;

&lt;p&gt;All three platforms support serverless functions, which makes them excellent hosts for AI-powered frontends using &lt;a href="https://openclaw.ai" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt;. Deploy your static frontend to any of these platforms and use their serverless functions to call OpenClaw agents — keeping your API keys server-side while serving the UI globally.&lt;/p&gt;

&lt;p&gt;Example: a Vercel API route that calls an OpenClaw agent to analyze user input:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// app/api/analyze/route.ts (Vercel, Next.js App Router)&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;NextRequest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;NextResponse&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;next/server&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;POST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;NextRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://api.openclaw.ai/v1/run&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Authorization&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Bearer &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;OPENCLAW_API_KEY&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Analyze this text and summarize key points: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;NextResponse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The same pattern works on Netlify Functions and Cloudflare Pages Functions — just adapt to each platform’s runtime. The frontend calls &lt;code&gt;/api/analyze&lt;/code&gt; and gets AI-powered results without exposing your API keys to the browser.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Choose Each Platform
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Choose Vercel when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You’re building with Next.js — Vercel’s native support is unmatched&lt;/li&gt;
&lt;li&gt;You want the best developer experience (CI/CD, preview deployments, analytics)&lt;/li&gt;
&lt;li&gt;Your project is personal/hobby (free tier commercial restriction applies)&lt;/li&gt;
&lt;li&gt;You need React Server Components with zero configuration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Choose Netlify when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need built-in forms without a backend&lt;/li&gt;
&lt;li&gt;You’re running a CMS-driven site (Contentful, Sanity, DatoCMS)&lt;/li&gt;
&lt;li&gt;You want commercial use without paying&lt;/li&gt;
&lt;li&gt;You need Netlify’s built-in identity/auth features for a JAMstack app&lt;/li&gt;
&lt;li&gt;You have a Hugo, Jekyll, or Gatsby site — Netlify’s build support for these is excellent&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Choose Cloudflare Pages when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need unlimited bandwidth — no surprises on bills&lt;/li&gt;
&lt;li&gt;You’re already on Cloudflare for DNS/security&lt;/li&gt;
&lt;li&gt;You need the lowest possible latency globally (300+ edge locations)&lt;/li&gt;
&lt;li&gt;You’re building with Remix or SvelteKit (excellent adapters)&lt;/li&gt;
&lt;li&gt;You need serverless functions with zero cold starts&lt;/li&gt;
&lt;li&gt;You’re building a high-traffic site and don’t want bandwidth caps&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Paid Plans (When You Outgrow Free)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Plan&lt;/th&gt;
&lt;th&gt;Vercel Pro&lt;/th&gt;
&lt;th&gt;Netlify Pro&lt;/th&gt;
&lt;th&gt;Cloudflare Pages Paid&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Price&lt;/td&gt;
&lt;td&gt;$20/month per member&lt;/td&gt;
&lt;td&gt;$19/month per member&lt;/td&gt;
&lt;td&gt;$5/month (Workers Paid plan)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bandwidth&lt;/td&gt;
&lt;td&gt;1 TB/month&lt;/td&gt;
&lt;td&gt;400 GB/month&lt;/td&gt;
&lt;td&gt;Unlimited&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Build minutes&lt;/td&gt;
&lt;td&gt;24,000/month&lt;/td&gt;
&lt;td&gt;1,000/month&lt;/td&gt;
&lt;td&gt;5,000/month&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Functions&lt;/td&gt;
&lt;td&gt;1M req/month&lt;/td&gt;
&lt;td&gt;125K/month&lt;/td&gt;
&lt;td&gt;10M req/month&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Cloudflare’s $5/month Workers Paid plan offers the best value at scale — 10M function requests, unlimited bandwidth, and it powers both Workers and Pages Functions in the same account.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Reads
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/render-hosting-review/" rel="noopener noreferrer"&gt;Render Free Hosting Review 2026: Deploy Web Apps, Databases, and Cron Jobs for Free&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/supabase-vs-neon/" rel="noopener noreferrer"&gt;Supabase vs Neon: Which Free PostgreSQL Database Should You Use in 2026?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/railway-heroku-alternative/" rel="noopener noreferrer"&gt;Railway App Review 2026: The Best Heroku Alternative for Developers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/oracle-free-arm-vps/" rel="noopener noreferrer"&gt;Oracle Cloud Always Free: Get a 4-Core 24GB ARM VPS for Free&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/best-free-hosting-2026/" rel="noopener noreferrer"&gt;7 Best Free Web Hosting for Developers: Cloudflare Pages, Vercel, Netlify and More&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Final Verdict
&lt;/h2&gt;

&lt;p&gt;There’s no single winner — it depends on your use case:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Next.js project:&lt;/strong&gt; Use Vercel. The framework and platform are made by the same team, and it shows.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JAMstack with forms/CMS:&lt;/strong&gt; Use Netlify. The ecosystem integrations are unmatched.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High traffic or unlimited bandwidth:&lt;/strong&gt; Use Cloudflare Pages. No bandwidth caps, 300+ edge locations, and sub-millisecond function starts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Commercial project on a budget:&lt;/strong&gt; Netlify or Cloudflare Pages — both allow commercial use on the free tier. Vercel’s free plan is hobby-only.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For most static sites and frontends, you can start on all three for free and migrate later if needed. The good news: deploying to any of them takes under 5 minutes from a GitHub repo.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://toolfreebie.com/vercel-netlify-cloudflare/" rel="noopener noreferrer"&gt;toolfreebie.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>hosting</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
