<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: toolfreebie</title>
    <description>The latest articles on DEV Community by toolfreebie (@build996).</description>
    <link>https://dev.to/build996</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3909730%2F6972eddd-4c8f-475b-a284-e5755d0ce323.jpeg</url>
      <title>DEV Community: toolfreebie</title>
      <link>https://dev.to/build996</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/build996"/>
    <language>en</language>
    <item>
      <title>Tavily vs Brave vs Exa: Free Search APIs for AI Agents</title>
      <dc:creator>toolfreebie</dc:creator>
      <pubDate>Thu, 28 May 2026 09:08:05 +0000</pubDate>
      <link>https://dev.to/build996/tavily-vs-brave-vs-exa-free-search-apis-for-ai-agents-59m1</link>
      <guid>https://dev.to/build996/tavily-vs-brave-vs-exa-free-search-apis-for-ai-agents-59m1</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftdjfitzwegp533qckrq1.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftdjfitzwegp533qckrq1.jpg" alt="Tavily vs Brave vs Exa: Free Search APIs for AI Agents" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Every AI Agent Needs a Search Tool — Here Are the Three Free Ones That Actually Work
&lt;/h2&gt;

&lt;p&gt;If you build AI agents in 2026, your model is brilliant at reasoning over text and useless at knowing what happened yesterday. The fix is the same one every framework — &lt;a href="https://toolfreebie.com/crewai-multi-agent-framework/" rel="noopener noreferrer"&gt;CrewAI&lt;/a&gt;, LangGraph, AutoGPT, &lt;a href="https://toolfreebie.com/aider-free-ai-coding-cli/" rel="noopener noreferrer"&gt;Aider&lt;/a&gt;, every &lt;a href="https://toolfreebie.com/mcp-protocol-ai-agents/" rel="noopener noreferrer"&gt;MCP server&lt;/a&gt; with a “web search” tool — eventually arrives at: hand the agent a search API.&lt;/p&gt;

&lt;p&gt;Google’s official Search API is closed off behind enterprise contracts. Bing’s search API is being retired. SerpAPI starts at $75/month. For developers prototyping or running personal agents, the practical free options have narrowed down to three serious providers, each with a different definition of “free”:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tavily&lt;/strong&gt; — purpose-built for LLM retrieval, 1,000 free API credits per month, no credit card&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Brave Search API&lt;/strong&gt; — independent web index, 2,000 free queries per month at 1 query/second, no credit card&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exa&lt;/strong&gt; — neural search engine designed for AI, $10 of free signup credit (≈1,000 searches), email signup only&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All three are production-ready, all three publish OpenAPI specs you can paste into a tool definition, and all three plug straight into the agent frameworks you are already using. This guide breaks down what each free tier really gets you, which one to pair with which framework, and the corner cases where you should reach for a different tool entirely.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Comparison: Tavily vs Brave vs Exa Free Tiers
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Tavily (Free)&lt;/th&gt;
&lt;th&gt;Brave Search API (Free)&lt;/th&gt;
&lt;th&gt;Exa (Free)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Free quota&lt;/td&gt;
&lt;td&gt;1,000 API credits/month&lt;/td&gt;
&lt;td&gt;2,000 queries/month&lt;/td&gt;
&lt;td&gt;$10 signup credit (~1,000 searches)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rate limit&lt;/td&gt;
&lt;td&gt;~10 requests/sec, no daily cap&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1 query/sec&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~5 requests/sec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Credit card needed&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Free tier resets&lt;/td&gt;
&lt;td&gt;Monthly&lt;/td&gt;
&lt;td&gt;Monthly&lt;/td&gt;
&lt;td&gt;One-time credit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Index type&lt;/td&gt;
&lt;td&gt;Aggregated (Bing/others) + own crawl&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Independent crawl, own index&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Embedding-based neural index&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Optimized for&lt;/td&gt;
&lt;td&gt;LLM RAG / agent retrieval&lt;/td&gt;
&lt;td&gt;Traditional web search&lt;/td&gt;
&lt;td&gt;Semantic / similarity search&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Content extraction&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes — built-in &lt;code&gt;include_raw_content&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Snippet only (paid tier adds extraction)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes — built-in &lt;code&gt;contents.text&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;News endpoint&lt;/td&gt;
&lt;td&gt;Yes (&lt;code&gt;topic="news"&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Yes (dedicated news API)&lt;/td&gt;
&lt;td&gt;Yes (via &lt;code&gt;type="neural"&lt;/code&gt; + filters)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Domain include/exclude&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Limited (goggles)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best for&lt;/td&gt;
&lt;td&gt;RAG agents that need clean text&lt;/td&gt;
&lt;td&gt;Cheap, high-volume search at scale&lt;/td&gt;
&lt;td&gt;Finding similar pages / research&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The short version: &lt;strong&gt;Tavily&lt;/strong&gt; is what you reach for when an LLM is going to read the result; &lt;strong&gt;Brave&lt;/strong&gt; is what you reach for when you want a lot of independent search results cheaply; &lt;strong&gt;Exa&lt;/strong&gt; is what you reach for when you want results that share semantic meaning rather than just keywords.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Tavily?
&lt;/h2&gt;

&lt;p&gt;Tavily is a search API built specifically for LLMs and AI agents. Founded in 2023 and now used by tens of thousands of developers, it has become the default search tool in LangChain, the recommended tool in the CrewAI documentation, and the example most MCP search servers ship with.&lt;/p&gt;

&lt;p&gt;The pitch is straightforward: a normal search API gives you ten blue links and snippets. An agent then has to spend additional turns visiting each URL, parsing HTML, stripping ads and navigation, and producing usable text. Tavily collapses that entire pipeline into a single API call — you send a query, you get back ranked URLs &lt;em&gt;plus&lt;/em&gt; a clean, model-ready text answer extracted from the top results, with optional raw content of each page.&lt;/p&gt;

&lt;p&gt;For agents, this matters in two practical ways. First, it cuts token usage: instead of feeding 10 noisy HTML pages into your context window, you feed one cleaned summary plus three extracted snippets. Second, it cuts latency: one HTTP call instead of one search call plus ten fetch calls.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tavily Free Tier: What You Actually Get
&lt;/h2&gt;

&lt;p&gt;The free tier is generous for prototyping and personal agents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;1,000 API credits per month&lt;/strong&gt;, refreshed at the start of each calendar month&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1 credit = 1 basic search&lt;/strong&gt;; &lt;code&gt;search_depth="advanced"&lt;/code&gt; costs 2 credits per call&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No credit card&lt;/strong&gt; required — sign up with email or GitHub and your key is live immediately&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full API access&lt;/strong&gt; — every endpoint and parameter that paid users get&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;~10 requests per second&lt;/strong&gt; rate limit (not officially published, but consistent in practice)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a hobby agent doing 30 searches per day, you will not hit the limit. For a production app, the next paid tier (Researcher) is $30/month for 4,000 credits, with usage-based billing on top.&lt;/p&gt;

&lt;p&gt;One thing to know: Tavily does not run its own crawler at the scale of Google. It aggregates from upstream providers (Bing API is a major one) &lt;em&gt;plus&lt;/em&gt; a curated crawl of high-quality sources, then re-ranks the combined results for relevance to your specific LLM query. The ranking quality is the real product, not the raw index size.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started with Tavily
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Get Your Free API Key
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;a href="https://tavily.com" rel="noopener noreferrer"&gt;tavily.com&lt;/a&gt; and click &lt;strong&gt;Get API Key&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Sign in with GitHub or email — no credit card form appears&lt;/li&gt;
&lt;li&gt;Copy the key from your dashboard (it starts with &lt;code&gt;tvly-&lt;/code&gt;)&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  2. Call the API from Python
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install tavily-python
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from tavily import TavilyClient

client = TavilyClient(api_key="tvly-YOUR_KEY")

response = client.search(
    query="What were the major Claude 4.7 release notes?",
    search_depth="basic",         # "advanced" gives deeper crawl, costs 2 credits
    max_results=5,
    include_answer=True,          # LLM-generated summary of top results
    include_raw_content=False,    # set True to get full extracted page text
)

print(response["answer"])
for r in response["results"]:
    print(f"{r['title']} - {r['url']}")
    print(r["content"][:200])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Direct curl Without the SDK
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -X POST https://api.tavily.com/search \
  -H "Content-Type: application/json" \
  -d '{
    "api_key": "tvly-YOUR_KEY",
    "query": "latest open-source LLM benchmarks",
    "search_depth": "basic",
    "include_answer": true,
    "max_results": 5
  }'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Drop Tavily into a CrewAI Agent
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from crewai import Agent
from crewai_tools import TavilySearchTool

researcher = Agent(
    role="Research Analyst",
    goal="Find authoritative sources for any topic",
    backstory="You search the open web and cite primary sources only.",
    tools=[TavilySearchTool(api_key="tvly-YOUR_KEY")],
    verbose=True,
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is the whole integration — CrewAI ships the tool wrapper, and the agent will now call Tavily whenever its reasoning step decides to “look something up.” For setting up the CrewAI side, see our &lt;a href="https://toolfreebie.com/crewai-multi-agent-framework/" rel="noopener noreferrer"&gt;free CrewAI guide&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Brave Search API?
&lt;/h2&gt;

&lt;p&gt;Brave Search API is the developer-facing endpoint of the same search index that powers the Brave Browser’s default search. Unlike Tavily (which sits on top of upstream APIs) or Exa (which is a semantic engine), Brave runs its own independent web crawler and serves ~30 billion pages from infrastructure it controls.&lt;/p&gt;

&lt;p&gt;That independence is the entire pitch. Brave is not paying Microsoft for every query, and it is not subject to Bing’s rate limits or pricing changes. If you are building a product whose value is “we search the open web and synthesize answers” — for example, a competitor to Perplexity — the underlying index has to be one you actually control or license cheaply at scale. Brave is currently the most realistic option in that category.&lt;/p&gt;

&lt;p&gt;Brave also exposes several specialized endpoints out of the box: &lt;code&gt;/web/search&lt;/code&gt;, &lt;code&gt;/news/search&lt;/code&gt;, &lt;code&gt;/videos/search&lt;/code&gt;, &lt;code&gt;/images/search&lt;/code&gt;, and a &lt;code&gt;/suggest&lt;/code&gt; autocomplete endpoint. For an agent that needs different result types in different turns, having all of that under one key is genuinely convenient.&lt;/p&gt;

&lt;h2&gt;
  
  
  Brave Search API Free Tier: What You Actually Get
&lt;/h2&gt;

&lt;p&gt;The free plan, called &lt;strong&gt;Data for Free&lt;/strong&gt;, is the lowest-friction one of the three:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;2,000 queries per month&lt;/strong&gt; across the web search endpoint&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1 query per second&lt;/strong&gt; rate limit (this is the most-cited gotcha — you cannot fan out 10 parallel searches at once)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No credit card&lt;/strong&gt; required; signup adds a card only if you upgrade&lt;/li&gt;
&lt;li&gt;Access to &lt;strong&gt;web, news, video, image, and suggest endpoints&lt;/strong&gt; on the free plan&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Goggles support&lt;/strong&gt; — custom rerank rules to bias toward specific domains&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The free tier returns snippets, not extracted page bodies. If you want extracted markdown content with the search result, that requires the &lt;strong&gt;Data for AI&lt;/strong&gt; plan, which costs $5 per 1,000 queries and is the cheapest pure-search-plus-extraction price on the market.&lt;/p&gt;

&lt;p&gt;The 1 query/second rate limit on the free tier is the single most important number to internalize. If your agent does parallel fan-out search (a common pattern in LangGraph workflows), you will hit 429s immediately. The simplest fix is a token-bucket wrapper around the client.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started with Brave Search
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Get Your Free Key
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;a href="https://api.search.brave.com" rel="noopener noreferrer"&gt;api.search.brave.com&lt;/a&gt; and click &lt;strong&gt;Get Started Free&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Sign up with email; verify; choose the &lt;strong&gt;Data for Free&lt;/strong&gt; plan&lt;/li&gt;
&lt;li&gt;Generate a subscription token from &lt;strong&gt;API Keys&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  2. curl First Call
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -s "https://api.search.brave.com/res/v1/web/search?q=open+source+LLM+benchmarks&amp;amp;count=10" \
  -H "Accept: application/json" \
  -H "X-Subscription-Token: YOUR_TOKEN"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Python Client with Rate Limiting Built In
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import os, time, requests
from collections import deque

class BraveSearch:
    def __init__(self, token, rps=1):
        self.token = token
        self.min_interval = 1.0 / rps
        self.calls = deque()

    def _throttle(self):
        now = time.time()
        while self.calls and now - self.calls[0] &amp;gt; 1.0:
            self.calls.popleft()
        if self.calls and len(self.calls) &amp;gt;= 1:
            time.sleep(self.min_interval - (now - self.calls[-1]))
        self.calls.append(time.time())

    def search(self, q, count=10, country="us"):
        self._throttle()
        r = requests.get(
            "https://api.search.brave.com/res/v1/web/search",
            headers={
                "Accept": "application/json",
                "X-Subscription-Token": self.token,
            },
            params={"q": q, "count": count, "country": country},
            timeout=20,
        )
        r.raise_for_status()
        return r.json()

brave = BraveSearch(os.environ["BRAVE_TOKEN"])
data = brave.search("free vector databases for RAG 2026")
for r in data["web"]["results"][:5]:
    print(r["title"], "-", r["url"])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Use Brave with LangChain
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from langchain_community.tools import BraveSearch

tool = BraveSearch.from_api_key(
    api_key=os.environ["BRAVE_TOKEN"],
    search_kwargs={"count": 5},
)

print(tool.run("Latest GPT-4.5 evaluations"))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What Is Exa?
&lt;/h2&gt;

&lt;p&gt;Exa (formerly Metaphor Systems) is a semantic search engine built around dense vector embeddings rather than keyword inversion. Instead of matching the words in your query against words on pages, Exa converts your query and the entire indexed web into the same embedding space, then returns pages whose meaning is closest — even if they share zero surface vocabulary with the query.&lt;/p&gt;

&lt;p&gt;This sounds like a niche distinction until you actually use it. Two examples that illustrate where Exa shines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;“Articles by someone who used to work at OpenAI and now does longevity research”&lt;/strong&gt; — a query with no good keywords to match on; Exa returns relevant blog posts; Google returns junk.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;“Pages similar to this Anthropic safety post”&lt;/strong&gt; — Exa has a dedicated &lt;code&gt;find_similar&lt;/code&gt; endpoint that returns semantically nearest pages to a URL you supply; the closest equivalent on Google is “site:” with a list you maintain yourself.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Exa is the right tool when your agent’s task is research, similarity discovery, or finding non-obvious sources. It is the wrong tool when you need the absolute newest news article from this morning, because the embedding index is updated continuously but not in real time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Exa Free Tier: What You Actually Get
&lt;/h2&gt;

&lt;p&gt;Exa structures its free path differently from the other two:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;$10 of free credit&lt;/strong&gt; at signup, no credit card required&lt;/li&gt;
&lt;li&gt;Pricing: &lt;strong&gt;$5 per 1,000 searches&lt;/strong&gt; for the basic &lt;code&gt;search&lt;/code&gt; endpoint, $10 per 1,000 for &lt;code&gt;search&lt;/code&gt; + &lt;code&gt;contents&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Effective free quota: &lt;strong&gt;~1,000 search-only calls or ~500 search-plus-contents calls&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Once the $10 runs out, you must add a card to continue — there is no monthly refill&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full feature access&lt;/strong&gt; on the free credit: neural search, keyword search, find-similar, contents extraction, livecrawl, summaries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you blow through $10 in a week of heavy experimentation, that signals either that the tool is genuinely valuable for your use case (in which case pay) or that you are using it wrong (search-plus-contents in a loop where you should be caching). Either way, the trial credit is enough to make a real go/no-go decision.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started with Exa
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Sign Up and Grab Your Key
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;a href="https://exa.ai" rel="noopener noreferrer"&gt;exa.ai&lt;/a&gt; and click &lt;strong&gt;Get API Key&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Sign in with Google or email; you land on the dashboard with $10 of credit visible&lt;/li&gt;
&lt;li&gt;Copy your key from &lt;strong&gt;API Keys&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  2. Neural Search with Contents Extraction
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install exa-py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from exa_py import Exa

exa = Exa(api_key="YOUR_KEY")

result = exa.search_and_contents(
    "research papers about retrieval-augmented generation evaluation",
    type="neural",
    num_results=5,
    text={"max_characters": 2000},   # extracted, cleaned page text
)

for r in result.results:
    print(r.title, "-", r.url)
    print(r.text[:300])
    print()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Find Similar Pages
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Given any URL, return semantically similar pages
similar = exa.find_similar_and_contents(
    "https://www.anthropic.com/research/constitutional-ai-harmlessness-from-ai-feedback",
    num_results=5,
    text=True,
)

for r in similar.results:
    print(r.url, "score:", round(r.score, 3))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. curl Without the SDK
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -s https://api.exa.ai/search \
  -H "x-api-key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "papers comparing dense and sparse retrievers",
    "numResults": 5,
    "type": "neural",
    "contents": {"text": {"maxCharacters": 1500}}
  }'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Head-to-Head: Tavily vs Brave vs Exa
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Quota Math for a Real Agent
&lt;/h3&gt;

&lt;p&gt;The “quota per month” numbers look comparable until you do the math against a realistic agent loop. Say your agent does 5 searches per user interaction, and you have 20 daily active users:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Daily searches:&lt;/strong&gt; 20 users × 5 searches = 100 searches/day = 3,000/month&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tavily free (1,000/mo):&lt;/strong&gt; covers ~6 users/day, then hard stop&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Brave free (2,000/mo):&lt;/strong&gt; covers ~13 users/day, plus the 1 req/sec ceiling caps your parallelism&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exa $10 credit:&lt;/strong&gt; ~1,000 search-only calls, gone in 10 days, then pay&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For anything beyond hobby scale, you will hit a paid tier. The choice is then about which paid pricing makes sense for your access pattern — Brave’s $5/1,000 with extraction is the cheapest absolute, Tavily’s $30/4,000 includes LLM-tuned ranking, Exa’s $10/1,000 with content gives you semantic search nobody else offers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Result Quality for LLM Consumption
&lt;/h3&gt;

&lt;p&gt;This is what actually matters when an agent is the consumer. We are not optimizing for human eyeballs; we are optimizing for token efficiency and answer faithfulness downstream.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tavily&lt;/strong&gt; wins on this axis by design. The &lt;code&gt;include_answer=True&lt;/code&gt; flag returns an LLM-generated summary that is already cleaned, deduplicated, and citation-tagged. The &lt;code&gt;include_raw_content=True&lt;/code&gt; flag returns extracted page text without HTML, ads, or navigation — exactly what you would pipe into a &lt;code&gt;system&lt;/code&gt; message for a downstream LLM call. You pay no extra credits for this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Brave&lt;/strong&gt; on the free tier returns search snippets only — typically the first ~150 characters of a result, with some metadata. To get clean page bodies you need either the paid Data for AI plan or a separate scraping step. For agents, this means an extra fetch hop per result.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Exa&lt;/strong&gt; ties with Tavily on content extraction (&lt;code&gt;contents.text&lt;/code&gt; returns cleaned page bodies) and uniquely offers &lt;code&gt;contents.summary&lt;/code&gt; which runs an LLM over each result to compress it further. It uses credit faster but the output is the most LLM-ready of the three.&lt;/p&gt;

&lt;h3&gt;
  
  
  Latency
&lt;/h3&gt;

&lt;p&gt;Measured from a single US datacenter, p50 round-trip on the basic &lt;code&gt;search&lt;/code&gt; endpoint with no extraction:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Brave:&lt;/strong&gt; ~250-400 ms&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tavily basic:&lt;/strong&gt; ~400-700 ms&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tavily advanced:&lt;/strong&gt; ~1.5-3 s (deeper crawl)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exa neural:&lt;/strong&gt; ~400-800 ms; with &lt;code&gt;contents&lt;/code&gt;: ~1-2 s&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For interactive agents, Brave is the fastest, Tavily basic and Exa neural are comparable, and Tavily advanced is in a different latency class — only worth it when answer quality justifies the wait.&lt;/p&gt;

&lt;h3&gt;
  
  
  Index Freshness
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Brave&lt;/strong&gt; wins on freshness — its independent crawler updates within hours for major news sources, and the &lt;code&gt;/news/search&lt;/code&gt; endpoint is specifically optimized for recency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tavily&lt;/strong&gt; inherits the freshness of its upstream Bing API plus its own curated crawl; typically within 1-6 hours for news. The &lt;code&gt;topic="news"&lt;/code&gt; parameter biases toward recency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Exa&lt;/strong&gt; updates continuously but with an embedding step in between, so very recent content (last few hours) may not yet be in the neural index. The &lt;code&gt;livecrawl="always"&lt;/code&gt; parameter forces a real-time crawl for the top hits, but costs more credit.&lt;/p&gt;

&lt;h3&gt;
  
  
  Working With Your Agent Framework
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Framework&lt;/th&gt;
&lt;th&gt;Tavily&lt;/th&gt;
&lt;th&gt;Brave&lt;/th&gt;
&lt;th&gt;Exa&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LangChain / LangGraph&lt;/td&gt;
&lt;td&gt;Native (&lt;code&gt;TavilySearchResults&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Native (&lt;code&gt;BraveSearch&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Native (&lt;code&gt;ExaSearchRetriever&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://toolfreebie.com/crewai-multi-agent-framework/" rel="noopener noreferrer"&gt;CrewAI&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Native (&lt;code&gt;TavilySearchTool&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Via custom &lt;code&gt;BaseTool&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Native (&lt;code&gt;EXASearchTool&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://toolfreebie.com/mcp-protocol-ai-agents/" rel="noopener noreferrer"&gt;MCP servers&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Official &lt;code&gt;mcp-server-tavily&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Community &lt;code&gt;brave-search-mcp&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Official &lt;code&gt;exa-mcp-server&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI Assistants&lt;/td&gt;
&lt;td&gt;Function calling wrapper&lt;/td&gt;
&lt;td&gt;Function calling wrapper&lt;/td&gt;
&lt;td&gt;Function calling wrapper&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Anthropic Claude tool use&lt;/td&gt;
&lt;td&gt;Tool definition snippet&lt;/td&gt;
&lt;td&gt;Tool definition snippet&lt;/td&gt;
&lt;td&gt;Tool definition snippet&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;All three publish well-maintained MCP servers, which is the path that lets your AI assistant (Claude Desktop, Cursor, Cline, etc.) gain search powers without writing any code at all. For background on MCP, see our &lt;a href="https://toolfreebie.com/mcp-protocol-ai-agents/" rel="noopener noreferrer"&gt;guide to the Model Context Protocol&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Which One Should You Use? A Decision Tree
&lt;/h2&gt;

&lt;p&gt;Use this decision logic — in order — and you will land on the right tool roughly every time.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Is the consumer an LLM that needs clean, summarized text?&lt;/strong&gt; → Tavily. The &lt;code&gt;include_answer&lt;/code&gt; + &lt;code&gt;include_raw_content&lt;/code&gt; defaults are exactly what you want.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Do you need a high volume of cheap web searches with an independent index, and you are willing to do extraction yourself?&lt;/strong&gt; → Brave. The $5/1,000 paid tier is unbeatable on raw search cost.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Are you doing research, similarity discovery, or finding non-obvious sources?&lt;/strong&gt; → Exa. Neural search and &lt;code&gt;find_similar&lt;/code&gt; have no real free-tier competitor.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Do you need news in the last hour?&lt;/strong&gt; → Brave (&lt;code&gt;/news/search&lt;/code&gt;) or Tavily (&lt;code&gt;topic="news"&lt;/code&gt;); avoid Exa for breaking news.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Are you building a Perplexity-style product where the index &lt;em&gt;is&lt;/em&gt; the moat?&lt;/strong&gt; → Brave. The independent crawl matters at scale.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Are you prototyping an agent and just want one search call that “works”?&lt;/strong&gt; → Tavily. Easiest setup, cleanest output, biggest free monthly quota.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Combining All Three: The “Search Router” Pattern
&lt;/h2&gt;

&lt;p&gt;For serious agent systems, a single search provider is a brittle dependency. A common pattern in 2026 is to wrap all three behind a single internal tool that routes by query type:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from tavily import TavilyClient
from exa_py import Exa
import requests, os

tavily = TavilyClient(os.environ["TAVILY_KEY"])
exa = Exa(os.environ["EXA_KEY"])
BRAVE_TOKEN = os.environ["BRAVE_TOKEN"]

def smart_search(query: str, intent: str = "general"):
    """Route to the best search provider for the intent.

    intent: 'news' | 'research' | 'similar' | 'general'
    """
    if intent == "news":
        # Brave news endpoint, freshest index
        r = requests.get(
            "https://api.search.brave.com/res/v1/news/search",
            headers={"X-Subscription-Token": BRAVE_TOKEN, "Accept": "application/json"},
            params={"q": query, "count": 5},
            timeout=15,
        ).json()
        return [{"title": x["title"], "url": x["url"], "text": x.get("description", "")}
                for x in r.get("results", [])]

    if intent == "research":
        # Exa neural search for semantic matching
        res = exa.search_and_contents(query, type="neural", num_results=5,
                                       text={"max_characters": 1500})
        return [{"title": r.title, "url": r.url, "text": r.text} for r in res.results]

    if intent == "similar":
        # Exa find-similar (query should be a URL)
        res = exa.find_similar_and_contents(query, num_results=5, text=True)
        return [{"title": r.title, "url": r.url, "text": r.text} for r in res.results]

    # default: Tavily for LLM-optimized general retrieval
    res = tavily.search(query=query, search_depth="basic",
                        include_answer=True, max_results=5)
    return [{"answer": res["answer"]}] + \
           [{"title": r["title"], "url": r["url"], "text": r["content"]}
            for r in res["results"]]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent’s reasoning step picks the &lt;code&gt;intent&lt;/code&gt; based on its own plan, and the router transparently uses whichever provider is best — and whichever still has free quota left. Pair this with a 24-hour cache keyed by &lt;code&gt;(query, intent)&lt;/code&gt; and your real search bill stays near zero for a long time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common Gotchas
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Tavily: Watch the &lt;code&gt;search_depth&lt;/code&gt; Default
&lt;/h3&gt;

&lt;p&gt;The Python SDK defaults to &lt;code&gt;search_depth="basic"&lt;/code&gt; (1 credit) but the LangChain wrapper has at times defaulted to &lt;code&gt;advanced&lt;/code&gt; (2 credits). With a 1,000-credit free tier, this halves your usable quota if you do not notice. Always pass &lt;code&gt;search_depth&lt;/code&gt; explicitly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Brave: Parallel Fan-Out Will Get You 429’d
&lt;/h3&gt;

&lt;p&gt;The free tier caps you at exactly 1 query per second. If your LangGraph workflow does &lt;code&gt;asyncio.gather()&lt;/code&gt; over 5 sub-queries at once, four of them 429. Either wrap in a token bucket (see the Python client above) or upgrade to the paid plan, which lifts the limit to 20 queries/second.&lt;/p&gt;

&lt;h3&gt;
  
  
  Exa: &lt;code&gt;type="auto"&lt;/code&gt; Costs More Than You Think
&lt;/h3&gt;

&lt;p&gt;Exa auto-selects between neural and keyword search, and neural costs more credit. If you know your query is keyword-heavy (“Anthropic blog post May 2026”), force &lt;code&gt;type="keyword"&lt;/code&gt; to save credit. Save &lt;code&gt;type="neural"&lt;/code&gt; for queries that benefit from semantic matching.&lt;/p&gt;

&lt;h3&gt;
  
  
  All Three: Cache Aggressively
&lt;/h3&gt;

&lt;p&gt;An agent that asks “what is the latest Llama model” twenty times in a debugging session burns 20 credits on the same answer. A trivial in-memory or SQLite cache keyed by the query string saves more credit than any other optimization you will do. The cache TTL should be 1 hour for general queries, 15 minutes for news, 24 hours for stable reference material.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pairing Search With a Free LLM
&lt;/h2&gt;

&lt;p&gt;None of these search APIs do anything on their own — they feed text to an LLM that produces the actual user-facing answer. The cheapest production stack we have seen in 2026 pairs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Search:&lt;/strong&gt; Tavily free (1,000 monthly) for general retrieval + Brave free (2,000 monthly) for news fan-out&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM:&lt;/strong&gt; Free tier from &lt;a href="https://toolfreebie.com/groq-fastest-free-ai-api/" rel="noopener noreferrer"&gt;Groq&lt;/a&gt; (14,400 requests/day), &lt;a href="https://toolfreebie.com/gemini-free-ai-api/" rel="noopener noreferrer"&gt;Gemini&lt;/a&gt; (1M token context), or &lt;a href="https://toolfreebie.com/together-ai-free-api-llama-deepseek-flux-2026/" rel="noopener noreferrer"&gt;Together AI&lt;/a&gt; (Llama 3.3 70B free tier)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Orchestration:&lt;/strong&gt; CrewAI for multi-agent flows, LangGraph for stateful workflows, or a vanilla function-calling loop for simple cases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability:&lt;/strong&gt; &lt;a href="https://toolfreebie.com/langfuse-llm-observability/" rel="noopener noreferrer"&gt;Langfuse&lt;/a&gt; self-hosted or Hobby tier to trace every search call and LLM call&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The total monthly bill at hobby scale, with all of the above: $0. The total at small production scale (a few hundred daily users): typically $30-80, almost all of it search-API overage above the free tiers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Can I use these search APIs for commercial products?
&lt;/h3&gt;

&lt;p&gt;Yes — all three offer commercial use on every tier including the free one. Read each provider’s Terms of Service for redistribution restrictions (typically you cannot resell raw search results as a competing search engine, but you can use them in any agent or end-product feature).&lt;/p&gt;

&lt;h3&gt;
  
  
  What about SerpAPI / ScraperAPI / SearXNG?
&lt;/h3&gt;

&lt;p&gt;SerpAPI is the long-standing Google-results scraper used by many older LangChain examples. It starts at $75/month with only a 100-search trial — fine for production, expensive for prototyping. ScraperAPI is similar. SearXNG is a self-hosted metasearch aggregator — free if you host it, but the throughput and stability depend on your hosting and on upstream search engines not rate-limiting your IP.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does Google offer a free search API in 2026?
&lt;/h3&gt;

&lt;p&gt;No public, generally-available one. Google Custom Search JSON API has a free tier of 100 queries/day, but it is limited to “site search” on a list of domains you specify in advance — it is not a general web search API. Google’s Vertex AI Search is enterprise-only.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which one works best in MCP setups?
&lt;/h3&gt;

&lt;p&gt;Tavily’s official &lt;code&gt;mcp-server-tavily&lt;/code&gt; is the most polished and the one Anthropic uses in its example MCP configs. Exa’s &lt;code&gt;exa-mcp-server&lt;/code&gt; is also official and adds the &lt;code&gt;find_similar&lt;/code&gt; tool which is uniquely useful inside Claude Desktop. Brave has only community-maintained MCP servers but they work fine.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use these inside an MCP server I build myself?
&lt;/h3&gt;

&lt;p&gt;Yes — all three are just HTTP APIs. Wrap whichever one you prefer in an MCP tool definition and your assistant inherits web search capability. See our &lt;a href="https://toolfreebie.com/mcp-protocol-ai-agents/" rel="noopener noreferrer"&gt;MCP explainer&lt;/a&gt; for the full server pattern.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I know I am hitting the free-tier ceiling?
&lt;/h3&gt;

&lt;p&gt;Tavily and Exa both expose usage on their dashboards in near-real time. Brave shows usage on the dashboard with a 5-10 minute delay. All three return a structured error with the rate-limit headers (&lt;code&gt;x-ratelimit-remaining&lt;/code&gt;, &lt;code&gt;retry-after&lt;/code&gt;) on 429 responses — log those headers in your client so you can alert before you hit the cap rather than after.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is there a single “best free search API”?
&lt;/h3&gt;

&lt;p&gt;No, and any article that claims one is gaming a keyword. For LLM-consumed agent retrieval, Tavily is the cleanest default. For independent index and high volume, Brave wins. For semantic search and find-similar, Exa is the only real option. The “search router” pattern earlier in this guide is the answer when you cannot pick.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bottom Line
&lt;/h2&gt;

&lt;p&gt;The free-search-API market in 2026 has stabilized into three genuinely useful options, each with a clear specialty. Pick by access pattern, not by raw quota:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Building an agent that needs search? Start with Tavily.&lt;/strong&gt; The clean text output and the monthly 1,000-credit refresh make it the lowest-friction first integration.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Need cheap volume? Add Brave.&lt;/strong&gt; 2,000 free queries plus the cheapest paid tier in the market mean it is the natural second provider when Tavily runs out.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Doing research or similarity work? Reach for Exa.&lt;/strong&gt; Neural and find-similar are unique capabilities the other two simply do not offer.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Wire up the router pattern, cache aggressively, and you can run a production-grade agent with web search capability for $0/month at hobby scale and a predictable five to fifty dollars at small production scale. Combined with a free LLM tier from &lt;a href="https://toolfreebie.com/groq-fastest-free-ai-api/" rel="noopener noreferrer"&gt;Groq&lt;/a&gt; or &lt;a href="https://toolfreebie.com/gemini-free-ai-api/" rel="noopener noreferrer"&gt;Gemini&lt;/a&gt;, that is a complete agent stack that costs nothing meaningful until you actually have users.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Reads
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/crewai-multi-agent-framework/" rel="noopener noreferrer"&gt;CrewAI: Free Open-Source Multi-Agent AI Framework for Python&lt;/a&gt; — the most natural framework to pair with these search APIs&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/mcp-protocol-ai-agents/" rel="noopener noreferrer"&gt;MCP (Model Context Protocol): Connect AI Agents to Any Tool or API&lt;/a&gt; — how all three providers ship MCP servers&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/crewai-vs-autogpt-vs-langgraph/" rel="noopener noreferrer"&gt;CrewAI vs AutoGPT vs LangGraph: Which Free Agent Framework Should You Use in 2026?&lt;/a&gt; — choosing the orchestrator that consumes your search results&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/langfuse-llm-observability/" rel="noopener noreferrer"&gt;Langfuse: Free Open-Source LLM Observability&lt;/a&gt; — trace every search call and LLM call your agent makes&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/groq-fastest-free-ai-api/" rel="noopener noreferrer"&gt;Groq API: The Fastest Free AI API in 2026&lt;/a&gt; — the LLM half of the free agent stack&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://toolfreebie.com/tavily-vs-brave-vs-exa-search/" rel="noopener noreferrer"&gt;toolfreebie.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Langfuse: Free Open-Source LLM Observability</title>
      <dc:creator>toolfreebie</dc:creator>
      <pubDate>Thu, 28 May 2026 09:02:37 +0000</pubDate>
      <link>https://dev.to/build996/langfuse-free-open-source-llm-observability-2b8i</link>
      <guid>https://dev.to/build996/langfuse-free-open-source-llm-observability-2b8i</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwfzcjm60zr1325whjjay.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwfzcjm60zr1325whjjay.jpg" alt="Langfuse: Free Open-Source LLM Observability" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Langfuse?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://langfuse.com" rel="noopener noreferrer"&gt;Langfuse&lt;/a&gt; is a free, open-source &lt;strong&gt;LLM observability&lt;/strong&gt; platform — the tool you reach for when your AI app works in the demo and then does something baffling in production. It records every model call, agent step, retrieval, and tool use as a structured trace you can open, read, and replay. Born out of Y Combinator’s W23 batch and now one of the most-starred LLM engineering projects on GitHub (&lt;a href="https://github.com/langfuse/langfuse" rel="noopener noreferrer"&gt;langfuse/langfuse&lt;/a&gt;), it has become the default “what just happened?” layer for teams shipping anything more complex than a single chat completion.&lt;/p&gt;

&lt;p&gt;The core of Langfuse is MIT-licensed and self-hostable, which is the part that matters for this blog: you can run the entire platform on your own machine or a cheap VPS for $0, forever, with no seat limits and no trace caps. There’s also a managed Langfuse Cloud with a genuinely free Hobby tier if you’d rather not run infrastructure. Either way, the SDKs, the integrations, and the trace UI are the same.&lt;/p&gt;

&lt;p&gt;If you’re building with any of the &lt;a href="https://toolfreebie.com/best-free-ai-apis-2026/" rel="noopener noreferrer"&gt;free AI APIs&lt;/a&gt; covered here — Gemini, Groq, OpenRouter, Together — Langfuse is the missing piece that turns “I think the prompt is fine” into “here is the exact request, the exact response, the latency, and the cost.” This guide covers what LLM observability actually buys you, whether Langfuse is really free, how it compares to LangSmith and Phoenix, and how to instrument your first app in about ten minutes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why LLM Observability Matters
&lt;/h2&gt;

&lt;p&gt;Traditional application monitoring assumes deterministic code: same input, same output, and a stack trace when something breaks. LLM apps break that assumption in three ways, and each one is a reason observability stopped being optional in 2026.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Non-determinism.&lt;/strong&gt; The same prompt can return different answers on different days. Without a recorded trace of the exact input and output, “it gave a weird answer yesterday” is unreproducible and therefore unfixable.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hidden multi-step chains.&lt;/strong&gt; A single user message to an agent can fan out into a dozen model calls, retrievals, and tool invocations. When the final answer is wrong, the bug is usually three steps back — a bad retrieval, a truncated context, a tool that returned an error the model ignored. You need to see the whole tree.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost and latency creep.&lt;/strong&gt; Token usage is invisible until the bill arrives. Observability surfaces per-call token counts and dollar estimates so you can catch the prompt that quietly grew to 40,000 tokens of context.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;LLM observability gives you a recorded, searchable history of every AI interaction: the prompts, the completions, the latency, the token cost, the retrieved documents, and the tool calls — organized as nested traces so you can drill from a user session down to the single span that misbehaved. That’s the category Langfuse sits in, alongside LangSmith, Arize Phoenix, and Helicone.&lt;/p&gt;

&lt;h2&gt;
  
  
  Is Langfuse Really Free? Cloud vs Self-Hosted
&lt;/h2&gt;

&lt;p&gt;“Free” means two different things with Langfuse, and both are real.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Self-hosted (free forever).&lt;/strong&gt; The Langfuse core is open source under the MIT license. You run it yourself with Docker — a Postgres database, a ClickHouse analytics store, Redis, and the Langfuse web/worker containers, all wired up by the official &lt;code&gt;docker compose&lt;/code&gt; file. There are no trace limits, no seat limits, and no feature gates on the open-source build beyond a small set of enterprise add-ons (SSO enforcement, fine-grained RBAC, audit logs) that live behind a commercial license. For an individual or a small team, the MIT build does everything you need.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Langfuse Cloud Hobby (free tier).&lt;/strong&gt; If you don’t want to run infrastructure, Langfuse Cloud has a free Hobby plan that includes 50,000 units per month with no credit card required, according to the &lt;a href="https://langfuse.com/pricing" rel="noopener noreferrer"&gt;Langfuse pricing page&lt;/a&gt; (always check the page for the current limit — these numbers move). A “unit” is roughly one ingested observation, so 50,000/month comfortably covers a side project or an early-stage app in development.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Self-Hosted (MIT)&lt;/th&gt;
&lt;th&gt;Cloud Hobby (Free)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Price&lt;/td&gt;
&lt;td&gt;$0 (you pay for the server)&lt;/td&gt;
&lt;td&gt;$0, no credit card&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Trace / event volume&lt;/td&gt;
&lt;td&gt;Unlimited&lt;/td&gt;
&lt;td&gt;50,000 units/month&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Team seats&lt;/td&gt;
&lt;td&gt;Unlimited&lt;/td&gt;
&lt;td&gt;Limited on free tier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data residency&lt;/td&gt;
&lt;td&gt;Your infrastructure&lt;/td&gt;
&lt;td&gt;EU or US region&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Setup effort&lt;/td&gt;
&lt;td&gt;One &lt;code&gt;docker compose up&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Sign up, copy two keys&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Maintenance&lt;/td&gt;
&lt;td&gt;You own upgrades &amp;amp; backups&lt;/td&gt;
&lt;td&gt;Managed for you&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enterprise extras (SSO, RBAC)&lt;/td&gt;
&lt;td&gt;Commercial license&lt;/td&gt;
&lt;td&gt;Paid tiers&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The honest rule of thumb: prototype on Cloud Hobby because it takes ninety seconds to start, and move to self-hosted the moment you either exceed the free volume, need unlimited seats, or have data-residency requirements that rule out a third party seeing your prompts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Langfuse vs LangSmith vs Phoenix vs Helicone
&lt;/h2&gt;

&lt;p&gt;Four tools dominate free-tier LLM observability in 2026, and they make different trade-offs between openness, framework lock-in, and how you wire them up.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Open source&lt;/th&gt;
&lt;th&gt;Free path&lt;/th&gt;
&lt;th&gt;Integration model&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Langfuse&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes (MIT core)&lt;/td&gt;
&lt;td&gt;Self-host free + Cloud Hobby (50k units/mo)&lt;/td&gt;
&lt;td&gt;SDK + decorators + OpenTelemetry, framework-agnostic&lt;/td&gt;
&lt;td&gt;Teams who want a full platform they can also self-host&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://www.langchain.com/langsmith" rel="noopener noreferrer"&gt;LangSmith&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;No (managed SaaS)&lt;/td&gt;
&lt;td&gt;Free Developer plan (~5,000 traces/mo, 1 seat)&lt;/td&gt;
&lt;td&gt;Tightest with LangChain / LangGraph&lt;/td&gt;
&lt;td&gt;Teams already all-in on the LangChain stack&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://phoenix.arize.com" rel="noopener noreferrer"&gt;Arize Phoenix&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Fully free to self-host&lt;/td&gt;
&lt;td&gt;OpenTelemetry / OpenInference, notebook-first&lt;/td&gt;
&lt;td&gt;Data scientists debugging in notebooks &amp;amp; evals&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://www.helicone.ai" rel="noopener noreferrer"&gt;Helicone&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Free tier (~10,000 requests/mo)&lt;/td&gt;
&lt;td&gt;Proxy — change one base URL&lt;/td&gt;
&lt;td&gt;The absolute lowest-effort drop-in logging&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;(Free-tier numbers above are from each vendor’s public pricing/docs and change often — verify on the linked page before you rely on them.)&lt;/p&gt;

&lt;p&gt;The clearest dividing line is &lt;em&gt;how&lt;/em&gt; they capture data. Helicone is a &lt;strong&gt;proxy&lt;/strong&gt;: you point your OpenAI base URL at Helicone and it logs every request passing through — zero code changes, but it only sees what flows through the proxy. Langsmith and Langfuse use an &lt;strong&gt;SDK/instrumentation&lt;/strong&gt; model: you wrap your calls or add a decorator, which means they can capture non-LLM steps (retrievals, tool calls, business logic) as spans in the same trace. Phoenix leans on the OpenTelemetry standard, which makes it portable but a little more setup-heavy.&lt;/p&gt;

&lt;p&gt;Langfuse’s pitch is “open like Phoenix, full-featured like LangSmith, framework-agnostic unlike either.” If you want one platform that handles tracing, prompt management, and evals, and you want the option to self-host it for free, Langfuse is the broadest pick. If you live entirely inside &lt;a href="https://toolfreebie.com/crewai-vs-autogpt-vs-langgraph/" rel="noopener noreferrer"&gt;LangGraph&lt;/a&gt;, LangSmith’s deeper native hooks may win on convenience.&lt;/p&gt;

&lt;h2&gt;
  
  
  Core Features That Matter
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Tracing and Spans
&lt;/h3&gt;

&lt;p&gt;The foundation. A &lt;strong&gt;trace&lt;/strong&gt; represents one unit of work — typically one user request — and contains nested &lt;strong&gt;spans&lt;/strong&gt; for each step inside it: the retrieval, each LLM call, each tool invocation. Langfuse shows this as an expandable tree with timing, token counts, and cost on every node. When an agent gives a bad answer, you open the trace and walk down to the exact span where the context went wrong. Traces can be grouped into &lt;strong&gt;sessions&lt;/strong&gt; (a multi-turn conversation) and attributed to a &lt;strong&gt;user&lt;/strong&gt;, so you can answer “show me everything user 4471 did this week.”&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Prompt Management
&lt;/h3&gt;

&lt;p&gt;Langfuse stores your prompts as versioned, named objects you fetch at runtime instead of hardcoding strings. You edit a prompt in the UI, label a version &lt;code&gt;production&lt;/code&gt;, and your app picks it up without a redeploy. Every version is linked to the traces that used it, so you can see whether v4 of your system prompt actually reduced hallucinations versus v3. This is the feature that turns prompt engineering from “edit code, commit, deploy, hope” into something measurable.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Evaluations and Scoring
&lt;/h3&gt;

&lt;p&gt;Langfuse can attach &lt;strong&gt;scores&lt;/strong&gt; to any trace — from explicit user thumbs-up/down, from an LLM-as-a-judge evaluator, from a custom function, or from manual human annotation in the UI. Over time these scores become quality metrics you can chart: “answer relevance dropped 8% after we switched models.” You can run evaluators automatically on a sample of production traffic or against a fixed test set.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Datasets
&lt;/h3&gt;

&lt;p&gt;A dataset is a curated set of inputs (and optional expected outputs) you run your app against to catch regressions before they ship. The natural workflow: find a trace where the app failed, click “add to dataset,” and that real-world failure becomes a permanent test case. Re-run the dataset after every prompt or model change and compare scores side by side.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Playground
&lt;/h3&gt;

&lt;p&gt;An in-app prompt playground lets you grab a failing trace, tweak the prompt or swap the model, and re-run it immediately to see if your fix works — without leaving the tool or wiring up a script. It connects to your model providers, so you can A/B a prompt against Gemini and Groq in the same window.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Metrics and Dashboards
&lt;/h3&gt;

&lt;p&gt;Aggregate views over all your traces: total cost per day, p95 latency per model, token usage by feature, score trends over time. This is where you notice that one endpoint is responsible for 70% of your spend, or that latency doubled the day you added a reranking step.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Self-Host Langfuse for Free
&lt;/h2&gt;

&lt;p&gt;The fastest way to a free, unlimited Langfuse instance is the official Docker Compose stack. On any machine with Docker installed — including a free &lt;a href="https://toolfreebie.com/oracle-free-arm-vps/" rel="noopener noreferrer"&gt;Oracle Cloud ARM VPS&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone https://github.com/langfuse/langfuse.git
cd langfuse
docker compose up -d
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That brings up the full stack (Langfuse web + worker, Postgres, ClickHouse, Redis) and serves the UI at &lt;code&gt;http://localhost:3000&lt;/code&gt;. Create an account on first load — it’s stored in your own database — make a project, and copy the public and secret API keys it generates. You now have a production-grade observability platform that no one else can see, with no trace limits, running for the cost of the server.&lt;/p&gt;

&lt;p&gt;For production you’ll want to put it behind HTTPS and back up Postgres and ClickHouse, but for development the compose file is genuinely one command. The official &lt;a href="https://langfuse.com/self-hosting" rel="noopener noreferrer"&gt;self-hosting docs&lt;/a&gt; cover the Kubernetes Helm chart and managed-database setups when you outgrow single-node.&lt;/p&gt;

&lt;h2&gt;
  
  
  Instrumenting Your App: Three Ways
&lt;/h2&gt;

&lt;p&gt;Langfuse offers progressively deeper levels of instrumentation. Start with the first one; reach for the others as your app grows. All three send data to the same project — set these environment variables once and every example below works against either Cloud or your self-hosted instance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;export LANGFUSE_PUBLIC_KEY="pk-lf-..."
export LANGFUSE_SECRET_KEY="sk-lf-..."
# Cloud EU: https://cloud.langfuse.com  |  Cloud US: https://us.cloud.langfuse.com
# Self-hosted: http://localhost:3000
export LANGFUSE_HOST="https://cloud.langfuse.com"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Way 1: The OpenAI Drop-In Wrapper (zero refactor)
&lt;/h3&gt;

&lt;p&gt;If your code already uses the OpenAI SDK — which is true for most free AI APIs, since &lt;a href="https://toolfreebie.com/groq-fastest-free-ai-api/" rel="noopener noreferrer"&gt;Groq&lt;/a&gt;, &lt;a href="https://toolfreebie.com/together-ai-free-api-llama-deepseek-flux-2026/" rel="noopener noreferrer"&gt;Together&lt;/a&gt;, &lt;a href="https://toolfreebie.com/mistral-free-api/" rel="noopener noreferrer"&gt;Mistral&lt;/a&gt;, and &lt;a href="https://toolfreebie.com/openrouter-free-ai-models/" rel="noopener noreferrer"&gt;OpenRouter&lt;/a&gt; are all OpenAI-compatible — you change exactly one import line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install langfuse openai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# before:  from openai import OpenAI
from langfuse.openai import openai   # drop-in replacement

client = openai.OpenAI(
    base_url="https://api.groq.com/openai/v1",   # any OpenAI-compatible endpoint
    api_key="YOUR_GROQ_KEY",
)

resp = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Explain LLM observability in one sentence."}],
)
print(resp.choices[0].message.content)
# This call is now automatically traced in Langfuse: prompt, completion,
# token usage, latency, and cost — with zero other changes.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every completion you make now shows up as a trace. This is the lowest-effort way to start and works against any OpenAI-compatible free API.&lt;/p&gt;

&lt;h3&gt;
  
  
  Way 2: The @observe Decorator (capture your own functions)
&lt;/h3&gt;

&lt;p&gt;To see your business logic — not just the model call — wrap any function with the &lt;code&gt;@observe&lt;/code&gt; decorator. Nested decorated functions automatically become nested spans in the same trace:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from langfuse import observe
from langfuse.openai import openai

@observe()
def retrieve(question: str) -&amp;gt; str:
    # your vector search here; return the context string
    return "...retrieved context..."

@observe()
def answer(question: str) -&amp;gt; str:
    context = retrieve(question)            # becomes a child span
    resp = openai.chat.completions.create(  # becomes another child span
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": f"Answer using:\n{context}"},
            {"role": "user", "content": question},
        ],
    )
    return resp.choices[0].message.content

answer("What does Langfuse trace?")
# One trace, three spans: answer -&amp;gt; retrieve, answer -&amp;gt; openai call.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Way 3: The LangChain / LangGraph Callback
&lt;/h3&gt;

&lt;p&gt;If you build with LangChain or &lt;a href="https://toolfreebie.com/crewai-vs-autogpt-vs-langgraph/" rel="noopener noreferrer"&gt;LangGraph&lt;/a&gt;, pass Langfuse’s callback handler and it captures the whole chain automatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from langfuse.langchain import CallbackHandler

handler = CallbackHandler()
result = chain.invoke(
    {"question": "What is LLM observability?"},
    config={"callbacks": [handler]},
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For TypeScript / Node projects, the same drop-in pattern exists:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import OpenAI from "openai";
import { observeOpenAI } from "langfuse";

const client = observeOpenAI(new OpenAI({
  baseURL: "https://api.groq.com/openai/v1",
  apiKey: process.env.GROQ_API_KEY,
}));

const resp = await client.chat.completions.create({
  model: "llama-3.3-70b-versatile",
  messages: [{ role: "user", content: "Hello from Node, traced by Langfuse." }],
});
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because Langfuse v3 is built on OpenTelemetry under the hood, any OTel-instrumented library or framework can also feed it — useful if you’re standardizing telemetry across services. Check the &lt;a href="https://langfuse.com/docs" rel="noopener noreferrer"&gt;Langfuse docs&lt;/a&gt; for the current SDK API, which evolves between major versions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tracing a RAG Pipeline End-to-End
&lt;/h2&gt;

&lt;p&gt;RAG is where observability earns its keep, because a wrong answer can come from retrieval &lt;em&gt;or&lt;/em&gt; generation and the two failure modes look identical from the outside. Picture a typical stack: a question comes in, you embed it, search a &lt;a href="https://toolfreebie.com/free-vector-database-rag/" rel="noopener noreferrer"&gt;vector database&lt;/a&gt;, rerank with &lt;a href="https://toolfreebie.com/cohere-rag-api/" rel="noopener noreferrer"&gt;Cohere&lt;/a&gt;, stuff the top chunks into a prompt, and generate an answer.&lt;/p&gt;

&lt;p&gt;With each step wrapped in &lt;code&gt;@observe&lt;/code&gt;, a single Langfuse trace shows you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The exact &lt;strong&gt;query embedding&lt;/strong&gt; step and its latency&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;documents retrieved&lt;/strong&gt; from the vector store, with their similarity scores — so you can instantly see if retrieval pulled garbage&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;reranked order&lt;/strong&gt; after Cohere, to confirm the reranker actually helped&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;final prompt&lt;/strong&gt; that went to the model, including exactly which chunks made it into the context window&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;completion&lt;/strong&gt;, token count, and cost&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When a user reports “it said we don’t offer refunds, but we do,” you open their trace and the answer is right there: either the refund policy chunk wasn’t retrieved (a retrieval/embedding problem) or it was retrieved but the model ignored it (a prompt problem). Five seconds of looking replaces an hour of guessing. That single capability — being able to &lt;em&gt;see&lt;/em&gt; which half of the RAG pipeline failed — is the most common reason teams adopt Langfuse.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prompt Management Without Redeploys
&lt;/h2&gt;

&lt;p&gt;Once your prompts live in Langfuse, you fetch them by name at runtime:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from langfuse import Langfuse

langfuse = Langfuse()
prompt = langfuse.get_prompt("support-agent")   # fetches the 'production' label by default
compiled = prompt.compile(customer_name="Ada", product="Widget Pro")

# use compiled as your system prompt; cached client-side, linked to the trace
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now editing the support agent’s behavior is a UI change, not a code change. Non-engineers can iterate on copy, you can roll back a bad version with one click, and because Langfuse links each prompt version to the traces and scores it produced, you get a real before/after on quality instead of vibes. Prompts are cached on the client so the fetch doesn’t add latency to your hot path.&lt;/p&gt;

&lt;h2&gt;
  
  
  Running Evaluations
&lt;/h2&gt;

&lt;p&gt;The maturity curve for an AI app usually goes: ship it, watch traces, notice a recurring failure, turn that failure into a dataset entry, then run evaluations so the failure can’t silently come back. Langfuse supports all of it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Online evaluation&lt;/strong&gt; — run an LLM-as-a-judge evaluator on a sample of live traffic and chart the score over time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Offline evaluation&lt;/strong&gt; — run your app against a fixed dataset before every release and diff the scores against the last run.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Human annotation&lt;/strong&gt; — queue traces for a teammate to label in the UI, building a gold-standard set.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The judge model can be any provider you connect — including a free one. Using &lt;a href="https://toolfreebie.com/gemini-free-ai-api/" rel="noopener noreferrer"&gt;Gemini&lt;/a&gt; or a Llama model on Groq as your evaluator keeps the whole eval loop at $0, which matters because evaluation can easily run more model calls than production itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Use Langfuse vs Alternatives
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;You want one open-source platform for tracing + prompts + evals, with the option to self-host free&lt;/strong&gt; → Langfuse&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You are all-in on LangChain / LangGraph and want the tightest native integration&lt;/strong&gt; → &lt;a href="https://www.langchain.com/langsmith" rel="noopener noreferrer"&gt;LangSmith&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You debug mostly in Jupyter notebooks and care most about evals&lt;/strong&gt; → &lt;a href="https://phoenix.arize.com" rel="noopener noreferrer"&gt;Arize Phoenix&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want the absolute lowest-effort logging and only call one OpenAI-compatible API&lt;/strong&gt; → &lt;a href="https://www.helicone.ai" rel="noopener noreferrer"&gt;Helicone&lt;/a&gt; (proxy, one URL change)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You have strict data-residency rules and prompts can’t leave your network&lt;/strong&gt; → self-hosted Langfuse or Phoenix&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You’re prototyping today and want zero setup&lt;/strong&gt; → Langfuse Cloud Hobby (free, no card)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Is Langfuse really free?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes, two ways. The MIT-licensed core is free to self-host with no trace, seat, or feature caps (a few enterprise extras like SSO enforcement need a commercial license). Langfuse Cloud also has a free Hobby tier with 50,000 units/month and no credit card. You only pay if you want managed hosting above the free volume or enterprise governance features.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does Langfuse add latency to my app?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Negligibly. The SDK sends trace data asynchronously in the background after your response is already returned, and prompts are cached client-side. Your users don’t wait on Langfuse.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do I have to use LangChain to use Langfuse?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;No — that’s the point. Langfuse is framework-agnostic. The OpenAI drop-in wrapper and the &lt;code&gt;@observe&lt;/code&gt; decorator work with plain SDK calls, &lt;a href="https://toolfreebie.com/crewai-multi-agent-framework/" rel="noopener noreferrer"&gt;CrewAI&lt;/a&gt;, LlamaIndex, raw HTTP, or your own custom orchestration. LangChain is just one of many supported integrations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What’s the difference between Langfuse and LangSmith?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;LangSmith is a closed-source managed product from the LangChain team, with the deepest hooks into the LangChain ecosystem. Langfuse is open-source, can be self-hosted for free, and is deliberately framework-agnostic. If you’re not married to LangChain — or you need to keep data on your own infrastructure — Langfuse is the more flexible choice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I use Langfuse with free APIs like Gemini, Groq, or DeepSeek?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes. Any OpenAI-compatible endpoint works with the drop-in wrapper — just set the &lt;code&gt;base_url&lt;/code&gt;. &lt;a href="https://toolfreebie.com/groq-fastest-free-ai-api/" rel="noopener noreferrer"&gt;Groq&lt;/a&gt;, &lt;a href="https://toolfreebie.com/together-ai-free-api-llama-deepseek-flux-2026/" rel="noopener noreferrer"&gt;Together&lt;/a&gt;, &lt;a href="https://toolfreebie.com/deepseek-free-api/" rel="noopener noreferrer"&gt;DeepSeek&lt;/a&gt;, &lt;a href="https://toolfreebie.com/mistral-free-api/" rel="noopener noreferrer"&gt;Mistral&lt;/a&gt;, and &lt;a href="https://toolfreebie.com/openrouter-free-ai-models/" rel="noopener noreferrer"&gt;OpenRouter&lt;/a&gt; all qualify, and Gemini works through its OpenAI-compatible layer or a dedicated integration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does Langfuse store my prompts and completions?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes — that’s how tracing works. On Cloud, that data lives in Langfuse’s chosen region (EU or US). If your prompts contain sensitive data you can’t send to a third party, self-host: then the data never leaves your infrastructure. The SDK also supports masking specific fields before they’re sent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can it track cost?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes. Langfuse computes per-call token usage and a dollar estimate based on each model’s pricing, then aggregates it into dashboards by day, model, user, or feature — so you can find your most expensive endpoint at a glance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What database does self-hosted Langfuse need?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The current architecture uses Postgres for transactional data and ClickHouse for high-volume trace analytics, plus Redis for queuing. The official Docker Compose file provisions all of them, so you don’t assemble it by hand.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use Langfuse with OpenClaw
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://openclaw.ai" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt; is an AI agent platform for orchestrating multi-step automated workflows — exactly the kind of long-running, multi-call system where a single failed step is otherwise invisible. Pointing OpenClaw’s model calls at the Langfuse-wrapped client gives every automated run a full trace tree.&lt;/p&gt;

&lt;p&gt;A practical pairing: OpenClaw runs an unattended nightly pipeline (summarize new tickets, draft responses, flag anomalies). Each run is one Langfuse trace, with a span for every model call and tool use. In the morning you don’t re-read logs — you scan the Langfuse dashboard for any trace with a low score or an error span, open just those, and see exactly which step went sideways. Wire OpenClaw and Langfuse to the same free &lt;a href="https://toolfreebie.com/openrouter-free-ai-models/" rel="noopener noreferrer"&gt;OpenRouter&lt;/a&gt; or &lt;a href="https://toolfreebie.com/gemini-free-ai-api/" rel="noopener noreferrer"&gt;Gemini&lt;/a&gt; key and the whole observe-and-iterate loop costs nothing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Verdict
&lt;/h2&gt;

&lt;p&gt;Langfuse is the right default in 2026 for anyone shipping an LLM app who has been burned by a bug they couldn’t reproduce. It captures the full trace tree, manages your prompts as versioned objects, and runs evaluations — and it does all of that as an open-source platform you can self-host for free with no caps, or run on a free Cloud tier in ninety seconds. The framework-agnostic SDK means it fits whatever stack you already have, and the OpenAI drop-in wrapper means your first trace is one import line away.&lt;/p&gt;

&lt;p&gt;LangSmith is the smoother ride if you live entirely in LangChain, Phoenix is the notebook-native choice for evals, and Helicone wins on pure zero-effort logging. But for the broadest combination of openness, features, and a real free path, Langfuse is the one to install first. Spin up the Docker stack or grab a Cloud Hobby key, change one import in your app, and watch your first trace appear — then ask yourself how you ever debugged AI without it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Reads
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/best-free-ai-apis-2026/" rel="noopener noreferrer"&gt;10 Best Free AI APIs in 2026: The Ultimate Comparison&lt;/a&gt; — pick the model provider you’ll trace with Langfuse&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/crewai-multi-agent-framework/" rel="noopener noreferrer"&gt;CrewAI: Free Open-Source Multi-Agent AI Framework for Python&lt;/a&gt; — multi-agent systems that beg to be observed&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/crewai-vs-autogpt-vs-langgraph/" rel="noopener noreferrer"&gt;CrewAI vs AutoGPT vs LangGraph: Which Free Agent Framework Should You Use?&lt;/a&gt; — the orchestration layer Langfuse sits underneath&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/mcp-protocol-ai-agents/" rel="noopener noreferrer"&gt;MCP (Model Context Protocol): Connect AI Agents to Any Tool or API&lt;/a&gt; — the tool calls that show up as spans in your traces&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/cohere-rag-api/" rel="noopener noreferrer"&gt;Cohere Free API: The Best Free Embedding and Rerank API for RAG&lt;/a&gt; — the RAG stack you’ll be debugging trace-by-trace&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/dify-ai-app-builder/" rel="noopener noreferrer"&gt;Dify: Free Open-Source AI App Builder for Chatbots and Workflows&lt;/a&gt; — another open-source layer in the free AI stack&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://toolfreebie.com/langfuse-llm-observability/" rel="noopener noreferrer"&gt;toolfreebie.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
    </item>
    <item>
      <title>Which Free Text-to-Speech API Should You Use in 2026?</title>
      <dc:creator>toolfreebie</dc:creator>
      <pubDate>Thu, 28 May 2026 08:57:09 +0000</pubDate>
      <link>https://dev.to/build996/which-free-text-to-speech-api-should-you-use-in-2026-20dj</link>
      <guid>https://dev.to/build996/which-free-text-to-speech-api-should-you-use-in-2026-20dj</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdzyjlta2vtjbzq36w2ya.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdzyjlta2vtjbzq36w2ya.jpg" alt="Which Free Text-to-Speech API Should You Use in 2026?" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Which Free Text-to-Speech API Should You Use in 2026?
&lt;/h2&gt;

&lt;p&gt;If you searched for a &lt;strong&gt;free text-to-speech API&lt;/strong&gt;, you are almost certainly building one of three things: a voice feature for an app that needs to read text aloud, a content pipeline that turns articles or scripts into audio, or a voice agent that needs to &lt;em&gt;speak back&lt;/em&gt; after it transcribes. The good news is that 2026 is the best year ever to do this for free. The catch is that “free” means three completely different things across the major providers, and picking the wrong one wastes either your money or your weekend.&lt;/p&gt;

&lt;p&gt;Three names dominate the search results: &lt;strong&gt;Google Cloud Text-to-Speech&lt;/strong&gt;, &lt;strong&gt;ElevenLabs&lt;/strong&gt;, and &lt;strong&gt;OpenAI&lt;/strong&gt;. Google runs a genuine recurring free tier that refills every month. ElevenLabs has the best-sounding voices and the most generous voice-cloning features, but the smallest free quota. OpenAI has no free tier at all — yet it is so cheap, and so trivial to wire into code you already wrote, that it belongs in any honest comparison.&lt;/p&gt;

&lt;p&gt;This guide compares all three on the metrics that decide the question: the real free-tier ceiling, what you pay once you cross it, voice quality and count, language coverage, latency, and the licensing fine print that quietly blocks commercial use on some “free” tiers. Every number links back to the provider’s own pricing or docs page — nothing here is invented benchmark theatre.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 30-Second Answer
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Free path&lt;/th&gt;
&lt;th&gt;Voice quality&lt;/th&gt;
&lt;th&gt;Paid rate (cheapest tier)&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Google Cloud TTS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Recurring monthly free tier (renews forever)&lt;/td&gt;
&lt;td&gt;Very good (WaveNet / Neural2 / Chirp 3)&lt;/td&gt;
&lt;td&gt;$4/1M chars (Standard), $16/1M (WaveNet)&lt;/td&gt;
&lt;td&gt;High-volume production audio on a permanent free quota&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ElevenLabs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;10,000 credits/month, no card&lt;/td&gt;
&lt;td&gt;Best in class, plus instant voice cloning&lt;/td&gt;
&lt;td&gt;$5/mo (30K credits) Starter&lt;/td&gt;
&lt;td&gt;Narration, audiobooks, character voices, cloning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenAI TTS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No free tier — pay as you go&lt;/td&gt;
&lt;td&gt;Good, steerable with &lt;code&gt;gpt-4o-mini-tts&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;$15/1M chars (&lt;code&gt;tts-1&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Adding voice to an app that already calls OpenAI&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you want a free quota that resets every single month and never expires, &lt;strong&gt;Google Cloud TTS&lt;/strong&gt; is the only one that fits — up to 4 million characters of Standard audio per month, free, indefinitely. If you care about how the voice &lt;em&gt;sounds&lt;/em&gt; above everything else — narration, audiobooks, game characters, or cloning your own voice — &lt;strong&gt;ElevenLabs&lt;/strong&gt; wins on quality even though its free quota is small. If you already have an OpenAI key wired into your codebase and just want your app to talk, &lt;strong&gt;OpenAI TTS&lt;/strong&gt; is the path of least friction, even though there is no free tier to speak of.&lt;/p&gt;

&lt;p&gt;The rest of this article unpacks why.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why “Free Text-to-Speech API” Is Worth Searching For
&lt;/h2&gt;

&lt;p&gt;Text-to-speech used to be either robotic and free (the old &lt;code&gt;espeak&lt;/code&gt; era) or human-sounding and expensive. That gap closed in 2024–2025. Neural TTS that is genuinely hard to distinguish from a human reader is now a commodity, and the providers compete on price and free quota rather than raw quality.&lt;/p&gt;

&lt;p&gt;The reason a free tier matters becomes obvious the moment you run the numbers on a real workload. Take a blog-to-podcast tool that converts 50 articles a month, each averaging 8,000 characters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;50 × 8,000 = &lt;strong&gt;400,000 characters/month&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;On Google Cloud Standard voices: &lt;strong&gt;$0&lt;/strong&gt; — comfortably inside the 4M-character free tier&lt;/li&gt;
&lt;li&gt;On Google WaveNet voices: &lt;strong&gt;$0&lt;/strong&gt; — inside the 1M-character premium free tier&lt;/li&gt;
&lt;li&gt;On OpenAI &lt;code&gt;tts-1&lt;/code&gt;: 400K × $15/1M = &lt;strong&gt;$6.00/month&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;On ElevenLabs: 400K characters far exceeds the 10K free credits — you would need the &lt;strong&gt;$22/month Creator&lt;/strong&gt; plan (100K credits) or higher&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The same workload ranges from free to $22/month depending purely on which provider you pick. That is the entire reason this comparison exists.&lt;/p&gt;

&lt;h3&gt;
  
  
  What “free” actually means in TTS (three different shapes)
&lt;/h3&gt;

&lt;p&gt;There are three distinct shapes of “free text-to-speech API” in 2026, and conflating them is the most common mistake:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Recurring free tier:&lt;/strong&gt; A quota that resets every month, forever, as long as your account is in good standing. &lt;em&gt;Google Cloud, Microsoft Azure, and ElevenLabs&lt;/em&gt; all do this (in very different sizes). This is the only shape that supports an ongoing free product.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Time-limited free tier:&lt;/strong&gt; A generous quota that only lasts your first 12 months. &lt;em&gt;Amazon Polly&lt;/em&gt; uses this. Great for a launch year, then it disappears.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pay-as-you-go, no free tier:&lt;/strong&gt; No standing free quota at all, but the per-character price is so low it is effectively free at small volume. &lt;em&gt;OpenAI&lt;/em&gt; is the headline example.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A recurring free tier is what you want for a side project or a low-volume production feature. Pay-as-you-go is what you want when the integration friction of a second vendor outweighs a few dollars a month. Knowing which shape you are signing up for prevents the nasty surprise of a “free” tier evaporating after a year.&lt;/p&gt;

&lt;h2&gt;
  
  
  Google Cloud Text-to-Speech: The Only True Recurring Free Tier
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://cloud.google.com/text-to-speech/pricing" rel="noopener noreferrer"&gt;Google Cloud Text-to-Speech&lt;/a&gt; is the workhorse answer for anyone who needs real volume without a bill. Unlike a one-time signup credit, Google’s free tier &lt;strong&gt;renews every month&lt;/strong&gt; and never expires, which makes it the closest thing to a permanently free TTS API at scale.&lt;/p&gt;

&lt;h3&gt;
  
  
  The free tier (the real numbers)
&lt;/h3&gt;

&lt;p&gt;Google’s published free monthly allowances, by voice family, at the time of writing:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Voice type&lt;/th&gt;
&lt;th&gt;Free per month&lt;/th&gt;
&lt;th&gt;Paid rate after free tier&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Standard&lt;/strong&gt; (basic neural)&lt;/td&gt;
&lt;td&gt;0–4 million characters&lt;/td&gt;
&lt;td&gt;$4.00 / 1M characters&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;WaveNet / Neural2&lt;/strong&gt; (premium)&lt;/td&gt;
&lt;td&gt;0–1 million characters&lt;/td&gt;
&lt;td&gt;$16.00 / 1M characters&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Studio&lt;/strong&gt; (long-form premium)&lt;/td&gt;
&lt;td&gt;0–100K characters&lt;/td&gt;
&lt;td&gt;$160 / 1M characters&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The 4-million-character Standard free tier is the headline. That is roughly &lt;strong&gt;66 hours of spoken audio every month&lt;/strong&gt; at an average speaking rate — enough to run a daily news-reader bot, an accessibility “read this page aloud” feature, or a blog-to-audio pipeline indefinitely without paying a cent. The premium WaveNet/Neural2 tier (1M chars free) is where you go when you want the more natural-sounding voices and can stay under ~16 hours of audio per month.&lt;/p&gt;

&lt;h3&gt;
  
  
  Voices and languages
&lt;/h3&gt;

&lt;p&gt;Google ships &lt;strong&gt;380+ voices across 50+ languages and variants&lt;/strong&gt;, with full &lt;a href="https://cloud.google.com/text-to-speech/docs/ssml" rel="noopener noreferrer"&gt;SSML&lt;/a&gt; support — so you can control pauses, pronunciation, pitch, speaking rate, and emphasis with markup. The newer &lt;strong&gt;Chirp 3: HD&lt;/strong&gt; voices push quality close to ElevenLabs for supported languages. The trade-off versus ElevenLabs is that Google does not offer arbitrary instant voice cloning on the public API; you pick from the catalogue.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code: synthesize speech with Google Cloud TTS
&lt;/h3&gt;

&lt;p&gt;The REST API takes JSON in, returns base64-encoded audio:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {"text": "Hello from a free text to speech API."},
    "voice": {"languageCode": "en-US", "name": "en-US-Standard-C"},
    "audioConfig": {"audioEncoding": "MP3"}
  }' \
  "https://texttospeech.googleapis.com/v1/text:synthesize" \
  | jq -r '.audioContent' | base64 --decode &amp;gt; out.mp3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Python with the official client library:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from google.cloud import texttospeech

client = texttospeech.TextToSpeechClient()

response = client.synthesize_speech(
    input=texttospeech.SynthesisInput(text="Hello from a free text to speech API."),
    voice=texttospeech.VoiceSelectionParams(
        language_code="en-US",
        name="en-US-Standard-C",  # swap to en-US-Neural2-F for premium
    ),
    audio_config=texttospeech.AudioConfig(
        audio_encoding=texttospeech.AudioEncoding.MP3,
        speaking_rate=1.0,
    ),
)

with open("out.mp3", "wb") as f:
    f.write(response.audio_content)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Where Google Cloud TTS is a poor fit
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Requires a GCP account with a credit card on file.&lt;/strong&gt; You won’t be charged inside the free tier, but the card and billing setup are mandatory — a higher barrier than ElevenLabs’ email-only signup.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No arbitrary voice cloning.&lt;/strong&gt; Custom Voice exists but is an enterprise onboarding process, not a self-serve “upload 30 seconds of audio” feature like ElevenLabs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auth is heavier.&lt;/strong&gt; Service-account JSON or ADC, not a single bearer token you paste into a header. Worth the setup for the free volume, but it is a setup.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  ElevenLabs: Best Voice Quality and Free Voice Cloning
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://elevenlabs.io/pricing" rel="noopener noreferrer"&gt;ElevenLabs&lt;/a&gt; is the provider people reach for when the &lt;em&gt;sound&lt;/em&gt; matters more than the price. Its voices set the bar for emotional range, breath, and prosody, and it is the only major option where instant voice cloning and a large public voice library are first-class, self-serve features.&lt;/p&gt;

&lt;h3&gt;
  
  
  The free tier: 10,000 credits/month
&lt;/h3&gt;

&lt;p&gt;ElevenLabs gives every new account &lt;strong&gt;10,000 credits per month, no credit card required&lt;/strong&gt;. For the standard Multilingual v2 model, that works out to roughly &lt;strong&gt;10 minutes of generated audio per month&lt;/strong&gt;. The lighter Flash v2.5 and Turbo v2.5 models consume half a credit per character, so the same quota stretches to about 20 minutes of audio if you use them.&lt;/p&gt;

&lt;p&gt;Two pieces of fine print matter a lot:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Attribution is required on the free tier.&lt;/strong&gt; You must credit ElevenLabs when you publish audio generated on the free plan.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Commercial use requires a paid plan.&lt;/strong&gt; The free tier is for non-commercial use; the moment you monetize the output you need at least the $5/month Starter plan (30,000 credits), which also removes attribution and unlocks instant voice cloning.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The 10K free credits are best understood as a high-quality evaluation and hobby tier, not a free production backend. If voice quality is your priority and your volume is genuinely tiny — a personal project, a demo, a handful of clips — it is excellent. If you need hours of audio per month for free, Google wins on quota.&lt;/p&gt;

&lt;h3&gt;
  
  
  Models and languages
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Strength&lt;/th&gt;
&lt;th&gt;Credit cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;eleven_multilingual_v2&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Highest quality, most expressive, 29 languages&lt;/td&gt;
&lt;td&gt;1 credit / char&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;eleven_flash_v2_5&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;~75 ms latency, ideal for real-time voice agents&lt;/td&gt;
&lt;td&gt;0.5 credit / char&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;eleven_turbo_v2_5&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Balance of quality and speed&lt;/td&gt;
&lt;td&gt;0.5 credit / char&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Code: synthesize speech with ElevenLabs
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -X POST \
  "https://api.elevenlabs.io/v1/text-to-speech/JBFqnCBsd6RMkjVDRZzb" \
  -H "xi-api-key: $ELEVENLABS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello from the best-sounding free text to speech API.",
    "model_id": "eleven_multilingual_v2"
  }' \
  --output out.mp3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Python with the official SDK:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from elevenlabs.client import ElevenLabs

client = ElevenLabs(api_key=os.environ["ELEVENLABS_API_KEY"])

audio = client.text_to_speech.convert(
    voice_id="JBFqnCBsd6RMkjVDRZzb",   # a stock library voice
    model_id="eleven_flash_v2_5",       # low-latency for agents
    text="Hello from the best-sounding free text to speech API.",
    output_format="mp3_44100_128",
)

with open("out.mp3", "wb") as f:
    for chunk in audio:
        f.write(chunk)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Where ElevenLabs is a poor fit
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tiny free quota.&lt;/strong&gt; 10 minutes of audio per month is for evaluation, not for shipping a free product at any volume.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No commercial use without paying.&lt;/strong&gt; If your project earns money, the free tier is off the table by license, regardless of volume.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-character cost is the highest of the three at scale.&lt;/strong&gt; You pay for the quality. For plain functional narration where any neural voice is fine, Google or OpenAI is cheaper.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  OpenAI TTS: No Free Tier, but Cheap and Frictionless
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://platform.openai.com/docs/guides/text-to-speech" rel="noopener noreferrer"&gt;OpenAI’s audio API&lt;/a&gt; has no recurring free tier — every character is billed against your OpenAI usage. It earns a place in this comparison anyway, because the per-character price is low enough to be effectively free at hobby volume, and because if you already call OpenAI for chat or Whisper, adding speech is one more method on a client you have already configured.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pricing and models
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Price&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tts-1&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Standard quality, lowest latency&lt;/td&gt;
&lt;td&gt;$15 / 1M characters&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tts-1-hd&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Higher audio fidelity&lt;/td&gt;
&lt;td&gt;$30 / 1M characters&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;gpt-4o-mini-tts&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Newer, steerable — you can instruct tone and delivery&lt;/td&gt;
&lt;td&gt;Billed in audio tokens (≈ a few cents per long passage)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;At $15 per million characters, generating 10,000 characters — roughly the same audio length as ElevenLabs’ entire monthly free quota — costs &lt;strong&gt;$0.15&lt;/strong&gt;. For a personal project that produces a few thousand characters a day, you might spend under a dollar a month. There is no free tier, but there is also no quota to blow through; you simply pay for what you use.&lt;/p&gt;

&lt;p&gt;The standout feature of &lt;code&gt;gpt-4o-mini-tts&lt;/code&gt; is &lt;em&gt;steerability&lt;/em&gt;: you can pass an instruction like “speak in a calm, sympathetic tone” alongside the text, and the model adapts delivery — something neither Google’s catalogue voices nor ElevenLabs’ standard endpoint do out of the box.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code: synthesize speech with OpenAI
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from openai import OpenAI

client = OpenAI()  # reads OPENAI_API_KEY

with client.audio.speech.with_streaming_response.create(
    model="gpt-4o-mini-tts",
    voice="nova",
    input="Hello from a pay-as-you-go text to speech API.",
    instructions="Speak in a warm, upbeat tone.",
) as response:
    response.stream_to_file("out.mp3")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The eleven built-in voices (&lt;code&gt;alloy&lt;/code&gt;, &lt;code&gt;echo&lt;/code&gt;, &lt;code&gt;fable&lt;/code&gt;, &lt;code&gt;onyx&lt;/code&gt;, &lt;code&gt;nova&lt;/code&gt;, &lt;code&gt;shimmer&lt;/code&gt;, plus the newer &lt;code&gt;ash&lt;/code&gt;, &lt;code&gt;ballad&lt;/code&gt;, &lt;code&gt;coral&lt;/code&gt;, &lt;code&gt;sage&lt;/code&gt;, &lt;code&gt;verse&lt;/code&gt;) cover most needs. There is no custom voice cloning.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where OpenAI TTS is a poor fit
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No free tier at all.&lt;/strong&gt; If “$0/month” is a hard requirement, this is the wrong choice — pick Google.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No voice cloning, limited voice catalogue.&lt;/strong&gt; Eleven voices versus Google’s 380+ or ElevenLabs’ huge library.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You need a credit card and standing billing.&lt;/strong&gt; Same barrier as Google, without the recurring free quota to justify it.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Honorable Mentions: Other Free (or Free-ish) TTS APIs
&lt;/h2&gt;

&lt;p&gt;The big three above are the practical answers, but several alternatives are worth knowing about depending on your stack.&lt;/p&gt;

&lt;h3&gt;
  
  
  Microsoft Azure AI Speech
&lt;/h3&gt;

&lt;p&gt;Azure’s neural TTS includes a &lt;strong&gt;recurring free tier of 500,000 characters per month&lt;/strong&gt; for standard neural voices, renewing monthly like Google’s. It supports 400+ voices across 140+ languages and has the strongest catalogue for enterprise scenarios and custom neural voice (with approval). If your infrastructure is already on Azure, it is the natural pick.&lt;/p&gt;

&lt;h3&gt;
  
  
  Amazon Polly
&lt;/h3&gt;

&lt;p&gt;Polly’s &lt;strong&gt;free tier is time-limited to your first 12 months&lt;/strong&gt;: 5 million characters/month for standard voices and 1 million characters/month for neural voices. Generous during a launch year, but it is not a permanent free tier — after 12 months you pay standard rates. Best if you are already in the AWS ecosystem.&lt;/p&gt;

&lt;h3&gt;
  
  
  Deepgram Aura
&lt;/h3&gt;

&lt;p&gt;Deepgram added a TTS model family (&lt;strong&gt;Aura&lt;/strong&gt;) to complement its speech-to-text stack. There is no permanent free tier, but the same $200 signup credit that covers transcription also covers Aura synthesis — useful if you want one vendor for both directions of a voice pipeline. See our &lt;a href="https://toolfreebie.com/free-whisper-api-compared/" rel="noopener noreferrer"&gt;free Whisper API comparison&lt;/a&gt; for the speech-to-text side of Deepgram.&lt;/p&gt;

&lt;h3&gt;
  
  
  Self-host: Piper, Coqui, and Kokoro
&lt;/h3&gt;

&lt;p&gt;If you want truly free at the marginal level and have any hardware, open-source TTS has caught up fast. &lt;a href="https://github.com/rhasspy/piper" rel="noopener noreferrer"&gt;Piper&lt;/a&gt; runs fast neural TTS on a Raspberry Pi. &lt;strong&gt;Kokoro-82M&lt;/strong&gt; is a tiny, high-quality open model that runs on CPU. Coqui TTS offers voice cloning locally. The trade-off is the usual self-hosting tax: you own the setup, the updates, and the crashes. For a personal tool this is genuinely free; for a SaaS, the operational time rarely beats Google’s free tier until you are well past it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Side-by-Side Spec Sheet
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Google Cloud TTS&lt;/th&gt;
&lt;th&gt;ElevenLabs&lt;/th&gt;
&lt;th&gt;OpenAI TTS&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Free tier shape&lt;/td&gt;
&lt;td&gt;Recurring monthly (forever)&lt;/td&gt;
&lt;td&gt;Recurring monthly (forever)&lt;/td&gt;
&lt;td&gt;None (pay as you go)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Free monthly volume&lt;/td&gt;
&lt;td&gt;4M chars Standard / 1M premium&lt;/td&gt;
&lt;td&gt;10,000 credits (~10 min)&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Credit card to start&lt;/td&gt;
&lt;td&gt;Required&lt;/td&gt;
&lt;td&gt;Not required&lt;/td&gt;
&lt;td&gt;Required&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Commercial use on free tier&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No (paid plan required)&lt;/td&gt;
&lt;td&gt;Yes (it’s all paid)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Voice count&lt;/td&gt;
&lt;td&gt;380+&lt;/td&gt;
&lt;td&gt;Large library + cloning&lt;/td&gt;
&lt;td&gt;11 built-in&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Languages&lt;/td&gt;
&lt;td&gt;50+&lt;/td&gt;
&lt;td&gt;29 (Multilingual v2)&lt;/td&gt;
&lt;td&gt;Multilingual (follows input)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Voice cloning&lt;/td&gt;
&lt;td&gt;Enterprise only&lt;/td&gt;
&lt;td&gt;Yes, self-serve (paid)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lowest latency option&lt;/td&gt;
&lt;td&gt;Standard voices&lt;/td&gt;
&lt;td&gt;Flash v2.5 (~75 ms)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;tts-1&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SSML / prosody control&lt;/td&gt;
&lt;td&gt;Full SSML&lt;/td&gt;
&lt;td&gt;Limited (model-driven)&lt;/td&gt;
&lt;td&gt;Steerable via instructions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cheapest paid rate&lt;/td&gt;
&lt;td&gt;$4 / 1M chars (Standard)&lt;/td&gt;
&lt;td&gt;$5/mo (30K credits)&lt;/td&gt;
&lt;td&gt;$15 / 1M chars&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auth&lt;/td&gt;
&lt;td&gt;Service account / ADC&lt;/td&gt;
&lt;td&gt;API key header&lt;/td&gt;
&lt;td&gt;API key&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Decision Tree: Which One Should You Pick?
&lt;/h2&gt;

&lt;p&gt;Run through this list top to bottom. The first row that matches your situation is your answer.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;I need hours of audio per month, for free, forever.&lt;/strong&gt; → &lt;strong&gt;Google Cloud TTS.&lt;/strong&gt; The 4M-character recurring Standard tier is the only quota that supports this.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Voice quality is the whole point — narration, audiobook, character voice.&lt;/strong&gt; → &lt;strong&gt;ElevenLabs&lt;/strong&gt; if non-commercial and low volume; pay $5/month Starter the moment you monetize.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;I want to clone a specific voice from a short sample.&lt;/strong&gt; → &lt;strong&gt;ElevenLabs&lt;/strong&gt; (instant voice cloning, paid). No one else does this self-serve.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;I already call OpenAI for chat or Whisper and just want my app to talk.&lt;/strong&gt; → &lt;strong&gt;OpenAI TTS.&lt;/strong&gt; Same client, same key, ~$0.15 per 10K characters.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;I need fine pronunciation, pause, and pitch control via markup.&lt;/strong&gt; → &lt;strong&gt;Google Cloud TTS&lt;/strong&gt; (full SSML support).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;I want the model to adapt tone from an instruction (“sound sympathetic”).&lt;/strong&gt; → &lt;strong&gt;OpenAI &lt;code&gt;gpt-4o-mini-tts&lt;/code&gt;&lt;/strong&gt;, the only one with self-serve steerability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;My infrastructure is already on Azure or AWS.&lt;/strong&gt; → &lt;strong&gt;Azure AI Speech&lt;/strong&gt; (500K chars/month free, recurring) or &lt;strong&gt;Amazon Polly&lt;/strong&gt; (free for first 12 months).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;I want zero per-character cost and have hardware to run it.&lt;/strong&gt; → &lt;strong&gt;Self-host Piper or Kokoro.&lt;/strong&gt; Free at the margin, you own the ops.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Combining Free TTS with Free Whisper and a Free LLM
&lt;/h2&gt;

&lt;p&gt;The most powerful use of a free TTS API is not standalone playback — it is the final leg of a full voice loop. A complete, no-cost voice-agent stack in 2026 looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Speech in:&lt;/strong&gt; a &lt;a href="https://toolfreebie.com/free-whisper-api-compared/" rel="noopener noreferrer"&gt;free Whisper API&lt;/a&gt; (Groq’s no-card free tier is the cleanest) transcribes the user’s audio.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reasoning:&lt;/strong&gt; a free LLM — Groq Llama 3.3 70B, &lt;a href="https://toolfreebie.com/together-ai-free-api-llama-deepseek-flux-2026/" rel="noopener noreferrer"&gt;Together AI&lt;/a&gt;, or Google Gemini — generates the response text.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Speech out:&lt;/strong&gt; Google Cloud TTS (free monthly tier) or ElevenLabs Flash v2.5 (low latency) speaks the answer.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Three free quotas, zero cards if you stick to Groq plus Google’s free tier, and a complete speech-to-speech agent. The same architecture that costs real money on a single commercial vendor runs free as long as each provider’s monthly ceiling holds.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is there a truly free text-to-speech API with no time limit?
&lt;/h3&gt;

&lt;p&gt;Yes — &lt;strong&gt;Google Cloud Text-to-Speech&lt;/strong&gt; and &lt;strong&gt;Microsoft Azure AI Speech&lt;/strong&gt; both offer recurring monthly free tiers that renew indefinitely (4M and 500K characters respectively for their relevant voice tiers). They require a credit card on file, but you are not charged inside the free quota. Amazon Polly’s free tier, by contrast, only lasts your first 12 months.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which free TTS API has the best voice quality?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;ElevenLabs&lt;/strong&gt; is widely regarded as the most natural and expressive, especially for long-form narration and emotional delivery. Google’s newer &lt;strong&gt;Chirp 3: HD&lt;/strong&gt; voices are very close for supported languages and come with a far larger free quota. OpenAI’s voices are good and improving, with &lt;code&gt;gpt-4o-mini-tts&lt;/code&gt; adding tone steerability. If quality is the only axis, ElevenLabs; if quality-per-free-character, Google.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use a free TTS API commercially?
&lt;/h3&gt;

&lt;p&gt;It depends on the provider. &lt;strong&gt;Google Cloud and OpenAI&lt;/strong&gt; allow commercial use of generated audio (Google inside its free tier, OpenAI as paid usage). &lt;strong&gt;ElevenLabs’ free tier is non-commercial only&lt;/strong&gt; and requires attribution — you must upgrade to at least the $5/month Starter plan to monetize the output. Always re-read each provider’s terms before shipping; licensing changes.&lt;/p&gt;

&lt;h3&gt;
  
  
  How many characters is one minute of speech?
&lt;/h3&gt;

&lt;p&gt;At a natural speaking rate of roughly 150 words per minute and ~5 characters per word plus spaces, one minute of audio is approximately &lt;strong&gt;900–1,000 characters&lt;/strong&gt;. So Google’s 4M-character Standard free tier is roughly 66 hours per month, and ElevenLabs’ 10K credits is about 10 minutes on the Multilingual v2 model.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which TTS API is best for a real-time voice agent?
&lt;/h3&gt;

&lt;p&gt;Latency is the deciding factor. &lt;strong&gt;ElevenLabs Flash v2.5&lt;/strong&gt; targets ~75 ms model latency and is purpose-built for conversational agents. &lt;strong&gt;OpenAI &lt;code&gt;tts-1&lt;/code&gt;&lt;/strong&gt; and Google’s Standard voices are also fast enough for most interactive use. For the lowest possible end-to-end latency, stream the audio as it is generated rather than waiting for the full file.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do these APIs support SSML?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Google Cloud TTS&lt;/strong&gt; has the most complete SSML support — pauses, pronunciation via phonemes, pitch, rate, and emphasis. &lt;strong&gt;Azure&lt;/strong&gt; also has strong SSML. &lt;strong&gt;ElevenLabs&lt;/strong&gt; relies more on its model’s inherent prosody than on markup, and &lt;strong&gt;OpenAI&lt;/strong&gt; uses natural-language instructions (with &lt;code&gt;gpt-4o-mini-tts&lt;/code&gt;) instead of SSML tags.&lt;/p&gt;

&lt;h3&gt;
  
  
  What audio formats can I get back?
&lt;/h3&gt;

&lt;p&gt;All three return MP3 by default and support additional formats: Google offers LINEAR16 (WAV), OGG Opus, and MULAW; OpenAI offers MP3, Opus, AAC, FLAC, WAV, and PCM; ElevenLabs offers MP3 at several bitrates plus PCM and µ-law for telephony. Pick Opus or low-bitrate MP3 for streaming, WAV/PCM when you need to post-process the audio.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Reads
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/free-whisper-api-compared/" rel="noopener noreferrer"&gt;Free Whisper API: Groq, Deepgram, AssemblyAI Compared&lt;/a&gt; — the speech-to-text half of any voice pipeline, and the natural companion to this guide.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/groq-fastest-free-ai-api/" rel="noopener noreferrer"&gt;Groq API: The Fastest Free AI API in 2026&lt;/a&gt; — the free LLM that powers the reasoning step between transcription and speech.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/together-ai-free-api-llama-deepseek-flux-2026/" rel="noopener noreferrer"&gt;Together AI Free API: Llama, DeepSeek, FLUX&lt;/a&gt; — another free model source for generating the text your TTS reads aloud.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/gemini-free-ai-api/" rel="noopener noreferrer"&gt;Google Gemini API: The Best Free AI API in 2026&lt;/a&gt; — pairs naturally with Google Cloud TTS for an all-Google free stack.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/best-free-ai-apis-2026/" rel="noopener noreferrer"&gt;10 Best Free AI APIs in 2026&lt;/a&gt; — the wider map of which free API does what.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://toolfreebie.com/free-text-to-speech-api/" rel="noopener noreferrer"&gt;toolfreebie.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>opensource</category>
    </item>
    <item>
      <title>5 Free AI Coding Assistants for VS Code &amp; Terminal</title>
      <dc:creator>toolfreebie</dc:creator>
      <pubDate>Thu, 28 May 2026 08:51:40 +0000</pubDate>
      <link>https://dev.to/build996/5-free-ai-coding-assistants-for-vs-code-terminal-2adj</link>
      <guid>https://dev.to/build996/5-free-ai-coding-assistants-for-vs-code-terminal-2adj</guid>
      <description>&lt;p&gt;If you write code for a living in 2026, you have probably tried Cursor, GitHub Copilot, or one of the other paid AI coding tools and walked away thinking the same thing: &lt;em&gt;this is genuinely useful, but $20 to $40 a month adds up fast&lt;/em&gt;. The good news is that the free, open-source side of this market has caught up. You can now get high-quality autocomplete, multi-file refactoring, autonomous agent loops, and even self-hosted local inference without paying a cent, as long as you are willing to bring your own free-tier API key (or run a model locally).&lt;/p&gt;

&lt;p&gt;This guide covers five free AI coding assistants that I actually use in 2026. They split cleanly across two environments: VS Code (where Cline, Continue.dev, and Codeium live) and the terminal (Aider, plus self-hosted Tabby for the privacy-first crowd). Every one is free, every one is open-source or has a permanently free tier, and every one is genuinely production-ready.&lt;/p&gt;

&lt;h2&gt;
  
  
  What “free AI coding assistant” actually means in 2026
&lt;/h2&gt;

&lt;p&gt;The phrase covers three different product shapes, and the differences matter when you pick one:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Autocomplete&lt;/strong&gt; — inline ghost text as you type. Continue.dev and Codeium are the strongest free options here.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chat / refactor&lt;/strong&gt; — a side panel that answers questions about your code and applies suggested edits. Every tool on this list does this; quality varies with the model behind it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent&lt;/strong&gt; — autonomous multi-file edits, terminal execution, and self-verification. This is the Cursor / Devin shape. Cline and Aider are the two strongest free agents.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The pricing line in 2026 is drawn around inference cost, not features. Paid tools (Cursor, Copilot, Cody) bundle inference into a flat subscription. Free tools ask you to bring your own key from a provider with a real free tier — &lt;a href="https://toolfreebie.com/gemini-free-ai-api/" rel="noopener noreferrer"&gt;Gemini&lt;/a&gt;, &lt;a href="https://toolfreebie.com/groq-fastest-free-ai-api/" rel="noopener noreferrer"&gt;Groq&lt;/a&gt;, &lt;a href="https://toolfreebie.com/openrouter-free-ai-models/" rel="noopener noreferrer"&gt;OpenRouter&lt;/a&gt;, &lt;a href="https://toolfreebie.com/deepseek-free-api/" rel="noopener noreferrer"&gt;DeepSeek&lt;/a&gt;, or a local &lt;a href="https://toolfreebie.com/ollama-run-ai-locally/" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt; instance. Combine the right free key with the right open-source frontend and your effective cost is zero.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Cline — the best free agent for VS Code
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/cline/cline" rel="noopener noreferrer"&gt;Cline&lt;/a&gt; (formerly Claude Dev) is the closest free analogue to Cursor’s agent mode. It is an Apache 2.0 VS Code extension that drives a multi-step loop: read files, propose edits, execute terminal commands, verify results, and iterate. You see every step before it runs and can stop or correct it.&lt;/p&gt;

&lt;p&gt;What makes Cline stand out among free options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Plan / Act mode&lt;/strong&gt; — you can ask it to draft a plan first (read-only) and only switch to Act when you approve. This is the single biggest UX improvement over running an agent “raw.”&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;BYOK with anything&lt;/strong&gt; — Gemini, OpenRouter, Groq, Together, DeepSeek, Anthropic, OpenAI, or local Ollama. The free path is Gemini 2.0 Flash (15 RPM, 1M token context) or DeepSeek V3 via OpenRouter free tier.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Live cost tracking&lt;/strong&gt; — every message shows token counts and the dollar cost so far. With Gemini Flash you watch it stay at $0.00.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP support&lt;/strong&gt; — Cline is one of the first agents to integrate the &lt;a href="https://toolfreebie.com/mcp-protocol-ai-agents/" rel="noopener noreferrer"&gt;Model Context Protocol&lt;/a&gt;, so you can plug in custom tools (databases, browsers, internal APIs) without writing extension code.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Real workflow: install Cline from the VS Code marketplace, paste a Gemini API key (free from &lt;a href="https://aistudio.google.com" rel="noopener noreferrer"&gt;aistudio.google.com&lt;/a&gt;, no card), open a Python repo, and type “add type hints to every function in src/ and run mypy until it passes.” Cline reads the files, makes edits, runs &lt;code&gt;mypy&lt;/code&gt;, sees the errors, fixes them, and runs again. End-to-end on a small repo this takes 3-5 minutes and costs $0.&lt;/p&gt;

&lt;p&gt;Where it falls short: Cline is agent-only, not autocomplete. If you want ghost-text-as-you-type, you need to pair it with Continue.dev or Codeium.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Aider — the strongest terminal-native AI pair programmer
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://aider.chat" rel="noopener noreferrer"&gt;Aider&lt;/a&gt; is the answer if you spend your day in a terminal and a tmux session, not a GUI editor. It is a Python CLI (Apache 2.0) that opens an interactive prompt inside a Git repo and edits files in place, committing each change with a descriptive message you can read in &lt;code&gt;git log&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The things Aider does better than any other free tool:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Repo map via tree-sitter&lt;/strong&gt; — Aider parses your entire codebase into a symbol map and feeds the LLM only the relevant parts. On a 100k-line repo this means the model still understands cross-file dependencies without busting your context window.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Architect / editor split&lt;/strong&gt; — you can run a strong reasoning model (DeepSeek R1, o1-mini) as the architect and a cheap fast model (DeepSeek V3, Gemini Flash) as the editor. The architect plans, the editor writes. This is the cheapest way to get high-quality changes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auto-commit with diff messages&lt;/strong&gt; — every Aider edit becomes a Git commit you can &lt;code&gt;git revert&lt;/code&gt;. No “agent went off the rails and trashed my repo” recovery.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reproducible benchmark&lt;/strong&gt; — Aider publishes a &lt;a href="https://aider.chat/docs/leaderboards/" rel="noopener noreferrer"&gt;leaderboard&lt;/a&gt; running 225 real exercism problems through every model combination, so you can pick the cheapest model that hits your accuracy bar.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Free combo I run: &lt;code&gt;aider --model openrouter/deepseek/deepseek-chat --architect-model openrouter/deepseek/deepseek-r1&lt;/code&gt; using the OpenRouter free tier. End-to-end cost for a typical refactor session is under $0.05, often $0.00 when you stay under the daily free quota.&lt;/p&gt;

&lt;p&gt;Where it falls short: terminal-only, no autocomplete, no inline edit preview. If you live in VS Code, Cline is the better fit.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Continue.dev — the best free autocomplete for VS Code and JetBrains
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://continue.dev" rel="noopener noreferrer"&gt;Continue.dev&lt;/a&gt; is Apache 2.0, runs in VS Code and JetBrains, and gives you what Copilot gives you (inline ghost text + chat panel + slash commands) without the subscription. The catch and the feature: you wire up your own model providers in a YAML config file.&lt;/p&gt;

&lt;p&gt;What you actually get for free:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Inline autocomplete&lt;/strong&gt; — uses a small fast model for ghost text. The recommended free option is Qwen 2.5 Coder 1.5B via Ollama (runs on CPU), or Groq’s free Llama 3.1 8B endpoint for cloud speed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chat panel&lt;/strong&gt; — point it at any chat-completions endpoint. Gemini Flash, DeepSeek V3, OpenRouter free models, or local Llama 3.3 via Ollama all work.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom slash commands&lt;/strong&gt; — define &lt;code&gt;/test&lt;/code&gt;, &lt;code&gt;/review&lt;/code&gt;, &lt;code&gt;/explain&lt;/code&gt; as YAML prompts that pull in the current file or selection. Closest thing to Cursor’s command palette in a free tool.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Indexed codebase chat&lt;/strong&gt; — Continue runs a local embedding index of your repo (free Voyage AI or local nomic-embed-text via Ollama) so chat can pull relevant context from anywhere in the codebase.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Sample &lt;code&gt;config.yaml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;models&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Chat (Gemini Flash)&lt;/span&gt;
    &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gemini&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gemini-2.0-flash-exp&lt;/span&gt;
    &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;YOUR_FREE_GEMINI_KEY&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Autocomplete (Qwen Coder)&lt;/span&gt;
    &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ollama&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;qwen2.5-coder:1.5b&lt;/span&gt;
    &lt;span class="na"&gt;roles&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;autocomplete&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="na"&gt;embeddingsProvider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ollama&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nomic-embed-text&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Where it falls short: Continue is autocomplete and chat, not a full agent. For agentic multi-file work you still want Cline.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Codeium / Windsurf — the easiest free start, no config required
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://codeium.com" rel="noopener noreferrer"&gt;Codeium&lt;/a&gt; (the free product, distinct from their paid Windsurf IDE) gives you unlimited free autocomplete and chat in VS Code, JetBrains, Neovim, Emacs, and 40+ other editors. No bring-your-own-key, no quota counter, no credit card. Their business model funds the free tier with enterprise self-hosted licenses, and they have committed to keeping the individual plan permanently free.&lt;/p&gt;

&lt;p&gt;Why it stays on this list despite not being open-source:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Zero setup&lt;/strong&gt; — install extension, sign in with email, start typing. No model config, no API keys, no Ollama running in the background.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Truly unlimited&lt;/strong&gt; — Codeium does not rate-limit individual users on autocomplete or chat. The only paid features are Cascade (their agent) and team management.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Editor coverage no one else matches&lt;/strong&gt; — if you write Go in Neovim and TypeScript in JetBrains and Python in VS Code, Codeium is the only free tool that gives you the same UX everywhere.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local-only mode for enterprise&lt;/strong&gt; — Codeium can run fully on-prem with no telemetry, which is why government and large finance shops use it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What you give up: Codeium is not open-source and the free tier sends code through their hosted models. If that is a dealbreaker for your codebase, skip to Tabby below.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Tabby — self-hosted, fully local, fully free
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://tabby.tabbyml.com" rel="noopener noreferrer"&gt;Tabby&lt;/a&gt; (Apache 2.0) is the answer when your code cannot leave your machine. It is a self-hosted AI coding assistant that runs on your laptop or a workstation, ships its own server, and exposes a VS Code / JetBrains / Vim extension that talks to &lt;code&gt;localhost&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;What Tabby gives you that nothing else on this list does:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;100% local&lt;/strong&gt; — no API key, no internet, no telemetry. Code never leaves the machine.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One-command install&lt;/strong&gt; — &lt;code&gt;docker run -p 8080:8080 tabbyml/tabby serve --model StarCoder-1B --device cuda&lt;/code&gt; and you have a coding assistant. The default model fits on a CPU; with a consumer GPU you can run StarCoder-7B for noticeably better completions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Repo-aware retrieval&lt;/strong&gt; — Tabby indexes your codebase and pulls relevant context into each completion, the same trick Cursor uses but running entirely on your hardware.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Team-server mode&lt;/strong&gt; — point your colleagues’ editors at a shared Tabby server on a beefy machine. One GPU serves a small team.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Where it falls short: completion quality on the free local models (StarCoder, DeepSeek Coder 1.3B) is meaningfully below GPT-4-class output. Tabby is the right pick when privacy is non-negotiable, not when you want the best autocomplete.&lt;/p&gt;

&lt;h2&gt;
  
  
  Side-by-side comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Shape&lt;/th&gt;
&lt;th&gt;Environment&lt;/th&gt;
&lt;th&gt;License&lt;/th&gt;
&lt;th&gt;Setup&lt;/th&gt;
&lt;th&gt;Best free model combo&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cline&lt;/td&gt;
&lt;td&gt;Agent&lt;/td&gt;
&lt;td&gt;VS Code&lt;/td&gt;
&lt;td&gt;Apache 2.0&lt;/td&gt;
&lt;td&gt;2 min&lt;/td&gt;
&lt;td&gt;Gemini 2.0 Flash (free, 1M ctx)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Aider&lt;/td&gt;
&lt;td&gt;Agent&lt;/td&gt;
&lt;td&gt;Terminal&lt;/td&gt;
&lt;td&gt;Apache 2.0&lt;/td&gt;
&lt;td&gt;1 min (pip)&lt;/td&gt;
&lt;td&gt;DeepSeek V3 + R1 via OpenRouter free tier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Continue.dev&lt;/td&gt;
&lt;td&gt;Autocomplete + chat&lt;/td&gt;
&lt;td&gt;VS Code / JetBrains&lt;/td&gt;
&lt;td&gt;Apache 2.0&lt;/td&gt;
&lt;td&gt;10 min (config)&lt;/td&gt;
&lt;td&gt;Gemini Flash chat + Qwen Coder local autocomplete&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codeium&lt;/td&gt;
&lt;td&gt;Autocomplete + chat&lt;/td&gt;
&lt;td&gt;40+ editors&lt;/td&gt;
&lt;td&gt;Proprietary (free tier)&lt;/td&gt;
&lt;td&gt;30 sec&lt;/td&gt;
&lt;td&gt;Hosted (no choice, but unlimited)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tabby&lt;/td&gt;
&lt;td&gt;Autocomplete&lt;/td&gt;
&lt;td&gt;VS Code / JetBrains / Vim&lt;/td&gt;
&lt;td&gt;Apache 2.0&lt;/td&gt;
&lt;td&gt;5 min (Docker)&lt;/td&gt;
&lt;td&gt;Local StarCoder-7B&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Which one should you actually use?
&lt;/h2&gt;

&lt;p&gt;Honest decision tree:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;You want one tool, you live in VS Code, and you want agentic multi-file edits&lt;/strong&gt; → Cline + a free Gemini API key. Stop reading.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want one tool and you live in a terminal&lt;/strong&gt; → Aider with the DeepSeek architect/editor combo via OpenRouter.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want the best free autocomplete and zero setup hassle&lt;/strong&gt; → Codeium. Install, sign in, done.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want fully local, code never leaves your machine&lt;/strong&gt; → Tabby in Docker.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want power-user autocomplete with full control over which model runs where&lt;/strong&gt; → Continue.dev with a YAML config you can commit to your repo.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want the strongest possible setup overall&lt;/strong&gt; → Cline for agent work + Codeium for inline autocomplete. They do not conflict; you get ghost text from Codeium and large refactors from Cline.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Pairing with free AI APIs
&lt;/h2&gt;

&lt;p&gt;Three of these tools (Cline, Aider, Continue.dev) need an LLM provider. The free combos that work in 2026:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Google Gemini API&lt;/strong&gt; — Gemini 2.0 Flash is free up to 15 RPM and 1,500 requests/day, with a 1M-token context window that handles huge repos. &lt;a href="https://toolfreebie.com/gemini-free-ai-api/" rel="noopener noreferrer"&gt;Setup guide&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Groq&lt;/strong&gt; — Llama 3.3 70B and Qwen 32B free, 14,400 requests/day, very fast (300-800 tokens/s). Best for autocomplete-style requests where latency matters. &lt;a href="https://toolfreebie.com/groq-fastest-free-ai-api/" rel="noopener noreferrer"&gt;Setup guide&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DeepSeek&lt;/strong&gt; — V3 chat and R1 reasoning both have a free credit grant and DeepSeek’s own API is the cheapest paid tier if you exhaust it. &lt;a href="https://toolfreebie.com/deepseek-free-api/" rel="noopener noreferrer"&gt;Setup guide&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenRouter&lt;/strong&gt; — single key, 300+ models, several with permanent free endpoints (DeepSeek V3, Llama 3.3 70B, Qwen 32B). &lt;a href="https://toolfreebie.com/openrouter-free-ai-models/" rel="noopener noreferrer"&gt;Setup guide&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local Ollama&lt;/strong&gt; — runs Llama 3.3, Qwen 2.5 Coder, DeepSeek Coder, and others entirely on your machine. Zero API cost, zero rate limit. &lt;a href="https://toolfreebie.com/ollama-run-ai-locally/" rel="noopener noreferrer"&gt;Setup guide&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Is GitHub Copilot Free a real option?&lt;/strong&gt; GitHub announced a free Copilot tier in late 2024 for verified students and open-source maintainers, with a small monthly chat quota. It is genuinely free for those users, but the cap (50 chat messages, 2,000 completions per month) is low enough that for daily work the tools in this guide are more practical.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Are these as good as Cursor?&lt;/strong&gt; Cline running on Claude Sonnet 4.6 or Gemini 2.5 Pro is competitive with Cursor for agentic work — same loop, same UX patterns, same model behind the scenes. The gap is mostly polish, not capability. On free models the gap widens; you trade ~10-20% accuracy for $20/month saved.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I use these on a corporate codebase?&lt;/strong&gt; Check your security policy first. Cline, Aider, and Continue.dev send code to whichever API key you configure — Gemini, OpenRouter, etc. — and those providers have their own data-retention policies. Codeium has an opt-out for training data. Tabby is the only option that sends nothing anywhere.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do any of these work with local-only models?&lt;/strong&gt; Cline, Aider, and Continue.dev all support Ollama out of the box. Set the provider to &lt;code&gt;ollama&lt;/code&gt; and a model name like &lt;code&gt;qwen2.5-coder:32b&lt;/code&gt;. Tabby is local-only by design.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What about Cody and Tabnine?&lt;/strong&gt; Sourcegraph’s Cody is open-source with a free tier (200 autocompletes, 20 chat messages per month) — usable but capped. Tabnine has a free starter plan that is essentially a demo. Neither beats the five tools in this guide for the unlimited-free use case.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bottom line
&lt;/h2&gt;

&lt;p&gt;Free AI coding assistants in 2026 are not a downgrade from paid tools — they are the same tools with a different billing model. Cline gives you Cursor’s agent loop. Aider gives you something Cursor cannot (clean Git history, terminal-native, reproducible benchmarks). Continue.dev gives you Copilot-style autocomplete with full provider control. Codeium gives you the cleanest zero-setup install. Tabby gives you the only fully local option.&lt;/p&gt;

&lt;p&gt;Pick one based on your editor and your privacy needs, pair it with one of the free AI APIs above, and you have a setup that costs nothing and ships features at the same rate as a $40/month subscription. The only thing it costs you is ten minutes of config.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Reads
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/cline-vscode-ai-agent/" rel="noopener noreferrer"&gt;Cline: Free Open-Source AI Coding Agent for VS Code (Cursor Alternative)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/aider-free-ai-coding-cli/" rel="noopener noreferrer"&gt;Aider: Free Open-Source AI Coding Agent for Your Terminal&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/bolt-new-free-app-builder/" rel="noopener noreferrer"&gt;Bolt.new: Free AI App Builder That Codes, Runs, and Deploys in Your Browser&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/ollama-run-ai-locally/" rel="noopener noreferrer"&gt;Ollama: Run AI Models Locally for Free&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/openrouter-free-ai-models/" rel="noopener noreferrer"&gt;OpenRouter: Access 300+ Free AI Models with One API Key&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://toolfreebie.com/free-ai-coding-assistants/" rel="noopener noreferrer"&gt;toolfreebie.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
    </item>
    <item>
      <title>Qdrant vs Pinecone vs Chroma: Free Vector Database</title>
      <dc:creator>toolfreebie</dc:creator>
      <pubDate>Thu, 28 May 2026 08:46:12 +0000</pubDate>
      <link>https://dev.to/build996/qdrant-vs-pinecone-vs-chroma-free-vector-database-234k</link>
      <guid>https://dev.to/build996/qdrant-vs-pinecone-vs-chroma-free-vector-database-234k</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ydkc495175wc0721mtb.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ydkc495175wc0721mtb.jpg" alt="Qdrant vs Pinecone vs Chroma: Free Vector Database" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Qdrant vs Pinecone vs Chroma: Free Vector Database for RAG
&lt;/h1&gt;

&lt;p&gt;If you are building a retrieval-augmented generation (RAG) pipeline in 2026, the vector database is the load-bearing piece nobody talks about until it breaks. Embeddings are commoditised — &lt;a href="https://toolfreebie.com/cohere-rag-api/" rel="noopener noreferrer"&gt;Cohere&lt;/a&gt;, OpenAI, Voyage, and a dozen open models will turn your text into vectors for free or near-free. The harder question is where those vectors live, how fast you can search them, and how much you have to pay before the bill becomes scary.&lt;/p&gt;

&lt;p&gt;Three names dominate the free end of that market: &lt;strong&gt;Qdrant&lt;/strong&gt;, &lt;strong&gt;Pinecone&lt;/strong&gt;, and &lt;strong&gt;Chroma&lt;/strong&gt;. All three give you a real way to start a RAG project at zero cost. None of them require a credit card on day one. But they sit on fundamentally different points on the open-source-vs-managed and local-vs-cloud spectrums, and the right pick depends entirely on what you are building and how far you expect it to scale.&lt;/p&gt;

&lt;p&gt;This guide compares all three on the metrics that actually matter for a free RAG stack — what the free tier really lets you do, what happens when you outgrow it, performance numbers from third-party benchmarks, and the engineering trade-offs that hit you a month into the project. Every number cited links back to the provider’s own docs, GitHub repo, or a public benchmark; nothing here is fabricated.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 30-Second Answer
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Database&lt;/th&gt;
&lt;th&gt;Free path&lt;/th&gt;
&lt;th&gt;License&lt;/th&gt;
&lt;th&gt;Free ceiling&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Qdrant&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1 GB managed cloud cluster, free forever, no card&lt;/td&gt;
&lt;td&gt;Apache 2.0&lt;/td&gt;
&lt;td&gt;1 GB RAM + ~4 GB disk on managed; unlimited self-host&lt;/td&gt;
&lt;td&gt;Production RAG with hybrid search, payload filters, no vendor lock-in&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pinecone&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Starter plan: 2 GB storage, 5 indexes, no card&lt;/td&gt;
&lt;td&gt;Closed-source SaaS&lt;/td&gt;
&lt;td&gt;2 GB storage, 2M read units, 1M write units per month&lt;/td&gt;
&lt;td&gt;Zero-ops managed RAG, fastest first-vector-to-production&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Chroma&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;100% local — &lt;code&gt;pip install chromadb&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Apache 2.0&lt;/td&gt;
&lt;td&gt;Bounded by your laptop’s RAM and disk&lt;/td&gt;
&lt;td&gt;Local prototypes, notebooks, single-tenant desktop apps&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you want the smallest possible step from &lt;em&gt;idea&lt;/em&gt; to &lt;em&gt;working RAG&lt;/em&gt; with three lines of Python and no signup, &lt;strong&gt;Chroma&lt;/strong&gt; wins. If you want a managed service that just exists at a URL with no servers to babysit, &lt;strong&gt;Pinecone&lt;/strong&gt; is the easiest. If you want a real free tier that can carry a small production app, plus the option to self-host the exact same binary later when you outgrow it, &lt;strong&gt;Qdrant&lt;/strong&gt; is the only one of the three with both at the same time.&lt;/p&gt;

&lt;p&gt;The rest of this article unpacks why.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why You Need a Vector Database for RAG at All
&lt;/h2&gt;

&lt;p&gt;RAG, at its core, is one cheap trick: instead of stuffing your entire knowledge base into every LLM prompt, you embed your documents once, store the vectors, and at query time you embed the user’s question, look up the most similar document chunks by cosine similarity, and paste only those chunks into the prompt. The LLM never sees your full corpus — it only ever sees the few passages that matter for the current question.&lt;/p&gt;

&lt;p&gt;This makes the vector-search step the bottleneck. Three properties decide whether your RAG app is good:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Recall:&lt;/strong&gt; does the retriever actually return the relevant chunk? (Approximate-nearest-neighbour algorithms are tunable — you can trade speed for recall.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency:&lt;/strong&gt; how long does a single query take? If your RAG round trip is 800 ms before the LLM even starts streaming, the UX is dead.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost:&lt;/strong&gt; how much do you pay per million vectors stored, per million queries served, and per million tokens re-embedded when you change models?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A flat-array brute-force search through Python lists works for ten thousand vectors. It falls over at a million. The vector databases below all use some flavour of HNSW (Hierarchical Navigable Small World) graphs to get sub-linear search complexity, plus a binary protocol that does not melt under load. The free tiers exist because every provider knows that the marginal cost of carrying a small project is rounding error, and the developer who built their hobby app on your stack is the developer who buys the production plan later.&lt;/p&gt;

&lt;h2&gt;
  
  
  What “Free” Actually Means in Vector Database Land
&lt;/h2&gt;

&lt;p&gt;There are three meaningfully different shapes of “free” on offer:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Self-host open source:&lt;/strong&gt; the code is Apache 2.0, you run it on your own hardware, you pay only for the box. Qdrant, Chroma, Weaviate, Milvus, and pgvector all live here. Free as in &lt;em&gt;you do the work&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Managed free tier:&lt;/strong&gt; a permanent free quota on the vendor’s own cloud, refilled monthly or capped at storage. Pinecone and Qdrant Cloud both offer this. Free as in &lt;em&gt;they do the work&lt;/em&gt;, within limits.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trial credits:&lt;/strong&gt; a one-time wallet of paid-rate credit ($50–$300). Weaviate Cloud, Zilliz, and some others use this model. Useful for evaluation, not for shipping.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This guide focuses on the first two, because they are the only paths that let a real project keep running for free past the first month.&lt;/p&gt;

&lt;h2&gt;
  
  
  Qdrant: Open-Source Rust + Generous Managed Free Tier
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://qdrant.tech" rel="noopener noreferrer"&gt;Qdrant&lt;/a&gt; is a Rust-written vector database under the &lt;a href="https://github.com/qdrant/qdrant/blob/master/LICENSE" rel="noopener noreferrer"&gt;Apache 2.0 license&lt;/a&gt;. It is the rare project that gives you a credible production-grade open-source binary and a generous managed cloud free tier from the same team — which means you can prototype on the free cloud, migrate the exact same data to a self-hosted instance later, and never touch a different query language.&lt;/p&gt;

&lt;h3&gt;
  
  
  Free cloud cluster (no card)
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://qdrant.tech/documentation/cloud/" rel="noopener noreferrer"&gt;Qdrant Cloud free tier&lt;/a&gt; gives you one 1 GB cluster, free forever, with no credit card required. That is not a trial. It does not auto-convert to paid. The cluster is region-pinned, has full TLS, and exposes both REST and gRPC. You get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1 GB RAM cluster (enough for roughly 1–3 million 384-dimensional vectors with default HNSW parameters)&lt;/li&gt;
&lt;li&gt;Full HNSW indexing with all distance metrics (cosine, dot, Euclidean, Manhattan)&lt;/li&gt;
&lt;li&gt;Payload filtering (Qdrant’s headline feature — filter by metadata &lt;em&gt;during&lt;/em&gt; the ANN search, not after)&lt;/li&gt;
&lt;li&gt;Hybrid search (dense + sparse vectors in the same query) since Qdrant 1.10&lt;/li&gt;
&lt;li&gt;Snapshots, backups, monitoring dashboard&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Self-hosting
&lt;/h3&gt;

&lt;p&gt;One Docker command and you have a running Qdrant on your laptop or a VPS:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker run -p 6333:6333 -p 6334:6334 \
    -v $(pwd)/qdrant_storage:/qdrant/storage \
    qdrant/qdrant
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is the complete install. There is no separate metadata store, no Zookeeper, no Kafka. The binary is ~20 MB, the disk format is portable, and Qdrant ships an official &lt;a href="https://qdrant.github.io/qdrant/redoc/index.html" rel="noopener noreferrer"&gt;REST + gRPC schema&lt;/a&gt; plus first-party clients for &lt;a href="https://github.com/qdrant/qdrant-client" rel="noopener noreferrer"&gt;Python&lt;/a&gt;, JavaScript/TypeScript, Go, Rust, Java, and .NET.&lt;/p&gt;

&lt;h3&gt;
  
  
  Python in 10 lines
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct

client = QdrantClient(url="https://YOUR-CLUSTER.qdrant.io", api_key="...")
client.create_collection(
    "docs",
    vectors_config=VectorParams(size=1024, distance=Distance.COSINE),
)
client.upsert("docs", points=[
    PointStruct(id=1, vector=[...1024 floats...], payload={"title": "Hello"}),
])
hits = client.search("docs", query_vector=[...1024 floats...], limit=5)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What pushes you off the free tier
&lt;/h3&gt;

&lt;p&gt;Storage. One gigabyte is enough for a personal knowledge base, an internal company FAQ, or a side project’s documentation — but a SaaS that ingests user content will hit the ceiling fast. The next step is the &lt;em&gt;Free Trial&lt;/em&gt; credit (currently $25) on a larger cluster, then paid tiers that start around $0.014/hour for a 4 GB cluster. Or you migrate to self-host.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pinecone: The Managed-First Default
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.pinecone.io" rel="noopener noreferrer"&gt;Pinecone&lt;/a&gt; was the first venture-funded managed vector database and remains the easiest one to get a production-shaped URL out of. The product is closed-source — you cannot run a Pinecone binary on your own hardware — but the trade-off is that you cannot break anything either. There is no cluster to size, no HNSW parameters to tune, no replicas to provision.&lt;/p&gt;

&lt;h3&gt;
  
  
  Starter plan free tier
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://www.pinecone.io/pricing/" rel="noopener noreferrer"&gt;Pinecone Starter plan&lt;/a&gt; gives every account a permanent free allowance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;2 GB storage&lt;/li&gt;
&lt;li&gt;5 serverless indexes&lt;/li&gt;
&lt;li&gt;2 million read units per month&lt;/li&gt;
&lt;li&gt;1 million write units per month&lt;/li&gt;
&lt;li&gt;Up to 100 namespaces per index&lt;/li&gt;
&lt;li&gt;No credit card required&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The free tier is serverless — there are no nodes to pay for when idle. You pay (or use free units) per read and per write, where a read unit roughly equals a single small query and a write unit roughly equals one vector upserted. For a typical chatbot, 2 million read units is on the order of hundreds of thousands of user queries a month, which is more than enough for any prototype and many small production apps.&lt;/p&gt;

&lt;h3&gt;
  
  
  Python in 10 lines
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key="...")
pc.create_index(
    name="docs",
    dimension=1024,
    metric="cosine",
    spec=ServerlessSpec(cloud="aws", region="us-east-1"),
)
index = pc.Index("docs")
index.upsert(vectors=[("doc-1", [...1024 floats...], {"title": "Hello"})])
hits = index.query(vector=[...1024 floats...], top_k=5, include_metadata=True)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What pushes you off the free tier
&lt;/h3&gt;

&lt;p&gt;The first wall is usually &lt;em&gt;concurrent users&lt;/em&gt;, not storage. A B2C app that does any meaningful traffic will burn through 2 million read units quickly, and once you exceed the monthly allowance the index is paused (Starter plan) or you pay overage (Standard plan starts at $50/month minimum). The second wall is features: namespaces above 100, hybrid search beyond serverless’s current support window, and on-prem deployment all push you to Enterprise.&lt;/p&gt;

&lt;h2&gt;
  
  
  Chroma: The Local-First Default
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.trychroma.com" rel="noopener noreferrer"&gt;Chroma&lt;/a&gt; is the lightest possible vector database. It is also &lt;a href="https://github.com/chroma-core/chroma/blob/main/LICENSE" rel="noopener noreferrer"&gt;Apache 2.0&lt;/a&gt;, but its philosophy is the opposite of Pinecone’s: it expects to live &lt;em&gt;inside&lt;/em&gt; your Python application as an embedded library, the way SQLite lives inside your application as a file. There is a server mode, but the default getting-started path is &lt;code&gt;pip install chromadb&lt;/code&gt; and you have a working vector database in the same process as your script.&lt;/p&gt;

&lt;h3&gt;
  
  
  Free path
&lt;/h3&gt;

&lt;p&gt;The local install &lt;em&gt;is&lt;/em&gt; the free tier. There is no signup, no cluster, no API key — just a directory on disk where Chroma persists its DuckDB-backed storage. Chroma Cloud is in &lt;a href="https://www.trychroma.com/cloud" rel="noopener noreferrer"&gt;paid private preview&lt;/a&gt; as of late 2025, so for free-tier purposes Chroma is a pure self-host story.&lt;/p&gt;

&lt;h3&gt;
  
  
  Python in 5 lines
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import chromadb

client = chromadb.PersistentClient(path="./chroma_db")
collection = client.get_or_create_collection("docs")
collection.add(ids=["doc-1"], documents=["Hello world"], metadatas=[{"src": "readme"}])
hits = collection.query(query_texts=["What is hello?"], n_results=5)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note the API difference: Chroma can embed text for you using a default sentence-transformer (downloads on first use), so you can pass &lt;code&gt;query_texts&lt;/code&gt; instead of pre-computed vectors. That is brilliant for prototypes and a footgun in production — the bundled embedder is small, English-only, and not what you want for a real product. For anything serious, plug in OpenAI, &lt;a href="https://toolfreebie.com/cohere-rag-api/" rel="noopener noreferrer"&gt;Cohere Embed v3&lt;/a&gt;, or a custom embedding function.&lt;/p&gt;

&lt;h3&gt;
  
  
  What pushes you off Chroma
&lt;/h3&gt;

&lt;p&gt;Concurrency, scale, and operations. Chroma’s in-process mode is single-writer. Its server mode (&lt;code&gt;chroma run&lt;/code&gt;) exists and works, but the operational story — backups, replication, monitoring, multi-region — is far less mature than Qdrant’s. Chroma is the best default for “I want a working RAG demo in five minutes” and “I want a local notebook to find similar items in my CSV.” It becomes a liability the moment you have ten concurrent users hitting the same index from a deployed web app.&lt;/p&gt;

&lt;h2&gt;
  
  
  Head-to-Head: Free Tier Limits Compared
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Limit&lt;/th&gt;
&lt;th&gt;Qdrant Cloud Free&lt;/th&gt;
&lt;th&gt;Pinecone Starter&lt;/th&gt;
&lt;th&gt;Chroma (Local)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Storage&lt;/td&gt;
&lt;td&gt;1 GB RAM (~1–3M vectors at 384d)&lt;/td&gt;
&lt;td&gt;2 GB&lt;/td&gt;
&lt;td&gt;Your disk&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Indexes / collections&lt;/td&gt;
&lt;td&gt;Multiple in 1 cluster&lt;/td&gt;
&lt;td&gt;5 indexes&lt;/td&gt;
&lt;td&gt;Unlimited (your file system)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reads per month&lt;/td&gt;
&lt;td&gt;No hard cap (RAM-bound)&lt;/td&gt;
&lt;td&gt;2 M read units&lt;/td&gt;
&lt;td&gt;Unlimited (CPU-bound)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Writes per month&lt;/td&gt;
&lt;td&gt;No hard cap&lt;/td&gt;
&lt;td&gt;1 M write units&lt;/td&gt;
&lt;td&gt;Unlimited&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hybrid (dense + sparse)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Partial (sparse-dense indexes, region-limited)&lt;/td&gt;
&lt;td&gt;No (dense only)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Metadata filtering during ANN&lt;/td&gt;
&lt;td&gt;Yes (payload filter inside HNSW walk)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes (post-filter)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Persistence&lt;/td&gt;
&lt;td&gt;Cloud-managed&lt;/td&gt;
&lt;td&gt;Cloud-managed&lt;/td&gt;
&lt;td&gt;Local DuckDB / SQLite&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Backups&lt;/td&gt;
&lt;td&gt;Snapshots&lt;/td&gt;
&lt;td&gt;Collection backups&lt;/td&gt;
&lt;td&gt;Copy the directory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Self-host option&lt;/td&gt;
&lt;td&gt;Yes (Apache 2.0)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes (Apache 2.0)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Credit card to start&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No (no account needed)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Two things jump out. First, Chroma does not really compete on the same axis — it is a library, not a service. Second, between the two services, Qdrant’s free tier is the only one whose cap is &lt;em&gt;storage only&lt;/em&gt;, not query volume. Pinecone will pause your index if you blow the read-unit budget. Qdrant Cloud will simply slow down if you saturate the 1 GB cluster, but the queries keep flowing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance: What the Public Benchmarks Say
&lt;/h2&gt;

&lt;p&gt;The vector-database performance picture changes every quarter, and most vendor benchmarks are theatre. Two public third-party datasets are worth looking at:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The &lt;a href="https://github.com/qdrant/vector-db-benchmark" rel="noopener noreferrer"&gt;Qdrant vector-db-benchmark repo&lt;/a&gt;&lt;/strong&gt; — open-source, reproducible, runs every major engine through the same ANN-Benchmarks dataset with default and tuned configurations. Yes, it is published by Qdrant, but the harness is open and you can re-run it. Qdrant generally tops latency and RPS in their published runs; Chroma is not in the comparison set because it is single-node.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The &lt;a href="http://ann-benchmarks.com" rel="noopener noreferrer"&gt;ann-benchmarks.com&lt;/a&gt; leaderboard&lt;/strong&gt; — the canonical academic benchmark for ANN libraries (not full databases), useful for comparing the underlying index algorithms (HNSW, IVF, ScaNN).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a small free-tier project, the takeaway is that all three engines will return a top-5 query under 50 ms with healthy recall at the dataset sizes you can actually fit in their free quotas. Latency-per-dollar starts to matter at higher scale; at the free tier, pick on developer experience and lock-in, not p99 by 5 ms.&lt;/p&gt;

&lt;h2&gt;
  
  
  Embedding Compatibility
&lt;/h2&gt;

&lt;p&gt;None of these databases generate embeddings on their own (Chroma’s default model aside). You bring vectors in, and the database stores and searches them. That means your embedding choice is independent — and worth thinking about, because the bill on embeddings can dwarf the bill on the vector DB itself.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Embedder&lt;/th&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Free tier&lt;/th&gt;
&lt;th&gt;Plays well with&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://toolfreebie.com/cohere-rag-api/" rel="noopener noreferrer"&gt;Cohere Embed v3&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;1024 (or 384 light)&lt;/td&gt;
&lt;td&gt;Trial key, no card&lt;/td&gt;
&lt;td&gt;Multilingual RAG, +Rerank in one stack&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI text-embedding-3-small&lt;/td&gt;
&lt;td&gt;1536 (or shrinkable)&lt;/td&gt;
&lt;td&gt;Pay-as-you-go ($0.02/1M tokens)&lt;/td&gt;
&lt;td&gt;Ubiquitous defaults, every library supports it&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Voyage AI voyage-3-lite&lt;/td&gt;
&lt;td&gt;512&lt;/td&gt;
&lt;td&gt;$50 trial credit&lt;/td&gt;
&lt;td&gt;Lowest latency, strong on code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BGE / E5 (open source)&lt;/td&gt;
&lt;td&gt;varies&lt;/td&gt;
&lt;td&gt;Free (self-host)&lt;/td&gt;
&lt;td&gt;Air-gapped deployments, zero per-token cost&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sentence-Transformers (open source)&lt;/td&gt;
&lt;td&gt;384 / 768&lt;/td&gt;
&lt;td&gt;Free (self-host)&lt;/td&gt;
&lt;td&gt;Local notebooks, Chroma’s default&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;All three vector databases accept any of these; they are agnostic about where the vectors came from as long as the dimension matches what you declared at index creation.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Choose Which: Decision Tree
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;You want a notebook-based RAG demo today, with no signup.&lt;/strong&gt;
→ Chroma. &lt;code&gt;pip install chromadb&lt;/code&gt;, three lines, done. Move on.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You are building a real product and want managed infrastructure with zero ops.&lt;/strong&gt;
→ Pinecone. The starter plan covers prototypes, the upgrade path is clean, the docs are the best in the category. You pay the price of vendor lock-in.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want a real free tier you can leave running, with an exit door to self-host when traffic grows.&lt;/strong&gt;
→ Qdrant. The 1 GB cloud cluster carries a small production app, and when you outgrow it the migration to a self-hosted Docker container is one snapshot restore away.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You need hybrid search (BM25 + dense) without paying for a premium tier.&lt;/strong&gt;
→ Qdrant. It is the only one of the three that ships full sparse-dense hybrid in its free tier.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You need to filter by tens of metadata fields during retrieval.&lt;/strong&gt;
→ Qdrant. Payload filtering happens inside the HNSW walk, not as a post-filter, which preserves recall when the filter is selective.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You are deploying to a customer’s air-gapped environment.&lt;/strong&gt;
→ Qdrant or Chroma. Pinecone is not an option here.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Your team has zero appetite for running a database.&lt;/strong&gt;
→ Pinecone. The serverless model is the closest thing to “vector DB as an HTTP function” in the market.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Self-Host vs Managed Trade-Off
&lt;/h2&gt;

&lt;p&gt;This is the question that decides 80% of the choice between Qdrant/Chroma and Pinecone. Self-hosting is free in money and expensive in attention. A small VPS — &lt;a href="https://toolfreebie.com/oracle-free-arm-vps/" rel="noopener noreferrer"&gt;Oracle Cloud’s always-free ARM tier&lt;/a&gt; gives you four cores and 24 GB of RAM for $0 forever — can comfortably run Qdrant or Chroma serving a small RAG app, and the marginal cost of growth is just whatever extra RAM you buy.&lt;/p&gt;

&lt;p&gt;What self-hosting does &lt;em&gt;not&lt;/em&gt; give you for free is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automatic snapshot-and-restore on a schedule you trust&lt;/li&gt;
&lt;li&gt;Multi-region replication for HA&lt;/li&gt;
&lt;li&gt;An on-call rotation when the disk fills up at 3 a.m.&lt;/li&gt;
&lt;li&gt;A vendor support contract when something subtle breaks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a hobby app or an MVP, those things do not matter — the cost of an outage is your own time. For anything with revenue attached, the managed option starts to look cheap. Qdrant’s strength is that the same query interface works on both, so the migration story is straightforward when the project’s stakes change.&lt;/p&gt;

&lt;h2&gt;
  
  
  Integration with LangChain, LlamaIndex, and the LLM Layer
&lt;/h2&gt;

&lt;p&gt;All three databases have first-class connectors in the major orchestration libraries — there is no reason to pick on integration coverage:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LangChain:&lt;/strong&gt; &lt;code&gt;langchain-qdrant&lt;/code&gt;, &lt;code&gt;langchain-pinecone&lt;/code&gt;, &lt;code&gt;langchain-chroma&lt;/code&gt; are all official packages with active maintenance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LlamaIndex:&lt;/strong&gt; Same story — &lt;code&gt;QdrantVectorStore&lt;/code&gt;, &lt;code&gt;PineconeVectorStore&lt;/code&gt;, &lt;code&gt;ChromaVectorStore&lt;/code&gt; all live in the core repo or first-party plugins.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Haystack, LlamaCpp, Semantic Kernel:&lt;/strong&gt; All three databases are first-tier choices.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On the LLM side, the vector database is independent of the model you use to generate answers. Free-tier RAG stacks I see most often in 2026:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Embeddings:&lt;/strong&gt; Cohere Embed v3 (free trial key)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reranker:&lt;/strong&gt; Cohere Rerank v3 (same key)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector store:&lt;/strong&gt; Qdrant Cloud free or local Chroma&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM:&lt;/strong&gt; Groq Llama 3.3, Gemini 2.5 Flash, or &lt;a href="https://toolfreebie.com/together-ai-free-api-llama-deepseek-flux-2026/" rel="noopener noreferrer"&gt;Together AI’s free model tier&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That entire pipeline costs $0 up to the point where any single component’s free quota runs out, which for most personal projects is essentially never.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is pgvector a better choice than these three?
&lt;/h3&gt;

&lt;p&gt;If you already run PostgreSQL and your collection fits in a single Postgres box, &lt;a href="https://github.com/pgvector/pgvector" rel="noopener noreferrer"&gt;pgvector&lt;/a&gt; is a serious option — one fewer service to operate, transactional consistency with your other tables, mature backups. It loses to Qdrant on filtering performance at scale and on hybrid search, and it tops out earlier on throughput. For a RAG project where Postgres is already in the stack, start there. For a new project, the specialised databases are easier to reason about.&lt;/p&gt;

&lt;h3&gt;
  
  
  What about Weaviate, Milvus, Zilliz, Vespa?
&lt;/h3&gt;

&lt;p&gt;All worth knowing. Weaviate has the most ambitious built-in module system (it ships its own embedders, rerankers, multi-tenancy, generative search), but the managed free tier is a 14-day trial, not permanent. Milvus is the heavyweight open-source choice for hundred-million-vector deployments; overkill for a starting project. Zilliz is the managed Milvus, with a serverless free tier that competes with Pinecone. Vespa is Yahoo’s open-source search engine that also does vectors well, and is the right pick if you need full text + vectors + structured filters at search-engine scale. For free-tier RAG, the three covered here are the most popular for a reason — they have the lowest activation energy.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use these databases without an LLM at all?
&lt;/h3&gt;

&lt;p&gt;Yes — vector databases are useful any time you have items and want similarity search. Recommendation systems, semantic search across product catalogues, duplicate detection, image similarity (with image embeddings), code search. RAG is the headline use case but not the only one.&lt;/p&gt;

&lt;h3&gt;
  
  
  How big does my vector index have to be before I need a real database?
&lt;/h3&gt;

&lt;p&gt;Rule of thumb: under 100 K vectors, a flat numpy array with cosine similarity is faster than any database and zero ops. From 100 K to a few million, an in-process library like Chroma or FAISS is fine. Past 10 M vectors, you want a real database with persistence, snapshots, and a binary protocol — Qdrant, Pinecone, or Weaviate. The crossover is fuzzy; the gradient is real.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do I need to re-embed everything when I change my embedding model?
&lt;/h3&gt;

&lt;p&gt;Yes. Embeddings from different models are not interoperable — a query vector from OpenAI cannot be searched against documents embedded with Cohere. This is the single biggest hidden cost of RAG. When you change embedding models, you re-embed your entire corpus, which is also a re-write of every vector in the database. Plan migrations.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is a “write unit” or “read unit” in Pinecone’s pricing?
&lt;/h3&gt;

&lt;p&gt;Pinecone’s serverless billing splits operations into read units and write units, where one read unit roughly equals one similarity query that returns up to 10 results from a small index, and one write unit roughly equals one vector upserted. The actual conversion depends on index size and result count — the &lt;a href="https://docs.pinecone.io/guides/organizations/manage-billing/understanding-cost" rel="noopener noreferrer"&gt;Pinecone docs&lt;/a&gt; have the exact formula. For most chatbot workloads, 2 M read units a month covers far more queries than you would expect.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Reads
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/cohere-rag-api/" rel="noopener noreferrer"&gt;Cohere Free API: The Best Free Embedding and Rerank API for RAG in 2026&lt;/a&gt; — the embedding/reranker layer that pairs naturally with any of the three databases above.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/together-ai-free-api-llama-deepseek-flux-2026/" rel="noopener noreferrer"&gt;Together AI Free API: Run Llama 3.3, DeepSeek R1, and FLUX&lt;/a&gt; — the LLM side of a complete free RAG stack.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/best-free-ai-apis-2026/" rel="noopener noreferrer"&gt;10 Best Free AI APIs in 2026&lt;/a&gt; — broader survey of the free AI API landscape.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/oracle-free-arm-vps/" rel="noopener noreferrer"&gt;Oracle Cloud Always Free: 4-Core 24GB ARM VPS&lt;/a&gt; — where to host a self-managed Qdrant or Chroma for free.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/vercel-netlify-cloudflare/" rel="noopener noreferrer"&gt;Vercel vs Netlify vs Cloudflare Pages&lt;/a&gt; — frontend hosting to deploy the RAG app on top.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://toolfreebie.com/free-vector-database-rag/" rel="noopener noreferrer"&gt;toolfreebie.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>hosting</category>
      <category>devops</category>
    </item>
    <item>
      <title>Free Whisper API: Groq, Deepgram, AssemblyAI Compared</title>
      <dc:creator>toolfreebie</dc:creator>
      <pubDate>Thu, 28 May 2026 08:40:43 +0000</pubDate>
      <link>https://dev.to/build996/free-whisper-api-groq-deepgram-assemblyai-compared-562m</link>
      <guid>https://dev.to/build996/free-whisper-api-groq-deepgram-assemblyai-compared-562m</guid>
      <description>&lt;h2&gt;
  
  
  Free Whisper API: Groq, Deepgram, AssemblyAI Compared
&lt;/h2&gt;

&lt;p&gt;OpenAI’s &lt;a href="https://github.com/openai/whisper" rel="noopener noreferrer"&gt;Whisper&lt;/a&gt; changed speech-to-text the same way Llama changed open chat models: a frontier-grade ASR model the entire industry could host, fine-tune, and run on commodity hardware. Two years later, the question for most developers is no longer &lt;em&gt;which model&lt;/em&gt; to use — it is &lt;em&gt;which hosted API gives me Whisper-quality transcription without a bill&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Three providers dominate the answer in 2026: &lt;strong&gt;Groq&lt;/strong&gt;, &lt;strong&gt;Deepgram&lt;/strong&gt;, and &lt;strong&gt;AssemblyAI&lt;/strong&gt;. All three give you Whisper (or a Whisper-class model) behind a hosted API with a free path to first transcription. None of them require you to spin up a GPU instance, manage CUDA drivers, or fight a Python audio dependency tree. But the meaning of “free” varies wildly between them, and the right pick depends entirely on what you are building.&lt;/p&gt;

&lt;p&gt;This guide compares the three on the metrics that actually matter — real free-tier ceilings, per-hour cost once you pass them, supported languages, latency, file-size limits, and the engineering trade-offs you will hit when traffic grows. Every number cited links back to the provider’s own pricing or docs page; nothing here is fabricated benchmark theatre.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 30-Second Answer
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Free path&lt;/th&gt;
&lt;th&gt;Whisper model&lt;/th&gt;
&lt;th&gt;Paid rate (cheapest)&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Groq&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;True free tier, no card&lt;/td&gt;
&lt;td&gt;whisper-large-v3 + turbo&lt;/td&gt;
&lt;td&gt;$0.04/hr (turbo)&lt;/td&gt;
&lt;td&gt;Fast batch transcription, hackathons, side projects&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Deepgram&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$200 signup credit&lt;/td&gt;
&lt;td&gt;Whisper Cloud (whisper-large)&lt;/td&gt;
&lt;td&gt;~$0.48/hr Whisper · $0.258/hr Nova-3&lt;/td&gt;
&lt;td&gt;Production transcription with diarization and SLAs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AssemblyAI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$50 signup credit&lt;/td&gt;
&lt;td&gt;Whisper-Streaming&lt;/td&gt;
&lt;td&gt;$0.30/hr Whisper · $0.15/hr Universal&lt;/td&gt;
&lt;td&gt;Production pipelines that need Whisper + summary/sentiment in one call&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you want a no-strings, no-card free tier you can ship a real side project on, &lt;strong&gt;Groq&lt;/strong&gt; is the only one that fits. If you want a high-quality production transcription stack with $200 of runway to evaluate it on, &lt;strong&gt;Deepgram&lt;/strong&gt; wins. If you want Whisper plus a stack of additional NLP features (chapter detection, sentiment, entity extraction, summarization) in the same request, &lt;strong&gt;AssemblyAI&lt;/strong&gt; is the cleanest single-API choice.&lt;/p&gt;

&lt;p&gt;The rest of this article unpacks why.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why “Free Whisper API” Is Worth Searching For
&lt;/h2&gt;

&lt;p&gt;The official OpenAI Whisper API costs &lt;strong&gt;$0.006 per minute of audio&lt;/strong&gt;, which works out to $0.36 per hour. That sounds cheap until you do the math on a real workload:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A podcast transcription tool processing 1,000 hours/month = $360/month on OpenAI&lt;/li&gt;
&lt;li&gt;A meeting-bot SaaS averaging 50 hours/customer/month at 200 customers = $3,600/month&lt;/li&gt;
&lt;li&gt;A user-generated content platform with 10,000 hours of audio/month = $3,600/month&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Self-hosting Whisper on your own GPU is cheaper at scale, but only if you actually have the GPU, the DevOps capacity to keep it running, and a workload large enough that the instance never sits idle. For the 90% of projects that don’t, the question becomes: &lt;em&gt;which hosted API offers the cheapest entry path&lt;/em&gt;? That is exactly what the providers below compete on.&lt;/p&gt;

&lt;h3&gt;
  
  
  What “free” actually means in this market
&lt;/h3&gt;

&lt;p&gt;There are two distinct shapes of “free Whisper API” on offer in 2026:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Genuine free tier:&lt;/strong&gt; A permanently free quota every account gets, refilled daily or monthly, no credit card required. &lt;em&gt;Groq&lt;/em&gt; is the only major provider doing this for speech-to-text.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Free credits at signup:&lt;/strong&gt; A one-time wallet of credits ($50–$200) you spend down at paid rates. Once gone, you pay or stop. &lt;em&gt;Deepgram&lt;/em&gt; and &lt;em&gt;AssemblyAI&lt;/em&gt; use this model.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Both are useful — they just suit different stages of a project. A free-tier API is ideal for a personal tool, a demo, or a workload with predictable low volume. Free credits are better for prototypes that need higher concurrency or premium features (diarization, summarization) up front, with a clean ramp into paid usage when the product is real.&lt;/p&gt;

&lt;h2&gt;
  
  
  Groq Whisper API: The Only True Free Tier
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://console.groq.com/docs/speech-to-text" rel="noopener noreferrer"&gt;Groq&lt;/a&gt; built its reputation around Language Processing Units (LPUs) that serve Llama and DeepSeek faster than any GPU cloud. In 2025 they extended that infrastructure to OpenAI’s Whisper models — and unlike every other Whisper host, they gave it a real, no-card free tier that anyone with an email address can use.&lt;/p&gt;

&lt;h3&gt;
  
  
  Models on offer
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model ID&lt;/th&gt;
&lt;th&gt;Paid price&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;whisper-large-v3&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;$0.111/hour&lt;/td&gt;
&lt;td&gt;OpenAI’s flagship Whisper checkpoint, highest accuracy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;whisper-large-v3-turbo&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;$0.04/hour&lt;/td&gt;
&lt;td&gt;Distilled, ~8× faster, small accuracy drop on long audio&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Both models are multilingual (99+ languages for transcription), and both support a separate translation endpoint that returns English text from any source language. The minimum billed length is 10 seconds — even a 2-second clip charges as 10.&lt;/p&gt;

&lt;h3&gt;
  
  
  Free-tier ceiling (the real one)
&lt;/h3&gt;

&lt;p&gt;Groq’s published rate limits for the free tier on either Whisper model are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;20 requests per minute&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;2,000 requests per day&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;7,200 audio seconds per hour&lt;/strong&gt; (2 hours of audio every hour)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;28,800 audio seconds per day&lt;/strong&gt; (8 hours of audio every day)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;25 MB max file size&lt;/strong&gt; on free tier, 100 MB on the paid Dev tier&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That ceiling is unusually generous for a “free” tier. Eight hours of transcribed audio per day, every day, with no card and no expiry, is enough to run a real podcast transcription side project or a daily meeting-notes tool for one person indefinitely. If you cross the 25 MB file limit, chunk the audio with &lt;code&gt;ffmpeg&lt;/code&gt; before sending; Groq’s docs include a recommended chunking snippet.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code: transcribe a file with Groq
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://api.groq.com/openai/v1/audio/transcriptions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$GROQ_API_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-F&lt;/span&gt; &lt;span class="s2"&gt;"file=@meeting.mp3"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-F&lt;/span&gt; &lt;span class="s2"&gt;"model=whisper-large-v3-turbo"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-F&lt;/span&gt; &lt;span class="s2"&gt;"response_format=verbose_json"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Python with the OpenAI SDK (Groq is OpenAI-compatible on this endpoint):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.groq.com/openai/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GROQ_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meeting.mp3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;audio&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;audio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;transcriptions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;whisper-large-v3-turbo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;audio&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;response_format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;verbose_json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;timestamp_granularities&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;segment&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;seg&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;segments&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;seg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; – &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;seg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;seg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;verbose_json&lt;/code&gt; response includes word- or segment-level timestamps you can use for captions, search indexing, or feeding into LLM summarization. If you only need the transcript string, &lt;code&gt;response_format=text&lt;/code&gt; drops the JSON envelope.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where Groq is a poor fit
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No built-in speaker diarization.&lt;/strong&gt; Whisper itself doesn’t predict speaker turns; Deepgram and AssemblyAI run a separate diarization model alongside transcription. If you need “Speaker 1 / Speaker 2” output, plug pyannote.audio or a hosted diarizer in front of Groq, or pick a different provider.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No long-running async jobs.&lt;/strong&gt; Every request is synchronous. For files over ~60 minutes, chunk and merge yourself.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No production SLA on the free tier.&lt;/strong&gt; Limits change occasionally; production workloads should sit on the paid Dev tier.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Deepgram Whisper Cloud: The $200 Production Path
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://deepgram.com/pricing" rel="noopener noreferrer"&gt;Deepgram&lt;/a&gt; has been one of the dominant production speech-to-text vendors since well before Whisper existed. They run their own ASR model family (Nova-3, the current flagship; Nova-2; and the real-time Flux model) and also host Whisper as a managed product called &lt;strong&gt;Whisper Cloud&lt;/strong&gt;. Whisper Cloud sits alongside their proprietary models behind one API key, so you can A/B both on the same audio and pick whichever wins for your data.&lt;/p&gt;

&lt;h3&gt;
  
  
  The free path: $200 of credit
&lt;/h3&gt;

&lt;p&gt;Deepgram gives every new account $200 of API credit at signup, no card required. Their pricing page describes it as “free $200 credit, then pay as you go.” There is no fixed expiry on the credit, which is unusual — most competitors expire credits at 30–90 days.&lt;/p&gt;

&lt;p&gt;At Whisper Cloud’s published rate (~$0.0048/minute, or roughly $0.288/hour at the time of writing, with concurrency capped at 5 streams on the free tier), $200 of credit gives you something like &lt;strong&gt;~700 hours of Whisper transcription&lt;/strong&gt; to evaluate the product before you commit. If you decide Deepgram’s own Nova-3 model is good enough — and for English audio it usually is — $200 stretches further because Nova-3 is cheaper per minute and faster.&lt;/p&gt;

&lt;h3&gt;
  
  
  Whisper Cloud vs Nova-3: the trade-off Deepgram wants you to make
&lt;/h3&gt;

&lt;p&gt;Whisper Cloud is positioned as a compatibility option for teams who already pipe through Whisper and want a hosted replacement for self-hosted inference. Deepgram’s real recommendation for new builds is Nova-3, because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Nova-3 is cheaper per minute&lt;/li&gt;
&lt;li&gt;Nova-3 has built-in &lt;strong&gt;speaker diarization&lt;/strong&gt;, &lt;strong&gt;smart formatting&lt;/strong&gt;, &lt;strong&gt;language detection&lt;/strong&gt;, and &lt;strong&gt;profanity filtering&lt;/strong&gt; in the same request&lt;/li&gt;
&lt;li&gt;Nova-3 supports real-time streaming as a first-class feature; Whisper is fundamentally batch&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For most production English transcription pipelines in 2026, Nova-3 is the better answer — and if you arrived here searching “free Whisper API,” it’s worth pricing both before you commit. Whisper Cloud remains the right pick if you specifically need Whisper’s multilingual behavior or you’re benchmarking a model swap.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code: transcribe with Deepgram (Whisper or Nova)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Token &lt;/span&gt;&lt;span class="nv"&gt;$DEEPGRAM_API_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: audio/wav"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--data-binary&lt;/span&gt; @meeting.wav &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"https://api.deepgram.com/v1/listen?model=whisper-large&amp;amp;punctuate=true"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Swap the model to Nova-3 by changing &lt;code&gt;model=whisper-large&lt;/code&gt; to &lt;code&gt;model=nova-3&lt;/code&gt;. The Python SDK is a thin wrapper:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;deepgram&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DeepgramClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PrerecordedOptions&lt;/span&gt;

&lt;span class="n"&gt;dg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DeepgramClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DEEPGRAM_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meeting.wav&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;buffer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;

&lt;span class="n"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PrerecordedOptions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;whisper-large&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# or "nova-3"
&lt;/span&gt;    &lt;span class="n"&gt;punctuate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;diarize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;            &lt;span class="c1"&gt;# Nova-3 only; ignored on whisper-large
&lt;/span&gt;    &lt;span class="n"&gt;smart_format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;listen&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;v&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;transcribe_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;channels&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;alternatives&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;transcript&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Where Deepgram is a poor fit
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Once the $200 runs out, you’re paying.&lt;/strong&gt; No free tier waits behind it. Budget the runway accordingly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Higher concurrency requires paid plans.&lt;/strong&gt; The five-stream cap on the trial is enough to evaluate, not to ship a real concurrent batch pipeline.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Whisper Cloud is not Deepgram’s strategic priority.&lt;/strong&gt; Expect Nova to get the new features first; Whisper Cloud is a compatibility-and-evaluation product.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  AssemblyAI: Whisper Plus the Full NLP Stack
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.assemblyai.com/pricing" rel="noopener noreferrer"&gt;AssemblyAI&lt;/a&gt; takes a different approach. Instead of competing on “we host Whisper cheaply,” they sell a layered speech intelligence platform where transcription is the foundation and the value is everything stacked on top — chapter detection, sentiment analysis, named-entity extraction, content moderation, summarization, topic classification. All available in the same request that produces the transcript.&lt;/p&gt;

&lt;h3&gt;
  
  
  The free path: $50 of credit
&lt;/h3&gt;

&lt;p&gt;AssemblyAI gives new accounts $50 of credit on signup, no credit card required. The two relevant models:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Universal-3 Pro (Async)&lt;/strong&gt; — their current flagship pre-recorded model, $0.15/hr at the time of writing. Recommended for new builds.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Whisper-Streaming&lt;/strong&gt; — the open-source Whisper model hosted on AssemblyAI’s infrastructure, $0.30/hr, supports 99+ languages.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;$50 of credit covers roughly &lt;strong&gt;166 hours of Whisper-Streaming&lt;/strong&gt; or &lt;strong&gt;333 hours of Universal-3 Pro&lt;/strong&gt; — plenty to prototype, demo, or transcribe a backlog of meeting recordings before you have to pay.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why pick AssemblyAI’s Whisper over Groq’s
&lt;/h3&gt;

&lt;p&gt;The answer is almost always: &lt;em&gt;because you also want the layered features&lt;/em&gt;. If you only need transcript text, Groq’s free tier is strictly better — same model family, no card, no credit clock. The reason to buy AssemblyAI is that adding &lt;code&gt;sentiment_analysis: true&lt;/code&gt; or &lt;code&gt;auto_chapters: true&lt;/code&gt; to a single API call returns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Per-sentence sentiment (positive / negative / neutral with confidence)&lt;/li&gt;
&lt;li&gt;Auto-generated chapter boundaries with headlines for long-form audio&lt;/li&gt;
&lt;li&gt;Named entities (PERSON, ORG, LOCATION, etc.) with timestamps&lt;/li&gt;
&lt;li&gt;Topic categories from the IAB taxonomy&lt;/li&gt;
&lt;li&gt;PII redaction in the transcript&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Reproducing that stack on top of Groq means a second LLM call, your own entity-extraction prompt, and your own chaptering logic. For one project that’s fine. For a SaaS product, the integration cost of doing it yourself rapidly exceeds the price difference per hour.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code: transcribe with AssemblyAI
&lt;/h3&gt;

&lt;p&gt;AssemblyAI’s API is two-step (upload + transcribe) rather than a single multipart POST:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;

&lt;span class="n"&gt;API_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ASSEMBLYAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;API_KEY&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# 1. Upload audio
&lt;/span&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meeting.mp3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;upload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.assemblyai.com/v2/upload&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;audio_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;upload&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;upload_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Request transcription (with optional features)
&lt;/span&gt;&lt;span class="n"&gt;job&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.assemblyai.com/v2/transcript&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;audio_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;audio_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;speech_model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;universal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# or "whisper-streaming"
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;speaker_labels&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto_chapters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentiment_analysis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Poll until done
&lt;/span&gt;&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.assemblyai.com/v2/transcript/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;break&lt;/span&gt;
    &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chapter&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chapters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;chapter&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;start&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;s] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;chapter&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;headline&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Where AssemblyAI is a poor fit
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Free credit runs out fast on heavy workloads.&lt;/strong&gt; $50 is roughly a quarter of Deepgram’s $200.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Two-step upload adds latency.&lt;/strong&gt; Bigger files take longer to upload than to transcribe in some cases.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Universal-3 Pro is not Whisper.&lt;/strong&gt; If your codebase or contracts specifically mandate Whisper output, choose Whisper-Streaming explicitly, accept the higher per-hour rate, and don’t drift toward Universal “because it’s cheaper.”&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Honorable Mentions: Other Ways to Get Free Whisper
&lt;/h2&gt;

&lt;p&gt;The three above are the practical answers for hosted Whisper in 2026, but a few alternatives are worth knowing about.&lt;/p&gt;

&lt;h3&gt;
  
  
  Self-host with &lt;code&gt;faster-whisper&lt;/code&gt; or &lt;code&gt;whisper.cpp&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;If you already have a GPU box (or even a recent MacBook), &lt;a href="https://github.com/SYSTRAN/faster-whisper" rel="noopener noreferrer"&gt;faster-whisper&lt;/a&gt; (CTranslate2) and &lt;a href="https://github.com/ggerganov/whisper.cpp" rel="noopener noreferrer"&gt;whisper.cpp&lt;/a&gt; deliver real-time-or-better transcription on hardware you already own. Truly free at the marginal level. The catch: you own the operational complexity (driver updates, OOM crashes, queueing). For a personal tool this is fine; for a SaaS, the time it costs you is rarely worth the API savings until volume passes ~500 hours/month.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hugging Face Inference API
&lt;/h3&gt;

&lt;p&gt;Hugging Face’s free Inference API can call OpenAI Whisper checkpoints, but rate limits are aggressive and request latency on the free tier is unpredictable. Useful for one-off testing in a notebook; not a production option.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cloudflare Workers AI Whisper
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://toolfreebie.com/cloudflare-workers-ai-free-edge-ai-inference-with-47-models/" rel="noopener noreferrer"&gt;Cloudflare Workers AI&lt;/a&gt; includes Whisper among its 47+ free models, billed in “neurons” rather than minutes. If you already run your stack on Cloudflare Workers, it integrates very cleanly and the free daily neuron quota is generous. Less compelling as a standalone choice if you’re not on Cloudflare.&lt;/p&gt;

&lt;h3&gt;
  
  
  The official OpenAI Whisper API
&lt;/h3&gt;

&lt;p&gt;$0.006/minute, billed against your OpenAI usage. Not free, but worth listing as the reference price every other provider competes against. If you already have OpenAI usage running and don’t want a third API key in your codebase, it’s the path of least integration friction.&lt;/p&gt;

&lt;h2&gt;
  
  
  Side-by-Side Spec Sheet
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Groq&lt;/th&gt;
&lt;th&gt;Deepgram&lt;/th&gt;
&lt;th&gt;AssemblyAI&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Free tier shape&lt;/td&gt;
&lt;td&gt;Permanent free tier, no card&lt;/td&gt;
&lt;td&gt;$200 signup credit&lt;/td&gt;
&lt;td&gt;$50 signup credit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Whisper model&lt;/td&gt;
&lt;td&gt;large-v3, large-v3-turbo&lt;/td&gt;
&lt;td&gt;whisper-large (Whisper Cloud)&lt;/td&gt;
&lt;td&gt;Whisper-Streaming&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Native non-Whisper model&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;Nova-3, Nova-2, Flux&lt;/td&gt;
&lt;td&gt;Universal-3 Pro, Universal-2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cheapest paid rate&lt;/td&gt;
&lt;td&gt;$0.04/hr (turbo)&lt;/td&gt;
&lt;td&gt;~$0.258/hr (Nova-3)&lt;/td&gt;
&lt;td&gt;$0.15/hr (Universal-2)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Speaker diarization&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes (Nova-3)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Real-time streaming&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes (Flux, Nova)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Summarization / chapters&lt;/td&gt;
&lt;td&gt;No (DIY via LLM)&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;Yes (auto-chapters)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sentiment / entities&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max file size (single request)&lt;/td&gt;
&lt;td&gt;25 MB free / 100 MB dev&lt;/td&gt;
&lt;td&gt;2 GB&lt;/td&gt;
&lt;td&gt;2.2 GB (URL) / 5 GB (upload)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API style&lt;/td&gt;
&lt;td&gt;Synchronous, OpenAI-compatible&lt;/td&gt;
&lt;td&gt;Synchronous + streaming&lt;/td&gt;
&lt;td&gt;Async upload + poll&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Languages&lt;/td&gt;
&lt;td&gt;99+ (Whisper)&lt;/td&gt;
&lt;td&gt;30+ (Nova) / 99+ (Whisper)&lt;/td&gt;
&lt;td&gt;99+ (Whisper) / 17+ (Universal)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Decision Tree: Which One Should You Pick?
&lt;/h2&gt;

&lt;p&gt;Run through this list top to bottom. The first row that matches your situation is your answer.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;I am building a side project / hackathon entry / personal tool.&lt;/strong&gt; → &lt;strong&gt;Groq.&lt;/strong&gt; No card, real free tier, fastest to first transcription.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;I need speaker diarization (who said what) in the output.&lt;/strong&gt; → &lt;strong&gt;Deepgram Nova-3&lt;/strong&gt; if production-bound, &lt;strong&gt;AssemblyAI&lt;/strong&gt; if you also need chapters/summary.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;I need Whisper specifically — same model my self-hosted setup uses now — as a hosted swap.&lt;/strong&gt; → &lt;strong&gt;Deepgram Whisper Cloud&lt;/strong&gt;, then evaluate Nova-3 as a downgrade test.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;I need transcript + sentiment + chapters + entities from one API call.&lt;/strong&gt; → &lt;strong&gt;AssemblyAI.&lt;/strong&gt; The integration cost saved is worth the higher per-hour rate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;I need real-time streaming transcription for a voice agent.&lt;/strong&gt; → &lt;strong&gt;Deepgram Flux/Nova&lt;/strong&gt; or &lt;strong&gt;AssemblyAI Universal Streaming.&lt;/strong&gt; Groq is batch-only.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;I have heavy multilingual audio (Spanish, Mandarin, Hindi, Arabic, etc.).&lt;/strong&gt; → &lt;strong&gt;Groq whisper-large-v3&lt;/strong&gt; for cost, &lt;strong&gt;AssemblyAI Whisper-Streaming&lt;/strong&gt; for accuracy + post-processing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;I already run my backend on Cloudflare Workers.&lt;/strong&gt; → &lt;strong&gt;Cloudflare Workers AI Whisper&lt;/strong&gt; — integration savings beat per-hour savings here.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;I already have an OpenAI key wired in and don’t want a third vendor.&lt;/strong&gt; → &lt;strong&gt;Official OpenAI Whisper API&lt;/strong&gt;, $0.006/min. Don’t optimize what you don’t need to.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Combining Free Whisper with a Free LLM
&lt;/h2&gt;

&lt;p&gt;The real productivity unlock isn’t transcription on its own — it’s transcription &lt;em&gt;plus&lt;/em&gt; an LLM pass on the resulting text. A reasonable free stack for a side-project transcription tool in 2026 looks like:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Audio in:&lt;/strong&gt; Groq &lt;code&gt;whisper-large-v3-turbo&lt;/code&gt; (free, fast).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM pass on the transcript:&lt;/strong&gt; Groq Llama 3.3 70B, &lt;a href="https://toolfreebie.com/cohere-free-api-embedding-rerank-rag-2026/" rel="noopener noreferrer"&gt;Cohere Command R+&lt;/a&gt;, or &lt;a href="https://toolfreebie.com/together-ai-free-api-llama-deepseek-flux-2026/" rel="noopener noreferrer"&gt;Together AI Llama 3.3 70B Free&lt;/a&gt; for summarization, action-item extraction, or speaker attribution via prompt.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embedding for search:&lt;/strong&gt; Cohere Embed v3 or another free embedding tier.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Three free API keys, zero cards, end-to-end speech-to-search. The same architecture that costs $0.36/min on commercial offerings can run free as long as you stay within each provider’s daily ceiling.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is OpenAI’s Whisper actually free?
&lt;/h3&gt;

&lt;p&gt;The Whisper &lt;em&gt;model weights&lt;/em&gt; are MIT-licensed and free to self-host. The &lt;strong&gt;OpenAI Whisper API&lt;/strong&gt; ($0.006/min) is not free — there is no free tier and you need a credit card on file. When people say “free Whisper API” they almost always mean a third-party host (Groq, Deepgram, AssemblyAI) that runs Whisper for you with a free path in.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which Whisper API is the most accurate?
&lt;/h3&gt;

&lt;p&gt;All three host the same underlying &lt;code&gt;whisper-large-v3&lt;/code&gt; checkpoint (or a distilled variant of it), so transcription accuracy on identical audio is comparable. Differences in real-world output come from preprocessing (audio normalization, VAD), post-processing (smart formatting, punctuation), and whether diarization is layered on top. Groq runs the cleanest “raw Whisper” output; Deepgram and AssemblyAI add post-processing that usually helps for English business audio.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use these APIs for real-time transcription?
&lt;/h3&gt;

&lt;p&gt;Whisper itself is a batch model — it ingests a complete audio file and returns a transcript. &lt;strong&gt;Groq&lt;/strong&gt; is batch-only. &lt;strong&gt;Deepgram&lt;/strong&gt; offers real-time streaming via Nova and the Flux model (not Whisper). &lt;strong&gt;AssemblyAI&lt;/strong&gt; offers Universal Streaming and Whisper-Streaming for real-time use. For voice-agent latency budgets, Nova-3 and AssemblyAI Universal Streaming are the practical picks; Whisper itself is not ideal for sub-second response.&lt;/p&gt;

&lt;h3&gt;
  
  
  What’s the difference between whisper-large-v3 and whisper-large-v3-turbo?
&lt;/h3&gt;

&lt;p&gt;Turbo is a distilled version of large-v3 — fewer decoder layers, ~8× faster, and substantially cheaper to serve. The accuracy gap on standard benchmarks is small (a few percent WER) and only meaningful on long, noisy, or accented audio. For most use cases turbo is the right default; reach for large-v3 only when you’ve benchmarked turbo on your data and found it lacking.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use the free tier commercially?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Groq&lt;/strong&gt; permits commercial use on the free tier within the published rate limits; their paid Dev tier exists to lift those limits and add SLA, not to gate commercial access. &lt;strong&gt;Deepgram and AssemblyAI&lt;/strong&gt; credits are usable for any purpose — they’re paid usage you didn’t pay for yet. Always re-read each provider’s TOS before deploying commercially; it changes.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I handle audio files larger than 25 MB on Groq?
&lt;/h3&gt;

&lt;p&gt;Chunk the audio before sending. The simplest reliable approach is &lt;code&gt;ffmpeg -i input.mp3 -f segment -segment_time 600 -c copy chunk_%03d.mp3&lt;/code&gt; to split into 10-minute pieces, transcribe each, and concatenate the resulting text. Groq’s docs include a more aggressive recipe that downsamples to 16 kHz mono first, which both reduces file size and matches Whisper’s training audio format.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which one has the best multilingual support?
&lt;/h3&gt;

&lt;p&gt;Anywhere you see “whisper-large-v3” you get OpenAI’s published 99+ language coverage. Groq, Deepgram Whisper Cloud, and AssemblyAI Whisper-Streaming are all equivalent there. Deepgram Nova-3 supports a smaller set (around 30+ languages) but is faster and cheaper for the languages it does support — primarily English with strong coverage of Spanish, French, German, Portuguese, Italian, Dutch, Hindi, Japanese, Korean, and Mandarin.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do any of these offer free real-time streaming?
&lt;/h3&gt;

&lt;p&gt;Not at production volume. Deepgram and AssemblyAI both bill streaming minutes against their respective free credits ($200 and $50). Groq doesn’t offer streaming at all. If real-time is core to your product, plan to pay; the free credits are useful for evaluation, not for shipping a public voice product.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Reads
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/free-text-to-speech-api/" rel="noopener noreferrer"&gt;Which Free Text-to-Speech API Should You Use in 2026?&lt;/a&gt; — the speech-to-text half meets its text-to-speech counterpart: free TTS APIs compared for the speak-back leg of a voice pipeline.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/groq-api-the-fastest-free-ai-api-in-2026-300-800-tokens-s/" rel="noopener noreferrer"&gt;Groq API: The Fastest Free AI API in 2026&lt;/a&gt; — full breakdown of Groq’s LLM offering, which pairs directly with their Whisper endpoint.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/cohere-free-api-embedding-rerank-rag-2026/" rel="noopener noreferrer"&gt;Cohere Free API: Embedding and Rerank for RAG&lt;/a&gt; — the embedding half of a free transcription-to-search pipeline.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/together-ai-free-api-llama-deepseek-flux-2026/" rel="noopener noreferrer"&gt;Together AI Free API: Llama, DeepSeek, FLUX&lt;/a&gt; — another free LLM source for post-transcription summarization.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/cloudflare-workers-ai-free-edge-ai-inference-with-47-models/" rel="noopener noreferrer"&gt;Cloudflare Workers AI&lt;/a&gt; — also hosts Whisper, billed in neurons; relevant if your stack is already on Cloudflare.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/10-best-free-ai-apis-in-2026-the-ultimate-comparison/" rel="noopener noreferrer"&gt;10 Best Free AI APIs in 2026&lt;/a&gt; — the wider context for which provider does what.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://toolfreebie.com/free-whisper-api-compared/" rel="noopener noreferrer"&gt;toolfreebie.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Bolt.new: Free AI App Builder That Codes, Runs, and Deploys in Your Browser</title>
      <dc:creator>toolfreebie</dc:creator>
      <pubDate>Thu, 28 May 2026 08:35:15 +0000</pubDate>
      <link>https://dev.to/build996/boltnew-free-ai-app-builder-that-codes-runs-and-deploys-in-your-browser-g10</link>
      <guid>https://dev.to/build996/boltnew-free-ai-app-builder-that-codes-runs-and-deploys-in-your-browser-g10</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpxc4dn34uill26584cgq.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpxc4dn34uill26584cgq.jpg" alt="Bolt.new: Free AI App Builder That Codes, Runs, and Deploys in Your Browser" width="800" height="523"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Bolt.new?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://bolt.new" rel="noopener noreferrer"&gt;Bolt.new&lt;/a&gt; is a free, browser-based AI app builder from &lt;a href="https://stackblitz.com" rel="noopener noreferrer"&gt;StackBlitz&lt;/a&gt;. You type a prompt — “build me a Next.js todo app with Supabase auth” — and Bolt scaffolds the project, writes the code, runs &lt;code&gt;npm install&lt;/code&gt;, boots a dev server, and shows you the live preview, all inside a browser tab. When you like what you see, one click deploys it to Netlify with a real URL.&lt;/p&gt;

&lt;p&gt;The trick is that none of the build happens on a remote sandbox. The Node.js runtime, the package manager, the dev server, and the file system all live inside your browser, courtesy of &lt;a href="https://webcontainers.io" rel="noopener noreferrer"&gt;StackBlitz WebContainers&lt;/a&gt;. When the AI agent edits a file, the change is immediate — there is no round trip to a Docker container in the cloud, no cold start, no queue. That single architectural choice is what makes Bolt feel a generation ahead of older AI coding tools that wrap a remote VM in a chat UI.&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://www.latent.space/p/bolt" rel="noopener noreferrer"&gt;Eric Simons’ interview on the Latent Space podcast&lt;/a&gt;, the StackBlitz CEO described Bolt.new’s growth as “the fastest software product I have ever seen go from zero to viral” — the product crossed $8M ARR within two months of launch in late 2024, before adding any salespeople. By 2026 it has settled into one of the three most-used AI app builders alongside &lt;a href="https://lovable.dev" rel="noopener noreferrer"&gt;Lovable&lt;/a&gt; and &lt;a href="https://v0.dev" rel="noopener noreferrer"&gt;v0 by Vercel&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This guide is the honest 2026 take on Bolt.new’s free tier: what 1 million tokens per month actually buys, where Bolt beats the alternatives, where it falls down, and how to combine it with free AI APIs and self-hosted clones to get more out of it without paying for Pro.&lt;/p&gt;

&lt;h2&gt;
  
  
  The WebContainer Trick: Why “AI Code in Browser” Actually Works
&lt;/h2&gt;

&lt;p&gt;Most AI coding products fall into one of two camps. The first runs in your local editor — Cursor, &lt;a href="https://toolfreebie.com/cline-vscode-ai-agent/" rel="noopener noreferrer"&gt;Cline&lt;/a&gt;, GitHub Copilot, &lt;a href="https://toolfreebie.com/aider-free-ai-coding-cli/" rel="noopener noreferrer"&gt;Aider&lt;/a&gt; — and assumes you have Node.js, Python, Docker, and the rest of your toolchain already set up. The second runs on a remote sandbox: &lt;a href="https://replit.com" rel="noopener noreferrer"&gt;Replit Agent&lt;/a&gt;, GitHub Codespaces, Gitpod. They give you a VM in the cloud and pipe a terminal back to your browser.&lt;/p&gt;

&lt;p&gt;Bolt.new is the only widely-used product in a third camp. The Linux runtime, the package manager, the file system, the HTTP server — all of it is compiled to &lt;a href="https://webassembly.org" rel="noopener noreferrer"&gt;WebAssembly&lt;/a&gt; and runs &lt;em&gt;inside&lt;/em&gt; a single browser tab. There is no VM to rent and no laptop to set up. The first time you open Bolt.new on a fresh Chromebook, you can prompt “build me a SvelteKit blog with markdown posts” and have a running app in 60 seconds.&lt;/p&gt;

&lt;p&gt;Three concrete consequences of this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cold start is zero.&lt;/strong&gt; The dev server is ready the moment your AI agent finishes writing files — no container provisioning, no &lt;code&gt;docker pull&lt;/code&gt;, no waiting room.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compute cost is your laptop, not their cloud.&lt;/strong&gt; StackBlitz pays nothing per running project, which is a big part of why a $25/month Pro plan can include 10M tokens of Claude usage. Their marginal cost is the AI tokens, not the hosting.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You can fork and remix instantly.&lt;/strong&gt; Sharing a Bolt.new URL is the same as sharing source — anyone who opens it has a fully running clone in their browser within a few seconds. There is no “deploy this to a sandbox” intermediate step.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The only thing WebContainers cannot do is run native binaries that aren’t compiled for WASM. That rules out Docker-in-Docker, native Postgres, Python data science with NumPy/Pandas (some Python &lt;em&gt;does&lt;/em&gt; work via Pyodide, but it is slower and more limited), and anything that calls into a system library beyond the Node ecosystem. For 90% of modern web app prototyping — React, Next.js, Vue, Svelte, Astro, Remix, Vite, Tailwind, TypeScript — none of that matters. For ML pipelines or anything with a heavy native dependency, Bolt is not the right tool.&lt;/p&gt;

&lt;h2&gt;
  
  
  Free Tier: What 1M Tokens Per Month Actually Buys
&lt;/h2&gt;

&lt;p&gt;Per the &lt;a href="https://bolt.new/pricing" rel="noopener noreferrer"&gt;Bolt pricing page&lt;/a&gt;, the 2026 free tier gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;1,000,000 tokens per month&lt;/strong&gt; — the soft cap on AI usage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;300,000 tokens per day&lt;/strong&gt; — the hard daily cap; you cannot burn the whole month in one sitting&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No rollover&lt;/strong&gt; — unused free-tier tokens reset every day; only paid tokens roll over (up to one extra month)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bolt branding&lt;/strong&gt; on deployed sites&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Public projects only&lt;/strong&gt; — the free tier cannot make a project private&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;“How much app is 300,000 tokens?” is the question every new user asks, and the honest answer is: &lt;em&gt;less than you’d expect&lt;/em&gt;. Each user prompt sends the entire current file tree, the conversation history, and tool definitions back to Claude. A medium-complexity edit to a 5-file Next.js project — say, “add a search bar to the header that filters the post list” — typically consumes 30,000–60,000 tokens of context. Across the day that gives you 5–10 meaningful prompts.&lt;/p&gt;

&lt;p&gt;From &lt;a href="https://www.dyad.sh/blog/free-ai-app-builders-compared" rel="noopener noreferrer"&gt;an independent 2026 comparison of free AI app builders&lt;/a&gt;, the realistic free-tier output is “roughly 3-8 meaningful prompts per day” before you hit the wall. That is enough to scaffold a small project and do a couple of refinements. It is not enough to take a real app from nothing to production in a single sitting.&lt;/p&gt;

&lt;p&gt;Two practical workarounds:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Plan offline first.&lt;/strong&gt; Write a tight design doc — pages, routes, data model, third-party integrations — before opening Bolt. The fewer “wait, also do X” follow-ups you need, the more you get out of 300K tokens. Throwaway prompting destroys the free tier.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use Bolt.new for scaffold + structure, then export to a real editor.&lt;/strong&gt; Bolt has a “Download” button that ships you a zip with the full project. Open that in &lt;a href="https://toolfreebie.com/cline-vscode-ai-agent/" rel="noopener noreferrer"&gt;Cline&lt;/a&gt; or &lt;a href="https://toolfreebie.com/aider-free-ai-coding-cli/" rel="noopener noreferrer"&gt;Aider&lt;/a&gt; with a free &lt;a href="https://toolfreebie.com/gemini-free-ai-api/" rel="noopener noreferrer"&gt;Gemini&lt;/a&gt; or &lt;a href="https://toolfreebie.com/openrouter-free-ai-models/" rel="noopener noreferrer"&gt;OpenRouter&lt;/a&gt; key, and continue iterating without the token budget pressure.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Bolt.new vs Lovable vs v0 vs Replit Agent vs Cursor
&lt;/h2&gt;

&lt;p&gt;Five products are usually in the same conversation in 2026. They look similar from the outside (“AI builds your app from a prompt”) but they make completely different trade-offs. This table summarizes the ones that matter for a developer choosing a default tool.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Bolt.new&lt;/th&gt;
&lt;th&gt;Lovable&lt;/th&gt;
&lt;th&gt;v0&lt;/th&gt;
&lt;th&gt;Replit Agent&lt;/th&gt;
&lt;th&gt;Cursor&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Surface&lt;/td&gt;
&lt;td&gt;Browser tab (in-browser runtime)&lt;/td&gt;
&lt;td&gt;Browser tab (remote sandbox)&lt;/td&gt;
&lt;td&gt;Browser tab (UI-only)&lt;/td&gt;
&lt;td&gt;Browser tab (remote VM)&lt;/td&gt;
&lt;td&gt;Local IDE (forked VS Code)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Free tier&lt;/td&gt;
&lt;td&gt;1M tokens/mo, 300K/day&lt;/td&gt;
&lt;td&gt;5 messages/day&lt;/td&gt;
&lt;td&gt;Generous chat-only tier&lt;/td&gt;
&lt;td&gt;Limited Agent trial that expires&lt;/td&gt;
&lt;td&gt;2-week Pro trial; limited free&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stack scope&lt;/td&gt;
&lt;td&gt;Full-stack web (Node ecosystem)&lt;/td&gt;
&lt;td&gt;Full-stack web&lt;/td&gt;
&lt;td&gt;UI components only (React/Tailwind)&lt;/td&gt;
&lt;td&gt;Full-stack + databases + cron&lt;/td&gt;
&lt;td&gt;Anything you have on disk&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Default model&lt;/td&gt;
&lt;td&gt;Claude Sonnet 4.6&lt;/td&gt;
&lt;td&gt;Claude / GPT (managed)&lt;/td&gt;
&lt;td&gt;v0-managed&lt;/td&gt;
&lt;td&gt;Claude / GPT (managed)&lt;/td&gt;
&lt;td&gt;Cursor-managed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Premium model&lt;/td&gt;
&lt;td&gt;Opus 4.7 (paid plans)&lt;/td&gt;
&lt;td&gt;Yes (paid)&lt;/td&gt;
&lt;td&gt;Higher tiers on Vercel Pro&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;One-click deploy&lt;/td&gt;
&lt;td&gt;Netlify (built in)&lt;/td&gt;
&lt;td&gt;Built-in&lt;/td&gt;
&lt;td&gt;Vercel&lt;/td&gt;
&lt;td&gt;Replit Deployments&lt;/td&gt;
&lt;td&gt;External (your choice)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Database&lt;/td&gt;
&lt;td&gt;Connect Supabase&lt;/td&gt;
&lt;td&gt;Connect Supabase&lt;/td&gt;
&lt;td&gt;Connect Supabase / Neon&lt;/td&gt;
&lt;td&gt;Built-in Postgres&lt;/td&gt;
&lt;td&gt;Whatever you wire up&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GitHub sync&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Native&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Open-source clone&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/stackblitz-labs/bolt.diy" rel="noopener noreferrer"&gt;bolt.diy&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pro pricing&lt;/td&gt;
&lt;td&gt;$25/mo (10M tokens)&lt;/td&gt;
&lt;td&gt;$25/mo&lt;/td&gt;
&lt;td&gt;$20/mo&lt;/td&gt;
&lt;td&gt;$25/mo Core&lt;/td&gt;
&lt;td&gt;$20/mo&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Three takeaways from the table that aren’t obvious to most newcomers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Replit Agent and Bolt.new are full-stack; v0 is UI-only.&lt;/strong&gt; If your prompt is “build me a complete app with auth and a database,” v0 is the wrong tool — it is meant to feed React components into an existing codebase, not to ship a finished product. Bolt and Replit both ship something runnable end-to-end.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bolt’s WebContainer beats Replit’s sandbox for iteration speed.&lt;/strong&gt; Replit Agent is more capable on long-running backend tasks (it has a real Linux VM), but every time the agent edits a file, you wait for the cloud sandbox to reload. Bolt feels nearly instant because the runtime is local to your browser tab.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lovable is built for non-developers who want polish over control.&lt;/strong&gt; If the team using the tool is not technical and the most important thing is that the output looks good without thinking about &lt;code&gt;tailwind.config.js&lt;/code&gt;, Lovable is the right pick. If you want to read and edit the code yourself, Bolt is much more developer-friendly.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Five-Minute Quickstart: From Prompt to Deployed App
&lt;/h2&gt;

&lt;p&gt;Here is the shortest path from “nothing” to “deployed Next.js app with a database” using only Bolt’s free tier.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1 — Sign in
&lt;/h3&gt;

&lt;p&gt;Open &lt;a href="https://bolt.new" rel="noopener noreferrer"&gt;bolt.new&lt;/a&gt; in any Chromium-based browser (Chrome, Edge, Brave, Arc — Firefox works too as of 2026, with slightly slower WebContainer perf). Sign in with GitHub or email. There is no credit card prompt.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2 — Write a tight first prompt
&lt;/h3&gt;

&lt;p&gt;The single biggest free-tier optimization is your first prompt. Vague prompts (“make me a website”) force Bolt to ask follow-ups, each one burning context. A specific prompt produces a working scaffold in one shot.&lt;/p&gt;

&lt;p&gt;A prompt that consistently produces a usable starter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Build a Next.js 14 (App Router) blog with the following:

Pages:
- /            list of posts (title, date, excerpt, tag pill)
- /posts/[slug] full post with markdown rendering
- /admin       password-gated form to create new posts (env var ADMIN_PASSWORD)

Data:
- Posts stored in a single posts.json file in the repo
- Each post has: slug, title, date (ISO), tag, excerpt, body (markdown)

Styling:
- Tailwind CSS, dark mode default, monospace headlines, inline code blocks styled with a card shadow

Tooling:
- TypeScript strict mode
- next.config.js with images.unoptimized = true so it deploys cleanly to Netlify

After scaffolding, add three sample posts so I can see the styling.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice that everything is concrete: framework version, routing convention, data shape, styling, deploy target, even sample data. Bolt produces a runnable app in one pass on a prompt this specific and uses about 50K–80K tokens doing it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3 — Run the preview
&lt;/h3&gt;

&lt;p&gt;The preview pane mounts as soon as &lt;code&gt;npm install&lt;/code&gt; finishes, which on a typical broadband connection is 8–15 seconds. There is no “deploy” step yet — you are looking at the dev server running inside your browser.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4 — Connect Supabase (optional)
&lt;/h3&gt;

&lt;p&gt;For a real database instead of &lt;code&gt;posts.json&lt;/code&gt;, click “Connect Supabase” in the right sidebar. Bolt will prompt you to create a free &lt;a href="https://toolfreebie.com/supabase-vs-neon-free-postgresql-2026/" rel="noopener noreferrer"&gt;Supabase project&lt;/a&gt; and inject the credentials into &lt;code&gt;.env.local&lt;/code&gt; automatically. Then prompt: &lt;em&gt;“Move posts from posts.json to a Supabase table called posts. Replace the JSON imports with Supabase queries.”&lt;/em&gt; Bolt will write the migration, the queries, and update the routes in one pass — typically ~40K tokens.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5 — Deploy
&lt;/h3&gt;

&lt;p&gt;Click “Deploy” → choose Netlify → wait ~20 seconds. You get a real public URL like &lt;code&gt;https://celadon-dingo-12345.netlify.app&lt;/code&gt;. Custom domains and removing the Bolt branding are paid features.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 6 — Push to GitHub
&lt;/h3&gt;

&lt;p&gt;Click “Push to GitHub”. Bolt creates a public repo with all the code. From there, you can clone it locally and continue with Cline, Aider, or any free CLI agent — no longer paying the Bolt token budget for incremental edits.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Token Economy: How to Make the Free Tier Last
&lt;/h2&gt;

&lt;p&gt;The single most useful skill for any free-tier Bolt user is reading the token meter and writing prompts that respect it. Five things that meaningfully extend the daily 300K budget:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Front-load detail
&lt;/h3&gt;

&lt;p&gt;One specific 2,000-token prompt is much cheaper than five 500-token clarifying prompts, because each follow-up sends the entire conversation history again. The cost grows quadratically with how chatty you are.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Pin a tight system prompt
&lt;/h3&gt;

&lt;p&gt;Bolt lets you specify project-level instructions (“Always use TypeScript. Never install new packages without asking. Match the code style of the existing files.”) in the project settings. These are sent once per request and prevent Bolt from re-deciding conventions on every edit, which is a frequent token waster.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Edit small files yourself
&lt;/h3&gt;

&lt;p&gt;Bolt’s editor is a real editor. If you need to tweak a literal string, fix a typo, or change a Tailwind class, just type the edit. Don’t burn 20K tokens asking Claude to do it.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Use “Discussion mode” for design conversations
&lt;/h3&gt;

&lt;p&gt;Discussion mode lets you talk through architecture changes without editing files. It uses a much smaller context (no file tree, no diff machinery) and is meant for “should I use Server Actions or a tRPC layer?” conversations before you commit to a change.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Export and continue locally
&lt;/h3&gt;

&lt;p&gt;The 300K daily cap is the real constraint. Once you hit it, the most productive move is to download the project as a zip, open it in VS Code with &lt;a href="https://toolfreebie.com/cline-vscode-ai-agent/" rel="noopener noreferrer"&gt;Cline&lt;/a&gt; + a free &lt;a href="https://toolfreebie.com/openrouter-free-ai-models/" rel="noopener noreferrer"&gt;OpenRouter&lt;/a&gt; or &lt;a href="https://toolfreebie.com/gemini-free-ai-api/" rel="noopener noreferrer"&gt;Gemini&lt;/a&gt; key, and keep iterating with no token meter. Bolt is a great &lt;em&gt;scaffolder&lt;/em&gt;; it doesn’t have to be your only editor.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Bolt Hits Its Limits
&lt;/h2&gt;

&lt;p&gt;Bolt is not the right tool for every project. Three categories where the WebContainer model breaks down:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Anything with a native binary dependency.&lt;/strong&gt; Imagemagick, FFmpeg, Postgres-the-binary, Python with NumPy/Pandas/PyTorch, Rust toolchain, system-level audio/video. WebContainers run a Node-shaped runtime; calls into the system C library or other languages mostly don’t work. (Pyodide gives you Python-in-WASM, which Bolt can use for some scripts, but it is slow and missing the data-science package landscape.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Long-running background workers.&lt;/strong&gt; A WebContainer lives only as long as the browser tab is open. There is no daemon, no cron, no queue worker. For backends that need to run a Celery/Sidekiq-style worker, deploy elsewhere. Replit Agent, with its real VM, is a better fit if you need persistent background processing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Large monorepos.&lt;/strong&gt; Bolt is happiest with one to maybe a dozen workspaces. Pulling a 500-package pnpm monorepo into a browser tab is technically possible and miserable in practice — the file watcher overhead and memory pressure tank the UX. For codebases that size, use Cline or Aider locally.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For everything else — landing pages, internal admin tools, prototypes, indie SaaS MVPs, one-off scripts, design system playgrounds, hackathon submissions — Bolt is genuinely best-in-class.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bolt.diy: The Open-Source Self-Hosted Alternative
&lt;/h2&gt;

&lt;p&gt;StackBlitz published &lt;a href="https://github.com/stackblitz-labs/bolt.diy" rel="noopener noreferrer"&gt;bolt.diy&lt;/a&gt; as the open-source variant of Bolt’s frontend in 2024, and it has a healthy community in 2026. Bolt.diy ships the same chat-and-preview UI but lets you bring your own model and run the whole thing on a free &lt;a href="https://toolfreebie.com/best-free-hosting-2026/" rel="noopener noreferrer"&gt;free hosting platform&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The differences that actually matter:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Bolt.new (hosted)&lt;/th&gt;
&lt;th&gt;bolt.diy (self-hosted)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cost&lt;/td&gt;
&lt;td&gt;Free (1M tokens/mo limit)&lt;/td&gt;
&lt;td&gt;Free + your own API costs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model&lt;/td&gt;
&lt;td&gt;Claude Sonnet/Opus, managed&lt;/td&gt;
&lt;td&gt;Any model — bring your own key&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Free model option&lt;/td&gt;
&lt;td&gt;No (always burns Bolt tokens)&lt;/td&gt;
&lt;td&gt;Yes — pair with &lt;a href="https://toolfreebie.com/gemini-free-ai-api/" rel="noopener noreferrer"&gt;Gemini&lt;/a&gt;, &lt;a href="https://toolfreebie.com/groq-fastest-free-ai-api/" rel="noopener noreferrer"&gt;Groq&lt;/a&gt;, &lt;a href="https://toolfreebie.com/together-ai-free-api-llama-deepseek-flux-2026/" rel="noopener noreferrer"&gt;Together&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WebContainer&lt;/td&gt;
&lt;td&gt;Yes (StackBlitz infra)&lt;/td&gt;
&lt;td&gt;Yes (open StackBlitz npm package)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;One-click deploy&lt;/td&gt;
&lt;td&gt;Built-in Netlify&lt;/td&gt;
&lt;td&gt;Manual&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Setup time&lt;/td&gt;
&lt;td&gt;0 minutes&lt;/td&gt;
&lt;td&gt;~30 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Privacy&lt;/td&gt;
&lt;td&gt;Code goes through Bolt servers&lt;/td&gt;
&lt;td&gt;Stays on your machine&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The killer use case for bolt.diy: pair it with a free AI provider so you escape the 1M-tokens-per-month ceiling without paying for Bolt Pro. With &lt;a href="https://toolfreebie.com/gemini-free-ai-api/" rel="noopener noreferrer"&gt;Gemini&lt;/a&gt;‘s free tier (1,500 requests/day) or &lt;a href="https://toolfreebie.com/groq-fastest-free-ai-api/" rel="noopener noreferrer"&gt;Groq&lt;/a&gt;‘s free tier (14,400 requests/day), you have effectively unlimited usage for personal projects. Quality is below Claude Sonnet 4.6 on the hardest tasks, but for most CRUD and UI work the gap is small.&lt;/p&gt;

&lt;h3&gt;
  
  
  Quick start with bolt.diy
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# clone
git clone https://github.com/stackblitz-labs/bolt.diy
cd bolt.diy

# install
pnpm install

# bring your own key in .env.local
echo "GOOGLE_GENERATIVE_AI_API_KEY=your_gemini_key_here" &amp;gt; .env.local
# or
echo "GROQ_API_KEY=your_groq_key_here" &amp;gt;&amp;gt; .env.local

# run
pnpm run dev
# open http://localhost:5173
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s it — five commands and you have a Bolt clone running locally with no token meter, no rate limits beyond what your free AI provider imposes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is Bolt.new really free?
&lt;/h3&gt;

&lt;p&gt;Yes. The free tier gives you 1,000,000 tokens per month with a 300,000 daily cap, no credit card required, and full access to Claude Sonnet 4.6. The only meaningful restrictions are: deployed sites have Bolt branding, all your projects are public, and you cannot remove the daily cap without upgrading. For prototyping and learning, the free tier is genuinely usable. For sustained daily work on a real project, you will hit the cap and want Pro.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which AI model does Bolt.new use?
&lt;/h3&gt;

&lt;p&gt;The default is &lt;strong&gt;Claude Sonnet 4.6&lt;/strong&gt; from Anthropic. Paid plans also unlock &lt;strong&gt;Claude Opus 4.7&lt;/strong&gt; for harder reasoning tasks, and a “Standard” mode for cheap polish edits. The model selection is built into the UI; you don’t pick the provider, only the depth/cost trade-off.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use my own API key with Bolt.new?
&lt;/h3&gt;

&lt;p&gt;Not on the hosted bolt.new — the model is managed and bundled into the token cost. If you want BYOK, use the open-source &lt;a href="https://github.com/stackblitz-labs/bolt.diy" rel="noopener noreferrer"&gt;bolt.diy&lt;/a&gt; fork, which lets you plug in any model: Gemini, Groq, OpenRouter, Together AI, even local Ollama.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does Bolt.new work offline?
&lt;/h3&gt;

&lt;p&gt;The WebContainer runtime works offline once loaded — you can edit files and run code without a network — but the AI agent obviously can’t, since it calls Claude over HTTPS. For a fully offline AI coding workflow, use &lt;a href="https://toolfreebie.com/cline-vscode-ai-agent/" rel="noopener noreferrer"&gt;Cline&lt;/a&gt; or &lt;a href="https://toolfreebie.com/aider-free-ai-coding-cli/" rel="noopener noreferrer"&gt;Aider&lt;/a&gt; with &lt;a href="https://toolfreebie.com/ollama-run-ai-locally/" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt; instead.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I deploy a Bolt.new app to Vercel or Cloudflare instead of Netlify?
&lt;/h3&gt;

&lt;p&gt;Yes — push to GitHub, then connect the repo from &lt;a href="https://toolfreebie.com/vercel-vs-netlify-vs-cloudflare-pages-2026/" rel="noopener noreferrer"&gt;Vercel or Cloudflare Pages&lt;/a&gt;. The one-click deploy inside Bolt only goes to Netlify, but the generated project is a normal Node/Next.js/Vite codebase that works on any modern host.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does Bolt.new support backends other than Node?
&lt;/h3&gt;

&lt;p&gt;Limited. WebContainers run a Node.js-shaped runtime, so anything that compiles to JavaScript (TypeScript, Elm, ReScript) works fine. Python via Pyodide works for scripts but is too slow for full backends. Go, Rust, Ruby, Java, .NET — not natively. For polyglot stacks, Replit Agent or your local editor is the right tool.&lt;/p&gt;

&lt;h3&gt;
  
  
  What happens to my projects if I let my Pro subscription lapse?
&lt;/h3&gt;

&lt;p&gt;Projects stay where they are — Bolt does not delete code. You drop back to the free tier limits: 300K tokens/day, public projects only, Bolt branding on deployments. Already-private projects remain accessible to you but cannot be edited or visited by collaborators in the same way.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is Bolt.new safe for client work?
&lt;/h3&gt;

&lt;p&gt;Free-tier projects are public, so any client code with secrets in it is a problem. Pro lets you make projects private, which is the minimum bar for professional work. For NDA-grade work, the open-source bolt.diy on a self-hosted instance is the only fully-private option.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does the token meter actually count?
&lt;/h3&gt;

&lt;p&gt;Tokens reflect what Claude charges StackBlitz: input tokens (your prompt + file context + history) plus output tokens (the diff Claude writes back). Per the &lt;a href="https://support.bolt.new/account-and-subscription/tokens" rel="noopener noreferrer"&gt;official Bolt token docs&lt;/a&gt;, file context dominates — a request that touches a 1,000-line file costs roughly the same regardless of how short your prompt is. This is why “edit only the relevant files” is a real technique.&lt;/p&gt;

&lt;h2&gt;
  
  
  Decision Tree: Bolt.new vs Cline vs Aider vs Cursor vs Lovable
&lt;/h2&gt;

&lt;p&gt;If you want a one-line picker for which tool to start with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;You want to go from prompt to deployed app in one browser tab&lt;/strong&gt; → Bolt.new&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want a polished UI, you’re not a developer, and you’ll pay for it&lt;/strong&gt; → Lovable&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want UI components to drop into an existing Next.js codebase&lt;/strong&gt; → v0&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want a real backend with a real database that runs 24/7&lt;/strong&gt; → Replit Agent&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want a serious AI agent inside VS Code with your own API key&lt;/strong&gt; → &lt;a href="https://toolfreebie.com/cline-vscode-ai-agent/" rel="noopener noreferrer"&gt;Cline&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want a terminal-first AI pair programmer that auto-commits&lt;/strong&gt; → &lt;a href="https://toolfreebie.com/aider-free-ai-coding-cli/" rel="noopener noreferrer"&gt;Aider&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want a managed forked-VS-Code experience and don’t mind paying&lt;/strong&gt; → Cursor&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want autocomplete plus a chat box, nothing more&lt;/strong&gt; → GitHub Copilot&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want to self-host the whole stack with your own free API key&lt;/strong&gt; → bolt.diy + &lt;a href="https://toolfreebie.com/gemini-free-ai-api/" rel="noopener noreferrer"&gt;Gemini&lt;/a&gt; or &lt;a href="https://toolfreebie.com/groq-fastest-free-ai-api/" rel="noopener noreferrer"&gt;Groq&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Use Bolt.new with OpenClaw
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://openclaw.ai" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt; is an AI agent platform for orchestrating multi-step automated workflows. Bolt and OpenClaw cover different parts of the build-and-ship loop, and the seam between them is genuinely useful.&lt;/p&gt;

&lt;p&gt;The pattern: use Bolt.new for the human-driven scaffold-and-design phase — describe the app you want, watch it appear, push to GitHub. Then hand the GitHub repo over to an OpenClaw flow for the unattended phase: a nightly job that bumps dependencies, runs the test suite on each push, regenerates the OpenAPI client whenever the schema changes, and files an issue if anything breaks. Bolt builds the v1; OpenClaw maintains it.&lt;/p&gt;

&lt;p&gt;A concrete pipeline an OpenClaw agent can own end-to-end without you touching it: every Monday at 7am, pull the latest Bolt-generated codebase, run &lt;code&gt;pnpm audit&lt;/code&gt;, attempt to upgrade any package with a known CVE, run the test suite, and on success open a PR titled “weekly security bump.” On failure, open an issue with the failing test names. You wake up to either a green PR ready to merge or a clear bug report.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Verdict
&lt;/h2&gt;

&lt;p&gt;Bolt.new is the best-in-class answer to “I have an idea and want a running app in 5 minutes” in 2026. The WebContainer trick is genuinely a generation ahead of products that wrap a remote VM in chat — there is no cold start, no setup, no dependency on what you have installed locally. For prototyping, hackathons, internal admin tools, and turning a spec into a deployed link in front of a stakeholder, nothing else feels as fast.&lt;/p&gt;

&lt;p&gt;The free tier’s 300K daily cap is real, though, and the cost of follow-up prompts grows fast. The right way to use Bolt.new on free is: front-load detail in your first prompt, get a runnable scaffold in one pass, push to GitHub, and continue refinements in &lt;a href="https://toolfreebie.com/cline-vscode-ai-agent/" rel="noopener noreferrer"&gt;Cline&lt;/a&gt; or &lt;a href="https://toolfreebie.com/aider-free-ai-coding-cli/" rel="noopener noreferrer"&gt;Aider&lt;/a&gt; against any free AI API. That hybrid workflow — Bolt for the “0 to 1,” a free local CLI agent for the “1 to 10” — is the cheapest serious AI coding stack available right now, and it costs literally zero dollars.&lt;/p&gt;

&lt;p&gt;If you want to escape the token ceiling entirely, &lt;a href="https://github.com/stackblitz-labs/bolt.diy" rel="noopener noreferrer"&gt;bolt.diy&lt;/a&gt; with &lt;a href="https://toolfreebie.com/gemini-free-ai-api/" rel="noopener noreferrer"&gt;Gemini&lt;/a&gt; or &lt;a href="https://toolfreebie.com/groq-fastest-free-ai-api/" rel="noopener noreferrer"&gt;Groq&lt;/a&gt; as the model is the natural next step. You give up the polish of the hosted product but you also give up the $25/month it would otherwise cost to keep building. Open the page, type a prompt, watch your app run in your browser. There is no faster way to find out whether an idea is worth pursuing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Reads
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/cline-vscode-ai-agent/" rel="noopener noreferrer"&gt;Cline: Free Open-Source AI Coding Agent for VS Code (Cursor Alternative)&lt;/a&gt; — the right next step after Bolt scaffolds your app&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/aider-free-ai-coding-cli/" rel="noopener noreferrer"&gt;Aider: Free Open-Source AI Coding Agent for Your Terminal&lt;/a&gt; — terminal-first complement to Bolt’s browser UI&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/best-free-ai-apis-2026/" rel="noopener noreferrer"&gt;10 Best Free AI APIs in 2026: The Ultimate Comparison&lt;/a&gt; — pick a free model to pair with bolt.diy&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/openrouter-free-ai-models/" rel="noopener noreferrer"&gt;OpenRouter: Access 300+ Free AI Models with One API Key&lt;/a&gt; — the easiest single key for bolt.diy or downstream Cline&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/supabase-vs-neon-free-postgresql-2026/" rel="noopener noreferrer"&gt;Supabase vs Neon: Which Free PostgreSQL Database Should You Use in 2026?&lt;/a&gt; — the natural backend for Bolt.new apps&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://toolfreebie.com/bolt-new-free-app-builder/" rel="noopener noreferrer"&gt;toolfreebie.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
    </item>
    <item>
      <title>Aider: Free Open-Source AI Coding Agent for Your Terminal</title>
      <dc:creator>toolfreebie</dc:creator>
      <pubDate>Thu, 28 May 2026 08:29:46 +0000</pubDate>
      <link>https://dev.to/build996/aider-free-open-source-ai-coding-agent-for-your-terminal-4dbd</link>
      <guid>https://dev.to/build996/aider-free-open-source-ai-coding-agent-for-your-terminal-4dbd</guid>
      <description>&lt;h2&gt;
  
  
  What Is Aider?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://aider.chat" rel="noopener noreferrer"&gt;Aider&lt;/a&gt; is a free, open-source AI pair programmer that runs entirely in your terminal. You launch it inside a Git repository, point it at any LLM, and from then on every line of code you change goes through a conversation: you describe what you want in plain English, Aider edits the files, runs your tests if you ask it to, and auto-commits each change with a sensible message — all without leaving the shell.&lt;/p&gt;

&lt;p&gt;Where editor-based agents like &lt;a href="https://toolfreebie.com/cline-vscode-ai-agent/" rel="noopener noreferrer"&gt;Cline&lt;/a&gt;, &lt;a href="https://www.cursor.com/" rel="noopener noreferrer"&gt;Cursor&lt;/a&gt;, and &lt;a href="https://codeium.com/windsurf" rel="noopener noreferrer"&gt;Windsurf&lt;/a&gt; wrap the experience in panels and buttons, Aider keeps everything text. That sounds primitive until you sit with it for a day. The terminal-first design means Aider plays nicely with tmux, SSH sessions on remote boxes, vim/emacs, and the kind of multi-window workflow that senior engineers refuse to give up. There is no extension to install, no editor lock-in, no IDE quirks — just a single Python package and your repo.&lt;/p&gt;

&lt;p&gt;Aider has been quietly maintained since 2023 by Paul Gauthier and a healthy contributor base on &lt;a href="https://github.com/Aider-AI/aider" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; (Apache 2.0 license, tens of thousands of stars). It is one of the few AI coding tools that publishes a real, reproducible benchmark — the Aider Polyglot — and rewrites it every time a major model lands. That alone is worth the price of admission. And the price is zero.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Aider Is Different from Cline, Cursor, and Copilot
&lt;/h2&gt;

&lt;p&gt;The popular AI coding tools all sit on the same backbone — an LLM that reads files and proposes edits — but they differ in three big ways: &lt;strong&gt;where they live&lt;/strong&gt;, &lt;strong&gt;how they edit&lt;/strong&gt;, and &lt;strong&gt;how they think about Git&lt;/strong&gt;. Aider’s choices on all three are unusual.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Aider&lt;/th&gt;
&lt;th&gt;&lt;a href="https://toolfreebie.com/cline-vscode-ai-agent/" rel="noopener noreferrer"&gt;Cline&lt;/a&gt;&lt;/th&gt;
&lt;th&gt;Cursor&lt;/th&gt;
&lt;th&gt;GitHub Copilot&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Price&lt;/td&gt;
&lt;td&gt;Free (BYOK)&lt;/td&gt;
&lt;td&gt;Free (BYOK)&lt;/td&gt;
&lt;td&gt;$20/mo Pro&lt;/td&gt;
&lt;td&gt;$10/mo Individual&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Surface&lt;/td&gt;
&lt;td&gt;Terminal CLI&lt;/td&gt;
&lt;td&gt;VS Code extension&lt;/td&gt;
&lt;td&gt;Forked VS Code (separate app)&lt;/td&gt;
&lt;td&gt;Editor plugin&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Open source&lt;/td&gt;
&lt;td&gt;Yes (Apache 2.0)&lt;/td&gt;
&lt;td&gt;Yes (Apache 2.0)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Choose your model&lt;/td&gt;
&lt;td&gt;Any model via LiteLLM (100+ providers)&lt;/td&gt;
&lt;td&gt;15+ providers&lt;/td&gt;
&lt;td&gt;Cursor-managed&lt;/td&gt;
&lt;td&gt;Mostly OpenAI, some Claude&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Free model option&lt;/td&gt;
&lt;td&gt;Yes — pair with any free API or Ollama&lt;/td&gt;
&lt;td&gt;Yes — pair with any free API&lt;/td&gt;
&lt;td&gt;Limited free tier&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Git auto-commit&lt;/td&gt;
&lt;td&gt;Yes (per change, with semantic message)&lt;/td&gt;
&lt;td&gt;Optional&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Repo-wide context map&lt;/td&gt;
&lt;td&gt;Yes (tree-sitter, ranks symbols by relevance)&lt;/td&gt;
&lt;td&gt;Workspace search&lt;/td&gt;
&lt;td&gt;Codebase index&lt;/td&gt;
&lt;td&gt;Workspace search&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multiple edit formats&lt;/td&gt;
&lt;td&gt;Yes (whole / diff / udiff / search-replace)&lt;/td&gt;
&lt;td&gt;One (search-replace)&lt;/td&gt;
&lt;td&gt;Internal&lt;/td&gt;
&lt;td&gt;Internal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Architect/Editor split&lt;/td&gt;
&lt;td&gt;Yes (different model for plan vs apply)&lt;/td&gt;
&lt;td&gt;Yes (Plan/Act with one model)&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Voice input&lt;/td&gt;
&lt;td&gt;Yes (Whisper)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Web URL ingestion&lt;/td&gt;
&lt;td&gt;Yes (&lt;code&gt;/web&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Yes (built-in browser)&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The headline takeaway: if you live in an editor, Cline and Cursor are the natural fit. If you live in a terminal — or if you regularly SSH into remote machines, work in tmux, pair with vim/emacs/nano, or run on a server with no GUI — Aider is the only tool in this category that was designed for that workflow from day one.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Aider Polyglot Benchmark: A Public Number You Can Trust
&lt;/h2&gt;

&lt;p&gt;Most AI coding tools publish marketing copy. Aider publishes a benchmark — and not a synthetic one. The &lt;a href="https://aider.chat/docs/leaderboards/" rel="noopener noreferrer"&gt;Aider Polyglot leaderboard&lt;/a&gt; tests every major model against 225 hand-curated coding exercises across &lt;strong&gt;C++, Go, Java, JavaScript, Python, and Rust&lt;/strong&gt;. Each exercise has hidden unit tests; a model passes only when its edits make the tests go green, with at most one self-correction round.&lt;/p&gt;

&lt;p&gt;What makes this benchmark useful, beyond being public and reproducible:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It tests the full pipeline — reading a problem statement, locating the right files, producing a syntactically valid edit, getting the diff format right, and writing code that compiles and passes tests. A model that “knows” the answer but botches the diff format scores zero, exactly like in real life.&lt;/li&gt;
&lt;li&gt;It is multi-language. A model that writes elegant Python and falls apart on Rust will not top this leaderboard, even though it might top a Python-only one.&lt;/li&gt;
&lt;li&gt;Aider re-runs it for every notable model release. The leaderboard you read today is not the one you read six months ago.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The practical use is choosing a model. Before the polyglot existed, “is GPT-4o better than Claude for refactoring Go?” was a vibe argument. Now you check the table. The leaderboard also distinguishes between &lt;strong&gt;edit-format pass rate&lt;/strong&gt; (did the model produce a syntactically applicable diff?) and &lt;strong&gt;final pass rate&lt;/strong&gt; (did the test go green?). Models that score high on the first but low on the second are confidently wrong. The opposite — high final, low edit-format — almost never happens, because if your edits are mangled they cannot pass.&lt;/p&gt;

&lt;p&gt;If you are choosing a free model to pair with Aider, head to the leaderboard, sort by the latest column, and pick the highest-ranked model that has a free tier. As of 2026, the strongest options that fit that description include DeepSeek’s reasoning models, Gemini 2.5 Pro on Google AI Studio, and Llama 3.3 70B on free providers like &lt;a href="https://toolfreebie.com/groq-fastest-free-ai-api/" rel="noopener noreferrer"&gt;Groq&lt;/a&gt; or &lt;a href="https://toolfreebie.com/together-ai-free-api-llama-deepseek-flux-2026/" rel="noopener noreferrer"&gt;Together AI&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Five Features That Make Aider Worth Your Terminal Time
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. The Repo-Map: Context Without Token Bankruptcy
&lt;/h3&gt;

&lt;p&gt;The naive way to give an LLM “context” about your repo is to dump every file into the prompt. The naive way also costs $20 of tokens per task and breaks the moment your repo crosses a few thousand lines. Aider’s repo-map uses &lt;a href="https://tree-sitter.github.io/tree-sitter/" rel="noopener noreferrer"&gt;tree-sitter&lt;/a&gt; to parse every source file in your repo and extract a structured summary: class names, function signatures, top-level constants, exported types. It then ranks those symbols by relevance to the current chat using a PageRank-style graph over identifier references.&lt;/p&gt;

&lt;p&gt;The result is a compact map — a few hundred lines for most projects — that gives the LLM enough scaffolding to ask for the right files. You did not paste anything. Aider figured it out from the AST.&lt;/p&gt;

&lt;p&gt;You can see the current map any time with &lt;code&gt;/map&lt;/code&gt;, and tune the size cap with &lt;code&gt;--map-tokens&lt;/code&gt;. For repos under 100K lines this works astoundingly well. For monorepos beyond that, combine it with &lt;code&gt;.aiderignore&lt;/code&gt; (same syntax as &lt;code&gt;.gitignore&lt;/code&gt;) to scope the agent to a single package.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Multiple Edit Formats: Pick the One Your Model Is Best At
&lt;/h3&gt;

&lt;p&gt;Most agents use one edit format. Aider supports four, because different models reliably do different ones better:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;whole&lt;/strong&gt; — model returns the complete new file. Highest token cost, lowest failure rate. Good for small files and weaker models.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;diff&lt;/strong&gt; — model returns SEARCH/REPLACE blocks. Moderate token cost, fragile if the model botches whitespace.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;udiff&lt;/strong&gt; — model returns unified-diff hunks. Compact, what experienced devs read; some models do this very well.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;diff-fenced&lt;/strong&gt; — diff inside a fenced code block. Edge cases around models that inject markdown headers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Aider auto-picks the right format per model based on benchmark history, but you can override with &lt;code&gt;--edit-format&lt;/code&gt;. If you have an unusual model and edits keep failing to apply, switching the format almost always fixes it.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Architect Mode: Two Models, Each Doing What It Is Good At
&lt;/h3&gt;

&lt;p&gt;Frontier reasoning models — DeepSeek R1, OpenAI o-series, Gemini 2.5 Pro thinking — are excellent at planning and weak at producing pristine edit formats. Cheap fast models — Llama 3.3 70B, Gemini 2.0 Flash, GPT-4.1 Mini — are the reverse. Aider’s architect mode lets you assign a model to each role:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aider &lt;span class="nt"&gt;--architect&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--model&lt;/span&gt; openrouter/deepseek/deepseek-r1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--editor-model&lt;/span&gt; openrouter/anthropic/claude-3-5-sonnet
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The architect produces a plan in natural language. The editor takes that plan plus the relevant files and emits the actual SEARCH/REPLACE blocks. Two-stage prompting like this consistently beats single-model approaches on the polyglot benchmark, often by ten percentage points or more on harder languages.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Auto-Commit With Real Commit Messages
&lt;/h3&gt;

&lt;p&gt;Aider commits after every accepted change, with a one-line message generated from the diff. The first time you run it your &lt;code&gt;git log&lt;/code&gt; looks like a real engineer worked on the repo all day. This sounds cosmetic until your first time using &lt;code&gt;git bisect&lt;/code&gt; on AI-generated code — granular commits with semantic messages turn a debugging nightmare into a five-minute regression hunt.&lt;/p&gt;

&lt;p&gt;Don’t want auto-commits? Pass &lt;code&gt;--no-auto-commits&lt;/code&gt; and Aider stages the changes for you to review and commit yourself. Don’t have a Git repo? Aider will offer to &lt;code&gt;git init&lt;/code&gt; for you on first run, since the entire workflow assumes one.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. &lt;code&gt;/web&lt;/code&gt;: Drop a URL Into the Chat
&lt;/h3&gt;

&lt;p&gt;One of Aider’s most underrated commands. Type &lt;code&gt;/web https://docs.example.com/api&lt;/code&gt; and Aider scrapes the page, converts it to clean Markdown, and adds it to the chat as context. Now the LLM has the actual API reference for the library you are using, not the version it remembers from training. This eliminates a huge category of stale-knowledge bugs without you needing to set up a separate RAG pipeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Install Aider
&lt;/h2&gt;

&lt;p&gt;Aider is a Python package. The official one-liner installer handles the Python version and dependency isolation for you:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python &lt;span class="nt"&gt;-m&lt;/span&gt; pip &lt;span class="nb"&gt;install &lt;/span&gt;aider-install
aider-install
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That installs Aider into an isolated environment using &lt;a href="https://docs.astral.sh/uv/" rel="noopener noreferrer"&gt;uv&lt;/a&gt; and adds it to your PATH. If you prefer to manage your own venv, the older method still works:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pipx &lt;span class="nb"&gt;install &lt;/span&gt;aider-chat
&lt;span class="c"&gt;# or in a venv:&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;aider-chat
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verify the install:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aider &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You also need an API key for whatever LLM provider you plan to use. Aider reads keys from environment variables and from a &lt;code&gt;.env&lt;/code&gt; file in your project. The most common setup looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENROUTER_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-or-v1-...
&lt;span class="c"&gt;# or&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;GEMINI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;AIza...
&lt;span class="c"&gt;# or&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;DEEPSEEK_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is the entire install. &lt;code&gt;cd&lt;/code&gt; into a Git repo, run &lt;code&gt;aider&lt;/code&gt;, start typing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Connecting Aider to Free LLM APIs
&lt;/h2&gt;

&lt;p&gt;Aider speaks &lt;a href="https://docs.litellm.ai/" rel="noopener noreferrer"&gt;LiteLLM&lt;/a&gt; under the hood, which means it supports virtually every provider with one consistent &lt;code&gt;--model&lt;/code&gt; flag. The four practical zero-cost setups in 2026:&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 1: Google Gemini Free Tier (Most Generous)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://toolfreebie.com/gemini-free-ai-api/" rel="noopener noreferrer"&gt;Gemini’s free tier&lt;/a&gt; on Google AI Studio gives you Gemini 2.5 Pro with a 1M-token context window and very generous request limits — long enough to throw entire codebases at it. Set the key and run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;GEMINI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;AIza...
aider &lt;span class="nt"&gt;--model&lt;/span&gt; gemini/gemini-2.5-pro
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For most personal projects, Gemini’s free tier alone keeps Aider running indefinitely with no card on file.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 2: OpenRouter Free Models
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://toolfreebie.com/openrouter-free-ai-models/" rel="noopener noreferrer"&gt;OpenRouter&lt;/a&gt; aggregates dozens of providers behind one OpenAI-compatible endpoint and exposes a tier of &lt;code&gt;:free&lt;/code&gt; models you can call without spending credits.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENROUTER_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-or-v1-...
aider &lt;span class="nt"&gt;--model&lt;/span&gt; openrouter/deepseek/deepseek-r1:free
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Rate limits are tighter than Gemini’s, but the variety is unmatched. Worth keeping a key around just for fallback when other free tiers throttle you.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 3: Ollama (Fully Local, Truly Offline)
&lt;/h3&gt;

&lt;p&gt;If you have a reasonably modern laptop and want zero cloud calls, &lt;a href="https://toolfreebie.com/ollama-run-ai-locally/" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt; runs models on your own GPU.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull qwen2.5-coder:14b
aider &lt;span class="nt"&gt;--model&lt;/span&gt; ollama_chat/qwen2.5-coder:14b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Quality drops below frontier models, but for boilerplate, file renames, and small refactors, a local 14B coder model is good enough. And no tokens leave your machine — useful for codebases under NDA.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 4: DeepSeek Direct (Cheapest Frontier)
&lt;/h3&gt;

&lt;p&gt;DeepSeek’s API offers near-frontier quality on its V3 chat model and reasoning quality from R1, with off-peak discount pricing that often makes it the cheapest paid option in the category. Their &lt;a href="https://toolfreebie.com/deepseek-r1-free-api/" rel="noopener noreferrer"&gt;free trial credits&lt;/a&gt; are enough to evaluate, and routine use rarely costs more than a coffee per month.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;DEEPSEEK_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-...
aider &lt;span class="nt"&gt;--model&lt;/span&gt; deepseek/deepseek-chat
&lt;span class="c"&gt;# or for the reasoning model:&lt;/span&gt;
aider &lt;span class="nt"&gt;--model&lt;/span&gt; deepseek/deepseek-reasoner
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Mixed Strategy: Architect Free, Editor Free
&lt;/h3&gt;

&lt;p&gt;The architect mode shines when the two models live on different free tiers. A common 2026 pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aider &lt;span class="nt"&gt;--architect&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--model&lt;/span&gt; gemini/gemini-2.5-pro &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--editor-model&lt;/span&gt; openrouter/meta-llama/llama-3.3-70b-instruct:free
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Gemini’s reasoning power for the plan, OpenRouter’s free Llama for the mechanical edit. Total cost: zero.&lt;/p&gt;

&lt;h2&gt;
  
  
  A First Real Task: From Prompt to Green Tests
&lt;/h2&gt;

&lt;p&gt;To make this concrete, here is a representative Aider session adding a feature to a Python web app. The user types one prompt; Aider does everything else.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt:&lt;/strong&gt; &lt;em&gt;“Add a /healthz endpoint to the FastAPI app that returns 200 with the build commit SHA from an env var, plus a Postgres ping. Add a pytest test.”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The session, abbreviated:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aider &lt;span class="nt"&gt;--model&lt;/span&gt; gemini/gemini-2.5-pro
&lt;span class="go"&gt;Aider v0.64.0
Main model: gemini/gemini-2.5-pro
Repo: 47 files, 8,200 LOC
Repo-map: 412 tokens

&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Add a /healthz endpoint to the FastAPI app that returns 200 with
&lt;span class="go"&gt;  the build commit SHA from an env var, plus a Postgres ping.
  Add a pytest test.

I'll need to look at the FastAPI app entry point and your DB session helper.
Please add these files to the chat:
  - app/main.py
  - app/db.py

&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;/add app/main.py app/db.py
&lt;span class="go"&gt;
Files added (1,420 tokens).

Plan:
  1. Add a Pydantic model HealthzResponse in app/main.py
  2. Add an async route GET /healthz that:
     - reads BUILD_COMMIT_SHA env var (default 'unknown')
     - executes SELECT 1 on the DB session, returns ok/fail
  3. Add tests/test_healthz.py with two cases (healthy, db down via mock)

Apply changes? [y/n] y

Editing app/main.py
Editing tests/test_healthz.py (new file)

Commit f4a2b81: feat: add /healthz with build SHA and Postgres ping

&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;/run pytest tests/test_healthz.py &lt;span class="nt"&gt;-q&lt;/span&gt;
&lt;span class="go"&gt;
..                                                  [100%]
2 passed in 0.34s

Add this output to the chat? [y/n] y

&lt;/span&gt;&lt;span class="gp"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Total time: under a minute on a fast model. Two files changed, two tests passing, one commit, zero buttons clicked. The same workflow over SSH on a remote dev box, in a tmux pane, on a Chromebook — Aider does not care.&lt;/p&gt;

&lt;h2&gt;
  
  
  Production Tips: Don’t Burn Tokens
&lt;/h2&gt;

&lt;p&gt;Free tiers and frontier models both have limits. A few habits keep Aider economical without giving up much:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Use &lt;code&gt;/drop&lt;/code&gt; aggressively.&lt;/strong&gt; Files stay in the chat context until you remove them. After Aider edits a file, you usually do not need it in the next prompt — drop it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add a &lt;code&gt;.aiderignore&lt;/code&gt;&lt;/strong&gt; for the parts of your repo that are noise — &lt;code&gt;node_modules&lt;/code&gt;, generated code, vendored deps, large fixtures. The repo-map respects it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cap the map.&lt;/strong&gt; &lt;code&gt;--map-tokens 1024&lt;/code&gt; is plenty for most repos and slashes per-prompt cost.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use &lt;code&gt;/clear&lt;/code&gt; between unrelated tasks&lt;/strong&gt; to flush the conversation history. Multi-thousand-token chat history follows you to every prompt; clearing it can halve token cost.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reach for &lt;code&gt;--weak-model&lt;/code&gt;&lt;/strong&gt; for commit messages and summarization. Aider already uses a small model by default for those side tasks; you can point this at an even cheaper one (&lt;code&gt;gemini/gemini-2.0-flash&lt;/code&gt;, free) to save more.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auto-test only at task boundaries.&lt;/strong&gt; &lt;code&gt;--auto-test&lt;/code&gt; runs your test suite after every edit. On large suites that adds up. Run tests manually with &lt;code&gt;/run pytest&lt;/code&gt; when you want.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is Aider really free?
&lt;/h3&gt;

&lt;p&gt;The Aider tool itself is free and open-source under Apache 2.0. The model calls go through whichever LLM provider you point it at — and you can absolutely run it for $0/month using free providers (Gemini, OpenRouter free tier, Ollama local). The only thing you pay for, optionally, is upgrading to a paid model when free-tier rate limits start to slow you down.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does Aider work without Git?
&lt;/h3&gt;

&lt;p&gt;Technically yes, with &lt;code&gt;--no-git&lt;/code&gt;, but you give up auto-commit, the diff-aware repo-map, and easy rollback. On a fresh project Aider will offer to &lt;code&gt;git init&lt;/code&gt; for you, which takes one keypress. Just let it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can Aider run code or commands?
&lt;/h3&gt;

&lt;p&gt;Yes. The &lt;code&gt;/run&lt;/code&gt; command runs an arbitrary shell command and offers to add the output to the chat — perfect for tests, linters, or running your dev server briefly to check for startup errors. Unlike fully autonomous agents, Aider does not run commands on its own; you trigger them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is Aider an autonomous agent like Cline’s Act mode?
&lt;/h3&gt;

&lt;p&gt;No, and that is a deliberate choice. Aider treats you as the loop: it proposes edits, you approve, it commits, you run tests, you describe the next step. There is no “go solve this issue end-to-end without me watching” mode. For codebases where every line matters, that is a feature. If you want fully autonomous “fix this issue” workflows, pair Aider with &lt;a href="https://toolfreebie.com/cline-vscode-ai-agent/" rel="noopener noreferrer"&gt;Cline&lt;/a&gt; or use a dedicated agent framework.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use Aider with my company’s private model gateway?
&lt;/h3&gt;

&lt;p&gt;Yes — anything LiteLLM speaks, Aider speaks. Set &lt;code&gt;OPENAI_API_BASE&lt;/code&gt; to your gateway URL, pass &lt;code&gt;--model openai/your-model-id&lt;/code&gt;, and the rest just works. This makes Aider one of the most enterprise-friendly tools in the category.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does Aider support MCP?
&lt;/h3&gt;

&lt;p&gt;As of mid-2026 Aider’s primary tools are file editing, &lt;code&gt;/run&lt;/code&gt;, and &lt;code&gt;/web&lt;/code&gt;. &lt;a href="https://toolfreebie.com/mcp-protocol-ai-agents/" rel="noopener noreferrer"&gt;MCP&lt;/a&gt; integration is on the roadmap and discussed in active issues, but the design philosophy — minimal core, scriptable shell — means a lot of what MCP servers provide is achievable today via &lt;code&gt;/run&lt;/code&gt; and shell pipes.&lt;/p&gt;

&lt;h3&gt;
  
  
  What about voice input?
&lt;/h3&gt;

&lt;p&gt;Set &lt;code&gt;OPENAI_API_KEY&lt;/code&gt; and run &lt;code&gt;/voice&lt;/code&gt;. Aider records, transcribes via Whisper, and inserts the transcript as your next prompt. The transcription is the only place Aider phones home outside your chosen LLM, and you can disable it.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Use Aider vs Cline vs Cursor
&lt;/h2&gt;

&lt;p&gt;The three tools cover overlapping ground. A simple decision tree:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;You live in vim / emacs / a terminal multiplexer, or you SSH into remote dev machines:&lt;/strong&gt; Aider. Nothing else in this category respects that workflow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want autonomous, multi-file, multi-step “go fix this issue” execution with browser verification:&lt;/strong&gt; &lt;a href="https://toolfreebie.com/cline-vscode-ai-agent/" rel="noopener noreferrer"&gt;Cline&lt;/a&gt;. Its Plan/Act split and built-in browser are purpose-built for that.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want a managed product with one bill, one model picker, polished UI, and tab-completion-style suggestions in the editor:&lt;/strong&gt; Cursor. You give up cost transparency and model choice; you get less friction.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want Git-aware, surgical edits with full audit trail and the option to bail at any point:&lt;/strong&gt; Aider. The auto-commit + diff-format design optimizes for “I will pair with this; I will not let it run wild.”&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want to run on a free model with zero credit card and zero subscription:&lt;/strong&gt; Aider or Cline both work. Pick by surface (terminal vs editor).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Many serious developers in 2026 use both: Aider for terminal-native pairing on personal projects and remote boxes, Cline (or Cursor) for autonomous IDE work on the day job. They are not enemies; they fit different parts of the day.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pairing Aider With Free APIs: Cost Reality
&lt;/h2&gt;

&lt;p&gt;Concrete monthly costs for an engineer using an AI coding pair for, say, 60 hours a month:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Setup&lt;/th&gt;
&lt;th&gt;Monthly Cost&lt;/th&gt;
&lt;th&gt;Model Quality&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Aider + Gemini 2.5 Pro free tier&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;Frontier&lt;/td&gt;
&lt;td&gt;Hits rate limits at heavy usage; works for most solo projects&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Aider + OpenRouter free models&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;Strong open-source&lt;/td&gt;
&lt;td&gt;Tighter limits, huge model variety, easy fallback&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Aider + Ollama Qwen2.5-Coder 14B local&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;Mid&lt;/td&gt;
&lt;td&gt;Fully offline, no rate limits, requires GPU&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Aider + DeepSeek V3 (paid, off-peak)&lt;/td&gt;
&lt;td&gt;~$2–8&lt;/td&gt;
&lt;td&gt;Frontier&lt;/td&gt;
&lt;td&gt;Cheapest paid frontier; pay only for usage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Aider + Anthropic Claude Sonnet (paid)&lt;/td&gt;
&lt;td&gt;~$10–40&lt;/td&gt;
&lt;td&gt;Frontier&lt;/td&gt;
&lt;td&gt;Top of polyglot leaderboard most months&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cursor Pro (subscription)&lt;/td&gt;
&lt;td&gt;$20&lt;/td&gt;
&lt;td&gt;Mixed&lt;/td&gt;
&lt;td&gt;Predictable bill, fewer choices&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GitHub Copilot Individual&lt;/td&gt;
&lt;td&gt;$10&lt;/td&gt;
&lt;td&gt;Mid&lt;/td&gt;
&lt;td&gt;No autonomous mode&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The first three rows are what makes Aider compelling. A serious engineering workflow at zero dollars a month is not a trick — it is just a matter of pairing a free tool with a free API.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Reads
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/cline-vscode-ai-agent/" rel="noopener noreferrer"&gt;Cline: Free Open-Source AI Coding Agent for VS Code (Cursor Alternative)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/crewai-vs-autogpt-vs-langgraph/" rel="noopener noreferrer"&gt;CrewAI vs AutoGPT vs LangGraph: Which Free Agent Framework Should You Use in 2026?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/n8n-workflow-automation/" rel="noopener noreferrer"&gt;n8n: Open-Source Workflow Automation with AI Agents and 400+ Integrations&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/mcp-protocol-ai-agents/" rel="noopener noreferrer"&gt;MCP (Model Context Protocol): Connect AI Agents to Any Tool or API&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/bolt-new-free-app-builder/" rel="noopener noreferrer"&gt;Bolt.new: Free AI App Builder That Codes, Runs, and Deploys in Your Browser&lt;/a&gt; — the in-browser counterpart to Aider&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/notebooklm-ai-research/" rel="noopener noreferrer"&gt;Google NotebookLM: Free AI Research Tool for Summarizing Documents and PDFs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For developers who want the same prompt-driven workflow but without ever leaving the browser, the natural pairing is &lt;a href="https://toolfreebie.com/bolt-new-free-app-builder/" rel="noopener noreferrer"&gt;Bolt.new&lt;/a&gt; — it scaffolds a runnable app in a single tab via WebContainer, then you can push to GitHub and continue refining locally with Aider against any free API key.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Aider is the tool you reach for when you have opinions about your code. The terminal-first, Git-aware, edit-format-conscious design assumes you are going to read every diff, that you care which commits show up in &lt;code&gt;git log&lt;/code&gt;, and that the LLM is your assistant rather than your replacement. For a certain kind of engineer — and a certain kind of repo — that is exactly the right contract.&lt;/p&gt;

&lt;p&gt;It is also the most straightforwardly free serious AI coding tool in 2026. No subscription, no enterprise upsell, no credit card screen. Pip install it, point it at &lt;a href="https://toolfreebie.com/gemini-free-ai-api/" rel="noopener noreferrer"&gt;Gemini’s free tier&lt;/a&gt; or &lt;a href="https://toolfreebie.com/ollama-run-ai-locally/" rel="noopener noreferrer"&gt;a local Ollama model&lt;/a&gt;, and start pairing. The whole thing takes ten minutes from &lt;code&gt;pip install&lt;/code&gt; to first commit, and the productivity ceiling is as high as the model you point it at.&lt;/p&gt;

&lt;p&gt;If you have not tried a terminal-native AI pair programmer before, give Aider an afternoon on a small side project. Either it will fit your workflow perfectly and replace half your editor extensions, or it will not — and you will know within a few hours which camp you are in. There is no lock-in, no sunk cost, and no reason not to try.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Reads
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/free-ai-coding-assistants/" rel="noopener noreferrer"&gt;5 Free AI Coding Assistants for VS Code &amp;amp; Terminal&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/cline-vscode-ai-agent/" rel="noopener noreferrer"&gt;Cline: Free Open-Source AI Coding Agent for VS Code (Cursor Alternative)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/bolt-new-free-app-builder/" rel="noopener noreferrer"&gt;Bolt.new: Free AI App Builder That Codes, Runs, and Deploys in Your Browser&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/ollama-run-ai-locally/" rel="noopener noreferrer"&gt;Ollama: Run AI Models Locally for Free&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/openrouter-free-ai-models/" rel="noopener noreferrer"&gt;OpenRouter: Access 300+ Free AI Models with One API Key&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://toolfreebie.com/aider-free-ai-coding-cli/" rel="noopener noreferrer"&gt;toolfreebie.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
    </item>
    <item>
      <title>Cline: Free Open-Source AI Coding Agent for VS Code (Cursor Alternative)</title>
      <dc:creator>toolfreebie</dc:creator>
      <pubDate>Thu, 28 May 2026 08:24:18 +0000</pubDate>
      <link>https://dev.to/build996/cline-free-open-source-ai-coding-agent-for-vs-code-cursor-alternative-36lc</link>
      <guid>https://dev.to/build996/cline-free-open-source-ai-coding-agent-for-vs-code-cursor-alternative-36lc</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flt7x3vg7ctf3avdtm9yl.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flt7x3vg7ctf3avdtm9yl.jpg" alt="Cline VS Code AI Agent" width="800" height="534"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Cline?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://cline.bot" rel="noopener noreferrer"&gt;Cline&lt;/a&gt; is a free, open-source AI coding agent that lives inside VS Code. Originally released as “Claude Dev” in 2024 and renamed to Cline in late 2024, the project has grown into one of the most popular autonomous coding assistants on the OpenVSX and VS Code marketplaces — over a million installs as of 2026, and a GitHub repo that consistently sits in the top of the trending charts (&lt;a href="https://github.com/cline/cline" rel="noopener noreferrer"&gt;cline/cline&lt;/a&gt;, Apache 2.0).&lt;/p&gt;

&lt;p&gt;Where editors like GitHub Copilot give you single-line completions and chat boxes, Cline does the whole task: it reads your repo, plans the change, edits multiple files, runs the terminal, opens a browser to verify, and waits for your approval at every irreversible step. It’s the same shape of agent you get from &lt;a href="https://www.cursor.com/" rel="noopener noreferrer"&gt;Cursor&lt;/a&gt; or &lt;a href="https://codeium.com/windsurf" rel="noopener noreferrer"&gt;Windsurf&lt;/a&gt;, except it costs nothing to install, runs against any model you point it at, and the extension itself is open-source code you can read line-by-line.&lt;/p&gt;

&lt;p&gt;The catch — and the reason it pairs so well with the free AI APIs covered on this blog — is that Cline is BYOK (bring your own key). The extension is free, but the model calls go through whatever provider you configure. With a free &lt;a href="https://toolfreebie.com/gemini-free-ai-api/" rel="noopener noreferrer"&gt;Gemini&lt;/a&gt;, &lt;a href="https://toolfreebie.com/openrouter-free-ai-models/" rel="noopener noreferrer"&gt;OpenRouter&lt;/a&gt;, &lt;a href="https://toolfreebie.com/together-ai-free-api-llama-deepseek-flux-2026/" rel="noopener noreferrer"&gt;Together AI&lt;/a&gt;, or &lt;a href="https://toolfreebie.com/ollama-run-ai-locally/" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt; backend, you can run Cline at zero marginal cost.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cline vs Cursor vs GitHub Copilot
&lt;/h2&gt;

&lt;p&gt;The three tools occupy overlapping but distinct positions. A side-by-side:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Cline&lt;/th&gt;
&lt;th&gt;Cursor&lt;/th&gt;
&lt;th&gt;GitHub Copilot&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Price&lt;/td&gt;
&lt;td&gt;Free (BYOK)&lt;/td&gt;
&lt;td&gt;$20/mo Pro&lt;/td&gt;
&lt;td&gt;$10/mo Individual&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Editor&lt;/td&gt;
&lt;td&gt;VS Code extension&lt;/td&gt;
&lt;td&gt;Forked VS Code (separate app)&lt;/td&gt;
&lt;td&gt;VS Code, JetBrains, others&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Open source&lt;/td&gt;
&lt;td&gt;Yes (Apache 2.0)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Choose your model&lt;/td&gt;
&lt;td&gt;Anthropic, OpenAI, Gemini, DeepSeek, Groq, Together, Ollama, LM Studio, Bedrock, OpenRouter, LiteLLM&lt;/td&gt;
&lt;td&gt;Cursor-managed (mostly Claude/GPT)&lt;/td&gt;
&lt;td&gt;Mostly OpenAI, some Claude&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Free model option&lt;/td&gt;
&lt;td&gt;Yes — pair with any free API&lt;/td&gt;
&lt;td&gt;Limited free tier&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Autonomous multi-file edits&lt;/td&gt;
&lt;td&gt;Yes (Act mode)&lt;/td&gt;
&lt;td&gt;Yes (Composer / Agent)&lt;/td&gt;
&lt;td&gt;Yes (Copilot Workspace, beta)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Terminal execution&lt;/td&gt;
&lt;td&gt;Yes (with approval)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Browser automation&lt;/td&gt;
&lt;td&gt;Yes (built-in)&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;a href="https://toolfreebie.com/mcp-protocol-ai-agents/" rel="noopener noreferrer"&gt;MCP&lt;/a&gt; server support&lt;/td&gt;
&lt;td&gt;Yes (native)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Plan-then-execute mode&lt;/td&gt;
&lt;td&gt;Yes (Plan / Act toggle)&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Token cost tracker per task&lt;/td&gt;
&lt;td&gt;Yes (live, per request)&lt;/td&gt;
&lt;td&gt;No (subscription)&lt;/td&gt;
&lt;td&gt;No (subscription)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The headline trade-off: Cursor and Copilot give you a managed experience and predictable monthly bill. Cline gives you full transparency over the model, the prompts, and the per-token cost — at the price of wiring up your own API key. For developers who already keep Gemini, OpenRouter, or DeepSeek keys around for other projects, that wiring is a five-minute job.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Features That Matter
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Plan and Act Modes
&lt;/h3&gt;

&lt;p&gt;Cline’s marquee feature in 2026 is the explicit Plan/Act toggle in the input bar. In &lt;strong&gt;Plan mode&lt;/strong&gt;, the model can only read files, search the workspace, and write you a step-by-step proposal — it cannot modify code or run commands. In &lt;strong&gt;Act mode&lt;/strong&gt;, it executes that plan, asking for approval before each tool use it considers irreversible (file writes, terminal commands, browser actions).&lt;/p&gt;

&lt;p&gt;This separation maps directly to how senior engineers actually work: think first, code second. It also dramatically reduces wasted tokens — a small reasoning model in Plan mode can often produce a workable plan that a cheaper executor model then fills in.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Bring-Your-Own-Model
&lt;/h3&gt;

&lt;p&gt;The provider dropdown in Cline’s settings is the longest in the category. You can route the same conversation through any of: Anthropic (Claude 3.7/4 Sonnet, Opus), OpenAI (GPT-4o, GPT-4.1, o-series), Google Gemini, DeepSeek, Groq, Together AI, Mistral, OpenRouter, AWS Bedrock, GCP Vertex AI, Azure OpenAI, OpenAI-compatible local servers (Ollama, LM Studio, llama.cpp, vLLM), and LiteLLM proxies.&lt;/p&gt;

&lt;p&gt;This matters for two reasons. First, you can pick the cost/quality point that matches the task — a tiny local model for boilerplate, a frontier model for the hard refactor. Second, it future-proofs your workflow: when a new state-of-the-art model lands, you point Cline at it the same day, no waiting for a vendor to integrate it.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Native MCP Support
&lt;/h3&gt;

&lt;p&gt;Cline was one of the earliest agents to ship native support for the &lt;a href="https://toolfreebie.com/mcp-protocol-ai-agents/" rel="noopener noreferrer"&gt;Model Context Protocol&lt;/a&gt;. Any MCP server — file system, GitHub, Postgres, Playwright, Slack, or your own — plugs into Cline’s tool list with no extra wiring. The MCP marketplace inside Cline lists hundreds of community servers you can install in two clicks.&lt;/p&gt;

&lt;p&gt;Practically, this means Cline can do things outside the editor without you teaching it custom tools: query your production-replica Postgres, file a GitHub issue from a stack trace, or drive a Playwright browser to reproduce a bug a user filed.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Browser Automation Built In
&lt;/h3&gt;

&lt;p&gt;Cline ships with a built-in headless browser tool. The agent can open a URL, screenshot the page, click on elements, type into fields, and read back the rendered DOM. The killer use case: “make this UI change and verify it visually” — Cline edits the React component, runs the dev server, opens the page, screenshots before/after, and only marks the task complete once the visual confirms the change.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Live Cost Visibility
&lt;/h3&gt;

&lt;p&gt;Every task in Cline shows a running token counter and dollar estimate based on the current provider’s pricing. You can watch a multi-step refactor consume tokens in real time, and you can hit Stop the moment it stops being economical. No other agent in this category surfaces cost this directly.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Install Cline
&lt;/h2&gt;

&lt;p&gt;Installation takes about thirty seconds:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open VS Code (or VS Codium, Cursor, Windsurf — Cline runs in any VS Code-compatible editor)&lt;/li&gt;
&lt;li&gt;Open the Extensions panel (&lt;code&gt;Ctrl+Shift+X&lt;/code&gt; or &lt;code&gt;Cmd+Shift+X&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Search for &lt;strong&gt;Cline&lt;/strong&gt; (publisher: saoudrizwan)&lt;/li&gt;
&lt;li&gt;Click Install&lt;/li&gt;
&lt;li&gt;Click the new Cline icon in the activity bar&lt;/li&gt;
&lt;li&gt;On first launch, Cline asks you to pick a provider and paste an API key&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That’s it. Cline now sits in the side panel with an input box and a Plan/Act toggle. The extension itself never makes a network call until you give it a model and start a task.&lt;/p&gt;

&lt;h2&gt;
  
  
  Connecting Cline to Free APIs
&lt;/h2&gt;

&lt;p&gt;The provider you pick determines whether Cline is genuinely free or just cheap. The four practical zero-cost options:&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 1: Google Gemini (Recommended for Starters)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://toolfreebie.com/gemini-free-ai-api/" rel="noopener noreferrer"&gt;Gemini’s free tier&lt;/a&gt; on Google AI Studio gives you Gemini 2.0 Flash and Gemini 2.5 Pro at very generous request-per-minute limits with a 1M-token context window — long enough to dump your entire repo into a single prompt for most projects.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Get a free key at &lt;a href="https://aistudio.google.com" rel="noopener noreferrer"&gt;aistudio.google.com&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;In Cline settings, choose provider &lt;strong&gt;Google Gemini&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Paste the key&lt;/li&gt;
&lt;li&gt;Pick model &lt;code&gt;gemini-2.5-pro&lt;/code&gt; (best for planning) or &lt;code&gt;gemini-2.0-flash&lt;/code&gt; (faster, cheaper if you flip to paid)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For most personal projects, Gemini’s free tier is enough to keep Cline running indefinitely with no card on file.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 2: OpenRouter Free Models
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://toolfreebie.com/openrouter-free-ai-models/" rel="noopener noreferrer"&gt;OpenRouter&lt;/a&gt; aggregates dozens of providers behind one OpenAI-compatible endpoint, including a tier of &lt;code&gt;:free&lt;/code&gt; models you can call without spending credits.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Sign up at &lt;a href="https://openrouter.ai" rel="noopener noreferrer"&gt;openrouter.ai&lt;/a&gt; and copy your key&lt;/li&gt;
&lt;li&gt;In Cline settings, choose provider &lt;strong&gt;OpenRouter&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Paste the key&lt;/li&gt;
&lt;li&gt;In the model search, type &lt;code&gt;:free&lt;/code&gt; and pick a strong free model like &lt;code&gt;deepseek/deepseek-r1:free&lt;/code&gt; or &lt;code&gt;meta-llama/llama-3.3-70b-instruct:free&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;OpenRouter’s free tier rate limits are tighter than Gemini’s, but the variety is unmatched — you can switch between fifty different free models without changing keys.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 3: Ollama (Fully Local, Fully Free)
&lt;/h3&gt;

&lt;p&gt;If you have a reasonable laptop and don’t want any cloud calls at all, &lt;a href="https://toolfreebie.com/ollama-run-ai-locally/" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt; runs models on your own GPU.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install Ollama and pull a model: &lt;code&gt;ollama pull qwen2.5-coder:14b&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;In Cline settings, choose provider &lt;strong&gt;Ollama&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Set the base URL to &lt;code&gt;http://localhost:11434&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Pick the model you pulled&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Quality drops below frontier models, but for boilerplate, file renames, and small refactors, a local 14B model is good enough — and zero tokens leave your machine.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 4: Together AI Free Tier
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://toolfreebie.com/together-ai-free-api-llama-deepseek-flux-2026/" rel="noopener noreferrer"&gt;Together AI&lt;/a&gt;‘s &lt;code&gt;-Free&lt;/code&gt; models include Llama 3.3 70B and DeepSeek R1 Distill 70B — both strong code models. Sign up, copy the key, choose the OpenAI-compatible provider in Cline, and point the base URL to &lt;code&gt;https://api.together.xyz/v1&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mixed Strategy: Plan with One, Act with Another
&lt;/h3&gt;

&lt;p&gt;Cline lets you set &lt;em&gt;different&lt;/em&gt; models for Plan and Act mode. A common pattern in 2026: use a frontier reasoning model (Gemini 2.5 Pro, Claude Opus, DeepSeek R1) for Plan mode, then switch to a fast cheap model (Llama 3.3 70B on &lt;a href="https://toolfreebie.com/groq-fastest-free-ai-api/" rel="noopener noreferrer"&gt;Groq&lt;/a&gt;, Gemini 2.0 Flash) for Act mode. The plan is the expensive part of cognition; the execution is mostly mechanical edits.&lt;/p&gt;

&lt;h2&gt;
  
  
  A First Real Task: Add a Feature End-to-End
&lt;/h2&gt;

&lt;p&gt;To make this concrete, here’s a typical Cline session for adding a “dark mode toggle” to a React app. The user types one prompt; everything else is Cline.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt:&lt;/strong&gt; &lt;em&gt;“Add a dark mode toggle to the navbar. Persist the choice in localStorage. Use the existing Tailwind dark: variants — don’t introduce a new theming library.”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In Plan mode, Cline:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Reads &lt;code&gt;tailwind.config.js&lt;/code&gt;, confirms &lt;code&gt;darkMode: 'class'&lt;/code&gt; is set (or proposes enabling it)&lt;/li&gt;
&lt;li&gt;Reads &lt;code&gt;src/components/Navbar.tsx&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Searches the codebase for any existing theme context&lt;/li&gt;
&lt;li&gt;Writes a plan: add a &lt;code&gt;useDarkMode&lt;/code&gt; hook, modify Navbar to render a toggle button, add a small effect that syncs the &lt;code&gt;dark&lt;/code&gt; class on &lt;code&gt;document.documentElement&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You read the plan. If it looks right, you flip to Act mode. Cline:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Creates &lt;code&gt;src/hooks/useDarkMode.ts&lt;/code&gt; — pauses, shows you the diff, waits for approval&lt;/li&gt;
&lt;li&gt;Edits &lt;code&gt;Navbar.tsx&lt;/code&gt; — pauses, shows the diff, waits for approval&lt;/li&gt;
&lt;li&gt;Runs &lt;code&gt;pnpm run dev&lt;/code&gt; in the terminal — pauses, asks before executing&lt;/li&gt;
&lt;li&gt;Opens &lt;code&gt;http://localhost:5173&lt;/code&gt; in the built-in browser&lt;/li&gt;
&lt;li&gt;Screenshots the navbar, clicks the new toggle, screenshots again, confirms the page background switched&lt;/li&gt;
&lt;li&gt;Reports done with a list of files changed and the cost (e.g. “$0.04, 18,200 tokens”)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The full task is fifteen minutes of mostly autonomous work. You stayed in the loop at the four moments that matter (plan, two diffs, terminal). For senior engineers used to writing every line, this feels strange the first time and indispensable by the third.&lt;/p&gt;

&lt;h2&gt;
  
  
  MCP: Giving Cline Superpowers
&lt;/h2&gt;

&lt;p&gt;Cline’s MCP support is what lets it reach beyond the file system. Three useful servers to install on day one (all from the Cline marketplace):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;filesystem&lt;/strong&gt; — read/write outside the open workspace, useful for cross-repo refactors&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;github&lt;/strong&gt; — open issues, file PRs, comment on existing PRs without leaving the editor&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;playwright&lt;/strong&gt; — drive a real browser to reproduce user-reported bugs against your dev server&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To install one, click the MCP icon in Cline’s panel, search the marketplace, and click Install. The server runs as a local subprocess; no cloud connection unless the server itself needs one.&lt;/p&gt;

&lt;p&gt;Custom MCP servers — anything you’ve built or anything from the wider &lt;a href="https://toolfreebie.com/mcp-protocol-ai-agents/" rel="noopener noreferrer"&gt;MCP ecosystem&lt;/a&gt; — drop in just by adding their config to &lt;code&gt;~/.cline/mcp.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "mcpServers": {
    "my-postgres": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-postgres", "postgres://localhost/dev"]
    }
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After a reload, Cline can run read-only SQL against your dev database without you copy-pasting schemas into the chat.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost Comparison: Cline + Free API vs Cursor / Copilot
&lt;/h2&gt;

&lt;p&gt;Concrete monthly costs for a developer who uses an AI agent for, say, 60 hours of coding:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Setup&lt;/th&gt;
&lt;th&gt;Monthly Cost&lt;/th&gt;
&lt;th&gt;Model Quality&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cline + Gemini 2.5 Pro free tier&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;Frontier&lt;/td&gt;
&lt;td&gt;Hits rate limits at high usage; works for most solo work&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cline + OpenRouter free models&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;Strong open&lt;/td&gt;
&lt;td&gt;Tighter limits but huge model variety&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cline + Ollama Qwen2.5-Coder 14B&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;Mid&lt;/td&gt;
&lt;td&gt;Local, no cloud calls, no rate limits&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cline + Anthropic Claude Sonnet (paid)&lt;/td&gt;
&lt;td&gt;~$10–40&lt;/td&gt;
&lt;td&gt;Frontier&lt;/td&gt;
&lt;td&gt;Pay only for what you use; transparent per-task&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cursor Pro&lt;/td&gt;
&lt;td&gt;$20&lt;/td&gt;
&lt;td&gt;Frontier (Claude/GPT)&lt;/td&gt;
&lt;td&gt;Predictable; unlimited slow model, capped fast models&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GitHub Copilot Individual&lt;/td&gt;
&lt;td&gt;$10&lt;/td&gt;
&lt;td&gt;Frontier (GPT-4.1)&lt;/td&gt;
&lt;td&gt;Strong autocomplete, weaker agent UX&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cursor Pro + Cline as backup&lt;/td&gt;
&lt;td&gt;$20&lt;/td&gt;
&lt;td&gt;Frontier&lt;/td&gt;
&lt;td&gt;Both options available; Cline catches what Cursor misses&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The honest answer: if you bill clients for engineering time, $20/mo for Cursor pays for itself in the first hour. If you’re a hobbyist, student, or open-source maintainer, Cline + a free API tier gets you 80% of the experience at $0/mo. The two aren’t mutually exclusive — Cursor is itself a VS Code fork, so you can install Cline inside Cursor and have both available.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cline vs Aider vs Continue.dev
&lt;/h2&gt;

&lt;p&gt;The free open-source AI coding agent space in 2026 has three credible contenders. A quick decision matrix:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Project&lt;/th&gt;
&lt;th&gt;Surface&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;th&gt;Weakness&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cline&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;VS Code extension&lt;/td&gt;
&lt;td&gt;Visual workflows, browser verification, MCP-heavy tasks&lt;/td&gt;
&lt;td&gt;Heavier UI; needs VS Code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://aider.chat" rel="noopener noreferrer"&gt;Aider&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Terminal CLI&lt;/td&gt;
&lt;td&gt;Power users on the command line, Git-aware refactors&lt;/td&gt;
&lt;td&gt;No GUI; less hand-holding for newcomers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://continue.dev" rel="noopener noreferrer"&gt;Continue.dev&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;VS Code &amp;amp; JetBrains&lt;/td&gt;
&lt;td&gt;Enterprise teams that want shared config + autocomplete&lt;/td&gt;
&lt;td&gt;Less autonomous than Cline; more like Copilot&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you live in VS Code and want full agent autonomy, Cline. If you live in tmux and want every change tied to a Git commit, Aider. If you need a team-shareable autocomplete plus chat, Continue. None of these is wrong; they fit different working styles.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tips for Keeping Cost (and Frustration) Low
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Use Plan mode aggressively.&lt;/strong&gt; A 2,000-token plan is cheaper than a 20,000-token wrong-direction execution.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add a &lt;code&gt;.clineignore&lt;/code&gt; file&lt;/strong&gt; so Cline doesn’t accidentally read &lt;code&gt;node_modules&lt;/code&gt;, lock files, or build outputs into context.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pin the model per-task.&lt;/strong&gt; Use a small fast model for find-and-replace work; reserve the frontier model for design and debugging.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cap with &lt;code&gt;maxRequests&lt;/code&gt;.&lt;/strong&gt; Cline has a per-task request limit that stops runaway loops — set it to 30 for most tasks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Approve diffs incrementally.&lt;/strong&gt; If a diff looks wrong, reject and explain in one sentence. Cline rewrites the diff much faster than rolling back later.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pair with &lt;a href="https://toolfreebie.com/ollama-run-ai-locally/" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt; for repetitive tasks&lt;/strong&gt; like generating tests for already-written functions; the local model is “free” tokens.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Is Cline really free?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Cline extension is free under Apache 2.0. The model calls are not — you pay whichever provider you connect (or pay nothing if you stay within a free tier or run Ollama locally). There’s no Cline-the-company subscription gate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does Cline work in Cursor or Windsurf?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes — both are forks of VS Code, and Cline installs cleanly inside either. Some users actually run Cursor as their editor and Cline as a second agent for tasks they’d rather hand off entirely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can Cline read my whole codebase?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It reads files on demand using a search-and-grep flow rather than embedding the entire repo. That keeps context windows honest and means you don’t need a vector database for it to work. For very large repos, pair it with a model that has a long context window (Gemini 2.5 Pro at 1M, Claude Sonnet at 200K).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Will Cline silently delete my files?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;No. Every file write, terminal command, and browser action requires explicit approval before it runs (this is the default and changing it is a deliberate setting). The agent shows you the diff or the command before you click Approve.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I use Cline offline?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes — point it at &lt;a href="https://toolfreebie.com/ollama-run-ai-locally/" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt; or LM Studio running locally. Once the model is pulled, Cline does not need a network connection.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does Cline support tool use / function calling?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes, both natively (its built-in tools for files, terminal, browser) and via MCP servers. Models that don’t support function calling natively still work — Cline uses a structured prompt format underneath.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What’s the difference between Cline and Roo Code?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Roo Code is a popular fork of Cline with an additional set of “modes” (Architect, Code, Ask, Debug) and slightly different UI conventions. Functionally similar; pick whichever interface you prefer. Both are free and open source.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does Cline phone home or collect telemetry?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The extension itself sends only opt-in anonymous usage telemetry; the model calls go directly from your machine to whichever provider you configured. There is no Cline-operated proxy in the path.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Use Cline vs Alternatives
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;You want a free, transparent AI coding agent and don’t mind wiring an API key&lt;/strong&gt; → Cline + a free model&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want zero setup and predictable monthly cost, willing to pay $20&lt;/strong&gt; → &lt;a href="https://www.cursor.com" rel="noopener noreferrer"&gt;Cursor&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You live in a terminal and want every change committed&lt;/strong&gt; → &lt;a href="https://aider.chat" rel="noopener noreferrer"&gt;Aider&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want only autocomplete plus a chat box&lt;/strong&gt; → GitHub Copilot or &lt;a href="https://continue.dev" rel="noopener noreferrer"&gt;Continue.dev&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want to plug your own MCP servers into the agent loop&lt;/strong&gt; → Cline (best-in-class MCP UX)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want to run everything locally with no cloud calls&lt;/strong&gt; → Cline + &lt;a href="https://toolfreebie.com/ollama-run-ai-locally/" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Use Cline with OpenClaw
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://openclaw.ai" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt; is an AI agent platform for orchestrating multi-step automated workflows. Cline plays well at the seam between human-in-the-loop coding and fully autonomous OpenClaw flows.&lt;/p&gt;

&lt;p&gt;A useful split: OpenClaw runs the long-running unattended jobs (nightly dependency updates, regenerating SDK clients from a changed OpenAPI spec, checking the build on multiple Node versions). Cline handles the human-in-the-loop work where you actually want to read every diff before it lands. The two share the same model providers — connect both to the same &lt;a href="https://toolfreebie.com/openrouter-free-ai-models/" rel="noopener noreferrer"&gt;OpenRouter&lt;/a&gt; or &lt;a href="https://toolfreebie.com/together-ai-free-api-llama-deepseek-flux-2026/" rel="noopener noreferrer"&gt;Together AI&lt;/a&gt; key, and you have one billing surface for everything.&lt;/p&gt;

&lt;p&gt;A concrete example pipeline: an OpenClaw cron job watches a third-party SDK for new releases, downloads the new version, runs your test suite against it, and on failure files a GitHub issue with the failing test and stack trace. The next morning, you open the issue inside Cline (“fix issue #483”), and Cline does the actual fix work with you supervising the diffs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Verdict
&lt;/h2&gt;

&lt;p&gt;Cline is the right default in 2026 for any developer who already has a free AI API key and wants a serious coding agent without paying a subscription. The Plan/Act split is genuinely better UX than the implicit modes other agents use. Native MCP support means it grows with the ecosystem instead of getting locked into one set of built-in tools. And because the provider is your choice, you can pick the cost/quality point that matches the task and switch the moment a better model lands.&lt;/p&gt;

&lt;p&gt;Cursor and Copilot are still excellent products — for some teams the fixed monthly cost and curated model selection is exactly what’s wanted. But Cline is the option that makes “AI coding agent” available to anyone with a laptop and a free API key, with no gatekeeping and no contract. Install the extension, point it at &lt;a href="https://toolfreebie.com/gemini-free-ai-api/" rel="noopener noreferrer"&gt;Gemini&lt;/a&gt;, give it a small task, and decide for yourself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Reads
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/free-ai-coding-assistants/" rel="noopener noreferrer"&gt;5 Free AI Coding Assistants for VS Code &amp;amp; Terminal&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/aider-free-ai-coding-cli/" rel="noopener noreferrer"&gt;Aider: Free Open-Source AI Coding Agent for Your Terminal&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/crewai-vs-autogpt-vs-langgraph/" rel="noopener noreferrer"&gt;CrewAI vs AutoGPT vs LangGraph: Which Free Agent Framework Should You Use in 2026?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/n8n-workflow-automation/" rel="noopener noreferrer"&gt;n8n: Open-Source Workflow Automation with AI Agents and 400+ Integrations&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/mcp-protocol-ai-agents/" rel="noopener noreferrer"&gt;MCP (Model Context Protocol): Connect AI Agents to Any Tool or API&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/notebooklm-ai-research/" rel="noopener noreferrer"&gt;Google NotebookLM: Free AI Research Tool for Summarizing Documents and PDFs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://toolfreebie.com/cline-vscode-ai-agent/" rel="noopener noreferrer"&gt;toolfreebie.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
    </item>
    <item>
      <title>Together AI Free API: Run Llama 3.3, DeepSeek R1, and FLUX Image Generation for Free in 2026</title>
      <dc:creator>toolfreebie</dc:creator>
      <pubDate>Sun, 03 May 2026 15:54:07 +0000</pubDate>
      <link>https://dev.to/build996/together-ai-free-api-run-llama-33-deepseek-r1-and-flux-image-generation-for-free-in-2026-19of</link>
      <guid>https://dev.to/build996/together-ai-free-api-run-llama-33-deepseek-r1-and-flux-image-generation-for-free-in-2026-19of</guid>
      <description>&lt;h2&gt;
  
  
  What Is Together AI?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.together.ai" rel="noopener noreferrer"&gt;Together AI&lt;/a&gt; is an AI inference platform that hosts hundreds of open-source models behind one OpenAI-compatible API. Founded in 2022 and backed by NVIDIA, Salesforce Ventures, and Kleiner Perkins, the company built its reputation around two things developers actually care about: &lt;strong&gt;fast hosted inference for state-of-the-art open models&lt;/strong&gt; (Llama, DeepSeek, Qwen, Mixtral) and a &lt;strong&gt;genuinely free tier&lt;/strong&gt; that exposes a small but useful set of those models with no credit card required.&lt;/p&gt;

&lt;p&gt;What separates Together AI from the long list of “free AI API” providers in 2026 is the breadth of categories you can hit on a single key. One signup gives you free access to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Llama 3.3 70B Instruct Turbo (Free)&lt;/strong&gt; — Meta’s flagship 70B chat model&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DeepSeek R1 Distill Llama 70B (Free)&lt;/strong&gt; — open reasoning model with chain-of-thought&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;FLUX.1 &lt;a href="https://dev.toFree"&gt;schnell&lt;/a&gt;&lt;/strong&gt; — Black Forest Labs’ fast image generation model&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Llama 3.2 11B Vision Instruct (Free)&lt;/strong&gt; — multimodal image-understanding model&lt;/li&gt;
&lt;li&gt;Plus hundreds of other open models on a $1 trial credit&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’re already evaluating &lt;a href="https://toolfreebie.com/groq-api-the-fastest-free-ai-api-in-2026-300-800-tokens-s/" rel="noopener noreferrer"&gt;Groq&lt;/a&gt;, &lt;a href="https://toolfreebie.com/cerebras-inference-api-fastest-free-ai-api/" rel="noopener noreferrer"&gt;Cerebras&lt;/a&gt;, &lt;a href="https://toolfreebie.com/google-gemini-api-the-best-free-ai-api-in-2026/" rel="noopener noreferrer"&gt;Gemini&lt;/a&gt;, or &lt;a href="https://toolfreebie.com/deepseek-api-free-access-to-r1-reasoning-and-v3-chat-models/" rel="noopener noreferrer"&gt;DeepSeek&lt;/a&gt;, Together AI fills a different gap: a single endpoint that covers chat, reasoning, vision, and image generation on the same key.&lt;/p&gt;

&lt;h2&gt;
  
  
  What’s Actually Free on Together AI
&lt;/h2&gt;

&lt;p&gt;Together AI uses a clear naming convention: any model whose ID ends with the suffix &lt;code&gt;-Free&lt;/code&gt; can be called without consuming credits. These are slightly slower than the paid tiers (rate-limited, lower priority) but functionally complete. Everything else runs against the $1 free trial credit you get at signup.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model ID&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Context&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;meta-llama/Llama-3.3-70B-Instruct-Turbo-Free&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Chat / instruction&lt;/td&gt;
&lt;td&gt;128K tokens&lt;/td&gt;
&lt;td&gt;General assistant, RAG answer generation, code Q&amp;amp;A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;deepseek-ai/DeepSeek-R1-Distill-Llama-70B-free&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Reasoning&lt;/td&gt;
&lt;td&gt;32K tokens&lt;/td&gt;
&lt;td&gt;Math, multi-step logic, agent planning loops&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;meta-llama/Llama-Vision-Free&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Vision (multimodal)&lt;/td&gt;
&lt;td&gt;128K tokens&lt;/td&gt;
&lt;td&gt;Image captioning, OCR, chart and screenshot understanding&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;black-forest-labs/FLUX.1-schnell-Free&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Image generation&lt;/td&gt;
&lt;td&gt;1024×1024 default&lt;/td&gt;
&lt;td&gt;Blog cover images, prototypes, social posts&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Beyond the explicitly free tier, the $1 trial credit is enough to exercise dozens of paid models — Mixtral 8x22B, Qwen 2.5 72B, Llama 3.1 405B, audio models like Whisper, embeddings models like BGE and M2-BERT — for tens of thousands of tokens each, which is plenty to test whether the bigger models meaningfully change your results before you commit a card.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: Together AI quietly retires and renames “Free” models from time to time as newer versions land. If a model ID stops working, check the &lt;a href="https://docs.together.ai/docs/serverless-models" rel="noopener noreferrer"&gt;official model list&lt;/a&gt; for the current Free variant.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Get Your Free API Key
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;a href="https://api.together.ai" rel="noopener noreferrer"&gt;api.together.ai&lt;/a&gt; and sign up with email, Google, or GitHub&lt;/li&gt;
&lt;li&gt;Verify your email address&lt;/li&gt;
&lt;li&gt;From the dashboard, navigate to &lt;strong&gt;Settings → API Keys&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Copy your default key (it starts with a long hex string, no prefix)&lt;/li&gt;
&lt;li&gt;Set it as an environment variable: &lt;code&gt;export TOGETHER_API_KEY="your_key_here"&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No credit card. No phone number. The $1 free trial credit and access to all &lt;code&gt;-Free&lt;/code&gt; models are activated immediately on signup.&lt;/p&gt;

&lt;h2&gt;
  
  
  curl Quickstart: Your First Request in 30 Seconds
&lt;/h2&gt;

&lt;p&gt;Together AI is fully OpenAI-compatible, so the cleanest way to confirm everything works is a one-shot curl call against the chat completions endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://api.together.xyz/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$TOGETHER_API_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo-Free",
    "messages": [
      {"role": "user", "content": "Explain pgvector in two sentences."}
    ]
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you get back a JSON response with a &lt;code&gt;choices[0].message.content&lt;/code&gt; field, you’re set. The exact same payload shape works against OpenAI — only the base URL and the &lt;code&gt;model&lt;/code&gt; string change.&lt;/p&gt;

&lt;h2&gt;
  
  
  Python Quickstart
&lt;/h2&gt;

&lt;p&gt;The official SDK is a thin wrapper around the OpenAI Python client. Install it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;together
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Basic chat completion:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;together&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Together&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Together&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TOGETHER_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta-llama/Llama-3.3-70B-Instruct-Turbo-Free&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a concise senior engineer.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;When should I prefer SQLite over Postgres?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you already have OpenAI SDK code, swapping providers is a two-line change:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TOGETHER_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.together.xyz/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta-llama/Llama-3.3-70B-Instruct-Turbo-Free&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a haiku about caching.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every parameter you’d pass to OpenAI — &lt;code&gt;temperature&lt;/code&gt;, &lt;code&gt;top_p&lt;/code&gt;, &lt;code&gt;stop&lt;/code&gt;, &lt;code&gt;response_format&lt;/code&gt;, &lt;code&gt;tools&lt;/code&gt;, &lt;code&gt;tool_choice&lt;/code&gt; — works identically.&lt;/p&gt;

&lt;h2&gt;
  
  
  Streaming Responses
&lt;/h2&gt;

&lt;p&gt;For chat UIs and agent loops, you almost always want token streaming. Set &lt;code&gt;stream=True&lt;/code&gt; and iterate:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta-llama/Llama-3.3-70B-Instruct-Turbo-Free&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Outline a blog post about RAG.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;delta&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flush&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Streaming on the Free tier is real streaming, not buffered chunks — you’ll see tokens appear at roughly the model’s true generation rate, which makes it usable for live chat UIs even before you start paying.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reasoning with DeepSeek R1 Distill
&lt;/h2&gt;

&lt;p&gt;The DeepSeek R1 family produces visible chain-of-thought reasoning before its final answer. On Together AI’s Free tier you can call the 70B distilled variant, which keeps most of the reasoning capability of the full R1 model at a fraction of the parameter count:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deepseek-ai/DeepSeek-R1-Distill-Llama-70B-free&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A bookstore sold 60 books on Monday, then sales grew &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;12% each day through Friday. How many books did they &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sell in total that week? Show your work.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model’s response will include a &lt;code&gt;&amp;lt;think&amp;gt;&lt;/code&gt;…&lt;code&gt;&amp;lt;/think&amp;gt;&lt;/code&gt; block of internal reasoning followed by the final answer. For agent applications, you can either show the reasoning to the user (transparency) or strip it out (clean output) depending on the surface.&lt;/p&gt;

&lt;h2&gt;
  
  
  Image Generation with FLUX.1 [schnell] Free
&lt;/h2&gt;

&lt;p&gt;FLUX.1 [schnell] is Black Forest Labs’ fast text-to-image model, distilled to 4 sampling steps and open-sourced under Apache 2.0. Together AI hosts it as a free image-generation endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;black-forest-labs/FLUX.1-schnell-Free&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A clean isometric illustration of an AI agent fetching data from a cloud database, soft pastel colors, no text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;height&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;steps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The returned URL is hosted by Together AI and stays valid long enough to download or pipe into a CDN. For blog covers, social posts, or quick mockups, FLUX.1 [schnell] often beats Stable Diffusion XL on prompt adherence at a fraction of the inference time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Vision: Llama 3.2 Vision Free
&lt;/h2&gt;

&lt;p&gt;The Free vision model accepts standard OpenAI-format multimodal messages — text plus image URLs or base64 data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta-llama/Llama-Vision-Free&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What does this dashboard show? List the three highest values.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://example.com/dashboard.png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;
            &lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the cheapest path in 2026 to a working “describe this screenshot” or “extract data from this chart” feature without standing up your own vision pipeline. For OCR-heavy workloads on dense documents, a paid vision model will still outperform — but for screenshots, charts, product photos, and general image Q&amp;amp;A, Llama Vision Free is genuinely useful.&lt;/p&gt;

&lt;h2&gt;
  
  
  Together AI vs Other Free AI APIs
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Free Chat&lt;/th&gt;
&lt;th&gt;Free Reasoning&lt;/th&gt;
&lt;th&gt;Free Vision&lt;/th&gt;
&lt;th&gt;Free Image Gen&lt;/th&gt;
&lt;th&gt;OpenAI Compatible&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Together AI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Llama 3.3 70B&lt;/td&gt;
&lt;td&gt;DeepSeek R1 Distill 70B&lt;/td&gt;
&lt;td&gt;Llama 3.2 Vision 11B&lt;/td&gt;
&lt;td&gt;FLUX.1 schnell&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://toolfreebie.com/groq-api-the-fastest-free-ai-api-in-2026-300-800-tokens-s/" rel="noopener noreferrer"&gt;Groq&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Llama 3.3 70B (very fast)&lt;/td&gt;
&lt;td&gt;DeepSeek R1 Distill&lt;/td&gt;
&lt;td&gt;Llama Vision&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://toolfreebie.com/cerebras-inference-api-fastest-free-ai-api/" rel="noopener noreferrer"&gt;Cerebras&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Llama 3.3 70B (extremely fast)&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://toolfreebie.com/google-gemini-api-the-best-free-ai-api-in-2026/" rel="noopener noreferrer"&gt;Gemini&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Gemini 2.0 Flash&lt;/td&gt;
&lt;td&gt;Gemini 2.0 Flash Thinking&lt;/td&gt;
&lt;td&gt;Built in&lt;/td&gt;
&lt;td&gt;Imagen (limited)&lt;/td&gt;
&lt;td&gt;Via compat layer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://toolfreebie.com/cloudflare-workers-ai-free-edge-ai-inference-with-47-models/" rel="noopener noreferrer"&gt;Cloudflare Workers AI&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Llama 3 / Mistral&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;LLaVA&lt;/td&gt;
&lt;td&gt;SDXL Lightning&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://toolfreebie.com/openrouter-access-300-free-ai-models-with-one-api-key/" rel="noopener noreferrer"&gt;OpenRouter&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Many free models&lt;/td&gt;
&lt;td&gt;DeepSeek R1 free&lt;/td&gt;
&lt;td&gt;Several&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Where Together AI wins on the free tier:&lt;/strong&gt; coverage. It’s the only provider on this list that offers chat, reasoning, vision, &lt;em&gt;and&lt;/em&gt; image generation under one OpenAI-compatible endpoint, on one key, with no credit card. If you’re prototyping a multimodal product and don’t want to juggle three or four signups, Together AI compresses the entire surface area into one integration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where the others win:&lt;/strong&gt; raw speed (Cerebras and Groq are faster on Llama 3.3 70B), context window (Gemini’s 1M tokens is unmatched), or model variety (OpenRouter aggregates more providers).&lt;/p&gt;

&lt;h2&gt;
  
  
  Rate Limits and Fair Use
&lt;/h2&gt;

&lt;p&gt;Free-tier rate limits on Together AI exist to keep costs predictable. The exact numbers are published in the &lt;a href="https://docs.together.ai/docs/rate-limits" rel="noopener noreferrer"&gt;official rate limits page&lt;/a&gt; and change as the platform scales, but as a working mental model in 2026:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;-Free chat models:&lt;/strong&gt; low double-digit requests per minute, with smaller per-day caps than paid tiers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;-Free image models:&lt;/strong&gt; tighter caps (image inference is much more expensive), often a few requests per minute&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Paid models on trial credit:&lt;/strong&gt; the standard tier-1 limits, but capped by your $1 budget — usually thousands of requests before the credit runs out on smaller models&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The headline takeaway: Free-tier limits are designed for development and prototyping. They are not designed to support a production user base. If your side project starts getting traction, you’ll need to either move to a paid plan or layer caching in front (request deduplication on prompts is the highest-leverage win).&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Use Together AI vs Alternatives
&lt;/h2&gt;

&lt;p&gt;A simple decision tree based on what you’re optimizing for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Need everything in one key — chat + reasoning + vision + images?&lt;/strong&gt; → Together AI Free tier&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Need the fastest possible chat response (under 1 second to first token)?&lt;/strong&gt; → &lt;a href="https://toolfreebie.com/cerebras-inference-api-fastest-free-ai-api/" rel="noopener noreferrer"&gt;Cerebras&lt;/a&gt; or &lt;a href="https://toolfreebie.com/groq-api-the-fastest-free-ai-api-in-2026-300-800-tokens-s/" rel="noopener noreferrer"&gt;Groq&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Need a 1M-token context window for long documents?&lt;/strong&gt; → &lt;a href="https://toolfreebie.com/google-gemini-api-the-best-free-ai-api-in-2026/" rel="noopener noreferrer"&gt;Gemini&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Need the widest catalogue of free models from many providers?&lt;/strong&gt; → &lt;a href="https://toolfreebie.com/openrouter-access-300-free-ai-models-with-one-api-key/" rel="noopener noreferrer"&gt;OpenRouter&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Need the best free embedding + reranker for RAG?&lt;/strong&gt; → &lt;a href="https://toolfreebie.com/cohere-free-api-embedding-rerank-rag-2026/" rel="noopener noreferrer"&gt;Cohere&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Building edge functions and want inference inside Cloudflare?&lt;/strong&gt; → &lt;a href="https://toolfreebie.com/cloudflare-workers-ai-free-edge-ai-inference-with-47-models/" rel="noopener noreferrer"&gt;Cloudflare Workers AI&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Together AI is the right answer when your project benefits from a single integration that covers many capabilities, especially for multimodal applications and reasoning-heavy agents that may also need image generation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use Together AI with OpenClaw
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://openclaw.ai" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt; is an AI agent platform that orchestrates multiple APIs and tools into automated workflows. Together AI fits well as a &lt;strong&gt;single inference layer&lt;/strong&gt; behind an OpenClaw agent that needs to handle multiple modalities — read a screenshot, reason about what to do next, and produce a generated image as part of the output.&lt;/p&gt;

&lt;p&gt;A working example: an OpenClaw agent receives a customer support ticket that includes a screenshot of an error. The agent uses Llama Vision (Free) to extract the error message from the image, DeepSeek R1 Distill (Free) to reason about which knowledge-base article applies, Llama 3.3 70B (Free) to draft a reply, and FLUX.1 &lt;a href="https://dev.toFree"&gt;schnell&lt;/a&gt; to generate a clean diagram for the customer if a visual explanation helps. All four steps hit the same API key.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;together&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Together&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Together&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TOGETHER_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;support_pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ticket_text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;screenshot_url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;A multi-modal support agent step for OpenClaw.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="c1"&gt;# 1. Extract the error from the screenshot
&lt;/span&gt;    &lt;span class="n"&gt;vision&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta-llama/Llama-Vision-Free&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Read the error message in this screenshot and return only the error text.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;screenshot_url&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;
            &lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;error_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vision&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;

    &lt;span class="c1"&gt;# 2. Reason about which solution applies
&lt;/span&gt;    &lt;span class="n"&gt;reasoning&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deepseek-ai/DeepSeek-R1-Distill-Llama-70B-free&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Ticket: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ticket_text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Error extracted: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;error_text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;What is the most likely root cause?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;}],&lt;/span&gt;
        &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;800&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# 3. Draft a customer-facing reply
&lt;/span&gt;    &lt;span class="n"&gt;reply&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta-llama/Llama-3.3-70B-Instruct-Turbo-Free&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a senior support engineer. Be concise and friendly.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Ticket: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ticket_text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Root cause analysis: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;reasoning&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Write the reply to the customer.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;error_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analysis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;reasoning&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reply&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;reply&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The same pattern fits other OpenClaw use cases: a research agent that reads charts and reasons about them, a content agent that writes a post and generates its cover image, a QA agent that screenshots a UI and verifies what it sees. The single-key, single-SDK shape keeps the agent code small.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pricing When You Outgrow Free
&lt;/h2&gt;

&lt;p&gt;If your application moves beyond prototyping, Together AI’s serverless pricing for the same models is competitive with the rest of the market. Approximate published prices in 2026 for popular models:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Approx Price&lt;/th&gt;
&lt;th&gt;Unit&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Llama 3.3 70B Instruct Turbo&lt;/td&gt;
&lt;td&gt;~$0.88&lt;/td&gt;
&lt;td&gt;per 1M tokens (blended)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Llama 3.1 8B Instruct Turbo&lt;/td&gt;
&lt;td&gt;~$0.18&lt;/td&gt;
&lt;td&gt;per 1M tokens (blended)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Llama 3.1 405B Instruct Turbo&lt;/td&gt;
&lt;td&gt;~$3.50&lt;/td&gt;
&lt;td&gt;per 1M tokens (blended)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek R1&lt;/td&gt;
&lt;td&gt;~$3.00 / $7.00&lt;/td&gt;
&lt;td&gt;per 1M input / output tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FLUX.1 [schnell]&lt;/td&gt;
&lt;td&gt;~$0.003&lt;/td&gt;
&lt;td&gt;per image (1024×1024, 4 steps)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BGE / M2-BERT embeddings&lt;/td&gt;
&lt;td&gt;~$0.008 to $0.05&lt;/td&gt;
&lt;td&gt;per 1M tokens (model-dependent)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Two things make this pricing especially friendly for solo builders. First, you only pay for what you use — there’s no monthly minimum. Second, the same key works for both the Free tier and paid models, so there’s no migration cost when you flip from free to paid for a single hot model. Check the &lt;a href="https://www.together.ai/pricing" rel="noopener noreferrer"&gt;official pricing page&lt;/a&gt; for current numbers.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Is Together AI’s Free tier really free, or is it a trial?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Both. Models with the &lt;code&gt;-Free&lt;/code&gt; suffix are free to call indefinitely (rate-limited but non-expiring). All other models run against a one-time $1 trial credit at signup. Once the trial credit is gone, paid models stop until you add a payment method.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do I need a credit card to sign up?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;No. The default account state has no payment method on file. You only need to add one when you want to spend beyond your trial credit on paid models — Free-tier models keep working either way.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is the API truly OpenAI-compatible?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes for chat completions, streaming, and tool calling. Image generation uses Together AI’s own endpoint shape (which closely mirrors OpenAI’s). Embeddings are also OpenAI-compatible. In practice, you can point any OpenAI SDK at &lt;code&gt;https://api.together.xyz/v1&lt;/code&gt; and most code works without changes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What’s the difference between “Turbo” and non-Turbo models?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Turbo variants are quantized (typically FP8) for higher throughput at very small quality loss. Together AI publishes evaluation numbers showing Turbo variants stay within a fraction of a percent of full-precision quality on standard benchmarks. For nearly all production use cases, prefer Turbo.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I use Together AI for commercial projects?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes — both the Free and paid tiers permit commercial use, subject to each model’s underlying license. Llama models follow Meta’s Llama Community License, FLUX.1 [schnell] is Apache 2.0, and so on. Confirm any specific model’s license on its model card before shipping.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does Together AI store my prompts or completions?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Together AI’s stated policy is that they don’t train on your data and that prompts are not retained beyond what’s needed for abuse prevention. For sensitive workloads, the dedicated/enterprise tiers offer stronger data-handling guarantees. Re-check the current &lt;a href="https://www.together.ai/privacy" rel="noopener noreferrer"&gt;privacy policy&lt;/a&gt; before sending real customer data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How does the Free tier compare to running models locally with &lt;a href="https://toolfreebie.com/ollama-run-ai-models-locally-for-free/" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt;?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Ollama is unbeatable for offline development and zero-cost long-running tasks, but it’s bounded by the GPU on your laptop — running Llama 3.3 70B locally requires serious hardware. Together AI’s Free tier gives you the same model running on a real datacenter GPU, just with rate limits. The two tools are complements: prototype locally with Ollama on a smaller model, then call Together AI when you need the 70B for the parts that matter.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Verdict
&lt;/h2&gt;

&lt;p&gt;Together AI’s Free tier is the most underrated entry point in the free-AI-API space because it solves a problem most other free APIs ignore: &lt;strong&gt;multimodal coverage on a single key&lt;/strong&gt;. Every other provider in this category is great at one thing — Cerebras for raw speed, Gemini for context length, Cohere for retrieval, Cloudflare for edge — and forces you to integrate three or four of them if your project needs more than one capability. Together AI’s &lt;code&gt;-Free&lt;/code&gt; models give you chat, reasoning, vision, and image generation behind one HTTPS endpoint, one SDK, and one key, with no credit card.&lt;/p&gt;

&lt;p&gt;For prototyping multimodal agents, building a side project that mixes capabilities, or just keeping one fewer signup form on your “maybe later” list, Together AI’s Free tier earns its place in any serious 2026 free-AI-API stack. Sign up at &lt;a href="https://api.together.ai" rel="noopener noreferrer"&gt;api.together.ai&lt;/a&gt;, copy the key, and your first chat completion is about three minutes away.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Reads
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/10-best-free-ai-apis-in-2026-the-ultimate-comparison/" rel="noopener noreferrer"&gt;10 Best Free AI APIs in 2026: The Ultimate Comparison&lt;/a&gt; — the master list of every free chat API worth your time&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/groq-vs-cerebras-vs-gemini-fastest-free-ai-api-2026/" rel="noopener noreferrer"&gt;Groq vs Cerebras vs Gemini: Which Free AI API Is Actually Fastest in 2026?&lt;/a&gt; — when raw speed is the deciding factor&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/deepseek-api-free-access-to-r1-reasoning-and-v3-chat-models/" rel="noopener noreferrer"&gt;DeepSeek API: Free Access to R1 Reasoning and V3 Chat Models&lt;/a&gt; — for the same R1 reasoning, sourced directly&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/openrouter-access-300-free-ai-models-with-one-api-key/" rel="noopener noreferrer"&gt;OpenRouter: Access 300+ Free AI Models with One API Key&lt;/a&gt; — when model variety matters more than coverage of a single provider&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://toolfreebie.com/cohere-free-api-embedding-rerank-rag-2026/" rel="noopener noreferrer"&gt;Cohere Free API: The Best Free Embedding and Rerank API for RAG in 2026&lt;/a&gt; — pair with Together AI for a complete free RAG stack&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://toolfreebie.com/together-ai-free-api-llama-deepseek-flux-2026/" rel="noopener noreferrer"&gt;toolfreebie.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Cohere Free API: The Best Free Embedding and Rerank API for RAG in 2026</title>
      <dc:creator>toolfreebie</dc:creator>
      <pubDate>Sun, 03 May 2026 15:50:01 +0000</pubDate>
      <link>https://dev.to/build996/cohere-free-api-the-best-free-embedding-and-rerank-api-for-rag-in-2026-5a2e</link>
      <guid>https://dev.to/build996/cohere-free-api-the-best-free-embedding-and-rerank-api-for-rag-in-2026-5a2e</guid>
      <description>&lt;h2&gt;
  
  
  What Is Cohere?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://cohere.com" rel="noopener noreferrer"&gt;Cohere&lt;/a&gt; is a Toronto-based AI company founded in 2019 by Aidan Gomez (one of the original authors of the “Attention Is All You Need” Transformer paper) and a team of ex-Google Brain researchers. Unlike OpenAI or Anthropic, Cohere built its platform from day one around a specific use case: &lt;strong&gt;enterprise retrieval and RAG (Retrieval-Augmented Generation)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That focus shows up in three places where Cohere genuinely leads the field — and where most developers don’t realize they can get it for free:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Embed v3&lt;/strong&gt; — text embeddings that consistently rank near the top of the MTEB benchmark, in both English and 100+ other languages&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rerank v3&lt;/strong&gt; — the most-deployed neural reranker in production RAG systems, available via a single API call&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Command R / R+&lt;/strong&gt; — chat models specifically trained for RAG, tool use, and grounded citations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And the part most developers miss: a free Cohere trial key gives you access to &lt;em&gt;all&lt;/em&gt; of these. No credit card, no time limit. The only constraint is per-minute rate limiting, which is fine for prototyping, side projects, and small production workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  What’s Free on Cohere
&lt;/h2&gt;

&lt;p&gt;Cohere has two key types: &lt;strong&gt;Trial keys&lt;/strong&gt; (free) and &lt;strong&gt;Production keys&lt;/strong&gt; (paid). Trial keys never expire — they’re rate-limited but otherwise unrestricted.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Endpoint&lt;/th&gt;
&lt;th&gt;Trial Rate Limit&lt;/th&gt;
&lt;th&gt;Production Rate Limit&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Chat (Command R/R+)&lt;/td&gt;
&lt;td&gt;20 calls/min&lt;/td&gt;
&lt;td&gt;500 calls/min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Embed&lt;/td&gt;
&lt;td&gt;100 calls/min&lt;/td&gt;
&lt;td&gt;2,000 calls/min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rerank&lt;/td&gt;
&lt;td&gt;10 calls/min&lt;/td&gt;
&lt;td&gt;1,000 calls/min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Classify&lt;/td&gt;
&lt;td&gt;100 calls/min&lt;/td&gt;
&lt;td&gt;1,000 calls/min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Summarize&lt;/td&gt;
&lt;td&gt;5 calls/min&lt;/td&gt;
&lt;td&gt;500 calls/min&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Notice the Embed limit: &lt;strong&gt;100 calls per minute&lt;/strong&gt; with up to 96 documents per call. That’s effectively 9,600 embeddings per minute on the free tier — more than enough to index a personal knowledge base or a small document corpus from scratch in a few minutes.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: Trial keys are not for production traffic, but they are for real development. Cohere’s documentation explicitly encourages building and testing on trial keys before upgrading.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Get Your Free API Key
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;a href="https://dashboard.cohere.com/welcome/register" rel="noopener noreferrer"&gt;dashboard.cohere.com/welcome/register&lt;/a&gt; and sign up with email or Google&lt;/li&gt;
&lt;li&gt;Verify your email address&lt;/li&gt;
&lt;li&gt;From the dashboard, navigate to &lt;strong&gt;API Keys&lt;/strong&gt; in the left sidebar&lt;/li&gt;
&lt;li&gt;Your default Trial key is already there — copy it&lt;/li&gt;
&lt;li&gt;Set it as an environment variable: &lt;code&gt;export COHERE_API_KEY="your_key_here"&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No credit card. No phone number. Two minutes from signup to your first embedding.&lt;/p&gt;

&lt;h2&gt;
  
  
  Python Quickstart: Your First Embedding
&lt;/h2&gt;

&lt;p&gt;Install the official Cohere Python SDK:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;cohere
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Embedding three documents:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cohere&lt;/span&gt;

&lt;span class="n"&gt;co&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cohere&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ClientV2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;COHERE_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;co&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cohere makes the best free embedding API for RAG.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OpenClaw is an AI agent platform for orchestrating tools.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Toronto is the headquarters of Cohere.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embed-english-v3.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;input_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_document&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding_types&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;float&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Got &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; embeddings&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Each embedding is &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; dimensions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That returns three 1024-dimensional vectors you can drop into any vector database — Pinecone, Weaviate, Chroma, Qdrant, pgvector, or just a NumPy array.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;input_type&lt;/code&gt; parameter is important: Cohere’s embeddings are &lt;strong&gt;asymmetric&lt;/strong&gt;. Use &lt;code&gt;"search_document"&lt;/code&gt; when indexing your corpus, and &lt;code&gt;"search_query"&lt;/code&gt; when embedding the user’s question. Treating them differently gives noticeably better retrieval quality than symmetric embedding APIs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Embedding Models You Get for Free
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model ID&lt;/th&gt;
&lt;th&gt;Dimensions&lt;/th&gt;
&lt;th&gt;Languages&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;embed-english-v3.0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1024&lt;/td&gt;
&lt;td&gt;English&lt;/td&gt;
&lt;td&gt;Highest quality English search and RAG&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;embed-multilingual-v3.0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1024&lt;/td&gt;
&lt;td&gt;100+&lt;/td&gt;
&lt;td&gt;Multilingual search, cross-language RAG&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;embed-english-light-v3.0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;384&lt;/td&gt;
&lt;td&gt;English&lt;/td&gt;
&lt;td&gt;Smaller index, faster queries, low storage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;embed-multilingual-light-v3.0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;384&lt;/td&gt;
&lt;td&gt;100+&lt;/td&gt;
&lt;td&gt;Multilingual on a budget&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For most RAG projects, &lt;code&gt;embed-english-v3.0&lt;/code&gt; at 1024 dimensions is the sweet spot. If you’re storing millions of vectors and storage cost matters, the light variants drop to 384 dimensions — about 60% smaller indexes — with only a small quality drop.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cohere Rerank: The Secret Weapon for RAG Quality
&lt;/h2&gt;

&lt;p&gt;Here is where Cohere genuinely leads: &lt;strong&gt;Rerank&lt;/strong&gt;. After your vector database returns the top 50 or 100 candidate documents, you pass them to Rerank along with the user’s query. Rerank scores each document for actual relevance and reorders them. The top 5 reranked results are almost always dramatically better than the top 5 from raw vector similarity.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cohere&lt;/span&gt;

&lt;span class="n"&gt;co&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cohere&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ClientV2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;COHERE_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How do I add a free embedding API to my chatbot?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cohere offers free embedding API access through trial keys.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Pinecone is a managed vector database service.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OpenAI embeddings cost $0.02 per million tokens.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Use embed-english-v3.0 for the best quality English embeddings.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Vector databases store high-dimensional vectors for similarity search.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;co&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rerank&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rerank-english-v3.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;top_n&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Score: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;relevance_score&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  |  &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That returns the three documents most relevant to the query, with calibrated relevance scores between 0 and 1. In production RAG systems, adding a Rerank step typically boosts answer quality by 15–30% over vector-similarity-only retrieval — which is why it’s the most-deployed neural reranker in commercial RAG stacks.&lt;/p&gt;

&lt;p&gt;And it’s &lt;strong&gt;free on the trial key&lt;/strong&gt;: 10 calls per minute, with up to 1,000 documents per call.&lt;/p&gt;

&lt;h2&gt;
  
  
  Chat with Command R+: Built for RAG
&lt;/h2&gt;

&lt;p&gt;Cohere’s Command R+ chat model is purpose-built for RAG. Unlike most chat APIs where you stuff retrieved documents into the system prompt, Cohere’s chat endpoint accepts a structured &lt;code&gt;documents&lt;/code&gt; parameter — and the model returns inline citations pointing to which documents each claim came from.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cohere&lt;/span&gt;

&lt;span class="n"&gt;co&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cohere&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ClientV2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;COHERE_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;co&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;command-r-plus&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Which Cohere embedding model should I use for English RAG?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embed-english-v3.0 produces 1024-dimensional embeddings and leads MTEB English benchmarks.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embed-english-light-v3.0 produces 384-dimensional embeddings, optimized for low storage cost.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embed-multilingual-v3.0 supports over 100 languages.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Citations:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;citation&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;citations&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;[]:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  - &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;citation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; from sources: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;citation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sources&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model produces a grounded answer that cites which document each fact came from. For RAG applications where users need to verify the source of every claim — legal, medical, internal knowledge bases — this is significantly more useful than free-text generation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Free Chat Models on Cohere
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model ID&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;Context Window&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;command-r-plus&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;104B&lt;/td&gt;
&lt;td&gt;128k tokens&lt;/td&gt;
&lt;td&gt;Best quality, complex RAG, tool use&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;command-r&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;35B&lt;/td&gt;
&lt;td&gt;128k tokens&lt;/td&gt;
&lt;td&gt;Faster RAG, cheaper-when-paid baseline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;command-r7b&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;7B&lt;/td&gt;
&lt;td&gt;128k tokens&lt;/td&gt;
&lt;td&gt;Fastest responses, simple Q&amp;amp;A&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;All three are available through your free trial key at the same 20-calls-per-minute rate limit. &lt;code&gt;command-r-plus&lt;/code&gt; is the headline model — it scores comparably to GPT-4o on RAG benchmarks while being explicitly trained to follow document citations.&lt;/p&gt;

&lt;h2&gt;
  
  
  End-to-End RAG Pipeline (All Free)
&lt;/h2&gt;

&lt;p&gt;Here’s a complete RAG pipeline using only Cohere’s free trial key — embed, store, retrieve, rerank, and answer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cohere&lt;/span&gt;

&lt;span class="n"&gt;co&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cohere&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ClientV2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;COHERE_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# 1. Your knowledge base
&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OpenClaw is an AI agent platform for orchestrating multiple AI APIs and tools.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cohere Embed v3 produces 1024-dimensional vectors optimized for retrieval.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cohere Rerank v3 reorders candidate documents by true relevance to the query.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Command R+ is a 104B model trained specifically for RAG with citations.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Free trial keys on Cohere have no time limit — only per-minute rate limits.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Index documents
&lt;/span&gt;&lt;span class="n"&gt;doc_embeds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;co&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embed-english-v3.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;input_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_document&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding_types&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;float&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;
&lt;span class="n"&gt;doc_matrix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc_embeds&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Embed the query
&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How do I get free access to Cohere&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s RAG models?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;query_embed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;co&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embed-english-v3.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;input_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding_types&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;float&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# 4. Vector similarity — get top 3 candidates
&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;doc_matrix&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt; &lt;span class="n"&gt;query_embed&lt;/span&gt;
&lt;span class="n"&gt;top_indices&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;argsort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;:][::&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;candidates&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;top_indices&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# 5. Rerank to get best 2
&lt;/span&gt;&lt;span class="n"&gt;reranked&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;co&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rerank&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rerank-english-v3.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;top_n&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;top_docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;reranked&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# 6. Answer with Command R+ using grounded citations
&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;co&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;command-r-plus&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;top_docs&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s a full production-shape RAG pipeline — embed, retrieve, rerank, generate with citations — running on a free trial key with zero credit card on file.&lt;/p&gt;

&lt;h2&gt;
  
  
  JavaScript / Node.js Example
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;cohere-ai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;CohereClientV2&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;cohere-ai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;co&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;CohereClientV2&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;token&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;COHERE_API_KEY&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;co&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Cohere is the best free embedding API for RAG.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Toronto is the headquarters of Cohere.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;embed-english-v3.0&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;inputType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;search_document&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;embeddingTypes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;float&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Got &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;float&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; embeddings`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Cohere vs Other Free Embedding Options
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Free Embedding Model&lt;/th&gt;
&lt;th&gt;Dimensions&lt;/th&gt;
&lt;th&gt;Multilingual&lt;/th&gt;
&lt;th&gt;Reranker?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cohere&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;embed-english-v3.0 / multilingual-v3.0&lt;/td&gt;
&lt;td&gt;1024 / 384&lt;/td&gt;
&lt;td&gt;100+ languages&lt;/td&gt;
&lt;td&gt;Yes (Rerank v3)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Google Gemini&lt;/td&gt;
&lt;td&gt;text-embedding-004&lt;/td&gt;
&lt;td&gt;768&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mistral AI&lt;/td&gt;
&lt;td&gt;mistral-embed&lt;/td&gt;
&lt;td&gt;1024&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cloudflare Workers AI&lt;/td&gt;
&lt;td&gt;bge-base-en-v1.5&lt;/td&gt;
&lt;td&gt;768&lt;/td&gt;
&lt;td&gt;English only&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hugging Face Inference&lt;/td&gt;
&lt;td&gt;BGE / E5 family&lt;/td&gt;
&lt;td&gt;varies&lt;/td&gt;
&lt;td&gt;Some multilingual&lt;/td&gt;
&lt;td&gt;No (manual setup)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI (paid only)&lt;/td&gt;
&lt;td&gt;text-embedding-3-large&lt;/td&gt;
&lt;td&gt;3072&lt;/td&gt;
&lt;td&gt;Strong multilingual&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Where Cohere wins on the free tier:&lt;/strong&gt; the only provider on this list that ships a hosted neural reranker. For RAG quality, that single feature usually matters more than which embedding model you started with. Combined with asymmetric embeddings (separate &lt;code&gt;search_query&lt;/code&gt; and &lt;code&gt;search_document&lt;/code&gt; modes), Cohere’s free tier is a credible foundation for real retrieval applications — not just a demo toy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use Cohere with OpenClaw
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://openclaw.ai" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt; is an AI agent platform that orchestrates multiple APIs and tools into automated workflows. Cohere fits well as the &lt;strong&gt;retrieval and grounding layer&lt;/strong&gt; inside OpenClaw agents — the part that searches your private documents before the agent acts.&lt;/p&gt;

&lt;p&gt;A common pattern: an OpenClaw agent receives a user task (“draft a reply to this customer ticket”), uses Cohere Embed + Rerank to pull the three most relevant past tickets and policies from your knowledge base, then passes those documents to Command R+ to generate a cited reply. Because Cohere returns explicit citations, the agent can attach source links to the draft for human review.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cohere&lt;/span&gt;

&lt;span class="n"&gt;co&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cohere&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ClientV2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;COHERE_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retrieve_and_answer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;knowledge_base&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;A retrieval-then-answer step for use inside an OpenClaw agent.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Rerank handles both retrieval and ranking in one call
&lt;/span&gt;    &lt;span class="n"&gt;reranked&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;co&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rerank&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rerank-english-v3.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;knowledge_base&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;top_n&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;top_docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;knowledge_base&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;reranked&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;co&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;command-r-plus&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
        &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;top_docs&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;answer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sources&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;top_docs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;citations&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;citations&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Example use inside an agent step
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;retrieve_and_answer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is our refund policy for digital downloads?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;knowledge_base&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;load_company_kb&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;# your own loader
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;answer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice: when you only have a few hundred candidate documents, you can skip the embedding/vector-DB step entirely and just pass everything to Rerank. The free trial key allows up to 1,000 documents per Rerank call, which covers a surprising number of small-to-medium knowledge bases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cohere Pricing (When You Need More)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Price&lt;/th&gt;
&lt;th&gt;Unit&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Command R+&lt;/td&gt;
&lt;td&gt;$2.50 input / $10.00 output&lt;/td&gt;
&lt;td&gt;per 1M tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Command R&lt;/td&gt;
&lt;td&gt;$0.15 input / $0.60 output&lt;/td&gt;
&lt;td&gt;per 1M tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Command R7B&lt;/td&gt;
&lt;td&gt;$0.0375 input / $0.15 output&lt;/td&gt;
&lt;td&gt;per 1M tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Embed v3 (English / Multilingual)&lt;/td&gt;
&lt;td&gt;$0.10&lt;/td&gt;
&lt;td&gt;per 1M tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rerank v3&lt;/td&gt;
&lt;td&gt;$2.00&lt;/td&gt;
&lt;td&gt;per 1,000 searches&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;When you graduate from a Trial key to a Production key, Command R7B at $0.15 per million output tokens is one of the cheapest production-grade models available. Embed v3 at $0.10 per million tokens is competitive with or cheaper than every comparable hosted embedding API.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Use Cohere
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Cohere is the right choice when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You’re building a RAG application and want the best free embeddings + reranker combo&lt;/li&gt;
&lt;li&gt;You need multilingual retrieval across 100+ languages without changing models&lt;/li&gt;
&lt;li&gt;Your application requires grounded citations (legal, medical, internal knowledge bases)&lt;/li&gt;
&lt;li&gt;You want asymmetric embeddings (separate query and document modes) for better search quality&lt;/li&gt;
&lt;li&gt;You’re prototyping retrieval pipelines and want generous free per-minute limits&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Consider alternatives when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need raw chat throughput more than retrieval quality — use Groq or Cerebras for speed, Gemini Flash for free quota&lt;/li&gt;
&lt;li&gt;You want OpenAI SDK drop-in compatibility — use Mistral AI or DeepSeek&lt;/li&gt;
&lt;li&gt;You need image, audio, or multimodal generation — Cohere is text-only&lt;/li&gt;
&lt;li&gt;You’re building a pure chatbot with no retrieval — Command R+ works, but the model isn’t priced or designed around that use case&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Related Reads
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/groq-vs-cerebras-vs-gemini/" rel="noopener noreferrer"&gt;Groq vs Cerebras vs Gemini: Which Free AI API Is Actually Fastest in 2026?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/cerebras-free-api/" rel="noopener noreferrer"&gt;Cerebras Inference API: The Fastest Free AI API You’ve Never Heard Of&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/mistral-free-api/" rel="noopener noreferrer"&gt;Mistral AI Free API: Call Nemo and Mixtral for Free with Any OpenAI SDK&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/github-models-free-api/" rel="noopener noreferrer"&gt;GitHub Models: Free GPT-4o and Llama API for Every Developer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/cloudflare-workers-ai/" rel="noopener noreferrer"&gt;Cloudflare Workers AI: Free Edge AI Inference with 47+ Models&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Final Verdict
&lt;/h2&gt;

&lt;p&gt;Cohere is the most underrated free AI API for one specific reason: it’s the only provider that ships a complete RAG stack — embeddings, reranker, and a chat model trained for grounded citations — all behind a single free trial key. Most “free AI API” articles skip Cohere because they only compare chat models, where Cohere is fine but not best-in-class. That misses the point of what the company actually built.&lt;/p&gt;

&lt;p&gt;If your project involves search over your own documents, internal knowledge bases, customer tickets, product catalogs, or anything resembling RAG, Cohere’s free tier covers more of the pipeline than any other single provider. Sign up at &lt;a href="https://dashboard.cohere.com/welcome/register" rel="noopener noreferrer"&gt;dashboard.cohere.com&lt;/a&gt;, copy your trial key, and your first reranked retrieval is about ten minutes away.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://toolfreebie.com/cohere-rag-api/" rel="noopener noreferrer"&gt;toolfreebie.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Free AI Video Generators in 2026: Kling vs Pika vs HeyGen Compared</title>
      <dc:creator>toolfreebie</dc:creator>
      <pubDate>Sun, 03 May 2026 15:45:57 +0000</pubDate>
      <link>https://dev.to/build996/free-ai-video-generators-in-2026-kling-vs-pika-vs-heygen-compared-39k6</link>
      <guid>https://dev.to/build996/free-ai-video-generators-in-2026-kling-vs-pika-vs-heygen-compared-39k6</guid>
      <description>&lt;h2&gt;
  
  
  The State of Free AI Video Generation in 2026
&lt;/h2&gt;

&lt;p&gt;Two years ago, generative video was a research demo. You’d see a five-second OpenAI Sora clip on Twitter, a Runway Gen-2 reel that looked like a melted oil painting, and a vague feeling that “real” AI video was still a year or two out. By early 2026 that’s no longer true. There are three tools I now reach for every week — &lt;strong&gt;Kling&lt;/strong&gt;, &lt;strong&gt;Pika&lt;/strong&gt;, and &lt;strong&gt;HeyGen&lt;/strong&gt; — and all three have a free tier you can use without a credit card.&lt;/p&gt;

&lt;p&gt;The three solve different problems. Kling is what you use when you want a cinematic short clip generated from a still image or a text prompt. Pika is what you use when you want to direct a scene with motion brushes, lip-sync, and quick edits. HeyGen is what you use when you want a talking-head video of a fake (or real, with permission) person reading a script you wrote. They are not competitors so much as three slots in the same AI video toolkit.&lt;/p&gt;

&lt;p&gt;This article walks through each tool, what its free tier actually includes in April 2026, where the rough edges are, and how I’ve wired all three into automation built on top of &lt;a href="https://openclaw.ai" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt; for batch video generation. If you’re a creator, a developer building media tooling, or a marketer trying to stop paying $150/month for stock video, one or more of these will earn its place in your workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Quick Verdict, Up Front
&lt;/h2&gt;

&lt;p&gt;If you only have ten seconds to read this article:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Kling&lt;/strong&gt; — best for cinematic image-to-video and text-to-video. Free tier gives you ~166 credits/day (about 6 short clips) and 1080p output on the standard model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pika&lt;/strong&gt; — best for scene-level direction, motion brushes, and quick edits to existing video. Free tier is 250 credits at signup with limited regeneration.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HeyGen&lt;/strong&gt; — best for AI avatar talking-head videos for marketing, training, and tutorials. Free tier is three minutes of video per month with a watermark.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The rest of this article is the long version of why I picked those three over the dozen other contenders, what the actual workflow looks like, and how to chain them together for things like automated short-form video pipelines.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Picked These Three
&lt;/h2&gt;

&lt;p&gt;The free AI video space is crowded. There’s Runway, Luma Dream Machine, Hailuo (MiniMax), Vidu, Kling, Pika, HeyGen, Synthesia, D-ID, Sora when you can get a slot, and a long tail of WeChat-only Chinese tools. To narrow the field, I tested for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A real free tier in April 2026.&lt;/strong&gt; Not a “free trial that needs a card.” Several big-name tools quietly removed credit-card-free signup over the last year.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output quality I’d actually use.&lt;/strong&gt; Not just demo-reel cherry-picks. I generated the same prompt across every candidate and compared the dud rate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Different problem space.&lt;/strong&gt; Three text-to-video tools that do the same thing isn’t a useful roundup. I picked one cinematic generator (Kling), one motion-control editor (Pika), and one talking-head avatar (HeyGen).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API or automation surface.&lt;/strong&gt; At least one of the three needs to be scriptable, because that’s where AI video gets interesting beyond hobby use.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Notable tools that didn’t make this list and why:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Runway Gen-3&lt;/strong&gt; — beautiful output, but the free tier is now 125 one-time credits and that’s it. Once you’ve burned them, you’re paying. Kling and Pika are more sustainable for ongoing free use.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Luma Dream Machine&lt;/strong&gt; — solid quality, but the free tier dropped to 30 generations/month in late 2025. Workable for occasional use but more limited than Kling’s daily refresh.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sora&lt;/strong&gt; — when you can get access through a ChatGPT Plus account it’s stunning, but it’s not really “free” — you’re paying for the Plus subscription.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Synthesia&lt;/strong&gt; — free tier removed in 2024. Fully paid product now.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  1. Kling: The Best Free Cinematic Video Generator
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fna9smhk0p92guu0fjbq8.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fna9smhk0p92guu0fjbq8.jpg" alt="KlingAI 3.0 community page showing the All-New KlingAI 3.0 Series hero with a real generated cinematic clip" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Kling's English-locale community landing — the desert-driving hero is itself a Kling-generated clip.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://klingai.com" rel="noopener noreferrer"&gt;Kling&lt;/a&gt; is built by Kuaishou — the Chinese short-video company with billions of users — and it’s currently my default for “give me a five-second cinematic shot of X.” The model handles motion, light, and camera moves better than anything else available without payment in 2026. Most importantly, the free tier is unusually generous: a daily credit refresh rather than a one-time pool.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the Free Tier Actually Includes
&lt;/h3&gt;

&lt;p&gt;As of April 2026, signing up for Kling with an email gives you 166 credits per day. Each generation costs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Standard text-to-video, 5s, 720p:&lt;/strong&gt; 10 credits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Standard text-to-video, 5s, 1080p:&lt;/strong&gt; 20 credits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Standard image-to-video, 5s, 1080p:&lt;/strong&gt; 20 credits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pro mode (higher quality, 10s):&lt;/strong&gt; 35 credits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lip sync, motion brush, camera control:&lt;/strong&gt; usually +5 to +10 credits&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That works out to roughly 6-8 standard 1080p clips per day at no cost, or 3-4 longer Pro clips. The credits don’t roll over, so you have to use them or lose them — but the daily refresh is what makes Kling viable as a long-term free tool rather than a brief trial.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Standard vs Pro Difference
&lt;/h3&gt;

&lt;p&gt;Kling ships two underlying models. &lt;strong&gt;Standard&lt;/strong&gt; is fast (about 60 seconds per generation) and handles most prompts well. &lt;strong&gt;Pro&lt;/strong&gt; takes longer (3-5 minutes), produces noticeably better motion coherence, and supports the longer 10-second outputs. For text-to-video without a reference image, Pro is worth the credit hit; for image-to-video starting from a strong reference still, Standard is usually fine.&lt;/p&gt;

&lt;h3&gt;
  
  
  A First Generation
&lt;/h3&gt;

&lt;p&gt;The web UI is intentionally simple. Sign in with Google or email, pick text-to-video or image-to-video, type a prompt or upload an image, set duration and resolution, hit Generate. A queue position appears, the clip arrives in your library when ready, and you can download as MP4.&lt;/p&gt;

&lt;p&gt;The single most important Kling-specific tip: &lt;strong&gt;prompts work best when written like a film shot description, not like a Midjourney prompt&lt;/strong&gt;. Compare:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bad: &lt;em&gt;“a cat, cyberpunk, neon, 4k, detailed, cinematic, high quality”&lt;/em&gt; — Kling treats the modifiers as scene elements and produces a confused frame.&lt;/li&gt;
&lt;li&gt;Good: &lt;em&gt;“Wide shot of a black cat walking slowly through a rainy Tokyo alley at night, neon signs reflected in puddles, slight steam rising from grates, camera tracking right at hip height.”&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The good prompt produces something that looks like a real cinematographer made a deliberate choice. The bad prompt produces a beautifully lit cat that doesn’t move convincingly. Tag-spam works for image generators; Kling rewards sentences.&lt;/p&gt;

&lt;h3&gt;
  
  
  Image-to-Video Is Where Kling Shines
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpbtbl73ix628wvqlyb3v.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpbtbl73ix628wvqlyb3v.jpg" alt="Kling templates page showing pre-built image-to-video starter prompts" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Kling ships templated image-to-video recipes — the fastest way to evaluate the model on the free tier.&lt;/p&gt;

&lt;p&gt;If you upload a still image and write a short motion prompt, Kling produces output that’s substantially better than its text-to-video. The reasoning is structural: the model only has to invent motion, not the entire visual world. Workflow I use weekly:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Generate a hero still in Midjourney, Imagen, or Flux. Iterate until the image is exactly what I want.&lt;/li&gt;
&lt;li&gt;Upload that still to Kling, image-to-video mode, 1080p, 5s.&lt;/li&gt;
&lt;li&gt;Prompt with motion only: &lt;em&gt;“Camera slowly pushes in on the subject. Hair moves gently in the wind. Background trees sway.”&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Generate two or three takes (Kling is non-deterministic), pick the best one.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This pipeline costs 40-60 credits and produces output you’d otherwise pay a stock-video site $40 for. It’s the single highest-leverage use of Kling’s free tier.&lt;/p&gt;

&lt;h3&gt;
  
  
  Camera Controls and Motion Brush
&lt;/h3&gt;

&lt;p&gt;Kling’s camera control panel lets you specify pan, tilt, zoom, and orbit moves explicitly rather than hoping the prompt conveys them. Motion brush lets you mask part of the input image and tell the model “move this region in this direction.” Both features cost extra credits but eliminate most of the “the AI didn’t understand what I wanted to move” problem that plagued earlier video generators.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where Kling Falls Short
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Faces drift over longer clips.&lt;/strong&gt; A 10-second Pro clip with a clear human face will sometimes shift facial features halfway through. Workaround: keep clips at 5 seconds and stitch in DaVinci Resolve.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Text in scenes is unreadable.&lt;/strong&gt; Like every video model in 2026, signs and on-screen text are gibberish. Generate clean plates and overlay real text in post.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The free tier UI is in Mandarin by default for some signup regions.&lt;/strong&gt; The English toggle is in the top right; the Mandarin labels are easy to navigate around using the visual layout.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Daily credits don’t accumulate.&lt;/strong&gt; If you don’t log in for a week, you don’t have 1,162 credits waiting — you have 166. Plan your generation days.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  2. Pika: Scene-Level Direction and Motion Brushes
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqsw0yt0xejrvm38svc2m.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqsw0yt0xejrvm38svc2m.jpg" alt="Pika homepage showing the Pikaformances feature preview alongside the Google/Facebook/Discord/Email sign-in card" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Pika gates everything behind a free account — the modal you see is unavoidable, but signup itself is genuinely free.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://pika.art" rel="noopener noreferrer"&gt;Pika&lt;/a&gt; is the second tool I keep installed. Where Kling is best at “generate me a cinematic shot,” Pika is best at “take this clip and modify it with surgical precision.” It’s the closest thing in the free AI video space to a non-linear editor where the operations are AI primitives rather than transitions.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the Free Tier Actually Includes
&lt;/h3&gt;

&lt;p&gt;Pika’s free tier in April 2026 gives you 250 credits at signup, with no automatic daily refresh — you earn small amounts of additional credits by participating in their Discord challenges or referring users. Each generation costs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pika 2.2 text-to-video, 5s, 1080p:&lt;/strong&gt; 30 credits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Image-to-video, 5s, 1080p:&lt;/strong&gt; 30 credits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pikaframes (frame-to-frame interpolation):&lt;/strong&gt; 35 credits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pikaffects (specific transformation effects):&lt;/strong&gt; 25-50 credits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lip sync to audio:&lt;/strong&gt; 30 credits&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s roughly 8-10 generations from your initial pool. After that you’re either paying $10/month for the Standard plan (700 credits/mo) or hunting for community credit drops. The free tier is best understood as a generous trial rather than a sustainable daily tool — the opposite shape from Kling.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Pika Is Worth a Slot Anyway
&lt;/h3&gt;

&lt;p&gt;Pika ships features the others don’t. Specifically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pikaffects&lt;/strong&gt; — pre-built transformation primitives. “Inflate” makes the subject puff up, “explode” replaces them with a particle burst, “melt” liquefies them, “crush” smashes them. They’re designed for short-form social video and they look great. No competitor offers this set as one-click effects.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pikaframes&lt;/strong&gt; — give it a starting image and an ending image, get a smooth video between them. Useful for product shots (“from box to assembled”), morphs, and storyboard-to-video.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lip sync&lt;/strong&gt; — upload a video of a person and an audio file, Pika rewrites the mouth to match the new audio. Quality is the best of the free tools I tested for this specific task.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Modify region&lt;/strong&gt; — paint a mask on a frame, prompt the change (“make the shirt red”), Pika regenerates only that region across the clip.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these are headline “generate cinematic video from scratch” features, but together they make Pika the right tool for editing AI video the rest of the way.&lt;/p&gt;

&lt;h3&gt;
  
  
  A Realistic Workflow
&lt;/h3&gt;

&lt;p&gt;The shape of the work I get done with Pika in a week:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Generate a base clip in Kling (uses Kling’s free daily credits).&lt;/li&gt;
&lt;li&gt;Bring it into Pika to apply a Pikaffect or run lip sync against a voiceover I generated in ElevenLabs or Coqui.&lt;/li&gt;
&lt;li&gt;Export and assemble in DaVinci Resolve (also free).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That pipeline produces social-media-ready short-form video without paying any single tool. Pika’s free credits are limiting if it’s your only tool, but they go a long way when used surgically on top of another generator.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where Pika Falls Short
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Initial credit pool runs out fast.&lt;/strong&gt; 250 credits sounds like a lot until you realize a single generation is 30. After your first day of experimentation, expect to be on a slower drip.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No public API on the free tier.&lt;/strong&gt; Pika has an API but it’s invite-only and paid. Automation requires browser automation against the web UI.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pikaffects are visually distinctive — to a fault.&lt;/strong&gt; If your audience watches a lot of TikTok they’ve seen the inflate/melt/explode effects on a hundred other accounts. Use sparingly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Long-form text prompts get truncated.&lt;/strong&gt; Keep your prompts under ~40 words for best results.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  3. HeyGen: AI Avatars That Read Your Script
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj55czqx6rqdmqy3ooeus.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj55czqx6rqdmqy3ooeus.jpg" alt="HeyGen templates page with the Transform any idea into a compelling video hero and the live AI Agent prompt UI" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;HeyGen's AI Agent landing — type a prompt, set duration and aspect, and the avatar pipeline kicks off.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.heygen.com" rel="noopener noreferrer"&gt;HeyGen&lt;/a&gt; solves a completely different problem from the other two. Where Kling and Pika generate cinematic or stylized video, HeyGen generates a realistic-looking person speaking words you typed. It’s the tool you reach for when you want a presenter for a tutorial, a marketing video, an e-learning module, or any context where someone needs to look at a camera and explain something.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the Free Tier Actually Includes
&lt;/h3&gt;

&lt;p&gt;The HeyGen free tier in April 2026 gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;3 minutes of video per month&lt;/strong&gt; across all your generations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Access to ~100 stock avatars&lt;/strong&gt; (real people who licensed their likeness)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;~300 voices in 40+ languages&lt;/strong&gt; via the built-in TTS&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;720p export with a HeyGen watermark&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Up to 1-minute video length per generation&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Three minutes a month sounds tight, and it is — but most use cases are 60-90 second explainer videos, so you’re realistically looking at two or three videos per month before you’d need to upgrade. For a side project or a single-person business, that’s often enough.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Killer Feature: Custom Voice Clone
&lt;/h3&gt;

&lt;p&gt;HeyGen’s standout free feature is &lt;strong&gt;Instant Voice Clone&lt;/strong&gt; — upload a 30-second clip of someone speaking (yours, or someone else’s with their permission) and HeyGen creates a TTS voice that sounds like them. You can then use that voice on any avatar in the platform. Free tier limits you to one voice clone, but the quality is genuinely good in English and the major European languages, and decent in Mandarin and Japanese.&lt;/p&gt;

&lt;p&gt;The two-step workflow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Record yourself reading the HeyGen onboarding paragraph at a normal speaking pace. Upload it.&lt;/li&gt;
&lt;li&gt;Wait ~5 minutes. Pick the new voice from the voice dropdown when generating any video.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Combined with the free avatar library, this gets you a presenter who looks like a paid actor and sounds like you. There’s an obvious ethical line here — only clone your own voice or one you have explicit permission for — but the technical capability is there in the free tier.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Avatar Selection
&lt;/h3&gt;

&lt;p&gt;The 100 free stock avatars cover a wide range of ages, ethnicities, and presentation styles: business-casual person at a desk, casual person against a neutral background, news-anchor framing, etc. They’re filmed people who licensed their image, not generated faces, which means they look genuinely human and don’t fall into the uncanny valley that pure-AI avatars do. Premium tiers unlock more avatars and the ability to create your own custom avatar from a video upload, but the free pool is varied enough for most general-purpose work.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Generation Workflow
&lt;/h3&gt;

&lt;p&gt;HeyGen feels like a slide editor more than a video generator. You add scenes, each scene has a background (color, image, or stock video), an avatar, and a script. You type the script, pick the voice, and generate. The avatar reads the script with synced lip movement, natural-looking head turns, and basic gestures. Total turnaround for a 60-second video is usually 2-3 minutes.&lt;/p&gt;

&lt;p&gt;The most underrated feature: &lt;strong&gt;HeyGen translates and dubs in one click&lt;/strong&gt;. Generate an English video, then use the Translate option to produce a Spanish, French, German, or Mandarin version with the same avatar lip-syncing the new language. Useful for any creator targeting multiple markets without recording multiple takes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where HeyGen Falls Short
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The watermark on the free tier is visible.&lt;/strong&gt; It’s a “Made with HeyGen” badge in the corner. Not subtle. If you’re publishing professionally you’ll want the $24/month Creator plan to remove it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Avatars are static-camera talking heads.&lt;/strong&gt; No walking around, no scene changes within the avatar shot, no full-body shots. If you want a presenter doing things, you’re back to filming a real person.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;3 minutes/month adds up fast if you iterate.&lt;/strong&gt; Generations against your script all count, including ones you discard. Get the script right in a text editor before generating.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Voice clone needs clean audio.&lt;/strong&gt; A 30-second clip with background noise produces a noisy clone. Record in a quiet room with a decent USB mic.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Side-by-Side Comparison
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftdxstld89ylf5muatphv.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftdxstld89ylf5muatphv.jpg" alt="Comparison table: Kling vs Pika vs HeyGen across free quota, generation time, clip length, resolution, image-to-video, voice clone, watermark, and best use case" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Where each free tier actually lands across the metrics that matter.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Kling&lt;/th&gt;
&lt;th&gt;Pika&lt;/th&gt;
&lt;th&gt;HeyGen&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Primary use case&lt;/td&gt;
&lt;td&gt;Cinematic clips&lt;/td&gt;
&lt;td&gt;Scene editing &amp;amp; effects&lt;/td&gt;
&lt;td&gt;Talking-head avatars&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Text-to-video&lt;/td&gt;
&lt;td&gt;Yes (best of the three)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No (script-to-avatar only)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Image-to-video&lt;/td&gt;
&lt;td&gt;Yes (best in class)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Free tier model&lt;/td&gt;
&lt;td&gt;~166 credits/day refresh&lt;/td&gt;
&lt;td&gt;250 credits at signup&lt;/td&gt;
&lt;td&gt;3 minutes/month&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Free output resolution&lt;/td&gt;
&lt;td&gt;1080p&lt;/td&gt;
&lt;td&gt;1080p&lt;/td&gt;
&lt;td&gt;720p&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Free output watermark&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max clip length (free)&lt;/td&gt;
&lt;td&gt;10s (Pro) / 5s (Standard)&lt;/td&gt;
&lt;td&gt;5s&lt;/td&gt;
&lt;td&gt;60s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lip sync to audio&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;Yes (good)&lt;/td&gt;
&lt;td&gt;Yes (built into avatars)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Camera control&lt;/td&gt;
&lt;td&gt;Yes (explicit panel)&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Motion brush&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes (Modify Region)&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Voice cloning&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes (1 voice on free)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Translation/dubbing&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Public API&lt;/td&gt;
&lt;td&gt;Yes (paid)&lt;/td&gt;
&lt;td&gt;Invite-only (paid)&lt;/td&gt;
&lt;td&gt;Yes (paid tier)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best for&lt;/td&gt;
&lt;td&gt;B-roll, hero shots&lt;/td&gt;
&lt;td&gt;Effects, lip-sync, edits&lt;/td&gt;
&lt;td&gt;Tutorials, training, marketing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  How to Pick — A Decision Tree
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx1l8thp77sz8y2gk1fw7.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx1l8thp77sz8y2gk1fw7.jpg" alt="Decision tree mapping video need to recommended tool: cinematic shot to Kling, social clip to Pika, avatar to HeyGen, all-of-above to the combined pipeline" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you can answer one question — what kind of video — you don't need to read the rest of the comparison.&lt;/p&gt;

&lt;p&gt;Most of the time the choice falls out of one question: &lt;strong&gt;what does the final video need to look like?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cinematic establishing shots, B-roll, or any “make me a beautiful 5-second video” task&lt;/strong&gt; → Kling. The daily credit refresh means you can iterate without blowing through a fixed pool, and image-to-video on a strong reference still consistently produces the best output of the three.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Effects, lip sync to a voiceover, or modifying an existing clip&lt;/strong&gt; → Pika. The Pikaffects library is unique, the lip sync quality is the best of the three for re-dubbing footage you didn’t generate, and the modify-region feature is the only way to do localized edits across an AI-generated clip in any free tool.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;An explainer video, tutorial, marketing pitch, or anything where someone needs to talk to camera&lt;/strong&gt; → HeyGen. The avatar quality is genuinely good, the voice clone makes it personal, and the one-click translation lets you reach non-English audiences from a single English script.&lt;/p&gt;

&lt;p&gt;The combination I use most is Kling + HeyGen — Kling for the visuals, HeyGen for any spoken intro or outro by a presenter avatar. Pika comes in when I need a specific Pikaffect or a precise edit Kling can’t make.&lt;/p&gt;

&lt;h2&gt;
  
  
  Combining All Three: A Free Short-Form Video Pipeline
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fec800on3h8nxza9rwq39.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fec800on3h8nxza9rwq39.jpg" alt="Pipeline diagram: a static image flows into Kling (image-to-video), then Pika (restyle); the same image also goes to HeyGen (avatar voiceover); both branches merge in CapCut for the final short" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The free-tier-only pipeline I actually use to produce a 30-second explainer in under five minutes of work.&lt;/p&gt;

&lt;p&gt;The pipeline I built in early 2026 to produce one short-form video per day with zero spend:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Script in any LLM.&lt;/strong&gt; A short 60-second script with a hook, three beats, and a call to action. Claude or DeepSeek for free.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Voiceover in ElevenLabs free tier or Coqui.&lt;/strong&gt; 10,000 characters/month free in ElevenLabs is enough for ~10 short scripts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hero still in Flux Schnell or Imagen 3 free.&lt;/strong&gt; One image that captures the visual concept of the video.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cinematic clip from the still in Kling.&lt;/strong&gt; Image-to-video, 1080p, 5s. Repeat 3-4 times for the different beats of the script.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lip-synced presenter intro in HeyGen.&lt;/strong&gt; 10-15 second avatar talking-head intro using the cloned voice.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Edit and assemble in DaVinci Resolve free.&lt;/strong&gt; Trim, color-grade, add captions (which DaVinci’s built-in transcription generates), export to 9:16 for vertical platforms.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Daily cost: $0. Weekly time: ~30 minutes per video once the workflow is dialed in. The output quality is high enough that the audience can’t tell the difference between this pipeline and a small studio’s work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using Kling, Pika, and HeyGen with OpenClaw
&lt;/h2&gt;

&lt;p&gt;If you’re orchestrating media generation through &lt;a href="https://openclaw.ai" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt; agents — which is increasingly the right move for batch content production — the three tools fit different parts of the agent’s toolkit. None of them have a fully open free API, but two have paid APIs that an agent can call when scaled, and the web UIs can be driven via browser automation when the volume is small.&lt;/p&gt;

&lt;p&gt;The pattern I’ve found works:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Agent generates a script and a still-frame prompt&lt;/strong&gt; using a free LLM API like &lt;a href="https://dev.to/p/deepseek-api-free-r1-v3-models"&gt;DeepSeek&lt;/a&gt; or &lt;a href="https://dev.to/p/groq-fastest-free-ai-api-2026"&gt;Groq&lt;/a&gt;. Both give you enough free quota for hundreds of script generations per day.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent calls an image generator&lt;/strong&gt; (Flux, Imagen 3 via the Gemini free tier) for the hero still.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Browser automation step submits the still to Kling&lt;/strong&gt; in image-to-video mode, polls for completion, downloads the MP4. This is the part where, until Kling opens a free API, you’re using Playwright or similar.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent uses HeyGen’s API for the talking-head intro.&lt;/strong&gt; HeyGen’s API is paid but inexpensive — about $0.04 per second of video on the lowest tier — and well-suited to programmatic use. For pure-free workflows you can drive HeyGen’s web UI with browser automation too.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Final assembly&lt;/strong&gt; happens in FFmpeg via the agent’s shell tool. Concat clips, overlay captions, output the final file.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The advantage of orchestrating through OpenClaw rather than running each tool by hand is that the agent can iterate on rejected outputs. If a Kling generation comes back with the wrong subject framing, the agent retries with a refined prompt. If the HeyGen avatar’s voiceover trips on a technical word, the agent rewrites the script using the speak-friendly equivalent. This is exactly the kind of multi-step, failure-tolerant workflow that AI agents handle better than rigid scripts — and the free tiers make experimentation cheap.&lt;/p&gt;

&lt;p&gt;For more on building agent workflows that call third-party tools, see our walkthrough of &lt;a href="https://dev.to/p/mcp-model-context-protocol-connect-ai-agents"&gt;MCP for connecting AI agents to any tool or API&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest Limitations of Free AI Video in 2026
&lt;/h2&gt;

&lt;p&gt;Three things to keep in mind before betting a real production schedule on free AI video:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Daily credit caps mean you can’t burst.&lt;/strong&gt; If a project needs 30 cinematic clips by Friday, the free Kling tier won’t get you there in time — you’d need 5+ days at the daily refresh rate. Plan accordingly or pay for a one-month bump.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output quality is non-deterministic.&lt;/strong&gt; Even the best prompt produces a dud one in three or four times. Budget for regeneration credits.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Faces and hands remain the weak point.&lt;/strong&gt; All three tools handle faces well in close-ups but struggle with subtle facial drift over longer clips. For anything where a viewer will scrutinize a face, Kling’s image-to-video on a strong portrait still is your best chance, and short clips (5s, not 10s) are safer than long ones.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Terms of service vary.&lt;/strong&gt; Kling and Pika both allow free-tier output to be used commercially as of April 2026, but check before publishing — the Chinese-origin tools in particular have updated their commercial-use clauses repeatedly. HeyGen’s free tier output is technically commercial-use-allowed but the watermark makes it impractical for paid client work.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What’s Coming in the Rest of 2026
&lt;/h2&gt;

&lt;p&gt;Three things to watch:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI Sora consumer tier.&lt;/strong&gt; Sora has been API-only and expensive; rumors of a free tier inside ChatGPT Plus could shake up this list overnight.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open-source video models catching up.&lt;/strong&gt; Hunyuan Video, Mochi 1, and CogVideoX are usable open-weight models in 2026 — none yet match Kling on a fresh consumer GPU, but they’re closing the gap fast and let you run unlimited free generation on hardware you already own.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HeyGen-style avatar generators going lower-cost.&lt;/strong&gt; D-ID’s free tier vanished, but new entrants like Hedra and Synthesia’s stripped-down “Studio Free” launched in early 2026 are trying to undercut HeyGen. Worth watching.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This list is current as of April 2026. Free tiers in this space change quarterly — what’s free this week may not be free next week. The pattern of three tools (one cinematic generator, one editor, one talking-head) will outlast any specific provider, even when the names change.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Reads
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/postman-alternatives/" rel="noopener noreferrer"&gt;Postman Alternatives in 2026: Bruno and Hoppscotch — Free, Open-Source API Clients That Don’t Force a Login&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/free-temp-email-services/" rel="noopener noreferrer"&gt;Free Temporary Email Services in 2026: 9 Best Disposable Email Tools for Developer Testing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/herosms-review/" rel="noopener noreferrer"&gt;HeroSMS Review: Receive SMS Verification Codes from 180+ Countries&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/cohere-rag-api/" rel="noopener noreferrer"&gt;Cohere Free API: The Best Free Embedding and Rerank API for RAG in 2026&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://toolfreebie.com/render-hosting-review/" rel="noopener noreferrer"&gt;Render Free Hosting Review 2026: Deploy Web Apps, Databases, and Cron Jobs for Free&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Final Verdict
&lt;/h2&gt;

&lt;p&gt;If you’re going to use one of the three:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Use &lt;a href="https://klingai.com" rel="noopener noreferrer"&gt;Kling&lt;/a&gt;&lt;/strong&gt; if you need cinematic clips and want the most generous, sustainable free tier. Daily credit refresh and 1080p output make it the best free general-purpose AI video tool in 2026.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use &lt;a href="https://pika.art" rel="noopener noreferrer"&gt;Pika&lt;/a&gt;&lt;/strong&gt; if you’re editing or transforming existing clips, lip-syncing voiceovers, or applying social-friendly effects. Limited free credits but unique features.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use &lt;a href="https://www.heygen.com" rel="noopener noreferrer"&gt;HeyGen&lt;/a&gt;&lt;/strong&gt; if you need a talking-head presenter for tutorials, marketing, or training. Voice clone and one-click translation are killer features inside the free 3 minutes/month.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want the full pipeline — and you’re willing to invest 30 minutes a day learning the tools — chain all three together. The output rivals what stock-video subscriptions and small studios charge hundreds of dollars per month for, and the cost is zero. That equation didn’t exist a year ago and probably won’t last forever, so it’s worth using while it’s there.&lt;/p&gt;

&lt;p&gt;For more free AI tools that pair well with this video pipeline, see our roundup of &lt;a href="https://dev.to/p/10-best-free-ai-apis-2026-comparison"&gt;the 10 best free AI APIs in 2026&lt;/a&gt; and our guide to &lt;a href="https://dev.to/p/notebooklm-free-ai-research-tool"&gt;Google NotebookLM for free AI research&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://toolfreebie.com/kling-pika-heygen/" rel="noopener noreferrer"&gt;toolfreebie.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>tools</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
