What Is Cohere?
Cohere is a Toronto-based AI company founded in 2019 by Aidan Gomez (one of the original authors of the “Attention Is All You Need” Transformer paper) and a team of ex-Google Brain researchers. Unlike OpenAI or Anthropic, Cohere built its platform from day one around a specific use case: enterprise retrieval and RAG (Retrieval-Augmented Generation).
That focus shows up in three places where Cohere genuinely leads the field — and where most developers don’t realize they can get it for free:
- Embed v3 — text embeddings that consistently rank near the top of the MTEB benchmark, in both English and 100+ other languages
- Rerank v3 — the most-deployed neural reranker in production RAG systems, available via a single API call
- Command R / R+ — chat models specifically trained for RAG, tool use, and grounded citations
And the part most developers miss: a free Cohere trial key gives you access to all of these. No credit card, no time limit. The only constraint is per-minute rate limiting, which is fine for prototyping, side projects, and small production workloads.
What’s Free on Cohere
Cohere has two key types: Trial keys (free) and Production keys (paid). Trial keys never expire — they’re rate-limited but otherwise unrestricted.
| Endpoint | Trial Rate Limit | Production Rate Limit |
|---|---|---|
| Chat (Command R/R+) | 20 calls/min | 500 calls/min |
| Embed | 100 calls/min | 2,000 calls/min |
| Rerank | 10 calls/min | 1,000 calls/min |
| Classify | 100 calls/min | 1,000 calls/min |
| Summarize | 5 calls/min | 500 calls/min |
Notice the Embed limit: 100 calls per minute with up to 96 documents per call. That’s effectively 9,600 embeddings per minute on the free tier — more than enough to index a personal knowledge base or a small document corpus from scratch in a few minutes.
Note: Trial keys are not for production traffic, but they are for real development. Cohere’s documentation explicitly encourages building and testing on trial keys before upgrading.
How to Get Your Free API Key
- Go to dashboard.cohere.com/welcome/register and sign up with email or Google
- Verify your email address
- From the dashboard, navigate to API Keys in the left sidebar
- Your default Trial key is already there — copy it
- Set it as an environment variable:
export COHERE_API_KEY="your_key_here"
No credit card. No phone number. Two minutes from signup to your first embedding.
Python Quickstart: Your First Embedding
Install the official Cohere Python SDK:
pip install cohere
Embedding three documents:
import os
import cohere
co = cohere.ClientV2(api_key=os.environ["COHERE_API_KEY"])
response = co.embed(
texts=[
"Cohere makes the best free embedding API for RAG.",
"OpenClaw is an AI agent platform for orchestrating tools.",
"Toronto is the headquarters of Cohere."
],
model="embed-english-v3.0",
input_type="search_document",
embedding_types=["float"]
)
print(f"Got {len(response.embeddings.float)} embeddings")
print(f"Each embedding is {len(response.embeddings.float[0])} dimensions")
That returns three 1024-dimensional vectors you can drop into any vector database — Pinecone, Weaviate, Chroma, Qdrant, pgvector, or just a NumPy array.
The input_type parameter is important: Cohere’s embeddings are asymmetric. Use "search_document" when indexing your corpus, and "search_query" when embedding the user’s question. Treating them differently gives noticeably better retrieval quality than symmetric embedding APIs.
Embedding Models You Get for Free
| Model ID | Dimensions | Languages | Best For |
|---|---|---|---|
embed-english-v3.0 |
1024 | English | Highest quality English search and RAG |
embed-multilingual-v3.0 |
1024 | 100+ | Multilingual search, cross-language RAG |
embed-english-light-v3.0 |
384 | English | Smaller index, faster queries, low storage |
embed-multilingual-light-v3.0 |
384 | 100+ | Multilingual on a budget |
For most RAG projects, embed-english-v3.0 at 1024 dimensions is the sweet spot. If you’re storing millions of vectors and storage cost matters, the light variants drop to 384 dimensions — about 60% smaller indexes — with only a small quality drop.
Cohere Rerank: The Secret Weapon for RAG Quality
Here is where Cohere genuinely leads: Rerank. After your vector database returns the top 50 or 100 candidate documents, you pass them to Rerank along with the user’s query. Rerank scores each document for actual relevance and reorders them. The top 5 reranked results are almost always dramatically better than the top 5 from raw vector similarity.
import os
import cohere
co = cohere.ClientV2(api_key=os.environ["COHERE_API_KEY"])
query = "How do I add a free embedding API to my chatbot?"
documents = [
"Cohere offers free embedding API access through trial keys.",
"Pinecone is a managed vector database service.",
"OpenAI embeddings cost $0.02 per million tokens.",
"Use embed-english-v3.0 for the best quality English embeddings.",
"Vector databases store high-dimensional vectors for similarity search."
]
response = co.rerank(
model="rerank-english-v3.0",
query=query,
documents=documents,
top_n=3
)
for result in response.results:
print(f"Score: {result.relevance_score:.4f} | {documents[result.index]}")
That returns the three documents most relevant to the query, with calibrated relevance scores between 0 and 1. In production RAG systems, adding a Rerank step typically boosts answer quality by 15–30% over vector-similarity-only retrieval — which is why it’s the most-deployed neural reranker in commercial RAG stacks.
And it’s free on the trial key: 10 calls per minute, with up to 1,000 documents per call.
Chat with Command R+: Built for RAG
Cohere’s Command R+ chat model is purpose-built for RAG. Unlike most chat APIs where you stuff retrieved documents into the system prompt, Cohere’s chat endpoint accepts a structured documents parameter — and the model returns inline citations pointing to which documents each claim came from.
import os
import cohere
co = cohere.ClientV2(api_key=os.environ["COHERE_API_KEY"])
response = co.chat(
model="command-r-plus",
messages=[
{"role": "user", "content": "Which Cohere embedding model should I use for English RAG?"}
],
documents=[
{"data": {"text": "embed-english-v3.0 produces 1024-dimensional embeddings and leads MTEB English benchmarks."}},
{"data": {"text": "embed-english-light-v3.0 produces 384-dimensional embeddings, optimized for low storage cost."}},
{"data": {"text": "embed-multilingual-v3.0 supports over 100 languages."}}
]
)
print(response.message.content[0].text)
print()
print("Citations:")
for citation in response.message.citations or []:
print(f" - '{citation.text}' from sources: {[s.id for s in citation.sources]}")
The model produces a grounded answer that cites which document each fact came from. For RAG applications where users need to verify the source of every claim — legal, medical, internal knowledge bases — this is significantly more useful than free-text generation.
Free Chat Models on Cohere
| Model ID | Size | Context Window | Best For |
|---|---|---|---|
command-r-plus |
104B | 128k tokens | Best quality, complex RAG, tool use |
command-r |
35B | 128k tokens | Faster RAG, cheaper-when-paid baseline |
command-r7b |
7B | 128k tokens | Fastest responses, simple Q&A |
All three are available through your free trial key at the same 20-calls-per-minute rate limit. command-r-plus is the headline model — it scores comparably to GPT-4o on RAG benchmarks while being explicitly trained to follow document citations.
End-to-End RAG Pipeline (All Free)
Here’s a complete RAG pipeline using only Cohere’s free trial key — embed, store, retrieve, rerank, and answer:
import os
import numpy as np
import cohere
co = cohere.ClientV2(api_key=os.environ["COHERE_API_KEY"])
# 1. Your knowledge base
documents = [
"OpenClaw is an AI agent platform for orchestrating multiple AI APIs and tools.",
"Cohere Embed v3 produces 1024-dimensional vectors optimized for retrieval.",
"Cohere Rerank v3 reorders candidate documents by true relevance to the query.",
"Command R+ is a 104B model trained specifically for RAG with citations.",
"Free trial keys on Cohere have no time limit — only per-minute rate limits.",
]
# 2. Index documents
doc_embeds = co.embed(
texts=documents,
model="embed-english-v3.0",
input_type="search_document",
embedding_types=["float"]
).embeddings.float
doc_matrix = np.array(doc_embeds)
# 3. Embed the query
query = "How do I get free access to Cohere's RAG models?"
query_embed = np.array(co.embed(
texts=[query],
model="embed-english-v3.0",
input_type="search_query",
embedding_types=["float"]
).embeddings.float[0])
# 4. Vector similarity — get top 3 candidates
scores = doc_matrix @ query_embed
top_indices = np.argsort(scores)[-3:][::-1]
candidates = [documents[i] for i in top_indices]
# 5. Rerank to get best 2
reranked = co.rerank(
model="rerank-english-v3.0",
query=query,
documents=candidates,
top_n=2
)
top_docs = [candidates[r.index] for r in reranked.results]
# 6. Answer with Command R+ using grounded citations
answer = co.chat(
model="command-r-plus",
messages=[{"role": "user", "content": query}],
documents=[{"data": {"text": d}} for d in top_docs]
)
print(answer.message.content[0].text)
That’s a full production-shape RAG pipeline — embed, retrieve, rerank, generate with citations — running on a free trial key with zero credit card on file.
JavaScript / Node.js Example
npm install cohere-ai
import { CohereClientV2 } from "cohere-ai";
const co = new CohereClientV2({ token: process.env.COHERE_API_KEY });
const response = await co.embed({
texts: [
"Cohere is the best free embedding API for RAG.",
"Toronto is the headquarters of Cohere."
],
model: "embed-english-v3.0",
inputType: "search_document",
embeddingTypes: ["float"]
});
console.log(`Got ${response.embeddings.float.length} embeddings`);
Cohere vs Other Free Embedding Options
| Provider | Free Embedding Model | Dimensions | Multilingual | Reranker? |
|---|---|---|---|---|
| Cohere | embed-english-v3.0 / multilingual-v3.0 | 1024 / 384 | 100+ languages | Yes (Rerank v3) |
| Google Gemini | text-embedding-004 | 768 | Limited | No |
| Mistral AI | mistral-embed | 1024 | Limited | No |
| Cloudflare Workers AI | bge-base-en-v1.5 | 768 | English only | No |
| Hugging Face Inference | BGE / E5 family | varies | Some multilingual | No (manual setup) |
| OpenAI (paid only) | text-embedding-3-large | 3072 | Strong multilingual | No |
Where Cohere wins on the free tier: the only provider on this list that ships a hosted neural reranker. For RAG quality, that single feature usually matters more than which embedding model you started with. Combined with asymmetric embeddings (separate search_query and search_document modes), Cohere’s free tier is a credible foundation for real retrieval applications — not just a demo toy.
Use Cohere with OpenClaw
OpenClaw is an AI agent platform that orchestrates multiple APIs and tools into automated workflows. Cohere fits well as the retrieval and grounding layer inside OpenClaw agents — the part that searches your private documents before the agent acts.
A common pattern: an OpenClaw agent receives a user task (“draft a reply to this customer ticket”), uses Cohere Embed + Rerank to pull the three most relevant past tickets and policies from your knowledge base, then passes those documents to Command R+ to generate a cited reply. Because Cohere returns explicit citations, the agent can attach source links to the draft for human review.
import os
import cohere
co = cohere.ClientV2(api_key=os.environ["COHERE_API_KEY"])
def retrieve_and_answer(question: str, knowledge_base: list[str]) -> dict:
"""A retrieval-then-answer step for use inside an OpenClaw agent."""
# Rerank handles both retrieval and ranking in one call
reranked = co.rerank(
model="rerank-english-v3.0",
query=question,
documents=knowledge_base,
top_n=3
)
top_docs = [knowledge_base[r.index] for r in reranked.results]
answer = co.chat(
model="command-r-plus",
messages=[{"role": "user", "content": question}],
documents=[{"data": {"text": d}} for d in top_docs]
)
return {
"answer": answer.message.content[0].text,
"sources": top_docs,
"citations": answer.message.citations or []
}
# Example use inside an agent step
result = retrieve_and_answer(
question="What is our refund policy for digital downloads?",
knowledge_base=load_company_kb() # your own loader
)
print(result["answer"])
Notice: when you only have a few hundred candidate documents, you can skip the embedding/vector-DB step entirely and just pass everything to Rerank. The free trial key allows up to 1,000 documents per Rerank call, which covers a surprising number of small-to-medium knowledge bases.
Cohere Pricing (When You Need More)
| Model | Price | Unit |
|---|---|---|
| Command R+ | $2.50 input / $10.00 output | per 1M tokens |
| Command R | $0.15 input / $0.60 output | per 1M tokens |
| Command R7B | $0.0375 input / $0.15 output | per 1M tokens |
| Embed v3 (English / Multilingual) | $0.10 | per 1M tokens |
| Rerank v3 | $2.00 | per 1,000 searches |
When you graduate from a Trial key to a Production key, Command R7B at $0.15 per million output tokens is one of the cheapest production-grade models available. Embed v3 at $0.10 per million tokens is competitive with or cheaper than every comparable hosted embedding API.
When to Use Cohere
Cohere is the right choice when:
- You’re building a RAG application and want the best free embeddings + reranker combo
- You need multilingual retrieval across 100+ languages without changing models
- Your application requires grounded citations (legal, medical, internal knowledge bases)
- You want asymmetric embeddings (separate query and document modes) for better search quality
- You’re prototyping retrieval pipelines and want generous free per-minute limits
Consider alternatives when:
- You need raw chat throughput more than retrieval quality — use Groq or Cerebras for speed, Gemini Flash for free quota
- You want OpenAI SDK drop-in compatibility — use Mistral AI or DeepSeek
- You need image, audio, or multimodal generation — Cohere is text-only
- You’re building a pure chatbot with no retrieval — Command R+ works, but the model isn’t priced or designed around that use case
Related Reads
- Groq vs Cerebras vs Gemini: Which Free AI API Is Actually Fastest in 2026?
- Cerebras Inference API: The Fastest Free AI API You’ve Never Heard Of
- Mistral AI Free API: Call Nemo and Mixtral for Free with Any OpenAI SDK
- GitHub Models: Free GPT-4o and Llama API for Every Developer
- Cloudflare Workers AI: Free Edge AI Inference with 47+ Models
Final Verdict
Cohere is the most underrated free AI API for one specific reason: it’s the only provider that ships a complete RAG stack — embeddings, reranker, and a chat model trained for grounded citations — all behind a single free trial key. Most “free AI API” articles skip Cohere because they only compare chat models, where Cohere is fine but not best-in-class. That misses the point of what the company actually built.
If your project involves search over your own documents, internal knowledge bases, customer tickets, product catalogs, or anything resembling RAG, Cohere’s free tier covers more of the pipeline than any other single provider. Sign up at dashboard.cohere.com, copy your trial key, and your first reranked retrieval is about ten minutes away.
Originally published at toolfreebie.com.
Top comments (0)