What Is Together AI?
Together AI is an AI inference platform that hosts hundreds of open-source models behind one OpenAI-compatible API. Founded in 2022 and backed by NVIDIA, Salesforce Ventures, and Kleiner Perkins, the company built its reputation around two things developers actually care about: fast hosted inference for state-of-the-art open models (Llama, DeepSeek, Qwen, Mixtral) and a genuinely free tier that exposes a small but useful set of those models with no credit card required.
What separates Together AI from the long list of “free AI API” providers in 2026 is the breadth of categories you can hit on a single key. One signup gives you free access to:
- Llama 3.3 70B Instruct Turbo (Free) — Meta’s flagship 70B chat model
- DeepSeek R1 Distill Llama 70B (Free) — open reasoning model with chain-of-thought
- FLUX.1 schnell — Black Forest Labs’ fast image generation model
- Llama 3.2 11B Vision Instruct (Free) — multimodal image-understanding model
- Plus hundreds of other open models on a $1 trial credit
If you’re already evaluating Groq, Cerebras, Gemini, or DeepSeek, Together AI fills a different gap: a single endpoint that covers chat, reasoning, vision, and image generation on the same key.
What’s Actually Free on Together AI
Together AI uses a clear naming convention: any model whose ID ends with the suffix -Free can be called without consuming credits. These are slightly slower than the paid tiers (rate-limited, lower priority) but functionally complete. Everything else runs against the $1 free trial credit you get at signup.
| Model ID | Type | Context | Best For |
|---|---|---|---|
meta-llama/Llama-3.3-70B-Instruct-Turbo-Free |
Chat / instruction | 128K tokens | General assistant, RAG answer generation, code Q&A |
deepseek-ai/DeepSeek-R1-Distill-Llama-70B-free |
Reasoning | 32K tokens | Math, multi-step logic, agent planning loops |
meta-llama/Llama-Vision-Free |
Vision (multimodal) | 128K tokens | Image captioning, OCR, chart and screenshot understanding |
black-forest-labs/FLUX.1-schnell-Free |
Image generation | 1024×1024 default | Blog cover images, prototypes, social posts |
Beyond the explicitly free tier, the $1 trial credit is enough to exercise dozens of paid models — Mixtral 8x22B, Qwen 2.5 72B, Llama 3.1 405B, audio models like Whisper, embeddings models like BGE and M2-BERT — for tens of thousands of tokens each, which is plenty to test whether the bigger models meaningfully change your results before you commit a card.
Note: Together AI quietly retires and renames “Free” models from time to time as newer versions land. If a model ID stops working, check the official model list for the current Free variant.
How to Get Your Free API Key
- Go to api.together.ai and sign up with email, Google, or GitHub
- Verify your email address
- From the dashboard, navigate to Settings → API Keys
- Copy your default key (it starts with a long hex string, no prefix)
- Set it as an environment variable:
export TOGETHER_API_KEY="your_key_here"
No credit card. No phone number. The $1 free trial credit and access to all -Free models are activated immediately on signup.
curl Quickstart: Your First Request in 30 Seconds
Together AI is fully OpenAI-compatible, so the cleanest way to confirm everything works is a one-shot curl call against the chat completions endpoint:
curl https://api.together.xyz/v1/chat/completions \
-H "Authorization: Bearer $TOGETHER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-3.3-70B-Instruct-Turbo-Free",
"messages": [
{"role": "user", "content": "Explain pgvector in two sentences."}
]
}'
If you get back a JSON response with a choices[0].message.content field, you’re set. The exact same payload shape works against OpenAI — only the base URL and the model string change.
Python Quickstart
The official SDK is a thin wrapper around the OpenAI Python client. Install it:
pip install together
Basic chat completion:
import os
from together import Together
client = Together(api_key=os.environ["TOGETHER_API_KEY"])
response = client.chat.completions.create(
model="meta-llama/Llama-3.3-70B-Instruct-Turbo-Free",
messages=[
{"role": "system", "content": "You are a concise senior engineer."},
{"role": "user", "content": "When should I prefer SQLite over Postgres?"}
],
max_tokens=400,
)
print(response.choices[0].message.content)
If you already have OpenAI SDK code, swapping providers is a two-line change:
from openai import OpenAI
client = OpenAI(
api_key=os.environ["TOGETHER_API_KEY"],
base_url="https://api.together.xyz/v1",
)
response = client.chat.completions.create(
model="meta-llama/Llama-3.3-70B-Instruct-Turbo-Free",
messages=[{"role": "user", "content": "Write a haiku about caching."}],
)
print(response.choices[0].message.content)
Every parameter you’d pass to OpenAI — temperature, top_p, stop, response_format, tools, tool_choice — works identically.
Streaming Responses
For chat UIs and agent loops, you almost always want token streaming. Set stream=True and iterate:
stream = client.chat.completions.create(
model="meta-llama/Llama-3.3-70B-Instruct-Turbo-Free",
messages=[{"role": "user", "content": "Outline a blog post about RAG."}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)
Streaming on the Free tier is real streaming, not buffered chunks — you’ll see tokens appear at roughly the model’s true generation rate, which makes it usable for live chat UIs even before you start paying.
Reasoning with DeepSeek R1 Distill
The DeepSeek R1 family produces visible chain-of-thought reasoning before its final answer. On Together AI’s Free tier you can call the 70B distilled variant, which keeps most of the reasoning capability of the full R1 model at a fraction of the parameter count:
response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-R1-Distill-Llama-70B-free",
messages=[
{
"role": "user",
"content": (
"A bookstore sold 60 books on Monday, then sales grew "
"12% each day through Friday. How many books did they "
"sell in total that week? Show your work."
),
}
],
max_tokens=2000,
)
print(response.choices[0].message.content)
The model’s response will include a <think>…</think> block of internal reasoning followed by the final answer. For agent applications, you can either show the reasoning to the user (transparency) or strip it out (clean output) depending on the surface.
Image Generation with FLUX.1 [schnell] Free
FLUX.1 [schnell] is Black Forest Labs’ fast text-to-image model, distilled to 4 sampling steps and open-sourced under Apache 2.0. Together AI hosts it as a free image-generation endpoint:
response = client.images.generate(
model="black-forest-labs/FLUX.1-schnell-Free",
prompt="A clean isometric illustration of an AI agent fetching data from a cloud database, soft pastel colors, no text",
width=1024,
height=1024,
steps=4,
n=1,
)
print(response.data[0].url)
The returned URL is hosted by Together AI and stays valid long enough to download or pipe into a CDN. For blog covers, social posts, or quick mockups, FLUX.1 [schnell] often beats Stable Diffusion XL on prompt adherence at a fraction of the inference time.
Vision: Llama 3.2 Vision Free
The Free vision model accepts standard OpenAI-format multimodal messages — text plus image URLs or base64 data:
response = client.chat.completions.create(
model="meta-llama/Llama-Vision-Free",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What does this dashboard show? List the three highest values."},
{"type": "image_url", "image_url": {"url": "https://example.com/dashboard.png"}}
]
}
],
)
print(response.choices[0].message.content)
This is the cheapest path in 2026 to a working “describe this screenshot” or “extract data from this chart” feature without standing up your own vision pipeline. For OCR-heavy workloads on dense documents, a paid vision model will still outperform — but for screenshots, charts, product photos, and general image Q&A, Llama Vision Free is genuinely useful.
Together AI vs Other Free AI APIs
| Provider | Free Chat | Free Reasoning | Free Vision | Free Image Gen | OpenAI Compatible |
|---|---|---|---|---|---|
| Together AI | Llama 3.3 70B | DeepSeek R1 Distill 70B | Llama 3.2 Vision 11B | FLUX.1 schnell | Yes |
| Groq | Llama 3.3 70B (very fast) | DeepSeek R1 Distill | Llama Vision | No | Yes |
| Cerebras | Llama 3.3 70B (extremely fast) | Limited | No | No | Yes |
| Gemini | Gemini 2.0 Flash | Gemini 2.0 Flash Thinking | Built in | Imagen (limited) | Via compat layer |
| Cloudflare Workers AI | Llama 3 / Mistral | Limited | LLaVA | SDXL Lightning | Yes |
| OpenRouter | Many free models | DeepSeek R1 free | Several | Limited | Yes |
Where Together AI wins on the free tier: coverage. It’s the only provider on this list that offers chat, reasoning, vision, and image generation under one OpenAI-compatible endpoint, on one key, with no credit card. If you’re prototyping a multimodal product and don’t want to juggle three or four signups, Together AI compresses the entire surface area into one integration.
Where the others win: raw speed (Cerebras and Groq are faster on Llama 3.3 70B), context window (Gemini’s 1M tokens is unmatched), or model variety (OpenRouter aggregates more providers).
Rate Limits and Fair Use
Free-tier rate limits on Together AI exist to keep costs predictable. The exact numbers are published in the official rate limits page and change as the platform scales, but as a working mental model in 2026:
- -Free chat models: low double-digit requests per minute, with smaller per-day caps than paid tiers
- -Free image models: tighter caps (image inference is much more expensive), often a few requests per minute
- Paid models on trial credit: the standard tier-1 limits, but capped by your $1 budget — usually thousands of requests before the credit runs out on smaller models
The headline takeaway: Free-tier limits are designed for development and prototyping. They are not designed to support a production user base. If your side project starts getting traction, you’ll need to either move to a paid plan or layer caching in front (request deduplication on prompts is the highest-leverage win).
When to Use Together AI vs Alternatives
A simple decision tree based on what you’re optimizing for:
- Need everything in one key — chat + reasoning + vision + images? → Together AI Free tier
- Need the fastest possible chat response (under 1 second to first token)? → Cerebras or Groq
- Need a 1M-token context window for long documents? → Gemini
- Need the widest catalogue of free models from many providers? → OpenRouter
- Need the best free embedding + reranker for RAG? → Cohere
- Building edge functions and want inference inside Cloudflare? → Cloudflare Workers AI
Together AI is the right answer when your project benefits from a single integration that covers many capabilities, especially for multimodal applications and reasoning-heavy agents that may also need image generation.
Use Together AI with OpenClaw
OpenClaw is an AI agent platform that orchestrates multiple APIs and tools into automated workflows. Together AI fits well as a single inference layer behind an OpenClaw agent that needs to handle multiple modalities — read a screenshot, reason about what to do next, and produce a generated image as part of the output.
A working example: an OpenClaw agent receives a customer support ticket that includes a screenshot of an error. The agent uses Llama Vision (Free) to extract the error message from the image, DeepSeek R1 Distill (Free) to reason about which knowledge-base article applies, Llama 3.3 70B (Free) to draft a reply, and FLUX.1 schnell to generate a clean diagram for the customer if a visual explanation helps. All four steps hit the same API key.
import os
from together import Together
client = Together(api_key=os.environ["TOGETHER_API_KEY"])
def support_pipeline(ticket_text: str, screenshot_url: str) -> dict:
"""A multi-modal support agent step for OpenClaw."""
# 1. Extract the error from the screenshot
vision = client.chat.completions.create(
model="meta-llama/Llama-Vision-Free",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "Read the error message in this screenshot and return only the error text."},
{"type": "image_url", "image_url": {"url": screenshot_url}}
]
}]
)
error_text = vision.choices[0].message.content
# 2. Reason about which solution applies
reasoning = client.chat.completions.create(
model="deepseek-ai/DeepSeek-R1-Distill-Llama-70B-free",
messages=[{
"role": "user",
"content": f"Ticket: {ticket_text}\nError extracted: {error_text}\nWhat is the most likely root cause?"
}],
max_tokens=800,
)
# 3. Draft a customer-facing reply
reply = client.chat.completions.create(
model="meta-llama/Llama-3.3-70B-Instruct-Turbo-Free",
messages=[
{"role": "system", "content": "You are a senior support engineer. Be concise and friendly."},
{"role": "user", "content": f"Ticket: {ticket_text}\nRoot cause analysis: {reasoning.choices[0].message.content}\nWrite the reply to the customer."}
],
)
return {
"error": error_text,
"analysis": reasoning.choices[0].message.content,
"reply": reply.choices[0].message.content,
}
The same pattern fits other OpenClaw use cases: a research agent that reads charts and reasons about them, a content agent that writes a post and generates its cover image, a QA agent that screenshots a UI and verifies what it sees. The single-key, single-SDK shape keeps the agent code small.
Pricing When You Outgrow Free
If your application moves beyond prototyping, Together AI’s serverless pricing for the same models is competitive with the rest of the market. Approximate published prices in 2026 for popular models:
| Model | Approx Price | Unit |
|---|---|---|
| Llama 3.3 70B Instruct Turbo | ~$0.88 | per 1M tokens (blended) |
| Llama 3.1 8B Instruct Turbo | ~$0.18 | per 1M tokens (blended) |
| Llama 3.1 405B Instruct Turbo | ~$3.50 | per 1M tokens (blended) |
| DeepSeek R1 | ~$3.00 / $7.00 | per 1M input / output tokens |
| FLUX.1 [schnell] | ~$0.003 | per image (1024×1024, 4 steps) |
| BGE / M2-BERT embeddings | ~$0.008 to $0.05 | per 1M tokens (model-dependent) |
Two things make this pricing especially friendly for solo builders. First, you only pay for what you use — there’s no monthly minimum. Second, the same key works for both the Free tier and paid models, so there’s no migration cost when you flip from free to paid for a single hot model. Check the official pricing page for current numbers.
FAQ
Is Together AI’s Free tier really free, or is it a trial?
Both. Models with the -Free suffix are free to call indefinitely (rate-limited but non-expiring). All other models run against a one-time $1 trial credit at signup. Once the trial credit is gone, paid models stop until you add a payment method.
Do I need a credit card to sign up?
No. The default account state has no payment method on file. You only need to add one when you want to spend beyond your trial credit on paid models — Free-tier models keep working either way.
Is the API truly OpenAI-compatible?
Yes for chat completions, streaming, and tool calling. Image generation uses Together AI’s own endpoint shape (which closely mirrors OpenAI’s). Embeddings are also OpenAI-compatible. In practice, you can point any OpenAI SDK at https://api.together.xyz/v1 and most code works without changes.
What’s the difference between “Turbo” and non-Turbo models?
Turbo variants are quantized (typically FP8) for higher throughput at very small quality loss. Together AI publishes evaluation numbers showing Turbo variants stay within a fraction of a percent of full-precision quality on standard benchmarks. For nearly all production use cases, prefer Turbo.
Can I use Together AI for commercial projects?
Yes — both the Free and paid tiers permit commercial use, subject to each model’s underlying license. Llama models follow Meta’s Llama Community License, FLUX.1 [schnell] is Apache 2.0, and so on. Confirm any specific model’s license on its model card before shipping.
Does Together AI store my prompts or completions?
Together AI’s stated policy is that they don’t train on your data and that prompts are not retained beyond what’s needed for abuse prevention. For sensitive workloads, the dedicated/enterprise tiers offer stronger data-handling guarantees. Re-check the current privacy policy before sending real customer data.
How does the Free tier compare to running models locally with Ollama?
Ollama is unbeatable for offline development and zero-cost long-running tasks, but it’s bounded by the GPU on your laptop — running Llama 3.3 70B locally requires serious hardware. Together AI’s Free tier gives you the same model running on a real datacenter GPU, just with rate limits. The two tools are complements: prototype locally with Ollama on a smaller model, then call Together AI when you need the 70B for the parts that matter.
Final Verdict
Together AI’s Free tier is the most underrated entry point in the free-AI-API space because it solves a problem most other free APIs ignore: multimodal coverage on a single key. Every other provider in this category is great at one thing — Cerebras for raw speed, Gemini for context length, Cohere for retrieval, Cloudflare for edge — and forces you to integrate three or four of them if your project needs more than one capability. Together AI’s -Free models give you chat, reasoning, vision, and image generation behind one HTTPS endpoint, one SDK, and one key, with no credit card.
For prototyping multimodal agents, building a side project that mixes capabilities, or just keeping one fewer signup form on your “maybe later” list, Together AI’s Free tier earns its place in any serious 2026 free-AI-API stack. Sign up at api.together.ai, copy the key, and your first chat completion is about three minutes away.
Related Reads
- 10 Best Free AI APIs in 2026: The Ultimate Comparison — the master list of every free chat API worth your time
- Groq vs Cerebras vs Gemini: Which Free AI API Is Actually Fastest in 2026? — when raw speed is the deciding factor
- DeepSeek API: Free Access to R1 Reasoning and V3 Chat Models — for the same R1 reasoning, sourced directly
- OpenRouter: Access 300+ Free AI Models with One API Key — when model variety matters more than coverage of a single provider
- Cohere Free API: The Best Free Embedding and Rerank API for RAG in 2026 — pair with Together AI for a complete free RAG stack
Originally published at toolfreebie.com.
Top comments (0)