What is Together AI?
Together AI is an inference platform that lets you run open-source LLMs (Llama 3, Mixtral, DBRX, Qwen) through a simple API — at prices 2-4x lower than OpenAI. They also offer a free tier with $5 credits to get started.
Why Together AI?
- $5 free credits — enough for ~5M tokens with smaller models
-
OpenAI-compatible API — swap
openai.OpenAI(base_url=...)and you are done - 70+ open-source models — Llama 3 70B, Mixtral, CodeLlama, DBRX, Qwen
- Fine-tuning — fine-tune any model with your data, starting at $2/hour
- Serverless + dedicated — scale from prototype to production
Quick Start
pip install together
from together import Together
client = Together(api_key="your-api-key") # Free $5 at api.together.xyz
response = client.chat.completions.create(
model="meta-llama/Llama-3-70b-chat-hf",
messages=[{"role": "user", "content": "Explain container orchestration in 3 sentences"}]
)
print(response.choices[0].message.content)
OpenAI-Compatible (Drop-In Replacement)
from openai import OpenAI
# Just change base_url — ALL your OpenAI code works instantly
client = OpenAI(
base_url="https://api.together.xyz/v1",
api_key="your-together-key"
)
response = client.chat.completions.create(
model="meta-llama/Llama-3-70b-chat-hf",
messages=[{"role": "user", "content": "Write a Dockerfile for a Python FastAPI app"}]
)
Streaming + Function Calling
stream = client.chat.completions.create(
model="meta-llama/Llama-3-70b-chat-hf",
messages=[{"role": "user", "content": "Build a REST API for user management"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
Fine-Tuning Your Own Model
# Upload training data
resp = client.files.create(
file=open("training_data.jsonl", "rb"),
purpose="fine-tune"
)
# Start fine-tuning ($2/hour for 7B models)
job = client.fine_tuning.create(
model="meta-llama/Llama-3-8b-chat-hf",
training_file=resp.id,
n_epochs=3
)
Image Generation
response = client.images.generate(
model="black-forest-labs/FLUX.1-schnell-Free",
prompt="A DevOps engineer deploying to Kubernetes, cyberpunk style",
n=1
)
print(response.data[0].url)
Price Comparison
| Model | Together AI | OpenAI Equivalent | Savings |
|---|---|---|---|
| Llama 3 70B | $0.90/M tokens | GPT-4: $30/M | 97% |
| Llama 3 8B | $0.20/M tokens | GPT-3.5: $1.50/M | 87% |
| Mixtral 8x7B | $0.60/M tokens | GPT-4: $30/M | 98% |
| Codestral | $0.20/M tokens | GPT-4: $30/M | 99% |
Real-World Use Case
A SaaS startup was spending $8K/month on OpenAI for their customer support chatbot. They switched to Together AI with Llama 3 70B: same quality responses (tested with blind A/B), $900/month total cost. The $7K savings paid for two junior developers.
Want to cut your AI infrastructure costs by 90%? I help teams migrate from OpenAI to open-source models. Contact spinov001@gmail.com or check my data tools on Apify.
Top comments (0)