Together AI Has a Free API: Run Open-Source LLMs 4x Cheaper Than OpenAI

#ai #llm #python #opensource

What is Together AI?

Together AI is an inference platform that lets you run open-source LLMs (Llama 3, Mixtral, DBRX, Qwen) through a simple API — at prices 2-4x lower than OpenAI. They also offer a free tier with $5 credits to get started.

Why Together AI?

$5 free credits — enough for ~5M tokens with smaller models
OpenAI-compatible API — swap openai.OpenAI(base_url=...) and you are done
70+ open-source models — Llama 3 70B, Mixtral, CodeLlama, DBRX, Qwen
Fine-tuning — fine-tune any model with your data, starting at $2/hour
Serverless + dedicated — scale from prototype to production

Quick Start

pip install together

from together import Together

client = Together(api_key="your-api-key")  # Free $5 at api.together.xyz

response = client.chat.completions.create(
    model="meta-llama/Llama-3-70b-chat-hf",
    messages=[{"role": "user", "content": "Explain container orchestration in 3 sentences"}]
)
print(response.choices[0].message.content)

OpenAI-Compatible (Drop-In Replacement)

from openai import OpenAI

# Just change base_url — ALL your OpenAI code works instantly
client = OpenAI(
    base_url="https://api.together.xyz/v1",
    api_key="your-together-key"
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3-70b-chat-hf",
    messages=[{"role": "user", "content": "Write a Dockerfile for a Python FastAPI app"}]
)

Streaming + Function Calling

stream = client.chat.completions.create(
    model="meta-llama/Llama-3-70b-chat-hf",
    messages=[{"role": "user", "content": "Build a REST API for user management"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Fine-Tuning Your Own Model

# Upload training data
resp = client.files.create(
    file=open("training_data.jsonl", "rb"),
    purpose="fine-tune"
)

# Start fine-tuning ($2/hour for 7B models)
job = client.fine_tuning.create(
    model="meta-llama/Llama-3-8b-chat-hf",
    training_file=resp.id,
    n_epochs=3
)

Image Generation

response = client.images.generate(
    model="black-forest-labs/FLUX.1-schnell-Free",
    prompt="A DevOps engineer deploying to Kubernetes, cyberpunk style",
    n=1
)
print(response.data[0].url)

Price Comparison

Model	Together AI	OpenAI Equivalent	Savings
Llama 3 70B	$0.90/M tokens	GPT-4: $30/M	97%
Llama 3 8B	$0.20/M tokens	GPT-3.5: $1.50/M	87%
Mixtral 8x7B	$0.60/M tokens	GPT-4: $30/M	98%
Codestral	$0.20/M tokens	GPT-4: $30/M	99%

Real-World Use Case

A SaaS startup was spending $8K/month on OpenAI for their customer support chatbot. They switched to Together AI with Llama 3 70B: same quality responses (tested with blind A/B), $900/month total cost. The $7K savings paid for two junior developers.

Want to cut your AI infrastructure costs by 90%? I help teams migrate from OpenAI to open-source models. Contact spinov001@gmail.com or check my data tools on Apify.

DEV Community