DEV Community

Alex Spinov
Alex Spinov

Posted on

Together AI Has a Free API: Run Open-Source LLMs 4x Cheaper Than OpenAI

What is Together AI?

Together AI is an inference platform that lets you run open-source LLMs (Llama 3, Mixtral, DBRX, Qwen) through a simple API — at prices 2-4x lower than OpenAI. They also offer a free tier with $5 credits to get started.

Why Together AI?

  • $5 free credits — enough for ~5M tokens with smaller models
  • OpenAI-compatible API — swap openai.OpenAI(base_url=...) and you are done
  • 70+ open-source models — Llama 3 70B, Mixtral, CodeLlama, DBRX, Qwen
  • Fine-tuning — fine-tune any model with your data, starting at $2/hour
  • Serverless + dedicated — scale from prototype to production

Quick Start

pip install together
Enter fullscreen mode Exit fullscreen mode
from together import Together

client = Together(api_key="your-api-key")  # Free $5 at api.together.xyz

response = client.chat.completions.create(
    model="meta-llama/Llama-3-70b-chat-hf",
    messages=[{"role": "user", "content": "Explain container orchestration in 3 sentences"}]
)
print(response.choices[0].message.content)
Enter fullscreen mode Exit fullscreen mode

OpenAI-Compatible (Drop-In Replacement)

from openai import OpenAI

# Just change base_url — ALL your OpenAI code works instantly
client = OpenAI(
    base_url="https://api.together.xyz/v1",
    api_key="your-together-key"
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3-70b-chat-hf",
    messages=[{"role": "user", "content": "Write a Dockerfile for a Python FastAPI app"}]
)
Enter fullscreen mode Exit fullscreen mode

Streaming + Function Calling

stream = client.chat.completions.create(
    model="meta-llama/Llama-3-70b-chat-hf",
    messages=[{"role": "user", "content": "Build a REST API for user management"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
Enter fullscreen mode Exit fullscreen mode

Fine-Tuning Your Own Model

# Upload training data
resp = client.files.create(
    file=open("training_data.jsonl", "rb"),
    purpose="fine-tune"
)

# Start fine-tuning ($2/hour for 7B models)
job = client.fine_tuning.create(
    model="meta-llama/Llama-3-8b-chat-hf",
    training_file=resp.id,
    n_epochs=3
)
Enter fullscreen mode Exit fullscreen mode

Image Generation

response = client.images.generate(
    model="black-forest-labs/FLUX.1-schnell-Free",
    prompt="A DevOps engineer deploying to Kubernetes, cyberpunk style",
    n=1
)
print(response.data[0].url)
Enter fullscreen mode Exit fullscreen mode

Price Comparison

Model Together AI OpenAI Equivalent Savings
Llama 3 70B $0.90/M tokens GPT-4: $30/M 97%
Llama 3 8B $0.20/M tokens GPT-3.5: $1.50/M 87%
Mixtral 8x7B $0.60/M tokens GPT-4: $30/M 98%
Codestral $0.20/M tokens GPT-4: $30/M 99%

Real-World Use Case

A SaaS startup was spending $8K/month on OpenAI for their customer support chatbot. They switched to Together AI with Llama 3 70B: same quality responses (tested with blind A/B), $900/month total cost. The $7K savings paid for two junior developers.


Want to cut your AI infrastructure costs by 90%? I help teams migrate from OpenAI to open-source models. Contact spinov001@gmail.com or check my data tools on Apify.

Top comments (0)