Together AI Has a Free API — Here's How to Run Open-Source LLMs 4x Cheaper Than OpenAI

#togetherai #ai #llm #api

A startup CTO told me: 'Our OpenAI bill was $2,000/month for GPT-4. We switched the non-critical tasks to Llama 3 on Together AI — same quality for those use cases, $200/month.'

What Together AI Offers

Together AI:

Free $5 credit on signup — enough for 10M+ tokens on smaller models
100+ open-source models — Llama 3, Mixtral, CodeLlama, DBRX, etc.
OpenAI-compatible API — switch with one line
Inference — fastest Llama inference available
Fine-tuning — train custom models on your data
Embeddings — text embeddings for RAG/search
Image generation — Stable Diffusion, FLUX

Quick Start

npm install openai  # Together uses OpenAI-compatible API

import OpenAI from 'openai';

const together = new OpenAI({
  baseURL: 'https://api.together.xyz/v1',
  apiKey: process.env.TOGETHER_API_KEY
});

const response = await together.chat.completions.create({
  model: 'meta-llama/Llama-3-70b-chat-hf',
  messages: [{ role: 'user', content: 'Explain Docker in simple terms' }],
  max_tokens: 500
});

console.log(response.choices[0].message.content);

REST API

# Chat completion
curl 'https://api.together.xyz/v1/chat/completions' \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "meta-llama/Llama-3-70b-chat-hf",
    "messages": [{"role": "user", "content": "Hello!"}],
    "max_tokens": 200
  }'

# Streaming
curl 'https://api.together.xyz/v1/chat/completions' \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "meta-llama/Llama-3-70b-chat-hf",
    "messages": [{"role": "user", "content": "Tell me a joke"}],
    "stream": true
  }'

# Embeddings
curl 'https://api.together.xyz/v1/embeddings' \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "togethercomputer/m2-bert-80M-8k-retrieval",
    "input": "What is machine learning?"
  }'

# Image generation
curl 'https://api.together.xyz/v1/images/generations' \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "stabilityai/stable-diffusion-xl-base-1.0",
    "prompt": "A robot painting a landscape",
    "n": 1
  }'

Code Generation

// CodeLlama for coding tasks
const code = await together.chat.completions.create({
  model: 'codellama/CodeLlama-34b-Instruct-hf',
  messages: [{
    role: 'user',
    content: 'Write a Python function that finds all prime numbers up to n using the Sieve of Eratosthenes'
  }],
  max_tokens: 500
});

Embeddings for RAG

// Generate embeddings for semantic search
const embedding = await together.embeddings.create({
  model: 'togethercomputer/m2-bert-80M-8k-retrieval',
  input: ['How to deploy Docker containers', 'Kubernetes vs Docker Swarm']
});

// Use embeddings for similarity search
const vectors = embedding.data.map(d => d.embedding);

JSON Mode

// Force structured JSON output
const result = await together.chat.completions.create({
  model: 'meta-llama/Llama-3-70b-chat-hf',
  messages: [{
    role: 'user',
    content: 'Extract the product name, price, and rating from: "The Sony WH-1000XM5 headphones are $349 with 4.7 stars"'
  }],
  response_format: { type: 'json_object' }
});
// Returns: {"product": "Sony WH-1000XM5", "price": 349, "rating": 4.7}

Cost Comparison (per 1M tokens)

Model	Provider	Cost
GPT-4o	OpenAI	$5.00 input
Claude 3.5 Sonnet	Anthropic	$3.00 input
Llama 3 70B	Together AI	$0.90 input
Llama 3 8B	Together AI	$0.20 input
Mixtral 8x7B	Together AI	$0.60 input

Need AI-powered scraping? Check out my web scraping actors on Apify — intelligent data extraction.

Need custom AI solutions? Email me at spinov001@gmail.com.

DEV Community