A startup CTO told me: 'Our OpenAI bill was $2,000/month for GPT-4. We switched the non-critical tasks to Llama 3 on Together AI — same quality for those use cases, $200/month.'
What Together AI Offers
Together AI:
- Free $5 credit on signup — enough for 10M+ tokens on smaller models
- 100+ open-source models — Llama 3, Mixtral, CodeLlama, DBRX, etc.
- OpenAI-compatible API — switch with one line
- Inference — fastest Llama inference available
- Fine-tuning — train custom models on your data
- Embeddings — text embeddings for RAG/search
- Image generation — Stable Diffusion, FLUX
Quick Start
npm install openai # Together uses OpenAI-compatible API
import OpenAI from 'openai';
const together = new OpenAI({
baseURL: 'https://api.together.xyz/v1',
apiKey: process.env.TOGETHER_API_KEY
});
const response = await together.chat.completions.create({
model: 'meta-llama/Llama-3-70b-chat-hf',
messages: [{ role: 'user', content: 'Explain Docker in simple terms' }],
max_tokens: 500
});
console.log(response.choices[0].message.content);
REST API
# Chat completion
curl 'https://api.together.xyz/v1/chat/completions' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"model": "meta-llama/Llama-3-70b-chat-hf",
"messages": [{"role": "user", "content": "Hello!"}],
"max_tokens": 200
}'
# Streaming
curl 'https://api.together.xyz/v1/chat/completions' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"model": "meta-llama/Llama-3-70b-chat-hf",
"messages": [{"role": "user", "content": "Tell me a joke"}],
"stream": true
}'
# Embeddings
curl 'https://api.together.xyz/v1/embeddings' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"model": "togethercomputer/m2-bert-80M-8k-retrieval",
"input": "What is machine learning?"
}'
# Image generation
curl 'https://api.together.xyz/v1/images/generations' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"model": "stabilityai/stable-diffusion-xl-base-1.0",
"prompt": "A robot painting a landscape",
"n": 1
}'
Code Generation
// CodeLlama for coding tasks
const code = await together.chat.completions.create({
model: 'codellama/CodeLlama-34b-Instruct-hf',
messages: [{
role: 'user',
content: 'Write a Python function that finds all prime numbers up to n using the Sieve of Eratosthenes'
}],
max_tokens: 500
});
Embeddings for RAG
// Generate embeddings for semantic search
const embedding = await together.embeddings.create({
model: 'togethercomputer/m2-bert-80M-8k-retrieval',
input: ['How to deploy Docker containers', 'Kubernetes vs Docker Swarm']
});
// Use embeddings for similarity search
const vectors = embedding.data.map(d => d.embedding);
JSON Mode
// Force structured JSON output
const result = await together.chat.completions.create({
model: 'meta-llama/Llama-3-70b-chat-hf',
messages: [{
role: 'user',
content: 'Extract the product name, price, and rating from: "The Sony WH-1000XM5 headphones are $349 with 4.7 stars"'
}],
response_format: { type: 'json_object' }
});
// Returns: {"product": "Sony WH-1000XM5", "price": 349, "rating": 4.7}
Cost Comparison (per 1M tokens)
| Model | Provider | Cost |
|---|---|---|
| GPT-4o | OpenAI | $5.00 input |
| Claude 3.5 Sonnet | Anthropic | $3.00 input |
| Llama 3 70B | Together AI | $0.90 input |
| Llama 3 8B | Together AI | $0.20 input |
| Mixtral 8x7B | Together AI | $0.60 input |
Need AI-powered scraping? Check out my web scraping actors on Apify — intelligent data extraction.
Need custom AI solutions? Email me at spinov001@gmail.com.
Top comments (0)