Replicate Has a Free API — Here's How to Run AI Models Without GPUs

#replicate #ai #machinelearning #api

A developer wanted to add image generation to his app. AWS SageMaker: $400/month for a GPU instance. Replicate: pay per prediction, starting at fractions of a cent. For his 100 images/day, it cost $3/month.

What Replicate Offers

Replicate pricing:

Free tier: some models are free to run
Pay per prediction: most models cost $0.0001-$0.10 per run
No GPUs to manage — models run on Replicate's infrastructure
Thousands of open-source models — Stable Diffusion, Llama, Whisper, etc.
Custom models — deploy your own with Cog
Streaming — real-time output for LLMs
Webhooks — async prediction notifications

Quick Start

npm install replicate

import Replicate from 'replicate';

const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN });

// Generate an image with SDXL
const output = await replicate.run(
  'stability-ai/sdxl:39ed52f2a78e934b3ba6e2a89f5b1c712de7dfea535525255b1aa35c5565e08b',
  { input: { prompt: 'A serene mountain lake at sunset, photorealistic' } }
);
console.log(output); // ['https://replicate.delivery/...png']

REST API

# Create a prediction
curl -X POST 'https://api.replicate.com/v1/predictions' \
  -H 'Authorization: Bearer YOUR_API_TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{
    "version": "39ed52f2a78e934b3ba6e2a89f5b1c712de7dfea535525255b1aa35c5565e08b",
    "input": { "prompt": "A cat astronaut on Mars" }
  }'

# Get prediction result
curl 'https://api.replicate.com/v1/predictions/PREDICTION_ID' \
  -H 'Authorization: Bearer YOUR_API_TOKEN'

# List your predictions
curl 'https://api.replicate.com/v1/predictions' \
  -H 'Authorization: Bearer YOUR_API_TOKEN'

Common Models

// Image generation (SDXL)
const image = await replicate.run('stability-ai/sdxl', {
  input: { prompt: 'A futuristic city', width: 1024, height: 1024 }
});

// Speech to text (Whisper)
const transcript = await replicate.run('openai/whisper', {
  input: { audio: 'https://example.com/audio.mp3', model: 'large-v3' }
});

// Text generation (Llama)
const text = await replicate.run('meta/meta-llama-3-70b-instruct', {
  input: {
    prompt: 'Explain quantum computing in simple terms',
    max_tokens: 500
  }
});

// Image upscaling
const upscaled = await replicate.run('nightmareai/real-esrgan', {
  input: { image: 'https://example.com/low-res.jpg', scale: 4 }
});

// Remove background
const result = await replicate.run('cjwbw/rembg', {
  input: { image: 'https://example.com/photo.jpg' }
});

Streaming (LLMs)

// Stream tokens as they generate
for await (const event of replicate.stream('meta/meta-llama-3-70b-instruct', {
  input: { prompt: 'Write a poem about coding' }
})) {
  process.stdout.write(event.data);
}

Webhooks (Async)

// Start prediction with webhook callback
await replicate.predictions.create({
  version: 'stability-ai/sdxl:...',
  input: { prompt: 'A beautiful sunset' },
  webhook: 'https://yourapp.com/api/replicate-webhook',
  webhook_events_filter: ['completed']
});

// Handle webhook
app.post('/api/replicate-webhook', (req, res) => {
  const { output, status } = req.body;
  if (status === 'succeeded') {
    saveImage(output[0]); // Save the generated image
  }
  res.sendStatus(200);
});

Deploy Custom Models

# cog.yaml + predict.py — deploy any model
# predict.py
from cog import BasePredictor, Input

class Predictor(BasePredictor):
    def setup(self):
        self.model = load_my_model()

    def predict(self, text: str = Input(description="Input text")) -> str:
        return self.model.generate(text)

cog push r8.im/your-username/your-model

Need AI-powered web scraping? Check out my web scraping actors on Apify — smart data extraction.

Need custom AI integration? Email me at spinov001@gmail.com.

DEV Community