DEV Community

Cover image for Running AI models with Replicate and Encore
Ivan Cernja for Encore

Posted on • Originally published at encore.dev

Running AI models with Replicate and Encore

Running AI models in production typically requires managing complex infrastructure, GPUs, and scaling challenges. Replicate simplifies this by providing a cloud API to run thousands of AI models without managing any infrastructure.

In this tutorial, we'll build a backend that uses Replicate to generate images with state-of-the-art AI models like FLUX and Stable Diffusion. You'll learn how to run model predictions, handle async results, and store generated images with full type safety.

What is Replicate?

Replicate is a platform that lets you run machine learning models via a simple API. It provides:

  • Thousands of AI Models including image generation, LLMs, speech, and video
  • No Infrastructure Management - automatic scaling and GPU provisioning
  • Pay Per Use - only pay for what you run, no idle costs
  • Simple API - run any model with a few lines of code
  • Fast Cold Starts - models load in seconds
  • Custom Models - deploy your own models or fine-tune existing ones

Popular models on Replicate include FLUX (image generation), Llama (LLM), Whisper (speech-to-text), and hundreds more.

What we're building

We'll create an AI-powered backend with:

  • Image generation using FLUX and Stable Diffusion models
  • Async prediction handling with webhooks for long-running models
  • Image storage using Encore's object storage
  • Type-safe endpoints for creating and retrieving predictions
  • Status polling to track prediction progress

The backend will handle all the AI model orchestration while Replicate manages the infrastructure.

Getting started


Prefer to skip the setup? Use encore app create --example=ts/replicate-image-generator to start with a complete working example. This tutorial walks through building it from scratch to understand each component.

First, install Encore if you haven't already:

# macOS
brew install encoredev/tap/encore

# Linux
curl -L https://encore.dev/install.sh | bash

# Windows
iwr https://encore.dev/install.ps1 | iex
Enter fullscreen mode Exit fullscreen mode

Create a new Encore application. This will prompt you to create a free Encore account if you don't have one (required for secret management):

encore app create replicate-app --example=ts/hello-world
cd replicate-app
Enter fullscreen mode Exit fullscreen mode

Setting up Replicate

Creating your Replicate account

  1. Go to replicate.com and create a free account
  2. Navigate to your API tokens page
  3. Create a new API token and copy it

Replicate offers a free tier with credits to get started, then pay-per-use pricing based on model runtime.

Installing the Replicate SDK

Install the Replicate Node.js client:

npm install replicate
Enter fullscreen mode Exit fullscreen mode

Backend implementation

Creating the AI service

Every Encore service starts with a service definition file (encore.service.ts). Services let you divide your application into logical components. At deploy time, you can decide whether to colocate them in a single process or deploy them as separate microservices, without changing a single line of code:

// ai/encore.service.ts
import { Service } from "encore.dev/service";

export default new Service("ai");
Enter fullscreen mode Exit fullscreen mode

Configuring Replicate

To use Replicate's API, you need to authenticate with an API token. Encore provides built-in secrets management to securely store sensitive values like API keys:

// ai/replicate.ts
import Replicate from "replicate";
import { secret } from "encore.dev/config";

const replicateToken = secret("ReplicateToken");

export const replicate = new Replicate({
  auth: replicateToken(),
});
Enter fullscreen mode Exit fullscreen mode

The secret() function creates a reference to a secret value that's stored securely outside your code. Set your Replicate API token for your local development environment:

# Development
encore secret set --dev ReplicateToken

# Production
encore secret set --prod ReplicateToken
Enter fullscreen mode Exit fullscreen mode

Image generation endpoint

Create an endpoint to generate images using AI models. Replicate offers hundreds of text-to-image models including FLUX (fast, high-quality), Stable Diffusion, and specialized models for different artistic styles. We'll use FLUX Schnell, one of the fastest and most popular models:

// ai/generate.ts
import { api } from "encore.dev/api";
import { replicate } from "./replicate";

interface GenerateImageRequest {
  prompt: string;
  model?: "flux" | "stable-diffusion";
  aspectRatio?: "1:1" | "16:9" | "9:16";
}

interface GenerateImageResponse {
  id: string;
  status: string;
  output?: string[];
}

export const generateImage = api(
  { expose: true, method: "POST", path: "/ai/generate" },
  async (req: GenerateImageRequest): Promise<GenerateImageResponse> => {
    // Choose model based on request
    const modelVersion = req.model === "stable-diffusion" 
      ? "stability-ai/sdxl:39ed52f2a78e934b3ba6e2a89f5b1c712de7dfea535525255b1aa35c5565e08b"
      : "black-forest-labs/flux-schnell";

    const input: any = {
      prompt: req.prompt,
    };

    // FLUX-specific parameters
    if (req.model !== "stable-diffusion") {
      input.aspect_ratio = req.aspectRatio || "1:1";
      input.num_outputs = 1;
    }

    const prediction = await replicate.predictions.create({
      version: modelVersion,
      input,
    });

    return {
      id: prediction.id,
      status: prediction.status,
      output: prediction.output as string[] | undefined,
    };
  }
);
Enter fullscreen mode Exit fullscreen mode

Checking prediction status

AI models can take anywhere from a few seconds to a few minutes to generate images, depending on the model and complexity. Instead of making your users wait, you can return immediately and let them poll for results. Create an endpoint to check if a prediction is complete:

// ai/generate.ts (continued)
interface PredictionStatusRequest {
  id: string;
}

interface PredictionStatusResponse {
  id: string;
  status: string;
  output?: string[];
  error?: string;
}

export const getPredictionStatus = api(
  { expose: true, method: "GET", path: "/ai/predictions/:id" },
  async ({ id }: PredictionStatusRequest): Promise<PredictionStatusResponse> => {
    const prediction = await replicate.predictions.get(id);

    return {
      id: prediction.id,
      status: prediction.status,
      output: prediction.output as string[] | undefined,
      error: prediction.error ? String(prediction.error) : undefined,
    };
  }
);
Enter fullscreen mode Exit fullscreen mode

Storing generated images

Generated images from Replicate are temporary URLs that expire. To keep images permanently accessible, you need to store them. With Encore, you can create object storage by simply defining a bucket in your code. The framework automatically provisions the infrastructure locally using a storage emulator:

// ai/storage.ts
import { Bucket } from "encore.dev/storage/objects";

export const images = new Bucket("generated-images", {
  public: true, // Make images publicly accessible via URL
});
Enter fullscreen mode Exit fullscreen mode

This creates a storage bucket that's accessible from anywhere in your application. Now add an endpoint to download images from Replicate and store them in your bucket:

// ai/generate.ts (continued)
import { images } from "./storage";

interface SaveImageRequest {
  predictionId: string;
  imageUrl: string;
}

interface SaveImageResponse {
  url: string;
  key: string;
}

export const saveImage = api(
  { expose: true, method: "POST", path: "/ai/save-image" },
  async (req: SaveImageRequest): Promise<SaveImageResponse> => {
    // Download image from Replicate
    const response = await fetch(req.imageUrl);
    const imageBuffer = Buffer.from(await response.arrayBuffer());

    // Generate unique key
    const key = `${req.predictionId}-${Date.now()}.png`;

    // Upload to Encore's object storage
    await images.upload(key, imageBuffer, {
      contentType: "image/png",
    });

    // Get public URL
    const publicUrl = await images.publicUrl(key);

    return {
      url: publicUrl,
      key,
    };
  }
);
Enter fullscreen mode Exit fullscreen mode

Testing the backend

Start your Encore backend (make sure Docker is running first):

encore run
Enter fullscreen mode Exit fullscreen mode

Your API is now running locally. Open the local development dashboard at http://localhost:9400 to explore your API with interactive documentation and test endpoints directly in the browser.

Encore's local development dashboard showing API Explorer

Generate an image

Use curl or the API Explorer to generate an image:

# Generate with FLUX (fast, high quality)
curl -X POST http://localhost:4000/ai/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "A serene Japanese garden with cherry blossoms and a koi pond",
    "model": "flux",
    "aspectRatio": "16:9"
  }'
Enter fullscreen mode Exit fullscreen mode

Response:

{
  "id": "abc123xyz",
  "status": "starting"
}
Enter fullscreen mode Exit fullscreen mode

Check prediction status

curl http://localhost:4000/ai/predictions/abc123xyz
Enter fullscreen mode Exit fullscreen mode

When complete:

{
  "id": "abc123xyz",
  "status": "succeeded",
  "output": ["https://replicate.delivery/pbxt/abc123/output.png"]
}
Enter fullscreen mode Exit fullscreen mode

Save the generated image

curl -X POST http://localhost:4000/ai/save-image \
  -H "Content-Type: application/json" \
  -d '{
    "predictionId": "abc123xyz",
    "imageUrl": "https://replicate.delivery/pbxt/abc123/output.png"
  }'
Enter fullscreen mode Exit fullscreen mode

Response with your stored image:

{
  "url": "https://storage.encore.dev/generated-images/abc123xyz-1234567890.png",
  "key": "abc123xyz-1234567890.png"
}
Enter fullscreen mode Exit fullscreen mode

Here's an example of a generated image:

Example AI-generated image from FLUX model

Using the API Explorer

The local development dashboard shows:

  • All API endpoints with auto-generated documentation
  • Request/Response schemas extracted from TypeScript types
  • Interactive testing - run predictions directly in the browser
  • Distributed tracing - see Replicate API calls, timing, and object storage operations

Here's what a trace looks like for an image generation request:

Distributed tracing showing Replicate API call and response

Example prompts to try

FLUX and Stable Diffusion work best with detailed, descriptive prompts:

# Photorealistic scene
"A cozy coffee shop interior with warm lighting, wooden tables, 
vintage decor, and morning sunlight streaming through large windows, 
photorealistic, 8k"

# Artistic style
"A cyberpunk city street at night with neon signs, rain-slicked 
pavement, and flying cars, digital art, vibrant colors"

# Character design
"A friendly robot character with big eyes, metallic blue body, 
and glowing chest panel, cute, Pixar style, 3D render"

# Nature scene
"A misty mountain landscape at sunrise with pine trees, a calm lake 
reflection, and golden light, landscape photography"
Enter fullscreen mode Exit fullscreen mode

Frontend integration

Generate a type-safe API client for your frontend:

encore gen client --output=./frontend/src/lib/client.ts
Enter fullscreen mode Exit fullscreen mode

Use the client in your React/Next.js app:

import Client, { Local } from "./lib/client";

const client = new Client(Local);

// Generate image
const prediction = await client.ai.generateImage({
  prompt: "A beautiful sunset over mountains",
  model: "flux",
  aspectRatio: "16:9",
});

// Poll for completion
let status = await client.ai.getPredictionStatus({ id: prediction.id });
while (status.status !== "succeeded") {
  await new Promise(resolve => setTimeout(resolve, 1000));
  status = await client.ai.getPredictionStatus({ id: prediction.id });
}

// Save the image
if (status.output?.[0]) {
  const saved = await client.ai.saveImage({
    predictionId: prediction.id,
    imageUrl: status.output[0],
  });
  console.log("Image saved:", saved.url);
}
Enter fullscreen mode Exit fullscreen mode

For CORS configuration when your frontend runs on a different origin, update your encore.app file:

{
  "id": "replicate-app",
  "global_cors": {
    "allow_origins_without_credentials": ["http://localhost:5173"]
  }
}
Enter fullscreen mode Exit fullscreen mode

Serving a simple frontend

For quick demos and prototypes, you can serve a static HTML frontend directly from your Encore app using api.static():

// frontend/frontend.ts
import { api } from "encore.dev/api";

export const frontend = api.static(
  { 
    expose: true, 
    path: "/!path", 
    dir: "./assets"
  },
);
Enter fullscreen mode Exit fullscreen mode

Create an index.html file in frontend/assets/:

<!DOCTYPE html>
<html>
<head>
    <title>AI Image Generator</title>
    <style>
        body { font-family: system-ui; max-width: 800px; margin: 40px auto; }
        input { width: 100%; padding: 12px; margin: 8px 0; }
        button { padding: 12px 24px; background: #000; color: #fff; border: none; }
        img { max-width: 100%; margin: 16px 0; }
    </style>
</head>
<body>
    <h1>AI Image Generator</h1>
    <input id="prompt" placeholder="Describe your image..." />
    <button onclick="generate()">Generate</button>
    <div id="output"></div>

    <script>
        async function generate() {
            const prompt = document.getElementById('prompt').value;
            const output = document.getElementById('output');
            output.innerHTML = 'Generating...';

            // Start generation
            const res = await fetch('/ai/generate', {
                method: 'POST',
                headers: { 'Content-Type': 'application/json' },
                body: JSON.stringify({ prompt, model: 'flux', aspectRatio: '1:1' })
            });
            const prediction = await res.json();

            // Poll for completion
            let status = prediction;
            while (status.status !== 'succeeded' && status.status !== 'failed') {
                await new Promise(r => setTimeout(r, 1000));
                const statusRes = await fetch(`/ai/predictions/${prediction.id}`);
                status = await statusRes.json();
            }

            if (status.status === 'succeeded') {
                output.innerHTML = `<img src="${status.output[0]}" />`;
            } else {
                output.innerHTML = 'Generation failed';
            }
        }
    </script>
</body>
</html>
Enter fullscreen mode Exit fullscreen mode

The path: "/!path" pattern serves as a fallback route, meaning it will match any path that doesn't match your API endpoints. This works great for single-page applications.

Static files are served directly from Encore's Rust runtime with zero JavaScript execution, making them extremely fast. When you deploy with git push encore, your frontend deploys alongside your backend, giving you a single URL you can immediately share to demo your prototype.

For production applications with more complex frontend needs (React, Next.js, build pipelines), we recommend deploying your frontend to Vercel, Netlify, or similar services and using the generated API client to call your Encore backend.

Deployment

Deploy your AI-powered backend:

git add .
git commit -m "Add Replicate image generation"
git push encore
Enter fullscreen mode Exit fullscreen mode

Set your production Replicate token:

encore secret set --prod ReplicateToken
Enter fullscreen mode Exit fullscreen mode

Note: Encore Cloud is great for prototyping with fair use limits. For production workloads with higher usage, you can connect your AWS or GCP account and Encore will provision infrastructure directly in your cloud account.

Advanced features

Using webhooks for long-running models

For models that take longer to run, use webhooks instead of polling:

export const generateImageWebhook = api(
  { expose: true, method: "POST", path: "/ai/generate-async" },
  async (req: GenerateImageRequest): Promise<GenerateImageResponse> => {
    const prediction = await replicate.predictions.create({
      version: "black-forest-labs/flux-schnell",
      input: {
        prompt: req.prompt,
        aspect_ratio: req.aspectRatio || "1:1",
      },
      webhook: "https://your-app.com/ai/webhook",
      webhook_events_filter: ["completed"],
    });

    return {
      id: prediction.id,
      status: prediction.status,
    };
  }
);

// Handle webhook
export const handleWebhook = api.raw(
  { expose: true, method: "POST", path: "/ai/webhook" },
  async (req, res) => {
    const chunks: Buffer[] = [];
    for await (const chunk of req) {
      chunks.push(chunk);
    }
    const body = JSON.parse(Buffer.concat(chunks).toString());

    // Process completed prediction
    if (body.status === "succeeded") {
      // Save image, notify user, etc.
    }

    res.writeHead(200);
    res.end(JSON.stringify({ received: true }));
  }
);
Enter fullscreen mode Exit fullscreen mode

Running other models

Replicate has thousands of models. Here are some popular ones:

Large Language Models:

const output = await replicate.run(
  "meta/llama-2-70b-chat",
  {
    input: {
      prompt: "Explain quantum computing in simple terms",
    },
  }
);
Enter fullscreen mode Exit fullscreen mode

Speech to Text (Whisper):

const output = await replicate.run(
  "openai/whisper",
  {
    input: {
      audio: "https://example.com/audio.mp3",
    },
  }
);
Enter fullscreen mode Exit fullscreen mode

Video Generation:

const output = await replicate.run(
  "stability-ai/stable-video-diffusion",
  {
    input: {
      image: "https://example.com/image.png",
    },
  }
);
Enter fullscreen mode Exit fullscreen mode

Image-to-image transformations

Use Stable Diffusion for image editing:

interface TransformImageRequest {
  imageUrl: string;
  prompt: string;
  strength?: number; // 0-1, how much to transform
}

export const transformImage = api(
  { expose: true, method: "POST", path: "/ai/transform" },
  async (req: TransformImageRequest): Promise<GenerateImageResponse> => {
    const prediction = await replicate.predictions.create({
      version: "stability-ai/sdxl",
      input: {
        image: req.imageUrl,
        prompt: req.prompt,
        strength: req.strength || 0.8,
      },
    });

    return {
      id: prediction.id,
      status: prediction.status,
    };
  }
);
Enter fullscreen mode Exit fullscreen mode

Cost optimization

Replicate charges based on compute time. Here are tips to optimize costs:

  1. Use faster models - FLUX Schnell is faster than FLUX Pro
  2. Batch requests - generate multiple variations at once
  3. Cache results - store generated images to avoid re-generating
  4. Set timeouts - prevent runaway costs on failed predictions
  5. Monitor usage - track prediction counts and costs in Replicate dashboard

Next steps

If you found this tutorial helpful, consider starring Encore on GitHub to help others discover it.

Top comments (1)

Collapse
 
andout profile image
Ivan Cernja Encore

great