How to integrate AI-driven message generation with Cloudflare services: A step-by-step guide

#ai #javascript #react #cloudcomputing

In this guide, I’ll walk through how I implemented an AI-driven message generator for rental requests using Cloudflare Workers AI, TanStack Start, and Llama 3.1.

My implementation leverages a modern edge-first stack:

Cloudflare Workers AI: Provides serverless access to open-source models (specifically @cf/meta/llama-3.1-8b-instruct-fast).
TanStack Start: Used for the full-stack application, utilizing createServerFn for seamless server-side logic.
Hyperdrive (PostgreSQL): Fetches real-time context (user profiles, rental details) to ground the AI's generation.

I created a lightweight wrapper to interact with Cloudflare's AI REST API. This handles authentication and request formatting.

import { env } from "cloudflare:workers";

export const runLLama = async (
  input: {
    max_tokens?: number;
    messages: { role: string; content: string }[];
    // ... validation types
  },
  model: string = "llama-3.1-8b-instruct-fast",
) => {
  const url = `https://api.cloudflare.com/client/v4/accounts/${env.CLOUDFLARE_ACCOUNT_ID}/ai/run/@cf/meta/${model}`;

  const response = await fetch(url, {
    headers: {
      accept: "application/json",
      Authorization: `Bearer ${env.CLOUDFLARE_TOKEN}`, // Secure token handling
      "Content-Type": "application/json",
    },
    method: "POST",
    body: JSON.stringify(input),
  });

  const data = (await response.json()) as LLamaResponse;
  return data.result.response;
};

The server function orchestrates the entire process. Instead of asking the AI to simply "write a message," I provide it with rich context fetched from the project database.

This function performs three key steps:

Validates Input: Ensures I have a valid requestId and userId.
Fetches Context: Retrieves the specific rental request and user profile from the database.
Prompt Engineering: Constructs a detailed prompt ensuring the model follows a strict format.

export const generateMessageServerFn = createServerFn({ method: "POST" })
  .inputValidator(...) // Validate inputs
  .handler(async ({ data, context }): Promise<GeneratedMessage> => {
    // 1. Fetch Request & Profile Data
    const requestData = await getRentalRequestWithProfile(
      context.sql,
      data.requestId,
      data.userId,
    );

    // 2. Build Human-Readable Context Strings
    const occupantsText = [
      `${request.adults} adult${request.adults > 1 ? "s" : ""}`,
      // ... helps model understand composition (adults, children, pets)
    ].join(", ");

    // 3. Construct the Prompt
    const userPrompt = `Generate a professional and friendly introductory message...

    Guest Information:
    - Name: ${profile.first_name} ${profile.last_name}
    - Employment: ${profile.employment}

    Rental Request Details:
    - Destination: ${city_name}, ${country_name}
    - Duration: ${request.term_length} months
    - Occupants: ${occupantsText}

    Format your response EXACTLY as follows:
    [Your title here]
    ---
    [Your body text here]
    `;

    // 4. Call the AI
    const response = await runLLama({
      max_tokens: 512,
      messages: [
        {
          role: "system",
          content: "You are a professional rental message writer...",
        },
        { role: "user", content: userPrompt },
      ],
    });

    // 5. Parse the Response
    const [rawTitle, rawBody] = response.split("---\n");
    return {
      title: rawTitle.replace(/\*/g, "").trim(),
      body: rawBody.trim(),
    };
  });

One of the trickiest parts of working with LLMs is ensuring the output is easy to parse programmatically without over-engineering it with complex JSON schemas.

I opted for a simple, robust approach by forcing a delimiter in the prompt (---). Here is how I process the raw text response:

// 5. Parse the Response
const [rawTitle, rawBody] = response.split("---\n");

return {
  // Remove any standard Markdown bold syntax (**) that models often add to titles
  title: rawTitle.replace(/\*/g, "").trim(), 
  body: rawBody.trim(),
};

Why I did this:

split("---\n"): In the prompt, I explicitly told Llama to separate the title and the body with ---. This allows me to reliably split the single string response into two distinct parts: the headline and the message content.
replace(/\*/g, ""): LLMs have a strong tendency to "bold" titles using Markdown (e.g., **Subject: Hello**). Since I render the title in my own UI component, which already handles styling, I use this regex to strip out those asterisk characters, ensuring I get clean, raw text.

Why This Approach Works

Low Latency: By using Cloudflare's edge network and the "fast" variant of Llama 3.1, I achieved minimal response times.
Structured Output: Prompting the model to use specific delimiters (e.g., ---) allows me to easily parse the title and body separately, maintaining a clean UI/UX.
Privacy & Security: Authentication tokens are kept server-side (accessed via env), and user data is processed securely within the request lifecycle.

DEV Community

How to integrate AI-driven message generation with Cloudflare services: A step-by-step guide

Why This Approach Works

Top comments (0)