In this guide, I’ll walk through how I implemented an AI-driven message generator for rental requests using Cloudflare Workers AI, TanStack Start, and Llama 3.1.
My implementation leverages a modern edge-first stack:
-
Cloudflare Workers AI: Provides serverless access to open-source models (specifically
@cf/meta/llama-3.1-8b-instruct-fast). -
TanStack Start: Used for the full-stack application, utilizing
createServerFnfor seamless server-side logic. - Hyperdrive (PostgreSQL): Fetches real-time context (user profiles, rental details) to ground the AI's generation.
I created a lightweight wrapper to interact with Cloudflare's AI REST API. This handles authentication and request formatting.
import { env } from "cloudflare:workers";
export const runLLama = async (
input: {
max_tokens?: number;
messages: { role: string; content: string }[];
// ... validation types
},
model: string = "llama-3.1-8b-instruct-fast",
) => {
const url = `https://api.cloudflare.com/client/v4/accounts/${env.CLOUDFLARE_ACCOUNT_ID}/ai/run/@cf/meta/${model}`;
const response = await fetch(url, {
headers: {
accept: "application/json",
Authorization: `Bearer ${env.CLOUDFLARE_TOKEN}`, // Secure token handling
"Content-Type": "application/json",
},
method: "POST",
body: JSON.stringify(input),
});
const data = (await response.json()) as LLamaResponse;
return data.result.response;
};
The server function orchestrates the entire process. Instead of asking the AI to simply "write a message," I provide it with rich context fetched from the project database.
This function performs three key steps:
- Validates Input: Ensures I have a valid requestId and userId.
- Fetches Context: Retrieves the specific rental request and user profile from the database.
- Prompt Engineering: Constructs a detailed prompt ensuring the model follows a strict format.
export const generateMessageServerFn = createServerFn({ method: "POST" })
.inputValidator(...) // Validate inputs
.handler(async ({ data, context }): Promise<GeneratedMessage> => {
// 1. Fetch Request & Profile Data
const requestData = await getRentalRequestWithProfile(
context.sql,
data.requestId,
data.userId,
);
// 2. Build Human-Readable Context Strings
const occupantsText = [
`${request.adults} adult${request.adults > 1 ? "s" : ""}`,
// ... helps model understand composition (adults, children, pets)
].join(", ");
// 3. Construct the Prompt
const userPrompt = `Generate a professional and friendly introductory message...
Guest Information:
- Name: ${profile.first_name} ${profile.last_name}
- Employment: ${profile.employment}
Rental Request Details:
- Destination: ${city_name}, ${country_name}
- Duration: ${request.term_length} months
- Occupants: ${occupantsText}
Format your response EXACTLY as follows:
[Your title here]
---
[Your body text here]
`;
// 4. Call the AI
const response = await runLLama({
max_tokens: 512,
messages: [
{
role: "system",
content: "You are a professional rental message writer...",
},
{ role: "user", content: userPrompt },
],
});
// 5. Parse the Response
const [rawTitle, rawBody] = response.split("---\n");
return {
title: rawTitle.replace(/\*/g, "").trim(),
body: rawBody.trim(),
};
});
One of the trickiest parts of working with LLMs is ensuring the output is easy to parse programmatically without over-engineering it with complex JSON schemas.
I opted for a simple, robust approach by forcing a delimiter in the prompt (---). Here is how I process the raw text response:
// 5. Parse the Response
const [rawTitle, rawBody] = response.split("---\n");
return {
// Remove any standard Markdown bold syntax (**) that models often add to titles
title: rawTitle.replace(/\*/g, "").trim(),
body: rawBody.trim(),
};
Why I did this:
-
split("---\n"): In the prompt, I explicitly told Llama to separate the title and the body with ---. This allows me to reliably split the single string response into two distinct parts: the headline and the message content. -
replace(/\*/g, ""): LLMs have a strong tendency to "bold" titles using Markdown (e.g.,**Subject: Hello**). Since I render the title in my own UI component, which already handles styling, I use this regex to strip out those asterisk characters, ensuring I get clean, raw text.
Why This Approach Works
- Low Latency: By using Cloudflare's edge network and the "fast" variant of Llama 3.1, I achieved minimal response times.
-
Structured Output: Prompting the model to use specific delimiters (e.g.,
---) allows me to easily parse the title and body separately, maintaining a clean UI/UX. -
Privacy & Security: Authentication tokens are kept server-side (accessed via
env), and user data is processed securely within the request lifecycle.
Top comments (0)