Recently, I started working on a project that requires several data normalisation features. One of which is getting external data and intelligently linking it with existing internal data. Data normalisation is a common requirement, especially recently, since we can augment searches and matches as we deem fit by vectorising our data and querying the vectors with embedding models.
The project is built with Nuxt and hosted on Cloudflare for Edge availability. I initially set it up through NuxtHub Admin for CI/CD and used two core module server composables: hubVectorize() and hubAi(). NuxtHub is sunsetting the NuxtHub Admin at the end of this month. The core module no longer features Cloudflare’s automatic binding setup. Here, I share my experience building a demo application with similar AI requirements using Nuxt on Cloudflare, without the NuxtHub wrapper.
This article is the first of the Nuxt & Cloudflare AI Vector Pipeline Series, a three-part series.
- Part one - Nuxt & Cloudflare Vectorize: Setting up D1, Drizzle, and Workers AI
- Part two - Nuxt & Cloudflare Queues: Building a Data Sync Pipeline using Vectorize
- Part three - Implementing semantic matching in Nuxt with Cloudflare Vectorize
Let’s get going 😁
Building an AI-powered real estate listing Nuxt application
Together, we will build the backend of a fictitious real estate property listings aggregator website. If it were real, it would feature property listings from multiple real estate agencies. The app needs to regularly retrieve a feed of listings from these agencies and add them to the internal properties database. We need to link each agent property to a standardised location.
The challenge with such scenarios is that the data feed may not be categorised as needed; for example, an agent may include the location using a postcode, some might omit the postcode and use the area names, and others might mention the location in the property listing title. In all cases, human errors can prevail, such as typos and incomplete words. We will use AI to match agents' property locations against our locations database to correctly categorise the properties internally.
We will do this on a single Nuxt app, working with a Cloudflare AI worker, a Vector store, a D1 database, and a couple of Queue workers. Almost a monolith, but not precisely so. Since we are binding to multiple Cloudflare services, we can call this a distributed setup 😅.
The messy data challenge
For example, a single property in Shoreditch (an area of London) might appear in three completely different ways across our incoming feeds:
- Agent A: Uses the postcode "E1 6AN" as a specific postcode attribute.
- Agent B: Uses the standard area name "Shoreditch" as a specific location attribute.
- Agent C: Does not include postcode or location but instead adds the location as part of the property title “Hidden Gem 💎 (around Shoreditch)”.
While our database schema includes a list of postcodes mapped to areas, it lacks a native way to categorise a property's locations from a mix of postcodes, addresses, and descriptions, especially when they conflict or are misspelt. Without a smart normalisation layer, we cannot reliably categorise these listings, leaving our database fragmented and preventing us from maintaining a clean "Source of Truth" for our application.
💡 It is not always ideal to process business logic on an Edge provider such as Cloudflare, and it is often unnecessary. Let me know in the comments or on socials if you’d like me to write about some of the many alternative architectures that preserve Edge availability while reducing development and operational infrastructure costs.
In this article, we’ll set up the internal database, schema migrations, and ORM, deploy it to a Cloudflare D1 database, and prepare the AI worker binding for later.
The code for the entire demo application is publicly available on GitHub: Nuxt & Cloudflare AI Vector Pipeline Series - give it a ⭐.
Drizzle DB Schema on Cloudflare D1
Let’s start with a fresh Nuxt install, install Drizzle, Drizzle Kit for generating database schema migration files, and the Cloudflare / Wrangler dependencies:
pnpm dlx nuxi@latest init nuxt-data-ai-sync
cd nuxt-data-ai-sync
pnpm add drizzle-orm postgres
pnpm add -D drizzle-kit wrangler @types/node nitro-cloudflare-dev
We can also configure Nuxt to use Nitro’s experimental features, which we will need later. Mainly tasks and asyncContext. Your /nuxt.config.ts should include the correct Nitro preset and Nitro’s experimental features:
export default defineNuxtConfig({
compatibilityDate: '2025-12-06',
devtools: { enabled: true },
modules: [
'nitro-cloudflare-dev',
],
nitro: {
preset: 'cloudflare_module',
experimental: {
tasks: true,
asyncContext: true,
},
},
})
Deploy Nuxt as a worker on Cloudflare
Now’s a good time to add the initial Wrangler configuration so we can deploy to Cloudflare for the first time. Create the /wrangler.toml file at the project’s root and fill in the initial content:
name = "property-sync-app"
main = "./.output/server/index.mjs"
compatibility_date = "2025-08-02"
account_id = "xxxx" # <-- your account ID here
assets = { directory = "./.output/public", binding = "ASSETS" }
Now we can build and deploy our so far useless app on Cloudflare:
pnpm build
npx wrangler deploy
The output should look similar to the following;
⛅️ wrangler 4.53.0
───────────────────
🌀 Building list of assets...
✨ Read 14 files from the assets directory /home/keithmifsud/projects/keithmifsud/nuxt-data-ai-sync/.output/public
🌀 Starting asset upload...
🌀 Found 10 new or modified static assets to upload. Proceeding with upload...
+ /robots.txt
+ /_nuxt/error-500.pPqbDRtt.css
+ /favicon.ico
+ /_nuxt/b-_NRaZA.js
+ /_nuxt/builds/latest.json
+ /_nuxt/error-404.ClksgLvO.css
+ /_nuxt/entry.ClxZN-oh.css
+ /_nuxt/builds/meta/23e8b19f-c1c2-45ca-83bc-9cad6a8adfcf.json
+ /_nuxt/Cv0oBIJc.js
+ /_nuxt/BVjrWf75.js
Uploaded 3 of 10 assets
Uploaded 6 of 10 assets
Uploaded 10 of 10 assets
✨ Success! Uploaded 10 files (3.21 sec)
Total Upload: 685.31 KiB / gzip: 184.57 KiB
Worker Startup Time: 7 ms
Your Worker has access to the following bindings:
Binding Resource
env.ASSETS Assets
Uploaded property-sync-app (15.49 sec)
Deployed property-sync-app triggers (4.65 sec)
https://property-sync-app.mifsud-k.workers.dev
Current Version ID: 3d88b77c-2d1f-4c8e-8f01-22823a4a1c61
Thus, you can test that the URL shows the Nuxt welcome page.
I like to get the noticeable housekeeping tasks tackled as soon as possible. Since we will definitely be dealing with queued processes, we will be working across different contexts (Cloudflare Workers using the V8 Engine and Nitro Runtime using Node.js) asynchronously. Give the worker settings in Cloudflare’s UI a once-over and make sure your runtime has the nodejs_compat flag enabled.
Modelling the database schema using Drizzle ORM
Since our schema is relatively small, we can place it in a single file. Create /server/database/schema.ts with:
import { sqliteTable, text, integer } from 'drizzle-orm/sqlite-core'
import { relations, sql } from 'drizzle-orm'
const generateUUID = () => crypto.randomUUID()
// Agents
export const agents = sqliteTable('agents', {
id: text('id').primaryKey().$defaultFn(generateUUID),
name: text('name').notNull(),
apiRoute: text('api_route').notNull().unique(),
createdAt: text('created_at').default(sql`(CURRENT_TIMESTAMP)`),
})
// Locations (Canonical)
export const locations = sqliteTable('locations', {
id: text('id').primaryKey().$defaultFn(generateUUID),
name: text('name').notNull(),
// D1/SQLite stores arrays as strings. We handle parsing in the app.
postcodes: text('postcodes', { mode: 'json' }).$type<string[]>().notNull(),
createdAt: text('created_at').default(sql`(CURRENT_TIMESTAMP)`),
})
// Properties (Incoming)
export const properties = sqliteTable('properties', {
id: text('id').primaryKey().$defaultFn(generateUUID),
externalRef: text('external_ref').notNull(), // Agent's ID for the property
agentId: text('agent_id').references(() => agents.id).notNull(),
locationId: text('location_id').references(() => locations.id).notNull(),
title: text('title').notNull(),
originalLocation: text('original_location').notNull(), // The messy string
createdAt: text('created_at').default(sql`(CURRENT_TIMESTAMP)`),
})
export const agentsRelations = relations(agents, ({ many }) => ({
properties: many(properties),
}))
export const propertiesRelations = relations(properties, ({ one }) => ({
agent: one(agents, {
fields: [properties.agentId],
references: [agents.id],
}),
location: one(locations, {
fields: [properties.locationId],
references: [locations.id],
}),
}))
Note that SQLite does not support UUID types, which is why we’re using the text type and a function to generate the UUIDs.
We will populate the agents' table with a list of agents from whom we'll fetch a listings feed. The locations table will hold our known locations, which we'll later use to populate the vector index and eventually match against the agent property listings. Finally, the properties table will hold agent properties with a resolved location.
I’ve also added a couple of soft relations as an example.
While we're here, we can generate the database migration files using Drizzle Kit. Generating them is relatively straightforward (unless you're working in a team; Drizzle Kit can result in several conflicts with the migration files order 😮💨). Just run npx drizzle-kit generate to generate the migration files from the schema we just wrote.
Running Drizzle migrations on a Cloudflare D1 database
We can now create a new D1 database on Cloudflare. We can do this using Wrangler CLI:
npx wrangler d1 create property-sync-db
And add the database binding to /wrangler.toml:
[[d1_databases]]
binding = "DB"
database_name = "property-sync-db"
database_id = "the-id-returned-from-the-wrangler-command"
migrations_dir = "server/database/migrations"
Creating the D1 database on Cloudflare does not automatically bind it to our worker. Bindings are added on deployment based on the wrangler.toml file entries. Notice that we're also specifying the migrations directory location, which we will need soon.
⚠️ Note: Cloudflare’s D1 is not always a good fit. Scaling large datasets requires manual sharding, and as with all SQLite databases, D1 lacks several key features that production-grade applications often need.
We also need to enter some database credentials in /drizzle.config.ts.
import { defineConfig } from 'drizzle-kit'
export default defineConfig({
dialect: 'sqlite',
schema: './server/database/schema.ts',
out: './server/database/migrations',
driver: 'd1-http',
dbCredentials: {
accountId: process.env.CLOUDFLARE_ACCOUNT_ID!,
databaseId: process.env.CLOUDFLARE_DATABASE_ID!,
token: process.env.DRIZZLE_API_TOKEN!,
},
})
You should add the environment variables and use them instead of adding them directly in the config file. This so you can swap them for different environments. We don’t have dbCredentials.token, but migrations seem to fail if not included. My .env file looks like this:
CLOUDFLARE_ACCOUNT_ID="my_cloudflare_account_id"
CLOUDFLARE_DATABASE_ID="the_database_id_from_cloudflare"
# Notice I don't have DRIZZLE_API_TOKEN
And with that, we can run the database migrations directly on the D1 instance by running:
npx wrangler d1 migrations apply property-sync-db --remote
Let’s seed some data on our Cloudflare D1 Database
I'm going to create an API endpoint to seed some needed data into the D1 database. You'd obviously not do this, this way, in a real application. Still, this way we can move this task forward faster and lay some groundwork for when we need other internal API endpoints later, since Cloudflare's AI features don't run in dev environments. However, to make it harder for attackers, we should add an environment variable to the Cloudflare worker.
We can add the secret using Wrangler:
npx wrangler secret put NUXT_INTERNAL_API_SECRET
Or from Cloudflare’s dashboard, navigate to Worker & Pages, select this worker, and go to settings. Then add an encrypted environment variable named NUXT_INTERNAL_API_SECRET with a secret string. Feel free to add the same variable locally if you intend to use the upcoming API endpoint from localhost.
We need to seed some dummy agents and a few locations so we can generate text embeddings, add the locations to the vector store, and later use AI to resolve the messy listings from the agents' feeds.
Create an API endpoint at: /server/api/internals/seed.get.ts and add the following contents:
import { drizzle } from 'drizzle-orm/d1'
import { agents, locations } from '~~/server/database/schema'
export default defineEventHandler(async (event) => {
const config = useRuntimeConfig()
if (getHeader(event, 'x-secret') !== config.internalApiSecret) {
throw createError({ statusCode: 401, statusMessage: 'Unauthorized' })
}
const env = event.context.cloudflare.env
if (!env.DB) {
throw createError(
{ statusCode: 500, statusMessage: 'Database not found in environment' })
}
const db = drizzle(env.DB)
console.log('🌱 Starting Seed...')
// 3. Seed Agents
const agentsData = [
{
name: 'The Formalist',
apiRoute: '/api/agents/formalist',
},
{
name: 'The Traditionalist',
apiRoute: '/api/agents/traditionalist',
},
{
name: 'The Marketer',
apiRoute: '/api/agents/marketer',
},
]
await db.insert(agents).values(agentsData)
// 4. Seed Locations (Corrected Data)
const locationsData = [
{
name: 'Shoreditch',
postcodes: ['E1', 'E2', 'EC1', 'EC2', 'N1'],
},
{
name: 'Canary Wharf',
postcodes: ['E14'],
},
{
name: 'Brixton',
postcodes: ['SW2', 'SW9'],
},
{
name: 'Notting Hill',
postcodes: ['W10', 'W11', 'W2'],
},
]
await db.insert(locations).values(locationsData)
return {
success: true,
message: 'Database Seeded',
stats: {
agents: agentsData.length,
locations: locationsData.length,
},
}
})
That’s it, build, deploy, and run the seeder:
pnpm build
npx wrangler deploy
curl -H "x-secret: [YOUR_SECRET]" "https://[YOUR_WORKER_URL]/api/internals/seed
You can now see the seeded data through Cloudflare’s Database Explorer by navigating to:
Storage & databases → D1 SQL database → property-sync-db → Explore Data
Cloudflare Workers AI
Although this is not a tutorial about AI, it is important to note the difference between Cloudflare Workers AI and Cloudflare Vectorize. Vector stores are "simply" a database of vectors; Pinecone{target="_blank"} is a popular alternative to Cloudflare Vectorize. In contrast, Workers AI is the compute part, the processor that generates text embedding from our data, which can later be stored as vectors.
In short, Vectorize is the memory, and Workers AI is the brain. With standalone solutions like Pinecone, you generally still need separate services for the memory and the brain (e.g., Pinecone + OpenAI). The Cloudflare ecosystem bundles the generation of embeddings (Workers AI) and their storage (Vectorize) into a single, low-latency workflow.
Since Workers AI is a built-in platform service, we only need to specify the binding in our wrangler.toml config to access it directly form our Nuxt application.
Go ahead and add the binding in wrangler.toml:
[ai]
binding = "AI"
This binding will be effective on our next deployment.
Moving forward with storing vectors on Cloudflare’s Vectorize Index
In the next part of this Nuxt & Cloudflare AI Vector Pipeline Series, we will be building a Background Data Sync Pipeline using Vectorize. In the article, we will also take a deep dive into working with Cloudflare Queues from our Nuxt application and, most importantly, listen and handle the dispatched Queue messages from the same Nuxt application, i.e., the same Cloudflare Worker.
I hope you found this article helpful. Feel free to ask me anything in the comments below or on Social. Please share it with your peers and subscribe to get notified when I publish new articles 💙.
Need help with Nuxt & Cloudflare?
Keith Mifsud is an Official Nuxt Agency Partner, learn more.
Top comments (0)