Vasiliy Zakharchenko

Posted on Apr 10 • Edited on Apr 13 • Originally published at community.developer.atlassian.com

AI Magic in Atlassian Forge: Local Semantic Search with Forge SQL

#ai #atlassian #webassembly #showdev

Introduction

Did you know that Forge SQL can be used for semantic search?

Since Forge SQL is backed by TiDB, and TiDB supports Vector Search, it is now possible to store embeddings in the database and query them by semantic similarity. That means your app can search by meaning, not only by exact keyword matches. PingCAP highlights semantic search, recommendation systems, and Retrieval-Augmented Generation (RAG) as key use cases for this capability.

This is a very practical AI pattern. Traditional search works best when users type the same words that already exist in the stored content. Semantic search works differently: both the documents and the user query are converted into embeddings, and the database returns the closest matches based on vector distance. Because of that, a query can still find the right result even when the wording is completely different.

This is also one of the core ideas behind RAG. Before an LLM generates an answer, the system first retrieves the most relevant documents and passes them as context. Better retrieval usually means better answers.

In this article, I will show how to build this pattern inside an Atlassian Forge app while still keeping the architecture aligned with Runs on Atlassian. In this example, embeddings are generated locally in the Custom UI frontend, stored in Forge SQL using Forge SQL ORM, and queried directly with vector search. No external AI API is required for the semantic search flow. Runs on Atlassian eligibility depends on meeting platform requirements, including around egress, so this kind of local approach can be especially useful for Forge developers.

The article has two parts:

How embeddings are generated in Custom UI and processed in the Forge backend
A short demonstration of the app in action, with a link to the full video walkthrough

Building the Local Semantic Search Flow

To build semantic search inside a Forge app, we need a way to generate embeddings locally in the browser, send them to the backend, and then use Forge SQL to search by vector similarity.

In this example, the flow consists of four steps:

choosing a lightweight embedding model
configuring the frontend to run the model locally
generating vectors from user input
sending those vectors to the backend and querying Forge SQL

1. Choosing the embedding model

The first step was choosing an embedding model that could realistically run inside Forge Custom UI.

For this example, I used Xenova/all-MiniLM-L6-v2.

I chose it for a very practical reason: it is lightweight and fits much better into the kind of size and runtime constraints that Forge developers usually care about.

Since the goal of this example is to keep the semantic search flow local, the model needs to run directly in the browser without relying on an external AI API. A smaller model is a much better fit for that approach.

From the full model repository, I only needed a small set of files:

config.json
special_tokens_map.json
tokenizer.json
tokenizer_config.json
onnx/model_quantized.onnx

In short, each file has a specific role:

config.json contains the main model configuration
special_tokens_map.json defines special tokens used by the tokenizer
tokenizer.json contains the tokenizer itself, including how text is split into tokens
tokenizer_config.json contains tokenizer settings and metadata
onnx/model_quantized.onnx is the actual neural network model used for inference in the browser

The most important part here is model_quantized.onnx. This is the file that performs embedding generation. It is a quantized ONNX model, which means it is optimized to be smaller and more practical for client-side execution.

The tokenizer files are also essential, because the model does not work directly with raw text. First, the input text must be converted into tokens in exactly the same way the model expects. Only after that can the ONNX model generate the embedding vector.

So in practice, the setup is split into two parts:

the tokenizer files prepare the text input
the ONNX model converts that tokenized input into an embedding vector

This is enough to run local embedding generation in the frontend and a good fit for a Forge app that aims to stay simple, portable, and aligned with Runs on Atlassian badge.

2. Configuring the frontend to run the model locally

To run semantic search fully inside the Forge app, the frontend first needs to load the embedding model locally in the browser.

The main dependency for this is:

npm i @huggingface/transformers -S

After that, the application needs two groups of files:

the model files
the ONNX runtime WebAssembly files

2.1 Adding the model files

The model itself, together with its tokenizer and configuration files, can be placed into the frontend public folder. In my case, I used this structure:

public/
   models/
      all-MiniLM-L6-v2/
         config.json
         special_tokens_map.json
         tokenizer.json
         tokenizer_config.json
         onnx/
           model_quantized.onnx

This allows the frontend to load the model directly from the app’s own static assets.

2.2 Adding the ONNX WebAssembly runtime files

In addition to the model, the browser also needs the ONNX runtime files used to execute the model locally.

These files can be copied from node_modules/onnxruntime-web/dist/ into public/wasm:

ort-wasm-simd-threaded.mjs
ort-wasm-simd-threaded.wasm
ort-wasm-simd-threaded.asyncify.mjs
ort-wasm-simd-threaded.asyncify.wasm

So the final structure looks like this:

public/
   models/
      all-MiniLM-L6-v2/
         onnx/
             model_quantized.onnx
         config.json
         special_tokens_map.json
         tokenizer.json
         tokenizer_config.json
   wasm/
      ort-wasm-simd-threaded.mjs
      ort-wasm-simd-threaded.wasm
      ort-wasm-simd-threaded.asyncify.mjs
      ort-wasm-simd-threaded.asyncify.wasm

At this point, the frontend has everything it needs to load the model and run inference locally.

2.3 Initializing the model in the frontend

The next step is to initialize the model when the frontend starts. This only needs to happen once. After the first load, the browser can reuse cached assets, which makes the next startup much faster.

Here is the code I used:

/// <reference types="vite/client" />
import { env, FeatureExtractionPipeline, pipeline, ProgressInfo } from "@huggingface/transformers";

env.localModelPath = `./models/`;

env.allowLocalModels = true;
env.allowRemoteModels = false;
env.useBrowserCache = false;
env.useWasmCache = true;
const isDevMode = import.meta.env.DEV;
env.backends.onnx.wasm!.wasmPaths = isDevMode ? `${window.location.origin}/wasm/` : `../wasm/`;

const MODEL_NAME = `all-MiniLM-L6-v2`;

export interface VectorBuilder {
  getVector(text: string): Promise<number[]>;
}

interface MiniLLM {
  init(progress: (progressInfo: ProgressInfo) => void): Promise<VectorBuilder>;
}

class VectorBuilderImpl implements VectorBuilder {
  private readonly extractor: FeatureExtractionPipeline;

  constructor(extractor: FeatureExtractionPipeline) {
    this.extractor = extractor;
  }

  async getVector(text: string): Promise<number[]> {
    const output = await this.extractor(text, {
      pooling: "mean",
      normalize: true,
    });
    return Array.from(output.data) as number[];
  }
}

class MiniLLMImpl implements MiniLLM {
  async init(progress: (progressInfo: ProgressInfo) => void): Promise<VectorBuilder> {
    const extractor = await pipeline("feature-extraction", MODEL_NAME, {
      progress_callback: progress,
    });
    return new VectorBuilderImpl(extractor);
  }
}

export const miniLLM: MiniLLM = new MiniLLMImpl();

What this configuration does

There are a few important details here.

env.localModelPath = './models/' tells transformers.js where to find the model files inside the frontend assets.

env.allowLocalModels = true and env.allowRemoteModels = false make sure that the application only uses local model files and does not try to download anything from an external model registry.

env.useWasmCache = true allows the WebAssembly runtime files to be cached, which helps reduce repeated loading costs.

The following line is especially important:

const isDevMode = import.meta.env.DEV;

env.backends.onnx.wasm!.wasmPaths = isDevMode ? `${window.location.origin}/wasm/` : `../wasm/`;

I added isDevMode because the WebAssembly path needs to be resolved differently when running through forge tunnel. Without that adjustment, the runtime files might not load correctly in local development mode.

Tracking model loading progress

The progress callback is used to show model loading progress in the UI, for example through a spinner or progress indicator.

progress: (progressInfo: ProgressInfo) => void

This is useful because the model is loaded into the frontend on startup, and that can take a little time on the first run.

An important detail here is that these files are loaded from the app’s own host, not from an external service. So this is just normal asset loading from the frontend itself, not an external AI API call.

Result

After this setup, the frontend is ready to initialize the embedding model locally and generate vectors directly in the browser.

In the next step, we can use this initialized pipeline to convert document text and search queries into embedding vectors.

3. Generating the vector

Once the model is available in the frontend, the application can convert plain text into an embedding vector.

This happens in two places:

when a user adds a document
when a user enters a search query

In both cases, the frontend takes the input text, runs it through the embedding model, and produces a fixed-size numeric vector. That vector is the semantic representation of the text. Instead of relying on exact words, the application can now compare meanings by comparing vectors.

This is the core idea behind semantic search: text is first transformed into embeddings, and only then used for similarity search.

In practice, generating the vector is very simple:

const vectorBuilder = await miniLLM.init(onProgress);
const vector = await vectorBuilder.getVector(text);

Here, miniLLM.init(onProgress) initializes the local embedding pipeline, and getVector(text) converts the input text into a numeric vector.

The same approach is used both for storing documents and for searching. When a document is added, the frontend generates an embedding for the document text before sending it to the backend. When a user performs a search, the frontend generates an embedding for the query text and sends that vector to the search resolver.

So from this point on, the application no longer works with plain text only. It works with semantic representations of that text, which is what makes similarity search possible.

4. Sending the vector to the backend and processing it in Forge SQL

Once the vector is generated in the frontend, it can be sent to a Forge resolver and stored in Forge SQL together with the original document.

In my example, the model is defined like this:

export const embeddedDocuments = mysqlTable(
  "embedded_documents",
  {
    id: int().autoincrement().notNull(),
    document: text().notNull(),
    title: varchar({ length: 255 }).notNull(),
    embedding: vectorTiDBType("embedding", { dimension: 384 }).notNull(),
  },
  (table) => [primaryKey({ columns: [table.id], name: "id" })],
);

And the migration looks like this:

import { MigrationRunner } from "@forge/sql/out/migration";

export default (migrationRunner: MigrationRunner): MigrationRunner => {
  return migrationRunner.enqueue(
    "v1_MIGRATION0",
    "CREATE TABLE `embedded_documents` ( `id` int AUTO_INCREMENT NOT NULL, `document` text NOT NULL, `title` VARCHAR(255) NOT NULL, `embedding` VECTOR(384) NOT NULL, CONSTRAINT `id` PRIMARY KEY(`id`) )",
  );
};

The important part here is the vector dimension. It must exactly match the output dimension of the embedding model. In this case, all-MiniLM-L6-v2 produces vectors with dimension 384, so both the ORM model and the SQL migration use 384 as well.

After that, saving a document is straightforward. The frontend sends the document text, title, and generated embedding vector to the resolver, and the backend inserts them into the table:

resolver.define(
  "create",
  async (req: Request<{ data: InferInsertModel<typeof embeddedDocuments> }>): Promise<number> => {
    const payload = req.payload.data;
    const res = await forgeSQL.insert(embeddedDocuments).values([payload]);
    return res[0].insertId;
  },
);

At this point, the database stores not only the original text, but also its semantic representation as a vector.

The search flow works in a similar way. The frontend generates an embedding for the user’s query and sends that vector to the backend. Then the backend uses vecCosineDistance to compare the query vector with all stored document vectors and return the closest matches:

resolver.define(
  "search",
  async (
    req: Request<{ vector: number[] }>,
  ): Promise<{ id: number; title: string; document: string; distance: number }[]> => {
    const vector = req.payload.vector;
    const fieldAlias = sql.raw("distance");
    const distance = sql<number>`${vecCosineDistance(embeddedDocuments.embedding, vector)} as \`${fieldAlias}\``;
    return forgeSQL
      .select({
        id: embeddedDocuments.id,
        document: embeddedDocuments.document,
        title: embeddedDocuments.title,
        distance: distance,
      })
      .from(embeddedDocuments)
      .orderBy(asc(fieldAlias))
      .limit(formatLimitOffset(5));
  },
);

Under the hood, the generated SQL looks like this:

select
`id` as `a_id_id`,
`document` as `a_document_document`,
`title` as `a_title_title`,
 VEC_COSINE_DISTANCE(`embedded_documents`.`embedding`, VEC_FROM_TEXT(?)) as `distance`
from `embedded_documents`
order by distance asc
limit 5

This is the key step where semantic search actually happens. Instead of checking whether the query contains the same words as the document, Forge SQL compares vector similarity and returns the nearest results.

So the backend responsibility is very simple:

store document embeddings
receive the query embedding
calculate vector distance in Forge SQL
return the nearest documents sorted by similarity

At this point, the full semantic search pipeline is complete: the frontend generates embeddings locally, the backend stores them in Forge SQL, and search works by vector similarity instead of exact keyword matching - while still preserving Runs on Atlassian eligibility.

A short demonstration of the app in action

After the technical setup is complete, the easiest way to understand the value of semantic search is to see it working on a small set of example documents.

For this demo, I added five documents to the application. Each document has a title and a longer text description. When a document is submitted, the frontend generates its embedding locally and the backend stores both the original text and the vector in Forge SQL.

Adding sample documents

To populate the demo dataset, I added the following documents.

Title: Dogs

Document Text:

The Unwavering Bond: A Comprehensive Look at Domestic Dogs

Domestic dogs, scientifically known as *Canis lupus familiaris*, have shared a unique evolutionary journey with humans for over fifteen thousand years. Originally descended from ancient wolves, these resilient mammals have transitioned from wild predators to beloved family members, earning their reputation as "man's best friend." Their primary role has shifted significantly through history; while they were once valued strictly for their hunting prowess and guarding abilities, modern canines are now primarily cherished for their companionship and emotional support.

Physically, dogs exhibit an incredible diversity in size, coat texture, and temperament. From the tiny Chihuahua to the massive Great Dane, every breed possesses specific traits developed through centuries of selective breeding. Beyond their physical attributes, dogs are highly intelligent social animals capable of understanding human emotions and complex commands. They communicate through a sophisticated range of vocalizations, including barks and whines, alongside subtle body language like tail wagging or ear positioning.

Furthermore, the working capabilities of dogs remain vital to society today. Specialized service animals assist individuals with visual impairments, while brave search-and-rescue teams navigate treacherous terrain to save lives. Their acute sense of smell, which is thousands of times more sensitive than a human's, allows them to detect specific scents with remarkable precision. Whether they are performing a high-stakes job or simply waiting patiently for their owner to return home, dogs continue to demonstrate an unparalleled level of loyalty, devotion, and unconditional love that enriches human lives across every culture.

Title: Tree

Document Text:

The Silent Giants: Understanding the Life of Trees

Trees are the fundamental pillars of our planet's terrestrial ecosystems, serving as complex biological organisms that sustain life on Earth. As perennial plants with an elongated stem or trunk, they are uniquely characterized by their woody structure and extensive root systems. Through the remarkable process of photosynthesis, trees convert sunlight, water, and carbon dioxide into life-sustaining oxygen and glucose. This chemical transformation not only supports the tree's own growth but also regulates the global atmospheric balance, making forests the "lungs of our planet."

The internal anatomy of a tree is a marvel of natural engineering. Beneath the protective outer bark lies the cambium layer, which facilitates the growth of new cells, and the xylem, a sophisticated vascular system that transports nutrients from the earth to the highest leaves. Throughout the seasons, deciduous trees undergo dramatic transformations, shedding their foliage in autumn to conserve energy before the harsh winter months. In contrast, evergreens maintain their needles year-round, showcasing the diverse evolutionary strategies plants use to survive in varying climates.

Beyond their biological functions, trees provide critical habitats for countless species of insects, birds, and fungi. They stabilize the soil against erosion, offer cooling shade during intense heat, and contribute to the water cycle by releasing moisture through transpiration. For humanity, trees have been an essential resource for millennia, providing timber for construction, fruit for sustenance, and a profound sense of tranquility. Protecting these ancient, towering organisms is vital for maintaining biodiversity and ensuring the environmental health of future generations.

Title: Fish

Document Text:

The Aquatic Realm: Exploring the World of Fish

Fish represent a diverse group of craniate organisms that have mastered life in the world's oceans, rivers, and lakes for over five hundred million years. As cold-blooded vertebrates, they are perfectly adapted to their underwater environments, utilizing specialized organs called gills to extract life-sustaining oxygen directly from the water. Unlike land-dwelling mammals, fish possess streamlined bodies covered in protective scales and use various fins for propulsion, stability, and precise maneuvering through dense aquatic currents.

The biological variety among fish is staggering, ranging from the tiny, colorful inhabitants of tropical coral reefs to the colossal whale sharks that roam the open sea. Many species have evolved incredible sensory capabilities, such as the lateral line system, which detects minute vibrations and pressure changes in the surrounding water. This "sixth sense" allows them to navigate in complete darkness, avoid predators, and hunt with remarkable accuracy. Additionally, some fish exhibit complex social behaviors, forming massive schools that move in perfect unison to confuse attackers or increase foraging efficiency.

Reproduction and survival strategies in the aquatic world are equally fascinating. While some fish lay thousands of delicate eggs in hidden nests, others, like certain sharks, give birth to fully formed live young. Their role in the global food web is indispensable, as they serve as a primary protein source for billions of humans and countless other predators. From the deepest abyssal trenches to the shallowest mountain streams, fish continue to thrive as a testament to evolutionary resilience, playing a vital role in maintaining the delicate ecological balance of our blue planet's hydrosphere.

Title: Cat

Document Text:

The Enigmatic Grace: Understanding the Domestic Cat

The domestic cat, or *Felis catus*, is a small carnivorous mammal celebrated for its agility, independent spirit, and mysterious demeanor. Having lived alongside humans for nearly ten thousand years, cats were originally revered in ancient societies—most notably in Egypt—for their ability to protect grain stores from rodents. Unlike dogs, which were bred for cooperation, cats have largely retained their solitary hunting instincts, making them fascinatingly self-sufficient companions in modern households.

Physically, cats are marvels of biological engineering. Their skeletons are incredibly flexible, allowing them to squeeze through tight spaces and always land on their feet thanks to a highly developed righting reflex. They possess extraordinary sensory perceptions; their night vision is far superior to that of humans, and their retractable claws allow for silent stalking and efficient climbing. A cat’s communication is equally nuanced, ranging from the gentle vibration of a purr, which often signals contentment or self-healing, to the sharp hiss used for territorial defense.

Behaviorally, cats are known for their fastidious grooming habits and complex social signals. While they are often labeled as aloof, many cats form deep emotional bonds with their owners, expressing affection through "kneading" or gentle head-butts. Their predatory prowess remains intact, even in indoor environments, where they often treat toys as "prey" to satisfy their instinctive need to hunt. As one of the world's most popular pets, cats continue to captivate us with their blend of wild heritage and domestic charm, offering a quiet, observant presence that has inspired artists and thinkers for millennia.

Title: Mice

Document Text:

The Smallest Survivors: The World of Mice

Mice are small rodents belonging to the family Muridae, known for their incredible adaptability and presence in nearly every corner of the globe. Characterized by their pointed snouts, large rounded ears, and long, thin tails, these tiny mammals have successfully thrived alongside human civilizations for thousands of years. While often viewed as mere pests in granaries, mice are highly complex creatures with sophisticated social structures and remarkable survival instincts that allow them to inhabit diverse environments ranging from dense forests to urban households.

Biologically, mice are built for stealth and speed. Their whiskers, or vibrissae, are highly sensitive tactile organs that allow them to navigate in total darkness by sensing air currents and physical obstacles. They possess an extraordinary reproductive rate, a necessary evolutionary strategy to counter their role as a primary food source for numerous predators, including owls, snakes, and felines. Despite their small stature, mice are surprisingly intelligent; they exhibit problem-solving abilities and can communicate with one another using ultrasonic vocalizations that are completely inaudible to the human ear.

In the realm of science and history, the mouse has played an indispensable role. Due to their genetic similarity to humans, mice are the most commonly studied model organisms in medical research, contributing to countless breakthroughs in genetics and pharmacology. Whether they are scurrying through a field or living in a controlled laboratory setting, mice demonstrate a level of resilience and biological efficiency that far outweighs their size. Their ability to find food in the most difficult conditions and their cautious, nocturnal nature continue to make them one of the most successful mammalian species on Earth.

Trying semantic search queries

Once the dataset is ready, the next step is to test how the application behaves with natural-language queries.

The interesting part here is that the queries do not need to contain the exact title of the document. In fact, they work best when they describe the concept in a more human way.

Example 1:

I am looking for information about large organisms that live for hundreds of years, have a woody trunk, and use their leaves to turn sunlight into energy while providing shade and stabilized soil for the ecosystem.

Result: Tree (55.66%)

This query contains a lot of extra words such as “I am looking for information about”, but the important semantic signals are still there: woody trunk, sunlight into energy, shade, and stabilized soil. A keyword search might be less reliable here, but semantic search can still understand the meaning and rank Tree as the closest match.

Example 2:

I am looking for information about small domestic predators that were respected in ancient history, are very independent, have excellent night vision, and can land on their feet when they jump from high places.

Result: Cat (39.90%)

This example is useful because the query never says the word cat. Instead, it describes distinctive traits: ancient history, independence, night vision, and landing on their feet. A traditional search for the exact word would fail here, but semantic search can still connect the description to the Cat document.

Example 3:

Tell me about tiny mammals that are often found in houses or fields, which scientists use in laboratories to study genetics and develop new medicines because they breed very fast and are biologically similar to humans.

Result: Mice

This is a good example of a long user-style query. The user might not remember the exact word mice, but they remember the context: small mammals, homes and fields, laboratory research, genetics, and fast reproduction. That is exactly the kind of scenario where semantic search becomes much more useful than plain keyword matching.

What this demo shows

This small demo shows the practical difference between keyword search and semantic search.

The application is not matching documents only by title or exact words. Instead, it compares the semantic meaning of the query vector against the stored document vectors and returns the nearest results. That is why the search can still work even when the user describes the idea indirectly or uses very different wording.

If you want to see the complete flow in action, including model loading, document creation, and semantic search in the UI, you can watch the full video walkthrough here:

Conclusion

This example shows that semantic search can be implemented directly in Atlassian Forge by combining local embeddings in Custom UI with vector search in Forge SQL.

That already gives you a useful AI-powered retrieval flow: the app searches by meaning, not just by exact words, while still preserving Runs on Atlassian eligibility.

It is also a natural foundation for RAG.

If the top matching documents returned by Forge SQL are passed into the Forge LLM API as context, the app can move from semantic search to full Retrieval-Augmented Generation. The retrieval step stays in Forge SQL, and the generation step can be handled by Atlassian-hosted LLMs.

So this example is not only a semantic search demo. It is also a practical starting point for building a fully Forge-native RAG application.

If you want to explore the full source code, you can find the example application here: Example application on GitHub

Top comments (1)

Vasiliy Zakharchenko • Apr 12 • Edited

Small addition to this article.

In this example, embeddings are generated in the Custom UI (browser). This works well and keeps the flow fully local.

However, it is also possible to run the embedding model entirely on the Forge backend (inside the resolver), while still staying aligned with Runs on Atlassian eligibility.

This approach has a few practical differences:

the frontend sends only plain text
embeddings are generated on the backend
no model loading required in Custom UI, and no browser-side WebAssembly setup

You can use the same model - Xenova/all-MiniLM-L6-v2 - directly on the backend.

The setup is slightly different:

model files are stored inside the app (models/ folder)
included via extraFiles in manifest.yml
a separate bundle with @huggingface/transformers is built and loaded dynamically at runtime (Forge does not support dynamic imports in the default webpack build)

1. Add the model files into the app

The model files can be placed inside the app like this:

models/
  all-MiniLM-L6-v2/
    config.json
    special_tokens_map.json
    tokenizer.json
    tokenizer_config.json
    onnx/
      model.onnx

One important detail: model_quantized.onnx should be renamed to model.onnx.

2. Include the model files in manifest.yml

The model files need to be included in the Forge package:

app:
  runtime:
    name: nodejs24.x
    architecture: arm64
  id: ari:cloud:ecosystem::app/e05c9f71-3320-4eb7-bf83-89c20c6f8d21
  package:
    extraFiles:
      - models/**/*

Also, backend embeddings do not need the Custom UI wasm setup from the frontend example.

3. Build a separate sidecar bundle with @huggingface/transformers

In my case, I created a separate small library for this:

ai-lib/

Its package.json looks like this:

{
  "name": "ai-lib",
  "version": "1.0.0",
  "type": "commonjs",
  "main": "index.js",
  "scripts": {
    "build:tunnel": "vite build --mode tunnel",
    "build:arm64": "vite build --mode production:arm64",
    "build:x86_64": "vite build --mode production:x86_64"
  },
  "dependencies": {
    "@huggingface/transformers": "^4.0.1",
    "vite": "^8.0.8"
  },
  "devDependencies": {
    "esbuild": "^0.28.0",
    "rollup-plugin-copy": "^3.5.0",
    "vite-plugin-static-copy": "^4.0.1"
  }
}

This is built separately because the default Forge webpack build does not handle the dynamic import path for the model bundle the way we need here.

4. Build for the correct architecture

For forge deploy, the ai-lib bundle must match the runtime architecture defined in manifest.yml.

For example:

app:
  runtime:
    name: nodejs24.x
    architecture: arm64

If architecture is set to arm64, build the sidecar bundle for arm64:

npm run build:arm64

If architecture is set to x86_64, build it for x86_64:

npm run build:x86_64

If architecture is not specified in manifest.yml, use x86_64.

For forge tunnel, use:

npm run build:tunnel

In the tunnel case, the build is intended for local development and depends on your machine architecture.

5. Backend embedding code

The vector generation code can look like this:

import { env, pipeline } from "@huggingface/transformers";
import path from "node:path";

const MODEL_NAME = "all-MiniLM-L6-v2";

export const initAI = async (basePath: string) => {
  env.allowLocalModels = true;
  env.allowRemoteModels = false;

  env.localModelPath = path.join(basePath, "models/");
  env.backends.onnx.wasm!.proxy = false;
  env.backends.onnx.wasm!.wasmPaths = path.join(basePath, "wasm/");

  const extractor = await pipeline("feature-extraction", MODEL_NAME, {
    device: "cpu",
  });

  return {
    async getVector(text: string): Promise<number[]> {
      const output = await extractor(text, { pooling: "mean", normalize: true });
      return Array.from(output.data) as number[];
    },
  };
};

6. Include the built sidecar bundle in manifest.yml

After the bundle is built, include it in the app package too:

app:
  runtime:
    name: nodejs24.x
    architecture: arm64
  id: ari:cloud:ecosystem::app/e05c9f71-3320-4eb7-bf83-89c20c6f8d21
  package:
    extraFiles:
      - models/**/*
      - ai-lib/dist/**/*

7. Load the bundle dynamically on the backend

In the backend code, the built bundle can be loaded dynamically like this:

import path from "node:path";

let aiInstance: any = null;

export const getVector = async (text: string): Promise<number[]> => {
  if (!aiInstance) {
    const sidecarPath = path.join(process.cwd(), "ai-lib/dist/dist/index.mjs");

    try {
      const importDynamic = new Function("modulePath", "return import(modulePath)");
      const module = await importDynamic(sidecarPath);
      const initAI = module.initAI;

      aiInstance = await initAI(process.cwd());
    } catch (err) {
      console.error("Failed to load AI sidecar bundle:", err);
      throw err;
    }
  }

  return await aiInstance.getVector(text);
};

8. Use it directly in the resolver

Then the resolver can generate embeddings on the backend:

import { getVector } from "./ai";

resolver.define("create", async (req: Request<{ data: CreateDocumentInput }>): Promise<number> => {
  const { title, document } = req.payload.data;
  const embedding = await getVector(document);
  const res = await forgeSQL.insert(embeddedDocuments).values([{ title, document, embedding }]);
  return res[0].insertId;
});

With this approach, the frontend sends only plain text, and the vectors for semantic search are computed fully on the backend.

I created a full working example here:

forge-sql-orm-example-backend-ai