Are you looking to enhance the search capabilities of your F.A.Q. System to provide more precise and relevant results? Incorporating a semantic search feature might be the solution you've been searching for. This article will guide you through creating a semantic search system using Storyblok and Orama.
To implement the semantic search, we are going to follow this process:
- extracting the embeddings from our FAQ content from Storyblok;
- creating indexes for the embeddings into Orama;
- performing a search based on the user input.
We will build a script from implementing the above process in JavaScript, using some libraries as a dependency.
Installing the JavaScript libraries
We need to install these dependencies:
- Storyblok JS Client: for retrieving content from Storyblok ( https://www.npmjs.com/package/storyblok-js-client )
- Orama SDK: for creating indexes and performing the search ( https://www.npmjs.com/package/@orama/orama )
- Transformer.js: for running the model for creating embeddings from the text ( https://www.npmjs.com/package/@xenova/transformers )
- Prompts: for managing the user input ( https://www.npmjs.com/package/prompts ).
To install all of these packages in your JavaScript project you can use these commands:
npm i --save storyblok-js-client
npm i --save @orama/orama
npm i --save @xenova/transformers
npm i --save prompts
The source code
Now, we are going to create a new JavaScript file. The next sections will cover all the steps needed to implement the search.
Importing the libraries
In the JavaScript file, you can start importing the packages we are going to use:
import { create, insertMultiple, search } from "@orama/orama";
import { pipeline } from "@xenova/transformers";
import prompts from "prompts";
import StoryblokClient from "storyblok-js-client";
Initializing the Storyblok Client
To access the list of the FAQ from Storyblok that we want to index into Orama we have to initialize the StoryblokClient
with the access token:
const Storyblok = new StoryblokClient({
accessToken: "youraccesstoken",
cache: {
clear: "auto",
type: "memory",
},
});
For retrieving the Storyblok access token: https://www.storyblok.com/faq/retrieve-and-generate-access-tokens
Getting the content from Storyblok
For retrieving content we have to perform a HTTP API call to the stories
endpoint, filtering for some specific content (starts_with
parameter), and limiting the number of items (per_page
parameter):
const response = await Storyblok.get("cdn/stories", {
starts_with: "faq/",
per_page: 100,
});
Running the model
For indexing the questions of the FAQ (the text), we have to generate embeddings, which are numerical vectors representing the semantic meaning of the text.
We are going to use the GTE model: https://huggingface.co/Supabase/gte-small
const pipe = await pipeline("feature-extraction", "Supabase/gte-small");
let output = null;
let embedding = null;
Collecting data for indexes
Looping through the Storyblok content, we are going to fill an array. Each element has a name, the "full slug", and the embedding.
const arrayInserting = [];
await response.data.stories.forEach(async (element) => {
output = await pipe(element.name, {
pooling: "mean",
normalize: true,
});
embedding = Array.from(output.data);
arrayInserting.push({
name: element.name,
full_slug: element.full_slug,
embedding: embedding,
});
});
Creating Orama indexes
To perform a semantic search, we must generate indexes via the schema creation and fill the schema with the content we want indexed.
We must store embeddings as vector[384]
for the semantic search. The length of the vector depends on the model used.
const db = await create({
schema: {
name: "string",
full_slug: "string",
embedding: "vector[384]",
},
});
await insertMultiple(db, arrayInserting);
Asking the use input
const inputUser = await prompts({
type: "text",
name: "question",
message: "Ask me something about Storyblok",
});
const stringToSearch = inputUser.question;
Performing the search
Now in the stringToSearch
, we have the text to match with our indexes. For performing the search with Orama, we are going to calculate the embeddings from the string and perform the search using the vector search with Orama:
output = await pipe(stringToSearch, {
pooling: "mean",
normalize: true,
});
embedding = Array.from(output.data);
const results = await search(db, {
mode: "vector",
vector: {
value: embedding,
property: "embedding",
},
similarity: 0.85, // Minimum similarity. Defaults to `0.8`
includeVectors: true, // Defaults to `false`
limit: 10, // Defaults to `10`
offset: 0, // Defaults to `0`
});
Showing the results
Now, we have the results
array with the search results, so we can loop into the result
and access the data the search found.
console.log("");
if (results.hits.length === 0) {
console.log(`I can't find anything about ${stringToSearch}`);
console.log(
"Maybe you can add a new entry for the Frequent Asked Questions in Storyblok",
);
}
if (results.hits.length === 1) {
console.log(`I found a link for you for : ${stringToSearch}`);
}
if (results.hits.length > 1) {
console.log(`I found some useful links for you for : ${stringToSearch}`);
}
console.log("");
results.hits.forEach((element) => {
console.log(` ✨ ${element.document.name}`);
console.log(` 🔗 https://storyblok.com/${element.document.full_slug}`);
console.log("");
});
This represents a revolutionary breakthrough in search methodology, surpassing the constraints of syntactic search, which typically involves locating text containing specific isolated words. With this advanced approach to search, discovering similar content based on meaning becomes entirely feasible.
Watch the Video
I also created a short video to show the process
Top comments (0)