Introduction ๐
In this article, we are going to explore cosine similarity and how to use it to implement a very simple yet effective search bar. Although the topic includes a bit of math, the implementation will be as simplistic as possible, yet we are still going to cover exactly how things work under the hood.
The problem ๐ค
If you are here, you are probably wondering, "How do I search for things in the first place?". The answer to this question lies in a field of computer science known as Data Retrieval. We are going to use one of the most fundamental methods used in data retrieval which, while not perfect, is surprisingly easy to implement and yields satisfying results.
Cosine Similarity ๐งฎ
Cosine similarity is a measure used to calculate how similar two vectors are. In simple terms, it is the cosine of the angle formed by the two vectors in space. Consider the following vectors:
In this example, the cosine similarity between vectors A and B is the cosine of angle ฮฑ, which is cos(26.57ยฐ) = 0.8943
This means that the vectors are quite similar.
We can also use this method to calculate the similarity of words.
Imagine we had words that could only have the letters A and B.
Word 1: "aaab"
Word 2: "abb"
Users input word: "aab"
The x-Axis will represent the amount of As and y-Axis the amount of Bs in each word.
At first glance, we can already see that our input "aab" is much closer to "aaab" than to "abb". Let's go ahead and calculate the cosine of each angle:
cos(8.13ยฐ) = 0.98
cos(36.86ยฐ) = 0.80
This is how we can represent words in a 2D space. But this by itself is not useful. In our world, we use more than just "a" and "b" characters. The English alphabet alone has 26 base characters. Imagine we wanted to support special characters as well, or even multiple languages.
Here we will need to use the general formula for cosine similarity:
Here, vectors A and B are n-dimensional. Note that they should have the same dimensions.
We will compare the following words using this formula:
Word 1 : "banana"
Word 2 : "apple"
User input: "bana"
For each word, we will create a 26-dimensional vector. Each slot represents the frequency of the n-th letter of the alphabet in a given word.
"banana":ย [3,1,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0]
"apple": ย ย ย [1,0,0,0,1,0,0,0,0,0,0,1,0,0,0,2,0,0,0,0,0,0,0,0,0,0]
"bana":ย ย ย ย [2,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0]
cos(bana,banana) = 0.982
cos(bana,apple) = 0.309
Therefore, the word "bana" is more closely related to "banana" than to "apple".
Enough math yapping, time to code ๐ป
We are going to create a cosine similarity search bar in React.
Let's start:
Create a new React app
npx create-react-app search-bar-app
Create a new SearchBar.tsx file
This is our main component.
Add the SearchBar
component inside App.js
.
import logo from './logo.svg';
import './App.css';
import { SearchBar } from './SearchBar.tsx';
function App() {
return (
<div className="App">
<header className="App-header">
<div>
<SearchBar/>
</div>
</header>
</div>
);
}
export default App;
From your terminal, install the compute-cosine-similarity package
npm i compute-cosine-similarity
Let's start building our component.
We need 3 variables: one for the query given by the user, one for the results to be displayed back to them, and finally the list of searchable items.
const [query, setquery] = useState<string>("");
const [results, setResults] = useState<string[]>([]);
const items = [
'Apples',
'Bananas',
'Carrots',
'Dairy Milk',
'Eggs',
'Flour',
'Grapes',
'Honey',
'Ice Cream',
'Juice',
'Kale',
'Lettuce',
'Milk',
'Nuts',
'Oranges',
'Pasta',
'Quinoa',
'Rice',
'Spinach',
'Tomatoes',
'Udon Noodles',
'Vanilla Extract',
'Watermelon',
'Yogurt',
'Zucchini'
];
Time for our component's logic.
When a user provides new input, we have to calculate the cosine similarity between the input and every word in our list. Then we return the list of items ordered by cosine similarity in decreasing order.
To calculate the similarity, for each word we construct an array of size 26. Each position in the array represents a letter of the English alphabet. For each character of a word, if the character is an English character, we increase the value of the character's position by 1.
function isEnglishAlphabet(character: string): boolean {
return /^[a-zA-Z]$/.test(character);
}
function calculateCosineSimilarity(firstWord:string, secondWord:string):number {
const firstWordMatrix = new Array(26).fill(0);
const secondWordMatrix = new Array(26).fill(0);
for (let character of firstWord) {
if (isEnglishAlphabet(character)) {
const index = character.toLowerCase().charCodeAt(0) - 97;
firstWordMatrix[index]++;
}
}
for (let character of secondWord) {
if (isEnglishAlphabet(character)) {
const index = character.toLowerCase().charCodeAt(0) - 97;
secondWordMatrix[index]++;
}
}
console.log(cosineSimilarity(firstWordMatrix, secondWordMatrix))
return cosineSimilarity(firstWordMatrix, secondWordMatrix) || 0;
}
Notice how we access the position of the letter in the matrix. If it is an English character, we get its ASCII value and subtract 97, which is the ASCII value of the letter "a".
We can now use this method on every item to return the results to the user. For each item, we create a pair of its name and cosine similarity with the user input. Then we filter out the items that are less than 0.5 similar and finally order by similarity.
function search():void {
let newResults = items.map(item =>({
name: item,
similarity: calculateCosineSimilarity(query,item)
}))
.filter(item => item.similarity > 0.5)
.sort((a, b) => b.similarity - a.similarity)
.map(item => item.name);
setResults(newResults);
}
We can gather all the methods above in a single useEffect
that runs whenever the user input changes.
useEffect(() => {
function isEnglishAlphabet(character: string): boolean {
return /^[a-zA-Z]$/.test(character);
}
function calculateCosineSimilarity(firstWord:string, secondWord:string):number {
const firstWordMatrix = new Array(26).fill(0);
const secondWordMatrix = new Array(26).fill(0);
for (let character of firstWord) {
if (isEnglishAlphabet(character)) {
const index = character.toLowerCase().charCodeAt(0) - 97;
firstWordMatrix[index]++;
}
}
for (let character of secondWord) {
if (isEnglishAlphabet(character)) {
const index = character.toLowerCase().charCodeAt(0) - 97;
secondWordMatrix[index]++;
}
}
console.log(cosineSimilarity(firstWordMatrix, secondWordMatrix))
return cosineSimilarity(firstWordMatrix, secondWordMatrix) || 0;
}
function search():void {
let newResults = items.map(item =>({
name: item,
similarity: calculateCosineSimilarity(query,item)
}))
.filter(item => item.similarity > 0.5)
.sort((a, b) => b.similarity - a.similarity)
.map(item => item.name);
setResults(newResults);
}
search();
}, [query])
For the final step, we need to create the view of our component. We'll use a simple input that holds the value of "query" and whenever the value inside changes, the query will be updated. This will trigger the useEffect
we created. We will map the results in a list just below the search bar.
return (
<div>
<input
type="text"
placeholder="Search..."
value={query}
onChange={(e) => setquery(e.target.value)}
/>
<ul>
{results.map((item, index) => (
<li key={index}>{item}</li>
))}
</ul>
</div>
);
Final result โจ๐
Go ahead and type something into the search bar. The top results should be very relevant to what you are looking for!
Full search bar component code:
import React, { useEffect, useState } from 'react';
import cosineSimilarity from 'compute-cosine-similarity';
export const SearchBar = () => {
const [query, setquery] = useState<string>("");
const [results, setResults] = useState<string[]>([]);
const items = [
'Apples',
'Bananas',
'Carrots',
'Dairy Milk',
'Eggs',
'Flour',
'Grapes',
'Honey',
'Ice Cream',
'Juice',
'Kale',
'Lettuce',
'Milk',
'Nuts',
'Oranges',
'Pasta',
'Quinoa',
'Rice',
'Spinach',
'Tomatoes',
'Udon Noodles',
'Vanilla Extract',
'Watermelon',
'Yogurt',
'Zucchini'
];
useEffect(() => {
function isEnglishAlphabet(character: string): boolean {
return /^[a-zA-Z]$/.test(character);
}
function calculateCosineSimilarity(firstWord:string, secondWord:string):number {
const firstWordMatrix = new Array(26).fill(0);
const secondWordMatrix = new Array(26).fill(0);
for (let character of firstWord) {
if (isEnglishAlphabet(character)) {
const index = character.toLowerCase().charCodeAt(0) - 97;
firstWordMatrix[index]++;
}
}
for (let character of secondWord) {
if (isEnglishAlphabet(character)) {
const index = character.toLowerCase().charCodeAt(0) - 97;
secondWordMatrix[index]++;
}
}
console.log(cosineSimilarity(firstWordMatrix, secondWordMatrix))
return cosineSimilarity(firstWordMatrix, secondWordMatrix) || 0;
}
function search():void {
let newResults = items.map(item =>({
name: item,
similarity: calculateCosineSimilarity(query,item)
}))
.filter(item => item.similarity > 0.5)
.sort((a, b) => b.similarity - a.similarity)
.map(item => item.name);
setResults(newResults);
}
search();
}, [query])
return (
<div>
<input
type="text"
placeholder="Search..."
value={query}
onChange={(e) => setquery(e.target.value)}
/>
<ul>
{results.map((item, index) => (
<li key={index}>{item}</li>
))}
</ul>
</div>
);
}
Top comments (0)