Every developer working with Large Language Models quickly learns about vector embeddings—arrays of floating-point numbers mapping words, sentences, or images into multi-thousand-dimensional semantic spaces. But while we write APIs calling text-embedding-3-small daily, humans lack the biological architecture to conceptualize 1536-dimensional coordinates.
To bridge this intuitive void, we built Vector Space Explorer: an interactive web visualizer allowing developers to input custom vocabularies, perform real vector arithmetic (like puppy - dog + cat = kitten), play semantic clustering games, and examine the raw JSON outputs returned from deep learning hubs.
Here is the technical architectural breakdown of how we built this application using React 18, Tailwind CSS, and pure client-side linear algebra math.
1. Multi-Provider Endpoint Ingestion
Depending on budget or privacy restrictions, developers use different pipelines. To serve all needs, we wrapped our request adapters to handle multiple standard API specifications through a unified, client-secured interface:
- Simulated (Mock Mode): An offline, high-speed, lightweight client-side embedder calculating mathematical coordinates internally so developers can test layouts instantaneously without keys.
-
OpenAI Cloud: Requests fetched from the secure gateway using
text-embedding-3-small(1,536-dim). -
LM Studio (Local): Allows local offline execution of state-of-the-art open models like
nomic-embed-text-v1.5on port1234. - Featherless.ai & OpenRouter: Direct serverless endpoints mapping standard OpenAI-compatible JSON responses.
Here is how the API layer executes the ingestion of raw words:
// excerpt from src/utils/api.ts
export async function fetchWordEmbedding(
word: string,
settings: SystemSettings
): Promise<number[]> {
if (settings.demoMode) {
return generateMockEmbedding(word); // Instant client-side PCA-friendly vector
}
const response = await fetch(`${settings.baseUrl}/embeddings`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${settings.apiKey}`
},
body: JSON.stringify({
input: word,
model: settings.model
})
});
if (!response.ok) {
throw new Error(`Endpoint status code error: ${response.status}`);
}
const json = await response.json();
return json.data[0].embedding; // Standard 1,536 float elements
}
2. The Math Behind the Visualization: Client-Side PCA
Fetching a 1,536-dimensional array is only the beginning. To render this on a computer screen, we must flatten 1,536 axes down to just 2 dimensions ($X, Y$) while preserving as much cluster structure and relative similarity as possible.
While we could send data back to Python for Scikit-Learn’s PCA, doing so ruins UI snappiness. We solved this by writing a pure TypeScript client-side Principal Component Analysis engine utilizing raw Matrix math:
Step 2.1: Constructing the Covariance Matrix
First, we mean-center the coordinate matrices of all words currently added to the sandbox, then calculate their covariance. This maps the directional associations of our coordinates.
\Sigma = \frac{1}{n} X^T X
Step 2.2: SVD via Jacobi Eigenvalue Algorithm
To extract the two main components (the directions capturing the highest amount of variance), we must extract the eigenvectors from our covariance matrix. We write a iterative Jacobi sweep solver to diagonalize symmetric matrices direct in TypeScript:
// Clean conceptual loop of Jacobi Eigenvalue Solver
export function solveEigenvectors(covariance: number[][], maxSweeps = 50) {
const size = covariance.length;
const eigenvectors = createIdentityMatrix(size);
let matrix = cloneMatrix(covariance);
for (let sweep = 0; sweep < maxSweeps; sweep++) {
let offDiagonalSum = computeOffDiagonalNorm(matrix);
if (offDiagonalSum < 1e-9) break; // Diagonalized successfully!
for (let p = 0; p < size; p++) {
for (let q = p + 1; q < size; q++) {
const theta = calculateRotationAngle(matrix, p, q);
const [c, s] = [Math.cos(theta), Math.sin(theta)];
matrix = applyJacobiRotation(matrix, p, q, c, s);
updateEigenvectorSet(eigenvectors, p, q, c, s);
}
}
}
return { eigenvalues: extractDiagonals(matrix), eigenvectors };
}
Step 2.3: Projecting down to 2D
We sort computed eigenvalues descending, choose the eigenvectors corresponding to the top two eigenvalues, and project our high-dimensional vectors onto those top two Principal Components to compute coordinates on our responsive galaxy map.
3. Why are "puppy" and "dog" far apart visually?
A common question emerges when users construct clusters in Sandbox mode:
"If puppy and dog have a Cosine Similarity of 0.85 (extremely high), why are they rendered far apart on the 2D constellation grid?"
This paradox illustrates the exact mathematical limitation of dimensional projection.
┌──────────────────────────┐
│ 1,536-Dimensional Space │ ➔ (true relationship is extremely adjacent)
└─────────────┬────────────┘
│ PCA Lossy Compress
▼
┌──────────────────────────┐
│ 2D Screen Canvas │ ➔ (compressed projection can distort angles)
└──────────────────────────┘
- Loss of Variance Info: Standard embeddings span 1,536 dimensions representation. High-similarity dimensions might point along eigenvectors #12 or #24 which are entirely discarded in order to squeeze the map onto Component #1 and #2.
- Global Optimization Context: PCA determines its coordinate calculations based exclusively on the currently visible set of sandbox stars. If you only have "cat", "dog", and "puppy" in the sandbox, the eigenvalues will polarize. By inputting highly distinct vocabulary structures (e.g., adding "planet", "computer", "compiler", and "jupiter"), PCA gains rich context nodes, pulling "dog", "puppy", and "cat" close together into a dense animal sub-cluster while pushing tech words across to the opposite quadrant!
To offset this limitation, our side Telemetry Panel calculates both the 2D Euclidean offset and the True High-Dimensional API Cosine Similarity as you select nodes, teaching developers how to interpret projection skew.
4. Deep Vector Algebra Calculations
One of the application's proudest features is its interactive Algebra Lab, which allows executing vector math directly in the browser.
Executing vector("puppy") - vector("dog") + vector("cat") produces a new 1,536-dimensional target coordinate array. We search all active sandbox star arrays to identify the absolute closest semantic neighbor utilizing Cosine Vector angles:
$$
\text{Similarity}(A, B) = \frac{A \cdot B}{|A| |B|}
$$
The model traces an animated emerald trajectory line showing how close the synthesized conceptual vector came to landing exactly on its semantic targets (like #kitten).
By maintaining robust client state, loading dynamic SVG/Canvas layers smoothly, and managing API key parameters privately in browser memory without server tracking, the application guarantees secure, stateless, and incredibly educational sandbox interactions.
Fork the codebase on GitHub to explore adding 3D orbital environments, cluster mapping, or real-time Vector DB indexing visuals today!
Code and more: https://www.dailybuild.xyz/project/139-vector-space-explorer
Top comments (0)