Meaning Has a Shape: How AI Models Represent Concepts (and Why It Changes Everything About Search).

AI models represent meaning as location in a high-dimensional space words with similar meanings sit near each other, unrelated concepts sit far apart. This is called an embedding. Understanding embeddings explains how semantic search works, why AI sometimes confidently produces outdated answers, and what to do about it. No maths required.

This is Part 3 of a five-part series from the Vectors pillar of Context First AI. Built for anyone starting their AI journey — developer or not. Parts 1 and 2 covered next-token prediction and tokenisation respectively. This part goes deeper into how meaning is represented.

Full series:

Part 1 — The Autocomplete That Ate the World
Part 2 — You're Not Reading Words, You're Reading Chunks
Part 3 — Meaning Has a Shape
Part 4 — You're Not Writing Prompts, You're Writing Instructions for a Very Particular Mind
Part 5 — What to Do When the Model Doesn't Know Enough

The Search That Kept Failing

It started during an internal tool evaluation — a 30-person team building an AI assistant over their HR documentation.

The setup seemed sound. Documents indexed. Search connected. Interface clean. But when the team ran test queries, they kept getting empty results on things that clearly existed. "How do I request time off?" — nothing. The policy was right there in the knowledge base, under the title "Leave Application Process."

The tool wasn't broken. The issue was more fundamental: the search was matching strings, not meanings. And those two things, it turns out, are very different problems.

What an Embedding Actually Is

To understand why the search failed — and how to fix it — you need to understand embeddings.

An embedding is a location. When a model processes a word, a sentence, or an entire document, it converts that text into a list of numbers — typically hundreds or thousands of them — that encodes its position in a high-dimensional space.

The key property of this space: semantic similarity maps to geometric proximity. Words and phrases that appear in similar contexts during training end up placed near each other. Unrelated concepts end up far apart.

You can see this directly using a sentence transformer model:

from sentence_transformers import SentenceTransformer
import numpy as np

model = SentenceTransformer("all-MiniLM-L6-v2")

phrases = [
    "how do I request time off",
    "employees should submit a leave application",
    "quarterly budget forecast",
    "invoice payment terms",
]

embeddings = model.encode(phrases)

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Compare the first phrase against all others
query = embeddings[0]
for i, phrase in enumerate(phrases[1:], 1):
    similarity = cosine_similarity(query, embeddings[i])
    print(f"Similarity to '{phrases[i]}': {similarity:.3f}")

Running this produces something like:

Similarity to 'employees should submit a leave application': 0.721
Similarity to 'quarterly budget forecast': 0.082
Similarity to 'invoice payment terms': 0.047

The query and the HR policy sentence are close in embedding space — high cosine similarity — even though they share no keywords. The budget and invoice phrases are far away. The geometry reflects the meaning.

The Vector Arithmetic Example

The famous demonstration of what this space encodes is worth working through directly.

python
from sentence_transformers import SentenceTransformer
import numpy as np

model = SentenceTransformer("all-MiniLM-L6-v2")

Encode the four concepts

king, man, woman, queen = model.encode(["king", "man", "woman", "queen"])

Vector arithmetic: king - man + woman

result = king - man + woman

Find similarity to queen

def cosine_similarity(a, b):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

print(f"king - man + woman → similarity to 'queen': {cosine_similarity(result, queen):.3f}")
print(f"king - man + woman → similarity to 'king': {cosine_similarity(result, king):.3f}")
print(f"king - man + woman → similarity to 'man': {cosine_similarity(result, man):.3f}")


The result vector sits closer to *queen* than to any of the input words. Nobody wrote that relationship. It emerged from the geometry of how those words were used across training data — because *king* and *queen* appear in analogous contexts to *man* and *woman* respectively, and the model encoded that analogy structurally.

This is what it means for meaning to have a shape.

Building a Minimal Semantic Search

The practical application for the HR team's problem is semantic search — retrieval by meaning rather than keyword matching.

Here's a minimal working example:

python
from sentence_transformers import SentenceTransformer
import numpy as np

model = SentenceTransformer("all-MiniLM-L6-v2")

Knowledge base documents (simplified)

documents = [
"Leave Application Process: Employees should submit a leave application via the HR portal at least 5 working days in advance.",
"Expense Reimbursement Policy: All business expenses must be submitted within 30 days of incurrence with receipts attached.",
"Remote Work Guidelines: Employees may work remotely up to 3 days per week subject to manager approval.",
"Performance Review Schedule: Annual reviews are conducted in January and July each year.",
]

Index: convert all documents to embeddings at setup time

document_embeddings = model.encode(documents)

def semantic_search(query: str, top_k: int = 2) -> list[dict]:
query_embedding = model.encode(query)

similarities = []
for i, doc_embedding in enumerate(document_embeddings):
    similarity = np.dot(query_embedding, doc_embedding) / (
        np.linalg.norm(query_embedding) * np.linalg.norm(doc_embedding)
    )
    similarities.append({"document": documents[i], "score": float(similarity)})

return sorted(similarities, key=lambda x: x["score"], reverse=True)[:top_k]

Test with the original failing query

results = semantic_search("how do I request time off")
for r in results:
print(f"Score: {r['score']:.3f}")
print(f"Document: {r['document'][:80]}...")
print()

Output:

Score: 0.698
Document: Leave Application Process: Employees should submit a leave application via the HR...

Score: 0.201
Document: Remote Work Guidelines: Employees may work remotely up to 3 days per week subject...

The correct document surfaces at the top, despite sharing no keywords with the query. The retrieval is driven entirely by the semantic proximity of the embeddings.

This is the architecture underneath virtually every AI tool that finds relevant information from a document store.

Two Kinds of Knowledge — and Two Different Failure Modes

Understanding embeddings opens up a second important distinction: the difference between what a model learned and what you give it.

Parametric knowledge is baked into the model's weights during pre-training. It's vast — facts, concepts, patterns, cultural context — but it was fixed at a point in time. The model has a training cutoff, and it cannot update itself after that. Critically, it often doesn't know when it's uncertain about something in this category. It can sound equally confident whether it's right or drawing on outdated information.

Contextual knowledge is whatever you supply in the prompt — a document, a data extract, a policy, a set of instructions. The model processes this at inference time and can reason over it carefully, because it's right there in the context window.

The failure modes are different, and diagnosable:

import openai

client = openai.OpenAI()

# Parametric knowledge failure — asking the model to recall something
# it may not have, or that may have changed since training
parametric_prompt = """
What is the current interest rate set by the Bank of England?
"""

# Contextual knowledge approach — supply the information, ask for reasoning
contextual_prompt = """
The Bank of England's Monetary Policy Committee voted on [DATE] to set the 
base rate at [CURRENT RATE]%. This decision was driven by [REASON].

Based on the above, what is the current Bank of England base rate, 
and what was the stated rationale?
"""

# The parametric approach may return outdated or hallucinated data.
# The contextual approach constrains the model to reason from what you've supplied.

# Practical pattern: when accuracy matters, always use the contextual approach.
# Supply the source. Ask the model to reason from it, not from training memory.

The pattern is simple: for anything time-sensitive, domain-specific, or where you need verifiable accuracy, don't ask the model to retrieve from memory. Give it the information and ask it to reason over what you've supplied.

This is, in structural terms, the core idea behind RAG (retrieval-augmented generation) — which we cover properly in Part 5.

Choosing an Embedding Model

Not all embedding models are equal, and the choice matters for retrieval quality.

# Common embedding model options and their tradeoffs

embedding_models = {
    "all-MiniLM-L6-v2": {
        "dimensions": 384,
        "speed": "fast",
        "quality": "good for general use",
        "use_case": "prototyping, general semantic search"
    },
    "all-mpnet-base-v2": {
        "dimensions": 768,
        "speed": "moderate",
        "quality": "higher quality general embeddings",
        "use_case": "production general search"
    },
    "text-embedding-3-small": {
        "dimensions": 1536,
        "speed": "API call latency",
        "quality": "strong across many domains",
        "use_case": "OpenAI ecosystem integration"
    },
    "text-embedding-3-large": {
        "dimensions": 3072,
        "speed": "API call latency",
        "quality": "highest quality in OpenAI family",
        "use_case": "high-stakes retrieval, complex domains"
    },
}

# Key principle: the embedding model used at index time and query time
# must be the same model. Mixing models produces meaningless similarity scores.

# For specialist domains (legal, medical, scientific), consider domain-specific
# embedding models trained on that vocabulary — general models may underperform
# on highly specialised terminology.

The final point in the comment block is worth emphasising: a poorly trained embedding model produces a space where proximity doesn't reliably encode meaning. Retrieval becomes unreliable in ways that look like the wrong documents being returned — which is often misdiagnosed as a model quality issue rather than an embedding quality issue.

What Comes Next

You now have three layers of the foundation: prediction, tokenisation, and meaning representation. Part 4 puts this to work practically — how to communicate with these models in ways that consistently produce better results. Prompt engineering, done correctly, is a direct consequence of understanding the mechanics we've built up across these three parts.

See you there.

*Created with AI assistance. Originally published at [Context First AI]