Daniel Odii

Posted on May 22

Understanding AI Embeddings

#ai #programming #rag #mcp

I've been hearing about embeddings for a while now, and even as someone who's very conversant with using LLMs as a daily driver and for integrating into smart systems, I wasn't really sure what exactly embeddings were and how they connected with everything else.

In this writeup, I'll be unpacking some of the things I've been able to learn about embeddings — what they are and how to use them as a software developer/engineer.

Turning Meanings into Coordinates

Think of embeddings as turning meanings into coordinates. LLMs are not built to — and cannot — understand words the same way humans do, so they convert text into lists of numbers that represent meaning.

Take the word "dog" for example. An LLM wouldn't straightforwardly understand what the word means until it converts it into a group of numbers:

"dog" → [0.21, -0.88, 0.44, ...]

What the Numbers Are NOT Based On

The number of values in an embedding has nothing to do with:

Word length
Number of letters
Number of characters

This is because embeddings don't encode spelling — they encode meaning and features. The embedding size is determined by:

The embedding model's architecture
How much semantic information the model wants to represent

So the embedding dimension is directly proportional to the model size.

Key Properties

Similar meanings end up close together
Different meanings end up farther apart

You could say that embeddings are basically "a mathematical location for meaning."

Real-World Analogy

Imagine a large city map:

Tailors live in one district
Doctors live in another district
Developers live in a separate district

Now replace people with words, sentences, documents, or even images. That's basically embeddings!

A few more examples to drive it home:

Pair	Relationship
"JavaScript" and "React"	Close together
"Needle and thread" and "fashion design"	Close together
"Dog" and "cat"	Close together
"Bank" (money) and "banana"	Far apart

Why Do Embeddings Matter?

Embeddings are what make AI able to:

Search semantically — find results based on meaning, not just keywords
Recommend similar content
Retrieve relevant context
Power Retrieval-Augmented Generation (RAG) systems
Compare meanings instead of exact words

Case Study: Semantic Search

Without embeddings, AI search would behave like old-school keyword search — returning results based on exact phrase matching.

With embeddings, a query like "How to fix app crashing" would also surface results like:

"Application keeps closing"
"React Native app freezes"
"Unexpected mobile app shutdown"

...because the meanings are close, even if the words are different.

What Can Be Embedded?

Almost anything:

Words — e.g., "King"
Sentences — e.g., "How to build a React app"
Entire documents — e.g., PDFs, docs, chats, codebases, etc.
Images — this is how Google reverse image search works

What Happens Behind the Scenes?

The system compares embeddings using similarity/distance metrics:

Cosine similarity — measures how similar two embeddings are based on their direction, regardless of size. If two vectors point almost the same way, they likely have similar meaning.
Euclidean distance — measures the actual straight-line distance between two embeddings in vector space. A smaller distance means the meanings are closer together.

Applying Embeddings in RAG

Let's look at how embeddings fit into a RAG (Retrieval-Augmented Generation) pipeline. Here's an example: building an enhanced search engine for a company website.

Step 1 → Convert documents into embeddings
         (e.g., PDFs, notes, product catalogs, support docs)

Step 2 → Store them in a vector database
         (e.g., Pinecone, Weaviate, Chroma, PGVector)

Step 3 → A user asks: "How do suppliers onboard?"

Step 4 → The question is converted into an embedding too

Step 5 → The system searches for nearby embeddings
         (semantically similar documents)

Step 6 → Relevant chunks are sent to the LLM

This is essentially how most "Chat with your docs" implementations work.

A Common Misconception

Some people think embeddings store knowledge — but that's not quite right.

Embeddings store:

Semantic relationships
Meaning patterns

The actual reasoning still happens in the LLM. Embeddings mainly help the model find relevant information, not process it.

Embedding Models

Open-Source / Free

These can be downloaded, run locally, fine-tuned, and used without API costs:

Model	Notes
BGE Embeddings	Strong general-purpose embeddings
E5 Embeddings	Great for retrieval tasks
Sentence Transformers	Very popular for semantic search
Hugging Face models	Wide variety available

Closed / Paid APIs

These are accessed through APIs and are typically billed per token or request:

Provider	Notes
OpenAI Embeddings	Widely used, easy to integrate
Cohere Embeddings	Strong multilingual support
Voyage AI Embeddings	Optimized for retrieval

If you've read to this point — congratulations, you're already on your way to becoming a pro RAG engineer. (Just kidding.)

Thanks for reading through though.

Top comments (1)

Harjot Singh • Jun 1

i appreciate how you broke down embeddings into something so digestible. turning meanings into coordinates really clarifies their role in AI. if you're looking to build something with ease, check out Moonshift. you can get a full next.js + postgres + auth app deployed in about 7 minutes, and you own the code on your github. let me know if you want a complimentary run.