Keyword vs. semantic search with AI

#python #ai #database #vectordatabase

When building search for an application, you typically face two broad approaches:

Traditional keyword-based search — match words exactly or with simple variants.
Semantic (or vector) search — match meaning or context using AI embeddings.

There's also a hybrid approach, but I will let that for a future article. Instead, in this post I’ll walk you through how the two brad approaches work in Python using MariaDB and an AI embedding model, highlight where they differ, and show code that you can adapt.

The key components

For this example, I used MariaDB Cloud to spin up a free serverless database. Within seconds I had a free instance ready. I grabbed the host/user/password details, connected with VS Code, created a database called demo, created a products table and loaded ~500 rows of product names via LOAD DATA LOCAL INFILE. This is an extremely small dataset, but it's enough for learning and experimentation.

Then I built a small Python + FastAPI app. First I implemented a simple keyword search (by product name) endpoint using full-text index, then I implemented semantic (vector) search using AI-generated vector embeddings + MariaDB’s vector support. You can see the whole process in this video.

Keyword-based search: simple and familiar

For keyword search I used a full-text index on the name column of of the products table. With this index in place, I could search by product name using this SQL query:

SELECT name
FROM products
ORDER BY MATCH(name) AGAINST(?)
LIMIT 10;

I exposed this functionality using a FastAPI endpoint as follows:

@app.get("/products/text-search")
def text_search(query: str):
    cursor = connection.cursor()
    cursor.execute(
        "SELECT name FROM products ORDER BY MATCH(name) AGAINST(?) LIMIT 10;", (query,)
    )
    return [name for (name,) in cursor]

Pros:

Runs fast.
Works well when users type exact or close terms.
Uses built-in SQL features (no external AI model needed).

Cons:

Misses synonyms, context or related meaning.
Doesn’t understand intent (if user types “running shoes”, a strict keyword search may miss “jogging trainers” or “sneakers”).
Quality depends heavily on the wording.

In my demo, the endpoint returned several products that were not relevant to “running shoes”.

Semantic (vector) search: matching meaning

To go beyond keywords I implemented a second endpoint:

I use an AI embedding model (Google Generative AI via LangChain) to convert each product name into a high-dimensional vector.
Store those vectors in MariaDB with the vector integration for LangChain.
At query time, embed the user’s search phrase into a vector (using exactly the same AI embedding model of the previous step), then perform a similarity search with the highly performant HNSW algorithm in MariaDB (e.g., top 10 nearest vectors) and return the corresponding products.

Here’s how I implemented the ingestion endpoint:

@app.post("/products/ingest")
def ingest_products():
    cursor = connection.cursor()
    cursor.execute("SELECT name FROM products;")
    vector_store.add_texts([name for (name,) in cursor])
    return "Products ingested successfully"

And this is the semantic search endpoint:

@app.get("/products/semantic-search")
def search_products(query: str):
    results = vector_store.similarity_search(query, k=10)
    return [doc.page_content for doc in results]

The LangChain integration for MariaDB makes the whole process extremely easy. The integration creates two tables:

langchain_collection: Each row represents a related set of vector embeddings. I have only one in this demo which corresponds to the product names.
langchain_embedding: The vector embeddings. Each vector belongs to a collection (many-to-one to langchain_collection).

When I ran the semantic search endpoint with the same query “running shoes”, the results felt much more relevant: they included products that didn’t match “running” or “shoes” literally but were semantically close.

Keyword vs. semantic — when to use which

Here’s a quick comparison:

Approach	Pros	Cons
Keyword search	Quick to set up, uses SQL directly	Limited to literal term matching, less clever
Semantic search	Matches meaning and context, more flexible	Requires embedding model + vector support

Pick keyword search when:

Your search domain is small and predictable or, obviously, you need exact keyword match.
Users know exactly what they’re looking for (specific codes, exact names).
You want minimal dependencies and complexity.

Pick semantic search when:

You need to handle synonyms, similar concepts, user intent.
The dataset or domain has natural language variation.
You’re willing to integrate an embedding model and manage vector storage/indexing. MariaDB helps with this.

In many real-world apps you’ll use a hybrid: start with keyword search, and for higher-value queries or when exact match fails, fall back to semantic search. Or even mix the two via hybrid search. MariaDB helps with this too.

How simple the integration can be

In my demo I triggered vector ingestion via a POST endpoint (/ingest). That reads all product names, computes embeddings, and writes them to MariaDB. One line of code (via LangChain + MariaDB integration) handled the insertion of ~500 rows of vectors.

Once vectors are stored, adding a semantic search endpoint was just a few lines of code. The MariaDB vector support hidden most of the complexity.

The source code

You can find the code on GitHub. I have one simplistic easy-to-follow program in the webinar-main.py and a more elaborate one with good practices in backend.py. Feel free to clone the repository, modify it, experiment with your own datasets, and let us know if there's anything you'd like to see in the LangChain integration for MariaDB.