A backend developer's honest take on building semantic search, RAG systems, and recommendation engines
Why I Built These Projects
As a Laravel backend developer, I'm comfortable with the usual stack – MySQL databases, Eloquent ORM, Redis for caching. But lately, I've been hearing a lot about "AI" and "vector databases" and wondered what all the fuss was about.
So I decided to build four projects to understand how these technologies actually work and how they might fit into the web development world I know. This isn't about replacing Laravel or MySQL – it's about understanding new tools that could complement what we already do well.
Here's what I discovered by building these projects from scratch.
What the Heck is a Vector Database Anyway?
Let me explain this the way I wish someone had explained it to me.
You know how in Laravel, we store data in rows and columns? Well, imagine if instead of storing "iPhone 13 Pro Max" as a string, we could store the meaning of that phrase as a list of 384 numbers. Sounds crazy, right?
Here's the magic: when someone searches for "latest Apple smartphone," the system converts that query into the same type of 384-number list. Then it finds database entries with similar number patterns. Boom – semantic search!
Think of it like this: if traditional databases are like filing cabinets where you need to know the exact folder name, vector databases are like having a super-smart librarian who understands what you actually mean.
Why These Projects Explore Vector Databases
I'm not abandoning MySQL – it's still perfect for transactional data. But these projects explore what happens when you need to understand meaning and context, which traditional databases aren't designed for.
Here's what I wanted to understand through these projects:
The Search Problem: What if users could search by meaning instead of exact keywords? How would that work technically?
The Recommendation Problem: Instead of simple "related products," what if recommendations could understand deeper relationships between items?
The Content Problem: When you have thousands of documents, how do you find related content that shares concepts but different words?
The Multi-Modal Problem: How can you search images using text descriptions, or find similar images?
Project 1: Understanding Document Search
My first project was intentionally simple: a semantic search system with 23 diverse documents. The goal was to understand how vector search actually works compared to traditional database queries.
The Technical Reality
In Laravel, we're used to:
$products = Product::where('name', 'LIKE', '%' . $query . '%')->get();
With vector search, the approach is fundamentally different:
# Convert text to 384-dimensional vector
embedding = model.encode("search query")
# Find similar vectors in database
results = collection.query(query_embeddings=embedding)
What I discovered: When I searched for "Warcraft," the system returned documents about "Thrall meets Jaina" and "Sylvanas Windrunner" – even though those documents never contained the word "Warcraft." The system understood these were related concepts through the vector representations.
Why This Matters for Laravel Devs
Imagine you're building a blog platform. Instead of:
$posts = Post::where('title', 'LIKE', '%laravel%')
->orWhere('content', 'LIKE', '%laravel%')
->get();
You could do:
# Find posts semantically related to Laravel
posts = search_posts("web development framework PHP")
# Returns Laravel posts, Symfony posts, CodeIgniter posts, etc.
The difference? Your search understands that someone looking for "web development framework PHP" probably wants Laravel content, even if they didn't type "Laravel."
Project 2: Exploring RAG (Retrieval-Augmented Generation)
RAG stands for Retrieval-Augmented Generation – basically combining search with AI to answer questions using your own content.
The Concept I Wanted to Test
The idea is simple: instead of having AI answer questions from its general training, you first search your own documents for relevant information, then have AI answer based on that specific content.
How RAG Works in Practice
Think of it as a three-step process:
- Search your content for relevant information (the retrieval part)
- Combine the question with found content (the augmentation part)
- Generate an answer based on your specific content (the generation part)
The Technical Flow
In this project, the process works like this:
- User asks: "What are vector databases used for?"
- System searches the document collection for relevant content
- AI receives both the question and the found content as context
- AI generates an answer based only on the provided content
This approach ensures the AI answers are grounded in your actual content rather than making up information from its general training.
Project 3: Exploring Multi-Modal Search
This project explores something that fascinated me: searching images using text descriptions, and vice versa.
The Traditional Approach vs. Multi-Modal
Typically in Laravel, we handle images like this:
Schema::create('products', function (Blueprint $table) {
$table->string('image_path');
$table->string('alt_text')->nullable();
});
But what if you could search images by describing what you're looking for? Like typing "a cute cat" and finding cat images, even if they were never tagged or named "cat"?
Understanding CLIP
CLIP (Contrastive Language-Image Pre-training) is a model that converts both images and text into the same 512-dimensional vector space. This means:
- Images become vectors
- Text descriptions become vectors
- Similar concepts (image + text) have similar vectors
The project demonstrates two types of search:
- Text-to-image: Search for images using text descriptions
- Image-to-image: Find similar images using another image as the query
Real-World Applications for Web Developers
Imagine building:
- E-commerce platforms where users upload a photo and find similar products
- Content management systems where editors can find stock photos by describing what they need
- Social platforms where you can search posts by visual content, not just hashtags
Project 4: Understanding Content-Based Recommendations
This project explores how to build recommendations based on product characteristics rather than simple category matching.
Traditional vs. Vector-Based Recommendations
In Laravel, we typically do simple recommendations like:
$related = Product::where('category_id', $product->category_id)
->where('id', '!=', $product->id)
->inRandomOrder()
->limit(4)
->get();
This project explores a different approach: what if recommendations were based on understanding the actual characteristics of products?
The Content-Based Concept
Instead of "people who bought X also bought Y," this system analyzes:
- Product descriptions and features
- User purchase history patterns
- Semantic similarities between products
The system creates embeddings from product descriptions using TF-IDF (Term Frequency-Inverse Document Frequency), then builds user profiles based on their purchase history.
How It Works in Practice
The project demonstrates:
- Product Analysis: Convert product descriptions into vector representations
- User Profiling: Create user "taste profiles" from their purchase history
- Similarity Matching: Find products with similar characteristics to user preferences
This approach can recommend products across different categories if they share similar characteristics or appeal to similar user preferences.
What I Learned Building These Projects
The Learning Curve
Mathematical Concepts: As a web developer, concepts like "cosine similarity" and "high-dimensional vectors" seemed intimidating at first. But they're actually just ways to measure how similar things are.
New Dependencies: Instead of just Composer packages, I was dealing with Python libraries, pre-trained models, and different types of data processing.
Different Thinking: Moving from "exact matches" to "similarity scores" required rethinking how search and recommendations work.
What Made It Manageable
Pre-trained Models: I didn't need to train anything from scratch. Using existing models like SentenceTransformer and CLIP made the projects feasible.
Simple Analogies: Vector similarity is like measuring distance between points, just in many more dimensions than we can visualize.
Incremental Complexity: Starting with basic document search, then adding AI responses, then images, then recommendations built understanding gradually.
What These Projects Taught Me
Technical Insights
These aren't just cool tech demos – they represent different approaches to common web development problems:
- Semantic search could improve site search functionality
- RAG systems could power smarter help systems or chatbots
- Multi-modal search could enhance e-commerce product discovery
- Content-based recommendations could improve user engagement
Practical Advice for Laravel Developers
Start Here, Not There
Don't: Try to build everything from scratch
Do: Use existing services like OpenAI embeddings API or Cohere
Don't: Worry about the math initially
Do: Focus on the user experience and business value
Don't: Replace your entire stack overnight
Do: Add vector search as a complementary feature
How This Could Work with Laravel
These projects showed me that you don't need to replace existing systems. Instead, you could:
- Keep Laravel for user management, transactions, and business logic
- Keep MySQL for structured data that needs consistency
- Add vector services for search and discovery features
- Use APIs to connect Laravel with AI services when needed
Practical Integration Options
Based on building these projects, here are approaches that could work:
- Managed Services: Use APIs like OpenAI embeddings or Pinecone for vector storage
- Microservices: Run Python-based AI services alongside Laravel applications
- Hybrid Search: Combine traditional database queries with semantic search
- Gradual Adoption: Start with one feature like improved search, then expand
Reflections on the Technology
What I Understand Now
Building these projects clarified several things:
It's About Enhancement, Not Replacement: Vector databases don't replace traditional databases – they solve different problems.
The Tools Are Maturing: Pre-trained models and managed services make this technology accessible without deep AI expertise.
Real Business Value: These aren't just demos – semantic search, intelligent Q&A, and better recommendations solve real user problems.
Areas for Further Exploration
These projects opened up questions I'd like to explore:
Performance at Scale: How do these approaches perform with millions of documents or users?
Cost Management: What are the ongoing costs of embedding generation and vector storage?
Hybrid Approaches: How can you combine traditional and semantic search for the best results?
Integration Patterns: What are the best ways to integrate these capabilities into existing Laravel applications?
Why I Think This is Worth Exploring
Building these projects taught me that vector databases and AI aren't some distant future technology – they're tools that can solve problems we face today in web development.
The barrier to entry is lower than I expected. You don't need to understand neural network architectures or train your own models. You can start with pre-built services and libraries.
For Laravel developers, this feels similar to when we first learned about Redis or Elasticsearch – initially unfamiliar, but ultimately just another tool that's good at specific tasks.
If you're curious about how these technologies work in practice, I'd recommend starting with a simple document search project. The concepts become much clearer when you see them in action rather than just reading about them.
Getting Started
The projects in this repository progress from simple to complex:
- Start with Project 1 if you want to understand basic vector search
- Try Project 2 to see how AI can answer questions using your content
- Explore Project 3 for multi-modal search capabilities
- Build Project 4 to understand content-based recommendations
Each project includes detailed documentation explaining every line of code, assuming no prior AI knowledge.
The complete code and technical documentation for all four projects is available in the vector-stores-project repository. Each project includes step-by-step explanations designed for web developers exploring AI-powered applications.
Top comments (4)
I haven't done as much research as you did. From what I understand it is a single purpose database, meaning it only does similarity searches.
Solutions like Elasticsearch, Meilisearch and Solr can do the same, but also do exact searches.
And as a can-do-it-all database Postgres has an extension for vectors
So I think the use of a standalone vector database depends more on how much you need to use its strengths.
I think if I'm not mistaken elastic search introduce vector database as well using kNN capabilities. I haven't tried it yet though but yeah this thread is just basically a post from my learning and try to apply it to my current tech stack
Yes your journey in this post gave me some ideas i didn't have before. Thank you for that.
What i wanted to point out with my comment is that there are hybrid solutions. And that for some of the use cases in the post it is not needed to go outside PHP.
In this case I'm the cautious swimmer just dipping my toes in the water , where you are cliff diving into the water.
Yes true, it's basically only for the semantic search. So that's why I thought if we can implement semantic search on the web development (me as a backend) then it would be a complementary feature rather than complete replacement.