Most RAG tutorials start with "first, sign up for Pinecone." I'm going to skip that entirely. For the majority of Laravel applications, a dedicated vector database is overkill. You already have MySQL. You already have Laravel's queue system. That's enough to build a fully functional retrieval augmented generation pipeline that works well into the tens of thousands of documents.
RAG solves a specific problem. LLMs are trained on general data up to a cutoff date. They know nothing about your application's content, your internal docs, your product knowledge base, or anything else specific to your domain. RAG fixes this by retrieving relevant content from your own data and injecting it into the prompt as context before asking the model to answer. The model stops guessing and starts answering based on what you actually have.
Here is how to build it properly in Laravel.
What We Are Building
A pipeline that does four things:
- Accepts documents (articles, pages, PDFs, anything text-based) and stores them with their embeddings
- When a user asks a question, converts that question into an embedding
- Finds the most semantically similar documents using cosine similarity against your stored embeddings
- Feeds those documents as context to GPT and returns a grounded answer
No external services beyond OpenAI. No Docker containers for a vector DB. Just Laravel, MySQL, and two API calls per query.
Requirements
- Laravel 10 or 11
- PHP 8.1+
- MySQL 8.0+
- OpenAI API key
- Guzzle (ships with Laravel)
Step 1: The Documents Table
`php artisan make:migration create_documents_table`
`public function up(): void { Schema::create('documents', function (Blueprint $table) { $table->id(); $table->string('title'); $table->longText('content'); $table->longText('embedding')->nullable(); // JSON float array $table->string('source')->nullable(); // URL, filename, etc. $table->timestamps(); }); }`
`php artisan migrate`
The embedding column stores a JSON-encoded array of 1536 floats (for text-embedding-3-small). Yes, it's a text column, not a native vector type. MySQL 9 adds vector support but for now JSON in a longText column works fine for most use cases.
Step 2: The Document Model
`php artisan make:model Document`
`namespace App\Models; use Illuminate\Database\Eloquent\Model; class Document extends Model { protected $fillable = ['title', 'content', 'embedding', 'source']; protected $casts = [ 'embedding' => 'array', ]; }`
The embedding cast handles the JSON encoding and decoding automatically. When you set $document->embedding = $vectorArray, Laravel serializes it. When you read it back, you get a PHP array of floats.
Step 3: The Embedding Service
Keep all OpenAI communication in one place. This makes it easy to swap providers later.
`php artisan make:service EmbeddingService`
`namespace App\Services; use Illuminate\Support\Facades\Http; class EmbeddingService { private string $apiKey; private string $model = 'text-embedding-3-small'; public function __construct() { $this->apiKey = config('services.openai.key'); } public function embed(string $text): array { // Trim to ~8000 tokens to stay within model limits $text = mb_substr(strip_tags($text), 0, 32000); $response = Http::withToken($this->apiKey) ->post('https://api.openai.com/v1/embeddings', [ 'model' => $this->model, 'input' => $text, ]); if ($response->failed()) { throw new \RuntimeException('OpenAI embedding request failed: ' . $response->body()); } return $response->json('data.0.embedding'); } public function cosineSimilarity(array $a, array $b): float { $dot = 0.0; $magA = 0.0; $magB = 0.0; foreach ($a as $i => $val) { $dot += $val * $b[$i]; $magA += $val ** 2; $magB += $b[$i] ** 2; } $denominator = sqrt($magA) * sqrt($magB); return $denominator > 0 ? $dot / $denominator : 0.0; } }`
Register it in config/services.php
`'openai' => [ 'key' => env('OPENAI_API_KEY'), ],`
Step 4: Indexing Documents
A command to process documents and store their embeddings. You run this once on existing content, then hook it into your document creation flow going forward.
`php artisan make:command IndexDocuments`
`namespace App\Console\Commands; use App\Models\Document; use App\Services\EmbeddingService; use Illuminate\Console\Command; class IndexDocuments extends Command { protected $signature = 'rag:index {--fresh : Re-index all documents}'; protected $description = 'Generate and store embeddings for all documents'; public function handle(EmbeddingService $embedder): int { $query = Document::query(); if (!$this->option('fresh')) { $query->whereNull('embedding'); } $documents = $query->get(); $bar = $this->output->createProgressBar($documents->count()); foreach ($documents as $doc) { try { $doc->embedding = $embedder->embed($doc->title . "\n\n" . $doc->content); $doc->save(); $bar->advance(); } catch (\Exception $e) { $this->error("Failed on document {$doc->id}: " . $e->getMessage()); } // Respect OpenAI rate limits usleep(200000); // 200ms between requests } $bar->finish(); $this->newLine(); $this->info('Indexing complete.'); return self::SUCCESS; } }`
Run it:
`php artisan rag:index`
Notice I'm concatenating title and content before embedding. The title carries a lot of semantic weight and including it improves retrieval accuracy noticeably.
Step 5: The Retrieval Logic
This is the core of RAG. Given a query, find the most relevant documents.
`namespace App\Services; use App\Models\Document; class RetrievalService { public function __construct(private EmbeddingService $embedder) {} public function retrieve(string $query, int $topK = 5, float $threshold = 0.75): array { $queryVector = $this->embedder->embed($query); $documents = Document::whereNotNull('embedding')->get(); $scored = $documents->map(function (Document $doc) use ($queryVector) { return [ 'document' => $doc, 'score' => $this->embedder->cosineSimilarity($queryVector, $doc->embedding), ]; }) ->filter(fn($item) => $item['score'] >= $threshold) ->sortByDesc('score') ->take($topK) ->values(); return $scored->toArray(); } }`
The $threshold of 0.75 filters out loosely related documents. You may need to tune this for your content, lower it if you're getting no results, raise it if you're getting irrelevant ones. Anywhere between 0.70 and 0.85 is usually sensible.
Step 6: The RAG Query Service
This ties retrieval and generation together.
`namespace App\Services; use Illuminate\Support\Facades\Http; class RagService { public function __construct( private RetrievalService $retriever, private string $apiKey ) { $this->apiKey = config('services.openai.key'); } public function ask(string $question): array { // Step 1: Retrieve relevant documents $results = $this->retriever->retrieve($question, topK: 4); if (empty($results)) { return [ 'answer' => 'I could not find relevant information to answer this question.', 'sources' => [], ]; } // Step 2: Build context from retrieved docs $context = collect($results) ->map(fn($r) => "### {$r['document']->title}\n{$r['document']->content}") ->join("\n\n---\n\n"); // Step 3: Send to GPT with context $response = Http::withToken($this->apiKey) ->post('https://api.openai.com/v1/chat/completions', [ 'model' => 'gpt-4o-mini', 'temperature' => 0.2, 'messages' => [ [ 'role' => 'system', 'content' => "You are a helpful assistant. Answer questions using only the context provided below. If the answer is not in the context, say so clearly. Do not make up information.\n\nContext:\n{$context}" ], [ 'role' => 'user', 'content' => $question, ] ], ]); return [ 'answer' => $response->json('choices.0.message.content'), 'sources' => collect($results)->map(fn($r) => [ 'title' => $r['document']->title, 'source' => $r['document']->source, 'score' => round($r['score'], 3), ])->toArray(), ]; } }`
Two things worth noting here. Temperature is set to 0.2, not the default 0.7. You want deterministic, factual answers when doing RAG, not creative ones. And the system prompt explicitly tells the model to stay within the provided context and admit when it doesn't know. Without that instruction, GPT will hallucinate rather than say "I don't have that information."
Step 7: The Controller
`php artisan make:controller RagController`
`namespace App\Http\Controllers; use App\Services\RagService; use Illuminate\Http\Request; class RagController extends Controller { public function __construct(private RagService $rag) {} public function ask(Request $request) { $request->validate(['question' => 'required|string|max:500']); $result = $this->rag->ask($request->input('question')); return response()->json($result); } }`
Register the route in routes/api.php
`Route::post('/ask', [RagController::class, 'ask']);`
Step 8: Test It
Seed a couple of documents first:
`Document::create([ 'title' => 'Laravel Queue Configuration', 'content' => 'Laravel queues allow you to defer time-consuming tasks...', 'source' => 'https://laravel.com/docs/queues', ]);`
Run the indexer:
`php artisan rag:index`
Then hit the endpoint:
`curl -X POST http://your-app.test/api/ask \ -H "Content-Type: application/json" \ -d '{"question": "How do I configure Laravel queues?"}'`
Response:
{ "answer": "Laravel queues are configured via the config/queue.php file...", "sources": [ { "title": "Laravel Queue Configuration", "source": "https://laravel.com/docs/queues", "score": 0.891 } ] }
Where This Falls Down at Scale
This setup works well up to roughly 50,000 documents. Beyond that, loading all embeddings into memory for comparison becomes a problem. At that point your options are:
- Add a MySQL generated column + raw SQL dot product approximation to filter candidates before full cosine comparison
- Move to pgvector if you can switch to PostgreSQL, which handles this natively and efficiently
- Then and only then consider Pinecone or Weaviate
Most Laravel projects never reach that threshold. Start simple, measure, then scale the storage layer when you actually need to.
What to Build on Top of This
Once the core pipeline is working, the useful next steps are: caching query embeddings so repeated questions don't hit the API twice, chunking long documents into 500-token segments before embedding so retrieval is more granular, adding a feedback mechanism so users can flag bad answers and you can track retrieval quality over time, and per-user conversation history so the model has context across multiple turns.
That is a production-ready RAG foundation in Laravel with no external vector database. The whole thing is maybe 200 lines of actual PHP spread across four service classes and one command.
Original post available in - https://www.phpcmsframework.com/2026/03/building-rag-system-in-laravel-from-scratch.html
Top comments (0)