DEV Community

M M Islam Chisty
M M Islam Chisty

Posted on

I built a custom AI assistant platform with Laravel + multi-provider LLMs + pgvector - here's what I learned

I've been a software engineer for a while. A few months ago I decided to build a SaaS product solo. This is a technical write-up of what I shipped, the architectural decisions I made, and the honest lessons from building it.

The product is ChatNexus - a platform that lets any business deploy a custom AI assistant trained on their own data. Upload documents, paste text, crawl a URL, or sync a Google Doc - your assistant learns from your content and can answer questions, handle bookings, and run 24/7 on your website.

Here's the full stack and why I chose each piece.


The Stack

Laravel 12 + Livewire 4 (PHP 8.4)

I'm a Java engineer by day. PHP wasn't the obvious choice, but Laravel's ecosystem - queues, broadcasting, Cashier, Sail - meant I could ship subscription billing, real-time chat, background jobs, and a full admin panel without stitching together five separate services. Livewire 4 kept me in a single language while still getting reactive UI. It saved weeks.

Multi-provider AI via LlmProviderInterface

I didn't want to be locked into one AI provider. The system is built around a LlmProviderInterface and a LlmProviderRegistry - chat requests are routed to whichever provider is configured for the agent's assigned model. Currently supported: Groq, OpenAI, Google Gemini, and xAI Grok - all using a shared OpenAI-compatible streaming implementation. Swapping providers is a config change, not a code change.

The default model is llama-3.1-8b-instant via Groq. Groq's LPU delivers tokens fast enough that streaming actually feels instant - which matters a lot for user experience.

OpenAI for embeddings, any provider for chat

This is a deliberate split. Embeddings are generated using OpenAI's text-embedding-3-small at 768 dimensions and cached for 24 hours. Chat completions go through whichever provider is assigned to the agent. Keeping embeddings on a consistent model means the vector space stays stable - if you swap chat providers, your existing knowledge base still works.

PostgreSQL + pgvector (Neon) for RAG

I didn't want to pay for a separate vector database. pgvector runs directly in Postgres - same connection, same ORM, no extra infra. Neon is a serverless Postgres provider with pgvector support built in; it works well on a free tier for early-stage SaaS.

Documents get chunked with overlap (sentence-boundary aware - the chunker tries to break at . or \n rather than mid-word), batch-embedded in a single API call, and stored as vectors. At query time, cosine similarity retrieves the most relevant chunks and injects them into the system prompt. Results are deterministic and fast.

Knowledge base sources supported: plain text, PDF, URL (with a built-in web scraper), and Google Docs / Google Sheets via a service account integration.

Booking system built in

Each agent can have availability schedules and handle appointment bookings directly through the chat - no third-party integration required. The availability engine checks slots, prevents double-booking, and sends confirmation/cancellation/reschedule emails to both the customer and the agent owner. Default timezone is Australia/Sydney but configurable per agent.

Laravel Reverb for real-time streaming

Token streaming over WebSockets so responses appear word-by-word rather than all at once. Reverb is Laravel's first-party WebSocket server - self-hosted, no third-party dependency, no per-message billing.

Stripe via Laravel Cashier

Subscription billing in under a day. Cashier handles webhooks, plan changes, trials, and invoices. I didn't write a single line of raw Stripe API code.

Deployed on Render.com

Web service + background worker + Redis. Docker-based. Auto-deploys on push to master.


What I Actually Learned

1. Split your AI responsibilities by model stability.

Using OpenAI for embeddings and allowing any provider for chat was the right call. Embeddings define the shape of your vector space - if you change the model, all existing vectors become incompatible and you need to re-embed your entire knowledge base. Keeping that layer fixed while letting the chat layer be flexible saved a painful migration.

2. Queue workers on managed platforms need a process loop, not exec.

php artisan queue:work --max-time=3600 exits after one hour by design (memory management). On Render, that triggers a "crashed instance" alert every hour - even though the service auto-recovers immediately. The fix is wrapping the command in while true; do ... done so the container process stays alive and the worker cycles internally. One line change, eliminates recurring false-alarm alerts.

3. Livewire 4 and UTF-8 BOM don't mix.

If a Blade file has a BOM byte at the start, Livewire's regex that attaches wire:id to the root element silently misplaces it onto a child element. Every wire:click and wire:model handler stops working - no JS error, no visible symptom, just dead interactivity. Checking for BOM bytes is now the first thing I do when Livewire pages render but don't respond.

4. "AI Agent" is the wrong positioning for a product that answers questions.

The term carries a market expectation: autonomous multi-step reasoning, tool orchestration, goal completion. If your product is primarily answering questions from a knowledge base, calling it an "agent" creates a mismatch - users feel underwhelmed. "AI Assistants" is more honest, and leading with what the assistant does (answers questions, books appointments, runs 24/7 on your data) converts better than a buzzword.

5. Free tools are a legitimate acquisition channel.

The product ships with three publicly accessible tools - an AI Playground, a FAQ Generator, and a System Prompt Generator - that require no login. These are standalone, genuinely useful, and link back to the main product. For a solo bootstrapped SaaS with no marketing budget, giving something useful away is more effective than any ad.

6. Batch your embedding calls.

The first version embedded each knowledge base chunk in a separate API call. For a document with 50 chunks, that's 50 round trips. Switching to batch embedding - sending all chunks in a single request - made knowledge base ingestion roughly 10× faster for large documents. The OpenAI embeddings API supports batching natively.


Where It Is Now

ChatNexus is in early access. It's live at chatnexus.cloud.

I'm looking for beta users willing to:

  • Set up an agent on their real website
  • Break things and tell me what's missing
  • Give honest feedback on what's confusing

Beta access: 60-day free Starter plan (3 agents, 1,000 messages/month), no credit card. Drop a comment or reach out and I'll send the promo code.


Built solo. Feedback - good or bad - is genuinely appreciated.

Top comments (0)