Building Stayzr meant solving real problems: PMS integration, high-throughput webhook handling, and AI that actually knows your property. Here's how we architected it.
The Stack (What's Running in Production)
Backend: Go 1.23 with Fiber framework, pgx/v5 connection pooling, Bun ORM over PostgreSQL, Redis for caching/sessions, OpenTelemetry for tracing
AI Agents Service: Python 3.11 + FastAPI (Uvicorn), LangChain primitives, Qdrant for knowledge base, ChromaDB for conversation memory
Frontend: Next.js 15 / React admin UI + marketing site
3rd-party Integrations: Mews (PMS), WhatsApp Business/Meta, Resend + Postmark (email), Azure Blob Storage (files), Gemini + OpenAI (LLM + embeddings), Infisical (secrets), SigNoz + Oneuptime (observability)
It's a polyglot monorepo: Go where throughput and concurrency matter (API, dispatch, sync), Python where the LLM/RAG ecosystem lives.
Why Go Over Python/Node/Java?
For the parts handling concurrent I/O — PMS sync workers, email dispatch worker, webhook fan-in — Go's goroutines + channels let us run in-process worker pools without pulling in a broker or heavyweight async runtime.
The dispatch worker is a for{ select } loop over a ticker and wake channel — simple and effective for our use case.
We kept Python only for the agents service because that's where LangChain, Gemini/OpenAI SDKs, and vector-store clients live. The honest answer: Go for systems work, Python where AI tooling requires it.
Multi-Tenancy: Row-Level Isolation
Shared database, shared schema, row-level isolation by organizationId. Every tenant-scoped table carries an organizationId, with a TenantDB wrapper in the data layer that auto-appends organization_id = $N to queries.
Middleware (MultiTenantContext / RequireTenant) resolves the org from the X-Organization-ID header, query param, cookie, or JWT claim. Below org we scope further by propertyId (a hotel can have multiple properties).
The AI memory store enforces the same boundary differently — every guest's conversation memory lives in a collection keyed org:{orgId}:guest:{guestId}, with redundant guestId metadata filter as defense-in-depth.
PMS Integration: The Hardest Part
We currently integrate with Mews, a modern cloud-based PMS. The architecture is built to support multiple providers (Opera, Cloudbeds scaffolding exists).
The Architecture Supports Multiple Providers
There's a generic Transformer[TLocal, TRemote] interface for bidirectional mapping (remote ↔ local model) with conflict-resolution strategies: REMOTE_WINS, LOCAL_WINS, MERGE_FIELDS, MANUAL_REVIEW.
Provider capabilities (API versions, base URLs, auth methods, supported entities, field mappings, rate limits, retry policy, webhook support) live in a pms_provider_configs table as JSONB. Adding a provider is "insert a row + implement the client."
Mews is interesting because it's REST but POST-only, with credentials in every request body rather than headers, and separate gross-pricing/net-pricing tokens.
The Real Challenge: Bidirectional Sync
Getting sync working wasn't hard because of HTTP. It was hard because of identity reconciliation and bidirectional sync without clobbering local edits.
We keep a pms_entity_mappings table translating local IDs ↔ Mews external IDs per entity type. The sync orchestrator diffs incoming records against local state, skips writes when there are no real changes, and routes genuine conflicts through the resolution strategy.
Getting "don't overwrite a staff member's local correction with stale remote data" right is the part that requires careful engineering.
Rate Limits, Downtime, Slow Responses
Three layers:
- Token-bucket rate limiter in the Mews client (≈200 requests / 30s). Requests block on the bucket rather than getting 429'd.
- Retry with exponential backoff + jitter (1s, 2s, 4s…) on 429/500/502/503/504, capped at 3 attempts.
-
Durable sync queue (
pms_sync_queuetable) processed by polling worker. Failed items record attempt count, last error, andnextRetryAt, then retry later — so outages degrade to delayed sync, not lost sync.
Caching PMS Data
PMS data (guests, reservations, rooms) is synced into PostgreSQL and read from there — Postgres acts as the cache. Redis is for auth/permissions/session and rate-limit counters (5–15 min TTL).
Invalidation is event-driven: Mews webhooks (Reservation.Updated, Customer.Updated, Message.Added) enqueue re-sync of affected entities. No classic cache-invalidation race because there's no separate cache tier.
Webhooks: Ack-Fast, Process-Async
How We Handle Inbound Webhooks
-
WhatsApp/Meta: Handled in Go backend, verified via
X-Hub-Signature-256(HMAC-SHA256). Resolve business number → reconcile guest by phone → find-or-create conversation → store message → hand off async. - Email (campaigns): Resend webhooks (Svix-style signatures) and Postmark webhooks for delivery, open, bounce, complaint, inbound replies.
-
PMS: Mews webhooks at
POST /webhooks/pms/mews/{integrationId}, HMAC-SHA256 validated, processed async after immediate200.
Every webhook: verify signature → ack fast → process in goroutine.
Throughput Architecture
The intake path is cheap: verify + persist + return 200, then process out-of-band. WhatsApp processing fans out to goroutines; PMS work lands in a durable queue draining at controlled rate.
A spike becomes a deeper queue, not dropped messages or webhook timeouts. The architecture is built for throughput bounded by downstream workers, not webhook handling.
Message Queue Strategy
We use Go goroutines + worker pools for fan-out and a Postgres-backed queue table (pms_sync_queue, dispatch batches) where durability across restarts matters.
Postgres-as-a-queue gives us:
- Transactional enqueue
- Visibility into stuck jobs via plain SQL
- Zero new infra to operate
When a single Postgres queue stops keeping up, that's the signal to introduce a real broker.
Message Persistence
For durable paths, queue rows track state: PENDING → PROCESSING → COMPLETED/FAILED with attempt counts and nextRetryAt. A crash mid-process leaves a retryable row.
Email dispatch uses idempotency keys (batchID:contactID) so retries can't double-send.
AI Concierge: API-Based, Multi-Provider, Custom ReAct
How It Works
API-based, multi-provider. Default LLM is Gemini (gemini-2.0-flash-exp), with OpenAI, DeepSeek, Azure OpenAI selectable via LLM factory.
It's a custom ReAct loop with router + specialist agents. Router LLM classifies guest intent, delegates to one of five specialists:
- Booking
- Services
- Property Info
- Atlas/knowledge
- Catch-all Concierge
Each specialist runs bounded ReAct loop (3–4 turns), binding tools and executing against Go backend over HTTP.
How AI Knows Property-Specific Info
RAG over Qdrant. Each property's documents (CSVs, text, markdown) are chunked, embedded with OpenAI text-embedding-3-small (1536-dim), stored in stayzr_knowledge collection with payload indexes on organizationId, propertyId, category, fileName.
Retrieval scoped to property (exact propertyId OR org-wide docs with null property). Query enrichment anchors vague questions ("what's nearby?") to property's actual city before embedding. Multi-pass search (strict threshold 0.3 → relaxed 0.15 → top-k fallback) ensures near-misses still return results.
Results carry matchTier tag so downstream logic knows trust level.
Escalation When AI Doesn't Know
Explicit escalation score sums signals:
- Urgent/frustrated/negative sentiment
- All-tools-failed
- Low confidence (<0.4)
- Explicit "I want human/manager" requests
Score ≥0.7 → escalate, 0.3–0.7 → monitor, else none.
On hard failure, specialist returns human-handoff line rather than hallucinating. Anti-hallucination guardrails detect ungrounded details. Duplicate-reply suppressor won't let bot parrot same answer twice without new tool result.
Prompt Engineering Only
100% prompt engineering — no fine-tuning. Every prompt is a builder function (buildRouterPrompt, buildBookingPrompt) injecting dynamic context: property basics, amenities, policies, local time, recent guest memory, active booking state.
For multi-tenant product where every property's facts differ, RAG + structured prompts beats fine-tuning.
Scalability & Performance
Current Architecture
Go services are mostly stateless (state in Postgres/Redis), so horizontal scaling is available (run N backend processes behind Nginx). Today we're vertically scaled with queue-absorbs-spike behavior.
Caching Strategy
Redis for:
- Org data (5 min TTL)
- User permissions (15 min)
- Session validation (5 min)
- Sliding-window rate-limit counters (1 min)
Pool sized at 100 connections. PMS/business data in Postgres, not Redis-cached.
WhatsApp Rate Limiting
We use token-bucket limiter pattern (already in codebase for Mews client) that paces requests. Per-minute global rate limit (default 60 RPM) on email dispatch worker. Same pattern, different ceiling, when volume justifies it.
Observability
OpenTelemetry end-to-end. OTel collector ships traces/logs to SigNoz and Oneuptime (dual export).
Prometheus scrapes node/postgres/redis/nginx/cadvisor every 15s with Alertmanager rules. Blackbox exporter probes public health endpoints (api.stayzr.com/health, marketing site, admin).
Go backend instrumented with OTel SDK + OTLP HTTP exporter. Structured logging (zap in Go) into OTel/SigNoz pipeline, PM2 log files parsed by collector's filelog receiver.
Security & Privacy
Webhook Authentication
Every inbound webhook signature-verified:
- WhatsApp/Meta via
X-Hub-Signature-256(HMAC-SHA256) - Mews via HMAC-SHA256
- Resend via Svix-style HMAC over
id.timestamp.body - Postmark via server token/signature
- Agents service via
X-Agent-SignatureHMAC + agent-ID allowlist
No unauthenticated webhook endpoints.
Encryption
In transit: TLS 1.2/1.3 everywhere via Nginx + Let's Encrypt, strong cipher suites, HSTS.
At rest: Disk/storage-level encryption. Application-level encryption for OAuth tokens via secure token codec.
PCI Scope
We don't store card data. Payment provider configs exist (Stripe, Razorpay) — integration model lets processor handle cards, staying out of PCI scope by never touching PANs.
Lessons Learned
Biggest Technical Wins
- Postgres-as-a-queue works — until it doesn't. When it stops keeping up, that's when you introduce RabbitMQ/Kafka. Not before.
- PMS APIs are idiosyncratic — they're not clean REST. POST-only, body-auth, partial webhooks, different data models. Build for reconciliation and conflict from start.
- AI latency is hundreds of ms to seconds — if calling external LLM + RAG + tool calls, optimize for correctness, not fake speed claims.
- Documentation matters — keep README aligned with actual stack. Documentation debt is invisible until someone reads it and the map doesn't match territory.
What Works Well
- Goroutines for I/O-bound work — in-process worker pools without external broker
- RAG + structured prompts — beats fine-tuning for multi-tenant with different property facts
- Event-driven invalidation — webhooks trigger re-sync, no cache-invalidation races
- Idempotency keys — prevent double-send on retries
What's Next
PMS Integrations
Opera and Cloudbeds next (config scaffolding exists), with service-layer abstraction making second provider straightforward.
AI Features
Multilingual partially real — locale handling in agent runtime, migrations canonicalizing guest languages. The agent runtime's tool-using design makes additional modalities feasible.
Scale
The architecture supports horizontal scaling (stateless Go services, state in Postgres/Redis). The next infrastructure investment is adding redundancy as we scale beyond current design partners.
For Developers Building Similar Systems
If you're building in hospitality tech or B2B SaaS with:
- PMS/hotel integrations
- High-throughput webhook handling
- AI concierge with property-specific knowledge
- Multi-tenant architecture
I'm happy to share patterns. Drop a comment or reach out.
Check Out Stayzr (If You're a Hotel Operator)
We're actively onboarding design partners. If you're a hotel operator drowning in guest messages, manual back-office work, or B2B travel agent requests, I'd love to show you what's possible.
30-day free trial, no strings attached.
If you found this useful, I'm planning more deep-dives on:
- Go concurrency patterns for high-throughput webhook systems
- PMS transformer abstraction (code walkthrough)
- RAG pipeline for property-specific knowledge (Qdrant + multi-pass search)
Drop a comment if you want to see any of these.

Top comments (0)