Modern SaaS teams need deep observability without sacrificing tenant isolation or compliance. This post explains how we built a multi-tenant monitoring platform that routes logs, metrics, and traces to isolated SigNoz and OneUptime stacks, enforces strong security controls, and aligns with SOC 2 and ISO 27001 practices. The result: each customer gets a dedicated monitoring experience while we keep the operational footprint lean and repeatable.
1) Architecture Overview
We designed a hub-and-spoke model:
- A central monitoring VM hosts the observability stack.
- Each tenant has either:
- a fully isolated SigNoz stack (frontend, query, collector, ClickHouse), or
- a shared stack with strict routing based on a tenant identifier (for lightweight tenants).
- Each application VM runs an OpenTelemetry (OTEL) Collector that tails PM2 logs, receives OTLP traces/metrics, and forwards to the monitoring VM.
This gives a consistent ingestion pipeline while allowing isolation-by-default where needed.
2) Tenant Segregation Strategy
We support two isolation modes:
1) Full isolation per tenant
- Dedicated SigNoz stack per tenant
- Separate ClickHouse instance
- Separate OTEL collector upstream
- Strongest data isolation
2) Logical isolation on a shared stack
- Single SigNoz + ClickHouse
- Routing by business_id (header + resource attribute)
- Good for smaller tenants
We default to full isolation for regulated or high-traffic customers.
Key routing headers:
- x-business-id for SigNoz
- x-oneuptime-token for OneUptime
3) Provisioning and Hardening the Monitoring VM
We treat the monitoring VM as a controlled production system:
- SSH keys only, no password auth
- Minimal inbound ports (22, 80, 443, 4317/4318)
- Nginx as a single TLS ingress
- Docker Compose for immutable service layout
Example provisioning steps (high-level):
# SSH key-based access only
az vm user update --resource-group <rg> --name <vm> --username <user> --ssh-key-value "<pubkey>"
# Open required ports (restrict SSH to trusted IPs)
az network nsg rule create ... --destination-port-ranges 22 80 443 4317 4318
4) Multi-Tenant Routing at the Edge
We use Nginx maps to route traffic by hostname for both UI and OTLP ingestion:
map $host $signoz_collector_upstream {
signoz.tenant-a.example signoz-otel-collector-tenant-a;
signoz.tenant-b.example signoz-otel-collector-tenant-b;
default signoz-otel-collector-default;
}
server {
listen 4318;
location / {
proxy_pass http://$signoz_collector_upstream;
}
}
This gives us clean DNS-based tenant routing while keeping a single IP.
5) Collector Configuration: Logs, Traces, Metrics
Each tenant VM runs OTEL Collector with filelog + OTLP. We parse PM2 logs (JSON wrapper), normalize severity, and attach resource fields for fast filtering in SigNoz.
Core fields we enforce:
- severity_text (info/warn/error)
- service.name
- deployment.environment
- host.name
- business_id
Minimal config excerpt:
processors:
resourcedetection:
detectors: [system]
resource:
attributes:
- key: business_id
value: ${env:BUSINESS_ID}
action: upsert
transform/logs:
log_statements:
- context: log
statements:
- set(severity_text, attributes["severity"]) where attributes["severity"] != nil
This makes severity_text, service.name, and host.name searchable immediately in SigNoz.
6) Client-Side Integration (Apps)
We used a consistent OTEL pattern across backend, web, and agent services:
- Backend: OTLP exporter for traces
- Web: browser traces forwarded to backend (which re-exports)
- Agents: OTEL SDK configured with OTEL_EXPORTER_OTLP_ENDPOINT
Typical environment variables:
BUSINESS_ID=tenant-a
SIGNOZ_ENDPOINT=http://signoz.tenant-a.example:4318
ONEUPTIME_ENDPOINT=http://status.tenant-a.example:4318
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://127.0.0.1:4318/v1/traces
DEPLOY_ENV=production
7) DNS and TLS (Public UX)
Each tenant gets:
- signoz.
- status.
We terminate TLS at Nginx with real certificates (ACME/Let's Encrypt):
sudo certbot --nginx -d signoz.tenant-a.example -d status.tenant-a.example
We keep per-tenant TLS policies aligned with strong ciphers and HSTS.
8) Verification and Observability QA
We validate the pipeline with:
- OTEL health endpoint (/health on collector)
- Test traffic from backend
- ClickHouse queries to confirm log attributes
- SigNoz filters for severity_text, service.name, host.name
Example ClickHouse check (internal):
SELECT severity_text, count()
FROM signoz_logs.logs_v2
WHERE resources_string['business_id'] = 'tenant-a'
AND timestamp >= now() - INTERVAL 15 MINUTE
GROUP BY severity_text;
9) Security and Compliance (SOC 2 + ISO 27001)
We implemented controls aligned with SOC 2 and ISO 27001:
- Access control: SSH keys only, least privilege, MFA on cloud console.
- Network segmentation: minimal open ports; SSH restricted by source IP.
- Secrets management: runtime secrets stored in a vault, never in code.
- Encryption in transit: TLS everywhere, no plaintext endpoints exposed.
- Encryption at rest: disk encryption enabled on VMs and DB volumes.
- Audit trails: system logs retained; infra changes tracked in code.
- Change management: all config in repos; change reviews before deployment.
- Monitoring and alerting: OneUptime for SLOs and uptime checks.
- Incident response: documented procedures, retention and escalation.
- Backup strategy: ClickHouse backup policies per tenant.
10) Repeatability: Infra + Tenant Config as Code
We split configuration by responsibility:
- Monitoring services repo: all infra and Nginx routing
- Tenant repos: OTEL collector config and deploy hooks
That means a new VM can be rebuilt with:
1) Pull monitoring repo and run docker compose up -d
2) Update DNS + TLS
3) Run tenant deployment scripts to install collector and env
Final Takeaways
This architecture gives us the best of both worlds:
- Strong tenant isolation for compliance-focused clients
- Shared ops processes and standard config
- Fast log filtering (severity/service/env/host) for high signal-to-noise debugging
- A repeatable, audited deployment flow suitable for SOC 2 and ISO 27001 requirements

Top comments (0)