DEV Community

Prince Raj
Prince Raj

Posted on

Building a Multi-Tenant Observability Platform with SigNoz + OneUptime

Modern SaaS teams need deep observability without sacrificing tenant isolation or compliance. This post explains how we built a multi-tenant monitoring platform that routes logs, metrics, and traces to isolated SigNoz and OneUptime stacks, enforces strong security controls, and aligns with SOC 2 and ISO 27001 practices. The result: each customer gets a dedicated monitoring experience while we keep the operational footprint lean and repeatable.

1) Architecture Overview

We designed a hub-and-spoke model:

  • A central monitoring VM hosts the observability stack.
  • Each tenant has either:
    • a fully isolated SigNoz stack (frontend, query, collector, ClickHouse), or
    • a shared stack with strict routing based on a tenant identifier (for lightweight tenants).
  • Each application VM runs an OpenTelemetry (OTEL) Collector that tails PM2 logs, receives OTLP traces/metrics, and forwards to the monitoring VM.

This gives a consistent ingestion pipeline while allowing isolation-by-default where needed.

Ingestion Pipeline

2) Tenant Segregation Strategy

We support two isolation modes:

1) Full isolation per tenant

  • Dedicated SigNoz stack per tenant
  • Separate ClickHouse instance
  • Separate OTEL collector upstream
  • Strongest data isolation

2) Logical isolation on a shared stack

  • Single SigNoz + ClickHouse
  • Routing by business_id (header + resource attribute)
  • Good for smaller tenants

We default to full isolation for regulated or high-traffic customers.

Key routing headers:

  • x-business-id for SigNoz
  • x-oneuptime-token for OneUptime

3) Provisioning and Hardening the Monitoring VM

We treat the monitoring VM as a controlled production system:

  • SSH keys only, no password auth
  • Minimal inbound ports (22, 80, 443, 4317/4318)
  • Nginx as a single TLS ingress
  • Docker Compose for immutable service layout

Example provisioning steps (high-level):

# SSH key-based access only
az vm user update --resource-group <rg> --name <vm> --username <user> --ssh-key-value "<pubkey>"

# Open required ports (restrict SSH to trusted IPs)
az network nsg rule create ... --destination-port-ranges 22 80 443 4317 4318
Enter fullscreen mode Exit fullscreen mode

4) Multi-Tenant Routing at the Edge

We use Nginx maps to route traffic by hostname for both UI and OTLP ingestion:

map $host $signoz_collector_upstream {
  signoz.tenant-a.example  signoz-otel-collector-tenant-a;
  signoz.tenant-b.example  signoz-otel-collector-tenant-b;
  default                 signoz-otel-collector-default;
}

server {
  listen 4318;
  location / {
    proxy_pass http://$signoz_collector_upstream;
  }
}
Enter fullscreen mode Exit fullscreen mode

This gives us clean DNS-based tenant routing while keeping a single IP.

5) Collector Configuration: Logs, Traces, Metrics

Each tenant VM runs OTEL Collector with filelog + OTLP. We parse PM2 logs (JSON wrapper), normalize severity, and attach resource fields for fast filtering in SigNoz.

Core fields we enforce:

  • severity_text (info/warn/error)
  • service.name
  • deployment.environment
  • host.name
  • business_id

Minimal config excerpt:

processors:
  resourcedetection:
    detectors: [system]
  resource:
    attributes:
      - key: business_id
        value: ${env:BUSINESS_ID}
        action: upsert

  transform/logs:
    log_statements:
      - context: log
        statements:
          - set(severity_text, attributes["severity"]) where attributes["severity"] != nil
Enter fullscreen mode Exit fullscreen mode

This makes severity_text, service.name, and host.name searchable immediately in SigNoz.

6) Client-Side Integration (Apps)

We used a consistent OTEL pattern across backend, web, and agent services:

  • Backend: OTLP exporter for traces
  • Web: browser traces forwarded to backend (which re-exports)
  • Agents: OTEL SDK configured with OTEL_EXPORTER_OTLP_ENDPOINT

Typical environment variables:

BUSINESS_ID=tenant-a
SIGNOZ_ENDPOINT=http://signoz.tenant-a.example:4318
ONEUPTIME_ENDPOINT=http://status.tenant-a.example:4318
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://127.0.0.1:4318/v1/traces
DEPLOY_ENV=production
Enter fullscreen mode Exit fullscreen mode

7) DNS and TLS (Public UX)

Each tenant gets:

  • signoz.
  • status.

We terminate TLS at Nginx with real certificates (ACME/Let's Encrypt):

sudo certbot --nginx -d signoz.tenant-a.example -d status.tenant-a.example
Enter fullscreen mode Exit fullscreen mode

We keep per-tenant TLS policies aligned with strong ciphers and HSTS.

8) Verification and Observability QA

We validate the pipeline with:

  • OTEL health endpoint (/health on collector)
  • Test traffic from backend
  • ClickHouse queries to confirm log attributes
  • SigNoz filters for severity_text, service.name, host.name

Example ClickHouse check (internal):

SELECT severity_text, count()
FROM signoz_logs.logs_v2
WHERE resources_string['business_id'] = 'tenant-a'
AND timestamp >= now() - INTERVAL 15 MINUTE
GROUP BY severity_text;
Enter fullscreen mode Exit fullscreen mode

9) Security and Compliance (SOC 2 + ISO 27001)

We implemented controls aligned with SOC 2 and ISO 27001:

  • Access control: SSH keys only, least privilege, MFA on cloud console.
  • Network segmentation: minimal open ports; SSH restricted by source IP.
  • Secrets management: runtime secrets stored in a vault, never in code.
  • Encryption in transit: TLS everywhere, no plaintext endpoints exposed.
  • Encryption at rest: disk encryption enabled on VMs and DB volumes.
  • Audit trails: system logs retained; infra changes tracked in code.
  • Change management: all config in repos; change reviews before deployment.
  • Monitoring and alerting: OneUptime for SLOs and uptime checks.
  • Incident response: documented procedures, retention and escalation.
  • Backup strategy: ClickHouse backup policies per tenant.

10) Repeatability: Infra + Tenant Config as Code

We split configuration by responsibility:

  • Monitoring services repo: all infra and Nginx routing
  • Tenant repos: OTEL collector config and deploy hooks

That means a new VM can be rebuilt with:
1) Pull monitoring repo and run docker compose up -d
2) Update DNS + TLS
3) Run tenant deployment scripts to install collector and env

Final Takeaways

This architecture gives us the best of both worlds:

  • Strong tenant isolation for compliance-focused clients
  • Shared ops processes and standard config
  • Fast log filtering (severity/service/env/host) for high signal-to-noise debugging
  • A repeatable, audited deployment flow suitable for SOC 2 and ISO 27001 requirements

Top comments (0)