Vigilmon

Posted on Jun 27

How to Monitor Kubernetes Services with Vigilmon (HTTP + TCP + Alerts)

#kubernetes #devops #monitoring #sre

Running services on Kubernetes gives you powerful orchestration — self-healing pods, rolling deployments, horizontal scaling. But Kubernetes also introduces a new class of failure modes that internal health probes and kubectl get pods simply cannot catch.

In this tutorial you'll set up comprehensive uptime monitoring for your Kubernetes services using Vigilmon — free tier, no credit card required.

Why Kubernetes services need external monitoring

Kubernetes has livenessProbe and readinessProbe built in. So why do you need external monitoring on top of that?

Because the internal probes only tell you what the node can see. External monitoring tells you what your users see. These are not the same thing:

LoadBalancer IP is unreachable — your pods are healthy, but the cloud provider's load balancer has a routing issue or the external IP never got provisioned
NodePort binding breaks — a node reboots and the NodePort service doesn't come back cleanly; pods show Running but traffic can't reach them
Ingress controller misconfiguration — an nginx-ingress or Traefik update changes routing rules; pods are fine, users get 404s
DNS resolution failure — your cluster-external DNS record stops resolving; Kubernetes knows nothing about this
Cross-namespace connectivity broken — a NetworkPolicy change silently drops traffic between namespaces

The pattern is the same: Kubernetes reports everything green while users see downtime. External monitoring from multiple geographic regions is the only reliable way to catch this class of failure.

What you'll need

A running Kubernetes cluster (any provider: EKS, GKE, AKS, k3s, etc.)
A service exposed via LoadBalancer, NodePort, or Ingress
A free Vigilmon account (sign up takes 30 seconds)

Step 1: Expose your service and add a health endpoint

Before setting up monitoring, make sure your service is reachable from outside the cluster and has a /health endpoint to probe.

Here's a minimal Kubernetes Deployment + Service for a Node.js app:

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-api
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-api
  template:
    metadata:
      labels:
        app: my-api
    spec:
      containers:
        - name: my-api
          image: your-registry/my-api:latest
          ports:
            - containerPort: 3000
          livenessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 10
            periodSeconds: 30
          readinessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 5
            periodSeconds: 10
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 500m
              memory: 256Mi
---
# service.yaml
apiVersion: v1
kind: Service
metadata:
  name: my-api
  namespace: production
spec:
  type: LoadBalancer
  selector:
    app: my-api
  ports:
    - port: 80
      targetPort: 3000
      protocol: TCP

Apply both:

kubectl apply -f deployment.yaml
kubectl apply -f service.yaml

Get your external IP:

kubectl get svc my-api -n production
# NAME     TYPE           CLUSTER-IP      EXTERNAL-IP      PORT(S)        AGE
# my-api   LoadBalancer   10.96.42.100    203.0.113.50     80:31234/TCP   2m

Make sure your /health route returns HTTP 200:

curl http://203.0.113.50/health
# {"status":"ok","timestamp":"2024-01-15T10:23:00Z"}

Step 2: Set up HTTP monitoring in Vigilmon

Now that your service is reachable, add it to Vigilmon:

Log in to vigilmon.online and go to Monitors → New Monitor
Choose HTTP / HTTPS as the type
Set the URL to your service endpoint, e.g. http://203.0.113.50/health (or https://api.yourdomain.com/health if you have a domain and TLS)
Set the check interval to 1 minute
Under Expected response, set:
- Status code: 200
- (Optional) Response body contains: "status":"ok"
Save the monitor

Vigilmon will probe your endpoint from multiple geographic regions. If any region can't reach your LoadBalancer — even when your pods are running — an incident is triggered immediately.

The multi-region consensus advantage

This is where Vigilmon differs from single-region monitoring tools. Instead of declaring an outage the moment one probe fails, Vigilmon uses multi-region consensus: multiple independent probes from different geographic locations must agree that the service is unreachable before an alert fires.

For Kubernetes, this matters because:

A single probe failure could be a transient network blip on one cloud provider's transit path
A regional cloud provider incident might affect probes from one region but not others
True downtime is confirmed quickly (within seconds) when multiple regions agree

You get fewer false positives without missing real incidents.

Step 3: Monitor your TCP layer

HTTP monitoring tells you if your app responds correctly. TCP monitoring tells you if the port is reachable at all — useful for detecting load balancer routing failures before they manifest as HTTP errors.

If you're running a database or other TCP service inside your cluster, you might expose it via NodePort for internal monitoring:

# postgres-nodeport.yaml
apiVersion: v1
kind: Service
metadata:
  name: postgres-nodeport
  namespace: production
spec:
  type: NodePort
  selector:
    app: postgres
  ports:
    - port: 5432
      targetPort: 5432
      nodePort: 30432
      protocol: TCP

Then in Vigilmon:

Go to Monitors → New Monitor
Choose TCP Port
Enter your node's IP and the NodePort number: node-1.example.com / 30432
Save the monitor

Now you'll know immediately if the TCP socket becomes unreachable — even if Kubernetes reports the pod as Running.

Step 4: Configure alert channels

When a k8s service goes down, you want to know in seconds — not when a user files a support ticket.

Email alerts

In Vigilmon, go to Alert Channels → Add Channel → Email
Enter your on-call email address (or your team's alert email)
Assign the channel to your monitors

Webhook alerts (Slack, PagerDuty, etc.)

Go to Alert Channels → Add Channel → Webhook
Enter your webhook URL
Assign the channel to your monitors

Example payload Vigilmon sends when a service goes down:

{
  "monitor_name": "my-api /health",
  "status": "down",
  "url": "http://203.0.113.50/health",
  "started_at": "2024-01-15T10:23:00Z",
  "duration_seconds": 120
}

Use this to route alerts into PagerDuty, Opsgenie, or a custom webhook handler that enriches the alert with Kubernetes context:

# Example webhook handler that annotates the alert with kubectl output
kubectl get pods -n production -l app=my-api --no-headers | \
  awk '{print $1, $3, $4}' > /tmp/pod-status.txt

Using `kubectl` to correlate alerts

When you get a Vigilmon alert, your first instinct should be to check whether the issue is at the Kubernetes layer or external:

# 1. Check pod status
kubectl get pods -n production -l app=my-api

# 2. Check service endpoints — are pods registered?
kubectl get endpoints my-api -n production

# 3. Check events for the service
kubectl describe svc my-api -n production

# 4. Test from inside the cluster
kubectl run tmp-curl --image=curlimages/curl --rm -it --restart=Never -- \
  curl http://my-api.production.svc.cluster.local/health

If curl from inside the cluster works but Vigilmon shows the service as down, the problem is external — load balancer, DNS, or network routing. If it fails inside the cluster too, you have a pod or service configuration issue.

Step 5: Create a public status page

If you're running production services for customers or internal users, a public status page reduces support noise when incidents happen.

In Vigilmon, go to Status Pages → New Status Page
Give it a name: "Production Cluster Services"
Add your monitors: my-api /health, TCP postgres, etc.
Group them logically — API, Database, Queue, etc.
Publish the page

You'll get a public URL like https://status.vigilmon.online/your-page. Share it in your runbook, your app's dashboard, or with stakeholders who need visibility into cluster health without kubectl access.

Putting it all together

Here's a summary of the monitors to create for a typical Kubernetes production deployment:

Monitor	Type	What it catches
`https://api.yourdomain.com/health`	HTTP	Pod crashes, app errors, ingress routing failures, DNS issues
`https://api.yourdomain.com`	HTTP	TLS/certificate issues, Ingress path routing
`node-1.example.com:30432`	TCP	Postgres NodePort reachability
`node-1.example.com:443`	TCP	HTTPS port binding at the node level

And the corresponding Kubernetes resources to ensure your services are monitorable:

# Check what's exposed externally
kubectl get svc -A --field-selector spec.type=LoadBalancer
kubectl get svc -A --field-selector spec.type=NodePort
kubectl get ingress -A

What's next

SSL certificate expiry — Vigilmon monitors your TLS cert and alerts you before it lapses. Critical for Kubernetes clusters using cert-manager, where auto-renewal can silently fail
Heartbeat monitoring — if you run CronJobs in Kubernetes, Vigilmon's heartbeat monitors will alert you when a scheduled job stops reporting in, even if the pod exits cleanly
Multi-environment coverage — add separate monitors for staging and production, and use Vigilmon's status page grouping to keep them organized

Get started free at vigilmon.online — no credit card, monitors start running in under a minute.

DEV Community

How to Monitor Kubernetes Services with Vigilmon (HTTP + TCP + Alerts)

Why Kubernetes services need external monitoring

What you'll need

Step 1: Expose your service and add a health endpoint

Step 2: Set up HTTP monitoring in Vigilmon

The multi-region consensus advantage

Step 3: Monitor your TCP layer

Step 4: Configure alert channels

Email alerts

Webhook alerts (Slack, PagerDuty, etc.)

Using `kubectl` to correlate alerts

Step 5: Create a public status page

Putting it all together

What's next

Top comments (0)

Why Kubernetes services need external monitoring

What you'll need

Step 1: Expose your service and add a health endpoint

Step 2: Set up HTTP monitoring in Vigilmon

The multi-region consensus advantage

Step 3: Monitor your TCP layer

Step 4: Configure alert channels

Email alerts

Webhook alerts (Slack, PagerDuty, etc.)

Using kubectl to correlate alerts

Step 5: Create a public status page

Putting it all together

What's next

Using `kubectl` to correlate alerts