DEV Community

Wakeup Flower
Wakeup Flower

Posted on

Reason of unhealthy but seems fine instance detected by ELB

1 — Health check path mismatch (most common)

  • ELB is checking /health, but instance A:

    • doesn’t serve that path,
    • serves it under a different route (e.g., /status),
    • or requires authentication to access it.
  • Instance returns 404 / 403 / 500 → ELB marks it unhealthy.


2 — Wrong port or protocol

  • ELB checks port 80 (HTTP), but app on instance A listens on 8080.
  • Or ELB expects HTTP, but instance only responds on HTTPS.
  • → ELB sees connection refused / timeout.

3 — Security group or NACL blocking traffic

  • ELB’s health check traffic can’t reach instance A because:

    • Instance’s security group doesn’t allow inbound traffic from the ELB on the health check port.
    • Or NACLs block it.
  • → Health check packets dropped.


4 — Application slow or erroring under load

  • App on instance A:

    • Responds too slowly (longer than ELB’s health check timeout).
    • Returns 5xx errors intermittently (crash, memory leak, DB issue).
  • → Health check fails while other instances may still pass.


5 — Instance-specific misconfiguration

  • Instance A might have:

    • Wrong version of the app deployed.
    • Dependency missing.
    • Config file pointing to wrong DB.
    • Local firewall (iptables/ufw) blocking health check traffic.
  • → Only that one instance fails health checks.


6 — Health check thresholds

  • ELB requires X successful responses before marking “healthy”.
  • If instance A is flaky and fails 2/3 probes → still marked unhealthy.
  • Others might pass consistently.

7 — OS or networking issue

  • Instance A’s OS/network stack may be unhealthy:

    • High CPU load.
    • Network interface issues.
    • Misconfigured route table.
  • ELB can connect to others but not A.


Key takeaway

If ELB marks instance A unhealthy, even though you think it’s “fine”:

  • Start with health check configuration (path, port, protocol).
  • Then verify network access (SG, NACL, firewall).
  • Finally, check application logs on that instance for errors, slowness, or path mismatches.

Top comments (0)