π Introduction
As distributed jobs scale across nodes, observability becomes essential. In this part, we explore how the spring-batch-db-cluster-partitioning
framework exposes real-time cluster health, task loads, and node statuses β giving developers the visibility needed for debugging, performance tuning, and production readiness.
π Exposing Cluster State with Actuator Endpoints
Spring Boot Actuator endpoints provide a natural interface to expose cluster state. This framework adds two custom indicators:
β
/actuator/health
This includes:
"batchCluster": {
"status": "UP",
"details": {
"Total Active Nodes": "3",
"Total Nodes in Cluster": "3"
}
},
"batchClusterNode": {
"status": "UP",
"details": {
"Current Load (number of live tasks)": "2",
"Node Status": "ACTIVE",
"Node Id": "worker-1",
"Last Heartbeat Update Time": "2025-08-03T02:43:07.133+00:00",
"Start Time": "2025-08-03T02:42:32.092+00:00"
}
}
This helps you instantly verify:
- Overall cluster health, how many nodes available in total.
- If the current node is active and responsive
- How many tasks it's executing
- When it last sent a heartbeat
π /actuator/batch-cluster
β Full Cluster Snapshot
This custom endpoint provides a full view of the entire cluster state:
{
"nodes": [
{
"Started At": "2025-08-03T02:42:32.092+00:00",
"Node Id": "worker-1",
"Current Load (# of tasks)": 2,
"Host": "worker-1.company.local",
"Last Heartbeat": "2025-08-03T02:43:07.133+00:00",
"Status": "ACTIVE"
}
],
"totalNodes": 3
}
It includes:
- All registered nodes
- Current task count
- Last heartbeat timestamp
- Node status (
ACTIVE
,UNREACHABLE
, etc.)
π Node Load Metrics
Cluster load is computed based on:
- Active tasks being executed per node
- Heartbeat freshness
- Task reassignment if a node becomes unreachable
This allows external monitoring tools to:
- Detect load imbalance
- Alert on stale heartbeats
- Audit execution trends over time
π Best Practices
- π¦ Use static node IDs (e.g.,
worker-1
,worker-2
) for easier observability - π οΈ Integrate with Prometheus/Grafana using custom endpoints or intermediate exporters
- π§ͺ Monitor node health to detect failures before partitions get stuck
β Whatβs Next (Part 6 Preview)
In the next part, weβll cover:
- β οΈ Failure handling and retries
- π§― What happens when a node crashes mid-job
- π How tasks are reassigned or resumed
- π§ Retry strategies to ensure data consistency
β Want More?
- Explore the code: GitHub β spring-batch-db-cluster-partitioning
- Read earlier parts in the series: Dev.to article series
Top comments (0)