Debdut Chakraborty for Rocket.Chat

Posted on Feb 13

Conditions, Phases, and Declarative Phase Rules in Kubernetes Operators

#kubernetes #devops #opensource #cloud

Tl;Dr;

You can start with experimenting with the demo: https://debdutdeb.github.io/kubernetes-phase-rules/, linking conditions with phases. It's fun. At least was to me.
Spec = desired state; status = observed state. Controllers write status; the API conventions describe this split and the role of conditions.
Conditions are the right primitive: one observation per condition type, standardized and tooling-friendly. We use specific condition types so each observable fact is explicit and consumable.
Phase is still useful as a single, high-level label for observability (UIs, alerts, filters), but it should be derived from conditions, not maintained as a separate state machine.
Phase rules declare “when these conditions hold, phase is X.” The first matching rule wins. That keeps a single source of truth (conditions), makes phase logic testable and explicit, and avoids duplication across consumers.
The kubernetes-phase-rules (https://github.com/debdutdeb/kubernetes-phase-rules) package provides the rule types, matchers, and a StatusManager that keeps conditions and phase in sync and patches status for you.

The problem we faced

At Rocket.Chat we have an operator called Airlock that originally started as an operator for mongodb user management. But recently expanding to help us manage a few more aspects of our database operations.

We added a custom resource, Backup, whose status depended on several independent facts: Is the backup store available? Is database access granted? Has the backup job been scheduled? Did it complete or fail? Each of those is a separate observation, and in practice they’re often discovered or updated at different times, sometimes by different parts of the same controller or even by different controllers. So we ended up with multiple conditions on one resource: BucketStoreReady, MongoDBAccessRequestReady, JobScheduled, JobCompleted, JobFailed, and so on.

We also needed one phase—a single label like Pending, Running, Completed, or Failed—for UIs, alerts, and runbooks. That phase had to reflect the combination of all those conditions: “Failed if the store is missing or the job failed; Running if the store and access are ready and the job is scheduled; Completed if the job completed,” etc. The hard part was managing that mapping as we scaled out: multiple controllers or reconciliation steps each setting a subset of conditions, and every consumer (and the controller itself) needing a consistent answer to “what phase is this?” without reimplementing the same condition→phase logic in multiple places. We wanted a single source of truth (the conditions), one place that defined how conditions map to phase, and no drift between what the controller thinks the phase is and what dashboards or alerts assume.

This is about why conditions are the right primitive, why we still want a phase, and how phase rules let us derive phase from conditions in one place and keep everything in sync, including when multiple controllers touch the same resource.

The post will consistently refer to airlock and the backup resource as examples. It's easier that way for my memory to keep track of things.

Spec and status: desired vs observed

In the Kubernetes API, every resource that has mutable state is split into two parts:

spec — The desired state: what the user or automation asked for. It is the source of truth for “what should be true.”
status — The observed state: what the system has actually observed. Controllers write here; users typically don’t. It answers “what is true right now.”

The Kubernetes API conventions spell this out clearly: the specification is a complete description of the desired state and is persisted with the object; the status summarizes the current state and is “usually persisted with the object by automated processes.” So when you build a controller, you read spec, do work, and write what you observed into status.

That separation matters. It keeps user intent (spec) from being overwritten by controller updates, allows different access control for spec vs status, and gives clients a stable place to read “what’s actually happening” without parsing controller logic.

Why conditions (and why specific ones)

The standard way to put “what’s actually happening” into status is conditions. A condition is a single observation: a type (e.g. Ready, JobCompleted, BucketStoreReady), a status (True, False, or Unknown), and usually a reason and message. The API conventions describe conditions as “a standard mechanism for higher-level status reporting”: they let tools and other controllers understand resource state without implementing your controller’s logic. Conditions should “complement more detailed information” in status; they’re the contract for “is this thing ready / failed / still working?”

So why specific condition types? Because each condition should represent one observable fact. If you only had a single “Status” condition, you’d lose information: you couldn’t tell “store not ready” from “job failed” from “job still running.” By defining conditions like:

BucketStoreReady — Can we see and use the backup store?
MongoDBAccessRequestReady — Is database access granted?
JobScheduled — Has the backup job been scheduled?
JobCompleted — Did the job finish successfully?
JobFailed — Did the job fail?

you give the controller a place to report each fact as it learns it, and you give dashboards, alerts, and other controllers a way to react to specific causes (e.g. “alert when JobFailed is True” or “show message when BucketStoreReady is False”).

The conventions also say: condition type names should describe the current observed state (adjectives or past-tense verbs like “Ready”, “Succeeded”, “Failed”), and the absence of a condition should be treated like Unknown. So you design conditions to be small, explicit observations; the controller sets them as it reconciles; and the rest of the system consumes them without reimplementing your state machine.

Why phase when we already have conditions?

Conditions are the right primitive: they’re granular, extensible, and standardized. But they’re also many. For a Backup or any other resource for that matter, you might have five or six conditions. For a user or a dashboard, the first question is often: “What state is this in? Running? Failed? Pending?” Answering that from raw conditions means “run the same logic the controller would use to decide the high-level state” — and that logic then lives in every consumer (CLI, UI, Prometheus, runbooks). That duplicates logic and drifts over time.

A familiar example is the Node in Kubernetes. The node controller and kubelet set several conditions on a Node (e.g. Ready, DiskPressure, MemoryPressure, NetworkUnavailable). Those conditions drive behavior: the scheduler uses them to decide where to place pods; taints and other node properties can be updated based on conditions. But for “is this node usable?” you need a single picture. The Node’s phase (e.g. Running) is that summary, the final, high-level state that users and automation care about. So conditions are the levers the system uses to change node properties and make decisions; phase is the outcome you read when you want to know the node’s overall state.

So we still want a phase: a single, high-level label like Pending, Running, Completed, or Failed that means “the outcome of applying our rules to the current conditions.” Phase is the observability contract: one field that UIs can show in a column, that alerting can filter on (“alert if phase != Ready”), and that runbooks can branch on, without each consumer reimplementing the condition→state rules.

The Kubernetes API conventions actually deprecate the old use of phase as a first-class state-machine enum in core resources, because adding new enum values breaks compatibility and phase was often used instead of explicit conditions. The better pattern is: conditions are the source of truth; phase is a derived summary. So we keep conditions as the only thing the controller writes, and we compute phase from those conditions using a clear, declarative set of rules. That way we get the observability benefit of a single phase field without turning phase into an independent state machine.

The phase rule idea

Instead of the controller imperatively setting phase in code (“if store missing then phase = Failed”), we declare rules: “phase is Failed when condition A is True or B is True; phase is Running when C and D are True and E is False; …”. The relationship is one-way:

The controller only updates conditions (e.g. via SetCondition).
Phase is derived by evaluating an ordered list of phase rules over the current conditions.
The first rule whose condition matcher matches the current conditions gives the phase; if none match, phase is Unknown.

So conditions drive phase, phase never drives conditions. The same condition set always produces the same phase for a given rule list. Rule order encodes priority (e.g. “Completed” before “Running” before “Failed” before “Pending”). Because phase is always recomputed from the current conditions, it doesn’t matter which controller or which reconciliation step last wrote a condition—whoever updates conditions, the same rules apply and phase stays consistent.

Example (conceptually): For a Backup custom resource:

Phase Completed when: BucketStoreReady=True, MongoDBAccessRequestReady=True, JobCompleted=True.
Phase Running when: store and access are True, JobScheduled=True.
Phase Failed when: store or access is False, or JobFailed=True.
Phase Pending when: store or access is Unknown, or job not yet scheduled.

Each of these is a phase rule: a phase name plus a matcher over conditions. You evaluate the list in order; the first match wins. That’s the phase rule idea: declarative, testable, and a single place to define “what phase means.”

The kubernetes-phase-rules package

The kubernetes-phase-rules module provides exactly that: a small, experimental and potentially incomplete Go library for defining phase rules and computing phase from []metav1.Condition.

The core type is the PhaseRule interface. A phase rule has three methods: Satisfies(conditions) returns true if the given slice of metav1.Condition matches the rule (e.g. all required conditions present with the right statuses for an AND rule, or at least one for an OR rule); Phase() returns the phase name this rule represents (e.g. "Running", "Failed"); ComputePhase(conditions) returns that phase name when the rule is satisfied, and the constant PhaseUnknown ("Unknown") otherwise. So a phase rule is “when these conditions hold, the phase is X”; you build concrete rules with NewPhaseRule(phaseName, matcher) and pass them to the StatusManager or evaluate them yourself.

type PhaseRule interface {
    Satisfies(conditions []metav1.Condition) bool
    Phase() string
    ComputePhase(conditions []metav1.Condition) string
}

Package rules — You build phase rules with NewPhaseRule(phaseName, matcher). Matchers are built from:
- ConditionEquals(conditionType, statuses...) — this condition type must have one of the given statuses (True, False, Unknown).
- ConditionsAll(...) — all of the given condition matchers must match (logical AND).
- ConditionsAny(...) — at least one must match (logical OR).

Given a slice of metav1.Condition, you call rule.Satisfies(conditions) or rule.ComputePhase(conditions); for a list of rules, you iterate in order and take the first satisfied rule’s phase (or PhaseUnknown).

Package conditions — A StatusManager ties this into controller-runtime: you give it the CR’s status conditions pointer, the CR (implementing a small interface with SetPhase / GetPhase / SetObservedGeneration), and the phase rules. When you call SetCondition or SetConditions, it updates the condition slice, recomputes phase from the rules, updates the object’s phase and observed generation, and patches status via client.Status().Patch only when something changed. So the controller only sets conditions; the manager keeps phase and status in sync.

// Yes, I ran out of name ideas.
type Object2 interface {
    SetPhase(phase string)
    GetPhase() string
    SetObservedGeneration(generation int64)
}

The module is experimental and kept intentionally simple: minimal API, minimal dependencies, no feature creep. You can use it as a starting point and adapt the rules or the manager to your CRDs.

Try it in the browser

You can see phase rules in action and experiment with conditions and rule order in the interactive demo, load the templates of build your own rules:

Demo (GitHub Pages): https://debdutdeb.github.io/kubernetes-phase-rules/

The demo lets you define condition types, set their statuses, and define an ordered list of phase rules (with AND/OR and allowed statuses). It computes the resulting phase so you can build intuition for how conditions map to phase and why rule order matters.

Top comments (1)

Michael Weibel • Jun 24

Interesting post, thanks!

Just to confirm my understanding: You are proposing using conditions for detailed status knowledge, but also generating a separate .status.phase property from rules? I think that's generally okay, but typically controllers don't set .status.phase anymore. Instead, the approach taken is to use computed conditions (like type=Ready) as the high-level status. Since your package looks like it could handle computing this condition too, what makes you prefer using status.phase over a computed condition?