How it works¶
The reconcile loop¶
Every controller follows the same shape: watch a resource, compare desired to observed, take one safe step, record status, requeue. The operator decides; the agent performs host mutations; the agent reports; the operator rolls forward on the next pass.
sequenceDiagram
autonumber
participant U as nwctl / UI
participant A as K8s API
participant O as Operator (controller)
participant N as Node agent
participant H as Host / storage
U->>A: apply CRD (intent)
A-->>O: watch event
O->>O: compare desired vs observed
O->>A: patch .status (phase, next step)
O->>N: delegate one host op
N->>H: mutate (ip link / drbdadm / kairos)
N-->>A: report observed state
A-->>O: requeue
O->>O: honor gates, then advance one step
These invariants are non-negotiable — each is a hard-won lesson from the DaemonSet's incident history:
- Idempotence. Every pass is safe to retry; never leave partial state.
- Honor explicit gates. Refuse to advance past a safety gate rather than racing it.
- Backpressure. If the storage layer is unhealthy, wait — don't queue more work.
- Fail loud, don't partially proceed. The motivating failure was one node flipping while the other didn't, and the next step firing anyway. Controllers stop instead.
- Status reflects observed, never desired.
The storage-upgrade machine¶
The most battle-hardened workflow is the single→multi-node storage upgrade: flipping the storage
bond from active-backup to balance-rr so DRBD can replicate across nodes — without disrupting the
VMs running on top. In the DaemonSet this is a 21-state machine persisted in a ConfigMap; the operator
observes it today and models it as a first-class workflow (BondModeFlipPlan / ReplicationUpgrade) next.
stateDiagram-v2
direction TB
[*] --> Idle
Idle --> PreconditionsCheck: ≥2 eligible nodes, peers live
PreconditionsCheck --> AcquireLock
AcquireLock --> SuspendMigration: cluster lock held
state "Protect workloads" as prot {
SuspendMigration
}
SuspendMigration --> DisconnectDRBD: abort VMIMs, pin evictionStrategy=None
DisconnectDRBD --> DrainSatellite
DrainSatellite --> FlipNode1
FlipNode1 --> FlipNode2
FlipNode2 --> ResumeDRBD: both on balance-rr
ResumeDRBD --> RestoreMigration: peers reconnected + synced
RestoreMigration --> ReleaseLock: restore evictionStrategy=LiveMigrate
ReleaseLock --> Complete
Complete --> [*]
FlipNode1 --> BondFlipRetry: transient failure
FlipNode2 --> BondFlipRetry: transient failure
BondFlipRetry --> FlipNode1: forward-only retry (max 3)
BondFlipRetry --> Rollback: retries exhausted
ResumeDRBD --> Rollback: cannot resume safely
state "Rollback ladder" as Rollback {
direction TB
RbRestoreMigration --> RbResumeSatellite
RbResumeSatellite --> RbReconnectDrbd
RbReconnectDrbd --> RbRevertNodes
}
Rollback --> Failed
Failed --> [*]
What the diagram encodes — and why each guard exists:
| Stage | The guard | Why |
|---|---|---|
| PreconditionsCheck | ≥2 eligible nodes; every peer live | Never flip into a cluster that can't hold replicas. |
| SuspendMigration | abort in-flight VM migrations, pin evictionStrategy=None |
If KubeVirt live-migrates a VM mid-flip, it can land on a node with asymmetric DRBD — data-availability risk. |
| DisconnectDRBD → DrainSatellite | disconnect replication, move storage pods off the node | Flip the bond on a quiesced storage path, not a live one. |
| FlipNode1 → FlipNode2 | one node at a time, forward-only retry | A half-flipped cluster is the original incident; retries never revert an already-flipped node. |
| ResumeDRBD | reconnect only after peer convergence + two-replica sync | Never resume replication over a marginal or asymmetric link. |
| Rollback ladder | undo in reverse: restore VMs → resume storage → reconnect → revert bonds | Every workflow has a defined stuck-state and a reverse path. |
This is why it's gated, not autonomous
Every step here mutates host networking or storage on a live cluster — high blast radius. The productized version is over-specified with these guards as testable acceptance criteria, and its dangerous paths run supervised with human sign-off, never as an unattended loop. The read-only observation bridge comes first precisely because it carries none of that risk.