Components¶
Inside the operator¶
The operator is one binary (cmd/operator) hosting a controller-manager. Each CRD gets its own
controller package; validation and defaulting live in an admission webhook; host work is delegated
to the agent. Backends and operating systems sit behind capability interfaces so the same
controllers run against different substrates.
Custom resources¶
Group nodewright.spectrocloud.com/v1alpha1. Workflow CRDs carry a .status.phase and a
per-step progress list; status always reflects observed state, never desired.
| CRD | Scope | Phase | What it is |
|---|---|---|---|
VMVolumeMigration |
namespaced | v0.1 ⭐ | Automates migrating a VM's volume to RWX (cold dd-copy / hot csi-clone) with snapshot bookends + checksum verification. The migration runbook, made declarative. |
NodewrightCluster |
cluster (singleton) | v0.1 | The cluster-level health rollup: Bootstrap · SingleNode · Upgrading · MultiNode · Degraded · Recovering. |
NodewrightAgentStatus |
per node | v0.1 | Per-node observed state — role, heartbeat, storage/DRBD health. |
BondModeFlipPlan |
per flip | v0.2 | A coordinated bond-mode flip as a workflow, with VM protection and rollback. |
ReplicationUpgrade |
per upgrade | v0.2 | The single→multi-node storage upgrade, gated and staged. |
EdgeProfile |
cluster | v0.2 | The substrate definition — topology, backend, OS, names, thresholds — as a signed, versioned artifact. |
The observation bridge (being built now)¶
The first increment is deliberately read-only. Rather than take over the DaemonSet, the operator
observes it: it reads the DaemonSet's live state (a ConfigMap-backed 21-state machine plus per-node
heartbeats) and its Kubernetes Events, and rolls that up into NodewrightCluster.status and
NodewrightAgentStatus.
Why read-only first
It ships value in days, it cannot race or harm the DaemonSet, and it forces an accurate, typed model of the machine the operator will later take over. Low blast radius by construction.
Extensibility — three seams, not a fork¶
The current customer's hardware is one cell of a matrix, never the hardcoded default:
storage.Backend— each backend declares static capabilities (SupportsClone(),SupportsRWX(), …); controllers query them and the webhook rejects workflows a backend can't safely support.os.OSProvider— each OS declares what host operations it can perform (boot-recovery hooks, persistent network config, kernel tuning).EdgeProfile— bundles the topology + backend + mode + OS enums, plus site-specific names and thresholds, into one signed profile. Adding a new site or hardware shape means adding a profile, not editing controllers.
Capabilities are compile-time static — declared by the Go type, never live-probed — so a backend outage can never silently change what the operator will admit.