Skip to content

Components

Inside the operator

The operator is one binary (cmd/operator) hosting a controller-manager. Each CRD gets its own controller package; validation and defaulting live in an admission webhook; host work is delegated to the agent. Backends and operating systems sit behind capability interfaces so the same controllers run against different substrates.

Controller-managerCapability interfacesImplementationsVMVolumeMigrationcontrollerNodewrightClustercontrollerAgentStatussynthesisAdmission webhookstorage.Backend(SupportsClone / Snapshot /Replication / RWX …)os.OSProvider(BootRecovery / PersistentNet /KernelTuning …)EdgeProfile(topology + backend + mode + OS,names, thresholds)piraeuslocal-pathportworx (v1.0)kairosubuntu (v0.2)rhel (v1.0) queries capabilitiesresolves substrategate CRDs on capabilities

Custom resources

Group nodewright.spectrocloud.com/v1alpha1. Workflow CRDs carry a .status.phase and a per-step progress list; status always reflects observed state, never desired.

CRD Scope Phase What it is
VMVolumeMigration namespaced v0.1 ⭐ Automates migrating a VM's volume to RWX (cold dd-copy / hot csi-clone) with snapshot bookends + checksum verification. The migration runbook, made declarative.
NodewrightCluster cluster (singleton) v0.1 The cluster-level health rollup: Bootstrap · SingleNode · Upgrading · MultiNode · Degraded · Recovering.
NodewrightAgentStatus per node v0.1 Per-node observed state — role, heartbeat, storage/DRBD health.
BondModeFlipPlan per flip v0.2 A coordinated bond-mode flip as a workflow, with VM protection and rollback.
ReplicationUpgrade per upgrade v0.2 The single→multi-node storage upgrade, gated and staged.
EdgeProfile cluster v0.2 The substrate definition — topology, backend, OS, names, thresholds — as a signed, versioned artifact.

The observation bridge (being built now)

The first increment is deliberately read-only. Rather than take over the DaemonSet, the operator observes it: it reads the DaemonSet's live state (a ConfigMap-backed 21-state machine plus per-node heartbeats) and its Kubernetes Events, and rolls that up into NodewrightCluster.status and NodewrightAgentStatus.

Legacy DaemonSetNodewright operatorNodewrightCluster.statusNodewrightAgentStatusstate ConfigMap(21-state machine,heartbeats)Kubernetes Events read-onlyread-onlytyped rollup

Why read-only first

It ships value in days, it cannot race or harm the DaemonSet, and it forces an accurate, typed model of the machine the operator will later take over. Low blast radius by construction.

Extensibility — three seams, not a fork

The current customer's hardware is one cell of a matrix, never the hardcoded default:

  • storage.Backend — each backend declares static capabilities (SupportsClone(), SupportsRWX(), …); controllers query them and the webhook rejects workflows a backend can't safely support.
  • os.OSProvider — each OS declares what host operations it can perform (boot-recovery hooks, persistent network config, kernel tuning).
  • EdgeProfile — bundles the topology + backend + mode + OS enums, plus site-specific names and thresholds, into one signed profile. Adding a new site or hardware shape means adding a profile, not editing controllers.

Capabilities are compile-time static — declared by the Go type, never live-probed — so a backend outage can never silently change what the operator will admit.