Talos Linux¶
The cluster runs Talos Linux v1.12.4 -- a purpose-built, immutable operating system for Kubernetes. Talos has no shell, no SSH, and no package manager. All management is done through a gRPC API via talosctl.
Why Talos¶
| Property | Benefit |
|---|---|
| Immutable | The OS is read-only. No drift, no manual changes, no configuration surprises. |
| API-driven | All operations go through talosctl. Infrastructure is code, not a series of SSH commands. |
| Minimal attack surface | No shell, no SSH, no unnecessary services. The only way in is the API. |
| Declarative | Machine configs are YAML documents that describe the desired state of each node. |
| Atomic upgrades | Upgrades swap the entire OS image atomically. Rollback is automatic on failure. |
Talos Version and Factory Images¶
The cluster uses Talos v1.12.4 with custom factory images from factory.talos.dev. Each node type has a different image built from a schematic -- a YAML file that declares which system extensions and overlays to include.
How Factory Images Work¶
flowchart LR
A[Extension YAML<br/>e.g. intel.yaml] -->|POST| B[factory.talos.dev/schematics]
B -->|Returns| C[Schematic ID<br/>97bf8e92...]
C --> D[factory.talos.dev/installer/<br/>SCHEMATIC_ID:VERSION]
D --> E[Custom Talos Image<br/>with Extensions] The schematic ID is embedded in the installer image URL used by each node. For example:
factory.talos.dev/installer/97bf8e92fc6bba0f03928b859c08295d7615737b29db06a97be51dc63004e403:v1.12.4
Image Definitions¶
The justfile defines four image variables, one per node type:
cp_image := "factory.talos.dev/installer/de94b242...:v1.12.4" # RPi (control plane)
cp_amd_image := "factory.talos.dev/installer/f19ad7b4...:v1.12.4" # AMD (control plane)
worker_intel_image := "factory.talos.dev/installer/97bf8e92...:v1.12.4" # Intel workers
To regenerate schematic IDs after changing extensions:
This POSTs each extension YAML to factory.talos.dev/schematics and prints the resulting IDs.
Extensions per Node Type¶
Configuration Generation Flow¶
All Talos configuration is generated and applied through just recipes defined in pitower/talos/justfile.
flowchart TD
A[secrets.sops.yaml<br/>Encrypted] -->|sops -d| B[secrets.yaml<br/>Decrypted]
B --> C[talosctl gen config]
D[general.patch] --> C
E[controlplane.patch] --> C
C --> F[clusterconfig/controlplane.yaml]
C --> G[clusterconfig/worker.yaml]
F --> H{Per-node patches}
G --> H
H -->|worker-01.patch| I[worker-01.yaml]
H -->|worker-02.patch| J[worker-02.yaml]
H -->|worker-03.patch| K[worker-03.yaml]
H -->|worker-04.patch| L[worker-04.yaml]
H -->|worker-05.patch| M[worker-05.yaml]
H -->|worker-06.patch| N[worker-06.yaml] Step 1: Generate Base Configs¶
This decrypts secrets.sops.yaml and runs talosctl gen config with two global patches:
patches/general.patch-- applied to all nodespatches/controlplane.patch-- applied to control plane nodes only
Output goes to clusterconfig/controlplane.yaml and clusterconfig/worker.yaml.
Step 2: Apply Per-Node Patches¶
Each node gets its own patch applied on top of the base config, setting hostname, install image, network interfaces, and VIP assignments. The patched configs are written to clusterconfig/worker-XX.yaml.
Control Plane Nodes Use controlplane.yaml as Base
worker-01, worker-02, and worker-03 are control plane nodes despite their naming. Their per-node patches are applied on top of controlplane.yaml, not worker.yaml.
Key Patches¶
General Patch (all nodes)¶
machine:
kubelet:
extraArgs:
rotate-server-certificates: true
extraConfig:
imageGCHighThresholdPercent: 60
imageGCLowThresholdPercent: 50
extraMounts:
- destination: /var/mnt/extra
type: bind
source: /var/mnt/extra
options:
- rbind
- rshared
- rw
features:
hostDNS:
enabled: true
forwardKubeDNSToHost: false
resolveMemberNames: true
cluster:
network:
cni:
name: none
proxy:
disabled: true
| Setting | Purpose |
|---|---|
rotate-server-certificates | Enables automatic kubelet server certificate rotation |
imageGCHighThresholdPercent: 60 | Triggers image garbage collection when disk usage exceeds 60% |
imageGCLowThresholdPercent: 50 | Stops GC when disk usage drops below 50% |
Extra mount /var/mnt/extra | Provides a writable bind mount for workloads that need host-level storage |
hostDNS.enabled | Enables Talos host-level DNS resolution |
resolveMemberNames | Allows resolving cluster member names via host DNS |
cni.name: none | Disables default CNI -- Cilium is installed as a post-bootstrap addon |
proxy.disabled: true | Disables kube-proxy -- Cilium operates in kube-proxy replacement mode |
Control Plane Patch¶
cluster:
allowSchedulingOnControlPlanes: true
coreDNS:
disabled: true
apiServer:
certSANs:
- 127.0.0.1
extraArgs:
service-account-issuer: https://raw.githubusercontent.com/swibrow/home-ops/main/pitower/kubernetes
service-account-jwks-uri: https://k8s.cluster.internal:6443/openid/v1/jwks
| Setting | Purpose |
|---|---|
allowSchedulingOnControlPlanes | Permits workloads on control plane nodes to maximize resource use |
coreDNS.disabled | Disables built-in CoreDNS -- DNS is handled by Cilium or an alternative |
certSANs: [127.0.0.1] | Adds localhost to the API server certificate SANs |
service-account-issuer | Sets the OIDC issuer URL for service account tokens to a GitHub-hosted endpoint |
service-account-jwks-uri | JWKS endpoint for verifying service account tokens |
Per-Node Patches¶
Each node has a patch under patches/nodes/ that sets:
- Hostname (e.g.,
worker-01) - Install image (factory image URL with schematic ID and Talos version)
- Network interfaces (DHCP, VIP assignment for control plane nodes)
- Install disk (where applicable, e.g.,
/dev/mmcblk0for eMMC)
Example control plane node patch: