Talos Linux¶

The cluster runs Talos Linux v1.12.4 -- a purpose-built, immutable operating system for Kubernetes. Talos has no shell, no SSH, and no package manager. All management is done through a gRPC API via talosctl.

Why Talos¶

Property	Benefit
Immutable	The OS is read-only. No drift, no manual changes, no configuration surprises.
API-driven	All operations go through `talosctl`. Infrastructure is code, not a series of SSH commands.
Minimal attack surface	No shell, no SSH, no unnecessary services. The only way in is the API.
Declarative	Machine configs are YAML documents that describe the desired state of each node.
Atomic upgrades	Upgrades swap the entire OS image atomically. Rollback is automatic on failure.

Talos Version and Factory Images¶

The cluster uses Talos v1.12.4 with custom factory images from factory.talos.dev. Each node type has a different image built from a schematic -- a YAML file that declares which system extensions and overlays to include.

How Factory Images Work¶

flowchart LR
    A[Extension YAML<br/>e.g. intel.yaml] -->|POST| B[factory.talos.dev/schematics]
    B -->|Returns| C[Schematic ID<br/>97bf8e92...]
    C --> D[factory.talos.dev/installer/<br/>SCHEMATIC_ID:VERSION]
    D --> E[Custom Talos Image<br/>with Extensions]

The schematic ID is embedded in the installer image URL used by each node. For example:

factory.talos.dev/installer/97bf8e92fc6bba0f03928b859c08295d7615737b29db06a97be51dc63004e403:v1.12.4

Image Definitions¶

The justfile defines four image variables, one per node type:

cp_image     := "factory.talos.dev/installer/de94b242...:v1.12.4"  # RPi (control plane)
cp_amd_image := "factory.talos.dev/installer/f19ad7b4...:v1.12.4"  # AMD (control plane)
worker_intel_image := "factory.talos.dev/installer/97bf8e92...:v1.12.4"  # Intel workers

To regenerate schematic IDs after changing extensions:

just image-id

This POSTs each extension YAML to factory.talos.dev/schematics and prints the resulting IDs.

Extensions per Node Type¶

Intel WorkersAMD Nodes

extensions/intel.yaml

customization:
  systemExtensions:
    officialExtensions:
      - siderolabs/util-linux-tools
      - siderolabs/i915-ucode
      - siderolabs/intel-ucode

Extensions provide Intel GPU firmware (i915-ucode), CPU microcode updates (intel-ucode), and additional userspace utilities (util-linux-tools).

extensions/amd.yaml

customization:
  systemExtensions:
    officialExtensions:
      - siderolabs/util-linux-tools
      - siderolabs/amd-ucode
      - siderolabs/amdgpu-firmware

Extensions provide AMD CPU microcode and GPU firmware for the Lenovo 440p and Acemagician AM06 nodes running AMD processors.

Configuration Generation Flow¶

All Talos configuration is generated and applied through just recipes defined in pitower/talos/justfile.

flowchart TD
    A[secrets.sops.yaml<br/>Encrypted] -->|sops -d| B[secrets.yaml<br/>Decrypted]
    B --> C[talosctl gen config]
    D[general.patch] --> C
    E[controlplane.patch] --> C
    C --> F[clusterconfig/controlplane.yaml]
    C --> G[clusterconfig/worker.yaml]

    F --> H{Per-node patches}
    G --> H

    H -->|worker-01.patch| I[worker-01.yaml]
    H -->|worker-02.patch| J[worker-02.yaml]
    H -->|worker-03.patch| K[worker-03.yaml]
    H -->|worker-04.patch| L[worker-04.yaml]
    H -->|worker-05.patch| M[worker-05.yaml]
    H -->|worker-06.patch| N[worker-06.yaml]

Step 1: Generate Base Configs¶

just config

This decrypts secrets.sops.yaml and runs talosctl gen config with two global patches:

patches/general.patch -- applied to all nodes
patches/controlplane.patch -- applied to control plane nodes only

Output goes to clusterconfig/controlplane.yaml and clusterconfig/worker.yaml.

Step 2: Apply Per-Node Patches¶

just patch

Each node gets its own patch applied on top of the base config, setting hostname, install image, network interfaces, and VIP assignments. The patched configs are written to clusterconfig/worker-XX.yaml.

Control Plane Nodes Use controlplane.yaml as Base

worker-01, worker-02, and worker-03 are control plane nodes despite their naming. Their per-node patches are applied on top of controlplane.yaml, not worker.yaml.

Key Patches¶

General Patch (all nodes)¶

patches/general.patch

machine:
  kubelet:
    extraArgs:
      rotate-server-certificates: true
    extraConfig:
      imageGCHighThresholdPercent: 60
      imageGCLowThresholdPercent: 50
    extraMounts:
      - destination: /var/mnt/extra
        type: bind
        source: /var/mnt/extra
        options:
          - rbind
          - rshared
          - rw
  features:
    hostDNS:
      enabled: true
      forwardKubeDNSToHost: false
      resolveMemberNames: true
cluster:
  network:
    cni:
      name: none
  proxy:
    disabled: true

Setting	Purpose
`rotate-server-certificates`	Enables automatic kubelet server certificate rotation
`imageGCHighThresholdPercent: 60`	Triggers image garbage collection when disk usage exceeds 60%
`imageGCLowThresholdPercent: 50`	Stops GC when disk usage drops below 50%
Extra mount `/var/mnt/extra`	Provides a writable bind mount for workloads that need host-level storage
`hostDNS.enabled`	Enables Talos host-level DNS resolution
`resolveMemberNames`	Allows resolving cluster member names via host DNS
`cni.name: none`	Disables default CNI -- Cilium is installed as a post-bootstrap addon
`proxy.disabled: true`	Disables kube-proxy -- Cilium operates in kube-proxy replacement mode

Control Plane Patch¶

patches/controlplane.patch

cluster:
  allowSchedulingOnControlPlanes: true
  coreDNS:
    disabled: true
  apiServer:
    certSANs:
      - 127.0.0.1
    extraArgs:
      service-account-issuer: https://raw.githubusercontent.com/swibrow/home-ops/main/pitower/kubernetes
      service-account-jwks-uri: https://k8s.cluster.internal:6443/openid/v1/jwks

Setting	Purpose
`allowSchedulingOnControlPlanes`	Permits workloads on control plane nodes to maximize resource use
`coreDNS.disabled`	Disables built-in CoreDNS -- DNS is handled by Cilium or an alternative
`certSANs: [127.0.0.1]`	Adds localhost to the API server certificate SANs
`service-account-issuer`	Sets the OIDC issuer URL for service account tokens to a GitHub-hosted endpoint
`service-account-jwks-uri`	JWKS endpoint for verifying service account tokens

Per-Node Patches¶

Each node has a patch under patches/nodes/ that sets:

Hostname (e.g., worker-01)
Install image (factory image URL with schematic ID and Talos version)
Network interfaces (DHCP, VIP assignment for control plane nodes)
Install disk (where applicable, e.g., /dev/mmcblk0 for eMMC)

Example control plane node patch:

patches/nodes/worker-01.patch

machine:
  install:
    image: factory.talos.dev/installer/f19ad7b4...:v1.12.4
  network:
    hostname: worker-01
    interfaces:
      - deviceSelector:
          physical: true
        dhcp: true
        vip:
          ip: 192.168.0.200