Rook Ceph¶

Rook Ceph provides distributed block storage for the cluster. Three 512 GB NVMe drives on the Acemagician AM06 nodes form a Ceph cluster managed by the Rook operator, delivering replicated, high-availability persistent volumes.

Architecture¶

flowchart TD
    subgraph Rook Operator
        OP[rook-ceph-operator\nManages Ceph lifecycle]
    end

    subgraph Ceph Cluster
        MON1[MON\nworker-01]
        MON2[MON\nworker-02]
        MON3[MON\nworker-03]
        MGR[MGR\nCluster manager]
        OSD1[OSD\nworker-01\n512GB NVMe]
        OSD2[OSD\nworker-02\n512GB NVMe]
        OSD3[OSD\nworker-03\n512GB NVMe]
    end

    subgraph Resources
        BP[CephBlockPool\nReplicated x3]
        SC[StorageClass\nceph-block]
        DASH[Dashboard\nrook.example.com]
    end

    OP --> MON1 & MON2 & MON3
    OP --> MGR
    OP --> OSD1 & OSD2 & OSD3
    BP --> OSD1 & OSD2 & OSD3
    SC --> BP
    MGR --> DASH

Repository Layout¶

The Rook Ceph deployment is split into three kustomization directories:

pitower/kubernetes/apps/rook-ceph/
├── operator/           # Rook operator Helm chart + CRDs
│   ├── kustomization.yaml
│   ├── namespace.yaml
│   └── values.yaml
├── cluster/            # CephCluster CR, dashboard HTTPRoute
│   ├── kustomization.yaml
│   ├── values.yaml
│   └── httproute.yaml
└── add-ons/            # Grafana dashboards for Ceph monitoring
    ├── kustomization.yaml
    └── dashboard/
        ├── kustomization.yaml
        ├── ceph-cluster-dashboard.json
        ├── ceph-osd-dashboard.json
        └── ceph-pools-dashboard.json

Separation of concerns

The operator and cluster are deployed as separate ArgoCD applications. This allows the operator to be upgraded independently of the cluster, and prevents accidental cluster disruption during operator updates.

Operator¶

The Rook operator is deployed via the rook-ceph Helm chart (v1.17.9) into the rook-ceph namespace:

operator/values.yaml

crds:
  enabled: true
csi:
  enableCephfsDriver: false
monitoring:
  enabled: false
resources:
  requests:
    memory: 128Mi
    cpu: 100m
  limits: {}

Key decisions:

CephFS driver disabled -- the cluster uses block storage only (cephFileSystems: [])
CRDs managed by the chart -- crds.enabled: true ensures CRDs are installed and upgraded with the operator

Cluster Configuration¶

The rook-ceph-cluster Helm chart deploys the CephCluster custom resource:

Monitors and Managers¶

Component	Count	Purpose
MON	3	Maintain cluster map consensus (one per node)
MGR	1	Cluster management, dashboard, metrics
OSD	3	One per NVMe drive, stores actual data

Storage Nodes¶

Each OSD is pinned to a specific NVMe device by disk ID to prevent accidental data loss:

cluster/values.yaml (storage section)

cephClusterSpec:
  storage:
    useAllNodes: false
    useAllDevices: false
    config:
      osdsPerDevice: "1"
    nodes:
      - name: "worker-01"
        devices:
          - name: "/dev/disk/by-id/nvme-AirDisk_512GB_SSD_NFQ0044006866P70GX"
      - name: "worker-02"
        devices:
          - name: "/dev/disk/by-id/nvme-AirDisk_512GB_SSD_NFQ0044007344P70GX"
      - name: "worker-03"
        devices:
          - name: "/dev/disk/by-id/nvme-AirDisk_512GB_SSD_NFQ0044010702P70GX"

Device selection

useAllNodes and useAllDevices are both set to false. Each node and device is explicitly listed to prevent Ceph from consuming unintended disks. Devices are referenced by /dev/disk/by-id/ paths for stability across reboots.

Network¶

cephClusterSpec:
  network:
    provider: host

Host networking is used for Ceph daemons to maximize throughput and minimize latency between OSDs and monitors.

Resource Limits¶

Daemon	CPU Request	Memory Request	Memory Limit
MGR	125m	512Mi	2Gi
MON	49m	512Mi	1Gi
OSD	442m	1Gi	6Gi
MGR Sidecar	49m	128Mi	256Mi
Crash Collector	15m	64Mi	64Mi
Log Collector	100m	100Mi	1Gi

Storage Resources¶

CephBlockPool¶

The default block pool provides three-way replication across the three OSD nodes. This is configured through the rook-ceph-cluster Helm chart's defaults.

Current configuration

cephFileSystems: [] -- No CephFS filesystems are deployed
cephObjectStores: [] -- No S3-compatible object stores are deployed
cephBlockPoolsVolumeSnapshotClass.enabled: false -- Volume snapshot class for block pools is not yet enabled

StorageClass¶

The Helm chart creates a ceph-block StorageClass that provisions RBD (RADOS Block Device) volumes from the block pool. Applications request storage through PVCs referencing this class:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-app-data
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: ceph-block
  resources:
    requests:
      storage: 10Gi

Dashboard¶

The Ceph dashboard is enabled and accessible via the internal gateway:

cluster/httproute.yaml

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: rook-ceph-dashboard
  namespace: rook-ceph
spec:
  hostnames:
    - rook.example.com
  parentRefs:
    - name: envoy-internal
      namespace: networking
      sectionName: https
  rules:
    - backendRefs:
        - name: rook-ceph-mgr-dashboard
          port: 7000

Access the dashboard at https://rook.example.com from the internal network or via Tailscale VPN.

Dashboard credentials

The dashboard admin password is stored in the rook-ceph-dashboard-password secret in the rook-ceph namespace:

kubectl -n rook-ceph get secret rook-ceph-dashboard-password \
  -o jsonpath='{.data.password}' | base64 -d

Monitoring¶

Grafana Dashboards¶

Three Grafana dashboards are deployed as ConfigMaps with the grafana_dashboard: "true" label, automatically discovered by the Grafana sidecar:

Dashboard	Grafana ID	Purpose
Ceph Cluster	2842	Overall cluster health, IOPS, throughput
Ceph OSD	5336	Per-OSD performance and utilization
Ceph Pools	5342	Pool-level statistics and capacity

Toolbox¶

The Rook toolbox pod is enabled (toolbox.enabled: true) for interactive Ceph CLI troubleshooting:

kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash

# Inside the toolbox
ceph status
ceph osd status
ceph df
rados df

Health Checks¶

Common commands to verify Ceph cluster health:

Quick StatusOSD TreePool UsagePG Status

kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph status

kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph osd tree

kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph df

kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph pg stat

Data safety

The cleanupPolicy.confirmation field is left empty (""). Setting it to "yes-really-destroy-data" would allow the cleanup job to wipe all Ceph data when the CephCluster resource is deleted. Never change this unless you are intentionally decommissioning the cluster.