Skip to content

kube-prometheus-stack

The kube-prometheus-stack Helm chart deploys a complete Prometheus monitoring pipeline. In the cluster, it provides Prometheus server, node-exporter, kube-state-metrics, and a curated set of recording and alerting rules for Kubernetes internals.

What's Included

Component Purpose
Prometheus Time-series database that scrapes and stores metrics
node-exporter Exposes hardware and OS-level metrics from each node
kube-state-metrics Generates metrics about the state of Kubernetes objects (pods, deployments, PVCs)
PrometheusOperator Manages Prometheus instances and watches for ServiceMonitor/PodMonitor CRDs
Recording Rules Pre-computed queries for common Kubernetes metrics

Grafana and Alertmanager

Grafana is deployed as a separate Helm release for independent lifecycle management. Alertmanager is currently disabled (alertmanager.enabled: false).

Prometheus Configuration

Prometheus is configured with the following key settings:

prometheusSpec:
  externalLabels:
    cluster: home-ops
  ruleSelectorNilUsesHelmValues: false
  serviceMonitorSelectorNilUsesHelmValues: false
  podMonitorSelectorNilUsesHelmValues: false
  probeSelectorNilUsesHelmValues: false
  scrapeConfigSelectorNilUsesHelmValues: false
  enableAdminAPI: true
  walCompression: true
  retentionSize: 15GB
  storageSpec:
    volumeClaimTemplate:
      spec:
        storageClassName: ceph-block
        resources:
          requests:
            storage: 20Gi

Selector Configuration

All *SelectorNilUsesHelmValues: false settings ensure Prometheus discovers ServiceMonitors, PodMonitors, ProbeMonitors, and recording rules from all namespaces -- not just those created by the Helm chart. This is essential for applications in other namespaces to be scraped.

Storage

Prometheus stores its TSDB on a 20Gi Ceph block volume (ceph-block StorageClass). WAL compression is enabled to reduce write amplification and disk usage. The retention policy is size-based at 15GB, meaning Prometheus will automatically prune old data when the TSDB approaches this limit.

Access

Prometheus is exposed internally via Envoy Gateway:

route:
  main:
    enabled: true
    hostnames:
      - prometheus.example.com
    parentRefs:
      - name: envoy-internal
        namespace: networking
        sectionName: https

This makes Prometheus available at https://prometheus.example.com for internal/VPN users only.

ServiceMonitor Pattern

The cluster uses ServiceMonitor resources extensively to define scrape targets. A ServiceMonitor tells Prometheus which services to scrape, on which port, and at which path. The Prometheus Operator watches for these CRDs and automatically configures Prometheus scrape jobs.

flowchart LR
    App[Application Pod] -->|exposes /metrics| Svc[Kubernetes Service]
    SM[ServiceMonitor] -->|selects| Svc
    PO[Prometheus Operator] -->|watches| SM
    PO -->|configures| Prom[Prometheus]
    Prom -->|scrapes| Svc

Applications with ServiceMonitors

The following applications across the cluster expose ServiceMonitors:

Application Namespace Metrics
Cilium Agent kube-system eBPF datapath, policy, endpoint metrics
Cilium Operator kube-system Operator health, IPAM allocation
Hubble kube-system DNS, TCP, HTTP, ICMP, flow, drop, port-distribution
Hubble Relay kube-system Relay connection and forwarding metrics
external-dns networking DNS record sync metrics
cloudflared networking Tunnel connection metrics
nginx (external) networking HTTP request metrics
nginx (internal) networking HTTP request metrics
Authelia security Authentication and authorization metrics
External Secrets Operator security Secret sync metrics
Grafana monitoring Dashboard rendering, data source query metrics
snapshot-controller system Volume snapshot metrics
metrics-server kube-system API metrics

Creating a ServiceMonitor

To add monitoring for a new application, create a ServiceMonitor in the application's namespace. Example for an app using the bjw-s app-template:

# In your app's values.yaml
service:
  main:
    ports:
      http:
        port: 8080
      metrics:
        port: 9090

serviceMonitor:
  main:
    enabled: true
    endpoints:
      - port: metrics
        interval: 1m

For Helm charts that don't have built-in ServiceMonitor support, create one manually:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-app
  namespace: my-namespace
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: my-app
  endpoints:
    - port: metrics
      interval: 1m
      path: /metrics

Kubernetes Component Monitoring

The kube-prometheus-stack scrapes all major Kubernetes control plane components:

Component Enabled Endpoints
kubelet Yes Auto-discovered
kube-apiserver Yes Auto-discovered
kube-controller-manager Yes 192.168.0.201, 192.168.0.202, 192.168.0.203
kube-scheduler Yes 192.168.0.201, 192.168.0.202, 192.168.0.203
etcd Yes 192.168.0.201, 192.168.0.202, 192.168.0.203
kube-proxy No Disabled (Cilium replaces kube-proxy via eBPF)

Static Endpoints

The controller-manager, scheduler, and etcd endpoints are statically configured to the three control plane node IPs because Talos Linux does not expose these components as Kubernetes services.

Metric Relabeling

The stack applies metric relabeling rules to reduce cardinality and filter out unnecessary metrics. Each component has a keep-list regex that retains only the metrics that are actually used in dashboards and alerts.

For example, the kubelet ServiceMonitor keeps only metrics matching prefixes like container_cpu, container_memory, kubelet_*, and drops high-cardinality labels like uid, id, and name:

metricRelabelings:
  - action: keep
    sourceLabels: ["__name__"]
    regex: (container_cpu|container_memory|kubelet_*|...)_(.+)
  - action: labeldrop
    regex: (uid)
  - action: labeldrop
    regex: (id|name)

This keeps storage costs down and query performance high on the 20Gi PVC.

kube-state-metrics

kube-state-metrics is configured to expose all labels on key resource types, which enables label-based filtering in Grafana dashboards:

kube-state-metrics:
  metricLabelsAllowlist:
    - "deployments=[*]"
    - "persistentvolumeclaims=[*]"
    - "pods=[*]"

A relabeling rule also adds the kubernetes_node label to every metric, derived from the pod's node name, enabling per-node breakdowns in dashboards.

Helm Chart Reference

Property Value
Chart prometheus-community/kube-prometheus-stack
Version 81.6.9
Namespace monitoring
Manifest path pitower/kubernetes/apps/monitoring/kube-prometheus-stack/