kube-prometheus-stack¶
The kube-prometheus-stack Helm chart deploys a complete Prometheus monitoring pipeline. In the cluster, it provides Prometheus server, node-exporter, kube-state-metrics, and a curated set of recording and alerting rules for Kubernetes internals.
What's Included¶
| Component | Purpose |
|---|---|
| Prometheus | Time-series database that scrapes and stores metrics |
| node-exporter | Exposes hardware and OS-level metrics from each node |
| kube-state-metrics | Generates metrics about the state of Kubernetes objects (pods, deployments, PVCs) |
| PrometheusOperator | Manages Prometheus instances and watches for ServiceMonitor/PodMonitor CRDs |
| Recording Rules | Pre-computed queries for common Kubernetes metrics |
Grafana and Alertmanager
Grafana is deployed as a separate Helm release for independent lifecycle management. Alertmanager is currently disabled (alertmanager.enabled: false).
Prometheus Configuration¶
Prometheus is configured with the following key settings:
prometheusSpec:
externalLabels:
cluster: home-ops
ruleSelectorNilUsesHelmValues: false
serviceMonitorSelectorNilUsesHelmValues: false
podMonitorSelectorNilUsesHelmValues: false
probeSelectorNilUsesHelmValues: false
scrapeConfigSelectorNilUsesHelmValues: false
enableAdminAPI: true
walCompression: true
retentionSize: 15GB
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: ceph-block
resources:
requests:
storage: 20Gi
Selector Configuration
All *SelectorNilUsesHelmValues: false settings ensure Prometheus discovers ServiceMonitors, PodMonitors, ProbeMonitors, and recording rules from all namespaces -- not just those created by the Helm chart. This is essential for applications in other namespaces to be scraped.
Storage¶
Prometheus stores its TSDB on a 20Gi Ceph block volume (ceph-block StorageClass). WAL compression is enabled to reduce write amplification and disk usage. The retention policy is size-based at 15GB, meaning Prometheus will automatically prune old data when the TSDB approaches this limit.
Access¶
Prometheus is exposed internally via Envoy Gateway:
route:
main:
enabled: true
hostnames:
- prometheus.example.com
parentRefs:
- name: envoy-internal
namespace: networking
sectionName: https
This makes Prometheus available at https://prometheus.example.com for internal/VPN users only.
ServiceMonitor Pattern¶
The cluster uses ServiceMonitor resources extensively to define scrape targets. A ServiceMonitor tells Prometheus which services to scrape, on which port, and at which path. The Prometheus Operator watches for these CRDs and automatically configures Prometheus scrape jobs.
flowchart LR
App[Application Pod] -->|exposes /metrics| Svc[Kubernetes Service]
SM[ServiceMonitor] -->|selects| Svc
PO[Prometheus Operator] -->|watches| SM
PO -->|configures| Prom[Prometheus]
Prom -->|scrapes| Svc Applications with ServiceMonitors¶
The following applications across the cluster expose ServiceMonitors:
| Application | Namespace | Metrics |
|---|---|---|
| Cilium Agent | kube-system | eBPF datapath, policy, endpoint metrics |
| Cilium Operator | kube-system | Operator health, IPAM allocation |
| Hubble | kube-system | DNS, TCP, HTTP, ICMP, flow, drop, port-distribution |
| Hubble Relay | kube-system | Relay connection and forwarding metrics |
| external-dns | networking | DNS record sync metrics |
| cloudflared | networking | Tunnel connection metrics |
| nginx (external) | networking | HTTP request metrics |
| nginx (internal) | networking | HTTP request metrics |
| Authelia | security | Authentication and authorization metrics |
| External Secrets Operator | security | Secret sync metrics |
| Grafana | monitoring | Dashboard rendering, data source query metrics |
| snapshot-controller | system | Volume snapshot metrics |
| metrics-server | kube-system | API metrics |
Creating a ServiceMonitor¶
To add monitoring for a new application, create a ServiceMonitor in the application's namespace. Example for an app using the bjw-s app-template:
# In your app's values.yaml
service:
main:
ports:
http:
port: 8080
metrics:
port: 9090
serviceMonitor:
main:
enabled: true
endpoints:
- port: metrics
interval: 1m
For Helm charts that don't have built-in ServiceMonitor support, create one manually:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: my-app
namespace: my-namespace
spec:
selector:
matchLabels:
app.kubernetes.io/name: my-app
endpoints:
- port: metrics
interval: 1m
path: /metrics
Kubernetes Component Monitoring¶
The kube-prometheus-stack scrapes all major Kubernetes control plane components:
| Component | Enabled | Endpoints |
|---|---|---|
| kubelet | Yes | Auto-discovered |
| kube-apiserver | Yes | Auto-discovered |
| kube-controller-manager | Yes | 192.168.0.201, 192.168.0.202, 192.168.0.203 |
| kube-scheduler | Yes | 192.168.0.201, 192.168.0.202, 192.168.0.203 |
| etcd | Yes | 192.168.0.201, 192.168.0.202, 192.168.0.203 |
| kube-proxy | No | Disabled (Cilium replaces kube-proxy via eBPF) |
Static Endpoints
The controller-manager, scheduler, and etcd endpoints are statically configured to the three control plane node IPs because Talos Linux does not expose these components as Kubernetes services.
Metric Relabeling¶
The stack applies metric relabeling rules to reduce cardinality and filter out unnecessary metrics. Each component has a keep-list regex that retains only the metrics that are actually used in dashboards and alerts.
For example, the kubelet ServiceMonitor keeps only metrics matching prefixes like container_cpu, container_memory, kubelet_*, and drops high-cardinality labels like uid, id, and name:
metricRelabelings:
- action: keep
sourceLabels: ["__name__"]
regex: (container_cpu|container_memory|kubelet_*|...)_(.+)
- action: labeldrop
regex: (uid)
- action: labeldrop
regex: (id|name)
This keeps storage costs down and query performance high on the 20Gi PVC.
kube-state-metrics¶
kube-state-metrics is configured to expose all labels on key resource types, which enables label-based filtering in Grafana dashboards:
kube-state-metrics:
metricLabelsAllowlist:
- "deployments=[*]"
- "persistentvolumeclaims=[*]"
- "pods=[*]"
A relabeling rule also adds the kubernetes_node label to every metric, derived from the pod's node name, enabling per-node breakdowns in dashboards.
Helm Chart Reference¶
| Property | Value |
|---|---|
| Chart | prometheus-community/kube-prometheus-stack |
| Version | 81.6.9 |
| Namespace | monitoring |
| Manifest path | pitower/kubernetes/apps/monitoring/kube-prometheus-stack/ |