Metrics¶

The tfout exposes comprehensive Prometheus metrics to monitor its performance and behavior. These metrics help you track the health, performance, and usage patterns of your Terraform outputs synchronization.

Metrics Endpoint¶

Metrics are exposed on the controller-runtime metrics endpoint: - Path: /metrics - Port: 8080 (production) / 8080 (local development) - Format: Prometheus format - Authentication: Required in production, disabled for local development with --disable-metrics-auth

Available Metrics¶

Reconciliation Metrics¶

`terraform_outputs_reconcile_total`¶

Type: Counter Description: Total number of reconciliations performed Labels: - namespace: Namespace of the TerraformOutputs resource - name: Name of the TerraformOutputs resource - result: Result of reconciliation (success, error)

`terraform_outputs_reconcile_duration_seconds`¶

Type: Histogram Description: Duration of reconciliation operations in seconds Labels: - namespace: Namespace of the TerraformOutputs resource - name: Name of the TerraformOutputs resource - result: Result of reconciliation (success, error)

Backend Fetch Metrics¶

`terraform_outputs_backend_fetch_total`¶

Type: Counter Description: Total number of backend fetch operations Labels: - namespace: Namespace of the TerraformOutputs resource - name: Name of the TerraformOutputs resource - backend_type: Type of backend (s3) - backend_index: Index of the backend in the backends array (0-based) - result: Result of the fetch operation (success, error)

`terraform_outputs_backend_fetch_duration_seconds`¶

Type: Histogram Description: Duration of backend fetch operations in seconds Labels: - namespace: Namespace of the TerraformOutputs resource - name: Name of the TerraformOutputs resource - backend_type: Type of backend (s3) - backend_index: Index of the backend in the backends array (0-based)

Output Metrics¶

`terraform_outputs_found_total`¶

Type: Gauge Description: Total number of outputs found from all backends Labels: - namespace: Namespace of the TerraformOutputs resource - name: Name of the TerraformOutputs resource

`terraform_outputs_sensitive_total`¶

Type: Gauge Description: Total number of sensitive outputs found Labels: - namespace: Namespace of the TerraformOutputs resource - name: Name of the TerraformOutputs resource

`terraform_outputs_last_sync_timestamp`¶

Type: Gauge Description: Unix timestamp of the last successful sync Labels: - namespace: Namespace of the TerraformOutputs resource - name: Name of the TerraformOutputs resource

S3 Backend Metrics¶

`terraform_outputs_s3_requests_total`¶

Type: Counter Description: Total number of S3 API requests made Labels: - namespace: Namespace of the TerraformOutputs resource - name: Name of the TerraformOutputs resource - operation: S3 operation type (GetObject, HeadObject) - result: Result of the S3 request (success, error)

Kubernetes Resource Metrics¶

`terraform_outputs_configmap_operations_total`¶

Type: Counter Description: Total number of ConfigMap operations performed Labels: - namespace: Namespace of the TerraformOutputs resource - name: Name of the TerraformOutputs resource - operation: Kubernetes operation (create, update) - result: Result of the operation (success, error)

`terraform_outputs_secret_operations_total`¶

Type: Counter Description: Total number of Secret operations performed Labels: - namespace: Namespace of the TerraformOutputs resource - name: Name of the TerraformOutputs resource - operation: Kubernetes operation (create, update) - result: Result of the operation (success, error)

Example Queries¶

Basic Health Monitoring¶

# Reconciliation success rate
rate(terraform_outputs_reconcile_total{result="success"}[5m]) / rate(terraform_outputs_reconcile_total[5m])

# Average reconciliation duration
rate(terraform_outputs_reconcile_duration_seconds_sum[5m]) / rate(terraform_outputs_reconcile_duration_seconds_count[5m])

# Error rate
rate(terraform_outputs_reconcile_total{result="error"}[5m])

Backend Performance¶

# S3 request success rate
rate(terraform_outputs_s3_requests_total{result="success"}[5m]) / rate(terraform_outputs_s3_requests_total[5m])

# Backend fetch duration 95th percentile
histogram_quantile(0.95, rate(terraform_outputs_backend_fetch_duration_seconds_bucket[5m]))

# Failed backend fetches
rate(terraform_outputs_backend_fetch_total{result="error"}[5m])

Resource Operations¶

# ConfigMap create/update operations
rate(terraform_outputs_configmap_operations_total[5m])

# Secret operation errors
rate(terraform_outputs_secret_operations_total{result="error"}[5m])

Output Tracking¶

# Total outputs managed per resource
terraform_outputs_found_total

# Percentage of sensitive outputs
terraform_outputs_sensitive_total / terraform_outputs_found_total * 100

# Time since last successful sync
time() - terraform_outputs_last_sync_timestamp

Alerting Rules¶

Here are some example alerting rules for monitoring the operator:

groups:
- name: tfout
  rules:
  - alert: TerraformOutputsReconcileFailure
    expr: rate(terraform_outputs_reconcile_total{result="error"}[5m]) > 0
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "TerraformOutputs reconciliation failures detected"
      description: "{{ $labels.namespace }}/{{ $labels.name }} has failed reconciliations"

  - alert: TerraformOutputsS3Errors
    expr: rate(terraform_outputs_s3_requests_total{result="error"}[5m]) > 0
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "S3 request failures for TerraformOutputs"
      description: "S3 requests failing for {{ $labels.namespace }}/{{ $labels.name }}"

  - alert: TerraformOutputsStaleSync
    expr: time() - terraform_outputs_last_sync_timestamp > 3600
    for: 0m
    labels:
      severity: warning
    annotations:
      summary: "TerraformOutputs sync is stale"
      description: "{{ $labels.namespace }}/{{ $labels.name }} hasn't synced for over 1 hour"

  - alert: TerraformOutputsSlowReconcile
    expr: histogram_quantile(0.95, rate(terraform_outputs_reconcile_duration_seconds_bucket[5m])) > 30
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "TerraformOutputs reconciliation is slow"
      description: "95th percentile reconciliation time is over 30 seconds"

Grafana Dashboard¶

You can use these metrics to create comprehensive Grafana dashboards. Key panels to include:

Overview: Success rate, error rate, reconciliation count
Performance: Reconciliation duration, backend fetch duration
Outputs: Total outputs, sensitive outputs ratio
Backend Health: S3 request success rate, error breakdown
Resource Operations: ConfigMap/Secret create/update rates

Monitoring Best Practices¶

Set up alerts for reconciliation failures and S3 errors
Monitor sync freshness using terraform_outputs_last_sync_timestamp
Track performance trends with duration histograms
Monitor resource creation to ensure ConfigMaps and Secrets are being created
Watch for backend-specific issues using backend_type labels