Troubleshooting¶
This page provides solutions to common issues you might encounter when using TFOut.
Diagnostic Commands¶
Before diving into specific issues, these commands help gather diagnostic information:
# Check TerraformOutputs resources
kubectl get terraformoutputs --all-namespaces
# Get detailed status
kubectl describe terraformoutputs <name> -n <namespace>
# Check operator logs
kubectl logs -n tfout deployment/tfout-controller-manager --tail=100
# Check events
kubectl get events --field-selector involvedObject.kind=TerraformOutputs -n <namespace>
# Verify created resources
kubectl get configmaps,secrets -l managed-by=tfout -n <namespace>
Common Issues¶
1. TerraformOutputs Resource Not Syncing¶
Symptoms¶
syncStatus
showsFailed
orInProgress
lastSyncTime
is not updating- No ConfigMap/Secret created
Diagnosis¶
kubectl describe terraformoutputs <name> -n <namespace>
kubectl logs -n tfout deployment/tfout-controller-manager --tail=50
Common Causes & Solutions¶
Backend Access Issues
# Test S3 access manually
aws s3 ls s3://your-bucket/your-key
# Check IAM permissions
aws sts get-caller-identity
Invalid State File
# Verify state file exists and is valid JSON
aws s3 cp s3://your-bucket/your-key - | jq .
# Check for outputs section
aws s3 cp s3://your-bucket/your-key - | jq .outputs
Network Connectivity
# Test from within cluster
kubectl run debug-pod --image=amazon/aws-cli --rm -it -- aws s3 ls s3://your-bucket/
2. Permission Denied Errors¶
Symptoms¶
- Error messages containing "Access Denied" or "403 Forbidden"
- Logs show authentication/authorization failures
Solutions¶
For EKS with IRSA
# Verify service account annotation
kubectl get serviceaccount tfout-controller-manager -n tfout -o yaml
# Check IAM role trust policy
aws iam get-role --role-name your-tfout-role
For IAM Role Assumption
# Test role assumption
aws sts assume-role \
--role-arn arn:aws:iam::123456789012:role/your-role \
--role-session-name test-session
Required IAM Policy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:HeadObject"
],
"Resource": "arn:aws:s3:::your-bucket/*"
},
{
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": "arn:aws:s3:::your-bucket"
}
]
}
3. ConfigMap/Secret Not Created¶
Symptoms¶
- TerraformOutputs shows
Success
status - No ConfigMap or Secret appears in target namespace
Diagnosis¶
# Check if outputs exist in state
kubectl logs -n tfout deployment/tfout-controller-manager | grep "outputs found"
# Verify target namespace exists
kubectl get namespace <target-namespace>
# Check RBAC permissions
kubectl auth can-i create configmaps --as=system:serviceaccount:tfout:tfout-controller-manager -n <target-namespace>
Solutions¶
No Outputs in State
RBAC Issues
# Check ClusterRole
kubectl get clusterrole tfout-manager-role -o yaml
# Verify RoleBinding
kubectl get clusterrolebinding tfout-manager-rolebinding -o yaml
4. Operator Pod Not Starting¶
Symptoms¶
- TFOut pods in
Pending
,CrashLoopBackOff
, orError
state - No operator logs available
Diagnosis¶
Common Causes & Solutions¶
Image Pull Issues
# Check image pull secrets
kubectl get pods <pod-name> -n tfout -o yaml | grep -A5 imagePullSecrets
# Verify image exists
docker pull ghcr.io/swibrow/tfout:latest
Resource Constraints
# Increase resource limits
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 100m
memory: 128Mi
Security Context Issues
# Check security policies
kubectl get psp,securitycontextconstraints
# Try relaxed security context for debugging
securityContext: {}
podSecurityContext: {}
5. Slow Sync Performance¶
Symptoms¶
- Long delays between Terraform changes and Kubernetes updates
- High CPU/memory usage
- Timeout errors
Solutions¶
Optimize Sync Interval
Reduce State File Size
# Check state file size
aws s3api head-object --bucket your-bucket --key your-key
# Split large state files or use remote state data sources
Increase Resources
6. Metrics Not Available¶
Symptoms¶
- Prometheus metrics endpoint returns 404 or connection refused
- No metrics in monitoring dashboards
Diagnosis¶
# Check metrics endpoint
kubectl port-forward -n tfout deployment/tfout-controller-manager 8080:8080
curl http://localhost:8080/metrics
# Check service
kubectl get service -n tfout
Solutions¶
Metrics Disabled
# Enable metrics in Helm values
metrics:
enabled: true
port: 8080
controller:
metricsBindAddress: ":8080" # Not "0"
Network Policy Issues
# Check network policies
kubectl get networkpolicy -n tfout
# Test connectivity
kubectl run test-pod --image=curlimages/curl --rm -it -- \
curl http://tfout-metrics.tfout.svc.cluster.local:8080/metrics
Advanced Debugging¶
Enable Debug Logging¶
# Helm values for detailed logging
controller:
development: true
logLevel: debug
env:
- name: GOMAXPROCS
value: "1"
- name: GOMEMLIMIT
value: "100Mi"
Debug Pod Access¶
Create a debug pod to test connectivity:
apiVersion: v1
kind: Pod
metadata:
name: debug-pod
namespace: tfout
spec:
serviceAccountName: tfout-controller-manager
containers:
- name: debug
image: amazon/aws-cli
command: ["/bin/sh"]
args: ["-c", "sleep 3600"]
env:
- name: AWS_REGION
value: us-west-2
Then exec into it:
kubectl exec -it debug-pod -n tfout -- /bin/sh
# Test AWS access
aws sts get-caller-identity
aws s3 ls s3://your-bucket/
# Test network connectivity
curl -v https://s3.amazonaws.com
Operator Profiling¶
Enable profiling for performance issues:
Access profiling data:
kubectl port-forward -n tfout deployment/tfout-controller-manager 6060:6060
go tool pprof http://localhost:6060/debug/pprof/profile
State File Validation¶
Verify State Format¶
Valid Terraform state should have this structure:
{
"version": 4,
"terraform_version": "1.0.0",
"outputs": {
"example_output": {
"value": "example_value",
"type": "string",
"sensitive": false
}
}
}
Common State Issues¶
Missing Outputs Section
Corrupted State
# Validate JSON format
jq . terraform.tfstate > /dev/null && echo "Valid JSON" || echo "Invalid JSON"
State Lock Issues
# Check for state locks (if using DynamoDB)
aws dynamodb get-item --table-name terraform-locks --key '{"LockID":{"S":"bucket/key"}}'
Resource Management¶
Clean Up Orphaned Resources¶
# Find ConfigMaps/Secrets managed by TFOut
kubectl get configmaps,secrets -l managed-by=tfout --all-namespaces
# Clean up specific resources
kubectl delete configmap <name> -n <namespace>
kubectl delete secret <name> -n <namespace>
Force Resource Recreation¶
# Delete existing resources to force recreation
kubectl delete configmap <configmap-name> -n <namespace>
kubectl delete secret <secret-name> -n <namespace>
# Trigger resync
kubectl annotate terraformoutputs <name> -n <namespace> force-sync="$(date)"
Performance Tuning¶
Memory Optimization¶
resources:
limits:
memory: 256Mi # Adjust based on state file size
requests:
memory: 128Mi
env:
- name: GOMEMLIMIT
value: "200Mi" # Set Go memory limit
CPU Optimization¶
resources:
limits:
cpu: 500m
requests:
cpu: 100m
env:
- name: GOMAXPROCS
value: "2" # Limit Go runtime threads
Sync Optimization¶
spec:
# Adjust based on requirements
syncInterval: 15m # Longer intervals for stable environments
# Use multiple smaller TerraformOutputs instead of one large one
Getting Help¶
Collect Debug Information¶
When reporting issues, include:
# Cluster information
kubectl version
kubectl get nodes
# TFOut deployment info
helm list -n tfout
kubectl get pods -n tfout -o wide
# Resource status
kubectl get terraformoutputs --all-namespaces -o yaml > terraformoutputs.yaml
# Operator logs
kubectl logs -n tfout deployment/tfout-controller-manager --tail=200 > operator.log
# Events
kubectl get events --all-namespaces --sort-by='.lastTimestamp' > events.yaml
Community Resources¶
Support Levels¶
- Community Support: GitHub issues and discussions
- Bug Reports: Include debug information and reproduction steps
- Feature Requests: Describe use case and expected behavior
Prevention¶
Best Practices¶
- Test in Development First: Always test configuration changes in non-production
- Monitor Sync Status: Set up alerts for failed syncs
- Use Appropriate Intervals: Don't sync too frequently in production
- Implement RBAC: Use least-privilege access
- Version Control: Keep TerraformOutputs resources in Git
- Document Dependencies: Clearly document which applications depend on which outputs