Introduction
Troubleshooting Kubernetes issues often requires switching between logs, events, and documentation. K8sGPT simplifies this by using AI to analyze cluster problems and generate human-readable explanations and solutions.
In this summary, I’ll walk through how I used K8sGPT with Amazon Bedrock to diagnose issues inside an EKS cluster and how it improved the entire troubleshooting experience.
What is K8sGPT?
K8sGPT is an open-source CLI and operator that scans Kubernetes resources and uses AI models to:
- Detect misconfigurations
- Explain errors in simple language
- Provide actionable fixes
- Recommend best practices
Why Integrate with Amazon Bedrock?
Amazon Bedrock enhances K8sGPT by offering:
- Multiple LLM choices (Nova, Claude, Llama)
- Secure, enterprise-ready identity (no API keys)
- Low-latency inference from AWS regions
- Cost savings using lightweight models like Nova Lite Setup Overview For the demo, I used: EKS cluster
Broken and misconfigured workloads

K8sGPT with Amazon Bedrock Nova Lite model
1. Create the EKS Cluster
I used a simple eksctl configuration to provision a small EKS cluster.
2. Install K8sGPT
Download the K8sGPT CLI and place it in your system’s PATH.
The installation is straightforward — just download the binary, extract it, and move it into /usr/local/bin/.
3. Configure K8sGPT to Use Bedrock
K8sGPT supports Bedrock natively.
You configure it by adding:
- Backend provider →
amazonbedrock - Bedrock region →
us-east-1 - Model →
amazon.nova-lite-v1:0
Then set Bedrock as the default provider.
4. Create Problematic Kubernetes Resources
To test K8sGPT, I intentionally deployed workloads with issues:
Examples of injected failures
- Broken Image Pod → invalid image registry
- Resource Heavy Pod → impossible CPU/memory requests
- StatefulSet with Wrong StorageClass → PVC creation failure
These represent issues commonly faced in real Kubernetes environments.
5. Run K8sGPT Analysis
Once workloads are deployed:
k8sgpt analyze
k8sgpt analyze --filter Pod --namespace k8sgpt-demo --explain | head -20
K8sGPT quickly identifies issues and generates fixes using AI.
Example Output (Summarized)
- Image Pull Error Issue: Back-off pulling image Fix: Correct the registry/image tag
- Insufficient Resources Issue: Pods cannot schedule Fix: Adjust CPU/memory or scale nodegroup
- Invalid StorageClass Issue: PVCs stuck in Pending Fix: Update storage class or create a valid one
These explanations are far more readable compared to raw Kubernetes error messages.
Essential commands
1. Essential Test Commands:
- Basic Analysis (No AI)
k8sgpt analyze
2. AI-Powered Analysis (Uses Bedrock)
k8sgpt analyze --explain
3. Specific Namespace
k8sgpt analyze --explain --namespace k8sgpt-demo
4.Specific Resource Type
k8sgpt analyze --explain --filter Pod
k8sgpt analyze --explain --filter Deployment
5. Multiple Filters
k8sgpt analyze --explain --filter Pod,Deployment,Service
6. Check Configuration
k8sgpt auth list
k8sgpt filters list
7. Verbose Output
k8sgpt analyze --explain --verbose
Benefits Observed
1. Intelligent Issue Detection
K8sGPT identified issues across pods, StatefulSets, services, and jobs.
2. Human-Friendly Explanations
It rewrites cryptic Kubernetes errors into simple sentences.
3. Actionable Remediation Steps
Includes:
-
kubectlcommands - Configuration fixes
- Architecture recommendations
4. Speed
It analyzed more than a dozen issues in seconds.
Amazon Bedrock Model Options
| Model | Speed | Quality | Cost | Best Use Case |
|---|---|---|---|---|
| Nova Lite | Fastest | Great | Cheapest | Day-to-day troubleshooting |
| Nova Pro | High | Excellent | Moderate | Complex multi-resource analysis |
| Claude 3 Haiku | Fast | High | Good | Detailed explanations |
Best recommendation: Start with Nova Lite.
Fixing Issues (Summary)
Fix Image Pull Failure
Delete the broken pod and recreate it with a valid image.
Fix Misconfigured Deployment
Patch Deployment or update chart values.
Fix Storage Issues
Correct the StorageClass or create a valid one.
Production Considerations
Security
- Use IAM Roles
- Avoid embedding credentials
- Enable CloudTrail auditing
Cost
- Monitor Bedrock usage
- Use model tiering based on workload
Integrations
- Prometheus + Grafana
- Alerting systems
- CI/CD pre-deployment checks
In-Cluster Operator Deployment
You can deploy K8sGPT operator using Helm and set Bedrock as the backend for ongoing cluster analysis.
Before vs After K8sGPT
Before
- Manual debugging
- Slow MTTR
- High cognitive load
- Heavy dependency on senior engineers
After
- Fast AI-driven diagnostics
- Consistent solutions
- Junior engineers troubleshoot confidently
- Reduced MTTR significantly
Conclusion
K8sGPT combined with Amazon Bedrock is a powerful way to modernize Kubernetes troubleshooting. It minimizes time spent on debugging, improves team efficiency, and brings clarity to complex cluster issues.
If you're managing EKS environments at scale, this integration provides:
- Faster resolutions
- Clearer insights
- Better operational consistency
The future of Kubernetes troubleshooting is AI-driven, and tools like K8sGPT make that future easy to adopt.
Quick Start Checklist
- Configure AWS credentials
- Install K8sGPT
- Add Amazon Bedrock as AI backend
- Deploy workloads
- Run
k8sgpt analyze - Apply the recommended fixes
Happy Learning
Prithiviraj Rengarajan
DevOps Engineer




Top comments (0)