One of the most interesting tech events I've attended this year was AWS Community Day Bangalore, which took place at the Conrad Hotel on May 23, 2025. Cloud enthusiasts, developers, architects, and AWS specialists from all around Bangalore gathered to share information, brainstorm, and go deeply into actual AWS deployments, and the vibe in the room was amazing.
"Failure is Inevitable — Be Ready: Chaos Testing AWS EKS with AWS FIS" was one talk that particularly caught my attention. I want to share what I learnt from this session with all of you because it completely changed the way I think about creating robust microservices.
Why Chaos Engineering Matters
The inspiring opening statement of the session was, "Failure is Inevitable." Things will break in the era of microservices and complicated distributed systems of today. Using a slide that depicted the reality of microservices. A complex network of interconnected services that seemed more like chaos than order the presenters expertly demonstrated this.
In traditional systems, features are created and then everything is left to chance. A new attitude is required when using AWS EKS (Elastic Kubernetes Service) to execute microservices. You must demonstrate your system's durability through controlled experiments; you cannot simply presume it.
Understanding Chaos Engineering
What is Chaos Engineering, then? According to the presenters, it is the process of conducting "controlled" experiments on a system in order to increase confidence in its resilience. Consider it this way: you purposefully introduce errors during business hours, when your team is prepared to watch and react, rather than waiting for your production system to break at three in the morning.
The core principles of Chaos Engineering include :
- Simulate Real World Scenarios: Replicate actual failure conditions
- Observe Impact: Monitor how your system responds
- Build Recovery Mechanisms: Design automatic healing processes
- Validate System Assumptions: Challenge your architectural beliefs
- Target Both Infrastructure & Application: Don't just test one layer
"The Need for a New Mindset" was one of the slides that truly spoke to me. The presenters highlighted five important changes in perspective:
Don't assume resilience, prove it — Don't trust what you haven't tested
Design for failures, not just functionality — Build with failure scenarios in mind
Shift from confidence to curiosity — Question your assumptions constantly
Break things before they break you — Proactive is better than reactive
Test both services and infrastructure — Holistic testing is essential
Creating Resilient Microservices
The presenters displayed an excellent diagram that outlined the process of developing resilient services. It is not enough to simply launch a new service and then forget about it. Rather, it must be built with resilience in mind from the beginning, with appropriate fallbacks and links to pre-existing services that can gracefully deal with failures.
When using AWS EKS to orchestrate numerous containerized microservices, this architecture thinking is essential. Using the AWS REST API Gateway, the presentation demonstrated a microservices architecture based on EKS. Included in the configuration were:
- AWS API Gateway as the entry point
- Network Load Balancer (NLB) and Application Load Balancer (ALB) for traffic distribution
- EKS clusters spread across multiple availability zones
- Multiple EKS nodes running different microservices (Service A, B, C)
- Databases connected to each service
You need this multi-layered, redundant design, but how can you be sure it can withstand stress? In this situation, AWS Fault Injection Service (FIS) is useful.
Enter AWS Fault Injection Service (FIS)
You may conduct controlled chaos experiments on your AWS infrastructure with AWS FIS, a fully managed solution. The presenters divided the main ideas of FIS into four parts:
Actions: The faults you inject (e.g., stop EC2 instances, failover databases)
Targets: The resources you affect (e.g., EC2 instances, ECS tasks)
Stop Conditions: Safety mechanisms to halt the experiment if things go wrong
Experiment Template: The blueprint of what to do, your chaos playbook
What impressed me most was the extensive list of AWS services FIS supports :
- Amazon CloudWatch
- Amazon DynamoDB
- Amazon EBS
- Amazon EC2
- Amazon ECS
- Amazon EKS (the focus of this talk)
- Amazon ElastiCache
- Amazon RDS
- Amazon S3
- Amazon Systems Manager
- Amazon VPC
With this degree of support for all AWS services, you can fully replicate real-world failures.
The Live Demo
The live demo walkthrough, where they used AWS FIS to illustrate EKS Pod Termination, was the session's high point. The FIS console displayed an experiment template intended to remove pods from an EKS cluster, as I could see on the screen. The setup for the experiment included:
- Experiment Template ID: Clearly defined for tracking
- S3 bucket log destination: For storing experiment logs
- Actions: Specifically targeting EKS pod deletion -** Stop conditions:** Safety nets to prevent runaway failures
- IAM Role: Proper permissions for FIS to execute the experiment
It was enlightening to watch them start the experiment and observe the system's reaction in real time.
How AWS FIS Helped Them
Following the demonstration, the presenters discussed the observable advantages of using chaotic engineering with AWS FIS:
- Proactive Risk Mitigation: Identifying weaknesses before they become incidents
- Improved Service Availability: Higher uptime due to tested resilience
- Data-Driven Reliability Improvements: Using experiment data to make informed architectural decisions
- Faster Incident Recovery: Teams already know how to respond because they've practiced
These benefits go beyond theory; they are quantifiable results that have an immediate effect on customer satisfaction and business continuity.
Advanced Capabilities
The session also covered some advanced FIS capabilities :
- Multi-Account Experiment Support: Run chaos experiments across multiple AWS accounts
- Custom Fault Injection: Create your own custom failure scenarios
- Scheduled Experiments: Automate chaos testing at regular intervals
- Advanced Monitoring using Event Bridge: Get real-time insights into experiment execution
- Safety Levers: Additional guardrails to ensure experiments don't cause unintended damage
These features make AWS FIS enterprise-ready and scalable for large organizations managing complex cloud infrastructures.
Lessons Learned
The presenters concluded by sharing priceless insights from their experience with chaotic engineering:
- Start Small, Scale Gradually: Don't try to chaos-test everything at once
- Hypotheses Are Critical: Always form a hypothesis before running an experiment
- Observability Is Non-Negotiable: You can't understand what you can't see
- Controlled Chaos Builds Confidence: Regular testing removes fear
- Operational Guardrails is must: Always have kill switches and rollback plans
Because they are based on actual experience, these lessons struck resonated deeply with me.
Call to Action
The presenters wrapped off with a useful foundation for a call to action:
Before You Begin
- Review system architecture
- Define steady state
- Form hypotheses
Planning the Experiment
- Start in test environments
- Consult team insights
- Review outage history
- Target high-impact systems
During the Experiment
- Use stop conditions
- Track metrics
- Iterate safely
Even if you're just starting out, chaotic engineering is approachable thanks to our methodical methodology.
Conclusion
This talk on chaotic engineering really stood out during the amazing learning experience that was AWS Community Day Bangalore 2025. I learned from the seminar that while failure in distributed systems is unavoidable, readiness makes all the difference. We can test our systems proactively and gain genuine trust in our architectures with the help of AWS FIS. I strongly suggest experimenting with chaotic engineering using AWS FIS if you're using microservices on EKS.
References
Event: AWS Community Day Bangalore 2025
Topic: Chaos Testing AWS EKS with AWS FIS
Date: May 23, 2025
Location: Conrad Benguluru


Top comments (0)