Aisalkyn Aidarova

Posted on Nov 8 • Edited on Nov 10

interview questions and answers

#devops #security #interview #automation

What is OPA (Open Policy Agent)?

OPA is a Policy-as-Code engine.
It allows you to write rules that allow or deny something.

Instead of manually checking security, access, or configuration, OPA enforces rules automatically.

Example real situations:

"Do not allow public S3 buckets"
"Do not allow SSH open to 0.0.0.0/0"
"Only approved users can deploy to production"
"Every resource must have cost and owner tags"

OPA works everywhere: Kubernetes, AWS services, microservices, APIs, etc.

What is Rego?

Rego is the language used to write OPA policies.

If OPA is the engine,
Rego is the language that tells the engine what to do.

Rego reads:

input = information about the request/resource
rules = your conditions
decision = allow or deny

Where OPA is Used in AWS

Location	Purpose	Tool
EKS / Kubernetes	Block bad deployments	OPA Gatekeeper
Terraform Deployment Pipeline	Block non-compliant AWS resources before creation	`opa eval` in CI/CD
AWS Accounts Compliance	Detect & auto-fix violations	AWS Config + EventBridge + OPA
API Security	Decide whether a request is allowed	OPA sidecar / Envoy

Real Interview-Ready Example (Clear & Simple)

Most recent real example

In my last project, I used OPA to enforce no public S3 buckets across multiple AWS accounts.

Why?
SecurityHub kept reporting public buckets → that is a major data leakage risk.

What I did:

Wrote a Rego policy that checks bucket ACL and encryption.
Integrated it with AWS Config + EventBridge.
If a bucket becomes public:

It immediately triggers remediation.
ACL is reset to private.
Encryption is enabled.

Outcome:

Public buckets reduced to zero
SecurityHub compliance scores increased
Passed internal audit successfully

Say this in the interview — it is perfect.

What AWS Resources I Wrote OPA Policies For

S3 (prevent public access, enforce encryption)
EC2 (enforce tags, restrict AMIs to approved ones)
Security Groups (deny 0.0.0.0/0 for SSH or DB ports)
IAM Roles (restrict privileged policies)
EKS deployments (no privileged containers, required labels, prevent host networking)

How I Integrate OPA with EKS, API Gateway, IAM

EKS

I deploy OPA Gatekeeper → it checks Kubernetes manifests before they are applied.
If the manifest violates policy → deployment is blocked.

CI/CD (Terraform)

OPA evaluates terraform plan.
If the plan tries to create a non-compliant resource → pipeline fails.

AWS Config + EventBridge

Used for continuous monitoring and auto-remediation of live AWS resources.

How I Test and Validate OPA Policies

Step	What I Do	Purpose
1. Unit Tests (`opa test`)	Test policy logic locally	Find mistakes early
2. Dry-Run Mode	Run policy in audit mode first	Avoid production impact
3. CI/CD Integration	Validate during Terraform/K8s deployment	Prevent bad infrastructure release

This shows control and safe rollout — interviewers like that.

Tools I Use to Manage OPA

Terraform → to deploy OPA/Gatekeeper configurations
GitHub Actions / Jenkins → to run policy checks in CI/CD
Argo CD → to sync policies to multiple clusters
AWS Config / SecurityHub → continuous evaluation

How I Version, Audit, and Roll Back Policies

All policies stored in Git
Every change goes through Pull Request review
CI automatically tests before merging
Git history gives traceability and audit record
If a policy causes issues → simply rollback commit

Debugging OPA (Important Answer)

When a policy blocks something unexpectedly:

I use opa eval --explain=full to see why.
For Gatekeeper, I check audit logs to see which rule triggered.
I adjust conditions to be more specific.

This shows controlled troubleshooting.

Performance Optimization

To keep OPA fast:

Avoid nested loops
Store lookup data in data (OPA's memory store)
Reuse computed logic instead of recalculating

This prevents OPA becoming slow in high-traffic systems.

Deploying OPA in Microservices (API Security)

OPA runs as a sidecar or Envoy external-auth
Microservice sends request metadata to OPA
OPA returns allow/deny
No central dependency → very fast and scalable

Using OPA in Kubernetes (Gatekeeper)

To manage multiple clusters:

We store policies in a central Git repo
Use Argo CD to push updates to all clusters
This ensures same security rules everywhere

✅ INTERVIEW MASTER SHEET (FULL ANSWERS READY TO SPEAK)

1) Tell Me About Your Role / Project

Answer:

I built a Profile Service application on AWS using EKS, DynamoDB, and Terraform. The application is a Python Flask API deployed on Kubernetes behind an AWS ALB. I used IRSA to allow the pods to securely access DynamoDB without storing any credentials. The entire infrastructure is provisioned using Terraform, and CI/CD is automated with GitHub Actions to build, test, and deploy changes. I enabled observability using Prometheus and Grafana and implemented DynamoDB Global Tables for multi-region active-active disaster recovery. This project demonstrates end-to-end DevOps and SRE practices including automation, security, scaling, monitoring, and reliability.

2) Difference Between DevOps and SRE

Answer:

DevOps focuses on automation, CI/CD, and improving the speed of delivery.
SRE focuses on reliability, stability, and uptime in production environments.

DevOps = Ship Faster
SRE = Keep It Reliable

SRE uses SLIs, SLOs, SLAs, and Error Budgets to balance reliability with deployment velocity.

3) SLI, SLO, SLA — and How They Tie Together

Answer:

SLI is the metric we measure (e.g., success rate, latency).
SLO is the target we want to achieve for that metric (e.g., 99.9% success).
SLA is the external legal/business promise we make to customers (e.g., 99.5% uptime or credits).

So the relationship is:

SLI provides the data

SLO provides the internal goal

SLA provides the external commitment

4) Three Important Functions of an SRE

Answer:

Reliability Management — maintain availability & performance using SLIs/SLOs & error budgets.
Monitoring & Incident Response — implement observability, run on-call, troubleshoot production.
Automation & Toil Reduction — eliminate manual work through scripts, pipelines, and tooling.

5) Encryption In-Transit vs At-Rest

Answer:

In Transit: I use TLS/HTTPS on the ALB to encrypt all traffic to the app, and pod-to-DynamoDB communication is encrypted via HTTPS.

At Rest: DynamoDB is encrypted using KMS, EBS volumes are encrypted, and secrets are stored using Secrets Manager or IRSA—never in plain text.

6) DynamoDB — How You Worked With It

Answer:

I provision DynamoDB using Terraform. I design the schema around access patterns, choose a partition key for scalability, and enable KMS encryption and TTL.

I use IAM Role for Service Accounts (IRSA) to allow EKS pods to access DynamoDB securely without storing any credentials.

For multi-region DR, I use DynamoDB Global Tables to achieve active-active replication across regions.

7) Can DynamoDB Be Active-Active?

Answer:

Yes. DynamoDB supports active-active through Global Tables, which replicate data across multiple AWS regions automatically. Each region becomes a full read/write primary. This is commonly used in high-availability and multi-region architectures.

8) Disaster Recovery Strategy (High-Critical Systems like Netflix)

Answer:

For mission-critical systems, I recommend an Active-Active Multi-Region architecture:

Deploy services in multiple AWS regions

Use DynamoDB Global Tables for data replication

Use Route 53 latency routing for automatic regional failover

Ensure observability + health checks for traffic shift decisions This reduces downtime to near-zero and meets very low RPO/RTO requirements.

9) Troubleshooting: Application Cannot Reach Database

Answer:
I follow a layered debugging approach:

Network: kubectl exec into pod → nslookup + nc -z <host> <port>
DNS: Validate DB endpoint resolves
IAM / IRSA: Ensure the correct role is attached to ServiceAccount
Configuration: Validate env variables (TABLE_NAME, region, etc.)
Logs / Flow Logs: Check CloudWatch + DynamoDB metrics for errors

This shows discipline & SRE troubleshooting mindset.

10) Python Automation in CI/CD

Answer:

Yes, I wrote Python automation that runs inside the CI/CD pipeline.
The script validates configs, checks required environment variables, and prevents deploying misconfigured code.
If the script fails, it returns a non-zero exit code, which causes the pipeline to stop, preventing bad deployments.

11) What Happens if the Script Fails?

Answer:

The pipeline stops immediately. The deployment does not proceed. Logs indicate failure, we address the issue, and re-run. This ensures production remains in a safe, known good state.

12) Why Are You Looking for a Change?

Answer:

I gained great experience working with large systems at Bank of America. However, the environment is process-heavy. I’m looking for a faster-moving engineering culture where I can have more hands-on technical ownership, especially around Kubernetes, Terraform, observability, and reliability automation.

🎤 Closing Line to End the Interview Strong

Answer:

Thank you for the discussion. I really enjoyed this conversation. The role aligns very well with my experience in Kubernetes, Terraform, observability, and SRE mindset. I would be excited to contribute and continue growing with your team.

DEV Community

interview questions and answers

What is OPA (Open Policy Agent)?

What is Rego?

Where OPA is Used in AWS

Real Interview-Ready Example (Clear & Simple)

What AWS Resources I Wrote OPA Policies For

How I Integrate OPA with EKS, API Gateway, IAM

EKS

CI/CD (Terraform)

AWS Config + EventBridge

How I Test and Validate OPA Policies

Tools I Use to Manage OPA

How I Version, Audit, and Roll Back Policies

Debugging OPA (Important Answer)

Performance Optimization

Deploying OPA in Microservices (API Security)

Using OPA in Kubernetes (Gatekeeper)

✅ INTERVIEW MASTER SHEET (FULL ANSWERS READY TO SPEAK)

1) Tell Me About Your Role / Project

2) Difference Between DevOps and SRE

3) SLI, SLO, SLA — and How They Tie Together

4) Three Important Functions of an SRE

5) Encryption In-Transit vs At-Rest

6) DynamoDB — How You Worked With It

7) Can DynamoDB Be Active-Active?

8) Disaster Recovery Strategy (High-Critical Systems like Netflix)

9) Troubleshooting: Application Cannot Reach Database

10) Python Automation in CI/CD

11) What Happens if the Script Fails?

12) Why Are You Looking for a Change?

🎤 Closing Line to End the Interview Strong

Top comments (0)