DEV Community

Aisalkyn Aidarova
Aisalkyn Aidarova

Posted on • Edited on

interview questions and answers

What is OPA (Open Policy Agent)?

OPA is a Policy-as-Code engine.
It allows you to write rules that allow or deny something.

Instead of manually checking security, access, or configuration, OPA enforces rules automatically.

Example real situations:

  • "Do not allow public S3 buckets"
  • "Do not allow SSH open to 0.0.0.0/0"
  • "Only approved users can deploy to production"
  • "Every resource must have cost and owner tags"

OPA works everywhere: Kubernetes, AWS services, microservices, APIs, etc.


What is Rego?

Rego is the language used to write OPA policies.

If OPA is the engine,
Rego is the language that tells the engine what to do.

Rego reads:

  • input = information about the request/resource
  • rules = your conditions
  • decision = allow or deny

Where OPA is Used in AWS

Location Purpose Tool
EKS / Kubernetes Block bad deployments OPA Gatekeeper
Terraform Deployment Pipeline Block non-compliant AWS resources before creation opa eval in CI/CD
AWS Accounts Compliance Detect & auto-fix violations AWS Config + EventBridge + OPA
API Security Decide whether a request is allowed OPA sidecar / Envoy

Real Interview-Ready Example (Clear & Simple)

Most recent real example

In my last project, I used OPA to enforce no public S3 buckets across multiple AWS accounts.

Why?
SecurityHub kept reporting public buckets → that is a major data leakage risk.

What I did:

  1. Wrote a Rego policy that checks bucket ACL and encryption.
  2. Integrated it with AWS Config + EventBridge.
  3. If a bucket becomes public:
  • It immediately triggers remediation.
  • ACL is reset to private.
  • Encryption is enabled.

Outcome:

  • Public buckets reduced to zero
  • SecurityHub compliance scores increased
  • Passed internal audit successfully

Say this in the interview — it is perfect.


What AWS Resources I Wrote OPA Policies For

  • S3 (prevent public access, enforce encryption)
  • EC2 (enforce tags, restrict AMIs to approved ones)
  • Security Groups (deny 0.0.0.0/0 for SSH or DB ports)
  • IAM Roles (restrict privileged policies)
  • EKS deployments (no privileged containers, required labels, prevent host networking)

How I Integrate OPA with EKS, API Gateway, IAM

EKS

I deploy OPA Gatekeeper → it checks Kubernetes manifests before they are applied.
If the manifest violates policy → deployment is blocked.

CI/CD (Terraform)

OPA evaluates terraform plan.
If the plan tries to create a non-compliant resource → pipeline fails.

AWS Config + EventBridge

Used for continuous monitoring and auto-remediation of live AWS resources.


How I Test and Validate OPA Policies

Step What I Do Purpose
1. Unit Tests (opa test) Test policy logic locally Find mistakes early
2. Dry-Run Mode Run policy in audit mode first Avoid production impact
3. CI/CD Integration Validate during Terraform/K8s deployment Prevent bad infrastructure release

This shows control and safe rollout — interviewers like that.


Tools I Use to Manage OPA

  • Terraform → to deploy OPA/Gatekeeper configurations
  • GitHub Actions / Jenkins → to run policy checks in CI/CD
  • Argo CD → to sync policies to multiple clusters
  • AWS Config / SecurityHub → continuous evaluation

How I Version, Audit, and Roll Back Policies

  • All policies stored in Git
  • Every change goes through Pull Request review
  • CI automatically tests before merging
  • Git history gives traceability and audit record
  • If a policy causes issues → simply rollback commit

Debugging OPA (Important Answer)

When a policy blocks something unexpectedly:

  1. I use opa eval --explain=full to see why.
  2. For Gatekeeper, I check audit logs to see which rule triggered.
  3. I adjust conditions to be more specific.

This shows controlled troubleshooting.


Performance Optimization

To keep OPA fast:

  • Avoid nested loops
  • Store lookup data in data (OPA's memory store)
  • Reuse computed logic instead of recalculating

This prevents OPA becoming slow in high-traffic systems.


Deploying OPA in Microservices (API Security)

  • OPA runs as a sidecar or Envoy external-auth
  • Microservice sends request metadata to OPA
  • OPA returns allow/deny
  • No central dependency → very fast and scalable

Using OPA in Kubernetes (Gatekeeper)

To manage multiple clusters:

  • We store policies in a central Git repo
  • Use Argo CD to push updates to all clusters
  • This ensures same security rules everywhere

INTERVIEW MASTER SHEET (FULL ANSWERS READY TO SPEAK)


1) Tell Me About Your Role / Project

Answer:

I built a Profile Service application on AWS using EKS, DynamoDB, and Terraform. The application is a Python Flask API deployed on Kubernetes behind an AWS ALB. I used IRSA to allow the pods to securely access DynamoDB without storing any credentials. The entire infrastructure is provisioned using Terraform, and CI/CD is automated with GitHub Actions to build, test, and deploy changes. I enabled observability using Prometheus and Grafana and implemented DynamoDB Global Tables for multi-region active-active disaster recovery. This project demonstrates end-to-end DevOps and SRE practices including automation, security, scaling, monitoring, and reliability.


2) Difference Between DevOps and SRE

Answer:

DevOps focuses on automation, CI/CD, and improving the speed of delivery.
SRE focuses on reliability, stability, and uptime in production environments.

DevOps = Ship Faster
SRE = Keep It Reliable

SRE uses SLIs, SLOs, SLAs, and Error Budgets to balance reliability with deployment velocity.


3) SLI, SLO, SLA — and How They Tie Together

Answer:

SLI is the metric we measure (e.g., success rate, latency).
SLO is the target we want to achieve for that metric (e.g., 99.9% success).
SLA is the external legal/business promise we make to customers (e.g., 99.5% uptime or credits).

So the relationship is:

  • SLI provides the data
  • SLO provides the internal goal
  • SLA provides the external commitment

4) Three Important Functions of an SRE

Answer:

  1. Reliability Management — maintain availability & performance using SLIs/SLOs & error budgets.
  2. Monitoring & Incident Response — implement observability, run on-call, troubleshoot production.
  3. Automation & Toil Reduction — eliminate manual work through scripts, pipelines, and tooling.

5) Encryption In-Transit vs At-Rest

Answer:

In Transit: I use TLS/HTTPS on the ALB to encrypt all traffic to the app, and pod-to-DynamoDB communication is encrypted via HTTPS.

At Rest: DynamoDB is encrypted using KMS, EBS volumes are encrypted, and secrets are stored using Secrets Manager or IRSA—never in plain text.


6) DynamoDB — How You Worked With It

Answer:

I provision DynamoDB using Terraform. I design the schema around access patterns, choose a partition key for scalability, and enable KMS encryption and TTL.

I use IAM Role for Service Accounts (IRSA) to allow EKS pods to access DynamoDB securely without storing any credentials.

For multi-region DR, I use DynamoDB Global Tables to achieve active-active replication across regions.


7) Can DynamoDB Be Active-Active?

Answer:

Yes. DynamoDB supports active-active through Global Tables, which replicate data across multiple AWS regions automatically. Each region becomes a full read/write primary. This is commonly used in high-availability and multi-region architectures.


8) Disaster Recovery Strategy (High-Critical Systems like Netflix)

Answer:

For mission-critical systems, I recommend an Active-Active Multi-Region architecture:

  • Deploy services in multiple AWS regions
  • Use DynamoDB Global Tables for data replication
  • Use Route 53 latency routing for automatic regional failover
  • Ensure observability + health checks for traffic shift decisions This reduces downtime to near-zero and meets very low RPO/RTO requirements.

9) Troubleshooting: Application Cannot Reach Database

Answer:
I follow a layered debugging approach:

  1. Network: kubectl exec into pod → nslookup + nc -z <host> <port>
  2. DNS: Validate DB endpoint resolves
  3. IAM / IRSA: Ensure the correct role is attached to ServiceAccount
  4. Configuration: Validate env variables (TABLE_NAME, region, etc.)
  5. Logs / Flow Logs: Check CloudWatch + DynamoDB metrics for errors

This shows discipline & SRE troubleshooting mindset.


10) Python Automation in CI/CD

Answer:

Yes, I wrote Python automation that runs inside the CI/CD pipeline.
The script validates configs, checks required environment variables, and prevents deploying misconfigured code.
If the script fails, it returns a non-zero exit code, which causes the pipeline to stop, preventing bad deployments.


11) What Happens if the Script Fails?

Answer:

The pipeline stops immediately. The deployment does not proceed. Logs indicate failure, we address the issue, and re-run. This ensures production remains in a safe, known good state.


12) Why Are You Looking for a Change?

Answer:

I gained great experience working with large systems at Bank of America. However, the environment is process-heavy. I’m looking for a faster-moving engineering culture where I can have more hands-on technical ownership, especially around Kubernetes, Terraform, observability, and reliability automation.


🎤 Closing Line to End the Interview Strong

Answer:

Thank you for the discussion. I really enjoyed this conversation. The role aligns very well with my experience in Kubernetes, Terraform, observability, and SRE mindset. I would be excited to contribute and continue growing with your team.

Top comments (0)