DEV Community

Krithika Murugesan
Krithika Murugesan

Posted on

How I Built a "Blind" AI Resume Screener to Fight Hiring Bias - and What AWS Taught Me Along the Way

A real-world journey through Agentic AI, cloud infrastructure, and the messy details no tutorial covers.

We talk a lot about AI's potential to transform hiring. But potential without responsibility is just hype. Unconscious bias in recruitment is a well-documented problem - and it starts long before the first interview, often in the first 30 seconds a recruiter spends skimming a resume.

So I built something to fix that.

This is the story of how I designed an AI-Powered "Blind" Resume Screener using CrewAI, OpenAI's GPT-4o, and AWS - and the real lessons I learned getting it to production.

The Problem: Bias Sneaks In Before the Conversation Starts

Names, email addresses, schools, zip codes - these data points have nothing to do with whether a candidate can do the job. Yet research consistently shows they influence hiring decisions. A resume with a "white-sounding" name gets more callbacks than an identical resume with a "Black-sounding" name. A Gmail address reads differently than a .edu one.

If we want fairer hiring, we need to remove that signal from the equation - before any human (or AI evaluator) sees the document.

The Architecture: How It Works
Step 1 - PII Redaction via LLM

Before any evaluation happens, the system passes the raw resume text through an LLM-based redaction layer. It automatically identifies and strips:

Names (first, last, full)
Email addresses
Physical addresses and zip codes
Phone numbers

The result is a clean, anonymized document that evaluators - human or AI - interact with exclusively.

Step 2 - Agentic Evaluation with CrewAI

This is where it gets interesting. I used CrewAI to orchestrate a two-agent pipeline:
Technical Skill Matcher - Compares the redacted resume against the job requirements, scoring alignment across skills, tools, and experience domains.
Technical Interviewer - Generates targeted follow-up questions based purely on the technical content, as if preparing for a first-round screen.

These agents run sequentially, each feeding context to the next, creating a structured evaluation report - all without ever knowing who the candidate is.

Step 3 - Cloud Storage & Notifications

Once the evaluation is complete:
Results are archived in AWS DynamoDB under a CandidateScores table keyed by CandidateID
A summary email is dispatched via Amazon SES to the hiring team the moment processing finishes

The whole pipeline is event-driven - submit a resume, get a structured evaluation report in your inbox. No manual steps.

The Tech Stack

AI Agent Application

├── Agent Orchestration
│ ├── CrewAI
│ └── LangChain

├── Large Language Model (LLM)
│ └── OpenAI GPT-4o

├── Cloud Storage
│ └── AWS DynamoDB

├── Notifications
│ └── Amazon SES

├── Security
│ └── AWS IAM
│ └── Custom Inline Policies

└── Programming Language
 └── Python
 └── Boto3 SDK

Lessons Learned - The Stuff No Tutorial Covers

1. Strategic LLM Pivoting: Why I Left AWS Bedrock

I originally planned to use AWS Bedrock for the PII extraction layer. It made sense on paper - keep everything in the AWS ecosystem, simplify IAM, reduce external dependencies.

In practice, I ran into model availability restrictions in my target region and latency issues that slowed the redaction step significantly. After benchmarking, I pivoted to OpenAI's gpt-4o-mini for extraction - and the difference was immediate. Faster responses, cleaner redaction output, and no regional gatekeeping.

The lesson: don't be dogmatic about staying in one vendor's ecosystem. The right tool for the job is the right tool.

2. IAM: Moving Beyond "Full Access"

When you're learning AWS, the easiest path is attaching *FullAccess policies and moving on. I started there too.

But as I got closer to a real deployment, I rebuilt my IAM policies from scratch using the Principle of Least Privilege. My agent's execution role now grants only:

dynamodb:PutItem and dynamodb:GetItem on the specific CandidateScores table
ses:SendEmail from a verified sender identity

That's it. Writing custom inline JSON policies in the IAM console felt tedious at first, but it forced me to understand exactly what my system needed - and nothing more. In a real production environment handling candidate data, this isn't optional. It's the baseline.

3. DNS & Regional Configuration: The gaierror Rabbit Hole

This one cost me the better part of an afternoon.
My local Python environment kept throwing gaierror - a socket-level DNS resolution failure - when trying to reach AWS service endpoints. The root cause was a regional mismatch: my boto3 client was initialized with one region, while my environment variables pointed to another.

The fix sounds obvious in hindsight: standardize your region configuration across every layer. Pick a region, set it in your .env, and make sure every boto3 client instantiation either reads from that env variable or has it hardcoded consistently.

For SES specifically, there's an additional gotcha - email verification is case-sensitive and sandbox-scoped. In sandbox mode, both the sender and receiver addresses must be individually verified. And Say2Name@gmail.com is not the same identity as say2name@gmail.com. The console will accept your message, AWS will silently fail to deliver it, and you'll spend an hour wondering what went wrong.

Check your spam folder. Check your verification status. Check your casing.

What This Project Really Taught Me

Building this end-to-end forced me to stop thinking about AI and cloud infrastructure as separate disciplines. The agent logic is only as reliable as the infrastructure running beneath it. IAM policies shape what your agents can actually do. Regional misconfigurations break workflows that look perfectly correct in code.

More importantly - building a responsible AI system requires more deliberate design than building a capable one. The PII redaction step doesn't make the pipeline smarter. It makes it fairer. That's a different optimization target, and it requires explicit architectural choices.

What's Next

A few directions I'm actively exploring:
Bias audit layer - post-evaluation analysis to flag whether the scoring distribution shows demographic skew across a batch of candidates

Structured output schemas - moving from free-text evaluation reports to JSON-schema-validated outputs for downstream ATS integration

Multi-model evaluation - running parallel evaluations across different LLMs and comparing scoring consistency

Try It Yourself

The full codebase is available on GitHub: https://lnkd.in/dMJ-8aAH

If you're working on responsible AI applications or building agentic systems on AWS, I'd love to hear what you're building. Drop a comment or connect.

Top comments (0)