AWS publishes new sample repos almost daily — reference architectures, agent patterns, CDK constructs, security blueprints — spread across 5 different GitHub organizations.
Most developers never see them until weeks later.
I built a system that catches them the same day. Here's how.
The Problem
AWS maintains 9,657+ repos across these orgs:
- aws-samples (8,044 repos)
- awslabs (992 repos)
- aws-ia (234 repos)
- aws-solutions (153 repos)
- aws-solutions-library-samples (234 repos)
When AWS publishes a new repo — say, a production-ready Bedrock AgentCore sample or a CDK construct for EKS — it's buried. GitHub search doesn't surface it. Google takes days to index it. By the time you find it, someone else already built on it.
The Solution: Auto-Indexing Pipeline
I built AWS Solution Finder with a twice-daily auto-indexing pipeline that detects and classifies new repos automatically.
Architecture
EventBridge (cron: 2 AM + 2 PM UTC daily)
↓
5 separate rules (one per org)
↓
Indexer Lambda (Docker, Python 3.12, 3GB, 15-min timeout)
↓
For each org:
- Fetch ALL repos from GitHub API (paginated)
- Compare with master index in S3 → find NEW + REMOVED repos
- Classify new repos with Bedrock Nova Pro (22 metadata fields)
- Generate Titan Embed v2 embeddings (1024-dim)
- Incrementally update FAISS index
- Save to S3 + publish new Lambda version
- Write run record to DynamoDB
The 22 Metadata Fields
Every repo gets classified by Bedrock Nova Pro:
- Solution type, competency area, AWS services used
- Primary/secondary language, deployment tools
- Cost range, setup time, complexity
- Business value, target audience, freshness status
- Agentic capabilities (yes/no/partial)
- And more
This is what makes search actually useful — you're not just matching keywords, you're matching intent against structured metadata.
Safety Guards
- If >50% of repos would be removed in a single run → blocked (likely GitHub API error)
- If GitHub returns 0 repos → master index NOT updated
- Ghost repo detection: repos in FAISS but not on GitHub get cleaned up
- Rate limiting on GitHub API calls (1 sec between classifications)
The New Feature: "New Repos This Week"
Today I shipped a badge on the search page showing repos added in the last 7 days:
- 🆕 44 new repos this week
- Grouped by org (aws-samples, awslabs, etc.)
- Direct GitHub links for each
- Rolling 7-day window, updated twice daily
This gives users a reason to come back daily — not just when they need to search.
Tech Stack
| Layer | Technology |
|---|---|
| Scheduling | Amazon EventBridge (5 rules, twice daily) |
| Compute | AWS Lambda (Docker, Python 3.12, 3GB) |
| AI Classification | Amazon Bedrock Nova Pro |
| Embeddings | Amazon Bedrock Titan Embed v2 (1024-dim) |
| Vector Search | FAISS (IndexFlatL2, in-memory) |
| Storage | Amazon S3 |
| Audit Trail | Amazon DynamoDB |
| IaC | AWS CDK (TypeScript) |
Total cost for the indexing pipeline: almost nothing. The Lambda runs for ~5 minutes twice a day. Bedrock calls are only for NEW repos (not re-classifying existing ones).
Try It
The 🆕 badge is live now. Free to try — 3 searches without registration.
→ awssolutionfinder.solutions.cloudnestle.com/search
What AWS repos have you discovered recently that more people should know about?
Built by Ajit NK — AWS Community Builder (Dev Tools). Building AI-powered developer tools at CloudNestle.


Top comments (0)