DEV Community

Ajit for AWS Community Builders

Posted on

How I Track 44 New AWS Repos Per Week Automatically with EventBridge, Bedrock, and FAISS

AWS publishes new sample repos almost daily — reference architectures, agent patterns, CDK constructs, security blueprints — spread across 5 different GitHub organizations.

Most developers never see them until weeks later.

I built a system that catches them the same day. Here's how.

The Problem

AWS maintains 9,657+ repos across these orgs:

  • aws-samples (8,044 repos)
  • awslabs (992 repos)
  • aws-ia (234 repos)
  • aws-solutions (153 repos)
  • aws-solutions-library-samples (234 repos)

When AWS publishes a new repo — say, a production-ready Bedrock AgentCore sample or a CDK construct for EKS — it's buried. GitHub search doesn't surface it. Google takes days to index it. By the time you find it, someone else already built on it.

The Solution: Auto-Indexing Pipeline

I built AWS Solution Finder with a twice-daily auto-indexing pipeline that detects and classifies new repos automatically.

Architecture

EventBridge (cron: 2 AM + 2 PM UTC daily)

5 separate rules (one per org)

Indexer Lambda (Docker, Python 3.12, 3GB, 15-min timeout)

For each org:

  1. Fetch ALL repos from GitHub API (paginated)
  2. Compare with master index in S3 → find NEW + REMOVED repos
  3. Classify new repos with Bedrock Nova Pro (22 metadata fields)
  4. Generate Titan Embed v2 embeddings (1024-dim)
  5. Incrementally update FAISS index
  6. Save to S3 + publish new Lambda version
  7. Write run record to DynamoDB

The 22 Metadata Fields

Every repo gets classified by Bedrock Nova Pro:

  • Solution type, competency area, AWS services used
  • Primary/secondary language, deployment tools
  • Cost range, setup time, complexity
  • Business value, target audience, freshness status
  • Agentic capabilities (yes/no/partial)
  • And more

This is what makes search actually useful — you're not just matching keywords, you're matching intent against structured metadata.

Safety Guards

  • If >50% of repos would be removed in a single run → blocked (likely GitHub API error)
  • If GitHub returns 0 repos → master index NOT updated
  • Ghost repo detection: repos in FAISS but not on GitHub get cleaned up
  • Rate limiting on GitHub API calls (1 sec between classifications)

The New Feature: "New Repos This Week"

Today I shipped a badge on the search page showing repos added in the last 7 days:

  • 🆕 44 new repos this week
  • Grouped by org (aws-samples, awslabs, etc.)
  • Direct GitHub links for each
  • Rolling 7-day window, updated twice daily

This gives users a reason to come back daily — not just when they need to search.

Tech Stack

Layer Technology
Scheduling Amazon EventBridge (5 rules, twice daily)
Compute AWS Lambda (Docker, Python 3.12, 3GB)
AI Classification Amazon Bedrock Nova Pro
Embeddings Amazon Bedrock Titan Embed v2 (1024-dim)
Vector Search FAISS (IndexFlatL2, in-memory)
Storage Amazon S3
Audit Trail Amazon DynamoDB
IaC AWS CDK (TypeScript)

Total cost for the indexing pipeline: almost nothing. The Lambda runs for ~5 minutes twice a day. Bedrock calls are only for NEW repos (not re-classifying existing ones).

Try It

The 🆕 badge is live now. Free to try — 3 searches without registration.

awssolutionfinder.solutions.cloudnestle.com/search

What AWS repos have you discovered recently that more people should know about?


Built by Ajit NK — AWS Community Builder (Dev Tools). Building AI-powered developer tools at CloudNestle.

Top comments (0)