R.A. Olanrewaju

Posted on Jun 22

I built a serverless agent that finds open-source issues for me every morning

#python #aws #serverless #opensource

I contribute to OWASP Nest. Before that I was spending 20-30 minutes manually browsing GitHub for issues that matched my stack, most of which turned out to be stale, already claimed, or just not a good fit. I got tired of that. So I built something to do it for me.

The result is OSS-Contrib-Scout: a Lambda function that runs every morning at 8am, searches GitHub for Python issues tagged good-first-issue or help-wanted, scores each one against my actual stack using Gemini AI, and pushes a ranked digest to Telegram.

This post covers how it's built, what broke, and what a clean production run actually looks like.

The architecture

Three modules, one handler, one schedule.

EventBridge Scheduler (daily, 8am UTC)
        │
        ▼
   Lambda Function
        │
        ├── github_search.py   → GitHub Issues Search API
        ├── scorer.py          → Gemini API (scoring + filtering)
        └── notifier.py        → Telegram Bot API

github_search.py hits GitHub's Issues Search API with a query like is:open is:issue language:Python label:"good first issue", extracts the fields that matter (title, repo, body, labels, comment count), and returns a list.

scorer.py sends each issue plus a short profile of my stack to Gemini and asks for a JSON response: a score from 1-10 and a one-line reason. Anything below 6 gets dropped. The top 5 by score go to Telegram.

notifier.py formats the results as an HTML message and POSTs to the Telegram Bot API. I used HTML over Telegram's Markdown format because repo names often contain underscores that break Markdown parsing.

lambda_function.py is thin on purpose. It just calls the three functions in order and returns a status dict.

def lambda_handler(event, context):
    raw_issues = search_issues(language="Python", max_results=8)
    issues = [extract_issue_summary(i) for i in raw_issues]
    top_matches = score_and_filter(issues, min_score=6, max_results=5)
    send_digest(top_matches)
    return {"statusCode": 200, "issuesFound": len(issues), "issuesSent": len(top_matches)}

Infrastructure is defined in a SAM template: one Lambda function, one EventBridge schedule, one IAM execution role scoped to CloudWatch Logs only (the function makes outbound HTTPS calls to GitHub, Gemini, and Telegram, it doesn't need any AWS service permissions beyond logging).

Why Lambda and not EC2

This job runs for about 45 seconds, once a day. An EC2 instance would idle for 23 hours 59 minutes burning credits. Lambda charges only for actual execution time, and at this volume, the cost sits comfortably inside AWS's permanent free tier (1M requests/month, 400K GB-seconds/month). Not the trial credits, the always-free tier. The agent costs $0.00 to run.

I also have a trial AWS account with a finite credit budget. SAA-C03 labs need those credits more than a daily cron job does.

What broke

Three things broke, in order of how annoying they were to debug.

1. Gemini's free tier has a 20 requests/day cap on some accounts.

I knew the published number was around 1,500 requests/day for Flash-Lite. What I didn't know is that the actual limit varies by account, project, and billing state. The real number for my account turned out to be 20. I found this out by running the scorer against 15 issues back-to-back and watching every single one fail with a 429.

The error body told me everything:

"quotaId": "GenerateRequestsPerDayPerProjectPerModel-FreeTier",
"quotaValue": "20"

I'd built retry logic assuming these were per-minute rate limits, the kind you can wait out. But 20 per day means once you've hit the ceiling, waiting 5 seconds and retrying just wastes another request. I ripped the retry block out, dropped max_results from 15 to 8, and moved on. The lesson: check your actual quota from the API response body, not the docs.

2. python-dotenv crashed the Lambda function.

I use load_dotenv() locally to read .env files. Lambda doesn't use .env files, it reads environment variables from its own configuration. I'd deliberately excluded python-dotenv from the Lambda package to keep dependencies light. What I forgot to do was make the import conditional.

Runtime.ImportModuleError: Unable to import module 'lambda_function': No module named 'dotenv'

Fix was two lines:

try:
    from dotenv import load_dotenv
    load_dotenv()
except ImportError:
    pass

3. GitHub's API timed out twice.

Read timeout on the search call. Not a rate limit, not an auth issue, just a slow moment on GitHub's end. The fix was bumping the timeout from 10 seconds to 20 and adding a retry with backoff:

for attempt in range(3):
    try:
        response = requests.get(url, params=params, headers=headers, timeout=20)
        response.raise_for_status()
        return response.json().get("items", [])
    except requests.exceptions.RequestException as e:
        time.sleep(3 * (attempt + 1))
raise last_error

The first clean run

After the Gemini quota reset overnight, the agent ran on its EventBridge schedule for the first time without me touching anything.

CloudWatch logs:

08:00:25  INIT_START
08:00:26  Starting daily OSS contribution scan...
08:00:26  GitHub API rate limit remaining: 9
08:00:26  Found 8 candidate issues from GitHub
08:01:10  4 issues passed the scoring bar
08:01:10  Digest sent.
08:01:10  Duration: 44490ms  Billed: 44884ms  Memory: 63MB/256MB

45 seconds. 63MB peak memory. $0.00.

What landed in Telegram:

[8/10] Alerts: add email (SMTP), Slack and a generic webhook
repo: SikamikanikoBG/homelab-monitor
why: Backend focus on Python, API, and common integrations. Good fit.

[7/10] Improvements: Integration tests should be async
repo: langchain-ai/langchain-google
why: Backend focus, Python, and testing align well. Scope is clear.

[7/10] API keys shows two keys when only one is added
repo: RunestoneInteractive/rs
why: Backend bug in Python/Django with security implications.

[7/10] Add a Canada PIPEDA policy profile as a data-driven config
repo: maziyarpanahi/openmed
why: Good fit for backend, Python, and security interest.

The Django API key bug is the one I'll probably look at first. It's a backend bug with security implications, which is my exact lane, and the scope is clear enough to get into in an afternoon.

The scoring prompt

The most important thing in the whole system is PROFILE_CONTEXT, the block of text Gemini reads before scoring each issue. If this is wrong, the scores are useless.

Mine looks roughly like:

Backend developer, ~5 years experience.
Core stack: Python, Django, DRF, FastAPI, PostgreSQL, Redis, Docker.
Interested in: security-related issues (OWASP-adjacent), backend bugs,
API design issues, Django/FastAPI specifically.
NOT looking for: frontend-only issues, niche ML/data-science,
documentation-only unless very quick wins.

I also tell Gemini to score down issues with 10+ comments (likely already claimed) and issues with empty bodies (hard to act on without knowing what's actually wanted).

Deployment

It's a SAM template, so deployment is:

sam build
sam deploy --guided

First run asks for the stack name, region, and the three secrets (Gemini key, Telegram bot token, chat ID). Subsequent runs are just sam build && sam deploy. SAM creates the Lambda function, IAM role, and EventBridge rule as a single CloudFormation stack.

The repo is at github.com/Jpeg-create/OSS-Contrib-Scout if you want to run your own version. The PROFILE_CONTEXT in scorer.py is the main thing you'd change.

What's next

Two more agents are in progress using the same architecture: OSS-Solution-Drafter (on-demand fix drafting via DeepSeek when I decide to pursue an issue) and Job-Scout (daily job listing digest with tailored cover letter drafts). Both will get their own posts once they're running.

The code is straightforward. The interesting parts were the quota discovery and working out which retry logic actually helps versus which makes things worse. Building it taught me more about Lambda, EventBridge, and SAM than any course section covering the same topics, which was the point.