Kiran H

Posted on Mar 23

Nobody Talks About the Person Who Stops the Team from Shipping a Broken Demo

#ai #github #microsoft #python

By: Kiran H — QA, Documentation & GitHub

Hindsight Hackathon — Team 1/0 Coders

The night before the demo, I found a hardcoded API key sitting in a committed file. One grep command. Thirty seconds. That's the kind of thing that ends a project's credibility before a judge reads a single line of code.

Everyone talks about the engineers who built the memory module, wired the LLM, and designed the execution engine. Nobody talks about the person who made sure none of it fell apart before the judges saw it.

That was my job. And it turned out to be just as technical — and just as important — as any of the features we shipped.

What My Role Actually Meant

On paper, my tasks were: write the README, clean up GitHub, run QA tests, handle the submission checklist. Sounds administrative. It wasn't.

In practice it meant:

Being the first person to read the codebase as an outsider
Finding the gaps between what the code does and what the docs say it does
Running the full pipeline end to end before anyone else did
Making sure no secrets, no broken imports, no hardcoded paths made it to main

If I missed something, the judges would find it. And judges don't give partial credit for 'it works on my machine.'

The README Is Not Documentation. It's a First Impression.

I've seen good projects lose to average ones because the README was a disaster. Judges are busy. They spend maybe 90 seconds on your repo before deciding if it's worth their full attention.

Our README had to answer five questions in that 90 seconds:

What does this project do?
How does it use Hindsight? (judges specifically check this)
How do I run it?
What are the API endpoints?
Who built what?

I structured it around those five questions. Not around the code. Not around how we built it. Around what a judge needs to know.

The Hindsight section was the most important. The judges are evaluating memory integration — so I made that impossible to miss:

## How Hindsight Memory Works

Every code submission triggers a full memory pipeline:

1. signal_tracker.py captures behavioral signals

(time taken, edit count, error types, attempt number)

2. cognitive_analyzer.py converts signals into patterns

(rushing, overthinking, guessing, concept_gap)

3. hindsight.py stores patterns to Hindsight Cloud

via retain() with user_id metadata

4. On next session, recall() retrieves the user profile

and reflect() generates adaptive instructions for the LLM

Clear. Sequential. No jargon. A judge who has never seen our code can understand the memory architecture in 30 seconds.

The .gitignore Problem

Before I touched the README, I ran one command:

git log --all --full-history -- .env

I was checking if anyone had ever committed a .env file. They hadn't — but memory_data.json was being tracked. That file contains real user behavioral data from testing sessions. User IDs, patterns, session timestamps. Not something that should be in a public repo.

I updated .gitignore immediately:

# Secrets

.env

# Runtime data

memory_data.json

# Python

__pycache__/

*.pyc

.venv/

# Node

node_modules/

dist/

Then I checked every file for hardcoded API keys:

grep -r "sk-" . --include="*.py"

grep -r "gsk_" . --include="*.py"

grep -r "API_KEY" . --include="*.py" | grep -v "os.getenv"

One hit. A Groq key sitting in an old test file that never got cleaned up. Removed it, rotated the key, committed the fix. Thirty seconds of work that would have been a very bad moment in front of judges.

QA Testing: Finding Bugs Before the Judges Do

I ran the full 5-step integration test manually, documenting every response:

Step 1: GET /get_problem/p001 → 200 OK

Step 2: POST /submit_code → 200 OK, patterns detected

Step 3: GET /memory/recall/{user_id} → patterns stored in Hindsight

Step 4: GET /user_profile/{user_id} → profile populated

Step 5: POST /get_feedback → personalized hint returned

Then I tested the edge cases — the things developers forget to test because they're too close to their own code:

Infinite loop submission → should timeout in 5 seconds, not hang forever
Problem ID that doesn't exist → should return 404, not 500
New user with no memory → should return default profile, not crash
GET /problems/difficulty/hard → should return filtered list, not all problems

The infinite loop test was the most important. If a user submits:

def two_sum(nums, target):

while True:

    pass

The execution service needs to kill that in 5 seconds. If it doesn't, the server hangs and every other user's request is blocked. It worked — the threading timeout in execution_service.py caught it cleanly.

Making the Repo Public Without Embarrassing the Team

Making a repo public isn't just flipping a switch. It means every commit, every branch name, every comment in every file is now visible to anyone.

I did a final sweep before flipping it:

Checked all branch names were professional
Verified the commit history told a clean story
Confirmed requirements.txt matched what's actually installed
Added repo description and topic tags: fastapi, groq, hindsight, ai, python, coding-mentor

The topic tags matter more than people think. They make the repo discoverable. Judges browsing Hindsight hackathon submissions can find us through tags.

Then I merged dev into main with a single clean commit:

"feat: complete backend MVP — AI coding mentor with Hindsight memory"

One commit message. Tells the whole story. Clean history.

The Submission Checklist Nobody Skips

The night before submission I went through every item:

GitHub repo is public
README explains Hindsight usage clearly
.env.example has all keys, no real values
All branches merged to main
Demo video link in README
Article links in README
requirements.txt complete and updated
No .env files committed
No hardcoded API keys anywhere
memory_data.json in .gitignore

Every item on that list is something that has cost a team points in a real submission. Not hypothetically — actually. A missing .env.example means a judge can't run your project. A private repo means they can't even see it.

What I Learned

_Documentation is a product. _
A judge reading your README is a user. Design it for them, not for yourself. Answer their questions in the order they'll ask them.
_Security is a QA task, not a DevOps task. _
Checking for hardcoded secrets takes 5 minutes. Rotating a leaked key and explaining it to your team takes much longer.
_Edge cases are where judges look. _
The happy path works for everyone. What happens with an infinite loop, a missing user, a bad problem ID — that's what separates a demo from a product.
_Clean history tells a story. _
A judge reading your commit log should understand what you built and in what order. One clear commit message beats ten 'fix stuff' commits every time.
_The quality gate role is a real engineering role. _
It requires understanding the full system, reading code you didn't write, and making judgment calls about what's acceptable to ship. That's not admin work. That's engineering.