By: Kiran H — QA, Documentation & GitHub
Hindsight Hackathon — Team 1/0 Coders
The night before the demo, I found a hardcoded API key sitting in a committed file. One grep command. Thirty seconds. That's the kind of thing that ends a project's credibility before a judge reads a single line of code.
Everyone talks about the engineers who built the memory module, wired the LLM, and designed the execution engine. Nobody talks about the person who made sure none of it fell apart before the judges saw it.
That was my job. And it turned out to be just as technical — and just as important — as any of the features we shipped.
What My Role Actually Meant
On paper, my tasks were: write the README, clean up GitHub, run QA tests, handle the submission checklist. Sounds administrative. It wasn't.
In practice it meant:
- Being the first person to read the codebase as an outsider
- Finding the gaps between what the code does and what the docs say it does
- Running the full pipeline end to end before anyone else did
- Making sure no secrets, no broken imports, no hardcoded paths made it to main
If I missed something, the judges would find it. And judges don't give partial credit for 'it works on my machine.'
The README Is Not Documentation. It's a First Impression.
I've seen good projects lose to average ones because the README was a disaster. Judges are busy. They spend maybe 90 seconds on your repo before deciding if it's worth their full attention.
Our README had to answer five questions in that 90 seconds:
- What does this project do?
- How does it use Hindsight? (judges specifically check this)
- How do I run it?
- What are the API endpoints?
- Who built what?
I structured it around those five questions. Not around the code. Not around how we built it. Around what a judge needs to know.
The Hindsight section was the most important. The judges are evaluating memory integration — so I made that impossible to miss:
## How Hindsight Memory Works
Every code submission triggers a full memory pipeline:
1. signal_tracker.py captures behavioral signals
(time taken, edit count, error types, attempt number)
2. cognitive_analyzer.py converts signals into patterns
(rushing, overthinking, guessing, concept_gap)
3. hindsight.py stores patterns to Hindsight Cloud
via retain() with user_id metadata
4. On next session, recall() retrieves the user profile
and reflect() generates adaptive instructions for the LLM
Clear. Sequential. No jargon. A judge who has never seen our code can understand the memory architecture in 30 seconds.
The .gitignore Problem
Before I touched the README, I ran one command:
git log --all --full-history -- .env
I was checking if anyone had ever committed a .env file. They hadn't — but memory_data.json was being tracked. That file contains real user behavioral data from testing sessions. User IDs, patterns, session timestamps. Not something that should be in a public repo.
I updated .gitignore immediately:
# Secrets
.env
# Runtime data
memory_data.json
# Python
__pycache__/
*.pyc
.venv/
# Node
node_modules/
dist/
Then I checked every file for hardcoded API keys:
grep -r "sk-" . --include="*.py"
grep -r "gsk_" . --include="*.py"
grep -r "API_KEY" . --include="*.py" | grep -v "os.getenv"
One hit. A Groq key sitting in an old test file that never got cleaned up. Removed it, rotated the key, committed the fix. Thirty seconds of work that would have been a very bad moment in front of judges.
QA Testing: Finding Bugs Before the Judges Do
I ran the full 5-step integration test manually, documenting every response:
Step 1: GET /get_problem/p001 → 200 OK
Step 2: POST /submit_code → 200 OK, patterns detected
Step 3: GET /memory/recall/{user_id} → patterns stored in Hindsight
Step 4: GET /user_profile/{user_id} → profile populated
Step 5: POST /get_feedback → personalized hint returned
Then I tested the edge cases — the things developers forget to test because they're too close to their own code:
- Infinite loop submission → should timeout in 5 seconds, not hang forever
- Problem ID that doesn't exist → should return 404, not 500
- New user with no memory → should return default profile, not crash
- GET /problems/difficulty/hard → should return filtered list, not all problems
The infinite loop test was the most important. If a user submits:
def two_sum(nums, target):
while True:
pass
The execution service needs to kill that in 5 seconds. If it doesn't, the server hangs and every other user's request is blocked. It worked — the threading timeout in execution_service.py caught it cleanly.
Making the Repo Public Without Embarrassing the Team
Making a repo public isn't just flipping a switch. It means every commit, every branch name, every comment in every file is now visible to anyone.
I did a final sweep before flipping it:
- Checked all branch names were professional
- Verified the commit history told a clean story
- Confirmed requirements.txt matched what's actually installed
- Added repo description and topic tags: fastapi, groq, hindsight, ai, python, coding-mentor
The topic tags matter more than people think. They make the repo discoverable. Judges browsing Hindsight hackathon submissions can find us through tags.
Then I merged dev into main with a single clean commit:
"feat: complete backend MVP — AI coding mentor with Hindsight memory"
One commit message. Tells the whole story. Clean history.
The Submission Checklist Nobody Skips
The night before submission I went through every item:
- GitHub repo is public
- README explains Hindsight usage clearly
- .env.example has all keys, no real values
- All branches merged to main
- Demo video link in README
- Article links in README
- requirements.txt complete and updated
- No .env files committed
- No hardcoded API keys anywhere
- memory_data.json in .gitignore
Every item on that list is something that has cost a team points in a real submission. Not hypothetically — actually. A missing .env.example means a judge can't run your project. A private repo means they can't even see it.
What I Learned
- _Documentation is a product. _
- A judge reading your README is a user. Design it for them, not for yourself. Answer their questions in the order they'll ask them.
- _Security is a QA task, not a DevOps task. _
- Checking for hardcoded secrets takes 5 minutes. Rotating a leaked key and explaining it to your team takes much longer.
- _Edge cases are where judges look. _
- The happy path works for everyone. What happens with an infinite loop, a missing user, a bad problem ID — that's what separates a demo from a product.
- _Clean history tells a story. _
- A judge reading your commit log should understand what you built and in what order. One clear commit message beats ten 'fix stuff' commits every time.
- _The quality gate role is a real engineering role. _
- It requires understanding the full system, reading code you didn't write, and making judgment calls about what's acceptable to ship. That's not admin work. That's engineering.
Resources & Links
Hindsight GitHub: https://github\.com/vectorize\-io/hindsight
Hindsight Docs: https://hindsight\.vectorize\.io/
Agent Memory: https://vectorize\.io/features/agent\-memory
Top comments (0)