Last week, I did something I've been dreading for months: I audited every single line of code from the 86 AI applications I shipped in the past year.
As a Data Scientist at Gramener, I've been moving fast—really fast. One new AI-powered application every 4.2 days, on average. ~25 commits per project. 25,000+ lines of code across 86 repositories. Natural language database interfaces. Schema generators. Hypothesis testing tools. Document analyzers. You name it, I've probably built a version of it.
But velocity comes with a cost. And I needed to know: Was I building fast, or was I building a disaster waiting to happen? So I conducted a forensic code review of all 86 repositories. The grade? B+ (revised from C+ after I realized most "critical vulnerabilities" were false alarms). Here's the brutal truth about what I found—and what you should steal from it.
The Context: One Repository Every 4.2 Days
I work at the intersection of velocity and enterprise. My job is to take a client's ambitious AI idea and turn it into a working prototype in days, not months. The tech stack is consistent: Python backends (FastAPI), JavaScript frontends (custom), and LLMs everywhere.
The pace is relentless. Eighty-six applications in twelve months. That's roughly one every 4.2 days.
I shipped 86 in one year. This isn't a brag—it's a confession. I needed to know if I was building value or just creating noise.
The Audit: 86 Repositories, 25,000 Lines of Code, 6 Hours of Analysis
The process was systematic:
- ✅ Deep dive into 3 major repositories (schemaforge, dailynews, routing-assistant)
- ✅ Security vulnerability scanning
- ✅ Architecture pattern identification across all 86 repos
- ✅ Code duplication analysis
- ✅ Technical debt assessment
Every repository. Every commit. Every shortcut I took when the client demo was in 48 hours and the LLM wasn't cooperating.
The initial result? C+. Multiple critical vulnerabilities. Hardcoded API keys. Prompt injection attacks. XSS vulnerabilities.
But then I looked closer. I built threat models. I asked: "Who's actually at risk here?"
The revised result? B+.
Two of the three "critical" vulnerabilities were false alarms—intentional architecture decisions that looked scary but weren't actually exploitable.
Here's what I really found.
The Good News: I'm Not Building a Disaster
Finding #1: Technical Debt is Actually Lower Than Average
The Numbers: 40-55% technical debt across all projects, compared to the industry average of 60-80%.
I was shocked. I expected to find a mess. Instead, I found that shipping fast with consistent patterns beats shipping slow with "perfect" code.
Why it worked:
- 70% of my repositories share an identical architecture
- The same pattern repeated = less cognitive load, fewer bugs
- LLM-first design means less code to maintain
The lesson: Architecture monoculture has benefits. When 70% of your apps work the same way, you get really good at that one way.
Finding #2: The LLM-First Architecture Pattern
The Pattern: 70% of my applications share this exact flow:
1. User Input
2. File Parser (XLSX.js, CSV, etc.)
3. Prompt Builder (construct LLM prompt)
4. asyncLLM Stream (streaming API calls)
5. partial-json Parser (parse incomplete JSON)
6. marked → Markdown rendering
7. lit-html → DOM rendering
8. Export/Download functionality
This isn't coincidence. It's a deliberate pattern I discovered and repeated.
The insight: Traditional architecture puts compute at the center: Input → Algorithm → Output
My architecture puts intelligence at the center: Input → LLM → Output (algorithm is emergent)
This explains the velocity. I'm not writing algorithms. I'm prompting them.
The lesson: You can ship 4x faster when you let LLMs handle the complexity.
Finding #3: LLMs Save 75% of Development Time
Let me show you real numbers from one of my projects: SchemaForge (a tool that generates database schemas from CSV files).
Traditional approach:
- Parse CSV with custom parser: 8 hours
- Infer data types (regex hell): 12 hours
- Detect relationships: 16 hours
- Generate SQL DDL: 8 hours
- Generate DBT models: 12 hours
- Write tests: 20 hours
- Debug edge cases: 16 hours
- Total: 92 hours
My LLM-first approach:
- Parse CSV with XLSX.js: 2 hours
- Build prompt with examples: 4 hours
- Stream LLM, parse JSON: 6 hours
- Render with marked + lit-html: 3 hours
- Debug + iterate: 8 hours
- Total: 23 hours
Time saved: 69 hours (75% reduction)
The lesson: The leverage isn't in writing code faster. It's in not writing code at all.
The Bad News: What I Got Wrong (And What I Got Right About Being Wrong)
The Plot Twist: 2 out of 3 "Critical" Issues Were False Alarms
I initially flagged three critical security vulnerabilities. But after building threat models and asking "Who's actually at risk?", I realized two were false positives.
False Alarm #1: "Hardcoded API Keys" 🚫
What I Found: API keys visible in public repositories.
Initial Assessment: CRITICAL
After Threat Modeling: INFO
Why the Downgrade:
These are domain-restricted demo keys. Think of Google Maps API keys embedded in websites—they're publicly visible, but only authorized domains can use them.
My architecture: Users provide their own API keys. The demo keys are for... demos. They're intentional, not accidental.
The lesson: Not all visible keys are vulnerabilities. Context matters.
False Alarm #2: "Prompt Injection Attacks" 🚫
What I Found: Unsanitized user input going directly into LLM prompts.
Initial Assessment: CRITICAL
After Threat Modeling: LOW
Why the Downgrade:
Users provide their own API keys. So if someone tries a "prompt injection" attack, they're attacking themselves, with their own money, to corrupt their own results.
This isn't a security vulnerability. This is "garbage in, garbage out."
The "victim" would be the attacker. There's no exploit here.
The lesson: Threat modeling isn't about finding scary patterns. It's about asking "Who gets hurt, and how?"
Real Issue #1: XSS via unsafeHTML ⚠️
What I Found: LLM outputs rendered with unsafeHTML() directive (from lit-html) without sanitization.
Initial Assessment: HIGH
After Threat Modeling: MEDIUM-HIGH (still a problem!)
Why It Matters:
If someone uploads a file with a crafted filename like:
data<img src=x onerror=alert('xss')>.csv
And the LLM includes that filename in its response, my code would render it as HTML, executing the JavaScript and potentially stealing API keys from localStorage.
The Fix:
// BAD (what I was doing):
import { unsafeHTML } from 'lit-html/directives/unsafe-html.js';
html`<div>${unsafeHTML(llmOutput)}</div>`
// GOOD (what I should do):
import DOMPurify from 'dompurify';
const cleanHTML = DOMPurify.sanitize(llmOutput);
html`<div>${unsafeHTML(cleanHTML)}</div>`
Time to fix: 4-6 hours across affected repositories.
The lesson: LLM outputs are user input. Treat them like you'd treat form submissions.
Real Issue #2: ~500 Lines of Duplicated Code
What I Found: The same code patterns copied across multiple repositories:
- File parsing logic
- LLM streaming setup
- Error handling patterns
Why It Happened: Velocity > abstraction. I was shipping too fast to stop and refactor.
The Cost: Changes require updating 10-20 repos instead of one shared library.
The Fix: Create a shared template repository with common utilities.
Time to fix: 10-15 hours to build the template (one-time investment).
Time saved: 200+ hours over the next year.
The lesson: Third time you copy-paste something? Extract it. Don't wait until repo #86.
What I'd Do Differently (Your Cheat Sheet)
Change #1: Build the Template at Repo #5, Not #86
The Mistake: I waited until project 86 to realize I was copying the same 500 lines of code everywhere.
The Fix: Create @prudhvi/llm-app-template with:
- DOMPurify pre-configured
- Streaming LLM setup (asyncLLM + partial-json parser)
- Common file parsers (CSV, Excel, JSON)
- lit-html + marked.js rendering pipeline
- Error handling utilities
- Pre-commit security hooks
Time investment: 10-15 hours (one time)
Time saved: 200+ hours over repos #87-172
The lesson: When you find yourself shipping one repo every 4 days, invest in a template. Future-you will thank you.
Change #2: Fix the Monoculture Risk
The Reality: 70% of my repos share identical architecture.
The Benefit: Fast iteration, consistent patterns, low cognitive load.
The Risk: One bug becomes 50 bugs. One dependency breaking change breaks 50 apps.
The Fix:
- Pin dependencies with exact versions (
openai==1.49.0, notopenai>=1.0) - Add Subresource Integrity (SRI) hashes to CDN imports
- Test dependency updates on one repo before updating all 50
The lesson: Architecture monoculture is a feature, not a bug—but you need guardrails.
Change #3: Stop Committing Log Files
What I Found: email_log.txt committed to multiple repos.
The Fix:
# Add to .gitignore
echo "email_log.txt" >> .gitignore
echo "*.log" >> .gitignore
git rm --cached email_log.txt
Time to fix: 1-2 hours
The lesson: Simple mistake, simple fix. But it clutters git history and wastes review time.
The Patterns That Actually Work
After 87 applications, here's what consistently delivers results:
The 3-Day POC Cycle
Day 1: FastAPI backend + LLM integration + sample data
Day 2: Frontend + core user flow
Day 3: Deployment + demo prep + edge case handling
This cycle works for 80% of client POCs.
The Tech Stack That Never Failed
Backend: FastAPI
└─ Why: Type-safe, fast, auto-docs, async
Frontend: Custom (only Bootstrap)
└─ Why: Rapid development, looks professional enough
LLM Layer: LangChain OR direct API calls
└─ LangChain for: Complex chains, multiple tools
└─ Direct calls for: Simple use cases, debugging
Deployment: Github, AWS (production)
└─ Why: Fast, cheap, easy demos
The Client Demo Formula
-
Show value in 60 seconds (not features)
- Don't explain how it works
- Show what problem it solves
-
Live demo > slides (always)
- Slides are for context
- Live demos sell
-
Handle "Can it do X?" with "Let's try it now"
- LLMs are flexible
- Live experimentation impresses clients
-
Have backup demo data ready
- Internet fails
- APIs go down
- Always have a fallback
By The Numbers
The Stats:
- 86 applications shipped in one year
- 1 app every 4.2 days average velocity
- ~25 commits per repository
- 25,000+ lines of code (total across all repos)
- 40-55% technical debt (vs 60-80% industry average)
- 3 "critical" vulnerabilities found (revised to 1 real issue)
- ~500 lines of duplicated code
- 70% share identical LLM-first architecture
- 75% reduction in development time vs traditional approach
- B+ overall grade (revised from C+ after threat modeling)
What Worked:
- LLM-first architecture: Intelligence at the center, not algorithms
- Consistent patterns: Same flow repeated = less cognitive load
- Streaming UX: async + partial-json = perceived speed
- Velocity through monoculture: 70% similarity = fast iteration
What Didn't:
- No template until repo #86: 500 lines duplicated needlessly - I was just lazy to template
-
XSS vulnerability:
unsafeHTMLwithout DOMPurify (fixable in 4-6 hours) - Log file commits: Git noise from committed logs
- Monoculture risk: One bug = 50 bugs (but also: one fix = 50 fixes)
Your Takeaway Checklist
Before You Build Your First LLM App:
[ ] Create a security checklist template
[ ] Set up a shared utilities package structure
[ ] Define your standard tech stack
[ ] Create ADR template for decisions
[ ] Build a project scaffold/template
For Every LLM Project:
[ ] Run security checklist before first commit
[ ] Use consistent architecture pattern
[ ] Document major decisions (ADR)
[ ] Extract to utilities on 3rd code duplication
[ ] Demo to real user by Day 3
After Every 10 Projects:
[ ] Review patterns that emerged
[ ] Update templates with learnings
[ ] Refactor shared utilities
[ ] Update security checklist with new findings
After 50+ Projects:
[ ] Conduct full forensic audit
[ ] Grade yourself honestly
[ ] Share learnings publicly
[ ] Open-source your templates
The Reflection
Here's what surprised me most: I expected a disaster. I found intentional trade-offs.
What looked like critical vulnerabilities were architectural decisions:
- Domain-restricted demo keys (intentional, not leaked secrets)
- User-provided API keys (no third-party risk)
- LLM-first architecture (complexity outsourced, not eliminated)
The one real issue (XSS via unsafeHTML)? Fixable in an afternoon.
The bigger discovery? I've been making startup founder trade-offs, not junior developer mistakes:
- Speed over security (but users provide their own keys, limiting blast radius)
- Features over tests (but consistent patterns reduce bug surface area)
- Duplication over abstraction (but enabling rapid iteration)
- Monoculture over diversity (but creating leverage through repetition)
The B+ grade (revised from C+ after threat modeling) isn't a failure. It's proof that you can ship 86 applications in a year while maintaining lower-than-average technical debt—if you're willing to make deliberate trade-offs.
Want to Go Deeper?
Full Forensic Analysis:
I've published the complete code review with interactive charts, detailed vulnerability breakdowns, and architecture pattern analysis here: prudhvi1709.github.io/datastories/prudhvi-codereview/
GitHub Profile:
See all 86 applications (tagged with llmdemo): github.com/prudhvi1709
Browse by pattern:
- Natural language interfaces (20+ repos)
- Data automation & schema tools (15+ repos)
- ML & analytics (25+ repos)
- Enterprise POCs & prototypes (27+ repos)
Coming Soon:
I'm open-sourcing the LLM-first app template with:
- Pre-configured DOMPurify (security built-in)
- asyncLLM + partial-json streaming setup
- lit-html + marked.js rendering pipeline
- Common file parsers (CSV, Excel, JSON)
Follow me on GitHub to get notified when it launches.
Let's Connect:
Building AI applications at scale? I write about LLM architecture, velocity at scale, and lessons from 86 production apps.
- LinkedIn: prudhvi-krovvidi
- Email: kprudhvi71@gmail.com
- Website: prudhvi1709.github.io
Now if you'll excuse me, I have one XSS vulnerability to fix.
(And then I'm building that template so I never make this mistake again.)
Tags: #AI #LLM #FastAPI #Streamlit #LangChain #Security #WebDev #Python #MachineLearning #SoftwareEngineering
What about you? Have you audited your own code recently? What did you find? Drop a comment below—I'd love to hear your horror stories (or success stories).
Top comments (0)