DEV Community

Cover image for I Built 86 AI Applications in One Year. Here's My Brutal Self-Audit.
prudhvi krovvidi
prudhvi krovvidi

Posted on

I Built 86 AI Applications in One Year. Here's My Brutal Self-Audit.

Last week, I did something I've been dreading for months: I audited every single line of code from the 86 AI applications I shipped in the past year.

As a Data Scientist at Gramener, I've been moving fast—really fast. One new AI-powered application every 4.2 days, on average. ~25 commits per project. 25,000+ lines of code across 86 repositories. Natural language database interfaces. Schema generators. Hypothesis testing tools. Document analyzers. You name it, I've probably built a version of it.

But velocity comes with a cost. And I needed to know: Was I building fast, or was I building a disaster waiting to happen? So I conducted a forensic code review of all 86 repositories. The grade? B+ (revised from C+ after I realized most "critical vulnerabilities" were false alarms). Here's the brutal truth about what I found—and what you should steal from it.


The Context: One Repository Every 4.2 Days

I work at the intersection of velocity and enterprise. My job is to take a client's ambitious AI idea and turn it into a working prototype in days, not months. The tech stack is consistent: Python backends (FastAPI), JavaScript frontends (custom), and LLMs everywhere.

The pace is relentless. Eighty-six applications in twelve months. That's roughly one every 4.2 days.

I shipped 86 in one year. This isn't a brag—it's a confession. I needed to know if I was building value or just creating noise.


The Audit: 86 Repositories, 25,000 Lines of Code, 6 Hours of Analysis

The process was systematic:

  • ✅ Deep dive into 3 major repositories (schemaforge, dailynews, routing-assistant)
  • ✅ Security vulnerability scanning
  • ✅ Architecture pattern identification across all 86 repos
  • ✅ Code duplication analysis
  • ✅ Technical debt assessment

Every repository. Every commit. Every shortcut I took when the client demo was in 48 hours and the LLM wasn't cooperating.

The initial result? C+. Multiple critical vulnerabilities. Hardcoded API keys. Prompt injection attacks. XSS vulnerabilities.

But then I looked closer. I built threat models. I asked: "Who's actually at risk here?"

The revised result? B+.

Two of the three "critical" vulnerabilities were false alarms—intentional architecture decisions that looked scary but weren't actually exploitable.

Here's what I really found.


The Good News: I'm Not Building a Disaster

Finding #1: Technical Debt is Actually Lower Than Average

The Numbers: 40-55% technical debt across all projects, compared to the industry average of 60-80%.

I was shocked. I expected to find a mess. Instead, I found that shipping fast with consistent patterns beats shipping slow with "perfect" code.

Why it worked:

  • 70% of my repositories share an identical architecture
  • The same pattern repeated = less cognitive load, fewer bugs
  • LLM-first design means less code to maintain

The lesson: Architecture monoculture has benefits. When 70% of your apps work the same way, you get really good at that one way.


Finding #2: The LLM-First Architecture Pattern

The Pattern: 70% of my applications share this exact flow:

1. User Input
2. File Parser (XLSX.js, CSV, etc.)
3. Prompt Builder (construct LLM prompt)
4. asyncLLM Stream (streaming API calls)
5. partial-json Parser (parse incomplete JSON)
6. marked → Markdown rendering
7. lit-html → DOM rendering
8. Export/Download functionality
Enter fullscreen mode Exit fullscreen mode

This isn't coincidence. It's a deliberate pattern I discovered and repeated.

The insight: Traditional architecture puts compute at the center: Input → Algorithm → Output

My architecture puts intelligence at the center: Input → LLM → Output (algorithm is emergent)

This explains the velocity. I'm not writing algorithms. I'm prompting them.

The lesson: You can ship 4x faster when you let LLMs handle the complexity.


Finding #3: LLMs Save 75% of Development Time

Let me show you real numbers from one of my projects: SchemaForge (a tool that generates database schemas from CSV files).

Traditional approach:

  • Parse CSV with custom parser: 8 hours
  • Infer data types (regex hell): 12 hours
  • Detect relationships: 16 hours
  • Generate SQL DDL: 8 hours
  • Generate DBT models: 12 hours
  • Write tests: 20 hours
  • Debug edge cases: 16 hours
  • Total: 92 hours

My LLM-first approach:

  • Parse CSV with XLSX.js: 2 hours
  • Build prompt with examples: 4 hours
  • Stream LLM, parse JSON: 6 hours
  • Render with marked + lit-html: 3 hours
  • Debug + iterate: 8 hours
  • Total: 23 hours

Time saved: 69 hours (75% reduction)

The lesson: The leverage isn't in writing code faster. It's in not writing code at all.


The Bad News: What I Got Wrong (And What I Got Right About Being Wrong)

The Plot Twist: 2 out of 3 "Critical" Issues Were False Alarms

I initially flagged three critical security vulnerabilities. But after building threat models and asking "Who's actually at risk?", I realized two were false positives.


False Alarm #1: "Hardcoded API Keys" 🚫

What I Found: API keys visible in public repositories.

Initial Assessment: CRITICAL

After Threat Modeling: INFO

Why the Downgrade:

These are domain-restricted demo keys. Think of Google Maps API keys embedded in websites—they're publicly visible, but only authorized domains can use them.

My architecture: Users provide their own API keys. The demo keys are for... demos. They're intentional, not accidental.

The lesson: Not all visible keys are vulnerabilities. Context matters.


False Alarm #2: "Prompt Injection Attacks" 🚫

What I Found: Unsanitized user input going directly into LLM prompts.

Initial Assessment: CRITICAL

After Threat Modeling: LOW

Why the Downgrade:

Users provide their own API keys. So if someone tries a "prompt injection" attack, they're attacking themselves, with their own money, to corrupt their own results.

This isn't a security vulnerability. This is "garbage in, garbage out."

The "victim" would be the attacker. There's no exploit here.

The lesson: Threat modeling isn't about finding scary patterns. It's about asking "Who gets hurt, and how?"


Real Issue #1: XSS via unsafeHTML ⚠️

What I Found: LLM outputs rendered with unsafeHTML() directive (from lit-html) without sanitization.

Initial Assessment: HIGH

After Threat Modeling: MEDIUM-HIGH (still a problem!)

Why It Matters:

If someone uploads a file with a crafted filename like:

data<img src=x onerror=alert('xss')>.csv
Enter fullscreen mode Exit fullscreen mode

And the LLM includes that filename in its response, my code would render it as HTML, executing the JavaScript and potentially stealing API keys from localStorage.

The Fix:

// BAD (what I was doing):
import { unsafeHTML } from 'lit-html/directives/unsafe-html.js';
html`<div>${unsafeHTML(llmOutput)}</div>`

// GOOD (what I should do):
import DOMPurify from 'dompurify';
const cleanHTML = DOMPurify.sanitize(llmOutput);
html`<div>${unsafeHTML(cleanHTML)}</div>`
Enter fullscreen mode Exit fullscreen mode

Time to fix: 4-6 hours across affected repositories.

The lesson: LLM outputs are user input. Treat them like you'd treat form submissions.


Real Issue #2: ~500 Lines of Duplicated Code

What I Found: The same code patterns copied across multiple repositories:

  • File parsing logic
  • LLM streaming setup
  • Error handling patterns

Why It Happened: Velocity > abstraction. I was shipping too fast to stop and refactor.

The Cost: Changes require updating 10-20 repos instead of one shared library.

The Fix: Create a shared template repository with common utilities.

Time to fix: 10-15 hours to build the template (one-time investment).

Time saved: 200+ hours over the next year.

The lesson: Third time you copy-paste something? Extract it. Don't wait until repo #86.


What I'd Do Differently (Your Cheat Sheet)

Change #1: Build the Template at Repo #5, Not #86

The Mistake: I waited until project 86 to realize I was copying the same 500 lines of code everywhere.

The Fix: Create @prudhvi/llm-app-template with:

  • DOMPurify pre-configured
  • Streaming LLM setup (asyncLLM + partial-json parser)
  • Common file parsers (CSV, Excel, JSON)
  • lit-html + marked.js rendering pipeline
  • Error handling utilities
  • Pre-commit security hooks

Time investment: 10-15 hours (one time)

Time saved: 200+ hours over repos #87-172

The lesson: When you find yourself shipping one repo every 4 days, invest in a template. Future-you will thank you.


Change #2: Fix the Monoculture Risk

The Reality: 70% of my repos share identical architecture.

The Benefit: Fast iteration, consistent patterns, low cognitive load.

The Risk: One bug becomes 50 bugs. One dependency breaking change breaks 50 apps.

The Fix:

  • Pin dependencies with exact versions (openai==1.49.0, not openai>=1.0)
  • Add Subresource Integrity (SRI) hashes to CDN imports
  • Test dependency updates on one repo before updating all 50

The lesson: Architecture monoculture is a feature, not a bug—but you need guardrails.


Change #3: Stop Committing Log Files

What I Found: email_log.txt committed to multiple repos.

The Fix:

# Add to .gitignore
echo "email_log.txt" >> .gitignore
echo "*.log" >> .gitignore
git rm --cached email_log.txt
Enter fullscreen mode Exit fullscreen mode

Time to fix: 1-2 hours

The lesson: Simple mistake, simple fix. But it clutters git history and wastes review time.


The Patterns That Actually Work

After 87 applications, here's what consistently delivers results:

The 3-Day POC Cycle

Day 1: FastAPI backend + LLM integration + sample data
Day 2: Frontend + core user flow
Day 3: Deployment + demo prep + edge case handling
Enter fullscreen mode Exit fullscreen mode

This cycle works for 80% of client POCs.


The Tech Stack That Never Failed

Backend: FastAPI
  └─ Why: Type-safe, fast, auto-docs, async

Frontend: Custom (only Bootstrap)
  └─ Why: Rapid development, looks professional enough

LLM Layer: LangChain OR direct API calls
  └─ LangChain for: Complex chains, multiple tools
  └─ Direct calls for: Simple use cases, debugging

Deployment: Github, AWS (production)
  └─ Why: Fast, cheap, easy demos
Enter fullscreen mode Exit fullscreen mode

The Client Demo Formula

  1. Show value in 60 seconds (not features)

    • Don't explain how it works
    • Show what problem it solves
  2. Live demo > slides (always)

    • Slides are for context
    • Live demos sell
  3. Handle "Can it do X?" with "Let's try it now"

    • LLMs are flexible
    • Live experimentation impresses clients
  4. Have backup demo data ready

    • Internet fails
    • APIs go down
    • Always have a fallback

By The Numbers

The Stats:

  • 86 applications shipped in one year
  • 1 app every 4.2 days average velocity
  • ~25 commits per repository
  • 25,000+ lines of code (total across all repos)
  • 40-55% technical debt (vs 60-80% industry average)
  • 3 "critical" vulnerabilities found (revised to 1 real issue)
  • ~500 lines of duplicated code
  • 70% share identical LLM-first architecture
  • 75% reduction in development time vs traditional approach
  • B+ overall grade (revised from C+ after threat modeling)

What Worked:

  • LLM-first architecture: Intelligence at the center, not algorithms
  • Consistent patterns: Same flow repeated = less cognitive load
  • Streaming UX: async + partial-json = perceived speed
  • Velocity through monoculture: 70% similarity = fast iteration

What Didn't:

  • No template until repo #86: 500 lines duplicated needlessly - I was just lazy to template
  • XSS vulnerability: unsafeHTML without DOMPurify (fixable in 4-6 hours)
  • Log file commits: Git noise from committed logs
  • Monoculture risk: One bug = 50 bugs (but also: one fix = 50 fixes)

Your Takeaway Checklist

Before You Build Your First LLM App:

[ ] Create a security checklist template
[ ] Set up a shared utilities package structure
[ ] Define your standard tech stack
[ ] Create ADR template for decisions
[ ] Build a project scaffold/template
Enter fullscreen mode Exit fullscreen mode

For Every LLM Project:

[ ] Run security checklist before first commit
[ ] Use consistent architecture pattern
[ ] Document major decisions (ADR)
[ ] Extract to utilities on 3rd code duplication
[ ] Demo to real user by Day 3
Enter fullscreen mode Exit fullscreen mode

After Every 10 Projects:

[ ] Review patterns that emerged
[ ] Update templates with learnings
[ ] Refactor shared utilities
[ ] Update security checklist with new findings
Enter fullscreen mode Exit fullscreen mode

After 50+ Projects:

[ ] Conduct full forensic audit
[ ] Grade yourself honestly
[ ] Share learnings publicly
[ ] Open-source your templates
Enter fullscreen mode Exit fullscreen mode

The Reflection

Here's what surprised me most: I expected a disaster. I found intentional trade-offs.

What looked like critical vulnerabilities were architectural decisions:

  • Domain-restricted demo keys (intentional, not leaked secrets)
  • User-provided API keys (no third-party risk)
  • LLM-first architecture (complexity outsourced, not eliminated)

The one real issue (XSS via unsafeHTML)? Fixable in an afternoon.

The bigger discovery? I've been making startup founder trade-offs, not junior developer mistakes:

  • Speed over security (but users provide their own keys, limiting blast radius)
  • Features over tests (but consistent patterns reduce bug surface area)
  • Duplication over abstraction (but enabling rapid iteration)
  • Monoculture over diversity (but creating leverage through repetition)

The B+ grade (revised from C+ after threat modeling) isn't a failure. It's proof that you can ship 86 applications in a year while maintaining lower-than-average technical debt—if you're willing to make deliberate trade-offs.


Want to Go Deeper?

Full Forensic Analysis:
I've published the complete code review with interactive charts, detailed vulnerability breakdowns, and architecture pattern analysis here: prudhvi1709.github.io/datastories/prudhvi-codereview/

GitHub Profile:
See all 86 applications (tagged with llmdemo): github.com/prudhvi1709

Browse by pattern:

  • Natural language interfaces (20+ repos)
  • Data automation & schema tools (15+ repos)
  • ML & analytics (25+ repos)
  • Enterprise POCs & prototypes (27+ repos)

Coming Soon:
I'm open-sourcing the LLM-first app template with:

  • Pre-configured DOMPurify (security built-in)
  • asyncLLM + partial-json streaming setup
  • lit-html + marked.js rendering pipeline
  • Common file parsers (CSV, Excel, JSON)

Follow me on GitHub to get notified when it launches.

Let's Connect:
Building AI applications at scale? I write about LLM architecture, velocity at scale, and lessons from 86 production apps.


Now if you'll excuse me, I have one XSS vulnerability to fix.

(And then I'm building that template so I never make this mistake again.)


Tags: #AI #LLM #FastAPI #Streamlit #LangChain #Security #WebDev #Python #MachineLearning #SoftwareEngineering


What about you? Have you audited your own code recently? What did you find? Drop a comment below—I'd love to hear your horror stories (or success stories).

Top comments (0)