425 commits, 672 files, 1.5 billion tokens — and one form. A story about learning to work with AI in a real product.
The Problem: AI coding in Enterprise projects fails because of context limitations. Models "forget" after 20-30 minutes of work. Our Solution: Key Lesson: AI without constraints is like a broken combine harvester on fire — it will "optimize" everything, including things you didn't ask for.⚡ TL;DR — Quick Summary
Introduction: A Task Nobody Had Solved
Imagine this: you need to give an analyst the ability to code. Not "write a prompt to ChatGPT," but actually make changes to an Enterprise product with a three-year history and a million lines of code.
The developer isn't sitting next to them dictating every line. They set up the environment, control quality, and only intervene when something goes wrong.
Sounds like science fiction? We thought so too. Until we tried.
Why This Is Harder Than It Seems
When a programmer uses an AI assistant, they control every step. They see what's happening "under the hood." They notice oddities in the code immediately.
With an analyst, everything is different. They see the result: "the form appeared" or "the form doesn't work." But code quality, architectural decisions, potential bugs — all of this remains behind the scenes.
We decided to create a system that compensates for this blindness. A system where AI can't "cause trouble" even if it really wants to.
Tool Selection: Why Cursor
We tried several options: GitHub Copilot, Claude Code, Windsurf, various API wrappers. We settled on Cursor for several reasons:
AI Coding Tools Comparison (2025)
| Criterion | Cursor | GitHub Copilot | Claude Code | Windsurf | Codex |
|---|---|---|---|---|---|
| Multi-model | Yes: Opus, Gemini, GPT | No: GPT-4/o1 only | No: Claude only | Yes: Multiple | No: codex-1 only |
| MCP Integration | Built-in | Via Extensions | Built-in | Partial | No |
| Custom Rules |
.mdc files |
No |
CLAUDE.md only |
Yes | AGENTS.md |
| Agent Mode | Full | Workspace (beta) | Full | Cascade | Full (sandbox) |
| Context Window | 1M+ (Gemini) | 128K | 200K | 1M+ | 192K |
| Enterprise SSO | Yes | Yes | Beta | Yes | Yes |
| IDE Type | Fork VSCode | Extension | Terminal CLI | Fork VSCode | Cloud sandbox |
Conclusion: Cursor is the only tool that combines multi-model support with built-in MCP integration and flexible context-aware rules.
Multi-model Support
Cursor allows using different models for different tasks:
- Claude Opus 4.5 for architectural planning (smart but "expensive" in tokens)
- Gemini 3 Flash for implementation (fast, cheap, and most importantly — 1 million tokens of context)
MCP Integration
Model Context Protocol (MCP) — a way to connect external tools to AI:
| MCP Server | Purpose |
|---|---|
| Jira | Task management |
| Context7 | Library documentation |
| Memory Bank | Context preservation between sessions |
| Beads | Atomic task tracking |
Flexible Rules
Cursor allows creating .mdc files with rules that automatically load depending on context. Working on a React component — get React rules. Writing a script — get Node.js rules.
The Design-to-Code Pipeline: Figma → Prototype → Production
One crucial part of our workflow that made AI coding possible: we started from design, not from code.
The Three-Step Process
Figma Design — The analyst creates UX/UI mockups in Figma. No code yet, just visual design and component structure.
Prototype Implementation — We ask AI to transfer the Figma design to a clean, minimal project (10-15 files). This is where AI shines — small context, clear requirements, fast iteration.
Production Migration — Once the prototype works perfectly, we migrate it to the main product. AI handles the integration with existing patterns and styles.
💡 Why this works: AI struggles with large codebases but excels at small, focused tasks. By breaking the work into "design → prototype → production," we keep each step within AI's effective context window.
With Figma MCP, the AI agent can even read design specs directly from Figma files — colors, spacing, component hierarchy — and apply them automatically.
Security Requirements: Working Locally
Our security team set strict requirements: no access to the corporate network during development. No cloning of the production database.
This meant we needed a full mocking system. We built it on MSW (Mock Service Worker):
- 50+ handlers for all API endpoints
- Realistic data generators using @faker-js
- Full business logic emulation
Quality Gates: The Stricter, The Better
Here's the key insight we took from this project: AI needs strict constraints.
Without them, it starts to "create." Sees outdated code — refactors. Notices a potential vulnerability — "fixes" it. Finds a style mismatch — reformats.
Sounds useful? In practice, it means a simple task "add a field to a form" turns into a PR with 100,000 lines.
Our Quality Gates Pipeline
- commitlint — checks commit message format
- ESLint — strict TypeScript rules, import order
-
TypeScript — strict mode, no
any - Vitest — unit tests must pass
- Secretlint — checks for accidentally committed secrets
AI cannot bypass these checks. If the code doesn't pass — the commit won't happen.
The Context Problem: The Main Pain Point
Now for the most important part. The thing that almost killed the entire project.
Context.
When you work with a simple 10-file application, AI handles it perfectly. The entire project fits in its "memory." It sees the complete picture.
But what happens when the project is a million lines of code accumulated over three years? AI sees only a fragment. The tip of the iceberg.
Here are real numbers:
| Project Size | Tokens | AI Effective Work Time |
|---|---|---|
| Tutorial project | 100K | Unlimited |
| Medium product | 500K | 2-3 hours |
| Enterprise (3+ years) | 1M+ | 20-30 minutes |
After 30 minutes, AI starts to "forget." Repeats mistakes. Proposes solutions you've already rejected. Breaks what was just working.
Four Rakes We Stepped On
Rake #1: "It Worked on a Simple Example"
We ran an experiment. Asked an analyst to create a registration form on a clean boilerplate — minimal React project, reference rules, 10 files.
Result: 15 minutes, everything works perfectly.
The same task on a real project: nothing works. AI gets confused by dependencies, uses outdated patterns, conflicts with existing code.
Lesson: It's not about AI being "dumb." It's about lack of context.
Rake #2: AI "Fixed" the Entire Project
This was a catastrophe. We set a task: add one feature. AI completed it. And also:
- Replaced all
anywith specific types - "Fixed" potential vulnerabilities
- Reformatted half the project
- Updated outdated dependencies
Result: PR with 100,000+ lines. GitLab physically couldn't display the diff. We spent two weeks figuring it out. The product was broken.
😰 "This was very painful."
🚜 Imagine a combine harvester that suddenly decided it's not just harvesting wheat, but also "optimizing" the entire field — plowing, seeding, and building a barn. Except the harvester is broken and on fire. That's what uncontrolled AI looks like on a large codebase.
Lesson: You need rules that explicitly limit the scope of AI work. Otherwise, you get a "helpful" AI that turns your simple feature into a full-scale renovation project — with demolition included. 🔥
Rake #3: Token Limitation
We didn't immediately understand that most models have context limited to 100-200K tokens. For an Enterprise project, this is enough for 3-5 iterations.
Then AI starts "forgetting" the beginning of the conversation. Proposes solutions you've already rejected. Repeats mistakes.
Lesson: For Enterprise, you need models with at least 1 million tokens of context.
🧠 "Enterprise projects require at least 1 million tokens of context. Otherwise, it doesn't work." — Opus
Rake #4: Auto-Mode Is a Trap
Cursor can automatically select a model. Sounds convenient? In practice, it often chooses a "cheap" model with a small context.
We wasted a lot of time before we understood: for serious work, you need to manually select the model.
Lesson: Opus for planning, Gemini Flash for implementation. No auto-mode.
How We Solved the Context Problem
After all the rakes, we developed a system. It's not perfect, but it works.
Two-Level Task Tracking
Jira — top level. Tasks for the team: "VP-385: Add registration form."
Beads — atomic level. Tasks for AI:
- "bd-1: Review file UserForm.tsx"
- "bd-2: Add email field"
- "bd-3: Write test"
Beads is stored locally, syncs with git. AI always knows what step it stopped at.
Memory Bank
This is "external memory" for AI. We store:
| File | Purpose |
|---|---|
activeContext.md |
Current focus — what we're working on now |
progress.md |
Implementation status — what's already done |
research-*.md |
Investigations — what we found out |
archive-*.md |
Completed tasks — historical reference |
Usage example:
AI: "Look at all my commits and summarize them"
Memory Bank → Indexing → Result
💡 "Memory Bank is a lifesaver. Without it, context is lost forever."
When AI "forgets" context, it can access Memory Bank and restore understanding.
SuperCode Workflows
SuperCode adds another layer of acceleration:
| Feature | Description |
|---|---|
| Smart Actions | Custom automation workflows via JSON/YML in .supercode/actions/
|
| Prompt Updaters | Transform prompts via AI, URL, or shell commands |
| Voice Commands | Trigger actions by voice |
| Nested Workflows | Sequential execution with run: true for multi-step pipelines |
Workflow Example (from SuperCode docs):
{
"Architecture Design": {
"mode": "SC:Architect",
"model": "o3",
"prompt": "Design the architecture for: $prompt",
"run": true
},
"Implementation": {
"model": "claude-4-sonnet",
"prompt": "Implement based on the design: $prompt",
"run": true
},
"Full Feature Workflow": {
"actions": ["Architecture Design", "Implementation"]
}
}
💡 "I imagine myself as Tony Stark talking to Jarvis."
Model Combination
We split work between two models:
Claude Opus 4.5 — architect. Creates plans, writes specs, conducts reviews. It has "only" 200K tokens, but for planning that's enough.
Gemini 3 Flash — executor. Implements code according to plan. 1 million tokens of context — can work for hours without losing the thread.
Cycle: Opus plans → Gemini implements → Opus reviews.
Project Statistics
Over 1.5 weeks of work on the feature/timeline branch:
| Metric | Value |
|---|---|
| Commits | 425 |
| Files changed | 672 |
| Lines added | +85,000 |
| Lines removed | -11,000 |
| Tests added | ~200 |
| Tokens spent | 1.5 billion |
What was implemented:
- ✅ Full MSW mocking system (50+ handlers)
- ✅ Schedule Timeline with Gantt chart
- ✅ Quality Gates (ESLint, TypeScript, Husky)
- ✅ Beads integration
- ✅ 200+ unit tests
Comparison: Traditional Development vs AI
Honest comparison:
| Parameter | Traditional | With AI |
|---|---|---|
| Time per feature | 2-3 weeks | 1.5 weeks* |
| Code quality | Depends on developer | High (Quality Gates) |
| Tests | Often skipped | 200+ automatically |
| Documentation | Often none | Generated |
Including infrastructure setup, learning, and all the rakes.
Important nuance: the first time is expensive. We spent 1.5 weeks understanding how this works. Setting up rules. Stepping on rakes.
💰 "First time is expensive. Second time is 10x faster." — Opus
The second feature will take 10 times less time.
Role Evolution
AI coding changes team roles:
Analyst no longer just "writes specs." They become a junior developer:
- ✅ Understand SQL queries
- ✅ Work with Git (branches, commits, PRs)
- ✅ Read code at a basic level
- ✅ Use AI prompts effectively
Developer no longer just "writes code." They become an architect:
- ✅ Design patterns over language syntax
- ✅ System architecture skills
- ✅ DevOps fundamentals
- ✅ Any language: Java, Node.js, Python, Go — AI writes them all
🎖️ "Developers become universal soldiers."
Developers become universal specialists. Can work with any stack because they understand principles, not syntax.
Conclusions and Recommendations
The Complete Architecture
What Works
- Figma → Prototype → Production — design first, implement in small steps
- Opus + Gemini combination — smart architect + fast executor
- Quality Gates — the stricter the constraints, the better the result
- Two-level tracking — Jira for team, Beads for AI
- Memory Bank — external memory to not lose context
- SuperCode Workflows — chain automation for AI actions
- Data mocking — complete development autonomy
What Doesn't Work
- Auto-mode for model selection
- AI without constraints (will fix the entire project)
- Models with context less than 1M tokens for Enterprise
Checklist for Getting Started
- [ ] Set up local development environment
- [ ] Implement Quality Gates (ESLint, TypeScript strict)
- [ ] Create a data mocking system (MSW)
- [ ] Connect MCP (Jira, Context7, Memory Bank)
- [ ] Train analyst on Git and SQL
- [ ] Choose the right models (Opus + Gemini)
Conclusion
The battle for context hasn't been won yet. Technologies evolve, context windows grow, but the problem remains.
Enterprise projects are too large for AI to "see" them in full. This means we need systems that help AI maintain focus. Task trackers, Memory Bank, Quality Gates.
We spent 1.5 billion tokens to understand this. I hope our experience helps you spend less.
🏆 "The battle for context hasn't been won yet. But we know how to fight."
What's your experience with AI coding in large projects? Share in the comments!
🔗 Click to expand: All Resources & Links
🔗 Resources & Links
AI Coding Tools
Tool
Description
Cursor
AI-first code editor with multi-model support
GitHub Copilot
AI pair programmer by GitHub
Claude Code
Anthropic's agentic coding tool
AI Models
Model
Description
Claude Opus 4.5
Anthropic's most capable model (200K context)
Gemini 3 Flash
Google's fast model (1M context)
ChatGPT
OpenAI's conversational AI
MCP Servers
Server
Description
Model Context Protocol
Protocol for connecting tools to AI
Memory Bank MCP
Persistent context storage for AI
Figma MCP
Design-to-code integration
Context7
Library documentation for AI
Beads
Local atomic task tracking
Workflow Automation
Tool
Description
SuperCode
AI workflow chains, voice input, prompt enhancement
Quality Gates
Tool
Description
ESLint
JavaScript/TypeScript linter
TypeScript
Typed JavaScript
Vitest
Fast unit test framework
commitlint
Commit message linter
Secretlint
Prevent committing secrets
Husky
Git hooks made easy
Mocking & Testing
Tool
Description
MSW
Mock Service Worker for API mocking
@faker-js
Generate realistic fake data
Playwright
E2E testing automation
Security
Tool
Description
Snyk
Security scanning for dependencies
Project Management
Design & Frontend
About the Author
Software Engineer.
Tools: Cursor IDE, Claude Opus 4.5, Gemini 3 Flash, SuperCode.
Try Cursor IDE — The AI Code Editor
_ #ai #cursor #enterprise #programming #devjournal _















Top comments (0)