Aleksandr Polenkov

Posted on Dec 30, 2025

Battle for Context: How We Implemented AI Coding in an Enterprise Project Feature

#ai #cursor #enterprise #programming

425 commits, 672 files, 1.5 billion tokens — and one form. A story about learning to work with AI in a real product.

⚡ TL;DR — Quick Summary

The Problem: AI coding in Enterprise projects fails because of context limitations. Models "forget" after 20-30 minutes of work.

Our Solution:

Figma → Prototype → Production — Design first, then AI implements in small steps
Opus + Gemini combo — Opus plans (200K), Gemini implements (1M tokens)
Strict Quality Gates — ESLint, TypeScript, Vitest, commitlint, Secretlint
Two-level tracking — Jira for team, Beads for AI atomic tasks
Memory Bank — External memory so AI doesn't lose context
SuperCode Workflows — Smart Actions for automated multi-step pipelines
MSW Mocking — Local development without network access

Key Lesson: AI without constraints is like a broken combine harvester on fire — it will "optimize" everything, including things you didn't ask for.

Introduction: A Task Nobody Had Solved

Imagine this: you need to give an analyst the ability to code. Not "write a prompt to ChatGPT," but actually make changes to an Enterprise product with a three-year history and a million lines of code.

The developer isn't sitting next to them dictating every line. They set up the environment, control quality, and only intervene when something goes wrong.

Sounds like science fiction? We thought so too. Until we tried.

Why This Is Harder Than It Seems

When a programmer uses an AI assistant, they control every step. They see what's happening "under the hood." They notice oddities in the code immediately.

With an analyst, everything is different. They see the result: "the form appeared" or "the form doesn't work." But code quality, architectural decisions, potential bugs — all of this remains behind the scenes.

We decided to create a system that compensates for this blindness. A system where AI can't "cause trouble" even if it really wants to.

Tool Selection: Why Cursor

We tried several options: GitHub Copilot, Claude Code, Windsurf, various API wrappers. We settled on Cursor for several reasons:

AI Coding Tools Comparison (2025)

Criterion	Cursor	GitHub Copilot	Claude Code	Windsurf	Codex
Multi-model	Yes: Opus, Gemini, GPT	No: GPT-4/o1 only	No: Claude only	Yes: Multiple	No: codex-1 only
MCP Integration	Built-in	Via Extensions	Built-in	Partial	No
Custom Rules	`.mdc` files	No	`CLAUDE.md` only	Yes	`AGENTS.md`
Agent Mode	Full	Workspace (beta)	Full	Cascade	Full (sandbox)
Context Window	1M+ (Gemini)	128K	200K	1M+	192K
Enterprise SSO	Yes	Yes	Beta	Yes	Yes
IDE Type	Fork VSCode	Extension	Terminal CLI	Fork VSCode	Cloud sandbox

Conclusion: Cursor is the only tool that combines multi-model support with built-in MCP integration and flexible context-aware rules.

Multi-model Support

Cursor allows using different models for different tasks:

Claude Opus 4.5 for architectural planning (smart but "expensive" in tokens)
Gemini 3 Flash for implementation (fast, cheap, and most importantly — 1 million tokens of context)

MCP Integration

Model Context Protocol (MCP) — a way to connect external tools to AI:

MCP Server	Purpose
Jira	Task management
Context7	Library documentation
Memory Bank	Context preservation between sessions
Beads	Atomic task tracking

Flexible Rules

Cursor allows creating .mdc files with rules that automatically load depending on context. Working on a React component — get React rules. Writing a script — get Node.js rules.

The Design-to-Code Pipeline: Figma → Prototype → Production

One crucial part of our workflow that made AI coding possible: we started from design, not from code.

The Three-Step Process

Figma Design — The analyst creates UX/UI mockups in Figma. No code yet, just visual design and component structure.
Prototype Implementation — We ask AI to transfer the Figma design to a clean, minimal project (10-15 files). This is where AI shines — small context, clear requirements, fast iteration.
Production Migration — Once the prototype works perfectly, we migrate it to the main product. AI handles the integration with existing patterns and styles.

💡 Why this works: AI struggles with large codebases but excels at small, focused tasks. By breaking the work into "design → prototype → production," we keep each step within AI's effective context window.

With Figma MCP, the AI agent can even read design specs directly from Figma files — colors, spacing, component hierarchy — and apply them automatically.

Security Requirements: Working Locally

Our security team set strict requirements: no access to the corporate network during development. No cloning of the production database.

This meant we needed a full mocking system. We built it on MSW (Mock Service Worker):

50+ handlers for all API endpoints
Realistic data generators using @faker-js
Full business logic emulation

Quality Gates: The Stricter, The Better

Here's the key insight we took from this project: AI needs strict constraints.

Without them, it starts to "create." Sees outdated code — refactors. Notices a potential vulnerability — "fixes" it. Finds a style mismatch — reformats.

Sounds useful? In practice, it means a simple task "add a field to a form" turns into a PR with 100,000 lines.

Our Quality Gates Pipeline

commitlint — checks commit message format
ESLint — strict TypeScript rules, import order
TypeScript — strict mode, no any
Vitest — unit tests must pass
Secretlint — checks for accidentally committed secrets

AI cannot bypass these checks. If the code doesn't pass — the commit won't happen.

The Context Problem: The Main Pain Point

Now for the most important part. The thing that almost killed the entire project.

Context.

When you work with a simple 10-file application, AI handles it perfectly. The entire project fits in its "memory." It sees the complete picture.

But what happens when the project is a million lines of code accumulated over three years? AI sees only a fragment. The tip of the iceberg.

Here are real numbers:

Project Size	Tokens	AI Effective Work Time
Tutorial project	100K	Unlimited
Medium product	500K	2-3 hours
Enterprise (3+ years)	1M+	20-30 minutes

After 30 minutes, AI starts to "forget." Repeats mistakes. Proposes solutions you've already rejected. Breaks what was just working.

Four Rakes We Stepped On

Rake #1: "It Worked on a Simple Example"

We ran an experiment. Asked an analyst to create a registration form on a clean boilerplate — minimal React project, reference rules, 10 files.

Result: 15 minutes, everything works perfectly.

The same task on a real project: nothing works. AI gets confused by dependencies, uses outdated patterns, conflicts with existing code.

Lesson: It's not about AI being "dumb." It's about lack of context.

Rake #2: AI "Fixed" the Entire Project

This was a catastrophe. We set a task: add one feature. AI completed it. And also:

Replaced all any with specific types
"Fixed" potential vulnerabilities
Reformatted half the project
Updated outdated dependencies

Result: PR with 100,000+ lines. GitLab physically couldn't display the diff. We spent two weeks figuring it out. The product was broken.

😰 "This was very painful."

🚜 Imagine a combine harvester that suddenly decided it's not just harvesting wheat, but also "optimizing" the entire field — plowing, seeding, and building a barn. Except the harvester is broken and on fire. That's what uncontrolled AI looks like on a large codebase.

Lesson: You need rules that explicitly limit the scope of AI work. Otherwise, you get a "helpful" AI that turns your simple feature into a full-scale renovation project — with demolition included. 🔥

Rake #3: Token Limitation

We didn't immediately understand that most models have context limited to 100-200K tokens. For an Enterprise project, this is enough for 3-5 iterations.

Then AI starts "forgetting" the beginning of the conversation. Proposes solutions you've already rejected. Repeats mistakes.

Lesson: For Enterprise, you need models with at least 1 million tokens of context.

🧠 "Enterprise projects require at least 1 million tokens of context. Otherwise, it doesn't work." — Opus

Rake #4: Auto-Mode Is a Trap

Cursor can automatically select a model. Sounds convenient? In practice, it often chooses a "cheap" model with a small context.

We wasted a lot of time before we understood: for serious work, you need to manually select the model.

Lesson: Opus for planning, Gemini Flash for implementation. No auto-mode.

How We Solved the Context Problem

After all the rakes, we developed a system. It's not perfect, but it works.

Two-Level Task Tracking

Jira — top level. Tasks for the team: "VP-385: Add registration form."

Beads — atomic level. Tasks for AI:

"bd-1: Review file UserForm.tsx"
"bd-2: Add email field"
"bd-3: Write test"

Beads is stored locally, syncs with git. AI always knows what step it stopped at.

Memory Bank

This is "external memory" for AI. We store:

File	Purpose
`activeContext.md`	Current focus — what we're working on now
`progress.md`	Implementation status — what's already done
`research-*.md`	Investigations — what we found out
`archive-*.md`	Completed tasks — historical reference

Usage example:

AI: "Look at all my commits and summarize them"
Memory Bank → Indexing → Result

💡 "Memory Bank is a lifesaver. Without it, context is lost forever."

When AI "forgets" context, it can access Memory Bank and restore understanding.

SuperCode Workflows

SuperCode adds another layer of acceleration:

Feature	Description
Smart Actions	Custom automation workflows via JSON/YML in `.supercode/actions/`
Prompt Updaters	Transform prompts via AI, URL, or shell commands
Voice Commands	Trigger actions by voice
Nested Workflows	Sequential execution with `run: true` for multi-step pipelines

Workflow Example (from SuperCode docs):

{
  "Architecture Design": {
    "mode": "SC:Architect",
    "model": "o3",
    "prompt": "Design the architecture for: $prompt",
    "run": true
  },
  "Implementation": {
    "model": "claude-4-sonnet",
    "prompt": "Implement based on the design: $prompt",
    "run": true
  },
  "Full Feature Workflow": {
    "actions": ["Architecture Design", "Implementation"]
  }
}

💡 "I imagine myself as Tony Stark talking to Jarvis."

Model Combination

We split work between two models:

Claude Opus 4.5 — architect. Creates plans, writes specs, conducts reviews. It has "only" 200K tokens, but for planning that's enough.

Gemini 3 Flash — executor. Implements code according to plan. 1 million tokens of context — can work for hours without losing the thread.

Cycle: Opus plans → Gemini implements → Opus reviews.

Project Statistics

Over 1.5 weeks of work on the feature/timeline branch:

Metric	Value
Commits	425
Files changed	672
Lines added	+85,000
Lines removed	-11,000
Tests added	~200
Tokens spent	1.5 billion

What was implemented:

✅ Full MSW mocking system (50+ handlers)
✅ Schedule Timeline with Gantt chart
✅ Quality Gates (ESLint, TypeScript, Husky)
✅ Beads integration
✅ 200+ unit tests

Comparison: Traditional Development vs AI

Honest comparison:

Parameter	Traditional	With AI
Time per feature	2-3 weeks	1.5 weeks*
Code quality	Depends on developer	High (Quality Gates)
Tests	Often skipped	200+ automatically
Documentation	Often none	Generated

Including infrastructure setup, learning, and all the rakes.

Important nuance: the first time is expensive. We spent 1.5 weeks understanding how this works. Setting up rules. Stepping on rakes.

💰 "First time is expensive. Second time is 10x faster." — Opus

The second feature will take 10 times less time.

Role Evolution

AI coding changes team roles:

Analyst no longer just "writes specs." They become a junior developer:

✅ Understand SQL queries
✅ Work with Git (branches, commits, PRs)
✅ Read code at a basic level
✅ Use AI prompts effectively

Developer no longer just "writes code." They become an architect:

✅ Design patterns over language syntax
✅ System architecture skills
✅ DevOps fundamentals
✅ Any language: Java, Node.js, Python, Go — AI writes them all

🎖️ "Developers become universal soldiers."

Developers become universal specialists. Can work with any stack because they understand principles, not syntax.

Conclusions and Recommendations

The Complete Architecture

What Works

Figma → Prototype → Production — design first, implement in small steps
Opus + Gemini combination — smart architect + fast executor
Quality Gates — the stricter the constraints, the better the result
Two-level tracking — Jira for team, Beads for AI
Memory Bank — external memory to not lose context
SuperCode Workflows — chain automation for AI actions
Data mocking — complete development autonomy

What Doesn't Work

Auto-mode for model selection
AI without constraints (will fix the entire project)
Models with context less than 1M tokens for Enterprise

Checklist for Getting Started

[ ] Set up local development environment
[ ] Implement Quality Gates (ESLint, TypeScript strict)
[ ] Create a data mocking system (MSW)
[ ] Connect MCP (Jira, Context7, Memory Bank)
[ ] Train analyst on Git and SQL
[ ] Choose the right models (Opus + Gemini)

Conclusion

The battle for context hasn't been won yet. Technologies evolve, context windows grow, but the problem remains.

Enterprise projects are too large for AI to "see" them in full. This means we need systems that help AI maintain focus. Task trackers, Memory Bank, Quality Gates.

We spent 1.5 billion tokens to understand this. I hope our experience helps you spend less.

🏆 "The battle for context hasn't been won yet. But we know how to fight."

What's your experience with AI coding in large projects? Share in the comments!

🔗 Click to expand: All Resources & Links

🔗 Resources & Links

AI Coding Tools

Tool	Description
Cursor	AI-first code editor with multi-model support
GitHub Copilot	AI pair programmer by GitHub
Claude Code	Anthropic's agentic coding tool

AI Models

Model	Description
Claude Opus 4.5	Anthropic's most capable model (200K context)
Gemini 3 Flash	Google's fast model (1M context)
ChatGPT	OpenAI's conversational AI

MCP Servers

Server	Description
Model Context Protocol	Protocol for connecting tools to AI
Memory Bank MCP	Persistent context storage for AI
Figma MCP	Design-to-code integration
Context7	Library documentation for AI
Beads	Local atomic task tracking

Workflow Automation

Tool	Description
SuperCode	AI workflow chains, voice input, prompt enhancement

Quality Gates

Tool	Description
ESLint	JavaScript/TypeScript linter
TypeScript	Typed JavaScript
Vitest	Fast unit test framework
commitlint	Commit message linter
Secretlint	Prevent committing secrets
Husky	Git hooks made easy