DEV Community

Cover image for Battle for Context: How We Implemented AI Coding in an Enterprise Project Feature
Aleksandr Polenkov
Aleksandr Polenkov

Posted on

Battle for Context: How We Implemented AI Coding in an Enterprise Project Feature

425 commits, 672 files, 1.5 billion tokens — and one form. A story about learning to work with AI in a real product.

⚡ TL;DR — Quick Summary

The Problem: AI coding in Enterprise projects fails because of context limitations. Models "forget" after 20-30 minutes of work.

Our Solution:

  • Figma → Prototype → Production — Design first, then AI implements in small steps
  • Opus + Gemini combo — Opus plans (200K), Gemini implements (1M tokens)
  • Strict Quality Gates — ESLint, TypeScript, Vitest, commitlint, Secretlint
  • Two-level tracking — Jira for team, Beads for AI atomic tasks
  • Memory Bank — External memory so AI doesn't lose context
  • SuperCode Workflows — Smart Actions for automated multi-step pipelines
  • MSW Mocking — Local development without network access

Key Lesson: AI without constraints is like a broken combine harvester on fire — it will "optimize" everything, including things you didn't ask for.

Introduction: A Task Nobody Had Solved

Imagine this: you need to give an analyst the ability to code. Not "write a prompt to ChatGPT," but actually make changes to an Enterprise product with a three-year history and a million lines of code.

The developer isn't sitting next to them dictating every line. They set up the environment, control quality, and only intervene when something goes wrong.

Sounds like science fiction? We thought so too. Until we tried.

Diagram 1: Traditional vs Our Approach


Why This Is Harder Than It Seems

When a programmer uses an AI assistant, they control every step. They see what's happening "under the hood." They notice oddities in the code immediately.

With an analyst, everything is different. They see the result: "the form appeared" or "the form doesn't work." But code quality, architectural decisions, potential bugs — all of this remains behind the scenes.

Diagram 2: What Analyst Sees vs Hidden

We decided to create a system that compensates for this blindness. A system where AI can't "cause trouble" even if it really wants to.


Tool Selection: Why Cursor

We tried several options: GitHub Copilot, Claude Code, Windsurf, various API wrappers. We settled on Cursor for several reasons:

AI Coding Tools Comparison (2025)

Criterion Cursor GitHub Copilot Claude Code Windsurf Codex
Multi-model Yes: Opus, Gemini, GPT No: GPT-4/o1 only No: Claude only Yes: Multiple No: codex-1 only
MCP Integration Built-in Via Extensions Built-in Partial No
Custom Rules .mdc files No CLAUDE.md only Yes AGENTS.md
Agent Mode Full Workspace (beta) Full Cascade Full (sandbox)
Context Window 1M+ (Gemini) 128K 200K 1M+ 192K
Enterprise SSO Yes Yes Beta Yes Yes
IDE Type Fork VSCode Extension Terminal CLI Fork VSCode Cloud sandbox

Conclusion: Cursor is the only tool that combines multi-model support with built-in MCP integration and flexible context-aware rules.

Multi-model Support

Cursor allows using different models for different tasks:

Diagram 3: Multi-Model Support

  • Claude Opus 4.5 for architectural planning (smart but "expensive" in tokens)
  • Gemini 3 Flash for implementation (fast, cheap, and most importantly — 1 million tokens of context)

MCP Integration

Model Context Protocol (MCP) — a way to connect external tools to AI:

Diagram 4: MCP Integration

MCP Server Purpose
Jira Task management
Context7 Library documentation
Memory Bank Context preservation between sessions
Beads Atomic task tracking

Flexible Rules

Cursor allows creating .mdc files with rules that automatically load depending on context. Working on a React component — get React rules. Writing a script — get Node.js rules.


The Design-to-Code Pipeline: Figma → Prototype → Production

One crucial part of our workflow that made AI coding possible: we started from design, not from code.

The Three-Step Process

  1. Figma Design — The analyst creates UX/UI mockups in Figma. No code yet, just visual design and component structure.

  2. Prototype Implementation — We ask AI to transfer the Figma design to a clean, minimal project (10-15 files). This is where AI shines — small context, clear requirements, fast iteration.

  3. Production Migration — Once the prototype works perfectly, we migrate it to the main product. AI handles the integration with existing patterns and styles.

💡 Why this works: AI struggles with large codebases but excels at small, focused tasks. By breaking the work into "design → prototype → production," we keep each step within AI's effective context window.

With Figma MCP, the AI agent can even read design specs directly from Figma files — colors, spacing, component hierarchy — and apply them automatically.


Security Requirements: Working Locally

Our security team set strict requirements: no access to the corporate network during development. No cloning of the production database.

This meant we needed a full mocking system. We built it on MSW (Mock Service Worker):

Diagram 5: MSW Architecture

  • 50+ handlers for all API endpoints
  • Realistic data generators using @faker-js
  • Full business logic emulation

Quality Gates: The Stricter, The Better

Here's the key insight we took from this project: AI needs strict constraints.

Without them, it starts to "create." Sees outdated code — refactors. Notices a potential vulnerability — "fixes" it. Finds a style mismatch — reformats.

Sounds useful? In practice, it means a simple task "add a field to a form" turns into a PR with 100,000 lines.

Our Quality Gates Pipeline

Diagram 6: Quality Gates Pipeline

  1. commitlint — checks commit message format
  2. ESLint — strict TypeScript rules, import order
  3. TypeScript — strict mode, no any
  4. Vitest — unit tests must pass
  5. Secretlint — checks for accidentally committed secrets

AI cannot bypass these checks. If the code doesn't pass — the commit won't happen.


The Context Problem: The Main Pain Point

Now for the most important part. The thing that almost killed the entire project.

Context.

When you work with a simple 10-file application, AI handles it perfectly. The entire project fits in its "memory." It sees the complete picture.

But what happens when the project is a million lines of code accumulated over three years? AI sees only a fragment. The tip of the iceberg.

Diagram 7: Context Iceberg

Here are real numbers:

Project Size Tokens AI Effective Work Time
Tutorial project 100K Unlimited
Medium product 500K 2-3 hours
Enterprise (3+ years) 1M+ 20-30 minutes

After 30 minutes, AI starts to "forget." Repeats mistakes. Proposes solutions you've already rejected. Breaks what was just working.


Four Rakes We Stepped On

Diagram 8: Four Rakes & Solutions

Rake #1: "It Worked on a Simple Example"

We ran an experiment. Asked an analyst to create a registration form on a clean boilerplate — minimal React project, reference rules, 10 files.

Result: 15 minutes, everything works perfectly.

The same task on a real project: nothing works. AI gets confused by dependencies, uses outdated patterns, conflicts with existing code.

Lesson: It's not about AI being "dumb." It's about lack of context.

Rake #2: AI "Fixed" the Entire Project

This was a catastrophe. We set a task: add one feature. AI completed it. And also:

  • Replaced all any with specific types
  • "Fixed" potential vulnerabilities
  • Reformatted half the project
  • Updated outdated dependencies

Result: PR with 100,000+ lines. GitLab physically couldn't display the diff. We spent two weeks figuring it out. The product was broken.

😰 "This was very painful."

🚜 Imagine a combine harvester that suddenly decided it's not just harvesting wheat, but also "optimizing" the entire field — plowing, seeding, and building a barn. Except the harvester is broken and on fire. That's what uncontrolled AI looks like on a large codebase.

Lesson: You need rules that explicitly limit the scope of AI work. Otherwise, you get a "helpful" AI that turns your simple feature into a full-scale renovation project — with demolition included. 🔥

Rake #3: Token Limitation

We didn't immediately understand that most models have context limited to 100-200K tokens. For an Enterprise project, this is enough for 3-5 iterations.

Then AI starts "forgetting" the beginning of the conversation. Proposes solutions you've already rejected. Repeats mistakes.

Lesson: For Enterprise, you need models with at least 1 million tokens of context.

🧠 "Enterprise projects require at least 1 million tokens of context. Otherwise, it doesn't work."Opus

Rake #4: Auto-Mode Is a Trap

Cursor can automatically select a model. Sounds convenient? In practice, it often chooses a "cheap" model with a small context.

We wasted a lot of time before we understood: for serious work, you need to manually select the model.

Lesson: Opus for planning, Gemini Flash for implementation. No auto-mode.


How We Solved the Context Problem

After all the rakes, we developed a system. It's not perfect, but it works.

Two-Level Task Tracking

Diagram 9: Jira + Beads Tracking

Jira — top level. Tasks for the team: "VP-385: Add registration form."

Beads — atomic level. Tasks for AI:

  • "bd-1: Review file UserForm.tsx"
  • "bd-2: Add email field"
  • "bd-3: Write test"

Beads is stored locally, syncs with git. AI always knows what step it stopped at.

Memory Bank

This is "external memory" for AI. We store:

Diagram 10: Memory Bank Structure

File Purpose
activeContext.md Current focus — what we're working on now
progress.md Implementation status — what's already done
research-*.md Investigations — what we found out
archive-*.md Completed tasks — historical reference

Usage example:

AI: "Look at all my commits and summarize them"
Memory Bank → Indexing → Result
Enter fullscreen mode Exit fullscreen mode

💡 "Memory Bank is a lifesaver. Without it, context is lost forever."

When AI "forgets" context, it can access Memory Bank and restore understanding.

SuperCode Workflows

SuperCode adds another layer of acceleration:

Diagram 11: SuperCode Workflow

Feature Description
Smart Actions Custom automation workflows via JSON/YML in .supercode/actions/
Prompt Updaters Transform prompts via AI, URL, or shell commands
Voice Commands Trigger actions by voice
Nested Workflows Sequential execution with run: true for multi-step pipelines

Workflow Example (from SuperCode docs):

{
  "Architecture Design": {
    "mode": "SC:Architect",
    "model": "o3",
    "prompt": "Design the architecture for: $prompt",
    "run": true
  },
  "Implementation": {
    "model": "claude-4-sonnet",
    "prompt": "Implement based on the design: $prompt",
    "run": true
  },
  "Full Feature Workflow": {
    "actions": ["Architecture Design", "Implementation"]
  }
}
Enter fullscreen mode Exit fullscreen mode

💡 "I imagine myself as Tony Stark talking to Jarvis."

Model Combination

We split work between two models:

Claude Opus 4.5 — architect. Creates plans, writes specs, conducts reviews. It has "only" 200K tokens, but for planning that's enough.

Gemini 3 Flash — executor. Implements code according to plan. 1 million tokens of context — can work for hours without losing the thread.

Diagram 12: Model Combination Cycle

Cycle: Opus plans → Gemini implements → Opus reviews.


Project Statistics

Over 1.5 weeks of work on the feature/timeline branch:

Metric Value
Commits 425
Files changed 672
Lines added +85,000
Lines removed -11,000
Tests added ~200
Tokens spent 1.5 billion

What was implemented:

  • ✅ Full MSW mocking system (50+ handlers)
  • ✅ Schedule Timeline with Gantt chart
  • ✅ Quality Gates (ESLint, TypeScript, Husky)
  • Beads integration
  • ✅ 200+ unit tests

Comparison: Traditional Development vs AI

Diagram 13: Traditional vs AI Development

Honest comparison:

Parameter Traditional With AI
Time per feature 2-3 weeks 1.5 weeks*
Code quality Depends on developer High (Quality Gates)
Tests Often skipped 200+ automatically
Documentation Often none Generated

Including infrastructure setup, learning, and all the rakes.

Important nuance: the first time is expensive. We spent 1.5 weeks understanding how this works. Setting up rules. Stepping on rakes.

💰 "First time is expensive. Second time is 10x faster."Opus

The second feature will take 10 times less time.


Role Evolution

AI coding changes team roles:

Diagram 14: Role Evolution

Analyst no longer just "writes specs." They become a junior developer:

  • ✅ Understand SQL queries
  • ✅ Work with Git (branches, commits, PRs)
  • ✅ Read code at a basic level
  • ✅ Use AI prompts effectively

Developer no longer just "writes code." They become an architect:

  • ✅ Design patterns over language syntax
  • ✅ System architecture skills
  • ✅ DevOps fundamentals
  • ✅ Any language: Java, Node.js, Python, Go — AI writes them all

🎖️ "Developers become universal soldiers."

Developers become universal specialists. Can work with any stack because they understand principles, not syntax.


Conclusions and Recommendations

The Complete Architecture

Diagram 15: Complete Architecture

What Works

  1. Figma → Prototype → Production — design first, implement in small steps
  2. Opus + Gemini combination — smart architect + fast executor
  3. Quality Gates — the stricter the constraints, the better the result
  4. Two-level trackingJira for team, Beads for AI
  5. Memory Bank — external memory to not lose context
  6. SuperCode Workflows — chain automation for AI actions
  7. Data mocking — complete development autonomy

What Doesn't Work

  1. Auto-mode for model selection
  2. AI without constraints (will fix the entire project)
  3. Models with context less than 1M tokens for Enterprise

Checklist for Getting Started


Conclusion

The battle for context hasn't been won yet. Technologies evolve, context windows grow, but the problem remains.

Enterprise projects are too large for AI to "see" them in full. This means we need systems that help AI maintain focus. Task trackers, Memory Bank, Quality Gates.

We spent 1.5 billion tokens to understand this. I hope our experience helps you spend less.

🏆 "The battle for context hasn't been won yet. But we know how to fight."


What's your experience with AI coding in large projects? Share in the comments!


🔗 Click to expand: All Resources & Links

🔗 Resources & Links

AI Coding Tools

Tool Description
Cursor AI-first code editor with multi-model support
GitHub Copilot AI pair programmer by GitHub
Claude Code Anthropic's agentic coding tool

AI Models

Model Description
Claude Opus 4.5 Anthropic's most capable model (200K context)
Gemini 3 Flash Google's fast model (1M context)
ChatGPT OpenAI's conversational AI

MCP Servers

Server Description
Model Context Protocol Protocol for connecting tools to AI
Memory Bank MCP Persistent context storage for AI
Figma MCP Design-to-code integration
Context7 Library documentation for AI
Beads Local atomic task tracking

Workflow Automation

Tool Description
SuperCode AI workflow chains, voice input, prompt enhancement

Quality Gates

Tool Description
ESLint JavaScript/TypeScript linter
TypeScript Typed JavaScript
Vitest Fast unit test framework
commitlint Commit message linter
Secretlint Prevent committing secrets
Husky Git hooks made easy

Mocking & Testing

Tool Description
MSW Mock Service Worker for API mocking
@faker-js Generate realistic fake data
Playwright E2E testing automation

Security

Tool Description
Snyk Security scanning for dependencies

Project Management

Tool Description
Jira Team-level task management
GitLab DevOps platform
Git Version control system

Design & Frontend

Tool Description
Figma UI/UX design tool
React UI library
Node.js JavaScript runtime


About the Author

Software Engineer.

Tools: Cursor IDE, Claude Opus 4.5, Gemini 3 Flash, SuperCode.

Try Cursor IDE — The AI Code Editor

_ #ai #cursor #enterprise #programming #devjournal _

Top comments (0)