Sopaco

Posted on Jan 25

How Cowork & Coding Agent Only Modifies Files That Need Changing

#openai #cowork #aiops

Cowork Forge - An open-source AI multi-agent development platform, serving as both an embeddable AI Coding engine and a standalone production-grade development tool. GitHub: https://github.com/sopaco/cowork-forge

Introduction

Have you ever encountered this scenario:

Your project has been under development for some time, and suddenly the product manager runs over and says: "We need to add a 'tags' feature for users."

If you follow the traditional development process, you might need to: manually analyze which files need modification, modify the data model, update API interfaces, modify frontend pages, update test cases... and worry about whether you've missed any files.

If you use AI tools, many tools choose "full regeneration"—regenerating the entire project's code. But this brings new problems: your previously manually optimized code gets overwritten, your added comments and documentation are lost, unrelated files get modified, and you need to review all the code again.

This is the problem that "incremental code updates" aims to solve.

The core idea of incremental code updates is: intelligently identify the scope of requirement changes, only modify affected files, and preserve user custom code.

In this article, I'll explore Cowork Forge's incremental code update mechanism in depth, looking at how it analyzes change impact, generates precise update plans, and how to apply it in actual projects.

The Problem with "Full Regeneration"

Before discussing incremental updates, let's look at the problems brought by "full regeneration."

Typical Full Regeneration Workflow

When using some AI tools for code generation, the typical process is: requirement change → AI analyzes new requirements → regenerate all files → overwrite original files → user custom code lost → need to review all code again → manually restore custom code.

The problem with this workflow is: AI doesn't know which code was manually added by users and which was AI-generated, so it overwrites all files, including user custom code.

Problems with Full Regeneration

First, overwriting user custom code. This is the most serious problem. Suppose you added performance optimization code in a certain file—if AI fully regenerates, your optimization code gets overwritten. Your detailed field descriptions and constraint conditions added to the user data model are also lost. Your custom validation logic added to user business logic is also overwritten.

Second, losing comments and documentation. Your added detailed comments and documentation are also lost. These comments and documentation might contain important business logic explanations, design decision records, API usage examples, etc. Losing this information increases subsequent maintenance difficulty.

Third, modifying unrelated files. Full regeneration might modify some unrelated files, increasing unnecessary risk. For example, AI might modify a completely unchanged configuration file, causing configuration to be reset.

Fourth, need to review all code again. Even if only 10% of files truly need modification, you need to review 100% of the code, wasting a lot of time. Git diff shows a large number of changes, even though most changes are unnecessary.

Actual Case: Full Regeneration of a Feature Module

Suppose you have a user management module containing user data model, user API handlers, user route definitions, user business logic, and other files. Now you need to add a "user tags" feature.

If you use full regeneration, AI will regenerate all files, adding tag fields, tag-related APIs, tag routes, tag business logic. But the problem is, your previously added caching logic in user business logic gets overwritten, your added logging in user API handlers gets overwritten, your added field validation in user data model gets overwritten.

What's the consequence? You need to manually restore all custom code, need to retest all features, and might introduce new bugs.

If you use incremental updates, AI will analyze change impact, identify affected files, generate an incremental plan, only add tag fields, only add tag-related APIs, only add tag routes, only add tag business logic, preserve caching logic. What's the result? Custom code is preserved, only need to review changed parts, Git diff is clear and concise.

This case clearly demonstrates the advantage of incremental updates: it only modifies files that truly need modification, preserves user custom code, and greatly reduces review and repair workload.

Core Ideas of Change Impact Analysis

The core of incremental code updates is change impact analysis—identifying which files and code are affected by requirement changes.

Layers of Impact Analysis

Change impact analysis can be divided into several layers: requirements layer analysis (identify changed requirements), design layer analysis (identify changed components), implementation layer analysis (identify changed modules), file layer analysis (identify affected files), code layer analysis (identify affected code snippets).

The benefit of this layered analysis is: from macro to micro, gradually narrowing the impact scope, ensuring analysis accuracy.

Requirements layer analysis identifies new requirements, deleted requirements, modified requirements. For example, if the PRD adds a "user tags" feature, this is a new requirement.

Design layer analysis identifies components that need to be added, components that need to be modified. For example, the user data model needs to add a tags field, the user API handler needs to add tag-related interfaces.

Implementation layer analysis identifies modules that need to be added, modules that need to be modified. For example, need to add a tag management module, need to modify the user management module.

File layer analysis identifies files that need to be added, files that need to be modified. For example, need to add tag-related API files, need to modify user data model files.

Code layer analysis identifies code snippets that need modification. For example, need to add a tags field in the User struct, need to add tag-related processing logic in user API handlers.

File Dependency Relationship Construction and Analysis

The core of change impact analysis is constructing a file dependency relationship graph.

The dependency relationship graph contains nodes (representing a file) and edges (representing dependency relationships). Nodes contain file path, file type, exported content, imported content. Edges contain dependency source, dependency target, dependency type (direct import, type reference, function call, data flow).

The process of constructing a dependency relationship graph is: scan all source files, analyze each file, parse AST (Abstract Syntax Tree), extract imports and exports, add nodes, build dependency relationships.

Impact propagation analysis finds direct dependencies (files that depend on the current file) and indirect dependencies (files that depend on direct dependencies), uses breadth-first search to traverse the dependency relationship graph, calculates propagation depth.

The benefit of this dependency relationship analysis is: when a file is modified, it can quickly find all affected files, ensuring no files needing updates are missed.

API-Level Impact Propagation

Besides file-level dependencies, we also need to analyze API-level impact.

API-level impact analysis identifies API changes (add, delete, modify, rename), analyzes breaking changes, identifies all affected consumers. For example, if you modify an API signature, all code calling this API needs to be updated. AI will identify this affected code and include these changes in the incremental plan.

The benefit of this API-level impact analysis is: it can ensure API changes don't break existing callers, guaranteeing system stability.

Incremental Update Mechanism Details

After understanding change impact analysis, let's look at how Cowork Forge's incremental update mechanism works.

CodeUpdater's Working Principle

CodeUpdater is the core component responsible for incremental updates.

It contains a dependency relationship graph, code analyzer, and impact analyzer. When receiving design changes, it analyzes change impact, generates an update plan, and optimizes the update plan.

The dependency relationship graph is used to track dependencies between files, the code analyzer is used to analyze code structure, and the impact analyzer is used to analyze the scope of change impact.

Update Plan Generation

The process of generating an update plan is: sort files by dependency, generate update instructions for each file, add new files.

Sorting files by dependency ensures correct dependency relationships—if file A depends on file B, then file B should be modified first. This can be achieved through topological sorting.

Generating update instructions for each file analyzes file changes—if a file has changes, it's added to the update plan. Update instructions include file path, change type (add, modify, delete), change content.

Adding new files generates templates for each new file. Templates are generated according to the project's coding standards and conventions, ensuring new files are consistent with existing code style.

How to Preserve User Custom Code

Preserving user custom code is the core challenge of incremental updates. Cowork Forge uses the following strategies.

First, code region marking. AI-generated code adds markers, and user custom code also adds markers. This way, during incremental updates, AI can identify which code is AI-generated and which is user custom.

Second, code difference analysis. It analyzes differences between original code and new code, identifies user custom code, and generates differences. This can be achieved by comparing ASTs of two versions.

Third, code merge strategy. It analyzes original code, new code, and user code, identifies conflicts, resolves conflicts, and generates merged code. Merge strategies include: preserve user code, merge AI code, resolve conflicts.

The benefit of this design is: user custom code is preserved, AI's new code is merged, conflicts are intelligently resolved, greatly reducing user workload.

Complete Process Demonstration

Let's look at how incremental updates work through a complete case.

Scenario: Adding Tag Functionality to User Module

Suppose we have a user management module, and now need to add tag functionality.

The original user data model defines user ID, name, email, creation time, update time, and other fields, plus a user custom validation method that checks if user name is empty and if email contains @ symbol.

Requirement Change

The requirement has changed, and the PRD adds a new requirement: users can add tags to tasks for categorization and filtering.

Design Change

The design document is also updated, and the user table adds a tags field.

Incremental Update Process

The incremental update process is: detect PRD change → compare old and new versions → identify requirement differences → map affected files → generate incremental plan → HITL confirm change plan → user confirms → code executor implements changes → verification module executes tests → verification results → if passed update TodoList status, if failed error analyzer diagnoses → analyze failure cause → if planning error return to mapping affected files, if execution error local fix, if environment error environment fix.

graph LR
    A[Detect PRD Change] --> B[Compare Old and New Versions]
    B --> C[Identify Requirement Differences]
    C --> D[Map Affected Files]
    D --> E[Generate Incremental Plan]
    E --> F[HITL Confirm Change Plan]
    F --> G{User Confirms?}
    G -->|Yes| H[Code Executor Implements Changes]
    G -->|No| I[Plan Adjustment]
    I --> D
    H --> J[Verification Module Executes Tests]
    J --> K{Verification Results?}
    K -->|Passed| L[Update TodoList Status]
    K -->|Failed| M[Error Analyzer Diagnoses]
    M --> N[Analyze Failure Cause]
    N --> O{Error Type?}
    O -->|Planning Error| D
    O -->|Execution Error| P[Local Fix]
    P --> J
    O -->|Environment Error| Q[Environment Fix]
    Q --> J

    classDef process fill:#e1f5fe,stroke:#01579b,stroke-width:2px
    classDef decision fill:#fff3e0,stroke:#e65100,stroke-width:2px
    classDef action fill:#e8f5e8,stroke:#2e7d32,stroke-width:2px

    class A,B,C,D,E,F,H,J,L,M,N process
    class G,K,O decision
    class I,P,Q action

This flowchart shows the complete incremental update process. You can see this is an intelligent process with feedback loops—if problems occur, it intelligently analyzes the cause and takes appropriate measures.

Change Impact Analysis

AI will analyze design changes and identify affected files.

Affected files include: user data model (add tags field, preserve user custom validate method), user API handler (update API processing logic, add include_tags parameter), user route definition (might need to update routes), user business logic (update business logic, add add_tag method), database migration (add new migration file).

Generate Incremental Plan

AI will generate an incremental plan containing file updates and file creation.

File updates include: user data model (add tags field, preserve user custom validate method), user API handler (update get_user handler, add include_tags parameter), user business logic (add add_tag method).

File creation includes: database migration file (create new migration file).

HITL Confirm Change Plan

User will review the change plan, seeing change plan summary (3 files modified, 1 file added, expected impact scope medium) and detailed changes (user data model adds tags field and preserves user custom validate method, user API handler updates get_user handler and adds include_tags parameter, user business logic adds add_tag method, create new database migration file).

Updated Code

After update, the user data model adds a tags field (Option>), preserving the user custom validate method. Note: the user custom validate method is completely preserved!

Verification Results

After code update completes, the verification module executes tests. The check report shows build status success, test status passed, all 18 test cases passed, user code preserved, migration applied.

Technical Challenges and Solutions

Although incremental code updates are powerful, they also face some technical challenges.

Complexity of Dependency Relationship Construction

First, multi-language support. Different programming languages have different dependency relationships, dynamic language dependency relationships are difficult to analyze statically, advanced features like macros and templates increase analysis difficulty.

The solution is to support multi-language dependency analysis. Implement language analyzers for each language, use language-specific parsing tools. For dynamic languages, combine static analysis with runtime information.

Second, dynamic language support. For dynamic languages, combine static analysis with runtime information. Use static analyzers to analyze code structure, use runtime analyzers to collect runtime information, merge results from both analyses.

Boundary Case Identification and Handling

First, circular dependency detection. Detect circular dependencies, use depth-first search to traverse the dependency relationship graph, identify circular dependencies.

Second, conditional compilation handling. Handle conditional compilation, identify conditional compilation directives, evaluate conditions, analyze dependencies within conditional blocks.

Performance Optimization Strategies

First, incremental analysis. Only analyze changed parts, use caching, only analyze changed files. If a file is in cache, use cached results; otherwise, reanalyze and update cache.

Second, parallel analysis. Analyze multiple files in parallel, use async tasks to process multiple files in parallel, improving analysis speed.

Third, cache optimization. Use cache optimization performance, check cache, if cache hit, directly return cached results; otherwise, execute analysis and update cache.

Summary

Incremental code updates are one of Cowork Forge's core features. Through intelligent change impact analysis, it only modifies affected files and preserves user custom code.

Core Value of Incremental Updates

First, preserve user custom code. Won't overwrite user manually optimized code, preserve user-added comments and documentation, maintain code's personal style.

Second, improve development efficiency. Only modify necessary files, reduce code review workload, lower risk of introducing bugs.

Third, version control friendly. Git diff is clear and concise, change history is easy to track, code review is more efficient.

Fourth, support iterative development. Rapidly respond to requirement changes, flexibly adjust feature implementation, maintain code quality.

Applicable Scenarios and Limitations

Incremental updates are suitable for projects with frequent requirement changes, projects that need to preserve user custom code, incremental development of large projects, and multi-user collaboration projects.

But incremental updates also have limitations: projects with complex dependency relationships might have inaccurate analysis, dynamic language dependency analysis is more difficult, needs good code structure support, first-time use has some learning cost.

Future Improvement Directions

First, smarter dependency analysis. Support more programming languages, improve dynamic language analysis accuracy, support more complex code patterns.

Second, more precise change identification. Improve change impact identification precision, reduce false positives and false negatives, support more fine-grained changes.

Third, smarter code merging. Improve code merging accuracy, support more complex conflict resolution, provide better merge suggestions.

Fourth, better performance optimization. Further improve analysis speed, reduce memory usage, support ultra-large projects.

Recommendations for Developers

First, maintain good code structure. Clear module division, clear dependency relationships, consistent coding style.

Second, use code markers. Mark AI-generated code, mark user custom code, facilitating incremental update identification.

Third, regularly review change plans. Carefully review AI-generated change plans, confirm change rationality, promptly adjust inappropriate changes.

Fourth, fully leverage version control. Use Git to manage code changes, commit code regularly, facilitating rollback and backtracking.

Conclusion

Incremental code updates are an important feature of AI-driven software development. It solves the problems brought by full regeneration, making AI tools more practical and reliable.

Through intelligent change impact analysis, incremental updates can precisely identify files that need modification, preserve user custom code, and improve development efficiency.

As AI technology develops, incremental updates will become smarter and more precise, providing developers with better experience.

Future software development isn't AI completely replacing humans, but AI and humans collaborating deeply. Incremental updates are an important embodiment of this collaboration.

Related Reading:

DEV Community