How to Build a Dependency Map of a Legacy Codebase Using AI Tools

#webdev #programming #ai

Before you refactor anything in a legacy codebase, you need to know what depends on what. Change a function signature without knowing its callers, and you break things in unexpected places. Rename a class without understanding its inheritance tree, and you introduce failures that are hard to trace. The dependency map is the safety information that makes everything else in a legacy modernization project lower-risk.

Building a complete dependency map manually is expensive. AI tools accelerate the process significantly - with important caveats about where they fail that you need to know upfront.

What a Dependency Map Needs to Capture

A useful dependency map for legacy modernization is not just an import graph. It needs to capture:

Direct imports and module dependencies: what each file imports from where
Function call relationships: which functions call which other functions
Data flow: what data structures are created in one module and consumed in another
External dependencies: third-party libraries and their versions
Configuration dependencies: environment variables, config files, and feature flags the code depends on
Database schema dependencies: tables and columns the code reads from and writes to

The import graph is the easiest layer to generate automatically. The others require more work, and the deeper you go, the more valuable the map becomes.

Step 1: Generate the Static Import Graph

Start with what static analysis can tell you. For Python, the built-in ast module lets you walk any Python file and extract all import statements:

import ast
import os
from pathlib import Path

def extract_imports(filepath):
    """Extract all imports from a Python file."""
    with open(filepath) as f:
        tree = ast.parse(f.read())

    imports = []
    for node in ast.walk(tree):
        if isinstance(node, ast.Import):
            imports.extend(alias.name for alias in node.names)
        elif isinstance(node, ast.ImportFrom):
            module = node.module or ''
            imports.append(module)
    return imports

def build_import_graph(root_dir):
    """Build a dependency graph for all Python files in a directory."""
    graph = {}
    root = Path(root_dir)
    for py_file in root.rglob('*.py'):
        relative = str(py_file.relative_to(root))
        graph[relative] = extract_imports(py_file)
    return graph

This gives you a starting point: a dictionary of every Python file and its direct imports. For JavaScript and TypeScript codebases, ESLint plugins and tools like madge (available on GitHub) provide similar static import analysis.

Step 2: Use AI to Enrich the Graph with Call Relationships

Static import analysis tells you which modules depend on which other modules. It does not tell you which specific functions are called across that dependency. This is where AI assistance becomes valuable.

For each module that has dependencies you care about, prompt the AI to identify the call relationships:

Analyze the following two files. I need to understand the call relationship between them.

For every function in module A that calls a function in module B:
1. Name the calling function in module A
2. Name the function being called in module B  
3. Note what arguments are passed
4. Note how the return value is used

Be explicit about any indirect calls (through variables, dictionaries, or dynamic dispatch).

[module A code]
[module B code]

This produces a function-level call graph that is significantly more useful than the module-level import graph for planning refactoring safely.

Important limitation: AI analysis misses dynamic call patterns. If your legacy code stores function references in dictionaries and calls them by key, or if it uses Python's getattr() to call methods by name strings, these dynamic calls will not appear in AI-generated call graphs. You need manual inspection to catch them.

Step 3: Document External Dependencies and Their Versions

Legacy codebases often have complex dependency situations: pinned versions of libraries that are years out of date, circular dependencies between internal packages, and third-party libraries that are no longer maintained.

For Python projects, your requirements.txt or Pipfile.lock is the starting point. For JavaScript, package-lock.json or yarn.lock. Extract the full dependency tree including transitive dependencies:

# For Python: pip can generate a dependency tree
# pip install pipdeptree
# pipdeptree --json > dependencies.json

import json
import subprocess

result = subprocess.run(
    ['pipdeptree', '--json'],
    capture_output=True,
    text=True
)
deps = json.loads(result.stdout)

Feed this dependency list to the AI and ask it to identify:

Which dependencies are significantly out of date
Which dependencies have known security vulnerabilities (cross-reference with OWASP)
Which dependencies appear to have functional overlap (where you might be able to consolidate)
Which dependencies are no longer actively maintained

This produces a prioritized list of dependency modernization work that is separate from but informed by your code-level refactoring plan.

Step 4: Map Database Schema Dependencies

For legacy systems with significant database logic, the schema dependency is often the most complex layer. Business logic about what tables and columns mean gets embedded in code, and changing either requires understanding both.

The AI can help by analyzing SQL queries embedded in your codebase:

Analyze the following Python module. Extract every SQL query (including those
built dynamically through string concatenation). For each query:
1. Identify the tables accessed
2. Identify the specific columns read or written
3. Note whether it is a read or a write operation
4. Note any JOIN relationships

[module with SQL here]

This produces a table-and-column-level dependency map for each module. Modules that share table dependencies need to be refactored in coordination - changing a table schema without updating all the code that touches it is a common source of legacy modernization failures.

Step 5: Assemble and Visualize

With the import graph, call graph, external dependencies, and schema dependencies documented, you have a map that is genuinely useful for planning. The next step is making it navigable.

For complex codebases, a structured Markdown document organized by module works well as a starting point - it is human-readable and can be committed to version control alongside the code. For very large codebases, Git-tracked JSON or YAML dependency files, potentially visualized with a tool like Mermaid (available through GitHub), make the relationships searchable and interactive.

The dependency map is a living document. As you refactor modules, update the map to reflect the new structure. Over time it becomes the accurate documentation of what the codebase actually does, not just what it used to do.

Photo by cottonbro studio on Pexels

What to Do With the Map

The dependency map is most valuable for two decisions:

Refactoring sequence. Modules with few incoming dependencies (few things depend on them) are the safest to refactor first. Modules with many incoming dependencies need the most careful planning and testing before they change. The map tells you which is which.

Blast radius estimation. When you make a change, the dependency map tells you the maximum set of things that could be affected. Combined with your test suite, this lets you know whether you have adequate coverage before you touch something.

The full workflow for using this map during AI-assisted refactoring - including the prompting patterns that work best with this level of context - is covered in the guide on using AI coding assistants for legacy code modernization.

137Foundry provides legacy modernization services that include dependency mapping as a foundational assessment phase. Prettier and ESLint are useful companion tools for enforcing code style consistency as the refactoring proceeds. Node.js and Python.org official documentation are authoritative references for understanding the import and module systems of those runtimes.

A dependency map built before refactoring begins is an investment that pays back in avoided production incidents and faster, more confident changes throughout the modernization project.