Samuvel Pandian

Posted on Apr 14

I Built a LangGraph Agent That Audits Android Projects — Here's the Architecture

#android #python #ai #langgraph

The Problem Every Android Team Ignores Until It Hurts

You know that feeling. You open an Android project you haven't touched in three months and the build fails. AGP needs bumping. Kotlin wants a new version. The Compose BOM has moved two releases ahead. You update, fix the cascade of breaks, and push — only for someone to flag that your targetSdk is two versions behind and Google Play will start rejecting uploads next quarter.

That's the dependency side. Meanwhile, your AndroidManifest.xml has an exported Service with no permission guard because someone added it during a hackathon. Your codebase still has AsyncTask in a file nobody wants to refactor. You're at 20% Compose adoption with 47 XML layouts and no migration plan. The tech debt isn't in one place — it's scattered across build files, manifests, source trees, and version catalogs.

The tools exist in isolation: lint catches some things, dependency update plugins catch others, manual manifest review covers the security angle. But nobody runs all of them together, nobody prioritizes the results, and nobody tells you "fix the exported BroadcastReceiver before you worry about bumping that minor AndroidX version." That's the gap I wanted to close.

The Solution: One Command, Full Audit, AI Prioritization

DroidDoctor is an open-source CLI that scans your entire Android project — Gradle dependencies, manifest security, deprecated API usage, Compose adoption — and feeds the structured findings to an LLM that returns a prioritized action plan. One command. A health score from 0-100. Concrete fix-now vs. fix-later recommendations. Built with LangGraph so the analysis pipeline is a proper state machine, not a pile of scripts.

Architecture Deep Dive: A 7-Node State Machine

Here's the graph topology:

┌─────────────────────────┐
│   scan_project_structure │  ← entry point
└────────────┬────────────┘
             │
     ┌───────▼────────┐  error?
     │ conditional     │──────────► END
     │ routing         │
     └───────┬────────┘
             │ continue
     ┌───────▼──────────────────┐
     │ analyze_gradle_deps      │
     └───────┬──────────────────┘
     ┌───────▼──────────────────┐
     │ audit_manifest            │
     └───────┬──────────────────┘
     ┌───────▼──────────────────┐
     │ detect_deprecated_apis    │
     └───────┬──────────────────┘
     ┌───────▼──────────────────┐
     │ check_compose_adoption    │
     └───────┬──────────────────┘
     ┌───────▼──────────────────┐
     │ collect_results           │  ← fan-in sync point
     └───────┬──────────────────┘
             │
     ┌───────▼──────────────────┐  --no-llm?
     │ llm_analyze               │──── skip ──┐
     └───────┬──────────────────┘             │
     ┌───────▼──────────────────┐             │
     │ generate_report           │◄────────────┘
     └───────┬──────────────────┘
             │
            END

Every node receives and returns the same AgentState TypedDict. LangGraph merges each node's return dict into the running state automatically — so nodes only return the keys they modify. This is the core pattern that makes the whole thing work cleanly.

The Key Design Decision

The LLM never scans files. Each analysis node is deterministic Python code — regex, XML parsing, HTTP version checks. The LLM only receives a structured JSON summary at the end and synthesizes it into a prioritized report. This gives you reproducible scans, fast execution, and an --no-llm offline mode for CI pipelines.

What Each Node Does

scanner — Parses settings.gradle(.kts) to discover modules via regex on include() declarations, then walks each module directory to locate build files, manifests, source trees, and layout directories. Populates a list of ModuleInfo dataclasses.

gradle — Parses build.gradle(.kts) and libs.versions.toml for every module. Extracts dependencies, SDK versions, AGP/Kotlin/Compose BOM versions. Then hits the Google Maven and Maven Central HTTP APIs to fetch latest versions and classify each dependency's staleness as critical, major, or minor.

manifest — Uses xml.etree.ElementTree to parse each AndroidManifest.xml. Checks for exported components without permission guards, dangerous permissions, cleartext traffic enabled, hardcoded debuggable=true, and missing backup rules.

deprecated — Regex-scans every .kt and .java file against 13 patterns: AsyncTask, IntentService, startActivityForResult, LocalBroadcastManager, ViewModelProviders.of(), android.support.* imports, kotlin-android-extensions, kapt, and more. Each hit gets a concrete replacement suggestion.

compose — Counts XML layout files across all res/layout* directories and Kotlin files containing @Composable annotations. Calculates adoption percentage. Simple metric, but it makes migration progress visible.

analyzer — Serializes all findings to JSON, sends them to an LLM with an opinionated system prompt, and extracts a health score from the response. Supports Claude, OpenAI, Gemini, Groq, and Ollama via dynamic imports.

reporter — Assembles everything into a markdown report with tables for dependencies, manifest issues, deprecated APIs, and the AI analysis section.

Conditional Routing

Two routing decisions exist in the graph. First, after scanning: if no build.gradle files are found, the state gets an error key and the graph routes to END immediately. Second, the --no-llm flag: when set, build_graph(skip_llm=True) wires collect_results directly to generate_report, skipping the LLM node entirely. The graph is literally different depending on the flag — not just a runtime check.

Code Walkthrough

The State That Flows Through Everything

class AgentState(TypedDict, total=False):
    project_path: str
    project_name: str
    is_multi_module: bool
    modules: list[ModuleInfo]

    llm_provider: str   # claude | openai | gemini | groq | ollama
    llm_model: str | None

    gradle_deps: list[GradleDependency]
    sdk_versions: SdkVersions
    manifest_issues: list[ManifestIssue]
    deprecated_apis: list[DeprecatedApiUsage]
    compose_metrics: ComposeAdoptionMetrics | None

    health_score: int   # 0-100
    llm_report: str
    final_report: str
    error: str | None

total=False means every key is optional — nodes only populate their slice. LangGraph handles the merge.

Version Catalog Parsing (The Tricky Part)

Gradle's libs.versions.toml has three formats for declaring a library, and you have to handle all of them:

def _parse_library_line(line: str, catalog: VersionCatalog) -> None:
    alias_match = re.match(r'^(\S+)\s*=\s*(.+)$', line)
    if not alias_match:
        return

    alias = alias_match.group(1)
    value = alias_match.group(2).strip()

    # Format 1: "group:artifact:version"
    simple = re.match(r'^"([^:]+):([^:]+):([^"]+)"$', value)
    if simple:
        catalog.libraries[alias] = ParsedDependency(
            group=simple.group(1), artifact=simple.group(2),
            version=simple.group(3),
        )
        return

    # Format 2+3: { module = "g:a", version.ref = "x" }
    #         or: { group = "g", name = "a", version.ref = "x" }
    group = _extract_field(value, "group")
    name = _extract_field(value, "name")
    module = _extract_field(value, "module")
    version_ref = _extract_field(value, "version.ref")

    if module:
        parts = module.split(":")
        if len(parts) == 2:
            group, name = parts

    resolved_version = None
    if version_ref and version_ref in catalog.versions:
        resolved_version = catalog.versions[version_ref]

    catalog.libraries[alias] = ParsedDependency(
        group=group, artifact=name, version=resolved_version
    )

Then, to connect catalog entries to actual build files, you convert the TOML alias to a Gradle accessor (androidx-core-ktx becomes libs.androidx.core.ktx) and check if it appears in the build file content. This two-phase approach handles the indirection that version catalogs introduce.

Wiring the Graph

def build_graph(skip_llm: bool = False) -> StateGraph:
    graph = StateGraph(AgentState)

    graph.add_node("scan_project_structure", scan_project_structure)
    graph.add_node("analyze_gradle_dependencies", analyze_gradle_dependencies)
    graph.add_node("audit_manifest", audit_manifest)
    graph.add_node("detect_deprecated_apis", detect_deprecated_apis)
    graph.add_node("check_compose_adoption", check_compose_adoption)
    graph.add_node("collect_results", _collect_results)

    if not skip_llm:
        graph.add_node("llm_analyze", llm_analyze)

    graph.add_node("generate_report", generate_report)
    graph.set_entry_point("scan_project_structure")

    graph.add_conditional_edges(
        "scan_project_structure",
        _route_after_scan,
        {"continue": "analyze_gradle_dependencies", "error": END},
    )

    # Sequential chain through analysis nodes
    graph.add_edge("analyze_gradle_dependencies", "audit_manifest")
    graph.add_edge("audit_manifest", "detect_deprecated_apis")
    graph.add_edge("detect_deprecated_apis", "check_compose_adoption")
    graph.add_edge("check_compose_adoption", "collect_results")

    if skip_llm:
        graph.add_edge("collect_results", "generate_report")
    else:
        graph.add_edge("collect_results", "llm_analyze")
        graph.add_edge("llm_analyze", "generate_report")

    graph.add_edge("generate_report", END)
    return graph

Notice how skip_llm changes the graph topology at build time, not at runtime. The compiled graph is a different shape depending on the flag.

The LLM Prompt

The system prompt is opinionated by design — it forces the LLM to be a senior engineer, not a polite assistant:

SYSTEM_PROMPT = """\
You are DroidDoctor, a senior Android engineer performing a project health review.

Given the audit data, produce:
1. A HEALTH SCORE (0-100) based on:
   - Dependency freshness (30% weight)
   - Security posture from manifest (30% weight)
   - Code modernization / deprecated API usage (20% weight)
   - Compose adoption progress (20% weight)

2. A prioritized action plan grouped by:
   FIX NOW — security vulnerabilities, critical outdated deps
   FIX THIS SPRINT — major version gaps, deprecated APIs
   PLAN FOR NEXT QUARTER — compose migration, minor updates

Be opinionated. A senior Android engineer would be direct.
Avoid generic advice — reference the specific deps and files found.
"""

Dynamic Provider Loading

Five LLM providers, zero hard dependencies beyond the default:

PROVIDER_CONFIG = {
    "claude":  {"package": "langchain_anthropic",    "class": "ChatAnthropic",           "default_model": "claude-sonnet-4-20250514"},
    "openai":  {"package": "langchain_openai",       "class": "ChatOpenAI",              "default_model": "gpt-4o"},
    "gemini":  {"package": "langchain_google_genai",  "class": "ChatGoogleGenerativeAI", "default_model": "gemini-2.5-pro-preview-06-05"},
    "groq":    {"package": "langchain_groq",          "class": "ChatGroq",               "default_model": "llama-3.3-70b-versatile"},
    "ollama":  {"package": "langchain_ollama",         "class": "ChatOllama",             "default_model": "llama3.1"},
}

def _build_llm(provider: str, model: str | None = None) -> BaseChatModel:
    config = PROVIDER_CONFIG[provider]
    module = importlib.import_module(config["package"])
    cls = getattr(module, config["class"])
    return cls(**{config["model_kwarg"]: model or config["default_model"]})

importlib.import_module means you only need to install the provider you actually use. Run with Ollama locally? No API key needed, no langchain-anthropic in your environment.

Example Output

╭──────── Health Score — MyApp ────────╮
│            62/100                     │
╰──────── 3 module(s) scanned ─────────╯

┌─────────────── Outdated Dependencies ────────────────┐
│ Dependency                 │ Current │ Latest │ Sev.  │
│ com.android.tools.build    │ 8.1.0   │ 8.7.3  │ MAJOR │
│ org.jetbrains.kotlin       │ 1.9.10  │ 2.1.0  │ CRIT. │
│ targetSdk                  │ 33      │ 35     │ CRIT. │
│ androidx.core:core-ktx     │ 1.10.0  │ 1.15.0 │ MINOR │
└──────────────────────────────────────────────────────┘

┌─────────────── Manifest Issues ──────────────────────┐
│ CRITICAL │ Service "SyncService" exported without     │
│          │ intent-filter or permission protection     │
│ CRITICAL │ Cleartext (HTTP) traffic is enabled        │
│ WARNING  │ Dangerous permission: CAMERA               │
└──────────────────────────────────────────────────────┘

╭──── Compose Adoption ────╮
│ XML Layouts: 47          │
│ Composables: 12          │
│ Coverage: 20.3%          │
│ ░░░░████████████████ 20% │
╰──────────────────────────╯

╭──────────── AI-Powered Analysis ─────────────╮
│ FIX NOW:                                      │
│ - SyncService is exported with no permission  │
│   guard. Any app on the device can bind to    │
│   it. Add android:permission or set           │
│   exported="false".                           │
│ - Kotlin 1.9 → 2.1 is a major jump. Do it    │
│   before the ecosystem moves further.         │
│                                               │
│ THIS SPRINT:                                  │
│ - Replace AsyncTask in NetworkHelper.java     │
│   with viewModelScope.launch + Dispatchers.IO │
│ - Bump AGP 8.1 → 8.7 (required for          │
│   targetSdk 35 support)                       │
│                                               │
│ NEXT QUARTER:                                 │
│ - 47 XML layouts at 20% Compose. Start with  │
│   new screens only, convert settings/profile  │
│   screens first (low risk).                   │
╰──────────────────────────────────────────────╯

Quick Start

pip install droiddoctor

# Set your preferred LLM provider key
export ANTHROPIC_API_KEY=sk-ant-...

# Run against your project
droiddoctor /path/to/your/android-project

# Or use a different provider
droiddoctor . --provider openai

# Offline mode (no LLM, just raw scan data)
droiddoctor . --no-llm

What I Learned Building This

LangGraph vs. rolling your own orchestration. For a linear pipeline like this, you could absolutely get away with calling functions in sequence. But LangGraph gives you conditional routing, state management, and streaming progress updates out of the box. When I added --no-llm, I didn't write an if-statement in a run loop — I changed the graph topology. When I eventually add parallel execution for the analysis nodes, I change edges, not control flow. The abstraction pays for itself the moment your pipeline gets a second branch.

Deterministic nodes + LLM synthesis beats giving the LLM raw files. Early prototype: I fed build.gradle content directly to Claude and asked it to find issues. It hallucinated dependency versions. It missed the version catalog indirection. It couldn't check Maven Central for latest releases. The current architecture is: code does the scanning (correctly, every time), LLM does the reasoning about what matters most. The LLM is dramatically better at prioritization than at file parsing.

Version catalog parsing is surprisingly tricky. Three declaration formats ("g:a:v", { module = "g:a" }, { group = "g", name = "a" }), version indirection through version.ref, alias-to-accessor name conversion (androidx-core-ktx becomes libs.androidx.core.ktx in Gradle). I considered using a TOML parser library, but the line-by-line regex approach handles the formats you actually see in real projects without pulling in another dependency.

Cap penalty per category. The health score uses weighted categories with maximum deductions: dependencies can take off at most 30 points, security at most 30, deprecated APIs at most 20, Compose adoption at most 20. Without caps, a project with 50 outdated dependencies would score 0 even if the manifest was clean and the code was modern. Caps keep the score meaningful — it reflects breadth of issues, not depth of one problem.

What's Next

DroidDoctor is early and there's plenty of room to contribute:

Parallel analysis nodes — The four analysis nodes are sequential today. LangGraph supports fan-out/fan-in; wiring them to run in parallel would cut scan time significantly.
ProGuard/R8 rule validation — Check for missing keep rules, overly broad rules, and common obfuscation mistakes.
CI integration — GitHub Action that posts the health report as a PR comment, with score trend tracking over time.
Gradle build scan integration — Pull data from existing build scans instead of re-parsing files.
Custom rule definitions — Let teams define their own deprecated patterns and manifest policies via a config file.

The repo is at github.com/samuvelp/droiddoctor. Issues and PRs welcome.

DEV Community