Eresh Gorantla

Posted on Mar 22

From Stack Trace to Root Cause - Archexa's New Diagnose Command

#ai #sre #opensource #showdev

Archexa's new diagnose command correlates errors with actual source code to find the root cause, not just the symptom.

Archexa v0.2.1-beta. Apache 2.0 license. Single binary — macOS, Linux, Windows.

GitHub: github.com/ereshzealous/archexa

A stack trace tells you where your code failed. It doesn't tell you why.

NullPointerException at UserService.java:42 is the symptom. The root cause is three files upstream — a repository method that returns null instead of Optional, called by a service that doesn't check for it, triggered by a controller that passes through an unvalidated ID.

Finding that chain takes time. You read the failing line. You find the caller. You read the caller. You check what data it passed. You follow that data to where it originated. You do this across 3, 5, sometimes 10 files until you find the actual bug.

Most developers paste the stack trace into ChatGPT. It gives a plausible-sounding answer — but it doesn't have your code. It guesses based on the error message. It can't read your UserRepository.java to see that findManagerByDepartment() always returns null. It can't grep your codebase to find three other methods with the same missing null check.

That gap is what Archexa's new diagnose command fills. It has the stack trace and your codebase.

Three Ways to Feed It an Error

1. Stack Trace File (`--trace`)

Save your stack trace — from a console, crash report, or exception log — to a file:

archexa diagnose --config archexa.yaml --trace stacktrace.txt

Archexa parses stack frames from six languages:

Language	Stack Frame Format
Python	`File "path.py", line 42, in func`
Java / Kotlin	`at com.package.Class.method(File.java:42)`
Go	`/path/file.go:42 +0x1a4`
JavaScript / TypeScript	`at func (file.ts:42:10)`
Rust	`panicked at src/file.rs:42`
C#	`at Namespace.Class.Method() in File.cs:line 42`

It extracts every file path, line number, and function name from the trace — then maps them to actual files in your repository.

2. Log File (`--logs`)

Point it at your application logs:

archexa diagnose --config archexa.yaml --logs /var/log/app.log

It parses the log line by line, identifies error entries (lines containing ERROR, FATAL, EXCEPTION, panic, etc.), groups each error with its stack trace (subsequent indented lines), and extracts timestamps for filtering.

Filter by time window to focus on recent errors:

archexa diagnose --config archexa.yaml --logs app.log --last 1h      # last hour
archexa diagnose --config archexa.yaml --logs app.log --last 30m     # last 30 minutes
archexa diagnose --config archexa.yaml --logs app.log --last 2d      # last 2 days

Specify the log timezone if your timestamps aren't UTC:

archexa diagnose --config archexa.yaml --logs app.log --last 4h --tz "UTC+5:30"
archexa diagnose --config archexa.yaml --logs app.log --last 1h --tz "UTC-5"

The tool converts your log timestamps to UTC before comparing against the --last window. This way, a log entry at 10:15:33 in UTC+5:30 is correctly treated as 04:45:33 UTC.

When multiple errors are found, it analyzes the 20 most recent ones.

3. Inline Error (`--error`)

For a quick diagnosis from a single error message:

archexa diagnose --config archexa.yaml --error "NullPointerException at UserService.java:42"

It extracts any file:line references from the message and treats the whole string as the error to investigate.

Config Support

All diagnose options can be set in your archexa.yaml so you don't have to type them every time. CLI flags override config values.

archexa:
  source: "."

  openai:
    model: "google/gemini-2.5-flash"
    endpoint: "https://openrouter.ai/api/v1/"

  diagnose:
    logs: "logs/application.log"
    trace: ""
    error: ""
    last: "4h"
    tz: "UTC+5:30"

  prompts:
    diagnose: |
      Focus on the root cause, not the symptom.
      Show the exact code path that failed.
      Recommend a fix with before/after code.

With this config, you can just run:

archexa diagnose --config archexa.yaml

And it will read logs/application.log, filter errors from the last 4 hours in UTC+5:30. Override any field on the fly:

# Different log file, same config for everything else
archexa diagnose --config archexa.yaml --logs /tmp/crash.log

# Different time window
archexa diagnose --config archexa.yaml --last 1h

# Use a trace file instead of logs
archexa diagnose --config archexa.yaml --trace stacktrace.txt

What Happens Under the Hood

When you run diagnose, this is the sequence:

Parse — The input is parsed into structured data: error messages, file paths, line numbers, function names, timestamps, severity levels. A single log file might contain multiple errors — each is extracted as a separate entry with its associated stack trace lines.

Map — File paths from the stack trace are matched to actual files in your repository. UserService.java:42 becomes src/main/java/com/example/service/UserService.java. Files that don't exist in the repo (framework internals, third-party libraries) are noted but the agent focuses on what it can find.

Scan — Your repository is indexed — directory layout, file list, language distribution. No heavy parsing. Just enough for the agent to orient itself.

Investigate — This is where diagnose differs from everything else. The AI agent opens the files referenced in the stack trace at the reported line numbers. It reads the surrounding function. It finds the caller using find_references. It traces the data that caused the failure upstream using grep_codebase. It checks for similar patterns elsewhere. It makes 10-30 autonomous decisions about what to read next — exactly like a developer debugging unfamiliar code.

Synthesize — Everything the agent learned is distilled into a structured root cause analysis document.

What the Output Looks Like

The generated document follows a consistent structure:

1. Error Summary — What happened, where, and severity assessment.

2. Root Cause — The underlying reason, distinguished from the error location. "The error manifests at line 34, but the root cause is at line 37 where null originates."

3. Execution Flow — Step-by-step trace from entry point to failure, with file:line references at each step.

4. Affected Code — Table of every file involved in the error chain, what each does, and how it's affected.

5. Related Issues — Similar patterns elsewhere in the codebase that might have the same vulnerability.

6. Fix Recommendation — What to change, where, with before/after code snippets.

7. Prevention — Specific tests to write, validation to add, and monitoring suggestions.

Real Example: Spring Boot NullPointerException

I built a Spring Boot application with a deliberate bug to test diagnose:

A UserController with a /api/users/{id}/profile endpoint
A UserService that fetches the user and their department manager
A UserRepository where findManagerByDepartment() always returns null
The service calls manager.getName() without checking for null

Hit the endpoint, get a 500. Copy the stack trace, run diagnose:

archexa diagnose --config archexa.yaml --trace trace.txt

The agent traced the full chain:

Controller (UserController.java:28) → calls userService.getUserProfile(id)

Service (UserService.java:24) → calls userRepository.findManagerByDepartment(user.getDepartment())

Repository (UserRepository.java:37) → findManagerByDepartment() always returns null

Back to Service (UserService.java:34) → manager.getName() throws NullPointerException

It correctly distinguished the error location (line 34 where NPE occurs) from the root cause (line 37 where null originates).

It also found two additional bugs I hadn't asked about:

findById() returns null instead of Optional — a similar NPE waiting to happen
getActiveUser() doesn't check the active flag before returning

The fix it proposed: change the repository to return Optional<User>, add null checks in the service — with complete before/after code.

Scaling Test: FastAPI (2,600+ Files)

To test on a real-world codebase, I ran diagnose against the FastAPI framework with a simulated dependency injection cycle error.

archexa diagnose --config config-diagnose.yaml --logs sample-trace.txt --last 1d --tz "UTC+5:30"

The agent read fastapi/dependencies/utils.py (600+ lines), traced the recursive solve_dependencies function, and identified that dependency_cache caches solved values but doesn't track in-progress resolutions — meaning circular dependencies cause infinite recursion instead of a clear error.

It proposed the fix: a _solving_dependencies set parameter threaded through recursive calls, checked before each resolution. The standard cycle detection algorithm, applied to FastAPI's specific dependency resolution architecture.

18 tool calls across 10 iterations. 2,661 files scanned. 44 seconds total.

Why Deep Mode Is the Default

Every other Archexa command defaults to pipeline mode — you opt into deep mode with --deep. Diagnose is the exception. It defaults to deep mode because root cause analysis inherently requires reading actual source files.

A pipeline mode analysis can tell you "the error is probably in UserService.java." But it can't open the file, read the function, find the caller, and trace the data flow upstream. That requires the agent's tools — read_file, grep_codebase, find_references.

Use --no-deep if you want a quick pipeline assessment:

archexa diagnose --config archexa.yaml --trace crash.txt --no-deep

But the real value is in the investigation.

No Evidence Extraction

Diagnose skips Archexa's evidence extraction pipeline entirely in deep mode. The other commands (gist, analyze, query) scan every file and build AST blocks, dependency maps, and communication links before the agent starts. That takes 4-5 seconds on a large repo and produces data the agent mostly ignores because it reads the actual files anyway.

Diagnose is a focused command — the stack trace tells it exactly where to start. The agent gets a lightweight repo summary (file list, directory structure, language distribution) and the matched error files. Then it investigates with tools. No wasted work.

The same optimization applies to impact and review in deep mode as of this release.

Try It

# Install or update
curl -fsSL https://raw.githubusercontent.com/ereshzealous/archexa/main/install.sh | bash

# Create config
archexa init

# From a stack trace
archexa diagnose --config archexa.yaml --trace stacktrace.txt

# From logs with timezone
archexa diagnose --config archexa.yaml --logs app.log --last 4h --tz "UTC+5:30"

# From an error message
archexa diagnose --config archexa.yaml --error "ConnectionError at db_client.py:42"

Works with any OpenAI-compatible model. Tested with Gemini Flash via OpenRouter (~$0.03-0.05 per diagnosis). Use a larger model for complex multi-service errors.

DEV Community

From Stack Trace to Root Cause - Archexa's New Diagnose Command

Three Ways to Feed It an Error

1. Stack Trace File (`--trace`)

2. Log File (`--logs`)

3. Inline Error (`--error`)

Config Support

What Happens Under the Hood

What the Output Looks Like

Real Example: Spring Boot NullPointerException

Scaling Test: FastAPI (2,600+ Files)

Why Deep Mode Is the Default

No Evidence Extraction

Try It

Top comments (0)

Three Ways to Feed It an Error

1. Stack Trace File (--trace)

2. Log File (--logs)

3. Inline Error (--error)

Config Support

What Happens Under the Hood

What the Output Looks Like

Real Example: Spring Boot NullPointerException

Scaling Test: FastAPI (2,600+ Files)

Why Deep Mode Is the Default

No Evidence Extraction

Try It

1. Stack Trace File (`--trace`)

2. Log File (`--logs`)

3. Inline Error (`--error`)