Qimin Zhao

Posted on May 29

Turning first-pass host evidence into a DFIR handoff report

#incidentresponse #security #opensource #ai

Most incident-response writeups focus on the detection moment: a suspicious IP, a strange login, a web-root file, a Java service behaving oddly, or a process listening on a port it should not expose.

That first clue matters, but the next problem is usually more operational:

How do you turn the first 15-30 minutes of host triage into something another responder can trust, review, and continue?

A useful first-pass handoff is not just a paragraph that says "probably compromised." It should preserve the evidence trail, the commands or collectors used, the confidence level, the gaps, and the next manual checks.

Here is the handoff shape I like for AI-assisted host investigation.

Start with a bounded question

Instead of giving an AI agent a production shell, start with a narrow investigation question and a time window.

For example:

oi ask "Investigate this server for suspicious login, web, process, network, persistence, and recent-file clues over the last 7 days. Do not remediate." -s 7d

Or if the alert starts from one external address:

oi ip 203.0.113.77 -s 7d

The important part is not the exact command. The important part is the boundary: collect and correlate host evidence, but do not mutate the host.

Collect evidence by category, not by vibes

For a server case, a first pass should usually cover at least these areas:

authentication events and failed/successful login patterns
local accounts, groups, sudoers, and authorized keys summaries
process snapshots and suspicious command lines
listening sockets and active network connections
systemd, cron, startup items, and other persistence points
web logs and recent web-root file changes where relevant
Java process and memory-shell perimeter clues where relevant
recent files, packages, containers, and shell history clues

The goal is not to prove the whole incident in one pass. The goal is to avoid handing off a vague summary with no supporting trail.

Keep the case directory as the handoff object

Open Investigator writes every run into a case folder like this:

.oi/cases/<case-id>/
  case.json       # investigation input and mode
  evidence.jsonl  # evidence records with evidence_id
  commands.log    # allowed/denied command audit
  report.json     # structured report
  report.md       # human-readable report

That structure is useful because different responders need different artifacts.

The analyst reading quickly wants report.md.

The person validating claims wants evidence.jsonl.

The reviewer checking what the tool did wants commands.log.

The team integrating the result into a ticket, case system, or downstream tool wants report.json.

What should go into report.md

A good first-pass report should separate conclusions from evidence. I would expect sections like:

executive summary
observed signals
timeline or chain of suspicious activity
key evidence IDs
confidence and risk level
evidence gaps
recommended manual follow-up
explicit non-actions taken by the tool

The phrase "explicit non-actions" matters. If the tool did not block an IP, kill a process, delete a file, disable an account, restart a service, change firewall state, or isolate the host, the report should make that clear.

That is not just legal caution. It helps the next responder understand that the system was used for investigation, not remediation.

What should go into commands.log

If an AI system can ask for host checks, the audit trail should show what was allowed and what was denied.

For example, I want to know whether the run used only sealed read-only collectors, whether a policy-filtered read-only command fallback was used, and whether anything was denied because it looked destructive or outside scope.

A useful command audit answers:

Which collector or command ran?
Why was it allowed?
Was anything denied?
Which case did the action belong to?
Did it write only to the case directory?

This is one of the places where AI incident tooling should be boring on purpose.

Do not let the summary outrun the evidence

The report should not pretend to know what it did not inspect.

Examples of honest gaps:

outbound traffic was not proven
EDR telemetry was not correlated yet
cloud control-plane logs were not reviewed
packet captures were not available
application-level impact was not validated
Java deep diagnostics were not enabled
heap or JFR artifacts were intentionally not collected

Those gaps are not failures. They are the checklist for the next responder.

Why read-only matters for handoff

In an incident, the first tool that touches the host can accidentally change the evidence story.

For a first-pass AI investigator, I want the default boundary to be:

read logs, metadata, process snapshots, network state, persistence config, and relevant file metadata
write only case artifacts and audit records
avoid remediation authority by default
require explicit flags for heavier diagnostics
preserve enough raw evidence for a human to challenge the summary

That boundary makes the result more useful to a real DFIR/SOC workflow because another person can pick up the case without guessing what the AI changed.

A practical handoff checklist

Before sending the case to a teammate, I would check:

Is the original question and time window clear in case.json?
Does report.md cite evidence IDs instead of unsupported claims?
Does evidence.jsonl include the raw or summarized observations needed to challenge the conclusion?
Does commands.log show allowed and denied actions?
Are gaps written as follow-up tasks?
Are heavy artifacts, if any, explicitly requested and stored under the case directory?
Is the report clear that investigation happened but remediation did not?

That is the difference between "AI wrote a summary" and "the team received a case they can work."

Where Open Investigator fits

I maintain Open Investigator at Arvanta Cyber. It is an Apache-2.0 local AI server investigator for Linux and Windows incident response.

The design goal is narrow: let AI collect and correlate host evidence through sealed read-only tools, then produce reviewable case artifacts. It is not an EDR replacement, not a SIEM/SOAR replacement, and not an automated remediation system.

Useful starting points:

Product overview: https://www.arvantacyber.com/open-investigator/
AI DFIR report page: https://www.arvantacyber.com/open-investigator/ai-dfir-reporting-tool/
Open-source repo: https://github.com/SEc-123/open-investigator

I would be interested in how other DFIR, SOC, SRE, and security engineering teams structure first-pass handoff reports. The report format is often where tooling either becomes operationally useful or turns into one more summary to distrust.

DEV Community