Qimin Zhao

Posted on May 29

How to investigate suspicious SSH logins without giving AI a shell

#incidentresponse #linux #security #opensource

A lot of Linux incident response starts with a login question, not a malware sample.

Someone sees a spike of failed SSH attempts. A root login appears in the wrong time window. A service account logs in from an address nobody recognizes. A helpdesk ticket says "the server looks weird" and the only concrete clue is a username or IP address.

At that point, the useful question is not "is this host compromised?" It is more boring and more important:

Did anyone actually authenticate?
Which account was involved?
Was it password, key, sudo, su, or a scheduled task?
Was the same IP seen in web logs, current sockets, process context, or command history?
Did persistence, services, packages, or recent files change near the same time?
Can another responder review exactly what evidence was collected?

That last point matters. If you let an AI assistant freely run shell commands during the first pass, you can get speed, but you also create a new risk: the model may over-collect, mutate the host, or produce a confident answer that nobody can audit later.

For a login anomaly, I prefer a read-only evidence loop.

A practical first pass

Start with the narrow clue if you have one.

If the alert names a user:

oi login --user root -s 7d

If the alert names an IP address:

oi login --ip 203.0.113.44 -s 7d

If the alert is vague, start wider:

oi login -s 7d
oi scan -s 7d

The goal of the first pass is not to prove every detail. The goal is to build a timeline that a human responder can challenge.

For a suspicious SSH login, I want the initial report to answer five things.

1. Authentication pattern

Look for the difference between noise and access.

A server can receive thousands of failed SSH attempts from the internet. That is useful background, but it is not the same as a successful session. The first split should be:

failed attempts only
successful login after many failures
accepted key from an unusual source
login by an account that normally should not be interactive
root login where root SSH should be disabled

A good report should show the exact log lines or normalized evidence behind that assessment, not just say "suspicious login observed."

2. Account context

The username is only one part of the story.

For the involved account, collect read-only account facts:

UID and groups
shell
home directory
recent password or account metadata if available
sudo-related context
whether the account looks like a human, service, or automation identity

This helps separate a noisy brute-force event from a potentially meaningful access event.

3. Session and process context

If the login was successful, ask what was active around the same time.

Useful read-only checks include:

current sessions
process snapshot
parent/child process hints
network sockets
command history if available and appropriate
recent files in likely working directories

None of these alone prove compromise. Together, they can show whether the login became a shell, touched a web root, started a process, connected outbound, or did nothing observable.

4. Persistence and service changes

For the same time window, check persistence surfaces:

cron and timer entries
systemd services
shell startup files
recently modified files
unusual listeners
package changes
container changes if the host runs containers

The first pass should be explicit about gaps. For example: "auth logs show a successful login, but no matching persistence change was found in the collected window" is much more useful than a dramatic conclusion.

5. Reviewable output

The output should not be an opaque chat answer.

For a real handoff, I want a case folder that contains:

evidence.jsonl for normalized observations
commands.log for what was collected
report.json for structured findings
report.md for the human-readable narrative

This lets a second responder inspect the evidence, rerun missing checks, disagree with the conclusion, or attach the report to a ticket.

Why the AI boundary matters

An AI assistant is useful for correlation. It can connect login times, accounts, IPs, files, services, and network state faster than a tired human staring at separate terminals.

But during first-pass incident response, the assistant should not have authority to remediate by default.

It should not:

kill processes
delete files
disable accounts
restart services
block IPs
change firewall rules
install packages
upload or download tools

Those actions may be appropriate later, but they belong in an approved response step, not in evidence collection.

The safer pattern is: collect bounded evidence, write an auditable report, then let the responder decide what to do next.

Where Open Investigator fits

I maintain Open Investigator at Arvanta Cyber. It is an Apache-2.0 local AI incident investigation CLI for Linux and Windows hosts. The project is built around this read-only first-pass model: AI gets sealed investigation tools, not arbitrary remediation authority.

For login anomalies, WebShell clues, suspicious IPs, Java service issues, and vague server alerts, the point is to turn a loose signal into a reviewable case folder.

Project:
https://github.com/SEc-123/open-investigator

Overview:
https://www.arvantacyber.com/open-investigator/

Related read-only AI incident response guide:
https://www.arvantacyber.com/open-investigator/articles/local-ai-server-incident-response/