TraceTree: Mapping malware behavior to catch supply chain attacks

#ai #security #opensource #cybersecurity

We just released an important update: retraining our Random Forest model on real malware behavior from the CIC-MalMem-2022 dataset. The challenge was mapping 58,000 complex memory dump traces into a clean 10-feature vector space that our syscall graph extractor produces.

How it works:

Sandbox target in Docker (network dropped)
Trace every syscall with strace -t -f
Parse into a NetworkX directed graph
Extract 10 features (process count, network connections, file operations, severity scores, etc.)
Feed into RandomForest for classification

We also resolved module-level import cycles and pinned skops for safer model deserialization in production.

Looking for collaborators who understand malware behavior, syscall parsing, or want to contribute detection rules. Open to issues and PRs.

https://github.com/tejasprasad2008-afk/TraceTree

Top comments (1)

Aliaksei Zelianouski • Jul 5

The strace-to-graph pipeline is solid work. One thing I don't get: CIC-MalMem-2022 is memory dump statistics - there are no syscall traces in it. How did you map that onto the 10 features your extractor pulls from strace? And does ransomware behavior from memory dumps actually transfer to something like a malicious install script?