We just released an important update: retraining our Random Forest model on real malware behavior from the CIC-MalMem-2022 dataset. The challenge was mapping 58,000 complex memory dump traces into a clean 10-feature vector space that our syscall graph extractor produces.
How it works:
- Sandbox target in Docker (network dropped)
- Trace every syscall with strace -t -f
- Parse into a NetworkX directed graph
- Extract 10 features (process count, network connections, file operations, severity scores, etc.)
- Feed into RandomForest for classification
We also resolved module-level import cycles and pinned skops for safer model deserialization in production.
Looking for collaborators who understand malware behavior, syscall parsing, or want to contribute detection rules. Open to issues and PRs.
Top comments (0)