Yesterday my tool only looked at the filename.
Today I realised the filename is the lie attackers want you to believe.
The Moment Everything Changed
I took a harmless malicious.bat and renamed it to invoice.pdf.
My old checker (Day 12) said: “Looks safe ✅”
Windows Explorer showed: invoice.pdf (icon = PDF)
A normal user would double-click without a second thought.
But the file was still a batch script.
That’s when it hit me:
The operating system doesn’t execute the name.
It executes the content.
Files Have Two Identities
- What the user sees → filename + icon (easy to fake)
- What the OS executes → magic bytes (first 2–8 bytes of the file)
Real examples:
- PDF → always starts with
%PDF - Windows EXE → always starts with
MZ - ELF binary (Linux) → starts with
7f ELF - ZIP (DOCX, XLSX, JAR…) → starts with
PK\x03\x04
If the header says “executable” but the name says “document”, that’s a disguise. Game over for filename-only checkers.
So I rebuilt everything.
SafeOpen v2 — “Inspect Before You Execute”
Here’s the complete, ready-to-run tool with full explanations of every new capability.
#!/usr/bin/env python3
"""
SafeOpen v2 — File Security Analyzer
Inspect before you execute.
"""
import sys, hashlib, mimetypes, os, math, struct, time, argparse, json, re
from datetime import datetime
# === 1. Magic Byte Detection (Upgrade #1) ===
MAGIC_SIGNATURES = {
b"MZ": "Windows PE Executable",
b"\x7fELF": "Linux ELF Executable",
b"\xca\xfe\xba\xbe": "Java Class File",
b"PK\x03\x04": "ZIP Archive (DOCX/XLSX/JAR/etc)",
b"%PDF": "PDF Document",
b"\x89PNG": "PNG Image",
b"\xff\xd8\xff": "JPEG Image",
# ... (full dict in the complete code below)
}
def detect_magic(data):
for magic, desc in MAGIC_SIGNATURES.items():
if data.startswith(magic) or magic.lower() in data[:512].lower():
return desc
return None
What this does: Reads the first 2048 bytes and matches against known headers.
Rename malware.exe → report.pdf → tool now screams “CRITICAL — Executable disguised as document”.
# === 2. Entropy — The “Malware Smell” (Upgrade #2) ===
def entropy(data):
if not data: return 0.0
occur = [0] * 256
for byte in data:
occur[byte] += 1
ent = 0.0
length = len(data)
for x in occur:
if x == 0: continue
p = x / length
ent -= p * math.log2(p)
return ent
Why it matters: Normal documents have structure → entropy ~4.0–6.5
Packed/encrypted malware → entropy >7.5 (looks like random noise).
No signatures needed. Pure mathematics.
# === 3. Suspicious Behaviour Indicators (Upgrade #3) ===
SUSPICIOUS_STRINGS = [
(b"powershell", "PowerShell downloadcradle"),
(b"Invoke-WebRequest", "Downloader"),
(b"rm -rf", "Destructive delete"),
(b"net user", "User creation"),
# ... 20+ more patterns
]
def scan_strings(data):
hits = []
lower = data.lower()
for pattern, desc in SUSPICIOUS_STRINGS:
if pattern.lower() in lower:
hits.append(desc)
return hits
Scans first 512 KB for known malicious patterns. A document containing powershell -c Invoke-WebRequest is not a document.
# === 4. Embedded Network Indicators (Upgrade #4) ===
# Simple regex on decoded text
urls = re.findall(r'https?://[^\s\'"<>]{4,80}', text)
ips = re.findall(r'\b(?:\d{1,3}\.){3}\d{1,3}\b', text)
Shows you every C2 server or IP the file wants to talk to before you open it.
# === 5. Cryptographic Hashes + PE Header Parsing (Upgrade #5) ===
def sha256sum(path): ...
def md5sum(path): ...
def get_pe_info(data):
# Parses MZ → PE header, extracts architecture, compile time, DLL/EXE flag
...
Even if the file is renamed 10 times, the SHA-256 is the same.
PE parser tells you “64-bit executable compiled on 2025-11-03”.
Risk Meter — One Number to Rule Them All
Every check adds to a 0–100 risk score:
- Extension mismatch → +25
- High entropy → +30
- Suspicious strings → +5 each
- Embedded URLs → +2 each
- PE executable in .pdf → instant jump
Then a beautiful terminal risk meter with colour-coded threat level (CLEAN → CRITICAL).
What SafeOpen Is (and Is Not)
Is: 5-second pre-execution triage for suspicious attachments.
Is not: Antivirus, sandbox, or signature-based detector.
It solves the exact moment every SOC analyst, helpdesk tech, and power user faces:
“Hey, is this invoice.pdf safe?”
How to use it right now:
python3 safeopen.py suspicious.pdf --strings
python3 safeopen.py *.exe --json-out report.json
Results
Final Thought
Most breaches aren’t zero-days.
They’re ordinary files opened by ordinary people who trusted the filename.
Sorry for missing yesterday’s post. I got pulled into some serious debugging and real testing, and the write-up itself took longer than I expected. I didn’t want to rush it and post something half-baked, so I waited until it was stable and properly explained. Day 13 is finally here 🙂
SafeOpen gives you the habit:
Don’t execute first. Inspect first.
Drop a 🔥 if you want the Day 14 tomorrow.





Top comments (0)