Day 5/90: String Manipulation for Security -- Python Security

#cybersecurity #python #security #tutorial

Day 5/90: String Manipulation for Security

90 Day Python Security Scripting Challenge

Strings are the fundamental unit of security analysis. Today I built tools for the string operations that come up during incident response, malware triage, and web application testing.

Multi-Encoding Log Reader

During incident response, you pull logs from systems running different OSes and locales. A function that tries encodings in strict mode and falls back gracefully prevents silent data corruption.

def read_log_safely(path):
    """Read log files trying common encodings in order."""
    for encoding in ["utf-8", "latin-1", "cp1252", "ascii"]:
        try:
            with open(path, encoding=encoding, errors="strict") as fh:
                lines = fh.readlines()
            return lines, encoding
        except (UnicodeDecodeError, UnicodeError):
            continue
    with open(path, encoding="utf-8", errors="replace") as fh:
        return fh.readlines(), "utf-8-fallback"

def filter_log_by_pattern(lines, pattern):
    """Search log lines for security-relevant patterns."""
    import re
    compiled = re.compile(pattern, re.IGNORECASE)
    return [(i + 1, line.rstrip()) for i, line in enumerate(lines)
            if compiled.search(line)]

log_lines, enc = read_log_safely("/var/log/auth.log")
print(f"Encoding detected: {enc}, Lines: {len(log_lines)}")
failed = filter_log_by_pattern(log_lines, r"failed|invalid|denied")
for lineno, text in failed[:10]:
    print(f"  L{lineno}: {text}")

The strict-then-fallback approach ensures you know when encoding assumptions break. In forensic work, knowing that a file was not valid UTF-8 is itself useful information.

XOR String Decryption for Malware Analysis

Single-byte XOR remains common in malware obfuscation. Brute forcing 255 keys is instant and catches a large percentage of obfuscated strings.

def xor_bruteforce(raw_bytes, min_printable=0.7, min_len=5):
    """Try all single-byte XOR keys, return readable results."""
    hits = []
    for k in range(1, 256):
        decoded = bytes(b ^ k for b in raw_bytes)
        try:
            text = decoded.decode("ascii", errors="strict")
        except UnicodeDecodeError:
            continue
        printable_pct = sum(c.isprintable() for c in text) / len(text)
        if printable_pct >= min_printable and len(text.strip()) >= min_len:
            hits.append({"key": f"0x{k:02x}", "text": text.strip()})
    return hits

sample = bytes([0x6b, 0x74, 0x74, 0x63, 0x28, 0x25, 0x25, 0x15,
                0x14, 0x14, 0x04, 0x7e, 0x69, 0x7c, 0x6e, 0x65])
for hit in xor_bruteforce(sample):
    print(f"Key {hit['key']}: {hit['text']}")

URL and HTML Encoding for Web Security

Attackers layer URL encoding to bypass filters. Your analysis tools need to decode recursively and flag suspicious patterns in the decoded result.

import urllib.parse
import html

def deep_url_decode(text, max_iterations=10):
    """Decode URL encoding iteratively until stable."""
    for _ in range(max_iterations):
        decoded = urllib.parse.unquote(text)
        if decoded == text:
            return decoded
        text = decoded
    return text

def check_xss_patterns(user_input):
    """Check decoded input for common XSS vectors."""
    patterns = ["<script", "javascript:", "onerror=", "onload=",
                "<img", "<svg", "<iframe", "eval("]
    decoded = deep_url_decode(user_input)
    decoded = html.unescape(decoded)
    threats = [p for p in patterns if p.lower() in decoded.lower()]
    return {"decoded": decoded, "threats": threats, "safe": html.escape(decoded)}

test_payloads = [
    "%3Cscript%3Ealert(1)%3C%2Fscript%3E",
    "%253Cimg%2520onerror%253Dalert(1)%253E",
]
for payload in test_payloads:
    result = check_xss_patterns(payload)
    print(f"Input: {payload}")
    print(f"Decoded: {result['decoded']}")
    print(f"Threats: {result['threats']}")
    print(f"Escaped: {result['safe']}\n")

Key Takeaways

These string operations chain together during real investigations. A single incident might require you to decode URL layers, extract base64, convert hex to bytes, XOR decrypt, and defang the resulting IOCs for your report. Having each step as a tested function makes your response faster.

Day 5/90 of the 90 Day Python Security Scripting Challenge