Day 5/90: String Manipulation for Security
90 Day Python Security Scripting Challenge
Strings are the fundamental unit of security analysis. Today I built tools for the string operations that come up during incident response, malware triage, and web application testing.
Multi-Encoding Log Reader
During incident response, you pull logs from systems running different OSes and locales. A function that tries encodings in strict mode and falls back gracefully prevents silent data corruption.
def read_log_safely(path):
"""Read log files trying common encodings in order."""
for encoding in ["utf-8", "latin-1", "cp1252", "ascii"]:
try:
with open(path, encoding=encoding, errors="strict") as fh:
lines = fh.readlines()
return lines, encoding
except (UnicodeDecodeError, UnicodeError):
continue
with open(path, encoding="utf-8", errors="replace") as fh:
return fh.readlines(), "utf-8-fallback"
def filter_log_by_pattern(lines, pattern):
"""Search log lines for security-relevant patterns."""
import re
compiled = re.compile(pattern, re.IGNORECASE)
return [(i + 1, line.rstrip()) for i, line in enumerate(lines)
if compiled.search(line)]
log_lines, enc = read_log_safely("/var/log/auth.log")
print(f"Encoding detected: {enc}, Lines: {len(log_lines)}")
failed = filter_log_by_pattern(log_lines, r"failed|invalid|denied")
for lineno, text in failed[:10]:
print(f" L{lineno}: {text}")
The strict-then-fallback approach ensures you know when encoding assumptions break. In forensic work, knowing that a file was not valid UTF-8 is itself useful information.
XOR String Decryption for Malware Analysis
Single-byte XOR remains common in malware obfuscation. Brute forcing 255 keys is instant and catches a large percentage of obfuscated strings.
def xor_bruteforce(raw_bytes, min_printable=0.7, min_len=5):
"""Try all single-byte XOR keys, return readable results."""
hits = []
for k in range(1, 256):
decoded = bytes(b ^ k for b in raw_bytes)
try:
text = decoded.decode("ascii", errors="strict")
except UnicodeDecodeError:
continue
printable_pct = sum(c.isprintable() for c in text) / len(text)
if printable_pct >= min_printable and len(text.strip()) >= min_len:
hits.append({"key": f"0x{k:02x}", "text": text.strip()})
return hits
sample = bytes([0x6b, 0x74, 0x74, 0x63, 0x28, 0x25, 0x25, 0x15,
0x14, 0x14, 0x04, 0x7e, 0x69, 0x7c, 0x6e, 0x65])
for hit in xor_bruteforce(sample):
print(f"Key {hit['key']}: {hit['text']}")
URL and HTML Encoding for Web Security
Attackers layer URL encoding to bypass filters. Your analysis tools need to decode recursively and flag suspicious patterns in the decoded result.
import urllib.parse
import html
def deep_url_decode(text, max_iterations=10):
"""Decode URL encoding iteratively until stable."""
for _ in range(max_iterations):
decoded = urllib.parse.unquote(text)
if decoded == text:
return decoded
text = decoded
return text
def check_xss_patterns(user_input):
"""Check decoded input for common XSS vectors."""
patterns = ["<script", "javascript:", "onerror=", "onload=",
"<img", "<svg", "<iframe", "eval("]
decoded = deep_url_decode(user_input)
decoded = html.unescape(decoded)
threats = [p for p in patterns if p.lower() in decoded.lower()]
return {"decoded": decoded, "threats": threats, "safe": html.escape(decoded)}
test_payloads = [
"%3Cscript%3Ealert(1)%3C%2Fscript%3E",
"%253Cimg%2520onerror%253Dalert(1)%253E",
]
for payload in test_payloads:
result = check_xss_patterns(payload)
print(f"Input: {payload}")
print(f"Decoded: {result['decoded']}")
print(f"Threats: {result['threats']}")
print(f"Escaped: {result['safe']}\n")
Key Takeaways
These string operations chain together during real investigations. A single incident might require you to decode URL layers, extract base64, convert hex to bytes, XOR decrypt, and defang the resulting IOCs for your report. Having each step as a tested function makes your response faster.
Day 5/90 of the 90 Day Python Security Scripting Challenge
Top comments (0)