rudy_candy

Posted on Apr 20 • Originally published at alsavaudomila.com

Corrupted File picoCTF Writeup

#ctf #picoctf #forensics #security

Magic bytes corruption, hex editing, and a JPEG hiding in plain sight — this is my full walkthrough of the picoCTF 2019 "Corrupted File" forensics challenge. The file command returned just "data." No extension, no hint about the format. I spent roughly 25 minutes confused, trying the wrong tools, before two bytes of hexdump finally made everything click. Here's the complete process, including every dead end.

Challenge Overview: picoCTF 2019 "Corrupted File"

This challenge is from picoCTF 2019 , Forensics category, rated Easy. The setup: you receive a single file with no extension. The challenge description says it's "corrupted." Your job is to figure out what it actually is and recover the hidden flag inside it.

There are no other hints. No suggested tools. Just a broken file and your terminal.

Field	Details
CTF Name	picoCTF 2019
Challenge Name	Corrupted File
Category	Forensics
Difficulty	Easy
Flag	picoCTF{r3st0r1ng_th3_by73s_b67c1558}

First Look: The File Command Returns Nothing Useful

My first instinct with any unknown file is file. It checks the actual byte content — not the extension — and matches known magic byte patterns. I ran it expecting at least a partial identification:

$ file corrupted_file
corrupted_file: data

"data." That's the file command's way of saying it matched absolutely nothing in its signature database. Not truncated, not partially identified — just completely unknown. My first thought was that maybe the file was totally destroyed and nothing was recoverable. I sat with that for a moment.

Then I looked at the challenge name again: Corrupted File. Of course it was corrupted. That was the point. Something had been deliberately broken, and that something was almost certainly the header — the magic bytes that tell every tool on your system what type of file it's looking at.

Running strings — and finding more than I expected

Before touching any hex editor, I ran strings to see if any readable content survived the corruption:

$ strings corrupted_file | head -20
JFIF
Exif
...
picoCTF{r3st0r1ng_th3_by73s_b67c1558}

Two things jumped out immediately. First: JFIF. That's the JPEG File Interchange Format marker — it's specific to JPEG files and always appears just after the SOI header. Second, the flag was right there, readable as plain text near the end of the output. I could have submitted it and moved on.

But I didn't, because I wanted to understand what was actually broken. Submitting the flag without understanding the fix felt like cheating myself out of the lesson.

Why JPEG? How I Ruled Out Every Other Format

When you see an unidentified file, the first question is: what format is it supposed to be? I didn't guess JPEG randomly — I reasoned through it by eliminating alternatives.

JFIF = JPEG, full stop. The string "JFIF" doesn't appear in PNG, GIF, PDF, or ZIP files. It's specific to JPEG. Once I saw it in strings output, there was no ambiguity.
PNG ruled out immediately. PNG files contain the ASCII string "PNG" in bytes 1–3 of their magic number (89 50 4E 47). Even if the first byte were corrupted, "PNG" would still show up in strings. It didn't.
GIF ruled out. GIF headers start with ASCII "GIF87a" or "GIF89a." Completely readable in strings. Absent here.
PDF ruled out. PDFs begin with %PDF, also readable ASCII. Not present.
ZIP ruled out. ZIP files start with "PK" in ASCII. Not present either.

The only format consistent with seeing "JFIF" in the strings output was JPEG. The body of the file was fine — it was just the two-byte SOI marker at the very beginning that had been tampered with.

Rabbit Hole: 20 Minutes of Wrong Approaches

I didn't go straight to hexdump. I'm being honest here: I fumbled around for a while trying things that seemed reasonable but weren't actually going to fix anything.

Attempt 1: Renaming and opening directly (~5 minutes)

My immediate reaction was to rename it to corrupted_file.jpg and try to open it in an image viewer. I figured maybe it just needed the extension. The image viewer opened, showed a loading spinner for a fraction of a second, then gave me an error: "Invalid or unsupported image format." Of course. File extensions are just labels — they don't affect how the actual bytes are interpreted. A JPEG viewer checks the magic bytes first, and FF D8 was not there.

Attempt 2: exiftool (~5 minutes)

$ exiftool corrupted_file
ExifTool Version Number         : 12.76
File Name                       : corrupted_file
File Type                       : JPEG
File Type Extension             : jpg
MIME Type                       : image/jpeg
Image Width                     : 400
Image Height                    : 300
Color Components                : 3
...

This was surprising. Exiftool identified it as JPEG — dimensions and everything — even without the correct magic bytes. That's because exiftool doesn't rely solely on the first two bytes; it looks for JFIF/Exif markers deeper in the file structure. Useful confirmation that the file was JPEG, but it didn't repair anything. I still couldn't view it.

Attempt 3: binwalk (~10 minutes)

$ binwalk corrupted_file

DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------------------
0             0x0             Unknown data

Binwalk relies on magic byte signatures at known offsets. With the first two bytes wrong, it couldn't match anything. I tried binwalk -e (extract), binwalk --dd, different flags — nothing. Eventually I accepted that no automated tool was going to save me here. I needed to look at the raw bytes.

hexdump: Finding the Two Wrong Bytes

I finally did what I should have done ten minutes earlier:

$ hexdump -C corrupted_file | head -3
00000000  5c 78 ff e0 00 10 4a 46  49 46 00 01 01 00 00 01  |\.....JFIF......|
00000010  00 01 00 00 ff db 00 43  00 08 06 06 07 06 05 08  |.......C........|
00000020  07 07 07 09 09 08 0a 0c  14 0d 0c 0b 0b 0c 19 12  |................|

The first two bytes: 5C 78. Everything from byte 2 onward looked like legitimate JPEG data — FF E0 is the APP0 marker, and there's "JFIF" right in the ASCII column at bytes 6–9.

So what is 5C 78? In ASCII, 5C is a backslash (\) and 78 is the letter x. Together: \x. That's the hex escape prefix used in Python, C, and most scripting languages. When you write \xff\xd8 in code, the \x parts are supposed to be interpreted by the language and produce the binary bytes FF D8. But here, the \x prefix had been written out as literal ASCII characters instead of being processed. So instead of the binary JPEG SOI marker, the file had the text "\x" followed by the rest of the original sequence.

A correct JPEG SOI (Start of Image) header looks like this:

FF D8 FF E0 00 10 4A 46 49 46 ...
^^^^^
SOI marker — the two bytes that identify this as JPEG

The fix: replace bytes 0 and 1 with FF D8.

The "Aha" Moment: \x as a Fingerprint of the Corruption

When I saw 5C 78 in the hexdump, I stared at it for a moment. Then it hit me. Backslash-x. That 's escape notation. The person who wrote the challenge (or the script that generated it) had likely worked with the bytes as a Python string like b'\xff\xd8\xff\xe0...', and something went wrong in the encoding — the escape sequences got written as ASCII text rather than as binary.

It's such a specific kind of corruption that it immediately told me this was intentional challenge design. It's not random bit-flipping or file truncation — it's a precise two-byte substitution that takes the exact magic number and replaces it with its own escape-notation representation. Elegant, in a devious sort of way.

Once I understood what had happened, the fix was obvious and I felt slightly embarrassed it took me 20 minutes to get there.

The Fix: Surgical Repair with hexedit

First, copy the file. This is non-negotiable — never edit the original:

$ cp corrupted_file fixed_file
$ hexedit fixed_file

In hexedit, the display shows hex values on the left and their ASCII equivalents on the right. The cursor starts at position 0x00. I typed FF — the display updated immediately, showing the changed byte. Then D8 for the second byte. Then Ctrl+X to save and exit.

The hexdump before and after, for direct comparison:

Before — corrupted header:

00000000  5c 78 ff e0 00 10 4a 46  49 46 00 01 01 00 00 01  |\.....JFIF......|

After — repaired header:

00000000  ff d8 ff e0 00 10 4a 46  49 46 00 01 01 00 00 01  |......JFIF......|

Two bytes changed. The rest of the file: completely untouched. The JFIF marker at bytes 6–9 had been there all along, waiting for the SOI marker to precede it correctly.

Confirmation with file:

$ file fixed_file
fixed_file: JPEG image data, JFIF standard 1.01, resolution (DPI), density 1x1, segment length 16, baseline, precision 8, 400x300, components 3

From "data" to a fully described JPEG image with dimensions, resolution, and encoding details. That output felt disproportionately satisfying for what was, mechanically, a two-byte fix.

The Flag

Opening the repaired image revealed the flag printed on it:

picoCTF{r3st0r1ng_th3_by73s_b67c1558}

Full Trial Process: Every Step, Honest Results

Step	Command	Result	Why it failed / What I learned
1	`file corrupted_file`	"data" — unidentified	Magic bytes at offset 0 don't match any known signature
2	Renamed to .jpg, opened in image viewer	Error: invalid image format	File extensions are labels only; viewers check actual header bytes
3	`strings corrupted_file	head -20`	Found "JFIF" and the flag as plaintext
4	`exiftool corrupted_file`	Identified as JPEG (400×300)	Exiftool uses deeper format parsing; useful confirmation
5	`binwalk corrupted_file` (multiple flags)	"Unknown data" — nothing extracted	Binwalk also uses magic bytes at offset 0; dead end
6	`hexdump -C corrupted_file	head -3`	Bytes 0–1 are `5C 78` (ASCII "`\x`")
7	`cp corrupted_file fixed_file && hexedit fixed_file`	Changed `5C 78` → `FF D8` at offset 0	Two-byte repair of the JPEG SOI marker
8	`file fixed_file`	Valid JPEG, 400×300, 3 components	Repair successful; format fully identified
9	Opened fixed_file in image viewer	Flag visible in image	Challenge complete

Real-World Relevance: Magic Byte Attacks Beyond CTF

Magic byte manipulation isn't just a CTF trick. It's a real attack technique with documented abuse cases — and understanding how it works makes you a better analyst on both sides of a security investigation.

File upload bypass attacks

Many web applications validate uploaded files by checking only the magic bytes — the assumption being that if the first few bytes say "JPEG," the file is a JPEG. Attackers exploit this by prepending valid JPEG magic bytes to a PHP shell, a Python script, or any other executable payload. The server's validator sees FF D8 and passes the check. The file is stored. A direct HTTP request to the upload URL executes the code.

This attack class has appeared in CVEs against WordPress plugins, PHP image processing libraries (particularly when getimagesize() is misused as a security check), and custom file upload handlers that never perform server-side content inspection beyond the header.

Email gateway evasion

Some email security gateways block attachments based on their detected file type. Malware authors have swapped the magic bytes of Windows executables (which start with 4D 5A, "MZ") to make them resemble PDFs or Office documents. The gateway scans the header, sees a "document," and passes the attachment. The recipient's system — which may do additional verification — then handles it correctly as an executable.

Polyglot files

A polyglot file is simultaneously valid in two different formats. The most well-known CTF example is a file that is both a valid JPEG (because it starts with FF D8) and a valid ZIP archive (because the ZIP end-of-central-directory record appears at the end and doesn't interfere with JPEG parsing). Different tools interpret the same file completely differently. This has been used in real attacks to bypass content filters that only check one format's markers.

Anti-forensic header corruption

Malware samples sometimes deliberately corrupt their own PE headers or remove their magic bytes to confuse automated sandboxes. If a sandbox can't identify the file type, it may skip format-specific behavioral analysis. This buys the malware time in environments where human review only happens for samples that automated systems flag clearly.

MIME sniffing and the nosniff header

Browsers can disagree with servers about file types. A server might declare Content-Type: image/jpeg, but if the bytes look like HTML, some browsers will sniff the content and render it as HTML — potentially executing embedded JavaScript. This is why X-Content-Type-Options: nosniff is a security best practice in HTTP headers. Understanding magic bytes is fundamental to understanding why that header exists.

Magic Bytes Cheat Sheet for CTF Forensics

Having these memorized saves significant time in forensics challenges:

File Type	Magic Bytes (Hex)	ASCII / Notes
JPEG	`FF D8 FF`	Non-printable; followed by APP0 `FF E0` or APP1 `FF E1`
PNG	`89 50 4E 47 0D 0A 1A 0A`	`.PNG....` — "PNG" is readable in strings
GIF	`47 49 46 38`	`GIF8` — followed by "7a" or "9a"
PDF	`25 50 44 46`	`%PDF` — fully readable ASCII
ZIP / JAR / DOCX	`50 4B 03 04`	`PK..` — many formats are ZIP-based
ELF (Linux binary)	`7F 45 4C 46`	`.ELF`
Windows PE (.exe/.dll)	`4D 5A`	`MZ` — from Mark Zbikowski, original DOS designer
SQLite database	`53 51 4C 69 74 65 20 66`	`SQLite f` — starts "SQLite format 3"

Beginner Tips for Magic Byte Challenges

Start withfile, always. If it returns "data," the header is broken. That's your diagnostic.
Runstrings before reaching for a hex editor. Format-specific strings like "JFIF," "PNG," "%PDF," or "GIF8" appear in even heavily corrupted files and tell you the intended type in seconds.
Check only the first 8–16 bytes. Magic bytes live at the very beginning. hexdump -C file | head -2 is almost always sufficient to find header corruption.
Copy before editing. Always: cp original working_copy. Run all your edits on the copy. You may need the original as a reference.
hexedit navigation: Arrow keys to move, just type hex digits to overwrite. Ctrl+X saves and exits. Ctrl+C cancels without saving.
Python one-liner if hexedit feels uncomfortable:

data = open('corrupted_file', 'rb').read()
fixed = b'\xff\xd8' + data[2:]
open('fixed_file', 'wb').write(fixed)

This replaces the first two bytes with the correct JPEG SOI marker and writes the result to a new file. Three lines. No hex editor needed.

What I'd Do Differently Next Time

The biggest mistake I made was spending time on tools before looking at the raw bytes. The pattern for magic byte corruption challenges is straightforward once you've done it once:

Run file — does it identify the format? If it returns "data," the header is wrong.
Run strings — what format-specific strings appear? This tells you the intended file type without opening a hex editor.
Run hexdump -C file | head -3 — find the exact bytes at the start and compare to the known magic number for that format.
Make a copy, open in hexedit (or write a Python one-liner), and fix the bytes.
Verify with file again.

That's a five-step process I could now execute in under two minutes. At the time it took me about 25. The difference is just pattern recognition — and that comes from working through enough of these challenges to stop second-guessing the most obvious approach.

I'd also spend five minutes before any forensics CTF round memorizing the common magic bytes: JPEG, PNG, GIF, ZIP, PDF. Having that table in your head means you recognize a corrupted header instantly from the hexdump, rather than having to stop and look things up mid-solve.

DEV Community