rudy_candy

Posted on Apr 20 • Originally published at alsavaudomila.com

Audacity CTF: Audio Forensics

#ctf #security #linux #forensics

The CTF Challenge That Taught Me to Stop Listening and Start Looking

I almost submitted a wrong flag on a picoCTF audio forensics challenge because I spent twenty minutes trying to hear the answer instead of looking at it. The file was a short WAV — barely ten seconds — and I had played it probably fifteen times at different speeds, amplified it, reversed it, run strings on the raw bytes, and was about to give up when someone in the Discord server asked "did you check the spectrogram?" I had not. The flag was sitting right there, spelled out in the frequency domain in block letters, completely visible the moment I switched views. Twenty minutes of mishearing noise, solved in three seconds by looking at the right layer.

That moment reshaped how I approach every audio forensics challenge. Audacity isn't primarily a listening tool in CTF — it's a visualization tool. The spectrogram view is almost always the first place to look. This article is about what I learned from that embarrassing detour and the patterns I've recognized since, including the traps that sent me chasing the wrong thing before I understood what I was actually looking for.

Getting Audacity Ready: The Two Settings That Actually Matter

Audacity has a lot of settings. In CTF audio forensics, you touch maybe three of them regularly. Here's what to configure the moment you open a challenge file:

Switch to Spectrogram View Immediately

Click the dropdown arrow next to the track name → Spectrogram. Do this before you even press play. Most CTF flags hidden in audio are hidden visually in the frequency domain — the waveform view shows you nothing useful for these challenges.

Tune the Spectrogram Settings

The default spectrogram settings often make hidden content hard to see. Open Edit → Preferences → Spectrograms (or click the track dropdown → Spectrogram Settings) and adjust:

Window size: Start with 1024. If the image looks blurry, go to 2048 or 4096. Larger window = sharper frequency resolution, slower time resolution
Maximum frequency: Default is 8000 Hz. Try 20000 Hz if the flag isn't visible in the lower range — some challenges embed content in the higher frequencies precisely because the default view hides them
Color scheme: "Spectrum" or "Inferno" tends to make hidden content more visible than the default grayscale

The window size setting is the one that changed my CTF accuracy more than anything else. I had a challenge where the flag was visible at window size 4096 but looked like random noise at 512. Spent fifteen minutes assuming the challenge was about something else entirely before I thought to change it.

Rabbit Hole: The 45 Minutes I Spent Listening Instead of Looking

Here's my actual workflow before I understood spectrogram-first thinking — I want to document this because I think it's almost exactly what beginners try:

$ file mystery.wav
mystery.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, stereo 44100 Hz

$ strings mystery.wav | grep -i "flag&#124;ctf&#124;pico"
(no output)
# Flag was embedded visually in the spectrogram, not as ASCII bytes

# Opened in Audacity, waveform view
# Pressed play — heard static and what sounded like garbled tones
# Tried Effect → Amplify (3x)
# Still just louder static
# Tried Effect → Noise Reduction
# Reduced the noise. Still nothing obvious.
# Tried Effect → Reverse
# Backwards static
# Tried Effect → Change Speed → 50%
# Slower backwards static
# Inspected the left and right channels separately (Split Stereo Track)
# Both channels had the same static
# Ran strings again on exported raw bytes
# Still nothing
# --- 20 minutes in ---
# Switched to Spectrogram view (finally)
# Saw the flag immediately in large block letters between 2000-4000 Hz
# picoCTF{...} — done, instantly

The trap I fell into: I assumed audio forensics challenges were about audio. Most of them aren't. They use an audio file as a container for visual or encoded data that happens to be stored in the frequency domain. Treating it as something to listen to rather than look at is almost always the wrong instinct until you've ruled out the spectrogram.

Nine CTF Patterns in Audio Forensics (Ranked by How Often I've Seen Them)

After multiple CTF sessions with audio forensics challenges, these are the patterns I've encountered, roughly ordered by frequency. The first three account for the majority of challenges I've seen.

Pattern 1: Hidden Image in Spectrogram (The Most Common)

A flag, QR code, barcode, or image is embedded in the frequency domain. It's invisible in waveform view and may be hard to see at default spectrogram settings. This is the first thing to check on any audio challenge.

# In Audacity:
# 1. Track dropdown → Spectrogram
# 2. Track dropdown → Spectrogram Settings
#    - Window size: try 1024, then 2048, then 4096
#    - Max frequency: try 8000, then 20000
#    - Color scheme: Spectrum or Inferno
# 3. Zoom in on the time axis if the flag appears compressed

# If a QR code is visible in the spectrogram:
# Screenshot it → use a QR scanner
# The QR code may need cleanup in an image editor first if it's faint

One pattern I've seen: the flag is embedded in the very high frequency range (16000–20000 Hz) specifically to be hidden by the default 8000 Hz ceiling. If you see a suspicious-looking region at the top of the spectrogram that looks cut off, expand the frequency range.

Pattern 2: Morse Code in the Waveform

Short and long pulses (dots and dashes) encoded as beeps or amplitude spikes. The waveform view is better for this than the spectrogram — you're looking at timing patterns, not frequency content.

# In Audacity waveform view:
# 1. Zoom in on the time axis (Ctrl+Scroll)
# 2. Look for repeating short/long pulses
# 3. Amplify if pulses are hard to distinguish (Effect → Amplify)
# 4. Apply Noise Reduction first if background noise obscures the pattern

# For cleaner decoding: export as WAV, use online Morse decoder
# or: Effect → Normalize first to even out amplitude variations

# Common mistake: mis-identifying the dot/dash boundary
# Short burst = dot, long burst = dash, long gap = letter boundary
# Very long gap = word boundary

Pattern 3: Reversed or Speed-Manipulated Audio

Speech or tones recorded backwards, at double speed, or with pitch shifted. The tell is audio that sounds "wrong" — unusually fast, unusually pitched, or clearly speech but unintelligible.

# Reverse: Effect → Reverse (applies to selected region or entire track)
# Play again — if it now sounds like speech, you found it

# Speed manipulation:
# Effect → Change Speed (try 50% for sped-up audio, 200% for slowed-down)
# Effect → Change Tempo (preserves pitch while changing speed)
# Effect → Change Pitch (changes pitch without affecting speed)

# Common pattern: audio recorded at 2x speed
# Sounds like chipmunk speech → Change Speed → 50% → intelligible message

# Another pattern: audio pitch-shifted up by an octave
# Change Pitch → -12 semitones → back to normal range

Pattern 4: Stereo Channel Hiding

Data hidden in one channel of a stereo file, or only audible when channels are combined or subtracted. The waveform may look identical on both channels but contain subtle differences.

# Split stereo: Track dropdown → Split Stereo Track
# Two separate mono tracks appear — listen to each independently

# Invert and mix trick (reveals phase-cancelled content):
# 1. Duplicate the track
# 2. On one copy: Effect → Invert
# 3. Select both tracks → Tracks → Mix → Mix and Render
# Content common to both channels cancels out; unique content remains

# Also try: select all → Edit → Preferences → check for mono downmix differences

Pattern 5: DTMF Tones (Phone Keypad Encoding)

Phone keypad tones — each key produces a pair of specific frequencies. Spectrogram view reveals the frequency pairs; decoding them gives digits or characters.

# In spectrogram view, DTMF tones appear as brief horizontal lines
# occurring simultaneously at two frequencies:
#   697 Hz + 1209 Hz = "1"
#   697 Hz + 1336 Hz = "2"
#   697 Hz + 1477 Hz = "3"
#   770 Hz + 1209 Hz = "4"
#   ... (standard DTMF table)

# Export the audio and use a DTMF decoder:
# $ multimon-ng -t wav -a DTMF mystery.wav
# Output: DTMF: 1 DTMF: 3 DTMF: 3 7 ...

Pattern 6: SSTV (Slow-Scan Television) Signal

An image transmitted as audio using SSTV encoding — a technique from amateur radio. The audio sounds like a sequence of tones with a distinct rhythmic quality. In spectrogram view, it often appears as a regular striped pattern.

# Audacity can't decode SSTV — use it to visualize and confirm the pattern,
# then export and decode externally:

# On Linux:
# $ apt install qsstv
# Open qsstv → set to receive from audio file → play the wav
# OR use an online SSTV decoder (upload the wav file)

# SSTV spectrogram signature: regular vertical stripes with
# a calibration "VIS code" tone at the beginning

Pattern 7: Binary Encoding in Tone Pulses

Two alternating tones represent 1s and 0s. Identify the two frequencies in the spectrogram, note the timing sequence, convert to binary, then to ASCII.

# Spectrogram approach:
# Two horizontal bands alternating = binary encoding
# Note which frequency = 1, which = 0 (try both interpretations)
# Count pulse durations to determine bit boundaries
# Extract sequence: e.g., 01110000 01101001 01100011 01101111 = "pico"

# Tip: if the binary doesn't decode to ASCII, try:
# - Reversing the bit order within each byte
# - Swapping the 1/0 frequency assignment
# - Reading right-to-left instead of left-to-right

Pattern 8: LSB Steganography in WAV

Data hidden in the least significant bits of audio samples — imperceptible to the ear but extractable with dedicated tools. Audacity will show this as a faint noise floor that's slightly irregular, but can't decode it directly.

# Audacity can only flag this as suspicious — the waveform looks mostly clean
# but with an unusual noise pattern in quiet sections.

# Extract with external tools:
# $ pip install stegolsb
# $ wavsteg -r -i mystery.wav -o output.txt -n 1 -b 1000
# OR
# $ python3 -c "
# from scipy.io import wavfile
# import numpy as np
# rate, data = wavfile.read('mystery.wav')
# bits = (data.flatten() & 1).tolist()
# chars = [chr(int(''.join(map(str,bits[i:i+8])),2)) for i in range(0,min(len(bits),800),8)]
# print(''.join(chars))
# "

Pattern 9: Data Appended After Audio Content

Extra bytes appended at the end of the audio file — invisible in Audacity but detectable with binwalk or strings. If the audio content seems suspiciously short relative to the file size, check the raw bytes.

$ binwalk mystery.wav
DECIMAL       HEXADECIMAL     DESCRIPTION
0             0x0             RIFF (little-endian) data, WAVE audio
1048576       0x100000        Zip archive data, at least v2.0

# Zip appended after the audio — extract with dd or binwalk -e
$ dd if=mystery.wav of=hidden.zip bs=1 skip=1048576

Audacity vs Other Audio Tools: How I Actually Decide

Audacity is the right starting point for almost every CTF audio challenge, but it's not the right ending point for all of them. Here's how I decide when to stay in Audacity and when to reach for something else:

Situation	My First Choice	Why Not Audacity?
Unknown audio file, first look	Audacity (spectrogram)	—
Decode SSTV signal	qsstv / online decoder	Audacity can't decode SSTV — only visualize it
Decode DTMF tones programmatically	multimon-ng	Audacity requires manual frequency reading
Extract LSB steganography	wavsteg / stegolsb	Audacity has no LSB extraction feature
Analyze embedded files in WAV	binwalk → then dd	Audacity only works at the audio layer, not raw bytes
Batch process multiple audio files	SoX or FFmpeg	Audacity is GUI-only — no scripting without plug-ins
Morse code decoding from clean audio	Audacity + manual / online decoder	Audacity decodes nothing — it just displays
Noise reduction before external processing	Audacity → export → external tool	Audacity's noise reduction is good; use it as a preprocessor

The mental model that helped me most: Audacity is a visualizer and manual manipulator. It shows you things and lets you transform audio — but it doesn't decode anything automatically. Every pattern that requires actual decoding (SSTV, DTMF, LSB) needs an external tool. Audacity's job is to help you identify which pattern you're dealing with, then you switch tools accordingly.

Full Trial Process Table

Step	Action	Tool / Command	Result	Why it failed / succeeded
1	File identification	file mystery.wav	RIFF WAVE, 16 bit stereo 44100 Hz	Confirmed audio format — nothing unusual at file level
2	String search	strings mystery.wav	grep flag	No output
3	Listen to audio	Audacity → play	Static / garbled tones	Wasted 15+ minutes trying to hear hidden content that wasn't there acoustically
4	Amplify and reverse	Effect → Amplify, Reverse	Louder/backwards static	Wrong approach — reversal and amplification don't reveal spectrogram content
5	Noise reduction	Effect → Noise Reduction	Cleaner static, still nothing	Noise reduction removes the signal along with the noise if you don't know what you're looking for
6	Split stereo channels	Track → Split Stereo Track	Identical channels	This challenge didn't use channel hiding — but worth checking
7	Switch to spectrogram	Track dropdown → Spectrogram	Flag visible in block letters at 2000–4000 Hz	Should have done this at step 1 — this is always the first check
8	Read the flag	(visual inspection)	picoCTF{…}	Done — 20 minutes wasted on steps 3–6

Why Frequency-Domain Thinking Matters Beyond CTF

The spectrogram reveals information that the time domain (waveform) can't show you. This is the same reason that radio engineers, malware analysts working on audio-based C2 channels, and steganography researchers all use spectrogram analysis: data encoded in the frequency domain is imperceptible to a casual listener but mathematically present and extractable.

In real-world forensics contexts, audio steganography has been used to hide communications in seemingly innocent audio files — music shared on public platforms, for instance. The same spectrogram analysis techniques from CTF challenges apply directly: frequency-domain visualization first, then targeted extraction based on what you see.

The insight that sticks from CTF audio challenges: the WAV format stores raw PCM samples, which means the "audio" is just numbers. Those numbers can encode anything — an image in their frequency distribution, bits in their least significant positions, arbitrary bytes appended after the valid audio frames. The file extension says "audio" but the content layer you care about may have nothing to do with sound.

My Current First-Three-Minutes Workflow

This is what I actually do now, after learning the hard way what order matters:

# Step 1: What's the file format?
file target.wav
strings target.wav | grep -i "flag&#124;ctf&#124;pico"

# Step 2: Check for appended data (before even opening Audacity)
binwalk target.wav

# Step 3: Open in Audacity — spectrogram FIRST
# Track dropdown → Spectrogram
# Adjust: Window size 1024 → 2048 → 4096 until content is clear
# Adjust: Max frequency 8000 → 20000 if the default range shows nothing

# Step 4: If spectrogram shows nothing obvious
# - Split stereo channels, check each independently
# - Try invert + mix to reveal phase-cancelled content
# - Switch to waveform: look for Morse-like pulse patterns

# Step 5: If the audio sounds wrong
# - Effect → Reverse (backwards speech)
# - Effect → Change Speed → 50% (double-speed audio)
# - Effect → Change Pitch → -12 semitones (octave-shifted)

# Step 6: If all else fails, export and use external tools
# - DTMF: multimon-ng -t wav -a DTMF target.wav
# - SSTV: qsstv or online decoder
# - LSB: wavsteg or manual Python extraction

The key change from my early approach: spectrogram before listening. It seems backwards — it's an audio file, why wouldn't you listen to it first? But for CTF audio challenges, the spectrogram is more likely to show you the flag in the first ten seconds than your ears are in the first ten minutes. I now treat Audacity's spectrogram view the same way I treat fdisk -l on a disk image: run it immediately, before anything else, to understand what layer the challenge is actually operating on.

DEV Community