DEV Community

Cover image for A CAPTCHA Bypass Technique: Audio Files
Çalgan Aygün
Çalgan Aygün

Posted on

A CAPTCHA Bypass Technique: Audio Files

This is a draft write-up from 2019, documenting a CAPTCHA bypass technique I discovered back then. All code and images shown are examples for educational purposes.


One day back in 2019, I got tired of repeatedly logging into my school's student system just to enroll in a full class. Then a lightbulb went off in my head: I could automate these attempts and refocus on my actual work.

When it comes to automating website processes, I love using user scripts.

1. Automation Plan

Our school system expired user sessions after a while, even if you were actively working. So first, I needed to automate the login process. Once successfully logged in, navigating and enrolling in a class would be straightforward (just some click event magic).

2. Login UI

The school system had an internally managed CAPTCHA service. It displayed two numbers and asked for their sum. The images were slightly corrupted—but not enough to prevent OCR. However, I didn't want to rely on any API or image processing system to solve this CAPTCHA.

Example captcha image

3. Inspecting the Audio Service for CAPTCHA

After deciding not to use any image-related service, I started inspecting the network traffic of the login system. I noticed that the answer to the generated CAPTCHA was stored server-side in a session associated with me.

When I clicked the audio playback button, I realized it was reading out the answer directly. Two different endpoints returned two different audio files, split into tens and ones digits.

The audio system seemed like a perfect route for me. I had no intention of feeding these audio files into a speech-to-text service—I was looking for the fastest, hackiest solution possible.

Since the same audio files should have the same file size, I decided to test this hypothesis first.

Example HTTP traffic

And bingo! By mapping each tens and ones digit to their file sizes beforehand, I could automatically determine the answer.

4. The Solution

Here's an example of how the code worked:

// Example mapping of file sizes to digit values
const tensMap = {
  1234: 0,    // 0 tens
  1456: 10,   // 1 ten
  1567: 20,   // 2 tens
  // ... etc
};

const onesMap = {
  987: 0,     // 0 ones
  1023: 1,    // 1 one
  1145: 2,    // 2 ones
  // ... etc
};

// Example: Fetch audio files and determine answer by file size
async function solveCaptcha() {
  const tensResponse = await fetch('/captcha/audio/tens');
  const onesResponse = await fetch('/captcha/audio/ones');

  const tensSize = parseInt(tensResponse.headers.get('content-length'));
  const onesSize = parseInt(onesResponse.headers.get('content-length'));

  const tensValue = tensMap[tensSize] || 0;
  const onesValue = onesMap[onesSize] || 0;

  return tensValue + onesValue;
}

// Use it in the login flow
const answer = await solveCaptcha();
document.querySelector('#captcha-input').value = answer;
document.querySelector('#login-button').click();
Enter fullscreen mode Exit fullscreen mode

When I tested this approach, it worked perfectly!

5. Lessons Learned

This experience taught me a few things:

  1. Look for alternative attack vectors: When the obvious solution (OCR) seems complex, there might be simpler paths (audio files, metadata, etc.)
  2. Security through obscurity fails: Just because a CAPTCHA uses audio doesn't mean it's secure—especially if the files are deterministic
  3. File size is metadata: Even without processing the content, file properties can leak information

This technique worked because the audio files were pre-generated and static. A more secure implementation would generate unique audio files or add random noise to prevent size-based fingerprinting.

Top comments (0)