The Invisible Character That Cost Me Too Much Debugging Time

#testing #api #devops #opensource

The Problem

Imagine this: someone tells you they can't log in.
At first, it feels like the kind of bug you can squash before finishing your coffee. Maybe they fat-fingered their
password. Maybe their browser cache is holding onto stale session cookies. Easy.

Except it isn't.

You check the admin panel. The email is there: james.bond@mi6.com. Looks fine. The password hash matches what the user typed in. Logs show the request hitting the system cleanly. No obvious anomalies. And yet, every attempt comes back as “invalid credentials.”

You start peeling layers. Caching? No. Database encoding? No. By mid-afternoon, you're questioning whether you've misunderstood something as fundamental as string equality.

Finally, desperation pushes you into a hex editor. You copy the email from the database, paste it in, and there it is:

E2 80 8B

A zero-width space. An invisible Unicode character sitting between bond and @. Your eyes missed it, but the system
didn't.

What looked identical to a human was actually two different strings at the byte level.

How does something like this even happen?

Ghost characters are present more often than you'd expect. Some common causes:

Copy-paste from PDFs or Word docs: Rich-text formats often inject hidden control characters.
Email clients and chat apps: Some insert soft hyphens, directionality markers, or non-breaking spaces.
Keyboards and IMEs: Certain language input systems add combining marks or zero-width joiners.

The registration pipeline happily accepted it:

Regex validation? Passed.
Database insertion? No complaint.
API payload? Looked fine in JSON.

But during login, when the user typed the email manually (without the phantom character) the system compared apples to
almost-apples. Result: authentication failure.

And it's not just zero-width spaces:

Soft hyphen (U+00AD): invisible unless line breaks occur.
Left-to-right / right-to-left markers (U+200E, U+200F): wreak havoc on string rendering.
Homoglyphs: characters from different scripts that look identical (Latin a vs Cyrillic а).

Testing What You Can't See

The truth is most test suites don't cover this.
Think about your own testing. If I've asked you now, would you know exactly how your API
handles these edge cases? You probably check for bad input, missing fields, maybe even SQL injection. But do you
test what happens if a username has a zero-width space? Or if someone's password contains a right-to-left marker? Or if
two strings that look the same aren't really the same?

Of course not. Few do. And it's usually after you got burned by it.

Enter Dochia

That's where Dochia comes in. It's focused on systematic testing, not just ad-hoc bug reproductions.

The core idea: automate negative and boundary testing for real-world APIs, with a focus on invisible, hard-to-spot input bugs boundary cases,
missing required fields, oversized inputs, etc. Everything that's predictable and typically independent on the business logic.

Dochia takes an OpenAPI spec and generates smart payloads. Think of it as an
engine tuned specifically for negative and boundary testing.

It's as simple as:

dochia test -c api-spec.yml -s http://localhost:8080

It sends crafted requests and produces a report with concrete reproductions: the exact payload, the response, and why
it's suspicious. All within seconds.

Additional Bugs it Found

The zero-width space was one example. Dochia surfaced many others:

Passwords with null bytes (\x00)
- The backend silently truncated at the null, so hunter2\0evil matched hunter2.
- Result: partial-password login bypass.
Unicode minus sign bug (U+2212)
- Input: –25 (using a Unicode minus, not ASCII -).
- System accepted negative ages during registration.
Duplicate usernames bypass
- john vs john<U+200B> stored as distinct records.
- Duplicate-check logic missed it, leading to collisions.
Emoji handling
- API accepted alex🙂.
- DB stored it fine (UTF-8).
- UI list view exploded when rendering, because the frontend assumed fixed byte width per character.

Examples are from different projects, but follow similar pattern: missing edge-cases in the system, leading to unexpected behavior.

Why Share This?

Because these are not “big company problems.”
They're the kind of bugs that any developer can hit when their system meets messy, unpredictable user input.

So Dochia is free and open source:

brew install dochia-dev/tap/dochia-cli
dochia test -c openapi.yaml -s http://localhost:8080

Run it, and you'll see what bugs it can find.

The Reality of User Input

Users don't type raw ASCII. They paste from documents, switch keyboard layouts, run browser extensions, sit behind
proxies that rewrite headers. Every layer introduces noise.

If your system assumes “well-formed UTF-8 strings without weirdness,” you'll eventually burn hours, days, chasing
invisible bugs.

And you can either find those surprises early… or waste three days chasing an invisible space.

Don't be me.