🦄‍🪽 Unicode Characters & GlassWorm 🥛🐛

#security

Invisible Unicode Characters and the GlassWorm Malware

In October of last year, researchers at Koi Security discovered malware targeting Visual Studio Code and Open VSX extensions. By the time it was detected, it had already spread to around 35,000 machines.

It was only discovered after an extension “introduced some suspicious behavioral changes”, which prompted deeper investigation by the researchers

Instead of inserting visible malicious code, it used Unicode characters to inject hidden instructions directly into the source. These characters occupy space in a file but have no visual representation, so the malicious code is effectively invisible to the human eye. You could review the source normally and never see the payload.

Once installed, it got right to work. The extension would:

- Harvest credentials from npm, GitHub, and Git configs
- Look for and target cryptocurrency extensions to drain funds
- Deploy proxy servers to help create botnets
- Install hidden VNC servers for remote access
Then use all of the above to compromise other packages and continue spreading

Because it targeted developer environments directly, it could move through dependency chains and infect other packages, turning infected machines into distribution points for the malware.

How the “Invisible Code” Works

It’s actually pretty easy to find tools that generate these characters.

Every text character in the digital world has a unique identifier called a Unicode code point. That includes letters, punctuation, emojis — everything. Computers don’t read text visually like we do; they read those numeric code points.

Some Unicode characters exist without a visual representation. They technically occupy space in a string, but nothing is rendered on the screen. Things like zero-width spaces or joiners fall into this category.

You can even copy and paste them from tools like:

https://invisible-characters.com/
https://invisible-characters.net/

These characters actually have legitimate uses — especially for formatting text or controlling layout in multilingual systems or complex scripts.

Here is the malicious code from GlassWorm's initial detection:

You cannot see the malicious code at all!

I was able to find a few javascript Unicode Detectors- so you CAN find these, but this kind of check wouldn't normally be something you'd make/run on every npm package.

Here's my example of one:

function detectUnicode(str) {
  const unicodeChars = [];

  for (const char of str) {
    const codePoint = char.codePointAt(0);
    if (codePoint > 127) {
      unicodeChars.push({
        char,
        codePoint: `U+${codePoint.toString(16).toUpperCase().padStart(4, '0')}`,
        decimal: codePoint,
      });
    }
  }

  if (unicodeChars.length === 0) {
    console.log("No Unicode characters found.");
    return;
  }

  console.log(`Found ${unicodeChars.length} Unicode character(s):`);
  unicodeChars.forEach(({ char, codePoint, decimal }) => {
    console.log(`  '${char}' → ${codePoint} (decimal: ${decimal})`);
  });

  return unicodeChars;
}

Copy this into your editor and call detectUnicode(string); with a string with emojis or invisible unicode text.

Code inspired by: how to get a unicode code point in JS

After It Was Found:

Once the malware was finally detected, Microsoft and the Open VSX maintainers removed the compromised extensions from their registries.

But by that point, it had already spread widely across developer environments and package ecosystems.

GlassWorm is a good example of how supply chain attacks against developers are evolving — sometimes the exploit isn’t in the code you see, but in the characters you can’t.