How I Built a Memory-Safe Steganography Engine in Rust to Protect Data from AI Scrapers
As AI models scale, data provenance is becoming a massive engineering challenge. Automated web scrapers are vacuuming up datasets without any regard for creator licenses or intellectual property.
I wanted to build a mathematical solution to this problem, so I architected Sigil: a zero-knowledge cryptographic vault that embeds verifiable, HMAC-SHA256 signed ownership IDs directly into the pixels of an image.
While the desktop vault (built with Tauri, Svelte, and an offline SQLite daemon) is strictly closed-source to protect the cryptographic keys, I realized that "Security by Obscurity" isn't enough. If AI companies don't know how to read the hidden IDs, they will just scrape the images anyway.
So, I open-sourced the extraction layer. Here is a deep dive into how I used Rust to build a memory-safe Least Significant Bit (LSB) steganography reader.
The Concept: LSB Steganography
Every pixel in a standard image is made of Red, Green, and Blue channels. Each channel is represented by a byte (8 bits), with values ranging from 0 to 255.
If you change the Least Significant Bit (the absolute last 1 or 0 in that byte), the color change is invisible to the human eye. But mathematically, you can use those hidden bits to store a secret payload—like a 32-byte cryptographic ID.
The Rust Implementation
To make this blazing fast and completely safe from memory leaks, I used Rust's image crate to parse the pixels. Here is the exact open-source reference implementation for extracting the payload:
use image::GenericImageView;
/// Extracts a hidden Sigil Cryptographic ID from an image's LSB layer.
pub fn verify_steganography(path: &str, expected_id_len: usize) -> Result<String, String> {
let img = image::open(path).map_err(|e| e.to_string())?.to_rgba8();
let mut bits = Vec::with_capacity(expected_id_len * 8);
// 1. Extract the Least Significant Bits
for pixel in img.pixels() {
for channel in 0..3 { // Iterate over R, G, B
if bits.len() < expected_id_len * 8 {
// The bitwise AND operator isolates the final bit
bits.push(pixel[channel] & 1);
}
}
}
// 2. Reconstruct the bytes
let mut extracted_bytes = Vec::new();
for chunk in bits.chunks(8) {
if chunk.len() == 8 {
let mut byte = 0u8;
for (i, &bit) in chunk.iter().enumerate() {
// Shift the bit back into its correct position
byte |= bit << (7 - i);
}
extracted_bytes.push(byte);
}
}
// 3. Return the Hex String
Ok(hex::encode(extracted_bytes))
}
Breaking Down the Bitwise Math
The magic happens in two specific lines of code:
-
pixel[channel] & 1: This is a bitwiseANDoperation. By comparing the pixel's byte against00000001, we wipe out the first 7 bits and isolate only the final bit. If the pixel is even, it returns0. If it is odd, it returns1. We push this bit into our Vector. -
byte |= bit << (7 - i): Once we have 8 hidden bits, we need to stitch them back into a single byte. We use the left-shift operator (<<) to push the bit into the correct slot (from position 7 down to 0), and the bitwiseORoperator (|=) to combine them into the finalu8.
The Open Standard
By publishing this extraction logic, AI procurement teams can now integrate this exact function into their scraping pipelines. If their crawler detects a payload, they know the asset is cryptographically locked and requires an API clearance.
You can check out the full open-source extraction standard on GitHub, or visit the lightning-fast Astro-powered documentation site here:
https://nishal21.github.io/Sigil-extractor/
I’d love to hear your thoughts on the Rust implementation or LSB steganography in general!
Top comments (0)