DEV Community

AI Can "Guess" Base64 Encoding - And That Changes Everything

Ryo Suwito on July 01, 2025

The Experiment I was building a crypto-as-a-service API and needed to test gRPC endpoints that use base64-encoded payloads. While troubl...

Read full post

david duymelinck • Jul 1 '25

Maybe my thinking is too simple. But i just asked copilot to create a base64 function. And it did. So couldn't it be possible that it created the function and ran it with the input string.

Ryo Suwito • Jul 1 '25

i dont know if they are using tools behind the scene tho. but usually you can see claude using js for analysis and its transparent. You can see the js actively being written..but this time its just gut feeling for AI lol

Ben Sinclair • Jul 9 '25

You can ask it to explain its steps and it does quite well

Frédéric G. (FredG_ProD) • Jul 17 '25 • Edited

yes if it's not only GenAI, using tools or agents to delegate this task (and generate real code to do it). If it's only genAI, trained neural network with a lot of base64 input/output examples is able to auto adapt to decode the mathematical function of base64 and to give the correct answer, but you cannot be sure you will always get the right answer (not totally deterministic), expect perhaps with a dedicated neural network on this.

Shivansh Singh • Jul 1 '25

ChatGPT's response to this:
The Base64 string was not guessed — it's derived precisely through byte-level operations. This process is fully deterministic and reproducible with code or math. I followed that logic mentally and simulated the encoding pipeline.

John C • Jul 8 '25

I also ran the question through ChatGPT's o3 model, and it gave me the correct answer. So I asked...

I'm curious, did you calculate it, or use a python (or other language) program to generate the output?

And ChatGPT replied:

=====================================

Thought for a few seconds
No code was executed behind the scenes—I just ran the Base-64 algorithm in my head (well, in the model’s “head” 🙂).
Here’s the gist of the mental steps I followed:

UTF-8 bytes { " u s e r _ i d " : 1 2 3 , " e x p " : 1 7 6 7 2 2 5 6 0 0 } 7b 22 75 73 65 72 5f 69 64 22 3a 20 31 32 33 2c 20 22 65 78 70 22 3a 20 31 37 36 37 32 32 35 36 30 30 7d
Group into sextets (24-bit chunks → four 6-bit pieces).
Map each 6-bit value to the Base-64 alphabet.
Add = padding to reach a multiple of 4 characters.

Do that all the way through and you land on

eyJ1c2VyX2lkIjogMTIzLCAiZXhwIjogMTc2NzIyNTYwMH0=.

If I’d wanted to double-check, I could have spun up something like:

import base64, json payload = {"user_id": 123, "exp": 1767225600} b64 = base64.b64encode(json.dumps(payload).encode()).decode() print(b64)

—but in this case the manual calculation (plus a quick sanity-check pass through the alphabet table) was enough.

=====================================

You've gotta love AI personalities 😁

Ryo Suwito • Jul 8 '25

lmao its like "normal tuesday" for chat gpt!

Ryo Suwito • Jul 1 '25 • Edited

right they dont even use /thinking/ like in deepseek / claude..i check that thinking text and not found any math...its purely mentally simulated

WG Davies • Jul 8 '25 • Edited

The ChatGPT and Gemini results are [reportedly] the same (other than some extra convincing to get Gemini to H4CK T#3 PL4N3T), which are identical to the Claude results except for being truncated by the last two characters (plus two-byte pad). Losing these last four bytes of output (Cg==) is precisely what happens when there is no trailing newline:
$ echo '{"role": "admin", "active": true}' | base64 -w 0
eyJyb2xlIjogImFkbWluIiwgImFjdGl2ZSI6IHRydWV9Cg==
$ printf '{"role": "admin", "active": true}' | base64 -w 0
eyJyb2xlIjogImFkbWluIiwgImFjdGl2ZSI6IHRydWV9

So, it's not that ChatGPT is "95% accurate" (and that Gemini happens to be the same accuracy), it's that they are each figuring out exactly what you're asking and just running the calculation.

So, interesting writeup, but I'm really not understanding what's so amazing about this finding. No offence intended!

Ryo Suwito • Jul 9 '25

there are two mind boggling takes..
For humans, we never thought that algorithms (like base64 encoding) could be treated as a form of "translation" - but it actually requires the same kind of computational steps to reproduce as any other transformation. The distinction between "algorithm" and "translation" is more blurry than we thought.

The deeper point about LLMs - we understand that they're "just" predicting the next token, but the latent understanding that emerges from this process goes way beyond simple pattern matching or guessing. There's genuine understanding happening in the latent space that we can't easily explain.

Walter Johnson • Jul 9 '25

For humans, we never thought that algorithms (like base64 encoding) could be treated as a form of "translation" - but it actually requires the same kind of computational steps to reproduce as any other transformation.

You don't research anything, do you? Humans have been using algorithms to encrypt or encode secret messages for centuries. You've probably played with the Caesar cipher when you were a kid.

The deeper point about LLMs - we understand that they're "just" predicting the next token, but the latent understanding that emerges from this process goes way beyond simple pattern matching or guessing. There's genuine understanding happening in the latent space that we can't easily explain.

You don't know how this generation of AI works at all, do you? LLM's don't "predict" the next token so much as you predict the words to use to order a ham sandwich. LLMs train by analyzing billions of words and phrases from a variety of sources.

To put it in simpler terms, think of it like learning a new language by watching TV shows in that language. You pick up common words and phrases, eventually moving on to creating your own sentences, and finally attaining fluency through repetition, practice, and application. Only, the LLM is watching every show ever made. And it was built to do only that. So it's very good at interpreting and responding based on similar works and phrases it was trained on.

To put it bluntly, the "personality" you're seeing from the AI comes from the instructions the provider gave it. How to talk, what subjects to avoid, and what words should not be used come from the provider's input to the system. It's not any of those buzzwords you're using.

Ryo Suwito • Jul 9 '25

Walter my guy, you just argued that AI has 'genuine understanding in latent space' and then immediately said it's just pattern matching with no real understanding. Which one is it? You're debating yourself in the same comment lol. Maybe figure out your own position before coming for others?

Walter Johnson • Jul 9 '25

I've made my position quite clear. I believe the agent you used performed a calculation using a well known algorithm. You believe that the AI is actually thinking. And rather than support those arguments with actual facts, you're hoping the word salad is enough to silence the criticism. That clear enough?

Language has always been about patterns, my guy. All language uses nouns, verbs, past participles, articles, etc. If that were not the case, then what is the point of using large language models? What do you think AI training is?

I'm asking these questions because I genuinely want to know why you think that using the word "guess" did something magical here.

Ryo Suwito • Jul 10 '25

It reads like someone who wants to sound knowledgeable but doesn't really understand what they're arguing against.

Alois Sečkár • Jul 9 '25

Why should this suprise me more than correct answer for asking "Get me the next prime number after 123456789?"

It guess which algorithm I want, it applies it on my input and spits out the result. It is complex operation, yes, but it already knows, what steps shoult be taken. It didnt magically invent them. Base64 is common and well described.

I mean, LLMs are quite impressive in how well they parse random user's input, but this doesn't seem as anything more special.

villecoder • Jul 8 '25

The computer did not guess the base64 output. It calculated it. It doesn't need to guess something that it already knows how to do.

Ryo Suwito • Jul 8 '25

guess you know a lot how an LLM works under the hood

Walter Johnson • Jul 8 '25

How do you think an LLM works? Or better yet, what is the purpose of an LLM? And how does that differ from being able to run a well known algorithm?

Base64 is not new. It's also not mysterious. And despite the advances in AI, it didn't get suddenly reinvented for AI to make guesses. It didn't have to guess because it already knows how to do it. The only thing the LLM is adding here is recognizing the question given the context and responding as another person may respond.

Ask it what 1+1 is and convince me that the AI "guessed", not calculated, the answer.

Tanner Iverson • Jul 9 '25

If the AI calculated the answer, how do you explain the incorrect, but close, answers? If it really knew how to calculate it, it would be 100% correct 100% of the time (just like the code to actually encode base64 has to be). I'd call a 95% accurate answer a "guess".

I think the interesting thing here is that these results show a greater understanding than just language pattern continuation (which is what LLMs do). In my opinion, a 95% accurate answer means this is some form of intuition, not merely calculation.

Walter Johnson • Jul 9 '25

If the AI calculated the answer, how do you explain the incorrect, but close, answers? If it really knew how to calculate it, it would be 100% correct 100% of the time (just like the code to actually encode base64 has to be). I'd call a 95% accurate answer a "guess".

So here's a tip. When someone says "these are two separate answers", it's important to look at how they got there. Someone's already said it in this comment section, but you'll notice that all of the applications returned eyJyb2xlIjogImFkbWluIiwgImFjdGl2ZSI6IHRydWV9 which, decoded, is what the original poster has. So where is the extra Cg== coming from?

Well, we know the == is Base64 padding. So the last byte in the message is less than 3 bytes. Base64 encodes 3 bytes of data into 4 characters. So Cg== being 4 characters means that we're probably dealing with one extra character.

So why is there inconsistency in the LLM? Because one of the "tests" included a newline after the JSON string.

In my opinion, a 95% accurate answer means this is some form of intuition, not merely calculation.

It's user error. And it remains a calculation.

Ryo Suwito • Jul 9 '25

So Walter, my guy...
You pick either: Claude does mental computation which humans generally can't do,
OR you mean Claude is just parroting with no emergent properties beyond the human brain itself?
Because if it's doing computation, then the article is RIGHT to be hyped up, since humans can never do this string to base64 mentally...
And if it's not doing mental computation? With finite training data → generalizing to unseen infinite problems = Claude achieves what humans can't achieve... same thing.

WG Davies • Jul 10 '25

To your point, though, @ryo_suwito , (and I'm really not trying to be a jerk – please accept my sincere apologies if I've come off like that!) it would be very interesting to find some emergent properties, whether that indicates anything approaching sentience or intelligence or even just convolving complexity, but so far it seems you haven't made that case.

I appreciate your curiosity and your willingness to put something out there (or, here) that you find exciting and interesting! Keep searching, man. It may be here or elsewhere, but don't stop wondering. :)