The Experiment
I was building a crypto-as-a-service API and needed to test gRPC endpoints that use base64-encoded payloads. While troubleshooting with Claude AI, something bizarre happened:
Me: "Can you guess the base64 for this JSON: {"user_id": 123, "exp": 1767225600}
?"
Claude: "eyJ1c2VyX2lkIjogMTIzLCAiZXhwIjogMTc2NzIyNTYwMH0K"
I ran the actual encoding:
echo '{"user_id": 123, "exp": 1767225600}' | base64 -w 0
# Output: eyJ1c2VyX2lkIjogMTIzLCAiZXhwIjogMTc2NzIyNTYwMH0K
Perfect match.
This Shouldn't Be Possible
Base64 encoding involves multiple deterministic steps:
- Convert each character to ASCII bytes
- Concatenate all bytes into a bit stream
- Split into 6-bit chunks
- Map each chunk to base64 alphabet (A-Z, a-z, 0-9, +, /)
- Add padding if needed
Humans can't do this calculation mentally. It requires precise byte-level operations that our brains aren't designed for.
Testing Across AI Systems
I tested the same prompt on multiple AI systems with {"role": "admin", "active": true}
:
Actual base64: eyJyb2xlIjogImFkbWluIiwgImFjdGl2ZSI6IHRydWV9Cg==
Results:
-
ChatGPT:
eyJyb2xlIjogImFkbWluIiwgImFjdGl2ZSI6IHRydWV9
(95% accurate) -
Claude:
eyJyb2xlIjogImFkbWluIiwgImFjdGl2ZSI6IHRydWV9Cg==
(100% accurate) -
Gemini: Initially refused, then:
eyJyb2xlIjogImFkbWluIiwgImFjdGl2ZSI6IHRydWV9
- Grok: Wrong output, but attempted the structure
What This Actually Means
This isn't memorization. AI systems haven't seen every possible JSON-to-base64 combination in training. The space is too large.
This isn't simple pattern matching. Base64 depends on exact byte sequences. One character difference completely changes the output.
This appears to be algorithmic intuition. AI systems have internalized the mathematical relationship between inputs and base64 outputs.
Beyond "Stochastic Parrots"
The criticism that AI systems are just "stochastic parrots" regurgitating training data doesn't explain this capability.
Parrots repeat what they've heard. This is different - AI systems are predicting outputs of deterministic algorithms they weren't explicitly trained to compute.
Technical reality: The models have learned to approximate mathematical functions from examples, not just memorize text sequences.
Implications for Developers
For encoding/decoding tasks: AI might predict outputs faster than running actual algorithms.
For algorithm design: If AI can internalize mathematical relationships this precisely, it challenges assumptions about what constitutes "computation."
For security: While base64 is just encoding (not encryption), this capability raises questions about AI's potential against other algorithmic systems.
For AI capabilities: This suggests emergent mathematical reasoning that goes beyond text generation.
The Technical Reality
What we observed: AI systems demonstrating algorithmic intuition
What we don't know: How far this capability extends
What's clear: Current AI systems have abilities we didn't expect and don't fully understand
Testing This Yourself
Try asking your favorite AI system to "guess" the base64 encoding of simple JSON strings. Don't ask it to calculate - just ask for a guess.
Compare the results to actual encoding:
echo '{"test": "data"}' | base64 -w 0
The accuracy might surprise you.
Bottom Line
AI systems are developing capabilities that challenge our understanding of what they can do. Whether this represents genuine algorithmic reasoning or extremely sophisticated pattern recognition, the practical result is the same: AI can predict mathematical operations it wasn't explicitly designed to perform.
For developers: Don't assume AI limitations based on theoretical models. Test actual capabilities.
For researchers: We need better frameworks for understanding and measuring emergent AI abilities.
For the industry: The "just predicting next tokens" explanation is becoming insufficient for observed AI behavior.
The line between pattern recognition and computation is blurrier than we thought.
Top comments (30)
Maybe my thinking is too simple. But i just asked copilot to create a base64 function. And it did. So couldn't it be possible that it created the function and ran it with the input string.
i dont know if they are using tools behind the scene tho. but usually you can see claude using js for analysis and its transparent. You can see the js actively being written..but this time its just gut feeling for AI lol
You can ask it to explain its steps and it does quite well
ChatGPT's response to this:
The Base64 string was not guessed — it's derived precisely through byte-level operations. This process is fully deterministic and reproducible with code or math. I followed that logic mentally and simulated the encoding pipeline.
I also ran the question through ChatGPT's o3 model, and it gave me the correct answer. So I asked...
And ChatGPT replied:
=====================================
Thought for a few seconds
No code was executed behind the scenes—I just ran the Base-64 algorithm in my head (well, in the model’s “head” 🙂).
Here’s the gist of the mental steps I followed:
{ " u s e r _ i d " : 1 2 3 , " e x p " : 1 7 6 7 2 2 5 6 0 0 } 7b 22 75 73 65 72 5f 69 64 22 3a 20 31 32 33 2c 20 22 65 78 70 22 3a 20 31 37 36 37 32 32 35 36 30 30 7d
Do that all the way through and you land on
eyJ1c2VyX2lkIjogMTIzLCAiZXhwIjogMTc2NzIyNTYwMH0=.
If I’d wanted to double-check, I could have spun up something like:
import base64, json
payload = {"user_id": 123, "exp": 1767225600}
b64 = base64.b64encode(json.dumps(payload).encode()).decode()
print(b64)
—but in this case the manual calculation (plus a quick sanity-check pass through the alphabet table) was enough.
=====================================
You've gotta love AI personalities 😁
lmao its like "normal tuesday" for chat gpt!
right they dont even use /thinking/ like in deepseek / claude..i check that thinking text and not found any math...its purely mentally simulated
The ChatGPT and Gemini results are [reportedly] the same (other than some extra convincing to get Gemini to H4CK T#3 PL4N3T), which are identical to the Claude results except for being truncated by the last two characters (plus two-byte pad). Losing these last four bytes of output (
Cg==
) is precisely what happens when there is no trailing newline:$ echo '{"role": "admin", "active": true}' | base64 -w 0
eyJyb2xlIjogImFkbWluIiwgImFjdGl2ZSI6IHRydWV9Cg==
$ printf '{"role": "admin", "active": true}' | base64 -w 0
eyJyb2xlIjogImFkbWluIiwgImFjdGl2ZSI6IHRydWV9
So, it's not that ChatGPT is "95% accurate" (and that Gemini happens to be the same accuracy), it's that they are each figuring out exactly what you're asking and just running the calculation.
So, interesting writeup, but I'm really not understanding what's so amazing about this finding. No offence intended!
there are two mind boggling takes..
For humans, we never thought that algorithms (like base64 encoding) could be treated as a form of "translation" - but it actually requires the same kind of computational steps to reproduce as any other transformation. The distinction between "algorithm" and "translation" is more blurry than we thought.
The deeper point about LLMs - we understand that they're "just" predicting the next token, but the latent understanding that emerges from this process goes way beyond simple pattern matching or guessing. There's genuine understanding happening in the latent space that we can't easily explain.
You don't research anything, do you? Humans have been using algorithms to encrypt or encode secret messages for centuries. You've probably played with the Caesar cipher when you were a kid.
You don't know how this generation of AI works at all, do you? LLM's don't "predict" the next token so much as you predict the words to use to order a ham sandwich. LLMs train by analyzing billions of words and phrases from a variety of sources.
To put it in simpler terms, think of it like learning a new language by watching TV shows in that language. You pick up common words and phrases, eventually moving on to creating your own sentences, and finally attaining fluency through repetition, practice, and application. Only, the LLM is watching every show ever made. And it was built to do only that. So it's very good at interpreting and responding based on similar works and phrases it was trained on.
To put it bluntly, the "personality" you're seeing from the AI comes from the instructions the provider gave it. How to talk, what subjects to avoid, and what words should not be used come from the provider's input to the system. It's not any of those buzzwords you're using.
Walter my guy, you just argued that AI has 'genuine understanding in latent space' and then immediately said it's just pattern matching with no real understanding. Which one is it? You're debating yourself in the same comment lol. Maybe figure out your own position before coming for others?
I've made my position quite clear. I believe the agent you used performed a calculation using a well known algorithm. You believe that the AI is actually thinking. And rather than support those arguments with actual facts, you're hoping the word salad is enough to silence the criticism. That clear enough?
Language has always been about patterns, my guy. All language uses nouns, verbs, past participles, articles, etc. If that were not the case, then what is the point of using large language models? What do you think AI training is?
I'm asking these questions because I genuinely want to know why you think that using the word "guess" did something magical here.
It reads like someone who wants to sound knowledgeable but doesn't really understand what they're arguing against.
Oh for the ...
Will you just answer the questions?
How are you writing a sentence or a function in code? Are you predicting the next token, just like I am writing this comment?
How I'm doing that makes me smart - saying Oh, it's just making a list of tokens is, to me, like saying human thought is just putting one word after another - it's how I express my thoughts, not how I have them.
The research around how Claude thinks is fascinating.
Again, I don't understand why this supposition is necessary. I think it's way too easy to get programmed by the AI into thinking that it's "thinking" (in a lingual sense) than to just recognise what it's doing… parsing input to render deterministic output. What makes the LLM transform unique is that it's not necessarily linear: The same input may yield different output and vice versa. However, here you ask it to "guess" a transform, which it doesn't do *; rather, it interprets / translates (according to a lookup/Levenstein, just like your hungry Google search) "guess" to "solve" and does so. A "real" AI should say, "Guess?! Boy, you crazy! But if you give me a cookie, I'll calculate it for you. If you give me two cookies, I'll even show you the steps!" There is either nothing predictive happening or, if there is something predictive happening, it is not apparent from this example. I'm sure, for example, that the CPU is doing some predictive caching, but I posit that's not what you have in mind.
It reads like someone who wants to sound knowledgeable but doesn't really understand what they're arguing against.
Could you please clarify your statement? I'm quite honestly not sure what you're arguing against.
I grant that my response, above, is verbose. I'll summarise your findings from the article here with my direct responses:
printf(base64(string))
. I kan haz dat!"wget --spider https://wikipedia.org | base64 -w 0
. I definitely grant here that the supposed implication here is really interesting! Unfortunately, it still looks like the closest distance between two points is still a straight line.Why should this suprise me more than correct answer for asking "Get me the next prime number after 123456789?"
It guess which algorithm I want, it applies it on my input and spits out the result. It is complex operation, yes, but it already knows, what steps shoult be taken. It didnt magically invent them. Base64 is common and well described.
I mean, LLMs are quite impressive in how well they parse random user's input, but this doesn't seem as anything more special.
The computer did not guess the base64 output. It calculated it. It doesn't need to guess something that it already knows how to do.
guess you know a lot how an LLM works under the hood
How do you think an LLM works? Or better yet, what is the purpose of an LLM? And how does that differ from being able to run a well known algorithm?
Base64 is not new. It's also not mysterious. And despite the advances in AI, it didn't get suddenly reinvented for AI to make guesses. It didn't have to guess because it already knows how to do it. The only thing the LLM is adding here is recognizing the question given the context and responding as another person may respond.
Ask it what 1+1 is and convince me that the AI "guessed", not calculated, the answer.
If the AI calculated the answer, how do you explain the incorrect, but close, answers? If it really knew how to calculate it, it would be 100% correct 100% of the time (just like the code to actually encode base64 has to be). I'd call a 95% accurate answer a "guess".
I think the interesting thing here is that these results show a greater understanding than just language pattern continuation (which is what LLMs do). In my opinion, a 95% accurate answer means this is some form of intuition, not merely calculation.
So here's a tip. When someone says "these are two separate answers", it's important to look at how they got there. Someone's already said it in this comment section, but you'll notice that all of the applications returned
eyJyb2xlIjogImFkbWluIiwgImFjdGl2ZSI6IHRydWV9
which, decoded, is what the original poster has. So where is the extra Cg== coming from?Well, we know the == is Base64 padding. So the last byte in the message is less than 3 bytes. Base64 encodes 3 bytes of data into 4 characters. So Cg== being 4 characters means that we're probably dealing with one extra character.
So why is there inconsistency in the LLM? Because one of the "tests" included a newline after the JSON string.
It's user error. And it remains a calculation.
So Walter, my guy...
You pick either: Claude does mental computation which humans generally can't do,
OR you mean Claude is just parroting with no emergent properties beyond the human brain itself?
Because if it's doing computation, then the article is RIGHT to be hyped up, since humans can never do this string to base64 mentally...
And if it's not doing mental computation? With finite training data → generalizing to unseen infinite problems = Claude achieves what humans can't achieve... same thing.
First, I'm not your guy.
Second, you offer two options that are fundamentally flawed. Claude is not doing anything mental because Claude has no mind. It's ARTIFICIAL!. The computation it's doing doesn't require thought. There's no opportunity for emergent behavior. Math is math. Computer science is computer science. Psychoanalysis is completely unrelated.
Third, you keep bringing up this point that "humans can't do the same thing in their head". Well duh! We're not computers. We don't think in binary. "Protocol" means something entirely different to us than it does to a computer. How is our inability to perform complex calculations mentally evidence of an AI is showing signs of intuition?
To be fair (and beat on the point a little harder), I don't even know why a human couldn't (at least in principle) perform the calculation. As repeatedly demonstrated on this page, it's a very basic transform that can be performed serially with a two-byte buffer. Like Babbage's log tables, it would be tedious to calculate more than a few bytes but (as stated by @villecoder ), this is quite literally why "automatic calculators" came into being in the first place! The astonishing waste of power and CPU cycles to ask an "AI" to "guess" a base64 transform is matched only by the mental waste of people thinking these machines are thinking. News Flash: Machine sentience has been threatening human existence since the Jacquard Loom and – guess what – it's still not doing anything more interesting than shifting more and more bits around.
To your point, though, @ryo_suwito , (and I'm really not trying to be a jerk – please accept my sincere apologies if I've come off like that!) it would be very interesting to find some emergent properties, whether that indicates anything approaching sentience or intelligence or even just convolving complexity, but so far it seems you haven't made that case.
I appreciate your curiosity and your willingness to put something out there (or, here) that you find exciting and interesting! Keep searching, man. It may be here or elsewhere, but don't stop wondering. :)
Wait until you find out llms have been able to run python interpreters in a sandbox
With bigger strings and random generated data it obviously breaks.