Ryo Suwito

Posted on Jul 1

AI Can "Guess" Base64 Encoding - And That Changes Everything

#programming #webdev #rust #cybersecurity

The Experiment

I was building a crypto-as-a-service API and needed to test gRPC endpoints that use base64-encoded payloads. While troubleshooting with Claude AI, something bizarre happened:

Me: "Can you guess the base64 for this JSON: {"user_id": 123, "exp": 1767225600}?"

Claude: "eyJ1c2VyX2lkIjogMTIzLCAiZXhwIjogMTc2NzIyNTYwMH0K"

I ran the actual encoding:

echo '{"user_id": 123, "exp": 1767225600}' | base64 -w 0
# Output: eyJ1c2VyX2lkIjogMTIzLCAiZXhwIjogMTc2NzIyNTYwMH0K

Perfect match.

This Shouldn't Be Possible

Base64 encoding involves multiple deterministic steps:

Convert each character to ASCII bytes
Concatenate all bytes into a bit stream
Split into 6-bit chunks
Map each chunk to base64 alphabet (A-Z, a-z, 0-9, +, /)
Add padding if needed

Humans can't do this calculation mentally. It requires precise byte-level operations that our brains aren't designed for.

Testing Across AI Systems

I tested the same prompt on multiple AI systems with {"role": "admin", "active": true}:

Actual base64: eyJyb2xlIjogImFkbWluIiwgImFjdGl2ZSI6IHRydWV9Cg==

Results:

ChatGPT: eyJyb2xlIjogImFkbWluIiwgImFjdGl2ZSI6IHRydWV9 (95% accurate)
Claude: eyJyb2xlIjogImFkbWluIiwgImFjdGl2ZSI6IHRydWV9Cg== (100% accurate)
Gemini: Initially refused, then: eyJyb2xlIjogImFkbWluIiwgImFjdGl2ZSI6IHRydWV9
Grok: Wrong output, but attempted the structure

What This Actually Means

This isn't memorization. AI systems haven't seen every possible JSON-to-base64 combination in training. The space is too large.

This isn't simple pattern matching. Base64 depends on exact byte sequences. One character difference completely changes the output.

This appears to be algorithmic intuition. AI systems have internalized the mathematical relationship between inputs and base64 outputs.

Beyond "Stochastic Parrots"

The criticism that AI systems are just "stochastic parrots" regurgitating training data doesn't explain this capability.

Parrots repeat what they've heard. This is different - AI systems are predicting outputs of deterministic algorithms they weren't explicitly trained to compute.

Technical reality: The models have learned to approximate mathematical functions from examples, not just memorize text sequences.

Implications for Developers

For encoding/decoding tasks: AI might predict outputs faster than running actual algorithms.

For algorithm design: If AI can internalize mathematical relationships this precisely, it challenges assumptions about what constitutes "computation."

For security: While base64 is just encoding (not encryption), this capability raises questions about AI's potential against other algorithmic systems.

For AI capabilities: This suggests emergent mathematical reasoning that goes beyond text generation.

The Technical Reality

What we observed: AI systems demonstrating algorithmic intuition

What we don't know: How far this capability extends

What's clear: Current AI systems have abilities we didn't expect and don't fully understand

Testing This Yourself

Try asking your favorite AI system to "guess" the base64 encoding of simple JSON strings. Don't ask it to calculate - just ask for a guess.

Compare the results to actual encoding:

echo '{"test": "data"}' | base64 -w 0

The accuracy might surprise you.

Bottom Line

AI systems are developing capabilities that challenge our understanding of what they can do. Whether this represents genuine algorithmic reasoning or extremely sophisticated pattern recognition, the practical result is the same: AI can predict mathematical operations it wasn't explicitly designed to perform.

For developers: Don't assume AI limitations based on theoretical models. Test actual capabilities.

For researchers: We need better frameworks for understanding and measuring emergent AI abilities.

For the industry: The "just predicting next tokens" explanation is becoming insufficient for observed AI behavior.

The line between pattern recognition and computation is blurrier than we thought.

Top comments (31)

david duymelinck • Jul 1

Maybe my thinking is too simple. But i just asked copilot to create a base64 function. And it did. So couldn't it be possible that it created the function and ran it with the input string.

Ryo Suwito • Jul 1

i dont know if they are using tools behind the scene tho. but usually you can see claude using js for analysis and its transparent. You can see the js actively being written..but this time its just gut feeling for AI lol

Ben Sinclair • Jul 9

You can ask it to explain its steps and it does quite well

Frédéric G. (FredG_ProD) • Jul 17 • Edited

yes if it's not only GenAI, using tools or agents to delegate this task (and generate real code to do it). If it's only genAI, trained neural network with a lot of base64 input/output examples is able to auto adapt to decode the mathematical function of base64 and to give the correct answer, but you cannot be sure you will always get the right answer (not totally deterministic), expect perhaps with a dedicated neural network on this.

Shivansh Singh • Jul 1

ChatGPT's response to this:
The Base64 string was not guessed — it's derived precisely through byte-level operations. This process is fully deterministic and reproducible with code or math. I followed that logic mentally and simulated the encoding pipeline.

John C • Jul 8

I also ran the question through ChatGPT's o3 model, and it gave me the correct answer. So I asked...

I'm curious, did you calculate it, or use a python (or other language) program to generate the output?

And ChatGPT replied:

=====================================

Thought for a few seconds
No code was executed behind the scenes—I just ran the Base-64 algorithm in my head (well, in the model’s “head” 🙂).
Here’s the gist of the mental steps I followed:

UTF-8 bytes { " u s e r _ i d " : 1 2 3 , " e x p " : 1 7 6 7 2 2 5 6 0 0 } 7b 22 75 73 65 72 5f 69 64 22 3a 20 31 32 33 2c 20 22 65 78 70 22 3a 20 31 37 36 37 32 32 35 36 30 30 7d
Group into sextets (24-bit chunks → four 6-bit pieces).
Map each 6-bit value to the Base-64 alphabet.
Add = padding to reach a multiple of 4 characters.

Do that all the way through and you land on

eyJ1c2VyX2lkIjogMTIzLCAiZXhwIjogMTc2NzIyNTYwMH0=.

If I’d wanted to double-check, I could have spun up something like:

import base64, json payload = {"user_id": 123, "exp": 1767225600} b64 = base64.b64encode(json.dumps(payload).encode()).decode() print(b64)

—but in this case the manual calculation (plus a quick sanity-check pass through the alphabet table) was enough.

=====================================

You've gotta love AI personalities 😁

Ryo Suwito • Jul 8

lmao its like "normal tuesday" for chat gpt!

Ryo Suwito • Jul 1 • Edited

right they dont even use /thinking/ like in deepseek / claude..i check that thinking text and not found any math...its purely mentally simulated

WG Davies • Jul 8 • Edited

The ChatGPT and Gemini results are [reportedly] the same (other than some extra convincing to get Gemini to H4CK T#3 PL4N3T), which are identical to the Claude results except for being truncated by the last two characters (plus two-byte pad). Losing these last four bytes of output (Cg==) is precisely what happens when there is no trailing newline:
$ echo '{"role": "admin", "active": true}' | base64 -w 0
eyJyb2xlIjogImFkbWluIiwgImFjdGl2ZSI6IHRydWV9Cg==
$ printf '{"role": "admin", "active": true}' | base64 -w 0
eyJyb2xlIjogImFkbWluIiwgImFjdGl2ZSI6IHRydWV9

So, it's not that ChatGPT is "95% accurate" (and that Gemini happens to be the same accuracy), it's that they are each figuring out exactly what you're asking and just running the calculation.

So, interesting writeup, but I'm really not understanding what's so amazing about this finding. No offence intended!

Ryo Suwito • Jul 9

there are two mind boggling takes..
For humans, we never thought that algorithms (like base64 encoding) could be treated as a form of "translation" - but it actually requires the same kind of computational steps to reproduce as any other transformation. The distinction between "algorithm" and "translation" is more blurry than we thought.

The deeper point about LLMs - we understand that they're "just" predicting the next token, but the latent understanding that emerges from this process goes way beyond simple pattern matching or guessing. There's genuine understanding happening in the latent space that we can't easily explain.

Walter Johnson • Jul 9

For humans, we never thought that algorithms (like base64 encoding) could be treated as a form of "translation" - but it actually requires the same kind of computational steps to reproduce as any other transformation.

You don't research anything, do you? Humans have been using algorithms to encrypt or encode secret messages for centuries. You've probably played with the Caesar cipher when you were a kid.

The deeper point about LLMs - we understand that they're "just" predicting the next token, but the latent understanding that emerges from this process goes way beyond simple pattern matching or guessing. There's genuine understanding happening in the latent space that we can't easily explain.

You don't know how this generation of AI works at all, do you? LLM's don't "predict" the next token so much as you predict the words to use to order a ham sandwich. LLMs train by analyzing billions of words and phrases from a variety of sources.

To put it in simpler terms, think of it like learning a new language by watching TV shows in that language. You pick up common words and phrases, eventually moving on to creating your own sentences, and finally attaining fluency through repetition, practice, and application. Only, the LLM is watching every show ever made. And it was built to do only that. So it's very good at interpreting and responding based on similar works and phrases it was trained on.

To put it bluntly, the "personality" you're seeing from the AI comes from the instructions the provider gave it. How to talk, what subjects to avoid, and what words should not be used come from the provider's input to the system. It's not any of those buzzwords you're using.

Ryo Suwito • Jul 9

Walter my guy, you just argued that AI has 'genuine understanding in latent space' and then immediately said it's just pattern matching with no real understanding. Which one is it? You're debating yourself in the same comment lol. Maybe figure out your own position before coming for others?

Walter Johnson • Jul 9

I've made my position quite clear. I believe the agent you used performed a calculation using a well known algorithm. You believe that the AI is actually thinking. And rather than support those arguments with actual facts, you're hoping the word salad is enough to silence the criticism. That clear enough?

Language has always been about patterns, my guy. All language uses nouns, verbs, past participles, articles, etc. If that were not the case, then what is the point of using large language models? What do you think AI training is?

I'm asking these questions because I genuinely want to know why you think that using the word "guess" did something magical here.

Ryo Suwito • Jul 10

It reads like someone who wants to sound knowledgeable but doesn't really understand what they're arguing against.

Walter Johnson • Jul 11

Oh for the ...

Will you just answer the questions?

Mike Talbot ⭐ • Jul 11 • Edited

How are you writing a sentence or a function in code? Are you predicting the next token, just like I am writing this comment?

How I'm doing that makes me smart - saying Oh, it's just making a list of tokens is, to me, like saying human thought is just putting one word after another - it's how I express my thoughts, not how I have them.

The research around how Claude thinks is fascinating.

WG Davies • Jul 9 • Edited

Again, I don't understand why this supposition is necessary. I think it's way too easy to get programmed by the AI into thinking that it's "thinking" (in a lingual sense) than to just recognise what it's doing… parsing input to render deterministic output. What makes the LLM transform unique is that it's not necessarily linear: The same input may yield different output and vice versa. However, here you ask it to "guess" a transform, which it doesn't do *; rather, it interprets / translates (according to a lookup/Levenstein, just like your hungry Google search) "guess" to "solve" and does so. A "real" AI should say, "Guess?! Boy, you crazy! But if you give me a cookie, I'll calculate it for you. If you give me two cookies, I'll even show you the steps!" There is either nothing predictive happening or, if there is something predictive happening, it is not apparent from this example. I'm sure, for example, that the CPU is doing some predictive caching, but I posit that's not what you have in mind.

In fact, none of them do, as you show by the identical output (accounting for the difference in trailing newline). Caveat: The possible exception being Grok, but you don't provide the output, so we don't know!

Ryo Suwito • Jul 10

It reads like someone who wants to sound knowledgeable but doesn't really understand what they're arguing against.

WG Davies • Jul 10

Could you please clarify your statement? I'm quite honestly not sure what you're arguing against.

I grant that my response, above, is verbose. I'll summarise your findings from the article here with my direct responses:

What This Actually Means No, it doesn't. There is zero need (from the evidence provided!) that, "…this appears to be algorithmic intuition". You asked a fairly specific question ("guess the base64 of…") and got the correct output. NOW! If you had gotten significantly different output from the various "AI"s out there, that would legitimately be more interesting. But you didn't. You got them all (as far as we have record of) saying, "Oh, printf(base64(string)). I kan haz dat!"
Beyond "Stochastic Parrots "The criticism that AI systems are just "stochastic parrots" regurgitating training data doesn't explain this capability." Perhaps, but you have yet to demonstrate why. All "AI" systems have heard of "base64" and can calculate (by any means necessary) a base64 output just as any human can with the (very simple) algorithm you helpfully provide in the text. There is no approximation; all posted results are precisely identical when accounting for a trailing newline.
Implications for Developers "For encoding/decoding tasks: AI might predict outputs faster than running actual algorithms." Zero evidence is given for this, nor use or test cases. It's hard to imagine how "Hi there, please give me the base64 output of all of Wikipedia." would be any quicker than, wget --spider https://wikipedia.org | base64 -w 0. I definitely grant here that the supposed implication here is really interesting! Unfortunately, it still looks like the closest distance between two points is still a straight line.
For algorithm design: If AI can internalize mathematical relationships this precisely, it challenges assumptions about what constitutes 'computation.'" This one is frankly hilarious. The "AI" doesn't need to "internalize" anything. You asked it to do something. It did it (sort of). End of story. "Summarise Hamlet in the style of Hunter S Thompson" would be more entertaining, whilst, "Give me the IPv6 protocol in the mathematics of Einstein circa 1905" would be excruciating. But do you get the point yet? This is all just language models. Maybe try, "Show me emergent behaviour" and then tell us what happens?
For security: See?! This could have been interesting! "Can you decrypt the following text? …" That would at least be useful if not exactly emergent ….
For AI capabilities: "This suggests emergent mathematical reasoning that goes beyond text generation." Again, not from anything you've provided.
The Technical Reality We did not observe anything suggesting "algorithmic intuition" coff coff . Although I am truly interested to understand how we would ever know we had observed such a phenomenon.

Alois Sečkár • Jul 9

Why should this suprise me more than correct answer for asking "Get me the next prime number after 123456789?"

It guess which algorithm I want, it applies it on my input and spits out the result. It is complex operation, yes, but it already knows, what steps shoult be taken. It didnt magically invent them. Base64 is common and well described.

I mean, LLMs are quite impressive in how well they parse random user's input, but this doesn't seem as anything more special.

villecoder • Jul 8

The computer did not guess the base64 output. It calculated it. It doesn't need to guess something that it already knows how to do.

Ryo Suwito • Jul 8

guess you know a lot how an LLM works under the hood

Walter Johnson • Jul 8

How do you think an LLM works? Or better yet, what is the purpose of an LLM? And how does that differ from being able to run a well known algorithm?

Base64 is not new. It's also not mysterious. And despite the advances in AI, it didn't get suddenly reinvented for AI to make guesses. It didn't have to guess because it already knows how to do it. The only thing the LLM is adding here is recognizing the question given the context and responding as another person may respond.

Ask it what 1+1 is and convince me that the AI "guessed", not calculated, the answer.

Tanner Iverson • Jul 9

If the AI calculated the answer, how do you explain the incorrect, but close, answers? If it really knew how to calculate it, it would be 100% correct 100% of the time (just like the code to actually encode base64 has to be). I'd call a 95% accurate answer a "guess".

I think the interesting thing here is that these results show a greater understanding than just language pattern continuation (which is what LLMs do). In my opinion, a 95% accurate answer means this is some form of intuition, not merely calculation.

Walter Johnson • Jul 9

If the AI calculated the answer, how do you explain the incorrect, but close, answers? If it really knew how to calculate it, it would be 100% correct 100% of the time (just like the code to actually encode base64 has to be). I'd call a 95% accurate answer a "guess".

So here's a tip. When someone says "these are two separate answers", it's important to look at how they got there. Someone's already said it in this comment section, but you'll notice that all of the applications returned eyJyb2xlIjogImFkbWluIiwgImFjdGl2ZSI6IHRydWV9 which, decoded, is what the original poster has. So where is the extra Cg== coming from?

Well, we know the == is Base64 padding. So the last byte in the message is less than 3 bytes. Base64 encodes 3 bytes of data into 4 characters. So Cg== being 4 characters means that we're probably dealing with one extra character.

So why is there inconsistency in the LLM? Because one of the "tests" included a newline after the JSON string.

In my opinion, a 95% accurate answer means this is some form of intuition, not merely calculation.

It's user error. And it remains a calculation.

Ryo Suwito • Jul 9

So Walter, my guy...
You pick either: Claude does mental computation which humans generally can't do,
OR you mean Claude is just parroting with no emergent properties beyond the human brain itself?
Because if it's doing computation, then the article is RIGHT to be hyped up, since humans can never do this string to base64 mentally...
And if it's not doing mental computation? With finite training data → generalizing to unseen infinite problems = Claude achieves what humans can't achieve... same thing.

villecoder • Jul 9

First, I'm not your guy.

Second, you offer two options that are fundamentally flawed. Claude is not doing anything mental because Claude has no mind. It's ARTIFICIAL!. The computation it's doing doesn't require thought. There's no opportunity for emergent behavior. Math is math. Computer science is computer science. Psychoanalysis is completely unrelated.

Third, you keep bringing up this point that "humans can't do the same thing in their head". Well duh! We're not computers. We don't think in binary. "Protocol" means something entirely different to us than it does to a computer. How is our inability to perform complex calculations mentally evidence of an AI is showing signs of intuition?

WG Davies • Jul 9

To be fair (and beat on the point a little harder), I don't even know why a human couldn't (at least in principle) perform the calculation. As repeatedly demonstrated on this page, it's a very basic transform that can be performed serially with a two-byte buffer. Like Babbage's log tables, it would be tedious to calculate more than a few bytes but (as stated by @villecoder ), this is quite literally why "automatic calculators" came into being in the first place! The astonishing waste of power and CPU cycles to ask an "AI" to "guess" a base64 transform is matched only by the mental waste of people thinking these machines are thinking. News Flash: Machine sentience has been threatening human existence since the Jacquard Loom and – guess what – it's still not doing anything more interesting than shifting more and more bits around.

WG Davies • Jul 10

To your point, though, @ryo_suwito , (and I'm really not trying to be a jerk – please accept my sincere apologies if I've come off like that!) it would be very interesting to find some emergent properties, whether that indicates anything approaching sentience or intelligence or even just convolving complexity, but so far it seems you haven't made that case.

I appreciate your curiosity and your willingness to put something out there (or, here) that you find exciting and interesting! Keep searching, man. It may be here or elsewhere, but don't stop wondering. :)