Debajyati Dey

Posted on Apr 4 • Edited on Apr 5

A Case Study in Solving the Riddle of FrancisTRDEV

#challenge #ai #cryptography #discuss

Mixed-shift riddle solving via AI consensus

Recently, a riddle was posted by @francistrdev on 1st April on dev.to that presented a unique challenge: a multi-line poem obscured by Caesar shifts.

FrancisTRᴅᴇᴠ (っ◔◡◔)っ

Apr 1

Riddle me this DEV and MLH Community [April Fools]

#discuss #watercooler #writing #community

1 min read

However, as many soon discovered, this wasn't a standard single-key cipher. It was a "Mixed-Shift" puzzle, where different lines utilized different rotation values.

In this article, I am documenting the iterative journey of building an automated LLM-based Caesar cipher solver, the failures we (pair planning with AI (Gemini), hence we) encountered, and the final "Contextual Consensus" architecture that achieved 100% accuracy.

I name this approach/project - DecipherLM.

The Core Problem

The Riddle -

Not all rewards glitter the same, Vrxvi tcztbj fwkve jgfzc kyv tcrzd. Exfb jan vjmn R anodbn cx kanjt, Every promise real—no fake. Remaining patient tells you when, Pjvnb xo cadbc knprw jpjrw. Fetv pfl tyffjv kf wfccfn kyiflxy, Nothing’s lost—except maybe you. Numbers, years, a felt‑known tune, R dvdfip yldj r jzcmvi jgffe. Pnwcun fxamb, ojvrurja ouxf, If it’s wrong… why does it glow? Very few walk away untouched, Vmve wvnvi jljgvtk rj dlty. Pzvcu kf rejnvij, ufe’k xzmv lg-- Fcu ivgczvj wzcc kyv tlg. Leuvijkreu nyrk cvu pfl yviv, Lecvjj... pfl rcivrup yvri zk. Gcrp zk. Nyrk rd Z yvrizex?

Traditional frequency analysis (looking for 'E' and 'T') works well for long texts but fails on short, poetic lines. We turned to LLM Perplexity Scoring.

The Logic

An LLM is trained on trillions of tokens of natural language. If we shift a line 25 times, the version that "looks" most like English will have the lowest Perplexity (PPL)—a mathematical measurement of how "surprised" a model is by a sequence of text.

Phase 1: The "Global Master Key" Failure

Our first approach assumed the entire poem used one shift. We calculated the average perplexity for the whole block across all 25 shifts.

Result: Failure

Observation

Because lines 2, 3, and 6 used Shift +17 while the rest used Shift +9, the "correct" shift for the majority was being "poisoned" by the high perplexity of the minority lines. The model couldn't find a single key that made the whole text coherent.

Phase 2: The "Line-by-Line" Noise Trap

We pivoted to scoring every line individually. If a line looks best at +17, decrypt it at +17.

Result: Partial Success

The Model Crisis

SmolLM2-135M: Performed surprisingly well but made "hallucinated" guesses on short lines (e.g., choosing Shift +14 for a 4-word line).
GPT-2: Failed significantly due to an outdated tokenizer that couldn't handle the "character-level" noise of a cipher.
Large Models (360M+): Often performed worse. They were "too sensitive"—a single unusual word choice in the poem would cause a perplexity spike, leading the model to prefer a gibberish shift that happened to have a "smoother" token distribution.

Phase 3: The "Consensus & Mode" Breakthrough

I thought that for a(this) riddle, the shifts aren't random. They are scatteredly forming a group and each member of a group is following a pattern (the Mode value or সংখ্যাগুরুমান of the series of individual shifts).

This was my prompt -

Trusted Pool Strategy

Identify the Global Master Key (best shift for the whole block)
Identify the Local Mode (most frequent best-shift across individual lines)

Candidate Pool={s∣s \isin (Global Minimum \bigcup Mode(Line Shifts))}

The Constraint

Force every line to choose only from this Trusted Pool (e.g., [9, 17]).

But the 135M huggingface model was not showing any improvement. Then I thought maybe it is the model problem. I intended to use any SLM (Small Language Model) but stay away from LLMs because I can't run them locally. I decided to explore the power of Small Language Models. So, I had to choose between the narrower domain of models. I decided to go with Qwen2.5-0.5B.

By using Qwen2.5-0.5B—a model with superior character-level awareness, the approach finally clicked.

I was able to eliminate the "random noise" shifts (+14, +4, etc.), as the model was no longer allowed to pick them.

The Output -

✅ Trusted Candidate Shifts: [9, 17]

==============================
🏆 FINAL DECIPHERED TEXT
==============================

Eager clicks often spoil the claim.
Vows are made I refuse to break,
Games of trust begin again.
Once you choose to follow through,
I umuwzg pcua i aqtdmz axwwv.
Gentle words, familiar flow,
Even fewer suspect as much.
Yield to answers, don’t give up--
Old replies fill the cup.
Understand what led you here,
Unless... you already hear it.
Play it.
What am I hearing?

Phase 4: Final Polish — "Contextual History"

Even with a Trusted Pool, one line:

"A memory hums a silver spoon"

was still being misidentified by the Qwen2.5-0.5B model.

Contextual Scoring

Instead of scoring a line in isolation, we appended it to the previous two decrypted lines:

# The Secret Sauce
scoring_text = (history + "\n" + candidate_line).strip()
ppl = score_with_llm(scoring_text)

Feeding the model the "history" of the poem, it finally "understood" the narrative flow.

It recognized that:

"A memory hums..."

followed the previous lines perfectly at Shift +9, even if the mathematical perplexity was close.

Comparison of Models

Model	Size	Verdict
GPT-2	124M	Poor. Tokenizer is too old; struggles with cipher fragments.
SmolLM2-135M	135M	Good. Great "Goldilocks" model for simple tasks, but prone to noise.
Qwen2.5-0.5B	500M	Excellent. The winner. High precision and modern tokenization.
SmolLM2-360M	360M	Mediocre. Surprisingly overthinks the noise in short sentences.

Bigger isn't always better. We compared 4 different architectures to determine which model handled character level cryptographic noise most efficiently. The chart below plots the parameter size against qualitative verdict.

Why Qwen2.5 0.5B Won

Character Awareness:

Superior handling of ciphered text fragments compared to BPE-heavy models.
Modern Tokenization:

Avoids the "hallucination" traps seen in smaller SmoILM or older GPT-2 variants.
Efficiency:

Sub-5s decryption on consumer GPUs with 16-bit precision.

Remarks

Solving FrancisTRDEV's riddle was never about "having a GPU". It was about understanding that Context is King.

By moving from raw statistical scoring to a "Consensus + Context" architecture, we transformed a noisy LLM into a precise cryptanalytic tool.

Final Version of DecipherLM Architecture: A Triple-Stage Pipeline

Global Analysis Module (The Macro Lens): The first stage performs a broad sweep of the entire ciphertext. By calculating the Perplexity (PPL) of the full block across all 25 shifts, it identifies the Global Master Key—the most dominant shift that makes the most sense at scale.
Consensus Engine (The Statistical Filter): Instead of trusting a single key, this engine performs a "Line-by-Line" vote. It identifies the Mode (সংখ্যাগুরুমান) of shifts across individual lines to create a "Trusted Pool" (e.g., shifts +9 and +17). This critically filters out "linguistic noise"—those random, hallucinated shifts that might look like English in isolation but are statistically irrelevant to the whole poem.
Context-Aware Precision Stage (The Micro Lens): The final and most advanced layer uses the Qwen2.5-0.5B model with a Rolling Context Buffer. Each line is scored not in a vacuum, but alongside the context of the previous two decrypted lines. This ensures narrative and semantic continuity, allowing the model to perfectly resolve short, ambiguous lines that simple frequency analysis would miss.

This architecture moves away from "guessing" and toward a robust, multi-layered validation system that treats cryptography as a linguistic probability problem.

The Full & Final Code

First install dependencies -

uv add torch python-dotenv transformers accelerate

Now write code -

import torch
import os
from collections import Counter
from transformers import AutoTokenizer, AutoModelForCausalLM
from dotenv import load_dotenv
from huggingface_hub import login

load_dotenv()
hf_token = os.getenv("HF_TOKEN")

if hf_token:
    login(token=hf_token) 

# 1. Setup
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
DTYPE = torch.float16 if DEVICE.type == "cuda" else torch.float32
MODEL_NAME = "Qwen/Qwen2.5-0.5B"

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME, dtype=DTYPE, trust_remote_code=True, device_map=DEVICE.type
)
model.eval()


def caesar_shift(text: str, shift: int) -> str:
    res = []
    for ch in text:
        if ch.isalpha():
            base = ord("A") if ch.isupper() else ord("a")
            res.append(chr((ord(ch) - base + shift) % 26 + base))
        else:
            res.append(ch)
    return "".join(res)


def score_with_llm(text: str) -> float:
    tokens = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
    tokens = {k: v.to(DEVICE) for k, v in tokens.items()}
    with torch.no_grad():
        loss = model(**tokens, labels=tokens["input_ids"]).loss
    return torch.exp(loss).item()


def solve_with_consensus(ciphertext: str):
    lines = [line.strip() for line in ciphertext.split("\n") if line.strip()]

    # STEP 1: Global Master Key (Remains the same)
    print("🌍 Step 1: Calculating Global Master Key...")
    global_scores = []
    for s in range(1, 26):
        global_scores.append((s, score_with_llm(caesar_shift(ciphertext, s))))
    master_key = min(global_scores, key=lambda x: x[1])[0]

    # STEP 2: Potential Secondary Shift (Remains the same)
    print("🔍 Step 2: Identifying Potential Secondary Shift...")
    line_best_shifts = []
    for line in lines:
        best_s = min([(s, score_with_llm(caesar_shift(line, s))) for s in range(1, 26)], key=lambda x: x[1])[0]
        line_best_shifts.append(best_s)

    counts = Counter(line_best_shifts).most_common(2)
    primary_candidate = counts[0][0]
    secondary_candidate = counts[1][0] if len(counts) > 1 else master_key
    trusted_shifts = list(set([master_key, primary_candidate, secondary_candidate]))
    print(f"✅ Trusted Candidate Shifts: {trusted_shifts}")

    # STEP 3: Final Decryption with CONTEXT
    print("✍️  Step 3: Decrypting with contextual history...")
    final_lines = []
    history = "" # We will store previous decrypted lines here

    for line in lines:
        scores = []
        for s in trusted_shifts:
            candidate = caesar_shift(line, s)
            # We score the candidate line WITH the previous 2 lines of context
            # This prevents the model from choosing gibberish for short lines.
            scoring_text = (history + "\n" + candidate).strip()
            scores.append((s, score_with_llm(scoring_text), candidate))

        winner_shift, _, winner_text = min(scores, key=lambda x: x[1])
        final_lines.append(winner_text)

        # Update history (keep only the last 2 lines to save memory/tokens)
        history_lines = (history + "\n" + winner_text).strip().split('\n')
        history = "\n".join(history_lines[-2:])

    return final_lines, master_key


# Deciphering in Action
CIPHERTEXT = """Vrxvi tcztbj fwkve jgfzc kyv tcrzd.
Exfb jan vjmn R anodbn cx kanjt,
Pjvnb xo cadbc knprw jpjrw.
Fetv pfl tyffjv kf wfccfn kyiflxy,
R dvdfip yldj r jzcmvi jgffe.
Pnwcun fxamb, ojvrurja ouxf,
Vmve wvnvi jljgvtk rj dlty.
Pzvcu kf rejnvij, ufe’k xzmv lg--
Fcu ivgczvj wzcc kyv tlg.
Leuvijkreu nyrk cvu pfl yviv,
Lecvjj... pfl rcivrup yvri zk.
Gcrp zk.
Nyrk rd Z yvrizex?"""

decoded_output, m_key = solve_with_consensus(CIPHERTEXT)

print("\n" + "=" * 30)
print("🏆 FINAL DECIPHERED TEXT")
print("=" * 30 + "\n")
print("\n".join(decoded_output))

The console output -

Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.
Loading weights: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 290/290 [00:00<00:00, 502.36it/s]
🌍 Step 1: Calculating Global Master Key...
🔍 Step 2: Identifying Potential Secondary Shift...
✅ Trusted Candidate Shifts: [9, 17]
✍  Step 3: Decrypting with contextual history...

==============================
🏆 FINAL DECIPHERED TEXT
==============================

Eager clicks often spoil the claim.
Vows are made I refuse to break,
Games of trust begin again.
Once you choose to follow through,
A memory hums a silver spoon.
Gentle words, familiar flow,
Even fewer suspect as much.
Yield to answers, don’t give up--
Old replies fill the cup.
Understand what led you here,
Unless... you already hear it.
Play it.
What am I hearing?

Final Decryption Key

Most lines: Shift +9
Outlier lines: Shift +17

Final Verdict

The riddle is solved. The DecipherLM wins.

So, the entire poem will be -

Not all rewards glitter the same, Eager clicks often spoil the claim. Vows are made I refuse to break, Every promise real—no fake. Remaining patient tells you when, Games of trust begin again. Once you choose to follow through, Nothing’s lost—except maybe you. Numbers, years, a felt‑known tune, A memory hums a silver spoon. Gentle words, familiar flow, If it’s wrong… why does it glow? Very few walk away untouched, Even fewer suspect as much. Yield to answers, don’t give up-- Old replies fill the cup. Understand what led you here, Unless... you already hear it. Play it. What am I hearing?

Concluding

So, yes, ... that's a wrap!

Feel free to connect with me. :)

Thanks for reading! 🙏🏻 Written with 💚 by Debajyati Dey

Follow me on Dev...

Debajyati Dey

Web Developer, Tech Writer, Deep Learning Engineer. Finetuning or experimenting with models in my free time. Always eager to work with new technologies & document them. Email me for collaboration.

Happy coding 🧑🏽‍💻👩🏽‍💻! Have a nice day ahead! 🚀

Top comments (11)

FrancisTRᴅᴇᴠ (っ◔◡◔)っ • Apr 7

Great work! Couple of things.

This is an interesting solution. However, this cipher can be decrypted much easier since it is a Caesar Cypher, meaning you can shift the keys easier. The Shift/Key number is 69 lol.
Although you decrypted the text, you did not solve the actual riddle. With that said, the decryption is correct, so now riddle me this: What am I listening to?

Great work again! Well done and well documented too!

bingkahu (Matteo) • Apr 7

I will post the answer here. Before reading the hint I immediately thought Rick Astley but then i saw the hint realized it was the Pokemon theme tune ( Gotta Catch Em All). Solved 😁

FrancisTRᴅᴇᴠ (っ◔◡◔)っ • Apr 7

Is it Rick Astley?...or the Pokemon Theme tune?

bingkahu (Matteo) • Apr 7

I'm gonna say Pokemon. Locking it in 🤞

FrancisTRᴅᴇᴠ (っ◔◡◔)っ • Apr 7

Ok. I am gonna announce it soon where you can submit the answer more properly. Let me know if this works @bingkahu riddlesubmission.vercel.app

bingkahu (Matteo) • Apr 7

Well when I go to riddlesubmission.vercel.app it throws me a 404, but when I click on the hyperlink on this DEV post it redirects to this...

FrancisTRᴅᴇᴠ (っ◔◡◔)っ • Apr 7

There is an issue on Dev.to where the link doesn't simply "register" the middleware, which leads to a 404 since it treats it as a "text". I am gonna make a post about it to let everyone know.

Debajyati Dey • Apr 7

Oh okay lol. Now I think I was high on SLMs at that time and over engineered it XD

Debajyati Dey • Apr 8

I am too betting on Pokemon. Let's see if I got it right or not...🫠

Oh, man did you just see? emoji support on dev.to! I waited long for this feature...

Willie Harris • Apr 8

Really cool breakdown of how “bigger isn’t always better” with LLMs—using consensus + context instead of brute force feels like the kind of elegant hack Reddit would absolutely upvote 🚀