DEV Community

Cover image for A Case Study in Solving the Riddle of FrancisTRDEV
Debajyati Dey
Debajyati Dey

Posted on

A Case Study in Solving the Riddle of FrancisTRDEV

Recently, a riddle was posted by @francistrdev on 1st April on dev.to that presented a unique challenge: a multi-line poem obscured by Caesar shifts.


However, as many soon discovered, this wasn't a standard single-key cipher. It was a "Mixed-Shift" puzzle, where different lines utilized different rotation values.

In this article, I am documenting the iterative journey of building an automated LLM-based Caesar cipher solver, the failures we (pair planning with AI (Gemini), hence we) encountered, and the final "Contextual Consensus" architecture that achieved 100% accuracy.

I name this approach/project - DecipherLM.

The Core Problem

The Riddle -

Not all rewards glitter the same,
Vrxvi tcztbj fwkve jgfzc kyv tcrzd.
Exfb jan vjmn R anodbn cx kanjt,
Every promise real—no fake.
Remaining patient tells you when,
Pjvnb xo cadbc knprw jpjrw.
Fetv pfl tyffjv kf wfccfn kyiflxy,
Nothing’s lost—except maybe you.
Numbers, years, a felt‑known tune,
R dvdfip yldj r jzcmvi jgffe.
Pnwcun fxamb, ojvrurja ouxf,
If it’s wrong… why does it glow?
Very few walk away untouched,
Vmve wvnvi jljgvtk rj dlty.
Pzvcu kf rejnvij, ufe’k xzmv lg--
Fcu ivgczvj wzcc kyv tlg.
Leuvijkreu nyrk cvu pfl yviv,
Lecvjj... pfl rcivrup yvri zk.
Gcrp zk.
Nyrk rd Z yvrizex?

Traditional frequency analysis (looking for 'E' and 'T') works well for long texts but fails on short, poetic lines. We turned to LLM Perplexity Scoring.

Core Problem and Proposed Solution

The Logic

An LLM is trained on trillions of tokens of natural language. If we shift a line 25 times, the version that "looks" most like English will have the lowest Perplexity (PPL)—a mathematical measurement of how "surprised" a model is by a sequence of text.


Phase 1: The "Global Master Key" Failure

Our first approach assumed the entire poem used one shift. We calculated the average perplexity for the whole block across all 25 shifts.

Result: Failure

Observation

Because lines 2, 3, and 6 used Shift +17 while the rest used Shift +9, the "correct" shift for the majority was being "poisoned" by the high perplexity of the minority lines. The model couldn't find a single key that made the whole text coherent.

1st Naive Approach

Phase 2: The "Line-by-Line" Noise Trap

We pivoted to scoring every line individually. If a line looks best at +17, decrypt it at +17.

Result: Partial Success

2nd Naive Approach

The Model Crisis

  • SmolLM2-135M: Performed surprisingly well but made "hallucinated" guesses on short lines (e.g., choosing Shift +14 for a 4-word line).
  • GPT-2: Failed significantly due to an outdated tokenizer that couldn't handle the "character-level" noise of a cipher.
  • Large Models (360M+): Often performed worse. They were "too sensitive"—a single unusual word choice in the poem would cause a perplexity spike, leading the model to prefer a gibberish shift that happened to have a "smoother" token distribution.

Phase 3: The "Consensus & Mode" Breakthrough

I thought that for a(this) riddle, the shifts aren't random. They are scatteredly forming a group and each member of a group is following a pattern (the Mode value or সংখ্যাগুরুমান of the series of individual shifts).

This was my prompt -

Gemini Chat

Trusted Pool Strategy

  • Identify the Global Master Key (best shift for the whole block)
  • Identify the Local Mode (most frequent best-shift across individual lines)

CandidatePool=ss(GlobalMinimumMode(LineShifts)) Candidate Pool={s∣s \isin (Global Minimum \bigcup Mode(Line Shifts))}

3rd Improved Intelligent Approach - Finding Patterns - Trusted Pool Strategy

The Constraint

Force every line to choose only from this Trusted Pool (e.g., [9, 17]).

But the 135M huggingface model was not showing any improvement. Then I thought maybe it is the model problem. I intended to use any SLM (Small Language Model) but stay away from LLMs because I can't run them locally. I decided to explore the power of Small Language Models. So, I had to choose between the narrower domain of models. I decided to go with Qwen2.5-0.5B.

By using Qwen2.5-0.5B—a model with superior character-level awareness, the approach finally clicked.

I was able to eliminate the "random noise" shifts (+14, +4, etc.), as the model was no longer allowed to pick them.

The Output -

✅ Trusted Candidate Shifts: [9, 17]

==============================
🏆 FINAL DECIPHERED TEXT
==============================

Eager clicks often spoil the claim.
Vows are made I refuse to break,
Games of trust begin again.
Once you choose to follow through,
I umuwzg pcua i aqtdmz axwwv.
Gentle words, familiar flow,
Even fewer suspect as much.
Yield to answers, don’t give up--
Old replies fill the cup.
Understand what led you here,
Unless... you already hear it.
Play it.
What am I hearing?
Enter fullscreen mode Exit fullscreen mode

Phase 4: Final Polish — "Contextual History"

Even with a Trusted Pool, one line:

"A memory hums a silver spoon"

was still being misidentified by the Qwen2.5-0.5B model.

Contextual Scoring

Final Optimal Approach - Contextual Knowledge
Instead of scoring a line in isolation, we appended it to the previous two decrypted lines:

# The Secret Sauce
scoring_text = (history + "\n" + candidate_line).strip()
ppl = score_with_llm(scoring_text)
Enter fullscreen mode Exit fullscreen mode

Feeding the model the "history" of the poem, it finally "understood" the narrative flow.

It recognized that:

"A memory hums..."

followed the previous lines perfectly at Shift +9, even if the mathematical perplexity was close.


Comparison of Models

Model Size Verdict
GPT-2 124M Poor. Tokenizer is too old; struggles with cipher fragments.
SmolLM2-135M 135M Good. Great "Goldilocks" model for simple tasks, but prone to noise.
Qwen2.5-0.5B 500M Excellent. The winner. High precision and modern tokenization.
SmolLM2-360M 360M Mediocre. Surprisingly overthinks the noise in short sentences.

Bigger isn't always better. We compared 4 different architectures to determine which model handled character level cryptographic noise most efficiently. The chart below plots the parameter size against qualitative verdict.

Model Comparison Chart

Why Qwen2.5 0.5B Won

  • Character Awareness:

    Superior handling of ciphered text fragments compared to BPE-heavy models.

  • Modern Tokenization:

    Avoids the "hallucination" traps seen in smaller SmoILM or older GPT-2 variants.

  • Efficiency:

    Sub-5s decryption on consumer GPUs with 16-bit precision.


Remarks

Solving FrancisTRDEV's riddle was never about "having a GPU". It was about understanding that Context is King.

By moving from raw statistical scoring to a "Consensus + Context" architecture, we transformed a noisy LLM into a precise cryptanalytic tool.


Final Version of DecipherLM Architecture: A Triple-Stage Pipeline

  1. Global Analysis Module (The Macro Lens): The first stage performs a broad sweep of the entire ciphertext. By calculating the Perplexity (PPL) of the full block across all 25 shifts, it identifies the Global Master Key—the most dominant shift that makes the most sense at scale.

  2. Consensus Engine (The Statistical Filter): Instead of trusting a single key, this engine performs a "Line-by-Line" vote. It identifies the Mode (সংখ্যাগুরুমান) of shifts across individual lines to create a "Trusted Pool" (e.g., shifts +9 and +17). This critically filters out "linguistic noise"—those random, hallucinated shifts that might look like English in isolation but are statistically irrelevant to the whole poem.

  3. Context-Aware Precision Stage (The Micro Lens): The final and most advanced layer uses the Qwen2.5-0.5B model with a Rolling Context Buffer. Each line is scored not in a vacuum, but alongside the context of the previous two decrypted lines. This ensures narrative and semantic continuity, allowing the model to perfectly resolve short, ambiguous lines that simple frequency analysis would miss.

DecipherLM Architecture

This architecture moves away from "guessing" and toward a robust, multi-layered validation system that treats cryptography as a linguistic probability problem.

The Full & Final Code

First install dependencies -

uv add torch python-dotenv transformers accelerate
Enter fullscreen mode Exit fullscreen mode

Now write code -

import torch
import os
from collections import Counter
from transformers import AutoTokenizer, AutoModelForCausalLM
from dotenv import load_dotenv
from huggingface_hub import login

load_dotenv()
hf_token = os.getenv("HF_TOKEN")

if hf_token:
    login(token=hf_token) 

# 1. Setup
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
DTYPE = torch.float16 if DEVICE.type == "cuda" else torch.float32
MODEL_NAME = "Qwen/Qwen2.5-0.5B"

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME, dtype=DTYPE, trust_remote_code=True, device_map=DEVICE.type
)
model.eval()


def caesar_shift(text: str, shift: int) -> str:
    res = []
    for ch in text:
        if ch.isalpha():
            base = ord("A") if ch.isupper() else ord("a")
            res.append(chr((ord(ch) - base + shift) % 26 + base))
        else:
            res.append(ch)
    return "".join(res)


def score_with_llm(text: str) -> float:
    tokens = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
    tokens = {k: v.to(DEVICE) for k, v in tokens.items()}
    with torch.no_grad():
        loss = model(**tokens, labels=tokens["input_ids"]).loss
    return torch.exp(loss).item()


def solve_with_consensus(ciphertext: str):
    lines = [line.strip() for line in ciphertext.split("\n") if line.strip()]

    # STEP 1: Global Master Key (Remains the same)
    print("🌍 Step 1: Calculating Global Master Key...")
    global_scores = []
    for s in range(1, 26):
        global_scores.append((s, score_with_llm(caesar_shift(ciphertext, s))))
    master_key = min(global_scores, key=lambda x: x[1])[0]

    # STEP 2: Potential Secondary Shift (Remains the same)
    print("🔍 Step 2: Identifying Potential Secondary Shift...")
    line_best_shifts = []
    for line in lines:
        best_s = min([(s, score_with_llm(caesar_shift(line, s))) for s in range(1, 26)], key=lambda x: x[1])[0]
        line_best_shifts.append(best_s)

    counts = Counter(line_best_shifts).most_common(2)
    primary_candidate = counts[0][0]
    secondary_candidate = counts[1][0] if len(counts) > 1 else master_key
    trusted_shifts = list(set([master_key, primary_candidate, secondary_candidate]))
    print(f"✅ Trusted Candidate Shifts: {trusted_shifts}")

    # STEP 3: Final Decryption with CONTEXT
    print("✍️  Step 3: Decrypting with contextual history...")
    final_lines = []
    history = "" # We will store previous decrypted lines here

    for line in lines:
        scores = []
        for s in trusted_shifts:
            candidate = caesar_shift(line, s)
            # We score the candidate line WITH the previous 2 lines of context
            # This prevents the model from choosing gibberish for short lines.
            scoring_text = (history + "\n" + candidate).strip()
            scores.append((s, score_with_llm(scoring_text), candidate))

        winner_shift, _, winner_text = min(scores, key=lambda x: x[1])
        final_lines.append(winner_text)

        # Update history (keep only the last 2 lines to save memory/tokens)
        history_lines = (history + "\n" + winner_text).strip().split('\n')
        history = "\n".join(history_lines[-2:])

    return final_lines, master_key


# Deciphering in Action
CIPHERTEXT = """Vrxvi tcztbj fwkve jgfzc kyv tcrzd.
Exfb jan vjmn R anodbn cx kanjt,
Pjvnb xo cadbc knprw jpjrw.
Fetv pfl tyffjv kf wfccfn kyiflxy,
R dvdfip yldj r jzcmvi jgffe.
Pnwcun fxamb, ojvrurja ouxf,
Vmve wvnvi jljgvtk rj dlty.
Pzvcu kf rejnvij, ufe’k xzmv lg--
Fcu ivgczvj wzcc kyv tlg.
Leuvijkreu nyrk cvu pfl yviv,
Lecvjj... pfl rcivrup yvri zk.
Gcrp zk.
Nyrk rd Z yvrizex?"""

decoded_output, m_key = solve_with_consensus(CIPHERTEXT)

print("\n" + "=" * 30)
print("🏆 FINAL DECIPHERED TEXT")
print("=" * 30 + "\n")
print("\n".join(decoded_output))
Enter fullscreen mode Exit fullscreen mode

The console output -

Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.
Loading weights: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 290/290 [00:00<00:00, 502.36it/s]
🌍 Step 1: Calculating Global Master Key...
🔍 Step 2: Identifying Potential Secondary Shift...
✅ Trusted Candidate Shifts: [9, 17]
✍  Step 3: Decrypting with contextual history...

==============================
🏆 FINAL DECIPHERED TEXT
==============================

Eager clicks often spoil the claim.
Vows are made I refuse to break,
Games of trust begin again.
Once you choose to follow through,
A memory hums a silver spoon.
Gentle words, familiar flow,
Even fewer suspect as much.
Yield to answers, don’t give up--
Old replies fill the cup.
Understand what led you here,
Unless... you already hear it.
Play it.
What am I hearing?
Enter fullscreen mode Exit fullscreen mode

Final Decryption Key

  • Most lines: Shift +9
  • Outlier lines: Shift +17

Final Verdict

The riddle is solved. The DecipherLM wins.

So, the entire poem will be -

Not all rewards glitter the same,
Eager clicks often spoil the claim.
Vows are made I refuse to break,
Every promise real—no fake.
Remaining patient tells you when,
Games of trust begin again.
Once you choose to follow through,
Nothing’s lost—except maybe you.
Numbers, years, a felt‑known tune,
A memory hums a silver spoon.
Gentle words, familiar flow,
If it’s wrong… why does it glow?
Very few walk away untouched,
Even fewer suspect as much.
Yield to answers, don’t give up--
Old replies fill the cup.
Understand what led you here,
Unless... you already hear it.
Play it.
What am I hearing?

Concluding

So, yes, ... that's a wrap!

Feel free to connect with me. :)

Thanks for reading! 🙏🏻
Written with 💚 by Debajyati Dey
My GitHub My LinkedIn My Daily.dev My Peerlist My Twitter

Follow me on Dev...

Happy coding 🧑🏽‍💻👩🏽‍💻! Have a nice day ahead! 🚀

Top comments (0)