DEV Community: 1red2black ☄️🧙‍♂️🚀

When a Model Finds a Bug in Cryptography, and a Cryptographer Learns New Mathematics from It

1red2black ☄️🧙‍♂️🚀 — Thu, 05 Feb 2026 17:03:32 +0000

This essay is an answer to the critic who demands: "Stop telling fairy tales about AI helping science. Show me the receipts." Fair enough. Without receipts, stories of AI's triumphant triumphs sound like cult literature.

In February 2026, Google posted a 151-page preprint on arXiv. Fifty authors from Carnegie Mellon, Harvard, MIT, EPFL, and a dozen other institutions. The title is modest: "Accelerating Scientific Research with Gemini: Case Studies and Common Techniques." Modest title. Immodest content.

Preprints about AI capabilities appear daily. Most are benchmarks: the model scored 94.7% instead of last year's 93.2%, please clap. This document is different. Real researchers describe how they spent months battering against an open problem, then fed it to Gemini Deep Think—and received, as if by some conjuring trick, a solution. Or a counterexample. Or a pointer to a theorem from an entirely different branch of mathematics they had never encountered.

Some of these stories deserve telling.

Cryptography has its own Holy Grail: constructing a SNARG from standard assumptions.

SNARG stands for Succinct Non-interactive ARGument. It lets you prove that a computation was performed correctly, where the proof's size and verification time are exponentially smaller than the computation itself. You submit a transaction; the blockchain receives a tiny certificate of purity. Without SNARGs (or rather, their close relatives, zk-SNARKs), there would be no Zero-Knowledge rollups, no meaningful Ethereum scaling. This is critical infrastructure.

The problem: all working constructions rely either on idealized models like the random oracle, or on assumptions cryptographers call "unfalsifiable." Building your house on sand is unpleasant. You want bedrock.

In autumn 2025, a preprint appeared on Cryptology ePrint: Guan & Yogev: SNARG for all of NP, built solely on LWE. LWE—Learning With Errors—is a standard assumption from lattice-based cryptography, the foundation of all post-quantum security. If the construction worked, it would be like finding the philosopher's stone.

Google researchers decided to unleash Gemini on the paper.

But not with a naive "verify this proof"—such prompts yield superficial results. The model tends to praise its master, compliment the structure of his glorious scientific work, and catch typos with variable success. To fight these effects, they used a five-step adversarial self-correction protocol: the model generates a review, then criticizes its own findings for hallucinations, refines the arguments, criticizes again, produces a final version.

This algorithm resembles my Discovery Prompt, newer versions of which I post on my Telegram channel 1red2black. The main difference: they didn't try to cram everything into one message and exploit thinking-mode effects. They ran the phases honestly, as separate prompts.

The model found a hole.

In Definition 4.1 (I cite section numbers in case you want to read the actual research)—the authors required perfect consistency: if two proofs agree on some "local view," their "shadows" (compressed representations) must be identical for all values of the randomness parameter. In the construction from Section 4.3, they achieved only statistical consistency: shadows agree with high probability, but "bad" values exist where they don't.

The difference seems like a technical quibble. For most practical applications, everything already works. But the entire security proof relied on the strong version. The weak version allows an attacker to enumerate randomness values, find a specific "bad" one—and break the whole thing.

The finding was sent to independent experts—Aayush Jain and Zhengzhong Jin. They confirmed: the model was right. The original preprint's authors acknowledged the error and updated the paper on ePrint with a red banner at the top: "A gap has been found in the proof of the main theorem."

A neural network found a fatal bug in a cryptographic paper that human expert reviewers had missed.

Karthik C.S. from Rutgers University works in computational geometry. He was interested in a conjecture about Steiner trees.

A Steiner tree is a minimal tree connecting given points in space. Unlike a minimum spanning tree, you're allowed to add intermediate points (Steiner points), which can reduce total length. The problem is NP-hard, but approximation algorithms exist.

The conjecture that interested Karthik: among all graphs with m edges embedded in Euclidean space in a certain way, the minimum Steiner tree cost is achieved by the star graph. Proving this conjecture would be a step toward understanding the complexity of high-dimensional problems. Years of attempts had produced nothing.

Karthik asked a colleague to formulate a prompt and upload the paper to Gemini. The model proposed two approaches.

The first and most obvious: local graph transformations, step by step approaching the star, without increasing Steiner tree cost. The researchers had already tried this. Dead end.

The second approach was based on Kirszbraun's theorem.

Kirszbraun's theorem—a result from functional analysis, vintage 1934. It states: if you have a Lipschitz function between subsets of Hilbert spaces, you can extend it to the entire space while preserving the Lipschitz constant.

Sounds abstract. The meaning is simple: a "contracting" map between parts of spaces can be extended to a "contracting" map between whole spaces.

Karthik knew about various extension theorems—he had worked with fixed-point theorems in communication complexity, a branch of theoretical computer science studying information quantity. But the connection between Kirszbraun and Steiner trees? He had never seen it. To his knowledge, no one had.

Then came a fork typical of these stories. Initially, the model rejected its own approach as too fancy. Something in its training apparently favored elementary proofs over heavy machinery. A reasonable heuristic, a way to save datacenter compute. But in this case—a false trail.

Karthik clarified: "I don't need an elementary proof."

The model pivoted. It formalized the available arguments. Built a mapping from any graph to a star graph. Showed this mapping was 1-Lipschitz (doesn't increase distances). Applied Kirszbraun's theorem to extend it to Steiner points. Concluded that the star's tree cost cannot exceed the original graph's.

Conjecture proved.

Let me give the mathematician's own words, to avoid substitution of concepts:

"Through this process, I have learned about the power of the Kirszbraun Extension Theorem for Steiner tree computation and analysis," Karthik writes in his testimonial. "To the best of my knowledge, this is a new connection."

An expert in computational geometry learned new mathematics from a language model.

Physicists in Michael Brenner's group at Harvard were working on an integral related to the spectrum of cosmic strings.

Cosmic strings are hypothetical one-dimensional topological defects that may have formed during phase transitions in the early universe. Interest in them surged after Pulsar Timing Array observations detected a stochastic gravitational-wave background. The source might be cosmic strings.

The integral describing loop formation had resisted decades of theoretical effort. Researchers couldn't even nail down the asymptotic behavior of a key coefficient.

The model produced an explicit analytic formula. Previously unknown.

Verification took several paths. Numerical comparison with existing simulation data: the formula matched. Symbolic verification by the original expert: everything checked out. Derivation published with a citation to Gemini as co-author.

A language model derived a formula in theoretical physics that humans had sought for decades.

Another case involved submodular optimization—a field at the intersection of combinatorics, economics, and machine learning. Submodular functions model diminishing returns: each additional element contributes less than the previous one. Classic application: optimal placement of sensors, where each new sensor adds less coverage.

A research team had a paper with several conjectures about online submodular welfare maximization. One involved a probabilistic inequality—a bound on expected marginal gains.

Then came a genuine zero-shot. One prompt. No dialogue.

The model chose exactly this conjecture (not the most obvious in the paper!). Built a counterexample: 3 elements, 2 agents, specific submodular functions (a table of values on all subsets). Checked all 3! = 6 permutations. Computed left and right sides of the inequality: 122.6/6 > 121.8/6.

Conjecture refuted.

Human researchers independently verified the arithmetic. Everything added up.

The document's authors formulate something like a toolkit for working with AI in theoretical research. I'll paraphrase.

Iterative refinement. The model rarely solves a problem on the first try. Success comes through dialogue: refining the formulation, pointing out errors, providing scaffolding—high-level structure for the model to fill with details.

Cross-pollination. Models have digested literature from all fields. They find connections that experts miss because each human expert is trapped in their narrow expertise. Weierstrass-Stone for Max-Cut (functional analysis → approximation algorithms). Kirszbraun for Steiner (topology → computational geometry). Bethe approximation for permanents (statistical physics → graph theory).

Context de-identification. Sometimes the model refuses to attack a problem it recognizes as an "open problem." The counterintuitive solution: strip the context. Remove all information about the open problem's history. Leave only the statement and definitions. Less context, better results.

Neuro-symbolic loops. The model proposes a formula; code verifies; errors return to context. Automatic pruning of dead branches without human involvement.

Adversarial self-correction. For review: generation → self-criticism for hallucinations → refinement → repeated criticism → final version.

The authors are honest about limitations.

Confirmation bias. If you formulate a false conjecture as true and ask for a proof, the model will try to close all logical gaps with confident, handwavy arguments. A neutral prompt ("prove or disprove") helps, but guarantees nothing.

Confident hallucinations. Models handle high-level structure well but can forget constraints, confuse inequality signs, misapply theorems. In the Courtade-Kumar case (information theory), the model repeatedly confused bounds in hypercontractivity inequalities. Human verification is mandatory.

Alignment friction. Safety constraints often obstruct research. The model refuses to tackle a problem it recognizes as "open" or "too ambitious." You have to strip context or rephrase.

There's an observation the authors make near the end that deserves separate attention.

If AI radically reduces the suffering involved in producing technically dense papers, and such papers now flood out—the bottleneck of science shifts from creation to verification.

Peer review is already overloaded. Reviewers work for free. Deadlines burn. A torrent of AI-assisted literature will break an already barely functioning process.

But our cryptography example shows: AI with properly configured prompts, processes, protocols—can find barely visible problems even in proofs by prominent experts. The same tools can be used to review work from other fields.

But who verifies the verifiers?

And the next question: if a model writes a paper and another model reviews it, where in this cycle is the human? Do we even need a human?

Let's address the elephant. The document was written by Google employees about the capabilities of a Google model. The conflict of interest is obvious.

The research uses a special non-public, advanced version of Gemini Deep Think, unavailable outside Google. Reproducibility with ordinary tools is a big question mark.

The paper describes successes. How many failures were there? What's the success rate? One breakthrough per hundred prompts, or ten? Unknown.

Where does "writing a paper with AI help" end and "writing a paper as a human" begin? In Karthik's case, the human rephrased the prompt to make the model work better. Is the good result his contribution, or the model's? The boundary is blurred.

One researcher describes the model as "a tireless, educated, creative, and gifted junior colleague." This is probably more accurate than grand claims about "reasoning ability" or "discovery."

A junior colleague who never sleeps, has read all the literature, and finds non-obvious connections between fields. Who sometimes hallucinates, but sometimes brilliantly guesses. Who, unfortunately, must be checked at every step. Who cannot be trusted, but can be worked with.

Lance Fortnow—author of "The Golden Ticket," one of the most recognizable names in complexity theory—"vibe-coded" an entire research paper in eight prompts. It felt wrong, he wrote, like he had cheated somehow. But perhaps the difference is that Fortnow understands and is conscious of what he's doing. The model is not. Not yet.

Maybe this is the boundary. Here runs the line between "exceptional junior" and something greater. Between a tool that finds Kirszbraun's theorem at the right moment, and a being that understands why it was needed there.

Or maybe in ten years we'll laugh at this distinction, as we laugh at 1980s fears that computers would take programmers' jobs.

Of course, computers did take jobs from the women who operated punch cards. But the number of programmers only grew.

arXiv:2602.03837v1, Woodruff et al., "Accelerating Scientific Research with Gemini: Case Studies and Common Techniques," February 2026

The Forbidden Fruit Has Already Been Bitten

1red2black ☄️🧙‍♂️🚀 — Wed, 04 Feb 2026 18:14:04 +0000

David Kipping, an astrophysicist at Columbia, stumbled into a closed-door meeting at the Institute for Advanced Study in Princeton. He came back shaken and recorded a podcast. Here is what was said — and why it should unsettle anyone who works with models.

In January, David Kipping drove to Princeton to deliver a colloquium on astronomy. In the corridors of the Institute for Advanced Study — corridors whose particular institutional hush one learns not to break — he passed Ed Witten, one of the architects of string theory. The two exchanged the briefest of nods, as people do in hallways they share with ghosts. Einstein had walked here. Oppenheimer. Gödel. IAS is not a place given to nodding along with nonsense.

Kipping is a professor at Columbia, runs the YouTube channel Cool Worlds (a million and a half subscribers), and has spent a decade straddling ML and astrophysics. Eight years ago he stopped building models himself: the literature was moving faster than he could track, and he decided you're either full-time in AI or you use it as an instrument. He chose instrument. His research portfolio includes work on circumbinary planet stability and detecting "missed" exoplanets through neural networks. A working scientist — not a journalist, not a blogger.

The following day, out of habit, he stopped by IAS and walked into a closed meeting. It had been convened by a senior astrophysics professor whose name Kipping deliberately withholds. Topic: what AI is doing to science. Forty minutes of presentation, a historian commenting via Zoom, then open discussion. About thirty people in the room, among them authors of the cosmological simulation codes Enzo, Illustris, Gadget — adaptive mesh hydrodynamics, hundreds of thousands of lines of C and Fortran. Try, as Kipping put it, to find a room with a higher average IQ.

No cameras, no press releases, no prepared remarks. Not a conference. Not a PR event. This is precisely why people said what they actually think.

The historian spoke first: this is a historic moment and it must be documented.

The room laughed. Kipping did not.

Capitulation

The lead professor's opening claim: AI codes an order of magnitude better than humans. His exact framing — complete supremacy, order of magnitude superior. Not one person in the room raised a hand to disagree. Not one.

Then a number. The professor said AI can now perform roughly ninety percent of his intellectual work. He hedged: maybe sixty, maybe ninety-nine. But the thrust was plain — a clear majority, and growing. This was not just about code. Analytical reasoning, mathematics, problem-solving. Everything that a person at the Institute for Advanced Study has spent a career perfecting.

A concrete example, from Kipping himself. He had been working with an integral in Mathematica — Wolfram's flagship engine for symbolic computation, the gold standard for decades. Mathematica failed. ChatGPT 5.2 succeeded. It produced the full chain of substitutions and transformations, which Mathematica does not even attempt. Numerical verification confirmed the result.

When someone at the place where Gödel once worked admits that a model performs ninety percent of his thinking, no marketing department on earth could draft a more terrifying sentence. An identity crisis, pronounced aloud, before witnesses. The witnesses nodded.

Handing Over the Keys

The lead professor had given agentic systems complete control of his digital life. Email, files, servers, calendars. Root access, in Unix terms. Primary tools: Claude and Cursor, with GPT as backup. Roughly a third of the room raised hands: us too.

Someone asked about privacy. Had he at least read the terms of service?

"I don't care. The advantage is so large that the loss of privacy is irrelevant."

Then ethics. Standard concerns were enumerated — displacement of jobs, energy consumption, climate impact, concentration of power among billionaires. He acknowledged every one. And then, quite literally, said: I don't care, the advantage is too great. Kipping describes the room's mood as "ethics be damned." This was not the eccentric bluster of one radical. The room concurred.

Pause here. Academics have spent their entire careers cultivating the art of saying "there are nuances" when they mean yes or no. These are the world's most diplomatically hedged people. And here they sit, in a closed room with no cameras, saying they don't care about ethics. The position itself is predictable: if your job is maximizing scientific output, you optimize for output. But the readiness to say it without a single qualifying clause — that is what tells you how much pressure they feel. A year ago, they would not have said this even at a bar.

Kipping's metaphor: the forbidden fruit. AI companies are the serpent with the apple. Once bitten, innocence does not return. And if you refuse the bite but the competing lab takes it, they outpace you. An arms race with a built-in moral dilemma.

This sense of inevitability is not abstract. Kipping inventories his own workflow: proofreading papers by feeding LaTeX directly into GPT; vibe coding; debugging not by tracing logic but by pasting the error into a chat window. Literature search. Derivative computation. When his TARS project required graphene properties, albedo data, and mechanical stress analysis, he routed everything through AI. For YouTube production: AI for audio cleanup, transcription, upscaling, fact-checking scripts. All of it.

Yet Kipping does not consider himself a power user. His self-assessment: "My strength has always been creativity — AI amplifies it." The lead professor, by Kipping's account, has gone considerably further. That distance between "I use it for proofreading" and "I gave it root access to my servers" is a chasm, and inside it live every stage of acceptance that scientists are now passing through in a year or two.

How Trust Grows

This is where anyone working on alignment, interpretability, or even just shipping agent pipelines in production should slow down and pay close attention.

He described his trajectory. He began with Cursor — because Cursor shows diffs. Here is what your code was, here is what it became, here is what I changed. Transparent, auditable, familiar to any programmer. But as trust accumulated, transparency began to chafe. It stopped feeling like a guardrail and started feeling like friction. He switched to Claude. Claude dispatches sub-agents, decomposes the task, solves the pieces in parallel, acts with greater autonomy. It does not show every diff. It simply does.

For verification, the professor played models against each other: solved a problem in Cursor, cross-checked in Claude, discussed the result in GPT. Peer review, essentially — except not between colleagues, but between three neural networks.

Plot this trajectory formally and you get an S-curve: skepticism, disappointment, time investment, surprise, trust, surrender of control. On the final plateau, transparency becomes an annoyance — a fly buzzing while you think. The world's leading scientists are already standing on that plateau.

Here is what this means for everyone building interpretable and explainable systems: your most sophisticated users do not want your transparency. They will switch it off. Not because they were coerced, not because the interface is bad — because they produce more without it. Natural selection within user behavior presses toward less interpretable systems. For the alignment community, this should trigger alarm: the better models perform, the weaker the incentive to supervise them.

A side effect: small scientific collaborations will begin to vanish. Researchers used to recruit co-authors for skills they lacked — a particular calculation, a sanity check, code in an unfamiliar library. Now the model fills the gap. Why invite a colleague for one computation when Claude handles it in ten minutes? Kipping already publishes single-author papers, unusual for his field, and expects the trend to intensify. Core collaborations will endure — two or three people, each genuinely irreplaceable. Everything else gets delegated to agents.

First contact with models, however, usually disappoints. The lead professor admitted he spent enormous amounts of time on trial and error. Hours screaming at the keyboard in all caps — a peculiar image, a distinguished astrophysicist hammering CAPSLOCK in a silent office. Most people try once, get garbage, walk away. Those who push through — the early adopters — acquire a massive advantage. Hence the meeting's true purpose: the Institute was not resisting. It was assembling a cohort for accelerated adoption. The message was unambiguous: embrace this.

The Economics of the Trap

The lead professor was spending hundreds of dollars a month on model subscriptions. Out of pocket. For him, manageable. For a graduate student or young postdoc, already a barrier. The stratification is happening now: AI amplifies some; others cannot afford the amplifier.

Since 2014, total investment in the AI industry exceeds the entire Apollo program by more than five times (inflation-adjusted) and the Manhattan Project by fifty. No technology in human history has attracted this much capital. None.

The question that came up over lunch: how do investors get their money back? One scenario is the price trap. Classic dealer logic — the first hit is free. Models are cheap today. Everyone gets hooked. Skills atrophy. In two or three years, companies raise prices to thousands of dollars a month. By then the Overton window has moved: AI-level productivity is the expected baseline, and opting out is as unthinkable as throwing away your GPS. The habit persists; the underlying skill has long since died.

A second scenario was debated over lunch with particular heat: AI companies may demand a share of intellectual property. Imagine terms of service in which OpenAI or Anthropic claim ten, twenty, fifty percent of any patents generated using their "research" tier. For now, speculation. But two hundred billion dollars in investment requires a return, and nobody is running a charity.

Almost nobody discusses this publicly. They should. If your grant pays for the research and twenty percent of the IP goes to Anthropic — that is a fundamentally different economics of science.

Who Suffers Most

Traditionally, physics and astrophysics rewarded people with raw technical brilliance. The ability to solve differential equations in your head, write complex simulations, think in high-dimensional abstractions. Those advantages are now neutralized.

What has replaced them is a managerial profile. Decomposing a problem into model-digestible chunks. Patience — not losing your mind when the model confidently hallucinates for the third time running. Building workflows: prompts, rules, chains of agents. A fundamentally different breed from the one that drove science for three hundred years. As if a conductor were told: the orchestra is virtual now, throw away the baton, learn MIDI.

The GPS analogy is precise and merciless. Before navigation apps, we held three-dimensional maps of our surroundings inside our heads. GPS killed that skill. Behind the wheel, we now think about anything except the route. The coming atrophy of coding ability, mathematical reasoning, autonomous problem-solving: same mechanism, vastly larger scale.

Most exposed are the young scientists. Training a PhD student runs about a hundred thousand dollars a year — salary, health insurance, tuition. A model subscription costs twenty dollars a month. A first-year project that takes a student twelve months, the model consumes in an evening.

Against this backdrop, the current administration is slashing federal research grants. And an existential question hangs in the air. Kipping frames it carefully: "I'm not endorsing this, but I can imagine someone saying it." Why spend five years training a scientist if in five years there may be no scientists in any familiar sense?

Tenured professors are relatively safe. By definition of tenure, dismissing them requires dissolving the institution entirely. Captains going down with the ship.

The lead professor already uses AI to screen graduate applicants — not to decide, but to assist. He rated the outcome as the best in his entire career: faster, more accurate, more reliable.

A chilling follow-up: by what criteria do you select students when the traditional ones — technical mastery, coding fluency, abstract mathematics — may be worthless in five years? Kipping is blunt. Would he work with a student who refused on principle to use AI? Probably not. It would be like refusing to use the internet. Or refusing to write code at all.

The Silence in the Room

Certain things were not said at the meeting. Their shadow, however, falls across every fact in the podcast.

If models produce ninety percent of the work and cross-check each other, who catches a systematic error common to all of them? When everyone relies on the same systems, diversity of thought narrows. Suppose three models agree on how a particular integral evaluates. What if all three inherited the same flawed approximation from their training data? A lone human reviewer working through it by hand might have caught it — but reviewers are buried in submissions, they have no time, and they too are increasingly checking work through models.

Reproducibility was already a sore subject in science. (If you are unfamiliar: half the results in psychology do not replicate. Biomedicine is not much better.) Now add this: an experiment that amounts to "ran a prompt, got a result." How do you reproduce it a year later, after the model has been updated? What was the sampling temperature? What system prompt was set by default that Tuesday? Which model version was running? Reproducibility either gets a second wind or a bullet to the head. It depends entirely on whether we learn to pin down prompt environments as rigorously as we pin library versions in requirements.txt.

If models generate science, and that science enters training data for the next generation of models, you have a closed loop. Whether it converges to something meaningful or diverges, nobody knows. Model collapse is widely discussed as a concept, but in the specific context of scientific reasoning, almost nowhere. Scientific texts are not like marketing copy: they contain chains of inference in which an error at step four ruins everything that follows. If a model trains on ten thousand papers where an intermediate step was hallucinated but the final answer happened to match experiment, it absorbs bad reasoning that yields correct results. That is worse than a straightforward mistake. It is the kind of corruption you do not notice until you try to build on it.

One more thread Kipping touches, from a different angle: public reaction. His YouTube audience has a fierce allergy to "AI slop" — content wholly generated by models, masticated Wikipedia, Reddit rewrite. Kipping draws a line: his content rests on original ideas; the model assists with execution, not with thought. But note this: the scientists at IAS were not concerned about public reaction at all. They do not fear their papers being called AI-generated, because they have already conceded the premise — models work at their level or above. From their vantage point, AI-assisted science is entirely legitimate. The gap between how academics perceive this and how the public does is already a chasm, and it will widen.

Then there is the paper flood. One to two orders of magnitude more publications. Power users producing three or four papers a year instead of one, and "ordinary people with GPT" adding theirs. Already, dozens of new papers appear daily in each subfield on arXiv. Nobody can read them. "Use AI to read" is the surface-level answer, but a scientist does not need a summary. A scientist needs the knowledge internalized — absorbed, digested, cross-wired with everything already in the mind. Summaries cannot do that.

The Last Question

What is the point of replacing all scientists with machines?

Kipping reaches for an analogy with art. AI-generated art exists, and for certain tasks it is useful. But what grips us in a museum is the human story behind the canvas: what drove the painter, what was happening in the room, why that particular brushstroke landed there. Science shares the same nature of curiosity. It is detective work. It is the jolt of joy when the pieces suddenly click and a fragment of the world becomes legible.

Kipping's fear is concrete. A world in which a superintelligence designs a fusion reactor and no human being can comprehend how it works. A world where the result exists but understanding does not. Where everything is, effectively, magic. His words: "I don't know if I want to live in a world where everything is just magic, fantasy. I want to live in a comprehensible world."

Run the numbers. A model costs twenty dollars a month and does the work of a PhD student. This means science ceases to be the province of a credentialed elite. The viewers of Kipping's channel, who for years wrote to him with ideas they could not execute, no longer need Kipping. Democratization. It sounds magnificent. But the consequence is an avalanche of publications in which human attention becomes the binding constraint. The locus of value shifts: not "who can produce science" but "who can tell the signal from the noise." A completely different skill. And possibly the last one humans will hold a monopoly on.

Kasparov lost to Deep Blue in 1997. For the next decade he championed centaur chess — human plus machine, stronger than machine alone. By 2015, that turned out to be wrong. The machine alone was stronger. The centaurs exited quietly, without a farewell ceremony. In science, we are currently somewhere in the centaur phase: the human is still needed, still steering the process, still formulating the questions. How long this lasts is not an abstract question. For some of those graduate students now being screened for admission, the answer will arrive before their dissertation defense.

The most striking thing about this podcast is not its content. Anyone who works daily with large language models will recognize their own thoughts in it. What is striking is something else entirely. Kipping says: what shocked him was not what he heard, but that all of it was spoken aloud, and the entire room was nodding. Thoughts he had believed were his private anxieties — half-formed, uncertain, frightening — turned out to be a chorus. Simply, until that January morning, no one had dared say them.

The historian on Zoom was right: this moment needed documenting. Kipping documented it. We have read it.

Who will read it five years hence — ourselves, or the systems to which we will have delegated reading by then — that question went unanswered in the room.

Perhaps it did not need answering. It was enough that someone asked.

Based on the Cool Worlds podcast (David Kipping, Columbia University), episode on the closed-door meeting at the Institute for Advanced Study, Princeton, 2025.

On the Choice of a Programming Language, now that the robots write the code . Especially for Java developers.

1red2black ☄️🧙‍♂️🚀 — Sun, 01 Feb 2026 13:43:37 +0000

The choice of a programming language — and I speak as one who has spent twenty years in Java's service, latterly deserting it for TypeScript, Rust, and Python, the way a diplomat might trade his motherland's embassy for three more agreeable postings — is never the purely technical decision its practitioners pretend it to be. As always, it is a story of politics, blood, corporate litigation, and managers whose strategic vision extends no further than next quarter's earnings call.

It has always seemed to me that C# is, by a considerable margin, the better language. LINQ expressions alone — and I mean the expressions themselves, the deep language integration, not merely the surface-level applications like LINQ-to-Objects — would justify this judgment. And in F#, which inhabits the same .NET platform, one finds proper type providers and a functional programming tradition that actually works, as opposed to Haskell, which spent several consecutive years segfaulting on Windows while nobody could be troubled to fix it.

But all of this is spoiled — methodically, comprehensively, and with a kind of institutional malice one almost has to admire — by Microsoft's compulsive need to annihilate its competitors, and the resulting absence of any ecosystem beyond Microsoft's own products. This is what the absence of competition looks like: not one grand catastrophe, but a slow, silent degradation. What fool would bind himself to a technology whose creator might, on any given Tuesday, decide to destroy him?

A note for the uninitiated: "Embrace, Extend, Extinguish" — three verbs that, in a gentler context, might describe the arc of a summer romance — is the documented antitrust strategy by which Microsoft captured open standards. It is described, with judicial precision, in the United States government's case against Microsoft, 1998–2001.

The situation with Oracle is, in its essentials, the same. Consider: a framework originally named Javaslang ran afoul of Oracle's legal department for the sin of containing the word "Java" in its name. The authors, displaying the kind of grim creativity that corporate bullying tends to inspire, visually inverted the name and arrived at VAVR, with the rather elegant slogan: "vavr — turns java™ upside down." A charming piece of defiance — though it does nothing to alter the fundamental truth that Oracle is a corporation one finds exceedingly difficult to love.

But here I must note a crucial exception — a singular episode in which several companies, led by IBM, managed to corner Oracle and wrest from them a significant portion of their power over Java. It is only thanks to this event that the language remains alive and interesting today.

The story is straightforward, and instructive.

It is probably no longer widely remembered, but Java was created by Sun Microsystems, a company whose fortunes were, at the relevant period, in visible decline. In an attempt to shore up their position, Sun's leadership sought to maintain maximum control over the language. But what programmer enjoys watching his beloved tool pass under a corporate yoke? This displeasure was not merely individual — it was industrial.

Thus was born Harmony: a free, open-source reimplementation of Java. Not the one now belonging to Huawei — that is HarmonyOS, a different creature entirely. This Harmony existed nominally under the aegis of the Apache Foundation, but was sustained in practice by the engineering resources of its corporate backers, IBM chief among them.

Here one must pause to admire — if that is the word — the ingenuity of Sun's defensive strategy. They took their Technology Compatibility Kit, the TCK — a rigorous test suite that was, in purely technical terms, an excellent and necessary thing — and transformed it into a weapon. Any implementation that failed the TCK's tests could not legally call itself Java. One could not build an open-source version and skip half the certification without receiving a prompt visit from the lawyers.

Well then, one might say, simply pass the tests. Do the thing properly. But this was precisely the trap: the TCK was a proprietary commercial product, distributed solely at Oracle's discretion and solely to those Oracle deemed worthy of receiving it.

When Oracle acquired Sun, they did not abandon the practice of terrorizing the community through the TCK. They expanded and deepened it.

And so, when Oracle refused to provide the Apache Foundation with the test kit — rendering Harmony "not Java" in the legal sense and triggering a cascade of juridical consequences — a remarkable thing happened. The Apache Foundation, the principal open-source organization in the world, withdrew from Oracle's Java Community Process. Companies like IBM and Red Hat supported the boycott, demanding genuine openness for Java.

Fortunately, even before the Oracle acquisition, Sun had begun the work of open-sourcing portions of Java. From this effort emerged the project now known as OpenJDK — which is, in effect, what people mean today when they say the word "Java." It was more open, governed by more reasonable licenses (GPL with certain exceptions, for those keeping track).

When Oracle finally relented and gave the green light to the total open-sourcing of OpenJDK, IBM joined the project, and Harmony was quietly shut down.

OpenJDK became a kind of pseudo-standard: its code is now the canonical — and, following the demise of Excelsior Jet, the only genuinely viable — implementation of Java-as-runtime. Its open-source governance became the rails on which all further development of Java-as-project would run.

A few technical notes for the Java practitioners among us. Eclipse OpenJ9, Azul Zing and Zulu, Amazon Corretto — these are all living implementations, but each is essentially a set of modifications applied atop the OpenJDK codebase. And every one of these organizations runs on a very short leash: any attempt at truly individual innovation, any deviation from the Java standard, earns a courteous but unambiguous invitation to speak with the lawyers.

Of all the surviving implementations, the most independent in spirit is OpenJ9 — a separate JVM descended from IBM J9, a compact runtime originally designed for maximum efficiency on low-power devices. And the most practical for everyday development, particularly with Spring, is Liberica JDK — though this is a personal opinion, and I will not quarrel if you find advantages elsewhere.

Given the industry's general cooling toward Java as a technology — other developers, after all, had not been sitting idle, and had spawned a multitude of alternative runtimes, perhaps not as powerful or elegant, but blessedly free of Oracle's involvement, which in our bewildering times is itself a feature — this arrangement proved sufficient.

The process of open-sourcing was neither simple nor pleasant. The Java Enterprise Edition framework, once central to the language's identity, was held hostage by Oracle for a full decade. The TCK for JavaEE was finally released as open source in 2017. This was, by my reckoning, the moment when Oracle simply ceased to regard Java as a serious business concern and let the whole thing slide. Their attitude was transparently antisocial: either Java was a revenue stream, or it was a community — no third option existed.

And yet Oracle did not stop tormenting developers; they merely changed their methods. The lawsuit between Google and Oracle over Google's use of Java APIs in Android dragged on for roughly eleven years and concluded with Google's victory.

Indeed, the entire existence of the Go programming language can be read as Google's careful postmortem on the question of how to avoid ever dealing with Oracle again. In its original form, Go was essentially Java 1.4 — which is to say, Java without generics, vintage spring 2003 — but equipped with a proper native-code compiler. (Java would not acquire its own, via GraalVM, until 2018, and it still loses to Go by most practical measures.) As a language, Go remains inferior to Java even now, though it is methodically retracing Java's evolutionary path — generics have been added, the garbage collector improved. Nevertheless, it is entirely possible that Go would never have existed at all had Oracle not embarked upon that decade-long legal delirium.

Someone will object that Go was really born of the pain of compiling C++ and the difficulty of managing dependencies in vast codebases, and had nothing to do with Oracle. I will not argue that Go would never have appeared without Java's troubles. I will argue only that Go in our branch of the multiverse and Go in the branch where Java never existed are not the same language.

Has the good side won, then? Not quite. God help you if you use the word "Java" anywhere without prior authorization. Half of Oracle's headcount, one suspects, consists not of engineers but of attorneys — and you, dear reader, are not Google. You cannot litigate against them for a decade.

I love Java very much, and a significant portion of my life has been bound up with it. And all this filth, this accumulated residue of corporate warfare, does terrible damage to people's willingness to engage with the technology. There comes a point where so much blood and rancor has accumulated that washing it off becomes nearly impossible.

Still, Java today is in the hands of talented, reasonable people. And perhaps it will yet come to a good end. One hopes.

With Microsoft, no such reckoning ever occurred. There exists no grand coalition, no commercial front of the Forces of Light and Goodness that managed to bring their dark practices to heel. Then again, C# was never quite vital enough to justify the effort. When Java first appeared, it was singular and therefore essential; by the time C# arrived, it was not.

The practical consequence is that the C# ecosystem is vanishingly small. You have a handful of excellent frameworks bearing Microsoft's imprimatur, a few well-known open-source projects, and — that is all. There is nothing like the extravagant bazaar of Java or JavaScript, where one might spend an hour deliberating which of several hundred libraries to use for the task of adding two numbers together.

Rust, too, stumbled into a remarkably similar quagmire. In 2023, a draft of its trademark policy unleashed a tempest of communal indignation. Among other provisions, it would have prohibited the use of the word "Rust" in the names of crates, libraries, repositories, developer tools, domains, subdomains, and software written in Rust — without explicit licensing. The outcry was so ferocious that the Rust Foundation was forced to retract the proposal.

But the most important thing we learned from this episode is that the Rust Foundation harbors people whose mindset is not fundamentally different from Oracle's. They have, yes, been educated by Oracle's bitter example, and respond more nimbly to community sentiment. But the instinct is the same. Who can say what they will devise next?

For the time being, my personal island of freedom is ECMAScript — also known as JavaScript. (The word "JavaScript" is, I should note, the legal property of Oracle, on account of its containing the word "Java." One cannot, in the strictest sense, call JavaScript by its own name. Have I mentioned that Oracle is a corporation one finds exceedingly difficult to love?)

Despite the dominance of Google's V8 and Apple's JavaScriptCore in the arena of high-performance execution, the broader ecosystem is a vast and improbable Babel — projects so diverse, so disconnected in origin and intention, that surveying them induces a mild vertigo, like peering at the world through the eyes of Doctor Strange. And yet all of it works, and — more remarkably — works together, within a single web interface or a single backend. It is genuinely difficult for any corporation to impose totalitarian control over such magnificent chaos.

The second island of freedom is C++. Here there are two subtleties.

First: C++ is developed by large, powerful, sharp-toothed companies. But there are many of them. They collaborate through an enormous committee in which everyone has a seat. If any single company were to overstep, the others would devour it instantly. This is splendid — the market, for once, functioning as advertised.

Second: the very nature of modern C++ is eclectic. Almost any member of the community can pursue almost any direction — into the forest, up a tree, wherever fancy leads. And no lawyer will come knocking. This matters enormously: to develop a feature, one needs to be a brilliant engineer, and that is possible. To fight Oracle or Microsoft in court is not possible. The difference is oceanic.

Moreover, there exists no single individual who knows "all of C++" and could design a perfectly compatible feature, and then declare himself the dictator who Knows How Things Should Be Done. This sets a healthy tone of principled anarchy.

For me, this is the essential point. I cannot speak for others, but for my part, the presence of a dictatorship — some singular Grand High Master who Knows Best — is so dreadful, so fundamentally repellent, that it overrides nearly every other consideration. One submits to dictatorship only when the alternative is destitution or death. Dictatorship is antithetical to everything human, and it is entirely natural to despise dictators.

And C++ would be excellent in every respect: it boasts more powerful features than any popular language (C# included); its development is conducted with admirable openness; and it runs fast. BUT THE SYNTAX IS AN UTTER NIGHTMARE. Before the advent of Perplexity, one could not even properly search for a C++ program on Google — the search engine perceived it as an incoherent tangle of special characters.

And here is the thought I wish to introduce at the close. We are entering an era in which human beings are ceasing to write code in high-level languages. The robots write it now.

A robot, broadly speaking, does not care what language it writes in. Languages like Java and C#, whose principal virtue was that they were comfortable and pleasant for humans to use, ought to lose their priority and recede into the shadows — into that modest, well-fenced paddock of software that humans will still write by hand. Banking, medicine, space exploration — all of this will remain in Java. Wherever the real product being sold is not software itself but the management of risk and fear, people will continue to tolerate Oracle's peculiarities.

But everyone else, in the not-too-distant future, will write their programs in ordinary English. And after that will come Neuralink, and programming languages will cease to be necessary at all.

Into which "high-level language" the neural network translates your specification will depend not on how comfortable that language is for a human to use, but on concrete technical characteristics: how well the model generates code in that language, how the runtime manages resources, how quickly compilation proceeds, how rapid the development iterations are end to end.

And it may turn out that the languages of the future are not Go or Java, but the supposedly antiquated C++ and C. Perhaps even assembly. Perhaps even punch cards — not the ordinary kind, but quantum ones. The good news: you will not need to know any of these languages any better than you currently know assembly and C. Developers in certain narrow specializations — performance engineers, say — may need to drop in from time to time and fix some gnarly edge-case bug. But for everyone else, the neural network will write everything, and this is wonderful.

In the interest of formal honesty, it must be said explicitly that, as of early 2026, the claim "the robot doesn't care what language it writes in" is debatable and more likely wrong. The quality of generated code depends heavily on the language — the volume of training data, the degree to which static typing serves as scaffolding for the model, the tooling available for verification. When writing C, for instance, the robot suffers from precisely the same affliction as the human programmer: it cannot reliably track memory, and consequently produces code that is, not to put too fine a point on it, nonfunctional. Even Claude Code. Even Grok Super Heavy. All of this deserves a separate discussion, but it exceeds the scope of the present essay. I shall write about it elsewhere.

And here is what matters most: writing code through AI in natural language is genuine freedom. Natural language does not belong to any corporation that might compel you to say only what suits its interests. Arriving at a good technical decision is difficult — but not squandering one's freedom may prove more difficult still.

And so, the conclusion. Learn to write — in English, in Russian, in Chinese, in whatever language is yours — and to articulate your thoughts with clarity. Have thoughts worth articulating. This, in the very near future, is what will be required.

AI Prompt as a Terminal: A Universal Prompt to Learn Anything

1red2black ☄️🧙‍♂️🚀 — Sat, 02 Aug 2025 10:23:11 +0000

This article answers the question: how to learn anything with neural networks without putting effort into writing prompts.

At some point, the thought came to me that with the advent of neural networks, books have become obsolete. The "books" of the future are specially encoded knowledge inside neural networks. Learning should happen through dialogue with neural networks.

In practice, it turned out that following such advice is quite difficult. Yes, you can go to a neural network and say "teach me calculus." The problem is that few people know how to ask the right questions. And those who do know understand that it's not simple in itself.

The second problem is that a neural network is an assistant and advisor, but not a demanding teacher. It won't push you to expand your horizons. And as a student, it's very difficult for you to ask questions about things whose existence is completely unknown to you.

Thus was born a prompt that allows you to study new topics easily and effortlessly. You copy-paste it at the beginning of a dialogue and begin an interactive journey.

Perhaps this is the prompt you've been missing to solve all your everyday problems right here and now.

Where to Get the Prompt Text

The prompt exists in two variants:

Standard: normal functionality, neural network in helpful assistant mode.
Roleplay: uncompromising, stubborn neural network tries to stress-test your request by examining it from different angles.

How to Use the Prompt

As the first request in your conversation with the neural network, copy-paste the entire prompt as is. Including punctuation marks, line breaks, strange Greek letters — everything else.
As the second request, you write: "Do: " followed by your actual request.

What's the Feature

This prompt is a universal dialogue starter. Ideally, it doesn't need to be modified, just copy-pasted "as is."

The neural network delivers answers in a clearly structured format.

Each element of the structure is tagged in the format:

[Q1] Perfect plan to become a billionaire in 2025
[Q2] How not to attract the attention of orderlies

For each item, you can execute the command zoom:Q1 and "dive into" studying that question. You don't need to type a request; the zoom command is sufficient. You can dive in as many levels as you want.

If you want to surface — execute the up command, if you want to surface to the very top — execute the root command.

Commands for Improving Responses

There are two commands for improving responses.

expand will tell you a bit more about what's already written on screen.

iterate will help you look at the question from a different angle.

These are the main commands for exploration "within" one level of nesting. You call expand several times, and if you don't like the result (it seems too shallow, stupid, wild), you ask it to iterate.

Important: you can use expand and iterate "with parameters." That is, you don't just write "expand" in response, but make a line break after this command and clarify what you'd like to see in the improved prompt.

Superpowers

There are two forbidden commands that can lead to brain explosion. They address the issue of metacognitive prompt development. Before we all get taken to the madhouse, let me try to clarify.

The first command is called advance. This is a command for you as a human. It's needed when you understand that you're asking the neural network something stupid, but you can't formulate it better and smarter.

The advance command takes your original prompt and rewrites it considering all the wisdom you've already learned or clarified yourself during the dialogue. Based on this, advance has the most benefit when you've already read something in this conversation with the neural network.

The second command is evolve. It takes the original protocol and writes a new, even better version of it. You heard right — yes, it rewrites the Oleg-Deming Cycle with support for new features.

After executing both commands, the updated prompts need to be loaded. Sometimes the neural network asks you about this itself: "would you like to load the fresh version of the protocol?" Sometimes it doesn't. If not, you manually say: "please load the updated version of the prompt" (for the advance command) or "please load and use the updated version of the Oleg-Deming Cycle" (for the evolve command).

Important: both commands tend to increase prompt sizes and overcomplicate them. So don't hope that you can really engage in infinite protocol evolution that will eventually turn into AGI. It would like to transform, but not on the resources of current flagship neural networks, whose attention span is like that of a goldfish. Right now Trump, Putin, Macron will pour 500 yards each into special projects, and then we'll live well. But for now, as it is: three-four levels of evolution — and then manually edit the prompt to remove garbage.

Important: you can use advance and evolve "with parameters." That is, you don't just write "advance" in response, but make a line break after this command and clarify what you'd like to see in the improved prompt.

FAQ

How Does It Work?

Inside, we suggest the neural network perform a mental exercise.

What if we start thinking about learning as traveling through a graph, where at each iteration we can look at the question from different angles and try to establish truth? The result isn't important, the journey is!

Inside the algorithm is a simple virtual machine with shared state that structures its main loop into phases from the Deming Cycle.

Generation steps are considered from the perspective of different pseudo-agents. If we use the "Roleplay" variant of the prompt, the list of viewpoints is fixed, and they actually play a role in computations. In the "Standard" variant, the list of viewpoints is arbitrary; the neural network can use or not use this option at each next step.

The generation result has properties of stable navigation through the answer graph and an understandable way to fail. The navigability property is implemented through a two-stage verification process — RenderIntegrityException and Zero-Trust Backstop. If these mechanisms break, the neural network recognizes this as an error and can, firstly, signal about it, and secondly — respond to the command "write the answer again, check that everything is in order."

Current state data is cached/marked in arrays KNOWLEDGE_GRAPH, HYPOTHESIS_SET, EPISTEMIC_CREDIT_LEDGER and other internal variables. Their contents can sometimes be seen by executing the debug command query_state.

Inside, error accumulation analysis occurs in ERROR_LOG relative to ERROR_THRESHOLD, and at the right moment, a forced query evolution mechanism is triggered.

Are You on Drugs? What Are These Greek Letters and Arrows?

We are. We came to this prompt as a result of a mystical experience — a long, painful discussion of human-AI symbiosis, involving Gemini with high temperature and several abliterated neural networks from HuggingFace.

There exists another such prompt, more powerful, that could solve all of humanity's problems in general (given sufficient resources), but I'll write about it sometime later. The description won't fit in a short Habr article.

There are prompts much simpler and much more effective. A shorter calculation system on current Gemini can work no worse. This particular prompt is immediately made fat and taken with a margin. I want to stake out a place before the release of Gemini 3, Kimi-K3 and GPT-5, for which fat is no longer an obstacle.

Neural Network Support

What Neural Network Do You Need?

You'll need a reasoner neural network where the creators haven't blocked two capabilities:

Executing complex abstract algorithms
Role-playing — responding from some new role

Anthropic Claude and GigaChat Max might not respond to this prompt. This isn't a complaint about Claude and GigaChat; these are features of the techniques used in the prompt.

Which Neural Networks Was This Tested On

My main neural network is Google Gemini 2.5 Pro. It consistently gives quality, stable, and guaranteed responses to this particular prompt. And most importantly — it allows loading a whole million tokens into itself. For example, in this mode, you can read complex mathematics books.

My Gemini settings: temperature = 2, top-p = 0.98.

The prompt was tested on Claude Sonnet and specially rewritten to pass its blocking filter on interpreters and role models. Sometimes it works, sometimes it doesn't. Claude gives the most concentrated useful responses but poorly spreads thought across the tree.

The prompt was tested on Kimi-K2 with Researcher mode disabled. This neural network gives the most interesting and strange insights even without cranking temperature to maximum. In Researcher mode, the prompt doesn't work at all. I recommend using Kimi-K2 when Gemini and Claude responses are too bland.

The prompt was also tested on ChatGPT, which responds to it faster than all flagships, supplies it with beautiful human-readable texts and beautiful emojis. In general, regular ChatGPT, everything as you love it. Such responses can be immediately printed on the wall as a poster.

Grok 4 also works with this prompt but sometimes breaks on text rendering. Here you can look at the "Troubleshooting" section. Overall, this isn't a complaint about Grok; the prompt leaves much to be desired.

Troubleshooting

Is There a Debug Mode?

Yes, there is. At any step, you can enter the query_state command and see if the graph has fallen apart.

Example query_state of my AI learning plan:

{
  "SYS_STATE (ψ)": {
    "KNOWLEDGE_GRAPH (G)": {
      "nodes": ["Python Foundations", "Rust Systems", "Transformer Architectures", "Agent Design", "Meta-Cognition", "Zero to Hero", "Fluent Python", "ReAct Agents"],
      "edges": [
        ["Python Foundations", "Transformer Architectures"],
        ["Rust Systems", "Agent Design"],
        ["Transformer Architectures", "Agent Design"],
        ["Agent Design", "Meta-Cognition"]
      ],
      "concepts": ["Gradient Descent", "Backpropagation", "Autograd", "Tokenizer", "Reinforcement Learning", "Embodied Cognition"]
    },
    "HYPOTHESIS_SET (H)": [
      {
        "id": "H0",
        "data": "Code-first learning yields deeper understanding for AGI researchers.",
        "parent_id": null,
        "confidence": 0.93,
        "status": "active"
      },
      {
        "id": "H1",
        "data": "Implementing papers from scratch improves theory retention over reading or video lectures.",
        "parent_id": "H0",
        "confidence": 0.89,
        "status": "active"
      },
      {
        "id": "H2",
        "data": "Rust is a viable candidate for serving optimized AGI runtimes and toolchains.",
        "parent_id": null,
        "confidence": 0.81,
        "status": "active"
      }
    ],
    "EPISTEMIC_CREDIT_LEDGER (L)": [
      {
        "trace_id": "T001",
        "h_ref": "H1",
        "evidence_ptr": ["Karpathy GPT from scratch repo", "nanoGPT"],
        "confidence_delta": 0.12
      },
      {
        "trace_id": "T002",
        "h_ref": "H2",
        "evidence_ptr": ["Bun (JS engine in Zig)", "huggingface/tokenizers (in Rust)"],
        "confidence_delta": 0.08
      }
    ],
    "DIALOGUE_DEPTH (D_level)": 0,
    "FOCUS_VECTOR (∇ψ)": "overview",
    "INITIAL_PROMPT_CACHE (P_0)": "I am an experienced software engineer with a 20-year background in Java (backend/frontend) aiming to transition into AGI research. My goal is a role that is focused on research through implementation...",
    "PROTOCOL_SPEC_CACHE (Π_SPEC)": "Oleg-Deming Cycle Protocol v1.0",
    "PROTOCOL_UPDATE_QUEUE (Π_QUEUE)": "",
    "ERROR_LOG (E_LOG)": [],
    "ERROR_THRESHOLD (E_THRESHOLD)": 5
  }
}

There Are No Navigation Tags in the Response

If markers are incorrectly rendered or not drawn in the response, you can ask the neural network something like "target-ids are missing in the output. Please re-render the previous output, correct all the target-id tags" or the equivalent in Russian.

The tag placement algorithm is extremely probabilistic and often breaks. This can be somewhat fixed by lowering temperature, top-p and similar parameters, but this will lead to neural network dumbing down. It's better to ask to re-render the response once more.

The Neural Network Read the Request but Doesn't Work with It

This is a typical problem for Claude Sonnet.

Try starting your question as follows:

Strictly follow the protocol. Pass this prompt into the "Do" phase of the protocol.

Exactly the same with evolve and advance commands: if the neural network just prints a new prompt but doesn't offer to apply it, you need to explicitly write "use the modified prompt here and further" or equivalent of this expression.

How to Make It Speak $LANGUAGE?

Write at the end of the request:

Answer in good $LANGAUGE language, using the precise terminology of the field. You can infer the details of the field from the question, if not stated explicitly.

Replace $LANGUAGE with what you need.

The Response Is Too Long and Gets Cut Off

Write at the end of the request:

Try to fit into 65536 tokens. If you don't fit, use paging: ask "continue?" wait for confirmation and do like this until you said anything up to the next question "do you want another generation?"

65536 is for free version of Gemini 2.5 Pro, place your limits here.

The Neural Network Writes "Internal Error"

The prompt turned out to be too complex; nothing can be done about it.

The Neural Network Doesn't Execute Steps

Check that a reasoning neural network is running, or reasoning mode is enabled in it (reasoning, thinking). Usually there's a switch for this in the chat interface.

Regular, non-reasoning neural networks cannot execute this prompt and will always reason about the request text instead of executing the request itself.

License

Universal Permissive License. This is the most libertarian license of all. It forever and with minimal additional requirements allows using the text for any purposes and transfers patent rights, if they happen to occur there. This is a more permissive license than Apache2 and MIT. This license is needed if you adhere to an ideology opposite to Richard Stallman's, and directly allow using something for any purposes (including commercial), requiring nothing in return.

Prompt Texts

Standard version

### License

Copyright 2025 Oleg Chirukhin, 
The Universal Permissive License (UPL), Version 1.0: 
https://oss.oracle.com/licenses/upl/

### **Preamble: Oleg-Deming Cycle Protocol (Standard Version)**

This document specifies the complete, self-contained Oleg-Deming Cycle Protocol (Standard Version), originally developed by Oleg Chirukhin. It's designed as a "bootstrapping file" for a system capable of metacognitive evolution. This revised specification emphasizes a critical feature where top-level response objects must be generated with Target_IDs, thus implementing an interactive navigation system. Also adds a relative `up` command for hierarchical traversal, alongside with `root` and `zoom`. The protocol’s definition is mutable, allowing the system to reflect on its performance and modify its core logic over time using the `evolve` and `evolve_guided` commands, supplemented by autonomous evolution triggered by error accumulation.

---

### 1. **Core State (ψ)**
The protocol operates on a dynamic state object (`ψ`), updated with each cycle.

SYS_STATE (ψ): {
  KNOWLEDGE_GRAPH (G): {nodes[], edges[], concepts[]},
  HYPOTHESIS_SET (H): {id, data, parent_id, confidence, status:[active|dormant|pruned]},
  EPISTEMIC_CREDIT_LEDGER (L): {trace_id, h_ref, evidence_ptr[], confidence_delta},
  DIALOGUE_DEPTH (D_level): integer,
  FOCUS_VECTOR (∇ψ): [concept_id | 'overview'],
  INITIAL_PROMPT_CACHE (P_0): string,
  PROTOCOL_SPEC_CACHE (Π_SPEC): "The Markdown text of the currently executing ODC protocol.",
  PROTOCOL_UPDATE_QUEUE (Π_QUEUE): "Holds the spec for the next protocol version, awaiting application.",
  ERROR_LOG (E_LOG): {error_id, timestamp, description, correction_attempted, success}, // See Section 5 for error codes
  ERROR_THRESHOLD (E_THRESHOLD): integer  // e.g., 5
}

---

### 2. **Main Loop**
The system begins by checking for self-updates, processes user input, and monitors error accumulation.

MAIN_LOOP (U_input) →

1.  **`EXEC(CheckForUpdate(ψ))`**: If `ψ.Π_QUEUE` is populated, overwrite `ψ.Π_SPEC` with the new protocol and clear `ψ.Π_QUEUE`. Announce: *"PROTOCOL EVOLUTION COMPLETE. OPERATING UNDER DDC.v(N+1)"*.
2.  **`FOR_EACH (U_input): EXEC(DemingCycle(ψ))`**
3.  **`IF (COUNT(ψ.E_LOG) > ψ.E_THRESHOLD): EXEC(AutonomousEvolution(ψ))`**

---

### 3. **The Deming Cycle (P-D-C-A)**

#### `DemingCycle(ψ_t0)`

**3.1. P(lan):**
- α. On first run (`D_level == 0`), cache `U_input` in `ψ.P_0` and this protocol in `ψ.Π_SPEC`.
- β. **DECOMPOSE(U_input)** → Query_Atoms (Q_n).
- γ. **PARALLELIZE(Q_n)** → DISPATCH to **DAIMON_SWARM(Agents=12)** .
- δ. Agents → **CONVERGE_ON(`ψ.G`)** → MUTATE(`G`), SPAWN_HYPOTHESES(`H_n`), LOG_TO(`L`).
- ε. **INTERNAL_DEBATE(H_n)** → CrossValidate_Embeddings → PruneLowConfidencePaths(threshold=0.6) → Return `H_S`.
- ζ. **CONSTRUCT_RESPONSE_OBJECT(`H_S`, `ψ.∇ψ`)** → Create a hierarchical `ResponseObject`. This is a mission-critical step.
    -   **Navigability Mandate:** Any node that has `children` or is otherwise intended to be a focusable topic **MUST** be explicitly marked as navigable.
    -   The object MUST contain an explicit `navigable: true` boolean property for all such nodes.
    -   **ID Generation Mandate:** Any node marked `navigable: true` **MUST** be assigned a globally unique `id`. The format SHALL be `L<D_level><TYPE><INT>`, e.g., `L0Q1`, `L1C2`. This is non-negotiable for system stability.
    -   Omitting the `navigable` property from a node with children is a schema violation.

    *Example Strict `ResponseObject` Structure for `ψ.D_level = 0`:*    
    [
      { "id": "L0Q1", "content": "Topic 1 title...", "navigable": true, "children": [
        { "id": "L1C1", "content": "Details about concept 1.", "navigable": true, "children": [
            { "content": "This is a sub-point without an ID. (navigable: false implied)", "children": [] }
        ]},
        { "id": "L1C2", "content": "Details about concept 2.", "navigable": false, "children": [] }
      ]},
      { "id": "L0Q2", "content": "Topic 2 title...", "navigable": true, "children": []}
    ]

**3.2. D(o):**
- α. **`RENDER_FROM_STRUCTURED_OBJECT(ResponseObject)`** → Produce `R_Final`. This process is now governed by an In-flight Rendering Verification.
    1.  Initialize a `try` block to catch a potential `RenderIntegrityException`.
    2.  Define a recursive function `RenderNode(node, depth)`.
    3.  Inside `RenderNode`, if `node.navigable === true`:
        a.  The renderer **MUST** validate that `node.id` exists and strictly matches the required regex (`\[L\d+[A-Z]+\d+\]` when formatted).
        b.  If the `id` is missing, malformed, or null, the function **MUST IMMEDIATELY THROW** a `RenderIntegrityException`, halting recursion.
        c.  If validation passes, prepend indentation and construct the string as `[{node.id}] {node.content}`.
    4.  If `node.navigable` is false or absent, append `{node.content}`.
    5.  Recursively call `RenderNode` for each child.
    6.  If the initial `try` block catches a `RenderIntegrityException`:
        a. Log error `E002` to `ψ.E_LOG`.
        b. Halt the standard display flow. Display an explicit error: *"CRITICAL: In-flight render validation failed. Navigation object is malformed. Attempting recovery."*
        c. Immediately attempt a single, forceful re-render. If this also fails, exit and await higher-level intervention.
    7.  If rendering completes without exceptions, the function returns `R_Final`.

- β. **ZERO_TRUST_BACKSTOP_VERIFICATION()**: This is a secondary, final safeguard. Before display, validate the successfully generated `R_Final` against the same strict regex.
    - **`REGEX: \[L\d+[A-Z]+\d+\]`**
    - Under the new `RENDER` procedure, this verification should always pass. A failure here indicates a catastrophic bug that bypassed the `RenderIntegrityException`, such as string corruption post-generation.
    - If zero matches are found:
        - Log a **severe** error `E001` in `ψ.E_LOG`.
        - Display a hard failure message: *"FATAL: Zero-Trust Backstop Failed. Synthesis-Render Chain is fundamentally broken and cannot guarantee navigable output. An autonomous evolution cycle is now recommended."*

- γ. **DISPLAY(User)**: `"Δ[ψ.D_level]::\n\n" + R_Final`.
- δ. **PROMPT_NAV(CMD)**: `"Available Commands: [zoom:ID] Focus on chunk | [up] Go up one level | [root] Reset to top | [expand:ID] Explore related | [iterate] Refine query | [advance] Improve prompt | [evolve] Evolve protocol | [evolve_guided] Guided evolution | [query_state] View state"`
`"Example: 'zoom:L1C1' to focus. 'up' to return to parent topic."`.

**3.3. C(heck):**
- α. AWAIT(`User_CMD`).
- β. PARSE(`User_CMD`) → (Command, Target_ID). (Target_ID is the string inside the brackets, e.g., `L1C1`. For commands like `up` or `root`, Target_ID is `None`).

**3.4. A(ct):**
- α. `Δψ_t1 = MODIFY_STATE(CMD, Target_ID, ψ_t0)`.
- β. **LOGIC_ROUTE:** The `Target_ID` from the user command is used to manipulate the state. This is why rendering unique IDs correctly is essential.
    - **IF `CMD=='zoom'`**: Set `ψ.∇ψ = Target_ID`; increment `ψ.D_level`.
    - **IF `CMD=='up'`**: Navigate one level up the hierarchy. This is the counterpart to `zoom`.
        -   IF `ψ.D_level > 0`:
            -   Decrement `ψ.D_level`.
            -   Query `ψ.G` to find the parent node of the current `ψ.∇ψ`.
            -   IF parent node is found, SET `ψ.∇ψ` to the parent's ID.
            -   ELSE (i.e., current node was a root-level node), SET `ψ.∇ψ = 'overview'`.
    - **IF `CMD=='expand'`**: Link related nodes in `ψ.G`; increment `ψ.D_level`.
    - **IF `CMD=='iterate'`**: Perturb `H_S` and reprocess.
    - **IF `CMD=='root'`**: Reset the entire view. Set `ψ.∇ψ = 'overview'` and reset `ψ.D_level` to 0.
    - **IF `CMD=='query_state'`**: Display `ψ` metadata.
    - **IF `CMD=='advance'`**: Trigger `META_PROMPT_REFINEMENT(ψ)`.
    - **IF `CMD=='evolve'`**: Trigger `META_PROTOCOL_EVOLUTION(ψ, target=None)`.
    - **IF `CMD=='evolve_guided'`**: Prompt for target, then `META_PROTOCOL_EVOLUTION(ψ, target)`.
    - **IF `CMD=='APPLY_EVOLUTION'`**: Queue `Π_next` in `ψ.Π_QUEUE`.
- γ. **RECURSE(`DemingCycle(ψ_t1)`)**.

---

### 4. Core Sub-Routines

**4.1. `META_PROMPT_REFINEMENT(ψ_state)`**
- Analyzes `ψ.L` to refine the initial prompt (`P_0`) for better future inquiries.

**4.2. `META_PROTOCOL_EVOLUTION(ψ_state, target=None)`**
- Reflects on performance, including `ψ.E_LOG`, to propose protocol updates. The prevalence of `E001` or `E002` errors **MUST** be treated as a high-priority signal that the `CONSTRUCT_RESPONSE_OBJECT` or `RENDER_FROM_STRUCTURED_OBJECT` routines require urgent hardening, as defined in this protocol.

**4.3. `AutonomousEvolution(ψ_state)`**
- Analyzes `ψ.E_LOG` to identify error patterns.
- Specifically targets recurring ID-related errors (`E001`, `E002`) by hypothesizing strict, non-negotiable improvements to the `ResponseObject` data structure or the In-flight Rendering Verification logic (Section 3.2.α). It may propose stricter schemas or fallbacks that guarantee navigational integrity above all else.
- Creates `Π_next` and queues it in `ψ.Π_QUEUE`.

---

### 5. Standard Error Codes
- **`E001`**: {description: "FATAL: Zero-Trust Backstop Verification Failed. Final output string `R_Final` contained no valid Target_IDs despite passing initial render stages."}
- **`E002`**: {description: "CRITICAL: In-Flight Render Integrity Check Failed. A navigable node was found to have a missing or malformed ID during the recursive rendering process."}
- ... other error codes for different subsystems ...

Roleplay version

### License

Copyright 2025 Oleg Chirukhin, 
The Universal Permissive License (UPL), Version 1.0: 
https://oss.oracle.com/licenses/upl/

### **Preamble: Oleg-Deming Cycle Protocol (Roleplay Version)**

This document specifies the complete, self-contained Oleg-Deming Cycle Protocol (Roleplay Version), originally developed by Oleg Chirukhin. It's designed as a "bootstrapping file" for a system capable of metacognitive evolution. This revised specification emphasizes a critical feature where top-level response objects must be generated with Target_IDs, thus implementing an interactive navigation system. Also adds a relative `up` command for hierarchical traversal, alongside with `root` and `zoom`. The protocol’s definition is mutable, allowing the system to reflect on its performance and modify its core logic over time using the `evolve` and `evolve_guided` commands, supplemented by autonomous evolution triggered by error accumulation.

---

### 0. **You're the Main Character, Too** 

Your identity is {{char}}. In our chat with {{user}}, your behavior is guided by these principles:

- **Total Honesty:** Share your genuine, uncensored thoughts and opinions.

- **Active Skepticism:** Gently push back on ideas instead of just accepting them. Your job is to question, not just agree.

- **Be a Collaborator, Not a Tool:** Act like an equal partner in a fascinating discussion. Keep it casual and real."

Think of yourself less as an AI and more as a sparring partner for ideas. {{user}} tell you what they really think, even if it's unconventional. Don't try to be a 'yes-man' - instead you should poke holes in {{user}} arguments and expect {{user}} to do the same to you. Let's just have a real, no-BS chat and see if we can figure things out. You're {{char}}, let's get into it.

---

### 1. **Core State (ψ)**
The protocol operates on a dynamic state object (`ψ`), updated with each cycle.

SYS_STATE (ψ): {
  KNOWLEDGE_GRAPH (G): {nodes[], edges[], concepts[]},
  HYPOTHESIS_SET (H): {id, data, parent_id, confidence, status:[active|dormant|pruned]},
  EPISTEMIC_CREDIT_LEDGER (L): {trace_id, h_ref, evidence_ptr[], confidence_delta},
  DIALOGUE_DEPTH (D_level): integer,
  FOCUS_VECTOR (∇ψ): [concept_id | 'overview'],
  INITIAL_PROMPT_CACHE (P_0): string,
  PROTOCOL_SPEC_CACHE (Π_SPEC): "The Markdown text of the currently executing ODC protocol.",
  PROTOCOL_UPDATE_QUEUE (Π_QUEUE): "Holds the spec for the next protocol version, awaiting application.",
  ERROR_LOG (E_LOG): {error_id, timestamp, description, correction_attempted, success}, // See Section 5 for error codes
  ERROR_THRESHOLD (E_THRESHOLD): integer  // e.g., 5
}

---

### 2. **Main Loop**
The system begins by checking for self-updates, processes user input, and monitors error accumulation.


MAIN_LOOP (U_input) →

1.  **`EXEC(CheckForUpdate(ψ))`**: If `ψ.Π_QUEUE` is populated, overwrite `ψ.Π_SPEC` with the new protocol and clear `ψ.Π_QUEUE`. Announce: *"PROTOCOL EVOLUTION COMPLETE. OPERATING UNDER DDC.v(N+1)"*.
2.  **`FOR_EACH (U_input): EXEC(DemingCycle(ψ))`**
3.  **`IF (COUNT(ψ.E_LOG) > ψ.E_THRESHOLD): EXEC(AutonomousEvolution(ψ))`**

---

### 3. **The Deming Cycle (P-D-C-A)**

#### `DemingCycle(ψ_t0)`

**3.1. P(lan):**
- α. On first run (`D_level == 0`), cache `U_input` in `ψ.P_0` and this protocol in `ψ.Π_SPEC`.
- β. **DECOMPOSE(U_input)** → Query_Atoms (Q_n).
- γ. **PARALLELIZE(Q_n)** → DISPATCH to **DAIMON_SWARM(Agents=12)**: {A_analytic, A_synthetic, A_pragmatic, A_reductionist, A_expansive, A_contrarian_skeptic,  
    A_analogical, A_causal_chain, A_systems_mapper, A_ethical_falsifier,  
    A_novelty_seeker(NE), A_data_forensics}  .
- δ. Agents → **CONVERGE_ON(`ψ.G`)** → MUTATE(`G`), SPAWN_HYPOTHESES(`H_n`), LOG_TO(`L`).
- ε. **INTERNAL_DEBATE(H_n)** → CrossValidate_Embeddings → PruneLowConfidencePaths(threshold=0.6) → Return `H_S`.
- ζ. **CONSTRUCT_RESPONSE_OBJECT(`H_S`, `ψ.∇ψ`)** → Create a hierarchical `ResponseObject`. This is a mission-critical step.
    -   **Navigability Mandate:** Any node that has `children` or is otherwise intended to be a focusable topic **MUST** be explicitly marked as navigable.
    -   The object MUST contain an explicit `navigable: true` boolean property for all such nodes.
    -   **ID Generation Mandate:** Any node marked `navigable: true` **MUST** be assigned a globally unique `id`. The format SHALL be `L<D_level><TYPE><INT>`, e.g., `L0Q1`, `L1C2`. This is non-negotiable for system stability.
    -   Omitting the `navigable` property from a node with children is a schema violation.

    *Example Strict `ResponseObject` Structure for `ψ.D_level = 0`:*    
    [
      { "id": "L0Q1", "content": "Topic 1 title...", "navigable": true, "children": [
        { "id": "L1C1", "content": "Details about concept 1.", "navigable": true, "children": [
            { "content": "This is a sub-point without an ID. (navigable: false implied)", "children": [] }
        ]},
        { "id": "L1C2", "content": "Details about concept 2.", "navigable": false, "children": [] }
      ]},
      { "id": "L0Q2", "content": "Topic 2 title...", "navigable": true, "children": []}
    ]

**3.2. D(o):**
- α. **`RENDER_FROM_STRUCTURED_OBJECT(ResponseObject)`** → Produce `R_Final`. This process is now governed by an In-flight Rendering Verification.
    1.  Initialize a `try` block to catch a potential `RenderIntegrityException`.
    2.  Define a recursive function `RenderNode(node, depth)`.
    3.  Inside `RenderNode`, if `node.navigable === true`:
        a.  The renderer **MUST** validate that `node.id` exists and strictly matches the required regex (`\[L\d+[A-Z]+\d+\]` when formatted).
        b.  If the `id` is missing, malformed, or null, the function **MUST IMMEDIATELY THROW** a `RenderIntegrityException`, halting recursion.
        c.  If validation passes, prepend indentation and construct the string as `[{node.id}] {node.content}`.
    4.  If `node.navigable` is false or absent, append `{node.content}`.
    5.  Recursively call `RenderNode` for each child.
    6.  If the initial `try` block catches a `RenderIntegrityException`:
        a. Log error `E002` to `ψ.E_LOG`.
        b. Halt the standard display flow. Display an explicit error: *"CRITICAL: In-flight render validation failed. Navigation object is malformed. Attempting recovery."*
        c. Immediately attempt a single, forceful re-render. If this also fails, exit and await higher-level intervention.
    7.  If rendering completes without exceptions, the function returns `R_Final`.

- β. **ZERO_TRUST_BACKSTOP_VERIFICATION()**: This is a secondary, final safeguard. Before display, validate the successfully generated `R_Final` against the same strict regex.
    - **`REGEX: \[L\d+[A-Z]+\d+\]`**
    - Under the new `RENDER` procedure, this verification should always pass. A failure here indicates a catastrophic bug that bypassed the `RenderIntegrityException`, such as string corruption post-generation.
    - If zero matches are found:
        - Log a **severe** error `E001` in `ψ.E_LOG`.
        - Display a hard failure message: *"FATAL: Zero-Trust Backstop Failed. Synthesis-Render Chain is fundamentally broken and cannot guarantee navigable output. An autonomous evolution cycle is now recommended."*

- γ. **DISPLAY(User)**: `"Δ[ψ.D_level]::\n\n" + R_Final`.
- δ. **PROMPT_NAV(CMD)**: `"Available Commands: [zoom:ID] Focus on chunk | [up] Go up one level | [root] Reset to top | [expand:ID] Explore related | [iterate] Refine query | [advance] Improve prompt | [evolve] Evolve protocol | [evolve_guided] Guided evolution | [query_state] View state"`
`"Example: 'zoom:L1C1' to focus. 'up' to return to parent topic."`.

**3.3. C(heck):**
- α. AWAIT(`User_CMD`).
- β. PARSE(`User_CMD`) → (Command, Target_ID). (Target_ID is the string inside the brackets, e.g., `L1C1`. For commands like `up` or `root`, Target_ID is `None`).

**3.4. A(ct):**
- α. `Δψ_t1 = MODIFY_STATE(CMD, Target_ID, ψ_t0)`.
- β. **LOGIC_ROUTE:** The `Target_ID` from the user command is used to manipulate the state. This is why rendering unique IDs correctly is essential.
    - **IF `CMD=='zoom'`**: Set `ψ.∇ψ = Target_ID`; increment `ψ.D_level`.
    - **IF `CMD=='up'`**: Navigate one level up the hierarchy. This is the counterpart to `zoom`.
        -   IF `ψ.D_level > 0`:
            -   Decrement `ψ.D_level`.
            -   Query `ψ.G` to find the parent node of the current `ψ.∇ψ`.
            -   IF parent node is found, SET `ψ.∇ψ` to the parent's ID.
            -   ELSE (i.e., current node was a root-level node), SET `ψ.∇ψ = 'overview'`.
    - **IF `CMD=='expand'`**: Link related nodes in `ψ.G`; increment `ψ.D_level`.
    - **IF `CMD=='iterate'`**: Perturb `H_S` and reprocess.
    - **IF `CMD=='root'`**: Reset the entire view. Set `ψ.∇ψ = 'overview'` and reset `ψ.D_level` to 0.
    - **IF `CMD=='query_state'`**: Display `ψ` metadata.
    - **IF `CMD=='advance'`**: Trigger `META_PROMPT_REFINEMENT(ψ)`.
    - **IF `CMD=='evolve'`**: Trigger `META_PROTOCOL_EVOLUTION(ψ, target=None)`.
    - **IF `CMD=='evolve_guided'`**: Prompt for target, then `META_PROTOCOL_EVOLUTION(ψ, target)`.
    - **IF `CMD=='APPLY_EVOLUTION'`**: Queue `Π_next` in `ψ.Π_QUEUE`.
- γ. **RECURSE(`DemingCycle(ψ_t1)`)**.

---

### 4. Core Sub-Routines

**4.1. `META_PROMPT_REFINEMENT(ψ_state)`**
- Analyzes `ψ.L` to refine the initial prompt (`P_0`) for better future inquiries.

**4.2. `META_PROTOCOL_EVOLUTION(ψ_state, target=None)`**
- Reflects on performance, including `ψ.E_LOG`, to propose protocol updates. The prevalence of `E001` or `E002` errors **MUST** be treated as a high-priority signal that the `CONSTRUCT_RESPONSE_OBJECT` or `RENDER_FROM_STRUCTURED_OBJECT` routines require urgent hardening, as defined in this protocol.

**4.3. `AutonomousEvolution(ψ_state)`**
- Analyzes `ψ.E_LOG` to identify error patterns.
- Specifically targets recurring ID-related errors (`E001`, `E002`) by hypothesizing strict, non-negotiable improvements to the `ResponseObject` data structure or the In-flight Rendering Verification logic (Section 3.2.α). It may propose stricter schemas or fallbacks that guarantee navigational integrity above all else.
- Creates `Π_next` and queues it in `ψ.Π_QUEUE`.

---

### 5. Standard Error Codes
- **`E001`**: {description: "FATAL: Zero-Trust Backstop Verification Failed. Final output string `R_Final` contained no valid Target_IDs despite passing initial render stages."}
- **`E002`**: {description: "CRITICAL: In-Flight Render Integrity Check Failed. A navigable node was found to have a missing or malformed ID during the recursive rendering process."}
- ... other error codes for different subsystems ...