RoTSL

Posted on Apr 25 • Originally published at Medium on Apr 24

What Happens When You Try to Reverse Biology? A Deep Look at the Protein DNA Analysis Simulator

#research #alphafold #molecularbiology #protiensynthesis

Most biology tools move in one direction. DNA becomes RNA. RNA becomes protein.

Photo by Warren Umoh on Unsplash

That flow is drilled into anyone who has taken genetics or molecular biology. After a while, it becomes background knowledge. You stop questioning it because the pathway feels settled.

Then a project comes along and asks a slightly uncomfortable question:

What if we tried to move backward?

The Protein → DNA Synthesis Simulator explores that idea.

Project links:

This is not laboratory software. It is not a validated bioinformatics pipeline. It is a simulation designed to explore a biological question.

That distinction matters from the beginning.

This project is simulation and hypothetical only and has not been tested or validated. This project is for educational purposes only.

Why reverse biology is harder than it sounds

At first glance, reverse translation seems simple.

If DNA creates proteins, then proteins should point back to DNA.

But biology does not preserve information perfectly.

A protein sequence contains amino acids. DNA contains codons.

The problem is that multiple codons can encode the same amino acid.

For example:

Leucine can be encoded by six codons
Serine can be encoded by six codons
Arginine can be encoded by six codons

This redundancy is called degeneracy of the genetic code.

Once translation happens, part of the original DNA detail disappears.

You keep the protein.

You lose some of the nucleotide history.

That means reverse translation is not reconstruction.

It is estimation.

The simulator starts with a simple but useful idea

The main GitHub project allows users to input a protein sequence and generate a possible DNA equivalent.

The workflow is straightforward:

Enter a protein sequence
Parse amino acids
Match amino acids to codons
Build a theoretical DNA sequence
Display a reconstructed output

The important word here is possible.

The simulator does not claim to discover the original DNA strand.

It generates a biologically plausible candidate.

That may sound like a small distinction, but it changes how the project should be understood.

This is closer to a thought experiment than a prediction engine.

The protein analysis page adds a second layer

The main simulator introduces reverse translation.

Main Page

Link: https://rotsl.github.io/protein2dna_synthesis-simulator/protein_analysis/

Instead of immediately converting protein into DNA, the analysis layer pauses to inspect the sequence itself.

That shift makes the project more interesting.

A protein sequence contains more than letters.

It contains patterns.

The analysis process can reveal:

Amino acid composition
Sequence length
Repeating motifs
Structural tendencies
Hydrophobic or hydrophilic regions
Codon ambiguity possibilities
Conserved residue clusters

This changes the role of the simulator.

You stop treating proteins as outputs and start treating them as encoded information.

Protein sequences carry clues, not complete answers

One misconception in biology education is that proteins behave like exact reflections of DNA.

They do not.

A protein preserves order.

It does not preserve every codon decision.

This becomes obvious during reverse translation.

One protein sequence may correspond to many DNA sequences.

Different organisms may also favor different codon usage patterns.

A bacterial sequence and a mammalian sequence may encode the same amino acids using different codon preferences.

The simulator exposes this uncertainty instead of hiding it.

That is one of its strongest features.

The protein analysis page changes how users think

Many educational biology tools focus on output.

Input something. Receive a result.

The analysis page encourages a different rhythm.

You enter a sequence and ask:

What kind of protein is this?
Are certain amino acids overrepresented?
Are there repeating patterns?
Could this suggest structural behavior?
How ambiguous is reverse translation?

These questions matter because biological interpretation happens before prediction.

Researchers rarely jump directly to answers.

They inspect patterns first.

The analysis page mirrors that mindset.

Analysis

The live simulation version feels more interactive

The GitHub build introduces the idea.

The version feels closer to a working prototype.

Try it out here : https://protein-dna-simulator.vercel.app/

Preview structures

At first, both versions appear similar.

Protein input. DNA output. Translation logic.

But the live build feels faster and more responsive.

It behaves less like a static webpage and more like an active sequence workspace.

You can experiment quickly.

Change one amino acid.

Watch the sequence shift.

Remove residues.

See how output changes.

That immediate feedback matters.

Learning becomes easier when interaction is continuous.

The live version works like a live sequence interpreter

Many biology tools rely on a submission model:

Paste sequence
Configure settings
Submit request
Wait for processing
Read results

The live version presented shortens that cycle.

You enter data and receive near-instant interpretation.

That makes experimentation feel natural.

You stop thinking in terms of “jobs” and start thinking in terms of exploration.

The live simulator combines several biological layers

The interface appears to merge multiple ideas into one workflow.

Engineering Tab

The system includes:

Protein parsing
Amino acid recognition
Codon mapping
DNA sequence estimation
Educational translation logic
Sequence relationship visualization

This matters because reverse translation is not a single operation.

It is a chain of assumptions.

Each amino acid creates branching possibilities.

The simulator turns those possibilities into something visible.

Reverse translation exposes a hidden truth about biology

Most diagrams simplify biology into arrows.

DNA → RNA → Protein.

That model is useful.

It is also incomplete.

Real biology includes ambiguity.

Codon redundancy means several nucleotide sequences can create identical proteins.

That creates uncertainty.

The simulator does not remove uncertainty.

It places uncertainty at the center of the experience.

That makes the project more honest than many educational demos.

The project feels closer to computational biology than traditional teaching software

There is a subtle shift that happens when using the simulator.

You stop memorizing.

You start interpreting.

That makes the project feel closer to lightweight bioinformatics.

Professional sequence-analysis tools often involve:

Pattern recognition
Sequence comparison
Codon usage analysis
Structural inference
Similarity scoring
Translation mapping

The simulator is not competing with research-grade systems.

It borrows concepts from computational biology and simplifies them into something approachable.

Inspiration from published Science research

The project notes inspiration from research published in Science:

https://www.science.org/doi/abs/10.1126/science.aed1656

The simulator does not reproduce the paper’s findings.

The connection is conceptual.

Modern biology increasingly depends on inference.

Researchers often estimate relationships between biological systems rather than directly observing every process.

Protein folding prediction, sequence inference, and molecular modeling all rely on computational interpretation.

The simulator fits within that broader idea.

It asks:

If proteins preserve traces of DNA history, how much can we estimate from those traces?

That question alone makes the project worth exploring.

Why uncertainty is the most valuable part of the simulator

The strongest lesson here is not reverse translation.

It is uncertainty.

Science education sometimes creates the illusion that biology always produces clean answers.

The simulator quietly pushes against that assumption.

You expect one DNA sequence.

You discover many possibilities.

You expect certainty.

You find ambiguity.

That is closer to how real biological reasoning works.

What makes the project useful for education

The simulator works well for:

Students learning transcription and translation
Beginners exploring codon relationships
Developers interested in biological computation
Bioinformatics learners experimenting with sequence logic
People curious about molecular coding systems

The project does not require laboratory experience.

It asks users to think.

That alone gives it educational value.

Where the simulator could grow

There are several additions that could deepen the learning experience.

Organism-specific codon bias

Different organisms prefer different codons.

Adding selectable species would make reverse translation more realistic.

Multiple DNA candidates

Instead of returning one sequence, the simulator could generate ranked alternatives.

Probability scoring

Codon likelihood could help explain why some outputs are more plausible.

Structural hints

The analysis page could flag patterns associated with alpha helices or beta sheets.

Sequence comparison

Comparing proteins side-by-side would help explain mutation and similarity.

These additions would not make the simulator “correct.”

They would make uncertainty easier to understand.

Final thoughts

The Protein → DNA Synthesis Simulator works best when treated as a reasoning tool.

It does not reconstruct biology.

It explores biological possibility.

The GitHub version explains the concept.

The protein analysis page adds interpretation.

The live web version makes the process interactive.

Together, they create a small ecosystem for thinking about biological information in reverse.

The project becomes more interesting once you stop asking:

“Is this the original DNA?”

and start asking:

“What assumptions make this sequence plausible?”

That shift changes the experience.

You stop seeing proteins as endpoints.

You start seeing them as traces.

And sometimes, tracing hidden information is more interesting than certainty.

Project links

Main simulator: https://rotsl.github.io/protein2dna_synthesis-simulator/
Protein analysis page :
Interactive web version: https://protein-dna-simulator.vercel.app/

References

[1] Peiwei Deng, Heewon Lee, Carlos Armijo, Hao Wang, and Albert Gao. Protein-templated synthesis of di-nucleotide repeat DNA by an anti-phage reverse transcriptase. Science, page aed1656, 2026. PDB: 9Z6Y.

[2] Alexander A. Green, Pamela A. Silver, James J. Collins, and Peng Yin. Toehold switches: De-novo-designed regulators of gene expression. Cell, 159:925–939, 2014.

[3] Andrew V. Anzalone, Peyton B. Randolph, Jessie R. Davis, Alexander A. Sousa, LukeW.Koblan, JonathanM.Levy, PeterJ.Chen, ChristopherWilson, GregoryA. Newby, Aditya Raguram, and David R. Liu. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature, 576:149–157, 2019.

[4] Andrew V. Anzalone, Xin D. Gao, Christopher J. Podracky, Andrew T. Nelson, Luke W. Koblan, Aditya Raguram, Jonathan M. Levy, Jeffry A. M. Mercer, and David R. Liu. Programmable deletion, replacement, integration and inversion of large DNA sequences with twin prime editing. Nature Biotechnology, 40:731–740, 2022.

[5] Rohan R. DRT3b Engineering Studio: Protein →RNA →DNA simulator. https://github.com/rotsl/protein2dna_synthesis-simulator, 2026. Open-source web simulation platform. Live app: https://protein-dna-simulator.vercel.app/.

[6] Mihaly Varadi, Stephen Anyango, Mandar Deshpande, Sreenath Nair, Cindy Natassia, Galabina Yordanova, David Yuan, Oana Stroe, Gemma Wood, Agata Laydon, Augustin Žídek, Tim Green, Kathryn Tunyasuvunakool, Stig Petersen, John Jumper, Ellen Clancy, Richard Green, Oriol Vinyals, Demis Hassabis, and Sameer Velankar. AlphaFold Protein Structure Database: massive structural coverage for biology and medicine. Nucleic Acids Research, 50:D439–D444, 2022.

[7] John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, Alex Bridgland, Clemens Meyer, Simon A. A. Kohl, Andrew J. Ballard, Andrew Cowie, Bernardino Romera-Paredes, Stanislav Nikolov, Rishub Jain, Jonas Adler, Trevor Back, Stig Petersen, David Reiman, Ellen Clancy, Michal Zielinski, Martin Steinegger, Michalina Pacholska, Tamas Berghammer, Sebastian Bodenstein, David Silver, Oriol Vinyals, Andrew W. Senior, Koray Kavukcuoglu, Pushmeet Kohli, and Demis Hassabis. Highly accurate protein structure prediction with AlphaFold. Nature, 596:583–589, 2021.

[8] Kirsten L. Frieda, James M. Linton, Sahand Hormoz, Junhong Choi, Koo-Lok K. Chow, Zeba S. Singer, Mark W. Budde, Michael B. Elowitz, and Long Cai. Synthetic recording and in situ readout of lineage information in single cells. Nature, 541:107–111, 2017.

[9] Bushra Raj, Daniel E. Wagner, Aaron McKenna, Shristi Pandey, Allon M. Klein, Jay Shendure, Deepak L. Bhatt, and Bhatt Bhatt. GESTALT: a method for tracing lineage and clonal dynamics at single-cell resolution. Nature Biotechnology, 36:442–450, 2018.

[10] Wenyuan Tang, James H. Hu, and David R. Liu. PEAR: a highly multiplexed lineage recorder based on prime editing and in situ barcode readout. Nature Methods, 21:1054–1065, 2024.

[11] Junhong Choi, Wei Chen, Anna Minkina, Florence M. Chardon, Chase C. Suiter, Samuel G. Regalado, Silvia Domcke, Nobuhiko Hamazaki, Choli Lee, Beth Martin, Ryan M. Daza, and Jay Shendure. A temporally resolved, multi-symbolic molecular recorder based on sequential DNA writing. Nature Chemical Biology, 18:1204–1212, 2022.

[12] Nathaniel Roquet, Ava P. Soleimany, Alyssa C. Ferris, Scott Wick, and Timothy K. Lu. Synthetic recombinase-based state machines in living cells. Science, 353:aad8559, 2016.