I've been building renkin, a retrosynthesis engine in Pure Rust. You give it a target molecule as a SMILES string and it tries to find synthesis routes back to commercially available starting materials.
https://github.com/kent-tokyo/renkin
Background
Retrosynthesis is how organic chemists plan a synthesis: instead of asking "how do I make this?", you ask "what reaction could have produced this, and where do those precursors come from?" — working backwards from the target until you reach things you can actually buy. renkin automates that search.
SMILES is a text notation for molecular structure. Aspirin is CC(=O)Oc1ccccc1C(=O)O. That's what you pass in.
Search design
The straightforward approach — try every applicable reaction rule at every intermediate — runs into combinatorial explosion fast. A single molecule can match hundreds of rules, they apply recursively to every precursor, and the space grows exponentially. I needed something smarter.
AND-OR tree
Retrosynthesis search has a structure that doesn't fit standard graph search well. At any step, you can choose between reactions (either A or B works — an OR), but each reaction requires all its precursors simultaneously (AND). Standard graph search conflates these two, which messes up the cost accounting. renkin models the space as an AND-OR tree and searches it accordingly.
A* with SA Score
For the A* heuristic, I use the SA Score (Synthetic Accessibility Score) — a 1–10 number for how synthetically accessible a molecule is, where lower is easier. The idea is that lower SA Score intermediates are more likely to show up in building block catalogs, so steering the search in that direction tends to find better routes. It worked reasonably well in practice.
Beam search
For large molecules, even the AND-OR + A* combination can get out of hand. Beam search caps the candidates per step at N, which makes the computation predictable at the cost of some precision.
Reaction rules
Rules are written in SMARTS (a pattern language for chemical structures). The current set has 314:
- 31: hand-crafted rules for the most common reaction types — amide bond formation, esterification, Suzuki coupling, and similar
- 283: automatically extracted from the USPTO reaction database using rdchiral
The hand-crafted ones tend to be cleaner but don't cover much ground. The auto-extracted ones add coverage but come with noise. Template frequency weighting — giving higher priority to rules that appear more often in USPTO — turned out to be the biggest single factor in accuracy.
Benchmarks
USPTO-50k (4,907-molecule test set) is the standard evaluation for retrosynthesis tools. Here's how the numbers changed as I added each piece:
| Configuration | Solved | Rate | Rules | depth | beam |
|---|---|---|---|---|---|
| v0.1.0 initial (hand-crafted only) | 366/4907 | 7.5% | 31 | 3 | 50 |
| + auto templates (top-300) | 1363/4907 | 27.8% | 222 | 3 | 50 |
| + depth=5, top-500 templates | 2315/4907 | 47.2% | 314 | 5 | 50 |
| + beam=100 | 2688/4907 | 54.8% | 314 | 5 | 100 |
| + template frequency weighting | ~3484/4907 | ~71% | 314 | 5 | 100 |
The ~71% in the last row is confirmed on 100 molecules, not the full 4,907 — take it as a directional figure.
Comparison with other tools (same train/test split):
| Tool | Method | USPTO-50k |
|---|---|---|
| renkin | A* + AND-OR tree | ~71% (approx.)† |
| GLG | — | 58.0% |
| LocalRetro | Neural network | 53.4% |
| AiZynthFinder | MCTS | 45–53% |
| Retro* | AND-OR tree search | 44.3% |
| ASKCOS | MCTS | 41% |
† renkin's figure is from a 100-molecule sample; other tools used the full 4,907. This comparison still needs more work — I haven't verified whether the number holds at full scale.
The jump from template frequency weighting alone was larger than I expected. It's the thing I'd add first if starting over.
Why Pure Rust
renkin is built on chematic, a Pure Rust cheminformatics library I wrote earlier.
https://github.com/kent-tokyo/chematic
That means SMARTS matching, molecular graph operations, and SA Score calculation are all in safe Rust, no FFI. cargo build is enough, and it compiles to WebAssembly (~500 KB). For parallel rule application, renkin uses rayon — including a WASM-compatible build that runs through Web Workers, though that path hasn't had as much testing yet.
Usage
Python
pip install renkin
import renkin
result = renkin.find_routes(
"CC(=O)Oc1ccccc1C(=O)O", # aspirin
depth=5,
max_routes=3,
)
for route in result["routes"]:
for step in route["steps"]:
print(f" {step['target']} -> {' + '.join(step['precursors'])} [{step['rule']}]")
CLI (Rust)
cargo install renkin
renkin --target "CC(=O)Oc1ccccc1C(=O)O" --depth 5 \
--templates data/templates_extracted.smi
JavaScript / Node.js
npm install renkin
import init, { find_routes } from 'renkin';
await init();
const result = JSON.parse(find_routes("CC(=O)Oc1ccccc1C(=O)O", 5, 3, 0));
What's left
314 rules aren't enough for complex molecules like natural products — success rates drop there. I want to try pulling more templates from sources beyond USPTO.
Scoring routes by step count, yield, and cost (rather than just solved/not-solved) is also on the list. And a browser UI for stepping through the AND-OR tree is in progress.
Retrosynthesis engine "renkin":
https://github.com/kent-tokyo/renkin
The cheminformatics library underneath, "chematic":
https://github.com/kent-tokyo/chematic
Top comments (0)