RDKit has been the dominant cheminformatics library since its open-source release in 2006. It is written in C++, wrapped in Python, and has accumulated nearly two decades of validated chemistry: SMILES and SMARTS parsing, multiple fingerprint types, 2D coordinate generation, 3D conformer generation, MMFF94 and UFF force fields, a PostgreSQL cartridge. Most cheminformatics pipelines assume it is present.
In mid-2026, Rust's answer is rdkit-sys — bindings to RDKit's C++ CFFI interface — and a collection of pure-Rust crates that stalled in 2020-2021.
What exists in 2025-2026
| Crate | Type | Latest | Status |
|---|---|---|---|
| rdkit-sys | C++ FFI to RDKit | 0.4.12 (Oct 2024) | Maintained |
| openbabel | C++ FFI to Open Babel | 0.5.4 (Jan 2025) | Maintained |
| chemcore | Pure Rust | 0.4.1 (Feb 2021) | Unmaintained |
| purr | Pure Rust (SMILES parser) | 0.9.0 (Mar 2021) | Unmaintained |
| smiles-parser | Pure Rust (SMILES parser) | 0.4.1 (Nov 2020) | Unmaintained |
| cosmolkit | Pure Rust (new attempt) | 0.2.3 (May 2026) | New, unproven |
The pattern in the pure-Rust column is consistent: implementations hit a wall around 2020-2021 and stopped. The active work is FFI bindings to existing C++ tools. A new attempt (cosmolkit) appeared recently with an ambitious scope — SMILES, SDF, conformers, molecular graphs — but with under 800 downloads it is too early to evaluate.
SMILES parsing is solved. The rest is not.
Parsing a SMILES string is a context-free grammar problem, and Rust handles those well. purr implements the full OpenSMILES specification. smiles-parser does the same. Both work. Neither has had a release since 2020-2021.
The problem starts after parsing.
A SMILES string like c1ccccc1 (benzene) uses lowercase atoms to indicate aromaticity. To do anything useful — calculate molecular weight, count implicit hydrogens, check valence — you need to convert it to a Kekulé structure: alternating single and double bonds. This is kekulization, and it is a constraint-satisfaction problem on the molecular graph.
chemcore, the most complete pure-Rust attempt, has supported kekulization since its initial release (v0.1.x, June 2020). A benchmark published alongside v0.3.1 in October 2020 showed it handling edge cases that RDKit cannot. But kekulization is one step. What chemcore does not have: fingerprints, 2D coordinate generation, SMARTS matching, or stereochemistry. The last release was February 2021. Getting past kekulization turned out not to be the finishing line.
Aromaticity: no agreed definition
Even with kekulization in place, aromaticity perception is harder than it looks — partly because aromaticity itself has no single agreed-upon definition in cheminformatics.
Hückel's rule — 4n+2 π electrons — works for monocyclic systems. For polycyclic aromatics and heteroaromatics, implementations diverge. Daylight's original SMILES aromatic model differs from RDKit's model, which differs from CDK's. An algorithm that kekulizes correctly under one model may fail under another.
Any pure-Rust toolkit that wants to produce output compatible with RDKit-generated SMILES needs to match RDKit's aromaticity behavior exactly, not implement some variant of Hückel. That requires reading RDKit's source code and testing against its outputs. It is months of work before any of it is visible to end users.
2D coordinate generation: not attempted
Every cheminformatics toolkit ships 2D depiction — you cannot work with molecules you cannot see. The layout problem is harder than it looks.
RDKit ships its own 2D depiction engine (rdDepictor) and also integrates Schrodinger's CoordGen library because rdDepictor alone produces clashing depictions for complex ring systems. Two tools are needed because neither is sufficient alone. CoordGen works by matching known ring scaffold templates and running iterative geometry optimization for everything else.
No pure-Rust crate has attempted 2D coordinate generation. Getting it right requires ring perception, a library of scaffold templates, and an optimization pass to resolve clashes. It is a multi-month project, and the output is still wrong until enough templates are added.
Substructure search: the graph is not the chemistry
petgraph (v0.8.3, 377M total downloads) provides VF2-based subgraph isomorphism and is actively maintained. VF2 is the standard algorithm for this — roughly an order of magnitude faster than Ullmann on typical molecule-sized graphs. The graph infrastructure exists in Rust.
SMARTS matching, which is how substructure search works in cheminformatics, requires more than graph isomorphism. A SMARTS pattern [#6;r6] means "a carbon atom in a 6-membered ring." Matching it requires: parsing SMARTS syntax, knowing which atoms belong to which rings, and matching node attributes with chemical semantics — atomic number, formal charge, aromaticity flag, implicit hydrogen count.
Connecting petgraph's isomorphism to a chemistry-aware molecular graph is exactly the glue code that no published Rust crate provides.
Why bindings are the rational choice
RDKit's changelog goes back to 2006. The codebase contains 200+ molecular descriptors, MMFF94 and UFF force fields with their respective validation papers, an ETKDG 3D conformer generator that uses torsion angle statistics from the Cambridge Structural Database, and a PostgreSQL cartridge for large-scale screening. The Python ecosystem wraps all of this: chembl_webresource_client for ChEMBL API access, PandasTools, scikit-learn integration for ML on fingerprints.
rdkit-sys exposes a fraction of this via RDKit's CFFI interface. Choosing bindings over a rewrite is not a concession. It is what you do when you look at how much chemistry is embedded in that C++ code and how long it took to get there.
What changed in 2024-2025, and what 2026 adds so far
2024-2025: rdkit-sys had three releases in 2024, the last in October, and moved into the rdkit-rs/rdkit monorepo. openbabel (Rust bindings) released 0.5.4 in January 2025 — it exposes Open Babel's OBSmartsPattern, which matters if you need substructure search without pulling in RDKit.
2026: The only 2026-specific addition is cosmolkit (v0.2.3, May 2026, 778 downloads). It claims an ambitious scope — SMILES, SDF, conformers, molecular graphs, "AI-ready workflows" — but it is too new to evaluate. Whether it addresses aromaticity perception and 2D layout, the parts that stopped every earlier attempt, is not clear from the current documentation.
As of this writing, nothing else has shipped in 2026. The structural gap between Rust and Python cheminformatics is the same as it was in 2025.
The actual hard part
The challenging problems in cheminformatics are not Rust-specific. Ownership and lifetimes will slow you down on day one; aromaticity will block you on month three. The chemistry fundamentals — aromaticity perception, 2D layout, stereochemistry, substructure matching — require domain knowledge that does not come from a Rust tutorial.
RDKit did not get where it is because C++ is better than Rust. It got there because a team of chemists and programmers spent two decades solving specific, hard chemistry problems. Whoever builds the Rust equivalent will need to solve the same problems.
I have been working around these gaps while building chem-wasm-lens, a pure-Rust molecular analysis library targeting the browser via WebAssembly. Restricting scope — no SMARTS, no full stereochemistry — made it possible to ship. But restricted scope is different from a general-purpose toolkit, and that distinction matters.
Top comments (0)