Iโve been reading โ๐๐๐ง๐ ๐๐ก๐๐ข๐ง ๐๐จ๐ซ ๐๐ข๐๐ ๐๐๐ข๐๐ง๐๐๐ฌ ๐๐ง๐ ๐๐๐๐ฅ๐ญ๐ก๐๐๐ซ๐โ by Ivan Reznikov, published by O'Reilly, and hereโs what stood out to me:
In ๐๐ก๐๐ฆ๐ข๐ฌ๐ญ๐ซ๐ฒ ๐๐, the way we represent molecules may shape how models โunderstandโ chemistry.
๐๐ก๐๐ฆ๐ข๐ฌ๐ญ๐ซ๐ฒ-๐ญ๐ฎ๐ง๐๐ ๐๐๐๐ฌ ๐๐จ๐งโ๐ญ ๐ข๐ง๐ญ๐๐ซ๐ฉ๐ซ๐๐ญ ๐ฆ๐จ๐ฅ๐๐๐ฎ๐ฅ๐๐ฌ ๐ฅ๐ข๐ค๐ ๐๐ก๐๐ฆ๐ข๐ฌ๐ญ๐ฌ ๐๐จ. They interpret them as ๐ฌ๐๐ช๐ฎ๐๐ง๐๐๐ฌ ๐จ๐ ๐ญ๐จ๐ค๐๐ง๐ฌ.
Those tokens can come in different molecular representations:
โข ๐๐๐๐๐๐
โข ๐๐๐๐
๐๐๐
โข ๐๐ง๐๐ก๐ ๐ข๐๐๐ง๐ญ๐ข๐๐ข๐๐ซ๐ฌ
This creates a fascinating challenge for generative AI:
๐ ๐๐ก๐ข๐๐ก ๐ฆ๐จ๐ฅ๐๐๐ฎ๐ฅ๐๐ซ ๐ซ๐๐ฉ๐ซ๐๐ฌ๐๐ง๐ญ๐๐ญ๐ข๐จ๐ง ๐ ๐ข๐ฏ๐๐ฌ ๐๐๐๐ฌ ๐ญ๐ก๐ ๐๐๐ฌ๐ญ ๐๐๐ข๐ฅ๐ข๐ญ๐ฒ ๐ญ๐จ ๐ซ๐๐๐ฌ๐จ๐ง ๐๐๐จ๐ฎ๐ญ ๐๐ก๐๐ฆ๐ข๐ฌ๐ญ๐ซ๐ฒ?
๐๐๐๐๐๐ is compact and widely used, but struggles with ambiguity, stereochemistry, and incomplete molecular context.
๐๐๐๐
๐๐๐ is more robust because every generated sequence maps to a valid molecule.
๐๐ง๐๐ก๐ provides standardization, but sequence generation becomes harder due to its complexity.
Molecular representation is not just a formatting choice. It directly influences how AI models learn chemical relationships.
This is where initiatives like ๐ฎ๐ป4๐บ๐ซ (Generative Toolkit for Scientific Discovery) become important.
GT4SD explores chemistry-focused generative models capable of:
๐งช Chemical reaction prediction
๐งช Retrosynthesis prediction
๐งช Description โ SMILES generation
๐งช SMILES โ caption generation
๐งช Paragraph โ laboratory actions
The book evaluated several chemistry-tuned models, including:
โข ๐๐๐๐๐ ๐ฆ๐ฎ๐ฅ๐ญ๐ข๐ญ๐๐ฌ๐ค ๐๐ ๐ฆ๐จ๐๐๐ฅ๐ฌ
โข ๐๐จ๐ฅ๐๐
โข ๐๐๐๐๐๐๐-๐๐
What I found most interesting was not the successes, but the failures.
During reaction prediction tasks such as Fischer esterification, several models generated chemically implausible molecules, sometimes introducing atoms never present in the reactants.
And that exposes the deeper issue:
โ ๏ธ ๐๐ซ๐๐๐ข๐๐ญ๐ข๐ง๐ ๐ฆ๐จ๐ฅ๐๐๐ฎ๐ฅ๐๐ซ ๐ญ๐จ๐ค๐๐ง๐ฌ ๐ข๐ฌ ๐ง๐จ๐ญ ๐ญ๐ก๐ ๐ฌ๐๐ฆ๐ ๐๐ฌ ๐ฎ๐ง๐๐๐ซ๐ฌ๐ญ๐๐ง๐๐ข๐ง๐ ๐๐ก๐๐ฆ๐ข๐ฌ๐ญ๐ซ๐ฒ.
Real chemistry depends on:
โข Reaction mechanisms
โข Thermodynamics
โข Electron movement
โข Stereochemistry
โข 3D spatial interactions
๐ ๐๐ ๐ญ๐จ๐ค๐๐ง ๐ฌ๐๐ช๐ฎ๐๐ง๐๐ ๐๐๐ง ๐จ๐ง๐ฅ๐ฒ ๐๐๐ฉ๐ญ๐ฎ๐ซ๐ ๐ฉ๐๐ซ๐ญ ๐จ๐ ๐ญ๐ก๐๐ญ ๐ซ๐๐๐ฅ๐ข๐ญ๐ฒ.
One takeaway from the chapter was:
โ๐๐ข๐ญ๐ก๐จ๐ฎ๐ญ ๐ฎ๐ง๐๐๐ซ๐ฌ๐ญ๐๐ง๐๐ข๐ง๐ ๐ญ๐ก๐ ๐ฌ๐๐ข๐๐ง๐๐ ๐๐๐ก๐ข๐ง๐ ๐๐ก๐๐ฆ๐ข๐๐๐ฅ ๐ซ๐๐๐๐ญ๐ข๐จ๐ง๐ฌ, ๐ฆ๐จ๐๐๐ฅ๐ฌ ๐๐ซ๐ ๐ฃ๐ฎ๐ฌ๐ญ ๐ ๐ฎ๐๐ฌ๐ฌ๐ข๐ง๐ ๐ฉ๐จ๐ฌ๐ฌ๐ข๐๐ฅ๐ ๐ฌ๐จ๐ฅ๐ฎ๐ญ๐ข๐จ๐ง๐ฌ.โ
The future of AI-driven chemistry likely wonโt come from LLMs alone.
It will come from hybrid systems combining:
๐น Language models
๐น Graph neural networks
๐น 3D molecular representations
๐น Physics-informed AI
๐น Symbolic chemical reasoning
We are moving from models that generate chemistry to models that may eventually understand chemistry.

For further actions, you may consider blocking this person and/or reporting abuse
Top comments (0)