Agentic AI in chemistry

#ai #agentaichallenge #agents #llm

I’ve been reading “𝐋𝐚𝐧𝐠𝐂𝐡𝐚𝐢𝐧 𝐟𝐨𝐫 𝐋𝐢𝐟𝐞 𝐒𝐜𝐢𝐞𝐧𝐜𝐞𝐬 𝐚𝐧𝐝 𝐇𝐞𝐚𝐥𝐭𝐡𝐜𝐚𝐫𝐞” by Ivan Reznikov, published by O'Reilly, and here’s what stood out to me:
In 𝐜𝐡𝐞𝐦𝐢𝐬𝐭𝐫𝐲 𝐀𝐈, the way we represent molecules may shape how models “understand” chemistry.
𝐂𝐡𝐞𝐦𝐢𝐬𝐭𝐫𝐲-𝐭𝐮𝐧𝐞𝐝 𝐋𝐋𝐌𝐬 𝐝𝐨𝐧’𝐭 𝐢𝐧𝐭𝐞𝐫𝐩𝐫𝐞𝐭 𝐦𝐨𝐥𝐞𝐜𝐮𝐥𝐞𝐬 𝐥𝐢𝐤𝐞 𝐜𝐡𝐞𝐦𝐢𝐬𝐭𝐬 𝐝𝐨. They interpret them as 𝐬𝐞𝐪𝐮𝐞𝐧𝐜𝐞𝐬 𝐨𝐟 𝐭𝐨𝐤𝐞𝐧𝐬.
Those tokens can come in different molecular representations:
• 𝐒𝐌𝐈𝐋𝐄𝐒
• 𝐒𝐄𝐋𝐅𝐈𝐄𝐒
• 𝐈𝐧𝐂𝐡𝐈 𝐢𝐝𝐞𝐧𝐭𝐢𝐟𝐢𝐞𝐫𝐬
This creates a fascinating challenge for generative AI:
👉 𝐖𝐡𝐢𝐜𝐡 𝐦𝐨𝐥𝐞𝐜𝐮𝐥𝐚𝐫 𝐫𝐞𝐩𝐫𝐞𝐬𝐞𝐧𝐭𝐚𝐭𝐢𝐨𝐧 𝐠𝐢𝐯𝐞𝐬 𝐋𝐋𝐌𝐬 𝐭𝐡𝐞 𝐛𝐞𝐬𝐭 𝐚𝐛𝐢𝐥𝐢𝐭𝐲 𝐭𝐨 𝐫𝐞𝐚𝐬𝐨𝐧 𝐚𝐛𝐨𝐮𝐭 𝐜𝐡𝐞𝐦𝐢𝐬𝐭𝐫𝐲?
𝐒𝐌𝐈𝐋𝐄𝐒 is compact and widely used, but struggles with ambiguity, stereochemistry, and incomplete molecular context.
𝐒𝐄𝐋𝐅𝐈𝐄𝐒 is more robust because every generated sequence maps to a valid molecule.
𝐈𝐧𝐂𝐡𝐈 provides standardization, but sequence generation becomes harder due to its complexity.
Molecular representation is not just a formatting choice. It directly influences how AI models learn chemical relationships.
This is where initiatives like 𝑮𝑻4𝑺𝑫 (Generative Toolkit for Scientific Discovery) become important.
GT4SD explores chemistry-focused generative models capable of:
🧪 Chemical reaction prediction
🧪 Retrosynthesis prediction
🧪 Description → SMILES generation
🧪 SMILES → caption generation
🧪 Paragraph → laboratory actions
The book evaluated several chemistry-tuned models, including:
• 𝐆𝐓𝟒𝐒𝐃 𝐦𝐮𝐥𝐭𝐢𝐭𝐚𝐬𝐤 𝐓𝟓 𝐦𝐨𝐝𝐞𝐥𝐬
• 𝐌𝐨𝐥𝐓𝟓
• 𝐂𝐇𝐄𝐌𝐋𝐋𝐌-𝟐𝐛
What I found most interesting was not the successes, but the failures.
During reaction prediction tasks such as Fischer esterification, several models generated chemically implausible molecules, sometimes introducing atoms never present in the reactants.
And that exposes the deeper issue:
⚠️ 𝐏𝐫𝐞𝐝𝐢𝐜𝐭𝐢𝐧𝐠 𝐦𝐨𝐥𝐞𝐜𝐮𝐥𝐚𝐫 𝐭𝐨𝐤𝐞𝐧𝐬 𝐢𝐬 𝐧𝐨𝐭 𝐭𝐡𝐞 𝐬𝐚𝐦𝐞 𝐚𝐬 𝐮𝐧𝐝𝐞𝐫𝐬𝐭𝐚𝐧𝐝𝐢𝐧𝐠 𝐜𝐡𝐞𝐦𝐢𝐬𝐭𝐫𝐲.
Real chemistry depends on:
• Reaction mechanisms
• Thermodynamics
• Electron movement
• Stereochemistry
• 3D spatial interactions
𝐀 𝟏𝐃 𝐭𝐨𝐤𝐞𝐧 𝐬𝐞𝐪𝐮𝐞𝐧𝐜𝐞 𝐜𝐚𝐧 𝐨𝐧𝐥𝐲 𝐜𝐚𝐩𝐭𝐮𝐫𝐞 𝐩𝐚𝐫𝐭 𝐨𝐟 𝐭𝐡𝐚𝐭 𝐫𝐞𝐚𝐥𝐢𝐭𝐲.
One takeaway from the chapter was:
“𝐖𝐢𝐭𝐡𝐨𝐮𝐭 𝐮𝐧𝐝𝐞𝐫𝐬𝐭𝐚𝐧𝐝𝐢𝐧𝐠 𝐭𝐡𝐞 𝐬𝐜𝐢𝐞𝐧𝐜𝐞 𝐛𝐞𝐡𝐢𝐧𝐝 𝐜𝐡𝐞𝐦𝐢𝐜𝐚𝐥 𝐫𝐞𝐚𝐜𝐭𝐢𝐨𝐧𝐬, 𝐦𝐨𝐝𝐞𝐥𝐬 𝐚𝐫𝐞 𝐣𝐮𝐬𝐭 𝐠𝐮𝐞𝐬𝐬𝐢𝐧𝐠 𝐩𝐨𝐬𝐬𝐢𝐛𝐥𝐞 𝐬𝐨𝐥𝐮𝐭𝐢𝐨𝐧𝐬.”
The future of AI-driven chemistry likely won’t come from LLMs alone.
It will come from hybrid systems combining:
🔹 Language models
🔹 Graph neural networks
🔹 3D molecular representations
🔹 Physics-informed AI
🔹 Symbolic chemical reasoning
We are moving from models that generate chemistry to models that may eventually understand chemistry.

DEV Community

Agentic AI in chemistry

Top comments (0)