Drug discovery takes over a decade and costs billions. Researchers jump between PubMed, chemical databases, internal compound libraries, safety reports - all disconnected, all siloed.
Databricks just published a blueprint showing how MCP closes that gap. And honestly, it's one of the most concrete agentic AI demos I've seen from any major data platform this year.
Let me break it down.
What AiChemy Actually Is
AiChemy is a multi-agent system Databricks built on their own platform. The architecture is simple to describe but hard to execute: a supervisor agent that routes tasks across multiple specialized sub-agents, each connected to a different data source.
The data sources include external MCP servers - OpenTargets for disease-gene associations, PubChem for molecular properties, PubMed for literature - and internal Databricks-managed MCP servers connected to proprietary chemical libraries.
One internal source is a Genie Space, which gives the agent text-to-SQL access over a structured drug properties database. The other is a Vector Search index over ZINC - a library of 250,000 commercially available molecules - embedded using ECFP4 molecular fingerprints. That's the bit that lets the agent do chemical similarity search, not just keyword search.
The result: a researcher can ask AiChemy to find compounds similar to a known drug like Elacestrant, pull disease context from OpenTargets, cross-reference with PubMed literature, and get a formatted research summary - all in one pass.
The Part Most People Are Missing: Skills
The more interesting angle here isn't even the MCP wiring. It's the Skills layer sitting on top.
Skills in this context are structured instruction sets that load into the agent when triggered. They don't change what data the agent can access - they change how it reasons and formats output for specific task types. Think of them as context injection for consistent, domain-specific behavior.
For something like drug discovery, this matters a lot. A lead identification task and a safety assessment task look completely different in terms of output format, reasoning sequence, and regulatory language. Skills let you encode that institutional knowledge in a reusable, deterministic way - without fine-tuning anything.
This design pattern isn't unique to Databricks, but AiChemy is one of the first public examples of Skills being deployed at this level of domain specificity. That's worth paying attention to.
Why This Matters Beyond Pharma
The drug discovery framing is what makes this newsworthy, but the architecture applies to anything with heterogeneous, high-stakes data.
Finance. Legal. Supply chain. Any domain where an agent needs to pull from structured databases, unstructured document stores, and external knowledge bases simultaneously - and where the output needs to be traceable, not just "good enough."
The pattern Databricks demonstrated is:
- External MCP servers for public or third-party knowledge
- Databricks-managed MCP servers for internal, governed data
- Genie Spaces for structured SQL-accessible data
- Vector Search for embedding-based retrieval over proprietary corpora
- Skills for task-specific reasoning and output formatting
- A supervisor agent to orchestrate all of the above
That's a production-grade agentic stack. Not a demo. Not a proof of concept.
What This Tells Us About MCP Adoption in 2025
A few months ago, MCP was still being treated as an interesting protocol for developer tools. Claude Desktop add-ons. Local automation scripts.
AiChemy is a signal that MCP is now being deployed inside enterprise data platforms to solve real domain problems. Databricks built MCP Catalog, managed MCP servers with Unity Catalog governance, and an MCP tab in Agent Bricks - all within the last year. The infrastructure investment is real.
The question isn't whether enterprises will adopt MCP-native agents. It's how fast they'll build the internal knowledge infrastructure - the vector indexes, the Genie Spaces, the Skills libraries - that makes those agents actually useful.
That's the work happening right now. And it's not small.
The Bigger Picture
What Databricks built with AiChemy is less about drug discovery specifically and more about what a governed, production-ready agentic system looks like when you have serious data infrastructure behind it.
The MCP layer handles connectivity. The Skills layer handles consistency. Unity Catalog handles governance. The supervisor handles orchestration.
Each piece is independently useful. Together they're something qualitatively different from what most MCP demos show.
If you're building agentic AI systems on top of enterprise data - or writing about this space - AiChemy is worth studying carefully. The GitHub repo is public. The architecture diagrams are clear. It's one of the better documented real-world MCP deployments I've come across.
Top comments (0)