<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Kantemir Satibalov</title>
    <description>The latest articles on DEV Community by Kantemir Satibalov (@kantik001).</description>
    <link>https://dev.to/kantik001</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4009665%2Fb646b101-b5c6-4005-98d8-2fe3930bb3a7.png</url>
      <title>DEV Community: Kantemir Satibalov</title>
      <link>https://dev.to/kantik001</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kantik001"/>
    <language>en</language>
    <item>
      <title>I stopped trusting generic LLMs for horticulture — so I built a grounded assistant on ~500 scientific articles</title>
      <dc:creator>Kantemir Satibalov</dc:creator>
      <pubDate>Tue, 30 Jun 2026 13:23:58 +0000</pubDate>
      <link>https://dev.to/kantik001/i-stopped-trusting-generic-llms-for-horticulture-so-i-built-a-grounded-assistant-on-500-46p9</link>
      <guid>https://dev.to/kantik001/i-stopped-trusting-generic-llms-for-horticulture-so-i-built-a-grounded-assistant-on-500-46p9</guid>
      <description>&lt;p&gt;Last year I kept seeing the same pattern in agtech and “AI assistant” demos: a chatbot wrapped around a generic model, a handful of PDFs, and a disclaimer nobody reads.&lt;/p&gt;

&lt;p&gt;I'm a developer, not an agronomist. But I'm working on two related projects — a grounded RAG platform (grounded-llm, private repo) and its first production-shaped domain pack: a horticulture assistant built on hundreds of articles from the Russian journal Plodovodstvo i vinogradstvo Yuga Rossii (apple, pear, plum — on the order of ~500 source articles, not five blog posts).&lt;/p&gt;

&lt;p&gt;I didn't want another “ChatGPT for gardeners.”&lt;br&gt;
I wanted answers that behave like someone who actually read the literature — and admits when the literature doesn't cover the question.&lt;/p&gt;

&lt;p&gt;That gap turned into months of engineering. I'm sharing the story in public; the full corpus and codebase stay private.&lt;/p&gt;

&lt;p&gt;What broke first: “sounds right” ≠ “is right”&lt;br&gt;
Early experiments failed in boring, repeatable ways.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Domain language doesn't match generic retrieval&lt;/strong&gt;&lt;br&gt;
Russian horticulture is full of synonyms and notation variants: rootstock labels, disease names, regional cultivars. A user writes марссониоз; the literature may use Marssonina, abbreviations, or OCR-noisy spellings. Naive retrieval misses; the model fills the gap confidently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Scientific text isn't FAQ-shaped&lt;/strong&gt;&lt;br&gt;
Articles contain experiment sections, tables, and “brief for the grower” blocks. One chunk size for everything → right article, wrong paragraph → fluent wrong answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Generation is the wrong place to fix retrieval&lt;/strong&gt;&lt;br&gt;
If the right passage never reaches the prompt, no system prompt saves you. I separated concerns early:&lt;/p&gt;

&lt;p&gt;Python service → retrieval only (/rag/context)&lt;br&gt;
Go server → sessions, LLM calls, answer cleanup, guardrails&lt;br&gt;
Not because microservices are fashionable — because I needed to change and measure retrieval without redeploying the whole product.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I built: two layers, one product&lt;/strong&gt;&lt;br&gt;
| Layer | What it is |&lt;br&gt;
|-------|------------|&lt;br&gt;
| &lt;strong&gt;Platform core&lt;/strong&gt; (grounded-llm) | Auth, Postgres sessions, orchestration |&lt;br&gt;
| &lt;strong&gt;Domain pack&lt;/strong&gt; (horticulture) | Corpus, crop config, prompts, eval baselines |&lt;/p&gt;

&lt;p&gt;There's also a non-agricultural sandbox (demo_hr) — HR policy docs, same pipeline — to show the platform isn't hard-coded to apple diseases.&lt;/p&gt;

&lt;p&gt;The horticulture pack indexes on the order of ~14,500 text chunks from the journal corpus. At this scale, “vector search only” and “we'll fix it in the prompt” stop being credible.&lt;/p&gt;

&lt;p&gt;I'm not open-sourcing the full article texts (rights + focus). I am sharing architecture lessons, failure modes, and metrics — and offering controlled demos when it's worth someone's time.&lt;/p&gt;

&lt;p&gt;One question that kept me honest&lt;br&gt;
Which rootstocks and training systems show up in slope / terrace planting research for our region?&lt;/p&gt;

&lt;p&gt;Generic LLMs invent varieties and numbers.&lt;br&gt;
A grounded system either retrieves relevant experimental context — rootstocks, spacing, relief, regional trials — or should refuse to answer.&lt;/p&gt;

&lt;p&gt;That requirement ruled out most tutorial RAG stacks I'd seen. It also ruled out marketing photo → disease as the hero feature before a model is actually trained on disease imagery. Vision is on the roadmap; text grounded in papers is what's production-shaped today.&lt;/p&gt;

&lt;p&gt;What I deliberately didn't optimize for (yet):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multi-tenant SaaS billing&lt;/li&gt;
&lt;li&gt;Viral B2C Telegram growth&lt;/li&gt;
&lt;li&gt;Claiming diagnosis-grade vision from an ImageNet backbone&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I optimized for:&lt;br&gt;
1.Retrieval you can regression-test&lt;br&gt;
2.Answers you can gate before users see them&lt;br&gt;
3.A platform you can re-pack for another vertical in days, not months&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's next (Part 2)&lt;/strong&gt;&lt;br&gt;
Part 1 was the why.&lt;/p&gt;

&lt;p&gt;Part 2 is the decision that changed everything: I don't trust the pipeline until a fixed suite of domain questions passes retrieval — today 68 questions across apple, pear, plum, and the HR sandbox — before we pay for a single generated token.&lt;/p&gt;

&lt;p&gt;Spoiler: getting there wasn't “use a bigger embedding model.” It was unglamorous engineering — chunking, hybrid search, reranking, glossary expansion — I'll unpack one layer per post.&lt;/p&gt;

&lt;p&gt;If this resonates&lt;br&gt;
I'm building in public through writing, not through dumping the entire corpus on GitHub.&lt;/p&gt;

&lt;p&gt;Follow on Dev.to for Part 2&lt;br&gt;
Comment if you've hit similar RAG failure modes in regulated or scientific domains&lt;br&gt;
Reach out (GitHub / email in bio) for a short demo: HR sandbox or limited horticulture preview&lt;br&gt;
Disclaimer: assistant output is informational; field decisions require local experts and compliant product labels.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>llm</category>
    </item>
  </channel>
</rss>
