Nicanor Korir

Posted on Dec 1, 2025

Shamba-MedCare Prompt Engineering

#ai #machinelearning #promptengineering #webdev

Some background context:

I am building a simple plant disease diagnosis solution using AI, inspired by my farming background and advancements in intelligent technological tools

You can check out the Shamba-MedCare App here. Sorry for testing, you'll have to use your own api keys until the public launch is available. The keys are stored in the browser's local storage, so they are private

For context here, whenever you read LLM(Large Language Model), I mostly Claude. I like to use LLM since it's generic, and this solution can be fitted to any LLM.

I played around with several prompts in order to nail the best results. This is how I transformed my prompt engineering journey with Shamba-MedCare:

My first prompt to LLM Vision was embarrassingly naive:

"What disease does this plant have?"

The response was a 2,000-word essay about plant pathology in general. Helpful for a textbook. Useless for a farmer with a dying tomato plant. Getting AI to return structured, actionable, budget-aware diagnoses took iteration. Here's what I learned.

The Architecture

Two prompts matter: the system prompt (who LLM(e.g. claude pretends to be) and the analysis prompt (what to do with this specific image).

System Prompt: Creating "Shamba"

Prompts work better with a persona. I created Shamba persona, an agricultural pathologist who:

You are Shamba, an expert agricultural pathologist. You analyze
plant images to identify diseases, pests, and nutrient deficiencies.

Your expertise includes:
- 50+ crop types worldwide
- Fungal, bacterial, viral, and physiological disorders
- Traditional and modern treatment methods
- Practical advice for resource-limited farmers

Guidelines:
1. Always include at least one FREE/traditional treatment
2. Describe WHERE symptoms appear (for visual mapping)
3. Be honest about uncertainty—use confidence scores
4. Recommend professional help for severe cases

The key line: "Always include at least one FREE/traditional treatment."

Without that explicit instruction, the LLM defaulted to commercial products. Helpful for a suburban gardener. Useless for a farmer who can't afford a $15 fungicide.

Failure #1: The JSON Nightmare

My first structured attempt asked LLM to return JSON, which it did pretty well. Wrapped in markdown code fences. With helpful commentary before and after.

Here's my analysis:

json
{ "disease": "Early Blight" }


This is a common fungal disease...

My parser choked. The fix was explicit:

Return ONLY a valid JSON object. No markdown, no commentary,
no text before or after. Start with { and end with }

Still failed 10% of the time. So I added backend parsing that:

Strips markdown fences if present
Extracts JSON from surrounding text
Validates against the expected schema

Failure #2: Location Descriptions

For the visual heatmap feature, I needed LLM to describe WHERE damage appeared. My prompt asked for "affected regions."

LLM returned: "The affected area is significant." This was not helpful, I needed the exact coordinates, and I tried out several solutions. This was close to perfect:

Describe affected regions with:
- Location(helpful for heatmaps): top-left, center, lower-right, edges, margins
- Coverage: percentage of area affected (e.g., "35%")
- Spread direction: "Moving from lower leaves upward."

Now LLM returns:

{
  "affected_regions": [
    {
      "location": "lower-left",
      "severity": "severe",
      "description": "Dark brown lesions with concentric rings",
      "coverage": 15
    },
    {
      "location": "center",
      "severity": "moderate",
      "coverage": 20
    }
  ]
}

That's enough to generate a heatmap overlay for now

Failure #3: Treatment Cost Blindness

Early on, treatments came out randomly ordered. Sometimes the $50 systemic fungicide appeared first. Sometimes, the free wood ash remedy.

The problem: LLM has no inherent understanding of budget constraints. I had to structure it:

Provide treatments in EXACTLY this order:
1. FREE TIER: Traditional/home remedies ($0)
2. LOW COST: Basic solutions ($1-5)
3. MEDIUM COST: Commercial organic ($5-20)
4. HIGH COST: Synthetic/professional ($20+)

Each tier must have at least one option if applicable.

The response schema enforced this:

{
  "treatments": [
    {
      "method": "Wood ash paste",
      "cost_tier": "free",
      "estimated_cost": "$0",
      "ingredients": ["Wood ash", "Water"],
      "application": "Apply directly to affected areas",
      "availability": "Common from cooking fires"
    },
    {
      "method": "Neem oil spray",
      "cost_tier": "low",
      "estimated_cost": "$1-3"
    }
  ]
}

The Plant Part-Specific Prompt Strategy

Different plant parts reveal different problems, and I needed the right prompt to get the right problem with the best remedies. My prompt adapts:

For leaves:

Examine: color patterns, spot shapes, curling, holes, coating
Common issues: fungal spots, viral mosaic, nutrient chlorosis, pest damage

For roots:

Examine: color (white=healthy, brown/black=rot), texture, galls, structure
Common issues: root rot, nematode damage, waterlogging

This focus improves accuracy dramatically. Asking LLM to look for "anything wrong" produces vague results. Asking it to specifically check for concentric ring patterns in leaf spots? Now we're diagnosing Early Blight.

The Final Prompt Structure

[SYSTEM PROMPT]
You are Shamba, an agricultural pathologist...

[ANALYSIS PROMPT]
Analyze this {plant_part} image from a {crop_type} plant.

User's context: {additional_context}

Provide:
1. Image validation (correct plant part? good quality?)
2. Health score (0-100)
3. Disease identification with confidence (0.0-1.0)
4. Affected region locations for visual mapping
5. Treatments by cost tier (FREE mandatory)
6. Prevention tips

Return as JSON following this schema:
{response_schema}

What I'd Do Differently

Start with the output format first. I designed prompts around what I wanted LLM to do. I should have designed around what the farmer needed to see.

The heatmap feature was an afterthought. If I'd planned for it from day one, the location description format would have been baked in, not retrofitted. This is actually a useful feature for farmers if you can imagine an affected plant, there are heatmaps on the heavily affected areas

Test with bad photos early. My development photos were well-lit, centered, single-issue plants. Real farmer photos are blurry, shadowy, and show three problems at once. The robustness I needed only emerged after testing with garbage inputs.

Source code on GitHub