DEV Community

Cover image for How We Built CropGuard AI — Plant Disease Detection with Django, MongoDB Atlas and Deep Learning
Aakarsh Chimmani
Aakarsh Chimmani

Posted on

How We Built CropGuard AI — Plant Disease Detection with Django, MongoDB Atlas and Deep Learning

CropGuard AI is a Django-served Python web application backed by MongoDB Atlas for application persistence. The project combines plant disease detection, real-time model training with live SSE streaming, custom model diagnostics with Grad-CAM heatmaps, and MongoDB-powered analytics in one full stack deployment. This post explains why we chose this architecture, what Django serves, and how MongoDB fits into the runtime and data flow.


Team Members

This project was developed by:

We would like to express our sincere gratitude to @chanda_rajkumar for their valuable guidance and support throughout this project.

Their insights into system design, architecture, and development played a key role in shaping CropGuard-Ai.


The Problem We Set Out to Solve

Plant diseases destroy nearly 20–40% of global crop yield every year. Farmers in rural areas detect diseases too late because traditional diagnosis requires sending samples to labs — a process that takes days or even weeks.

We built CropGuard AI to solve this. Upload a photo of any plant leaf and get an instant diagnosis in under one second, powered by a production-grade deep learning model with 99.85% accuracy across 38 disease classes.

CropGuard AI Home Page

🌿 99.85% — EfficientNetV2-M Accuracy
38 plant disease classes · 14 crop types · 87,000+ training images

But the most interesting part of this project is not just the AI model. It is how we used MongoDB Atlas as the backbone for storing predictions, training metadata, and powering a real-time analytics dashboard with 7 aggregation pipelines.

CropGuard AI Live Demo — Disease Detection with Top-5 Predictions and Grad-CAM


Why Django and MongoDB Over Other Options

When we reviewed the architecture of CropGuard AI, the biggest challenge was not the neural network itself. It was managing ML model metadata, prediction history, training sessions, and analytics data in a way that could evolve as the platform grew without breaking existing data flows.

That is why we organized the project around Django as the single Python runtime while keeping MongoDB Atlas as the persistence layer. Instead of SQL tables that would need migrations every time we added a new field to model metadata, MongoDB let us store evolving document-shaped data cleanly and flexibly.

💡 Django is the runtime contract for CropGuard AI, and MongoDB Atlas is the application persistence layer. All API endpoints, auth routes, training pipeline, SSE streaming, and analytics are Django-served, while predictions, trained models, and user verification data are stored as MongoDB documents.

This matters especially for AI platforms. CropGuard AI has real-time SSE training streams, Firebase authentication, prediction history, model registry, Grad-CAM heatmap generation, and 7 aggregation pipelines for analytics. When these pieces share one Python runtime boundary, the system is easier to reason about, test, and deploy.


Application Architecture

The current CropGuard AI stack is organized around a Django project with a dedicated api application and a MongoDB Atlas persistence layer. The React frontend runs on Vite and communicates with Django via 25 REST API endpoints. The backend handles prediction, training, model testing, analytics, and authentication in one unified runtime.

{
  "runtime": "Django + Daphne ASGI server",
  "frontend": "React + Vite (port 5174)",
  "backend": "Django + Daphne (port 8000)",
  "ai_frameworks": ["PyTorch", "TensorFlow/Keras", "timm"],
  "builtin_model": "EfficientNetV2-M (99.85% accuracy, 38 classes)",
  "mongodb_collections": [
    "prediction_history",
    "model_registry",
    "users"
  ],
  "total_api_endpoints": 25,
  "real_time": "Server Sent Events (SSE) for live training charts",
  "authentication": "Firebase (Google Cloud)",
  "pdf_reports": "ReportLab — generated in memory, no disk writes",
  "deployment": "daphne -p 8000 cropguard.asgi:application"
}
Enter fullscreen mode Exit fullscreen mode

CropGuard AI Results — 99.85% Accuracy, Training Curves and Per-Class F1 Score


MongoDB Data Model in CropGuard AI

MongoDB is central to CropGuard AI because the platform generates multiple types of data with different structures. Prediction results, trained model metadata, and user verification codes all have different shapes. A rigid SQL table design would require multiple joined tables for a single prediction document and would break every time we added a new field to model metadata.

With MongoDB, each record remains a self-contained document while still being grouped into meaningful collections for retrieval and analytics. This document-oriented design is especially useful for ML platforms where data shapes evolve as the platform gains new features.

Collection 1 — prediction_history

Every time a user uploads a plant leaf image, the prediction is automatically saved as a MongoDB document with no extra API call required — it happens as a side effect inside the predict endpoint:

{
  "prediction_id": "a3f2b1c4-70cf-423b-8e61-153d63756d43",
  "user_email": "farmer@email.com",
  "timestamp": "2026-04-07T11:32:09Z",
  "image_filename": "tomato_leaf.jpg",
  "predicted_class": "Tomato___Late_blight",
  "confidence": 0.797,
  "top5": [
    {"class_name": "Tomato Late Blight", "confidence": 0.797},
    {"class_name": "Tomato Mosaic Virus", "confidence": 0.053},
    {"class_name": "Tomato Early Blight", "confidence": 0.037},
    {"class_name": "Tomato Healthy", "confidence": 0.028},
    {"class_name": "Tomato Septoria Leaf Spot", "confidence": 0.018}
  ],
  "model_used": "EfficientNetV2-M",
  "plant_type": "Tomato",
  "disease_name": "Late Blight",
  "is_healthy": false
}
Enter fullscreen mode Exit fullscreen mode

Collection 2 — model_registry

Every model trained on the platform is saved with full metadata — architecture, hyperparameters, class labels, and file paths — so it can be loaded directly into the Testing Lab without re-uploading:

{
  "model_id": "07bac6ea-70cf-423b-8e61-153d63756d43",
  "name": "Tomato Research Model",
  "base_architecture": "MobileNetV2",
  "date_trained": "2026-04-07T11:29:32Z",
  "final_train_acc": 0.35,
  "final_val_acc": 0.33,
  "num_classes": 3,
  "class_labels": [
    "Tomato_Early_Blight",
    "Tomato_Healthy",
    "Tomato_Late_Blight"
  ],
  "hyperparameters": {
    "learning_rate": 0.001,
    "batch_size": 16,
    "epochs": 3,
    "dropout": 0.3
  },
  "h5_path": "storage/models/model_07bac6ea.h5"
}
Enter fullscreen mode Exit fullscreen mode

Collection 3 — users

Email verification codes are stored in MongoDB with a 10-minute expiry timestamp instead of in-memory dictionaries, which means codes persist across server restarts:

{
  "email": "user@email.com",
  "verification_code": "847291",
  "created_at": "2026-04-07T11:00:00Z"
}
Enter fullscreen mode Exit fullscreen mode

Collections Summary:

Collection Purpose
prediction_history Every prediction — disease name, confidence, plant type, top-5 results, timestamp
model_registry All trained model metadata — architecture, class labels, hyperparameters, file path, accuracy
users Firebase email verification codes with 10-minute expiry and unique index on email

The Advanced Part — 7 MongoDB Aggregation Pipelines

This is where MongoDB goes far beyond basic CRUD. Our Analytics Dashboard runs 7 aggregation pipelines entirely inside MongoDB. Python just receives the final computed result — no Python loops or manual calculations for any analytics.

✅ All 7 pipelines use MongoDB stages like $group, $match, $sort, $limit, $bucket, $avg, and $dateToString. Computation happens inside the database engine, not in Python. All 7 results are returned from a single GET /api/analytics/overview call.

# Pipeline Stages
1 Platform Overview Stats $group → $sum → $avg
2 Top 10 Most Detected Diseases $match → $group → $sort → $limit
3 Top 10 Most Affected Plant Types $group → $sort → $limit
4 Predictions Per Day — Last 30 Days $match → $group → $dateToString → $sort
5 Confidence Distribution — $bucket $bucket with boundaries [0, 0.5, 0.7, 0.85, 0.95, 1.0]
6 Model Performance by Architecture $group → $avg → $max
7 Training Activity — Last 30 Days $match → $group → $dateToString → $sort

Example — Pipeline 2: Top Diseases

top_diseases = list(db['prediction_history'].aggregate([
    {"$match": {"is_healthy": False}},
    {"$group": {"_id": "$disease_name", "count": {"$sum": 1}}},
    {"$sort": {"count": -1}},
    {"$limit": 10}
]))
Enter fullscreen mode Exit fullscreen mode

Example — Pipeline 4: Predictions Per Day

thirty_days_ago = datetime.utcnow() - timedelta(days=30)

predictions_per_day = list(db['prediction_history'].aggregate([
    {"$match": {"timestamp": {"$gte": thirty_days_ago}}},
    {
        "$group": {
            "_id": {
                "$dateToString": {
                    "format": "%Y-%m-%d",
                    "date": "$timestamp"
                }
            },
            "count": {"$sum": 1}
        }
    },
    {"$sort": {"_id": 1}}
]))
Enter fullscreen mode Exit fullscreen mode

Example — Pipeline 5: Confidence Distribution with $bucket

confidence_dist = list(db['prediction_history'].aggregate([
    {
        "$bucket": {
            "groupBy": "$confidence",
            "boundaries": [0, 0.5, 0.7, 0.85, 0.95, 1.0],
            "default": "other",
            "output": {"count": {"$sum": 1}}
        }
    }
]))
Enter fullscreen mode Exit fullscreen mode

Real Time Model Training with SSE

One of the most impressive features of CropGuard AI is live model training directly in the browser. Users upload a dataset zip, choose a base architecture (MobileNetV2, EfficientNetB0, or ResNet50), set hyperparameters, and watch the model train in real time through a live Recharts chart.

Training runs on the Django backend using TensorFlow/Keras in a background Python thread. A custom Keras callback captures real metrics after every epoch and streams them to the React frontend via Server Sent Events:

class SSECallback(tf.keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs=None):
        session.metrics.append({
            "epoch": epoch + 1,
            "train_loss": round(logs.get("loss", 0), 4),
            "val_loss": round(logs.get("val_loss", 0), 4),
            "train_acc": round(logs.get("accuracy", 0) * 100, 2),
            "val_acc": round(logs.get("val_accuracy", 0) * 100, 2)
        })
Enter fullscreen mode Exit fullscreen mode

After training completes the model is saved as a real .h5 file (9.58MB+) using model.save(h5_path) and the metadata is written to MongoDB model_registry. The trained model immediately appears in the My Models Repository dropdown in the Testing Lab — no re-upload required.

💡 SSE was chosen over WebSockets because training progress is one-directional — the server pushes updates to the browser, the browser never needs to send data back. SSE is simpler, lighter, and natively supported by Django's StreamingHttpResponse with content_type="text/event-stream".


Model Comparison — User Model vs Production Model

The Diagnostics Lab lets users upload their own .h5 or .keras model trained on Colab or Kaggle and benchmark it against the production EfficientNetV2-M model side by side on the same image.

CropGuard AI Comparison — Performance Radar and Accuracy vs Inference Time

The comparison endpoint loads the user model from the MongoDB registry path and runs real TensorFlow inference — no simulation or copying of results:

# Load user model from MongoDB registry
registry_doc = db['model_registry'].find_one({"model_id": model_id})
h5_path = os.path.join(settings.BASE_DIR, registry_doc['h5_path'])
user_model = tf.keras.models.load_model(h5_path)

# Run real inference with user model's class labels only
class_labels = registry_doc['class_labels']
predictions = user_model.predict(img_array)[0]

top5 = sorted([
    {"class_name": class_labels[i], "confidence": float(predictions[i])}
    for i in range(len(class_labels))
], key=lambda x: x['confidence'], reverse=True)[:5]
Enter fullscreen mode Exit fullscreen mode

The frontend shows a split screen with Grad-CAM heatmaps for both models, confidence delta, WINNER badge on the more confident model, and a MATCH or DRIFT indicator showing whether both models agreed on the diagnosis.


Grad-CAM — Explainable AI

Grad-CAM (Gradient-weighted Class Activation Mapping) generates a heatmap showing which pixels in the leaf image most influenced the model's prediction. This is critical for trust in agricultural AI — you need to know the model is focusing on the actual disease, not the background or lighting.

✅ In every test on real plant disease images, the Grad-CAM heatmap correctly highlighted the diseased region of the leaf — not the background, not the stem. The red and yellow areas matched the visually obvious disease spots exactly.

Grad-CAM Heatmap — visualizing spatial regions the CNN used for classification


MongoDB Atlas Configuration

We connect to MongoDB Atlas using pymongo directly — no ORM wrapper. The connection is wrapped in a try/except so the Django server still starts cleanly even if Atlas is temporarily unreachable:

# cropguard/settings.py
import pymongo, os

try:
    MONGO_CLIENT = pymongo.MongoClient(os.environ.get('MONGODB_URI'))
    MONGO_DB = MONGO_CLIENT['cropguard']
    MONGO_CLIENT.admin.command('ping')
    print("MongoDB Atlas connected successfully")
except Exception as e:
    print(f"MongoDB connection failed: {e}")
    MONGO_DB = None
Enter fullscreen mode Exit fullscreen mode

Every view that uses MongoDB checks if MONGO_DB is None first and returns a clean 503 Service Unavailable JSON response rather than crashing with an unhandled exception.

MONGODB_URI=mongodb+srv://username:password@cluster0.xxxxx.mongodb.net/cropguard
SECRET_KEY=django-secret-key
FIREBASE_CREDENTIALS_PATH=backend/firebase_credentials.json
OPENROUTER_API_KEY=your-openrouter-key
GEMINI_API_KEY=your-gemini-key
Enter fullscreen mode Exit fullscreen mode

Django URL Configuration

All 25 endpoints are served by Django with zero URL prefix conflicts. The root cropguard/urls.py includes api/urls.py at path '' so every endpoint is served exactly as the React frontend expects with no changes needed on the frontend side:

# cropguard/urls.py
urlpatterns = [
    path('admin/', admin.site.urls),
    path('', include('api.urls')),
]

# api/urls.py — key endpoints
urlpatterns = [
    path('api/predict', core.predict),
    path('api/studio/train', studio.start_training),
    path('api/studio/progress/<str:session_id>', studio.stream_progress),
    path('api/studio/models/registry', studio.get_registry),
    path('api/studio/models/benchmark', studio.get_benchmark),
    path('api/studio/test/compare', studio.compare_models),
    path('api/studio/test/export-report', studio.export_report),
    path('api/analytics/overview', analytics.get_overview),
    path('api/predictions/history', core.get_prediction_history),
    path('api/predictions/<str:prediction_id>', core.delete_prediction),
    # ... 15 more endpoints
]
Enter fullscreen mode Exit fullscreen mode

Validation Before Demo

We did not treat the migration as complete just because the homepage loaded once. We verified all 12 critical endpoints using curl before running the full visual demo in the browser:

daphne -p 8000 cropguard.asgi:application

# Verified all critical endpoints
curl -X GET  http://localhost:8000/ping
curl -X GET  http://localhost:8000/api/studio/models/registry
curl -X GET  http://localhost:8000/api/analytics/overview
curl -X GET  http://localhost:8000/models/benchmark
curl -X POST http://localhost:8000/api/predict
curl -X POST http://localhost:8000/api/studio/train
curl -X GET  http://localhost:8000/api/studio/progress/{session_id}
curl -X POST http://localhost:8000/api/studio/test/compare
curl -X POST http://localhost:8000/api/studio/test/export-report
curl -X GET  http://localhost:8000/api/predictions/history
curl -X GET  http://localhost:8000/api/benchmark/architectures
curl -X POST http://localhost:8000/api/auth/send-code
Enter fullscreen mode Exit fullscreen mode

All 12 endpoints returned correct JSON responses with MongoDB data. After verification we deleted the original FastAPI files (main.py and routers/) and confirmed the Django server was the sole runtime serving the application.


Key Takeaways

Django and MongoDB became the right combination for CropGuard AI for three reasons. First, Django gave the project a single Python runtime for all API routes, SSE streaming, background training threads, Grad-CAM generation, and PDF report creation. Second, MongoDB provided a flexible persistence model for predictions, model metadata, and user data whose shapes naturally vary. Third, MongoDB aggregation pipelines handled all analytics computation inside the database — no Python loops needed for any statistics.

✅ For an AI platform, this architecture is not just a deployment convenience. It makes the system more maintainable. When prediction storage, model registry, and analytics all live in MongoDB with consistent document structures, adding new features does not break existing data flows.


Execution

The CropGuard AI platform is available through the GitHub repository below.

🔗 GitHub Repository: github.com/Aakarsh076788/cropguard-ai

🎬 Video Demonstration:


CropGuard AI is a Django-based Python full-stack project that uses MongoDB Atlas for application persistence, including prediction history, model registry, and user verification data, with 7 aggregation pipelines powering a real-time analytics dashboard.

Top comments (0)