Kundanika Nampally

Posted on Apr 28

Breaking Language Barriers in Agriculture: How MongoDB Atlas Powers FarmAI for Intelligent Crop Diagnostics

#mongodb #django #machinelearning #agriculture

Team Members

This project was developed by:

Introduction

In many parts of rural India, farmers don't have immediate access to agricultural experts. Imagine a farmer in Telangana noticing his rice crop turning yellow. The issue could be anything — nutrient deficiency, disease, or water stress — but identifying the exact problem isn't easy.

Most online resources are in English, and consulting an expert is often expensive or unavailable. Delays in diagnosis can lead to serious crop damage.

That's where FarmAI comes in.

FarmAI allows farmers to ask questions in their own language — Telugu, Hindi, Tamil, and more — and receive instant, AI-powered crop stress diagnosis along with actionable advice.

To make this system reliable and fast, we needed a backend that could handle:

multilingual data
flexible document structures
real-time analytics

We chose MongoDB Atlas — and it became a central part of our architecture.

What We Built: The Tech Stack

FarmAI is built using:

Backend: Django
Database: MongoDB Atlas
ML Model: TF-IDF + Logistic Regression
NLP Pipeline: Multilingual query processing
APIs: Django REST Framework + GraphQL

Each farmer interaction is stored as a rich document, not just plain text.

query_document = {
  "farmer_id": "farmer_001@example.com",
  "query_text": "my rice crop leaves are turning yellow",
  "input_language": "te",
  "detected_stress": "NUTRIENT_DEFICIENCY",
  "crop_detected": "Rice",
  "confidence_score": 0.87,
  "timestamp": "2026-04-07T10:23:00Z",
  "advisory": {
    "immediate_action": "Apply urea",
    "treatments": ["Spray urea solution"],
  }
}

The flexibility of MongoDB's document model made it easy to evolve this structure as our ML outputs improved.

MongoDB Features We Explored

Instead of relying on multiple queries and heavy backend logic, we used MongoDB's aggregation framework to push computation directly into the database.

1. `$facet` — Multiple Queries in a Single Round Trip

pipeline = [
   {"$match": {"farmer_id": email}},
   {"$facet": {
       "total":           [{"$count": "n"}],
       "today":           [...],
       "this_week":       [...],
       "unique_crops":    [...],
       "stress_breakdown":[...]
   }}
]

What it does

The $facet stage allows multiple aggregations to run in parallel within a single query.

Instead of executing 5 separate queries like:

total records
today's data
weekly data
unique crops
stress analysis

I combined everything into one MongoDB call.

Impact on my project

Reduced database round trips significantly
Improved API response time
Cleaner backend logic (single pipeline instead of multiple queries)

2. `$group` + `$project` + `$substr` → Peak Usage Hours

pipeline_hours = [
   {"$match": {"farmer_id": email}},
   {"$project": {
       "hour": {"$toInt": {"$substr": ["$timestamp", 11, 2]}}
   }},
   {"$group": {"_id": "$hour", "count": {"$sum": 1}}},
   {"$sort":  {"_id": 1}},
]

What it does

This pipeline extracts the hour (0–23) from a timestamp string and calculates how many queries occur in each hour.

$substr → extracts hour from timestamp
$toInt → converts it to a number
$group → counts occurrences
$sort → orders results

This powers a "Peak Usage Hours" chart in my dashboard.

Impact on my project

Enabled time-based insights
Helped identify peak system usage
Improved data visualization for users

3. `$bucket` → Confidence Score Distribution

pipeline_conf = [
   {"$match": {"farmer_id": email}},
   {"$bucket": {
       "groupBy":    "$confidence_score",
       "boundaries": [0, 0.2, 0.4, 0.6, 0.8, 1.01],
       "output":     {"count": {"$sum": 1}},
   }},
]

What it does

The $bucket stage groups values into predefined ranges — similar to a histogram.

In my case:

0–20%
20–40%
40–60%
60–80%
80–100%

This shows how confident the system is across predictions.

Impact on my project

Simplified statistical analysis
No need for frontend computation
Enabled quick visualization of model performance

4. `$match` → User-Specific Data Filtering

{"$match": {"farmer_id": email}}

What it does

Every pipeline begins with $match to filter data based on the logged-in user.

This ensures:

Users only access their own data
Queries remain efficient by reducing dataset size early

Impact on my project

Improved security and data isolation
Faster aggregation performance
Better scalability for multi-user systems

Why These Features Matter

By combining these MongoDB features, I was able to:

Replace multiple queries with single optimized pipelines
Perform analytics directly inside the database
Reduce backend complexity
Improve overall performance and scalability

Implementation

Login/Register Page

User authentication interface allowing farmers to securely register and log in to access personalized services.

Dashboard Page

Interactive dashboard displaying query statistics, crop insights, and real-time analytics for the logged-in farmer.

Profile Page

User profile view showing farmer details and account-specific information used for personalized recommendations.

Query Result

AI-generated crop diagnosis with detected stress type, confidence score, and recommended actions.

Atlas Farmer Queries

MongoDB Atlas collection storing farmer queries along with predictions, timestamps, and advisory data.

MongoDB Terminal

MongoDB shell interface used to execute queries and verify database operations during development.

A Bug That Took Time to Find

Our MongoDB connection was silently failing. Queries appeared to work — the app ran fine — but nothing was reaching Atlas. On server restart, all data disappeared.

The cause: our password Password$xx contained a $ character. In a MongoDB URI, $ must be URL-encoded as %24. Without encoding, PyMongo failed to authenticate and silently fell back to in-memory storage — no error, no warning.

Fix:

# Wrong
mongodb+srv://user:Password$xx@cluster...

# Correct
mongodb+srv://user:Password%24xx@cluster...

After this fix, every query correctly reached Atlas and appeared in the collection browser under the right farmer_id.

Impact on Our Project

What Changed

4–5 queries → 1 $facet pipeline
No more Python data processing
Analytics handled directly in MongoDB

Performance

Faster dashboard load
Single DB call instead of multiple
Parallel processing inside MongoDB

Security

{"$match": {"farmer_id": email}}

Each user sees only their data
No risk of exposing other users' info

Analytics

$bucket → confidence distribution
$group + $substr → peak hours
No backend calculations needed

Fix

Password $ → encoded as %24
Fixed data not saving issue

Result

✅ Faster
✅ Secure
✅ Cleaner code

Advantages

Fewer database calls — $facet replaced multiple sequential queries with a single pipeline
High performance — faster response times due to reduced DB round trips and parallel execution
Built-in analytics — $bucket, $group, and $substr handle all computations inside MongoDB
Real-time insights — dashboard charts reflect live data directly from MongoDB Atlas
Strong data security — $match ensures per-user data isolation by design
Flexible schema — easily added new fields without migrations
Clean and maintainable backend — reduced code complexity by shifting logic to aggregation pipelines
Efficient time-based analysis — peak usage hours derived directly from timestamps
Complete data in one document — query, language, prediction, and advisory stored together

Disadvantages / Challenges

Aggregation pipelines can be complex — designing multi-stage pipelines like $facet requires careful planning
Hard to debug — errors inside $facet often return empty results instead of clear failures
Learning curve — understanding advanced stages like $bucket and $group takes time
Edge cases in $bucket — values like 1.0 required adjusting boundaries (e.g., 1.01)
Dependency on data format — $substr works only because timestamps follow a consistent ISO format
Session handling complexity — tight integration between GraphQL and Django required
Request configuration issues — missing credentials: 'include' caused incorrect user mapping
Silent connection issues — incorrect MongoDB URI caused data not to persist properly

Results / Output

MongoDB Atlas Collection View

Collection view showing structured documents for each farmer interaction.

Charts (Confidence, Stress, Peak Hours)

Visualization of model confidence distribution, stress categories, and system usage patterns.

Dashboard — Personal Query Count

Personalized dashboard metric showing total number of queries made by the logged-in user.

Conclusion

MongoDB Atlas did more than just store our data — it powered our analytics.

Features like $facet, $bucket, $group + $substr, and $match replaced large amounts of backend data-processing logic with fast, database-native pipelines. This made the system both simpler and more efficient.

For a multilingual agricultural platform like FarmAI, where data is diverse and user privacy is critical, MongoDB's document model proved to be the right fit. Each document captures the complete interaction, while aggregation pipelines generate real-time insights with minimal overhead.

The biggest takeaway from this project is clear:
Push computation into the database, not the application layer.

This approach:

simplifies backend code
improves performance
ensures data remains consistent and secure

FarmAI shows that with the right tools, even a student-built system can deliver a scalable, real-world solution that makes a meaningful impact.

Acknowledgement

Behind every good project is someone who asks the right questions. For us, that was @chanda_rajkumar. His mentorship pushed FarmAI from a rough idea to a working system — and for that, we are truly grateful.

🔗 Project Links

GitHub Repository:
View Source Code on GitHub

Live Demo (Render):
Try FarmAI Live

Demo Video:

Team Members

Introduction

What We Built: The Tech Stack

MongoDB Features We Explored

1. $facet — Multiple Queries in a Single Round Trip

What it does

Impact on my project

2. $group + $project + $substr → Peak Usage Hours

What it does

Impact on my project

3. $bucket → Confidence Score Distribution

What it does

Impact on my project

4. $match → User-Specific Data Filtering

What it does

Impact on my project

Why These Features Matter

Implementation

Login/Register Page

Dashboard Page

Profile Page

Query Result

Atlas Farmer Queries

MongoDB Terminal

A Bug That Took Time to Find

Impact on Our Project

What Changed

Performance

Security

Analytics

Fix

Result

Advantages

Disadvantages / Challenges

Results / Output

MongoDB Atlas Collection View

Charts (Confidence, Stress, Peak Hours)

Dashboard — Personal Query Count

Conclusion

Acknowledgement

🔗 Project Links

1. `$facet` — Multiple Queries in a Single Round Trip

2. `$group` + `$project` + `$substr` → Peak Usage Hours

3. `$bucket` → Confidence Score Distribution

4. `$match` → User-Specific Data Filtering