DEV Community

Cover image for Breaking Language Barriers in Agriculture: How MongoDB Atlas Powers FarmAI for Intelligent Crop Diagnostics
Kundanika Nampally
Kundanika Nampally

Posted on

Breaking Language Barriers in Agriculture: How MongoDB Atlas Powers FarmAI for Intelligent Crop Diagnostics

Team Members

This project was developed by:

Introduction

In many parts of rural India, farmers don't have immediate access to agricultural experts. Imagine a farmer in Telangana noticing his rice crop turning yellow. The issue could be anything — nutrient deficiency, disease, or water stress — but identifying the exact problem isn't easy.

Most online resources are in English, and consulting an expert is often expensive or unavailable. Delays in diagnosis can lead to serious crop damage.

That's where FarmAI comes in.

FarmAI allows farmers to ask questions in their own language — Telugu, Hindi, Tamil, and more — and receive instant, AI-powered crop stress diagnosis along with actionable advice.

To make this system reliable and fast, we needed a backend that could handle:

  • multilingual data
  • flexible document structures
  • real-time analytics

We chose MongoDB Atlas — and it became a central part of our architecture.


What We Built: The Tech Stack

FarmAI is built using:

  • Backend: Django
  • Database: MongoDB Atlas
  • ML Model: TF-IDF + Logistic Regression
  • NLP Pipeline: Multilingual query processing
  • APIs: Django REST Framework + GraphQL

Each farmer interaction is stored as a rich document, not just plain text.

query_document = {
  "farmer_id": "farmer_001@example.com",
  "query_text": "my rice crop leaves are turning yellow",
  "input_language": "te",
  "detected_stress": "NUTRIENT_DEFICIENCY",
  "crop_detected": "Rice",
  "confidence_score": 0.87,
  "timestamp": "2026-04-07T10:23:00Z",
  "advisory": {
    "immediate_action": "Apply urea",
    "treatments": ["Spray urea solution"],
  }
}
Enter fullscreen mode Exit fullscreen mode

The flexibility of MongoDB's document model made it easy to evolve this structure as our ML outputs improved.


MongoDB Features We Explored

Instead of relying on multiple queries and heavy backend logic, we used MongoDB's aggregation framework to push computation directly into the database.

1. $facet — Multiple Queries in a Single Round Trip

pipeline = [
   {"$match": {"farmer_id": email}},
   {"$facet": {
       "total":           [{"$count": "n"}],
       "today":           [...],
       "this_week":       [...],
       "unique_crops":    [...],
       "stress_breakdown":[...]
   }}
]
Enter fullscreen mode Exit fullscreen mode

What it does

The $facet stage allows multiple aggregations to run in parallel within a single query.

Instead of executing 5 separate queries like:

  • total records
  • today's data
  • weekly data
  • unique crops
  • stress analysis

I combined everything into one MongoDB call.

Impact on my project

  • Reduced database round trips significantly
  • Improved API response time
  • Cleaner backend logic (single pipeline instead of multiple queries)

2. $group + $project + $substr → Peak Usage Hours

pipeline_hours = [
   {"$match": {"farmer_id": email}},
   {"$project": {
       "hour": {"$toInt": {"$substr": ["$timestamp", 11, 2]}}
   }},
   {"$group": {"_id": "$hour", "count": {"$sum": 1}}},
   {"$sort":  {"_id": 1}},
]
Enter fullscreen mode Exit fullscreen mode

What it does

This pipeline extracts the hour (0–23) from a timestamp string and calculates how many queries occur in each hour.

  • $substr → extracts hour from timestamp
  • $toInt → converts it to a number
  • $group → counts occurrences
  • $sort → orders results

This powers a "Peak Usage Hours" chart in my dashboard.

Impact on my project

  • Enabled time-based insights
  • Helped identify peak system usage
  • Improved data visualization for users

3. $bucket → Confidence Score Distribution

pipeline_conf = [
   {"$match": {"farmer_id": email}},
   {"$bucket": {
       "groupBy":    "$confidence_score",
       "boundaries": [0, 0.2, 0.4, 0.6, 0.8, 1.01],
       "output":     {"count": {"$sum": 1}},
   }},
]
Enter fullscreen mode Exit fullscreen mode

What it does

The $bucket stage groups values into predefined ranges — similar to a histogram.

In my case:

  • 0–20%
  • 20–40%
  • 40–60%
  • 60–80%
  • 80–100%

This shows how confident the system is across predictions.

Impact on my project

  • Simplified statistical analysis
  • No need for frontend computation
  • Enabled quick visualization of model performance

4. $match → User-Specific Data Filtering

{"$match": {"farmer_id": email}}
Enter fullscreen mode Exit fullscreen mode

What it does

Every pipeline begins with $match to filter data based on the logged-in user.

This ensures:

  • Users only access their own data
  • Queries remain efficient by reducing dataset size early

Impact on my project

  • Improved security and data isolation
  • Faster aggregation performance
  • Better scalability for multi-user systems

Why These Features Matter

By combining these MongoDB features, I was able to:

  • Replace multiple queries with single optimized pipelines
  • Perform analytics directly inside the database
  • Reduce backend complexity
  • Improve overall performance and scalability

Implementation

Login/Register Page

User authentication interface allowing farmers to securely register and log in to access personalized services.

Login/Register

Dashboard Page

Interactive dashboard displaying query statistics, crop insights, and real-time analytics for the logged-in farmer.

Dashboard Page

Profile Page

User profile view showing farmer details and account-specific information used for personalized recommendations.

Profile Page

Query Result

AI-generated crop diagnosis with detected stress type, confidence score, and recommended actions.

Query Result

Atlas Farmer Queries

MongoDB Atlas collection storing farmer queries along with predictions, timestamps, and advisory data.

Atlas Farmer Queries

MongoDB Terminal

MongoDB shell interface used to execute queries and verify database operations during development.

MongoDB Terminal


A Bug That Took Time to Find

Our MongoDB connection was silently failing. Queries appeared to work — the app ran fine — but nothing was reaching Atlas. On server restart, all data disappeared.

The cause: our password Password$xx contained a $ character. In a MongoDB URI, $ must be URL-encoded as %24. Without encoding, PyMongo failed to authenticate and silently fell back to in-memory storage — no error, no warning.

Fix:

# Wrong
mongodb+srv://user:Password$xx@cluster...

# Correct
mongodb+srv://user:Password%24xx@cluster...
Enter fullscreen mode Exit fullscreen mode

After this fix, every query correctly reached Atlas and appeared in the collection browser under the right farmer_id.


Error-resolved

Impact on Our Project

What Changed

  • 4–5 queries → 1 $facet pipeline
  • No more Python data processing
  • Analytics handled directly in MongoDB

Performance

  • Faster dashboard load
  • Single DB call instead of multiple
  • Parallel processing inside MongoDB

Security

{"$match": {"farmer_id": email}}
Enter fullscreen mode Exit fullscreen mode
  • Each user sees only their data
  • No risk of exposing other users' info

Analytics

  • $bucket → confidence distribution
  • $group + $substr → peak hours
  • No backend calculations needed

Fix

  • Password $ → encoded as %24
  • Fixed data not saving issue

Result

  • ✅ Faster
  • ✅ Secure
  • ✅ Cleaner code

Advantages

  • Fewer database calls$facet replaced multiple sequential queries with a single pipeline
  • High performance — faster response times due to reduced DB round trips and parallel execution
  • Built-in analytics$bucket, $group, and $substr handle all computations inside MongoDB
  • Real-time insights — dashboard charts reflect live data directly from MongoDB Atlas
  • Strong data security$match ensures per-user data isolation by design
  • Flexible schema — easily added new fields without migrations
  • Clean and maintainable backend — reduced code complexity by shifting logic to aggregation pipelines
  • Efficient time-based analysis — peak usage hours derived directly from timestamps
  • Complete data in one document — query, language, prediction, and advisory stored together

Disadvantages / Challenges

  • Aggregation pipelines can be complex — designing multi-stage pipelines like $facet requires careful planning
  • Hard to debug — errors inside $facet often return empty results instead of clear failures
  • Learning curve — understanding advanced stages like $bucket and $group takes time
  • Edge cases in $bucket — values like 1.0 required adjusting boundaries (e.g., 1.01)
  • Dependency on data format$substr works only because timestamps follow a consistent ISO format
  • Session handling complexity — tight integration between GraphQL and Django required
  • Request configuration issues — missing credentials: 'include' caused incorrect user mapping
  • Silent connection issues — incorrect MongoDB URI caused data not to persist properly

Results / Output

MongoDB Atlas Collection View

Collection view showing structured documents for each farmer interaction.

MongoDB Atlas Collection View

Charts (Confidence, Stress, Peak Hours)

Visualization of model confidence distribution, stress categories, and system usage patterns.

Chart-1

Chart-2

Dashboard — Personal Query Count

Personalized dashboard metric showing total number of queries made by the logged-in user.

Dashboard Query Count


Conclusion

MongoDB Atlas did more than just store our data — it powered our analytics.

Features like $facet, $bucket, $group + $substr, and $match replaced large amounts of backend data-processing logic with fast, database-native pipelines. This made the system both simpler and more efficient.

For a multilingual agricultural platform like FarmAI, where data is diverse and user privacy is critical, MongoDB's document model proved to be the right fit. Each document captures the complete interaction, while aggregation pipelines generate real-time insights with minimal overhead.

The biggest takeaway from this project is clear:
Push computation into the database, not the application layer.

This approach:

  • simplifies backend code
  • improves performance
  • ensures data remains consistent and secure

FarmAI shows that with the right tools, even a student-built system can deliver a scalable, real-world solution that makes a meaningful impact.


Acknowledgement

Behind every good project is someone who asks the right questions. For us, that was @chanda_rajkumar. His mentorship pushed FarmAI from a rough idea to a working system — and for that, we are truly grateful.

🔗 Project Links

GitHub Repository:
View Source Code on GitHub

Live Demo (Render):
Try FarmAI Live

Demo Video:

Top comments (0)