Team Members
This project was developed by:
Introduction
In many parts of rural India, farmers don't have immediate access to agricultural experts. Imagine a farmer in Telangana noticing his rice crop turning yellow. The issue could be anything — nutrient deficiency, disease, or water stress — but identifying the exact problem isn't easy.
Most online resources are in English, and consulting an expert is often expensive or unavailable. Delays in diagnosis can lead to serious crop damage.
That's where FarmAI comes in.
FarmAI allows farmers to ask questions in their own language — Telugu, Hindi, Tamil, and more — and receive instant, AI-powered crop stress diagnosis along with actionable advice.
To make this system reliable and fast, we needed a backend that could handle:
- multilingual data
- flexible document structures
- real-time analytics
We chose MongoDB Atlas — and it became a central part of our architecture.
What We Built: The Tech Stack
FarmAI is built using:
- Backend: Django
- Database: MongoDB Atlas
- ML Model: TF-IDF + Logistic Regression
- NLP Pipeline: Multilingual query processing
- APIs: Django REST Framework + GraphQL
Each farmer interaction is stored as a rich document, not just plain text.
query_document = {
"farmer_id": "farmer_001@example.com",
"query_text": "my rice crop leaves are turning yellow",
"input_language": "te",
"detected_stress": "NUTRIENT_DEFICIENCY",
"crop_detected": "Rice",
"confidence_score": 0.87,
"timestamp": "2026-04-07T10:23:00Z",
"advisory": {
"immediate_action": "Apply urea",
"treatments": ["Spray urea solution"],
}
}
The flexibility of MongoDB's document model made it easy to evolve this structure as our ML outputs improved.
MongoDB Features We Explored
Instead of relying on multiple queries and heavy backend logic, we used MongoDB's aggregation framework to push computation directly into the database.
1. $facet — Multiple Queries in a Single Round Trip
pipeline = [
{"$match": {"farmer_id": email}},
{"$facet": {
"total": [{"$count": "n"}],
"today": [...],
"this_week": [...],
"unique_crops": [...],
"stress_breakdown":[...]
}}
]
What it does
The $facet stage allows multiple aggregations to run in parallel within a single query.
Instead of executing 5 separate queries like:
- total records
- today's data
- weekly data
- unique crops
- stress analysis
I combined everything into one MongoDB call.
Impact on my project
- Reduced database round trips significantly
- Improved API response time
- Cleaner backend logic (single pipeline instead of multiple queries)
2. $group + $project + $substr → Peak Usage Hours
pipeline_hours = [
{"$match": {"farmer_id": email}},
{"$project": {
"hour": {"$toInt": {"$substr": ["$timestamp", 11, 2]}}
}},
{"$group": {"_id": "$hour", "count": {"$sum": 1}}},
{"$sort": {"_id": 1}},
]
What it does
This pipeline extracts the hour (0–23) from a timestamp string and calculates how many queries occur in each hour.
-
$substr→ extracts hour from timestamp -
$toInt→ converts it to a number -
$group→ counts occurrences -
$sort→ orders results
This powers a "Peak Usage Hours" chart in my dashboard.
Impact on my project
- Enabled time-based insights
- Helped identify peak system usage
- Improved data visualization for users
3. $bucket → Confidence Score Distribution
pipeline_conf = [
{"$match": {"farmer_id": email}},
{"$bucket": {
"groupBy": "$confidence_score",
"boundaries": [0, 0.2, 0.4, 0.6, 0.8, 1.01],
"output": {"count": {"$sum": 1}},
}},
]
What it does
The $bucket stage groups values into predefined ranges — similar to a histogram.
In my case:
- 0–20%
- 20–40%
- 40–60%
- 60–80%
- 80–100%
This shows how confident the system is across predictions.
Impact on my project
- Simplified statistical analysis
- No need for frontend computation
- Enabled quick visualization of model performance
4. $match → User-Specific Data Filtering
{"$match": {"farmer_id": email}}
What it does
Every pipeline begins with $match to filter data based on the logged-in user.
This ensures:
- Users only access their own data
- Queries remain efficient by reducing dataset size early
Impact on my project
- Improved security and data isolation
- Faster aggregation performance
- Better scalability for multi-user systems
Why These Features Matter
By combining these MongoDB features, I was able to:
- Replace multiple queries with single optimized pipelines
- Perform analytics directly inside the database
- Reduce backend complexity
- Improve overall performance and scalability
Implementation
Login/Register Page
User authentication interface allowing farmers to securely register and log in to access personalized services.
Dashboard Page
Interactive dashboard displaying query statistics, crop insights, and real-time analytics for the logged-in farmer.
Profile Page
User profile view showing farmer details and account-specific information used for personalized recommendations.
Query Result
AI-generated crop diagnosis with detected stress type, confidence score, and recommended actions.
Atlas Farmer Queries
MongoDB Atlas collection storing farmer queries along with predictions, timestamps, and advisory data.
MongoDB Terminal
MongoDB shell interface used to execute queries and verify database operations during development.
A Bug That Took Time to Find
Our MongoDB connection was silently failing. Queries appeared to work — the app ran fine — but nothing was reaching Atlas. On server restart, all data disappeared.
The cause: our password Password$xx contained a $ character. In a MongoDB URI, $ must be URL-encoded as %24. Without encoding, PyMongo failed to authenticate and silently fell back to in-memory storage — no error, no warning.
Fix:
# Wrong
mongodb+srv://user:Password$xx@cluster...
# Correct
mongodb+srv://user:Password%24xx@cluster...
After this fix, every query correctly reached Atlas and appeared in the collection browser under the right farmer_id.
Impact on Our Project
What Changed
- 4–5 queries → 1
$facetpipeline - No more Python data processing
- Analytics handled directly in MongoDB
Performance
- Faster dashboard load
- Single DB call instead of multiple
- Parallel processing inside MongoDB
Security
{"$match": {"farmer_id": email}}
- Each user sees only their data
- No risk of exposing other users' info
Analytics
-
$bucket→ confidence distribution -
$group+$substr→ peak hours - No backend calculations needed
Fix
- Password
$→ encoded as%24 - Fixed data not saving issue
Result
- ✅ Faster
- ✅ Secure
- ✅ Cleaner code
Advantages
-
Fewer database calls —
$facetreplaced multiple sequential queries with a single pipeline - High performance — faster response times due to reduced DB round trips and parallel execution
-
Built-in analytics —
$bucket,$group, and$substrhandle all computations inside MongoDB - Real-time insights — dashboard charts reflect live data directly from MongoDB Atlas
-
Strong data security —
$matchensures per-user data isolation by design - Flexible schema — easily added new fields without migrations
- Clean and maintainable backend — reduced code complexity by shifting logic to aggregation pipelines
- Efficient time-based analysis — peak usage hours derived directly from timestamps
- Complete data in one document — query, language, prediction, and advisory stored together
Disadvantages / Challenges
-
Aggregation pipelines can be complex — designing multi-stage pipelines like
$facetrequires careful planning -
Hard to debug — errors inside
$facetoften return empty results instead of clear failures -
Learning curve — understanding advanced stages like
$bucketand$grouptakes time -
Edge cases in
$bucket— values like1.0required adjusting boundaries (e.g.,1.01) -
Dependency on data format —
$substrworks only because timestamps follow a consistent ISO format - Session handling complexity — tight integration between GraphQL and Django required
-
Request configuration issues — missing
credentials: 'include'caused incorrect user mapping - Silent connection issues — incorrect MongoDB URI caused data not to persist properly
Results / Output
MongoDB Atlas Collection View
Collection view showing structured documents for each farmer interaction.
Charts (Confidence, Stress, Peak Hours)
Visualization of model confidence distribution, stress categories, and system usage patterns.
Dashboard — Personal Query Count
Personalized dashboard metric showing total number of queries made by the logged-in user.
Conclusion
MongoDB Atlas did more than just store our data — it powered our analytics.
Features like $facet, $bucket, $group + $substr, and $match replaced large amounts of backend data-processing logic with fast, database-native pipelines. This made the system both simpler and more efficient.
For a multilingual agricultural platform like FarmAI, where data is diverse and user privacy is critical, MongoDB's document model proved to be the right fit. Each document captures the complete interaction, while aggregation pipelines generate real-time insights with minimal overhead.
The biggest takeaway from this project is clear:
Push computation into the database, not the application layer.
This approach:
- simplifies backend code
- improves performance
- ensures data remains consistent and secure
FarmAI shows that with the right tools, even a student-built system can deliver a scalable, real-world solution that makes a meaningful impact.
Acknowledgement
Behind every good project is someone who asks the right questions. For us, that was @chanda_rajkumar. His mentorship pushed FarmAI from a rough idea to a working system — and for that, we are truly grateful.
🔗 Project Links
GitHub Repository:
View Source Code on GitHub
Live Demo (Render):
Try FarmAI Live
Demo Video:











Top comments (0)