By Vaishnavi Reddy, Siri Reddy, Hasini, Joshi Gayatri. This project was developed under the guidance and mentorship of Professor Chanda Rajkumar
The Idea Behind the Project:
What if thousands of farmers are asking similar questions every day—but no one is truly analyzing them?
This was the thought that led to our project.
Farmers frequently raise queries about crop diseases, fertilizers, irrigation, and weather conditions through helplines, apps, and messaging platforms. These queries often contain valuable insights, but they are usually scattered, unstructured, and underutilized.
Instead of approaching this with a highly complex solution, we focused on a simpler idea:
Can we design a system that automatically analyzes farmer queries and identifies patterns to provide smarter crop advisory support?
Why Do We Think This Problem Matters?
In real-world agricultural systems, a large amount of farmer interaction data exists as unstructured text—short questions, voice-to-text inputs, or incomplete descriptions.
Within this data:
Problems are often described vaguely
Local languages and mixed dialects are used
Critical issues like pest attacks or nutrient deficiencies may be hidden in simple sentences
Manually analyzing such data is not only time-consuming but also inefficient at scale.
As the number of farmers using digital platforms grows, it becomes essential to build systems that can:
Understand queries automatically
Identify recurring issues
Provide timely and relevant advisory
This project explores how a lightweight NLP-based system can help bridge this gap in a practical and scalable way.
How We Set Up Our Project
The goal of the project was to create a complete system that:
Accepts farmer queries as input
Cleans and processes the text
Extracts meaningful patterns
Classifies the type of query (e.g., pest, fertilizer, irrigation)
Stores and analyzes the data for future insights
Technology Stack
To keep the system simple yet effective, we used a lightweight and practical tech stack.
Python — Core Engine
The entire system is built using Python due to its strong support for NLP and data processing.
Data Handling
Pandas → Dataset processing
NumPy → Numerical operations
NLP with NLTK
We used NLTK to preprocess farmer queries by:
Removing stopwords
Cleaning text
Normalizing input
Machine Learning — Scikit-learn
Scikit-learn was used to build the classification model.
TF-IDF → Feature extraction
Logistic Regression → Query classification
Backend — Flask API
We developed a simple backend using Flask to make the model accessible.
MongoDB Integration for Database
Farmer query data is highly unstructured and varies significantly in format.
We used MongoDB because:
It supports flexible schemas (no rigid tables)
It efficiently stores document-based data
It is scalable and suitable for real-time applications
The system workflow:
User submits a query
The model processes and classifies it
Results are stored in MongoDB for analysis
Data Stored in MongoDB
*Each query is stored as a document containing:
*
Farmer query text
Predicted category (pest, irrigation, fertilizer, etc.)
Crop type (if identified)
Location (if available)
Timestamp
System Architecture
The system follows a modular pipeline:
1. Input
Farmer query (text or voice-converted text)
2. Preprocessing
Cleaning, normalization, stopword removal
3. Feature Extraction
TF-IDF vectorization
4. Prediction
Classification using Logistic Regression
5. Storage
Results stored in MongoDB
Pipeline Flow
Input → Preprocessing → Feature Extraction → Model → Output
System Demonstration
The system allows users to input a farmer query such as:
_"Why are my crop leaves turning yellow?"
_
The model processes the query and predicts the category (e.g., nutrient deficiency).
The output is displayed as:
“Fertilizer-related issue”
“Pest-related issue”
This demonstrates how the system can assist in decision-making.
Results and Insights:
From our analysis:
Frequent queries were related to pests and fertilizers
Seasonal trends were clearly observed
Similar problems were reported across different regions
This shows that analyzing query patterns can help in:
Predicting upcoming issues
Providing proactive advisory
Improving agricultural decision-making
Future Improvements
Future enhancements can include:
Using advanced models like BERT for better accuracy
Supporting regional languages
Integrating weather and soil data
Building a real-time farmer advisory app
Conclusion
This project demonstrates how NLP and machine learning can transform unstructured farmer queries into meaningful insights.
By integrating intelligent models with MongoDB, the system evolves from a simple classifier into a scalable crop advisory solution.
Ultimately, this approach helps in:
Understanding farmer needs better
Delivering timely recommendations
Supporting smarter and more sustainable agriculture






Top comments (0)