DEV Community

Cover image for Machine Learning Driven Crop Yield Prediction with NLP-Based Insight
CHITTIPROLU DAKSHAYANI
CHITTIPROLU DAKSHAYANI

Posted on

Machine Learning Driven Crop Yield Prediction with NLP-Based Insight

Machine Learning Driven Crop Yield Prediction with NLP-Based Insight is a smart agriculture system. It helps farmers to make better decisions. This project uses ML to guess how much crop can be cultivated. It also uses NLP to understand the farmer's inputs, weather reports, and market trends.
This project uses current and previous data to make predictions. This helps farmers choose the right crop and use resources such as water, soil and fertilisers.
Overall, this project shows how technology makes farming easier and smarter by using ML and NLP.

TEAM MEMBERS
This project was developed by:

@chityala_akshitha - Chityala Akshitha
@kapa_keerthi_reddy - Kapa Keerthi Reddy
@k_sahasri_4178e21fcb9e123 - Kannayavandla Sahasri

We want to express our sincere gratitude to @chanda_rajkumar for their guidance and support throughout this project. He helped in understanding the project, development and architecture of Machine Learning Driven Crop Yield Prediction with NLP-Based Insight.

The Problem We Set Out to Solve

Agriculture today is not as straightforward as it used to be. Farmers have to deal with unpredictable weather, changing soil conditions, and constantly shifting market trends. Even though a lot of useful data exists—like weather reports, expert recommendations, and shared farming experiences—it often isn’t easy to combine all of this into something practical. That’s where our project comes in.
We've built a system that uses machine learning to predict crop yield based on simple inputs such as soil type, weather conditions, and crop details. The idea was to create something that doesn’t feel complicated but still gives meaningful output that can actually help in decision-making.
But we didn’t want to stop at just prediction. A lot of valuable information in agriculture comes in text form, such as reports, suggestions, and guidelines. So we integrate Natural Language Processing (NLP) to process this kind of data and turn it into clear, easy-to-understand insights instead of leaving it as raw text.

What we found interesting while working on this project is how powerful it becomes when you combine structured data (like numbers) with unstructured data (like text). Instead of just giving a predicted value, the system tries to provide context and useful suggestions around it.
Overall, this project is about making data actually useful. The goal is simple: to help farmers make better decisions with the information they already have, but in a way that’s easier to understand and apply in real life.

The Approach

The machine learning-driven crop yield prediction with natural language processing is a system that is used to guide farmers to make the right decisions, and it will make the farmers' decisions easier and smarter. The ML model will not make decisions from past data; it will use some real-time user queries, such as asking the soil type, weather conditions and what type of crop they are using. At the same time, NLP will process the text and understand data such as agricultural reports, farmer feedback, and market information to provide some useful insights. This ML model will develop farming from a risky and uncertain process to a more planned and data-driven process where farmers can make better decisions in advance, and they can improve their overall productivity.

System Architecture Overview

This System takes the information from farmers, including soil conditions, weather information and also the crop details, through the MongoDB backend. MongoDB takes this input and connects it with machine learning models to predict our model and detect possible risks.
At the same time, the model predicts information like farmer feedback or agriculture information, which is analysed using NLP to understand the common drawbacks and provide useful information.
All this information, including the inputs, predictions, and insights, is stored in MongoDB for easy predictions and real-time updates.
This model creates a smooth and easy way for data collection to prediction and gives suggestions, helping farmers get clear and useful guidance for better farming ideas.

Data Model in MongoDB

users – stores user profiles and preferences
_id, name, email, phone, preferred_language, location, role (farmer/admin), registration_date
feedback – user-submitted feedback and reports
_id, user_id, crop, region, soil, weather, notes, sentiment, date_submitted
crop_prices – stores historical and edited crop prices
_id, crop_name, price, unit, market, date_updated, updated_by
crop_predictions – machine learning crop recommendations
_id, user_id, soil, weather, region, recommended_crop, advice, prediction_date
disease_diagnosis – plant image uploads and disease results
_id, user_id, image_path, diagnosis, confidence, advice, upload_date
price_predictions – AI-predicted crop prices
_id, crop_name, market, predicted_price, prediction_date, model_version
dashboard_stats – aggregated analytics for dashboard
_id, date, crop_counts, issue_counts, feedback_summary

from pymongo import MongoClient
app = Flask(__name__)
client = MongoClient(
"mongodb+srv://dakshayani:dakshi19@myatlasclusteredu.wizq9sn.mongodb.net/myDB?retryWrites=true&w=majority",
tls=True,
tlsCAFile=certifi.where()
)
db = client["myDB"]
collection = db["feedback"]
app.secret_key = 'replace-this-with-a-secure-key'
DATA_FILE = 'feedback.csv'
UPLOAD_FOLDER = 'uploads'
app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER
translator = Translator()

The Aggregation Pipelines

Crop yield trend helps to process historical yield data to show the changes in crop production over time. Weather impact analysis analyses weather data like rainfall and temperature to understand how they are affecting crop yield. Soil performance tracking helps to research how different soil types and conditions affect productivity and crop growth. Farmer activity tracing helps to measure how often farmers input data and interact with the system. prediction distribution helps to see how many predictions fall into low, medium or high yield categories. NLP insights trends recognise common issues like drought, pests or diseases from text data and feedback. The alert summary pipeline helps to highlight recent high-risk situations, such as poor yield prediction or negative farming conditions that need attention.

Integration of AI & ML in the System

This system employs simple machine learning and NLP algorithms to provide valuable insights related to farming, without increasing complexity as the users enters relevant information such as soil characteristics, weather parameters, and also crop characteristics, and the system evaluates the data and also makes predictions about the crop yield and the possible risks.
Additionally, if the user provides feedback or problems, NLP analyses the information provided and identifies common factors such as pest infestation, drought, or low crop yields.
For this, the platform shows several components. The first component (TextInsightEngine), which evaluates the text information provided by the users and also receives useful information from it and the next component involves an ML algorithm that also predicts the crop yield and categorises it as low, medium and high. Depending on these pieces of information, the system provides valuable suggestions for farmers.
Additionally, there is an option for integrating a chatbot module that will deliver more conversational answers and also a fallback algorithm to guarantee proper functioning of the platform in case the other modules fail.

import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.tree import DecisionTreeClassifier
from joblib import dump

df = pd.read_csv('feedback.csv')
X = df[['soil', 'weather', 'region']]
y = df['crop']
le_soil = LabelEncoder()
le_weather = LabelEncoder()
le_region = LabelEncoder()
le_crop = LabelEncoder()
X['soil_enc'] = le_soil.fit_transform(X['soil'])
X['weather_enc'] = le_weather.fit_transform(X['weather'])
X['region_enc'] = le_region.fit_transform(X['region'])
X_enc = X[['soil_enc', 'weather_enc', 'region_enc']]
y_enc = le_crop.fit_transform(y)
clf = DecisionTreeClassifier()
clf.fit(X_enc, y_enc)
dump(clf, 'crop_model.joblib')
dump(le_soil, 'le_soil.joblib')
dump(le_weather, 'le_weather.joblib')
dump(le_region, 'le_region.joblib')
dump(le_crop, 'le_crop.joblib')
print('Model and encoders saved.')
Enter fullscreen mode Exit fullscreen mode

Experiences of Farmers First

Here, the present project is created based on the notion that the farmers' decisions are not made manually. In addition to that, farmers do not have to rely on their own experience. We have to provide them with the information through a user-friendly platform, so that it allows them to make informed decisions.
The farmers get access to express something in clear information, such as a forecast of yield, risk factors, and also some insights gained due to the text mining analysis, which is performed on data based on weather and soil conditions. And at the same time, there is no need to understand complicated reports because they can see what kind of problems exist in the system is managed to detect, such as pests or lack of rainfall.
Based on such insights, users will be able to receive such type of recommendations, such as an approach that will be able to create a consistent cycle for all the farmers during which they will always get to know what is their current situation and what kind of actions they need to take.

Challenges We Faced and How We Solved Them

In doing this project, we have faced so many challenges, especially while working with real-world agricultural data. By collecting the incomplete data from different places like weather, soil, we face so many issues, so to solve this, we have applied a few simple steps, such as clean, validate for better data.
The accuracy of crop yield prediction was handled by a learning model that can easily learn from the data we have.
To process the data, which varies in language and format of the farmers' feedback and report, we use simple NLP models which can identify key words and also can give the common problems they face, such as pest or low rainfall.
The performance increases as we are using MongoDB's indexing and filtering which make queries simple. Here, there will be less problems for changing the data in database.
We also use the backend as Django which is having a structure and which makes the system to work easily and update easily.
By using all these solutions we have built a system that is reliable, flexible, and easy to use for farmers.

Meet AgroAssist: Your Smart Farming Companion

AgroAssist is a chatbot in our project that assist farmers. It gives guidance and support related to farming. It helps farmers to understand the crop yield, and weather conditions in simple way. This uses AI methods to provide answers. It uses ML for predictions and NLP to understand the farmer queries. It gives simple and clear replies so that farmers can understand it clearly.
It is connected within our project to give answers in real-time and based on previous data. Overall, this helps farmers for farming. It makes farming easier and smarter.

# AgroAssist Chatbot route and logic (no blueprint, no circular import)
@app.route('/agroassist', methods=['GET', 'POST'])
def agroassist():
    if 'chatlog' not in session:
        session['chatlog'] = []
    chatlog = session['chatlog']
    if request.method == 'POST':
        user_input = request.form['user_input']
        chatlog.append({'role': 'user', 'text': user_input})
        response = get_agroassist_response(user_input)
        chatlog.append({'role': 'bot', 'text': response})
        session['chatlog'] = chatlog
        return redirect(url_for('agroassist'))
    return render_template('agroassist.html', chatlog=chatlog)
def get_agroassist_response(user_input):
    text = user_input.lower()
    # Crop price Q&A
    if 'price' in text or 'cost' in text:
        prices = get_crop_prices()
        for crop in prices:
            if crop['name'].lower() in text:
                return f"Current price of {crop['name']}: ₹{crop['price']} {crop['unit']} in {crop['market']}."
        return "Please specify the crop name to get the price."
    # Crop recommendation
    if 'recommend' in text or 'which crop' in text:
        return "To get crop recommendations, use the Predict page and enter your soil, weather, and region details."
    # Disease/plant advice
    if 'disease' in text or 'problem' in text or 'plant' in text:
        return "For plant disease diagnosis, use the Crop Doctor page and upload a plant image."
    # Greetings
    if 'hello' in text or 'hi' in text or 'hey' in text:
        return "Hello! I'm AgroAssist, your farming assistant. Ask me about crop prices, advice, or feedback."
    # Fallback
    return "I'm AgroAssist. I can help with crop prices, recommendations, and farming advice. Try asking about a crop price or how to get recommendations."
Enter fullscreen mode Exit fullscreen mode

What’s Next for Our Project

Our project can be improved by creating new AI models for accurate prediction. In the future, we can use good ML models and make better predictions in deeper. We can can also add the real-time notifications and alerts. This helps farmer to know the weather change, crop damage, and market trends at correct time. This helps in improving the prediction and provide better suggestions.
We also update this app where we can access it on mobiles easily. Add the supporting sensors and devices which helps in collecting real-time data about soil, temperature and weather.

Wrapping Up
This project is not just a crop prediction system. It is like smart farming platform which helps farmer to take better decisions. It uses ML for prediction and NLP for understand farmer queries.
The system uses both present nad previous data to give suggestions. It helps farmers to choose right crops, and tell farmers how to use the resources properly. This improves the productivity.
So, total project shows the modern farming technology which is smarter and easier to use.

Top comments (0)