Building a Deep Learning Model to Detect Potato Diseases: My Journey with PlantVillage.

#computervision #deeplearning #tensorflow #agriculture

As a data scientist with a passion for solving real-world problems, I recently embarked on a project that aimed to detect potato diseases using computer vision. This idea sprouted from the realization that in many parts of Kenya and globally, farmers often lack quick and affordable access to agricultural expertise. A simple app that identifies potato diseases like early blight, late blight or confirms if the leaf is healthy, could make a real difference.

The Dataset: PlantVillage

The first step was choosing the right dataset. The PlantVillage dataset on Kaggle offered exactly what I needed: a categorized collection of potato leaf images labeled as either healthy, early blight or late blight. These high-quality images made training a model feasible without needing to manually curate data, a huge relief!

Setting Up the Pipeline

I developed this project using TensorFlow and Keras in Python. I began by loading and preprocessing the dataset. Preprocessing is crucial in any machine learning or deep learning project, especially in computer vision. This is because raw data is rarely in a form that's immediately useful for training a model.
dataset = tf.keras.preprocessing.image_dataset_from_directory( "PlantVillage", shuffle=True, image_size=(256, 256), batch_size=32 )
Once loaded, I split the data into training, validation, and test sets using a custom partitioning function. This was critical to ensure the model could generalize well.

Data Augmentation & Preprocessing

To combat overfitting, I applied data augmentation techniques such as flipping, zooming, contrast adjustments, and random translations. I also normalized the images by scaling pixel values between 0 and 1. This step significantly boosted training stability.
data_augmentation = tf.keras.Sequential([ layers.RandomFlip("horizontal_and_vertical"), layers.RandomRotation(0.2), layers.RandomZoom(0.2), layers.RandomContrast(0.2), layers.RandomTranslation(0.2, 0.2) ])

The Model Architecture

I used a simple but effective Convolutional Neural Network (CNN) built with Keras’ Sequential API. The architecture included several convolutional layers with ReLU activation, max-pooling, dropout to reduce overfitting and a final softmax layer to classify into three categories.
One challenge I encountered here was tuning the number of filters and layers. At first, the model either underfit or overfit badly. It took a few iterations and the introduction of Dropout and EarlyStopping to stabilize training.
EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

Training & Evaluation

Training the model over 50 epochs with early stopping allowed me to halt once performance plateaued. I monitored accuracy and loss on both training and validation sets.
The final evaluation on the test set yielded impressive results, with accuracy comfortably above 90%. Visualizing the confusion matrix confirmed that most misclassifications were between early and late blight, which is expected due to their visual similarity.
Challenges I Faced
Like any worthwhile journey, this project had its fair share of obstacles:

1. Data Imbalance

Initially, I noticed that the healthy class had more images than others. This imbalance skewed model predictions. I overcame this by using data augmentation more aggressively on underrepresented classes and ensuring balanced batch generation.

2. Memory Limitations

Working with 256x256 images on a standard machine sometimes caused memory issues during training. I solved this by caching and prefetching datasets using AUTOTUNE, which optimized performance without requiring a GPU upgrade.
train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=tf.data.AUTOTUNE)

3. Overfitting

Despite a good dataset, the model started overfitting after a few epochs. Dropout layers and data augmentation helped, but the real breakthrough came when I implemented early stopping, which gracefully halted training at the optimal point.

4. Evaluation Bias

My initial evaluation method didn’t give the full picture. Adding visualizations like confusion matrices, and sample predictions helped me interpret where the model struggled, especially between early and late blight.

Takeaways

This project taught me a lot about building robust computer vision pipelines:

Domain-specific preprocessing (like proper augmentation) is a game-changer.
Model evaluation is more than just accuracy—it’s about understanding behavior.
Simplicity wins: A moderately deep CNN, well-regularized and properly tuned, can outperform overly complex architectures.

What's Next?

I’m now working on deploying this model, enabling farmers to take a picture of a leaf and instantly receive a diagnosis. I’m also exploring transfer learning with models like MobileNetV2 for faster inference on edge devices.

Top comments (1)

Emmanuel Kiriinya • Jul 29

Such digital solutions for agriculture are a gamechanger.