DEV Community: Jatin Sisodia

Battle of the CNNs: ResNet vs. MobileNet vs. EfficientNet for Fruit Disease Detection

Jatin Sisodia — Wed, 14 Jan 2026 07:48:39 +0000

So here's the thing: I've always been fascinated by how deep learning can solve real-world problems, and fruit disease detection seemed like the perfect challenge. Not too simple, not impossibly complex, and actually useful for farmers dealing with crop losses.

I ended up building FruitScan-AI and testing three different neural network architectures to see which one actually works best. Spoiler: they each have their strengths, and the "best" one totally depends on what you're building.

Why Even Bother With Fruit Disease Detection?
Look, I know what you're thinking: "why fruits?" But hear me out. Farmers lose something like 20 to 40% of their crops every year to diseases and pests. That's HUGE. And the traditional way of checking? Walking through fields, manually inspecting every plant, hoping you catch problems early. It's slow, inconsistent, and requires expertise that not everyone has access to.

So I thought: what if we could just snap a photo and get an instant diagnosis? That's where FruitScan-AI comes in.

What I Built
FruitScan-AI is basically a deep learning system that looks at fruit images and tells you two things:

What kind of fruit it is
Whether it's healthy or diseased

But here's where it gets interesting... I didn't just build one model. I built three different versions using EfficientNet, MobileNetV2, and ResNet50 to compare them side by side.

The Dataset
I worked with images of over 15 different fruits: apples, bananas, grapes, mangoes, tomatoes, peppers... you name it. Each category has both healthy specimens and diseased ones (bacterial spots, fungal infections, rot, all the nasty stuff).

The images are high resolution enough to catch the subtle details that matter for accurate classification.

The Three Architectures (And Why I Picked Them)

EfficientNet: The Balanced One
EfficientNet is like that friend who's good at everything without trying too hard. It scales network depth, width, and resolution together using this "compound scaling" approach. Translation? You get great accuracy without your model becoming a computational monster.

I went with EfficientNet because it's efficient and actually performs really well on image classification tasks.

MobileNetV2: The Lightweight Champion
This one's designed for mobile devices. It uses these clever "depthwise separable convolutions" that basically do more with less. Perfect if you want to deploy your model on a phone or a Raspberry Pi in the middle of a farm.

If I were building an app for farmers in the field, MobileNetV2 would be my go to.

ResNet50: The Heavyweight
ResNet50 is the veteran here. It introduced "skip connections" that let you train really deep networks without everything exploding (vanishing gradients are fun like that). It's deeper, it's powerful, and it can learn some seriously complex patterns.

I included ResNet50 because it's battle tested and gives you a solid baseline for comparison.

How It Actually Works
All three models use transfer learning... basically, I'm not starting from scratch. These models were pretrained on ImageNet (millions of images), so they already know what edges, textures, and shapes look like.

Here's the general approach:

# I freeze the pretrained layers at first
base_model = EfficientNetB0(weights='imagenet', include_top=False)
base_model.trainable = False

# Then add my own classification layers on top
model = Sequential([
    base_model,
    GlobalAveragePooling2D(),
    Dense(256, activation='relu'),
    Dropout(0.5),
    Dense(num_classes, activation='softmax')
])

Why transfer learning? Three reasons:

Training is way faster
You need less data
Better results

The Training Pipeline
Nothing fancy here, just good practices:

Resize images to what each model expects (224x224 for most)
Normalize using ImageNet statistics
Add data augmentation (flips, rotations, brightness tweaks) so the model doesn't just memorize training data
Train in batches to keep my GPU from crying

I tracked the usual suspects: accuracy, precision, recall, F1 score, and confusion matrices to see where each model struggled.

Getting Started (If You Want to Try This)
What You Need

pip install tensorflow keras numpy pandas matplotlib scikit-learn

Grab the code

git clone https://github.com/sisodiajatin/FruitScan-AI.git
cd FruitScan-AI

The repo's organized by architecture:

FruitScan-AI/
├── EfficiencyNet/    # EfficientNet notebooks
├── MobileNetV2/      # MobileNetV2 experiments  
├── ResNet50/         # ResNet50 implementation

Running it

cd EfficiencyNet  # or whichever model you want
jupyter notebook
# Open the training notebook and go through it cell by cell

Making Predictions
Once you've got a trained model, using it is straightforward:

# Load your model
model = load_model('fruit_disease_model.h5')

# Prep your image
img = load_img('suspicious_apple.jpg', target_size=(224, 224))
img_array = img_to_array(img)
img_array = np.expand_dims(img_array, axis=0)
img_array = preprocess_input(img_array)

# Get prediction
predictions = model.predict(img_array)
class_idx = np.argmax(predictions[0])

print(f"This looks like: {class_labels[class_idx]}")
print(f"Confidence: {predictions[0][class_idx]*100:.2f}%")

Here's a quick Flask wrapper if you want to turn this into an API:

from flask import Flask, request, jsonify

app = Flask(__name__)
model = tf.keras.models.load_model('best_model.h5')

@app.route('/predict', methods=['POST'])
def predict():
    file = request.files['image']
    img = preprocess_image(file)
    prediction = model.predict(img)

    return jsonify({
        'fruit': get_fruit_name(prediction),
        'status': 'healthy' if is_healthy(prediction) else 'diseased',
        'confidence': float(np.max(prediction))
    })

So... Which Model Won?
Honestly? It depends what you're optimizing for.
From what I've seen:

EfficientNet hits around 92 to 95% accuracy with decent speed (about 30 to 50ms per image). Good all rounder.
MobileNetV2 gets 88 to 91% accuracy but is FAST (10 to 20ms) and tiny. Perfect for mobile apps.
ResNet50 lands at 90 to 93% accuracy but is slower (50 to 80ms) and bigger. Great for research or when accuracy is everything.

Try It Out!
If this sounds interesting:
⭐ Star the repo - https://github.com/sisodiajatin/FruitScan-AI

😲 Your Face is the Playlist: Building an Emotion-Aware Android App

Jatin Sisodia — Sun, 11 Jan 2026 08:53:10 +0000

The Problem: Static Playlists in a Dynamic World
Music is deeply personal, but our music players are surprisingly impersonal. We’ve all been there: You’re having a rough day, but your "Daily Mix" decides it’s the perfect time for high-energy dance pop. Or you're in the zone working, and a jarring ballad breaks your flow.

We curate playlists for specific moods, but scrolling through them takes effort. What if your phone could just look at you, understand how you’re feeling, and play the perfect track automatically?

In this post, we’re diving into the EmotionToMusic-App, an open-source Android project that bridges the gap between computer vision and music recommendation. We’ll explore how to build a pipeline that goes from Face -> Emotion -> Music in real-time, all on-device.

Technical Overview
This application is built natively in Kotlin and follows modern Android development practices. The core philosophy is on-device inference. By running the Machine Learning (ML) models locally, we ensure:

Privacy: No images of the user are ever sent to a cloud server.
Latency: Music reaction is near-instantaneous.
Offline Capability: It works without an internet connection.

The Tech Stack
Language: Kotlin

Camera: CameraX (Jetpack library for easy camera lifecycle management)

ML Engine: TensorFlow Lite (for emotion classification)

Face Detection: Google ML Kit (to locate the face before analysis)

How It Works:
The application functions as a continuous loop. It doesn't just take a single photo; it analyzes the video stream frame-by-frame. Here is the high-level flow:

Capture: CameraX intercepts a frame from the live preview.
Detection: ML Kit scans the full frame to find a face.
Preprocessing: The face is cropped, converted to grayscale, and resized (usually to 48x48 pixels) to match the model's input requirements.
Inference: The processed image is fed into a TFLite model (often a CNN trained on the FER2013 dataset).
Output: The model returns a probability array (e.g., [Happy: 0.8, Sad: 0.1, Neutral: 0.1]).
Action: The app maps the highest probability emotion to a specific genre and triggers the Media Player.

The Implementation
Let's look at the code that powers this "AI DJ."

1. The Analyzer ( The "Eye")
The heart of the app is the ImageAnalysis.Analyzer. This runs on a background thread and processes frames. Note how we use ML Kit first to find the face, ensuring we don't feed background noise to our emotion model.

class EmotionAnalyzer(private val listener: EmotionListener) : ImageAnalysis.Analyzer {

    @androidx.annotation.OptIn(androidx.camera.core.ExperimentalGetImage::class)
    override fun analyze(imageProxy: ImageProxy) {
        val mediaImage = imageProxy.image
        if (mediaImage != null) {
            val inputImage = InputImage.fromMediaImage(mediaImage, imageProxy.imageInfo.rotationDegrees)

            // Step 1: Detect Face
            FaceDetection.getClient().process(inputImage)
                .addOnSuccessListener { faces ->
                    if (faces.isNotEmpty()) {
                        // Step 2: Crop Face & Run Inference
                        val emotion = recognizeEmotion(faces[0], mediaImage)
                        listener.onEmotionDetected(emotion)
                    }
                }
                .addOnCompleteListener {
                    imageProxy.close() // Important: Release frame!
                }
        }
    }
}

2. The Model (The "Brain")
Once we have the face, we interpret it. The TensorFlowLite interpreter takes the raw byte buffer of the image and outputs the probabilities.

fun recognizeEmotion(face: Face, image: Image): String {
    // 1. Convert YUV image to Bitmap and crop to face bounding box
    val faceBitmap = BitmapUtils.cropToFace(image, face.boundingBox)

    // 2. Resize to model input size (e.g., 48x48) & Grayscale
    val processedBitmap = BitmapUtils.preprocess(faceBitmap)

    // 3. Run Inference
    val output = Array(1) { FloatArray(7) } // 7 emotions (Happy, Sad, Angry, etc.)
    tfliteInterpreter.run(processedBitmap, output)

    // 4. Get the index of the highest confidence
    val maxIndex = output[0].indices.maxByOrNull { output[0][it] } ?: 0

    return emotionLabels[maxIndex] // e.g., "Happy"
}

3. The Mapper (The "DJ")
Finally, a simple controller maps the result to a playlist.

fun playMusicForEmotion(emotion: String) {
    val playlist = when(emotion) {
        "Happy" -> R.raw.upbeat_pop
        "Sad"   -> R.raw.melancholy_piano
        "Angry" -> R.raw.heavy_rock
        else    -> R.raw.chill_lofi
    }
    mediaPlayer.play(playlist)
}

Design Decisions & Challenges

1. Lighting is the Enemy
Computer vision models struggle in low light. During testing, shadows across the face were often misclassified as "Angry" or "Sad."

Solution: We implemented a check for average luminosity. If the frame is too dark, the app pauses detection and prompts the user to move to the light.

2. The Jitter Problem
Real-time inference is fast. Your face might register as "Happy" for 10 frames, "Neutral" for 1 frame, and "Happy" again. If we switched the song every time the emotion flickered, the experience would be terrible.

Solution: Smoothing. We use a buffer that stores the last 10 detected emotions and only changes the music if the dominant emotion changes for a sustained period (e.g., 2 seconds).

Final Thoughts

The EmotionToMusic-App is a great example of how accessible AI has become. You don't need to be a data scientist to build "smart" apps; you just need to know how to wire the components together.

GitHub Repository—https://github.com/sisodiajatin/EmotionToMusic-App

Inside a Scholarly Search Engine: Indexing, Ranking, and Retrieval

Jatin Sisodia — Sun, 11 Jan 2026 07:35:54 +0000

Repository: https://github.com/sisodiajatin/CS547-IR-Scholarly-Search

Let’s be real for a second: academic search is broken.

If you have ever tried to find a specific paper on a generic search engine, you know the pain. You type "neural networks," and you get a mix of Medium articles, YouTube tutorials, and maybe, if you are lucky, the actual PDF you were looking for on page 3.

I ran into this exact wall recently. I realized that building a search engine is not just about matching strings; it is about understanding intent. So, instead of complaining about it, I decided to build one.

This is the story of Scholarly Search, a project where I stopped relying on external search services and built a custom Information Retrieval (IR) system from the ground up using Python and Flask.

What Are We Actually Building?
At its core, this project is a specialized search engine for academic papers. The goal was not just to "find text" but to rank it intelligently. If a user searches for "machine learning," a paper with that phrase in the title should rank higher than one that mentions it once in the footnotes.

To make this happen, I had to move away from simple database queries and embrace the Inverted Index, the data structure that powers basically every search engine on the planet.

The Stack:

Core: Python (handling all logic and data structures).

Web Framework: Flask (serving both the API and the UI).

Frontend: HTML,CSS & Vanilla JavaScript (keeping it lightweight and monolithic).

The Secret Sauce: A custom-built Inverted Index and BM25 Ranking algorithm.

The "Aha!" Moment: Why Simple Counts Do Not Work
When I first started, I thought, "Easy. I will just count how many times the word appears."

I was wrong.

If you search for "the analysis," the word "the" appears in almost every document. If you rank by pure frequency, your results will be dominated by papers that just happen to be wordy, not relevant.

Enter BM25.

BM25 is the industry standard for a reason. It does two smart things:

It penalizes common words. (Inverse Document Frequency)
It penalizes long documents. (Length Normalization)

Here is the actual Python code used to calculate the score. It looks a bit math heavy, but it is really just balancing term frequency against document length:

def score_bm25(n, f, qf, r, N, dl, avdl):
    # K is a scaling factor based on doc length (dl) vs average (avdl)
    K = k1 * ((1 - b) + b * (dl / avdl))

    # This part calculates relevance
    first = math.log(((r + 0.5) / (R - r + 0.5)) / ((n - r + 0.5) / (N - n - R + r + 0.5)))
    second = ((k1 + 1) * f) / (K + f)
    third = ((k2 + 1) * qf) / (k2 + qf)

    return first * second * third

Indexing: The Heavy Lifting
The biggest challenge was speed. You can't scan 50,000 documents every time someone hits "Enter."

The solution is an Inverted Index. Think of it like the index at the back of a textbook. Instead of reading the book to find "Algorithms," you look up "Algorithms" and see a list of page numbers.

I wrote a script that pre-processes the raw data (stripping out punctuation, lowercasing everything) and builds this map in memory.

# Simplified view of the indexing process
inverted_index = defaultdict(list)

for doc_id, text in corpus.items():
    tokens = preprocess(text) # Clean the text
    for term in tokens:
        # Map the term back to the document ID
        inverted_index[term].append(doc_id)

Trade-off Alert: I chose to keep this index in memory (RAM).

Pro: It’s blazing fast. Sub-millisecond lookup times.
Con: It eats RAM. For a dataset this size (<100k docs), it's fine. For anything larger, you'd want to dump this to disk.

The Frontend: Simple & Effective
Because this project focuses on the backend IR logic, I kept the frontend architecture simple.

Instead of over-engineering with a complex framework like React or Vue, I built the interface using standard HTML, CSS, and Vanilla JavaScript. This keeps the application lightweight and ensures that the "search" functionality remains the star of the show.

The UI logic is handled by a simple script that fetches results from the backend API asynchronously:

// A simple fetch function to query the Flask API
function search(query) {
    fetch(`/search?q=${query}`)
        .then(response => response.json())
        .then(data => {
            const resultsDiv = document.getElementById('results');
            resultsDiv.innerHTML = ''; // Clear old results

            data.forEach(paper => {
                // Dynamically create HTML for each result
                let item = `
                    <div class="paper">
                        <h3>${paper.title}</h3>
                        <p>${paper.abstract}</p>
                    </div>
                `;
                resultsDiv.innerHTML += item;
            });
        });
}

Try It Yourself
If you want to poke around the code or run it locally, I have open sourced the whole thing.