Serge Mbela

Posted on May 8, 2025 • Edited on Jul 24, 2025

Machine learning: Malaria: Classification of Parasitized and Non-parasitized Cell Images

Blood Cell Types: What Are We Looking At?
To train any model for medical image classification, understanding the visual features of each cell type is critical. Below are the primary blood cell types you might encounter in peripheral blood smears used for malaria detection.

1. Basophils

Main Function: Allergic reactions, release of histamine
Involvement in Malaria: Not directly involved. Rarely seen or affected in malaria.
Microscopic Appearance: Small with dark purple granules that obscure the nucleus

2.Erythroblast

Main Function: Precursors of red blood cells (in bone marrow)
Involvement in Malaria: Can increase during severe anemia as the bone marrow tries to compensate for RBC loss.
Microscopic Appearance: Larger than RBCs, have a central nucleus, bluish cytoplasm

3. Monocytes

Main Function: Phagocytosis; precursors of macrophages
Involvement in Malaria: Active in malaria. They engulf infected RBCs and parasite debris—part of the immune response.
Microscopic Appearance: Large cells with a kidney-shaped nucleus and grayish-blue cytoplasm

4. Myeloblasts

Main Function: Immature precursors of granulocytes
Involvement in Malaria: Not usually seen unless there’s bone marrow stress or a hematological disorder.
Microscopic Appearance: Very large, immature cell; round nucleus, visible nucleoli

5. Segmented Neutrophils

Main Function: First-line defense; phagocytosis of pathogens
Involvement in Malaria: Involved in inflammation. Sometimes elevated or decreased depending on disease stage/severity
Microscopic Appearance: Multi-lobed nucleus (3–5 lobes), fine cytoplasmic granules

Parasitized

Uninfected

Preprocessing Medical Images for Deep Learning

Working with medical images comes with unique challenges — inconsistent formats, varying dimensions, and sometimes missing or degraded data. Here’s how to prepare your data properly.

1. Handle Missing Data
MCAR (Missing Completely at Random): Use simple imputation (mean, median) or remove rows.

MAR (Missing at Random): Use advanced techniques like regression or iterative imputation (IterativeImputer in scikit-learn).
Explanation

NMAR (Not Missing at Random): Much harder — might require collecting more data or sensitivity analysis.

Tip: If you’re using JPG or PNG instead of DICOM (which retains crucial metadata), be aware that some diagnostic fidelity might be lost.

2. Manage Outliers
Once outliers are identified, you need to decide how to handle them. The approach depends on the nature of the outlier and the goal of your analysis.( For fraud detection for example you must keep outliers, if it is an error you can delete data or replace with more reprsentative value , such as the mean, median, mode, or a value predicted by a machine learning model.
Median imputation is often preferred as it's less sensitive to outliers than the mean.

Visualization Methods
Visualizing your data can often reveal outliers that might be missed by purely statistical methods.

Box Plots: Excellent for visualizing the distribution of a single variable and identifying potential outliers (points beyond the whiskers).

Histograms: Can show unusual peaks or tails in the data.

Scatter Plots: Useful for multivariate data to identify points that deviate from the general trend.

4. Start learning

Splitting into training and validation sets
train_test_split divides the data into training (80%) and validation (20%) sets, maintaining balanced class proportions (stratify=all_labels).
Displays the total number of images and the class distribution in each subset.
Preparing image generators
Defines image dimensions (128x128 pixels) and batch size (32).
Creates two ImageDataGenerator objects:
train_datagen with transformations (data augmentation) such as rotation, shift, zoom, horizontal flip, and normalization (pixels scaled between 0 and 1).
val_datagen with only normalization (no augmentation).
Converts the X_train and y_train arrays into a Pandas DataFrame to use flow_from_dataframe, which reads images on the fly from paths and applies transformations.
train_generator loads images for training (with shuffle).
validation_generator loads images for validation (without shuffle).
Defining the CNN model
A simple sequential model with 3 convolution + max pooling blocks to extract important image features.
A flatten layer to convert outputs into a 1D vector.
A dense (fully connected) layer with 512 neurons + ReLU activation to learn complex combinations.
Dropout (0.5) to reduce overfitting.
Output dense layer with 1 neuron + sigmoid activation to produce a binary probability (0 or 1).
Compiling the model
Adam optimizer (adaptive and efficient).
Binary cross-entropy loss function, suitable for binary classification.
Accuracy metric to evaluate performance.
Training callbacks

EarlyStopping: stops training if validation loss does not improve for 5 epochs and restores the best weights.

ModelCheckpoint: saves only the model with the best validation accuracy to best_malaria_model.h5.

Training the model Trains up to 20 epochs (or fewer if EarlyStopping triggers).

Images are provided by the generators.

Displays training progress information.

Visualizing training history
Plots accuracy and loss curves for training and validation.

These curves help check if the model converges, or if there is overfitting, etc.

Saves the plot as an image training_history.png inside the folder img_stats.
Final evaluation
Loads the best saved model.
Evaluates its performance on the full validation set.
Displays the final loss and accuracy.

DEV Community

Machine learning: Malaria: Classification of Parasitized and Non-parasitized Cell Images

1. Basophils

2.Erythroblast

3. Monocytes

4. Myeloblasts

5. Segmented Neutrophils

Preprocessing Medical Images for Deep Learning

Top comments (0)

1. Basophils

2.Erythroblast

3. Monocytes

4. Myeloblasts

5. Segmented Neutrophils

** Preprocessing Medical Images for Deep Learning**

Preprocessing Medical Images for Deep Learning