Introduction:
Mental health issues affect millions of people worldwide, and depression is one of the most common yet underdiagnosed conditions. Traditional diagnosis often requires clinical interviews and questionnaires, which may not always be accessible or objective. With the rise of artificial intelligence, researchers are exploring automated ways to assist in early detection — one promising method is facial expression analysis using deep learning.
This article walks you through a step-by-step deep learning project that detects depression based on facial images. The project uses a public dataset, performs preprocessing, builds a CNN model, trains it, and evaluates its performance.
Dataset Acquisition
We use the Depression Dataset on Facial Expression Images available on Kaggle. This dataset contains facial images labeled as Depressed or Not Depressed.
Downloading the Dataset
The dataset is downloaded using kagglehub:
import kagglehub
path = kagglehub.dataset_download("khairunneesa/depression-dataset-on-facial-ecpression-images")
print("Path to dataset files:", path)
Reorganizing the files in dataset
The original dataset has multiple emotion folders (happy, sad, neutral, etc.). Since our task is binary classification, we group them into just two categories:
- Depressed → sad and neutral
- Non-Depressed → all other emotions
- The script loops through train, val, and test sets, checks each emotion folder, and copies images into the appropriate binary folder. Filenames are prefixed with their original set and label (e.g., train_sad_img1.jpg) to avoid overwriting.
import os
import shutil
depressed_classes = ['sad', 'neutral']
binary_data_path = '/kaggle/working/binary_data'
os.makedirs(os.path.join(binary_data_path, 'depressed'), exist_ok=True)
os.makedirs(os.path.join(binary_data_path, 'non_depressed'), exist_ok=True)
for subdir in ['train', 'val', 'test']:
subdir_path = os.path.join(path, 'Depression Data', 'data', subdir)
for emotion_folder in os.listdir(subdir_path):
emotion_folder_path = os.path.join(subdir_path, emotion_folder)
if not os.path.isdir(emotion_folder_path):
continue
if emotion_folder.lower() in depressed_classes:
target_folder = os.path.join(binary_data_path, 'depressed')
else:
target_folder = os.path.join(binary_data_path, 'non_depressed')
for img_file in os.listdir(emotion_folder_path):
src_file = os.path.join(emotion_folder_path, img_file)
dst_file = os.path.join(target_folder, f"{subdir}_{emotion_folder}_{img_file}")
if os.path.isfile(src_file):
shutil.copy(src_file, dst_file)
- This ensures our model sees only two classes.
Image Preprocessing:
- We use Keras’ ImageDataGenerator for:
- Rescaling pixel values to [0,1]
- Splitting into train/validation sets (80/20)
- Grayscale conversion to reduce complexity
from tensorflow.keras.preprocessing.image import ImageDataGenerator
img_height, img_width = 128, 128
batch_size = 32
datagen = ImageDataGenerator(rescale=1./255, validation_split=0.2)
train_generator = datagen.flow_from_directory(
binary_data_path,
target_size=(img_height, img_width),
color_mode='grayscale',
batch_size=batch_size,
class_mode='binary',
subset='training',
shuffle=True
)
val_generator = datagen.flow_from_directory(
binary_data_path,
target_size=(img_height, img_width),
color_mode='grayscale',
batch_size=batch_size,
class_mode='binary',
subset='validation',
shuffle=False
)
Building the CNN Mode
We built a Convolutional Neural Network (CNN) — a type of AI model that’s great at understanding images.
Here’s what each layer does in simple terms:
Convolution Layer
- onv2D Layers (32, 64, 128 filters):Think of these as smart magnifying glasses that scan the image and detect patterns like edges, shapes, and textures.
- Each time we add more filters (32 → 64 → 128), the model learns more detailed patterns.
- MaxPooling2D Layers :These act like compressors — they shrink the image while keeping the important details, making learning faster and avoiding unnecessary noise.
- Flatten Layer: Imagine taking a 3D Lego structure and laying out all the pieces in a single row so the computer can process them easily.
- Dropout Layers (0.5 and 0.3) :These randomly turn off some “neurons” during training, which helps prevent overfitting (where the model memorizes instead of learning).
- Dense Layers: The first Dense layer (128 neurons, ReLU) acts like a decision-making stage, combining all learned patterns.
- The final Dense layer (1 neuron, sigmoid) gives the answer:
- Close to 0 → Non-depressed
- Close to 1 → Depressed
- Finally, we used Adam optimizer (for efficient learning) and binary cross-entropy (since this is a two-class problem).
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
model = Sequential([
Conv2D(32, (3,3), activation='relu', input_shape=(img_height, img_width, 1)),
MaxPooling2D(2,2),
Conv2D(64, (3,3), activation='relu'),
MaxPooling2D(2,2),
Conv2D(128, (3,3), activation='relu'),
MaxPooling2D(2,2),
Flatten(),
Dropout(0.5),
Dense(128, activation='relu'),
Dropout(0.3),
Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()
Training the Model
We train for 10 epochs and track validation accuracy.
epochs = 10
history = model.fit(train_generator, epochs=epochs, validation_data=val_generator)
- We tell the model to go through the entire training dataset 10 times.
- One full pass over the dataset = 1 epoch.
- More epochs usually help the model learn better, but too many can make it memorize the data (overfitting).
- The opposite can cause underfitting(training with insufficient data).
After training: We Plot training vs validation accuracy/loss curves
plt.figure(figsize=(12, 4))
# Plot training & validation accuracy values
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')
# Plot training & validation loss values
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model Loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.show()
Accuracy Curve:
The training and validation accuracy steadily increased, reaching nearly 80% by the final epoch. This tells us the model was learning to distinguish between depressed and non-depressed expressions effectively.
Loss Curve:
Both training and validation loss decreased consistently, which means the model was getting better at making correct predictions. Importantly, the two curves stayed close together, suggesting the model was generalizing well rather than just memorizing the training data.
Function to load and preprocess a user-uploaded image
from google.colab import files
from IPython.display import display, Image
def predict_uploaded_image(model, img_height, img_width):
uploaded = files.upload()
for filename in uploaded.keys():
print(f'User uploaded file "{filename}"')
try:
# Display the uploaded image
display(Image(filename))
# Load and preprocess the image
img = image.load_img(filename, target_size=(img_height, img_width), color_mode='grayscale')
img_array = image.img_to_array(img)
img_array = np.expand_dims(img_array, axis=0) # Add batch dimension
img_array /= 255.0 # Rescale
# Make prediction
prediction = model.predict(img_array)
# Interpret prediction (since it's binary classification with sigmoid)
if prediction[0][0] > 0.5:
predicted_class = 'depressed'
else:
predicted_class = 'non_depressed'
print(f"\nThe model predicts the uploaded image is: {predicted_class}")
print(f"Prediction score: {prediction[0][0]}")
except Exception as e:
print(f"Error processing file {filename}: {e}")
# Call the function to allow the user to upload an image and get a prediction
predict_uploaded_image(model, img_height, img_width)
As shown the above input image is declared depressed with a likelihood of 88%
Conclusion
Building a Convolutional Neural Network (CNN) for depression detection using facial images gave us a solid glimpse into how deep learning can support mental health research. Starting from data preprocessing, we built a CNN step by step with convolution, pooling, and dropout layers — each playing a role in teaching the model how to “see” and understand subtle patterns in faces.
Training the model for 10 epochs showed encouraging results: accuracy climbed close to 80%, and the loss steadily decreased for both training and validation sets.
Of course, this is just a starting point. Real-world applications in healthcare would require:
- Larger and more diverse datasets (different lighting, ages, ethnicities, real-life conditions).
- Model optimization (tuning hyperparameters, experimenting with more layers or transfer learning).
- Ethical considerations — ensuring fairness, privacy, and responsible use of AI in sensitive domains.
Still, the project demonstrates how machine learning can move beyond traditional applications and contribute meaningfully to mental health research. With refinement, such systems may one day support professionals in early detection and awareness, making interventions more timely and accessible.
*(PS: It is my first post here so please be kind) *
Top comments (0)