In the current landscape of Computer Vision, the default move is often Transfer Learning—taking a massive model like ResNet50 and fine-tuning it. While effective, this often abstracts away the fundamental mechanics of how a network actually "sees" texture.
For my latest project, I decided to build a Convolutional Neural Network (CNN) entirely from scratch using PyTorch. My goal? To build a binary classifier capable of distinguishing between hair textures (e.g., Curly vs. Straight) using the Kaggle Hair Type dataset.
Here is a look under the hood of the architecture and the engineering decisions I made.
1. The Data Pipeline: Why Augmentation Matters
The input images were standardized to $200 \times 200$ pixels. However, training a model from scratch on a smaller dataset poses a high risk of overfitting—where the model memorizes the images rather than learning the features.
To combat this, I engineered a robust training pipeline using torchvision.transforms.
Instead of feeding the model static images, I applied dynamic transformations:
- Random Rotations (50°): To handle different head tilts.
- Random Resized Crop: To force the model to look at different scales of the hair strands.
- Horizontal Flips: To ensure directional invariance.
Crucially, I kept the Test Set deterministic (only resizing and normalizing) to ensure I had a stable benchmark for evaluation.
2. The Architecture
I opted for a lightweight, shallow architecture to test how much information could be extracted with minimal compute.
The Stack:
-
Input:
(3, 200, 200) - Feature Extraction: A generic convolutional layer (32 filters, $3\times3$ kernel) followed by ReLU activation and $2\times2$ Max Pooling.
- Dimensionality Reduction: A Flatten layer converting the 2D feature maps into a vector of over 313,000 features.
- Classification Head: A dense hidden layer (64 neurons) leading to a single output neuron.
3. The "Binary" Nuance
Since I designed this as a binary classifier, the output layer and loss function had to be paired perfectly.
I used a Sigmoid activation on the final neuron to squash the output between 0 and 1 (representing probability). Consequently, I utilized Binary Cross Entropy Loss (BCELoss) rather than the standard Cross Entropy used in multi-class problems.
# The Classification Head
self.fc1 = nn.Linear(32 * 99 * 99, 64)
self.fc2 = nn.Linear(64, 1)
self.sigmoid = nn.Sigmoid()
4. Training for Reproducibility
One of the biggest challenges in ML engineering is reproducibility. To ensure my results weren't just a fluke of random initialization, I strictly seeded the environment:
SEED = 42
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.backends.cudnn.deterministic = True
I used Stochastic Gradient Descent (SGD) with a learning rate of 0.002 and momentum of 0.8. I tracked the Median Training Accuracy across epochs to filter out noise and the Mean Test Loss to monitor generalization.
Key Takeaways
Building this from scratch reinforced several core Deep Learning concepts:
- Input math is critical: Calculating the exact feature map size after convolution and pooling is necessary to line up the Linear layers.
- Data is king: The model performance improved significantly after introducing the RandomResizedCrop augmentation.
- Simplicity works: You don't always need a Transformer. For distinct textural differences, a simple CNN is fast, lightweight, and effective.
#MachineLearning #PyTorch #ComputerVision #DeepLearning #DataScience #CNN

Top comments (0)