I'm a 6th semester CS student at COMSATS University Islamabad. Over the past few months I've been doing deep learning research alongside my coursework, and we've submitted two papers to IEEE Access. Here's what we built and what I learned.
Paper 1 — IoT Intrusion Detection
IoT devices get attacked constantly. The problem with most deep learning IDS research is that models are validated on one dataset and never tested anywhere else — so you don't actually know if they generalize.
We took the CNN-DNN architecture from Nazari et al. (validated on Kitsune) and re-evaluated it on CICIDS-2017 — 2.8 million network flow records, 12 attack classes. It didn't just hold up, it improved: from 98.47% to 99.50% accuracy. That's the generalizability proof the original paper left open.
We also benchmarked a Transformer. Theoretically appealing for traffic classification — self-attention should model complex feature dependencies well. In practice on tabular network flow data, it landed at 96.98% and struggled on specific classes. Not bad, but behind CNN-DNN, and consistent with what the literature says about vanilla transformers on tabular data.
The finding I'm most proud of: a Lightweight model with only 17,772 parameters hit 97.55% accuracy — 21× fewer parameters than CNN-DNN — without needing GPU acceleration. That's actually deployable on constrained IoT edge hardware.
SMOTE applied only to the training set took CNN-DNN from 92.35% to 99.50%. Never touch the test set with synthetic samples.
Paper 2 — Breast Cancer Grading
Binary benign/malignant histopathology classification is a mostly solved problem — models hit 98%+ routinely. But a binary label isn't clinically useful. A surgeon needs to know if it's ductal carcinoma or lobular carcinoma because the treatment differs.
We extended the CBAM-VGGNet framework from Ijaz et al. (which did binary classification and explicitly listed multi-class grading as future work) to 8-class subtype grading on the BreakHis dataset. Replaced VGGNet with ResNet50V2, evaluated across all four magnification factors combined rather than individual subsets, and added GradCAM explainability.
Final result: 82.81% on the 8-class task, 96.27% binary accuracy. The GradCAM heatmaps attend to anatomically correct regions — nuclear pleomorphism for ductal carcinoma, organized acinar patterns for adenosis. That alignment with pathological criteria is what makes a model credible in a clinical setting, not just accurate.
Lobular carcinoma (F1: 0.67) and papillary carcinoma (F1: 0.74) were the hardest — morphological overlap with other malignant subtypes is a real challenge, not just a model failure.
What doing research as an undergrad actually taught me
Honest reporting matters more than clean numbers. Both papers include a limitations section that explicitly calls out where the models fail and why. Web Attack XSS got near-zero F1 across all three models in Paper 1 — we reported it, explained the structural reason (26 real test samples), and didn't hide it.
Two-phase fine-tuning in Paper 2 — frozen backbone first, then selective unfreezing — is not just a trick. Without it, fine-tuning a deep pretrained network on a small medical dataset will catastrophically forget useful features before the head stabilizes.
And SMOTE is not magic. It helped massively, but it can't compensate for insufficient real test samples. Know what it can and can't do.
Both papers are under review at IEEE Access. I'll update this post when they publish.
I'm Ahmad Mustafa, a Full Stack Developer and deep learning researcher based in Islamabad, Pakistan. I build AI-powered products and publish work in IoT security and medical imaging.
🔗 Research Repository: https://github.com/ahmadmustafa02/iot-malware-detection-research
📄 Paper 1 — IoT Intrusion Detection: Under review · IEEE Access
📄 Paper 2 — Breast Cancer Grading: Under review · IEEE Access
Website: https://ahmadmustafa.me/
LinkedIn: https://www.linkedin.com/in/ahmadmustafa01/
Top comments (0)